Professional Documents
Culture Documents
Command Task is a specific task that allows one or multiple shell commands of
UNIX to run in Windows during the workflow.
Given below are the three major components of the Workflow Manager:
Task Designer
Task Developer
7. Describe the impact of several join conditions and join order in a Joiner
Transformation?
Answer: We can define one or more conditions based on equality between the
specified master and detail sources. Both ports in a condition must have the same
datatype.
If we need to use two ports in the join condition with non-matching datatypes we
must convert the datatypes so that they match. The Designer validates datatypes in
a join condition.
Additional ports in the join condition increase the time necessary to join two
sources.
The order of the ports in the join condition can impact the performance of the Joiner
transformation. If we use multiple ports in the join condition, the Integration Service
compares the ports in the order we specified.
They are:
Static Cache
Dynamic Cache
Recache
Persistent Cache
Shared Cache
Static Cache remains as it is without change while a session is running.
10. Suppose we have two Source Qualifier transformations SQ1 and SQ2 connected
to Target tables TGT1 and TGT2 respectively. How do you ensure TGT2 is loaded
after TGT1?
Answer: If we have multiple Source Qualifier transformations connected to multiple
targets, we can designate the order in which the Integration Service loads data into
the targets.
In the Mapping Designer, We need to configure the Target Load Plan based on the
Source Qualifier transformations in mapping to specify the required loading order.
( data science online training )
Task Developer
Task Designer
Workflow Designer
12. Differentiate between a repository server and a powerhouse?
Answer: Repository server mainly guarantees repository reliability and uniformity
while the powerhouse server tackles the execution of many procedures between the
factors of the server’s database repository.
17. How we can create indexes after completing the loan process?
Answer: With the help of the command task at session-level we can create indexes
after the loading procedure. ( devops training )
18. What are the advantages of using Informatica as an ETL tool over Teradata?
Answer: First up, Informatica is a data integration tool, while Teradata is an MPP
database with some scripting (BTEQ) and fast data movement (load, FastLoad,
Parallel Transporter, etc) capabilities. Informatica over Teradata1) Metadata
repository for the organization’s ETL ecosystem.
Informatica jobs (sessions) can be arranged logically into worklets and workflows in
folders.
This leads to an ecosystem that is easier to maintain and quicker for architects and
analysts to analyze and enhance.2) Job monitoring and recovery-
Easy to monitor jobs using Informatica Workflow Monitor.
Easier to identify and recover in case of failed jobs or slow running jobs.
Ability to restart from failure row/step.3) InformaticaMarketPlace- one-stop-shop for
lots of tools and accelerators to make the SDLC faster, and improve application
support.4) Plenty of developers in the market with varying skill levels and expertise5)
Lots of connectors to various databases, including support for Teradata mode,
trump, FastLoad, and Parallel Transporter in addition to the regular (and slow) ODBC
drivers. Some ‘exotic’ connectors may need to be procured and hence could cost
extra. Examples – Power Exchange for Facebook, Twitter, etc which source data
from such social media sources.6) Surrogate key generation through shared
sequence generators inside Informatica could be faster than generating them inside
the database.7) If the company decides to move away from Teradata to another
solution, then vendors like Infosys can execute migration projects to move the data,
and change the ETL code to work with the new database quickly, accurately and
efficiently using automated solutions.8) Pushdown optimization can be used to
process the data in the database.9) Ability to code ETL such that processing load is
balanced between ETL server and the database box – useful if the database box is
aging and/or in case the ETL server has a fast disk/ large enough memory & CPU to
outperform the database in certain tasks.10) Ability to publish processes as web
services.Teradata over Informatica
Cheaper (initially) – No initial ETL tool license costs (which can be significant), and
lower OPEX costs as one doesn’t need to pay for yearly support from Informatica
Corp.
A great choice if all the data to be loaded is available as structured files – which can
then be processed inside the database after an initial stage load.
Good choice for a lower complexity ecosystem
Only Teradata developers or resources with good ANSI/Teradata SQL / BTEQ
knowledge required to build and enhance the system.
Static cache
Dynamic cache
Persistent cache
Shared cache
Recache
22. What are the various types of transformation?
Answer:
Aggregator transformation
Expression transformation
Filter transformation
Joiner transformation
Lookup transformation
Normalizer transformation
Rank transformation
Router transformation
Sequence generator transformation
Stored procedure transformation
Sorter transformation
Update strategy transformation
XML source qualifier transformation
23. When do you use SQL override in a lookup transformation?
Answer: You should override the lookup query in the following circumstances:
Override the ORDER BY clause. Create the ORDER BY clause with fewer columns to
increase performance. When you override the ORDER BY clause, you must suppress
the generated ORDER BY clause with a comment notation.
Note: If you use pushdown optimization, you cannot override the ORDER BY clause
or suppress the generated ORDER BY clause with a comment notation.
A lookup table name or column names contains a reserved word. If the table name
or any column name in the lookup query contains a reserved word, you must ensure
that they are enclosed in quotes.
Use parameters and variables. Use parameters and variables when you enter a
lookup SQL override. Use any parameter or variable type that you can define in the
parameter file. You can enter a parameter or variable within the SQL statement, or
use a parameter or variable as the SQL query. For example, you can use a session
parameter, $ParamMyLkpOverride, as the lookup SQL query, and set
$ParamMyLkpOverride to the SQL statement in a parameter file. The designer
cannot expand parameters and variables in the query override and does not validate
it when you use a parameter or variable. The integration service expands the
parameters and variables when you run the session. ( data science training online )
A lookup column name contains a slash (/) character. When generating the default
lookup query, the designer and integration service replace any slash character (/) in
the lookup column name with an underscore character. To query lookup column
names containing the slash character, override the default lookup query, replace the
underscore characters with the slash character, and enclose the column name in
double-quotes.
Add a WHERE clause. Use a lookup SQL override to add a WHERE clause to the
default SQL statement. You might want to use the WHERE clause to reduce the
number of rows included in the cache. When you add a WHERE clause to a Lookup
transformation using a dynamic cache, use a Filter transformation before the Lookup
transformation to pass rows into the dynamic cache that match the WHERE clause.
Note: The session fails if you include large object ports in a WHERE clause.
Other. Use a lookup SQL override if you want to query lookup data from multiple
lookups or if you want to modify the data queried from the lookup table before the
Integration Service caches the lookup rows. For example, use TO_CHAR to convert
dates to strings.
25. What are the transformations that are not supported in Mapplet?
Answer: Normalizer, Cobol sources, XML sources, XML Source Qualifier
transformations, Target definitions, Pre- and post-session Stored Procedures, Other
Mapplets.
26. Describe Data Concatenation?
Answer: Data concatenation is the bringing of different pieces of the record
together.
“Treat source rows as” property in session is set to “Data-Driven” by default when
using an update strategy transformation in a mapping.
30. Which is the T/R that builts only single cache memory?
Answer: Rank can build two types of cache memory. But the sorter always built only
one cache memory.
– The cache is also called Buffer.
Advantage:
Select Query performance increases.
Disadvantage:
Maintenance cost increases due to more no. of tables.
32. What is Mapping Debugger?
Answer:
– A debugger is a tool. By using this we can identify records are loaded or not and
correct data is loaded or not from one T/R to other T/R.
– Session succeeded but records are not loaded. In this situation, we have to use the
Debugger tool.
i. It is a GUI based client application that allows users to monitor ETL objects running
an ETL Server.
ii. Collect runtime statistics such as:
36. If Informatica has its scheduler why using third party scheduler?
Answer: The client uses various applications (mainframes, oracle apps use Tivoli
scheduling tool) and integrate different applications & scheduling that applications it
is very easy by using third party schedulers.
a. Star Schema.
b. Snowflake Schema.
c. Gallery Schema.
5. A schema is a data model that consists of one or more tables.
38. What are the new features of Informatica 9.x at the developer level?
Answer: From a developer’s perspective, some of the new features in Informatica 9.x
are as follows:
Now Lookup can be configured as an active transformation – it can return multiple
rows on a successful match
Now you can write SQL override on an un-cached lookup also. Previously you could
do it only on cached lookup
You can control the size of your session log. In a real-time environment, you can
control the session log file size or time
Database deadlock resilience feature – this will ensure that your session does not
immediately fail if it encounters any database deadlock, it will now retry the
operation again. You can configure several retry attempts.
39. Suppose we do not group by on any ports of the aggregator what will be the
output?
Answer: If we do not group values, the Integration Service will return only the last
row for the input rows.
42. How does a Rank Transform differ from Aggregator Transform functions MAX
and MIN?
Answer: Like the Aggregator transformation, the Rank transformation lets our group
information. The Rank Transform allows us to select a group of top or bottom
values, not just one value as in the case of Aggregator MAX, MIN functions.
1. All input groups and the output group must have matching ports. The precision,
datatype, and scale must be identical across all groups.
2. We can create multiple input groups, but only one default output group.
3. The Union transformation does not remove duplicate rows.
4. We cannot use a Sequence Generator or Update Strategy transformation
upstream from a Union transformation.
5. The Union transformation does not generate transactions.
1. If there are duplicates in the source database, a user can use the property in
source qualifier. A user must go to the Transformation tab and checkmark the
‘Select Distinct’ option. Also, a user can use SQL override for the same purpose.
The user can go to the Properties tab and in SQL query tab write the distinct query.
2. A user can use Aggregator and select ports as key to getting distinct values. If a
user wishes to find duplicates in the entire column, then all ports should be selected
as a group by key.
3. The user can also use Sorter with Sort distinct property to get distinct values.
4. Expression and filter transformation can also be used to identify and remove
new port is added to the transformation. This cache is updated as and when data is
read. If a source has duplicate records, the user can look in Dynamic lookup cache
1. When two tables from the same source database with primary key – foreign key
transformation relationship is there, then the sources can be linked to one source
qualifier transformation.
2. Filtering rows when Integration service adds a where clause to the user’s default
query.
3. When a user wants an outer join instead of an inner join, then join information is
4. When sorted ports are specified, the integration service uses the order by clause
5. If a user chooses to find a distinct value, then integration service uses select
When the data we need to filter is not a relational source, the user should use Filter
transformation. It helps the user to meet the specified filter condition to let go or
pass through. It will directly drop the rows that do not meet the condition, and
Column A
Aanchal
Priya
Karishma
Snehal
Nupura
Step1: Assign row numbers to each record. Generate row numbers using
After this assign this variable port to output port. After expression transformation,
Related Courses
Data Visualization Training (15 Courses, 5+ Projects)Cloud Computing Training (18 Courses, 5+ Projects)
Variable_count= Variable_count+1
O_count=Variable_count
Create a dummy output port for the same expression transformation and assign 1 to
that port. This dummy port will always return 1 for each row.
Variable_count= Variable_count+1
O_count=Variable_count
Dummy_output=1
Aanchal 1 1
Priya 2 1
Karishma 3 1
Snehal 4 1
Nupura 5 1
Step 2: Pass the above output to an aggregator and do not specify any group by the
aggregator and assign O_count port to it. The aggregator will return the last row.
This step’s final output will have a dummy port with value as 1 and
O_total_records will have a total number of records in the source. The aggregator
51
Step 3: Pass this output to joiner transformation and apply a join on dummy port.
The property sorted input should be checked in joiner transformation. Only then
the user can connect both expression and aggregator transformation to joiner
Aanchal 1 5
Priya 2 5
Karishma 3 5
Snehal 4 5
Nupura 5 5
Step 4: After the joiner transformation we can send this output to filter
Karishma 3 5
Snehal 4 5
Nupura 5 5
The target table also has a table structure as a source. We will have two tables
containing NULL values and others that would not contain NULL values.
simple steps.
1. The user must perform joins whenever possible. When for some tables, this is
not possible, then a user can create a stored procedure and then join the tables in
the database.
3. When data is unsorted, then a source with fewer rows should be considered a
master source.
4. For sorted joiner transformation, a source with less duplicate key values should
2. Can filter rows only from relational sources. 2. Can filter rows from any type of source syste
3. It limits the row sets extracted from a source. 3. It limits the row set sent to a target.
4. It enhances performance by minimizing the 4. It is added close to the source to filter out the
number of rows used in mapping. unwanted data early and maximize performanc
5. In this, filter condition uses the standard SQL to 5. It defines a condition using any statement or
execute in the database. transformation function to get either TRUE or F
i. If the source is DBMS, you can use the property in Source Qualifier to select the
distinct records.
Or
you can also use the SQL Override to perform the same.
ii. You can use, Aggregator and select all the ports as key to get the distinct values.
After you pass all the required ports to the Aggregator, select all those ports ,
those you need to select for de-duplication. If you want to find the duplicates
based on the entire columns, select all the ports as group by key.
Th
e Mapping will look like this.
iii. You can use Sorter and use the Sort Distinct Property to get the distinct
values. Configure the sorter in the following way to enable this.
iv. You can use, Expression and Filter transformation, to identify and remove
duplicate if your data is sorted. If your data is not sorted, then, you may first use
a sorter to sort the data and then apply this logic:
Use one expression transformation to flag the duplicates. We will use the
variable ports to identify the duplicate entries, based on Employee_ID.
v. When you change the property of the Lookup transformation to use the
Dynamic Cache, a new port is added to the transformation. NewLookupRow.
The Dynamic Cache can update the cache, as and when it is reading the data.
If the source has duplicate records, you can also use Dynamic Lookup cache and
then router to select only the distinct one.
3. What are the differences between Source Qualifier and Joiner Transformation?
The Source Qualifier can join data originating from the same source database.
We can join two or more tables with primary key-foreign key relationships by
linking the sources to one Source Qualifier transformation.
Un- cached lookup– Here, the lookup transformation does not create the cache.
For each record, it goes to the lookup Source, performs the lookup and returns
value. So for 10K rows, it will go the Lookup source 10K times to get the related
values.
Cached Lookup– In order to reduce the to and fro communication with the
Lookup Source and Informatica Server, we can configure the lookup
transformation to create the cache. In this way, the entire data from the Lookup
Source is cached and all lookups are performed against the Caches.
Based on the types of the Caches configured, we can have two types of caches,
Static and Dynamic.
The Integration Service performs differently based on the type of lookup cache
that is configured. The following table compares Lookup transformations with
an uncached lookup, a static cache, and a dynamic cache:
Persistent Cache
By default, the Lookup caches are deleted post successful completion of the
respective sessions but, we can configure to preserve the caches, to reuse it next
time.
Shared Cache
We can share the lookup cache between multiple transformations. We can share
an unnamed cache between transformations in the same mapping. We can
share a named cache between transformations in the same or different
mappings.
During session configuration, you can select a single database operation for all
rows using the Treat Source Rows As setting from the ‘Properties’ tab of the
session.
Once determined how to treat all rows in the session, we can also set options for
individual rows, which gives additional control over how each rows behaves. We
need to define these options in the Transformations view on mapping tab of the
session properties.
Steps:
1. Design the mapping just like an ‘INSERT’ only mapping, without Lookup, Update
Strategy Transformation.
2. First set Treat Source Rows As property as shown in below image.
3. Next, set the properties for the target table as shown below. Choose the
properties Insert and Update else Insert.
These options will make the session as Update and Insert records without using
Update Strategy in Target Table.
When we need to update a huge table with few records and less inserts, we can
use this solution to improve the session performance.
The solutions for such situations is not to use Lookup Transformation and
Update Strategy to insert and update records.
The Lookup Transformation may not perform better as the lookup table size
increases and it also degrades the performance.
9. Why update strategy and union transformations are Active? Explain with
examples.
1. The Update Strategy changes the row types. It can assign the row types based on
the expression created to evaluate the rows. Like IIF (ISNULL (CUST_DIM_KEY),
DD_INSERT, DD_UPDATE). This expression, changes the row types to Insert for
which the CUST_DIM_KEY is NULL and to Update for which the CUST_DIM_KEY is
not null.
2. The Update Strategy can reject the rows. Thereby with proper configuration, we
can also filter out some rows. Hence, sometimes, the number of input rows, may
not be equal to number of output rows.
Union Transformation
In union transformation, though the total number of rows passing into the
Union is the same as the total number of rows passing out of it, the positions of
the rows are not preserved, i.e. row number 1 from input stream 1 might not be
row number 1 in the output stream. Union does not even guarantee that the
output is repeatable. Hence it is an Active Transformation.
10. How do you load only null records into target? Explain through mapping
flow.
Let us say, this is our source
Cust_id Cust_name Cust_amount Cust_Place Cust_zip
The target structure is also the same but, we have got two tables, one which will
contain the NULL records and one which will contain non NULL records.
Instructor-led Sessions
Real-life Case Studies
Assignments
Lifetime Access
Explore Curriculum
OR
3. In expression transformation make two port, one is “odd” and another “even”.
4. Write the expression as below
12. How do you load first and last records into target table? How many ways are
there to do it? Explain through mapping flows.
The idea behind this is to add a sequence number to the records and then take
the Top 1 rank and Bottom 1 Rank from the records.
1. Drag and drop ports from source qualifier to two rank transformations.
2. Create a reusable sequence generator having start value 1 and connect the next
value to both rank transformations.
3. Set rank properties as follows. The newly added sequence port should be chosen
as Rank Port. No need to select any port as Group by Port.Rank – 1
4. Rank – 2
13. I have 100 records in source table, but I want to load 1, 5,10,15,20…..100 into
target table. How can I do this? Explain in detailed mapping flow.
This is applicable for any n= 2, 3,4,5,6… For our example, n = 5. We can apply the
same logic for any n.
The idea behind this is to add a sequence number to the records and divide the
sequence number by n (for this case, it is 5). If completely divisible, i.e. no
remainder, then send them to one target else, send them to the other one.
3. In expression create a new port (validate) and write the expression as in the
picture below.
4. Connect a filter transformation to expression and write the condition in property
as given in the picture below.
14. How do you load unique records into one target table and duplicate records
into a different target table?
Source Table:
COL
COL1 COL3
2
a b c
x y z
a b c
r f u
a b c
v f r
v f r
a b c
x y z
r f u
v f r
a b c
a b c
v f r
2. In aggregator transformation, group by the key column and add a new port. Call
it count_rec to count the key column.
3. Connect a router to the aggregator from the previous step. In router make two
groups: one named “original” and another as “duplicate”.
In original write count_rec=1 and in duplicate write count_rec>1.
The
picture below depicts the group name and the filter conditions.
Connect two groups to corresponding target tables.
16. I have two different source structure tables, but I want to load into single
target table? How do I go about it? Explain in detail through mapping flow.
We can use joiner, if we want to join the data sources. Use a joiner and use the
matching column to join the tables.
We can also use a Union transformation, if the tables have some common
columns and we need to join the data vertically. Create one union
transformation add the matching ports form the two sources, to two different
input groups and send the output group to the target.
The basic idea here is to use, either Joiner or Union transformation, to move the
data from two sources to a single target. Based on the requirement, we may
decide, which one should be used.
SELECT * FROM (
Informatica Approach:
This will give us the top 3 employees earning maximum salary in their respective
departments.
18. How do you convert single row from source into three rows into target?
We can use Normalizer transformation for this. If we do not want to use
Normalizer, then there is one alternate way for this.
We have a source table containing 3 columns: Col1, Col2 and Col3. There is only
1 row in the table as follows:
a b C
There is target table contains only 1 column Col. Design a mapping so that the
target table contains 3 rows as follows:
Col
a
Gr
oup Ports Tab.
3. Connect the sources with the three input groups of the union transformation.
20. How to join three sources using joiner? Explain though mapping flow.
We cannot join more than two sources using a single joiner. To join three
sources, we need to have two joiner transformations.
Let’s say, we want to join three tables – Employees, Departments and Locations –
using Joiner. We will need two joiners. Joiner-1 will join, Employees and
Departments and Joiner-2 will join, the output from the Joiner-1 and Locations
table.
3. Create the next joiner, Joiner-2. Take the Output from Joiner-1 and ports from
Locations Table and bring them to Joiner-2. Join these two data sources using
Location_ID.
4. The last step is to send the required ports from the Joiner-2 to the target or via
an expression transformation to the target table.
22. What are the types of Schemas we have in data warehouse and what are the
difference between them?
There are three different data models that exist.
1. Star schema
Here, the Sales fact table is a fact table and the surrogate keys of each dimension
table are referred here through foreign keys. Example: time key, item key,
branch key, location key. The fact table is surrounded by the dimension tables
such as Branch, Location, Time and item. In the fact table there are dimension
keys such as time_key, item_key, branch_key and location_keys and measures
are untis_sold, dollars sold and average sales.Usually, fact table consists of more
rows compared to dimensions because it contains all the primary keys of the
dimension along with its own measures.
2. Snowflake schema
In fact constellation, there are many fact tables sharing the same dimension
tables. This examples illustrates a fact constellation in which the fact tables sales
and shipping are sharing the dimension tables time, branch, item.
A dimension table consists of the attributes about the facts. Dimensions store
the textual descriptions of the business. Without the dimensions, we cannot
measure the facts. The different types of dimension tables are explained in
detail below.
Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact table
to which they are joined.
Eg: The date dimension table connected to the sales facts is identical to the date
dimension connected to the inventory facts.
Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text
attributes that are unrelated to any particular dimension. The junk dimension is
simply a structure that provides a convenient place to store the junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension. In
the fact table we need to maintain two keys referring to these dimensions.
Instead of that create a junk dimension which has all the combinations of gender
and marital status (cross join gender and marital status table and create a junk
table). Now we can maintain only one key in the fact table.
Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and
doesn’t have its own dimension table.
Eg: A transactional code in a fact table.
Role-playing dimension:
Dimensions which are often used for multiple purposes within the same
database are called role-playing dimensions. For example, a date dimension can
be used for “date of sale”, as well as “date of delivery”, or “date of hire”.
5(18079)
5(282)
DATA WAREHOUSING AND BI CERTIFICATION TRAINING
Data Warehousing and BI Certification Training
Reviews
5(5645)
5(3732)
4(4847)
Next
A fact table is the one which consists of the measurements, metrics or facts of
business process. These measurable facts are used to know the business value
and to forecast the future business. The different types of facts are explained in
detail below.
Additive:
Additive facts are facts that can be summed up through all of the dimensions in
the fact table. A sales fact is a good example for additive fact.
Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions
in the fact table, but not the others.
Eg: Daily balances fact can be summed up through the customers dimension but
not through the time dimension.
Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the
dimensions present in the fact table.
Eg: Facts which have percentages, ratios calculated.
A fact table that contains aggregated facts are often called summary tables.
The SCD Type 1 methodology overwrites old data with new data, and therefore
does not need to track historical data.
4. Connect lookup to source. In Lookup fetch the data from target table and send
only CUSTOMER_ID port from source to lookup.
5. Give the lookup condition like this:
6. Then, send rest of the columns from source to one router transformation.
7. In router create two groups and give condition like this:
8. For new records we have to generate new customer_id. For that, take a sequence
generator and connect the next column to expression. New_rec group from
router connect to target1 (Bring two instances of target to mapping, one for new
rec and other for old rec). Then connect next_val from expression to customer_id
column of target.
9. Change_rec group of router bring to one update strategy and give the condition
like this:
10. Instead of 1 you can give dd_update in update-strategy and then connect to
target.
In Type 2 Slowly Changing Dimension, if one new record is added to the existing
table with a new information then, both the original and the new record will be
presented having new records with its own primary key.
4. All the procedures are similar to SCD TYPE1 mapping. The Only difference is,
from router new_rec will come to one update_strategy and condition will be
given dd_insert and one new_pm and version_no will be added before sending to
target.
5. Old_rec also will come to update_strategy condition will give dd_insert then will
send to target.
Target load order (or) Target load plan is used to specify the order in which the
integration service loads the targets. You can specify a target load order based
on the source qualifier transformations in a mapping. If you have multiple
source qualifier transformations connected to multiple targets, you can specify
the order in which the integration service loads the data into the targets.
Target load order will be useful when the data of one target depends on the
data of another target. For example, the employees table data depends on the
departments data because of the primary-key and foreign-key relationship. So,
the departments table should be loaded first and then the employees table.
Target load order is useful when you want to maintain referential integrity when
inserting, deleting or updating tables that have the primary key and foreign key
constraints.
You can set the target load order or plan in the mapping designer. Follow the
below steps to configure the target load order:
30. Write the Unconnected lookup syntax and how to return more than one
column.
We can only return one port from the Unconnected Lookup transformation. As
the Unconnected lookup is called from another transformation, we cannot
return multiple columns using Unconnected Lookup transformation.
However, there is a trick. We can use the SQL override and concatenate the
multiple columns, those we need to return. When we can the lookup from
another transformation, we need to separate the columns again using substring.
Source:
I am pretty confident that after going through both these Informatica Interview
Questions blog, you will be fully prepared to take Informatica Interview without
any hiccups. If you wish to deep dive into Informatica with use cases, I will
recommend you to go through our website and enrol at the earliest.
Numerous transformation
Difficult requirements
Complex logic regarding business
6. How do you load alternate records into different tables through mapping
flow?
The concept is to add a sequence number to the records and then divide the record
number by 2. If it is breakable, then move it to one target and if not then move it to
another target. Some of the following steps are:
Unit Testing
Unit Testing
In unit testing what we need do is something like below
_TYPE_ID SHOULD BE
NOT NULL ,FIRST
LOAN CHARACHER
INSCH00000000 ACCEPT
RECOR
D
SCD
STG_SCHM_DTLS_001 ALPHABET(INSCH) AND PASS
_ID LAST 10 CHARACTER
002 RECORD ACCEPT
ED
Type3
NUMERIC VALUES AND
ALSO ITS LENGTH IS 16
In SCD
RECORD Type3 ,there
INSERTED
REJECT WHEN , NOT
INTO
should be
NULL ,FIRST 5
REJECT REJECTED added two
CHARACHER NOT
INSCP00100000 RECORDREC FILE WITH AN column to
STG_SCHM_DTLS_002 LOAN_TYPE_ID (INSCH) OR LAST 10 PASS
CHARACTER NON
0002 ORD ERROR_ID identifying a
REJECTED &ERROR_DET
NUMERIC VALUES AND
AILS INTO
single
ALSO ITS LENGTH <>16
ERROR_TABL attribute. It
E stores one
time
LOAN_COMPANY_ID
MUST BE NOT
historical
NULL,FIRST 4 RECOR data with
LOAN_COMPANY_ CHRACTER INCO000000000 ACCEPT D current data
STG_SCHM_DTLS_003
ID ALPHABET(INCO) AND 03 RECORD ACCEPT
PASS
LAST 11 CHRACTER ED
NUMERIC VALUES AND 1. This
ALSO LENGTH IS 15
is the
RECORD
INSERTED
REJECT WHEN , NOT
INTO
NULL ,FIRST 4
RECOR REJECTED
CHARACHER NOT
LOAN_COMPANY_ INSO000000600 REJECT D FILE WITH AN
STG_SCHM_DTLS_004 (INCO) OR LAST 11 PASS
ID 03 RECORD REJECT ERROR_ID
CHARACTER NON
ED &ERROR_DET
NUMERIC VALUES AND
AILS INTO
ALSO ITS LENGTH <>15
ERROR_TABL
E
RECOR
START DATE
ACCEPT D
STG_SCHM_DTLS_005 START_DATE SHOULD BE A VALID 12/9/1988
RECORD ACCEPT
PASS
DATE
ED
RECORD
INSERTED
INTO
RECOR REJECTED
START DATE SHOULD
REJECT D FILE WITH AN
STG_SCHM_DTLS_006 START_DATE NOT BE LOADED WHEN 33FeB/88 PASS
RECORD REJECT ERROR_ID
IT IS NOT A VALID DATE
ED &ERROR_DET
AILS INTO
ERROR_TABL
E
RECOR
SCHEME-DESC SHOULD ACCEPT D
STG_SCHM_DTLS_007 SCHEME_DESC
BE ALPHABETIC TYPE
AUTOMOBILE
RECORD ACCEPT
PASS
ED
RECORD
INSERTED
INTO
RECOR REJECTED
REJECT WHEN SCHEME
SCHEME_DESC REJECT D FILE WITH AN
STG_SCHM_DTLS_008 DISCOUNT IS NOT MOTO124 PASS
RECORD REJECT ERROR_ID
ALPHABETIC TYPE
ED &ERROR_DET
AILS INTO
ERROR_TABL
E
RECOR
PREMIUM_PER_L PREMIUM_PER_LACSS ACCEPT D
STG_SCHM_DTLS_009
ACS HOULD BE NUMERIC
5000
RECORD ACCEPT
PASS
ED
source
In Type 2 Slowly Changing Dimension, if one new record is added to the existing table with a new
information then both the original and the new record will be presented having new records with its
own primary key.
4. All the procedure same as described in SCD TYPE1 mapping. The Only
difference is , From router new_rec will come to one update_strategy and
condition will be given dd_insert and one new_pm and version_no will be
added before sending to target.
5. Old_rec also will come to update_strategy condition will given dd_insert then
will send to target.
SCD TYPE1
The SCD Type 1 methodology overwrites old data with new data, and therefore does
no need to track historical data .
8. For new records we have to generate new customer_id. For that take a sequence
generator and connect the next column to expression .New_rec group from router
connect to target1(Bring two instances of target to mapping, one for new rec and
other for old rec) .Then connect next_val from expression to customer_id column of
target
9. Change_rec group of router bring to one update strategy. and give the condition like
this
10. Instead of 1 you can give dd_update in update-stratgy. Then connect to target.
Extracting Middle Name From Ename
empno ename
empno ename
1 Sekher
2 Prasad
Suppose In Ename column there is first name and last name like this
empno ename
In target we have to separate ename column to firstnane and lastname like this
1. Drag the source to mapping area and connect with an expression trans formation as
shown bellow.
2. In expression transformation create two output port one is f_name and other is
l_name.
Source
E_NO JOIN_DATE
------- ---------
1 07-JUL-11
2 05-JUL-11
3 05-MAY-11
If the current month is july ,2011 then target will be like this.
Target
E_NO JOIN_DATE
------- ---------
1 07-JUL-11
2 05-JUL-11
Source
E_NO YEAR DAYNO
------ --------- - ---------
1 01-JAN-07 301
2 01-JAN-08 200
Year column is a date and dayno is numeric that represents a day ( as in 365 for 31-
Dec-Year). Convert the Dayno to corresponding year's month and date and then send
to targer.
Target
E_NO YEAR_MONTH_DAY
------ --------- ----------
1 29-OCT-07
2 19-JUL-08
Scenario: From the order_delivery table insert the records to target where , day
difference between order_date and delivery_date is greater than 2 days. ( Note: see
last article , where we discussed finding the time in hour between two dates)
Source
ORDER_NO ORDER_DATE DELIVERY_DATE
--------- --------- ---------
2 11-JAN-83 13-JAN-83
3 04-FEB-83 07-FEB-83
1 08-DEC-81 09-DEC-81
Target
ORDER_NO ORDER_DATE DELIVERY_ DATE
--------- -------- ------ --- ----------
2 11-JAN-83 13-JAN-83
3 04-FEB-83 07-FEB-83
We have to calculate difference between order_date and delivery date in hours and
send it to target.
o/p will be
2. In expression create one out/put port “diff” and make it integer type.
3. In that port write the condition like this and sent to target.
Check the Hire-Date is Date or Not
2. In expression create another oupput port hire_date1 and make it to date data-type,
shown in picture.
3. In Hire_date1 write the condition like this.
Senario:Suppose you are importing a flat file emp.csv and hire_date colummn is in
numeric format, like 20101111 .Our objective is convert it to date,with a format
'YYYYMMDD'.
source
EMPNO HIRE_DATE(numeric)
------- -----------
1 20101111
2 20090909
target
EMPNO HIRE_DATE (date)
------ -----------
1 11/11/2010
2 09/09/2009
1. Connect SQF to an expression.
2. In expression make hire_date as input only and make another port hire_date1 as o/p
port with date data type.
3. In o/p port of hire_date write condition like as below
The way I achieved is for each of the vowels in ename , I replaced it with null and in
port total vowel count , I substract the vowel port from the ename length which gives
me the individual count of vowels, after adding up for all vowels I found all the vowels
present. Here are all the variable ports.
For A write REPLACECHR(0,ENAME,'a',NULL)
For E write REPLACECHR(0,ENAME,'e',NULL)
For I write REPLACECHR(0,ENAME,'i',NULL)
For O write REPLACECHR(0,ENAME,'o',NULL)
For U write REPLACECHR(0,ENAME,'u',NULL)
And for o/p column total_vowels_count write expression like this
(length(ENAME)-length(A))
+
(length(ENAME)-length(E))
+
(length(ENAME)-length(I))
+
(length(ENAME)-length(O))
+
(length(ENAME)-length(U))
Scenario:There is a emp table and from that table insert the data to targt where
sal<3000 and reject other rows.
Q24 The Emp table contains the salary and commission in USD, in the target the
com and sal will converted to a given currency prefix ex: Rs.
Source
Target
2. In expression make a output port sal1 and make sal as input port only.
3. In sal1 write the condition as like bellow
Q23 In source there are some record. Suppose I want to send three targets. First
record will go to first target, Second one will go to second target and third record
will go to third target and then 4th to 1st,5th to 2nd , 6th to 3rd and so on.
3. Drag all output port of expression to router. In router make three groups and gve the
conditions Like
this
Currency convertor
Q22 Suppose that a source contains a column which holds the salary information
prefixed with the currency code , for example
EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO
7369 SMITH CLERK 7902 17-DEC-80 $300 20
7499 ALLEN SALESMAN 7698 20-FEB-81 £1600 30
7521 WARD SALESMAN 7698 22-FEB-81 ¥8500 30
In the target different currency will evaluate to a single currency value, for example
covert all to Rupees.
1. First thing we should consider that there are different types of currency like pound,
dollar, yen etc.So it’s a good idea to use mapping parameter or variable.Go to
mapping=> mapping parameter and variables then create three parameters (for this
example) and set its initial value as bellow
2. Then drag the source to mapping area and connect to an expression transformation.
3. In expression create a output port as sal1 and make sal as input only as bellow.
iif(instr(SAL,'$')!=0,TO_integer(SUBSTR(SAL,INSTR(SAL,'$')+1,LENGTH(SAL)-
1))*$$DOLAR,
iif(instr(SAL,'£')!=0,TO_integer(SUBSTR(SAL,INSTR(SAL,'£')+1,LENGTH(SAL)-
1))*$$POUND,
iif(instr(SAL,'¥')!=0,TO_integer(SUBSTR(SAL,INSTR(SAL,'¥')+1,LENGTH(SAL)-
1))*$$YEN
)
)
)
$$DOLAR,$$POUND,$$YEN these are mapping parameter . you can multiply
price in rupee directly for example dollar price in rupees i.e 48 .
5. Connect required output port from expression to target directly. And run the
session.
Q21: Reading a source file with salary prefix $ , in the target the Sal column must
store in number .
Source
EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO
7369 SMITH CLERK 7902 17-DEC-80 $800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 $1600 30
Target
EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 30
1. Drag the source to mapping area and connect each port to an expression
transformation.
2. In expression transformation add a new col sal1 and make it as out put and sal as in
put only as shown in picture.
3. In expression write the condition like this.
Solution:
2. Create a mapping as shown in the figure( I have considered a simple scenario where
a particular department id will be filtered to the target).
3. In filter set deptno=$$v1 (that means only dept no 20 record will go to the target.)
4. Mapping parameter value can’t change throughout the session but variable can be
changed. We can change variable value by using text file. I’ll show it in next scenario.
Solution:
1. In repository go to menu “tool” then “queries”. Query Browser dialog box will
appear.Then click on new button.
2. In Query Editor, choose folder name and object type as I have shown in the picture.
Solution:
1. Drag your target file to target designer and add a column as show on the picture. It’s
not a normal column .click on the ‘add file name to the table’ property. (I have given a
red mark there)
2. Then drag your source to mapping area and connect it to an expression
transformation.
3. In expression transformation add a new port as string data type and make it output
port.
4. In that output port write the condition like describe as bellow and then map it in to
filename port of target. Also send other ports to target. Finally run the session. You
will find two file one with sys date and other one is ‘.out’ file which one you can
delete.
5.
Target table rows , with each row as sum of all previous rows from source
table.
Scenario: How to produce rows in target table with every row as sum of all previous
rows in source table ? See the source and target table to understand the scenario.
SOURCE TABLE
Id Sal
1 200
2 300
3 500
4 560
TARGET TABLE
Id Sal
1 200
2 500
3 1000
4 1560
2. In expression add one column and make it output(sal1) and sal port as input only.
We will make use of a function named cume() to solve our problem, rather using any
complex mapping. Write the expression in sal1 as cume(sal) and send the output
rows to target.
Concatenation of duplicate value by comma separation
Scenario: You have two columns in source table T1, in which the col2 may contain
duplicate values.All the duplicate values in col2 of will be transformed as comma
separated in the column col2 of target table T2.
Source Table: T1
Col1 Col2
A x
B y
C z
A m
B n
Target Table: T2
col1 col2
A x,m
B y,n
C z
Solution:
1. We have to use the following transformation as below.
First connect a sorter transformation to source and make col1 as key and its order is
ascending. After that connect it to an expression transformation.
2. In Expression make four new port and give them name as in picture below.
3. In concat_val write expression like as describe bellow and send it to an aggregator
Scenario: There is a source table and 3 destination table T1,T2, T3. How to insert
first 1 to 10 record in T1, records from 11 to 20 in T2 and 21 to 30 in T3.Then again
from 31 to 40 into T1, 41 to 50 in T2 and 51 to 60 in T3 and so on i.e in cyclic order.
Solution:
1. Drag the source and connect to an expression.Connect the next value port of
sequence generator to expression.
2. Send the all ports to a router and make three groups as bellow
Group1
Group2
Group3
Extracting every nth row
Scenario: How to load every nth row from a Flat file/ relational DB to the target?
Suppose n=3, then in above condition the row numbered 3,6,9,12,....so on, This
example takes every 3 row to target table.
Solution:
2. In expression create a new port (validate) and write the expression like in the picture
below.
3. Connect a filter transformation to expression and write the condition in property like
in the picture below.
Scenario 13: There are 4 departments in Emp table. The first one with 100,2nd with
5, 3rd with 30 and 4th dept has 12 employees. Extract those dept numbers which has
more than 5 employees in it, to a target table.
Solution:
1. Put the source to mapping and connect the ports to aggregator transformation.
4. In expression make four output ports (dept10, dept20, dept30, dept40) to validate
dept no
And provide the expression like in the picture below.
5. Then connect to router transformation. And create a group and fill condition like
below.
6. Finally connect to target table having one column that is dept no.
Solution:
sorter properties
3. Add the next value of sequence generator to expression.(start the value from 1 in
sequence generator).
sorter to exp mapping
4. Connect the expression transformation to a filter or router. In the property set the
condition as follows-
Solution:
Step 1: Drag the source to mapping.
Step 2: Connect the router transformation to source and in router make 4 groups and
give condition like below.
router transformation
Solution:
sorter properties
3. Add the next value of sequence generator to expression.(start the value from 1 in sequence
generator).
sorter to exp mapping
4. Connect the expression transformation to a filter or router. In the property set the condition
as follows-
Scenario 10: How to separate the original records from source table to separate
target table by using rank transformation ?
Source Table
a
b c
X y z
A B c
R F u
A B c
V F r
V F r
Target Table
A B c
X Y z
R F u
Col1 Col2 Col3
V F r
Solution:
Step 1: Bring the source to mapping.
Solution:
Step 1: Drag the source and connect to an expression transformation.
Step2: Add the next value of a sequence generator to expression transformation.
Step 3: In expression transformation make two port, one is "odd" and another "even".
And Write the expression like below
expression property
Step 4: Connect a router transformation to expression.
Make two group in router.
And give condition Like below
rtr property
Solution
Step 1: Drag and drop the source to mapping.
1select * from emp minus select * from emp where rownum <= ( select count(*)/2 f
src qualifier sql query
Step:3 Then connect to target, and run mapping to see the results.
Solution:
3. Then connect to target.Now you are ready to run the mapping to see it in action.
Solution
Step4:Chose the advance option. Set number of initial rows skip: 1 ( it can be more
as per requirement )
adv properties
Scenario 4:
Solution:
Step 1: Drag and drop ports from source qualifier to two rank transformations.
Step 2: Create a reusable sequence generator having start value 1 and connect the next
value to both rank transformations.
Step 3: Set rank properties as follows
In Rank1
In Rank2
Step 4: Make two instances of the target.