You are on page 1of 3

Joiner transformation vs SQL transformation

There is a simple reason why Joining tables in SQ is better then joining


them in Joiner transformation:
Since anyways u are going to join the tables using some join condition,
doing that in source qualifier prevents the unnecessary records to pass
through the pipeline till it reaches the joiner transformation.
Informatica server will unneccesary process them which will unneccesary
consume the memory hence causing the degradation of mapping performance.

Based on the performance you cn decide what to do. If the number of records in the
two tables are less, you can choose to have joiner, and if you want to push the
Database server to do the bulk of pulling, then do the Source qualifier.

The easiest answer is YES and NO. SQL overrides are better is some cases, Joiners
are better in others.

1. Assuming the source database has zero performance issues, a SQL override is
almost *ALWAYS* superior to Joiners because a database engine doesn't have the
overhead that Informatica has, for example, moving data from the DB storage to over
the network (usually) and writing the data to cache files on the Informatica server
to accomplish the join. Now -- database engines do similar tasks internally, but
they are highly optimized. It's their job :-)

** My soapbox :-) You shouldn't automatically default to using "SQL overrides". You
can define the tables and define the "user defined join" and "source filter"
sections of the SQ object. There are some (most ideologically)) benefits to this.
Informatica can generate a query dynamically based on these settings.

2) If your source database is overwhelmed, underpowered or both, then the Joiner is


typically a better option. This is because your select statement is not impacting
the machine as much as joining tables and in many cases.

3) If your source data is *HUGE* then the Joiner is not the best options for the
reasons listed in #1, plus the fact that it will require huge amounts of disk
space. If you have to join millions of rows each run, be prepared to have 100 gigs
dedicated to the cache.

4) Another option people forget is just doing lookups too. In a typical


datawarehouse environment, I'll read from the fact with a few filters and do
lookups on the dimensions. Newbies seem to understand it easier (Especially support
staffs)

In our test environment, Oracle 10g performs JOIN operation 24% faster than
Informatica Joiner Transformation while without Index and 42% faster with Database
Index

In Oracle there are many ways to delete duplicate records. Note that below example
are described to just explain the different possibilities.

Consider the EMP table with below rows

create table emp(


EMPNNO integer,
EMPNAME varchar2(20),
SALARY number);

10 Bill 2000
11 Bill 2000
12 Mark 3000
12 Mark 3000
12 Mark 3000
13 Tom 4000
14 Tom 5000
15 Susan 5000

1. Using rowid

SQL > delete from emp


where rowid not in
(select max(rowid) from emp group by empno);

This technique can be applied to almost scenarios. Group by operation should be on


the columns which identify the duplicates.

2. Using self-join

SQL > delete from emp e1


where rowid not in
(select max(rowid) from emp e2
where e1.empno = e2.empno );

3. Using row_number()

SQL > delete from emp where rowid in


(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);

This is another efficient way to delete duplicates

4. Using dense_rank()

SQL > delete from emp where rowid in


(
select rid from
(
select rowid rid,
dense_rank() over(partition by empno order by rowid) rn
from emp
)
where rn > 1
);

Here you can use both rank() and dens_rank() since both will give unique records
when order by rowid.
5. Using group by

Consider the EMP table with below rows

10 Bill 2000
11 Bill 2000
12 Mark 3000
13 Mark 3000

SQL > delete from emp where


(empno,empname,salary) in
(
select max(empno),empname,salary from emp
group by empname,salary
);

This technique is only applicable in few scenarios.

You might also like