You are on page 1of 18

SORTER

__ sorting the tables in ascending or descending and


aslo to obtain Distinct records.

RANK
__ Top or bottom 'N' analysis .
JOINER
__ Join two different sources cmng from different and
same location .

FILTER
__ filters the rows that do not meet the condition.
ROUTER
__ It is useful to test multiple conditions .
AGGREGATOR
__ To perform group calculation such as count ,
max , min , sum , ag !mainly to perform
calculation or multiple rows or group"

NORMALIZER
__ #eads cobol files ! denormali$ed format".
%plit a single row into multiple rows.

SOURCE
QUALIFIER
__ It performs many tas&s such as oerride default
s'l 'uery , filtering records , (oin data from two or
more table etc
#epresents the flatfile or relational data.

UNION __ It merges data from multiple sources similar to
the )NI*N +,, %-, statement to combine the
results from two or more %-, statements. %imilar
to the )NI*N +,, statement, the )nion
transformation does not remoe duplicate rows.

EXPRESSION
__ .ou can use the /xpression transformation to
calculate alues in a single row before you write
to the target.

LOOK UP
__ )se a ,oo&up transformation in a mapping to loo&
up data in a flat file or a relational table, iew, or
synonym.

STORED
PROCEDURE
__ stored procedures to automate tas&s that are too
complicated for standard %-, statements..ou can
call by using %tored 0rocedure Transformation.

XML SOURCE
QUALIFIER
__ 1hen you add an 23, source definition to a
mapping, you need to connect it to an 23, %ource
-ualifier transformation.

UPDATE
STRATEGY
__ To flag rows for insert, delete, update, or re(ect..

+#45IT/4T)#/
Data Warehoue Ar!h"te!ture
ST
AR
SC
#E
MA

%tar schema architecture is the simplest
data warehouse design. The main feature
of a star schema is a table at the center,
called the fact table and the dimension
tables which allow browsing of specific
categories, summari$ing, drill6downs and
specifying criteria.
Typically, most of the fact tables in a star
schema are in database third normal form,
while dimensional tables are de6normali$ed
!second normal form".
Fa!t ta$%e
The fact table is not a typical relational
database table as it is de6normali$ed on
purpose 6 to enhance 'uery response
times. The fact table typically contains
records that are ready to explore, usually
with ad hoc 'ueries. #ecords in the fact
table are often referred to as eents, due
to the time6ariant nature of a data
warehouse enironment.
The primary &ey for the fact table is a
composite of all the columns except
numeric alues 7 scores !li&e -)+NTIT.,
T)#N*8/#, exact inoice date and time".
Typical fact tables in a global enterprise
data warehouse are !apart for those, there
may be some company or business specific
fact tables"9
sales fact table 6 contains all details
regarding sales
orders fact table 6 in some cases the table
can be split into open orders and historical
orders. %ometimes the alues for historical
orders are stored in a sales fact table.
budget fact table 6 usually grouped by
month and loaded once at the end of a
year.
forecast fact table 6 usually grouped by
month and loaded daily, wee&ly or
monthly.
inentory fact table 6 report stoc&s, usually
refreshed daily
D"&e'"o' ta$%e
Nearly all of the information in a typical
fact table is also present in one or more
dimension tables. The main purpose of
maintaining Dimension Tables is to allow
browsing the categories 'uic&ly and easily.
The primary &eys of each of the dimension
tables are lin&ed together to form the
composite primary &ey of the fact table. In
a star schema design, there is only one de6
normali$ed table for a gien dimension.
Typical dimension tables in a data
warehouse are9
time dimension table
customers dimension table
products dimension table
&ey account managers !:+3" dimension
table
sales office dimension table
%tar schema example
+n example of a star schema architecture
is depicted below.
SNOWFLAKE SC#EMA
%nowfla&e schema architecture is a more complex ariation of a star
schema design. The main difference is that dimensional tables in a
snowfla&e schema are normali$ed, so they hae a typical relational
database design.
%nowfla&e schemas are generally used when a dimensional table becomes
ery big and when a star schema can;t represent the complexity of a data
structure. <or example if a 0#*D)4T dimension table contains millions of
rows, the use of snowfla&e schemas should significantly improe
performance by moing out some data to other table !with =#+ND% for
instance".
The problem is that the more normali$ed the dimension table is, the more
complicated %-, (oins must be issued to 'uery them. This is because in
order for a 'uery to be answered, many tables need to be (oined and
aggregates generated.
+n example of a snowfla&e schema architecture is depicted below.
>+,+2. %45/3+
<or each star schema or snowfla&e schema it is possible to construct a
fact constellation schema.
This schema is more complex than star or snowfla&e architecture, which is
because it contains multiple fact tables. This allows dimension tables to be
shared amongst many fact tables.
That solution is ery flexible, howeer it may be hard to manage and
support.
The main disadantage of the fact constellation schema is a more
complicated design because many ariants of aggregation must be
considered.
In a fact constellation schema, different fact tables are explicitly assigned
to the dimensions, which are for gien facts releant. This may be useful
in cases when some facts are associated with a gien dimension leel and
other facts with a deeper dimension leel.
)se of that model should be reasonable when for example, there is a sales
fact table !with details down to the exact date and inoice header id" and
a fact table with sales forecast which is calculated based on month, client
id and product id.
In that case using two different fact tables on a different leel of grouping
is reali$ed through a fact constellation model.
Sour!e S(te&
+ database, application, file, or other
storage facility from which the data in a
data warehouse is deried.
Ma))"'*
The definition of the relationship and data
flow between source and target ob(ects.
Meta +ata
Data that describes data and other
structures, such as ob(ects, business rules,
and processes. <or example, the schema
design of a data warehouse is typically
stored in a repository as meta data, which
is used to generate scripts used to build
and populate the data warehouse. +
repository contains meta data.
Sta*"'* Area
+ place where data is processed before
entering the warehouse.
C%ea'"'*
The process of resoling inconsistencies
and fixing the anomalies in source data,
typically as part of the /T, process.
Tra',or&at"o'
The process of manipulating data. +ny
manipulation beyond copying is a
transformation. /xamples include
cleansing, aggregating, and integrating
data from multiple sources.
Tra')ortat"o'
The process of moing copied or
transformed data from a source to a data
warehouse.
Tar*et S(te&
+ database, application, file, or other
storage facility to which the ?transformed
source data? is loaded in a data warehouse.
<igure @.@A 9 %ample /T, 0rocess <low
I',or&at"!a
Informatica is a powerful /T, tool from
Informatica 4orporation, a leading proider
of enterprise data integration software and
/T, softwares.
The important Informatica 4omponents
are9
0ower /xchange
0ower 4enter
0ower 4enter 4onnect
0ower 4hannel
3etadata /xchange
0ower +naly$er
%uper >lue
In Informatica, all the 3etadata information
about source systems, target systems and
transformations are stored in the
Informatica repository. Informatica's 0ower
4enter 4lient and #epository %erer access
this repository to store and retriee
metadata.
Note9 To &now more about 3etadata and
its significance, please clic& here.
Sour!e a'+ Tar*et
4onsider a =an& that has got many
branches throughout the world. In each
branch data may be stored in different
source systems li&e oracle, s'l serer,
terradata, etc. 1hen the =an& decides to
integrate its data from seeral sources for
its management decisions, it may choose
one or more systems li&e oracle, s'l serer,
terradata, etc. as its data warehouse
target. 3any organisations prefer
Informatica to do that /T, process,
because Informatica is more powerful in
designing and building data warehouses. It
can connect to seeral sources and targets
to extract meta data from sources and
targets, transform and load the data into
target systems.
>uidelines to wor& with Informatica 0ower
4enter
#epository9 This is where all the
metadata information is stored in the
Informatica suite. The 0ower 4enter
4lient and the #epository %erer
would access this repository to
retriee, store and manage
metadata.
0ower 4enter 4lient9 Informatica
client is used for managing users,
identifiying source and target
systems definitions, creating
mapping and mapplets, creating
sessions and run wor&flows etc.
#epository %erer9 This repository
serer ta&es care of all the
connections between the repository
and the 0ower 4enter 4lient.
0ower 4enter %erer9 0ower 4enter
serer does the extraction from
source and then loading data into
targets.
Designer9 %ource +naly$er, 3apping
Designer and 1arehouse Designer
are tools reside within the Designer
wi$ard. %ource +naly$er is used for
extracting metadata from source
systems.
3apping Designer is used to create
mapping between sources and targets.
3apping is a pictorial representation about
the flow of data from source to target.
1arehouse Designer is used for extracting
metadata from target systems or metadata
can be created in the Designer itself.
Data 4leansing9 The 0ower4enter's
data cleansing technology improes
data 'uality by alidating, correctly
naming and standardi$ation of
address data. + person's address
may not be same in all source
systems because of typos and postal
code, city name may not match with
address. These errors can be
corrected by using data cleansing
process and standardi$ed data can
be loaded in target systems !data
warehouse".
Transformation9 Transformations help
to transform the source data
according to the re'uirements of
target system. %orting, <iltering,
+ggregation, Joining are some of the
examples of transformation.
Transformations ensure the 'uality of
the data being loaded into target and
this is done during the mapping
process from source to target.
1or&flow 3anager9 1or&flow helps to
load the data from source to target in
a se'uential manner. <or example, if
the fact tables are loaded before the
loo&up tables, then the target system
will pop up an error message since
the fact table is iolating the foreign
&ey alidation. To aoid this,
wor&flows can be created to ensure
the correct flow of data from source
to target.
1or&flow 3onitor9 This monitor is
helpful in monitoring and trac&ing the
wor&flows created in each 0ower
4enter %erer.
0ower 4enter 4onnect9 This
component helps to extract data and
metadata from /#0 systems li&e
I=3's 3-%eries, 0eoplesoft, %+0,
%iebel etc. and other third party
applications.
0ower 4enter /xchange9 This
component helps to extract data and
metadata from /#0 systems li&e
I=3's 3-%eries, 0eoplesoft, %+0,
%iebel etc. and other third party
applications.
I',or&at"!a
Po-er E.!ha'*e
Informatica 0ower /xchange as a stand
alone serice or along with 0ower 4enter,
helps organi$ations leerage data by
aoiding manual coding of data extraction
programs. 0ower /xchange supports batch,
real time and changed data capture
options in main frame!D=A, 8%+3, I3%
etc.,", mid range !+%BCC D=A etc.,", and for
relational databases !oracle, s'l serer,
dbA etc" and flat files in unix, linux and
windows systems.
Po-er Cha''e%
This helps to transfer large amount of
encrypted and compressed data oer ,+N,
1+N, through <irewalls, tranfer files oer
<T0, etc.
Meta Data E.!ha'*e
3etadata /xchange enables organi$ations
to ta&e adantage of the time and effort
already inested in defining data structures
within their IT enironment when used with
0ower 4enter. <or example, an organi$ation
may be using data modeling tools, such as
/rwin, /mbarcadero, *racle designer,
%ybase 0ower Designer etc for deeloping
data models. <unctional and technical team
should hae spent much time and effort in
creating the data model's data
structures!tables, columns, data types,
procedures, functions, triggers etc". =y
using meta deta exchange, these data
structures can be imported into power
center to identifiy source and target
mappings which leerages time and effort.
There is no need for informatica deeloper
to create these data structures once again.
Po-er A'a%(/er
0ower +naly$er proides organi$ations with
reporting facilities. 0ower+naly$er ma&es
accessing, analy$ing, and sharing
enterprise data simple and easily aailable
to decision ma&ers. 0ower+naly$er enables
to gain insight into business processes and
deelop business intelligence.
1ith 0ower+naly$er, an organi$ation can
extract, filter, format, and analy$e
corporate information from data stored in a
data warehouse, data mart, operational
data store, or otherdata storage models.
0ower+naly$er is best with a dimensional
data warehouse in a relational database. It
can also run reports on data in any table in
a relational database that do not conform
to the dimensional model.
Su)er G%ue
%uperglue is used for loading metadata in a
centrali$ed place from seeral sources.
#eports can be run against this superglue
to analy$e meta data.
Po-er Mart
0ower 3art is a departmental ersion of
Informatica for building, deploying, and
managing data warehouses and data
marts. 0ower center is used for corporate
enterprise data warehouse and power mart
is used for departmental data warehouses
li&e data marts. 0ower 4enter supports
global repositories and networ&ed
repositories and it can be connected to
seeral sources. 0ower 3art supports
single repository and it can be connected
to fewer sources when compared to 0ower
4enter. 0ower 3art can extensibily grow to
an enterprise implementation and it is easy
for deeloper productiity through a
codeless enironment.
Note9This is not a complete tutorial on
Informatica. 1e will add more Tips and
>uidelines on Informatica in near future.
I',or&at"!a 0 Tra',or&at"o'
In Informatica, Transformations help to
transform the source data according to the
re'uirements of target system and it
ensures the 'uality of the data being
loaded into target.
Transformations are of two types9 +ctie
and 0assie.
A!t"1e Tra',or&at"o'
+n actie transformation can change the
number of rows that pass through it from
source to target i.e it eliminates rows that
do not meet the condition in
transformation.
Pa"1e Tra',or&at"o'
+ passie transformation does not change
the number of rows that pass through it i.e
it passes all rows through the
transformation.
Transformations can be 4onnected or
)n4onnected.
Co''e!te+ Tra',or&at"o'
4onnected transformation is connected to
other transformations or directly to target
table in the mapping.
U'Co''e!te+ Tra',or&at"o'
+n unconnected transformation is not
connected to other transformations in the
mapping. It is called within another
transformation, and returns a alue to that
transformation.
L"t o, Tra',or&at"o'
<ollowing are the list of Transformations
aailable in
0ower4enter9
+ggregator Transformation
/xpression Transformation
<ilter Transformation
Joiner Transformation
,oo&up Transformation
Normali$er Transformation
#an& Transformation
#outer Transformation
%e'uence >enerator Transformation
%tored 0rocedure Transformation
%orter Transformation
)pdate %trategy Transformation
23, %ource -ualifier Transformation
+danced /xternal 0rocedure
Transformation
/xternal Transformation
)nion Transformation
In the following pages, we will explain all
the aboe Informatica Transformations and
their significances in the /T, process in
detail.
A**re*ator Tra',or&at"o'
+ggregator transformation is an +ctie and
4onnected transformation. This
transformation is useful to perform
calculations such as aerages and sums
!mainly to perform calculations on multiple
rows or groups". <or example, to calculate
total of daily sales or to calculate aerage
of monthly or yearly sales. +ggregate
functions such as +8>, <I#%T, 4*)NT,
0/#4/NTI,/, 3+2, %)3 etc. can be used in
aggregate transformation.
E.)re"o' Tra',or&at"o'
/xpression transformation is a 0assie and
4onnected transformation. This can be
used to calculate alues in a single row
before writing to the target. <or example,
to calculate discount of each product or to
concatenate first and last names or to
conert date to a string field.
F"%ter Tra',or&at"o'
<ilter transformation is an +ctie and
4onnected transformation. This can be
used to filter rows in a mapping that do not
meet the condition. <or example, to &now
all the employees who are wor&ing in
Department @C or to find out the products
that falls between the rate category DECC
and D@CCC.
Jo"'er Tra',or&at"o'
Joiner Transformation is an +ctie and
4onnected transformation. This can be
used to (oin two sources coming from two
different locations or from same location.
<or example, to (oin a flat file and a
relational source or to (oin two flat files or
to (oin a relational source and a 23,
source. In order to (oin two sources, there
must be at least one matching port. 1hile
(oining two sources it is a must to specify
one source as master and the other as
detail. The Joiner transformation supports
the following types of (oins9
Normal
3aster *uter
Detail *uter
<ull *uter
Normal (oin discards all the rows of data
from the master and detail source that do
not match, based on the condition.
3aster outer (oin discards all the
unmatched rows from the master source
and &eeps all the rows from the detail
source and the matching rows from the
master source.
Detail outer (oin &eeps all rows of data from
the master source and the matching rows
from the detail source. It discards the
unmatched rows from the detail source.
<ull outer (oin &eeps all rows of data from
both the master and detail sources.
Loo2u) Tra',or&at"o'
,oo&up transformation is 0assie and it can
be both 4onnected and )n4onnected as
well. It is used to loo& up data in a
relational table, iew, or synonym. ,oo&up
definition can be imported either from
source or from target tables.
<or example, if we want to retriee all the
sales of a product with an ID @C and
assume that the sales data resides in
another table. 5ere instead of using the
sales table as one more source, use ,oo&up
transformation to loo&up the data for the
product, with ID @C in sales table.
Difference between 4onnected and
)n4onnected ,oo&up Transformation9
4onnected loo&up receies input alues
directly from mapping pipeline whereas
)n4onnected loo&up receies alues from9
,:0 expression from another
transformation.
4onnected loo&up returns multiple columns
from the same row whereas )n4onnected
loo&up has one return port and returns one
column from each row.
4onnected loo&up supports user6defined
default alues whereas )n4onnected
loo&up does not support user defined
alues..
Nor&a%"/er Tra',or&at"o'
Normali$er Transformation is an +ctie and
4onnected transformation. It is used mainly
with 4*=*, sources where most of the
time data is stored in de6normali$ed
format. +lso, Normali$er transformation
can be used to create multiple rows from a
single row of data.
Ra'2 Tra',or&at"o'
#an& transformation is an +ctie and
4onnected transformation. It is used to
select the top or bottom ran& of data. <or
example, to select top @C #egions where
the sales olume was ery high or to select
@C lowest priced products.
Router Tra',or&at"o'
#outer is an +ctie and 4onnected
transformation. It is similar to filter
transformation. The only difference is, filter
transformation drops the data that do not
meet the condition whereas router has an
option to capture the data that do not meet
the condition. It is useful to test multiple
conditions. It has input, output and default
groups. <or example, if we want to filter
data li&e where %tateF3ichigan,
%tateF4alifornia, %tateFNew .or& and all
other %tates. It;s easy to route data to
different tables..
Se3ue'!e Ge'erator Tra',or&at"o'
%e'uence >enerator transformation is a
0assie and 4onnected transformation. It is
used to create uni'ue primary &ey alues
or cycle through a se'uential range of
numbers or to replace missing &eys.
It has two output ports to connect
transformations. =y default it has two fields
4)##8+, and N/2T8+,!.ou cannot add
ports to this transformation". N/2T8+, port
generates a se'uence of numbers by
connecting it to a transformation or target.
4)##8+, is the N/2T8+, alue plus one or
N/2T8+, plus the Increment =y alue.
Store+ Pro!e+ure Tra',or&at"o'
%tored 0rocedure transformation is a
0assie and 4onnected G )n4onnected
transformation. It is useful to automate
time6consuming tas&s and it is also used in
error handling, to drop and recreate
indexes and to determine the space in
database, a speciali$ed calculation etc.
The stored procedure must exist in the
database before creating a %tored
0rocedure transformation, and the stored
procedure can exist in a source, target, or
any database with a alid connection to the
Informatica %erer. %tored 0rocedure is an
executable script with %-, statements and
control statements, user6defined ariables
and conditional statements. In case of
stored procedure transformation procedure
will be compiled and executed in a
relational data source. .ou need data base
connection to import the stored procedure
in to your maping
Sorter Tra',or&at"o'
%orter transformation is a 4onnected and
an +ctie transformation. It allows to sort
data either in ascending or descending
order according to a specified field. +lso
used to configure for case6sensitie sorting,
and specify whether the output rows
should be distinct.
Sour!e Qua%","er Tra',or&at"o'
%ource -ualifier transformation is an +ctie
and 4onnected transformation. 1hen
adding a relational or a flat file source
definition to a mapping, it is must to
connect it to a %ource -ualifier
transformation. The %ource -ualifier
performs the arious tas&s such as
oerriding default %-, 'uery, filtering
recordsH (oin data from two or more tables
etc.
U)+ate Strate*( Tra',or&at"o'
)pdate strategy transformation is an +ctie
and 4onnected transformation. It is used to
update data in target table, either to
maintain history of data or recent changes.
.ou can specify how to treat source rows in
table, insert, update, delete or data drien.
XML Sour!e Qua%","er Tra',or&at"o'
23, %ource -ualifier is a 0assie and
4onnected transformation. 23, %ource
-ualifier is used only with an 23, source
definition. It represents the data elements
that the Informatica %erer reads when it
executes a session with 23, sources.
A+1a'!e+ E.ter'a% Pro!e+ure
Tra',or&at"o'
+danced /xternal 0rocedure
transformation is an +ctie and 4onnected
transformation. It operates in con(unction
with procedures, which are created outside
of the Designer interface to extend
0ower4enter70ower3art functionality. It is
useful in creating external transformation
applications, such as sorting and
aggregation, which re'uire all input rows to
be processed before emitting any output
rows.
U'"o' Tra',or&at"o'
The union transformation is used to merge
multiple datasets from arious streams or
pipelines into one dataset. This
transformation wor&s similar to the )NI*N
+,,, it does not remoe any duplicate rows.
It is recommended to use aggregator to
remoe duplicates are not expected at the
target.
E.ter'a% Pro!e+ure Tra',or&at"o'
/xternal 0rocedure transformation is an
+ctie and 4onnected7)n4onnected
transformations. %ometimes, the standard
transformations such as /xpression
transformation may not proide the
functionality that you want. In such cases
/xternal procedure is useful to deelop
complex functions within a dynamic lin&
library !D,," or )NI2 shared library, instead
of creating the necessary /xpression
transformations in a mapping.
Differences between +danced /xternal
0rocedure and /xternal 0rocedure
Transformations9
/xternal 0rocedure returns single alue,
where as +danced /xternal 0rocedure
returns multiple alues.
/xternal 0rocedure supports 4*3 and
Informatica procedures where as +/0
supports only Informatica 0rocedures

You might also like