__ sorting the tables in ascending or descending and
aslo to obtain Distinct records.
RANK __ Top or bottom 'N' analysis . JOINER __ Join two different sources cmng from different and same location .
FILTER __ filters the rows that do not meet the condition. ROUTER __ It is useful to test multiple conditions . AGGREGATOR __ To perform group calculation such as count , max , min , sum , ag !mainly to perform calculation or multiple rows or group"
NORMALIZER __ #eads cobol files ! denormali$ed format". %plit a single row into multiple rows.
SOURCE QUALIFIER __ It performs many tas&s such as oerride default s'l 'uery , filtering records , (oin data from two or more table etc #epresents the flatfile or relational data.
UNION __ It merges data from multiple sources similar to the )NI*N +,, %-, statement to combine the results from two or more %-, statements. %imilar to the )NI*N +,, statement, the )nion transformation does not remoe duplicate rows.
EXPRESSION __ .ou can use the /xpression transformation to calculate alues in a single row before you write to the target.
LOOK UP __ )se a ,oo&up transformation in a mapping to loo& up data in a flat file or a relational table, iew, or synonym.
STORED PROCEDURE __ stored procedures to automate tas&s that are too complicated for standard %-, statements..ou can call by using %tored 0rocedure Transformation.
XML SOURCE QUALIFIER __ 1hen you add an 23, source definition to a mapping, you need to connect it to an 23, %ource -ualifier transformation.
UPDATE STRATEGY __ To flag rows for insert, delete, update, or re(ect..
+#45IT/4T)#/ Data Warehoue Ar!h"te!ture ST AR SC #E MA
%tar schema architecture is the simplest data warehouse design. The main feature of a star schema is a table at the center, called the fact table and the dimension tables which allow browsing of specific categories, summari$ing, drill6downs and specifying criteria. Typically, most of the fact tables in a star schema are in database third normal form, while dimensional tables are de6normali$ed !second normal form". Fa!t ta$%e The fact table is not a typical relational database table as it is de6normali$ed on purpose 6 to enhance 'uery response times. The fact table typically contains records that are ready to explore, usually with ad hoc 'ueries. #ecords in the fact table are often referred to as eents, due to the time6ariant nature of a data warehouse enironment. The primary &ey for the fact table is a composite of all the columns except numeric alues 7 scores !li&e -)+NTIT., T)#N*8/#, exact inoice date and time". Typical fact tables in a global enterprise data warehouse are !apart for those, there may be some company or business specific fact tables"9 sales fact table 6 contains all details regarding sales orders fact table 6 in some cases the table can be split into open orders and historical orders. %ometimes the alues for historical orders are stored in a sales fact table. budget fact table 6 usually grouped by month and loaded once at the end of a year. forecast fact table 6 usually grouped by month and loaded daily, wee&ly or monthly. inentory fact table 6 report stoc&s, usually refreshed daily D"&e'"o' ta$%e Nearly all of the information in a typical fact table is also present in one or more dimension tables. The main purpose of maintaining Dimension Tables is to allow browsing the categories 'uic&ly and easily. The primary &eys of each of the dimension tables are lin&ed together to form the composite primary &ey of the fact table. In a star schema design, there is only one de6 normali$ed table for a gien dimension. Typical dimension tables in a data warehouse are9 time dimension table customers dimension table products dimension table &ey account managers !:+3" dimension table sales office dimension table %tar schema example +n example of a star schema architecture is depicted below. SNOWFLAKE SC#EMA %nowfla&e schema architecture is a more complex ariation of a star schema design. The main difference is that dimensional tables in a snowfla&e schema are normali$ed, so they hae a typical relational database design. %nowfla&e schemas are generally used when a dimensional table becomes ery big and when a star schema can;t represent the complexity of a data structure. <or example if a 0#*D)4T dimension table contains millions of rows, the use of snowfla&e schemas should significantly improe performance by moing out some data to other table !with =#+ND% for instance". The problem is that the more normali$ed the dimension table is, the more complicated %-, (oins must be issued to 'uery them. This is because in order for a 'uery to be answered, many tables need to be (oined and aggregates generated. +n example of a snowfla&e schema architecture is depicted below. >+,+2. %45/3+ <or each star schema or snowfla&e schema it is possible to construct a fact constellation schema. This schema is more complex than star or snowfla&e architecture, which is because it contains multiple fact tables. This allows dimension tables to be shared amongst many fact tables. That solution is ery flexible, howeer it may be hard to manage and support. The main disadantage of the fact constellation schema is a more complicated design because many ariants of aggregation must be considered. In a fact constellation schema, different fact tables are explicitly assigned to the dimensions, which are for gien facts releant. This may be useful in cases when some facts are associated with a gien dimension leel and other facts with a deeper dimension leel. )se of that model should be reasonable when for example, there is a sales fact table !with details down to the exact date and inoice header id" and a fact table with sales forecast which is calculated based on month, client id and product id. In that case using two different fact tables on a different leel of grouping is reali$ed through a fact constellation model. Sour!e S(te& + database, application, file, or other storage facility from which the data in a data warehouse is deried. Ma))"'* The definition of the relationship and data flow between source and target ob(ects. Meta +ata Data that describes data and other structures, such as ob(ects, business rules, and processes. <or example, the schema design of a data warehouse is typically stored in a repository as meta data, which is used to generate scripts used to build and populate the data warehouse. + repository contains meta data. Sta*"'* Area + place where data is processed before entering the warehouse. C%ea'"'* The process of resoling inconsistencies and fixing the anomalies in source data, typically as part of the /T, process. Tra',or&at"o' The process of manipulating data. +ny manipulation beyond copying is a transformation. /xamples include cleansing, aggregating, and integrating data from multiple sources. Tra')ortat"o' The process of moing copied or transformed data from a source to a data warehouse. Tar*et S(te& + database, application, file, or other storage facility to which the ?transformed source data? is loaded in a data warehouse. <igure @.@A 9 %ample /T, 0rocess <low I',or&at"!a Informatica is a powerful /T, tool from Informatica 4orporation, a leading proider of enterprise data integration software and /T, softwares. The important Informatica 4omponents are9 0ower /xchange 0ower 4enter 0ower 4enter 4onnect 0ower 4hannel 3etadata /xchange 0ower +naly$er %uper >lue In Informatica, all the 3etadata information about source systems, target systems and transformations are stored in the Informatica repository. Informatica's 0ower 4enter 4lient and #epository %erer access this repository to store and retriee metadata. Note9 To &now more about 3etadata and its significance, please clic& here. Sour!e a'+ Tar*et 4onsider a =an& that has got many branches throughout the world. In each branch data may be stored in different source systems li&e oracle, s'l serer, terradata, etc. 1hen the =an& decides to integrate its data from seeral sources for its management decisions, it may choose one or more systems li&e oracle, s'l serer, terradata, etc. as its data warehouse target. 3any organisations prefer Informatica to do that /T, process, because Informatica is more powerful in designing and building data warehouses. It can connect to seeral sources and targets to extract meta data from sources and targets, transform and load the data into target systems. >uidelines to wor& with Informatica 0ower 4enter #epository9 This is where all the metadata information is stored in the Informatica suite. The 0ower 4enter 4lient and the #epository %erer would access this repository to retriee, store and manage metadata. 0ower 4enter 4lient9 Informatica client is used for managing users, identifiying source and target systems definitions, creating mapping and mapplets, creating sessions and run wor&flows etc. #epository %erer9 This repository serer ta&es care of all the connections between the repository and the 0ower 4enter 4lient. 0ower 4enter %erer9 0ower 4enter serer does the extraction from source and then loading data into targets. Designer9 %ource +naly$er, 3apping Designer and 1arehouse Designer are tools reside within the Designer wi$ard. %ource +naly$er is used for extracting metadata from source systems. 3apping Designer is used to create mapping between sources and targets. 3apping is a pictorial representation about the flow of data from source to target. 1arehouse Designer is used for extracting metadata from target systems or metadata can be created in the Designer itself. Data 4leansing9 The 0ower4enter's data cleansing technology improes data 'uality by alidating, correctly naming and standardi$ation of address data. + person's address may not be same in all source systems because of typos and postal code, city name may not match with address. These errors can be corrected by using data cleansing process and standardi$ed data can be loaded in target systems !data warehouse". Transformation9 Transformations help to transform the source data according to the re'uirements of target system. %orting, <iltering, +ggregation, Joining are some of the examples of transformation. Transformations ensure the 'uality of the data being loaded into target and this is done during the mapping process from source to target. 1or&flow 3anager9 1or&flow helps to load the data from source to target in a se'uential manner. <or example, if the fact tables are loaded before the loo&up tables, then the target system will pop up an error message since the fact table is iolating the foreign &ey alidation. To aoid this, wor&flows can be created to ensure the correct flow of data from source to target. 1or&flow 3onitor9 This monitor is helpful in monitoring and trac&ing the wor&flows created in each 0ower 4enter %erer. 0ower 4enter 4onnect9 This component helps to extract data and metadata from /#0 systems li&e I=3's 3-%eries, 0eoplesoft, %+0, %iebel etc. and other third party applications. 0ower 4enter /xchange9 This component helps to extract data and metadata from /#0 systems li&e I=3's 3-%eries, 0eoplesoft, %+0, %iebel etc. and other third party applications. I',or&at"!a Po-er E.!ha'*e Informatica 0ower /xchange as a stand alone serice or along with 0ower 4enter, helps organi$ations leerage data by aoiding manual coding of data extraction programs. 0ower /xchange supports batch, real time and changed data capture options in main frame!D=A, 8%+3, I3% etc.,", mid range !+%BCC D=A etc.,", and for relational databases !oracle, s'l serer, dbA etc" and flat files in unix, linux and windows systems. Po-er Cha''e% This helps to transfer large amount of encrypted and compressed data oer ,+N, 1+N, through <irewalls, tranfer files oer <T0, etc. Meta Data E.!ha'*e 3etadata /xchange enables organi$ations to ta&e adantage of the time and effort already inested in defining data structures within their IT enironment when used with 0ower 4enter. <or example, an organi$ation may be using data modeling tools, such as /rwin, /mbarcadero, *racle designer, %ybase 0ower Designer etc for deeloping data models. <unctional and technical team should hae spent much time and effort in creating the data model's data structures!tables, columns, data types, procedures, functions, triggers etc". =y using meta deta exchange, these data structures can be imported into power center to identifiy source and target mappings which leerages time and effort. There is no need for informatica deeloper to create these data structures once again. Po-er A'a%(/er 0ower +naly$er proides organi$ations with reporting facilities. 0ower+naly$er ma&es accessing, analy$ing, and sharing enterprise data simple and easily aailable to decision ma&ers. 0ower+naly$er enables to gain insight into business processes and deelop business intelligence. 1ith 0ower+naly$er, an organi$ation can extract, filter, format, and analy$e corporate information from data stored in a data warehouse, data mart, operational data store, or otherdata storage models. 0ower+naly$er is best with a dimensional data warehouse in a relational database. It can also run reports on data in any table in a relational database that do not conform to the dimensional model. Su)er G%ue %uperglue is used for loading metadata in a centrali$ed place from seeral sources. #eports can be run against this superglue to analy$e meta data. Po-er Mart 0ower 3art is a departmental ersion of Informatica for building, deploying, and managing data warehouses and data marts. 0ower center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses li&e data marts. 0ower 4enter supports global repositories and networ&ed repositories and it can be connected to seeral sources. 0ower 3art supports single repository and it can be connected to fewer sources when compared to 0ower 4enter. 0ower 3art can extensibily grow to an enterprise implementation and it is easy for deeloper productiity through a codeless enironment. Note9This is not a complete tutorial on Informatica. 1e will add more Tips and >uidelines on Informatica in near future. I',or&at"!a 0 Tra',or&at"o' In Informatica, Transformations help to transform the source data according to the re'uirements of target system and it ensures the 'uality of the data being loaded into target. Transformations are of two types9 +ctie and 0assie. A!t"1e Tra',or&at"o' +n actie transformation can change the number of rows that pass through it from source to target i.e it eliminates rows that do not meet the condition in transformation. Pa"1e Tra',or&at"o' + passie transformation does not change the number of rows that pass through it i.e it passes all rows through the transformation. Transformations can be 4onnected or )n4onnected. Co''e!te+ Tra',or&at"o' 4onnected transformation is connected to other transformations or directly to target table in the mapping. U'Co''e!te+ Tra',or&at"o' +n unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a alue to that transformation. L"t o, Tra',or&at"o' <ollowing are the list of Transformations aailable in 0ower4enter9 +ggregator Transformation /xpression Transformation <ilter Transformation Joiner Transformation ,oo&up Transformation Normali$er Transformation #an& Transformation #outer Transformation %e'uence >enerator Transformation %tored 0rocedure Transformation %orter Transformation )pdate %trategy Transformation 23, %ource -ualifier Transformation +danced /xternal 0rocedure Transformation /xternal Transformation )nion Transformation In the following pages, we will explain all the aboe Informatica Transformations and their significances in the /T, process in detail. A**re*ator Tra',or&at"o' +ggregator transformation is an +ctie and 4onnected transformation. This transformation is useful to perform calculations such as aerages and sums !mainly to perform calculations on multiple rows or groups". <or example, to calculate total of daily sales or to calculate aerage of monthly or yearly sales. +ggregate functions such as +8>, <I#%T, 4*)NT, 0/#4/NTI,/, 3+2, %)3 etc. can be used in aggregate transformation. E.)re"o' Tra',or&at"o' /xpression transformation is a 0assie and 4onnected transformation. This can be used to calculate alues in a single row before writing to the target. <or example, to calculate discount of each product or to concatenate first and last names or to conert date to a string field. F"%ter Tra',or&at"o' <ilter transformation is an +ctie and 4onnected transformation. This can be used to filter rows in a mapping that do not meet the condition. <or example, to &now all the employees who are wor&ing in Department @C or to find out the products that falls between the rate category DECC and D@CCC. Jo"'er Tra',or&at"o' Joiner Transformation is an +ctie and 4onnected transformation. This can be used to (oin two sources coming from two different locations or from same location. <or example, to (oin a flat file and a relational source or to (oin two flat files or to (oin a relational source and a 23, source. In order to (oin two sources, there must be at least one matching port. 1hile (oining two sources it is a must to specify one source as master and the other as detail. The Joiner transformation supports the following types of (oins9 Normal 3aster *uter Detail *uter <ull *uter Normal (oin discards all the rows of data from the master and detail source that do not match, based on the condition. 3aster outer (oin discards all the unmatched rows from the master source and &eeps all the rows from the detail source and the matching rows from the master source. Detail outer (oin &eeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source. <ull outer (oin &eeps all rows of data from both the master and detail sources. Loo2u) Tra',or&at"o' ,oo&up transformation is 0assie and it can be both 4onnected and )n4onnected as well. It is used to loo& up data in a relational table, iew, or synonym. ,oo&up definition can be imported either from source or from target tables. <or example, if we want to retriee all the sales of a product with an ID @C and assume that the sales data resides in another table. 5ere instead of using the sales table as one more source, use ,oo&up transformation to loo&up the data for the product, with ID @C in sales table. Difference between 4onnected and )n4onnected ,oo&up Transformation9 4onnected loo&up receies input alues directly from mapping pipeline whereas )n4onnected loo&up receies alues from9 ,:0 expression from another transformation. 4onnected loo&up returns multiple columns from the same row whereas )n4onnected loo&up has one return port and returns one column from each row. 4onnected loo&up supports user6defined default alues whereas )n4onnected loo&up does not support user defined alues.. Nor&a%"/er Tra',or&at"o' Normali$er Transformation is an +ctie and 4onnected transformation. It is used mainly with 4*=*, sources where most of the time data is stored in de6normali$ed format. +lso, Normali$er transformation can be used to create multiple rows from a single row of data. Ra'2 Tra',or&at"o' #an& transformation is an +ctie and 4onnected transformation. It is used to select the top or bottom ran& of data. <or example, to select top @C #egions where the sales olume was ery high or to select @C lowest priced products. Router Tra',or&at"o' #outer is an +ctie and 4onnected transformation. It is similar to filter transformation. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. It is useful to test multiple conditions. It has input, output and default groups. <or example, if we want to filter data li&e where %tateF3ichigan, %tateF4alifornia, %tateFNew .or& and all other %tates. It;s easy to route data to different tables.. Se3ue'!e Ge'erator Tra',or&at"o' %e'uence >enerator transformation is a 0assie and 4onnected transformation. It is used to create uni'ue primary &ey alues or cycle through a se'uential range of numbers or to replace missing &eys. It has two output ports to connect transformations. =y default it has two fields 4)##8+, and N/2T8+,!.ou cannot add ports to this transformation". N/2T8+, port generates a se'uence of numbers by connecting it to a transformation or target. 4)##8+, is the N/2T8+, alue plus one or N/2T8+, plus the Increment =y alue. Store+ Pro!e+ure Tra',or&at"o' %tored 0rocedure transformation is a 0assie and 4onnected G )n4onnected transformation. It is useful to automate time6consuming tas&s and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a speciali$ed calculation etc. The stored procedure must exist in the database before creating a %tored 0rocedure transformation, and the stored procedure can exist in a source, target, or any database with a alid connection to the Informatica %erer. %tored 0rocedure is an executable script with %-, statements and control statements, user6defined ariables and conditional statements. In case of stored procedure transformation procedure will be compiled and executed in a relational data source. .ou need data base connection to import the stored procedure in to your maping Sorter Tra',or&at"o' %orter transformation is a 4onnected and an +ctie transformation. It allows to sort data either in ascending or descending order according to a specified field. +lso used to configure for case6sensitie sorting, and specify whether the output rows should be distinct. Sour!e Qua%","er Tra',or&at"o' %ource -ualifier transformation is an +ctie and 4onnected transformation. 1hen adding a relational or a flat file source definition to a mapping, it is must to connect it to a %ource -ualifier transformation. The %ource -ualifier performs the arious tas&s such as oerriding default %-, 'uery, filtering recordsH (oin data from two or more tables etc. U)+ate Strate*( Tra',or&at"o' )pdate strategy transformation is an +ctie and 4onnected transformation. It is used to update data in target table, either to maintain history of data or recent changes. .ou can specify how to treat source rows in table, insert, update, delete or data drien. XML Sour!e Qua%","er Tra',or&at"o' 23, %ource -ualifier is a 0assie and 4onnected transformation. 23, %ource -ualifier is used only with an 23, source definition. It represents the data elements that the Informatica %erer reads when it executes a session with 23, sources. A+1a'!e+ E.ter'a% Pro!e+ure Tra',or&at"o' +danced /xternal 0rocedure transformation is an +ctie and 4onnected transformation. It operates in con(unction with procedures, which are created outside of the Designer interface to extend 0ower4enter70ower3art functionality. It is useful in creating external transformation applications, such as sorting and aggregation, which re'uire all input rows to be processed before emitting any output rows. U'"o' Tra',or&at"o' The union transformation is used to merge multiple datasets from arious streams or pipelines into one dataset. This transformation wor&s similar to the )NI*N +,,, it does not remoe any duplicate rows. It is recommended to use aggregator to remoe duplicates are not expected at the target. E.ter'a% Pro!e+ure Tra',or&at"o' /xternal 0rocedure transformation is an +ctie and 4onnected7)n4onnected transformations. %ometimes, the standard transformations such as /xpression transformation may not proide the functionality that you want. In such cases /xternal procedure is useful to deelop complex functions within a dynamic lin& library !D,," or )NI2 shared library, instead of creating the necessary /xpression transformations in a mapping. Differences between +danced /xternal 0rocedure and /xternal 0rocedure Transformations9 /xternal 0rocedure returns single alue, where as +danced /xternal 0rocedure returns multiple alues. /xternal 0rocedure supports 4*3 and Informatica procedures where as +/0 supports only Informatica 0rocedures
Learn T-SQL From Scratch: An Easy-to-Follow Guide for Designing, Developing, and Deploying Databases in the SQL Server and Writing T-SQL Queries Efficiently