32173942 a Shortcuts | Data Warehouse | Metadata

SORTER RANK JOINER FILTER ROUTER AGGREGATOR NORMALIZER SOURCE QUALIFIER UNION

__ sorting the tables in ascending or descending and aslo to obtain Distinct records. __ Top or bottom 'N' analysis . __ Join two different sources cmng from different and same location . __ filters the rows that do not meet the condition. __ It is useful to test multiple conditions . __ To perform group calculation such as count , max , min , sum , avg (mainly to perform calculation or multiple rows or group) __ Reads cobol files ( denormalized format). Split a single row into multiple rows. __ It performs many tasks such as override default sql query , filtering records , join data from two or more table etc Represents the flatfile or relational data. __ It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows. You can use the Expression transformation to calculate values in a single row before you write to the target. Use a Lookup transformation in a mapping to look up data in a flat file or a relational table, view, or synonym. stored procedures to automate tasks that are too complicated for standard SQL statements.You can call by using Stored Procedure Transformation. When you add an XML source definition to a mapping, you need to connect it to an XML Source Qualifier transformation. __ To flag rows for insert, delete, update, or reject..

__ EXPRESSION __ LOOK UP STORED PROCEDURE XML SOURCE QUALIFIER UPDATE STRATEGY __

__

summarizing. most of the fact tables in a star schema are in database third normal form. The main feature of a star schema is a table at the center. while dimensional tables are de-normalized (second normal form). Fact table . called the fact table and the dimension tables which allow browsing of specific categories. Typically. drill-downs and specifying criteria.ARCHITECTURE Data Warehouse Architecture ST AR SC HE MA Star schema architecture is the simplest data warehouse design.

The fact table is not a typical relational database table as it is de-normalized on purpose . The fact table typically contains records that are ready to explore.usually grouped by month and loaded daily. In a star schema design. there may be some company or business specific fact tables): sales fact table . forecast fact table . usually refreshed daily Dimension table Nearly all of the information in a typical fact table is also present in one or more dimension tables. The primary keys of each of the dimension tables are linked together to form the composite primary key of the fact table. budget fact table .in some cases the table can be split into open orders and historical orders.contains all details regarding sales orders fact table . there is only one denormalized table for a given dimension. The main purpose of maintaining Dimension Tables is to allow browsing the categories quickly and easily.report stocks. inventory fact table .to enhance query response times.usually grouped by month and loaded once at the end of a year. TURNOVER. Sometimes the values for historical orders are stored in a sales fact table. due to the time-variant nature of a data warehouse environment. Records in the fact table are often referred to as events. . The primary key for the fact table is a composite of all the columns except numeric values / scores (like QUANTITY. exact invoice date and time). usually with ad hoc queries. weekly or monthly. Typical fact tables in a global enterprise data warehouse are (apart for those.

SNOWFLAKE SCHEMA Snowflake schema architecture is a more complex variation of a star schema design. For example if a PRODUCT dimension table contains millions of rows. The problem is that the more normalized the dimension table is. This is because in order for a query to be answered. many tables need to be joined and aggregates generated. The main difference is that dimensional tables in a snowflake schema are normalized. Snowflake schemas are generally used when a dimensional table becomes very big and when a star schema can’t represent the complexity of a data structure. the use of snowflake schemas should significantly improve performance by moving out some data to other table (with BRANDS for instance). so they have a typical relational database design. An example of a snowflake schema architecture is depicted below. the more complicated SQL joins must be issued to query them.Typical dimension tables in a data warehouse are: time dimension table customers dimension table products dimension table key account managers (KAM) dimension table sales office dimension table Star schema example An example of a star schema architecture is depicted below. .

This allows dimension tables to be shared amongst many fact tables. This schema is more complex than star or snowflake architecture. .GALAXY SCHEMA For each star schema or snowflake schema it is possible to construct a fact constellation schema. In a fact constellation schema. In that case using two different fact tables on a different level of grouping is realized through a fact constellation model. That solution is very flexible. Use of that model should be reasonable when for example. different fact tables are explicitly assigned to the dimensions. which are for given facts relevant. however it may be hard to manage and support. The main disadvantage of the fact constellation schema is a more complicated design because many variants of aggregation must be considered. client id and product id. This may be useful in cases when some facts are associated with a given dimension level and other facts with a deeper dimension level. there is a sales fact table (with details down to the exact date and invoice header id) and a fact table with sales forecast which is calculated based on month. which is because it contains multiple fact tables.

aggregating. Mapping The definition of the relationship and data flow between source and target objects. Transformation The process of manipulating data. business rules. or other storage facility from which the data in a data warehouse is derived. the schema design of a data warehouse is typically stored in a repository as meta data. Any manipulation beyond copying is a transformation. Transportation The process of moving copied or . typically as part of the ETL process. Staging Area A place where data is processed before entering the warehouse. and processes. Meta data Data that describes data and other structures. and integrating data from multiple sources.Source System A database. Cleansing The process of resolving inconsistencies and fixing the anomalies in source data. which is used to generate scripts used to build and populate the data warehouse. A repository contains meta data. For example. file. application. Examples include cleansing. such as objects.

etc. as its data warehouse target. terradata. Source and Target Consider a Bank that has got many branches throughout the world. sql server. file. all the Metadata information about source systems. Many organisations prefer • . Note: To know more about Metadata and its significance. please click here. or other storage facility to which the "transformed source data" is loaded in a data warehouse. Figure 1. When the Bank decides to integrate its data from several sources for its management decisions. a leading provider of enterprise data integration software and ETL softwares. Informatica's Power Center Client and Repository Server access this repository to store and retrieve metadata. etc.transformed data from a source to a data warehouse. it may choose one or more systems like oracle. target systems and transformations are stored in the Informatica repository. The important Informatica Components are: Power Exchange • Power Center • Power Center Connect • Power Channel • Metadata Exchange • Power Analyzer • Super Glue In Informatica.12 : Sample ETL Process Flow Informatica Informatica is a powerful ETL tool from Informatica Corporation. Target System A database. sql server. application. terradata. In each branch data may be stored in different source systems like oracle.

• Repository Server: This repository server takes care of all the connections between the repository and the Power Center Client. The Power Center Client and the Repository Server would access this repository to retrieve. transform and load the data into target systems. Source Analyzer is used for extracting metadata from source systems. Warehouse Designer is used for extracting metadata from target systems or metadata can be created in the Designer itself. identifiying source and target systems definitions. It can connect to several sources and targets to extract meta data from sources and targets. . • Designer: Source Analyzer. • Power Center Server: Power Center server does the extraction from source and then loading data into targets. Mapping Designer and Warehouse Designer are tools reside within the Designer wizard. because Informatica is more powerful in designing and building data warehouses. • Power Center Client: Informatica client is used for managing users.Informatica to do that ETL process. creating mapping and mapplets. Mapping is a pictorial representation about the flow of data from source to target. store and manage metadata. Guidelines to work with Informatica Power Center • Repository: This is where all the metadata information is stored in the Informatica suite. Mapping Designer is used to create mapping between sources and targets. creating sessions and run workflows etc.

workflows can be created to ensure the correct flow of data from source to target. Filtering. Aggregation.Data Cleansing: The PowerCenter's data cleansing technology improves data quality by validating. • Power Center Connect: This component helps to extract data and metadata from ERP systems like IBM's MQSeries. Transformations ensure the quality of the data being loaded into target and this is done during the mapping process from source to target. correctly naming and standardization of address data. Joining are some of the examples of transformation. • Workflow Manager: Workflow helps to load the data from source to target in a sequential manner. • Power Center Exchange: This component helps to extract data and • . • Workflow Monitor: This monitor is helpful in monitoring and tracking the workflows created in each Power Center Server. city name may not match with address. SAP. Peoplesoft. Siebel etc. For example. • Transformation: Transformations help to transform the source data according to the requirements of target system. then the target system will pop up an error message since the fact table is violating the foreign key validation. These errors can be corrected by using data cleansing process and standardized data can be loaded in target systems (data warehouse). and other third party applications. A person's address may not be same in all source systems because of typos and postal code. if the fact tables are loaded before the lookup tables. Sorting. To avoid this.

Siebel etc. There is no need for informatica developer to create these data structures once again. Peoplesoft. Embarcadero. through Firewalls. db2 etc) and flat files in unix. Meta Data Exchange Metadata Exchange enables organizations to take advantage of the time and effort already invested in defining data structures within their IT environment when used with Power Center. By using meta deta exchange. VSAM. triggers etc). procedures. and for relational databases (oracle.metadata from ERP systems like IBM's MQSeries.). Functional and technical team should have spent much time and effort in creating the data model's data structures(tables. columns. etc. Informatica Power Exchange Informatica Power Exchange as a stand alone service or along with Power Center. functions.. WAN. these data structures can be imported into power center to identifiy source and target mappings which leverages time and effort. Power Exchange supports batch. data types. . linux and windows systems. and other third party applications.). sql server. Sybase Power Designer etc for developing data models. Power Channel This helps to transfer large amount of encrypted and compressed data over LAN. real time and changed data capture options in main frame(DB2. SAP. an organization may be using data modeling tools. such as Erwin.. Oracle designer. For example. IMS etc. tranfer files over FTP. mid range (AS400 DB2 etc. helps organizations leverage data by avoiding manual coding of data extraction programs.

Power Center supports global repositories and networked repositories and it can be connected to several sources. Super Glue Superglue is used for loading metadata in a centralized place from several sources. deploying.Power Analyzer Power Analyzer provides organizations with reporting facilities. Power center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses like data marts. It can also run reports on data in any table in a relational database that do not conform to the dimensional model. operational data store. an organization can extract. Power Mart can extensibily grow to an enterprise implementation and it is easy for developer productivity through a codeless environment. format. and analyze corporate information from data stored in a data warehouse. data mart. and managing data warehouses and data marts. filter. and sharing enterprise data simple and easily available to decision makers. We will add more Tips and . PowerAnalyzer is best with a dimensional data warehouse in a relational database. analyzing. Power Mart Power Mart is a departmental version of Informatica for building. Reports can be run against this superglue to analyze meta data. With PowerAnalyzer. or otherdata storage models. Power Mart supports single repository and it can be connected to fewer sources when compared to Power Center. Note:This is not a complete tutorial on Informatica. PowerAnalyzer enables to gain insight into business processes and develop business intelligence. PowerAnalyzer makes accessing.

e it eliminates rows that do not meet the condition in transformation. and returns a value to that transformation. UnConnected Transformation An unconnected transformation is not connected to other transformations in the mapping. Informatica . Transformations help to transform the source data according to the requirements of target system and it ensures the quality of the data being loaded into target. Connected Transformation Connected transformation is connected to other transformations or directly to target table in the mapping. Transformations can be Connected or UnConnected. Passive Transformation A passive transformation does not change the number of rows that pass through it i. List of Transformations Following are the list of Transformations available in PowerCenter: • Aggregator Transformation • Expression Transformation . Transformations are of two types: Active and Passive.e it passes all rows through the transformation. Active Transformation An active transformation can change the number of rows that pass through it from source to target i. It is called within another transformation.Guidelines on Informatica in near future.Transformations In Informatica.

SUM etc. we will explain all the above Informatica Transformations and their significances in the ETL process in detail.• Filter Transformation • Joiner Transformation • Lookup Transformation • Normalizer Transformation • Rank Transformation • Router Transformation • Sequence Generator Transformation • Stored Procedure Transformation • Sorter Transformation • Update Strategy Transformation • XML Source Qualifier Transformation • Advanced External Procedure Transformation • External Transformation • Union Transformation In the following pages. Filter Transformation Filter transformation is an Active and Connected transformation. For example. to calculate total of daily sales or to calculate average of monthly or yearly sales. MAX. For example. PERCENTILE. Aggregate functions such as AVG. Expression Transformation Expression transformation is a Passive and Connected transformation. to calculate discount of each product or to concatenate first and last names or to convert date to a string field. This can be used to filter rows in a mapping that do not . This can be used to calculate values in a single row before writing to the target. This transformation is useful to perform calculations such as averages and sums (mainly to perform calculations on multiple rows or groups). Aggregator Transformation Aggregator transformation is an Active and Connected transformation. can be used in aggregate transformation. FIRST. COUNT.

meet the condition. to know all the employees who are working in Department 10 or to find out the products that falls between the rate category $500 and $1000. Full outer join keeps all rows of data from both the master and detail sources. In order to join two sources. there must be at least one matching port. Joiner Transformation Joiner Transformation is an Active and Connected transformation. It is used to look up data in a . to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source. For example. Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. The Joiner transformation supports the following types of joins: • Normal • Master Outer • Detail Outer • Full Outer Normal join discards all the rows of data from the master and detail source that do not match. Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source. Lookup Transformation Lookup transformation is Passive and it can be both Connected and UnConnected as well. It discards the unmatched rows from the detail source. While joining two sources it is a must to specify one source as master and the other as detail. This can be used to join two sources coming from two different locations or from same location. based on the condition. For example.

if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in another table. Also. For example. with ID 10 in sales table. It is used to select the top or bottom rank of data. Rank Transformation Rank transformation is an Active and Connected transformation.relational table. use Lookup transformation to lookup the data for the product. Connected lookup returns multiple columns from the same row whereas UnConnected lookup has one return port and returns one column from each row.. Lookup definition can be imported either from source or from target tables. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format. Connected lookup supports user-defined default values whereas UnConnected lookup does not support user defined values. or synonym. to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products. For example. view. Normalizer transformation can be used to create multiple rows from a single row of data. Router Transformation . Normalizer Transformation Normalizer Transformation is an Active and Connected transformation. Difference between Connected and UnConnected Lookup Transformation: Connected lookup receives input values directly from mapping pipeline whereas UnConnected lookup receives values from: LKP expression from another transformation. Here instead of using the sales table as one more source.

It is useful to automate time-consuming tasks and it is also used in error handling.Router is an Active and Connected transformation. It has input. It is similar to filter transformation. State=New York and all other States. filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition. to drop and recreate indexes and to determine the space in database. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value. It has two output ports to connect transformations. For example. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing keys. target. Sequence Generator Transformation Sequence Generator transformation is a Passive and Connected transformation.. It’s easy to route data to different tables. By default it has two fields CURRVAL and NEXTVAL(You cannot add ports to this transformation). output and default groups. Stored Procedure is an executable script with SQL statements and control statements. Stored Procedure Transformation Stored Procedure transformation is a Passive and Connected & UnConnected transformation. The only difference is. user-defined variables . and the stored procedure can exist in a source. The stored procedure must exist in the database before creating a Stored Procedure transformation. State=California. or any database with a valid connection to the Informatica Server. It is useful to test multiple conditions. if we want to filter data like where State=Michigan. a specialized calculation etc. NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target.

Source Qualifier Transformation Source Qualifier transformation is an Active and Connected transformation. delete or data driven. It represents the data elements that the Informatica Server reads when it executes a session with XML sources. update. You need data base connection to import the stored procedure in to your maping Sorter Transformation Sorter transformation is a Connected and an Active transformation. The Source Qualifier performs the various tasks such as overriding default SQL query. XML Source Qualifier is used only with an XML source definition. and specify whether the output rows should be distinct. It is used to update data in target table. When adding a relational or a flat file source definition to a mapping. It allows to sort data either in ascending or descending order according to a specified field.and conditional statements. it is must to connect it to a Source Qualifier transformation. join data from two or more tables etc. Advanced External Procedure Transformation . XML Source Qualifier Transformation XML Source Qualifier is a Passive and Connected transformation. In case of stored procedure transformation procedure will be compiled and executed in a relational data source. either to maintain history of data or recent changes. insert. filtering records. Update Strategy Transformation Update strategy transformation is an Active and Connected transformation. Also used to configure for case-sensitive sorting. You can specify how to treat source rows in table.

it does not remove any duplicate rows. This transformation works similar to the UNION ALL. In such cases External procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library. It is recommended to use aggregator to remove duplicates are not expected at the target. Union Transformation The union transformation is used to merge multiple datasets from various streams or pipelines into one dataset. where as Advanced External Procedure returns multiple values. instead of creating the necessary Expression transformations in a mapping. It operates in conjunction with procedures. which require all input rows to be processed before emitting any output rows. the standard transformations such as Expression transformation may not provide the functionality that you want. It is useful in creating external transformation applications. External Procedure Transformation External Procedure transformation is an Active and Connected/UnConnected transformations. which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality. External Procedure supports COM and Informatica procedures where as AEP supports only Informatica Procedures . such as sorting and aggregation.Advanced External Procedure transformation is an Active and Connected transformation. Sometimes. Differences between Advanced External Procedure and External Procedure Transformations: External Procedure returns single value.

Sign up to vote on this title
UsefulNot useful