Business Case: Why do we need ETL Tools?

Think of GE, the company has over 100+ years of history & presence in almost all the industries. Over these years company’s management style has been changed from book keeping to SAP. This transition was not a single day transition. In transition, from book keeping to SAP, they used a wide array of technologies, ranging from mainframes to PCs, data storage ranging from flat files to relational databases, programming languages ranging from Cobol to Java. This transformation resulted into different businesses, or to be precise different sub businesses within a business, running different applications, different hardware and different architecture. Technologies are introduced as and when invented & as and when required. This directly resulted into the scenario, like HR department of the company running on Oracle Applications, Finance running SAP, some part of process chain supported by mainframes, some data stored on Oracle, some data on mainframes, some data in VSM files & the list goes on. If one day company requires a consolidated reports of assets, there are two ways. 

First completely manual, generate different reports from different systems and integrate them. Second fetch all the data from different systems/applications, make a Data Warehouse, and generate reports as per the requirement.

Obviously second approach is going to be the best. Now to fetch the data from different systems, making it coherent, and loading into a Data Warehouse requires some kind of extraction, cleansing, integration, and load. ETL stands for Extraction, Transformation & Load. ETL Tools provide facility to Extract data from different non-coherent systems, cleanse it, merge it and load into target systems. What is Informatica? Informatica is a tool, supporting all the steps of Extraction, Transformation and Load process. Now a days Informatica is also being used as an Integration tool. Informatica is an easy to use tool. It has got a simple visual interface like forms in visual basic. You just need to drag and drop different objects (known as transformations) and design process flow for Data extraction transformation and load. These process flow diagrams are known as mappings. Once a mapping is made, it can be scheduled to run as and when required. In the background Informatica server takes care of fetching data from source, transforming it, & loading it to the target systems/databases. Informatica can communicate with all major data sources (mainframe/RDBMS/Flat Files/XML/VSM/SAP etc), can move/transform data between them. It can move huge volumes of data in a very effective way, many a times better than even bespoke programs written for specific data movement only. It can throttle the transactions (do big updates in small chunks to avoid long locking and filling the transactional log). It can effectively join data from two distinct data sources (even a xml file can be joined with a relational table). In all, Informatica has got the ability to effectively integrate heterogeneous data sources & converting raw data into useful information.

They are used to manage users. The PowerCenter Client connects to the repository through the Repository Service to fetch details. PowerCenter Administration Console: This is simply a web-based administration tool you can use to administer the PowerCenter installation. Power Center domain is the collection of all the servers required to support Power Center functionality. 79 of the Fortune 100 Companies NASDAQ Stock Symbol: INFA. Informatica is worlds leading ETL tool & its rapidly acquiring market as an Enterprise Integration Platform. Based on request type it redirects your request to one of the Power Center services. Some facts and figures about Informatica Corporation:      Founded in 1993. So here is a simpler version.Before we start actually working in Informatica. processes it as per the business logic and loads data to targets. workflow. based in Redwood City. fetches data from the repository and sends it back to the requesting components (mostly client tools and integration service) PowerCenter Client Tools: The PowerCenter Client consists of multiple tools. 3450 + Customers. Informatica PowerCenter is not just a tool but an end-to-end data processing and data integration environment. execute them or do anything meaningful (literally). It can be used just to integrate two different systems like SAP and MQ Series or to load data warehouses or Operational Data Stores (ODS).. It facilitates organizations to collect.      .7M Informatica Developer Networks: 20000 Members In short. It extracts data from sources. let’s have an idea about the company owning this wonderful product. entries are made in the repository. Integration Service: Integration Service does all the real job. centrally process and redistribute data. define sources and targets. Each domain has gateway (called domain server) hosts. PowerCenter Repository: Repository is nothing but a relational database which stores all the metadata created in Power Center. build mappings and mapplets with the transformation logic. So essentially client tools are used to code and give instructions to PowerCenter servers. Whenever you want to use Power Center services you send a request to domain server. business rules applied and quality of data before and after processing. Doesn’t make much sense? Right. Stock Price: $18. session. and create workflows to run the mapping logic. California 1400+ Employees.. It connects to the Integration Service to start workflows. Repository Service: Repository Service is the one that understands content of the repository. Now Informatica PowerCenter also includes many add-on tools to report the data being processed. To facilitate this PowerCenter is divided into different components:  PowerCenter Domain: As Informatica says “The Power Center domain is the primary unit for management and administration within PowerCenter”. Whenever you develop mapping.74 (09/04/2009) Revenues in fiscal year 2008: $455.

1. It shows how the data is acquired. known as sessions (Workflow Manager) Monitor execution of sessions (Workflow Monitor) Manage repository. what business rules are applied and where data is populated in readable reports. known as mapping. known as Informatica Power Center consists of 3 main components. PowerCenter Repository Reports: PowerCenter Repository Reports are a set of pre packaged Data Analyzer reports and dashboards to help you analyze and manage PowerCenter metadata. These tools enable a developer to      Define transformation process.  Informatica Software Architecture illustrated Informatica ETL product. SAP BW Service: The SAP BW Service extracts data from and loads data to SAP BW. useful for administrators (Repository Manager) Report Metadata (Metadata Reporter) . Metadata Manager: Metadata Manager is a metadata management tool that you can use to browse and analyze metadata from disparate metadata repositories.There are some more not-so-essential-to-know components discussed below:     Web Services Hub: Web Services Hub exposes PowerCenter functionality to external clients through web services. Data Analyzer: Data Analyzer is like a reporting layer to perform analytics on data warehouse or ODS data. Informatica PowerCenter Client Tools: These are the development tools installed at developer end. (Designer) Define run-time properties for a mapping.

Flat Files. where all the executions take place. 3. Informatica Product Line . Adabas)AS400 (DB2. IDMS. which has got the ability to process data but has no data to process. Tibco. XML. This is the place where all the metadata for your application is stored. i2 Applications: SAP R/3. This can be treated as backend of Informatica. This architecture is visually explained in diagram below: Sources Targets Standard: RDBMS. All the client tools and Informatica Server fetch data from Repository. JD Edwards. ODBC Applications: SAP R/3. VSAM. Tibco. Repository is a kind of data inventory where all the data related to mappings. targets etc is kept. Web Services Legacy: Mainframes (DB2. Informatica client and server without repository is same as a PC without memory/hard disk. fetches data. i2 EAI: MQ Series. PeopleSoft. Informatica PowerCenter Server: Server is the place. SAP BW. Informatica PowerCenter Repository: Repository is the heart of Informatica tools. ODBC Standard: RDBMS. Siebel. SAP BW. applies the transformations mentioned in the mapping and loads the data in the target system. Flat Files.2. JMS. sources. PeopleSoft. JMS. Flat File) Legacy: Mainframes (DB2)AS400 (DB2) Remote Targets Remote Sources This is the sufficient knowledge to start with Informatica. Server makes physical connections to sources/targets. So lets go straight to development in Informatica. IMS. Siebel. XML. Web Services EAI: MQ Series. JD Edwards.

For example. or other data storage models. VSAM.). Siebel etc. Embarcadero. . etc.. Oracle designer. IMS etc. an organization can extract. With Power Analyzer. mid range (AS400 DB2 etc. functions. Power Mart supports single repository and it can be connected to fewer sources when compared to Power Center. Power center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses like data marts. deploying. format. and sharing enterprise data simple and easily available to decision makers. a leading provider of enterprise data integration software and ETL softwares. WAN. Power Exchange: Informatica Power Exchange as a stand alone service or along with Power Center. It can also run reports on data in any table in a relational database that do not conform to the dimensional model. SAP. data types. data mart. transfer files over FTP. Power Analyzer: Power Analyzer provides organizations with reporting facilities. and other third party applications. Peoplesoft. There is no need for informatica developer to create these data structures once again. through Firewalls. Functional and technical team should have spent much time and effort in creating the data model's data structures(tables.. such as Erwin. these data structures can be imported into power center to identify source and target mappings which leverages time and effort. db2 etc) and flat files in unix. The important products provided by Informatica Corporation is provided below:         Power Center Power Mart Power Exchange Power Center Connect Power Channel Metadata Exchange Power Analyzer Super Glue Power Center & Power Mart: Power Mart is a departmental version of Informatica for building. and for relational databases (oracle. Meta Data Exchange: Metadata Exchange enables organizations to take advantage of the time and effort already invested in defining data structures within their IT environment when used with Power Center. PowerAnalyzer is best with a dimensional data warehouse in a relational database. operational data store. and managing data warehouses and data marts. analyzing. It helps to extract data and metadata from ERP systems like IBM's MQSeries. Power Center Connect: This is add on to Informatica Power Center. Power Channel: This helps to transfer large amount of encrypted and compressed data over LAN. and analyze corporate information from data stored in a data warehouse. Power Exchange supports batch. triggers etc). real time and changed data capture options in main frame(DB2. columns.).Informatica is a powerful ETL tool from Informatica Corporation. an organization may be using data modelling tools. sql server. By using meta data exchange. linux and windows systems. Power Analyzer makes accessing. procedures. Power Mart can extensibily grow to an enterprise implementation and it is easy for developer productivity through a codeless environment. Power Analyzer enables to gain insight into business processes and develop business intelligence. Power Center supports global repositories and networked repositories and it can be connected to several sources. helps organizations leverage data by avoiding manual coding of data extraction programs. Sybase Power Designer etc for developing data models. filter.

Reports can be run against this superglue to analyze meta data. It is called within another transformation. can change the row type. modifies. For example. The transformation that originates the branch can be active or passive. Sequence Generator transformation(SGT) is an exception to this rule. For example. As a result. Transformations can be Connected or UnConnected to the data flow. and returns a value to that transformation. Passive Transformation. The key point is to note that Designer allows you to connect multiple transformations to the same downstream transformation or transformation input group only if all transformations in the upstream branches are passive. Note: This is not a complete tutorial on Informatica. or passes data. contact its official website www. change the transaction boundary. Please visit us soon to check back. Informatica Certification Information . To know more about Informatica. UnConnected Transformation An unconnected transformation is not connected to other transformations in the mapping.Super Glue: Superglue is used for loading metadata in a centralized place from several sources. an Aggregator transformation performs calculations on groups of data. Filter. Transformations can be of two types: Active Transformation An active transformation can change the number of rows that pass through the transformation. A passive transformation does not change the number of rows that pass through it. maintains the transaction boundary. Transaction Control and Update Strategy are active transformations. A SGT does not receive data. and maintains the row type. The Designer provides a set of transformations that perform specific functions. It generates unique numeric values. The key point is to note that Designer does not allow you to connect multiple active transformations or an active and a passive transformation to the same downstream transformation or transformation input group because the Integration Service may not be able to concatenate the rows passed by active transformations However. Connected Transformation Connected transformation is connected to other transformations or directly to target table in the Informatica Transformations A transformation is a repository object that generates. the Integration Service does not encounter problems concatenating rows passed by a SGT and an active transformation.informatica. We will add more Tips and Guidelines on Informatica in near future.

mapping and mapplet developers Requirements . then Informatica Certification will help you achieve this.attain a passing score (70 percent or higher) on two exams:  Architechture and Administration  Advanced Administration (2) Informatica Certified Designer For PowerCenter Tranformation. Various Certifications from Informatica (1) Informatica Certified Administrator For PowerCenter administrators. testers and project managers Requirements .attain a passing score on three exams:  Architecture and Administration  Mapping Design  Advanced Mapping Design (3) Informatica Certified Consultant For PowerCenter experts Requirements: attain a passing score on five exams:  Architecture and Administration  Mapping Design .If you have wide and good hands-on experience in Informatica and you want to grow in the field of data integration area. There are variety of certifications available from Informatica.

Similarly the title in PwerCenter6 can be upgraded to 7. Advanced Administration  Advanced Mapping Design  Enablement Technologies There are various tracks available to go with for the above certification depending on the version of PowerCenter you are using.    PowerCenter 5 PowerCenter 6 PowerCenter 7 However. PowerCenter 5 Track Exam A: PowerCenter 5 Architecture and Administration Exam B: PowerCenter 5 Mapping Design Exam C: PowerCenter 5 Advanced Administration Exam D: PowerCenter 5 Advanced Mapping Design Exam E: Enablement Technologies PowerCenter 6 Track Exam G: PowerCenter 6 Architecture and Administration Exam H: PowerCenter 6 Mapping Design Exam J: PowerCenter 6 Advanced Administration Exam I: PowerCenter 6 Advanced Mapping Design Exam E: Enablement Technologies PowerCenter 7 Track Exam M: PowerCenter 7 Architecture and AdministrationClick to check syllabus . candidates who earn titles in PowerCenter 5 can upgrade the certification to PowerCenter 6 by appearing in one PowerCenter 6 upgrade examination. Following is the list of exams available under the above mentioned tracks.

Exam N: PowerCenter 7 Mapping Design Exam O: PowerCenter 7 Advanced Administration Exam P: PowerCenter 7 Advanced Exam E: Enablement Technologies Click to check syllabus Mapping DesignClick to check syllabus Update Exams Exam F: PowerCenter 6 Update Exam L: PowerCenter 7 Update Exam Q: PowerCenter 8 Update Some Important Links :   Search Informatica Database for the list of Certified professionals List of Certifications available from Informatica .

Sign up to vote on this title
UsefulNot useful