This action might not be possible to undo. Are you sure you want to continue?
Data warehousing is the entire process of data extraction, transformation, and loading of data to the warehouse and the access of the data by end users and applications
A data mart stores data for a limited number of subject areas, such as marketing and sales data. It is used to support specific applications. An independent data mart is created directly from source systems. A dependent data mart is populated from a data warehouse.
Transaction Data Prod
S T A G I N G A R E A O P E R A T I O N A L D A T A S T O R E
Data Analysis Tools and Applications
IBM IMS VSAM Oracle
SQL Cognos Teradata IBM Load Informatica Data Warehouse Data Marts Fi nance Essbase Marketing Meta Dat a Queri es,Reporting, DSS/EIS, Data Mining Micro Strategy Sales Microsoft Si ebel Business Objects Web Browser SAS
Other Internal Data ERP Web Data
OPERATIO NAL PERSONNEL
Clean/Scrub Trans form Fi rst logic
Need For ETL Tool
Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data) Sometimes source data is copied to the target database using the replication capabilities of standard RDBMS (not recommended because of “dirty data” in the source systems) Increasing performed by specialized ETL software
Sample ETL Tools DataStage from Ascential Software SAS System from SAS Institute Informatica Data Integrator From BO Hummingbird Genio Suite from Hummingbird Communications Oracle Express Abinito Decision Stream From Cognos MS-DTS from Microsoft .
Components Of Informatica Repository Manager Designer Workflow Manager .
and create sessions to run the mapping logic. The Informatica repository is at the center of the Informatica suite. Informatica Server. The Informatica Server extracts the source data. and Workflow Manager. The Informatica Client has three client applications: Repository Manager. and loads the transformed data into the targets. • • . define sources and targets. You create a set of metadata tables within the repository database that the Informatica applications and tools access. Use the Informatica Client to manage users.Informatica provides the following integrated components: • Informatica repository. The Informatica Client and Server access the repository to save and retrieve metadata. Designer. build mappings and mapplets with the transformation logic. performs the data transformation. Informatica Client.
Informatica server runs workflow according to the conditional links connecting tasks. A workflow is a set of instructions how and when to run the task related to ETL. Session is type of workflow task which describes how to move the data between source and target using a mapping. Mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. .Process Flow Informatica Server moves the data from source to target based on the workflow and metadata stored in the repository.
Microsoft SQL Server. Informix. If you use Power Center. IBM DB2. • • • • . SAP R/3. Microsoft Excel and Access. you can purchase Power Connect for IBM DB2 for faster access to IBM DB2 on MVS. Sybase. COBOL file. Fixed and delimited flat file. Oracle. you can purchase additional Power Connect products to access business sources such as PeopleSoft. and XML. and Teradata. Extended. and IBM MQSeries. Mainframe. Other. Siebel.Sources Power Mart and Power Center access the following sources: • Relational. File. If you use Power Center.
Targets Power Mart and Power Center can load data into the following targets: • Relational. IBM DB2. Other. Sybase IQ. Extended. Microsoft Access. FTP. File. Sybase. and Teradata. • • • You can load data into targets using ODBC or native drivers. Fixed and delimited flat files and XML. You can also purchase Power Connect for IBM MQSeries to load data into IBM MQSeries message queues. you can purchase an integration server to load data into SAP BW. . Microsoft SQL Server. If you use Power Center. Oracle. or external loaders. Informix.
so as to work in the client tools.creating folders . Step 3 : Creation of Workflow through workflow Manager which has different tasks connected between them. .session is the task which is pointing to a mapping created in the designer.General Flow of Informatica Step 1: Creating Repository . importing source and target tables . In that . creation of mappings.Creating users and assign permission in Repository Manager. Step 2:Connecting to the repository from the designer.
or creating sessions. The Informatica Server also creates metadata such as start and finish times of a session or session status. and then use the Repository Manager to create the metadata tables in the database. The Informatica Server reads metadata created in the Client application when you run a session. You create a database for the repository. Contd :- .Repository The Informatica repository is a set of tables that stores the metadata you create using the Informatica Client tools. analyzing sources. developing mappings or mapplets. You add metadata to the repository tables when you perform tasks in the Informatica Client application such as creating users.
you can develop global and local repository to share metadata: Global repository. Use the global repository to store common objects that multiple developers can use through shortcuts. and mappings. common dimensions and lookups. When you use Power Center. The global repository is the hub of the domain. mapplets. A local repository is within a domain that is not the global repository.. Local repositories. and enterprise standard transformations. Use local repositories for development.Repository Contd. You can also create copies of objects in nonshared folders. you can create shortcuts to objects in shared folders in the global repository. These objects may include operational or application source definitions. reusable transformations. These objects typically include source definitions. From a local repository. .
Repository Architecture Repository Client Repository Server ---------------------------Repository Agent Repository Database .
Creating a Repository To create Repository 1. In the Create Repository dialog box. . 2. In the Repository Manager. 3. choose Repository-Create Repository. specify the name of the new repository. Note: You must be running the Repository Manager in Administrator mode to see the Create Repository option on the menu. Launch the Repository Manager by choosing Programs-Power Center (or Power Mart) Client-Repository Manager from the Start Menu. as well as the parameters needed to connect to the repository database through ODBC. Administrator mode is the default when you install the program.
Database user used to connect to the repository. Public Administrators. The administrator group has only read privilege for other user groups.Working with Repository. By default 2 users will be created in the repository . Administrator User.. By default 2 Groups will be created . These groups and users cannot be deleted from the repository .
Write and Execute. Ex :.Working with Repository contd.. Examples of these are Use Designer. Browse repository . Session operator etc. • . Repository-wide security that controls which task or set of tasks a single user or group of users can access. Permissions.Read . You can perform various tasks for each privilege. Informatica tools include two basic types of security: • Privileges. Security assigned to individual folders within the repository.
If you have reusable transformation that you want to use in several mappings or across multiple folders. For example.Folders Folders provide a way to organize and store all metadata in the repository. For example. you may have a reusable Expression transformation that calculates sales commissions. Shared folders allow users to create shortcuts to objects in the folder. and sessions. schemas. Shared Folders When you create a folder. but not to edit them. including mappings. to help you organize your data warehouse logically. You can then use the object in other folders by creating a shortcut to the object. Or you can create a folder that allows users to share objects within the folder. you can place the object in a shared folder. you can configure it as a shared folder. Folders are designed to be flexible. Each folder has a set of properties you can configure to define how users access the folder. . you can create a folder that allows all repository users to see objects within the folder.
any user with the Super User privilege can perform all tasks across all folders in the repository. However. and the tasks you permit them to perform. With folder permissions. • • . Privileges grant access to specific tasks while permissions grant access to specific folders with read. Execute permission. Write permission. and execute qualifiers. Allows you to execute or schedule a session or batch in the folder. Allows you to create or edit objects in the folder. you can control user access to the folder. write. Allows you to view the folder as well as objects in the folder. Folders have the following types of permissions: • Read permission.Folder Permissions Permissions allow repository users to perform tasks within a folder. Folder permissions work closely with repository privileges.
Creating Folders .
Other Features of Repository Manager Viewing . removing Locks Adding Repository Backup and Recovery of Repository Taking Metadata reports like Completed Sessions details . session . List of reports on Jobs . workflow etc .
Questions/Comments ? .
Accessing the folder Importing the source and target tables required for mapping. Creation of mapping .Working with Designer Connecting to the repository using User id and password.
Transformation Developer: Used to create reusable transformations Mapplet Designer: Used to create mapplets Mapping Designer: Used to create mappings . XML.Tools provided by Designer Source Analyzer: Importing Source definitions for Flat file. Warehouse Designer: Use to Import or create target definitions. COBOL and relational Sources.
Importing Sources .
Import from Database Use ODBC connection for importing from database .
Import from File .
Creating Targets You can create target definitions in the Warehouse Designer for file and relational sources. Drag one of the following existing source definitions into the Warehouse Designer to make a target definition: o o o • Relational source definition Flat file source definition COBOL source definition • Manually create a target definition. . Create definitions in the following ways: • Import the definition for an existing target. Create a target definition based on a source definition. Create and design a target definition in the Warehouse Designer. Import the target definition from a relational target.
Creating targets .
Creation of simple mapping .
enter <Mapping Name> as the name of the new mapping and click OK. The naming convention for mappings is m_MappingName. click the Sources node to view source definitions added to the repository. Choose Mappings-Create. in fact it contains a new mapping without any sources. Contd.. targets. . In the Mapping Name dialog box. under the <Repository Name> repository and <Folder Name> folder. While the workspace may appear blank. or transformations. In the Navigator.Creation of simple mapping Switch to the Mapping Designer.
Mapping creation Contd.. . Click the icon representing the EMPLOYEES source and drag it into the workbook.
Click and drag the icon for the T_EMPLOYEES target into the workspace. Click the Targets icon in the Navigator to open the list of all target definitions..Mapping creation Contd. The Designer automatically connects a Source Qualifier transformation to the source definition. The target definition appears. After you add the target definition. The source definition appears in the workspace. . The final step is connecting the Source Qualifier to this target definition. you connect the Source Qualifier to the target.
To Connect the Source Qualifier to Target Definition: Click once in the middle of the <Column Name> in the Source Qualifier. Then release the mouse button. An arrow (called a connector) now appears between the row columns . and drag the cursor to the <Column Name> in the target..Mapping creation Contd. Hold down the mouse button.
or passes data The Designer provides a set of transformations that perform specific functions Data passes into and out of transformations through ports that you connect in a mapping or mapplet Transformations can be active or passive . modifies.Transformations A transformation is a repository object that generates.
Transformations Active transformations Aggregator Filter Router Joiner Source qualifier performs aggregate calculations serves as a conditional filter serves as a conditional filter (more than one filters) allows for heterogeneous joins represents all data queried from the source Passive transformations Expression performs simple calculations Lookup looks up values and passes to other objects Sequence generator generates unique ID values Stored procedure calls a stored procedure and captures return values Update strategy allows for logic to insert. update. delete. or reject data .
in the Mapplet Designer as part of a Mapplet. Drag one port to another to connect them in the mapping or Mapplet. . Connect the transformation to other transformations and target definitions. or in the Transformation Developer as a reusable transformation.. Configure the transformation.Transformations Contd. Create it in the Mapping Designer as part of a mapping. Create the transformation. Each type of transformation has a unique set of options that you can configure.
concatenate first and last names.Expression Transformation You can use the Expression transformations to calculate values in a single row before you write to the target. you might need to adjust employee salaries. or convert strings to numbers. You can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations. For example. You can use the Expression transformation to perform any non-aggregate calculations. .
Output port for the expression. For example. You enter the expression as a configuration option for the output port. The return value for the output port needs to match the return value of the expression. when calculating the total price for an order.Expression Transformation Calculating Values To use the Expression transformation to calculate values for a single row. Variable Port : Variable Port is used like local variable inside Expression Transformation . which can be used in other calculations . the input or input/output ports. One port provides the unit price and the other provides the quantity ordered. determined by multiplying the unit price by the quantity ordered. you must include the following ports: Input or input/output ports for each value used in the calculation.
. looking up a value. or generating a unique ID that modify information before it reaches the target. you can add transformations such as a calculating sum. representing all the columns of information read from a source and temporarily stored by the Informatica Server.Source Qualifier Transformation Every mapping includes a Source Qualifier transformation. In addition.
Specify sorted ports. You can use the Source Qualifier to perform the following tasks: • Join data originating from the same source database. you might use a custom query to perform aggregate calculations or execute a stored procedure • • • • • . Select only distinct values from the source. If you include a user-defined join. Filter records when the Informatica Server reads source data. You can join two or more tables with primary-foreign key relationships by linking the sources to one Source Qualifier. Create a custom query to issue a special SELECT statement for the Informatica Server to read source data. If you choose Select Distinct.Source Qualifier Transformation When you add a relational or a flat file source definition to a mapping. you need to connect it to a Source Qualifier transformation. For example. Specify an outer join rather than the default inner join. the Informatica Server adds a WHERE clause to the default query. the Informatica Server replaces the join information specified by the metadata in the SQL query. the Informatica Server adds an ORDER BY clause to the default SQL query. The Source Qualifier represents the records that the Informatica Server reads when it runs a session. the Informatica Server adds a SELECT DISTINCT statement to the default SQL query. If you specify a number for sorted ports. If you include a filter condition.
• . click Rename.Configuring Source Qualifier Transformation To configure a Source Qualifier: • • • In the Designer. In the Edit Transformations dialog box. Click the Properties tab.. and click OK. The naming convention for Source Qualifier transformations is SQ_TransformationName. Double-click the title bar of the Source Qualifier. enter a descriptive name for the transformation. open a mapping.
Tracing Level Sets the amount of detail included in the session log when you run a session containing this transformation. the Informatica Server adds an ORDER BY to the default query when it reads source records. Select Distinct Specifies if you want to select only unique records.Configuring Source Qualifier Option SQL Query Description Defines a custom query that replaces the default query the Informatica Server uses to read data from sources represented in this Source Qualifier User-Defined Join Source Filter Specifies the condition used to join data from multiple sources represented in the same Source Qualifier transformation Specifies the filter condition the Informatica Server applies when querying records. starting from the top of the Source Qualifier. the database sort order must match the session sort order. Number of Indicates the number of columns used when sorting records queried from relational Sorted Ports sources. The Informatica Server includes a SELECT DISTINCT statement if you choose this option. If you select this option. The ORDER BY includes the number of ports specified. When selected. .
The combination of sources can be varied. such as flat file sources.Joiner Transformation While a Source Qualifier transformation can join data originating from a common source database. the Joiner transformation joins two related heterogeneous sources residing in different locations or file systems. Joiner transformations typically combine information from two different sources that do not have matching keys. The Joiner transformation allows you to join sources that contain binary data. You can use the following sources: • • • • • • Two relational tables existing in separate databases Two flat files in potentially different file systems Two different ODBC sources Two instances of the same XML source A relational table and a flat file source A relational table and an XML source If two relational sources contain keys. . then a Source Qualifier transformation can easily join the sources on those keys.
The Designer creates the Joiner transformation. The Designer configures the second set of source fields and master fields by default. Double-click the title bar of the Joiner transformation to open the Edit Transformations dialog box. Click OK. Select and drag all the desired input/output ports from the second source into the Joiner transformation. Select the Ports tab. You can edit this property later. Click any box in the M column to switch the master/detail relationship for the sources. choose Transformation-Create. making it easier for you or others to understand or remember what the transformation does. Enter a description for the transformation.Creating a Joiner Transformation To create a Joiner Transformation: • • • • • • • In the Mapping Designer. Enter a name for the Joiner. Change the master/detail relationship if necessary by selecting the master source in the M column. . Select the Joiner transformation. This description appears in the Repository Manager. Keep in mind that you cannot use a Sequence Generator or Update Strategy transformation as a source to a Joiner transformation. The Designer creates input/output ports for the source fields in the Joiner as detail fields by default. The naming convention for Joiner transformations is JNR_TransformationName. Drag all the desired input/output ports from the first source into the Joiner transformation.
Creating a Joiner Transformation Select the Condition tab and set the condition. .
Detail Outer. be sure there is enough disk space on the file system. Master Outer. the caches are created in a directory specified by the server variable $PMCacheDir. Cache Directory Specifies the directory used to cache master records and the index to these records. The directory can be a mapped or mounted drive. Join Type .Configuring Joiner transformation Joiner Setting Description Case-Sensitive If selected. the Informatica Server uses case-sensitive string String Comparison comparisons when performing joins on string columns. If you override the directory. By default. or Full Outer. Specifies the type of join: Normal.
argument. synonym or Flat File. Connected Lookups Receives input values directly from another transformation in the pipeline For each input row.lookup_transformation_name (argument. you can pass multiple input values into the transformation. but only one column of data out of the transformation .. the Informatica Server queries the lookup table or cache based on the lookup ports and the condition in the transformation Passes return values from the query to the next transformation Un Connected Lookups Receives input values from an expression using the :LKP (:LKP.Lookup Transformation Used to look up data in a relational table. view.. It compares Lookup transformation port values to lookup table column values based on the lookup condition. With unconnected Lookups. .)) reference qualifier to call the lookup and returns one value.
Cached or uncached. By default. cached or uncached: to be connected or Connected or unconnected. . This enables you to look up values in the target and insert them if they do not exist. you can choose to use a dynamic or static cache. Connected and unconnected transformations receive input and send output in different ways. Sometimes you can improve session performance by caching the lookup table. You can configure the transformation unconnected. Informatica recommends that you cache the target table as the lookup.Lookup Transformation You can configure the Lookup transformation to perform different types of lookups. the lookup cache remains static and does not change during the session. With a dynamic cache. the Informatica Server inserts rows into the cache during the session. If you cache the lookup table.
4) Support user defined default values Unconnected lookup Receives input values from the result of LKP expression within other transformation. Does not support user defined default values .Diff bet Connected & Unconnected Lookup Connected lookup 1) Receives input values directly from of a the pipe line transformation. Cache includes all lookup out put ports. U can use a static cache. 2) U can use a dynamic or static cache 3) Cache includes all lookup columns used in the mapping.
Diff between Static & Dynamic Cache Static Cache 1) U can not insert or update the cache 2) The Informatica Server does not update the cache while it processes the Lookup transformation Dynamic Cache U can insert rows into the cache as u pass to the target The Informatica Server dynamically inserts data into the lookup cache and passes data to the target table. .
or use instructions coded into the session mapping to flag records for different database operations. you set your update strategy at two different levels: • Within a session. This illustrates how you might store historical information in a target table. if you want the T_CUSTOMERS table to be a snapshot of current customer data. Within a mapping. or reject. The model you choose constitutes your update strategy. you might have a target table. Within a mapping. you need to determine whether to maintain all the historic data or just the most recent changes. T_CUSTOMERS. As part of your target table design. In this case. delete. you can instruct the Informatica Server to either treat all records in the same way (for example. When you configure a session. However. that contains customer data. you would create a new record containing the updated address. • . you would update the existing customer record and lose the original address.Update Strategy Transformation When you design your data warehouse. you may want to save the original address in the table. When a customer address changes. you need to decide what type of information to store in targets. For example. instead of updating that portion of the customer record. and preserve the original record with the old customer address. how to handle changes to existing records. treat all records as inserts). you use the Update Strategy transformation to flag records for insert. In Power Mart and Power Center. update.
you have the following options: Setting Insert Description Treat all records as inserts. the primary key constraint must exist in the target definition. if the Informatica Server finds a corresponding record in the target table (based on the primary key value). For each record. Treat all records as deletes. the Informatica Server updates the record. the Informatica Server ignores all Update Strategy transformations in the mapping. the Informatica Server looks for a matching primary key value in the target table. If it exists. Note that the primary key constraint must exist in the target definition in the repository. the Informatica Server deletes it. Delete Update Data Driven . the Informatica Server rejects the record. If the mapping for the session contains an Update Strategy transformation.Setting up Update Strategy at Session Level During session configuration. If inserting the record violates a primary or foreign key constraint in the database. For each record. delete. For the Treat Rows As setting. or reject. update. this field is marked Data Driven by default. Treat all records as updates. The Informatica Server follows instructions coded into Update Strategy transformations within the session mapping to determine how to flag records for insert. If you do not choose Data Driven setting. you can select a single database operation for all records. Again.
Update Strategy Settings setting you choose depends on your update strategy and the status of data in target tables: Setting Insert Use To Populate the target tables for the first time. when you configure how to update individual target tables. not just a select group of target tables. you must set this strategy for the entire data warehouse. Update target tables. Choose this setting if records destined for the same table need to be flagged on occasion for one operation (for example. or maintaining a historical data warehouse. this setting provides the only way you can flag records for reject. Clear target tables. In the latter case. reject). update). Later. delete. You might choose this setting whether your data warehouse contains historical data or a snapshot. you can determine whether to insert updated records as new records or use the updated information to modify existing records in the target. Delete Update Data DrivenExert finer control over how you flag records for insert. . or for a different operation (for example. update. In addition. or reject.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.