1. The document describes various Informatica transformations and components used for data integration and ETL processes.
2. Transformations like Router, Filter, Expression, Lookup, Joiner, Aggregator, Sorter, Rank are used to transform, filter, aggregate, join and sort data from different sources.
3. Components like Sequence Generator, Stored Procedure, Normalizer, XML Source Qualifier are used for generating sequences, calling stored procedures, normalizing data, and qualifying XML sources.
1. The document describes various Informatica transformations and components used for data integration and ETL processes.
2. Transformations like Router, Filter, Expression, Lookup, Joiner, Aggregator, Sorter, Rank are used to transform, filter, aggregate, join and sort data from different sources.
3. Components like Sequence Generator, Stored Procedure, Normalizer, XML Source Qualifier are used for generating sequences, calling stored procedures, normalizing data, and qualifying XML sources.
1. The document describes various Informatica transformations and components used for data integration and ETL processes.
2. Transformations like Router, Filter, Expression, Lookup, Joiner, Aggregator, Sorter, Rank are used to transform, filter, aggregate, join and sort data from different sources.
3. Components like Sequence Generator, Stored Procedure, Normalizer, XML Source Qualifier are used for generating sequences, calling stored procedures, normalizing data, and qualifying XML sources.
-Provides multiple targets -In Filter we can pass only one -Is for generating Unique serial from single source. condition –It also allows to use numbers. -Router allows to use a a condition to test data -We use this, when the data is condition . -It tests the data for One not coming from source. -It tests the data for one or condition. -Provides two output ports: more conditions. -To maximize session NEXTVAL and CURRVAL. Ex:- 1)to test data based on 3 performance use Filter as -Generates numeric values. conditions use one Router close as to Source in the -Can replace missing values. transformation instead of 3 Mapping Properties: start value/ Filters -Filter condition returns true increment by, end value, 2) groups : Input/ Output or false for each row passes Current value,cycle, Output group 2types –user through it. no.of cached values,reset defined and default Aggregator [Ports :I/O/V] Expression[C/P] Stored -Performs calculations on -Expression is for groups of data computations. Procedure[p/c/u] [min,max,average,total] -It calculates a value. -Calls a procedure. -Use Aggregator to eliminate -Expression performs a -It generates a status code to the Duplicate rows in flat calculation based on values know the stored procedure is files[Use group by ports] with in a single row. completed successfully or not. -Components : Agg exp,group Ex:- Based on the price and -For maintaining databases we by port, sorted input, aggregate quantity of a particular item, use stored procedures. cache. you can calculate the total Status code: Provides error -Don’t use sorted input when purchase price for that item. handling for the informatica the session was incremental server during a workflow. aggregation & input data is UN CONNECTED: to run data driven. nested stored procedures. Sorter[C/A] Rank[C/A] NORMALIZER -Allows to sort the data. We -To select the TOP or -Use Normalizer instead of can sort the data from source BOTTOM rank of data. Source Qualifier when you transformation in ascending or -we can find Large or smallest normalize a COBOL source . descending order according to numeric value in a group. -Normalizes records from a specified sort key. -we can select top 10 COBOL and Relational sources [in mapping designertransselect -The data which is passing Clients in a company Normalizer trans create.] through Sorter is sorted according to sort key. XML Source Qualifier [p / c] -Use this only with an XML source Active:Can change the number Passive: It does not change the definition. -U can link only one of rows that passes through it no. of rows that passes through it XML source qualifier to one XML Ex: Filter-Removes that do not Ex:- Expression performs a source definition. –It has one input/output port for every column meet the filter condition. calculation on data . in the XML source. Lookup [ports: I/O/L/R] Look SQL override: Joiner[C/A] -Lookup is mostly used to “if -Overrides the default SQL -Joins two related target row present update else statement to query the look up heterogeneous Sources in insert “ kind of operation. table. different locations -I used to see whether the row -Specifies the SQL statement to -We can use two relational we are processing already use for querying look up values. sources existing in separate exists in the target or not -Use only with lookup caches data bases. -If it exists update the target, if enabled. -Two different ODBC sources. not exists do an insert. Persistent Cache: Relational table and XML We use lookup If it is persistent we can save the sources with at least one -to search for related value. lookup cache files and reuse matching port we can join two -to perform some calculations them ,If non persistent the rows sources -to search whether records will be deleted -Joiner allows to join sources exists or not in the target table. Shared Cache that contain binary data. -lookup is basically used when -We can share the lookup we want to get some value cache between multiple Cannot Join in the following based on some condition transformations. Situations: - -in lookup do SQL Override -We can shared an unnamed -both input pipe lines originate first and then go for condition. cache between transformations from the same SourceQualifier. Properties: in the same mapping. -We can share named cache Normal : Based on the between transformations in condition the server discards all the same or different mappings. rows of data from master and detail source that do not match. Connected Lookup Master: Keeps all rows of Un Connected Lookup data from the detail source and 1.we can use Dynamic or static 1.We can use a static cache cache the matching Rows from the 2.Does not supports. 2.Supports User defined default master source . From the master values source it discards the 3.Used for updating slowly 3.Is used to fetch values based unmatched rows. changing dimension tables. on incoming values. Detail outer : Keeps all rows 4.not connected in the mapping. 4.Connected Lookup is from the master source and the 5.use return port only in this connected in the mapping. matching rows from the detail transformation source. Dynamic Cache Static Cache Full Outer: keeps all rows of 1. we can insert rows into cache 1.we cannot insert or update data from both the Master and as we pass to the target rows Detail source. 2. when the condition is false, 2. when the condition is SOURCE QUALIFIER server inserts rows TRUE: server returns the value -to filter records when the This indicates that the row is from lookup table server reads source data. not in the source or target table. FALSE: server returns default -to specify sorted ports value for connected & null for -to select only distinct values unconnected from source. Properties: sql query, user define join, source filter, no.of sorted ports, pre, post sql
MIGRATION: Testing Mappings And
-You need to check several Respository: Stores Sessions things before making folder information or metadata used by -We have to write the test cases copies between environments: informatica server and client that business validity and also -Have a backup Of dev, QA, tools. to make sure that we are getting and Production repository. Metadata: Represents diff the right target. -No informatica jobs are types of objects like mappings, Example to make it clear: running or scheduled to run transformations, source -Let’s say you are extracting during the folder copy time. definitions, Target definitions. data from source into staging -You must wait until jobs area, in that case finish, or stop all jobs before Metadata Repository: -One of your test cases will be, doing the move. -Is a web based application that to check the row count of -Remove locks, make sure users allows you to run reports. source and then the row count are out of development, QA, -Provides number of reports, of the staging tables and Production repository before including reports on executed compare the results. Row moving any folders. -Be sure sessions, mappings, and source count should be Match. all work has been saved and target schemas. Usually when we talk of DW -View locks and remove all old -Providing information on how Testing, most of the times we locks that still remain after all to install and use the web based have to write the SQL queries to developers logged out.(no jobs metadata reporter to generate do the testing against source, are running) reports on metadata. staging and Target. -Copy shared folders first, then We have to check for: project folder. 1.Source- Target Validation -Stop and start informatica Bad data handling: 2.Scheduling Process services-Be sure scheduled We can clean the bad data before 3.Constraints jobs come backup. doing informatica loading by 4.Full load, Delta Load, Reload Moving Mappings,Sessions, using some scripts (shell scripts) Processes. etc from Development to which will check for number of 5.Business Logic production environment columns/ date format/ numeric 6.All DLL definitions -Copy development folder to values/ size for data files. -We have QA lead, we will talk production folder using folder copy option in repository Surrogate Key to him. -It has system generated -Depending upon data different manager. -In this way all the artificial primary key values validations are done (eg: date, source and target tables and -It allows to maintain concats etc) mappings will be copied to production environment. historical records in DW . Unit Testing: is testing the -Copy option will get all the -SK is unique for each row in single module or mapping. objects to the production . the table. Integration Testing: Export/Import will have to be -SK is a substitution for the Combining two modules or done separately for mapping, natural primary key. mappings and checking whether session etc. -When we copy a -Is is just a unique identifier or data is getting reflected in folder from one Repository to number for each row. another module. another Repository every thing will be copied into the moved Repository. Session log files: Recovering Sessions: -Server creates session log file MAPPING -If you stop a session or if an for each session. PARAMETERS and error causes a session to stop, -It writes information about VARIABLES. refer to the session and error session Mapping Parameters: - logs to determine the cause of into log files such as -Represents a constant value. failure. initialization -We can define before running a -Correct the errors and then process, creation of SQL session. complete the session: commands -In parameter file we can define Use one of the following to for errors and load summary. Mapping parameters and complete the session: Session Detail File: Variables and Session Run the session again if -This file contains load Parameters. the server has issued a statistics for each target in -Mapping Parameters retains commit. mapping. the same value through out the Consider performing -Includes information such as entire session. recovery if the server has table name, number of rows EX: In my project instead of issued at least one written or rejected. creating separate mappings for commit. Performance detail each Client or Customer, We Recovering stand alone created a mapping for single session: file: Customer. -A stand alone session is a Contains information about -Before running the session, Enter session that is not nested in a performance the value of the parameter in the Parameter File. batch. Reject File: Mapping variable:- -If a stand alone session fails This file contains the rows of we can recover by using a -Mapping Variables and data that the writer does not menu command or pmcmd. Parameters represents a value write to targets. -These options are not available that can change through the Control Files: for batched session SESSION. Contains the information about Parameter File: the target flat files such as data Q. I am trying to run a -Parameter File is to define the workflow with a parameter file format and loading instructions values for Parameters and and one of the session keeps on Session Parameters: Variables used in a session. failing? Source File Name: use the -It is created by Text editor Ans: In the parameter file, the parameters when you want to such as Word pad or Note pad. parameter may not be listed change the name or location of -In Parameter file we can define The session uses the session source file between Mapping parameters, Mapping parameter file to start all session runs. variables and Session sessions in the WF. Target File Name: is used Parameters Check the session when u want to change the properties see whether name or location of session session parameters are target file between session runs. defined correctly in the Reject file name: is used when parameter file u want to change the name or location of session reject file Use pmcmd to start the between session runs. session. Target LoadOrder[Joiner Cobol Copy Book Transformation-designer] Aggregator -Must be a text file. “Setting the order in which trans- Performance. -A period must exist at the end the server sends rows to To improve the performance of of the line to separate one line different target definitions in Aggregator and Session use from the other. a mapping” SORTED INPUT. -Field names cannot have --To set the Target Load Order: Aggregate Cache parenthesis( ) 1.Create a mapping that contains -Server Stores group values in an -The level of record type from 1 multiple Source Qualifier Trans. index cache and row data in the data cache. to 99 must start with a zero( 0 ). 2.When the mapping is completed, select Mappings -Target Load -Filter before aggregation. Working with COBOL Copy Plan[dialog box] -We cannot use sorted input when books... 3.Select a Source Qualifier from the source is Data driven. -The designer cannot recognize the list. IncrementalAggregation a COBOL Copy book(.cpy 4.Click the up and down buttons -If we run the session with file) as a COBOL file (.cbl), to move the Source Qualifier incremental aggregation enabled, because it lacks the proper within the load order – ok -save we should not do modifying the format. index which store Historical -Constraint based Loading[WFM] -To import the COBOL copy aggregate information. EX: 1. ‘A’ has a primary key, B and book in the Map Designer, we C has foreign keys referencing Sorted Input can insert it into a COBOL file the ‘A’ primary key. -Shows that input data is pre sorted template by using the COBOL 2. C - has Primary key , that D by groups. Statement " COPY ". references as a Foreign key. Data driven -After we insert the Copy book These four tables receive records -Is a default option if the from a single active source. file into the COBOL file mapping for the session template, we can save the file LOAD MANAGER contains an update When the server runs a WF ,the as a ( .cbl ) file and import it in Load manager: strategy transformation. the Designer. -Locks the workflow and reads -If we do not choose Data driven -If the ( .cbl ) file and ( .cpy ) workflow properties. when a mapping contains an file are not in the same local -Reads the parameter file and Update Strategy, The Work directory, the designer prompts expands workflow variables. Flow Manager shows a warning. for the location of the ( .cpy ) -Runs workflow tasks. –Starts the DTM to run session file. -When the COBOL Copy book DTM: file contains tabs,The designer When the server runs a SESSION , expands tabs into spaces. The DTM : -By default the designer -Fetches session and mapping from the repository. expands a tab character into 8 -Creates the session log file. spaces -Runs pre-session shell commands. -The OCCURS Clause -Creates and runs mapping reader , specifies the number of writer and transformation threads to repeated occurrences of data extract, Transform and load data. items with the same format in -Runs post-session shell commands. a table. -The REDEFINES Clause is to have multiple field definitions for the same storage SLOWLY CHANGING Partitioning [workflow DEBUGGER DIMENSIONS manager] “ I used to find out the changes TYPE 1 : Current data between the transformations and -New row insert it Round Robin partition: what level the change is occurring”. -Any update in source, update the -To process each partition target. approximately the same number When we run the debugger , the -Insert or Update. of rows. designer displays: “to keep the recent values in the Hash partition target” -To group rows of data among Debug log:View message from the partitions server uses hash Debugger. TYPE 2: History function. -New row -> Insert it. -In Hash partition there are two Target window: View target data. -Any update in source- Insert a new row types of keys. Instance Window: View Hash auto-keys: Use at RANK, transformation data “to keep full history of changes in SORTER the target,( PK is changed)”. Hash user keys: To generate the -The DEBUGGER runs a workflow partition key specify a number of for each session type. TYPE 3: History and current. ports. We can run the Debugger -New row – Insert it. Key Range Partition: -Before running the Workflow. -If the Sources or Targets in the -After running the Workflow. “to keep the current and previous pipeline are partitioned by key values in the target”. range we use key range partition. DEBUG PROCESS: 1) Set the partition type at the target 1)Creating break points: Create in Those dimensions which will instance to key range. mapping to find out the error change over time are SCD. 2) Create 3 partitions condition. 3) Choose ITEM_ID as the partition key. 2)Configure the debugger: 4) Set the key ranges as follows ITEM_ID Start Range End 3)Run the Debugger: Range The server reads the break points Partition#1 1000 3000 and Pauses the Debugger when Partition#2 3000 6000 the break points evaluate to Pass Through [default] TRUE. Power Mart -We use this when we want to Power center -does not increase the performance. -allows controlling of several -By default mapping contains this systems from a central point. at Source Qualifier and Target -does not -allows global repositories. -One reader and writer thread. Instance. -gives you the ability to partition your loads for performance. This allows multiple reader and writer threads. -Power mart is for data mart -Power center is for EDW. needs. -Can source data from relational -Power center can source data and flat files. from mainframe, legacy, ERP,EAI MAPPLET/ WORKLET/ PMCMD WORKFLOW / PMCMD FOLLOWING Mapping Standards Mapplet: Workflow: Following Mapping -If we are using the same set of -My datamart is with 3 batch Standards: Transformations in N number incremental load -Minimize the use of Stored -We already have batch sessions of mappings then we will create Procedures running as workflow that particular set as a single -So I have been assigned with Naming Conventions—like mapplet. multiple assignments which are giving specific names -You can't put a mapplet inside under workflow. -Error Handling another mapplet. Scheduling Workflow Audit Maintainance -Mapplets are like pre-processor There are 3 options -How many input records you macros not like sub-routines – 1)Run on demand->Manual are getting when the session is loaded, -Should not be checked if U -How many records loaded into every mapplet instance is have any of the scheduler warehouse-How many errors replaced by contents of the settings on. Process Tracking— mapplet. 2)Run continuously-> -Which map, which session, You cannot use the following -As per schedule but start on which table got invoked objects in a Mapplet server Initialization. during 1.Normalizer Trans, 2.Cobol -is a special option to keep in the ETL process Sources 3.XML Source loop. -because we are not touching Qualifier Transformation. 3)Run on server initialization-> the 4.XML sources, 5.Target -As per Schedule but starts on metadata Definitions 6.Other Mapplets. Server initialization, But -According to the audit they Worklet: continuous. need the statistics of the ETL -Worklet is like another task but -check this only if U want to process in their warehouse actually it contains set of tasks. run the session on initializing. [Find in Repository] -Create Worklet when you want Workflow to reuse a set of Workflow -In workflow manager the Event- If Running Concurrent wait and Event-Raise tasks are to logic in several workflows. Sessions[fails] control the sequence of task execution in the workflow. -In WF-Manager select session “I used worklets with in -Tasks we create in Workflow and in general tab: select FAIL workflow and just scheduled manager are non-reusable, Tasks we PARENT if this TASK fails that one workflow” create in Task Developer are and fail parent if this task reusable. does not. pmcmd using in unix unix prompt$ pmcmd <enter> pmcmd -Run workflow by using PMCMD pmcmd>connect –U user name command. -p password –s server name: Command : pmcmd Start workflow port no <enter> –s ip of infa server. port-U****- pmcmd>getwfdetails –f folder P****-F Folder_ NAME_Wait name WF name<enter> Workflow_name`. displays :[user name,password, -pmcmd is used to communicate with the Informatica server. start time, end time ,fail?succ?] -Used to start and stop workflows pmcmd>getsessiondetails –f and Tasks. worklet.session name
-If you look at session log, you will 1.Eliminate Unnecessary ports in -Creating Indexes come to know which part of the Source Qualifier. -Enough Table Spaces mapping took long time. 2.Add where clause where ever -Avoiding too many spaces -The SQL in the Source Qualifier necessary. -Set SQL TRACE ON to display might take long time to return 3.Create indexes on lookup the statistics. results since you are joining 3 condition ports. -Use auto trace function in large tables. 4.Indexing each column in the SQL*PLUS to automatically see -The Lookup cache creation will lookup condition can improve the Explain Plans. also take much time to cache 20 session Performance. -Use HINTS where ever needed. million records.[This can be tuned Particularly large lookup tables. Explain Plan:- by having Dynamic Lookup- 5.Do not select the row into a EXPLAIN PLAM gives you the Persistent cache] lookup unless that row is plan how your query will be -If you feel that Normalization takes needed. executed, also gives you the much time then try to replace the 6.Use Bulk loading. “Cost” of the query. Normalizer with an Expression -Disable the index updates during SQL> EXPLAIN PLAN/ SET using right logic. the bulk load and then refresh statement_id = ‘emp_sal’ FOR -Avoid using unwanted ports in the the indexes after the load. SELECT ename,job,sal,dname / Transformations. -This is generally faster than FROM emp,dept Target Bottlenecks: dropping and rebuilding indexes. WHERE emp.deptno=dept.deptno We can identify target bottlenecks 7.The lookup cache creation will AND NOT EXISTS (SELECT * by configuring the session to write also take much time to catch FROM sal grade WHERE emp.sal to a flat file target. If the session millions of records. [this can be BETWEEN losal AND hisal); increases when you write to a flat tuned by having dynamic SQL>EXPLAIN PLAN FOR file, You have a target bottleneck. lookup – persistent cache] SELECT * FROM emp; Tuning:- Drop indexes and key 8.Create appropriate indexes on SQL>DESC plan_table; constraints. -- Use bulk loading. target table. SET TRACEON: Creates a trace -Increase database network packet Read performance file. size. – Optimize target database. -Reduce the no. of records TK PROF:- Just a translator that Source: Extract the query from processed. converts trace file into a readable the session log and run it. Measure -Move the Filters/Aggregations to format. the time. If there is difference then the beginning [ next steps -Parsing/Optimizing/Executing go for tuning processes number of records] $TKPROF Ora_12558.trc Tuning: Optimize the query. Use Write performance trace.txt file stored in a conditional filters. -Drop indexes before loading [ if formatted version trace.txt. update is not there] TKPROF Command Line Error (Session) -Disable constraints before load and Options: -Data type mismatches. – Column re-enable the constraints after load. Value truncated, Table or View -Consider increasing the commit -EXPLAIN/TABLE/SYS/SORT/ does not exists in the database. level. RECORD/PRINT/INSERT. -Out of table space. -Use Hints. Handling Qes: When you are running a -Check the session Log (Run a test session, Ur mapping is fine.Ur loads with Verbose mode ON to session is fine and Ur session ran give more information about data). successfully.But the data is not -Check for directory paths. loaded into ur Database. -Check for the Source file names. Ans:- Could be in test load. – If it is -Check for the source and Target an Incremental load then the source Database connections-Remove the record might be already existing in Test load option after testing is the Target table. done.-Run debugger -Target Load Option as “Normal”. Importing Sources Informatica Versions differences Importing Excel sheets [source] Open the Excel file. Informatica 5x Informatica 6x -Select the columns and rows that 1.In 5x we will have to delete 1.In version 6x we can over you need to use as source. the existing mapping and write the existing mapping -In the first row, enter the column session and copy the Updated and session. names as it is for table(don't one 2.From version 6x it is include spaces). 2.We can not pass a parameter possible. -Then select from that row onwards till the column you require. in Lookup SQL Over ride till 3.You can create flat file target - Now click (on the menu) Insert 5x. definitions in the Designer to Name -->Define. 3.Informatica 5x has output data to flat files. If U have many sheets create ranges server manager 4.You can include multiple for each of the sheets SQT in a mapplet. 2. Format numeric columns as 5. 6x has workflow manager 'Number' 3. Save the file. and to watch the execution 4. Create an ODBC DSN pointing workflow monitor. to the file 6. Cannot do lookup on flat files. Now in Informatica- Designer, Informatica 7x 7x.Union Transformation choose 'Import from Database' and specify the DSN name and then -UNION,CUSTOM,XML -is used to merge data from choose the tables to be imported. STREAM transformation. pipeline branches into one Importing Flat files -Lookup on flat files. pipeline branch. Flat file -Grid Technologies[ server -It merges data from multiple -While importing flat file into the sources similar to the UNION source it will ask for delimiters. Grid]- here servers running -At that place there is a check box on different operating systems ALL SQL statement to to enable to treat the consecutive can coexist on the same server combine the results from two characters as Delimiters. Just grid. or more SQL statements. enable it. -You can import and export -The Union Transformation -When u click on “import from file” repository objects [both does not remove duplicate in the source menu , in the source analyzer in the second screen there dependent and independent] rows. -You can even use PMCMD -We connect heterogeneous exists a check box to treat consecutive delimiters as one. Repository. sources to this transformation -Check or Uncheck this as per your -In web application we can -In this we can create multiple Requirement. move the mapping which will input groups but only one Upgrading from version 6.2 to 7.0 output group. -Make a backup of the repository on be available to others. the server. Custom transformation -Repository->file->export->to -creates transformation location on the same server. applications, such as sorting -Uninstall Informatica 6.2 –Install and aggregation. Informatica 7 -Repository 7.1->import->from the -Is used to create a same server. transformation that requires multiple input groups, output groups or both.