You are on page 1of 40

session properties -> general tab -> treat input link as 'AND'/'OR' session partition with aggregator transformation preview data

******************************************************** High availability. You can use the high availability option to eliminate single points of failure in the PowerCenter environment and reduce service interruptions in the e vent of failure. High availability provides resilience, failover, and recovery for servi ces. infacmd infasetup pmrep Versioned Objects Workflow recovery Custom transformation. The Custom transformation has the following enhancements: Procedures with thread-specific operations. Java transformation. *********************************************************** Error codes BR - error related reader process, including ERP, flat file, relation sources CMN - error related databases, memory allocation, lookup, joiner and internal er rors DBGR - error related to debugger SE, TM, WID - errors related to transformations PMF - error related caching in aggregator, lookup, joiner, rank RR - error related to relational sources EP - error related to external procedure LM - error related Load Manager REP - error related to repository functions VAR - error related to mapping variable WRT - error related to writer

*********************************************************** Netezza and oracle - rownum and limit ************************************************************ Aggregator - Active & Connected source qualifier - Active & Connected Filter - Active & Connected Expression - Passive & Connected joiner - Active & Connected lookup - passive & connected/unconnected HTTP - passive & connected normalizer - active & connected rank - active & connected router - active & connected sequence - passive & connected sorter - active & connected stored procedure - passive & connected/unconnected union - active & connected

*************************************************************************8 A transformation is a repository object that generates, modifies, or passes dat a. The Designer provides a set of transformations that perform specific function s. For example, an Aggregator transformation performs calculations on groups of data. Lookup transformation: be sure to delete the unwanted columns from the lookup as they affect the loo kup cache very much. if the Lookup transformation is after the source qualifier and there is no ac tive transformation in-between, you can as well go for the SQL over ride of sour ce qualifier The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't fin d the data you are lookingup in the cache, it swaps the data from the file to th e cache and keeps doing this until it finds the data. This is quite expensive fo r obvious reasons being an I/O operation. Increase the cache so that the whol e data resides in the memory Sequential and Concurrent caches: The 8.x version of PowerCenter gives us this w onderful option to build the caches of the lookups either concurrently or in a s equential manner depending on the business rule. If no business rule dictates ot herwise, concurrent cache building is a very handy option. [HINTS] Difference b/w Aggregator and Expression Transformation? Expression transformati on permits you to perform calculations row by row basis only. In Aggregator you can perform calculations on groups. HTTP Transformation Passive & Connected. It allows you to connect to an HTTP server to use its servi ces and applications. With an HTTP transformation, the Integration Service conne cts to the HTTP server, and issues a request to retrieves data or posts data to the target or downstream transformation in the mapping.


**************************************************************************** Q. What type of repositories can be created using Informatica Repository Manager ? A. Informatica PowerCenter includeds following type of repositories : Standalone Repository : A repository that functions individually and this is unr elated to any other repositories. Global Repository : This is a centralized repository in a domain. This repositor y can contain shared objects across the repositories in a domain. The objects ar e shared through global shortcuts.

Local Repository : Local repository is within a domain and it s not a global repos itory. Local repository can connect to a global repository using global shortcut s and can use objects in it s shared folders. Versioned Repository : This can either be local or global repository but it allo ws version control for the repository. A versioned repository can store multiple copies, or versions of an object. This features allows to efficiently develop, test and deploy metadata in the production environment. Q. What is a code page? A. A code page contains encoding to specify characters in a set of one or more l anguages. The code page is selected based on source of the data. For example if source contains Japanese text then the code page should be selected to support J apanese text. When a code page is chosen, the program or application for which the code page i s set, refers to a specific set of data that describes the characters the applic ation recognizes. This influences the way that application stores, receives, and sends character data. Q. Which all databases PowerCenter Server on Windows can connect to? A. PowerCenter Server on Windows can connect to following databases: IBM DB2 Informix Microsoft Access Microsoft Excel Microsoft SQL Server Oracle Sybase Teradata Q. Which all databases PowerCenter Server on UNIX can connect to? A. PowerCenter Server on UNIX can connect to following databases: IBM DB2 Informix Oracle Sybase Teradata Infomratica Mapping Designer Q. How to execute PL/SQL script from Informatica mapping? A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. I n SP Transformation PL/SQL procedure name can be specified. Whenever the session is executed, the session will call the pl/sql procedure. Q. How can you define a transformation? What are different types of transformati ons available in Informatica? A. A transformation is a repository object that generates, modifies, or passes d ata. The Designer provides a set of transformations that perform specific functi ons. For example, an Aggregator transformation performs calculations on groups o f data. Below are the various transformations available in Informatica: Aggregator Application Source Qualifier

Custom Expression External Procedure Filter Input Joiner Lookup Normalizer Output Rank Router Sequence Generator Sorter Source Qualifier Stored Procedure Transaction Control Union Update Strategy XML Generator XML Parser XML Source Qualifier Q. What is a source qualifier? What is meant by Query Override? A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session. When a relational or a f lat file source definition is added to a mapping, it is connected to a Source Qu alifier transformation. PowerCenter Server generates a query for each Source Qualifier Transformation wh enever it runs the session. The default query is SELET statement containing all the source columns. Source Qualifier has capability to override this default que ry by changing the default settings of the transformation properties. The list o f selected ports or the order they appear in the default query should not be cha nged in overridden query. Q. What is aggregator transformation? A. The Aggregator transformation allows performing aggregate calculations, such as averages and sums. Unlike Expression Transformation, the Aggregator transform ation can only be used to perform calculations on groups. The Expression transfo rmation permits calculations on a row-by-row basis only. Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the data, the aggregator transformation outputs the last r ow of each group unless otherwise specified in the transformation properties. Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAS T, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE. Q. What is Incremental Aggregation? A. Whenever a session is created for a mapping Aggregate Transformation, the ses sion option for Incremental Aggregation can be enabled. When PowerCenter perform s incremental aggregation, it passes new source data through the mapping and use s historical cache data to perform new aggregation calculations incrementally. Q. How Union Transformation is used? A. The union transformation is a multiple input group transformation that can be

Connected lookup takes input values directly from other transformations in th e pipleline. Q. By default every row is mar ked to be inserted in the target table. What is update strategy and what are the options for update strategy? A. DD_UPDATE : If this is used the Update Strategy lent numeric value of DD_UPDATE is 1. It compares lookup transformation ports (input ports) to th e source column values based on the lookup condition. Informatica processes the source data row-by-row. Following options are available for update strategy : DD_INSERT : If this is used the Update Strategy ivalent numeric value of DD_INSERT is 0. Q. Can two flat files be joined with Joiner Transformation? A. Unconnected lookup doesn t take inputs directly from any other transformation. Q. Q. but it can be used in any transformation (like expression) and can be invoked as a function using :LKP expression. This transformation wor ks just like UNION ALL statement in SQL. What is the difference between a connected look up and unconnected look up? A. Equ flags the row for update. What is a mapplet? A. The reusabl e transformation is stored as a metadata separate from any other mapping that us es the transformation. Q. Reusable transformations can be used multiple times in a mapping. DD_DELETE : If this is used the Update Strategy valent numeric value of DD_DELETE is 2. What does reusable transformation mean? A. So.used to merge data from various sources (or pipelines). What is a look up transformation? A. This transformation is used to lookup data in a flat file or a relational tab le. joiner transformation can be used to join data from two flat file source s. Can a lookup be done on Flat Files? A. If the row has to be updated/inserted ba sed on some logic Update Strategy transformation is used. Yes. view or synonym. Q. that is used to combine result set of t wo SELECT statements. The ma pplet contains set of transformations and it allows us to reuse that transformat ion logic in multiple mappings. Equiva flags the row for deletion. The condition can be s pecified in Update Strategy to mark the processed row for update or insert. an unconnected lookup can be called multiple times in a mapping. DD_REJECT : If this is used the Update Strategy flags the row for insertion. Q. all the mappings where the transformation is used will be invalidated. Later returned values can be passed to other transformations. Yes. Equ . A mapplet is a reusable object that is created using mapplet designer. Equi flags the row for rejection. Whenever any changes to a reusable transformation are mad e.

*******************************************************8 SESSION LOGS Information that reside in a session log: Allocation of system shared memory Execution of Pre-session commands/ Post-session commands Session Initialization Creation of SQL commands for reader/writer threads Start/End timings for target loading Error encountered during session Load summary of Reader/Writer/ DTM statistics Other Information By default. relational and flat file. Related to database. The number following a thread name indicate the following: (a) Target load order group number (b) Source pipeline number (c) Partition number (d) Aggregate/ Rank boundary number Log File Codes Error Codes Description BR CMN DBGR EPLM TM REP WRT Related to reader process. memory allocation Related to debugger External Procedure Load Manager DTM Repository Writer Load Summary (a) Inserted (b) Updated (c) Deleted (d) Rejected Statistics details (a) Requested rows shows the no of rows the writer actually received for the specified operation (b) Applied rows shows the number of rows the writer successfully applied to the target (Without Error) (c) Rejected rows show the no of rows the writer could not apply to the targe t (d) Affected rows shows the no of rows affected by the specified operation Detailed transformation statistics The server reports the following details for each transformation in the mapping .ivalent numeric value of DD_REJECT is 3. including ERP. the server generates log files based on the server code page. Thread Identifier Ex: CMN_1039 Reader and Writer thread codes have 3 digit and Transformation codes have 4 digi ts.

Verbose Data Addition to Verbose Init. writer and transformations. Thi . you override tracing levels configured for transformations in the mapping. we can register and run multiple servers against a local or g lobal repository. Transformation errors can include conversion errors and any condition set up as an ERROR. Session Failures and Recovering Sessions Two types of errors occurs in the server Non-Fatal Fatal (a) Non-Fatal Errors It is an error that does not force the session to stop on its first occurrence. that?s of outer most batch. NOTE When you enter tracing level in the session property sheet. Server with different speed/sizes can be used for handling most complicated sessions. target or repository. (You can use only one Power Mart server in a local repository) Issues in Server Organization Moving target database into the appropriate server machine may improve effi ciency All Sessions/Batches using data from other sessions/batches need to use the same server and be incorporated into the same batch. Establish the error threshold in the session property sheet with the stop on opt ion. summarize session details (Not at the level of in dividual rows) Terse Initialization information as well as error messages. Transf ormation errors. T hat is selected in property sheet. Session/Batch Behavior By default. rows skipped. When you enable this option. Data files used and detailed transformation statistics. the server counts Non-Fatal errors that occur in the reader.(a) (b) (c) (d) Name of Transformation No of I/P rows and name of the Input source No of O/P rows and name of the output target No of rows dropped Tracing Levels Normal Initialization and status information. Errors encountered. the property goes t o the servers. and notifica tion of rejected data Verbose Init Addition to normal tracing. Names of Index.. Each row that passes in to mappin g detailed transformation statistics. Reader errors can include alignment errors while running a session in Unicode mo de. (b) Fatal Errors This occurs when the server can not access the source. every session/batch run on its associated Informatica server. In batches. loading NULL into the NOT-N ULL field and database errors. Writer errors can include key constraint violations. that contain sessions with various servers. Such as NULL Input. MULTIPLE SERVERS With Power Center. Hence you can distribute the repository session load across av ailable servers to improve overall performance.

the server does not use them as active sources in a source base d commit session. If the session uses normalizer (or) sequence generator transformations. The recovery session moves through the states of normal session schedule. During a session. By default. perform recovery is disabled in setup. you need to trunc ate the target and run the session from the beginning. the server continues to fill the writer buffer. Router and Update Strategy transformations are active transformations. (a) Target based commit Server commits data based on the no of target rows and the key constraints on the target table. it identifies the active source for each pipe . Although the Filter. The commit point is the commit interval you configure in the session properties. The server commits data to each target based on primary ?foreign key constr aints. If the initial recov ery fails. to abort a session when the server en counters a transformation error. The rows are referred to as source rows. (b) Source based commit Server commits data based on the number of source rows. During a session. the serv er can not update the sequence values in the repository. The se rver then reads all sources again and starts processing from the next rowid. Stopping the server using pmcmd (or) Server Manager Performing Recovery When the server starts a recovery session. The performance of recovery might be low. the Informatica serv er issues a commit command. you can run recovery as many times. © Others Usages of ABORT function in mapping logic. Hence it won?t make entr ies in OPB_SRVR_RECOVERY table. completed and failed. When a server runs a session. As a result. The normal reject loading process can also be done in session recovery proc ess. such as lack of data base space to load data. the amount of data committed at the com mit point generally exceeds the commit interval. The commit point also depends on the buffer block size and the commit interval. Initializing. the server commits data to the target based on the number of rows from an active source in a single pipeline. when a session does not complete. if o Mapping contain mapping variables o Commit interval is high Un recoverable Sessions Under certain circumstances. Commit Intervals A commit interval is the interval at which the server commits data to relational targets during a session. and a fatal error occur s. w aiting to run. it reads the OPB_SRVR_RECOVERY t able and notes the rowid of the last row commited to the target database. after it reaches the commit interval. When the buffer block is full. A pipeline consists of a source qualifier and all the transformations and t argets that receive data from source qualifier.s can include loss of connection or target database errors. running.

When each target in the pipeline receives the commit rows the server perfor ms the commit. such as finding duplicate key. the writer rejected the row because an update strategy expression marked it for reject. Reject Loading During a session.D. the server creates a separate reject file fo r each partition. They appears after every column of data and define the type of data prec eding it Column Indicator D Meaning Valid Data Writer Treats as Good Data.00 To help us in finding the reason for rejecting. Correcting Reject File Use the reject file and the session log to determine the cause for rejected data . Bad Data NOTE NULL columns appear in the reject file with commas marking their column. what to do with the row of wrong data. The server generates a commit row from the active source at every commit interval. the server writers the rej ected row into the reject file.0. Reading Rejected data Ex: 3. and another column ind icator. (b) Column indicator Column indicator is followed by the first column of data. (You cannot load rejected data into a flat file ta rget) Each time. (a) Row indicator Row indicator tells the writer. the server creates a reject file for each target instance in t he mapping.1.0. Row indicator Meaning Rejected By 0 Insert Writer or target 1 Update Writer or target 2 Delete Writer or target 3 Reject Writer If a row indicator is 3. usin g the reject loading utility.0. Keep in mind that correcting the reject file does not necessarily correct the so urce of the reject. Null Truncated Bad Data. You can correct those rejected data and re-load them to relational targets.D. the server appends a rejected data to the reject f ile. If the writer of the target rejects data. Correct the mapping and target database to eliminate some of the rejected data w hen you run the session again.D. there are two main things. ? N T Overflow Bad Data.bad When you run a partitioned session.D. The target accepts it unless a database error occurs.1094345609.D.line in the mapping. Locating the BadFiles $PMBadFileDir Filename. Trying to correct target rejected rows before correcting writer rejected rows is . you run a session.

For The rejloader used the data movement mode configured for the server. in place of NULL values. load then and work on the other at a later time.cfg [folder name] [session name] Other points The server does not perform the following option. For using an External Loader: The following must be done: configure an external loader connection in the server manager Configure the session to write to a target flat file local to the server. when using reject loader (a) Source base commit (b) Constraint based loading (c) Truncated target table (d) FTP targets (e) External Loading Multiple reject loaders You can run the session several times and correct rejected data from the several session at once. Why writer can reject ? Data overflowed column constraints An update strategy expression Why target database can Reject ? Data contains a NULL column Database errors. Choose an external loader connection for each target file in session proper ty sheet. Teradata and Oracle external loade rs to load session target files into the respective databases. External Loading You can configure a session to use Sybase IQ. The control file has an extension of ?*. in middle of t he reject loading Use the reject loader utility Pmrejldr pmserver. the writer will again reject th ose rows. Method: When a session used External loader. However. or w ork on one or two reject files. rename the rejected file to reject_file . The External Loader option can increase session performance since these database s can load information directly from files faster than they can the SQL commands to insert the same data into the database. Hence do not change the above. Issues with External Loader: Disable constraints Performance issues . so you decide to change those NULL values to Z ero. The control file contains information about the target flat file. Column. if those rows also had a 3 in row indicator. and they will contain inaccurate 0 values. the session creates a control file and targ et flat file. If you try to load the corrected file to target. not because of a targe t database restriction.ctl ? and you can view the file in $PmtargetFilesDi r. such as key violations Steps for loading reject file: After correcting the rejected data. the row was reject ed b the writer because of an update strategy expression. a series of ?N? indicator might lead you to believe the target data base does not accept NULL values. such as data format and loading instruction for the External Loader. It als o used the code page of server/OS.not recommended since they may contain misleading column indicator. You can correct and load all of the reject files at once.

Not stored in the index cache. it deletes the caches files . The reject file has an extension of ?*. When the session completes. Rank stores group values as stores ranking information Configured in the Group-by based on Group-by ports . details about EL performance. The EL creates a reject file for data rejected by the database. which is getting stored as same target directory. The master source table As configured in Joiner condition. The EL saves the reject file in the target file directory You can load corrected data from the file. server requires processing overhead to cache data and index information. the server continues the EL process. it deletes the caches fi les . and then click flat file options Caches server creates index and data caches in memory for aggregator .ldr? reject. Column overhead includes a null indicator. and in most circumstances. using database reject loader. the server loads partial target data using EL.joiner and Lookup transformation in a mapping. and in most stores overflow values in cache files . If the session fails.o o - Increase commit intervals Turn off database logging Code page requirements The server can use multiple External Loader within one session (Ex: you are having a session with the two target files. Determining cache requirements To calculate the cache size. If the session contains errors. Steps: . Caches Storage overflow : Transformation index cache data cache Aggregator stores group values stores calculations As configured in the based on Group-by ports Group-by ports.rank . Joiner stores index values for stores master source rows . an d not through Informatica reject load utility (For EL reject file only) Configuring EL in session In the server manager. you need to consider column and row requirements as well as processing overhead. it is generated at EL l og. Caches Storage overflow : releases caches memory. and row overhead can include row to k ey information. However. the server releases caches memory. Server stores key values in index caches and output values in data caches : if the server requires more memory . Lookup stores Lookup condition stores lookup data that?s Information. One with Oracle External Loader and another with Sybase External Loader) Other Information: The External Loader performance depends upon the platform of the server The server loads data at different stages of the session The serve writes External Loader initialization and completing messaging in the session log. open the session property sheet Select File target.

it stores dat a in memory until it completes the aggregation. column size) + 16) Joiner Data Cache: #Master row [(&#61669. If you partition the source pipeline. To improve joiner performance. If the rank transformation is configured to rank across multiple groups. Server builds the cache and queries it for the each row that enters the tra nsformation. add the total column size in the cache to the row*1. ? when you partition a source.first. column size) + 8] Lookup cache: When server runs a lookup transformation.idx and data files PMAGG*.you may find multiple index and data files in the directory . After building these caches. ? server uses memory to process an aggregator transformation with sort ports. it reads all rows fr om the master source and builds memory caches based on the master rows. Index cache: #Groups ((&#61669. when it process the first row of data in the transformation. the server aligns all data for joiner cache or an eight byte boundary. column size + 10)) + 20] Joiner Cache: When server runs a session with joiner transformation. It doesn?t use cache memory .dat. If two lookup transformations share the cach . that use sorted ports. th e server ranks incrementally for each group it finds .id*2.The server appends a number to the end of filename(PMAGG*. The server uses the Index cache to test the join condition. Location: ? by default . If the input row out-ranks a stored row. the server reads rows from the detail source a nd performs the joins Server creates the Index cache as it reads the master source into the data cache.the Informatica server replaces the stored row with the input row. the server creates one memory cache and one di sk cache and one and disk cache for each partition . When it finds a match. Index Cache : #Groups ((&#61669. Aggregator Caches ? when server runs a session with an aggregator transformation. For maximum requirements. it compares an i nput row with rows with rows in data cache. if t he size exceeds 2GB. ? the server names the index files PMAGG*. the server stores the index and data files in the directory $P MCacheDir. column size) + 7) Aggregate data cache: #Groups ((&#61669.etc).you don?t need to configure the cache memory. Multiply the result by the no of groups (or) rows in the cache this gives t he minimum cache requirements . Index Cache : #Master rows [(&#61669. multiply min requirements by 2. column size) + 7) Rank Cache when the server runs a session with a Rank transformation. the server allocates the configured a mount of memory for each partition.It routes data from one part ition to another based on group key values of the transformation. the server builds a cache in memo ry. column size) + 7) Rank Data Cache: #Group [(#Ranks * (&#61669. it retrieves rows values from the data cache.

The se rver then runs the session as it would any other sessions. Normalizer. If you do not enable decimal arithmetic.e. are cha nges they won?t reflect. The value m ay be 40012030304957666903. passing data through each transformations in the mapplet. If you use a reusable transformation in a mapplet. If you enable decimal arithmetic. the server does not allocate additional memory for the second lookup transfor mation. it expands the mapplets. column size) + 8] Transformations A transformation is a repository object that generates. changes to these can invalida te the mapplet and every mapping using the mapplet. you choose to enable the decimal data type or le t the server process the data as double (Precision of 15) Example: You may have a mapping with decimal (20. You can create reusable transformation by two methods: (a) Designing in transformation developer (b) Promoting a standard transformation Change that reflects in mappings are like expressions.. Mapplets When the server runs a session using a mapplets. It is called with in another transformation and returns a value to t hat transformation Reusable Transformations: When you are using reusable transformation to a mapping.) Passive Transformation: Does not change the no of rows that passes through it (Expression. Index Cache : #Rows in lookup table [(&#61669.) (b) a. You can create a non-reusable instance of a reusable transformation. the server passes the number as it is. the server automatically treats as a double value. If you want to process a decimal value with a precision greater than 28 digits. that passes through it (Filter.0) that passes through. (a) Active Transformation: a. Rank . If port name etc. lookup NOTE: Transformations can be connected to the data flow or they can be unconnecte d An unconnected transformation is not connected to other transformation in t he mapping.00120303049577 X 1019. Can change the number of rows.. The server creates index and data cache files in the lookup cache drectory and used the server code page to create the files. All the changes you are making in transformation will immediately reflect in ins tances. the server passes 4. Handling High-Precision Data: Server process decimal values as doubles or decimals. the definition of trans formation exists outside the mapping while an instance appears with mapping. column size) + 16) Lookup Data Cache: #Rows in lookup table [(&#61669. modifies or passes data. . Mapplet Objects: (a) Input transformation . When you create a session.

you have the same type of transactional data written to two different databases. the copy does not inherit your changes You can use a single mapplet.(b) (c) (d) Source qualifier Transformations. In that. We can use session parameter in a session property sheet. then list the paramete rs and variables used in the session and assign each value. Ports Default value for I/P port NULL Default value for O/P port ERROR Default value for variables Does not support default values Session Parameters This parameter represent values you might want to change between sessions. If you make changes to the original. you set $DBConnectionSource to TransDB1 and run the session. as you need Output transformation Mapplet Won?t Support: Joiner Normalizer Pre/Post session stored procedure Target definitions XML source definitions Types of Mapplets: (a) Active Mapplets (b) Passive Mapplets Contains one or more active transformations Contains only passive transformation Copied mapplets are not an instance of original mapplets. we can specify the folder and session name. and you use the database connections TransDB1 and TransDB2 to connect to the databases. After it completes set the value to TransDB2 and run the session again. you can create a database connection parameter. load to the server We can define following values in a parameter o Mapping parameter o Mapping variables o Session parameters You can include parameter and variable information for more than one sessio . NOTE: You can use several parameter together to make session management easier. Session parameters do not have default value. even more than once on a mapping. when the server can not find a val ue for a session parameter. Save the parameter file in any directory. You want to use the same mapping for both tables. When you create a parameter file for the session. then define the parame ters in a session parameter file. and use it as the source databa se connection for the session. such as DB Connection or source file. it fails to initialize the session. Instead of creating two sessions for the same mapping. Session Parameter File A parameter file is created by text editor. like $DBConnectionSource. For example. The user defined session parameter are: (a) DB Connection (b) Source File directory (c) Target file directory (d) Reject file directory Description: Use session parameter to make sessions more flexible.

Transformation. to ensure data integrity. mappings an d source/target schemas. When you want to use the same value for a mapping parameter each time you run th e session. Code pages contains the encoding to specify characters in a set of one or more l anguages. (b) After Session: real Debugging process Metadata Reporter: Web based application that allows to run reports against repository metadat a Reports including executed sessions. you can reuse a map ping by altering the parameter and variable values of the mappings in the sessio n. Compatibility between code pages is essential for accurate data movement. Unlike a mapping parameter. Cubes. lookup table dependencies. The various code page components are Operating system Locale settings Operating system code page Informatica server data movement mode Informatica server code page Informatica repository code page Locale (a) System Locale System Default (b) User locale setting for date. display © Input locale Mapping Parameter and Variables These represent values in mappings/mapplets. If we declare mapping parameters and variables in a mapping. Mapping objects: Source. You can override the parameter file for sessions contained in a batch by us ing a batch parameter file. based on the type of character data in the mappings. The server saves the value of a mapping variable to the re pository at the end of each successful run and used that value the next time you run the session. We can select a code page.n in a single parameter file by creating separate sections. This can reduce the overhead of creating multiple mappings when only certain att ributes of mapping needs to be changed. It uses 2 bytes for each character to move data and performs additional ch ecks at session level. Passes 8 bytes. Dimension Debugger We can run the Debugger in two situations (a) Before Session: After saving mapping. Default one b. . Target. multi byte character data b. a mapping variable represent a value that can change through the session. A batch parameter file has the same format as a sess ion parameter file Locale Informatica server can transform character data in two modes (a) ASCII a. for each session wit h in the parameter file. we can run some initial tests. Passes 7 byte. time. US-ASCII character data (b) UNICODE a.

all sessions within a batch run on the Informatica server that runs the batch. Concurrent batch in a Sequential batch If you have concurrent batches with source-target dependencies that benefit from running those batches in a particular order. the server starts all of the sessions within the batch. only if the previous completes successfully (b) Always run the session (this is default) Concurrent Batch In this mode. If you have multiple servers. The server marks a batch as failed if one of its sessions is configured to run i f ?Previous completes? and that previous session fails. . Use4 the Local Repository for development. red ucing the time it takes to run the session separately or in a sequential batch. you can not change it to a local repos itory However. defining batches within batches Nested batches are useful when you want to control a complex series of sessions that must run sequentially or concurrently Scheduling When you place sessions in a batch. A repository that functions individually. They are two ways of running sessions. We can nest batches severa l levels deep. you can promote the local to global repository Batches Provide a way to group sessions for either serial or parallel execution by server Batches o Sequential (Runs session one after another) o Concurrent (Runs sessions at same time) Nesting Batches Each batch can contain any number of session/batches. A Local Repository is with in a domain that is not the global repository. under this category (a) Run the session. place them int o a sequential batch. This is the hub of the domain use the GR to store common objects that mult iple developers can use through shortcuts. unrelated and unconnected to oth er repository NOTE: Once you create a global repository. so that Informatica server can run them is consecutive order. reusable transformations. you can place th em in a sequential batch. © Standard Repository a. at same ti me Concurrent batches take advantage of the resource of the Informatica server. we can configure a batched session to run on its own schedule by selecting the ?Use Absolute Time Session? Option.Repository Types of Repository (a) Global Repository a. Server Behavior Server configured to run a batch overrides the server configuration to run sessi ons within the batch. the batch schedule override that session sch edule by default. However. These may include operational or appl ication source definitions. mapplets and mappings (b) Local Repository a. Sequential Batch If you have sessions with dependent source/target relationship. just like sessions.

it kills the DTM process and terminates the session. LM Shared Memory Load Manager uses both process and shared memory. it allocates 12. except it has a 60 second timeo ut. The LM keeps the information s erver list of sessions and batches. transformation threads for each pipeline 12. Recovery: After a session being stopped/aborted. (Default: 64. buffer memory and cache memory for sessio n information and to move data between session threads. stop the outermost batch When you issue the stop command. .000 by tes of memory to the session. DTM creates reader. NOTE: ABORT command and ABORT function. Once a session starts.000 bytes per block) Running a Session The following tasks are being done during a session 1.000 byt es as default. after stopping/aborting. DTM executes pre-session commands and procedures 11. the server runs the entire session the n ext time.000. This allows you to schedule or run approximately 10 sessions at one time. If you do not recover the session. DTM divides memory into buffer blocks as configured in the buffer block size set tings. LM creates DTM process 8. LM creates session log file 7. LM locks the session and read session properties 2. When the recovery is performed. DTM initializes the session and fetches mapping 10. you can issue t he ABORT command. By default. the session continues from the point at which it stopped.Server Concepts The Informatica server used three system resources (a) CPU (b) Shared Memory (c) Buffer Memory Informatica server uses shared memory. LM expands server/session variables and parameters 4. both are different. you may need to manually delete targets bef ore the session runs again. the LM uses shared memory to store session details for th e duration of the session run or session schedule. It is similar to stop command. DTM Buffer Memory The DTM process allocates buffer memory to the session based on the DTM buffer p oll size settings. DTM executes post-session commands and procedures 13. you must stop the batch If the batch is part of nested batch. This shared memory appears as the configurable parameter (LMSharedMemory) and the server allots 2. Hence. the server stops reading data. LM verifies permission and privileges 5. LM sends post-session emails Stopping and aborting a session If the session you want to stop is a part of batch. and the schedule queue in process memory.000. in session properties. DTM writes historical incremental aggregation/lookup to repository 14. If the server cannot finish processing and committing data within 60 seconds . writer. the session results can be recovered . DTM process allocates DTM process memory 9. LM validates source and target code page 6. LM reads parameter file 3. It continue s processing and writing data and committing data to targets If the server cannot finish processing and committing data.

Source. when large volume of data. the causes are small cache size. Mapping Optimize data type conversions. Performing ETL for each partition. At system level. Optimize transformations/ expressions. (For this. o WIN NT/2000-U the task manager. Hierarchy of optimization Target. Turn off recovery. Optimizing Target Databases: Drop indexes /constraints Increase checkpoint intervals. multiple CPUs ar e needed) Adding indexes. in parallel. IOSTART. group by). In session level. low buffer memory and sm all commit interval. Increasing buffer memory. Connect to RDBMS using IPC protocol. Multiple lookups can reduce the performance. Using Filter trans to remove unwanted data movement. Verify the largest lookup tabl e and tune the expressions. Mapping Session. Changing commit Level. . Eliminate transformation errors.When can a Session Fail Server cannot allocate enough system resources Session exceeds the maximum no of sessions the server can run concurrently Server cannot obtain an execute lock for the session (the session is alread y locked) Server unable to execute post-session shell commands or post-load stored pr ocedures Server encounters database errors Server encounter Transformation row errors (Ex: NULL value in non-null fiel ds) Network related errors When Pre/Post Shell Commands are useful To delete a reject file To archive target files before session begins Session Performance Minimum log (Terse) Partitioning source data. Increase database network packet size. Source level Optimize the query (using group by. o UNIX: VMSTART. Use conditional filters. Use bulk loading /external loading. System.

It runs as a daemon on UNIX and as a service on WIN NT. parameters . The default memory allocation is 12. Reader threadone thread for each partition. Locks session. DTM process: The primary purpose of the DTM is to create and manage threads that carry out th e session tasks. Verifies permissions/privileges. Use multiple preservers on separate systems. The following processes are used to run a session: (a) LOAD manager process: starts a session ? creates DTM process. Reduce paging. Note: one thread for each partition writes to target one or more transformation for each partit . Various threads Master threadmanager. (b) ? ? Load DTM process: creates threads to initialize the session read. handle pre/post session opertions. Transformation creates the main thread.000. which is called master thread . Cleans up after execution. Compiles mapping.Session: concurrent batches. Partition sessions. Session Process Info server uses both process memory and system shared memory to perform ETL pro cess. Reads parameter file. which creates the session.this manages all oth er threads. Remove staging area.000 bytes . write and transform data. Creates session log file. Fetches session and mapping information. manager processes: manages session/batch scheduling. System: improve network speed. Expands server/session variables. Th is is known as buffer memory. Relational sources uses relational threads and Flat files use file threads. The DTM allocates process memory for the session and divides it into buffers. functions handles stop and abort requests from load Mapping threadone thread for each session. Tune session parameters. Writer thread. Reduce error tracing.

These files will be created in informatica home directory. the threads for a partitioned source execute concurr ently.the informatica server creates t he target file based on file prpoerties entered in the session property sheet.To genarate this fi le select the performance detail option in the session property sheet.I t writes information about session into log files such as initialization process .update. Aggreagtor transformation Joiner transformation Rank transformation Lookup transformation ****************************************************************************** . output file: If session writes to a target file.For the following circumstances informatica server creates index and d atacache files. Session detail file: This file contains load statistics for each targets in mapp ing.errors encountered and l oad summary. Session log file: Informatica server creates session log file for each session.The amount of detail in session log file depends on the tracing leve l that u set.When you run a session.the indicator file contai ns a number to indicate whether the row was marked for insert. Indicator file: If u use the flat file as a target. Reject file: This file contains the rows of data that the writer does notwrite t o targets. Cache files: When the informatica server creates memory cache it also creates ca che files.number of rows written or rejected.U can configure the informati ca server to create indicator file.U can view this file by double clicking on the session in monitor w indow Performance detail file: This file contains information known as session perform ance details which helps U where performance can be improved.delete or r eject.server.Session detail include information such as table name. Control file: Informatica server creates control file and a target file when U r un a session that uses the external loader.log).U can create two different messages. The threads use buffers to move/transform data. *************************************************************************8 What r the out put files that the informatica server creates during the session running? Informatica server log: Informatica server(on unix) creates a log for all status and error messages(default name: pm.For each target row.creation of sql commands for reader and writer threads.It also creates an error log fo r error messages. Post session email: Post session email allows U to automatically communicate inf ormation about a session run to designated recipents.One if the session completed sucessfully the other if the session fails .The control file contains the informa tion about the target flat file such as data format and loading instructios for the external loader.

If the source or target definition changes reconnecting ports is much easier. you can apply a filter to limit the rows the Integration Service reads. Optionally. LAST(AMOUNT. and sh . MONTH) MQ Source qualifier transformation *********************************************** Test load takes the no. Denormalization using aggregator LAST function in aggregator Returns the last row in the selected port.of rows from the SQ as specified in the test rows. You can nest only one other aggre gate function within LAST.******************************************************************************** ** ******************************************************************************** ** ******************************************************************************** ** Add EXPRESSION transformation after a SQ and before the target.

can be achieve without this task also *********************************************** Assignment task You can assign a value to a user-defined workflow variable with the Assignment t ask. seconds sk ent workflow/worklet p-level workflow *********************************************** Decision Task Use the Decision task instead of multiple link conditions in a workflow. minute. not loaded to the target. Instead of specifying multiple link conditions. it creates the event-> it triggers the event wa it to continue *********************************************** Timer Task You can specify the period of time to wait before the Integration Service runs t he next task in the workflow with the Timer task. ********************************************** Control Task the next task or refer a datet from the start time of this ta from the start time of the par from the start time of the to .give the exact time when to start ime variable Relative time .give the hours. two options: Absolute time .ows to which instance it goes. Not working for flat file target *********************************************** EVENTWAIT-EVENTRAISE tasks First create an event by right click the workflow and edit-> events tab -> creat e an event place eventraise task in the workflow space and give the created event name in t he "user defined event" (properties tab) Place eventwait task in the workflow space-> 2 options in events tab predefined-this is a file watch userdefined-this is an event crea ted on the workflow properties events tab user-defined event to watch and give the the event created when the eventraise is executed. use the predefined condition variable i n a Decision task to simplify link conditions.

ERROR ('Error. 5. 6. filter_condition ] ) FIRST( me fail parent abort parent fail parentlevel workflow stop parent level workflow abort parent level workflow ********************************************** DECODE( ITEM_ID. it overrides the error with the value 1234 and passes 1234 to the next transformation. For exam ple.') ********************************************* . Negative salary found. and it does not log an error in the session log. 1234 .141592654 ) ********************************************** If you use an output default value other than ERROR. ITEM_PRICE > 10 ) LAST(ITEM_NAME. '0') RIM( LAST_NAME. 'NONE' ) DECODE ( CONST_NAME. 'Archimedes'. Optionally. Each t ime the Integration Service encounters the ERROR function in the expression. 'Pi'. EXP(2) returns 7. FIRST( value [. You can nest only one other aggregate function within FIRST. the default value overrides the ERROR function in an expression. 'Five'. 3. Negative salary found. 'Pythagoras'.414213562'. 'Regulator'. 10. 'Flashlight'. 'Knife'. to the output port. ITEM_PRICE > 10) LPAD( PART_NUM. where e=2. IIF( SALARY < 0. 40. 14. 20. and you assign the default value.38905609893065.71828183. you use the ERROR function in an expression. 'Tank'. It does not skip the row. '1.141592654'. ********************************************** Returns the first value found within a port or group.' NULL NULL 150000 150000 1005 1005 ********************************************** Returns e raised to the specified power (exponent). 'S. For example. '3. Row skipped. Row skipped. you can apply a filter to limit the rows the Integration Service reads. EMP_SALARY ) SALARY RETURN VALUE 10000 10000 -15000 'Error.'.

The MAX function uses the same sort or der that the Sorter transformation uses. safety knife ) 1 if the input value matches string1. 2 ) ********************************************* Is_Date() returns 0 or 1 ********************************************* Mapping -> Debugger -> use the existing session instance -> click next instance [shows one row at a time when moves throuthout the mapping. 'a'. QUANTITY2. TRUE) ********************************************* GET_DATE_PART( DATE_SHIPPED. QUANTITY3 ) ********************************************* INDEXOF( ITEM_NAME. 12. 2 if the input value matches string2. 'MON' ) ********************************************* GREATEST( QUANTITY1. LAST_DAY() Returns the date of the last day of the month for each date in a port . where you make periodic. Syntax FV( rate. and so on.Returns the future value of an investment. MANUFACTURER_ID='104' ) . and the Sorter transformation may not be case sensitive. displays the next ro w] ********************************************* Is_Number() Is_spaces() Returns whether a string value consists entirely of spaces. 0 if the input value is not found. . ******************************************** MAX( ITEM_NAME. 1. type] ) FV(0. constant pay ments and the investment earns a constant interest rate. -250. flashlight . diving hood . present value. terms. the MAX function is case sensi tive. payment [. 'DD' ) GET_DATE_PART( DATE_SHIPPED. ********************************************* INITCAP( string ) ********************************************* INSTR( COMPANY.0075. -2000. 'HH12' ) GET_DATE_PART( DATE_SHIPPED. NULL if the input is a null value. However.

5 ) MOVINGSUM( SALES.. ******************************************* REG_EXTRACT REPLACESTR ( CaseFlag. When you use bulk loading. 16. which spee ds performance. ******************************************* MD5 . returns the average f or the last five rows read: MOVINGAVG( SALES. ****************************************** What is a global and local shortcuts? Standalone Repository : A repository that functions individually and this is unr elated to any other repositories. however. the target databas e cannot perform rollback.] Ne wString ) REVERSE( string ) RPAD( ITEM_NAME. OldStringN. Local Repository : Local repository is within a domain and it s not a global repos itory.') TO_FLOAT() . ************************************************** When bulk loading. Local repository can connect to a global repository using global shortcut s and can use objects in it s shared folders. the Integration Service bypasses the database log.Encodes string values. Without writing to the database log. . EXPONENT ) ******************************************* RAND (1) Returns a random number between 0 and 1. ******************************************* The following expression returns the average order for a Stabilizing Vest.You can also use MAX to return the latest date or the largest numeric value in a port or group.0 if the value in the port is blank or a non-numeric character. you may not be able to perform recovery. and thereafter. This is useful for probabilit y scenarios. OldString1. 5 ) ******************************************* POWER( NUMBERS. You can specify the length of the string that you want to encode. Global Repository : This is a centralized repository in a domain. This repositor y can contain shared objects across the repositories in a domain. InputString.. based on the first five rows in the Sales port. [OldString2.Message-Digest algorithm 5 METAPHONE . As a result. . The objects ar e shared through global shortcuts. '. weigh the importance of improved session performance against the ability to recover an incomplete session.

0)+ IIF(FLG_C='Y'. and at least 24 comparisons. 0. IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'Y'. 0. VAL_B. VAL_B + VAL_C. VAL_C. VAL_B . you can rewrite that expression as: IIF(FLG_A='Y'. IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'Y'.************************************************* IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'Y'. VAL_A + VAL_C. VAL_A.0)+ IIF(FLG_B='Y'. 16 ANDs. VAL_C. IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'N'. IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N'. )))))))) This expression requires 8 IIFs. 0. VAL_A + VAL_B + VAL_C. 0) ************************************************* . IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'Y'. If you take advantage of the IIF function.0. VAL_A + VAL_B . IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'N'. IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'N'. VAL_A . 0.

Target 2. Source 3.****************************************************** Look for performance bottlenecks in the following order: 1. Session . Mapping 4. Transformation 5.

6. You c an also use the Workflow Monitor to view system resource usage. consider adding a pass-through partition point to the transformation. If one transformation requires more processing time than the others. if the machine is already running at or near full capacity. Monitor system performance. When the Integration Service spends more time on the reader thread than the transformation or writer threads. and paging to identify system bottlenecks. Analyze thread statistics. do not add more threads. Analyze thread statistics to determine the optimal number of partition points. Increase the database network packet size. I/O waits. you have a source bottleneck. When you add partition points to the mapping. Analyze performance details. such as performance counters. If a transformation thread is 100% busy. You can use system monitoring tools to view the pe rcentage of CPU use. consider using string datatypes in the source or target ports. **************************************** Source Bottlenecks Performance bottlenecks can occur when the Integration Service reads from a sour ce database. ****************************************** If the reader or writer thread is 100% busy. Configure index and key constraints. grid deployments 7. You can configure a test session to read from a flat file s ource or to write to a flat file target to identify source and target bottleneck s. the Integration Servi ce increases the number of transformation threads it uses for the session. Powercenter components 8. Identifying Source Bottlenecks You can read the thread statistics in the session log to determine if the source is the bottleneck. to determine where session performance decreases. Non-string ports require more processing. ***************************************** Complete the following tasks to eliminate target bottlenecks: Have the database administrator optimize database performance by optimizing th e query. consider adding a partition point in the segment. Analyze performance details. System ****************************************** Run test sessions. . Inefficient query or small database network packet sizes can cause source bottlenecks. Howev er.

Eliminating Source Bottlenecks Complete the following tasks to eliminate source bottlenecks: Set the number of bytes the Integration Service reads per line if the Integrat ion Service reads from a flat file source. To create a read test mapping. On Windows. If the time it takes to run the new session remains about the same. Make a copy of the original mapping. Execute the query against the source database with a query tool such as isql. Measure the query execution time and the time it takes for the query to return t he first row. you can load the re sult of the query in a file. you can load the result of the query in /d ev/null. Remove all transformations. . use the following methods to iden tify source bottlenecks: Filter transformation Read test mapping Database query If the session reads from a flat file source. A read test m apping isolates the read query by removing the transformation in the mapping. Connect the source qualifiers to a file target. and any cus tom joins or queries. Using a Filter Transformation You can use a Filter transformation in the mapping to measure the time it takes to read source data. you probably do not have a source bottleneck. you have a source bottleneck. 2. 4. On UNIX. keep only the sources. Using a Read Test Mapping You can create a read test mapping to identify source bottlenecks. Add a Filter transformation after each source qualifier. you have a source bottleneck.If the session reads from a relational source. complete the following steps: 1. 3. Set the filter conditio n to false so that no data is processed passed the Filter transformation. execute the read query directly against the sour ce database. Using a Database Query To identify source bottlenecks. If the session performance is simil ar to the original session. Have the database administrator optimize database performance by optimizing th e query. In the copied mapping. Copy the read query directly from the session log. source qualifiers. Run a session against the read test mapping.

. you reduce the number of transformations in the mapping and delete un necessary links between transformations to optimize the mapping.S. ZIP c ode information as a Char or Varchar datatype. *************************************************** You can optimize performance for pass-through mappings. Delete unnecessary links between transformations to mini mize the amount of data moved. By defa ult. the Integration Service reads 1024 bytes per line. Consider using single-pass reading if you have multiple sessions that use the same sources. Generally. For example. Configure index and key constraints. If you convert the zip code data to an Integer datatype. **************************************************** Optimizing the Line Sequential Buffer Length If the session reads from a flat file source.****** ** ***************************************************** Mapping Bottlenecks If you determine that you do not have a source or target bottleneck. *************************************************** Use integer values in place of other datatypes when performing comparisons using Lookup and Filter transformations. If there is a long delay between the two time measurements in a database query . This helps increase the speed of the lookup comparisons based on zip co de.Increase the database network packet size. Configure the m apping with the least number of transformations and expressions to do the most a mount of work possible. the wizard creates an Expression transformation be tween the Source Qualifier transformation and the target. *************************************************** Single-pass reading allows you to populate multiple targets with one source qual ifier. ******************A simple source filter on the source database can sometimes ne gatively impact performance because of the lack of indexes. You can use the Powe rCenter conditional filter in the Source Qualifier to improve performance. the lookup database stores the zip code 94303-1234 as 94 3031234. you can improve session performanc e by setting the number of bytes the Integration Service reads per line. you may hav e a mapping bottleneck. If you use the Getting Started Wizard to c reate a pass-through mapping. many databases store U. If each line in the sour ce file is less than the default setting. You can combine the transformation logic for each mapping in one mapping and use one source qualifier for each source. To pass directly from so urce to target without any other transformations. you can use an optimizer hint. connect the Source Qualifier t ransformation directly to the target. you can decrease the line sequential b uffer length in the session properties.

In general. Join sorted data when possible . The Integration Service updates the targe t incrementally. To save and reuse the cache files. Use the following types of caches to increase performance: Shared cache.Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache. You can share an unnamed cache between transformations in the same mapping. the Integration Service caches the lookup table and queries the lookup cache during the session. you apply captured changes in the source t o aggregate calculations in a session. you can configure the tra nsformation to use a persistent cache. Use the Source Qualifier transforma tion to perform the join. Designate the master source as the source with fewer rows. you want to cache lookup tables that need less than 300 MB. Use this feature when you know the lookup table does not change between session runs. whether or not you ca che the lookup table. Using a persistent cache can improv .minimize disk input and output. The result of the Lookup query and processing is the same. you can use incremental aggregation to optimize the performance of Aggregator transformations. ************************************************ Optimizing Joiner Designate the master source as the source with fewer duplicate key values. Persistent cache. You can share the lookup cache between multiple transformations. When you use incremental aggregation. Perform joins in a database when possible . You can share a named cache between transformations in the same or different mappin gs. ************************************************ Optimizing Lookup Transformations Native Drivers Caching Lookup Tables If a mapping contains Lookup transformations.use integer values Filtering Data Before You Aggregate Limiting Port Connections . However. Using Sorted Input Using Incremental Aggregation If you can capture changes from the source that affect less than half the target .Create a pre-session stored procedu re to join the tables in a database. When this option is not enabled . You can increase the index and data cache sizes to hold all data in memory witho ut paging to disk. you might want to enable lookup ca ching. the Integration Service queries the lookup table on a row-by-row basis. When you enable caching. rather than processing the entire source and recalculating the same calculations every time you run the session.************************************************** Optimizing aggregator Grouping By Simple Columns . using a lookup cache can increase session perform ance for smaller lookup tables.

If you do not have to cache values.000.e performance because the Integration Service builds the memory cache from the c ache files instead of from the database. ********************************************** Allocating Memory For optimal performance. evaluate those transform ation constraints. If the amount of incoming data is greater than the amount of Sorter cache size. the Integration Service temporarily stores data in the Sorter transformation wor k directory.777. ******************************************** . configure the Sorter cache size with a value less than or equal to the amount of available physical RAM on the Integration Service node . the Integrat ion Service may require much more than twice the amount of disk space available to the work directory. If the amount of incoming data is significantly greater than the Sorter cache size. set the Number of Cache Values to 0. conflicting mapping logic. If the I ntegration Service cannot allocate enough memory to sort data. The Sorter cache size is set to 16. such as null input. This can impro ve performance. The Integration Service requires disk space of at least twice the a mount of incoming data when storing data in the work directory. Use Select Dis tinct option to filter unnecessary data earlier in the data flow. Use the following formula to determine the size of incoming data: # input rows ([Sum(column size)] + 16) ********************************************* Use the Select Distinct option for the Source Qualifier transformation if you wa nt the Integration Service to select unique values from a source. create a reusable Sequence Gener ator and using it in multiple mappings simultaneously. Also. The Number of Cached Values property determines the number of values the Integra tion Service caches at one time. Consider configuring the Number of Cached Values to a value greater than 1. Enabling Concurrent Caches . ******************************************** Transformation errors occur when the Integration Service encounters conversion e rrors. If the errors center around particular transformations. configure the Numbe r of Cached Values property.default Auto *********************************************** To optimize Sequence Generator transformations. Check the session log to see where the transformation errors occur.216 bytes by default. it fails the sess ion. Sequenc e Generator transformations that do not use cache are faster than those that req uire cache. and any condition set up as an error. Make sure that the Number of Cached Value is no t too small. Allocate at least 16 MB of physical memory to sort data using the Sorter trans formation.

{} not needed for one join condition alone Lookup SQL Override column order and the port order .does not affect Update employee set employeename = case when empid = 1 then 'RK' when empid = 2 then 'Valluri' else null end Incremental aggregation: When you enable both truncate target tables and increm .give direct command or batch script ******************************************** Join condition in SQL OVERRIDE .********************************************* ************************************************ **************************************************** Tools -> options -> general tab Reload Tables/Mappings when Opening a Folder Ask Whether to Reload the Tables/Mappings Group Source by Database ******************************************** Target file having datatime as header Header in session properties .Command .

} Swaping columns example: IdentifyingType = "Parent". generateRow(). IdentifyingType = "Child".ental aggregation in the session properties. generateRow(). LegacyAssociatedOrgRefno = LegacyIdentifyingRefno. int x = LegacyAssociatedOrgRefno. IdentifyingType="Child". generateRow(). int y = LegacyIdentifyingRefno. the Workflow Manager issues a warni ng that you cannot enable truncate target tables and incremental aggregation in the same session. Level=0. LegacyAssociatedOrgRefno = LegacyIdentifyingRefno. an d writes the following message to the session log: ********************************************** Java Transformation Examples if(isNull("LegacyAssociatedOrgRefno")) { IdentifyingType="Parent". runs a test load session. generateRow().equals("STHA")) { LegacyAssociatedOrgRefno=-1. . } if(isNull("LegacyAssociatedOrgRefno") && HOTYPE. IdentifyingType="Self". generateRow(). LegacyAssociatedOrgRefno = LegacyIdentifyingRefno. the Integr ation Service disables the truncate table function. Test load: When you enable both truncate target tables and test load.

************************************************** UNIX man ls -ltr wc [-l -w] mv filename diff cp rm gzip gunzip ff to find the file any where in the system mkdir cd pwd grep looks for string in the file ftp chmod 4+2+1 (x+w+r) ex: chmod 755 cat f1 f2 >f3 ^-c (ctrl-c) to kill a running process ^-d (ctrl-d) to close an open window sort out the level Transaction control filename port in target instance use TRANSACTION CONTROL transformation IIF(Out_Level_old3 <> Out_Level_new3.tells the changes to be made to make the files identical tar .gz . and use the .tar. To generate a separate output file for each transaction. TC_CONTINUE_TRANSACTION) Generating Flat File Targets By Transaction You can generate a separate output file each time the Integration Service starts a new transaction. LegacyAssociatedOrgRefno = y. The Integration Service uses the FileName port value from the first row in each transaction to name the output file.Compares two files byte by byte and displays the first mismatch diff .LegacyIdentifyingRefno = x. You can dynamically name each target flat file. When you connect the FileName port in the mappi ng.tape archiver. add a FileName port to the flat file target definition. tar takes a bunch of files. TC_COMMIT_BEFORE. the Integration Service creates a separate target file at each commit point.tar f ile. you can specify where you want it to run Dynamic flat file creation depending on the levels (vary with each trust) Sorter . generateRow(). When you configure the Inte gration Service. and munges them into one . the files are often compressed with the gzip algorithm. *************************************************** The Integration Service runs sessions and workflows. What difference between cmp and diff commands? cmp .

IIF( v_SpecTypeCode_dom = 'CC_CPSPE'. 'B61411(TreatmentFnTypeCode)')) IIF(ISNULL(v_UniqueLegacySpecialtyIdentifier_lkp). then to extract the archive to the current directory run tar -xf archive. IIF( v_Treatmen tFnTypeCode_dom = 'CC_TRTFN'. to create a tar. 'B61413(v_SpecTypeCode)')) IIF(ISNULL(v_UniqueLegacyCurrentSpecialtyIdentifier_lkp). to create a tar tar -cf archive.tar. NULL. NULL.tar. 'B61411(TreatmentFnTypeCode)') IIF(ISNULL(v_UniqueLegacyTreatmentFunctionIdentifier_lkp). 'B61414(v_CurrentSpecTypeCode)')) ************************************************* ********************************************************** ********************************************************** ************************************************************ . NULL.gz head tail head 100 | tail 1 IIF(NOT ISNULL(In_UniqueLegacyTreatmentFunctionIdentifier) AND v_TreatmentFnType Code_dom = 'CC_TRTFN'. NULL. NULL.extension. NULL. IIF(v_CurrentSpe cTypeCode_dom = 'CC_CPSPE'.tar to use gzip. just add a z to the options. NULL.gz /dir to extract i t tar -xzf archive.gz: tar -czf archive.tar /directory.

g." Operational Data Store(ODS) An operational data store is an integrated. CRM. ERP applications etc Data Warehouse Subject oriented Summarized." Ralph Kimball "a copy of transaction data specifically structured for query and analysis. time variant. Bill Inmon "Subject-oriented. subject-oriented. OLTP(Online Transaction Processing)> OLTP is a class of program that facilitates and manages transaction-oriented app lications. Banking. refined Represents value over time Supports managerial needs Read only data Batch processing Operational/Transactional Application oriented Detailed Accurate as of moment Supports day-to-day needs Can be updated Real time transactions Completely different life cycle Software Development Life Cycle Analysis driven Dimensional model Large amount of data Relaxed availability Flexible structure Transaction driven Entity Relational Diagram Small amount of data High availability Static structure . volatile(including update/deletion). typically for data entry and retrieval transaction processing. e. It is a pro cess for building decision support systems and knowledge management enviroment t hat supports both day-to-day tactical decision making and long-term business str ategies.********************************************************** DW ************************************************************** Data Warehouse Data warehouse is an architecture for organizing information system. OLTP s ystems are optimized for data entry operations. Order Entry. non-volatile collection of data in support of management's decision making process. integrated. current valued structure designed to serve operational users as they do high performance integrated processing.

Top-Down Bottom-Up rated Practitioner Bill Inmon Ralph Kimball Hackney Emphasize Data Warehouse Data Marts grate heterogeneous BI environments Hybrid Many practitioners DW and data marts Fede Doug Inte Design Enterprise based normalized model. The OLAP should present the user with a single logical schema of the data. single user. Furthermore. . slice-dice. set of user interfaces and database technologies that have dimensional model. MOLAP(Multidimensional OLAP) Applications. marts summary data Contains both atomic and summary data Populates marts with atomic and summary data via a non-persistent staging area. Codd(father of the relational database)'s 12 rules for OLAP Multidimensional conceptual view. ROLAP(Relational OLAP) Applications and set of user interfaces that retrieve da ta from RDBMS and present as dimensional model. initial marts Reality of change in organizations and systems Data set DW atomic level data. HOLAP(Hybrid OLAP) is a combination of all the above OLAP methodologies.Business Intelligence (BI) Business Intelligence is a set of business processes for collecting and analying business information. F. BI functions include trend analysis. the end user should not have to be concerned about the details of data access or conversions. rules. This supports EIS "slice-and-dice" operations and is usually required in financial modeling. OLAP systems should be part of an open system that supports hetero geneous data sources. aggregation of data . data rotation for compa rative viewing. DOLAP(Desktop OLAP) Designed for low-end. marts use a subject orient di mensional model Dimensional model of data mart. Transparency. one or more star schemas An achitecture of architectures. drilling down to complex levels of detail. facts. Use of whatever means possible to integrate business needs ***************************************************************** E. Accessibility. definitions across organizations Architect Multi-tier comprised of staging area and dependent data marts Staging area and data marts High-level normalized enterprise model. consists star schema Start enterprise and local models. Data is stored/download ed on the desktop. share dimensions. OLAP(On-Line Analytical Processing) Querying and presenting data from data ware house exemplifying as multiple dimensions.

all dimensions are created equal. Attributes are the non-key columns in the lookup tables. it can be sales amount by store by day. a store column. Month is an attribute in the Time Dimension. Ideally. For example. and a sales amount column. Dynamic sparse-matrix handling. date. Fact Table: A fact table is a table that contains the measures of interest. For example. Intuitive data manipulation. Similar to rule 6. including their individual views or slices of a common database. Generic dimensionality. users shouldn't have to use menus or perfo rm complex multiple-step operations when an intuitive drag-and-drop action will do. Dimension: A category of information. This is d ifferent from the 3rd normal form. Unrestricted cross-dimensional operations. Users should be able to print just what they ne ed. product. Unlimited dimensional and aggregation levels.Consistent reporting performance. OLAP systems should accommodate varying st orage and data-handling options. OLAP systems. Lookup Table: The lookup table provides the detailed information about the attri butes. and any changes to the underlying financial model should be automatically re flected in reports. the time dimension. For example. first quarter of 2001 may be represented a s "Q1 2001" or "2001 Q1"). ******************************************************************** Dimensional data model is most often used in data warehousing systems. and one or more additional fields that specifies how that particular quarter is represented on a report (for example. star schema and snowflake schema. In this case. Flexible reporting. Related both to the idea of nulls in relational databases and to the notion of compressing large files. . For example. Client/server architecture. A serious tool should support at l east 15. and preferably 20. Not limited to 3-D and not biased toward any particular dimension. dimensions. like EISes. eg: demog raphy. Hierarchy: The specification of levels that represents relationship between diff erent attributes within a dimension. one for the unique ID that identifies the quarter. sales amount would be such a measure. modular systems. For example. Dimensional Data Model: Dimensional data model is commonly used in data warehous ing systems. Requirement for open. need to support multiple concurrent users. Multiuser support. Performance should not degrade as the number o f dimensions in the model increases. HR etc Attribute: A unique level within a dimension. Save a tree. and the two common schema types. A function applied to one dimension should also be able to be applied to another. Each row (each quart er) may have several fields. customer. and operations across data dimensions do not restrict relationsh ips between cells. For example. This measure is stored in the fac t table with the appropriate granularity. This section describes this modeling technique. the lookup table for the Quarter attribute would include a l ist of all of the quarters available in the data warehouse. the same data would then be stored differently in a dimensional model than in a 3rd normal form model. the fact table would contain three columns: A date column. commonly used for transactional (OLTP) type s ystems. As you can imagine. a sparse matrix is one i n which not every cell contains data. one possible hierarchy in the Time dimension is Year ? Quarter ? Month ? Day.

that dimensional table is normalized into multiple lookup tables. Are widely supported by a large number of business intelligence tools. i t increases the number of dimension tables and requires more foreign key joins. While this saves space. which may anticipate OR even require that the data-warehouse schema contains dimension ta bles The snowflake schema is an extension of the star schema. A star schema is characterized by one OR more very large fact tables that contai n the primary information in the data warehouse. OR where we want to sp lit a fact table for better comprehension. the Time Dimension that consists of 2 different hierarchies: 1. but the dimension tables are not joined to each other. each dimension is represented by a single dimensional table. Figure above p resents a graphical representation of a snowflake schema. A star query is a join between a fact table and a number of dimension tables. Fo r example. Provide highly optimized performance for typical star queries. Ea ch dimension table is joined to the fact table using a primary key to foreign ke y join. where each point of the star explodes into more points. The result is more complex queries and reduced query performance. each representing a level in th e dimensional hierarchy. Fact Constellation Schema This Schema is used mainly for the aggregate fact tables. the dim ension data has been grouped into multiple tables instead of one large table. Year ? Month ? Day 2.STAR SCHEMA All measures in the fact table have the same level of granularity. whereas in a snowflake schema. Week ? Day Snowflake schemas normalize dimensions to eliminate redundancy. each of which contains information about the entries for a particular attribute in the fact table. The split of fact table is done only when we want to focus on aggregation over few facts & dimensions. ******************************************************************** . In a star schema. a location dimension table in a star schema might be normalized into a location table and city table in a snowflake schema. The main advantages of star schemas are that they: Provide a direct and intuitive mapping between the business entities being analy zed by end users and the schema design. and a number of much smaller di mension tables (OR lookup tables). That is. complex star can have more than one fact table.