Welcome to the World of “Cache”

The Hidden agenda
a) Basics of Cache 1) Memory Cache 2) Where the cache files are created 3) Naming Conventions 4) Cache Calculations b) Advanced Cache 1) Look up Cache 2) Aggregator Cache 3) Joiner Cache 4) Ranker Cache

Let’s get to the Basics:
Cache is a combination of: 1) 2) Index Cache: Server stores key values or condition values used to index values at a faster rate. Data Cache: Server stores output values.

Caching Storage Overview : • For Index Caches: a) Aggregators store group by values from Group-By ports. b) Rankers store Group-By values c) Joiners store index values for the master (Join condition columns) d) Lookups Stores lookup condition information For Data Caches: a) Aggregators store aggregate data based on Group-By ports (variable ports, output ports, non group by ports) b) Rankers store ranking based on Group-By port (output rows other than ranked column) c) Joiners store master table (Output columns not in Join condition). d) Look ups Stores stores lookup data that is not stored in the index cache.

• • •

If the Integration service requires more memory than the configured cache size. it cannot initialize the session and the session fails. the Integration service allocates 1 GB to the index cache and 2GB to the data cache for each transformation instance.Memory Cache : • The server creates a memory cache based on size specified in the session properties which can be done manually based on certain calculations . try to configure the index and data cache sizes to store data in memory. Since paging to disk can slow session performance. • • • . By default. it pages to the Disc. If the Integration service cannot allocate the configured amount of cache memory.

c) The session does not complete successfully.Where are the Cache Files Created? • • The Integration service creates the index and data cache files by default in the Integration service variable directory. The number of index and data files are limited only by the amount of disk space available in the cache directory. If the Integration service on Windows does not find a directory there.idx2. the Integration service creates multiple index and data files. it creates the files in the system directory.idx1 and PMAGG*. the Integration service appends a number to the end of the filename. If a cache file handles more than 2 GB of data. b) You configure the Lookup transformation to use a persistent cache. . it creates the index and data files in the installation directory. If the UNIX Integration service does not find a directory there. $PMCacheDir. the Integration service saves the files in the PMCache directory specified in the UNIX configuration file or the cache directory in the Windows registry. such as PMAGG*. • Three Instances when the Cache File exists even after Session completion: • • • a) The session performs incremental aggregation. If you do not define $PMCacheDir. When creating these files.

idx. 2  the partition index. 8  the session ID 4  the transformation ID.[overflow index] For example. • PMLKUP8_4_2. .Naming convention followed by Informatica Server: • [<Name Prefix> | <Prefix> <session ID>_<transformation ID>]_[partition index]<suffix>. PMLKUP  transformation type as Lookup.

Describes the type of transformation: Aggregator transformation is PMAGG. Identifies the type of file: Index file is . so the first partition has no partition index.dat.File Name Description Component Name Prefix Prefix Cache file name prefix configured in the Lookup transformation. Suffix Overflow Index . the Integration service creates multiple index and data files. Data file is . Joiner transformation is PMJNR. If a cache file handles more than 2 GB of data. The number of index and data files are limited by the amount of disk space available in the cache directory.2. The partition index is zero-based.1 and PMAGG*. Rank transformation is PMAGG. such as PMAGG*. this identifies the partition number. Transform Transformation instance ID number. Lookup transformation is PMLKUP.idx.idx. Session ID Session instance ID number. ation ID Partition Index If the session contains more than one partition.idx. the Integration service appends an overflow index to the filename. Partition index 2 indicates a cache file created in the third partition. When creating these files.

Rank: Index size: (Sum of column sizes in group-by ports + 17) X number of groups. Data Size: (Sum of column sizes of output ports + 10) X number of groups + 20.Cache Calculations • Aggregator: Index size: (Sum of column sizes in group-by ports + 17) X number of groups. Joiner: Index Size: (Sum of master column sizes in join condition + 16) X number rows in master table. Data Size: (Sum of column sizes of output ports + 7) X number of groups. Data Size: (Sum of master column sizes NOT in join condition but on output ports + 8)X number of rows in master table LookUp: Index Size: # rows in lookup table [( S column size) + 16] * 2 Data Size: # rows in lookup table [( S column size) + 8] • • • • • .

Lookup precision + 8 Round to nearest multiple of 8 24 16 24 32 16 16 16 16 16 Date/Time Decimal. high precision off (all precision) Decimal. <=28) Decimal.Datatype Binary Aggregator. high precision on (negative scale) Double Real Integer String Small integer 18 10 18 22 10 10 10 10 6 ASCII mode: precision ASCII mode: precision + 9 +3 6 16 . high precision on (precision >28) Decimal. Rank precision + 2 Joiner. high precision on (precision >18. high precision on (precision <=18) Decimal.

When the session completes.Lookup Caches Overview • • • • • • The Integration service builds a cache in memory when it processes the first row of data in a cached Lookup transformation It allocates memory for the cache based on the amount you configure in the transformation or session properties. the Integration service stores the overflow values in the cache files. The Integration service stores condition values in the index cache and output values in the data cache The Integration service queries the cache for each row that enters the transformation. The Integration service also creates cache files by default in the $PMCacheDir If the data does not fit in the memory cache. the Integration service releases cache memory and deletes the cache files unless you configure the Lookup transformation to use a persistent cache. .

You can share an unnamed cache between transformations in the same mapping. You can share the lookup cache between multiple transformations. you can configure the Lookup transformation to rebuild the lookup cache. You cannot use a dynamic cache with a flat file lookup. you can create a Lookup transformation to use a dynamic cache. It caches the lookup file or table and looks up values in the cache for each row that comes into the transformation. The Integration service dynamically inserts or updates data in the lookup cache and passes data to the target table. Shared cache. When the lookup condition is true. the Integration service returns a value from the lookup cache. the Integration service creates a static cache. By default. On the other hand Static caches dont get updated when you do a lookup. or if there is a match it will update the row in the target. If the persistent cache is not synchronized with the lookup table. The Integration service does not update the cache while it processes the Lookup transformation. • • • . your lookup table is your target table. You can share a named cache between transformations in the same or different mappings. You can save the lookup cache files and reuse them the next time the Integration service processes a Lookup transformation configured to use the cache Recache from source. Static cache. you can specify any of the following options: Persistent cache. For example. Dynamic cache. or read-only.Types of Lookup Cache • • • • When configuring a lookup cache. You can configure a static. If you want to cache the target table and insert new rows or update existing rows in the cache and the target. cache for any lookup source. So when you create the Lookup selecting the dynamic cache what It does is it will lookup values and if there is no match it will insert the row in both the target and the lookup cache (hence the word dynamic cache it builds up as you go along).

.

.

.

• The minimum size for a lookup index cache is independent of the number of source rows.Calculating the Lookup Index Cache • • • • The lookup index cache holds data for the columns used in the lookup condition. • Calculating the Maximum Lookup Index Cache • # rows in lookup table [( S column size) + 16] * 2   Columns in lookup condition. Calculating the Minimum Lookup Index Cache • 200 * [( S column size) + 16]  Columns in lookup condition. . specify the maximum lookup index cache size. The formula for calculating the minimum lookup index cache size is different than calculating the maximum size. For best session performance.

This indicates that the the row is not in the cache or target table. You can use a relational or flat file lookup.When the condition is not true. The informatica server inserts rows into cache when the condition is false.Difference between Static and Dynamic Cache Static cache: • • U can insert rows into the cache as u pass to the target. • Dynamic cache : • • U can not insert or update the cache. U can pass these rows to the target table You can use a relational look up only • . informatica server returns the default value for connected transformations and null for unconnected transformations. The informatica server returns a value from the lookup table or cache when the condition is true.

.

000 * (16 + 16) * 2 = 3.• • • • • • • • • • Example: The Lookup transformation. and the table contains 60. LKP_PROMOS.840.000 Therefore.000 rows. ITEM_ID. this Lookup transformation requires an index cache size between 6.840. looks up values based on the ITEM_ID. .000 bytes.400 and 3.400 Use the following calculation to determine the maximum index cache requirements: 60. Use the following calculation to determine the minimum index cache requirements: 200 * (16 + 16) = 6. It uses the following lookup condition: ITEM_ID = IN_ITEM_ID1 ITEM_ID column size Column in lookup condition integer = 16 The lookup condition uses one column.

400.Decimal  16 The lookup table has 60.Connected output port not in lookup condition – Integer -> 16 2) DISCOUNT .400. the data cache contains data from the return port.Connected output port not in lookup condition . the data cache contains data for the connected output ports. Use the following calculation to determine the minimum data cache requirements: 60. 1) PROMOTION_ID .000 rows.000 * (32 + 8) = 2.000 This Lookup transformation requires a data cache size of 2. not including ports used in the lookup condition.Calculating the Lookup Data Cache • In a connected transformation.000 bytes. • • • • • • . In an unconnected transformation.

.

If you use incremental aggregation. It does not use cache memory. You do not need to configure cache memory for Aggregator transformations that use sorted ports. Note: The Integration service uses memory to process an Aggregator transformation with sorted ports. the Integration service saves the cache files in the cache file directory. it stores data in memory until it completes the aggregation. • .Aggregator Cache • When the Integration service runs a session with an Aggregator transformation.

To configure a session for incremental aggregation: . configure the session with Verbose Init tracing.Configuring the Session fro Incremental Aggregation • • • • • Use the following guidelines when you configure the session for incremental aggregation: Verify the location where you want to store the aggregate files. If you want the Integration service to write the incremental aggregation cache file names in the session log. Verify the incremental aggregation settings in the session properties. Configure the session to write file names in the session log. If you choose to reinitialize the cache. the Workflow Manager displays a warning indicating the Integration service overwrites the existing cache and a reminder to clear this option after running the session. You can also configure the session to reinitialize the aggregate cache. You can configure the session for incremental aggregation in the Performance settings on the Properties tab.

.

000 and 5.String size .952.904. # groups [( S column size) + 17] Columns  Group by columns As per example. this Aggregator transformation requires an index cache size between 2.952.000 * (24 + 17) = 2.000 * 2 = 5.000 bytes.000 Therefore. .904.18 Therefore total column size = 18 + 6 = 24 Assuming there are 72. STORE_ID – Integer size  6 ITEM .952.000 The max index cache calculation is double the amount: 2.000 input rows The Min Index Cache calculation is: 72.Calculating the Aggregator Index Cache The index cache holds group information from the group by ports.

.

c) Port containing aggregate function (multiply by three).000 Therefore. Use the following calculation to determine the minimum data cache requirements: • • 72. ORDER_ID – Integer  6 SALES_PER_STORE_ITEMS . connect only the necessary input/output ports to subsequent transformations.* In the example.Decimal  30* Total = 36 The total number of groups as calculated for the index cache size is 72. • • . the data cache is generally larger than the index cache.096. To reduce the data cache size.000 bytes.Calculating the Aggregator Data Cache • The data cache holds row data for variable ports and connected output ports. Use the following information to calculate the minimum aggregate data cache size: # groups[( S column size) + 7] Column size  a) Non group by input/output ports. this Aggregator transformation requires a data cache size of 3. b) Local variable ports. As a result.000.000 * (36 + 7) = 3.096.

.

if every master row contains a unique key. After building the cache. The Integration service caches 100 master rows with unique keys. the Integration service stores more than 100 rows in the data cache. The server uses the Index cache to test the join condition. The number of rows it stores in the data cache depends on the data. Data cache. When it finds a match. • • • . if the master data contains multiple rows with the same key. Server creates the Index cache as it reads the master source into the data cache. it retrieves rows values from the data cache The Integration service caches all master rows with a unique key in the index cache. Index cache.Joiner Cache • While using joiner cache informatica server first reads the data from master source and built index & data cache in the master rows. and all master rows in the data cache. However. For example. the Integration service stores 100 rows in the data cache. The Integration service caches the master rows in the data cache that correspond to the 100 rows in the index cache. For instance.the Integration service then performs the join based on the detail source data and the cache data.

# master rows [( Sum of column size) + 8] Column Size Master column in join condition.000 Double the size to determine the maximum index cache requirements: 2. Use the following calculation to determine the minimum index cache requirements: 90.Joiner Index Cache Calculation The index cache holds rows from the master source that are in the join condition.000 * (16 + 16) = 2. .000 * 2 = 5.000 rows.000 bytes.000 and 5.880.880.880. In the example.760. it joins the sources ORDERS and PRODUCTS on ITEM_NO: • ITEM_NO – Decimal(10)  16 • • • • • PRODUCTS is the master source and has 90.000 Therefore.760. this Joiner transformation requires an index cache size between 2.

.

300.000 * (62 + 8) = 6.000 bytes.300.000 rows. . The following figure shows the connected output ports for JNR_ORDERS_PRODUCTS: ITEM_NAME – string  32 PRODUCT CATEGORY – decimal  30 Total column size = 62 The master source has 90. Use the following calculation to determine the minimum data cache requirements: 90. In the example .Joiner Data Cache Calculation • The data cache holds rows from the master source until the Integration service joins the data. • • • • • • • • • • # master rows [( S column size) + 8] Column  Master column not in join condition and used for output.000 This Joiner transformation requires a data cache size of 6.

.

000.455.324 The Integration service caches the first three rows (10.000 2.000). The Integration service reads the following input data: SALES 10. • . Therefore.Rank Caches • • • • • • • • • When the Integration service runs a session with a Rank transformation.455 6. is higher in rank than one of the cached rows. 12. the Integration service ranks incrementally for each group it finds.210 5. Since the row is lower in rank than the cached rows. it compares an input row with rows in the data cache. and 5. you configure a Rank transformation to find the top three sales. the Integration service replaces the cached row with the higher-ranked input row. When the Integration service reads the next row (2.210.000 12. For example. the Integration service replaces the stored row with the input row. it discards the row with 2. If the input row out-ranks a stored row.324). however. If the Rank transformation is configured to rank across multiple groups. The next row (6.455) it compares it to the cache values.

000 * 2 = 820.000 and 820.Calculating the Rank Index Cache • • • • • • • • • • The index cache holds group information from the group by ports. .000 product categories.000 Therefore. this Rank transformation requires an index cache size between 410.000.000 bytes. so the total number of groups is 10.000 Double the size to determine the maximum index cache requirements: 410. Use the following calculation to determine the minimum index cache requirements: 10.column size) = 24 There are 10. PRODUCT_CATEGORY (string(21). Use the following information to calculate the minimum rank index cache size: Rank Index Calculation: # groups [( S column size) + 17] Columns  Group by columns.000 * (24 + 17) = 410.

.

000 bytes. To reduce the data cache size.Calculating the Rank Data Cache • The data cache size is proportional to the number of ranks.000. The number of groups is 10. Use the following information to calculate the minimum rank data cache size: # groups [(# ranks *( S column size + 10)) + 20] ITEM_NO Decimal(10) = 10 ITEM_NAME String(23) = 26 PRICE Decimal (14) = 10 TOTAL COLUMN SIZE = 46 RNK_TOPTEN ranks by price. connect only the necessary input/output ports to subsequent transformations. Use the following calculation to determine the minimum data cache requirements: 10.800.800.000[(10 * (46 + 10)) + 20] = 5. • • • • • • • • • • . and the total number of ranks is 10.000 This Rank transformation requires a data cache size of 5. It holds row data until the Integration service completes the ranking and is generally larger than the index cache.