You are on page 1of 38

Supply Chain Intelligence

Supply Chain
Intelligence

Performance Tuning
Guide
2013

Performance Tuning Guide 2013

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Copyright and Disclaimer


Copyright © 2000 - 2013 Manhattan Associates, Inc. All rights reserved.

This documentation, as well as the software described in it, is furnished under license and may be used or copied
only in accordance with the terms of such license. The information in this documentation is furnished for informational
use only, is subject to change without notice, and should not be construed as a commitment by Manhattan
Associates, Inc. (“Manhattan”). No third party patent liability is assumed with respect to the use of the information
contained herein. While every precaution has been taken in the preparation of this documentation, Manhattan
assumes no responsibility for errors or omissions.

EXCEPT WHERE EXPRESSLY PROVIDED OTHERWISE, ALL CONTENT, MATERIALS AND INFORMATION,
ARE PROVIDED ON AN "AS IS" AND "AS AVAILABLE" BASIS. MANHATTAN EXPRESSLY DISCLAIMS ALL
WARRANTIES OF ANY KIND, WHETHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-
INFRINGEMENT.

Except as permitted by license, no part of this documentation may be reproduced, stored in a retrieval system, or
transmitted, in any form by any means, electronic, mechanical, recording, or otherwise, without the prior written
permission of Manhattan Associates.

Manhattan Associates is a registered trademark of Manhattan Associates, Inc.

All other brands, products, or services are trademarks, registered trademarks, or service marks of their respective
companies or organizations.

Contact Address:

Manhattan Associates, Inc.


2300 Windy Ridge Parkway,
Atlanta, Georgia 30339

http://www.manh.com/

Region Telephone Email

+1 877.756.7435 (U.S. and Canada only)


Americas callcenter@manh.com
+1 404.965.4025

(00) 800.988.0885 (China only)


Asia asiacustomersupport@manh.com
+86 21 3311 3499

Europe, Middle +44 (0) 1344 318400 (UK) Performance Tuning Guide 2013
emeacustomersupport@manh.com
East, and Africa +31 (0)30 214 3400 (NL)

Australia +61 1300787050 aucustomersupport2@manh.com

Unsure whom to contact? Call +1 404.965.4025 and you will be routed to the appropriate support group.

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Table of Contents
Overview ............................................................................................................................................... 4
Cognos BI Server Optimization............................................................................................................. 4
Set the HTTP config to cache static items. ................................................................................................4
Optimize CPU hardware configuration to support Cognos reports running in parallel. ..............................6
Governors Settings ............................................................................................................................... 8
Set Governors .................................................................................................................................... 8
Governors Settings detail ..........................................................................................................................9
ETL Optimization................................................................................................................................. 16
Extraction ................................................................................................................................................. 16
Transformation......................................................................................................................................... 17
Loading .................................................................................................................................................... 21
Cube Optimization .............................................................................................................................. 31
Ulimit Settings .......................................................................................................................................... 31
Enabling Parallelized Cube Processing ................................................................................................... 33
Set maximum parallel processes to be run at a time ............................................................................... 34
Set Auto Summarization .......................................................................................................................... 35
Periodically Clean up Models .................................................................................................................. 36
Time based partitioning techniques using cube groups ........................................................................... 36
Correct Indexing strategy based on cube queries.................................................................................... 38
Other settings .......................................................................................................................................... 38

Performance Tuning Guide 2013

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Overview
The objective of this guide is to provide performance tuning guidelines for the following SCI components:

• Cognos BI Server Optimization


• ETL Optimization
• Cube Optimization

Cognos BI Server Optimization


Set the HTTP config to cache static items.

• Copy the content below to httpd.conf, taking a backup of existing file.

# This is for the mod_expires settings. We are applying system


wide.

# Turn on Expires and set default expires to 3 days

ExpiresActive On

ExpiresDefault "now plus 3 days"

# Set up caching on media files for 1 month

<FilesMatch "\.(ico|gif|jpg|jpeg|png|flv|swf|mov|mp3|wmv|ppt)$">

ExpiresDefault "now plus 1 month"

Header append Cache-Control "public"

</FilesMatch>

# Set up 2 Hour caching on commonly updated files

Performance Tuning Guide 2013


<FilesMatch "\.(xml|txt|html|js|css)$">

ExpiresDefault "now plus 2 hours"

Header append Cache-Control "private, must-revalidate"

</FilesMatch>

# PDF Files will be matched here, cache for 6 hours

# Client should revalidate with the server to make sure

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

# they have latest copy. Also clients behind HTTP proxies

# will cache at the proxy. Proxy to revalidate on each serve.

<FilesMatch "\.(pdf)$">

ExpiresDefault "now plus 12 hours"

header append Cache-Control "public, must-revaildate, proxy-


revalidate"

</FilesMatch>

# Force no caching for dynamic files

<FilesMatch "\.(php|cgi|pl)$">

ExpiresDefault A0

Header set Cache-Control "no-store, no-cache, must-revalidate,


max-age=0"

Header set Pragma "no-cache"

</FilesMatch>

• Edit httpd.conf and remove the “#” from this line (enable mod expires, and mod_headers):
#LoadModule expires_module modules/mod_expires.so
#LoadModule headers_module modules/mod_headers.so

• Restart the Cognos and IBM HTTP web server services

Performance Tuning Guide 2013

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Optimize CPU hardware configuration to support Cognos


reports running in parallel.

• Navigate to Tuning folder: IBM Cognos Administration  System Servers (Select a server) 
click on arrow next to the dispatcher for the selected Server  Select “Set properties”Select
“Settings” tab  Select category “Tuning”

• Enter the following values in the above text fields based on the CPU configuration.

Report Service

Setting Default Suggested Calculated Reason/Comments


Value Starting Value for
Value ABC

Maximum 2 For each 8 This setting dictates the number of BIBus


number of server:2 * # of processes that will be spawned to handle

Performance Tuning Guide 2013


processes for CPUs interactive processing activity. Must be
the Interactive considered in combination with Batch Report
report service service settings. Overlap in interactive and
scheduled activity may warrant a lowering of
this threshold to accommodate both
processing activities.

Number of 1 For each 8 This setting indicates the number of threads


high affinity BIBusprocess: available per interactive report server (BIBus
connections 1 process) to handle high affinity requests. This
for the setting must be considered along with the low
Interactive affinity connections setting.
report service

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Setting Default Suggested Calculated Reason/Comments


Value Starting Value for
Value ABC

Number of 4 For each 16 This setting indicates the number of threads


high affinity BIBusprocess: available per interactive report server (BIBus
connections 2 process) to handle low affinity requests. This
for the setting must be considered along with the
Interactive high affinity.
report service

Batch Report Service

Setting Default Suggested Calculated Reason/Comments


Value Starting Value for
Value ABC

Maximum 2 For each 4 This setting dictates the number of BIBus


number of server: 1 * # processes that will be spawned to handle
processes for of CPUs scheduled processing activity. Must be
the batch considered in conjunction with Report service
report service settings. Overlap in interactive and scheduled
activity may warrant a lowering of this
threshold to accommodate both processing
activities.

The number of 1 For each 1 This setting is not applicable to scheduled


high affinity BIBusprocess: activity processing. Current scheduling
connections 1 functionality does not possess any high
for the batch (just leave the affinity requests.
report service default)

The number of 2 For each 4 This setting indicates the number of threads
low affinity BIBusprocess: available per batch report server (BIBus
connections 2 process) to handle low affinity requests. This
for the batch setting must be considered along with the
report service high affinity connections setting.

Performance Tuning Guide 2013

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Governors Settings
Set Governors
Use governors to reduce system resource requirements and improve performance. You set governors
before you create packages to ensure the metadata in the package contains the specified limits. All
packages that are subsequently published use the new settings.
The governor settings that take precedence are the ones that apply to the model that is currently open
(whether it is a parent model or a child model).
In a new project the governors do not have values defined in the model. You must open
the Governors window and change the settings if necessary. When you save the values in
the Governors window by clicking OK, the values for the governors are set. You can also set
governors in Report Studio. The governor settings in Report Studio override the governor settings in the
model.

Based on our test results we recommend the following setting on the Governors.

Performance Tuning Guide 2013

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Governors Settings detail


Maximum Number of Report Tables

You can control the number of tables that a user can retrieve in a query or report. When a table is
retrieved, it is counted each time it appears in the query or report. The limit is not the number of unique
tables. If the query or report exceeds the limit set for the number of tables, an error message appears and
the query or report is shown with no data.
A setting of zero (0) means no limit is set.
Note: This governor is not used in dynamic query mode.

Maximum Number of Retrieved Rows

You can set data retrieval limits by controlling the number of rows that are returned in a query or report.
Rows are counted as they are retrieved.
When you run a report and the data retrieval limit is exceeded, an error message appears and the query
or report is shown with no data.
You can also use this governor to set limits to the data retrieved in a query subject test or the report
design mode.
A setting of zero (0) means no limit is set.
If you externalize a query subject , this setting is ignored when you publish the model.
Note: This governor is not used in dynamic query mode.

Query Execution Time Limit

You can limit the time that a query can take. An error message appears when the preset number of
seconds is reached.
A setting of zero (0) means no limit is set.
Note: This governor is not used in dynamic query mode.

Large Text Item Limit

You can control the character length of BLOBs (binary large objects) that a user can retrieve in a query or
report. When the character length of the BLOB exceeds the set limit, an error message appears, and the
query or report is shown with no data.

Performance Tuning Guide 2013


A setting of zero (0) means no limit is set.

Outer Joins

You can control whether outer joins can be used in your query or report. An outer join retrieves all rows in
one table, even if there is no matching row in another table. This type of join can produce very large,
resource-intensive queries and reports.
Governors are set to deny outer joins by default. For example, outer joins are not automatically generated
when you test a query item in Framework Manager.

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

If you keep the setting as Deny, you are notified only if you create a relationship in the Diagram tab that
includes outer joins. You are not notified if you create a relationship in a data source query subject that
includes outer joins.
If you set the governor to Allow, dimension to fact relationships are changed from inner joins to outer
joins.
The outer joins governor does not apply in these circumstances:

• SQL that is generated by other means. If you set this governor to Deny, it does not apply to the
permanent SQL found in a data source query subject, whether the SQL was generated on
import , manually entered, or based on existing objects .

• Framework Manager needs to generate an outer join to create a stitched query. A stitched query is
a query that locally combines the results of two or more sub-queries by using a locally processed
outer join.

Note: This governor is not applicable for SAP BW data sources.

Note: This governor is not used in dynamic query mode.

Cross-Product Joins

You can control whether cross-product joins can be used in your query or report. A cross-product join
retrieves data from tables without joins. This type of join can take a long time to retrieve data.

The default value for this governor is Deny. Select Allow to allow cross-product joins.

Shortcut Processing

You can control how shortcuts are processed by IBM Cognos software.

When you open a model from a previous release, the Shortcut Processing governor is set
to Automatic. Automatic is a shortcut that exists in the same folder as its target and behaves as an
alias, or independent instance. However, a shortcut existing elsewhere in the model behaves as a
reference to the original. When you create a new model, the Shortcut Processing governor is always
set to Explicit.

If you set the governor to Explicit, the shortcut behavior is taken from the Treat As property. If
the Shortcut Processing governor is set to Automatic, we recommend that you verify the model and,
Performance Tuning Guide 2013
when repairing, change the governor to Explicit. This changes all shortcuts to the correct value from
the Treat As property based on the rules followed by the Automatic setting.

The Shortcut Processing governor takes priority over the Treat As property. For example, if the
governor is set to Automatic, the behavior of the shortcut is determined by the location of the shortcut
relative to its target regardless of the setting of the Treat As property is.

SQL Join Syntax

You can control how SQL is generated for inner joins in a model by selecting one of the following settings:

10

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

If the governor is set to Server determined, the CQEConfig.xml file is used to determine the governor
value. If there is no active CQEConfig.xml file or no parameter entry for the governor in the
CQEConfig.xml file, then the Implicit setting is used.

• The Implicit setting uses the where clause.

For example,

SELECT publishers.name, publishers.id,

books.title FROM publishers, books WHERE publishers.id

= books.publisher_id ORDER BY publishers.name, books.title;

• The Explicit setting uses the from clause with the keywords inner join in an on predicate.

For example,

SELECT

publishers.name, publishers.id,

books.title FROM publishers INNER JOIN books ON publishers.id

= books.publisher_id ORDER BY publishers.name, books.title;

You can set the join type on the query property in Report Studio to override the value of this
governor.

Regardless of the setting you use for this governor, the Explicit setting is used for left outer
joins, right outer joins, and full outer joins.

This governor has no impact on typed-in SQL.

Grouping of Measure Attributes (query items)

If the governor is set to Server determined, the CQEConfig.xml file is used to determine the governor
value. If there is no active CQEConfig.xml file or no parameter entry for the governor in the
CQEConfig.xml file, then the Disabled setting is used.

The Disabled setting prevents aggregation of the measure for the attributes. This is the default behavior.
For example,
select

Product.Product_line_code as Product_line_code, Performance Tuning Guide 2013


Order_method.Order_method_code as Order_method_code, //measure attribute

XSUM(Sales.Quantity for Product.Product_line_code) as Quantity


//aggregated

measure

from ...

The Enabled setting allows aggregation of the measure for the attributes.

Note: This is the default behavior for IBM Cognos Framework Manager versions prior to 8.3.
11

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

select

Product.Product_line_code as
Product_line_code,Order_method.Order_method_code as Order_method_code,
//measure attributeXSUM(Sales.Quantity for
Order_method.Order_method_code, Product.Product_line_code) as Quantity
//aggregated measure

from ...

SQL Generation for Level Attributes

You can control the use of the minimum aggregate in SQL generated for attributes of a level (member
caption).

If the governor is set to Server determined, the CQEConfig.xml file is used to determine the governor
value. If there is no active CQEConfig.xml file or no parameter entry for the governor in the
CQEConfig.xml file, then the Minimum setting is used.

The Minimum setting generates the minimum aggregate for the attribute. This setting ensures data
integrity if there is a possibility of duplicate records. For example,
select XMIN(Product.Product_line

for Product.Product_line_code) as Product_line, //level attribute

Product.Product_line_code as Product_line_code

from

(...) Product

The Group By setting adds the attributes of the level in the group by clause. with no aggregation for the
attribute. The distinct clause indicates a group by on all items in the projection list. The Group By setting
is recommended if the data has no duplicate records. It can enhance the use of materialized views and
may result in improved performance. For example,
select distinctProduct.Product_line

as Product_line, //level attribute

Product.Product_line_code as Product_line_code

from

Performance Tuning Guide 2013


(...) Product

Note: This governor is not used in dynamic query mode.

SQL Generation for Determinant Attributes

You can control the use of the minimum aggregate in SQL generated for attributes of a determinant with
the group by property enabled.

If the governor is set to Server determined, the CQEConfig.xml file is used to determine the governor
value. If there is no active CQEConfig.xml file or no parameter entry for the governor in the
CQEConfig.xml file, then the Minimum setting is used.

12

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

The Minimum setting generates the minimum aggregate for the attribute. This setting ensures data
integrity if there is a possibility of duplicate records. For example,
select

PRODUCT_LINE.PRODUCT_LINE_CODE as Product_line_code,

XMIN(PRODUCT_LINE.PRODUCT_LINE_EN for PRODUCT_LINE.PRODUCT_LINE_CODE)

as Product_line //attribute

from

great_outdoors_sales..GOSALES.PRODUCT_LINE PRODUCT_LINE

group by

PRODUCT_LINE.PRODUCT_LINE_CODE //key

The Group By setting adds the attributes of the determinants in the group by clause with no aggregation
for the attribute. This setting is recommended if the data has no duplicate records. It can enhance the use
of materialized views and may result in improved performance. For example,
select

PRODUCT_LINE.PRODUCT_LINE_CODE as Product_line_code,

PRODUCT_LINE.PRODUCT_LINE_EN as Product_line //attribute

from

great_outdoors_sales..GOSALES.PRODUCT_LINE PRODUCT_LINE

group by PRODUCT_LINE.PRODUCT_LINE_CODE //keyPRODUCT_LINE.PRODUCT_LINE_EN


//attribute

SQL Parameter Syntax

This governor specifies whether generated SQL uses parameter markers or literal values.

If the governor is set to Server determined, the CQEConfig.xml file is used to determine the governor
value. If there is no active CQEConfig.xml file or no parameter entry for the governor in the
CQEConfig.xml file, then the Marker setting is used.

You can override the value of this governor in Report Studio.

Performance Tuning Guide 2013


Dynamic SQL applications have the ability to prepare statements which include markers in the text which
denote that the value will be provided later. This is most efficient when the same query is used many
times with different values. The technique reduces the number of times a database has to hard parse an
SQL statement and it increases the re-use of cached statements. However, when queries navigate larger
amounts of data with more complex statements, they have a lower chance of matching other queries. In
this case, the use of literal values instead of markers may result in improved performance.

Allow Enhanced Model Portability at Run Time

This governor is selected upon initial upgrade of a Cognos ReportNet® 1.x model. It prevents rigid
enforcement of data types so that an IBM Cognos model can function as a ReportNet® 1.x model until
you update the data types in the metadata. After you have verified that the model has been upgraded
successfully, clear this governor.
13

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Other than for initial upgrade, there are limited uses for this governor. For example, you have created a
model for use with a data source and you want to run it against a different data source. The new data
source must be structurally similar to the original data source, and the database schema must be the
same between the two data sources. If you select this governor, IBM Cognos BI retrieves metadata from
the data source and caches it instead of using the metadata already cached in the model. When you have
completed modifying and testing the model against the new data source, clear this governor.

If you do not use this governor, you must ensure that the following metadata is the same in the original
and new data sources:

• collation sequence name

• collation level

• character set

• nullability

• precision

• scale

• column length

• data type

Allow Usage of Local Cache

Select this governor to specify that all reports based on this model will use cached data. For a new model,
this governor is enabled by default.

This setting affects all reports that use the model. Use Report Studio if you want a report to use a different
setting than the model.

Allow Dynamic Generation of Dimension Information

This governor is selected only upon initial upgrade of a ReportNet® 1.x model. This governor allows
consistent behavior with ReportNet® 1.x by deriving a form of dimension information from the
relationships, key information, and index information in the data source.

Use With Clause When Generating SQL

You can choose to use the With clause with IBM Cognos SQL if your data source supports

Performance Tuning Guide 2013


the With clause.

The With clause is turned on for models created in IBM Cognos BI. For upgraded models, it is turned off
unless it was explicitly turned on in the Cognos ReportNet®model prior to upgrading.

Suppress Null Values for SAP BW Data Sources

You can control whether or not nulls are suppressed by any report or analysis that uses the published
package. The governor is also applied to test results during the current Framework Manager session. It is
supported for SAP BW data sources only.

Some queries can be very large because null values are not filtered out. Null suppression removes a row
or column for which all of the values in the row or column are null (empty). Null suppression is performed

14

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

by SAP BW. This reduces the amount of data transferred to the IBM Cognos client products and
improves performance.

By default, nulls values are suppressed. If you clear this governor, null values are not suppressed.

There is a property called Suppress in Report Studio that overrides this governor. If
the Suppress property is set to None, null values are included in the result set even if the governor is set
to suppress null values.

Note: This governor is not applied when creating CSV files; therefore, CSV files include null
values if they exist in the data.

Publish Entire Model When Processing

A published package includes the model objects selected when the package was created. In addition,
those model objects are analyzed in order to identify and include dependent objects in the package.

In a complex or very large model, the analysis can take considerable time. To shorten the publish time,
set this governor to skip this analysis step and have the entire model written to the content store. The
resulting package may be larger because the entire model is published instead of only required objects,
however the time required to publish should be reduced.

Maximum external data sources that can be merged with a model

To use external data, report users import their data into an existing package. This governor controls the
number of external data files that can be imported.

The default is 1.

For more information about external data sources, see the IBM Cognos Report Studio User Guide.

Maximum external data file size (KB)

To use external data, report users import their data into an existing package. This governor controls the
size of each external data file.

By default, the maximum file size that report users can import is 2560 KB.

For more information about external data sources, see the IBM Cognos Report Studio User Guide.

Maximum external data row count

To use external data, report users import their data into an existing package. This governor controls the
number of rows that can exist in each external data file. Performance Tuning Guide 2013

By default, the maximum number of rows that report users can import is 20000.
For more information about external data sources, see the IBM Cognos Report Studio User Guide.

15

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

ETL Optimization
This section describes how to optimize all three sub-processes of the ETL: Extraction, Transformation,
and Load

Extraction
SCI uses the Import builds to perform extraction. Modifications to the Import builds can improve the
extraction processing times.

Queries
• Native queries are always preferred over Cognos SQL for speed; unless there is any DB specific
db function is used.
• Use Oracle SQL HINT for parallel records fetch. There are two versions of this:
1. Let Oracle decide the number of threads to be created. Use the following query:

SELECT /* +parallel */ FROM <table_name>

OR
2. Assign the number of parallel threads to be created. Use the following query:

SELECT /* +parallel(<table_name>,3) */ FROM <table_name>

Merge Technique

• The Import builds can pull data from multiple sources.


• Use the data manager Merge technique with certain optimization to pull data more efficiently. For
more detail on this, refer the following section on Data merge

Data Build Dos and Don’ts


• Avoid derivations in the data stream.
• Create calculated columns in the native query itself.
• Use Oracle SQL Loader. Refer section Oracle SQL Loader below.
Performance Tuning Guide 2013

16

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Transformation
Transformation, also referred to as Staging, is the ETL process of preparing data to be loaded into the
data warehouse.

This section is divided into three parts

• General guidelines for data manager


• Dimension builds
• Fact builds

General guidelines for data manager


• In the job streams, convert the SQL Node and replace it with procedure nodes. Invoke a stored
procedure which runs the query. Stored procedures run faster in consecutive runs and have a
query plan.
• Don’t add too many stage tables; try to achieve the logic using fewer builds which use grouping
and aggregation.
• Follow the correct indexing strategy for queries and data movement in Data manager. For more
information, see the section on Database guidelines for Data manager

Normal dimensions & large data volume

Normal dimension builds create records using the builds provided by data manager. Data manager is
designed to support source dimensional data from configuration tables. Generally, SCI dimension tables
contain data from the source system’s configuration tables, and not from its transactional tables (which
have a significantly larger volume of data).

Because it’s designed to support configuration data, data manager loads the entire data into memory.
After the data is loaded, it processes the records.

If the volume of data is too significant, a dimension can throw memory leak errors. Memory errors can be
managed in the following ways:

• Switch to a degenerate dimension. If the customer wants to see the measures based on this
Performance Tuning Guide 2013
dimension then we should move this to the Fact build and populate it as an attribute (degenerate
dimension).
• Reduce the record counts.
• Build the dimension using a fact build.

Dependent dimensions & large data volume

A dependent dimension is a slowly changing dimension. The SCI ETL uses fact builds to implement
dimensions to accommodate the following scenarios:

17

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

• A subset of dimensions, like Pickticket Line and Outbound LPN Line, has a large volume of data
which can’t be handled with normal dimension builds
• To track slowly changing attributes, SCI creates dimensions using multi-stage fact builds.

Identify degenerate dimensions


When a source column generates a large volume of data, it should be created as a degenerate
dimension. Review normal dimensions with substantially more volume than other dimensions and
determine if it should be a degenerate dimension.

If the data is coming from a transaction table with a large volume of data, the best approach is to convert
it into a FACT attribute, which is called a degenerate dimension.

There are two scenarios where there could be performance issues with dependent dimensions:

The Lookup to the Fact table is running out of memory. This can be addressed in the following ways:

• Dimension Breaking (see information in following subsection).


• Reduce the volume of data.
• Use post build stored procedures to update the rows:
Mark the surrogate keys to be populated as “-1”
Use a procedure to join the Fact table and the dimension table on the business_id and then update
the column (with value -1) with the actual value of the dimension primary key.
• The multi level fact builds which populate the dimensions are themselves slow:
o Indexes: Check the joins between the stage tables and the DW tables and add indexes as per
indexing strategy mentioned above in “General guidelines for Data Manager”.
o Merge Technique: If the join query is not working for a very large volume of data we can use the
data manager merge technique for data builds explained in the section Data Merge.

Dimension Breaking

One method to manage large data volume is dimension breaking. Dimension breaking processes data in
chunks by splitting the “sorted” data based on a particular column. This is possible as the dimension build
pulls the data into memory.

Performance Tuning Guide 2013


Prerequisite: Determine the Dimension to break on

Selecting the dimensions on which to break requires a mixture of analysis, experience, and the evaluation
of different options. A dimension is suitable for breaking if it meets the following conditions:

• It is not aggregated, or it is aggregated through few levels, with relatively few aggregated members
generated.

• It has a large domain that can be broken into smaller segments

• It is a balanced hierarchy, having an even distribution of members within parents

18

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Process

1. Choose the grain of the dimension build. The grain is the column that uniquely identifies a
dimension record. Sort the data based on this column in the query.
2. Double click on the Merge and Breaking Tab on the data build transformation model (highlighted).

3. The Build Properties Window opens (see screen shot below). Navigate to tab “Breaks”.
a. Add the grain column to the list “Break on:”
b. You can multiple columns where there is a composite key
c. Select the checkbox “Perform Break Processing Every:”
d. Enter 75% in the text field.
Note: This value can vary based on performance testing.

Performance Tuning Guide 2013

Stage Fact builds


The stage fact build occurs during a fact build. During the stage fact build, SCI performs the following
processes:

• Runs the transform query against multiple transactional import tables.


• Populates the dimension keys.
• Adds calculated derivations from dimension lookups.
• Converts column types to formats uniform across the framework.

19

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Specify Domain Size

A Domain Size defines internal transaction sizes when a fact build is executed. It applies for both
dimension elements and derived dimension elements. By default, Data Manager estimates the domain
size by referring to the reference structure associated with the dimension or derived dimension.

The system defined domain size can be inaccurate when the following conditions exist:

• Data includes unmatched members (data without an associated reference member)


• Reference domain is a large overestimate of the actual domain.

In these scenarios, you can provide a better custom domain size manually.

Note: To merge dimension elements that are not associated with a reference dimension, you must
set the domain size manually. Set the domain size to be greater than the maximum number of
distinct domain members for the dimension.

Process

1. Click the required dimension element or derived dimension element.

2. From the Edit menu, click Properties .

3. Click the Domain tab.

4. In the Domain size box, type the required domain size. This value should be greater than the
maximum number of distinct values for the dimension.

5. In the Domain type box, click the type of domain to use. Remember, for delivery of an aggregation
exception for the dimension, you must select Reference domain.

6. Click OK.

Note: This new domain size is applied when the build is next executed. If the domain size is too
small, the build process will generate a message to increase the size.

Performance Tuning Guide 2013

For more information on Dynamic Domain refer the following IBM link on Domain

20

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Loading
During the loading process, SCI compares the stage fact build data to the fact table data in the Data Mart.
SCI uses the normal fact build to pull the data from the stage build to the final fact table. The load build
includes two activities: data loading and data merging.

Data Loading

If data is taking too long to load into the FACT table, there might be a significant amount of data
accumulated in the FACT Table. To decrease processing time:

• Check the indexes on the FACT table. Refer section below on Database guidelines for Data manager
• Use the Oracle SQL* Loader to bulk load the data into the fact table. Refer the section on Oracle SQL
Loader

Data Merging

If the join between the stage table and the fact table is very slow and is returning data at a very low rate,
there might be a problem with the join. To decrease processing time:

• Check the joins between the tables. Refer section below on Database guidelines for Data manager
• Use the Data Merging technique. Please refer section Data Merging below.

Data merging is a very good mechanism to handle data coming from multiple tables as compared to join
queries between the tables. It loads data into memory and processes data which is faster than joins. In
addition, it merges data efficiently by handling duplicates. SCI can use the merge technique to merge the
stage table’s data with the Data mart tables.

Process

The merge is implemented in four stages, as identified in the graphic below.

Performance Tuning Guide 2013

21

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Stage 1: adding data sources


1. In this stage, add two data sources: one for new records from the stage table and one for the existing
rows in the FACT tables (these are the updated records present in both Stage and FACT tables).

2. For the FACT_GRAIN, identify a column in the incoming data sources which uniquely identifies a row
in the table. This is used for joins between the stage and the fact tables and to break the records
while processing them in memory during transformation.

In the first data source, pull the records from the stage table. Add the query as mentioned below.
Add */+ PARALLEL */ for parallel record fetch.

SELECT /*+ PARALLEL */ <COLUMN_LIST> FROM STG_FACT T1.

a. In the second data source, add a join between the Stage table and FACT table to pull the existing
rows.

SELECT /*+ PARALLEL */ FROM STG_FACT T1,FACT T2

WHERE T1.FACT_GRAIN =T2.FACT_GRAIN

ORDER BY T1.FACT_GRAIN

b. Add literals for each data source to mark the rows as NEW or EXISTING.

Performance Tuning Guide 2013

22

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

c. Merge the records in the data stream. Create a variable in the data stream items named
“EXISTS” and merge the literals to this.

Stage 2: defining merge record settings


1. Define the merge record settings.
a. Go to FACT properties and go to tab “Input”. Enter the values as mentioned in the screenshot
below:

2. Break the dimension:

Performance Tuning Guide 2013


a. Convert the FACT_GRAIN into a dimension.
b. Define dimension breaking on this column. Refer the section “Dimension Breaking”

23

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

3. Define the memory settings:


a. Click the required fact build.

b. From the Edit menu, click Properties .


c. Click the Memory tab.
In the Working memory limit (MB) box, type a suitable value to specify the maximum amount of
memory to be used. For more information, see Maximum Memory Usage.
d. In the Max page size (bytes) box, type a suitable value to specify the maximum page size.
For more information, see Maximum Page Size.
e. In the Initial page table size (slots) box, type a suitable value to specify the initial size of the
page table. For more information, see Initial Size of the Page Table.
f. In the Initial hash table size (slots) box, type a suitable value to specify the initial size of the
hash table. For more information, see Initial Size of the Hash Table
g. Click OK.

Stage 3: defining the data delivery

In this stage, define the data delivering for inserting new rows and for updating the existing rows.

1. Update existing records:


a. Right click on the destination source marked as (3) in the Fact diagram above and select
properties.
b. Navigate to tab Filters and add a filter as mentioned in the screenshot below. The EXISTS=1 filter
separates out the rows which were marked out as “1” in the data stream.

Performance Tuning Guide 2013

24

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

c. Navigate to tab “Module Properties” and enter values as below

d. Navigate to tab “Table Properties” and enter values as below:

Performance Tuning Guide 2013

25

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

2. Adding new records.


a. Right click on the destination source marked as (4) in the Fact diagram above and select
properties.
b. Navigate to tab Filters and add a filter as mentioned in the screenshot below.
c. Navigate to module delivery for this data source and implement Oracle SQL Loader.
d. Refer Oracle SQL Loader section for this. The EXISTS=1 filter separates out the rows which were
marked out as “0” in the data stream.

e. Right click on the data source marked (4) and go to table properties. Mark settings as mentioned
in the screenshot below

Performance Tuning Guide 2013

26

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Database guidelines for ETL

Indexing strategy in DM (Data Manager)

Indexing can make the Data Manager build faster. Below are some procedures for indexing on Data
Manager to improve performance.

1. Drop indexes on the tables before inserting. This is done using a procedure node from DM job stream
where we can invoke a stored procedure to drop the indexes on the tables.

2. Add the indexes back using the same strategy. The index is created mainly on the grain of the fact
i.e. a key which uniquely identifies a row in the FACT table.

3. Compute statistics on the tables using procedure nodes again.

Database indexing guidelines


 Add indexes only for the columns beings joined on and not on other filter condition columns
 Create indexes in the same order of the columns in the join
 Avoid too many indexes to avoid redundancy
 Create indexes with “compute statistics” or run compute statistics after the indexes are created.
 Index has to be added to all the surrogate keys because data manager uses MAX(surrogate_key) + 1
logic to generate the next surrogate key.
 Remove Primary Keys from FACT tables and add indexes on the column:
o Inserts are faster without PK constraints.
o Oracle *SQL Loader works better without PKs

Note: Composite indexes can be created for cubes and reporting purposes.

 Indexes are to be added to business id columns in dimensions because datamanager uses the
columns selected as business id to update the dimensions.

Performance Tuning Guide 2013

27

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Oracle SQL Loader

The import builds pull the data from the source system and appends it to the Import tables. Since SCI
doesn’t update any records on the destination table and it uses Oracle SQL Loader which appends an
array of data.

Advantage of Oracle Bulk Load:


o There is no constraint check or referential integrity check which makes the data inserts faster.
o There are no updates happening which is saving time on table scan.
o Bulk Load is faster than traditional Inserts.

Steps to Enable Oracle SQL loader

1. Open Import build.

2. Step 2: Right click on Destination Source and select

Performance Tuning Guide 2013

28

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

3. Select Oracle SQL *Loader from Change to dropdown.

4. Go to table properties and update the entries as per the screenshot below:

Performance Tuning Guide 2013

29

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Parallel Processing in Data manager

The number of parallel nodes to be processed for import, dimension and facts can be altered. Parallel
processing helps in running multiple builds in a shorter period of time.

If there are no other CPU expensive jobs in the SCI server other than the ETL, set the number of parallel
nodes to 4. If there are other big processes running on the same machine as the ETL, set the value to 2
or 1.

 To manage the parallel nodes for import, dimension and facts please update the field
CONFIG_TYPE_CONFIG_OPTION_VALUE in the table
TBL_CONFIG_TYPE_CONFIG_OPTION corresponding to each.

Query on the view VW_CONFIG_TYPE_CONFIG_OPTION to get the correct setting.

Performance Tuning Guide 2013

30

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Cube Optimization
Ulimit Settings
The Ulimit in the server for the user should be set to at least the following values

time(seconds) unlimited

file(blocks) unlimited

data(kbytes) unlimited

stack(kbytes) 4194304

memory(kbytes) unlimited

coredump(blocks) unlimited

To process the cube faster with optimal utilization of space, change to the Cognos configuration file for
transformer.

1. Navigate to COG_ROOT/configuration.
2. Open file cogtr.xml.
3. Add the following lines:

<Section Name="Transformer">

<Preference Name="PowerPlayPath" Type="string"


Value="..\..\cer5\bin\PwrPlay.exe"/> (this line is for windows)

<Preference Name="MultiFileCubeThreshold"
Type="int" Value="50000">

</Preference>

<Preference Name="WorkFileSortBufferSize"
Type="int" Value="160000000"></Preference>

<Preference Name="WorkFileMaxSize" Type="int" Performance Tuning Guide 2013


Value="1500000000"/>

<Preference Name="MaxTransactionNum" Type="int"


Value="1000000"/>

<Preference Name="DataWorkDirectory" Type="string"


Value="<SCI_HOME_DIR>/cubes/output/temp/mdc"></Preference>

<Preference Name="ModelWorkDirectory"
Type="string"
Value="<SCI_HOME_DIR>/cubes/output/temp/mdl"></Preference>

31

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

</Section>

4. Replace <SCI_HOME> with the path where SCI is installed.

Performance Tuning Guide 2013

32

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Enabling Parallelized Cube Processing


As per the latest code, SCI has not enabled the parallel processing of nodes for Cubes.

You can change how data manager handles the parallelization of independent processes through a
setting applied through catalog. You can choose to run independent processes in parallel or not. The
setting is shown here:

By default, a condition node, procedure node, SQL node, alert node, or email node is executed inline that
is, run within the Job Stream process. The implication of running inline is that the nodes are run in series
even if the Job Stream design has parallel flows. However, for procedure nodes and SQL nodes there Performance Tuning Guide 2013
may be instances where the nodes may take a long time to process so parallel execution would be
desirable. To facilitate this, we can specify that IBM Cognos Data Manager should create a separate
process for a node (procedure or SQL). For that select the above check box “Run as separate process”
as highlighted for that node.

This will make the processes run in parallel. This number of parallel processes is further based on the
parameter setting “-N” when using the data manager functions.

Note: Executing a node as a separate process uses more memory than executing inline.

33

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Set maximum parallel processes to be run at a time


Access the Catalog and open the job stream for ETL_Master_2: you will see the code below (highlighted
in red). You can update to process the number of parallel nodes.

The parameter is “N?” where ‘?’ has to be replaced with the number of parallel nodes to be processed.

Performance Tuning Guide 2013

34

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Set Auto Summarization


• Auto summarize feature on the General tab of the Data Source property sheet of each item helps
reduce the number of rows that Transformer retrieves from the source data, further improving cube
build performance

• Auto Summarize - This setting depends upon the fact data being read. If the fact data is not
consolidated then this can be checked so that the group by happens at database end. Hence less
data need to be transferred from database, consider the effect this setting has upon the generated
SQL of the data source as introducing summary functions.

Performance Tuning Guide 2013


• Ensure that the query has appropriate identifier and fact usage attributes set for this setting to be
effective.

• Review the SQL again to ensure appropriate grouping and summary functions are being applied.

• If the auto summarize option is not available then ensure the fact query is consolidated. That is it only
brings in one row of data for a unique key combination.

35

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Periodically Clean up Models


Periodically, we recommend that you revisit your cube design, adjusting your data, data sources, model,
or cube creation choices. Consider making the following improvements to speed up cube creation, reduce
cube size, and improve access times for report users:

• Delete or exclude records from the source data if they are no longer needed.

• To improve cube creation time, try changing the order of your structural data sources. Start with the
structural data sources that contain the hierarchical data for the dimensions. Then add transactional
data sources to supply measures for the model, using the minimum number of columns needed to
reference those dimensions.

• Categories. Minimizes the number of categories in a cube, the default option for models. Transformer
adds only categories that are referenced in the source data or specifically designated to be included.

• Ensure that your data does not have any uniqueness violations, if you are using level uniqueness.
Allocate extra time for data source processing to verify that all categories are unique within a level, or
eliminate this step if it is not necessary by clearing the Verify Category Uniqueness option on the
relevant data source property sheet.

Time based partitioning techniques using cube groups


Time-based partitioning is a useful means of optimizing cubes for OLAP reporting purposes. Time-based
partitioned cubes are a collection of child cubes, based on one level in the time dimension, that together
form one large cube. Each child cube is partitioned, or split, at the appropriate reporting level, such as
Quarter or Month.

Report users can view each cube independently, or access the entire collection of child cubes as a single,
time-based virtual cube. This means that reports can be viewed across the entire time dimension, or
across only one level in the time dimension, such as a specific month.

Create a Time-based Partitioned Cube Group

To more easily manage time-based partitioned cubes, you can gather them into like-structured groups.

Performance Tuning Guide 2013

36

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Process

1. To set up the cube group, while executing the option “Insert Power Cube”, on the Power
Cube property sheet, click the Cube Group tab, select the Enable time-based partitioning check
box, and click OK.

Performance Tuning Guide 2013

2. To ensure that the child cubes in your group cover a distinct level, for each one, specify the
appropriate level from your time dimension, such as Quarter or Month.

Note: You can open the .vcd file later, in any text editor, and manually include or exclude cubes.
For example, to improve performance, try adding entries in the .vcd file for cubes that are at a
higher level of the time hierarchy, such as the Quarter or Year level.

37

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.


Supply Chain Intelligence

Correct Indexing strategy based on cube queries


Create correct composite indexes based on the SQL generated for the cube. This will help in fast access
of data read during the cube build. Care should be taken care of to create the correct composite index
and make sure it is used for data read during the cube building.

Other settings
Further optimize cube creation by specifying one of the following processing methods, accessed from the
Processing tab of the Power Cube property sheet:

• Auto-partition. Enables the Auto-Partition tab, where you can set the parameters for
Transformer to devise a partitioning scheme.

• Data Passes. Optimizes the number of passes through the temporary working files during the
creation of a cube. This option is beneficial only if the more efficient alternative, auto-partitioning,
is not used (that is, your model implements features not supported with auto-partitioning).
Performance Tuning Guide 2013

38

Copyright © 2013 Manhattan Associates, Inc. All rights reserved.

You might also like