Expert Tips and New Techniques For Optimizing Data Load and Query Performance Part One

Expert Tips and New
Techniques for
Optimizing Data Load
and Query
Performance: Part 1
Gary Nolan
Melvan Consulting
© 2006 Wellesley Information Services. All rights reserved.
Overview
• Two back-to-back sessions on BW Performance

Session One (this session) focuses primarily on data load
performance, loading tips, and procedures for monitoring
loads into BW
Session Two focuses on query performance and
analysis tools
There is quite a bit of overlap … performance is a

holistic process
2
Performance….
3
What We’ll Cover …
• Performance Overview – Loading Performance

• Performance Checklist
• Data Modeling Strategies for Performance
• Performance Tuning Tips
• Methods for Determining Bottlenecks
• Wrap-up
4

• Wrap-up
5
Performance Tuning in SAP BW
OLTP
• OLTP systems
Application
Database
development and
performance tuning Application
separate
Performance tuning by Performance tuning
Basis experts SAP BW
Database
Application
6
Performance Tuning in SAP BW (cont.)
OLTP
• OLTP systems
Application development
and performance tuning Database
separate
Performance tuning by Application
Basis experts
SAP BW
SAP BW
Performance must be designed
into the SAP BW solution!
Database
Science vs. Art…
Application
Performance tuning
7

• Wrap-up
8
Your SAP BW Performance Checklist
• Have Clear Goals

If possible create a service level agreement (SLA) to
measure success
• Monitor Performance
BW Statistics can help
• Consider a Performance Subteam
Participants from application, database, and Basis teams
9
Your SAP BW Performance Checklist (cont.)
• Spend a great deal of time considering your data

model
Your data model is one of the most important concerns
f No amount of post-go-live tuning will effectively
resolve a bad design
Correctly and effectively use the right objects for
your design
• Maximize use of available resources via load
balancing
• Optimize extraction/data load performance
10

• Wrap-up
11
Improving Performance
1. Stay Current on Support Packages

2. Use Data Modeling Tools Strategically
3. Leverage Line-Item Dimensions
4. Database Table (Physical) Partitioning
5. MultiProvider (Logical) Partitioning
6. Use Time-Dependent Master Data Carefully
7. Implement Compression
8. Master Data Number Range Buffering
9. Change Run Improvements
10. InfoCube Load Performance
11. ODS Improvements
12. Flat-File Load Performance
12
Stay Current on Support Packages
service.sap.com/bi
13
Use Data Modeling Tools – Reporting Needs
Real-time Operational Management Info More Summarized
Inquiry Reporting ? (Lightly Summarized) (More Ad Hoc)
ERP (data source) Data Warehousing
Where is the dividing line?

The further the dividing line moves to the left, the more likely
you are to have reporting and data extract performance challenges
ERP
ERP Characteristics
Characteristics (OLTP)
(OLTP) DW
DW Characteristics
Characteristics (OLAP)
(OLAP)
(Online
(Online transactional
transactional processing)
processing) (Online
(Online analytical
analytical processing)
processing)
• Current, transaction-level data • Historical, summarized data

• Subject-area-specific • Enterprise or cross-application data
• Used for operational support • Used for business analysis
• Short-term retention • Longer retention
14
Use Data Modeling Tools – The BW Data Model
• Consider the data and its storage in your Data Warehouse
(DW) model …
ODS objects and InfoCubes are specifically designed for
certain uses Operational Data Store
• Operational reporting
• Near-real-time/volatile
• Granular ODS
• Built with ODS objects object
Persistent Data Warehouse Multidimensional Models

Storage Area • Non-volatile
• Multidimensional analysis
• Data staging • Granular
• Aggregated view
• Raw data • Integrated
• Integrated
• Built with PSA • Historical foundation
• Built with InfoCubes
objects • Built with ODS objects ODS
PSA Object 15
Use Data Modeling Tools – InfoCube Design:
Key Concepts
• The design of your data model is critical
because it impacts:
Data load performance
Reporting performance
Hardware/database sizing
• The overall goals of the InfoCube design
Offer information to end users in a way that matches
their normal understanding of the business
Deliver structured information, enabling easy
navigation/drilldown
Produce a model that can be easily implemented
“As posted vs. restated”
16
Use Data Modeling Tools – The InfoCube Model
• SAP extended star schema
Master data is shared, not part of the InfoCube
Master
Master Master
Master
data
data data
data
Dimension
Dimension Dimension
Dimension
Fact Tables
Dimension ID
Should be small in relation

to fact table (1:10 or >)
or a high-cardinality/line-
item dimension
Dimension
Dimension Dimension
Dimension
For reporting:
Master
Master Should contain only needed Master
data characteristics/key figures. Master
data data
data
Examine alternative data
models and types of KFs.
17
Use Data Modeling Tools – Master Data
• Master data
Hierarchies
Attributes
Master
Texts (multilingual support) Master
SID
SIDtable
table
Data
Data
For reporting: Dimension

Dimension Fact tables
Use time-dependent
master data sparingly,
if possible, to ensure
adequate performance
Hierarchies
Attributes Texts
Plant UK
Customer Street Account Plant Germany
Plant Australia
18
Use Data Modeling Tools – Consider the Level
of Granularity
• Granularity greatly influences:
Reporting capabilities
Performance
Space needed
Load time
• Key questions to ask:
Does the data need to be stored in the cube?
f Storing data in an ODS offers a lower level of
granularity
Does the data need to be stored in the warehouse at all?
f Can you meet users’ drill-down requirements by
linking directly to R/3?
f This would avoid having to load and store the data in
SAP BW
19
Leverage Line-Item Dimensions
• How do you handle unbalanced star schemas?

e.g., fact table: dimension ratio < 10:1
• Line-item dimensions
Direct link from InfoCube fact table to master data
surrogate ID (SID) table – without using an intermediate
dimension table
SID Dimension Fact table Dimension SID SID Fact table Dimension SID
With Without
line-item line-item
dimension dimension
20
Leverage Line-Item Dimensions (cont.)
• Line-item dimensions
Advantages
f Saves one table join at query runtime (overhead)
f Saves overhead of dimension ID determination at

runtime (overhead)
Disadvantages
f Only one characteristic is possible in a
line-item dimension
f Can be defined only during InfoCube design
Recommended for large dimensions with

distinct values
f e.g., invoice or order number, but also reasonable
for large material or customer dimensions
21
• High-cardinality dimensions
Specify high cardinality if the dimension size
exceeds 10 percent of the fact table size
f Converts the index from bitmap to B* Tree for
performance improvement
f InfoCubes can be converted at any time!
22
• Program: SAP_INFOCUBE DESIGNS (transaction

SE38) – make sure statistics are up to date prior to
running the program
Anything over 20% should be examined for re-shuffling of dimensions

or converting to a line item dimension 23
Database Table (Physical) Partitioning
• Each InfoCube actually has two fact tables

• F-Fact Table – optimized for data loading
• E-Fact Table – optimized for retrieving data
Go to Extras->Partitioning
Tcode
RSA1
24
Database Table Partitioning
• Partitioning
Partitioning splits the table into several tables
This can speed up query performance significantly
The F-Fact table is already partitioned by the request’s ID
The E-Fact table can be partitioned by:
f Calendar year month (0CALMONTH)
f Fiscal year period (0FISCPER)
f The InfoCube needs to be compressed in order

to take advantage of the partitioning
(this can be scheduled)
25
Database Table Partitioning (cont.)
• Benefits
Parallel accesses to partitions
Read smaller sets of data
26
MultiProvider Partitioning
• MultiProvider (logical) partitioning
Can partition data by year, plan/actual, region, business area, etc.
Parallel sub-queries are started automatically to basic InfoCubes
Use to divide large amounts of data into “chunks”
Consolidated view
on all data
Basic
InfoCubes
Europe Asia USA
F-Fact table E-Fact table F-Fact table E-Fact table F-Fact table E-Fact table
Req-id 001/2002 Req-id 001/2002 Req-id 001/2002
Req-id 002/2002 Req-id 002/2002 Req-id 002/2002
Physical Req-id 003/2002 Req-id 003/2002 Req-id 003/2002
partitions …. …. ….
012/2005 012/2005 012/2005
27
MultiProvider Partitioning (cont.)
• Benefits
Queries are split automatically and distributed to
InfoProviders (parallel execution where possible)
Single InfoProviders are smaller, less complex, and less
sparsely filled than one big InfoProvider
No additional data storage needed
Individual InfoProviders can be tuned independently
Data can be loaded into individual InfoProviders
in parallel
Transparent use for reporting to query designers and
end users
Local queries on each InfoProvider possible
Archiving of single basic InfoProvider is very easy
28
MultiProvider Partitioning (cont.)
• Disadvantages
Administration (with aggregates)
Additional I/O
29

• Wrap-up
30
Use Time-Dependent Master Data Carefully
• Overuse or incorrect definition of time dependency for

master data objects can adversely affect query
performance
• Time-dependent master data is used to model a view of
the data as it existed at a point in time
e.g., sales by salesperson reflecting the territory assignments
in 2001
• Time dependency should be modeled only if
requirements deem it a “must have,” not a
“nice to have”
Since it limits tuning potential
31
Use Time-Dependent Master Data Carefully (cont.)
• Aggregates can be built for time-dependent master data,

but are limited to specific key date
Limits the effectiveness of aggregates
Could increase the number of aggregates
When key date changes, special aggregate change run
is required
• Consider building time-dependent and time-independent
versions of the same attribute
32
InfoCube Compression
• Compression is the procedure to move data from the F-

Fact table to the E-Fact table
During compression, the request ID is eliminated
f Results in removing one dimension from join when reading
E-Fact table
f Therefore, query performance on compressed
data is better
f The request ID-keyed data in the F-Fact table is deleted
(partition is dropped)
33
InfoCube Compression (cont.)
Compression can be executed during query execution and
data loading
f Transparent to end users and data consistency
is guaranteed
Dat
a lo
ads
F Table
REQUEST No. Time Material Sales
Fact
FactTables
Tables
E Table
Compression
2
REQUEST No. Time Material Sales
34
InfoCube Compression (cont.)
• Compression benefit – reduction in data volume

Records with the same master data keys are combined during
compression
f Transaction records with the same keys in different
requests are combined
“Zeroed” records can also be removed
(e.g., + and – values)
f Administrators can control this for each InfoCube
Typically, an overall 20% to 30% reduction in data has been

observed
35
Compression and Non-Cumulative Key Figures
• Non-cumulative key figures
Cannot be cumulated meaningfully over time
f e.g., inventory, number of employees
Storage: Historical movements and reference point
(in E-Fact table)
Reference point is updated only when InfoCube is compressed
f If all requests are compressed, the reference point represents, e.g., the
current stock
f If not, then all request IDs have to be analyzed to determine correct
value
Example:
F-Fact table E-Fact table
Month Material Plant Material flow Reference point
Jan-02 4711 1000 10 0
Compressing 10
Feb-02 4711 1000 20 10
Mar 02 4711 1000 5 10
Compressing 35
Note: The reference points are stored only in the compressed E table;
they have the time value “infinity,” e.g., for day 12/31/9999 36
Symptoms of Many Uncompressed InfoCube Requests
• Data staging:
Index builds after data loads become slower
DB statistics after data loads become slower
Aggregate builds encounter long runtimes or errors
Data availability for querying on new requests data is affected
• Query execution:
Queries become slower and slower with each data load
Queries on non-cumulative InfoCubes are very slow
37
Master Data Range Buffering
• SID generation/allocation can become a bottleneck
when loading master data (or transaction data that
generates master data)
SID number range can be buffered instead of accessing the DB for
each SID
If you discover massive accesses to DB table NRIV via SQL trace
(ST05), increase the number range buffer in transaction SNRO
If possible, reset the value to its original state after the load (to avoid
unnecessary memory allocation)
SID Material Number
SID Buffer
1 4711 See the
4
2 0815 BW Expert
5
3 4712 6
article:
Increase Load
7 Performance by
8 Buffering
Number Ranges
Select next new
SID from buffer
New material: 1125
SID
38
Change Run/Apply Job
• There are two steps in the change run …

• Activates the new master data and hierarchy data
Master data is only available after the change run has
completed successfully
• All aggregates containing navigational attributes
and/or hierarchies are realigned and recalculated
with the new master data
Affects all aggregates that contain the changed attribute
and/or hierarchy
Percentage parameter in the IMG determines if the
aggregate is fixed or dropped and rebuilt completely
Change run can be started for specific InfoObjects
39
Change Run/Apply Job (cont.)
• All aggregates containing navigational attributes and/or

hierarchies are realigned and recalculated with the new
master data (cont.)
Key figures that are set for exception aggregation MIN and
MAX cause the aggregates to be completely rebuilt for each
change run
ABAP Program RSDDS_CHANGERUN_MONITOR can be used
to monitor change run in progress
40
Change Run Performance – Tips
• Use the threshold for delta and new build-up

in customizing
• The change run can be parallelized across InfoCubes;
see SAP Note 534630 for more details
• Check aggregate hierarchy (see “Rollup” for more details)
• Try to build basis aggregates that are not affected by
the change run, i.e., no navigational attributes or
hierarchy levels
• The following slides show details on the process itself …
41
InfoCube Data Load Performance
• Admin WB > Modeling > InfoCube > Manage > Performance Tab
• Recommendation: Drop secondary indexes prior to large
InfoCube data loads
• Can be done using process chains in BW 3.x
Create Index button:

Set automatic index
drop/rebuild
Statistics Structure button:

Set automatic DB statistics
run after a data load
42
Load Performance – Index Management
• Indexes have positive impact on read operations and

negative impact on write operations
• Since most of the database operations are DB writes
during a data load:
Drop indexes during the data load
Exception: If a database lookup occurs during a dataload
• Since most of the database operations are DB reads
during a query:
Recreate indexes right after the data load
• Hint: Use process chains
43
Load Performance – Load Packet Size
• Default packet size is set system-wide

• Packet size can be overwritten in the InfoPackage
• The larger the packet size, the fewer the number of
roundtrips between R/3 and BW
• The ONLY way to find the optimal packet size is through
trial and error, but you can start with an educated guess
• See OSS Collective Note: 417307 – Extractor
Packet Size
44
Load Sequencing
• Load master data before loading transactional data. If

you load transactional data first:
While loading transaction data, SIDs will be created with blank
attributes
While loading attributes later, system looks up into the SID
tables for proper SID
This lookup is very process-intensive
• While loading two transactional data loads at the same
time:
Make sure that the number of master data tables they both
need have been loaded
45
ODS Load/Activation Tuning Tips
Non-reporting ODS objects
BEx flag: Computation of
SIDs for the ODS can be
switched off
BEx flag must be switched

on if BEx-reporting on the
ODS is executed
Use if only unique records

are being loaded
into the ODS
Loads are faster as master data SID tables do not have to be read
and linked to the ODS data
46
Flat File Load Performance
• Use predetermined record length (ASCII)

Significantly faster than delimited files (CSV)
• File should read from the application server, not on the
client PC
• Work with network administrator to eliminate bottlenecks
• Avoid placing input load files on high I/O disks
• Split files up into multiple files and load in parallel
whenever possible
Typically it is significantly better performance to load four files
concurrently with 250,000 records in parallel as compared to
one file with one million records
Delete indexes before load, rebuild after load
f Faster to rebuild the index at the end of the load process
instead of updating it for every record 47
Other Performance Tuning Tips
• Filters
Build secondary indexes on filtered fields in source system
• Convert old LIS extractors to V3 collection method
More efficient loads
• Use delta processing whenever possible
• Run loads at off-peak times
• Ensure that the selection parameters of the extraction
InfoPackage facilitate the use of indexes
You can use transaction RSRV to check indexes
48
Other Performance Tuning Tips (cont.)
• PSA partitioning – transaction RSCUSTV6 – OSS Note

485878
• Data archiving
Remove obsolete data
• Keep statistics up-to-date
Use RSRV to check statistics
49

• Wrap-up
50
Methods for Determining Bottlenecks
• Be very critical of ABAP in transfer rules, update rules,

and start routines
Small bottlenecks become huge because of
multiple iterations
• Avoid select * from tables
• Avoid selecting all fields in a table when only a subset
is needed
• Use internal tables to avoid multiple selects
When reading an external table – consider selecting into an
internal table once, rather than selecting on each update
51
Methods for Determining Bottlenecks (cont.)
• Use “else if” rather than multiple if statements to

reduce the decision-making whenever possible
• Start routines enable you to manipulate whole data
packages instead of changing record by record.
Sometimes this can allow for faster transformation.
52
Transaction SE30 – Runtime Analysis
Large percentage in net

column denotes a very
resource-intensive
statement
Allows direct jump to code
Transaction SE30 can be used for performance analysis.

Start Extraction (or use RSA3). Run SE30 on active extraction job.
53
Is Extraction Time Too Long?
Check SQL
Use ST05 Trace with filter on the

extraction user (e.g., ALEREMOTE).
Make sure that no concurrent
extracting jobs run at the same time
with this execution.
54
Simulate Update
• Debugging and tuning update and transfer rules:
Allows ABAP single step using breakpoints to
determine bottlenecks
55

• Wrap-up
56
Resources
service.sap.com/bi (Requires login credentials to the SAP Service Marketplace)
Performance
57
Resources (cont.)
• BW Expert Articles:
“Better Star Schema Design Means Better Performance,”
by Gary Nolan, (Volume 2, Issue 8)
“Data Modeling with Time-Dependent Master Data,”
“InfoProvider Compression: What It Is and When to Do It,”
“24 BW Design and Data Modeling Tips for Optimal ETL,”
by Catherine Roze and Joffy Mathew, (Volume 1, Issue 9)
“Debug Routines in Update Rules in Just Seven Easy Steps,”
by July Hartono, (Volume 1, Issue 2)
58
7 Key Points to Take Home
• Data model for performance

• Stay current with support packages
• Implement Partitioning – this can significantly reduce
query processing time
• Be careful with time-dependent master data
• Implement and stay as current as possible with
InfoCube compression
• There are various tools to troubleshoot performance –
use them
• Performance is an ongoing task – always keep
performance as a factor in decision-making
59
Your Turn!
“An acre of performance is worth

a whole world of promise.”
-William Dean Howells-
How to contact me:

Gary Nolan
gary.nolan@melvanconsulting.com
60

Expert Tips and New Techniques For Optimizing Data Load and Query Performance Part One

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Expert Tips and New Techniques For Optimizing Data Load and Query Performance Part One

Uploaded by

Copyright:

Available Formats

Expert Tips and New

• Two back-to-back sessions on BW Performance

There is quite a bit of overlap … performance is a

• Performance Overview – Loading Performance

• Performance Overview – Loading Performance

• Performance Overview – Loading Performance

• Have Clear Goals

• Spend a great deal of time considering your data

• Performance Overview – Loading Performance

1. Stay Current on Support Packages

ERP (data source) Data Warehousing

Where is the dividing line?

• Current, transaction-level data • Historical, summarized data

Persistent Data Warehouse Multidimensional Models

Should be small in relation

For reporting: Dimension

• How do you handle unbalanced star schemas?

f Saves overhead of dimension ID determination at

 Recommended for large dimensions with

• Program: SAP_INFOCUBE DESIGNS (transaction

Anything over 20% should be examined for re-shuffling of dimensions

• Each InfoCube actually has two fact tables

f Fiscal year period (0FISCPER)

f The InfoCube needs to be compressed in order

Europe Asia USA

• Performance Overview – Loading Performance

• Overuse or incorrect definition of time dependency for

• Aggregates can be built for time-dependent master data,

• Compression is the procedure to move data from the F-

• Compression benefit – reduction in data volume

 Typically, an overall 20% to 30% reduction in data has been

• There are two steps in the change run …

• All aggregates containing navigational attributes and/or

• Use the threshold for delta and new build-up

Create Index button:

Statistics Structure button:

• Indexes have positive impact on read operations and

• Default packet size is set system-wide

• Load master data before loading transactional data. If

BEx flag must be switched

Use if only unique records

• Use predetermined record length (ASCII)

• PSA partitioning – transaction RSCUSTV6 – OSS Note

• Performance Overview – Loading Performance

• Be very critical of ABAP in transfer rules, update rules,

• Use “else if” rather than multiple if statements to

Large percentage in net

Allows direct jump to code

Transaction SE30 can be used for performance analysis.

Use ST05 Trace with filter on the

• Performance Overview – Loading Performance

• Data model for performance

“An acre of performance is worth

How to contact me:

You might also like

Recommended for large dimensions with

Typically, an overall 20% to 30% reduction in data has been