You are on page 1of 60

Expert Tips and New

Techniques for
Optimizing Data Load
and Query
Performance: Part 1
Gary Nolan
Melvan Consulting
© 2006 Wellesley Information Services. All rights reserved.
Overview

• Two back-to-back sessions on BW Performance


ΠSession One (this session) focuses primarily on data load
performance, loading tips, and procedures for monitoring
loads into BW
ΠSession Two focuses on query performance and
analysis tools

There is quite a bit of overlap … performance is a


holistic process

2
Performance….

3
What We’ll Cover …

• Performance Overview – Loading Performance


• Performance Checklist
• Data Modeling Strategies for Performance
• Performance Tuning Tips
• Methods for Determining Bottlenecks
• Wrap-up

4
What We’ll Cover …

• Performance Overview – Loading Performance


• Performance Checklist
• Data Modeling Strategies for Performance
• Performance Tuning Tips
• Methods for Determining Bottlenecks
• Wrap-up

5
Performance Tuning in SAP BW
OLTP
• OLTP systems
ΠApplication
Database
development and
performance tuning Application
separate
ΠPerformance tuning by Performance tuning
Basis experts SAP BW

Database

Application

6
Performance Tuning in SAP BW (cont.)
OLTP
• OLTP systems
ΠApplication development
and performance tuning Database
separate
ΠPerformance tuning by Application
Basis experts

SAP BW
SAP BW
Performance must be designed
into the SAP BW solution!
Database
Science vs. Art…
Application

Performance tuning
7
What We’ll Cover …

• Performance Overview – Loading Performance


• Performance Checklist
• Data Modeling Strategies for Performance
• Performance Tuning Tips
• Methods for Determining Bottlenecks
• Wrap-up

8
Your SAP BW Performance Checklist

• Have Clear Goals


ΠIf possible create a service level agreement (SLA) to
measure success
• Monitor Performance
ΠBW Statistics can help
• Consider a Performance Subteam
ΠParticipants from application, database, and Basis teams

9
Your SAP BW Performance Checklist (cont.)

• Spend a great deal of time considering your data


model
ΠYour data model is one of the most important concerns
f No amount of post-go-live tuning will effectively
resolve a bad design
ΠCorrectly and effectively use the right objects for
your design
• Maximize use of available resources via load
balancing
• Optimize extraction/data load performance

10
What We’ll Cover …

• Performance Overview – Loading Performance


• Performance Checklist
• Data Modeling Strategies for Performance
• Performance Tuning Tips
• Methods for Determining Bottlenecks
• Wrap-up

11
Improving Performance

1. Stay Current on Support Packages


2. Use Data Modeling Tools Strategically
3. Leverage Line-Item Dimensions
4. Database Table (Physical) Partitioning
5. MultiProvider (Logical) Partitioning
6. Use Time-Dependent Master Data Carefully
7. Implement Compression
8. Master Data Number Range Buffering
9. Change Run Improvements
10. InfoCube Load Performance
11. ODS Improvements
12. Flat-File Load Performance

12
Stay Current on Support Packages
service.sap.com/bi

13
Use Data Modeling Tools – Reporting Needs
Real-time Operational Management Info More Summarized
Inquiry Reporting ? (Lightly Summarized) (More Ad Hoc)

ERP (data source) Data Warehousing

Where is the dividing line?


The further the dividing line moves to the left, the more likely
you are to have reporting and data extract performance challenges

ERP
ERP Characteristics
Characteristics (OLTP)
(OLTP) DW
DW Characteristics
Characteristics (OLAP)
(OLAP)
(Online
(Online transactional
transactional processing)
processing) (Online
(Online analytical
analytical processing)
processing)

• Current, transaction-level data • Historical, summarized data


• Subject-area-specific • Enterprise or cross-application data
• Used for operational support • Used for business analysis
• Short-term retention • Longer retention

14
Use Data Modeling Tools – The BW Data Model
• Consider the data and its storage in your Data Warehouse
(DW) model …
ΠODS objects and InfoCubes are specifically designed for
certain uses Operational Data Store
• Operational reporting
• Near-real-time/volatile
• Granular ODS
• Built with ODS objects object

Persistent Data Warehouse Multidimensional Models


Storage Area • Non-volatile
• Multidimensional analysis
• Data staging • Granular
• Aggregated view
• Raw data • Integrated
• Integrated
• Built with PSA • Historical foundation
• Built with InfoCubes
objects • Built with ODS objects ODS
PSA Object 15
Use Data Modeling Tools – InfoCube Design:
Key Concepts
• The design of your data model is critical
because it impacts:
ΠData load performance
ΠReporting performance
ΠHardware/database sizing
• The overall goals of the InfoCube design
ΠOffer information to end users in a way that matches
their normal understanding of the business
ΠDeliver structured information, enabling easy
navigation/drilldown
ΠProduce a model that can be easily implemented
Œ “As posted vs. restated”

16
Use Data Modeling Tools – The InfoCube Model
• SAP extended star schema
ΠMaster data is shared, not part of the InfoCube
Master
Master Master
Master
data
data data
data

Dimension
Dimension Dimension
Dimension
Fact Tables

Dimension ID

Should be small in relation


to fact table (1:10 or >)
or a high-cardinality/line-
item dimension

Dimension
Dimension Dimension
Dimension
For reporting:
Master
Master Should contain only needed Master
data characteristics/key figures. Master
data data
data
Examine alternative data
models and types of KFs.
17
Use Data Modeling Tools – Master Data

• Master data
ΠHierarchies
ΠAttributes
Master
ΠTexts (multilingual support) Master
SID
SIDtable
table
Data
Data

For reporting: Dimension


Dimension Fact tables
Use time-dependent
master data sparingly,
if possible, to ensure
adequate performance

Hierarchies

Attributes Texts
Plant UK
Customer Street Account Plant Germany
Plant Australia

18
Use Data Modeling Tools – Consider the Level
of Granularity
• Granularity greatly influences:
ΠReporting capabilities
ΠPerformance
ΠSpace needed
ΠLoad time
• Key questions to ask:
ΠDoes the data need to be stored in the cube?
f Storing data in an ODS offers a lower level of
granularity
ΠDoes the data need to be stored in the warehouse at all?
f Can you meet users’ drill-down requirements by
linking directly to R/3?
f This would avoid having to load and store the data in
SAP BW
19
Leverage Line-Item Dimensions

• How do you handle unbalanced star schemas?


Πe.g., fact table: dimension ratio < 10:1
• Line-item dimensions
ΠDirect link from InfoCube fact table to master data
surrogate ID (SID) table – without using an intermediate
dimension table

SID Dimension Fact table Dimension SID SID Fact table Dimension SID

With Without
line-item line-item
dimension dimension

20
Leverage Line-Item Dimensions (cont.)

• Line-item dimensions
ΠAdvantages
f Saves one table join at query runtime (overhead)

f Saves overhead of dimension ID determination at


runtime (overhead)
ΠDisadvantages
f Only one characteristic is possible in a
line-item dimension
f Can be defined only during InfoCube design

ΠRecommended for large dimensions with


distinct values
f e.g., invoice or order number, but also reasonable
for large material or customer dimensions
21
Leverage Line-Item Dimensions (cont.)

• High-cardinality dimensions
ΠSpecify high cardinality if the dimension size
exceeds 10 percent of the fact table size
f Converts the index from bitmap to B* Tree for
performance improvement
f InfoCubes can be converted at any time!

22
Leverage Line-Item Dimensions (cont.)

• Program: SAP_INFOCUBE DESIGNS (transaction


SE38) – make sure statistics are up to date prior to
running the program

Anything over 20% should be examined for re-shuffling of dimensions


or converting to a line item dimension 23
Database Table (Physical) Partitioning

• Each InfoCube actually has two fact tables


• F-Fact Table – optimized for data loading
• E-Fact Table – optimized for retrieving data

Go to Extras->Partitioning

Tcode
RSA1
24
Database Table Partitioning

• Partitioning
ΠPartitioning splits the table into several tables
ΠThis can speed up query performance significantly
Œ The F-Fact table is already partitioned by the request’s ID
ΠThe E-Fact table can be partitioned by:
f Calendar year month (0CALMONTH)

f Fiscal year period (0FISCPER)

f The InfoCube needs to be compressed in order


to take advantage of the partitioning
(this can be scheduled)

25
Database Table Partitioning (cont.)

• Benefits
ΠParallel accesses to partitions
ΠRead smaller sets of data

26
MultiProvider Partitioning
• MultiProvider (logical) partitioning
ΠCan partition data by year, plan/actual, region, business area, etc.
ΠParallel sub-queries are started automatically to basic InfoCubes
Œ Use to divide large amounts of data into “chunks”

Consolidated view
on all data
Basic
InfoCubes

Europe Asia USA

F-Fact table E-Fact table F-Fact table E-Fact table F-Fact table E-Fact table
Req-id 001/2002 Req-id 001/2002 Req-id 001/2002
Req-id 002/2002 Req-id 002/2002 Req-id 002/2002
Physical Req-id 003/2002 Req-id 003/2002 Req-id 003/2002
partitions …. …. ….
012/2005 012/2005 012/2005
27
MultiProvider Partitioning (cont.)

• Benefits
ΠQueries are split automatically and distributed to
InfoProviders (parallel execution where possible)
ΠSingle InfoProviders are smaller, less complex, and less
sparsely filled than one big InfoProvider
ΠNo additional data storage needed
ΠIndividual InfoProviders can be tuned independently
ΠData can be loaded into individual InfoProviders
in parallel
ΠTransparent use for reporting to query designers and
end users
ΠLocal queries on each InfoProvider possible
ΠArchiving of single basic InfoProvider is very easy
28
MultiProvider Partitioning (cont.)

• Disadvantages
ΠAdministration (with aggregates)
ΠAdditional I/O

29
What We’ll Cover …

• Performance Overview – Loading Performance


• Performance Checklist
• Data Modeling Strategies for Performance
• Performance Tuning Tips
• Methods for Determining Bottlenecks
• Wrap-up

30
Use Time-Dependent Master Data Carefully

• Overuse or incorrect definition of time dependency for


master data objects can adversely affect query
performance
• Time-dependent master data is used to model a view of
the data as it existed at a point in time
Πe.g., sales by salesperson reflecting the territory assignments
in 2001
• Time dependency should be modeled only if
requirements deem it a “must have,” not a
“nice to have”
ΠSince it limits tuning potential

31
Use Time-Dependent Master Data Carefully (cont.)

• Aggregates can be built for time-dependent master data,


but are limited to specific key date
ΠLimits the effectiveness of aggregates
ΠCould increase the number of aggregates
ΠWhen key date changes, special aggregate change run
is required
• Consider building time-dependent and time-independent
versions of the same attribute

32
InfoCube Compression

• Compression is the procedure to move data from the F-


Fact table to the E-Fact table
ΠDuring compression, the request ID is eliminated
f Results in removing one dimension from join when reading
E-Fact table
f Therefore, query performance on compressed
data is better
f The request ID-keyed data in the F-Fact table is deleted
(partition is dropped)

33
InfoCube Compression (cont.)
ΠCompression can be executed during query execution and
data loading
f Transparent to end users and data consistency
is guaranteed

Dat
a lo
ads
F Table
REQUEST No. Time Material Sales

Fact
FactTables
Tables
E Table
Compression
2
REQUEST No. Time Material Sales

34
InfoCube Compression (cont.)

• Compression benefit – reduction in data volume


ΠRecords with the same master data keys are combined during
compression
f Transaction records with the same keys in different
requests are combined
Œ “Zeroed” records can also be removed
(e.g., + and – values)
f Administrators can control this for each InfoCube

ΠTypically, an overall 20% to 30% reduction in data has been


observed

35
Compression and Non-Cumulative Key Figures
• Non-cumulative key figures
ΠCannot be cumulated meaningfully over time
f e.g., inventory, number of employees
ΠStorage: Historical movements and reference point
(in E-Fact table)
ΠReference point is updated only when InfoCube is compressed
f If all requests are compressed, the reference point represents, e.g., the
current stock
f If not, then all request IDs have to be analyzed to determine correct
value
ΠExample:
F-Fact table E-Fact table
Month Material Plant Material flow Reference point
Jan-02 4711 1000 10 0
Compressing 10
Feb-02 4711 1000 20 10
Mar 02 4711 1000 5 10
Compressing 35

Note: The reference points are stored only in the compressed E table;
they have the time value “infinity,” e.g., for day 12/31/9999 36
Symptoms of Many Uncompressed InfoCube Requests

• Data staging:
ΠIndex builds after data loads become slower
ΠDB statistics after data loads become slower
ΠAggregate builds encounter long runtimes or errors
ΠData availability for querying on new requests data is affected
• Query execution:
ΠQueries become slower and slower with each data load
ΠQueries on non-cumulative InfoCubes are very slow

37
Master Data Range Buffering
• SID generation/allocation can become a bottleneck
when loading master data (or transaction data that
generates master data)
ΠSID number range can be buffered instead of accessing the DB for
each SID
ΠIf you discover massive accesses to DB table NRIV via SQL trace
(ST05), increase the number range buffer in transaction SNRO
ΠIf possible, reset the value to its original state after the load (to avoid
unnecessary memory allocation)
SID Material Number
SID Buffer
1 4711 See the
4
2 0815 BW Expert
5
3 4712 6
article:
Increase Load
7 Performance by
8 Buffering
Number Ranges
Select next new
SID from buffer
New material: 1125
SID
38
Change Run/Apply Job

• There are two steps in the change run …


• Activates the new master data and hierarchy data
ΠMaster data is only available after the change run has
completed successfully
• All aggregates containing navigational attributes
and/or hierarchies are realigned and recalculated
with the new master data
ΠAffects all aggregates that contain the changed attribute
and/or hierarchy
ΠPercentage parameter in the IMG determines if the
aggregate is fixed or dropped and rebuilt completely
ΠChange run can be started for specific InfoObjects

39
Change Run/Apply Job (cont.)

• All aggregates containing navigational attributes and/or


hierarchies are realigned and recalculated with the new
master data (cont.)
ΠKey figures that are set for exception aggregation MIN and
MAX cause the aggregates to be completely rebuilt for each
change run
ΠABAP Program RSDDS_CHANGERUN_MONITOR can be used
to monitor change run in progress

40
Change Run Performance – Tips

• Use the threshold for delta and new build-up


in customizing
• The change run can be parallelized across InfoCubes;
see SAP Note 534630 for more details
• Check aggregate hierarchy (see “Rollup” for more details)
• Try to build basis aggregates that are not affected by
the change run, i.e., no navigational attributes or
hierarchy levels
• The following slides show details on the process itself …

41
InfoCube Data Load Performance
• Admin WB > Modeling > InfoCube > Manage > Performance Tab
• Recommendation: Drop secondary indexes prior to large
InfoCube data loads
• Can be done using process chains in BW 3.x

Create Index button:


Set automatic index
drop/rebuild

Statistics Structure button:


Set automatic DB statistics
run after a data load

42
Load Performance – Index Management

• Indexes have positive impact on read operations and


negative impact on write operations
• Since most of the database operations are DB writes
during a data load:
ΠDrop indexes during the data load
ΠException: If a database lookup occurs during a dataload
• Since most of the database operations are DB reads
during a query:
ΠRecreate indexes right after the data load
• Hint: Use process chains

43
Load Performance – Load Packet Size

• Default packet size is set system-wide


• Packet size can be overwritten in the InfoPackage
• The larger the packet size, the fewer the number of
roundtrips between R/3 and BW
• The ONLY way to find the optimal packet size is through
trial and error, but you can start with an educated guess
• See OSS Collective Note: 417307 – Extractor
Packet Size

44
Load Sequencing

• Load master data before loading transactional data. If


you load transactional data first:
ΠWhile loading transaction data, SIDs will be created with blank
attributes
ΠWhile loading attributes later, system looks up into the SID
tables for proper SID
ΠThis lookup is very process-intensive
• While loading two transactional data loads at the same
time:
ΠMake sure that the number of master data tables they both
need have been loaded

45
ODS Load/Activation Tuning Tips
Non-reporting ODS objects
BEx flag: Computation of
SIDs for the ODS can be
switched off

BEx flag must be switched


on if BEx-reporting on the
ODS is executed

Use if only unique records


are being loaded
into the ODS

Loads are faster as master data SID tables do not have to be read
and linked to the ODS data
46
Flat File Load Performance

• Use predetermined record length (ASCII)


ΠSignificantly faster than delimited files (CSV)
• File should read from the application server, not on the
client PC
• Work with network administrator to eliminate bottlenecks
• Avoid placing input load files on high I/O disks
• Split files up into multiple files and load in parallel
whenever possible
ΠTypically it is significantly better performance to load four files
concurrently with 250,000 records in parallel as compared to
one file with one million records
ΠDelete indexes before load, rebuild after load
f Faster to rebuild the index at the end of the load process
instead of updating it for every record 47
Other Performance Tuning Tips

• Filters
ΠBuild secondary indexes on filtered fields in source system
• Convert old LIS extractors to V3 collection method
ΠMore efficient loads
• Use delta processing whenever possible
• Run loads at off-peak times
• Ensure that the selection parameters of the extraction
InfoPackage facilitate the use of indexes
ΠYou can use transaction RSRV to check indexes

48
Other Performance Tuning Tips (cont.)

• PSA partitioning – transaction RSCUSTV6 – OSS Note


485878
• Data archiving
ΠRemove obsolete data
• Keep statistics up-to-date
ΠUse RSRV to check statistics

49
What We’ll Cover …

• Performance Overview – Loading Performance


• Performance Checklist
• Data Modeling Strategies for Performance
• Performance Tuning Tips
• Methods for Determining Bottlenecks
• Wrap-up

50
Methods for Determining Bottlenecks

• Be very critical of ABAP in transfer rules, update rules,


and start routines
ΠSmall bottlenecks become huge because of
multiple iterations
• Avoid select * from tables
• Avoid selecting all fields in a table when only a subset
is needed
• Use internal tables to avoid multiple selects
ΠWhen reading an external table Рconsider selecting into an
internal table once, rather than selecting on each update

51
Methods for Determining Bottlenecks (cont.)

• Use “else if” rather than multiple if statements to


reduce the decision-making whenever possible
• Start routines enable you to manipulate whole data
packages instead of changing record by record.
Sometimes this can allow for faster transformation.

52
Transaction SE30 – Runtime Analysis

Large percentage in net


column denotes a very
resource-intensive
statement

Allows direct jump to code

Transaction SE30 can be used for performance analysis.


Start Extraction (or use RSA3). Run SE30 on active extraction job.

53
Is Extraction Time Too Long?
Check SQL

Use ST05 Trace with filter on the


extraction user (e.g., ALEREMOTE).
Make sure that no concurrent
extracting jobs run at the same time
with this execution.

54
Simulate Update
• Debugging and tuning update and transfer rules:
ΠAllows ABAP single step using breakpoints to
determine bottlenecks

55
What We’ll Cover …

• Performance Overview – Loading Performance


• Performance Checklist
• Data Modeling Strategies for Performance
• Performance Tuning Tips
• Methods for Determining Bottlenecks
• Wrap-up

56
Resources
service.sap.com/bi (Requires login credentials to the SAP Service Marketplace)
Performance

57
Resources (cont.)

• BW Expert Articles:
Œ “Better Star Schema Design Means Better Performance,”
by Gary Nolan, (Volume 2, Issue 8)
Œ “Data Modeling with Time-Dependent Master Data,”
by Gary Nolan, (Volume 2, Issue 4)
Œ “InfoProvider Compression: What It Is and When to Do It,”
by Gary Nolan, (Volume 3, Issue 5)
Œ “24 BW Design and Data Modeling Tips for Optimal ETL,”
by Catherine Roze and Joffy Mathew, (Volume 1, Issue 9)
Œ “Debug Routines in Update Rules in Just Seven Easy Steps,”
by July Hartono, (Volume 1, Issue 2)

58
7 Key Points to Take Home

• Data model for performance


• Stay current with support packages
• Implement Partitioning – this can significantly reduce
query processing time
• Be careful with time-dependent master data
• Implement and stay as current as possible with
InfoCube compression
• There are various tools to troubleshoot performance –
use them
• Performance is an ongoing task – always keep
performance as a factor in decision-making
59
Your Turn!

“An acre of performance is worth


a whole world of promise.”
-William Dean Howells-

How to contact me:


Gary Nolan
gary.nolan@melvanconsulting.com
60

You might also like