You are on page 1of 39

IBM SAP International Competence Center

Copyright IBM Corp. 2013



1







Comparison of
SAP Application Performance
on Centralized versus Distributed
Server Topologies







This document can be found on the web, www.ibm.com/support/techdocs



Date: November 2013



Walter Orb
Matthias Kchl
Fabrice Moyen
Hans-Jrgen Reiss














IBM SAP International Competence Center
Walldorf, Germany
IBM SAP International Competence Center


Copyright IBM Corp. 2013

2



Table of contents
TABLE OF CONTENTS ..................................................................................................................................... 2
FIGURES ........................................................................................................................................................ 3
TABLES .......................................................................................................................................................... 3
1. INTRODUCTION ..................................................................................................................................... 4
ACKNOWLEDGEMENTS ........................................................................................................................................... 4
2. TRADITIONAL SAP BUSINESS SUITE TOPOLOGIES .................................................................................. 5
COMMUNICATION BETWEEN APPLICATION SERVER AND DATABASE ................................................................................. 5
3. DESCRIPTION OF TEST ENVIRONMENT .................................................................................................. 6
HARDWARE AND NETWORK SETUP ........................................................................................................................... 6
TESTED TOPOLOGIES .............................................................................................................................................. 7
SAP 2-tier ...................................................................................................................................................... 7
Consolidated Para 3-tier / 3-tier-in-a-box ................................................................................................. 7
SAP 3-tier ...................................................................................................................................................... 8
3-tier Campus/Cloud Simulation................................................................................................................... 8
SAP WORKLOAD SCENARIOS ................................................................................................................................... 9
NIPING Network Round-Trip Time (RTT) .................................................................................................... 9
SAP ERP Workload Simulation .................................................................................................................... 10
DB ROW-Select ........................................................................................................................................... 11
SAP Client Copy ........................................................................................................................................... 11
4. RESULT ANALYSIS ................................................................................................................................ 13
NIPING NETWORK ROUND-TRIP TIMES .................................................................................................................. 13
INTERACTIVE SAP WORKLOAD ............................................................................................................................... 17
SAP BACKGROUND PROCESSING ............................................................................................................................ 19
DB-Select Simulation [SINGLE_MULTI_READ] by SAP................................................................................. 19
SAP Client Copy ........................................................................................................................................... 20
WIDE AREA NETWORK SIMULATION ....................................................................................................................... 23
NIPING Network Round-Trip Times ............................................................................................................ 23
Interactive SAP Workload ........................................................................................................................... 24
SAP Background Processing ....................................................................................................................... 25
SAP Client Copy ........................................................................................................................................... 26
SAP - SD Queries [DB_READ_UPDATE] ....................................................................................................... 27
5. POWER SYSTEMS (AIX) SPECIFIC OBSERVATIONS ................................................................................ 29
ADAPTER / OS / NETWORK SETTINGS ..................................................................................................................... 29
VIO UTILIZATION ................................................................................................................................................ 31
6. SUMMARY ........................................................................................................................................... 33
7. TECHNICAL APPENDIX ......................................................................................................................... 34
DETAILED TEST LANDSCAPE NETWORK LAYOUT AT IBM CLIENT CENTER IN MONTPELLIER ................................................. 34
ANUE NETWORK LATENCY SIMULATOR .................................................................................................................. 36
8. ABOUT THE AUTHORS ......................................................................................................................... 38
9. TRADEMARKS AND SPECIAL NOTICES .................................................................................................. 39

IBM SAP International Competence Center


Copyright IBM Corp. 2013

3

Figures

FIGURE 1 SAP APPLICATION TIERS ............................................................................................................................... 5
FIGURE 2 SCHEMATICS OF TEST-ENVIRONMENT .............................................................................................................. 6
FIGURE 3 TESTED TOPOLOGIES..................................................................................................................................... 7
FIGURE 4 NIPING ROUND-TRIP TIMES ........................................................................................................................ 15
FIGURE 5 LARGE PACKETS BANDWIDTH DB<->APP-SERVER ............................................................................................. 16
FIGURE 6 SAP ERP DB-DIALOG TIMES ....................................................................................................................... 17
FIGURE 7 INCREASE OF SAP ERP SD DB-REQUEST TIMES VS. 2-TIER ................................................................................ 17
FIGURE 8 ROW SELECT TIME OVER QUERY RESULT VOLUME .............................................................................................. 19
FIGURE 9 SAP CLIENT COPY ELAPSED RUNTIME ............................................................................................................. 20
FIGURE 10 INCREASE OF SAP CLIENT COPY PROCESSING TIME VS. 2-TIER ........................................................................... 21
FIGURE 11 PARALLEL CLIENT COPY PROCESSES .............................................................................................................. 22
FIGURE 12 ANUE DELAY VERIFICATION BY NIPING ......................................................................................................... 23
FIGURE 13 ERP DB-WAN RESPONSE TIME ................................................................................................................. 24
FIGURE 14 EXPONENTIAL INCREASES IN GUI RESPONSE TIME ........................................................................................... 25
FIGURE 15 DB-BACKGROUND PROCESSING SELECT TIMES (ZTEST-ABAP) ........................................................................ 26
FIGURE 16 3-TIER CLIENT COPY RUNTIME OVER APP-SERVER. LATENCY ............................................................................. 27
FIGURE 17 DB SELECT MIX REPORT ........................................................................................................................... 28
FIGURE 18 IMPACT OF ETHERNET ADAPTER TUNING ON ROUND-TRIP TIMES (IN MS) ............................................................. 30
FIGURE 19 NETWORK ROUND-TRIP TIMES (IN MS) USING DIFFERENT PROCESSOR MODES FOR VIO SERVERS .............................. 31

Tables

TABLE 1 CATEGORIZED RESULT SERIES.......................................................................................................................... 13
TABLE 2 LAN ROUND-TRIP TIMES ............................................................................................................................... 14
TABLE 3 NIPING ROUND-TRIP TIMES ............................................................................................................................ 15
TABLE 4 INCREMENTAL DB-REQUEST TIME VS. 2-TIER .................................................................................................... 18
TABLE 5 INCREMENTAL CLIENT COPY RUNTIME VS. 2-TIER ............................................................................................... 21
TABLE 6 PARALLELIZATION GAINS OF CLIENT COPY ......................................................................................................... 22
TABLE 7 INCREASE OF DB-REQUEST TIME IN WAN VS. 2-TIER ......................................................................................... 25


IBM SAP International Competence Center


Copyright IBM Corp. 2013

4

1. Introduction

Today we see a continued trend towards 3-tier client/server implementations at SAP
customers. In combination with virtualized platforms, this provides a high degree in
flexibility and agility.
However, 3-tier topologies introduce a new level of network complexity and
performance impact (latency, bandwidth) compared to a centralized 2-tier SAP system
setup. This is even more important when we compare the speedup of CPUs, I/Os (SSD,
Flash) versus network evolution during the past years.

Furthermore, SAP customers start accommodating the cloud computing paradigm and
start using Cloud based application server resources for special purposes (e.g. peak
load processing, SAP upgrades). This introduces wide-area-network effects to their SAP
applications. Obviously, the influence of the network can become the dominating factor
for response and processing time.

The IBM SAP International Competence Center in Walldorf together with the IBM Client
Center in Montpellier conducted a comprehensive series of measurements to quantify
the impact of 2-tier versus 3-tier topologies for different SAP workload characteristics.
The results of this study are represented and commented in this paper.
The SAP AG performance team contributed with know-how and test reports.







Acknowledgements

Special Thanks To:

Thomas Glaser, IBM Power Systems Technical Sales Manager Europe, for his
overall project sponsorship.

Dr. Ulrich Marquard, Senior Vice President and Head of Performance, Data
Management & Scalability at SAP AG Germany, and his team for providing
valuable test cases, consulting, and contribution while performing the tests and
editing this document.
IBM SAP International Competence Center


Copyright IBM Corp. 2013

5

2. Traditional SAP Business Suite Topologies

The SAP Business Suite backend architecture comprises of three layers, a database, an
application, and a presentation layer. These layers can be deployed in either 2-tier
mode (DB and application server processes run within a single OS image), or as a 3-tier
configuration where the database and application layers run in separate OS images and
have to communicate via a logical or physical network connection. The first tier is the
presentation layer.


Figure 1 SAP Application Tiers

Each topology has different characteristics in regard to complexity, flexibility and
resiliency.

Communication between Application Server and Database

The introduction of an additional TCP/IP network stack between the database and
application servers for 3-tier configuration has an impact on SAP transaction response
and the runtime of background jobs. In 2-tier implementations (dependent on the used
database version) the database client to server communication path can be optimized
to use an inter-process communication method. In a 3-tier configuration, database
accesses always have to pass through the TCP/IP software stack as well as through
some virtual (hypervisor) or physical (network interface adapter, LAN switches, WAN)
communication layer.
Each database access from the application server needs to pass through the complete
TCP/IP layer twice: first issuing a DB-request (select, update etc.) and then receiving
the results, either in form of selected business data or a transaction commit.


IBM SAP International Competence Center


Copyright IBM Corp. 2013

6

3. Description of test environment

Hardware and Network Setup

The infrastructure was set up and operated at the Montpellier IBM Client Center. The
SAP related tests were executed remotely by members of the IBM SAP Competence
Center in Walldorf via VPN access.

The following components were used to build the test landscape:

2x IBM Power 750 servers (model 8408-E8D)
o 32-core POWER7+ CPUs at 4GHz with 1TB of RAM
o Dual Virtual I/O servers at level 2.2.2.2 (with efix IV37111m2a,
IV38225s2a, IV39725m2a)
o All logical partitions at AIX level 7.1 TL2 SP2

IBM Storwize V7000 Storage Subsystems (model 2076-324)
o FC-attached
Dedicated network and SAN infrastructure
o 1Gb and 10Gb Ethernet LAN adapters and switches
o 8Gb fibre SAN adapters and switches

ANUE Network Latency Simulator
o 2 x 1Gb Ethernet ports


Figure 2 Schematics of Test-Environment

A symmetric EtherChannel connection between the two servers was chosen to resemble
realistic customer setups. Configuration details are shown on page 34 in the Technical
Appendix section.

The main focus of the test scenarios was to evaluate the impact of different network
topologies on database request times and subsequently the transaction response and
IBM SAP International Competence Center


Copyright IBM Corp. 2013

7

runtimes of background jobs. To avoid any other load dependent influencing factors,
the workloads and the partitions were sized so that CPU and memory utilization did not
create any bottlenecks. The maximum CPU utilization on each partition was kept in the
30-40% range.


Tested Topologies
SAP 2-tier

SAP workload is processed within a single partition (beaci01).
No external network traffic between database and application
server processes is required for transaction processing.

For all test series in this paper the database and SAP Central
Services remained on this partition. Only the application
workload was executed and measured on other partitions.








Consolidated Para 3-tier / 3-tier-in-a-box

An additional SAP application server instance is hosted within a
second partition (beaas12) on the same physical server.
Communication via virtualized Ethernet adapters is provided by
the PowerVM hypervisor using the integrated virtual Ethernet
capabilities.







Figure 3 Tested Topologies
IBM SAP International Competence Center


Copyright IBM Corp. 2013

8


SAP 3-tier
The application server resides on
another partition (beaas21) on a
second physical server.
The partition setup was identical to
beaas12. TCP/IP traffic had to
pass through the VIO server
partitions and physical network
segments.
For the dedicated adapter tests,
the Ethernet adapters were
assigned directly to partitions
beaci01 and beaas21. Thus any
additional latency induced by the
VIO servers was eliminated for
these test series.




3-tier Campus/Cloud Simulation

Same as the SAP 3-tier scenario
with a latency simulation device
(ANUE) added to the external
communication path. Although a
single physical device, simulated
latency time was split 50:50
across the two network paths
(inbound and outbound) during
our test series. The time delays on
both network paths were set
between 0 and 125 ms, the
resulting round-trip delay
experienced by the applications
was two times this time delay.


See page 36 for more details about the ANUE device.





IBM SAP International Competence Center


Copyright IBM Corp. 2013

9

SAP Workload Scenarios

A number of different workloads were used for simulating load on the test systems.
These workloads are taken from typical SAP applications and the measured results
(focussing on database request time) can be adapted to other SAP application load
scenarios.
NIPING Network Round-Trip Time (RTT)
SAP delivers the NIPING tool to test the SAP NI (Network Interface) layer. This layer is
used for inter application server, RFC, and GUI communication. As the name suggests
the tool works similar as ping and can be used to test the network connectivity. SAP
note 500235 (Network Diagnosis with NIPING) provides a detailed description of the
tool.
When using NIPING, at least two sessions must be started. The first one provides the
server role and it will just receive and immediately send back data packages that are
send by NIPING client sessions. The NIPING client can be controlled by command line
parameters to focus on different aspects of network connectivity (round-trip time
versus throughput).

In our test landscape, we used the following procedure:

1. Start NIPING server on beaci01 (database server)
# niping s I 15

This command starts the NIPING tool in a server role and automatically stops after an
idle time of 15 seconds (-I 15).

2. Start NIPING clients on all application server partitions:
# niping c H beaci01 -B 1 -L 10000
# niping c H beaci01 -B 100000 -L 1000

The first test uses a data-buffer size of one (-B 1) and sends the data packages 10000
times (-L 10000). This test measures the network round-trip time. The second test uses
a large data buffer (100000 bytes) to measure the network throughput.

The following is a sample output of the tool:
Thu Aug 29 09:29:54 2013
connect to server o.k.

Thu Aug 29 09:29:56 2013
send and receive 10000 messages (len 1)

------- times -----
avg 0.192 ms
IBM SAP International Competence Center


Copyright IBM Corp. 2013

10

max 9.754 ms
min 0.155 ms
tr 10.197 kB/s
excluding max and min:
av2 0.191 ms
tr2 10.248 kB/s

For the round-trip times we look at the av2 value, which is the average response time
for the number of messages sent excluding the maximum and minimum values. The tr2
value provides the network throughput figure.

Another tool to check network round-trip times is the standard ping utility. However,
the lowest time resolution of the AIX implementation of ping is in milliseconds. The
typical round-trip times in our test environment were in the microsecond range;
therefore, the standard ping command was not adequate for most of the test scenarios.

To validate the results of NIPING, we used another ping like program called fping (see
http://fping.org). This tool uses the Internet Control Message Protocol (ICMP) to test
the network connectivity to a target host and provides the desired round-trip time
resolution in microseconds. It has to be downloaded and installed on each server. We
collected fping results in all our test scenarios. As the results matched closely and
consistently with the output of the NIPING tests, we will provide only the NIPING
results in this paper.

SAP NIPING and fping are basic tests tools to verify the network (typically routing,
firewall settings), but can also be used for basic performance tests with no other
components included but the SAP fundamental network layer.

SAP ERP Workload Simulation

We chose the standard SAP ERP Sales and Distribution (SD) benchmark to check the
impact of network latency on online transactions. The SD benchmark reflects a typical
SAP application where the SAP application server is communicating with the database.
This benchmark is also used as a reference for all SAP sizing (SAPS) calculations (for
details please see: http://www.sap.com/sizing).

The transaction load driver was installed together with the database/central instance
partition (beaci01) and a fixed number of users were simulated on all three application
servers simultaneously to produce a steady work-load of online ERP transactions. All
users executed the same set of predefined transactions in a fixed number of loops. With
this setup a single test run produced the comparison data for all three network
topologies (2-tier, para-3-tier, and 3-tier).
The key metric that is impacted by the network latency from an application point of
view is the database request time for dialog and update work processes. We focused on
these statistics in the result analysis.

We modified the test procedure for the wide-area-network (WAN) runs. To produce a
reliable set of results, the SAP SD benchmark toolset requires that all tested application
IBM SAP International Competence Center


Copyright IBM Corp. 2013

11

servers start and finish the high-load interval (the period of time where all users are
logged on and generate a steady work-load) on each server at about the same time.

The WAN simulation has a heavy impact on end-user response times, so the 3-tier
application server connected via the delayed network path experiences a significantly
longer high-load interval.
To avoid this problem, we performed a baseline measurement simulating work-load
only on central instance application server running on beaci01. For the subsequent runs
with an increasing number of network round-trip delays, the work-load was then
executed on the 3-tier application server instance beaas21 only. This also means that -
compared to the earlier runs without simulated network delays - only one third of the
overall total workload was executed.


DB ROW-Select

This special load scenario executes a number of simple DB calls. This scenario is typical
for network I/O intensive background processing with many database accesses like
running analysis reports or end of year activities, etc.
We distinguish between two flavors:


1. [SINGLE_MULTI_READ] Single and multiple database reads within one
statement. This is typically found for analysis reports and mass data reads.

2. [DB_READ_UPDATE] Here database reads and writes are executed in a typical
SAP application manner. This scenario reflects heavy network I/O load SAP
applications, like mass data updates.



SAP Client Copy

We used local SAP client copies as another method to simulate the impact of network
latency on background jobs with a significant number of database accesses. This
workload shows both effects of high network I/O load combined with optimized data
access strategies.

A client copy has to read all client specific data of the specified source client from the
database and then insert the same data again using the new client number.
We selected the SAP_ALL client copy profile, which copies all client specific data
without change documents.
To ensure that we always copied the same amount of data, we first created a new client
100 as a copy of one of the SD benchmark clients. This client was not used for any
other tests, so after the initial copy, the client data was static.
Next we copied this client to a new target client 200. The first client copy to a new
client just inserts the data in the database. Any subsequent client copy to the same
client will delete existing data before inserting the copied data in the new client. So
after the initial copy to the target client 200, all following client copies used the same
amount or read/delete/insert statements on the database.
In our test system a client copy had to copy 62.544 tables with about 4 GB of data.

IBM SAP International Competence Center


Copyright IBM Corp. 2013

12


As test result we noted the runtime in seconds as reported at the bottom of the detailed
file log output of the Client Copy/Transport Log Analysis tool (transaction SCC3):


Exit program USERBUF_RESET successfully executed 13:41:15
Selected tables : 62.544
Copied data in kBytes : 4.098.311
Deleted data in kBytes : 4.098.329
Program ran successfully
Runtime (seconds) : 2.017
End of processing: 13:41:15


For each test scenario, we ran the client copy on all three application servers. The client
copies were scheduled as background jobs using the desired application server as
background server without any parallel processing.


Parallel Process Client Copy

To check the effects of an increased network load on the client copies, we also ran a
number of tests with various degrees of parallel processing.
The Client Copy tool (transaction SCCL) contains a Parameters for Parallel Processes
pushbutton that allows configuring the maximum number of processes and a RFC
server group to be used for the processing. The client copy process will run as a single
process during the analysis and post-processing phase, but will distribute work
packages to parallel processes to handle the actual copy operations.
We created three RFC server groups, each group containing only a single application
server. The client copies were scheduled similar to the previous scenarios on all three
application server. However, during the background scheduling, the parallel processes
option was used to specify the desired amount of parallel processing and the
appropriate RFC group for the tested application server.
We tested three scenarios with three, six, and nine parallel processes on each
application server.

IBM SAP International Competence Center


Copyright IBM Corp. 2013

13

4. Result Analysis

For analysis, the test series were categorized according to the topologies and physical
network characteristics. The provided charts will contain the following measurement
series:







Table 1 Categorized result series

Where applicable, differences within a category caused by (intentional) changes of
network parameters are discussed in the chapter Adapter / OS / Network Settings of
this paper.
Often results are normalized to the 2-tier setup which allows for a quick assessment of
the relative effects of different setups. Results are sorted ascending, while the sequence
of series can vary per test. Absolute times will be specified to allow quantified
extrapolations (at ones own risk).


NIPING Network Round-Trip Times

Network latency in Ethernet networks is often measured and documented as one-way
latency (the amount of time it takes from the source sending the packet until the target
receiving it). From an application point of view, the more important value is the round-
trip latency, which is the one-way latency from the source to the destination plus the
one-way latency from the destination back to the source. This round-trip network
latency does not include the amount of time that is spent in an application on the
destination system processing the packet. Any references to network latency in
this document will refer to round-trip latencies (unless explicitly stated
otherwise).

Before starting to analyse database request times on SAP application servers in 3-tier
configurations, we recommend to perform a quick sanity check to verify the network
round-trip times between application servers and the database server. We recommend
using the SAP NIPING tool, as it is already available in the SAP instance binary
directory
1
.




1
Please note that according to the definition above, network round-trip latency does not include packet processing time on
the destination. Nevertheless, NIPING provides reasonable good approximations of this network round-trip latency, as the
server process just receives the packets and immediately sends them back again.
Topology / Series
2-tier
Para 3-tier
3-tier virtualized 1 Gbit Ethernet
3-tier virtualized 10 Gbit Ethernet
3-tier dedicated 1 Gbit Ethernet
3-tier dedicated 10 Gbit Ethernet
3-tier WAN virtualized 1 Gbit Ethernet
IBM SAP International Competence Center


Copyright IBM Corp. 2013

14

The following are some typical values for expected NIPING round-trip times with small
packages in current customer LAN networks:


Round-trip time in
ms
Rating
less than 0.3 Very Good
between 0.3 and 0.7 Average
larger than 0.7 Network configuration should be
included in the performance analysis of
database response time problems
Table 2 LAN round-trip times

The chart below shows the NIPING round-trip times measured with the different
topologies. We measured round-trip times of about 30 microseconds for the 2-tier
configuration. From an application perspective this means that each database request
would have a network delay of at least 30 microseconds contributing to the response
time
2
.

Moving to the para-3-tier configuration, where both partitions reside on the same
physical server and communicate via the PowerVM hypervisor, we saw that the round-
trip time doubled to about 60 microseconds.

For the 3-tier configurations the round-trip times increased again to about 100
microseconds with dedicated Ethernet adapters.

In a fully virtualized 3-tier setup, all network packets have to pass through a VIO server
partition on both machines. Therefore the network round-trip times increased again to
about 200 microseconds.

For small network packets, theres little difference between the 10 Gbit and 1 Gbit
Ethernet adapters results. For network transfers with very small payloads, most of the
time is actually spend in processing the TCP/IP protocol stack on the servers and the
advantage in physical network speed does not play a significant role. However, when
looking at network transfers with larger payloads, the advantage of 10 Gbit networks
becomes apparent as they provide significantly improved network round-trip times
leading to better application response times.


2
Database clients in 2-tier configurations that can be configured to use an IPC connection for local communication might
achieve slightly better values for this communication delay.
IBM SAP International Competence Center


Copyright IBM Corp. 2013

15


Figure 4 NIPING round-trip times


niping round-trip time/ms

small packets
(10.000x 1Byte msgs.)
large packets
(1.000x 100.000 Byte msgs.)
2-tier 0,03 0,30
para 3-tier 0,06 0,66
3-tier ded. 10Gb 0,10 0,77
3-tier ded. 1Gb 0,11 1,82
3-tier virt. 10Gb 0,19 1,41
3-tier virt. 1Gb 0,21 2,14
Table 3 niping round-trip times

Although the main focus of this paper is on network round-trip delays, weve included
the following chart documenting the network throughput rates of the various scenarios
for completeness. The throughput rates become more relevant for administrative tasks
like database backups or for database queries that transfer a large amount of data per
request.
The test with small (one byte) data packets does not really test the network throughput
but the latency and the results show that there is very little difference between the 1Gb
and 10Gb Ethernet numbers. The additional processing required on the VIO servers
actually lead to lower throughput numbers for the 3-tier virtual 10 Gbit compared to the
3-tier dedicated 1Gbit scenario.
IBM SAP International Competence Center


Copyright IBM Corp. 2013

16

The picture changes for the real throughput test with large packets. The faster speed of
the 10 Gbit setup provides a significant improvement in network throughput and that
even a fully virtualized 10 Gbit Ethernet scenario provides better throughput rates than
a scenario with dedicated 1 Gbit Ethernet adapters and switches.


Figure 5 large packets bandwidth DB<->App-Server

IBM SAP International Competence Center


Copyright IBM Corp. 2013

17

Interactive SAP Workload

The chart in Figure 6 shows the average database request times for the SAP SD
benchmark transactions for the dialog and update tasks.


Figure 6 SAP ERP DB-Dialog times

The next chart and Table 4 show the relative increase in database request times of the
various scenarios compared to the 2-tier reference measurement.


Figure 7 Increase of SAP ERP SD DB-Request times vs. 2-tier

IBM SAP International Competence Center


Copyright IBM Corp. 2013

18


Incremental DB-Request times vs. 2-tier
Query Update
para 3-tier +16% +18%
3-tier ded. 10Gb +27% +34%
3-tier ded. 1Gb +57% +64%
3-tier virt. 10Gb +70% +81%
3-tier virt. 1Gb +78% +88%
Table 4 Incremental DB-Request time vs. 2-tier

The SAP SD benchmark transactions and their database access patterns have been
highly optimized in the past. With the increase of processing power over the last
number of years, the transactions have now become more or less light-weight
transactions.

From an end-user perspective, the more interesting statistic is the database request
time for the task type dialog as it is part of the transaction response time. Comparing a
dialog database request time of little more than 5 milliseconds in the 2-tier setup with
about 10 milliseconds for the 3-tier configuration does not sound that much. An end-
user would certainly not notice the difference for this particular set of transactions.
However, the relative increase in database request time in our test scenarios was more
than 70%.

In customer production systems, there are many business critical transactions that are
substantially more heavyweight and often have to fit in common service level
agreements (SLA) for end-user response times below one second. Let us assume a
transaction with a response time of just under one second, where the database request
time contributes 40% (or 400 milliseconds) of that time. If the database request time
increases by 70% on a 3-tier application server instance, the end-user response time
for this transaction would increase to 1.28 seconds and suddenly violate the SLA. An
end-user will probably complain about a nearly 30% longer dialog response time in this
case.

In general the network part is not dominating the overall response time for SAP
standard transactions and they work well in 3-tier scenarios. Nevertheless, if a
customer has long-running business critical transactions with many database accesses,
one should carefully analyze the potential impact of additional network delays before
moving to a 3-tier configuration.


IBM SAP International Competence Center


Copyright IBM Corp. 2013

19

SAP Background Processing
DB-Select Simulation [SINGLE_MULTI_READ] by SAP

The chart below shows the differences in database response times for simple select
statements using the different topologies. The report was parameterized to perform
2000 select statements for each select variant (select single, select 2 rows, select 3
rows ). The database request time in this chart was normalized to an average time
per selected row to provide a better comparison of the results for the different variants.


Figure 8 Row select time over query result volume

As expected an increase in the network delay has a much worse impact on database
response times for applications doing a lot of single record reads compared to optimized
select statements where more rows are retrieved with a single query. This is also
described in the SAP Performance Standard. The test series clearly show the potential
improvements by optimizing the application.

The worst case scenarios obviously are application reports that perform millions of
database accesses retrieving one record only for each access. Running in a 2-tier setup,
each database request would take about 94 microseconds. Comparing this to a common
customer setup with a fully virtualized environment and a 10 Gigabit Ethernet
infrastructure, the database request time for 3-tier scenarios would increase to about
260 microseconds, which is a factor of 2.75 slower or a difference of 166 microseconds
per access.
Lets assume a hypothetical background job that performs ten million such database
accesses. The total database request time for this job would be about 15 minutes for
the 2-tier scenario and 43 minutes with the 3-tier configuration.
A significant portion of that time could be compensated by rewriting the application to
fetch more than one record with each database access. If an application rewrite is not
an option, then we clearly recommend scheduling such background jobs on a 2-tier
application server instance only.
IBM SAP International Competence Center


Copyright IBM Corp. 2013

20


Please note that this 3-tier number was measured in an environment with very good
NIPING network round-trip times of less than 200 microseconds. Customer
environments with NIPING round-trip times of more than 500 microseconds are not
uncommon and this additional network latency would substantially increase the
difference in database request time for the 3-tier scenario.

SAP Client Copy

The SAP client copy process already exploits optimized database statements. It is a
typical example for how to minimize the impacts of network delays by using optimal
database access strategies.

Despite this optimization, there is still a significant impact of the 3-tier scenarios on the
overall runtime. The absolute runtime increased from about 55 minutes for the 2-tier
scenario to 60 minutes for the para 3-tier configuration.

For the fully virtualized setups, the runtime was 72 minutes with the 10 Gbit Ethernet
and 75 minutes with 1 Gbit Ethernet network.


Figure 9 SAP Client Copy elapsed runtime

The following Figure 10 and Table 5 show the increase in runtimes compared to the
2-tier reference run.

IBM SAP International Competence Center


Copyright IBM Corp. 2013

21


Figure 10 Increase of SAP Client Copy processing time vs. 2-tier


Incremental SAP Client Copy time
vs. 2-tier
para 3-tier 9%
3-tier ded. 10Gb 14%
3-tier ded. 1Gb 21%
3-tier virt. 10Gb 31%
3-tier virt. 1Gb 36%
Table 5 Incremental Client Copy runtime vs. 2-tier

Optimized database access strategies help to reduce the impact of network round-trip
delays in 3-tier configurations, but even then the overall runtime of our client copy
scenarios increased by more than 30%.

Compared to the physical separation of servers, a virtualized 3-tier setup on a single
Power System showed only a relatively small increase in elapsed processing time.

IBM SAP International Competence Center


Copyright IBM Corp. 2013

22

Effects of Process Parallelization for Client Copy

One way to mitigate the problems with long running background jobs is to exploit
parallel processing. This is not always an option, but when it is available, it can be used
to reduce the processing time to acceptable levels at the expense of an increased CPU
usage.

Weve compared the runtime of SAP client copies for all three tested scenarios with
various levels of parallel processing to the reference number of the single process client
copy on the 2-tier application server instance.
The chart below shows that the speed-up with parallel processing was pretty constant
across all topologies.

The second purpose of this test scenario was to check, whether introducing additional
network load (by running multiple processes in parallel) would have noticeable
performance impacts. The slope of the curves for the various scenarios is about the
same, which shows that the chosen workload was not high enough to reach any
network limitation.


Figure 11 Parallel Client Copy processes


This table shows the average speed-up for all three topologies:


#of parallel processes 3 6 9
Runtime acceleration 33% 47% 53%
Table 6 Parallelization gains of Client Copy


IBM SAP International Competence Center


Copyright IBM Corp. 2013

23

Wide Area Network Simulation

We used the workloads described in section SAP Workload Scenarios to measure the
impact of wide area network latencies on SAP database request times.

The network latency simulator (ANUE) was used to add a delay to the communication
path between beaci01 and beaas21. Therefore, the test procedure was modified slightly
to run the longer running workloads (ERP benchmark, simulated background jobs, client
copies) only on the remote application server instance beaas21.

We measured the performance impact at various simulated network delays between 1
and 125 milliseconds. After the test sequence with 5 milliseconds network latency
completed, it was obvious that it would not make sense trying to run the ERP SD
benchmark with an even higher network delay and a SAP client copy would have taken
days. Therefore we decided to reduce the test cases to NIPING and the simulated
background jobs for the remaining two latency tests (50 ms and 125 ms).

The measured round-trip time for the LAN tests with dedicated 10 Gb Ethernet adapters
(without the ANUE network delay) was about 0.1 ms (100 microseconds) and 0.2 ms in
a fully virtualized environment.

As the ANUE device provided us the opportunity to simulate LAN networks which do not
provide such good round-trip times as our test landscape, we decided to perform also a
few tests with network latency delays in the typical LAN range (latency delays of 0.1,
0.25, and 0.5 ms which correspond to round-trip delays of 0.2, 0.5, and 1 ms).
NIPING Network Round-Trip Times
The following chart shows the NIPING round-trip times with various simulated network
delays. The results are essentially exactly the same as in the 3-tier measurements
before plus the simulated round-trip delay added on top. This is no surprise as the
NIPING processes basically do nothing else other than sending and receiving network
packets.
We used these NIPING tests mainly as a verification that ANUE network delay was
configured as intended before the subsequent test scenario runs.

Figure 12 ANUE delay verification by niping
IBM SAP International Competence Center


Copyright IBM Corp. 2013

24

Interactive SAP Workload

While the end-user response time for the SAP SD benchmark transactions increased
only slightly for the 3-tier application server instance in a controlled LAN environment,
the picture changes completely when extending the tests into a WAN network scenario.
The results show that the application response time gets unacceptable pretty fast.

The impact is biggest on update processing. As mentioned before, update processing
happens asynchronously and does not directly influence end-user response time.
However, the longer running update processing will block the SAP work-process that
executes the update task. Eventually, all available update work-processes will be busy
leading to additional queuing effects.

At a certain point, long running updates will effect dialog transactions too, as some
subsequent transactions expect that previously created documents are already stored in
the various business tables on the database.

What the charts do not show is that we had to change the work-process configuration a
number of times to compensate for the longer database processing. Otherwise,
transactions would have aborted (because of update being too slow) and the dialog und
update response times would have been much higher, as incoming requests would have
to wait for free work-processes.
We ran the LAN tests with a configuration of 12 dialog and 3 update work-processes. To
achieve a successful run for the 10 ms round-trip delay test, we had to double the
number of dialog work-processes to 24 and increase the number of update work-
processes to 18.


Figure 13 ERP DB-WAN Response time




IBM SAP International Competence Center


Copyright IBM Corp. 2013

25


Simulated round-trip
delay/ms
0 0,2 0,5 1 2 6 10
Increase DB dialog
request time vs. 2-tier
+98% 168% 295% 535% 1004% 2851% 4665%
Increase DB update
request time vs. 2-tier
+148% 250% 430% 764% 1407% 4213% 8374%
Table 7 Increase of DB-Request time in WAN vs. 2-tier



Figure 14 Exponential increases in GUI response time

SAP Background Processing

The next chart shows the average database request time per row for different network
delays and should be compared with Figure 8 on page 19. The scale on the y-axis is
logarithmic to allow for the wide variance of simulated network round-trip delays.
As in the NIPING case, the results are virtually the same as in the 3-tier measurements
shown in Figure 8 plus the simulated round-trip delay added on top.

Once again, using optimized database access patterns help to mitigate the impact of
the additional network delay.

The average select time per row in the 2-tier scenario was about 25 microseconds when
selecting 9 rows with each query. The respective number for the one millisecond round-
trip delay was about 160 microseconds.

Our hypothetical background job fetching 10 million rows with 9 rows per select would
run a little longer than 4 minutes on a 2-tier application server instance. The same
background job running with only one millisecond network delay would already need
more than 26 minutes.
IBM SAP International Competence Center


Copyright IBM Corp. 2013

26

This makes it obvious that applications performing a large number of database access
would not perform well in WAN environments.


Figure 15 DB-Background Processing Select times (ZTEST-ABAP)

SAP Client Copy

For this set of tests we scheduled the client copy with 9 parallel processes. The runtime
of the 2-tier reference measurement was about 28 minutes. This increased to 39
minutes for the 3-tier configuration.
The runtime increased to 70 minutes already with a 1 ms round-trip delay, which is
actually a delay one might experience in a somewhat problematic LAN configuration.
Moving to the higher delays, even with a network round-trip delay of 10 ms only, the
client copy runtime already increased to more than 5 hours.
This clearly shows that it does not make sense trying to run background jobs with a
major database component in a WAN setup.

IBM SAP International Competence Center


Copyright IBM Corp. 2013

27


Figure 16 3-tier Client Copy runtime over App-Server. Latency

SAP - SD Queries [DB_READ_UPDATE]

In this test we used the load scenario [DB_READ_UPDATE], which simulates SAP
transactions with heavy network I/O. The test report executed several sets of database
selects in a sequence.
We measured the total runtime and the average runtime for selecting a single database
row in milliseconds. Increasing the network round-trip delay once again shows a rise in
total response time caused by the increased request time for fetching a single database
row. With higher network round-trip times, the total test response time rises up
extremely.
Please note that the chart uses a logarithmic scale for the y-axis, because of the huge
differences in average request time per row (from about 500 microseconds in the 2-tier
case climbing up to almost 100 milliseconds for the test scenario with a simulated
network round-trip delay of 50 milliseconds).

IBM SAP International Competence Center


Copyright IBM Corp. 2013

28


Figure 17 DB Select Mix Report


IBM SAP International Competence Center


Copyright IBM Corp. 2013

29

5. Power Systems (AIX) Specific Observations
Adapter / OS / Network Settings

During the SAP tests, we tested various AIX system parameters, in particular at
network level, and analysed the impact on SAP results.

Network tuning (referred as tuned in tests):

During network traffic, interrupt coalescing is introduced to avoid flooding the host
with too many interrupts. Consider a typical situation for a 1-Gbps Ethernet: if the
average package size is 1000 bytes, to achieve the full receiving bandwidth, there will
be 1250 packets in each processor tick (10ms). Thus, if there is no interrupt coalescing,
there will be 1250 interrupts in each processor tick, wasting processor time with all the
interrupt processing.
Interrupt coalescing is aimed at reducing the interrupt overhead with minimum latency.
There are two typical types of interrupt coalescing in AIX network adapters.

Most 1-Gbps Ethernet adapters, except the HEA (Host Ethernet Adapter) adapter, use
the interrupt throttling rate method, which generates interrupts at fixed frequencies,
allowing the bunching of packets based on time. The default interrupt rate is controlled
by the intr_rate parameter, which is 10000 times per second by default.

Most 10-Gb Ethernet adapters and HEA adapters use an advanced interrupt coalescing
feature. A timer starts when the first packet arrives, and then the interrupt is delayed
for n microseconds or until m packets arrive.
For the 10-Gb Ethernet adapter, the n value corresponds to intr_coalesce, which is 5
microseconds by default. The m value corresponds to receive_chain, which is 16
packets by default. Note the attribute name for earlier adapters might be different.

Todays Power7+ processors are much faster than the processors that were dominant
when the 1G/10G interrupt coalescing AIX default parameters were chosen.
Additionally, the potential overhead of the processor is also linked to the type of
workload the processor needs to deal with. So we did several tests deactivating the
interrupt coalescing parameters on the 1G and 10G physical adapters.

1G adapters: Changing intr_rate parameter of the physical adapter from 10000 to 0.

10G adapters: Changing intr_coalesce parameter of the physical adapter from 5 to 0
and changing receive_chain of the physical adapter from 16 to 1.

Test conclusion
1G : intr_rate parameter=0 definitely improves the network latency. A system
measurement tool as well as SAP NIPING test demonstrate the latency is dropping
about 0.1ms thanks to this tuning (for example, from 0.28 to 0.2 ms for a system test
with VIO servers)

10G: The system measurement tool as well as SAP tests do not show sensible
improvement using these tuning.

IBM SAP International Competence Center


Copyright IBM Corp. 2013

30


Figure 18 Impact of Ethernet adapter tuning on round-trip times (in ms)



Throughput tuning (referred as tuned_2 in tests):

Using the experience of other benchmarks done in Montpellier that focused on network
throughput, we did some tests (referred as "tuned_2") with specific network
throughput parameters:

LPAR virtual adapters:
mtu=65390 (previously 1500)
mtu_bypass=on (previously off)

VIO Physical adapters:
flow_ctrl=yes
jumbo_frames=yes
large_send=yes

VIO Etherchannel adapters:
use_jumbo_frame=yes
hash_mode=src_dst_port
mode=standard

VIO SEA adapters:
jumbo_frame=yes (previously no)
large_receive=yes (previously no)
largesend=1 (previously 0)

Nevertheless, the SAP tests which were more latency driven did not show any
significant improvement using these specific throughput parameters.
IBM SAP International Competence Center


Copyright IBM Corp. 2013

31

VIO Utilization

The PowerVM virtualization platform offers several options to give processor capacities
to a logical partition:

Dedicated processors: simply assign real physical processors (cores) to the
logical partition.

Shared processors (micro-partitioning): create a processor shared pool
with physical processors (cores) and then assign a virtual fraction of this pool to
the logical partition.

Donating (shared dedicating) processors: Compromise between the
dedicated mode and the shared mode. Donating mode offers the simplicity of
the dedicated mode with approaching performance, and almost the flexibility of
the shared mode.

These options are available not only for general-purpose logical partitions but also for
specific logical partitions such as Virtual I/O servers.
When doing network tests, the best compromise regarding performance is achieved
with VIO servers in donating mode. This is also a recommendation from SAP specialists
and is the mode we used in all SAP test scenarios.

Figure 19 Network round-trip times (in ms) using different processor modes for VIO servers

Note:
The AIX nmon monitoring tool provides several reports about processors consumption
such as LPAR and CPU_ALL (but there are others), but all are not fully accurate
according to the partitions processor mode:
LPAR is more adapted to shared mode
CPU_ALL is more adapted to dedicated mode and donating mode

When using dedicated partition, the nmon tool does not even provide the LPAR report
as it is definitely not adapted to this mode. But as donating mode is a dedicated mode
with some shared mode inside, the nmon tool provides both LPAR and CPU_ALL reports
in such case, even if the LPAR report is definitely not accurate.
IBM SAP International Competence Center


Copyright IBM Corp. 2013

32


Hereunder an example in our case, using VIO servers in donating mode:



The LPAR report seems to show the VIO server is working harder before the test starts
and after the test stops!



While the CPU_ALL report shows accurate data without this side effect before and after
the test begins.



IBM SAP International Competence Center


Copyright IBM Corp. 2013

33

6. Summary

The measurement results documented in this paper show that the network round-trip
time for database accesses can have a substantial impact on SAP background job
processing times and transaction response times. An increase in processing time of
more than 30% compared to a 2-tier scenario is quite common, even for applications
which make use of optimized database access patterns.

Less optimized applications that perform a large number of database requests i.e.,
where each request returns only a few records - are especially problematic. Their
runtime can easily increase by 100% or more.

Before moving an existing 2-tier SAP landscape to a 3-tier configuration, one should
carefully examine the database times and access patterns of affected critical business
transactions and background jobs to ensure that the additional network latency does
not result in SLA violations.

Reversely, for existing 3-tier implementations a good method to improve processing
times for problematic applications with a significant amount of database request times
is to run such applications on a 2-tier application server instance (using background job
scheduling or special logon groups for interactive transactions).

The implementation of a 3-tier-in a box setup (multiple partitions within a single
physical server, in this document referred to as para 3-tier) using PowerVM
connectivity provide an alternative with relatively low (<10%) incremental network
delays while maintaining a high degree in resource and administration flexibility.

An often advertised use case for hybrid cloud scenarios is to buy incremental
application server capacity for certain periods of time only, for example for month-end
or year-end processing.
According to our test results this temporary increase of processing capacity is only
feasible for applications with a rather small database component and thus low network
traffic between DB-server and remote (cloud) application servers.
Most month-end and year-end processing jobs perform a large number of database
accesses. Running them in a cloud would most likely fail, unless a cloud provider can
guarantee network round-trip times equivalent to Gbit Ethernet LAN networks.
In many cases, PowerVM capabilities for non-disruptive resource adjustment will
provide a better option delivering constant processing times (given a flexible billing
model is established).



IBM SAP International Competence Center


Copyright IBM Corp. 2013

34

7. Technical Appendix
Detailed Test Landscape Network Layout at IBM Client Center in Montpellier

IBM SAP International Competence Center


Copyright IBM Corp. 2013

35


IBM SAP International Competence Center


Copyright IBM Corp. 2013

36

ANUE Network Latency Simulator



The ANUE was configured in the 1Gbit network, as it provides only two 1Gbit interfaces
for external connections.

One can modify the latency at blade1 level and at blade2 level. During out tests, we
kept both values symmetric.
Looking at server YO88WU, all outbound traffic is delayed by blade1 and passed
through by blade 2. All inbound traffic is delayed by blade2 and passed through by
blade1. The overall round-trip time referred to during this document is defined by the
aggregate delay time of both ANUE blades plus the native network latencies in the LAN
segments described above.

Some screenshots from ANUE configuration GUI:






IBM SAP International Competence Center


Copyright IBM Corp. 2013

37



Note that the Latency per Packet output in this terminal window is actually the round-
trip time measured in the netlatency.ksh script. That means a network delay of 4 ms
configured on each blade results in a round-trip time of about 8.2 ms.

ANUE Network statistics panel:



IBM SAP International Competence Center


Copyright IBM Corp. 2013

38


8. About the authors

Walter Orb walter.orb@de.ibm.com
Walter Orb is a technical consultant working at the IBM SAP International
Competence Center in Walldorf. He has more than twenty years experience with
SAP on AIX and Power Systems, with a major focus on system performance,
benchmarks, and large-scale system tests.

Matthias Kchl koechl@de.ibm.com
Matthias Kchl is a member of the IBM SAP International Competence Center in
Walldorf. As a certified Senior Architect and CIM certified Marketer he is in charge for
field enablement, marketing and education around SAP Solutions running on the IBM
POWER and PureFlex platforms.
Fabrice Moyen fabrice_moyen@fr.ibm.com
Fabrice Moyen is a benchmark manager at the EMEA IBM Client Center in
Montpellier, France, working in the IBM Power benchmark team, and with more
specific competencies on PowerHA, IBM Systems Director and PureFlex. He is also a
member of the new IBM Power Linux Center recently inaugurated in Montpellier (in
addition to three other IBM Power Centers in Austin, New-York and Beijing).

Hans-Jrgen Reiss - hans-juergen.reiss@sap.com
Hans-Jrgen Reiss is a member of the Performance & Scalability team from SAP
Walldorf, Germany. He is a specialist in network sizing and analysis for SAP
applications. He is working with SAP development on architectural design and
optimization of SAP applications for SAP cloud solutions.








IBM SAP International Competence Center


Copyright IBM Corp. 2013

39

9. Trademarks and special notices
Copyright IBM Corporation 2013.
References in this document to IBM products or services do not imply that IBM intends to make them available
in every country.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked
on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S.
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at "Copyright and trademark information" at
www.ibm.com/legal/copytrade.shtml.
SAP and other SAP products and services mentioned herein, as well as their respective logos, are trademarks
or registered trademarks of SAP AG in Germany and in several other countries all over the world.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United
States, other countries, or both.
Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other
countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Information is provided "AS IS" without warranty of any kind.
All customer examples described are presented as illustrations of how those customers have used IBM
products and the results they may have achieved. Actual environmental costs and performance characteristics
may vary by customer.
Information concerning non-IBM products was obtained from a supplier of these products, published
announcement material, or other publicly available sources and does not constitute an endorsement of such
products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available
information, including vendor announcements and vendor worldwide homepages. IBM has not tested these
products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM
products. Questions on the capability of non-IBM products should be addressed to the supplier of those
products.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and
represent goals and objectives only. Contact your local IBM office or IBM authorized reseller for the full text of
the specific Statement of Direction.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive
statement of a commitment to specific levels of performance, function or delivery schedules with respect to any
future products. Such commitments are only made in IBM product announcements. The information is
presented here to communicate IBM's current investment and development activities as a good faith effort to
help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. The actual throughput or performance that any user will experience will vary depending upon
considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the
storage configuration, and the workload processed. Therefore, no assurance can be given that an individual
user will achieve throughput or performance improvements equivalent to the ratios stated here.
Photographs shown are of engineering prototypes. Changes may be incorporated in production models.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.