You are on page 1of 181

HA215

SAP HANA 2.0 SPS05 - Using


Monitoring and Performance Tools

.
.
PARTICIPANT HANDBOOK
INSTRUCTOR-LED TRAINING
.
Course Version: 17
Course Duration: 2 Day(s)
Material Number: 50155423
SAP Copyrights, Trademarks and
Disclaimers

© 2022 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the
express permission of SAP SE or an SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are
trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other
countries. Please see https://www.sap.com/corporate/en/legal/copyright.html for additional
trademark information and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software
components of other software vendors.
National product specifications may vary.
These materials may have been machine translated and may contain grammatical errors or
inaccuracies.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only,
without representation or warranty of any kind, and SAP SE or its affiliated companies shall not be liable
for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate
company products and services are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein should be construed as constituting an
additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business
outlined in this document or any related presentation, or to develop or release any functionality
mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’
strategy and possible future developments, products, and/or platform directions and functionality are
all subject to change and may be changed by SAP SE or its affiliated companies at any time for any
reason without notice. The information in this document is not a commitment, promise, or legal
obligation to deliver any material, code, or functionality. All forward-looking statements are subject to
various risks and uncertainties that could cause actual results to differ materially from expectations.
Readers are cautioned not to place undue reliance on these forward-looking statements, which speak
only as of their dates, and they should not be relied upon in making purchasing decisions.

© Copyright. All rights reserved. iii


Typographic Conventions

American English is the standard used in this handbook.


The following typographic conventions are also used.

This information is displayed in the instructor’s presentation

Demonstration

Procedure

Warning or Caution

Hint

Related or Additional Information

Facilitated Discussion

User interface control Example text

Window title Example text

iv © Copyright. All rights reserved.


Contents

vii Course Overview

1 Unit 1: Emergency Analysis and Troubleshooting

3 Lesson: Handling System Offline Situations


27 Lesson: Handling System Hang but Reachable Situations
41 Lesson: Analyzing a Suddenly Slow System

63 Unit 2: Structural System Performance Root Cause Analysis

65 Lesson: Analyzing Memory Issues


79 Lesson: Analyzing CPU Issues
87 Lesson: Analyzing Expensive Statement Issues
95 Lesson: Analyzing Disk and I/O Issues

109 Unit 3: Proactive Monitoring and Performance Safeguarding

111 Lesson: Configuring the SAP HANA Alerting Framework


123 Lesson: Setting up SAP HANA Workload Management
151 Lesson: Using SAP HANA Capture and Replay

© Copyright. All rights reserved. v


vi © Copyright. All rights reserved.
Course Overview

TARGET AUDIENCE
This course is intended for the following audiences:
● Support Consultant
● Developer IT Adminstrator IT Support
● System Administrator

© Copyright. All rights reserved. vii


viii © Copyright. All rights reserved.
UNIT 1 Emergency Analysis and
Troubleshooting

Lesson 1
Handling System Offline Situations 3

Lesson 2
Handling System Hang but Reachable Situations 27

Lesson 3
Analyzing a Suddenly Slow System 41

UNIT OBJECTIVES

● Handle system offline situations


● Handle system hanging but reachable situations
● Analyze a suddenly slow system

© Copyright. All rights reserved. 1


Unit 1: Emergency Analysis and Troubleshooting

2 © Copyright. All rights reserved.


Unit 1
Lesson 1
Handling System Offline Situations

LESSON OBJECTIVES
After completing this lesson, you will be able to:
● Handle system offline situations

Important Monitoring Information: Using the Command Line


Like any other system, an SAP HANA database system can go offline for various reasons. This
lesson looks at scenarios where the database unexpectedly went offline, and how the
database administrator could handle this situation.

Figure 1: Course Content Overview

The following issues can cause a SAP HANA system to go offline (that is, from the end-user
perspective, the SAP HANA system seems to hang):

Question: What can cause a SAP HANA system to go offline?

● ........................................
● ........................................
● ........................................

© Copyright. All rights reserved. 3


Unit 1: Emergency Analysis and Troubleshooting

● ........................................
● ........................................

Answer: What can cause a SAP HANA system to go offline?

● Power failure in the data center


● Hardware failures at the server level (CPU/memory)
● Hardware failures at the storage level (disk)
● Hardware failures at the network level (switches/router)
● Software errors at the operating system level (Linux)
● Software errors at the storage system level (SAN/NAS)
● Software errors at the database level (SAP HANA)
● Human error causing downtime (Server, router, storage, Linux, and SAP HANA)

Usually, in a system-down scenario the system cannot be accessed through SQL and/or any
another connection method. This makes analyzing the root cause a bit more difficult, but not
impossible. Several small tests, in the right order, can help you quickly exclude areas that
aren't causing the problem. Such a workflow should become a standard way of approaching a
system that is down.
Because SAP HANA cockpit might only be able to partially connect to the SAP HANA system,
you should use the following quick tests to roughly determine the area that causes the
problem. As soon as you have found the problem area you should investigate more deeply,
but not forget that getting the system up and running again has the highest priority.

Question: What tests can you perform to find the problem area?

● ........................................
● ........................................
● ........................................
● ........................................
● ........................................

Answer: What tests can you perform to find the problem area?

● Check the network. Ping some hosts in the data center.


● Check the hosts. Log on using SSH and verify that the OS is running without issues.
● Check the storage. Create, read, or delete a file to test the connection to the storage
system.
● Check SAP HANA. Use sapcontrol to test if all SAP HANA database services are running.
● Check SAP HANA. Use hdbsql to test if the SQL database is working for the application
user(s).

4 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

Caution:
The following checks will help you to quickly identify parts that are broken or
working. With there tests, you are not supposed to do a deep root cause analysis.
For a deep root cause analysis there are other and probably better tools
available.

Check the Network

Figure 2: Check the Network

In today's world, where almost every device is connected to the network, it's extremely
important that the network is up and running correctly. In an SAP HANA database system, the
network is important as well. End users connect to the database to execute all kind of queries.
This can be done directly using SQL or via a middleware application. The SAP HANA database
itself can be set up as a multi-host scale-out system that distributes the data over several
servers. Without a network, external end-user connections and internal server-to-server
connections would fail.
Because external and internal network connections are important for a SAP HANA system,
you should test both by pinging SAP HANA and non-SAP HANA hosts in your network. If all
the hosts can be reached, then the network is available and can be excluded.
ping <SAP HANA host>
ping <internal host>
ping <external host>

Using a ping, you can test that the remote hosts are reachable, but maybe the network
packages are taking the long way home due to a routing problem in the network. You can
check the network path to the remote host using the following command:
traceroute <SAP HANA host>
traceroute <internal host>

© Copyright. All rights reserved. 5


Unit 1: Emergency Analysis and Troubleshooting

Hint:
If in your company the end users are connecting to the network using a virtual
desktop infrastructure (VDI) solution or are in a dedicated network, then you
should test the network connections from within these infrastructures as well.

Check the SAP HANA Hosts on OS Level


As soon as you know that the network is up and running, you can start testing if the SAP
HANA hosts are functioning within normal parameters. Connect to the SAP HANA host(s)
using your preferred method (SSH, XRDP, VNC, and so on) and check the Linux system logs.

Figure 3: Check the Linux Host and Logs

As the SAP HANA hosts are normally up and running 24/7, check whether there were
unplanned and unexplained restarts. You can check this with the following command:
last | grep boot

Looking at the Linux system log files to analyze the system is one of the most important tasks
when troubleshooting a system. Since the move from syslog to systemd, kernel messages
and messages of system services are handled by systemd.
Systemd was introduced in SLES 12 and RHEL 7 and replaces the traditional init scripts.
Systemd also introduced its own logging system called journal.
Systemd manages the journal as a system service under the name systemd-journald.service
and it is switched on by default. In a systemd-enabled Linux system, the systemd-journald
service collects all messages from the kernel, boot process, syslog, user processes, standard
input, and system service errors in a centralized location.
You can check the last 50 boot error messages in the journal with the following command:
journalctl -n 50 -p err -b

-n = number of messages to display


-p = message priority
-b = display boot messages

6 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

Hint:
You can check the last 50 kernel error messages in the journal with the following
command:
journalctl -n 50 -p err -k

-k = display kernel messages

Check the Storage


Storage problems can result in severe database problems, database standstill or, even worse,
data loss.

Figure 4: Check the Storage

Avoiding storage problems is part of every layer in the Linux software and hardware stack.
Modern hard drives are capable of detecting and correcting minor errors in block reads. SAN
and NAS have built-in error correction and redundancy to handle power and hardware failure.
Modern Linux file systems are all journal-based and can correct errors created due to power
failures. Last but not least databases also support many different techniques to survive power
failures and incorrect service shutdown situation.
If the SAP HANA database system 'stopped' due to power, hardware, or software failures, you
should check if all file systems are available again after the server has restarted. Depending
on the storage system used you can investigate the storage problem more deeply.

Note:
In the scope of this course we will not investigate storage system problems. For
this you need to contact your storage vendor and get the support information you
need.

Check the SAP HANA Database System


Checking if the database is up and running sounds like a good plan, but with all the services
running it might still be the case that the end user or middleware application cannot connect.
Such a situation can happen when, for example, the SAP HANA system is up and running, but
the network doesn't allow SQL connections due to a firewall having been reconfigured.

© Copyright. All rights reserved. 7


Unit 1: Emergency Analysis and Troubleshooting

Figure 5: Check SAP HANA from the Command Line

To check if all the SAP HANA services and hosts are available on the Linux host, you can
execute the following commands:
As <sid>adm user:
sapcontrol -nr <instance number> -function GetProcessList
sapcontrol -nr <instance number> -function GetSystemInstanceList

You also need to check whether or not the system can be reached over the SQL interface.
When you are already connected to the SAP HANA host via the SSH session, check the SQL
interface with the following command:

Note:
The default port number range for tenant databases is 3<instance>40 -
3<instance>99.

As <sid>adm user:

hdbsql -n localhost -i <instance number> -d <Tenant name> -u <your


database user>

Enter your password when requested. You are now in the HDBSQL terminal. From the
HDBSQL terminal you can get SAP HANA connection information by executing the command:
\s

Caution:
It's important to test all your tenants, because the tenants have different SQL
ports and can be stopped independently of a running SAP HANA database
system.

8 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

Checking the SQL connection only from the local host isn't sufficient as it could be the case
that SAP HANA SQL is blocked on the network. To make sure that this isn't the case, you
should perform a HDBSQL connection test also via the network from the end-user LAN and
the application server network.
From the SAP S/4HANA ABAP application server, as <sid>adm user:

hdbsql -n <SAP HANA host> -i <instance number> -d <Tenant name> -u


<your database user>

Enter your password when requested. You are now in the HDBSQL terminal. From the
HDBSQL terminal you can get SAP HANA connection information by executing the command:

\s

If the issue is due to a hardware or a software failure, it is important to save log files on the
Linux operating system or at the storage system level for later analysis.
For further specific steps and guidance on pro-active or reactive actions you can take, see
SAP Note 1999020 — SAP HANA: Troubleshooting when the database is no longer reachable.

Important Monitoring Information: Using SAP HANA Cockpit


SAP HANA Cockpit 2.0 Architecture
The SAP HANA cockpit 2.0 provides a single point of access to a range of tools for
administration and detailed monitoring of SAP HANA databases. It also integrates
development capabilities required by administrators through the SAP HANA Database
Explorer.
The SAP HANA cockpit 2.0 is a web-based HTML5 user interface that you access through a
browser. It runs on SAP HANA extended application services, advanced model (XS
advanced). The cockpit handles single-container and multi-container systems as of SAP
HANA 1.0 SPS 12. You can use the cockpit to monitor and manage multiple SAP HANA
database systems and tenants.

Figure 6: SAP HANA Cockpit 2.0 Architecture

© Copyright. All rights reserved. 9


Unit 1: Emergency Analysis and Troubleshooting

The cockpit can be installed separately on dedicated hardware, shared hardware or in an


existing SAP HANA database tenant. This provides greater flexibility, because it allows you to
manage more than one SAP HANA system in a single administration environment.

What Can SAP HANA Cockpit Do?


SAP HANA cockpit provides centralized system and database administration features, such
as database monitoring, user management, and data backup. Administrators can use the
cockpit to start and stop services, to monitor the system, to configure system settings, and to
manage users and authorizations. The cockpit provides applications that allow you to manage
SAP HANA options and capabilities (for example, SAP HANA dynamic tiering). These
applications are only available if the options or capabilities are installed.

SAP HANA Cockpit Main Tools


The cockpit consists of the following tools, each of which has a dedicated function:
● Cockpit Manager
● SAP HANA cockpit
● SAP HANA database explorer

The Cockpit Manager is used by the cockpit administrator to register databases and to create
groups and cockpit users for accessing SAP HANA cockpit.
The first step for administrating a tenant or systemdb is to register the new tenant in the
Cockpit Manager. As soon as a tenant is registered in the Cockpit Manager, the database
administrator can start to use the tenant in the SAP HANA cockpit.
The SAP HANA cockpit home screen shows a high level aggregated overview of all registered
systems. From this aggregated landscape overview level you can quickly drill down to a
detailed overview of an individual database. In the database overview screen you find cards
for all important parts of the SAP HANA database. On a card, you see a mini graph of an
important KPI that belongs to the monitoring area the card displays. On these cards you will
also find links that start cockpit applications to analyze the measured KPIs further. Through
this drill-down you can easily find the cause of the problem.
The SAP HANA database explorer tool is integrated into the SAP HANA cockpit. The database
explorer allows you to query information about the database using SQL statements, and to
view information about your database's catalog objects.

Opening SAP HANA Cockpit 2.0


To start the cockpit, open the following URL in our training landscape: https://
wdflbmt7288.wdf.sap.corp:51026. After logging on, you are presented with an overview of the
databases assigned to your user account.

10 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

Figure 7: The SAP HANA Cockpit - Home Screen

In the overview, you can select the Database Directory tile or a dedicated group tile, like the
Group01 in this screenshot, to quickly see the status of the SAP HANA systems.
From the Database Directory tile, you can navigate to your SAP HANA system overview page,
where a detailed status of the selected SAP HANA system is displayed.

Database Directory
The Database Directory gives an aggregated overview of each database for which you are
responsible. In the Database Directory, you can see that a system or tenant is in trouble when
the status Stopped, Running with issues, or Unknown is displayed. To investigate these
problems in more detail, you start the cockpit System Overview page for this system or tenant
by selecting the corresponding line.
When the Database Directory shows that the whole SAP HANA database system is having
problems, it is important to quickly investigate the root cause of the problem so that you can
get the system up and running again.
Even when the SAP HANA database system is down, the cockpit can be used to investigate
the root cause of the problem. This system-down analysis is done via the SAP start service
connection.

© Copyright. All rights reserved. 11


Unit 1: Emergency Analysis and Troubleshooting

Figure 8: Database Directory

When a important system is down, you want to start it as soon as possible. This is a logical
course of action, but it could make the root cause analysis more difficult, because during a
restart, important low-level log or trace files can be overwritten. So a best practice is that
before starting the system, you save all the important log and trace files for later
investigation.
To support this, SAP HANA provides a full system information dump. This information dump
lets you control which logs to save, so you can use these saved logs to troubleshoot the issue
after you have restarted the SAP HANA database.
In the Database Directory, you can also specify the database user credentials required to drill
down to an individual database, which is necessary unless single sign-on is in effect for that
database.

Database Overview
The Database Directory shows a high-level status overview of all the databases belonging to
groups to which you have been granted access. For each database, you can drill down for
more information.
When you open the cockpit's Database Overview page for a system that shows the status
Stopped, No SQL access, or Unknown, it is very likely that you cannot connect to the SAP
HANA database using a SQL connection. The cockpit starts, but cannot retrieve the
monitoring data using SQL. This results in almost all cards showing the text Cannot load
data.
It is best that you start the Database Overview page of SYSTEMDB instead, because from this
Database Overview page you can retrieve some information via the SAP host agent
infrastructure. Via the SYSTEMDB, you can get information on the status of the SAP HANA
services.

12 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

Figure 9: SAP HANA System Not Available

This means that you cannot use the default monitoring cards to further investigate the
problem. Depending on the error situation, the SAP HANA Cockpit will present you the
relevant cards that can be useful during the investigation. In a system down you will find the
Manage full system information dumps application to be available, but the Troubleshoot
unresponsive systems isn't because the SAP HANA Cockpit can't connect to the SAP HANA
index service.

Figure 10: System Down vs Unresponsive System

© Copyright. All rights reserved. 13


Unit 1: Emergency Analysis and Troubleshooting

The Manage Full System Information Dumps Application


To collect the diagnosis information, choose the Manage full system information dumps
application. This application is found in the Alerts and Diagnostics card.

Figure 11: Manage Full System Dumps

To collect the diagnosis information, perform the following steps:

1. Search for the Alerts and Diagnostics card.

2. Choose the Manage full system information dumps link.

3. Choose Collect Diagnostics, and in the dropdown list choose Collect from Existing Files or
Create from Runtime Environment.

4. In the pop-up window choose the information items you want to collect. In the bottom-
right corner, choose Start Collecting.

5. When all the data is collected, the


fullsysteminfodump_<SID>_<DBNAME>_<HOST>_<timestamp>.zip file is displayed in
the collections table.

14 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

Figure 12: Collect Diagnosis Information

Collect from Existing Files


Choose this option if you want to collect diagnosis information for one or more file types for a
specific time period - by default the last seven days. If you also want information from system
views, select Include System Views.

Note:
If you are connected to the system database of a multiple-container system, only
information from the system views of the system database is collected.
Information from the system views of tenant databases is not collected,
regardless of this option setting.

Information from system views is collected through the execution of SQL statements, which
may impact performance. In addition, the database must be online, so this option is not
available in diagnosis mode.

Create from Runtime Environment


Choose this option if you want to restrict the information collected to one or more runtime
environment (RTE) dump files. You can configure the creation and collection of dump files by
specifying the following additional information:
● The number of sets to be collected (that is, the number of points in time at which RTE
dump files are collected). Possible values are 1-5.
● The interval (in minutes) at which RTE dump files are to be collected (possible values are 1,
5, 10, 15, and 30). The default value is 1.
● The host(s) from which RTE dump files are to be collected.
● The service(s) for each selected host from which RTE dump files are to be collected.
● The section(s) from each selected service from which RTE dump files are to be collected.

The system collects the relevant information and saves it to a ZIP file. This may take some
time and can be allowed to run in the background.

© Copyright. All rights reserved. 15


Unit 1: Emergency Analysis and Troubleshooting

If you are connected to the system database of a multiple-container system, information from
all tenant databases is collected and saved to separate ZIP files.

Download the Collected Diagnosis Information


Once the collected diagnosis information ZIP files are available, you can download them by
choosing Download. The files are saved locally in your browser’s download directory.

Figure 13: Download the Diagnosis Information

The diagnosis information is collected by the Python support script


fullSystemInfoDump.py, which collects a range of information from your system for
diagnosis purposes. It can be triggered from SAP HANA cockpit, or directly from the
command line.

Collect Diagnosis Information from the Command Line


The fullSystemInfoDump.py script allows you to collect information from your system,
even when it is not accessible using SQL. You can then add this information to a support
message, or use this information to investigate the root cause on your personal computer.
The script is part of the SAP HANA server installation and can be executed directly from the
command line.

16 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

Figure 14: Command Line: fullSystemInfoDump.py

If you are logged on as the operating system user, <sid>adm, the fullSystemInfoDump.py
script is part of the server installation and can be run from the command line. It is located in
the directory $DIR_INSTANCE/exe/python_support.

Hint:
You can use the predefined shell alias cdpy to quickly navigate to the
python_support directory.

Start the script from its location with the command:


python fullSystemInfoDump.py

You can modify the command with several command line options. To see the available
options, specify the option --help. All options related to getting a system dump are fully
described in SAP Note 1732157.
If the system can be reached by SQL (and you have not specified the option --nosql), the
script starts collecting diagnosis information. If the system cannot be reached by SQL, the
script starts collecting support information, but does not export data from system views.
The script creates a ZIP file containing the collected information and saves it to the directory
$DIR_GLOBAL/sapcontrol/snapshots. $DIR_GLOBAL typically points to /usr/sap/
<SID>/SYS/global.
The name of the ZIP file is structured as follows:
fullsysteminfodump_<SID>_<DBNAME>_<HOST>_<timestamp>.zip.

© Copyright. All rights reserved. 17


Unit 1: Emergency Analysis and Troubleshooting

The time-stamp in the file name is Coordinated Universal Time (UTC). The HOST and SID are
taken from the sapprofile.ini file.
The output directory for the ZIP file is shown as console output when the script is running, but
you can look it up with the command hdbsrvutil -z | grep DIR_GLOBAL= .

Diagnosis Information Collected


The Python support script fullSystemInfoDump.py collects the following information from
your system for diagnosis purposes.

Note:
All of the following file types are collected unless the option --rtedump is
specified, in which case only runtime environment (RTE) dump files are created
and collected.

Log File
All information about what has been collected is shown as console output, and is written to a
file named log.txt, which is stored in the ZIP file.

Trace Files
Each of the following trace files is put into a file with the same name as the trace file. For
storage reasons, only the trace files from the last seven days are collected unabridged. Older
trace files are not collected. This behavior can be changed by using the option --days or with
the options --fromDate and --toDate.

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/
compileserver_alert_<SAPLOCALHOST>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/
compileserver_<SAPLOCALHOST>.<...>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/daemon_<SAPLOCALHOST>.<...>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/
indexserver_alert_<SAPLOCALHOST>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/
indexserver_<SAPLOCALHOST>.<...>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/
nameserver_alert_<SAPLOCALHOST>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/nameserver_history.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/
nameserver_<SAPLOCALHOST>.<...>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/
preprocessor_alert_<SAPLOCALHOST>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/
preprocessor_<SAPLOCALHOST>.<...>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/
statisticsserver_alert_<SAPLOCALHOST>.trc

18 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/
statisticsserver_<SAPLOCALHOST>.<...>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/xsengine_alert_<SAPLOCALHOST>.trc

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/xsengine_<SAPLOCALHOST>.<...>.trc

Configuration Files
All configuration files are collected unabridged and stored in a file with the same name as
the .ini file:
● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/attributes.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/compileserver.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/daemon.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/executor.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/extensions.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/filter.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/global.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/indexserver.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/inifiles.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/localclient.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/mimetypemapping.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/nameserver.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/preprocessor.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/scriptserver.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/statisticsserver.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/validmimetypes.ini

● $DIR_INSTANCE/<SAPLOCALHOST>/exe/config/xsengine.ini

Database System Log Files


The following backup files are collected unabridged:
● $DIR_INSTANCE/<SAPLOCALHOST>/trace/backup.log

● $DIR_INSTANCE/<SAPLOCALHOST>/trace/backint.log

RTE Dump Files


For each index server, an RTE dump file containing information about threads, stack contexts,
and so on is created and stored in the file
indexserver_<SAPLOCALHOST>_<PORT>_runtimedump.trc. These files are stored
unabridged.

Crashdump Information
Crashdump files for services are collected unabridged.

© Copyright. All rights reserved. 19


Unit 1: Emergency Analysis and Troubleshooting

Performance Trace Files


Performance trace files with the suffix *.tpt are collected unabridged.

Kerberos Files
The following Kerberos files are collected:
● /etc/krb5.conf

● /etc/krb5.keytab

System Views
If the collection of system views is not excluded (the option --nosql is specified), all rows of
the following system views (with the exceptions mentioned) are exported into a CSV file with
the name of the table:

Note:
If you are connected to the system database of a multiple-container system, only
information from the system views of the system database is collected.
Information from the system views of tenant databases is not collected,
regardless of this option setting.

Note:
If you trigger the collection of diagnosis information from the SAP HANA cockpit
for offline administration, information from system views cannot be collected
because it does not use an SQL connection.

● SYS.M_CE_CALCSCENARIOS WHERE SCENARIO_NAME LIKE '%_SYS_PLE%'

● SYS.M_CONNECTIONS with CONNECTION_ID > 0

● SYS.M_DATABASE_HISTORY

● SYS.M_DEV_ALL_LICENSES

● SYS.M_DEV_PLE_SESSIONS_

● SYS.M_DEV_PLE_RUNTIME_OBJECTS_

● SYS.M_EPM_SESSIONS

● SYS.M_INIFILE_CONTENTS

● SYS.M_LANDSCAPE_HOST_CONFIGURATION

● SYS.M_RECORD_LOCKS

● SYS.M_SERVICE_STATISTICS

● SYS.M_SERVICE_THREADS

● SYS.M_SYSTEM_OVERVIEW

● SYS.M_TABLE_LOCATIONS

● SYS.M_TABLE_LOCKS

20 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

● SYS.M_TABLE_TRANSACTIONS

● _SYS_EPM.VERSIONS

● _SYS_EPM.TEMPORARY_CONTAINERS

● _SYS_EPM.SAVED_CONTAINERS

● _SYS_STATISTICS.STATISTICS_ALERT_INFORMATION

● _SYS_STATISTICS.STATISTICS_ALERT_LAST_CHECK_INFORMATION

Note:
Only the first 2,000 rows are exported.

● _SYS_STATISTICS.STATISTICS_ALERTS

Note:
Only the first 2,000 rows are exported.

● _SYS_STATISTICS.STATISTICS_INTERVAL_INFORMATION

● _SYS_STATISTICS.STATISTICS_LASTVALUES

● _SYS_STATISTICS.STATISTICS_STATE

● _SYS_STATISTICS.STATISTICS_VERSION

The first 2,000 rows of all remaining tables in the schema _SYS_STATISTICS are exported,
ordered by the SNAPSHOT_ID column.

Additional Information Collected if SQL Connection is Not Available


All available topology information is exported to a file named topology.txt. It contains
information about the host topology in a tree-like structure.

© Copyright. All rights reserved. 21


Unit 1: Emergency Analysis and Troubleshooting

Figure 15: Example: The Topology.txt File

Execute a System Restart


Now that all important diagnosis information is collected in the
fullsysteminfodump_<SID>_<DBNAME>_<HOST>_<timestamp>.zip file(s), you can restart
the SAP HANA database system. Before you restart the system, consider if you need to
restart the whole SAP HANA database system, or just the a tenant in the SAP HANA database
system. How to perform a database restart and how to restart a SAP HANA tenant are
described below.

22 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

Start a Tenant Database

Figure 16: Start a Tenant Database

In the SAP HANA cockpit - Home screen, select the Database Directory or your personal group
tile. In the Database Directory screen, choose the Manage Databases link. To start the tenant
database, in the Manage databases screen, choose the Start button. This will perform a
normal tenant database start.

© Copyright. All rights reserved. 23


Unit 1: Emergency Analysis and Troubleshooting

Start the SAP HANA Database System

Figure 17: Start the SAP HANA Database System

In the SAP HANA cockpit - Home screen, choose the Database Directory tile and choose the
SYSTEMDB that is stopped. In the Database Overview screen, search for the Services card. To
start the database system, in the Services card choose the Start Database button. This will
perform a normal database start of the SAP HANA database.

Note:
In newer version of the SP HANA Cockpit. you can choose the Start Database
button directly from the Database Overview screen.

Verify the Database System Status


The SAP HANA system restart sequence quickly restores the system to a fully operational
state.
When you restart an SAP HANA system, the following activities are executed by the restart
agent of the persistence layer:

1. The data volume of each service is accessed to read and load the restart record.

2. The list of open transactions is read into memory.

3. Row tables are loaded into memory.

4. Open transactions are processed using the redo log, as follows:

24 © Copyright. All rights reserved.


Lesson: Handling System Offline Situations

● Write transactions that were open when the database was stopped are rolled back.

● Changes to committed transactions that were not written to the data area are rolled
forward.
The first column tables start being reloaded into memory as they are accessed for roll
forward.

Note:
Because a regular or "soft" shutdown writes a savepoint, there are no replay
log entries to be processed in this case.

5. Aborted transactions are determined and rolled back.

6. A savepoint is performed with the restored consistent state of the database.

7. Column tables that are marked for preload, and their attributes, are asynchronously
loaded in the background (if they have not already been loaded as part of log replay).
The preload parameter is configured in the meta-data of the table. This feature is useful,
for example, to make certain tables and columns that are used by important business
processes available more quickly.

8. Column tables and their attributes that were loaded before restart, start reloading
asynchronously in the background (if they have not already been loaded as part of log
replay or because they are marked for preload).
During normal operation, the system tracks the tables that are currently in use. This list is
used as a basis for reloading tables after a restart.

Reloading column tables, as described in steps 7 and 8, restores the database to a fully
operational state more quickly. However, it does create performance overhead and may not
be necessary in non-production systems. You can deactivate the reload feature in the
indexserver.ini file by setting the reload_tables parameter in the sql section to false. In
addition, you can configure the number of tables whose attributes are loaded in parallel using
the tables_preloaded_in_parallel parameter in the parallel section of the
indexserver.ini file. This parameter also determines the number of tables that are preloaded in
parallel.

© Copyright. All rights reserved. 25


Unit 1: Emergency Analysis and Troubleshooting

Figure 18: Is the SAP HANA Database Running?

Now that the SAP HANA database system is up and running, you can continue to investigate
the failure. If you cannot find the root cause of the failure, open an SAP support message and
attach the diagnosis information collected in the SAP HANA cockpit - Full system information
dump application.

LESSON SUMMARY
You should now be able to:
● Handle system offline situations

26 © Copyright. All rights reserved.


Unit 1
Lesson 2
Handling System Hang but Reachable
Situations

LESSON OBJECTIVES
After completing this lesson, you will be able to:
● Handle system hanging but reachable situations

Troubleshoot Unresponsive Systems

Figure 19: Urgent Analysis and Troubleshooting

There are various reasons for a system to hang, or seem to be hanging from an end-user
perspective. The database is said to be hanging when it no longer responds to queries that are
executed against it.
The source of the system standstill might be related to any of the components involved, for
example, the storage, OS and hardware, network, SAP HANA database or the application
layer. For troubleshooting it is essential to collect information about the context of the active
threads in the SAP HANA database.

Question: How can a system get into a hanging state?

● ........................................

© Copyright. All rights reserved. 27


Unit 1: Emergency Analysis and Troubleshooting

● ........................................
● ........................................
● ........................................
● ........................................

The following list of issues can cause a system hang state, or a state where the system seems
to hang from the end-user perspective:

Answer: How can a system get into a hanging state?

● Log volume full caused by either a full disk, a quota setting or failed log backups
● Savepoint lock conflict with long-running update
● Wrong configuration of transparent huge page or OS page cache
● The Translation Lookaside Buffer (TLB) shootdown
● High context switches caused by many SqlExecutor or JobExecutor threads
● Huge Multiversion Concurrency Control (MVCC) versions
● High system CPU usage caused by non-HANA applications
● Frequent Out of Memory (OOM) situations that lead to a performance drop

Note:
What does "Translation Lookaside Buffer (TLB) shootdown" mean?
A Translation Lookaside Buffer (TLB) is a cache of the translations from virtual
memory addresses to physical memory addresses. When a processor changes
the virtual-to-physical mapping of an address, it needs to tell the other processors
to invalidate that mapping in their caches.

As SQL statements cannot usually be executed for analysis, you should perform the following
steps if it is still possible to log on to the OS of the master host (for example, as the <sid>adm
user). Also see SAP Note 1999020: SAP HANA: Troubleshooting when database is no longer
reachable for further specific steps and guidance on proactive or reactive actions you can
take.

Question: What and where can you check?

● ........................................
● ........................................
● ........................................
● ........................................

Answer: What and where can you check?

● Use SAP HANASitter


● Check if the SAP HANA file systems still have free space

28 © Copyright. All rights reserved.


Lesson: Handling System Hang but Reachable Situations

● Collect call stack and runtime information


● Analyze the current Operating System information
● Analyze an Unresponsive System in SAP HANA Cockpit

Use SAP HANASitter


SAP HANASitter is a tool developed by SAP Support that allows you to monitor SAP HANA
and to automatically create dump files in certain scenarios. For more information, see SAP
Note 2399979 - How-To: Configuring automatic SAP HANA Data Collection with SAP
HANASitter

Figure 20: Check System using SAP HANASitter

SAP HANASitter Features:

● Database online check


● CPU, ping, and critical feature check
● Recording mode for RTE dumps, stack calls, kernel profiler trace, and GStack
● Scale-out monitor
● Critical session killer

The SAP HANAsitter checks by default once an hour, if SAP HANA is online and primary. If so,
it starts to track. Tracking includes regularly (by default, every minute) checking if SAP HANA
is responsive. If it is not, it starts to record.
Recording can include writing call stacks of all active threads, recording run time dumps,
index server gstacks, and/or kernel profiler traces. By default, nothing is recorded.
If SAP HANA is responsive, it checks many of the critical features of SAP HANA. As standard,
the script checks if there are more than 30 active threads. If there are more than 30 active
threads, the script starts to record.
When the script has finished recording, it exits. The script can be configured to restart using
the command line.

© Copyright. All rights reserved. 29


Unit 1: Emergency Analysis and Troubleshooting

When the script has finished all the tests successfully, it sleeps for one hour, before it starts
all the checks again.

Setup Steps Overview


1. Create an SAP HANA user (for example, HANASITTER, but you can use a different name)
and assign the CATALOG READ privilege.

2. Create a user key (for example, SYSTEMKEY, but you can use a different name) in the
hdbuserstore.

3. Download the hanasitter.py script attached to SAP Note 2399979.

4. Store the script in, for example, the python_support directory.

5. As <sid>adm, change to the python_support directory with the command cdpy.

6. Execute the script with the command python hanasitter.py -ng 1.

Check if the SAP HANA File Systems Still Have Free Space
In a system hanging situation the execution of SQL statements is probably not possible
anymore. If you still can log on to the operating system, you should try to perform the
following steps on the OS of the master host.

Figure 21: Check an Unresponsive System

In cases where logs cannot be written, all DML statements will fall into wait status. This can
lead to a failure of opening new connections because the system internally executes DML
statements during the process. Typically, a full log volume is the cause for this.
The root cause of the "log volume full" situation is either caused by disk being full or hitting
the quota setting. To investigate more deeply, perform the following steps:

1. Check for the Internal Disk-Full Event (Alert 30) in the indexserver trace.

2. Check if the system is running out of disk space using the command df -h on the OS ssh
shell.

30 © Copyright. All rights reserved.


Lesson: Handling System Hang but Reachable Situations

3. Check if the system is running out of inodes (NFS) using the command df -i.

4. Check the quota setting in file system.

5. Check SAP Note 1679938 - Log Volume is full.

6. Check SAP Note 2083715 - Analyzing log volume full situations.

Once you have resolved the issue (for example, freed up disk space), you may need to
manually mark the internal event as handled. You can do this on the Overview tab of the
Administration editor in the SAP HANA studio, or by executing the following SQL statements:
ALTER SYSTEM SET EVENT ACKNOWLEDGED '<host>:<port>' <id>
ALTER SYSTEM SET EVENT HANDLED '<host>:<port>' <id>

Log Backup Failure


The "log volume full" situation can also be caused by failing log backups. A SAP HANA log
segment is freed for reuse as soon as a the segment is written by the backup process to the
backup location. If the log backups fail, the log segments are not reused and new log
segments are created. Eventually this can fill up the data volume file system resulting in the
"log volume full" situation. To investigate more deeply, perform the following steps:

1. Check the backup.log (located at /usr/sap/<SID>/HDB<Instance#>/<Host>/trace )


to see whether it includes ERROR in the log backup. Check M_BACKUP_CATALOG and
M_LOG_SEGMENTS.

2. If log backup uses backint, check backint.log (located at /usr/sap/<SID>/


HDB<Instance#>/<Host>/trace ) to see whether it includes ERROR information, and
contact backint vendor support.

Savepoint Lock Conflict with Long-running Update


With certain revision and conditions, the conflict between the savepoint lock and the DML lock
blocks subsequent statements when long-running update/insert statements exist. To
investigate more deeply, perform the following steps:

1. Use SAP Note 813020: How to generate a runtime dump on SAP HANA to collect a
runtime dump. In the generated dump, look for the following combination of call stacks in
many threads.

DataAccess::SavepointLock::lockShared(…)
DataAccess::SavepointSPI::lockSavepoint(…)

And one of the following call stacks that is in the savepoint phase.

DataAccess::SavepointLock::lockExclusive()
DataAccess::SavepointImpl::enterCriticalPhase(…)

2. If you are running SAP HANA 1.0 (rev 97 or older) check whether the symptoms match the
description in SAP Note 2214279 - Blocking situation caused by waiting writer holding
consistent change lock. If so, apply the parameter cch_reopening_enabled as
described in the SAP Note.

Collect Call Stack and Runtime Information


For accurate root cause analysis, it is very helpful to have call-stack and runtime information
available from the time of the hang. For this, you can use the approaches outlined in SAP Note

© Copyright. All rights reserved. 31


Unit 1: Emergency Analysis and Troubleshooting

2313619. It is useful to capture one or several runtime dumps (SAP Note 1813020) so that an
accurate root cause analysis can be done later.
If you consider the possibility of an SAP HANA internal deadlock, you can also run the
deadlock detector functionality of hdbcons (SAP Note 2222218): hdbcons
'deadlockdetector wg -w -o <file_name>.dot'.

Note:
The generated DOT file can be converted to a PDF or a GIF file using the following
commands:
● To generate a PDF file: dot -Tpdf -o <pdffile> /usr/sap/<SID>/
HDB00/work/HA215_<SID>_DeadlockCheck.dot

● To generate a GIF file: dot -Tgif -o <giffile> /usr/sap/<SID>/


HDB00/work/HA215_<SID>_DeadlockCheck.dot

Analyze the Current Operating System Information


Use OS commands like top to identify the amount of CPU consumption, the main CPU
consuming processes, and the main CPU utilization component. As SAP HANA is a
multithreading system it's more useful to have an overview of the consumed CPU per thread.
To display the CPU used per thread, use the top -H command. Its output shows the CPU
consumption per thread.
Check /proc/interrupts to understand if certain interrupts happen very often. Ensure that
the SAP HANA-related file systems contain sufficient free space (for example, using tools like
df or mmdf (for GPFS)) and that there is no quota defined that could result in a file system
becoming full. Check if the file systems are still reachable, for example, using df /usr/sap/
<sid>. Inspect OS log files, such as /var/log/messages, for suspicious error messages
that are issued at the time of the problem (but not at times of normal operation).

Check the OS Configuration


On Linux, you must ensure that Transparent Huge Pages (THPs) are disabled. If THPs are
disabled properly, the cat /sys/kernel/mm/transparent_hugepage/enabled
command returns always madvise [never].
See further recommended OS, firmware, and hardware settings described in SAP
Note 2000003.

Use the OS Command "sar" for Historic Analysis


Use operating system commands like sar to identify further relevant operating system
information (for example, memory, network, interrupts, and further CPU and load details). For
more information, see SAP Note 1999670.

Execute Hardware Checks


On IBM hardware, you can use the hardware check tool provided in SAP Note 1661146 to
collect more information about the system environment.
For SAP HANA 1.0, use the SAP HANA hardware configuration check tool (SAP
Note 1943937) to determine bottlenecks on the infrastructure side.
For SAP HANA 2.0, use the SAP HANA Hardware and Cloud Measurement Tools (SAP Note
2493172) to determine bottlenecks on the infrastructure side.

32 © Copyright. All rights reserved.


Lesson: Handling System Hang but Reachable Situations

See SAP Note 1999020: SAP HANA: Troubleshooting, when the database is no longer
reachable for further specific steps and guidance on the proactive or reactive actions you can
take.

Analyze an Unresponsive System in SAP HANA Cockpit


The Troubleshoot Unresponsive Systems function uses the SAP host agent. It can collect the
most important diagnosis information from the SAP HANA database, even when the system
is stopped or cannot be reached by SQL due to performance problems.

Figure 22: Troubleshoot an Unresponsive System

To troubleshoot a system in a hang state, there is a function in SAP HANA cockpit called
Troubleshoot unresponsive systems. When using this function, information is collected
through the SAP host agent. The communication between the web browser and the SAP host
agent is always done over HTTPS, which requires that the SAP host agent has a Secure
Sockets Layer (SSL) certificate (PSE) in its security directory.
The information is collected into a file named emergency_info_<SID>.zip by the Python script
emergencyInfo.py. This script connects to the index server, using the hdbcons interface.
The script tries to collect information about the open connections, running transactions, and
threads. It also shows blocked transactions. If the index server is unavailable, no information
is shown.
The information is collected by a Python script that connects to the index server, using the
hdbcons interface. The script tries to collect information about the open connections,
running transactions, and threads. It also shows blocked transactions. If the index server is
unavailable, no information is shown.

© Copyright. All rights reserved. 33


Unit 1: Emergency Analysis and Troubleshooting

Figure 23: Troubleshoot Unresponsive Systems

The Troubleshoot Unresponsive System function organizes information about the system by
tab. You can diagnose the following:
● Connections
● Transactions
● Blocked transactions
● Threads

34 © Copyright. All rights reserved.


Lesson: Handling System Hang but Reachable Situations

Connections Tab

Figure 24: Connections Tab

Analyzing the sessions connected to your SAP HANA database helps you identify which
applications, or which users, are currently connected to your system, as well as what they are
doing in terms of SQL execution.
On the CONNECTIONS tab, you can see information about the current connections to the SAP
HANA server. This information includes connection start time, ID, user name, and status. If
there are many connections open to the server, it can lead to congestion and may result in the
server becoming unresponsive.
On the CONNECTIONS tab, you can use the Cancel Connection button to stop a single
connection. To do this, select the connection that you want to cancel, and choose Cancel
Connection.
You can stop all the transactions that are currently running by choosing Cancel All
Transactions.

© Copyright. All rights reserved. 35


Unit 1: Emergency Analysis and Troubleshooting

Transactions Tab

Figure 25: Transactions Tab

On the TRANSACTIONS tab, you can see information about the current transactions in the
SAP HANA system. This information includes connection and transaction ID, allocated
memory, and user name. The information shown in the transactions table gives you a good
insight into current activity on the system.
Via the connection ID and the primary connection tab, you can even link the transaction to the
corresponding connection.
You can stop all the transactions that are currently running by choosing Cancel All
Transactions.

Blocked Transactions Tab

Figure 26: Blocked Transactions Tab

On the BLOCKED TRANSACTIONS tab, you can investigate if there are blocked transactions in
your system.

36 © Copyright. All rights reserved.


Lesson: Handling System Hang but Reachable Situations

Blocked transactions are transactions that cannot be processed further because they need to
acquire transactional locks (record or table locks) that are currently held by another
transaction. Transactions can also be blocked while waiting for other resources, such as the
network or disk access (database or metadata locks).
The type of lock held by the blocking transaction (record, table, or metadata) is indicated in
the Lock Type column.
The lock mode is indicated in the Transactional Lock Type column.
Exclusive: Row-level locks prevent concurrent write operations on the same record. They are
acquired implicitly by update and delete operations, or explicitly with the SELECT FOR
UPDATE statement.
Table-level: Locks prevent operations on the content of a table from interfering with changes
to the table definition (such as drop table or alter table). DML operations on the table content
require an intentional exclusive lock, while changes to the table definition (DDL operations)
require an exclusive table lock. There is also a LOCK TABLE statement for explicitly locking a
table. Intentional exclusive locks can be acquired if no other transaction holds an exclusive
lock for the same object. Exclusive locks require that no other transaction holds a lock for the
same object (neither intentional exclusive nor exclusive).
For more detailed analysis of blocked transactions, information about low-level locks is
available in the columns Lock Wait Name, Lock Wait Component, and Thread ID of Low-Level
Lock Owner. Low-level locks are locks acquired at the thread level. They manage code-level
access to a range of resources (for example, internal data structures, network, or disk). Lock
wait components group low-level locks by engine component or resource.
By choosing Cancel All Transactions, you can stop all the current running transactions.
Because the Delta Table Merge needs to lock tables to proceed, it is a common cause of
blocked transactions. Another job displayed by this monitor is the savepoint write, which
needs to pull a global database lock in its critical phase. A common issue is a flaw in the
application coding that does not commit a write transaction. Such a transaction will block any
other transaction that needs to access the same database object. To remedy the situation,
close the blocking transaction.
In the UI table, the blocked transactions are displayed directly beneath the blocking
transaction.
First, you must determine whether there is only one, or a few transactions, blocking many
other transactions. To do this, open the Blocked Transaction tab and check the amount of
blocking transactions. If there are only a few blocking transactions, there is probably an issue
on the application side. To resolve the problem, use the following techniques:

1. If only one transaction is blocked, contact the application user and the developer. Firstly,
ask the user to close the application and secondly, to check if there is a general issue with
the application code.
If you are not able to contact the user, you can kill the transaction or kill the client process
that opened the session. The transaction is rolled back. The session cancellation may take
some time to succeed. If it takes longer than 30 seconds, consider this as a bug and
contact development support.
If the session cancellation takes too long or does not complete at all, you can kill the client
process that opened the session. This also terminates the blocking transaction. As a
prerequisite, you must have access to the client machine.

© Copyright. All rights reserved. 37


Unit 1: Emergency Analysis and Troubleshooting

2. If a large amount of transactions are blocked, you must find out whether a specific access
pattern is causing the issue. If multiple transactions are trying to access the same
database objects with write operations, they block each other. To check if this is
happening, open the Blocked Transaction Monitor and analyze the Waiting Schema Name,
Waiting Object Name, and Waiting Record ID columns. If you find a lot of transactions that
are blocking many other transactions, you must investigate whether you can do the
following:

a. Change the client application(s) to avoid the access pattern.

b. If a background job that issues many write transactions (for example, a data load job)
is running, reschedule it to a period with a low user load.

c. Partition tables that are accessed frequently to avoid clashes. See the SAP HANA
Administration Guide for more details on partitioning.

3. If you cannot identify specific transactions or specific database objects that lead to
transactions being blocked, you can assume that there is a problem with the database
itself or with its configuration, an issue with the delta merge (for example, mass write
operation on a column table), or a long savepoint duration.

Threads Tab

Figure 27: Threads Tab

In the THREADS tab, you can identify which statements or procedures are being executed and
at what stage they are, who else is connected to the system, and if there are any internal
processes running as well. The information shown includes thread type, ID, and thread status.
You can also find information of system user and waiting time. The information shown in the
table helps you to identify transactions with high average wait times. With the user name,
thread ID, and wait time columns, you can identify which thread is causing problems.
In this tab, you can identify long-running threads and those threads that are blocked for an
inexplicable long period of time.
In the case of an emergency, choose Cancel All Transactions.

38 © Copyright. All rights reserved.


Lesson: Handling System Hang but Reachable Situations

LESSON SUMMARY
You should now be able to:
● Handle system hanging but reachable situations

© Copyright. All rights reserved. 39


Unit 1: Emergency Analysis and Troubleshooting

40 © Copyright. All rights reserved.


Unit 1
Lesson 3
Analyzing a Suddenly Slow System

LESSON OBJECTIVES
After completing this lesson, you will be able to:
● Analyze a suddenly slow system

Analyze a Suddenly Slow System


There can are various reasons why an SAP HANA system suddenly becomes very slow. This
lesson tackles issues where the database performance unexpectedly become very slow.

Figure 28: System Suddenly Slow

The following issues can cause a system to suddenly become slow (that is, from the end-user
perspective the SAP HANA system seems to be performing slowly):
● Hardware failures at the server level (read errors on bad memory)
● Hardware failures at the storage level (read errors on disk)
● Hardware failures at the network level (package collisions on switches or a router)
● Software errors at the OS level (OS command on Linux is using 100% CPU/memory
swapping on disk)

© Copyright. All rights reserved. 41


Unit 1: Emergency Analysis and Troubleshooting

● Software errors at the storage system level (software errors on SAN/NAS)


● Software errors at the database level (long running queries on SAP HANA or too many
sessions open)

Figure 29: Collect OS Logs

If the slow system is caused by something at the OS level, it is important to save log files on
the Linux OS or at storage system level for later analysis.
Usually, in a slow system scenario, the system can still be accessed through SQL, but it takes
longer to investigate. The normal SQL access should be used to further analyze the possible
root cause.

What Can and Should You Do?


As SQL statements can still be executed for analysis, you must perform the following steps
using the SAP HANA Cockpit 2.0:
● Look at the Overall Database Status card to see if all the services are running correctly.
● Look at the Memory Usage card to see if the SAP HANA database has enough memory
available.
● Look at the CPU Usage card to see who is using the CPU resources.
● Look at the Disk Usage card to see who is using the disk resources.
● Look at the Thread card to see which threads are using the resources.
● Look at the Sessions card to see which sessions are running.
● Look at the Monitor Statements card to see which SQL statements are running on the
system.

If the slow system is caused by something in the SAP HANA database, is it important to
investigate the SQL query and save log and trace files at the SAP HANA database level for
further analysis, before terminating sessions or threads.

42 © Copyright. All rights reserved.


Lesson: Analyzing a Suddenly Slow System

Database Overview Application


The Database Overview for a system displays important metrics and available functions,
regardless of whether it is a single-host or multi-host system, or a tenant database.
In the SAP HANA Database Overview, you can view key health indicators for this specific
database, such as services status, alerts, and resource utilization. You also have access to
tools that allow you to perform database administrations tasks, such as performance
analysis, and SQL statement execution. Different parts of a single card can link to different
views or applications. This way, you can see various components in a single view and decide
whether to further examine issues by drilling down.

Figure 30: Database Overview

Details also include the following:


● Information such as operational status, system usage type, whether the system has
multiple hosts, the number of hosts (if distributed), and database version
● The SAP HANA version history
● Information about the plug-ins that are installed
● The status of replication from your productive system to a secondary system
This information is only available and applicable if you are operating a secondary instance
of your database (for example, in a high availability (HA) scenario). If this is the case,
content from the primary or productive instance of your database is replicated to the
secondary instance.

To launch the overview, drill down to the name of the database from the Database Directory
or from a group. Unless your administrator has enabled single sign-on, you must connect to

© Copyright. All rights reserved. 43


Unit 1: Emergency Analysis and Troubleshooting

the database with a database user that has the system privilege CATALOG READ and choose
_SYS_STATISTICS.

Overall Database Status Card


To identify problems early and avoid disruptions, you must monitor your SAP HANA database
continuously.
You can monitor the overall status and resource usage of the SAP HANA database at a glance
on the homepage of the SAP HANA cockpit. Then, drill down for more detailed monitoring and
analysis.
The overall database status can be running, running with issues, or stopped. Choosing this
status brings you to Manage Services, where you can stop or kill a service, and start or stop a
system. Also, in the Overall Database Status, you receive alerts related to services, and, in
order of priority, you see:
● The number of services not running
● The number of services running with issues (if there are no stopped services)
● The number of services running

Manage Services
Manage Services provides you with detailed information about database services for an
individual database.

Note:
Not all the columns listed in the following figure are visible by default. You can add
and remove columns in the table personalization dialog, which you open by
choosing the personalization icon in the table toolbar.

44 © Copyright. All rights reserved.


Lesson: Analyzing a Suddenly Slow System

Figure 31: Manage Services

The following list gives an overview of the information available on each service:
● Host: The name of the host on which the service is running
● Status: The status of the service. The following statuses are possible:
- Running
- Running with Issues
- Stopped
- Not Running

To investigate why the service is not running, you can navigate to the crashdump file,
created when the service stopped.

Note:
The crashdump file opens in the Trace tool of the SAP HANA Web-based
Development Workbench. For this, you need the role
sap.hana.xs.ide.roles::TraceViewer or the parent role
sap.hana.xs.ide.roles::Developer.

● Service: The service name, for example, indexserver, nameserver, xsengine, and so on
● Role: The role of the service in a failover situation. Automatic failover happens when the
service or the host on which the service is running fails. The following values are possible:

© Copyright. All rights reserved. 45


Unit 1: Emergency Analysis and Troubleshooting

- Master: The service is the active master worker.


- No entry: The service is a slave worker.
- Standby: The service is in standby mode. It does not contain any data and does not
receive any requests.
● Port: The port that the system uses for internal communication between services
● Start Time: The time at which the service started. The time is given in the timezone of the
SAP HANA server.
● CPU: A mini chart visualizing the CPU usage of the service. To open the Performance
Monitor for a more detailed breakdown of CPU usage, choose the mini chart.
● Memory: A mini chart visualizes the memory usage of the service as follows:
- Dark green shows the service's used memory.
- Light green shows the service's peak memory.
- The gray stroke represents the effective allocation limit.
- The light gray background represents the physical memory.

To open Memory Analysis for a more detailed breakdown of memory usage as follows,
choose the mini chart:
- Used Memory (MB): The amount of memory currently used by the service. Choosing
the mini chart opens the Memory Analysis app for a more detailed breakdown of
memory usage.
- Peak Memory (MB): The highest amount of memory ever used by the service
- Effective Allocation Limit (MB): The effective maximum memory pool size that is
available to the process considering the current memory pool sizes of other processes
- Memory Physical on Host (MB): The total memory available on the host
- All Process Memory on Host (MB): The total used physical memory and swap memory
on the host
- Allocated Heap Memory (MB): The heap part of the allocated memory pool
- Allocated Shared Memory (MB): The shared memory part of the allocated memory pool
- Allocation Limit (MB): The maximum size of the allocated memory pool
- CPU Process (%): The CPU usage of the process
- CPU Host (%): The CPU usage on the host
- Memory Virtual on Host (MB): The virtual memory of the host
- Process Physical Memory (MB): The process physical memory used
- Process Virtual Memory (MB): The process virtual memory
- Shrinkable Size of Caches (MB): The memory that can be freed in the event of a
memory shortage

46 © Copyright. All rights reserved.


Lesson: Analyzing a Suddenly Slow System

- Size of Caches (MB): The part of the allocated memory pool that can potentially be
freed in the event of a memory shortage
- Size of Shared Libraries (MB): The code size (including shared libraries)
- Size of Thread Stacks (MB): The size of the service thread call stacks
- Used Heap Memory (MB): The amount of the process heap memory used
- Used Shared Memory (MB): The amount of the process shared memory used
- SQL Port: The SQL port number
- Process ID: The process ID

Operations on Services
As an administrator, you may need to perform certain operations on all, or selected services -
for example, start missing services, or stop or kill a service.
You can perform several operations on database services from Manage Services. You can
trigger these operations by selecting the service, and choosing the required option in the
footer toolbar.
Choose Start Missing Services to start inactive services. This can only be performed on a
tenant database if you drill down to Manage Services, through the system database.
Choose Stop Service to stop the selected service normally. The service is then typically
restarted.
Choose Kill Service to stop the selected service immediately and, if the related option is
selected, create a crashdump file. The service is then typically restarted.
Choose Add Service to add the service you selected from the list. This can only be performed
on a tenant database if you drill down to Manage Services, through the system database.
Services cannot be added to the system database itself. To add a service, you must have the
EXECUTE privilege on the stored procedure SYS. UPDATE_LANDSCAPE_CONFIGURATION.
Choose Remove Service to remove the selected service. This can only be performed on a
tenant database if you drill down to Manage Services, through the system database.
You can only remove services that have their own persistence. If data is still stored in the
service's persistence, it is re-distributed to other services.
You cannot remove the following services:
● Name server
● Master index server
● Primary index server on a host

To remove a service, you must have the EXECUTE privilege on the stored procedure SYS.
UPDATE_LANDSCAPE_CONFIGURATION.
Choose Reset Memory Statistics to reset all memory statistics for all services. This can only
be performed on a tenant database if you drill down to Manage Services application, through
the system database.
Peak used memory is the highest recorded value for used memory since the last time the
memory statistics were reset. This value is useful for understanding the behavior of used
memory over time and under peak loads. Resetting peak used memory allows you, for

© Copyright. All rights reserved. 47


Unit 1: Emergency Analysis and Troubleshooting

example, to establish the impact of a certain workload on memory usage. If you reset peak
used memory and run the workload, you can then examine the new peak used memory value.
Choose Go To Alerts to display the alerts for this database.
The SAP HANA database provides several features in support of high availability, one of which
is service auto-restart. In the event of a failure, or an intentional intervention by an
administrator that disables one of the SAP HANA services, the service auto-restart function
automatically detects the failure and restarts the stopped service process.

Memory Statistics per Service


Analyzing the memory allocation of the SAP HANA database can help you to investigate such
situations as out-of-memory incidents, memory corruptions, and memory leaks.

Figure 32: Memory Analysis – Per Service

The Memory Analysis app enables you to visualize and explore the memory allocation of every
service of a selected host during a specified time period. If you notice an increase in overall
memory usage, you can investigate whether it is due to a particular component,
subcomponent, or table.
The upper chart provides the following data:
● Global Allocation Limit: This is the global_allocation_limit for the host (as set in the
global.ini configuration file).
● Allocated Memory: This is the pool of memory pre-allocated by the host for storing in-
memory table data, thread stacks, temporary results, and other system data structures.

48 © Copyright. All rights reserved.


Lesson: Analyzing a Suddenly Slow System

● Total Used Memory: This is the total amount of memory used by SAP HANA, including
program code and stack, all data and system tables, and the memory required for
temporary computations.

Move the vertical selection bar in the upper chart to populate the data in the lower chart. The
vertical selection bar snaps to the closest time for which there is collected data for the
components. When you select the Components tab, the lower chart displays the Used
Memory by Component.
The Components tab provides the following detailed information:
● Used Memory by Component: For the specific time (chosen by the vertical selection bar in
the upper chart), the components of the selected service are listed in descending order of
the amount of used memory.
● Used Memory by Type: This donut chart displays a visual representation of the types of
used memory for the specified time.
● Components Used Memory History: If you select the checkbox of one or more
components, the used memory history chart is populated.

The Subcomponents tab displays more detailed memory use. You can filter by component
type. You can move through the collected data points by using the arrow buttons. The
following information is displayed:
● Used Memory by Subcomponent: Subcomponents of the selected component are listed in
descending order of used inclusive memory for the specific time (chosen by the vertical
selection bar in the upper chart). By choosing a subcomponent, you can expand the list.
● Filter by Component Name: To further refine the displayed subcomponent data, select the
filter icon to specify one or more component names.
● Subcomponents Used Memory History: Selecting the checkbox of one or more
subcomponents populates the used memory history chart.

The Tables tab shows detailed statistics on the memory used by data tables. The Tables tab
shows the following information:
● Top Ten Tables by Size: This displays the breakdown of memory usage of the 10 highest
consuming tables for the specific time (chosen by the vertical selection bar in the upper
chart).
● Top Ten Tables by Growth: This displays the memory usage of the 10 tables with the
largest change in consumption for the selected time period. By hovering over the data, you
can see the Previous Size memory usage value from the beginning of the time period and
the Growth during the time period (where the current size of the table is the sum of
Previous Size and Growth).

The following system views provide information from which the current and historical
memory allocation is calculated:
● HOST_RESOURCE_UTILIZATION_STATISTICS

● HOST_SERVICE_MEMORY

● HOST_SERVICE_COMPONENT_MEMORY

● HOST_HEAP_ALLOCATORS

● GLOBAL_ROWSTORE_TABLES_SIZE_BASE

© Copyright. All rights reserved. 49


Unit 1: Emergency Analysis and Troubleshooting

● HOST_COLUMN_TABLES_PART_SIZE

All views are in the _SYS_STATISTICS schema. For more information about these views, see
the SAP HANA SQL and System Views Reference Guide.

Usage and Performance Metrics


You can monitor key database metrics through the Memory Usage, CPU Usage, and Disk
Usage cards, as well as the Threads, Sessions, and Monitor Statements cards.
In a multi-host system, each host is represented by a selectable bar, with the selected host
displaying a time graph to the right of the bar chart. Hover over the bars to see details for the
selected host.
If a bar is highlighted, there is an associated high (red) or medium (yellow) alert. With single-
host databases, as there is only one host, no bar graphs are displayed. By viewing this high-
level information, you can decide whether to drill down to the Performance Monitor.

Performance Monitor
Analyzing the performance of the SAP HANA database over time can help you to pinpoint
bottlenecks, identify patterns, and forecast requirements.
Use the Performance Monitor to visually analyze historical performance data across a range
of key performance indicators related to memory, disk, and CPU usage.
Open the Performance Monitor by choosing the chart or the Show All link on the Memory
Usage, CPU Usage, or Disk Usage card on the homepage of the SAP HANA cockpit.

Figure 33: Performance Monitor

The Performance Monitor can be reached through the Memory Usage, CPU Usage, and Disk
Usage cards. All three cards point to the same Performance Monitor, but the displayed data

50 © Copyright. All rights reserved.


Lesson: Analyzing a Suddenly Slow System

(memory, CPU, or disk) depends on the selected database. The general working of the
Performance Monitor is the same for all three of the cards.
The Performance Monitor opens displaying the load graph for the selected database
(memory, CPU, or disk). The load graph initially visualizes resource usage of all hosts and
services listed on the left, according to the default key performance indicator (KPI) group of
the selected database.
You can customize the information displayed on the load graph in several ways, for example:
● Define the monitored time frame.
● Use the Add Chart button to create custom charts displaying the host and services
selection, as well as selected KPIs. For a list of all available KPIs, see Key Performance
Indicators.
● Update the displayed data by the selected refresh rate.
● Zoom into a specific time by changing the duration.
● In the Settings menu, customize your graphs by including hosts and services as well as
additional KPIs in the Charts tab. In the Alerts tab, configure alerts per category and
priority status.

Monitoring and Analyzing Sessions


Analyzing the sessions connected to your SAP HANA database can help you identify which
applications or which users are currently connected to your system, as well as what they are
doing in terms of SQL execution.
To use the Sessions card to monitor all sessions in your landscape, choose the mini chart.

Figure 34: Monitoring Sessions Card

© Copyright. All rights reserved. 51


Unit 1: Emergency Analysis and Troubleshooting

The Sessions card displays the number of active and total sessions.
Open the Sessions card. The Sessions page allows you to monitor all sessions in the current
landscape. You can see the following information:
● Active/inactive sessions and their relation to applications
● Whether a session is blocked and, if so, which session is blocking it
● The number of transactions that are blocked by a blocking session
● Statistics such as average query runtime and the number of DML and DDL statements in a
session
● The operator currently being processed by an active session

To support monitoring and analysis, you can perform the following actions from the Sessions
page:
● To cancel a session, choose Cancel Sessions.
● To save the data sets as a text or HTML file, choose Save As.

Monitoring and Analyzing Threads


Analyzing the threads running in the SAP HANA database can be helpful when analyzing the
current system load.
This analysis can help you identify which statements or procedures are being executed and at
what stage they are, who else is connected to the system, and if there are any internal
processes running also.
Use the Threads card to monitor the longest-running threads active in your system. It may be
useful to see, for example, how long a thread has been running, or if a thread is blocked for an
inexplicably long time.

Figure 35: Monitoring Threads Card

52 © Copyright. All rights reserved.


Lesson: Analyzing a Suddenly Slow System

The Threads card provides you with information about the number of currently active and
blocked threads in the database.
To open the Threads card, choose either the number of active or blocked threads on the card.
The 1,000 longest-running threads currently active in the database are listed. By default,
threads are listed in order of longest runtime. For each statement, you can see the duration,
as well as the name of the service that is executing the thread. You can identify the host, the
port, and the thread type, and whether the statement is related to a blocking transaction.
If a thread is involved in a blocked transaction, or is using an excessive amount of memory,
cancel the operation executing the thread by choosing Cancel Operations in the footer
toolbar.

Thread Details
The Threads card provides you with detailed information about the 1,000 longest-running
threads currently active in the database.

Note:
Not all of the columns listed in the following table are visible by default. You can
add and remove columns in the table personalization dialog, which you open by
choosing the personalization icon in the table toolbar.

The following table lists the information available for threads.

Detail Description

Blocking Transaction Blocking transaction


Duration (ms) Duration (ms)
Host Host name
Port Internal port
Service Service name
Hierarchy Thread grouping information. Filled with Con-
nection ID/Update Transaction ID/Transac-
tion ID or left empty for inactive threads
Connection ID Connection ID
Thread ID Thread ID
Calling The thread or service which the thread calls
Caller The thread or service which called this thread
Thread Type Thread Type
Thread Method Thread method
Thread Detail Thread detail
User User name
Application User Application user name
CPU Time CPU time of thread
Cumulative CPU Time CPU time of thread and associated children

© Copyright. All rights reserved. 53


Unit 1: Emergency Analysis and Troubleshooting

Detail Description

Transaction ID Transaction ID
Update Transaction ID Update Transaction ID
Thread Status Thread State
Connection Transaction ID Transaction object ID
Connection Start Time Connected Time
Connection Idle Time (ms) Time that the connection is unused and idle
Connection Status Connection Status: 'RUNNING' or 'IDLE'
Client Host Host name of client machine
Client IP IP of client machine
Client PID Client Process ID
Connection Type Connection type: Remote, Local, History (re-
mote), History (local)
Own Connection Own connection: TRUE if own connection,
FALSE if not
Memory Size per Connection Allocated memory size per connection
Auto Commit Commit mode of the current transaction:
TRUE if the current connection is in auto-
commit mode, FALSE otherwise
Last Action The last action done by the current connec-
tion: ExecuteGroup, CommitTrans, Abort-
Trans, PrepareStatement, CloseStatement,
ExecutePrepared, ExecuteStatement, Fetch-
Cursor, CloseCursor, LobGetPiece, LogPut-
Piece, LobFind, Authenticate, Connect, Dis-
connect, ExecQidItab, CursorFetchItab, In-
sertIncompleteItab, AbapStream, TxStartXA,
TxJoinXA
Current Statement ID Current statement ID
Current Operator Name Current operator name
Fetched Record Count Sum of the record count fetched by select
statements
Sent Message Size (Bytes) Total size of messages sent by the current
connection
Sent Message Count Total message count sent by the current
connection
Received Message Size (Byte) Total size of messages/transactions re-
ceived by the current connection
Received Message Count Total message/transaction count received
by the current connection

54 © Copyright. All rights reserved.


Lesson: Analyzing a Suddenly Slow System

Detail Description

Creator Thread ID Thread ID who created the current connec-


tion
Created By Engine component that created the connec-
tions: Session, Planning, Repository, CalcEn-
gine, Authentication, Table Exporter, Loader,
LLVM, JSVM, IMS Search API, OLAP Engine,
Mergedog, Ping Status, Name Server, Queue
Server, SQL Stored Procedure, Authoriza-
tion, TrexViaDbsl from ABAP, HybridTable
Reorganizer, Session external
Is Encrypted Encrypted: TRUE if the secure communica-
tion is enabled (SSL enabled), FALSE, other-
wise
Connection End Time The time when the connection is closed for
history connections
Blocked Update Transaction ID Write transaction ID of the write transaction
waiting for the lock
Blocking Transaction ID Transaction object ID of the transaction hold-
ing the lock
Thread ID of Lock Owner Connection ID associated with the blocked
write transaction
Blocking Update Transaction ID Write transaction ID of the write transaction
holding the lock
Transactional Lock Type Transactional lock type
Transactional Lock Mode Transactional lock mode
Lock Wait Component Waiting for lock component
Lock Wait Name Waiting for lock ID
Timestamp of Blocked Transaction Timestamp of the blocked transaction
Waiting Record ID ID of the record on which the lock is currently
placed
Waiting Object Name Name of the object on which the lock is cur-
rently placed
Waiting Object Type Type of the object on which the lock is cur-
rently placed
Waiting Schema Name Name of the schema on which the lock is cur-
rently placed

Monitoring and Analyzing with the Top SQL Statements


Analyzing the current most critical statements running in the SAP HANA database can help
you identify the root cause of poor performance, CPU bottlenecks, or out-of-memory
scenarios. Enabling memory tracking allows you to monitor the amount of memory used by
single statement executions.

© Copyright. All rights reserved. 55


Unit 1: Emergency Analysis and Troubleshooting

Use the Top SQL Statements card to analyze the current most critical statements running in
the database.

Figure 36: Monitoring Statements Card

The Top SQL Statements card displays the number of long-running statements and the long-
running blocking situations currently active in the database. Statements are ranked based on
a combination of the following criteria:
● The runtime of the current statement execution.
● The lock wait time of the current statement execution.
● The cursor duration of the current statement execution.

Open the Top SQL Statements card to list the 100 most critical statements currently active in
the database. By default, statements are listed in order of the longest runtime. For each
statement, you can see the full statement string, as well as the ID of the session in which the
statement is running. You can identify the application, the application user, and the database
user running the statement and whether the statement is related to a blocking transaction.
Optionally, you can activate monitoring of the memory consumption of statements by
choosing Enable Memory Tracking in the footer toolbar. Detailed information about the
memory consumption of statement execution is collected and displayed.
If a statement is involved in a blocked transaction or using an excessive amount of memory,
cancel the session that the statement is running in (or the blocking session) by choosing
Cancel Session in the footer toolbar.

56 © Copyright. All rights reserved.


Lesson: Analyzing a Suddenly Slow System

LESSON SUMMARY
You should now be able to:
● Analyze a suddenly slow system

© Copyright. All rights reserved. 57


Unit 1: Emergency Analysis and Troubleshooting

58 © Copyright. All rights reserved.


Unit 1

Learning Assessment

1. What kind of query language statements can be used in the SAP HANA Database
Explorer?
Choose the correct answer.

X A SQL

X B MDX

X C XML

X D TCL

2. In a system-down scenario, SQL can be used to access the system to further analyze the
problem.
Determine whether this statement is true or false.

X True

X False

3. Which analysis steps can be performed when the SAP HANA database cannot be reached
using SQL?
Choose the correct answer.

X A Check the /var/log/messages for SAP HANA error messages.

X B Use the deadlock detector functionality of hdbcons.

X C Use the Heath Monitor of SAP HANA Cockpit 2.0.

X D Check the hardware with the SAP HANA Hardware Configuration Check Tool.

4. The Troubleshoot Unresponsive Systems function collects its data through the
fullSystemInfoDump.py python script.
Determine whether this statement is true or false.

X True

X False

© Copyright. All rights reserved. 59


Unit 1: Learning Assessment

5. Which monitoring KPIs are provided in the Manage Services application?


Choose the correct answers.

X A CPU usage

X B Service status

X C Disk usage

X D Lock status

6. Because SAP HANA is running in-memory, a slow system situation is always caused by a
problem at OS level.
Determine whether this statement is true or false.

X True

X False

60 © Copyright. All rights reserved.


Unit 1

Learning Assessment - Answers

1. What kind of query language statements can be used in the SAP HANA Database
Explorer?
Choose the correct answer.

X A SQL

X B MDX

X C XML

X D TCL

Correct! The SAP HANA Database Explorer supports only SQL. Read more about this in
the lesson "Handling System Offline Situations" of the course HA215.

2. In a system-down scenario, SQL can be used to access the system to further analyze the
problem.
Determine whether this statement is true or false.

X True

X False

Correct! When the SAP HANA database is down, you should use the Troubleshoot
Unresponsive System application to analyze the problem. Read more about this in the
lesson "Handling System Offline Situations" of the course HA215.

3. Which analysis steps can be performed when the SAP HANA database cannot be reached
using SQL?
Choose the correct answer.

X A Check the /var/log/messages for SAP HANA error messages.

X B Use the deadlock detector functionality of hdbcons.

X C Use the Heath Monitor of SAP HANA Cockpit 2.0.

X D Check the hardware with the SAP HANA Hardware Configuration Check Tool.

Correct! The deadlock detector functionality is part of the hdbcons tool, and can be used
for analyzing an unreachable system. Read more about this in the lesson "Handling
System Hang but Reachable Situations" of the course HA215.

© Copyright. All rights reserved. 61


Unit 1: Learning Assessment - Answers

4. The Troubleshoot Unresponsive Systems function collects its data through the
fullSystemInfoDump.py python script.
Determine whether this statement is true or false.

X True

X False

Correct! The Troubleshoot Unresponsive Systems function collects its data through the
SAP Host Agent. Read more on this in the lesson "Handling System Hanging but
Reachable Situations" of the course HA215.

5. Which monitoring KPIs are provided in the Manage Services application?


Choose the correct answers.

X A CPU usage

X B Service status

X C Disk usage

X D Lock status

Correct! The CPU usage and Service status KPIs are shown in the Manage Services
application. Read more on this in the lesson "Analyzing a Suddenly Slow System" of the
course HA215.

6. Because SAP HANA is running in-memory, a slow system situation is always caused by a
problem at OS level.
Determine whether this statement is true or false.

X True

X False

Correct! A slow system can be caused by OS, network, hardware, other software, or the
SAP HANA database. The problem could be with any of these components. Read more on
this in the lesson "Analyzing a Suddenly Slow System" of the course HA215.

62 © Copyright. All rights reserved.


UNIT 2 Structural System Performance
Root Cause Analysis

Lesson 1
Analyzing Memory Issues 65

Lesson 2
Analyzing CPU Issues 79

Lesson 3
Analyzing Expensive Statement Issues 87

Lesson 4
Analyzing Disk and I/O Issues 95

UNIT OBJECTIVES

● Analyze memory issues


● Analyze CPU issues
● Analyze expensive statement issues
● Analyze disk and I/O issues

© Copyright. All rights reserved. 63


Unit 2: Structural System Performance Root Cause Analysis

64 © Copyright. All rights reserved.


Unit 2
Lesson 1
Analyzing Memory Issues

LESSON OBJECTIVES
After completing this lesson, you will be able to:
● Analyze memory issues

Memory Architecture and Allocation

Figure 37: Course Content Overview

Memory Usage in the SAP HANA Database Environment


Memory is a fundamental resource of the SAP HANA database. Understanding how the SAP
HANA database requests, uses, and manages this resource is crucial to understanding SAP
HANA.
SAP HANA provides a variety of memory usage indicators that allow for monitoring, tracking,
and alerting. The most important indicators are used memory and peak used memory. Since
SAP HANA contains its own memory manager and memory pool, external indicators such as
the size of resident memory at host level and the size of virtual and resident memory at
process level can be misleading when you are estimating the real memory requirements of an
SAP HANA deployment.

© Copyright. All rights reserved. 65


Unit 2: Structural System Performance Root Cause Analysis

You can find detailed information about memory consumption of individual SAP HANA
components and executed operations on the SAP HANA Performance Monitor - Memory
application.

Note:
See SAP Note 1704499 - "System Measurement for License Audit" for more
information about memory consumption with regards to SAP HANA licenses.

Virtual, Physical, and Resident Memory


When part of the virtually allocated memory actually needs to be used, it is loaded or mapped
to the real, physical memory of the host and becomes resident. Physical memory is the DRAM
memory installed on the host. On most SAP HANA hosts, it ranges from 256 gigabytes (GB)
to 1 terabyte (TB). It is used to run the Linux operating system, SAP HANA, and all other
programs.
Resident memory is the physical memory actually in operational use by a process. Over time,
the operating system may swap out some of a process's resident memory according to a
least-recently-used algorithm to make room for other data or code. Thus, a process's resident
memory size may fluctuate independently of its virtual memory size. A properly-sized SAP
HANA appliance has enough physical memory, so that swapping is disabled and should not be
observed.
This can be illustrated as shown in the figure, SAP HANA Resident Memory.

Figure 38: SAP HANA Resident Memory

On a typical SAP HANA appliance, the resident memory part of the operating system and all
other running programs usually does not exceed 2 GB. The rest of the memory is therefore
dedicated to SAP HANA.

66 © Copyright. All rights reserved.


Lesson: Analyzing Memory Issues

When memory is required for table growth or for temporary computations, the SAP HANA
code obtains it from the existing memory pool. When the pool cannot satisfy the request, the
SAP HANA memory manager will request and reserve more memory from the operating
system. At this point, the virtual memory size of SAP HANA processes grows.
Once a temporary computation completes or a table is dropped, the freed memory is
returned to the memory manager, which recycles it to its pool without informing the operating
system. Therefore, from SAP HANA's perspective, the amount of used memory shrinks, but
the process’s virtual and resident memory sizes are not affected. This creates a situation
where the used memory value may shrink to below the size of SAP HANA's resident memory.
This is normal.

Note:
The memory manager may also choose to return memory back to the operating
system, for example, when the pool is close to the allocation limit and contains
large unused parts.

SAP HANA Memory Usage and the Operating System


Because of the way SAP HANA manages memory, the relationship between Linux memory
indicators and SAP HANA's own memory indicators may not correlate as expected.
From the perspective of the Linux operating system, SAP HANA is a collection of separate
processes. Linux programs reserve memory for their use from the Linux operating system.
The entire reserved memory footprint of a program is referred to as its virtual memory. Each
Linux process has its own virtual memory, which grows when the process requests more
memory from the operating system, and shrinks when the process relinquishes unused
memory. You can think of virtual memory size as the memory amount that the process has
requested (or allocated) from the operating system, including reservations for its code, stack,
data, and memory pools under program control. SAP HANA's virtual memory is shown in the
figure, SAP HANA Memory Usage and the Operating System.

Figure 39: SAP HANA Memory Usage and the Operating System

© Copyright. All rights reserved. 67


Unit 2: Structural System Performance Root Cause Analysis

Note:
SAP HANA really consists of several separate processes, so the figure shows all
SAP HANA processes combined.

SAP HANA Used Memory


The total amount of memory used by SAP HANA is referred to as used memory. It includes
program code and stack, all data and system tables, and the memory required for temporary
computations.
SAP HANA consists of a number of processes running in the Linux operating environment.
Under Linux, the operating system is responsible for reserving memory to all processes.
When SAP HANA starts up, the operating system reserves memory for the program code
(sometimes called the text), the program stack, and static data. It then dynamically reserves
additional data memory when requested by the SAP HANA memory manager. Dynamically
allocated memory consists of heap memory and shared memory.
The figure, SAP HANA Used Memory, shows used memory, consisting of code, stack, and
table data.

Figure 40: SAP HANA Used Memory

Because the code and program stack size are about 6 GB, almost all of the used memory is
used for table storage, computations, and database management.
Memory is a fundamental resource of the SAP HANA database. Understanding how the SAP
HANA database requests, uses, and manages this resource is crucial to understanding SAP
HANA.
SAP HANA provides various memory usage indicators that enable monitoring, tracking, and
alerting. The most important indicators are those for used memory and peak used memory.
SAP HANA contains its own memory manager and memory pool. Certain external indicators
can be misleading when estimating the real memory requirements of an SAP HANA
deployment. Examples of these indicators include the size of resident memory at the host
level, and the size of virtual and resident memory at the process level.

68 © Copyright. All rights reserved.


Lesson: Analyzing Memory Issues

Service Used Memory


An SAP HANA system consists of multiple services that all consume memory, in particular the
indexserver service, the main database service. The index server holds all the data tables and
temporary results, and therefore dominates SAP HANA used memory.

Peak Used Memory


Ultimately, it is more important to understand the behavior of used memory over time and
under peak loads. For this purpose, SAP HANA has a special used memory indicator called
peak used memory. Because the value for used memory is a current measurement, peak
used memory allows you to keep track of the maximum value for used memory over time.
You can also reset peak used memory. This can be useful if you want to establish the impact
of a certain workload on memory usage. So for example, you can reset peak used memory,
run the workload, and then examine the new peak used memory value.

Memory Usage of Tables


The dominant part of the used memory in the SAP HANA database is the space used by data
tables. Separate measurements are available for column-store tables and row-store tables.

Note:
The SAP HANA database loads column-store tables into memory column by
column only upon use. This is sometimes called "lazy loading". This means that
columns that are never used will not be loaded and memory waste is avoided.
When the SAP HANA database runs out of allocatable memory, it will try to free
up some memory by unloading unimportant data (such as caches) and even
table columns that have not been used recently. Therefore, if it is important to
measure precisely the total, or worst-case, amount of memory used for a
particular table, it is important to ensure that the table is first fully loaded into
memory. You can do this by loading the table into memory.

Memory Usage of Expensive Statements


Every query and statement consumes memory, for the evaluation of the statement plan,
caching, and, mainly the calculation of intermediate and final results. While many statement
executions use only a moderate amount of memory, some queries, for instance using
unfiltered cross joins, will tax even very large systems.
Expensive statements are individual SQL statements whose execution time exceeded a
configured threshold. The expensive statements trace records information about these
statements for further analysis. If in addition to activating the expensive statements trace,
you enable per-statement memory tracking, the expensive statements trace will also show
the peak memory size used to execute expensive statements.
It is possible to further protect an SAP HANA system against excessive memory usage due to
uncontrolled queries by limiting the amount of memory used by single statement executions
per host.

Memory Allocator Statistics


Detailed information about memory consumption can be found by looking into allocator
statistics.
Allocator statistics track the memory consumption of individual components and operations
in SAP HANA and may help you to analyze issues related to memory consumption. Statistics

© Copyright. All rights reserved. 69


Unit 2: Structural System Performance Root Cause Analysis

are saved in the system views M_HEAP_MEMORY (allocated memory by component) and
M_CONTEXT_MEMORY (allocated memory that can be associated with a connection, a
statement, or a user). Both views have a reset feature so that statistics can be captured for a
specific period of time. The embedded statistics service also includes a view which tracks
memory allocation per host, HOST_HEAP_ALLOCATORS.

Note:
For full details of these views, see the SAP HANA SQL and System Views
Reference.

Allocator statistics are saved automatically for each core processor and in certain scenarios
where systems have a large number of logical cores the statistics can consume a significant
amount of memory. To save memory statistics logging can be reduced, to save statistics only
for each node, or only for each statistics object. An example of using the lscpu command to
retrieve details of the physical and logical CPU architecture is given in the section Controlling
CPU Consumption.
You can configure this feature by setting values for the following two configuration
parameters in the global.ini file:
● The parameter pool_statistics_striping can reduce the amount of memory
consumed by the component-specific allocator statistics (rows in M_HEAP_MEMORY with
the category PoolAllocator).
● The parameter composite_statistics_striping can reduce the amount of memory
consumed by statement-specific allocator statistics (rows in M_CONTEXT_MEMORY).

The parameters can be set to one of the following values (the configuration can be changed
online, but the change will only affect newly created statistic objects):

Value Effect
auto (default value) Let the system decide the statistics strategy.
By default SAP HANA will try to utilize as
much memory as possible for maximum per-
formance.
core The system allocates one stripe per logical
core.
numa The system allocates only one stripe per NU-
MA node.
none In this case, the system creates a single
stripe per statistics object.

Allocated Memory Pools and Allocation Limits


SAP HANA, across its different processes, reserves a pool of memory before actual use. This
pool of allocated memory is preallocated from the operating system over time, up to a
predefined global allocation limit, and is then efficiently used by SAP HANA as needed.
SAP HANA preallocates and manages its own memory pool, used for storing in-memory table
data, thread stacks, temporary results, and other system data structures. When more
memory is required for table growth or temporary computations, the SAP HANA memory
manager obtains it from the pool. When the pool cannot satisfy the request, the memory

70 © Copyright. All rights reserved.


Lesson: Analyzing Memory Issues

manager increases the pool size by requesting more memory from the operating system, up
to a predefined allocation limit.

The global_allocation_limit Parameter


The global_allocation_limit parameter is used to limit the amount of memory that can
be used by the SAP HANA database. The unit for this parameter is MB.
The default value is 0, in which case the global allocation limit is calculated as follows:
● Memory rule 1: <= 10GB – physical memory on the host minus 1 GB
● Memory rule 2: >10GB to <= 64GB – 90% physical memory on the host
● Memory rule 2: > 64GB – Memory rule 2 + 97% of each further GB

Alternatively, you can define this limit as a flexible percentage of the available main memory
size. If you enter a percentage value the precise value of the limit will be calculated
automatically by the system. The percentage value is very useful when running SAP HANA in
a virtual environment, if you then change the size of the vm-container where the system runs
the allocation limit will automatically adjust to the correct percentage of the new vm-container
size.

Note:
Changing this parameter does not require a restart.

Figure 41: Example: Usage of the global_allocation_limit Parameter

© Copyright. All rights reserved. 71


Unit 2: Structural System Performance Root Cause Analysis

There is normally no reason to change the value of this parameter although, for example, on
development systems with more than one SAP HANA system installed on a single host you
could limit the size of the memory pool to avoid resource contentions or conflicts.
A change may also be necessary to remain in compliance with the memory allowance of your
license if you purchased a license for less than the total amount of physical memory available.
This is illustrated in the following examples:
To change the global memory allocation limit, you must do the following:

1. Ensure you have the system privilege INIFILE ADMIN.

2. Change the value of the global_allocation_limit parameter in the configuration file


global.ini → section: memorymanager using SAP HANA Cockpit or SQL.

Caution:
As of SPS04 and the introduction of the parameter configuration framework,
SAP recommends not to edit the INI files directly when SAP HANA is online.

If you only enter a value for the system, it is used for all hosts. For example, if you have five
hosts and you set the limit to 5 GB, the database can use up to 5 GB on each host (25 GB in
total). If you enter a value for a specific host, then for that host, the specific value is used and
the system value is only used for all other hosts. This is only relevant for multiple-host
(distributed) systems.

Service Allocation Limit

Figure 42: Memory Architecture and Allocation – Example 2

72 © Copyright. All rights reserved.


Lesson: Analyzing Memory Issues

In addition to the global allocation limit, each service running on the host has an allocation
limit: the service allocation limit. Given that collectively, all services cannot consume more
memory than the global allocation limit, each service has what is called an effective allocation
limit. The effective allocation limit of a service specifies how much physical memory a service
can consume in reality, considering the current memory consumption of other services.

Figure 43: Change Tenant Memory Allocation Limit

What Happens When the Allocation Limit is Reached?


Memory is a finite resource. Once the allocation limit is reached and the pool is exhausted, the
memory manager can no longer allocate memory for internal operations without first giving
up something else. Buffers and caches are released, and column store tables are unloaded,
column-by-column, based on a least-recently-used order, up to a preset lower limit. When
tables are partitioned over several hosts, this is managed on a host-by-host basis, that is,
column partitions are unloaded only on hosts with an acute memory shortage.
Table (column or partition) unloading is generally not a good situation since it leads to
performance degradation later when the data must be reloaded for queries that need them.
You can identify pool exhaustion by examining the M_CS_UNLOADS system view.
However, it is still possible that the memory manager needs more memory than is available,
leading to an out-of-memory failure. This may happen, for example, when too many
concurrent transactions use up all memory, or when a particularly complex query performs a
cross join on very large tables and creates a huge intermediate result that exceeds the
available memory.

© Copyright. All rights reserved. 73


Unit 2: Structural System Performance Root Cause Analysis

Memory Information in Performance Monitor

Figure 44: Memory Information in Performance Monitor

You can find the Performance Monitor, with all preselected memory indicators, in the Memory
Usage card of the SAP HANA cockpit.
The Performance Monitor provides an overview of the general memory situation, with time-
based statistics for the following indicators:
● Database resident memory
● Total resident memory
● Physical memory size
● Database used memory
● Database allocation limit

For all running services, it provides the following respective indicators:


● Memory used
● Memory allocation limit

74 © Copyright. All rights reserved.


Lesson: Analyzing Memory Issues

Memory Information from SQL Commands

Figure 45: Memory Information from SQL Commands

SAP Note: 1969700 - SQL Statement Collection for SAP HANA contains several commands
that are useful to analyze memory-related issues. Based on your needs, you can configure
restrictions and parameters in the sections marked /* Modification section */.
The most important memory-related analysis queries are listed here. Note that some queries
have version-specific variations identified in the file names:
● HANA_Memory_Overview_1.00.vv
Provides an overview of current memory information.
● HANA_Memory_TopConsumers_1.00.vv
Displays the areas with the current highest memory requirements: columnstore and
rowstore tables, heap, code, and stack.
● HANA_Memory_SharedMemory*
Shows the currently used and allocated shared memory per host and service.
● HANA_Memory_TopConsumers_History_1.00.vv (+_ESS)
Displays the areas with the highest historical memory requirements: columnstore and
rowstore tables, heap, code, and stack. Optionally, it can include results for the Embedded
Statistics Server.

© Copyright. All rights reserved. 75


Unit 2: Structural System Performance Root Cause Analysis

Memory Information from Logs and Traces

Figure 46: Memory Information from Logs and Traces

In the case of critical memory issues, you can often find more detailed information in logs and
trace files, as follows:
● Identify memory-related errors in the SAP HANA system alert trace files. Search for the
strings memory, allocat, or OOM.

Note:
The search is not case-sensitive.

● Check if an Out of Memory (OOM) trace file was created.

76 © Copyright. All rights reserved.


Lesson: Analyzing Memory Issues

Figure 47: Check for OOM in SAP HANA Cockpit

In SAP HANA Cockpit, the number of Out of Memory (OOM) events are displayed on the
Memory Usage card. Use the Analyze Memory History application to investigate the root
cause of the OOM. You can find the Analyze Memory History application by choosing the More
button in the Memory Usage card.
Select the Out of Memory Events tab to display on lower chart the number of unique out-of-
memory events that have occurred in the time range specified in the header. The number of
events shown depends on your selected time range, not the vertical selection bar. The list
shows the following information on the OOM events:
● Occurrences: The number of times a specific OOM event has been triggered
● Last Occurrence: The time and date of the most recent occurrence of the OOM event
● Last Reason: The parameter that triggered the most recent occurrence of the OOM event
● Statement: The SQL statement related to the OOM event
● Statement Hash: The unique identifier for the OOM event. To open the Workload Analyzer
and investigate the event, choose the OOM identifier.

Hint:
When investigating from the SYSTEMDB, if an event has a corresponding OOM
dump file, you can select View Trace to launch the Dump Viewer in the SAP
Database Explorer.

© Copyright. All rights reserved. 77


Unit 2: Structural System Performance Root Cause Analysis

In the Memory Statistics charts you can choose to display historical data for a time range
between 24 hours and six weeks. To display a date range longer than six weeks (42 days), you
can use SQL to update the RETENTION_DAYS_CURRENT value in the table
_SYS_STATISTICS"."STATISTICS_SCHEDULE.
If you need help from SAP Customer Support to perform an in-depth analysis, add the
following valuable information to the ticket:
● Diagnosis information (full system info dump)
● Performance trace, which provides detailed information on the system behavior, including
statement execution details

The trace output is written to the trace file perftrace.tpt, which must be sent to SAP Customer
Support.
If specific SAP HANA system components need deeper investigation, SAP Customer Support
may ask you to raise the corresponding trace levels to INFO or DEBUG. To do this, following
these steps:

1. Launch the Database Trace wizard.

2. Select the Show All Components checkbox.

3. Enter the search string.

4. Select the found component in the indexserver.ini file.

5. Change the system trace level to the appropriate values.

Some trace components, for example join_eval = DEBUG, can create many megabytes of
trace information. They require an increase of the values maxfiles and maxfilesize in the
trace section of the global.ini file.

LESSON SUMMARY
You should now be able to:
● Analyze memory issues

78 © Copyright. All rights reserved.


Unit 2
Lesson 2
Analyzing CPU Issues

LESSON OBJECTIVES
After completing this lesson, you will be able to:
● Analyze CPU issues

CPU-related Issues
This lesson covers the troubleshooting of high CPU consumption on the system.

Figure 48: Course Content Overview

A constantly high CPU consumption leads to a considerably slower system, where no more
requests can be processed. From an end-user perspective, the application behaves slowly, is
unresponsive, or can seem to hang.

Note:
Optimal CPU use is the desired behavior for SAP HANA. Therefore, performance
issues are nothing to worry about unless the CPU becomes a bottleneck. SAP
HANA is optimized to consume all the memory and CPU available. The software
paralyzes queries as much as possible, to ensure optimal performance. Therefore,
if the CPU usage is near 100% for a query execution, it does not always mean that
there is an issue.

© Copyright. All rights reserved. 79


Unit 2: Structural System Performance Root Cause Analysis

Indicators of CPU-related Issues


CPU-related issues are indicated by alerts issued, or on cards in the SAP HANA Cockpit 2.0.
The following alerts may indicate CPU resource problems:
● Host CPU Usage (Alert 5)
● Most Recent Savepoint Operation (Alert 28)
● Savepoint Duration (Alert 54)

The following sources alert you to high CPU consumption on your SAP HANA database:
● Alert 5 (Host CPU usage) for current or past CPU usage
● The displayed CPU usage on the overview screen

Figure 49: Indicators of CPU-related Issues

The load graph in the figure, Indicators of CPU-related Issues, shows high CPU consumption,
or high consumption in the past.
Choose the CPU Usage card to see detailed CPU usage. In the detailed graph, several CPU-
related KPIs are shown.

80 © Copyright. All rights reserved.


Lesson: Analyzing CPU Issues

Figure 50: KPI Details of CPU-related Issues

On the left side of the figure, KPI Details of CPU-related Issues, the legend shows which color
represents which KPI. By default all KPIs are shown, which can make the graph appear
cluttered. Use the checkboxes in the legend to show or hide KPIs.
You can display a specific time period to investigate by using the From and To fields.

Create Additional KPIs Using Add Chart


Sometimes the default KPIs are not sufficient, and you need to combine other KPIs by using
the Add Chart button. To create an additional chart, you need to give it a name and select the
services and KPIs that you want to display.

© Copyright. All rights reserved. 81


Unit 2: Structural System Performance Root Cause Analysis

Figure 51: Adding Charts with more CPU-Related KPIs

You can rearrange the display order of the charts using the Settings button in the top-right
corner. You can also delete the charts, if needed.

Analysis of CPU-related Issues


When analyzing high CPU consumption, you must distinguish between the CPU resources
consumed by SAP HANA itself, and those consumed by other, non-SAP HANA processes on
the host. While the CPU consumption of SAP HANA is addressed here in detail, the CPU
consumption of other processes running on the same host is not covered. Such situations are
often caused by additional programs running concurrently on the SAP HANA appliance, such
as anti-virus and backup software. For more information, see SAP Note: 1730928.
A good starting point for the analysis is the Database Overview screen in the SAP HANA
cockpit. It displays the overall SAP HANA CPU usage, including all processes on the host, and
keeps track of the maximum CPU usage that occurred over the last two hours. If SAP HANA
CPU usage is low, then everything is fine. If the card shows high CPU usage, there is most
likely an issue on the SAP HANA database server. You should start an investigation to find the
cause of the problem.
To find out what is happening in more detail, open the Threads card from the Database
Overview page. To perform CPU time analysis, follow these steps:
● In the global.ini configuration file, check that the cpu_time_measurement_mode and
enable_tracking parameters are set to On. This ensures that resource tracking is
switched on.
● Add the CPU Time column to the display, using the settings on the right of the Threads
screen.

82 © Copyright. All rights reserved.


Lesson: Analyzing CPU Issues

The Thread Monitor now shows, in milliseconds, the CPU time of each thread running on SAP
HANA. A high CPU time for related threads indicates that an operation is causing increased
CPU consumption.

Figure 52: Threads Details – CPU-related Issues

To identify the expensive statements causing the high resource consumption, turn on the
expensive statement trace. This trace is accessed using the Monitor Expensive Statements
link in the Monitoring section of the SAP HANA database Overview screen. Start the expensive
statements trace and specify a reasonable run time. If possible, add further restrictive criteria
such as database user, or application user, to narrow down the amount of information traced.

Note:
When resource_tracking is activated, the CPU time for each statement is
shown in the CPU_TIME column.

© Copyright. All rights reserved. 83


Unit 2: Structural System Performance Root Cause Analysis

Figure 53: Expensive Statements Trace – CPU Time

Resolving CPU-related Issues


Resolving CPU-related issues is prioritized so that the system can return to a normal
operating state. This prioritization may make it more difficult to identify the root cause of the
problem.
After resolving the situation, it may not be possible to find out the actual root cause.
Therefore, record the state of the system under high load for later analysis by collecting a full
system info dump (see the lesson “Handling System Offline Situations”, earlier in this course).
To stop the operation causing high CPU consumption, go to the Thread card. In the Client
Host, Client IP, Client PID, and Application User columns (use the Settings button to add the
columns) you can identify the user that triggered the operation. To resolve the situation,
contact the user and clarify the actions they are performing.

84 © Copyright. All rights reserved.


Lesson: Analyzing CPU Issues

Figure 54: Expensive Statements Trace – Resolve CPU-related Issues

As soon as this is clarified and you agree on how to resolve the situation, two options are
available:
● On the client side, end the process calling the affected threads.
● Cancel the operation that is related to the affected threads.
To do this, select the identified thread in the Thread card and choose Cancel.

For further analysis on the root cause, open a ticket in SAP HANA Development Support and
attach the full system info dump, if available.

Retrospective Analysis of CPU-related Issues


There are several ways to analyze the root cause of an issue after it is resolved.
To perform a retrospective analysis of high CPU consumption, check the CPU Usage and
Alerts cards. The alert timestamp, combined with the data in the CPU Usage card, helps you
to determine the time frame of high CPU consumption. If you are unable to determine the
time frame, check the HOST_RESOURCE_UTILIZATION_STATISTICS statistics server table,
in the _SYS_STATISTICS schema. This table provides historical host resource information
for up to 30 days.
With this information, search through the trace files of the responsible process. Be careful to
choose the correct host when SAP HANA runs on a scale-out landscape. The information
contained in the trace files provides indications of the threads or queries that were running
during the affected time frame.
If the issue recurs, due to scheduled batch jobs or data loading processes, turn on the
Expensive Statements trace to record all involved statements. Also, check for background
jobs running concurrently, like backups and delta merge. These jobs may cause a resource

© Copyright. All rights reserved. 85


Unit 2: Structural System Performance Root Cause Analysis

shortage when running in parallel. Historical information about background jobs can be
obtained from the following system views:
● M_BACKUP_CATALOG

● M_DELTA_MERGE_STATISTICS

A longer history can be found in the HOST_DELTA_MERGE_STATISTICS statistics server


table, in the _SYS_STATISTICS schema.

LESSON SUMMARY
You should now be able to:
● Analyze CPU issues

86 © Copyright. All rights reserved.


Unit 2
Lesson 3
Analyzing Expensive Statement Issues

LESSON OBJECTIVES
After completing this lesson, you will be able to:
● Analyze expensive statement issues

Analyze Statements with the Expensive Statements Monitor

Figure 55: Course Content Overview

SQL statements processing a high amount of data, or using inefficient processing strategies,
can be responsible for increased memory requirements.

How to Analyze SQL Statements


A key step in identifying the source of poor performance is understanding how much time
SAP HANA spends on query execution. By analyzing SQL statements and calculating their
response times, you can better understand how the statements affect application and system
performance.
You can analyze the response time of SQL statements using the following:
● Monitor Statements
● SQL Trace

© Copyright. All rights reserved. 87


Unit 2: Structural System Performance Root Cause Analysis

From the trace file, you can analyze the response time of SQL statements.
● Expensive Statements Trace
On the Expensive Statements tab, you can view a list of all SQL statements that exceed a
specified response time.

In addition, you can analyze the SQL plan cache, which provides a statistical overview of the
statements are executed in the system.

Figure 56: The Monitor Statements Application

Use the Monitor Statements page to analyze the current most critical statements running in
the database.
Analyzing the current most critical statements running in the SAP HANA database can help
you identify the root cause of poor performance, CPU bottlenecks, or Out of Memory
situations. Enabling memory tracking allows you to monitor the amount of memory used by
single statement executions.
The SQL Statements card displays the number of long-running statements and long-running
blocking situations currently active in the database. Statements are ranked based on a
combination of the following criteria:
● Runtime of the current statement execution
● Lock wait time of the current statement execution
● Cursor duration of the current statement execution

Open the Monitor Statements page by choosing either the long-running statements or long-
running blocked statements on the card. The Monitor Statements page allows you to analyze

88 © Copyright. All rights reserved.


Lesson: Analyzing Expensive Statement Issues

the most current statements running in the database. Here you can see the following
information:
● The 100 most critical statements, listed in order of the longest runtime
● The full statement string and ID of the session in which the statement is running
● The application, the application user, and the database running the statement
● Whether a statement is related to a blocking transaction

To support monitoring, you can perform the following actions on the Monitor Statements
page:
● If a statement is in a blocked transaction or using an excessive amount of memory, you
can cancel the session that the statement is running in (or the blocked session) by
choosing Cancel Session in the footer toolbar.
● To access information about the memory consumption of statements, choose Enable
Memory Tracking in the footer toolbar.
● To set up or modify workload classes, choose a statement's Workload Class name. To
create a new workload class, choose New, or, to select a workload class from a list, choose
Existing, and fill out the fields.

Use and Analyze the Expensive Statements Trace


Use the Expensive Statements trace to analyze individual SQL queries whose execution time
is above a configured threshold. Analyzing expensive statements can help you to understand
why they exceed duration thresholds.
By default, the Expensive Statements trace is deactivated.

© Copyright. All rights reserved. 89


Unit 2: Structural System Performance Root Cause Analysis

Figure 57: Expensive Statements Trace (1 of 2)

You can find the Expensive Statements trace in the Monitor Statements application.
Use the Expensive Statements trace to analyze individual SQL queries whose execution time
was above a configured threshold. The Expensive Statements trace records information and
displays it on the Expensive Statements tab.

90 © Copyright. All rights reserved.


Lesson: Analyzing Expensive Statement Issues

Figure 58: Expensive Statements Trace (2 of 2)

The Expensive Statements trace records information about the expensive statements for
further analysis and displays it on the Expensive Statements tab.
To support monitoring and analysis, you can perform the following actions on the Expensive
Statements Trace page:
● The expensive statements trace is deactivated by default. To activate and configure it, in
the footer bar, choose Configure Trace.
● Define the monitored date.
● Filter expensive statements, refresh the list, choose the sorting parameter, and filter by
parameter.
● To save the data sets as a text or HTML files, choose Save As...
● To configure the threshold parameters, choose Configure Trace and enter information on
the Configure Expensive Trace page.
● To open an expensive statement with the SQL analyzer, next to the statement string,
choose More.
● Set up or modify workload classes by choosing a statement's workload class name. To
create a new workload class, choose New, or to select a workload class from a list, choose
Existing, and fill out the fields.

Use and Analyze the SQL Plan Cache


Use the SQL plan cache to get an insight into the workload of the SAP HANA database. It lists
all statements currently cached in the SAP HANA database.
Analyzing all statements currently cached in the SAP HANA database can help you to identify
statement hashes, as well as determining if a statement has been cached correctly.

© Copyright. All rights reserved. 91


Unit 2: Structural System Performance Root Cause Analysis

Figure 59: SQL Plan Cache

You can find the SQL Plan Cache tab in the Monitor Statements application.
Technically, the plan cache stores compiled execution plans of SQL statements for reuse,
which gives a performance advantage over recompilation at each invocation. For monitoring
reasons, the plan cache keeps statistics about each plan, for example, the number of
executions, minimum/maximum/total/average runtime, and lock/wait statistics. Analyzing
the plan cache is very helpful as one of the first steps in performance analysis because it gives
an overview of the statements that are executed in the system.

Note:
Due to the nature of a cache, seldom-used entries are removed from the plan
cache.

The SQL plan cache is useful for observing overall SQL performance because it provides
statistics on compiled queries. You can get an insight into frequently executed queries and
slow queries with a view to finding potential candidates for optimization.
To support monitoring and analysis, you can perform the following actions on the SQL Plan
Cache page:
● To open an SQL statement with the SQL analyzer, next to the statement string, choose
More.
● To save the data sets as a text or HTML file, choose Save As...
● The collection of SQL plan cache statistics is enabled by default. To disable it, choose
Configure.

92 © Copyright. All rights reserved.


Lesson: Analyzing Expensive Statement Issues

LESSON SUMMARY
You should now be able to:
● Analyze expensive statement issues

© Copyright. All rights reserved. 93


Unit 2: Structural System Performance Root Cause Analysis

94 © Copyright. All rights reserved.


Unit 2
Lesson 4
Analyzing Disk and I/O Issues

LESSON OBJECTIVES
After completing this lesson, you will be able to:
● Analyze disk and I/O issues

SAP HANA Storage Usage

SAP HANA Storage and I/O Usage


SAP HANA operates with all data in-memory, but also uses persistent storage to keep the
data safe in case of a system failure. Data changes are stored immediately (synchronously) in
the Log Volume and the transaction data and undo data is stored (asynchronously) in the
Data Volume using savepoints.

Data Volume explained


● Contains SQL data and undo log information
● Stores additional SAP HANA information, such as modeling data
● Data kept in-memory to ensure maximum performance
● Write process is asynchronous

Log Volume explained


● Information about data changes (redo log)
● Directly saved to persistent storage when transaction is committed (synchronous)
● Cyclical overwrite (only after backup)

© Copyright. All rights reserved. 95


Unit 2: Structural System Performance Root Cause Analysis

Figure 60: I/O Pattern per Operation

Although SAP HANA is an in-memory database, I/O still plays a critical role in system
performance. From an end user perspective, if there are issues with I/O performance, an
application, or the system as a whole, it runs sluggishly, is unresponsive, or can seem to hang.
In certain scenarios, data is read from or written to disk, for example during the COMMIT
transaction. Normally, this is done asynchronously, but at certain points, synchronous I/O is
performed. Even during asynchronous I/O, important data structures may be locked.
Here are some details for each of the scenarios:

Savepoint
A savepoint ensures that all changed, persistent data since the last savepoint is written to
disk.
By default, the SAP HANA database triggers savepoints at five minute intervals. Data is
automatically saved from memory to the data volume located on disk. Depending on the
type of data, the block sizes vary between 4 KB and 16 MB.
Savepoints run asynchronously to SAP HANA update operations. Database update
transactions only wait at the critical phase of the savepoint, which usually takes
microseconds.
Write Transactions
All changes to persistent data are captured in the redo log. SAP HANA asynchronously
writes the redo log with I/O orders of 4 KB to 1 MB size into log segments. Transactions
writing a Commit into the Redo log wait until the buffer containing the commit has been
written to the log volume.
Delta Merge
The delta merge itself takes place in-memory. Updates to column store tables are stored
in the delta storage. During the delta merge, these changes are applied to the main
storage, where they are stored, read, optimized, and compressed. After the delta merge
is complete, the new main storage is persisted in the data volume, that is, written to disk.
The delta merge does not block parallel read and update transactions.
Data Backup
For a data backup, the current payload of the data volumes is read and copied to backup
storage. When writing a backup, it is essential that there are no collisions with other
transactional operations running against the database on the I/O connection.

96 © Copyright. All rights reserved.


Lesson: Analyzing Disk and I/O Issues

Log Backup
Log backups store the content of a closed log segment. They are automatically and
asynchronously created by reading the payload from the log segments, and writing them
to the backup area.
Snapshot
SAP HANA database snapshots are used by certain operations, such as backup and
system copy. They are created by triggering a system-wide consistent savepoint. The
system keeps the blocks belonging to the snapshot at least until the drop of the
snapshot. Detailed information about snapshots can be found in the SAP HANA
Administration Guide.
Database Restart
At database startup, the services load their persistence, including catalog and row store
tables, into memory. This means that the persistence is read from the storage.
Additionally, the redo log entries written after the last savepoint are read from the log
volume and replayed in the data area in-memory. When this is finished, the database is
accessible. The bigger the row store, the longer it takes for the system to become
available for operation again.
Database Recovery
The restore of a data backup reads the backup content from the backup device and
writes it to the SAP HANA data volumes. The I/O write orders of the data recovery, have
a size of 64 MB. The redo log can be replayed during a database recovery, that is, the log
backups are read from the backup device and the log entries get replayed.
Failover (Host Auto-FailOver)
On the standby host, the services run in idle mode. Upon failover, the data and log
volumes of the failed host are automatically assigned to the standby host. The standby
host then has read and write access to the files of the failed active host. Row and Column
Store tables (the latter on demand) must be loaded into memory. The log entries have to
be replayed.
Takeover (System Replication)
The secondary system is already running, services are active but cannot accept SQL, and
thus are not usable by the application. As in the database restart (described earlier) the
row store tables need to be loaded into memory from persistent storage. If the prload
table is used, then most of the column store tables are already in-memory. During
takeover, the replicated redo logs, shipped since the last data transport from primary to
secondary, must be replayed.

© Copyright. All rights reserved. 97


Unit 2: Structural System Performance Root Cause Analysis

SAP HANA Disk-related Alerts

Figure 61: SAP HANA Disk-related Alerts

Alert 2 - Disk usage


Determines what percentage of each disk containing data, log, and trace files is used.
This includes space used by non-SAP HANA files.
Alert 28 - Most recent savepoint operation
Determines how long ago the last savepoint was defined, that is, when a complete,
consistent image of the database was persisted to disk.
Alert 30 - Check internal disk full event
Determines whether or not the disks to which data and log files are written are full. A
disk-full event causes your database to stop and must be resolved. This alert is issued
when it is not possible to write to one of the disk volumes used for data, log, backup, or
trace files. As well as running out of disk space, there are other possible causes. All
causes lead to this alert.
Issues that may prevent SAP HANA from writing to disk include the following:
● File system quota is exceeded.
● File system ran out of nodes.
● File system has errors (bugs).

In all cases, the solution is to free up disk space.


Alert 34 - Unavailable volumes
Determines whether or not all volumes are available.
Alert 50 - Number of diagnosis files
Determines the number of diagnosis files written by the system (excluding ZIP files). An
unusually large number of files can indicate a problem with the database (for example,
problems with trace file rotation or a high number of crashes).
Alert 51 - Size of diagnosis files

98 © Copyright. All rights reserved.


Lesson: Analyzing Disk and I/O Issues

Identifies large diagnosis files. Unusually large files can indicate a problem with the
database.
Alert 52 Crashdump files
Identifies new crashdump files that have been generated in the trace directory of the
system.
Alert 53 - Pagedump files
Identifies new pagedump files that have been generated in the trace directory of the
system.
Alert 54 - Savepoint duration
Identifies long-running savepoint operations.
Alert 60 - Sync/Async read ratio
Identifies a bad trigger asynchronous read ratio. This means that asynchronous reads are
blocking and behave almost like synchronous reads. This might have negative impact on
SAP HANA I/O performance in certain scenarios.
Alert 61 - Sync/Async write ratio
Identifies a bad trigger asynchronous write ratio. This means that asynchronous writes
are blocking, and behave almost like synchronous writes. This may have a negative
impact on SAP HANA I/O performance in certain scenarios.
Alert 77 - Database disk usage
Determines the total used disk space of the database. All data, logs, traces and backups
are considered.
Alert 89 - Missing volume files
Determines if there is any volume file missing.
Alert 113 - Host open file count
Determines what percentage of total open file handles are in use. All processes are
considered, including non-SAP HANA processes. Compare
M_HOST_RESOURCE_UTILIZATION.OPEN_FILE_COUNT with
M_HOST_INFORMATION.VALUE of M_HOST_INFORMATION.KEY open_file_limit.

Note:
For more information about SAP HANA alerts, see SAP Note 2445867 - "How-To:
Interpreting and Resolving SAP HANA Alerts".

Monitoring the Persistence Storage


In the Database Directory screen, in the Disk column, the mini graph will turn red and the disk
usage percentage will be very high to signal a disk full situation but you can't see which disk is
running out of space. To get more detailed information on the disk usage, you can choose the
disk space indicator in the Disk column. The Performance Monitor app will open and the Disk
Used and Disk Size KPIs are shown. You can add additional disk-related KPIs to get a better
insight into who is filling up the file system.
The Alerts column will show that there are alerts, but you can't see which alert was triggered.
To get more detailed information on the alerts, you can choose the alert shown in the Alerts
column. The Alerts app will open and the alert details are shown.

© Copyright. All rights reserved. 99


Unit 2: Structural System Performance Root Cause Analysis

Figure 62: SAP HANA Cockpit Storage Information

In SAP HANA cockpit, disk-related information is found via the Disk Usage card and the
Monitor Disk Volume application.

Figure 63: Disk Usage Information

The Monitor Disk Volume application provides information about the size of the data and log
volumes, the storage locations, the disk storage throughput, and the used page statistics in
the data volume.

100 © Copyright. All rights reserved.


Lesson: Analyzing Disk and I/O Issues

The Disk Usage card opens the Performance Monitor showing the Disk Size and Disk Used
information over time. This information helps you to understand when the growth of the used
disk space started and when the system ran out of space. By adding additional KPIs like Data
Read/Write and Log Read/Write, you can determine if the disk full event is caused by SAP
HANA writing huge amounts of data or log information.

Figure 64: More Details About Storage Usage

Note:
For more information about disk I/O analysis, see SAP Note 1999930 - "FAQ:
SAP HANA I/O Analysis".

LESSON SUMMARY
You should now be able to:
● Analyze disk and I/O issues

© Copyright. All rights reserved. 101


Unit 2: Structural System Performance Root Cause Analysis

102 © Copyright. All rights reserved.


Unit 2

Learning Assessment

1. Which KPIs are shown, by default, in the Performance Monitor started from the Memory
Usage card in SAP HANA Cockpit 2.0?
Choose the correct answers.

X A Total resident memory

X B Physical memory size

X C Used resident memory

X D Database free memory

2. The statistics service is a central element of the internal monitoring infrastructure of SAP
HANA. It collects statistical and performance information using SQL.
Determine whether this statement is true or false.

X True

X False

3. Which SAP HANA information sources inform you about high CPU consumption on your
SAP HANA database?
Choose the correct answers.

X A The host CPU usage Alert number 5

X B The CPU graph on the CPU usage card

X C The Alerts card in the Overview screen

X D The Services card

4. In SAP HANA, performance issues are nothing to worry because the whole database is
running in-memory.
Determine whether this statement is true or false.

X True

X False

© Copyright. All rights reserved. 103


Unit 2: Learning Assessment

5. Which tools can you use to monitor the Average Execution Time of individual SQL queries?
Choose the correct answers.

X A Monitor Statements - Overview tab

X B Monitor Statements - Active Statements tab

X C Monitor Statements - Expensive Statements tab

X D SQL plan cache tab

6. The SQL Statements card displays the number of blocked transactions currently on hold
in the database.
Determine whether this statement is true or false.

X True

X False

7. Which operations in the SAP HANA database result in disk I/O on the Redo Log Volume?
Choose the correct answers.

X A Database restart

X B Column table reload

X C Scale-out host failover

X D Database data backup

8. A disk-full event causes the SAP HANA database to stop and the problem must be
resolved before database operations can continue.
Determine whether this statement is true or false.

X True

X False

104 © Copyright. All rights reserved.


Unit 2

Learning Assessment - Answers

1. Which KPIs are shown, by default, in the Performance Monitor started from the Memory
Usage card in SAP HANA Cockpit 2.0?
Choose the correct answers.

X A Total resident memory

X B Physical memory size

X C Used resident memory

X D Database free memory

Correct! The total resident memory and the physical memory size are shown as a KPI in
the Performance Monitor. Read more on this in the lesson "Analyzing Memory Issues" of
the course HA215.

2. The statistics service is a central element of the internal monitoring infrastructure of SAP
HANA. It collects statistical and performance information using SQL.
Determine whether this statement is true or false.

X True

X False

Correct! The embedded statistics service is part of the indexserver. Read more on this in
the lesson "Analyzing Memory Issues" of the course HA215.

3. Which SAP HANA information sources inform you about high CPU consumption on your
SAP HANA database?
Choose the correct answers.

X A The host CPU usage Alert number 5

X B The CPU graph on the CPU usage card

X C The Alerts card in the Overview screen

X D The Services card

Correct! High CPU consumption is shown by the CPU graph, the Alerts card, and the host
CPU usage alert. Read more on this in the lesson "Analyzing CPU Issues" of the course
HA215.

© Copyright. All rights reserved. 105


Unit 2: Learning Assessment - Answers

4. In SAP HANA, performance issues are nothing to worry because the whole database is
running in-memory.
Determine whether this statement is true or false.

X True

X False

Correct! SAP HANA is optimized to consume all memory and CPU available. Even if you
have enough memory and CPU resources available, you can experience performance
issues due to badly written queries. Read more on this in the lesson "Analyzing CPU
Issues" of the course HA215.

5. Which tools can you use to monitor the Average Execution Time of individual SQL queries?
Choose the correct answers.

X A Monitor Statements - Overview tab

X B Monitor Statements - Active Statements tab

X C Monitor Statements - Expensive Statements tab

X D SQL plan cache tab

Correct! The response time of individual SQL queries can be monitored using the Monitor
Statements - Active Statements and Monitor Statements - SQL plan cache applications.
Read more about this in the lesson "Analyzing Expensive Statements Issues" of the course
HA215.

6. The SQL Statements card displays the number of blocked transactions currently on hold
in the database.
Determine whether this statement is true or false.

X True

X False

Correct! The SQL Statements card displays the currently running statements. Read more
about this in the lesson "Analyzing Expensive Statements Issues" of the course HA215.

106 © Copyright. All rights reserved.


Unit 2: Learning Assessment - Answers

7. Which operations in the SAP HANA database result in disk I/O on the Redo Log Volume?
Choose the correct answers.

X A Database restart

X B Column table reload

X C Scale-out host failover

X D Database data backup

Correct! A database restart and a scale-out host failover are operations that result in disk
I/O on the Redo Log volume. Read more about this in the lesson "Analyzing Disk and I/O
Issues" of the course HA215.

8. A disk-full event causes the SAP HANA database to stop and the problem must be
resolved before database operations can continue.
Determine whether this statement is true or false.

X True

X False

Correct! The SAP HANA database cannot continue in a disk-full situation because it
cannot write its data to disk. This must be fixed before the SAP HANA database can
continue operation. Read more about this in the lesson "Analyzing Disk and I/O Issues" of
the course HA215.

© Copyright. All rights reserved. 107


Unit 2: Learning Assessment - Answers

108 © Copyright. All rights reserved.


UNIT 3 Proactive Monitoring and
Performance Safeguarding

Lesson 1
Configuring the SAP HANA Alerting Framework 111

Lesson 2
Setting up SAP HANA Workload Management 123

Lesson 3
Using SAP HANA Capture and Replay 151

UNIT OBJECTIVES

● Configure SAP HANA alerting framework


● Set up SAP HANA workload management
● Capture and replay a SAP HANA workload

© Copyright. All rights reserved. 109


Unit 3: Proactive Monitoring and Performance Safeguarding

110 © Copyright. All rights reserved.


Unit 3
Lesson 1
Configuring the SAP HANA Alerting
Framework

LESSON OBJECTIVES
After completing this lesson, you will be able to:
● Configure SAP HANA alerting framework

Alert Monitoring
Alert Monitoring
As an administrator, you actively monitor the status of the system and its services and the
consumption of system resources. However, you are also alerted to critical situations, for
example: a disk is becoming full, CPU usage is reaching a critical level, or a server has
stopped.
A summary of all alerts in the database is available on the homepage of the SAP HANA
cockpit. To get more information about these alerts, and to analyze the historical occurrence
of alerts, you can drill down into the Alerts application.
In addition, several configuration options are available so that you can tailor alerts in the SAP
HANA database to your needs. For example, you can change the alerting thresholds, setup e-
mail notification of alerts, and switch particular alerts on or off.
On the Alerts card, the alerts are counted and grouped by the ten most important alert
categories defined in SAP HANA. Use the View By KPA (Key Performance Area) to switch
between Alert Categories and Alert KPA. You can refresh the displayed data by using the SAP
HANA Cockpit Refresh - Now button in the top right corner.

© Copyright. All rights reserved. 111


Unit 3: Proactive Monitoring and Performance Safeguarding

Figure 65: Alerts Count Overview

To open the Alerts application, choose the Alerts card.


The internal monitoring infrastructure of the SAP HANA database is continuously collecting
and evaluating information about status, performance, and resource usage from all
components of the SAP HANA database. It also performs regular checks on the data in the
system tables and views, and issues alerts to warn you of potential problems when
configurable threshold values are exceeded. The priority of the alert indicates the severity of
the problem, and depends on the nature of the check and configured threshold values. For
example, if 90% of available disk space is used, a low priority alert is issued; if 98% is used, a
high priority alert is issued.
The priority of an alert indicates the severity of the problem and how soon action needs to be
taken.

Priority Description
Information Action recommended to improve system
performance or stability
Low Medium-term action required to mitigate the
risk of downtime
Medium Short-term action required (few hours, days)
to mitigate the risk of downtime
High Immediate action required to mitigate the
risk of downtime, data loss, or data corrup-
tion

112 © Copyright. All rights reserved.


Lesson: Configuring the SAP HANA Alerting Framework

Alert Details
In addition, several configuration options are available so that you can tailor alerting in the
SAP HANA database to your needs (for example, changing alerting thresholds, switching
particular alerts off, and setting up e-mail notification of alerts).

Figure 66: Alert Details Application

To open the Alerts app, on the Overview page of the SAP HANA cockpit, choose the Alerts
card. All of the latest alerts are displayed in list format on the left.
Find and select the alert that you want to analyze using the options available for filtering,
searching, and sorting. Detailed information about the alert is shown on the right, including a
graph displaying how often the alert has been issued over a certain time frame.
Select the time frame that you want to analyze. By default, the number of occurrences per
hour over the last 24 hours is displayed.
When you select an alert, detailed information about the alert is displayed on the right. The
following detailed information about an alert is available:
● Category
Displays the category of the alert checker that issued the alert.
Alert checkers are grouped into categories, for example, those related to memory usage,
those related to transaction management, and so on.
● Next Scheduled Run
Displays when the related alert checker is next scheduled to run.

© Copyright. All rights reserved. 113


Unit 3: Proactive Monitoring and Performance Safeguarding

If the alert checker has been switched off (alert checker status Switched Off) or it failed the
last time it ran (alert checker status Failed), this field is empty because the alert checker is
no longer scheduled.
● Interval
Displays the frequency at which the related alert checker runs.
If the alert checker has been switched off (alert checker status Switched Off) or it failed the
last time it ran (alert checker status Failed), this field is empty because the alert checker is
no longer scheduled.
● Alerting Host and Port
Displays the name and port of the host that issued the alert.
In a system replication scenario, alerts issued by secondary system hosts can be identified
here. This allows you to ensure availability of secondary systems by addressing issues
before an actual failover.
● Alert Checker
Displays the name and description of the related alert checker.
● Proposed Solution
Displays the possible ways of resolving the problem identified in the alert, with a link to the
supporting app, if available.
● Past Occurrences of Alert
A configurable graphical display that indicates how often the alert occurred in the past.

How Does It Work?


As an SAP HANA database administrator, you need to monitor the status of the system and
its services and the consumption of system resources. When critical situations arise, you
need to be notified so that you can take appropriate action in a timely manner. For data center
operation and resource allocation planning, you must analyze historical monitoring data.
These requirements are met by SAP HANA's internal monitoring infrastructure.
The statistics service is a central element of SAP HANA's internal monitoring infrastructure. It
notifies you when critical situations arise in your systems and provides you with historical
monitoring data for analysis. It collects statistical and performance information using SQL.

114 © Copyright. All rights reserved.


Lesson: Configuring the SAP HANA Alerting Framework

Figure 67: Architecture

The statistics service collects and evaluates information about status, performance, and
resource consumption from all components belonging to the system. In addition, it performs
regular checks and when configurable threshold values are exceeded, issues alerts. For
example, if 90% of available disk space is used, a low priority alert is issued: if 98% is used, a
high priority alert is issued.
Monitoring and alert information are stored in database tables in a dedicated schema
(_SYS_STATISTICS). From there, the information can be accessed by administration tools,
such as SAP HANA cockpit, or SAP HANA studio. The data from system views is evaluated
against certain threshold values, which can then trigger configured follow-up actions, such as
an email notification.
The statistics service is implemented by a set of tables and SQLScript procedures in the
master index server and by the statistics scheduler thread that runs in the master name
server. The SQLScript procedures either collect data (data collectors) or evaluate alert
conditions (alert checkers). Procedures are invoked by the scheduler thread at regular
intervals, which are specified in the configuration of the data collector or alert checker. Data
collector procedures read system views and tables, process the data (for example, if the
persisted values need to be calculated from the read values) and store the processed data in
measurement tables for creating the measurement history.
This scheduler thread is part of the statistics server that runs in the nameserver service. From
the TREXNet, calls are sent to the indexserver to call SQLScript procedures.
In the case of multi-database systems, note the following:
● SystemDB: All threads run in the nameserver.
● Tenant DBs: All threads run in the indexserver.

Alert checker procedures are scheduled independently of the data collector procedures. They
read current data from the original system tables and views, not from the measurement
history tables. After reading the data, the alert checker procedures evaluate the configured
alert conditions. If an alert condition is fulfilled, a corresponding alert is written to the alert
tables. From there, it can be accessed by monitoring tools that display the alert. It is also

© Copyright. All rights reserved. 115


Unit 3: Proactive Monitoring and Performance Safeguarding

possible to have e-mail notifications sent to administrators if an alert condition is fulfilled.


Depending on the severity level of the alert, summary emails are sent at the appropriate
frequency (hourly, every 6 hours, or daily). You can also trigger alert checker procedures
directly from monitoring tools (for example, SAP HANA cockpit).

Data Management in the Statistics Service


The following mechanisms exist to manage the volume of data collected and generated by the
statistics service:
● Configurable data retention period
The data collected by the data collectors of the statistics service is deleted after a default
number of days. The majority of collectors have a default retention period of 42 days. For a
list of those collectors that have a different default retention period, execute the following
statement:
SELECT o.name, s.retention_days_default FROM
_SYS_STATISTICS.STATISTICS_SCHEDULE s,
_SYS_STATISTICS.STATISTICS_OBJECTS o WHERE s.id = o.id AND o.type =
'Collector'and s.retention_days_default != 42 order by 1;

You can change the retention period of individual data collectors with the following SQL
statement:
UPDATE _SYS_STATISTICS.STATISTICS_SCHEDULE set
RETENTION_DAYS_CURRENT=<retention_period_in_days> where
ID=<ID_of_data_collector>;

Note:
To determine the IDs of data collectors execute the statement:
SELECT * from _SYS_STATISTICS.STATISTICS_OBJECTS
where type = 'Collector';

Alert data in the _SYS_STATISTICS.STATISTICS_ALERTS table is also deleted by default


after 42 days. You can change this retention period with the following statement:
UPDATE _SYS_STATISTICS.STATISTICS_SCHEDULE set
RETENTION_DAYS_CURRENT=<retention_period_in_days> where ID=6002;
● Maximum number of alerts
By default, the number of alerts in the system (that is, rows in the table
_SYS_STATISTICS.STATISTICS_ALERTS_BASE) cannot exceed 1,000,000. If this number
is exceeded, the system starts deleting rows in increments of 10 percent, until the number
of alerts is below the maximum.
To change the maximum number of alerts permitted, add a row with the key
internal.alerts.maxrows and the new maximum value to the table
_SYS_STATISTICS"."STATISTICS_PROPERTIES.
INSERT INTO _SYS_STATISTICS.STATISTICS_PROPERTIES VALUES
('internal.alerts.maxrows', 500000);

Statistics Service in Multitenant Database Containers


In multiple-container systems, the statistics service runs as an embedded process in the
(master) index server of every tenant database. Every database has its own
_SYS_STATISTICS schema.

116 © Copyright. All rights reserved.


Lesson: Configuring the SAP HANA Alerting Framework

Monitoring tools such as the SAP HANA cockpit allow administrators in the system database
to access certain alerts occurring in individual tenant databases. However, this access is
restricted to alerts that identify situations with a potentially system-wide impact, for example,
the physical memory on a host is running out. Alerts that expose data in the tenant database
(for example, table names) are not visible to the system administrator in the system
database.

Configuring Alerts
From the Alert Details or the Configure Alerts link, on the SAP HANA Cockpit – Overview page,
you can open the Configure Alerts app. In the Configure Alerts app, there are several
configuration options available so that you can tailor alerting in the SAP HANA database to
your needs.
You can configure the following:
● Change the threshold values that trigger alerts of different priorities.
● Set up email notifications so that specific people are informed when alerts are issued.

You can also perform the following actions on alert checkers:


● Run alert checkers on a once-off basis regardless of their configured schedule or status.
● Switch alert checkers on or off.

Figure 68: Alerts Checker Configuration

Alert Checker Details


When you select an alert checker Alert Configuration, detailed information about the alert
checker and its configuration is displayed on the right.
You can view the following detailed information about an alert checker:

© Copyright. All rights reserved. 117


Unit 3: Proactive Monitoring and Performance Safeguarding

Detail Description

Header information The name of the alert checker, its status, and the last time it ran.
Description A description of what the alert checker does, for example, what
performance indicator it measures or what setting it verifies.
Alert Checker ID The unique ID of the alert checker.
Category The category of the alert checker.
Alert checkers are grouped into categories, for example, those re-
lated to memory usage, those related to transaction management,
and so on.

Threshold Values for The values that trigger high, medium, low, and information alerts
Prioritized Alerting issued by the alert checker.
The threshold values and the unit depend on what the alert check-
er does. For example, alert checker 2 measures what percentage
of disk space is currently used, so its thresholds are percentage
values.

Note:
Thresholds can be configured for any alert checker
that measures variable values that should stay within
certain ranges, for example, the percentage of physi-
cal memory used, or the age of the most recent data
backup. Many alert checkers verify only whether a
certain situation exists or not. Threshold values can-
not be configured for these alert checkers. For exam-
ple, alert checker 4 detects service restarts. If a serv-
ice was restarted, an alert is issued.

Interval How often the alert checker runs.


Schedule Active Indicates whether the alert checker is running automatically at the
configured interval.
Proposed Solution Possible ways of resolving the problem identified by the alert
checker.

Alert Checker Statuses


The status of an alert checker indicates whether it is running on schedule, has failed and has
been disabled by the system, or has been switched off. The following status states are
possible:
● Active
The alert checker is running on schedule.
● Failed
The alert checker failed the last time it ran (for example, due to a shortage of system
resources), so the system disabled it.

118 © Copyright. All rights reserved.


Lesson: Configuring the SAP HANA Alerting Framework

The alert checker remains disabled for a specific length of time before it is automatically
re-enabled. The length of time is calculated based on the values in the following columns of
the table STATISTICS_SCHEDULE (_SYS_STATISTICS):
- INTERVALLENGTH
- SKIP_INTERVAL_ON_DISABLE

Once INTERVALLENGTH x SKIP_INTERVAL_ON_DISABLE has elapsed, the alert checker


is re-enabled. The default values for all alert checkers mean that failed checkers remain
disabled for one hour. Every 60 seconds, the system determines the status of every alert
checker and/or whether the time to re-enablement has elapsed.
You can also re-enable the alert checker manually by switching it back on in Alert
Configuration.
● Switched Off
You switched off the alert checker schedule.
If you want the alert checker to run again automatically, you must manually switch it back
on.

Configure Alerting Thresholds


In many cases, you can configure the thresholds that trigger an alert. An alert checker can
have a low, medium, or high priority threshold.
Thresholds can be configured for any alert checker that measures variable values that should
stay within certain ranges, for example, the percentage of physical memory used, or the age
of the most recent data backup. Many alert checkers verify only whether a certain situation
exists or not. Threshold values cannot be configured for these alert checkers. For example,
alert checker 4 detects service restarts. If a service was restarted, an alert is issued.
Alerts are issued when the alert checker records values that exceed the configured
thresholds.

Switch Alerting On/Off


If you no longer want a particular alert to be issued, you can switch off the underlying alert
checker so that it no longer runs automatically according to the schedule. Alert checkers that
have been disabled by the system must be switched back on manually.
In some situations you may want to stop a particular alert from being issued, either because it
is unnecessary (for example, alerts that notify you when there are other alerts in the system)
or because it is not relevant in your system (for example, backup-related alerts in test
systems where no backups are performed).

Caution:
If you switch off alerts, you may not be warned about potentially critical
situations in your system.

You can switch an alert checker on again at any time. You may also want to switch on alert
checkers that the system has disabled, such as checkers with the status Failed. The system
automatically disables alert checkers when they fail to run, for example, due to a shortage of
system resources.
The system automatically switches failed alert checkers back on after a certain length of time.
For more information, see Alert Checker Statuses.

© Copyright. All rights reserved. 119


Unit 3: Proactive Monitoring and Performance Safeguarding

You can disable an alert for a particular table or schema. You can do this for the alerts
"Record count of non-partitioned column-store tables" (ID 17) and "Table growth of non-
partitioned column-store tables" (ID 20).
To exclude an alert from being issued for a particular table, use the following SQL statement:
INSERT INTO _sys_statistics.statistics_exclude_tables VALUES
(<alert_id>, '<schema_name>', '<table_name>')

To exclude an alert from being issued for all tables of a particular schema, use the following
SQL statement:
INSERT INTO _sys_statistics.statistics_exclude_tables VALUES
(<alert_id>, '<schema_name>', null)

To re-enable the alerts, delete the entries from the table


_sys_statistics.statistics_exclude_tables.
If you switched the alert checker off, its status changes to Switched Off and it is no longer
scheduled to run automatically.
If you switched the alert checker on, its status changes to Active and it starts running again
automatically, according to its configured schedule.

Set Up Email Notification


You can configure alert checkers so that you and other responsible administrators receive
push notifications by email when alerts are issued.
If you want to be notified by email about new alerts when they are issued, you can set this up
in Alerts Configuration. You can configure one or more default recipients to be notified when
any alert checker issues an alert. Also, if different people need to be notified about different
alerts, you can configure dedicated recipients for these alert checkers.
Note the following behavior:
● If you configure checker-specific recipients, default recipient(s) are not notified.
● If you delete all checker-specific recipients, default recipient(s) are notified again, if
configured.
● You can configure checker-specific recipients regardless of whether or not default
recipients are configured.

The configured recipients receive an email when an alert checker issues an alert. If the alert
checker issues the same alert the next time it runs, no further emails are sent. However, when
the alert checker runs and does not issue an alert, indicating that the issue is resolved or no
longer occurring, a final email is sent.

Check for Alerts Out of Schedule


In general, alert checkers run automatically according to a configured schedule. If necessary,
you can run an alert checker on a once-off basis outside of its schedule.
In some cases, you may want to check for a particular alert outside of the alert checker's
configured schedule. For example, to verify that the problem identified by a previous alert has
been resolved.
Running an alert checker in this specific way does not affect its configured schedule.

120 © Copyright. All rights reserved.


Lesson: Configuring the SAP HANA Alerting Framework

Note:
If you want to manually run an alert checker with the status Switched Off or Failed,
you must switch it back on first.

LESSON SUMMARY
You should now be able to:
● Configure SAP HANA alerting framework

© Copyright. All rights reserved. 121


Unit 3: Proactive Monitoring and Performance Safeguarding

122 © Copyright. All rights reserved.


Unit 3
Lesson 2
Setting up SAP HANA Workload Management

LESSON OBJECTIVES
After completing this lesson, you will be able to:
● Set up SAP HANA workload management

Introduction to SAP HANA Workload Management

Figure 69: SAP HANA Workload Management

Workload Management
The load on an SAP HANA system can be managed by selectively applying limitations and
priorities to how resources (such as the CPU, the number of active threads, and memory) are
used. Settings can be applied globally or at the level of individual user sessions by using
workload classes.
On an SAP HANA system, thanks to the capabilities of the platform, there are many different
types of workload, from simple or complex statements to potentially long-running data
loading jobs. These workload types must be balanced with the system resources that are
available to handle concurrent work. For simplicity, we classify workload queries as
transactional (OLTP) or analytic (OLAP). With a transactional query the typical response time
is measured in milliseconds and these queries are normally executed in a single thread.

© Copyright. All rights reserved. 123


Unit 3: Proactive Monitoring and Performance Safeguarding

However, analytic queries normally feature more complex operations using multiple threads
during execution: this can lead to higher CPU usage and memory consumption compared
with transactional queries.
To manage the workload of your system aim to ensure that the database management
system is running in an optimal way given the available resources. The goal is to maximize the
overall system performance by balancing the demand for resources between the various
workloads: not just to optimize for one particular type of operation. If you achieve this,
requests will be carried out in a way that meets your performance expectations and you will
be able to adapt to changing workloads over time. Besides optimizing for performance, you
can also optimize for robustness so that statement response times are more predictable.

Workload in the Context of SAP HANA


Workload in the context of SAP HANA can be described as a set of requests with common
characteristics.
We can look at the details of a particular workload in a number of ways. We can look at the
source of requests and determine if particular applications or application users generate a
high workload for the system. We can examine what kinds of SQL statements are generated:
are they simple or complex? Is there a prioritization of work done based on business
importance, for example, does one part of the business need to have more access at peak
times? We can also look at what kind of service level objectives the business has in terms of
response times and throughput.

Figure 70: Types of Workload

The figure, Types of Workload, shows different types of workload such as Extract Transform
and Load operations (used in data warehouses to load new data in batches from source
system) as well as analytic and transactional operations.
When we discuss workload management we are really talking about stressing the system in
terms of its resource utilization. The main resources we look at (shown in the previous figure)
are CPU, memory, disk I/O, and network. In the context of SAP HANA, disk I/O comes into
play for logging, for example, in an OLTP scenario many small transactions result in a high
level of logging compared to analytic workloads (although SAP HANA tries to minimize this).

124 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

With SAP HANA, network connections between nodes in a scale out system can also be
optimized, for example, statement routing is used to minimize network overheads.
However, when we try to influence workload in a system, the main focus is on the available
CPUs and memory being allocated and utilized. Mixed transactional and analytic workloads
can, for example, compete for resources and at times require more resources than are readily
available. If one request dominates, there may be a queuing effect, so the next request may
have to wait until the previous one is ready. Such situations need to be managed to minimize
the impact on overall performance.

Options for Managing Workload


Workload management can be configured at multiple levels: at the operating system-level, by
using global initialization settings, and at the session level.
There are a number of things you can do to influence how the workload is handled:
● Outside the SAP HANA system on the operating system level you can set the affinity of the
available cores.
● You can apply static settings using parameters to configure execution, memory
management, and peak load situations.
● You can influence workload dynamically at system runtime by defining workload classes.

Figure 71: Options for Managing Workload

The OS level settings are rather static, with a low granularity. The more dynamic and more
granular settings can be done at the SAP HANA system level, and even more so at the SAP
HANA session level by using SAP HANA workload classes.
All of these options have default settings which are applied during the SAP HANA installation.
These general-purpose settings may provide you with a perfectly acceptable performance, in
which case the workload management features described in this chapter may be
unnecessary. Before you begin workload management, ensure that the system generally is
well configured: that SQL statements are tuned, that in a distributed environment tables are
optimally distributed, and that indexes have been defined as needed.
If you have specific workload management requirements, the following table outlines a
process of looking at ever more fine-grained controls that can be applied with regard to CPU,
memory and execution priority.

© Copyright. All rights reserved. 125


Unit 3: Proactive Monitoring and Performance Safeguarding

Area Possible Actions


CPU Settings related to affinity are available to
bind server processes to specific CPU cores.
Processes must be restarted before these
changes become effective.
CPU Thread Pools Global execution settings are available to
manage CPU thread pools and manage par-
allel execution (concurrency).
Memory Global memory manager settings are availa-
ble to apply limits to the resources allocated
to expensive SQL statements.
Admission Control Global admission control settings can be
used to apply system capacity thresholds
above which SQL statements can be either
rejected or queued.
Workload Class Mapping A more targeted approach to workload man-
agement is possible by setting up preconfig-
ured classes which can be mapped to individ-
ual user sessions. You can, for example, map
an application name or an application user to
a specific workload class. Classes include the
option to apply a workload priority value.

Understand your Workload


Managing workloads can be viewed as an iterative three-part process: analyze the current
system performance, understand the nature of your workload, and map your workload to the
system resources.
There is no one single workload management configuration that fits all scenarios. Because
workload management settings are highly workload dependent you must first understand
your workload. The following figure shows an iterative process that you can use to understand
and optimize how the system handles the workload.

1. First, look at how the system is currently performing in terms of CPU usage and memory
consumption. What kinds of workloads are running on the system? Are there complex,
long running queries that require lots of memory?

2. When you have a broad understanding of the activity in the system you can drill down to
the details, such as business importance. Are statements being run that are strategic or
analytic in nature, compared to standard reporting that may not be so time-critical? Can
those statements be optimized to run more efficiently?

3. When you have a deeper understanding of the system, you have a number of ways to
influence how it handles the workload. You can map the operations to available resources,
such as CPU and memory, and determine the priority that requests get by, for example,
using workload classes.

Analyzing System Performance


You can use system views to analyze how effectively your system is handling the current
workload.

126 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

This section lists some of the most useful views available which you can use to analyze your
workload and gives suggestions for how to improve performance. Refer also to the scenarios
section for more details of how these analysis results can help you to decide which workload
management options to apply.

Analyzing SQL Statements


Use these views to analyze the performance of SQL statements:
● M_ACTIVE_STATEMENTS
● M_PREPARED_STATEMENTS
● M_EXPENSIVE_STATEMENTS

If these views indicate problems with statements you can use workload classes to tune the
statements by limiting memory or parallelism.
Consider also the setting of any session variables (in M_SESSION_CONTEXT) which might
have a negative impact on these statements. The following references provide more detailed
information on this:
● SAP Note 2215929: Using Client Info to set Session Variables and Workload Class settings
describes how client applications set session variables for dispatching workload classes.
● SAP HANA Developer Guide (Setting Session-Specific Client Information)

Analyzing CPU Activity


Use these views to analyze CPU activity:
● M_SERVICE_THREADS
● M_SERVICE_THREAD_SAMPLES
● M_EXPENSIVE_STATEMENTS.CPU_TIME (column)
● M_SERVICE_THREAD_CALLBACKS (stack frame information for service threads)
● M_JOBEXECUTORS (job executor statistics)

These views provide detailed information on the threads that are active in the context of a
particular service and information about locks held by threads.
If these views show many threads for a single statement, and the general system load is high
you can adjust the settings for the set of 'execution' ini-parameters as described in the topic
Controlling Parallel Execution.

Controlling CPU Consumption


Controlling CPU Consumption
If the physical hardware on a host is shared between several processes, you can use CPU
affinity settings to assign a set of logical cores to a specific SAP HANA process. These
settings are coarse-grained and apply on the OS and process levels.
You can use the affinity configuration parameter to restrict CPU usage of SAP HANA server
processes to certain CPUs or ranges of CPUs.

© Copyright. All rights reserved. 127


Unit 3: Proactive Monitoring and Performance Safeguarding

Figure 72: SAP HANA CPU Affinity

Using the configuration option, we first analyze how the system CPUs are configured Then,
based on the information returned, apply affinity settings in the daemon.ini file to bind specific
processes to logical CPU cores. Processes must be restarted before the changes become
effective. This approach applies primarily to the use cases of SAP HANA tenant databases
and multiple SAP HANA instances on one server. You can use this, for example, to partition
the CPU resources of the system by tenant database.

Note:
As an alternative to applying CPU affinity settings you can achieve similar
performance gains by changing the parameter max_concurrency in the section
[execution] of the global.ini configuration file. This may be more convenient
and can be done while the system is online.

To make the changes described here, you require access to the operating system of the SAP
HANA instance to run the Linux lscpu command and you require the privilege INIFILE
ADMIN.
Information about the SAP HANA system topology is also available from SAP HANA
monitoring views as described in a later subsection, SAP HANA Monitoring Views for CPU
Topology Details.

Hint:
For more information, see SAP Note 2470289: FAQ: SAP HANA Non-Uniform
Memory Access (NUMA).

For Xen and VMware, the users in the VM guest system see what is configured in the VM host.
So the quality of the reported information depends on the configuration of the VM guest.
Therefore, SAP cannot give any performance guarantees in this case.

128 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

Configuration Steps
To confirm the physical and logical details of your CPU architecture, analyze the system using
the lscpu command. This command returns a list of details of the system architecture.
The following table gives an overview of the most useful values, based on a sample system
with 2 physical chips (sockets), each containing 8 physical cores. These are hyper-threaded
to give a total of 32 logical cores.

Line Number Feature Example Value

1 Architecture x86_64

2 CPU op-mode(s) 32-bit, 64-bit

3 Byte Order LittleEndian

4 CPUs 32

5 On-line CPU(s) list 0-31

6 Thread(s) per core 2

7 Core(s) per socket 8

8 Socket(s) 2

9 NUMA node(s) 2

21 NUMA node0 CPU(s) 0-7,16-23

22 NUMA node1 CPU(s) 8-15,24-31

● Item 4-5: This sample server has 32 logical cores, numbered 0 to 31.
● Item 6-8: Logical cores (threads) are assigned to physical cores. Assigning multiple
threads to a single physical core is referred to as hyper-threading.
In this example, there are two sockets, each socket contains eight physical cores (total 16).
Two logical cores are assigned to each physical core, thus, each core exposes two
execution contexts for the independent and concurrent execution of two threads.
● Item 9: In this example there are two Non-uniform Memory Access (NUMA) nodes, one for
each socket. Other systems may have multiple NUMA nodes per socket.
● 21-22: The system numbers and assigns 32 logical cores to one of the two NUMA nodes.

Note:
Even on a system with 32 logical cores and two sockets, the assignment of logical
cores to physical CPUs and sockets can be different. It is important to collect the
assignment in advance before making changes.
You can perform a more detailed analysis by using the system commands
described in the next step. These commands provide detailed information for each
core, including how CPU cores are grouped as siblings.

© Copyright. All rights reserved. 129


Unit 3: Proactive Monitoring and Performance Safeguarding

In addition to the lscpu command, you can use the set of system commands in the /sys/
devices/system/cpu/ directory tree. For each logical core, there is a numbered
subdirectory beneath the node (/cpu12/ in the following examples).
The following examples show how to retrieve this information. The following table provides
details of some of the more useful commands:
cat /sys/devices/system/cpu/present
cat /sys/devices/system/cpu/cpu12/topology/thread_siblings_list

Command Example Output Commentary

present 0-15 The number of logical cores


available for scheduling.

cpu12/topology/ 4-7, 12-15 The cores on the same sock-


core_siblings_list et.

cpu12/topology/ 4,12 The logical cores assigned to


thread_siblings_list the same physical core (hy-
per-threading).

cpu12/topology/physi- 1 The socket of the current


cal_package_id core, in this case cpu12.

Other Linux commands which are relevant here are sched_setaffinity and numactl. The
command sched_setaffinity limits the set of CPU cores available (by applying a CPU
affinity mask) for execution of a specific process (this could be used, for example, to isolate
tenants) and numactl controls NUMA policy for processes or shared memory.
Based on the results returned you can use the affinity setting to restrict CPU usage of SAP
HANA server processes to certain CPUs or ranges of CPUs. You can do this for the following
servers: nameserver, indexserver, compileserver, preprocessor, and xsengine. Each server
has a section in the daemon.ini file.
The affinity setting is applied by the TrexDaemon when it starts the other SAP HANA
processes using the command sched_setaffinity. Changes to the affinity settings take
effect only after restarting the SAP HANA process.
The following examples show the syntax for the ALTER SYSTEM CONFIGURATION commands
required.

Example 1
To restrict the nameserver to two logical cores of the first CPU of socket 0 (see line 21 in the
example), use the following affinity setting:
ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET
('nameserver', 'affinity') = '0,16'

Example 2
To restrict the preprocessor and the compileserver to all remaining cores (that is, all except 0
and 16) on socket 0 (see line 21 in the example), use the following affinity settings:
ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET
('preprocessor', 'affinity') = '1-7,17-23'

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET


('compileserver', 'affinity') = '1-7,17-23'

130 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

Example 3
To restrict the indexserver to all cores on socket 1 (see line 22 in the example), use the
following affinity settings:
ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET
('indexserver', 'affinity') = '8-15,24-31'

Example 4
To set the affinity for two tenant databases, called DB1 and DB2 respectively, in a tenant
database setup, use the following affinity settings:
ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET
('indexserver.DB1', 'affinity') = '1-7,17-23';

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET


('indexserver.DB2', 'affinity') = '9-15,25-31';

CPU Affinity at Tenant Level


You can assign affinities to different tenants of a multi-tenant database on the same host, as
shown here. Run these SQL statements on the SYSTEMDB.

Figure 73: CPU Affinity at Tenant Level

Example 5
In this scenario tenant BD1 already exists. Here, we add another tenant DB2:
CREATE DATABASE NM2 ADD AT LOCATION 'host:30040' SYSTEM USER PASSWORD
Manager1;

Set the configuration parameter to bind CPUs to specific NUMA nodes on each tenant. You
can use the following notation with a dot to identify the specific tenant:

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini','SYSTEM') SET


('indexserver.NM1', 'affinity') ='0-7,16-23';

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini','SYSTEM') SET


('indexserver.NM2', 'affinity') ='8-15,24-31';

© Copyright. All rights reserved. 131


Unit 3: Proactive Monitoring and Performance Safeguarding

To assign affinities to multiple indexservers of the same tenant on the same host execute the
following SQL statements on the SYSTEMDB to apply the instance_affinity[port]
configuration parameter:

Example 5
In this scenario an indexserver is already running on tenant NM1 on port 30003. Here, we add
another indexserver on a different port:
ALTER DATABASE NM1 ADD 'indexserver' AT LOCATION 'host:30040';

Set the different instances of the instance_affinity[port] configuration parameter to


bind CPUs to specific NUMA nodes on each indexserver. The configuration parameter has a
1-2 digit suffix to identify the final significant digits of the port number: in this example, 30003
and 30040:
ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini','SYSTEM') SET
('indexserver.NM1', 'instance_affinity[3]')='0-7,16-23';
ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini','SYSTEM') SET
('indexserver.NM1', 'instance_affinity[40]')='8-15,24-31';

Restart the indexserver processes to make the affinity settings effective.


You can test the settings in SQL, or by using hdbcons. Run this query on the tenant or
SystemDB as shown here:
select * from M_NUMA_NODES;

Using hdbcons the process ID of the indexserver process is required as a parameter:


hdbcons -p <PID> "jexec info"

SQL Statements to Apply NUMA Location Preferences


You can specify NUMA node location preferences for specific database objects in SQL using
either the CREATE TABLE or ALTER TABLE statements.

NUMA Node Preferences


To apply NUMA node location preferences in SQL for tables, columns, or partitions, you can
use the NUMA NODE clause followed by a list of one or more preferred node locations. Refer to
the previous subsection for how to use the lscpu command to understand the system
topology. For more information on this feature, see SAP HANA SQL and System Views
Reference.
You can specify either individual nodes or a range of nodes, as shown in the following
example:
CREATE COLUMN TABLE T1(A int, B varchar(10)) NUMA NODE (‘1’, ‘3 TO 5’)

In this example, table T1 will be processed by NUMA node 1 if possible, and otherwise by any
of NUMA nodes 3-5. Preferences are saved in the system table NUMA_NODE_PREFERENCE_.
Use the following statement to remove any preferences for an object:
ALTER TABLE T1 NUMA NODE NULL

By default, preferences are only applied the next time the table is loaded. You can use the
ALTER TABLE statement with the keyword IMMEDIATE to apply the preference immediately
(the default value is DEFERRED):
ALTER TABLE T1 NUMA NODE (‘3’) IMMEDIATE

132 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

Granularity
NUMA node location preferences can be applied at any of the following levels:
● Table (column store only)
● Table partition (range partitioning only)
● Column

If multiple preferences for a column or partition have been defined, the column preference is
applied first, then the partition preference, then the table.
The following example shows the statement being used to apply a preference for column A in
table T1:
CREATE COLUMN TABLE T1(A int NUMA NODE (‘2’), B varchar(10))

The following examples show statements to apply a preference for partition A in table T1:
CREATE COLUMN TABLE T1(A int , B varchar(10)) PARTITION BY RANGE(A)
(PARTITION VALUE = 2 NUMA NODE (‘4’))
ALTER TABLE T1 ADD PARTITION (A) VALUE = 3 NUMA NODE (‘1’) IMMEDIATE

You can also identify a partition by its logical partition ID number and set a preference by
using ALTER TABLE as shown here:
ALTER TABLE T1 ALTER PARTITION 2 NUMA NODE ('3')

Transferring Preferences
Using the CREATE TABLE LIKE statement, the new table can be created with or without the
NUMA preference. In the following example any preference which has been applied to T2 will
(if possible) apply on new table T1. The system checks the topology of the target system to
confirm if it has the required number of nodes. If not, the preference is ignored:
CREATE TABLE T1 LIKE T2

The keyword WITHOUT can be used as shown in the following example to ignore any
preference which has been applied to T2 when creating the new table T1:
CREATE TABLE T1 LIKE T2 WITHOUT NUMA NODE

A similar approach is used with the IMPORT and EXPORT statements: any preferences are
saved in the exported table definition and applied, if possible, in the target environment when
the table is imported. In this case you can use the IGNORE keyword to import a table and
ignore any node preferences:
IMPORT SYSTEM."T14" FROM '/tmp/test/' WITH REPLACE THREADS 40 IGNORE
NUMA NODE

SAP HANA Monitoring Views for CPU Topology Details


A number of system views are available that you can use to retrieve details of the CPU
configuration.
You can get a general overview of the system topology by using the Linux lscpu command
described earlier. Information about the system topology is also available in the following
system views:
M_HOST_INFORMATION provides host information such as machine and operating system
configuration. Data in this view is stored in key-value pair format and the values are updated

© Copyright. All rights reserved. 133


Unit 3: Proactive Monitoring and Performance Safeguarding

once per minute. For most keys, you require the INIFILE ADMIN privilege to view the values.
Select one or more key names for a specific host to retrieve the corresponding values:
select * from SYS.M_HOST_INFORMATION where key in
('cpu_sockets','cpu_cores','cpu_threads');

M_NUMA_RESOURCES provides information on overall resource availability for the system:


select HOST, NUMA_NODE_ID, NUMA_NODE_DISTANCES, MEMORY_SIZE from
SYS.M_NUMA_NODES;

M_NUMA_NODES provides resource availability information on each NUMA node in the


hardware topology, including inter-node distances and neighbor information.
select MAX_NUMA_NODE_COUNT, MAX_LOGICAL_CORE_COUNT from
SYS.M_NUMA_RESOURCES;

Controlling Parallel Execution of SQL Statements


Controlling Parallel Execution of SQL Statements
You can apply INI file settings to control the two thread pools, SqlExecutor and JobExecutor,
that control the parallelism of statement execution.

Caution:
The settings described here should only be modified when other tuning
techniques like remodeling, repartitioning, and query tuning have been applied.
Modifying the parallelism settings requires a thorough understanding of the
actual workload because they have an impact on the overall system behavior.
Modify the settings iteratively by testing each adjustment.

On systems with highly concurrent workloads, too much parallelism of single statements may
lead to poor performance. Note also that partitioning tables influences the degree of
parallelism for statement execution. In general, adding partitions tends to increase
parallelism. You can use the parameters described in this section to adjust the CPU utilization
in the system.

134 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

Figure 74: Parallel Execution of SQL Statements

Two thread pools control the parallelism of the statement execution. Generally, target thread
numbers applied to these pools are soft limits, meaning that additional available threads can
be used if necessary and deleted when no longer required:
● SqlExecutor
This thread pool handles incoming client requests and executes simple statements. For
each statement execution, an SqlExecutor thread from a thread pool processes the
statement. For simple OLTP-like statements against column store, as well as for most
statements against row store, this will be the only type of thread involved. With OLTP, we
mean short running statements that consume relatively little resources. However, even
OLTP-systems like SAP Business Suite may generate complex statements.
● JobExecutor
The JobExecutor is a job dispatching subsystem. Almost all remaining parallel tasks are
dispatched to the JobExecutor and its associated JobWorker threads.
In addition to OLAP workload, the JobExecutor also executes operations like table
updates, backups, memory garbage collection, and savepoint writes.

You can set a limit for both SqlExecutor and JobExecutor to define the maximum number of
threads. You can use this, for example, on a system where OLAP workload would normally
consume too many CPU resources to apply a maximum value to the JobExecutor to reserve
resources for OLTP workload.

Caution:
Lowering the value of these parameters can have a negative effect on the parallel
processing of the servers, and reduce the performance of the overall system.
Adapt and test these values iteratively.
For more information, see Understand your Workload, and SAP Note 2222250:
FAQ SAP HANA Workload Management.

© Copyright. All rights reserved. 135


Unit 3: Proactive Monitoring and Performance Safeguarding

A further option to manage statement execution is to apply a limit to an individual user profile
for all statements in the current connection using the THREADLIMIT parameter.

Parameters for SQL Executor

Figure 75: Settings at the SAP HANA System Level

The following SqlExecutor parameters are in the sql section of the indexserver.ini file:
sql_executors sets a soft limit on the target number of logical cores for the SqlExecutor
pool.
● This parameter sets the target number of threads that are immediately available to accept
incoming requests. Additional threads will be created if needed, and deleted if no longer
needed.
● The parameter is initially not set (0) - the default value is the number of logical cores in a
system. As each thread allocates a particular amount of main memory for the stack,
reducing the value of this parameter can help to avoid memory footprint.

max_sql_executors sets a hard limit on the maximum number of logical cores that can be
used.
● In normal operation new threads are created to handle incoming requests. If a limit is
applied here, SAP HANA will reject new incoming requests with an error message if the
limit is exceeded.
● The parameter is initially not set (0) so no limit is applied.

Caution:
SAP HANA will not accept new incoming requests if the limit is exceeded. Use
this parameter with extreme care.

136 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

Parameters for JobExecutor


The following JobExecutor parameters are in the execution section of global.ini or
indexserver.ini:
max_concurrency sets the target number of logical cores for the JobExecutor pool.

● This parameter sets the size of the thread pool used by the JobExecutor used to parallelize
execution of database operations. Additional threads will be created if needed and deleted
if no longer needed. You can use this to limit resources available for JobExecutor threads,
thereby saving capacity for SqlExecutors.
● The parameter is initially not set (0) - the default value is the number of logical cores in a
system. Especially on systems with at least 8 sockets, consider setting this parameter to a
reasonable value between the number of logical cores per CPU, up to the overall number of
logical cores in the system. In a system that supports tenant databases, a reasonable
value is the number of cores divided by the number of tenant databases.

max_concurrency_hint limits the number of logical cores for job workers, even if more
active job workers are available.
● This parameter defines the number of jobs to create for an individual parallelized
operation. The JobExecutor proposes the number of jobs to create for parallel processing,
based on the recent load on the system. Multiple parallelization steps may result in far
more jobs being created for a statement (and hence higher concurrency) than this
parameter.
● The default is 0 (no limit is applied but the hint value is never greater than the value for
max_concurrency). On large systems (systems with more than 4 sockets) setting this
parameter to the number of logical cores of one socket may result in better performance,
but testing is necessary to confirm this.

default_statement_concurrency_limit restricts the actual degree of parallel execution


per connection within a statement.
● This parameter controls the maximum overall parallelism for a single database request.
Set this to a reasonable value (a number of logical cores) between 1 and
max_concurrency, but greater or equal to the value set for max_concurrency_hint.

● The default setting is 0 - no limit is applied.

Setting a Memory Limit for SQL Statements


Setting a Memory Limit for SQL Statements
You can protect an SAP HANA system from uncontrolled queries that consume excessive
memory, by limiting the amount of memory used by single statement executions per host. By
default, there is no limit set on statement memory usage. However, if a limit is applied,
statement executions that require more memory will be aborted when they reach the limit. To
avoid canceling statements unnecessarily, you can also apply a percentage threshold value
which considers the current statement allocation as a proportion of the global memory
currently available. Using this parameter, statements which have exceeded the hard-coded
limit may still be executed if the memory allocated for the statement is within the percentage
threshold. The percentage threshold setting is also effective for workload classes where a
statement memory limit can also be defined.

© Copyright. All rights reserved. 137


Unit 3: Proactive Monitoring and Performance Safeguarding

You can also create exceptions to these limits for individual users (for example, to ensure an
administrator is not prevented from doing a backup) by setting a different statement memory
limit for each individual.
These limits only apply to single SQL statements, not the system as a whole. Tables which
require much more memory than the limit applied here may be loaded into memory. The
parameter global_allocation_limit limits the maximum memory allocation limit for the
system as a whole.
You can view the (peak) memory consumption of a statement in
M_EXPENSIVE_STATEMENTS.MEMORY_SIZE.

Figure 76: Setting Memory Limits for SQL Statements

To be able to set memory limits for SQL statements, enable the following parameters:
● In the global.ini file, in the resource_tracking section:
- enable_tracking = on

- memory_tracking = on

● statement_memory_limit defines the maximum memory allocation per statement in


GB. The default value is 0 (no limit).
- In the global.ini file, expand the memorymanager section and locate the parameter. Set
an integer value in GB between 0 (no limit) and the value of the global allocation limit.
Values that are too small can block the system from performing critical tasks.
- When the statement memory limit is reached, a dump file is created with
'compositelimit_oom' in the name. The statement is aborted, but otherwise the system
is not affected. By default, only one dump file is written every 24 hours. If a second limit
is hit in that interval, no dump file is written. The interval can be configured in the
memorymanager section of the global.ini file, by using the oom_dump_time_delta
parameter, which sets the minimum time difference (in seconds) between two dumps
of the same kind (and the same process).

138 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

- The value defined for this parameter can be overridden by the corresponding workload
class property STATEMENT_MEMORY_LIMIT.

After setting this parameter, statements that exceed the limit you have set on a host are
stopped by running out of memory.
● statement_memory_limit_threshold defines the maximum memory allocation per
statement as a percentage of the global allocation limit. The default value is 0%
(statement_memory_limit is always respected).

- In the global.ini file, expand the memorymanager section and set the parameter as a
percentage of the global allocation limit.
- This parameter provides a means of controlling when statement_memory_limit is
applied. If this parameter is set, when a statement is issued the system will determine if
the amount of memory it consumes exceeds the defined percentage value of the overall
global_allocation_limit parameter setting. The statement memory limit is only
applied if the current SAP HANA memory consumption exceeds this statement
memory limit threshold as a percentage of the global allocation limit.
- This is a way of determining if a particular statement consumes a large amount of
memory compared to the overall system memory available. In this case, to preserve
memory for other tasks, the statement memory limit is applied and the statement fails
with an exception.
- Note that the value defined for this parameter also applies to the workload class
property STATEMENT_MEMORY_LIMIT.

● total_statement_memory_limit limits the memory available to all statements running


on the system.
- This limit does not apply to users with the administrator role SESSION ADMIN or
WORKLOAD ADMIN who need unrestricted access to the system.
- The value defined for this parameter cannot be overridden by the corresponding
workload class property TOTAL_STATEMENT_MEMORY_LIMIT.

- There is a corresponding parameter for use with system replication on an Active/Active


(read enabled) secondary server. This is required to ensure that enough memory is
always available for essential log shipping activity.

Managing Peak Load


SAP HANA Admission Control
Use the admission control feature to apply processing limits and to decide how to handle new
requests if the system is close to the point of saturation.
You can apply thresholds by using configuration parameters to define an acceptable limit of
activity in terms of the percentage of memory usage or percentage of CPU capacity.
Limits can be applied at two levels so that new requests will be queued first, until adequate
processing capacity is available or a timeout is reached. A higher threshold can then be
defined to determine the maximum workload level above which new requests will be rejected.
If requests have been queued, items in the queue are processed when the load on the system
reduces below the threshold levels. If the queue exceeds a specified size or if items are
queued for longer than a specified period of time, they are rejected.

© Copyright. All rights reserved. 139


Unit 3: Proactive Monitoring and Performance Safeguarding

In the case of rejected requests an error message that the server is temporarily overloaded is
returned to the client: 1038,'ERR_SES_SERVER_BUSY','rejected as server is
temporarily overloaded'.
The load on the system is measured by background processes which gather a set of
performance statistics covering available capacity for memory and CPU usage. The statistics
are moderated by a configurable averaging factor (exponentially weighted moving average) to
minimize volatility, and the moderated value is used in comparison with the threshold
settings.
The admission control filtering process does not apply to all requests. In particular, requests
that release resources will always be executed, for example, commit, rollback, and
disconnect. The filtering also depends on user privileges: administration requests from
SESSION_ADMIN and WORKLOAD_ADMIN are always executed.
There are some situations where it is not recommended to enable admission control, for
example, during planned maintenance events such as an upgrade or the migration of an
application. In these cases it is expected that the load level is likely to be saturated for a long
time and admission control could therefore result in the failure of important query executions.

Managing Peak Load


Use the admission control feature to apply processing limits and to decide how to handle new
requests if the system is close to the point of saturation.
You can apply thresholds using configuration parameters to define an acceptable limit of
activity in terms of the percentage of memory usage or percentage of CPU capacity.

Figure 77: Dynamic Admission Control

Limits can be applied at two levels so that firstly, new requests are queued until adequate
processing capacity is available or a timeout is reached, and secondly, a higher threshold can
be defined to determine the maximum workload level above which new requests are rejected.
If requests have been queued, items in the queue are processed when the load on the system
reduces below the threshold levels. If the queue exceeds a specified size or if items are
queued for longer than a specified period of time, they are rejected.
In the case of rejected requests, an error message that the server is temporarily overloaded,
is returned to the client: 1038,'ERR_SES_SERVER_BUSY','rejected as server is
temporarily overloaded'.
The load on the system is measured by background processes that gather a set of
performance statistics covering available capacity for memory and CPU usage. The statistics
are moderated by a configurable averaging factor (exponentially weighted moving average) to
minimize volatility, and the moderated value is used in comparison with the threshold
settings.

140 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

The admission control filtering process does not apply to all requests. In particular, requests
that release resources are always executed, for example, commit, rollback, and disconnect.
The filtering also depends on user privileges: administration requests from SESSION_ADMIN
and WORKLOAD_ADMIN are always executed.

Figure 78: Dynamic Admission Control Architecture

To monitor the admission control feature, you can use the SAP HANA cockpit or use the
following public monitoring views that are available:
● M_ADMISSION_CONTROL_STATISTICS
● M_ADMISSION_CONTROL_QUEUES
● Extended M_CONNECTIONS.CONNECTION_STATUS for queueing status

© Copyright. All rights reserved. 141


Unit 3: Proactive Monitoring and Performance Safeguarding

Workload Management Integration into the SAP HANA Cockpit

Figure 79: Admission Control in the SAP HANA Cockpit

In the Workload Admission Control Setting application, you can configure the threshold
values for admission control to determine when requests are queued or rejected, which are
defined as configuration parameters.
The admission control feature is enabled by default and the related threshold values and
configurable parameters are available in the indexserver.ini file. A pair of settings is available
for both memory and CPU that define firstly the queuing level (the default value is 90%) and
secondly, the rejection level (the default is not active). Two parameters are available to
manage the statistics collection process by defining how frequently statistics are collected
and setting the averaging factor that is used to moderate volatility. These parameters,
available in the Workload Admission Control Setting application and in the
admission_control section of the INI file, are summarized in the following table.

Parameter Default Description


Value
enable True Enables or disables the admission control feature.
queue_cpu_thre 90 The percentage of CPU usage above which requests are
shold queued. Queue details are available in the view
M_ADMISSION_CONTROL_QUEUES.
queue_memory_t 90 The percentage of memory usage above which requests are
hreshold queued.
reject_cpu_thr 0 The percentage of CPU usage above which requests are
eshold rejected. The default value 0 means that no requests are
rejected, but may be queued.

142 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

Parameter Default Description


Value
reject_memory_ 0 The percentage of memory usage above which requests are
threshold rejected. The default value 0 means that no requests are
rejected, but may be queued.
averaging_fact 70 This percentage value gives a weighting to the statistic
or averaging process. A low value has a strong moderating
effect (but may not adequately reflect real CPU usage). A
value of 100% means that no averaging is performed, that is,
only the current value for memory and CPU consumption is
considered.
statistics_col 1000 The unit is milliseconds. The statistics collection interval is
lection_interv set by default to 1000 ms (1 second), which has a negligible
al effect on performance. Values from 100 ms are supported.
Statistics details are visible in the view
M_ADMISSION_CONTROL_STATISTICS.

Events and Rejection Reasons


If statements are being rejected, you may need to investigate why this is happening. Events
related to admission control are logged and can be reviewed in the
M_ADMISSION_CONTROL_EVENTS view. The key information items here are the event type
(such as a statement was rejected, a statement was queued, or a configuration parameter
was changed) and the event reason, which gives explanatory text related to the type. Other
details in this view include the length of time the statement was queued and the measured
values for memory and CPU usage.
Two parameters are available to manage the event log in the admission_control_events
section of the INI file:

Parameter Default Value Description


queue_wait_tim 100000 The length of time (measured in microseconds) for
e_threshold which a request must be queued before it is included
in the event log (the default is one tenth of a second).
If the parameter is set to 0, then events are not
logged.
record_limit 1000000 The maximum record count permitted in the monitor
of historical events.

Queue Management
If requests have been queued, items in the queue are processed when capacity becomes
available. A background job continues to evaluate the load on the system in comparison to the
thresholds. When the load is reduced enough, queued requests are submitted in batches on
an oldest-first basis.
The queue status of a request is visible in the M_CONNECTIONS view. The connection status
value is set to queuing in the column M_CONNECTIONS.CONNECTION_STATUS.

© Copyright. All rights reserved. 143


Unit 3: Proactive Monitoring and Performance Safeguarding

There are several configuration parameters (in the admission_control section of the INI
file) to manage the queue and how the requests in the queue are released. You can apply a
maximum queue size or a queue timeout value. If either of these limits are exceeded, requests
which would otherwise be queued, are rejected. An interval parameter is available to
determine how frequently to check the server load so that de-queueing can start, and a de-
queue batch size setting is also available.

Parameter Default Description


Value
max_queue_size 10000 The maximum number of requests that can be queued.
Requests above this number are rejected.
dequeue_interv 1000 Use this parameter to set the frequency of the check to re-
al (ms) evaluate the load in comparison to the thresholds. The
default is 1000 ms (1 second). This value is recommended to
avoid overloading the system, though values above 100 ms
are supported.
dequeue_size 50 Use this parameter to set the de-queue batch size, that is,
the number of queued items that are released together once
the load is sufficiently reduced. This value can be between 1
and 100 queued requests.
queue_timeout 600 (s) Use this parameter to set the maximum length of time for
which items can be queued. The default is 10 minutes. The
minimum value that can be applied is 60 seconds and there
is no maximum limit. Requests queued for this length of time
are rejected. Note that the time-out value applies to all
entries in the queue. Any changes made to this configuration
value are applied to all entries in the existing queue.
queue_timeout_ 10000 Use this parameter to determine how frequently to check if
check_interval (ms) items have exceeded the queue time-out limit. The default is
10 seconds. The minimum value that can be applied is 100
ms and there is no maximum limit.

Note:
If Admission Control has been configured and is active, it takes precedence over
any other time-out value which may have been applied. This means that other
timeouts that apply to a query (such as a query timeout) will not be effective until
the query has been de-queued or rejected by the queue timeout.

SAP HANA Workload Classes


SAP HANA Workload Classes
You can manage workloads in SAP HANA by creating workload classes and workload class
mappings. Appropriate workload parameters are then dynamically applied to each client
session.
Workload class settings override other configuration settings (INI file values) which have been
applied. Workload class settings also override user parameter settings which have been

144 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

applied by the SQL command ALTER USER. However, workload class settings only apply for
the duration of the current session, whereas changes applied to the user persist. More
detailed examples of precedence are given in a separate section.
To apply workload class settings, client applications can submit client attribute values
(session variables) in the interface connect string as one or more property-value pairs. The
key values which can be used to work with workload classes are: database user, client,
application name, application user, and application type.

Figure 80: Settings at the SAP HANA Session Level

Based on this information the client is classified and mapped to a workload class. If it cannot
be mapped, it is assigned to the default workload class. The configuration parameters
associated with the workload class are read and this sets the resource variable in the session
or statement context.
The list of supported applications includes HANA WebIDE (XS Classic), HANA Studio, ABAP
applications, Lumira, and Crystal Reports. Full details of the session variables available in
each supported client interface which can be passed in the connect string are given in SAP
Note 2331857 SAP HANA workload class support for SAP client applications.

Caution:
Workload classes cannot be used on an Active/Active (read-only) secondary
node.

Required Privilege
Managing workload classes requires the WORKLOAD ADMIN privilege. Changes of workload
classes or mappings will only be applied when a (connected) database client reconnects. In
terms of the privileges of the executing user (DEFINER or INVOKER), the workload mapping is
always determined on the basis of invoking user, regardless of whether the user has definer or
invoker privileges.
The ABAP server sets the client context information automatically for all ABAP applications.

© Copyright. All rights reserved. 145


Unit 3: Proactive Monitoring and Performance Safeguarding

Users, classes, and mappings are interrelated: if you drop a user in the SAP HANA database,
all related workload classes are dropped and if you drop a workload class, the related
mappings are also dropped

Note:
In a scale-out environment workload classes are created for the complete SAP
HANA database and do not have to be created for each single node. However,
restrictions defined in these workload classes are applied to each single node
and not to the complete SAP HANA database

Create Workload Classes Using SAP HANA Cockpit


Several configuration options are available so that you can tailor workload classes in the SAP
HANA database to your needs.
You can manage workload in SAP HANA by creating workload classes and workload class
mappings. Workload classes and mappings are SQL object for workload management in SAP
HANA. The goal of workload classes and mappings is to provide an easy way for
administrators to regulate applications based on predefined mapping rules, to avoid resource
shortages with regard to CPU and memory consumption. Appropriate workload parameters
are dynamically applied to each client session.

Figure 81: Create Workload Class

You can classify workloads based on user and application context information and apply
configured resource limitations (for example, a statement memory limit). Workload classes
allow SAP HANA to influence dynamic resource consumption on the session or statement
level. When a request from an application arrives in SAP HANA, the corresponding workload

146 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

class is determined based on the information given by the session context such as application
name, application user name and database user name. Once the corresponding workload
class is determined, the application request can have its resources limited according to the
workload class definition.
Statement memory limits will not apply if memory tracking is inactive in SAP HANA cockpit.
You can activate memory tracking in the Configuration settings.
You can use workload classes to set values for the properties listed here. Each property also
has a default value, which is applied if no class can be mapped or if no other value is defined.
For all of the following parameters, although you can enter values including decimal fractions
(such as 1.5 GB) these numbers are rounded down and the whole number value is the
effective value which is applied

Parameter Value
Workload Class A name for the new workload class.
Name
Execution Priority To support better job scheduling, this property prioritizes
statements in the current execution. Priority values of 0 (lowest
priority) to 9 (highest) are available. The default value is 5.
Limit Type Individual Statement Limit or Total Aggregate Statement Limit.
Statement Memory Displayed if Individual Statement Limit is the specified limit type.
Limit Maximum amount of memory the statement may use, as either an
absolute or relative value.
Total Memory Limit Displayed if Total Aggregate Statement Limit is the specified limit
type. Maximum amount of memory all statements may use, as either
an absolute or relative value.
Statement Thread Displayed if Individual Statement Limit is the specified limit type.
Limit Maximum number of parallel threads the statement may execute, as
either an absolute or relative value.
Total Thread Limit Displayed if Total Aggregate Statement Limit is the specified limit
type. Maximum number of parallel threads all statements may
execute, as either an absolute or relative value.
Query Timeout The amount of time in seconds before the query times out.
(Available for databases running SAP HANA SPS 03 or higher).

Note:
For thread and memory limits, workload classes can contain either the statement-
level properties or the aggregated total properties, but not both. For the
aggregated limits, the full set of three properties must be defined: TOTAL
STATEMENT THREAD LIMIT, TOTAL STATEMENT MEMORY LIMIT, and
PRIORITY.

© Copyright. All rights reserved. 147


Unit 3: Proactive Monitoring and Performance Safeguarding

Example
You can set values for one or more resource properties in a single SQL statement. The
following example creates a workload class called MyWorkloadClass with values for all three
properties:
CREATE WORKLOAD CLASS "MyWorkloadClass" SET 'PRIORITY' = '3',
'STATEMENT MEMORY LIMIT' = '2' , 'STATEMENT THREAD LIMIT' = '20'

Examples of Precedence for Query Timeout


If multiple values have been defined using the different timeout methods available then
precedence rules apply. Firstly, if a valid matching workload class value has been defined, this
takes precedence over the INI file setting. Secondly, if a QueryTimeout value has been
applied, then the smallest (strictest) valid value which has been defined applies. The following
table shows some examples: in each case the values marked by an asterisk are the ones
which apply.

QueryTimeout 25 25 25 25*
statement_timeout (ini) 10 10* 10* 10 (ignored)

STATEMENT TIMEOUT (Workload class) 20* no match no value 0 (disabled)

Creating a Workload Mapping


Mapping link workload classes to client sessions depends on the value of a specific client
information property. The class with the most specific match is mapped to the database
client.
The SAP HANA application sends client context information in the ClientInfo object. This is a
list of property-value pairs that an application can set in the client interface. You can change
the running session-context of a connected database client using the following SQL
command:
ALTER SYSTEM ALTER SESSION SET

For more information, see the Setting Session-Specific Client Information in the SAP HANA
Developer Guide.
The properties supported are listed in the following table in order of importance. The
workload class with the greatest number of properties matching the session variables passed
from the client is applied. If two workload classes have the same number of matching
properties, they are matched in the following order of importance.

Field Name Description


OBJECT NAME Object types PROCEDURE, PACKAGE and AREA are supported.
This property only applies to procedures including AFLLANG
procedure which is a standard execution method to execute the
application function. Example: If a workload class is matched to
an object with type AREA, then it will apply the workload class
definition to all AFLLANG procedures which call application
functions in the given AFL AREA. Object type PACKAGE works
in a similar way. If more than one workload class is matched by

148 © Copyright. All rights reserved.


Lesson: Setting up SAP HANA Workload Management

Field Name Description


the OBJECT NAME then the more specific object type has the
higher priority: PROCEDURE > PACKAGE > AREA.
SCHEMA NAME Schema name of object defined in the OBJECT NAME property.
XS APPLICATION USER Name of the XS application user. For XSA applications which
NAME* use the session variable XS_APPLICATIONUSER for the
business user value.
APPLICATION USER The name of the application user, usually the user logged into
NAME* the application.
CLIENT* The client number is usually applied by SAP ABAP applications,
such as SAP Business Suite / Business Warehouse.
APPLICATION The name of the application component. This value is used to
COMPONENT NAME* identify sub-components of an application, such as CRM inside
the SAP Business Suite.
APPLICATION This value is used to provide coarse-grained properties of the
COMPONENT TYPE* workload generated by application components. In the future,
SAP may document well-defined application component types
to identify, for example, batch processing or interactive
processing.
APPLICATION NAME* The name of the application.
USER NAME The name of the SAP HANA database user, that is, the
'CURRENT_USER' of the session of the database the application
is connected to. Alternatively, you can use the name of a user
group; if both user name and group are provided a validation
error is triggered. The user name has a higher priority than the
user group in cases where these properties are required to
determine the best match.

Properties marked with an * support the use of wildcard characters.

Example
This example creates a workload mapping called MyWorkloadMapping that applies the
values of the MyWorkloadClass class to all sessions where the application name value is
HDBStudio:
CREATE WORKLOAD MAPPING "MyWorkloadMapping" WORKLOAD CLASS
"MyWorkloadClass" SET 'APPLICATION NAME' = 'HDBStudio';

Hints for Workload Classes


To give control over workload classes at run-time, a WORKLOAD_CLASS hint is available. You
can use this to apply more restrictive properties than the ones otherwise defined. For
example, workload class YOUR_WORKLOAD_CLASS applies the values: PRIORITY 5, THREAD 5,
MEMORY 50GB. This is then overridden by the values defined in a new class, as a hint, to
apply a higher priority value, a lower thread limit, and a lower memory threshold:
SELECT * FROM T1 WITH HINT( WORKLOAD_CLASS("MY_WORKLOAD_CLASS") );

This example applies more restrictive limits than those already defined and by default,
workload class hints can only be used in this way. The hint is ignored if any of the new values

© Copyright. All rights reserved. 149


Unit 3: Proactive Monitoring and Performance Safeguarding

weaken the restrictions or if any values are invalid. You can change this default behavior by
switching the following configuration parameter in the session_workload_management
section of the indexserver.ini file: allow_more_resources_by_hint. If this parameter is set
to True then any hint can be applied.

When are Restrictions Applied?


Generally, limitations applied by configuration settings, user profile parameters, and workload
class settings are applied as queries are compiled. For workload classes, you can see this by
querying the M_EXEPENSIVE_STATEMENTS monitoring view to see the operation value
(COMPILE, SELECT, FETCH, and so on) when the workload class was applied to a statement
string. This is not true, however, if the workload class mapping was overridden by a hint, and
also does not apply to the OBJECT NAME and SCHEMA NAME workload class properties. In
these cases, the statement can only be evaluated after compiling.

Monitoring Workload Classes


The following system views allow you to monitor workload classes and workload mappings:
● WORKLOAD_CLASSES
● WORKLOAD_MAPPINGS

In these system views the field WORKLOAD_CLASS_NAME shows the effective workload
class used for the last execution of that statement:
● M_ACTIVE_STATEMENTS
● M_PREPARED_STATEMENTS
● M_EXPENSIVE_STATEMENTS (enable_tracking and memory_tracking must first be
enabled in the global.ini file for this view)
● M_CONNECTIONS

If no workload class is applied, these views display the pseudo-workload class value
_SYS_DEFAULT.

LESSON SUMMARY
You should now be able to:
● Set up SAP HANA workload management

150 © Copyright. All rights reserved.


Unit 3
Lesson 3
Using SAP HANA Capture and Replay

LESSON OBJECTIVES
After completing this lesson, you will be able to:
● Capture and replay a SAP HANA workload

Overview of SAP HANA Capture and Replay


Business Example
You want to use the SAP HANA capture and replay functionality to help you find possible
performance or stability problems after changes in the hardware or software configuration.

What Is SAP HANA Capture and Replay?

Figure 82: SAP HANA Capture and Replay

The SAP HANA Capture and Replay performance management tool allows you to capture the
workload of a source system and to replay the captured workload on a target system without
applications.
Moreover, you can use the tool to analyze the captured workload and the reports generated
after replaying the workload. Comparing the performance between the source and target
systems can help you to find the root cause of performance differences.

© Copyright. All rights reserved. 151


Unit 3: Proactive Monitoring and Performance Safeguarding

The following changes may require a check of the existing system, concerning both
performance and stability:
● Hardware change
● SAP HANA revision upgrade
● SAP HANA INI file change
● Table partitioning change
● Index change
● Landscape reorganization for SAP HANA scale-out systems
● Apply HINT to queries

What Is a Workload?
A workload in the context of SAP HANA can be described as a set of requests with common
characteristics.
In the context of SAP HANA capture and replay, workload can mean any change to the
database via SQL statements that come from SAP HANA client interfaces such as JDBC,
ODBC, or DBSL. The workload can be created by applications or clients (for example, SAP
NetWeaver or Analytic).
You can look at the details of a workload in several ways. Firstly, you can look at the source of
requests and determine if applications or application users generate a high workload for the
system. You can examine what kinds of SQL statements are generated. Are they simple or
complex statements? Is there a prioritization of work done based on business importance?
For example, does one part of the business need to have more access at peak times? You can
then look at what kind of service level objectives the business has in terms of response times
and throughput.

How Does SAP HANA Capture and Replay Work?


The following figure gives an overview of the main steps involved in the capture and replay
process.

152 © Copyright. All rights reserved.


Lesson: Using SAP HANA Capture and Replay

Figure 83: How SAP HANA Capture and Replay Works

The main steps involved in the capture and replay process are:

1. Capture
In this step the tool automatically collects the execution context information together with
the incoming requests to the database. The captured workload file stores the start times
of the SQL statements.
A database backup is recommended after starting capturing, to ensure that the source
and target systems are in a consistent state.

2. Preprocess
In this step the tool reconstructs and optimizes the captured workload file so it can be
replayed on a target system. This process is a one-time operation and the stored
preprocessed workload file can be replayed multiple times.

3. Replay
The replayer is a service on operating system level that needs to be started before
replaying.
The tool replays the preprocessed file based on the SQL statement timestamp or on the
transactional order. Together with the collected execution context it allows you to
accurately simulate the database workload.

4. Analyze
For a final analysis, you can generate comparison reports displaying a capture-replay or a
replay-replay comparison. You can analyze the statements based on results or on
performance

© Copyright. All rights reserved. 153


Unit 3: Proactive Monitoring and Performance Safeguarding

Note:
We recommend that you perform a full database backup after starting the capture
step. This backup needs to be restored on the target system before the replay
phase starts to ensure that the source and target systems are in a consistent
state.

What Are the Landscape Requirements for Using SAP HANA Capture and Replay?
You can use a two- or three-system setup as a SAP HANA Capture and Replay landscape. The
following figure shows both the two-tier setup and the three-tier setup.

Figure 84: Possible Capture and Replay System Landscape Setups

In a two-system setup, you need a source and a target system. The control system and the
target system share the same host.
In a three-system setup, a separate control system is added. This control system is the
system running the cockpit and is the system for storing intermediate preprocessed or replay
results.
The advantage of a three-system setup is that the replay results are stored in a separate
control system. This means the capture information is not lost when recovering the target
system.
Consider the following recommendations when using SAP HANA capture and replay:
● Check the disk performance to ensure that there is sufficient bandwidth for capturing and
preprocessing workloads without any performance bottlenecks. If disk performance is not
sufficient, the active capture can impact the source system.

154 © Copyright. All rights reserved.


Lesson: Using SAP HANA Capture and Replay

● Check the available disk space in combination with the characteristics of the workload that
should be captured. The required disk space is highly dependent on the type of workload
being captured.
● Use the disk space that is dedicated to the database instance itself.
● One replayer service is sufficient to execute a replay successfully. For better scalability and
performance in large workload scenarios, multiple replayers can be used for all replaying
purposes. When using multiple replayers, distribute and divide all involved components
(for example, target instance, control instance, one or more replayers) on different hosts
and systems. Doing so will reflect the initial captured workload as realistically as possible.
This will also reduce the effect which the resource consumption of the components may
have on a replay.
● Use a separate control and target instance for replaying workloads. If a replayed statement
causes a crash, it will be displayed in the replay report. When you use the same control and
target instance, the replay report entry causing a crash, will not be successfully sent to the
control instance.
● Use the secure store for saving passwords and authenticating users.
● The target system should meet the same privacy and security prerequisites as the source
system. Since the target system processes the same data as the source system, it should
meet an appropriate security level depending on data criticality.
Unnecessary network connections to the target system should not be allowed. Users
registered on the source system might be able to access the target system after a replay
has been completed.
● Regarding version dependencies, the following rules can be followed:
- Target system >= control system and replayers >= source system.
- The source system should be at least 122.14+ for captures with transactional replay
enabled.
● To trigger replays, the control system and target system must be registered in the same
SAP HANA cockpit. The user in the SAP HANA cockpit must be able to access both
systems.
When registering the target system, the cockpit does not store the credentials.

System Privileges for SAP HANA Capture and Replay

● WORKLOAD CAPTURE ADMIN – Required for capturing workloads.


● WORKLOAD REPLAY ADMIN – Required for prepocessing, replaying workloads, and
viewing the load chart in the replay report.
● WORKLOAD ANALYZE ADMIN – Required for viewing the load chart in the replay report.

Capturing a Workload on the Source System


You can capture the entire workload from a source system or only a part of this workload.
To capture the workload from a source system, use the Capture Workload card.
After capturing the workload from a source system, a captured workload file is available for
replay. This captured workload file contains multiple captured workload segment files. At a
conceptual level, we refer to the captured workload file using the shorter term
captured workload.

© Copyright. All rights reserved. 155


Unit 3: Proactive Monitoring and Performance Safeguarding

Figure 85: Source System – Capture a Workload

To capture a workload on the source system, you need a user with WORKLOAD CAPTURE
ADMIN system privilege. Additionally, you can add the optional privileges INIFILE ADMIN and
BACKUP OPERATOR. These two additional privileges let you change parameter values in the
optional filters on the capture configuration page, and start database backups while the
workload capture is running.

Steps to Capture a Workload


1. On the Overview page, choose the Capture Workload card. The Capture Management page
opens. If you have already captured workload with SAP HANA capture and replay, you see
the captured workload on the current system.

2. To change INI configuration parameters, choose Configure Capture.


In the Configure Capture screen, you can change the Capture Destination. By default, the
captured workload file is stored in the $SAP_RETRIEVAL_PATH/trace directory. Because
the default trace directory generally resides in the same storage area as data and log
volumes, capturing workloads may affect the performance across the entire system over
time. Enter a different destination for the captured workload file to have a better
distribution of the disk I/O between the data and log volumes, and the captured workload
file.

3. To start configuring the new capture, on the Capture Management page, choose New
Capture.
On the Configure New Capture page, it is mandatory to enter the name of the new capture.
You can customize other optional settings before you start the capture.

4. Choose Start Capture.

156 © Copyright. All rights reserved.


Lesson: Using SAP HANA Capture and Replay

The Capture Monitor opens, displaying monitoring information such as duration, the
number of captured statements, or disk space. You can stop the capture or you can let it
run for as long as you wish. If you did not create a backup when starting the capture, you
can also start a full backup from the Capture Monitor page.

Note:
By default, the captured workload file is stored under the trace directory
$DIR_INSTANCE/<host name>/trace with a CPT file extension.

After the capture is complete, the new captured workload has the status Captured. By
choosing the new captured workload, the Capture Report opens, displaying information about
the captured workload. You can continue analyzing the captured workload. For more
information, see Analyze a Captured Workload.

Capture Configuration Settings


You can customize several optional settings before you start capturing a workload. The
following list provides an overview of the optional settings that can be customized.
● Description: Enter a description of the capture for future reference. You can use this
description to identify different scenarios in the same system.
● Schedule: Schedule the capture by specifying the start and end time.
● Overwrite Capture When Time Exceeds: Turn it on and enter a time to remove the
captured workload segment files that are older than the specified time you entered. Only
closed segments are deleted. The currently active captured workload segment file is not
affected.
● Overwrite Capture When Disk Usage Exceeds: Turn it on and select a ratio to remove the
old captured workload segment files when the disk usage exceeds the specified
percentage. Only closed segments are deleted. The currently active captured workload
segment file is not affected.
● Collect Explain Plan: Turn this on to collect the output of the EXPLAIN PLAN command for
the captured statements. You can use this information for analysis after the replay.
● Collect Workload Details: Turn this on to collect additional information for the
instrumentation-based workload analyzer such as application source, involved threads,
network statistics, or related objects.
If this option is disabled, the captured workload file can still be viewed using the
instrumentation-based workload analyzer, but less information is available for the review.
● Abstract SQL Plan: Turn this on to collect additional information on the SQL Plan Stability.
Information related to the physical execution plan is collected and stored as Abstract SQL
Plan (ASP) to the persistent storage for future reference.
● Optional Filter: Select additional filters to capture only desired aspects of the workload.
Filters can include different aspects such as Application Name, Database User Name, and
Statement Hash.
● Create Full Backup: Turn this on to automatically create a full database backup after
starting the capture

© Copyright. All rights reserved. 157


Unit 3: Proactive Monitoring and Performance Safeguarding

Hint:
To ensure that the source system and the target system are in a consistent
state for capture and replay, we recommend performing a full database
backup after starting the capture. A full database backup is only required the
first time, because incremental backups can be used once the system has
been initialized. For more information, see SAP HANA Backup and Recovery.

Analyze a Captured Workload


You can use the workload analyzer, based on engine instrumentation, to analyze the captured
workload before starting the replay.

Figure 86: Analyze a Captured Workload

Analyze Workload provides information on the number of traced files, as well as information
on the number of loaded files. To analyze the traced workload data, the file must be loaded
into the database.

Prerequisites
You have the system privilege WORKLOAD ANALYZE ADMIN.

Steps to Analyze a Captured Workload


1. To analyze the captured workload from a source system, use the Capture Workload card,
then select the capture you want to analyze.

2. To load the workload into the SAP HANA database, choose Load Workload.

158 © Copyright. All rights reserved.


Lesson: Using SAP HANA Capture and Replay

3. When the data is loaded the Analyze Workload button appears.

4. In the Workload Analysis page, several analysis charts and information tables are
displayed.

Preparations Before Replaying a Workload


Before you can replay a captured workload, you need to perform some preparation steps at
the SAP HANA and operating system levels. The required steps are shown in the following
figure.

Figure 87: Preparations Before Replaying a Workload

Set up the Replayer User Account in the Target System


In the target system, you need a user account to control the replayer. This user account
requires the WORKLOAD REPLAY ADMIN system privilege. Store the logon credentials in the
secure store of the control system.
As <sid>adm on the control system, execute the following command:
hdbuserstore SET <User key><Target system@Target tenant><User
name><Password>

Starting the Replayer


The replayer is a service at the operating system level that must be running before starting
the replay.

Note:
The replayer is not part of the SAP HANA database services that are running as
daemon processes. You must start and stop the replayer yourself.

© Copyright. All rights reserved. 159


Unit 3: Proactive Monitoring and Performance Safeguarding

Prerequisites
● A user with the WORKLOAD REPLAY ADMIN system privilege to control the replayer.
● Store the logon credentials in the secure store.
● When using multiple replayers, distribute and divide all involved components (for example,
target instance, control instance, and one or more replayers) on different hosts and
systems.

As <sid>adm, on the control system, execute the following command:


hdbwlreplayer -controlhost <Target system>-controlinstnum <Instance
number target>-controladminkey <CNR-Admin, User_key>-controldbname
<Target tenant>-port <listenPort>

Preprocess a Captured Workload on the Target System


The captured workload must be preprocessed before you can replay it on the target system.
The preprocessing step is necessary to optimize the captured workload file before replaying
it.
While preprocessing the captured workload file, the captured statement segment files are
stored in a directory. After the preprocessing is complete, the output is a preprocessed
workload file containing the directory with multiple files.
At the conceptual level, we refer to the preprocessed workload file using the shorter term
preprocessed workload. The following figure shows the concepts and the terminology:

Figure 88: Preprocess a Captured Workload File

The replay process is performed by the replayer, which must be running before you start the
replay. The replayer is a service at operating system level that reads SQL commands from the
preprocessed workload file and executes them one-by-one in timestamp-based order. All
preprocessed workloads can be replayed as often as necessary.

Hint:
SAP recommends performing the preprocessing step in the target system or in a
separate control system, not in the production system. The preprocessing may
require significant computing power.

You can preprocess a captured workload and replay the preprocessed workload using the
Replay Workload card.

Prerequisites
● A user with WORKLOAD REPLAY ADMIN system privilege.

160 © Copyright. All rights reserved.


Lesson: Using SAP HANA Capture and Replay

● You have captured workloads using the Capture Workload card.


● Copy the captured workload file from the production system to the target system in the
trace directory. If you use a control system that is different from the target system, copy
the captured workload file to the control system.

Figure 89: Preprocess a Captured Workload

Steps to Preprocess a Captured Workload


1. On the Overview page, choose the Replay Workload card. The Replay Management page
opens displaying an overview of the replay candidates located on the current system.

2. Check the status of the captured workload that you want to preprocess. The status should
be Not Preprocessed.

3. Optional: To change INI configuration parameters, choose Configure Replay.

4. Choose the Start link below the Not Preprocessed status.

5. The preprocessing starts. The runtime depends on the size of the captured workload. You
can manually refresh the screen by using the refresh button at the top right of the screen.

6. As soon as the preprocessing is done, the status changes to Preprocessed.

Preprocess Destination: After the preprocessing is complete, the preprocessed workload file
is stored by default in the $SAP_RETRIEVAL_PATH/trace directory. Because the default
trace directory generally resides in the same storage area with data and log volumes,
preprocessing workloads may affect the performance across the entire system. Enter a
different destination to have a better distribution of the disk I/O between the data and log
volumes, and the preprocessed files.

© Copyright. All rights reserved. 161


Unit 3: Proactive Monitoring and Performance Safeguarding

Note:
As an example, an executed statement A has a runtime of 10 ms during capture,
while executed statement A has a runtime of 12 ms during replay. If the runtime in
the target system is lower than the configured threshold value, the statement is no
longer listed in the replay report as slower or faster, but as comparable.

Replay a Preprocessed Workload on the Target System


You can replay the preprocessed workload based on the SQL statement timestamp or on the
transactional order.
Replaying a workload implies that the captured statements are executed again. You can
replay all captured workloads as often as necessary.
When running consecutive replays, we recommend restoring the target system back to a
consistent state after a replay and before running another replay. This is necessary because
after replaying a workload on a system, any changes applied during that replay remain active
in the system.

Hint:
Manually copy the captured workload files and the database backup from the
source system to the control or target system.

Prerequisites
● A user with WORKLOAD REPLAY ADMIN and WORKLOAD ANALYZE ADMIN system
privileges.
● The target system meets the same security and privacy prerequisites as the source
system. Since the target system processes the same data as the source system, it should
meet an appropriate security level depending on data criticality.
● You have preprocessed the captured workloads using the Replay Workload card.
● The replayer is running.

Caution:
Do not allow unnecessary network connections to the target system. Users
registered on the source system could access the target system after the replay
is complete.

162 © Copyright. All rights reserved.


Lesson: Using SAP HANA Capture and Replay

Figure 90: Replay a Preprocessed Workload

Steps to Replay a Preprocessed Workload


1. On the Overview page, choose the Replay Workload card. The Replay Management page
opens displaying the captured workload.

2. Choose a replay candidate with the status Preprocessed to start configuring it for the
replay. The Replay Configuration page opens, allowing you to configure various mandatory
and optional settings.

3. To start the replay, choose Start Replay.

If a database backup is available, restore the database before starting the replay in the target
system. When running a replay on a target system that has been restored using a backup
taken automatically during the capture process, activate the Synchronize Replay with Backup.
If no or only out-dated database backups are available, you can still restore the database or
manually export parts of the data before starting the replay in the target system. When
running a replay on a target system that was restored using old backups or contains only
smaller manual exports of data, deactivate the Synchronize Replay with Backup option.
The Replay Management screen opens displaying in the Replay List tab for the workloads that
are being replayed. To access the Replay Monitor, choose the running replay. The monitoring
view provides information such as duration, number of statements, size, and other details
about the replay in progress. You can navigate away from the monitoring view using the arrow
at the top right and you can return anytime.
If you have already replayed preprocessed workloads, you can generate comparison reports
for further analysis. For more information, see Generating Comparison Reports.

© Copyright. All rights reserved. 163


Unit 3: Proactive Monitoring and Performance Safeguarding

Replay Configuration Settings


The Replay Configuration allows you to set various parameters to best suit how the replay is
to be processed. The General Information page allows you to customize the following
mandatory and optional settings:
General Replay Information
● Replay Name: Enter a name for the replay. By default, this field has the same name as the
initial captured file.
● Description: Enter a description for the replay for your future reference. This information
can be used when changing settings for different replays.

Target System Information


● Host: Enter the target host name (for example, hana01) where the capture is replayed.

● Instance Number: Enter the target instance number (for example, 42) where the capture is
replayed.
● Database Mode: Choose between Single Container or Multiple Containers.

Replayer Options
● User Name: Enter a database user who has WORKLOAD REPLAY ADMIN privilege and is
used for the final preparation steps in the target instance.
● Password: Enter the database user password.
● Request Rate: Modify the rate at which the statements are replayed.
You can decrease the wait time between statements during the replay. For example,
statement B starts one second after statement A has been triggered. When setting the
request rate from 1x to 2x, this difference is only 0.5 seconds.
● Consistent with Backup: This option allows you to synchronize the replay with an existing
database backup.
The option is turned on by default allowing the replayer to compare each statement with
the database backup. This option makes it possible to check if there are no duplicate
inserts and if the backup and replay are aligned. A backup is required for this option to
work correctly.
If the option is turned off, the replayer replays statements, even if no backup is present.
This is important for scenarios in which you use only single tables, or smaller data exports,
which are not considered a complete backup.
● Collect Explain Plan: Collect the output of the EXPLAIN PLAN command for captured
statements. You can use this information for comparison after the replay.
● Transactional Replay: This option enables guaranteed transactional consistency during a
replay.

Caution:
Enabling this option may cause overhead to query runtime as transactional
consistency needs to be checked constantly.

Replayer Information

164 © Copyright. All rights reserved.


Lesson: Using SAP HANA Capture and Replay

● Replayer List: Select a running replayer that is used to connect to the target system and
facilitates the replay.
● User Authentication: Enter the password or the securestore key for the database users
captured in the source system. For a realistic replay, all users that are part of the workload
which you have chosen to replay, must be authenticated.
To reset the password for the database users captured in the source system, select Reset
Password, then choose the user. This can be helpful when you do not know the actual
password of each user. On the Reset Password window, set a new password for all selected
users, and choose Confirm. All selected user passwords in the defined target system are
changed as defined in this step.

Generating Comparison Reports


Comparison reports can be generated after successfully replaying a captured workload. You
can generate comparison reports displaying a capture-replay or a replay-replay comparison.

Capture-Replay Comparison Report


You can open a comparison report by choosing a replay from the Replay List. When opening a
comparison report directly from the Replay List, the report shown always compares values
from the original captured workload with values from the replay. This comparison report also
allows you to export the replay results to store them outside the database. Choose the arrow
at top right to export the replay results. Back in the Replay List, you can import a replay at the
top right.

Note:
Storing the replay results outside the database can be useful when the target and
control systems are the same. In such a setup, the previous replay results of the
control system could be overwritten after recovering the target system from the
database backup.

Generate a Replay-Replay Comparison Report


You can compare two or more replayed workloads with each other based on the same initial
captured workload. When using the Compare Replays button, the report shown always
compares different replays with each other based on the same initial captured workload.

Prerequisites
● A user with WORKLOAD REPLAY ADMIN and WORKLOAD ANALYZE ADMIN system
privileges.
● You have replayed preprocessed workloads using the Replay Workload card.

You can start the comparison of the replayed workloads from the Replay List in the Replay
Management.

Step to Generate Comparison Reports


1. On the Replay List tab, choose Compare Replays. The Select Baseline Replay dialog opens,
allowing you to select the replayed workload that you want to compare. Use the Target
<SID>information to distinguish between the replays.

2. Select one entry from the displayed list and choose Close. The Select Target Replay dialog
opens, allowing you to select the replayed workload that you want to compare with the

© Copyright. All rights reserved. 165


Unit 3: Proactive Monitoring and Performance Safeguarding

previously selected workload. The list displays replayed workloads based on the same
initial captured workload.

3. Select one or more entries from the displayed list and choose Compare Replays. The
Comparison Report opens, displaying a comparison of the selected replayed workloads.

Analyzing Comparison Reports


You can use comparison reports to analyze the completed replay. The information is
displayed on four tabs.

Figure 91: Generating Comparison Reports

Overview Tab
The Overview tab displays an overall comparison of the SQL statements involved in the
capturing and replaying process in the following blocks:
● Result Comparison: In a result-based comparison, you get an overview of the statements
with identical or different results. Choose the block to open directly the Result Comparison
tab.
● Performance Comparison: In a performance-based comparison, you get an overview of the
statements based on a comparison of runtimes. Choose the block to open directly the
Performance Comparison tab.
● Different Statements: Displays the top SQL statements that have different results from the
selected baseline in descending order. You can choose each row to open the Execution
Detail page for the selected SQL statement. Use the drop-down arrow to filter the
statements by time or by the number of records that have different results.
● Slower Statements: Displays the top SQL statements that have a different performance
ordered by the difference in execution time. You can choose each row to open the
Execution Detail page for the selected SQL statement. To view KPI details for each
statement, choose the icon to the right.
● Verification Skipped: Displays the distribution of reasons for statements with skipped
result comparison.

166 © Copyright. All rights reserved.


Lesson: Using SAP HANA Capture and Replay

● Replay Failed Statements: Displays the distribution of reasons for the statements, which
failed during replay. Use the drop-down arrow to filter the statements by time or error
code.
● Capture Information: Displays information on the capture system, capture options, and
the properties of the capture file.
● Replay Information: Displays information on the replay system and the replay options. If
the comparison was made between two replays, the information is displayed in a Baseline
Replay Information block and in a Target Replay Information block.

Load Tab
The Load tab includes load charts comparing both the captured and the replayed workloads
after a capture-replay comparison, or the baseline and the target workloads after a replay-
replay comparison. The KPIs can be toggled independently for both the capture and replay
aspects, making it easier to compare them with each other. Additional KPIs can be added
using the Show More KPIs button at the top right of the load chart.

Result Comparison Tab


The result-based comparison provides an overview of statements with, for example, identical
or different results.
The result-based replay report also includes a classification of statement types based on the
content of those statements being either deterministic or non-deterministic. Deterministic
statements should always deliver the same results during a replay. Non-deterministic
statements are expected to deliver different results (for example, because they do not
contain an explicit sorting of results).
The list below the graphic displays a detailed view of the statements that were selected using
the categories at the top of the report. To open the execution details for a specific statement,
choose the statement from the list.

Performance Comparison Tab


The performance-based comparison provides an overview of statements compared by
runtime.
Based on tolerance ratio, the statements are classified as comparable, when they have a
similar runtime within the defined tolerance ratio, Faster, Slower, or Failed. For more
information about the tolerance ratio, see Replay Report Elapsed Time Threshold in
Preprocess a Captured Workload.
The list below the graphic displays a detailed view on the statements that were selected using
the categories at the top of the report. To open the execution details for a specific statement,
choose the statement from the list.
You can use the detailed execution level of both report types to compare the EXPLAIN PLAN
results between the initial captured and replayed workloads, or between the baseline and
target workloads. Comparing the plans can provide guidance and pointers for further
statement-level investigation. This is only possible if the Collect Explain Plan setting was
activated for both the capture and the replay during the configuration steps. For more
information about this setting, see Capture Configuration Settings and Replay Configuration
Settings.

© Copyright. All rights reserved. 167


Unit 3: Proactive Monitoring and Performance Safeguarding

Resume Monitoring and Performance Tuning

Figure 92: Resume Monitoring and Performance Tuning

In the first part of the course HA215, we looked at specific troubleshooting and system
analysis when the SAP HANA database system is offline, hanging, or slow. We discussed how
to investigate the system status and how to generate a full system dump of the trace files.
In the second part, we looked at how to perform a performance root cause analysis on issues
regarding high memory, CPU and disk utilization. We also looked at how to identify expensive
SQL statements.
In the final part, we looked at the tools alerting framework, workload management, and the
capture and replay functionality provided by SAP HANA.
Using the SAP HANA Alerting Framework allows you to be informed up-front about possible
problems. Alerts are shown in SAP HANA cockpit, but they can also be sent using email.
SAP HANA workload management lets you control what is executed on the SAP HANA
database when the system is under high load. You can set up rules that allow SAP HANA to
decide what can be executed when the system load is high.
SAP HANA Capture and Replay lets you capture the current workload on your SAP HANA
production system and then replay this workload on a test system. This allows you to
investigate regression and/or performance degradation problems after a change to the SAP
HANA database hardware and software configuration. The tool can also be used to test the
performance and stability of a new SAP HANA version or support package stack.

LESSON SUMMARY
You should now be able to:
● Capture and replay a SAP HANA workload

168 © Copyright. All rights reserved.


Unit 3

Learning Assessment

1. When configuring SAP HANA alerts, you can only enter one email recipient per alert.
Determine whether this statement is true or false.

X True

X False

2. Which protocol is used by the statistics service to collects statistical and performance
information?
Choose the correct answers.

X A JSON

X B MDX

X C SNMP

X D SQL

3. What are the workload characteristics of SAP HANA requests?


Choose the correct answers.

X A The target of a request

X B The type of a query

X C The business importance

X D The disk level objectives

4. CPU-binding is a resource pooling technique at the SAP HANA kernel level, and it can be
configured within the SAP HANA database.
Determine whether this statement is true or false.

X True

X False

© Copyright. All rights reserved. 169


Unit 3: Learning Assessment

5. What changes may influence a system’s performance and stability?


Choose the correct answers.

X A Index changes

X B Log Mode changes

X C Table distribution changes

X D Memory allocation limit

6. The SAP HANA Capture and Replay tool allows you to capture real system workload in a
productive environment, and preprocess and replay the captured workload on a different
target system.
Determine whether this statement is true or false.

X True

X False

7. What changes may influence a system’s performance and stability?


Choose the correct answers.

X A Index changes

X B Log Mode changes

X C Table distribution changes

X D Memory allocation limit

170 © Copyright. All rights reserved.


Unit 3

Learning Assessment - Answers

1. When configuring SAP HANA alerts, you can only enter one email recipient per alert.
Determine whether this statement is true or false.

X True

X False

Correct! You can configure more than one email recipient per alert. Read more about this
in the lesson "Configuring SAP HANA Alerting Framework" of the course HA215.

2. Which protocol is used by the statistics service to collects statistical and performance
information?
Choose the correct answers.

X A JSON

X B MDX

X C SNMP

X D SQL

You are correct! The statistics service uses SQL to read the data from the SAP HANA
monitoring views. Read more about this in the lesson "Configuring SAP HANA Alerting
Framework" of the course HA215.

3. What are the workload characteristics of SAP HANA requests?


Choose the correct answers.

X A The target of a request

X B The type of a query

X C The business importance

X D The disk level objectives

Correct! The query type and business importance are characteristics of the workload.
Read more about this in the lesson "Setting up SAP HANA Workload Management" of the
course HA215.

© Copyright. All rights reserved. 171


Unit 3: Learning Assessment - Answers

4. CPU-binding is a resource pooling technique at the SAP HANA kernel level, and it can be
configured within the SAP HANA database.
Determine whether this statement is true or false.

X True

X False

Correct! CPU-binding is a resource pooling technique at the OS level, so it outside the


control of the SAP HANA database. Read more on this in the lesson "Setting up SAP HANA
Workload Management" of the course HA215.

5. What changes may influence a system’s performance and stability?


Choose the correct answers.

X A Index changes

X B Log Mode changes

X C Table distribution changes

X D Memory allocation limit

You are correct! Changes to an Index and/or the table distribution influence the way SAP
HANA accesses data, so this can have an impact on the performance. Read more about
this in the lesson "Using SAP HANA Capture and Replay" of the course HA215.

6. The SAP HANA Capture and Replay tool allows you to capture real system workload in a
productive environment, and preprocess and replay the captured workload on a different
target system.
Determine whether this statement is true or false.

X True

X False

Correct! The capture and replay does not need to be performed on the same server. Read
more about this in the lesson "Using SAP HANA Capture and Replay" of the course HA215.

172 © Copyright. All rights reserved.


Unit 3: Learning Assessment - Answers

7. What changes may influence a system’s performance and stability?


Choose the correct answers.

X A Index changes

X B Log Mode changes

X C Table distribution changes

X D Memory allocation limit

Correct! Changes to an Index and/or the table distribution influence the way SAP HANA
accesses data, so this can have an impact on the performance. Read more about this in
the lesson "Using SAP HANA Capture and Replay" of the course HA215.

© Copyright. All rights reserved. 173

You might also like