SAP Knowledge Base Article

 

  1612283 - Hardware Configuration Standards and Guidance Version   6     Validity: 11.10.2013 - active   Language   English

 

Symptom
Information is required on the correct efficient specification and configuration of Intel/AMD x64 Hardware running Windows for SAP ABAP and SAP Java application server environments.  This note also discusses SAP on Windows virtualized environments.

Cause
“Intel” x86 based hardware (based on either Intel or AMD) has evolved rapidly in recent years.  Many new technologies and features in Windows  and Intel H/W platforms (hereafter called “Intel”) directly impact the optimal configurations for SAP systems.  SAP ABAP and SAP Java application servers should be deployed after reviewing the recommendations in this note.  The configurations in this  SAP Note have been tested and proven by SAP, Microsoft and hardware vendors in lab tests, benchmarks and customer deployments.  More information on SAP standard benchmarks and the term “SAPS” can be found at http://www.sap.com/benchmark/ 

Resolution
Prior to purchasing new hardware, when installing and configuring SAP on Windows on physical or virtual environments, follow the deployment guidelines in the PDF file attached to this SAP note. 

General
SAP server throughput (as measured by SAPS) has increased significantly on Intel based server hardware in recent years. Intel/AMD and OEM hardware manufacturers achieved performance increases by introducing many new technologies and concepts.  SAP applications  require appropriate hardware configurations and parameterization to achieve the performance and throughput increases demonstrated in the SAP Standard Application benchmarks.  Inappropriate Intel configurations could cause significant performance problems, unpredictable  performance (sometimes slow) or significantly underperform relative to SAP Standard Application benchmarks. Provided the concepts and configurations documented in this note are followed these problems should not occur.

1. Overview of Modern Intel  Server Technologies
1.1. Clock Speed All SAP work processes other than Message Server and Enqueue Server are executing logic within a single thread.  The performance of  batch jobs in particular and other work process types in general is largely determined by the latency of database requests and by the time a SAP work process spends running on a single CPU Windows tread. SCU is the SAP specific terminology for describing per “thread”  throughput (Single Computing Unit – note 1501701). SCU is very important in determining the performance of a SAP system.  SAP Standard Application benchmarks have shown a strong correlation between clock speed (GHz) and SCU on the same processor architecture.  On some Intel servers disabling Hyperthreading may increase SCU, thereby improving the throughput of a single work process  (for example, batch job) but decreasing the total aggregate throughput of the entire server. Disabling Hyperthreading in the server BIOS may increase performance of a single transaction or report if the bottleneck is within the SAP application server.  If the bottleneck is within the  DBMS, this will not improve performance and may make performance worse. The exact performance increase per thread is dependent on  factors beyond the scope of this note.  Please contact Intel for further information on Hyperthreading and performance.  SAP benchmarks on  Windows Intel systems have shown higher SCU on higher clock speed processors. 2 socket servers have significantly higher SCU than 4 or 8 socket servers.  Benchmarks show 8 socket Intel servers have 55% lower  SAPS/thread than 2 socket as at September 2013. Some SAP components and some specific SAP processes (see note 1501701) are particularly sensitive to SCU performance.  Hypothetical examples below show how to calculate SCU performance (which corresponds to per thread performance on Windows): Example  – Intel Server with Hyperthreading ON: SAPS = 32,000 H/W configuration = Intel E5 2 processors / 16 cores / 32 threads SCU = 32,000 / 32 threads = 1,000 SCU SAPS Same Intel server with Hyperthreading OFF: SAPS = 22,000 H/W configuration = Intel E5 2 processors / 16 cores / 16 threads SCU = 22,000 / 16 threads = 1,375 SCU SAPS Windows Power Saving features can lower the clock speed when the CPU is idle.  Hardware vendors and Microsoft can provide more  information on the optimal energy/performance configuration. Additional information about Hyperthreading and SCU on Virtualized systems can be found in note 1246467 - Hyper-V Configuration Guideline and note 1056052 - Windows: VMware vSphere configuration guidelines.   Microsoft and VMware provide additional whitepapers  and blogs on this topic. 1.2. Multi-core When talking about performance we need to distinguish between aggregated performance expressed in throughput, like Sales Orders per

 Oracle and SAP are all designed NUMA aware as of current releases. See SAP Note 1635387 - Windows Processor Groups.3. Local memory access times are very fast on NUMA based systems because the memory controller is directly connected to one processor.  Processor groups are required to address > 64 threads. When upgrading hardware it is important to evaluate both the total aggregate throughput (total SAPS) and the SAPS per thread.hour. On the SAP application side. Information about the time it takes to execute an elementary operation is expressed more by the SCU as introduced above. Increasing the number of work processes (beyond about 50) and users on a single instance may not linearly improve throughput. performance often is associated with the time it takes to  calculate such an elementary operation of one payroll calculations.  Processor Groups are required on most 4 socket servers (4 socket * 10 core * hyperthreading = 80 threads) Applications and DBMS software must be Processor Group aware otherwise the maximum number of threads the application or DBMS can address is limited to 64. IBM.5. Large Physical Memory Windows Zero Memory Management is generally recommended and is documented in note 88416. SAP Kernel = no automatic processor group handling – see note 1635387 2. 1.4 . the time it takes to execute. Processor Groups (K-Groups) Windows 2008 R2 and higher introduced a concept called “Processor Groups”.  SAP generally recommends against huge  ABAP or Java instances as documented in note 9942.   Remote memory access is many times slower than local.  See section 2 of this note for further information Current status (August 2012)  1. Virtualization software does not prevent NUMA induced latencies nor change the  physical structure of the processor/memory layout. which can be executed on a given hardware. a single lookup of a row in a table. Oracle 11g = processor group support planned with patch 11.  The SAP Kernel for Windows is  single threaded and does not contain NUMA handling logic to localize memory storage for a specific process to a specific NUMA node. Aggregated throughput performance delivered by a single server hardware can be read out of SAP benchmarks and the associated SAPS number. Performance often is associated with the time it takes to run a specific business process such as to run a payroll calculation.  An example is three ABAP instances each with 50 work processes has shown much  better performance than one ABAP instance with 150 work processes. Installing multiple ABAP or Java instances on a single physical server will allow the H/W resources to be fully leveraged.  The calculation of local versus remote memory access for SAP application  instances is a simple mathematical formula: 2 socket = 50% chance of a local NUMA node access 4 socket = 25% chance of a local NUMA node access 8 socket = 12.0.2.  This can occur with or without virtualization.  NUMA aware RDBMS software  will attempt to keep memory structures local and avoid remote memory access. the focus in selecting a DBMS server often is more on the aggregate throughput performance and the ability of executing as many requests as possible in parallel. Suggested Profile Parameters for ABAP instances sharing the same H/W and operating system: PHYS_MEMSIZE   em/max_size_MB abap/heap_area_dia abap/heap_area_total physical RAM / number of instances + small amount for operating system ZAMM default = 1.  A 2 socket server is very powerful and a single instance with around 50 work  processes is unlikely to leverage the CPU power of the H/W.  Excessive remote memory accesses on 8 socket or higher servers running SAP ABAP instances will adversely impact  performance. payroll calculations per hour. On the other side. Or on the database side.  Whereas on the DBMS side.pfl In general use Windows Zero Administration Memory Management. RDBMS software from Microsoft. NUMA Non-Uniform Memory Access (NUMA) directly impacts the performance of SAP ABAP application servers. the scale-out provided by the SAP application layer allows high flexibility to leverage hardware which provide a high SCU.5% chance of a local NUMA node access 2 socket Intel commodity servers have a higher clock speed and better NUMA characteristics and are therefore suitable for SAP application servers.4. SQL Server 2008 R2 and higher = processor group aware 3.   1.  Remove the profile parameters listed in note 88416 and set only the  PHYS_MEMSIZE.  These conditions are both  met on 2 socket commodity Intel systems.  ZAMM parameters will be automatically calculated correctly based on the value for PHYS_MEMSIZE.   Performance will therefore be maximized on high clock speed processors with the least number of NUMA nodes. for example. Performance will be somewhat less than the H/W capability.  Modern DBMS software has demonstrated very good  scalability on 8 socket or higher Intel servers. Modern virtualization software may avoid remote memory communication if a Virtual  Machine is equal to or smaller than the resources of one NUMA node. balance workload with SAP Logon Load Balancing and keep the instance configuration identical by setting most parameters in the default.5 x PHYS_MEMSIZE* 2GB (2000000000) or slightly higher ZAMM default = PHYS_MEMSIZE abap/heap_area_nondia 0 (up to max value of abap/heap_area_total) *As of 720_EXT downwards compatible kernel patch 315 or higher The attached PDF file contains sample configurations 1.  Solution: install multiple smaller ABAP instances per physical server.

A separate network for SAP application servers to communicate with the RDBMS 3.      1.6.0 (Windows 2012 and Windows Server 2012 R2) supports 64 vCPU & 1TB RAM per Virtual Machine Hyper-V 2. Microsoft and SAN vendors can provide additional information on optimal IO configurations. Most Hardware vendors are benchmarking new 2 processor servers  with 256GB RAM. HP and other vendors. Total number of cores and threads has increased dramatically. one LUN presented to Hyper-Visor partitioned into multiple drive letters. 10G network and 2-4 Dual Port HBA 3. A modern Intel or AMD system with insufficient memory will be unable to run efficiently or achieve peak throughput.  Common causes are insufficient LUNs.  SAP benchmarks provide an indication of  the appropriate amount of RAM for a particular hardware configuration. The attached PDF file contains links with additional information about network topologies and configuration for Intel  systems 1. Other DBMS = check with DBMS vendor for support status (MaxDB/Livecache. SSD disks are of no benefit to SAP application servers. Total SAPS on Intel servers has increased significantly in recent years 2.  Some RDBMS and SAP instances may attempt to use loopback rather than shared memory by default  5. Memory SAP andDBMS performance testing and customer deployments have shown that RAM is a determining factor in scalability.  #2 may also be possible for ABAP & Java servers. DB2.0. insufficient HBAs. SAPQuicksizer also provides some guidance. The SAP ASCS/SCS is not a full application server and can run without problems on configuration #1. low latency and 100% reliable network connection between the SAP application server(s).    Large or busy systems strongly benefit from:  1. A substantial increase in SAPS per core on 2 socket Intel server and somewhat lesser increase on 4 socket and 8 socket Intel servers 3. 10G network and 2 x Dual Port  HBA  2.  Contact Microsoft and/or H/W vendor for recommended NIC and drivers  4. Offload. SR-IOV. Customers should use the H/W configurations published on the SAP benchmark website as guidance for how much RAM to specify. A significant but more moderate increase in SAPS per CPU thread.1) communication is single threaded and is unable to be distributed over multiple threads with technologies such as RSS.1.000 SAPS 256-384GB  RAM. TCPIP Loopback (127.000-140. OS Limitations: Windows 2012 R2 supports up to 640 threads* and 4TB RAM Windows 2012 supports up to 640 threads* and 4TB RAM Windows 2008 R2 supports up to 256 threads and 2TB RAM Windows 2008 supports 64 threads and 2TB RAM Hyper-V 3.g.0 (Windows 2008 R2) supports 4 vCPU & 64GB RAM per Virtual Machine VMware vSphere4.6.1 supports 64 vCPU and 1TB RAM per Virtual Machine VMware vSphere 5. Insufficient IO Performance IO can be a significant performance bottleneck. VM-FEX and parallelism features built into modern network cards and drivers. Given the low cost of RAM it is recommended to follow this configuration. Network SAP 3 tier configurations require a very high performance.  In most cases it is observed that 8 socket  systems use proportionately more energy than 2 socket systems. Dell R910) Balanced configurations tested and deployed at customer sites:   1.000-56.  Increase in SAPS per CPU thread (SCU) is most significant on 2  socket Intel servers 4. 2 socket Intel or AMD = > 32. TCPIP v4 and v6 offload and Receive Side Scaling have been tested by Microsoft.6. incorrectly configured MPIO software. As of September 2013  the minimum RAM for a 2 socket Intel server should be around 256GB. #2 or #3 . 2.6.0. 4 socket Intel or AMD = > 62. though performance will not be as good as expected and additional configuration is required.7. 10 Gigabit network 2. 1.2. Performance Bottlenecks 1. 10G network and 4-8 Dual Port HBA  SAP ABAP & Java servers and DBMS software will perform well on configuration #1. HP DL580G7.5 supports 64 vCPU and 1TB RAM per Virtual Machine *thread = Sockets x cores per processor x Hyperthreading. the message server and the database.  #1 has generally demonstrated best performance with simple configuration and tuning for most SAP applications relative to #2 and #3.  Summary of Physical Hardware Configurations SAP benchmarks show several clear trends:  1. Energy Consumption Customers are encouraged to compare the energy consumption of different H/W configurations. DBMS transaction logs benefit from very high performance SSD disks.x supports 8 vCPU and 255GB  RAM per Virtual Machine VMware vSphere 5. Configuration #3 requires special expert configuration and tuning to run SAP application servers or DBMS together with SAP application servers (with or without virtualization). Sybase etc) 1.4.3.000 SAPS 1TB RAM or more.000-75.000 SAPS 512GB-1TB RAM.   4 socket x 10 core Intel server with Hyperthreading = 80 threads (e. 8 socket Intel = 130.0 supports 32 vCPU and 1TB RAM per Virtual Machine VMware vSphere 5. FusionIO Solid State Disks (SSD) and other forms of SSD are increasingly common on Windows Database  servers.  Servers with 12 to 80 core and 24 to 160 threads are available from most  H/W vendors as at 2012.

 reliability and scalability for DBMS software (or other software that is NUMA aware). some backup software and some Anti-Virus software that was not designed for K-Groups. Hypervisors can automatically “relocate” a VM from a busy processor socket to another processor that is not so busy. In the case of still having enabled Hyperthreading AND having more than 64 CPUs. #2 and #3 are suitable for modern DBMS software and will deliver nearly linear scalability with addition of CPU sockets. 3.000 .   Typically there would be no need to engage expert consulting to install. NIC or SSD cards into an inappropriate PCI slot can have a dramatic impact on performance on some 8 socket systems 4. The process of moving a  VM from one NUMA node to another will eventually require copying the entire memory context across the QPI or Hyper-Transport (AMD) links. stable and predictable performance on SAP application servers or DBMS together with SAP application servers (with or without virtualization) on 8 socket or higher servers. configuration and performance support of SAP application servers on 8 socket servers.  Virtualization 4. Standard readily available documentation is sufficient to deploy DBMS software on large 8 socket or higher systems. Performance will be significantly reduced.  4. the Hypervisor will run the VM across both processors.  The SAP application server layer should be scaled out horizontally on 2 socket commodity servers or Virtual Machines. SAP is unable to provide generalized documentation regarding 8 socket or higher configurations because:  1. implement Microsoft KB 2510206 as per note 1635387. Note: on average 10-30% of the total SAPS resources is consumed by the DBMS layer. The impact of device drivers. Guidelines for running SAP ABAP server on 8+ socket systems Complex configuration and tuning is required to achieve good. The PDF file attached to this note demonstrates examples.000 SAPS) is required for DBMS layer. If SAP sizing indicates additional capacity in excess of a 2 socket server (currently 32. Virtual platforms supported for Windows Windows Hyper-V and VMware vSphere are both supported for SAP and documented in note 1409608 4. Guidelines for running SAP ABAP server on 4 socket systems 4 socket servers can be configured to run SAP application servers if 2 socket servers are unavailable. DBMS software running on 2. NUMA.  tuning and performance support of SAP application servers on 8 socket servers requires a Consulting engagement. Total physical memory will be (most often evenly) distributed over 8 sockets which may lead to very little local memory per NUMA node Hardware vendors are responsible for the specification. configure and tune DBMS software on 8 socket servers. If Virtualization is configured on 4 socket systems please consult the Hypervisor vendor for further information. Placement of PCI HBA. Virtual RAM + Virtual NUMA (vRAM) Hypervisors allocate vRAM to physical RAM. 3. Implement NUMA affinity as detailed in SAP note 1667863 4. 4 and 8 socket servers with large amounts of memory will achieve very good scalability and performance without the need for complex configurations and tuning. configuration and tuning on 2 socket servers is simple and largely automatic.  SAP systems should not “overcommit” meaning the vRAM should be equal to or less than .2. Provided only DBMS and (A)SCS software is installed the SAP Support procedures for 2.  Hardware vendors will often  provide a deployment guide for each specific DBMS. Disabling hyperthreading is often insufficient to reduce the number of threads below 64. This will force creation of evenly sized K-Groups/Processor Groups by the Windows OS. implementation. Some hardware architectures only provide 4 QPI/HyperTransport links.2.  It is generally  recommended to obtain the latest “Best Practices” deployment guides from the relevant hardware vendor.1.  2 socket  servers have a high clock speed. If possible disable HyperThreading to reduce the total threads to below 64.  Recommended Configuration Steps:   1. 4.  The implementation of > 4 socket servers differs significantly between the various hardware vendors  2. Determine the amount of local memory per NUMA node(s) and size the SAP instance accordingly 5.  Configuration of SAP ABAP Server on 4 socket & 8 or higher socket servers 3. efficient energy consumption and demonstrate good NUMA characteristics.3.Configurations #1.  Configuration.56. guidance and best practices regarding the configuration of non-NUMA aware applications on VMs 3. Remote memory accesses are vastly more probable on 8 socket systems 6. 8 socket or higher servers offer excellent performance. 4 or 8 socket servers are the same. K-Groups and SAP profile parameters is required. Poor SAP application server performance on 8 socket servers should be referred to the hardware vendor. 8 socket servers with OEM designed Hubs/Node controllers and NUMA architectures is likely be pronounced and significant   5.   Frequent VM relocations are likely to impact overall system performance and impact the predictability of performance (sometimes a VM will run slowly then after a migration to another NUMA node run fast). if additional availability & reliability features are required or if many databases are consolidated onto a single server/cluster then select 4 socket or higher servers.1. All threads will be in one K-Group. high SAPS per thread/SCU. therefore K-Group configuration is generally required 3. If the number of vCPU was increased to 12. 2. Knowledge of modern Processor technologies. A server with 2 processors each with 8 cores would be able to run 8 vCPU on a single processor. This is only possible if the number of vCPU is equal to or less than the number of cores on a physical processor. Additional configuration and tuning is required.  SAP application server installation.  H/W configurations with > 4 sockets require specialized Hubs/Node  controllers. Virtual CPU (vCPU) Hyper-V and VMware vSphere both map each individual vCPU to one physical core/thread.  Hypervisors will try to run all vCPU on the same  physical processor. SAP application servers typically consume 70%+ of the overall CPU resources of most customer systems.

0 and VMware vSphere 4.   Virtualization vendors provide additional documentation and recommendations on NUMA configurations  and best practices. the vCPU and local memory. 2 socket with 8 cores each and 128GB RAM.  RDBMS software performance was  therefore significantly decreased if vRAM > local NUMA memory or vCPU > cores on one single processor. Author: Cameron Gardiner.0 and VMWare 5. sizing. NUMA.0 greatly improve the alignment between VMs. the NUMA node. SAP AG Erik Rieger. local memory. 1. energy Header Data Released On Release Status Component Priority Category 14. Microsoft Corporation Contact Person for questions and comments on this article: cgardin@microsoft. Zero Memory Management. Kasim Hansia. Michael Webster VWware Global Inc. abap/heap_area.  Each processor has 8 cores and 64GB local memory directly connected to one processor + 64GB remote memory. per thread performance. The configuration and operation of 2 socket servers with large amounts of RAM is relatively simple. 8 socket with 8 cores each and 128GB RAM. Hyperthreading. SAP AG Jürgen Thomas. consolidation.physical RAM for Production systems. Multi-SID.com Reviewer: Karl-Heinz Hochmuth. AMD. If vRAM is larger than the physical RAM connected to one NUMA node or if the number of vCPU exceeds the number of cores on a single processor. 64 bit.10. Wintel . vRAM.x did not provide NUMA information to the Virtual Machines. Microsoft Corporation Keywords Intel. 3. ZMM. Hyper-V 3. QPI. 4 socket with 8 cores each and 128GB RAM.    Virtualization software does not prevent NUMA induced latencies nor change the physical structure of the processor/memory layout.0 and VMware vSphere 5. 2 socket servers with large amounts of physical memory (up to 768GB as of 2012)  have shown consistent and predictable results running virtual workloads. 2. Peter Simon. The amount of Local NUMA memory (therefore the maximum vRAM before remote access occurs) is a function of Total RAM and number of processors. Both Hyper-V 3.  Each processor has 8 cores and 16GB local memory directly connected to one processor + 112GB remote memory. single threaded.0 do provide NUMA topology information to the Virtual Machine. x64.  Hyper-V 2. em/max_size_MB. Partitioning large 4 socket or 8 socket servers into many Virtual Machines is unlikely to achieve good predictable and stable performance without expert knowledge and configuration. remote memory.  Each processor has 8 cores and 32GB local memory directly connected to one processor + 96GB remote memory. Modern Virtualization software may avoid remote memory communication if a Virtual Machine is equal to or smaller than the resources of one NUMA node. PHYS_MEMSIZE. SAP AG Bernd Lober.2013 13:44:17 Released to Customer BC-OP-NT Windows Normal How To Operating System WIN 2008 R2 Product This document is not restricted to a product or product version Attachments File Name File Size (KB) Mime Type . the Virtual Machine will be performing Remote NUMA memory access (which is many times slower than Local access). vCPU.

pdf 276 application/pdf .version 6 4.HW deployment DIAGRAMS PDF FILE .

Sign up to vote on this title
UsefulNot useful