This action might not be possible to undo. Are you sure you want to continue?
1612283 - Hardware Configuration Standards and Guidance Version 6 Validity: 11.10.2013 - active Language English
Information is required on the correct efficient specification and configuration of Intel/AMD x64 Hardware running Windows for SAP ABAP and SAP Java application server environments. This note also discusses SAP on Windows virtualized environments.
“Intel” x86 based hardware (based on either Intel or AMD) has evolved rapidly in recent years. Many new technologies and features in Windows and Intel H/W platforms (hereafter called “Intel”) directly impact the optimal configurations for SAP systems. SAP ABAP and SAP Java application servers should be deployed after reviewing the recommendations in this note. The configurations in this SAP Note have been tested and proven by SAP, Microsoft and hardware vendors in lab tests, benchmarks and customer deployments. More information on SAP standard benchmarks and the term “SAPS” can be found at http://www.sap.com/benchmark/
Prior to purchasing new hardware, when installing and configuring SAP on Windows on physical or virtual environments, follow the deployment guidelines in the PDF file attached to this SAP note.
SAP server throughput (as measured by SAPS) has increased significantly on Intel based server hardware in recent years. Intel/AMD and OEM hardware manufacturers achieved performance increases by introducing many new technologies and concepts. SAP applications require appropriate hardware configurations and parameterization to achieve the performance and throughput increases demonstrated in the SAP Standard Application benchmarks. Inappropriate Intel configurations could cause significant performance problems, unpredictable performance (sometimes slow) or significantly underperform relative to SAP Standard Application benchmarks. Provided the concepts and configurations documented in this note are followed these problems should not occur.
1. Overview of Modern Intel Server Technologies
1.1. Clock Speed All SAP work processes other than Message Server and Enqueue Server are executing logic within a single thread. The performance of batch jobs in particular and other work process types in general is largely determined by the latency of database requests and by the time a SAP work process spends running on a single CPU Windows tread. SCU is the SAP specific terminology for describing per “thread” throughput (Single Computing Unit – note 1501701). SCU is very important in determining the performance of a SAP system. SAP Standard Application benchmarks have shown a strong correlation between clock speed (GHz) and SCU on the same processor architecture. On some Intel servers disabling Hyperthreading may increase SCU, thereby improving the throughput of a single work process (for example, batch job) but decreasing the total aggregate throughput of the entire server. Disabling Hyperthreading in the server BIOS may increase performance of a single transaction or report if the bottleneck is within the SAP application server. If the bottleneck is within the DBMS, this will not improve performance and may make performance worse. The exact performance increase per thread is dependent on factors beyond the scope of this note. Please contact Intel for further information on Hyperthreading and performance. SAP benchmarks on Windows Intel systems have shown higher SCU on higher clock speed processors. 2 socket servers have significantly higher SCU than 4 or 8 socket servers. Benchmarks show 8 socket Intel servers have 55% lower SAPS/thread than 2 socket as at September 2013. Some SAP components and some specific SAP processes (see note 1501701) are particularly sensitive to SCU performance. Hypothetical examples below show how to calculate SCU performance (which corresponds to per thread performance on Windows): Example – Intel Server with Hyperthreading ON: SAPS = 32,000 H/W configuration = Intel E5 2 processors / 16 cores / 32 threads SCU = 32,000 / 32 threads = 1,000 SCU SAPS Same Intel server with Hyperthreading OFF: SAPS = 22,000 H/W configuration = Intel E5 2 processors / 16 cores / 16 threads SCU = 22,000 / 16 threads = 1,375 SCU SAPS Windows Power Saving features can lower the clock speed when the CPU is idle. Hardware vendors and Microsoft can provide more information on the optimal energy/performance configuration. Additional information about Hyperthreading and SCU on Virtualized systems can be found in note 1246467 - Hyper-V Configuration Guideline and note 1056052 - Windows: VMware vSphere configuration guidelines. Microsoft and VMware provide additional whitepapers and blogs on this topic. 1.2. Multi-core When talking about performance we need to distinguish between aggregated performance expressed in throughput, like Sales Orders per
Modern DBMS software has demonstrated very good scalability on 8 socket or higher Intel servers. SAP generally recommends against huge ABAP or Java instances as documented in note 9942. The SAP Kernel for Windows is single threaded and does not contain NUMA handling logic to localize memory storage for a specific process to a specific NUMA node. a single lookup of a row in a table. The calculation of local versus remote memory access for SAP application instances is a simple mathematical formula: 2 socket = 50% chance of a local NUMA node access 4 socket = 25% chance of a local NUMA node access 8 socket = 12.5% chance of a local NUMA node access 2 socket Intel commodity servers have a higher clock speed and better NUMA characteristics and are therefore suitable for SAP application servers. Local memory access times are very fast on NUMA based systems because the memory controller is directly connected to one processor. Processor Groups (K-Groups) Windows 2008 R2 and higher introduced a concept called “Processor Groups”. See SAP Note 1635387 - Windows Processor Groups.0. NUMA aware RDBMS software will attempt to keep memory structures local and avoid remote memory access. performance often is associated with the time it takes to calculate such an elementary operation of one payroll calculations. A 2 socket server is very powerful and a single instance with around 50 work processes is unlikely to leverage the CPU power of the H/W. 1. Virtualization software does not prevent NUMA induced latencies nor change the physical structure of the processor/memory layout. An example is three ABAP instances each with 50 work processes has shown much better performance than one ABAP instance with 150 work processes. On the other side. Large Physical Memory Windows Zero Memory Management is generally recommended and is documented in note 88416. These conditions are both met on 2 socket commodity Intel systems. NUMA Non-Uniform Memory Access (NUMA) directly impacts the performance of SAP ABAP application servers. Performance will therefore be maximized on high clock speed processors with the least number of NUMA nodes. Aggregated throughput performance delivered by a single server hardware can be read out of SAP benchmarks and the associated SAPS number. Performance often is associated with the time it takes to run a specific business process such as to run a payroll calculation. Oracle 11g = processor group support planned with patch 11. Or on the database side. the time it takes to execute.pfl In general use Windows Zero Administration Memory Management. Performance will be somewhat less than the H/W capability. IBM. payroll calculations per hour. Processor groups are required to address > 64 threads. Remote memory access is many times slower than local.hour. ZAMM parameters will be automatically calculated correctly based on the value for PHYS_MEMSIZE. for example. Installing multiple ABAP or Java instances on a single physical server will allow the H/W resources to be fully leveraged.5 x PHYS_MEMSIZE* 2GB (2000000000) or slightly higher ZAMM default = PHYS_MEMSIZE abap/heap_area_nondia 0 (up to max value of abap/heap_area_total) *As of 720_EXT downwards compatible kernel patch 315 or higher The attached PDF file contains sample configurations 1. On the SAP application side. Information about the time it takes to execute an elementary operation is expressed more by the SCU as introduced above. Increasing the number of work processes (beyond about 50) and users on a single instance may not linearly improve throughput. Excessive remote memory accesses on 8 socket or higher servers running SAP ABAP instances will adversely impact performance. Processor Groups are required on most 4 socket servers (4 socket * 10 core * hyperthreading = 80 threads) Applications and DBMS software must be Processor Group aware otherwise the maximum number of threads the application or DBMS can address is limited to 64.3.4. the focus in selecting a DBMS server often is more on the aggregate throughput performance and the ability of executing as many requests as possible in parallel. RDBMS software from Microsoft. This can occur with or without virtualization. SQL Server 2008 R2 and higher = processor group aware 3. balance workload with SAP Logon Load Balancing and keep the instance configuration identical by setting most parameters in the default. Oracle and SAP are all designed NUMA aware as of current releases. When upgrading hardware it is important to evaluate both the total aggregate throughput (total SAPS) and the SAPS per thread. SAP Kernel = no automatic processor group handling – see note 1635387 2. Whereas on the DBMS side.5. the scale-out provided by the SAP application layer allows high flexibility to leverage hardware which provide a high SCU. Modern virtualization software may avoid remote memory communication if a Virtual Machine is equal to or smaller than the resources of one NUMA node. Suggested Profile Parameters for ABAP instances sharing the same H/W and operating system: PHYS_MEMSIZE em/max_size_MB abap/heap_area_dia abap/heap_area_total physical RAM / number of instances + small amount for operating system ZAMM default = 1.4 . 1. Remove the profile parameters listed in note 88416 and set only the PHYS_MEMSIZE. Solution: install multiple smaller ABAP instances per physical server. See section 2 of this note for further information Current status (August 2012) 1.2. which can be executed on a given hardware.
6.g. Large or busy systems strongly benefit from: 1. DBMS transaction logs benefit from very high performance SSD disks. Summary of Physical Hardware Configurations SAP benchmarks show several clear trends: 1.6. #2 or #3 . #1 has generally demonstrated best performance with simple configuration and tuning for most SAP applications relative to #2 and #3.000-75. SAPQuicksizer also provides some guidance.x supports 8 vCPU and 255GB RAM per Virtual Machine VMware vSphere 5. Servers with 12 to 80 core and 24 to 160 threads are available from most H/W vendors as at 2012. SR-IOV. 10 Gigabit network 2. Total number of cores and threads has increased dramatically. Customers should use the H/W configurations published on the SAP benchmark website as guidance for how much RAM to specify. The SAP ASCS/SCS is not a full application server and can run without problems on configuration #1. Common causes are insufficient LUNs.0 (Windows 2008 R2) supports 4 vCPU & 64GB RAM per Virtual Machine VMware vSphere4.000 SAPS 256-384GB RAM. As of September 2013 the minimum RAM for a 2 socket Intel server should be around 256GB.2. Sybase etc) 1.5 supports 64 vCPU and 1TB RAM per Virtual Machine *thread = Sockets x cores per processor x Hyperthreading. 4 socket x 10 core Intel server with Hyperthreading = 80 threads (e. Contact Microsoft and/or H/W vendor for recommended NIC and drivers 4. Microsoft and SAN vendors can provide additional information on optimal IO configurations. one LUN presented to Hyper-Visor partitioned into multiple drive letters.000-140. SAP benchmarks provide an indication of the appropriate amount of RAM for a particular hardware configuration. Total SAPS on Intel servers has increased significantly in recent years 2. Given the low cost of RAM it is recommended to follow this configuration. 2.0 supports 32 vCPU and 1TB RAM per Virtual Machine VMware vSphere 5. A separate network for SAP application servers to communicate with the RDBMS 3. DB2. A significant but more moderate increase in SAPS per CPU thread.000 SAPS 512GB-1TB RAM. Other DBMS = check with DBMS vendor for support status (MaxDB/Livecache. 2 socket Intel or AMD = > 32. Insufficient IO Performance IO can be a significant performance bottleneck. Some RDBMS and SAP instances may attempt to use loopback rather than shared memory by default 5.1. incorrectly configured MPIO software. Energy Consumption Customers are encouraged to compare the energy consumption of different H/W configurations. Most Hardware vendors are benchmarking new 2 processor servers with 256GB RAM. SSD disks are of no benefit to SAP application servers. 8 socket Intel = 130. A substantial increase in SAPS per core on 2 socket Intel server and somewhat lesser increase on 4 socket and 8 socket Intel servers 3.000 SAPS 1TB RAM or more. FusionIO Solid State Disks (SSD) and other forms of SSD are increasingly common on Windows Database servers.6. the message server and the database. Offload. 10G network and 2-4 Dual Port HBA 3.7. 10G network and 2 x Dual Port HBA 2.4. The attached PDF file contains links with additional information about network topologies and configuration for Intel systems 1. #2 may also be possible for ABAP & Java servers. VM-FEX and parallelism features built into modern network cards and drivers.3. 4 socket Intel or AMD = > 62. Memory SAP andDBMS performance testing and customer deployments have shown that RAM is a determining factor in scalability. Configuration #3 requires special expert configuration and tuning to run SAP application servers or DBMS together with SAP application servers (with or without virtualization). A modern Intel or AMD system with insufficient memory will be unable to run efficiently or achieve peak throughput. 1. Dell R910) Balanced configurations tested and deployed at customer sites: 1. Performance Bottlenecks 1. TCPIP v4 and v6 offload and Receive Side Scaling have been tested by Microsoft. HP DL580G7.1 supports 64 vCPU and 1TB RAM per Virtual Machine VMware vSphere 5.0. low latency and 100% reliable network connection between the SAP application server(s). OS Limitations: Windows 2012 R2 supports up to 640 threads* and 4TB RAM Windows 2012 supports up to 640 threads* and 4TB RAM Windows 2008 R2 supports up to 256 threads and 2TB RAM Windows 2008 supports 64 threads and 2TB RAM Hyper-V 3. insufficient HBAs.1) communication is single threaded and is unable to be distributed over multiple threads with technologies such as RSS. though performance will not be as good as expected and additional configuration is required.0. TCPIP Loopback (127. Increase in SAPS per CPU thread (SCU) is most significant on 2 socket Intel servers 4. Network SAP 3 tier configurations require a very high performance. HP and other vendors. 1.0 (Windows 2012 and Windows Server 2012 R2) supports 64 vCPU & 1TB RAM per Virtual Machine Hyper-V 2. In most cases it is observed that 8 socket systems use proportionately more energy than 2 socket systems.000-56. 10G network and 4-8 Dual Port HBA SAP ABAP & Java servers and DBMS software will perform well on configuration #1.6.
Placement of PCI HBA. The implementation of > 4 socket servers differs significantly between the various hardware vendors 2. therefore K-Group configuration is generally required 3. reliability and scalability for DBMS software (or other software that is NUMA aware). Hypervisors can automatically “relocate” a VM from a busy processor socket to another processor that is not so busy. high SAPS per thread/SCU. Provided only DBMS and (A)SCS software is installed the SAP Support procedures for 2. DBMS software running on 2. Knowledge of modern Processor technologies. Disabling hyperthreading is often insufficient to reduce the number of threads below 64. Additional configuration and tuning is required. 4 and 8 socket servers with large amounts of memory will achieve very good scalability and performance without the need for complex configurations and tuning. Guidelines for running SAP ABAP server on 4 socket systems 4 socket servers can be configured to run SAP application servers if 2 socket servers are unavailable. It is generally recommended to obtain the latest “Best Practices” deployment guides from the relevant hardware vendor.2. 4. Typically there would be no need to engage expert consulting to install. NUMA. efficient energy consumption and demonstrate good NUMA characteristics. configuration and performance support of SAP application servers on 8 socket servers.1. tuning and performance support of SAP application servers on 8 socket servers requires a Consulting engagement. Virtual RAM + Virtual NUMA (vRAM) Hypervisors allocate vRAM to physical RAM.1. 3. guidance and best practices regarding the configuration of non-NUMA aware applications on VMs 3.Configurations #1. Virtual CPU (vCPU) Hyper-V and VMware vSphere both map each individual vCPU to one physical core/thread. SAP systems should not “overcommit” meaning the vRAM should be equal to or less than . Some hardware architectures only provide 4 QPI/HyperTransport links. #2 and #3 are suitable for modern DBMS software and will deliver nearly linear scalability with addition of CPU sockets.000 .3. Note: on average 10-30% of the total SAPS resources is consumed by the DBMS layer. NIC or SSD cards into an inappropriate PCI slot can have a dramatic impact on performance on some 8 socket systems 4. H/W configurations with > 4 sockets require specialized Hubs/Node controllers. If the number of vCPU was increased to 12. All threads will be in one K-Group. 3. configuration and tuning on 2 socket servers is simple and largely automatic. implement Microsoft KB 2510206 as per note 1635387. Recommended Configuration Steps: 1. 2 socket servers have a high clock speed. A server with 2 processors each with 8 cores would be able to run 8 vCPU on a single processor. implementation. stable and predictable performance on SAP application servers or DBMS together with SAP application servers (with or without virtualization) on 8 socket or higher servers. the Hypervisor will run the VM across both processors. Guidelines for running SAP ABAP server on 8+ socket systems Complex configuration and tuning is required to achieve good. Determine the amount of local memory per NUMA node(s) and size the SAP instance accordingly 5. Poor SAP application server performance on 8 socket servers should be referred to the hardware vendor. if additional availability & reliability features are required or if many databases are consolidated onto a single server/cluster then select 4 socket or higher servers. The SAP application server layer should be scaled out horizontally on 2 socket commodity servers or Virtual Machines. Configuration. In the case of still having enabled Hyperthreading AND having more than 64 CPUs. Configuration of SAP ABAP Server on 4 socket & 8 or higher socket servers 3. K-Groups and SAP profile parameters is required. The process of moving a VM from one NUMA node to another will eventually require copying the entire memory context across the QPI or Hyper-Transport (AMD) links. configure and tune DBMS software on 8 socket servers. Hardware vendors will often provide a deployment guide for each specific DBMS. Virtual platforms supported for Windows Windows Hyper-V and VMware vSphere are both supported for SAP and documented in note 1409608 4. 8 socket servers with OEM designed Hubs/Node controllers and NUMA architectures is likely be pronounced and significant 5. 2. If Virtualization is configured on 4 socket systems please consult the Hypervisor vendor for further information. Virtualization 4. The PDF file attached to this note demonstrates examples. SAP application server installation. 8 socket or higher servers offer excellent performance. Total physical memory will be (most often evenly) distributed over 8 sockets which may lead to very little local memory per NUMA node Hardware vendors are responsible for the specification.56. Remote memory accesses are vastly more probable on 8 socket systems 6.000 SAPS) is required for DBMS layer. some backup software and some Anti-Virus software that was not designed for K-Groups. Hypervisors will try to run all vCPU on the same physical processor. Standard readily available documentation is sufficient to deploy DBMS software on large 8 socket or higher systems. 4. The impact of device drivers. This will force creation of evenly sized K-Groups/Processor Groups by the Windows OS. Implement NUMA affinity as detailed in SAP note 1667863 4. This is only possible if the number of vCPU is equal to or less than the number of cores on a physical processor. Frequent VM relocations are likely to impact overall system performance and impact the predictability of performance (sometimes a VM will run slowly then after a migration to another NUMA node run fast). 4 or 8 socket servers are the same. If SAP sizing indicates additional capacity in excess of a 2 socket server (currently 32. SAP is unable to provide generalized documentation regarding 8 socket or higher configurations because: 1.2. SAP application servers typically consume 70%+ of the overall CPU resources of most customer systems. Performance will be significantly reduced. If possible disable HyperThreading to reduce the total threads to below 64.
vRAM. consolidation. Multi-SID. RDBMS software performance was therefore significantly decreased if vRAM > local NUMA memory or vCPU > cores on one single processor. x64.0 do provide NUMA topology information to the Virtual Machine. SAP AG Erik Rieger. sizing. QPI. em/max_size_MB.0 and VMware vSphere 4. SAP AG Jürgen Thomas. the vCPU and local memory.0 greatly improve the alignment between VMs. Wintel . 2. Hyperthreading. single threaded. Zero Memory Management. If vRAM is larger than the physical RAM connected to one NUMA node or if the number of vCPU exceeds the number of cores on a single processor. 3. the NUMA node.physical RAM for Production systems. 1.x did not provide NUMA information to the Virtual Machines. Each processor has 8 cores and 64GB local memory directly connected to one processor + 64GB remote memory. Each processor has 8 cores and 16GB local memory directly connected to one processor + 112GB remote memory. PHYS_MEMSIZE. abap/heap_area. 2 socket servers with large amounts of physical memory (up to 768GB as of 2012) have shown consistent and predictable results running virtual workloads. the Virtual Machine will be performing Remote NUMA memory access (which is many times slower than Local access). ZMM. energy Header Data Released On Release Status Component Priority Category 14. Microsoft Corporation Keywords Intel.com Reviewer: Karl-Heinz Hochmuth. Microsoft Corporation Contact Person for questions and comments on this article: cgardin@microsoft. Virtualization vendors provide additional documentation and recommendations on NUMA configurations and best practices. Both Hyper-V 3. Hyper-V 3.0 and VMware vSphere 5. remote memory. 64 bit. NUMA. Kasim Hansia. Modern Virtualization software may avoid remote memory communication if a Virtual Machine is equal to or smaller than the resources of one NUMA node. AMD. 8 socket with 8 cores each and 128GB RAM. Each processor has 8 cores and 32GB local memory directly connected to one processor + 96GB remote memory.2013 13:44:17 Released to Customer BC-OP-NT Windows Normal How To Operating System WIN 2008 R2 Product This document is not restricted to a product or product version Attachments File Name File Size (KB) Mime Type . The configuration and operation of 2 socket servers with large amounts of RAM is relatively simple.10. Peter Simon. vCPU. Author: Cameron Gardiner. local memory. Hyper-V 2. Virtualization software does not prevent NUMA induced latencies nor change the physical structure of the processor/memory layout.0 and VMWare 5. per thread performance. 4 socket with 8 cores each and 128GB RAM. The amount of Local NUMA memory (therefore the maximum vRAM before remote access occurs) is a function of Total RAM and number of processors. Michael Webster VWware Global Inc. SAP AG Bernd Lober. 2 socket with 8 cores each and 128GB RAM. Partitioning large 4 socket or 8 socket servers into many Virtual Machines is unlikely to achieve good predictable and stable performance without expert knowledge and configuration.
HW deployment DIAGRAMS PDF FILE .pdf 276 application/pdf .version 6 4.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.