Professional Documents
Culture Documents
2
RFD tNavigator
• tNavigator
– Developed by the research and product development teams of Rock Flow Dynamics
– Designed for running dynamic reservoir simulations on engineers’ laptops, servers, and HPC clusters.
– Written in C++ and designed from the ground up to run parallel acceleration algorithms on multicore and manycore
shared and distributed memory computing systems.
– Employs Qt graphical libraries, which makes the system true multiplatform.
– By taking advantage of the latest computing technologies like NUMA, Hyperthreading, MPI/SMP hybrids, the
performance of tNavigator by far exceeds the performance of any industry standard dynamic simulation tools.
– license pricing doesn’t depend on the number of cores employed in the shared memory computing systems
• One of the distinctive features includes the interactive user control of the simulation run
– Users can not only monitor every step of the reservoir simulation at runtime
– but also directly interrupt and change the simulation's configurations with just a mouse click.
3
Objectives
4
Test Cluster Configuration
• Dell PowerEdge R730 32-node (1024-core) “Thor” cluster
– Dual-Socket 16-Core Intel E5-2697A v4 @ 2.60 GHz CPUs (Power Management in BIOS sets to Maximum Performance)
– Memory: 64GB memory, DDR4 2133 MHz, Memory Snoop Mode in BIOS sets to Home Snoop, Turbo Enabled
• Mellanox SwitchX-2 SX6036 36-port 56Gb/s FDR InfiniBand / VPI Ethernet Switch
5
PowerEdge R730
Massive flexibility for data intensive operations
• Performance and efficiency
– Intelligent hardware-driven systems management
with extensive power management features
– Innovative tools including automation for
parts replacement and lifecycle manageability
– Broad choice of networking technologies from GigE to IB
– Built in redundancy with hot plug and swappable PSU, HDDs and fans
• Benefits
– Designed for performance workloads
• from big data analytics, distributed storage or distributed computing
where local storage is key to classic HPC and large scale hosting environments
• High performance scale-out compute and low cost dense storage in one package
• Hardware Capabilities
– Flexible compute platform with dense storage capacity
• 2S/2U server, 6 PCIe slots
– Large memory footprint (Up to 768GB / 24 DIMMs)
– High I/O performance and optional storage configurations
• HDD options: 12 x 3.5” - or - 24 x 2.5 + 2x 2.5 HDDs in rear of server
• Up to 26 HDDs with 2 hot plug drives in rear of server for boot or scratch
6
RFD tNavigator Performance – Ethernet vs InfiniBand
63% 35%
72%
7
RFD tNavigator Performance – Processes Per Node
• tNavigator process spawns multiple OpenMP worker threads onto CPU cores
– Typical case: launch 1 process per node (PPN=1), then spawn threads to utilize all cores
– We compare a case with 2PPN, where each process would spawn threads to its CPU socket
– Seen up to 20% gain in performance
13%
20%
Higher is better
8
RFD tNavigator Performance – CPU Processor
• “Broadwell” CPU provides more CPU cores per socket than “Haswell” family CPU
– The additional 14% of CPU cores translate to an additional 14% increase in performance
– Haswell: E5-2697 v3 is equipped with 14 cores per CPU which typically runs at 2.6GHz
– Broadwell: E5-2697A v4 is equipped with 16 cores per CPU which typically runs at 2.6GHz
14%
Higher is better
9
RFD tNavigator Performance – File system
29%
195%
10
RFD tNavigator Performance – I/O
8%
11
RFD tNavigator Profiling – % MPI Communications
12
RFD tNavigator Profiling – % MPI Communications
• Majority of data transfer messages are medium sizes for both data, except for:
– MPI_Allreduce has a large concentration (70% MPI, 16% wall) in small sizes (8,4,128 bytes)
– MPI_Bcast is concentrated at 4-byte (13% MPI, 3% wall)
– MPI_Waitall calls are 0-byte call (8% MPI, 2% wall)
Higher is better
13
RFD tNavigator Summary
• tNavigator integrates the latest technologies which enables higher performance
– tNavigator uses NUMA, Hyper-Threading, MPI/SMP hybrid to achieve higher scaling
• tNavigator demonstrates to perform with the right set of hardware components
– Network: InfiniBand delivers 72% higher performance compared to Ethernet
– CPU: 14% increase in CPU cores translate directly to a 14% increase in performance
– Running additional MPI process per node could improve performance by up to 20%
– File system: tNavigator demonstrates a need of a decent parallel file system
• Parallel file system, such as Lustre, supports RDMA transport, can be a good alternative to NFS
• Performance of NFS would cause performance degradation at scale
• The effect of writing results can have an impact on performance
14
Thank You
HPC Advisory Council
All trademarks are property of their respective owners. All information is provided “As-Is” without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and
completeness of the information contained herein. HPC Advisory Council undertakes no duty and assumes no obligation to update or correct any information presented herein
15 15