You are on page 1of 4

PARTNER SHOWCASE

NVIDIA is a MSC Software Performance partner with Quadro® and Professional Solution product lines that provide excellent
performance for Patran and MSC Nastran on Windows® and Linux® systems.

MSC Software: Partner Showcase - NVIDIA


GPU Computing Accelerates
Simulation Performance for MSC Nastran Users

Key Highlights:
The power wall (resulting from increase in terms of hiding memory latency because of their
power consumption and heat dissipation due specialization to inherently parallel problems.
to increased processor speeds) has introduced With the ever-increasing demand for more Industry
radical changes in computer architectures. computing performance, the HPC industry is High-Perfomance
Increasing core counts and hence, increasing moving towards a hybrid computing model, Computing
parallelism have replaced increasing clock where GPUs and CPUs work together to
speeds as the primary way of delivering greater perform general purpose computing tasks. In
hardware performance. A modern GPU (Graphics this hybrid-computing model, the GPU serves Challenge
Processing Unit) consists of hundreds of simple as a co-processor to the CPU. Co-processing Increase computing performance by
processing cores; this degree of parallelism refers to the use of an accelerator, a GPU, to developing a hybrid computing model
on a single processor is typically referred to as offload the CPU and to increase computational
‘many-core’ relative to ‘multi-core’ that refers to efficiency. In order to exploit this hybrid MSC Software Solutions
processors with at most a few dozen cores. computing model and the massively parallel
MSC Nastran 2012 to support GPU
Many-core GPUs will often demand a high GPU architecture, application software will need
computing capability including multiple GPU
degree of fine-grained parallelism – the to be redesigned. MSC Software and NVIDIA
computing capability for DMP runs
application program should create many threads engineers have been working together over
so that while some threads are waiting for data the last year on the use of GPUs to accelerate
to return from memory other threads can be the sparse direct solver in MSC Nastran. Benefits
executing – offering a different approach in • Vastly reduce use of pinned host memory
• Handle arbitrarily large fronts, for very
large models

Partner Showcase: NVIDIA | 1


PARTNER SHOWCASE MSC Software: Partner Showcase - NVIDIA

Solver Acceleration in and floating-point performance that are several In addition, the MSC Nastran implementation
MSC Nastran 2012: factors faster than the latest CPUs. In supports multiple GPU computing capability
MSC Nastran, the most time consuming part is for DMP (Distributed Memory Parallel) runs.
A sparse direct solver is possibly the most
the BLAS level 3 operations in the multi-frontal In such cases of DMP>1, multiple fronts are
important component in a finite element
factorization process. To date, only the trailing factorized concurrently on multiple GPUs. The
structural analysis program. Typically, a
matrix updates of the front factorization are matrix is decomposed into two domains, and
multi-frontal algorithm with out-of-core
implemented as CUDA kernels and these each domain is computed by a MPI process.
capability for solving extremely large
update kernels are the subject of a collaborative A typical MSC Nastran job submission
problems and BLAS level 3 kernels for the
work between NVIDIA and MSC engineers. command with multiple GPUs is shown below:
highest compute efficiency is implemented.
Elimination tree and compute kernel level nastran2012 jid=myinput mem=48gb
parallelism with dynamic scheduling is used GPU Computing Implementation and buffsize=65537 dmp=2 gpuid=0:1
to ensure the best scalability. The BLAS level Target Analysis (Solution Sequences): gputhresh=12000 sys205=192
3 compute kernels in a sparse direct solver NVIDIA’s CUDA parallel programming sys151=1 mode=i8 sdir=/local/
are the prime candidate for GPU computing architecture is used to implement the update skodiyal/tmp bat=no scr=yes
due to their high floating point density and kernels. CUDA is the hardware and software gpuid is the ID of a licensed GPU device to
favorable compute to communication ratio. architecture that enables NVIDIA GPUs be used in the analysis. Multiple IDs may
The proprietary symmetric MSCLDL and to execute programs written with C, C++, be assigned to MSC Nastran DMP runs.
asymmetric MSCLU sparse direct solvers FORTRAN, OpenCL, and other languages. gputhresh represents the minimum threshold
in MSC Nastran employ a super-element Vastly reduced use of pinned host memory for GPU computing in the multi-frontal sparse
analysis concept instead of dynamic tree level and the ability to handle arbitrarily large factorization. If the product of the rank size
parallelism. In this super-element analysis, the fronts, for very large models (greater than and the front size of each front is smaller than
structure/matrix is first decomposed into large 15M DOF) on a single Tesla C2050 GPU, are value, the rank update of the front is processed
sub-structures/sub-domains according to user some strengths of the GPU implementation on the CPU. Otherwise, the GPU device would
input and load balance heuristics. The out- in MSC Nastran 2012. ‘Staging’ is a term that be used for the rank update of the front.
of-core multi-frontal algorithm is then used to is used to describe how very large fronts are The GPUs supported with this implementation
compute the boundary stiffness, or the Schur handled. If the trailing submatrix is too large are the NVIDIA Tesla 20-series (shown in
compliment, followed by the transformation of to fit on the GPU device memory, then it is Figure 1) and Quadro GPUs based on the Fermi
the load vector, or the right hand side, to the broken up into approximately equal-sized architecture (compute capability 2.0). Linux
boundary. The global solution is found after ‘stages’ and the stages are completed in and Windows 64-bit platforms are supported
the boundary stiffness matrices are assembled order. Multiple streams are used within a
into the residual structure and the residual stage. So, for an arbitrarily large submatrix, Any ‘fat’ BLAS3 code path would be
structure is factorized and solved. The GPU is say 40GB, then it would be solved in, say, 10 potential candidate for GPU computing.
a natural fit for each sub-structure boundary stages of 4GB each. The actual sizes of the Sparse direct solver intensive SOL101 (linear
stiffness/Schur compliment calculation. stages can be varied for performance tuning. statics), SOL108 (direct frequency) and
SOL400 (nonlinear) fall into this category.
Today’s GPUs can provide memory bandwidth

Figure 1: NVIDIA Tesla 20-series GPUs (workstation & server form factors)

2 | MSC Software
MSC Software: Partner Showcase - NVIDIA PARTNER SHOWCASE

Figure 2: Automotive crank shaft (945K DOF) and engine (15.2M DOF) models

Figure 3: Performance speed-ups with


Single and Multiple GPUs using MSC Nastran
2012 models

SOL108 would need a complex sparse direct The hardware configurations used with enabled by GPU computing will facilitate
solver that is not supported in MSC Nastran these benchmark runs consisted of: MSC Nastran users to add more realism
2012 implementation, however, this feature (1) AMAX server, Linux, 2x hex-core Westmere, to their models thus improving the quality
is currently under development and testing 2.67GHz, 32GB memory, 2x Tesla C2050 of the simulations. A rapid CAE simulation
for an upcoming point release. Likewise, GPU for the 945K and 1.3M DOF model capability from GPUs has the potential to
conventional SOL111 (modal frequency) with transform current practices in engineering
large MPYAD’s (multiply-add) also should (2) Super Micro server, Linux, 2x quad- analysis and design optimization procedures.
benefit from GPU computing in a later release. core Nehalem 2.27GHz, 96GB memory,
2.2 TB SATA 5-way striped RAID and 2x This initial GPU computing implementation
Tesla C2050 GPU for all other models. also identified certain issues – for one, the
Performance analysis with larger the model, the higher the DMP overhead
GPU Computing: Figure 3 shows the end-to-end (total) speed-up in MSC Nastran. This increased CPU side
for single and multiple GPU runs. In general, overhead reduces the overall speed-up
Linear and nonlinear structural stress analysis based on the benchmark models, we see
are the target applications with this first resulting from GPU computing. Future
speed-ups in the range of 4-6X with a single releases of MSC Nastran will address such
implementation of GPU computing in GPU over a serial run and in the range of
MSC Nastran 2012. Structural finite element issues as well as expand the GPU computing
1.4-2X with 2 GPUs over a 8 core DMP run. capability to include complex solver kernels
models dominated by solid elements provide
for more concentrated computational work for the NVH and dynamics markets.
in the sparse matrix factorization, which Summary:
is highly desirable for the GPU. A range of GPU computing is implemented in
models with varying fidelity, from around MSC Nastran 2012 to significantly lower the
1M degrees of freedom (DOF) to 15M DOF simulation times for industry standard analysis
is considered (Figure 2). Performance models. Vastly reduced use of pinned memory
comparisons are relative to a serial Nastran and the ability to handle arbitrarily large front
run, which is still widely adopted within sizes for very large models are some of the
the customer community, as well as with strengths of this implementation. Further,
multi-core (2x quad-core Nehalem) CPUs. multiple GPUs can be used with Nastran
DMP analysis. The performance speed-ups

Partner Showcase: NVIDIA | 3


PARTNER SHOWCASE

About MSC Software About MSC Nastran


MSC Software is one of the ten original software companies and MSC Nastran Structural & Multidiscipline FEA
the worldwide leader in multidiscipline simulation. As a trusted MSC Nastran is the world’s most widely used Finite Element Analysis
partner, MSC Software helps companies improve quality, save time (FEA) solver that helped MSC Software become recognized in 2011
and reduce costs associated with design and test of manufactured as one of the “10 Original Software Companies”. When it comes to
products. Academic institutions, researchers, and students employ solving for stress/strain behavior, dynamic and vibration response
MSC technology to expand individual knowledge as well as expand and thermal gradients in real-world systems, MSC Nastran is
the horizon of simulation. MSC Software employs 1,000 professionals recognized as the most trusted multidiscipline solver in the world.
in 20 countries. For additional information about MSC Software’s
products and services, please visit www.mscsoftware.com. MSC Nastran is built on work done by NASA scientists and
researchers, and is trusted for the design of mission critical systems
in every industry. Nearly every spacecraft, aircraft, and vehicle
designed in the last 40 years has been analyzed using MSC Nastran.
In recent years, several extensions to its capabilities have
resulted in a single multidisciplinary solver providing users
with a trusted solution to simulate everything from a single
component to complex assemblies under diverse conditions.

Please visit MSC Nastran offers a complete set of linear static and dynamic
analysis capabilities along with unparalleled support for

www.mscsoftware.com superelements enabling users to solve large, complex assemblies


more efficiently. MSC Nastran also offers a complete set of implicit

for more partner showcases and explicit nonlinear analysis capabilities, thermal and interior/
exterior acoustics, and coupling between various disciplines
such as thermal, structural, and fluid interaction. New modular
packaging that enables you to get only what you need makes
it more affordable to own MSC Nastran than ever before.

Corporate Europe, Middle East, Asia-Pacific Asia-Pacific


MSC Software Corporation Africa MSC Software Japan LTD. MSC Software (S) Pte. Ltd.
2 MacArthur Place MSC Software GmbH Shinjuku First West 8F 100 Beach Road The MSC Software corporate logo, MSC, and the names of the
Santa Ana, California 92707 Am Moosfeld 13 23-7 Nishi Shinjuku #16-05 Shaw Tower MSC ‌Software products and services referenced herein are trademarks
Telephone 714.540.8900 81829 Munich, Germany 1-Chome, Shinjuku-Ku Singapore 189702 or registered trademarks of the MSC Software Corporation in the United
www.mscsoftware.com Telephone 49.89.431.98.70 Tokyo, Japan 160-0023 Telephone 65.6272.0082 States and/or other countries. All other trademarks belong to their
Telephone 81.3.6911.1200 respective owners. © 2012 MSC Software Corporation. All rights reserved.

NVIDIA*2012MAY*PS

You might also like