Professional Documents
Culture Documents
• Introduction
• Numerical variations
• Evolution of crashworthness modelling
• Evolution of CPU technology
• SMP and MPP
• Summary
Introduction - SMP vs MPP
Shared memory Clusters
HP
450000 CP3000+InfiniBand
400000 For 1024 cores, max. (2x1) Xeon 3.0GHz
HP
350000 speedup is 251(2007)! CP3000+InfiniBand
(2x2) Xeon 3.0GHz
300000
Model (car2car model):
250000 2,500,000+ nodes and elements
200000 Fully integrated (type 16) shells
150000
100000
50000
0
2 4 8 16 32 64 128 224 256 512 1024
Number of CPU
Numerical Variations - SMP
SMP ncpu=4, no special treatment
Based on the system load with 4 different runs
Example with 3 2 2
2
significant digits:
f1=10.1
f2=4.01
E2,f2 E4,f4 f3=5.05 E2,f2 E4,f4
f4=3.06
• CrachFEM materials
• Detailed HAZ welds
• Multiple airbags
• Rolling wheels
• New powertrain models
• New chassis models
Courtesy of: VOLVO CAR CORPORATION • Steering mechanics from
wheel to steering wheel
Model size for Satety Analysis
35M in 3 yrs
16000
Solid elements!
14000
Smaller elements
Solid spotweld cluster
Model Size (x1000)
12000 Multi-Physics
MPP
10000 Turn around time ~16h-48h
8000
Fully integrated shell
6000 More sophisticated
mat model
4000
Transition from
2000 SMP to MPP SMP
BT shell
0
1994 1996 1998 1999 2000 2002 2004 2006 2008 2014 2016
Serial Serial+SMP SMP/MPP MPP
cores/job 1 4 8 24 48 96 192
MPP performance – topcrunch.org (2007)
600000
HP
450000 CP3000+InfiniBand
(2x1) Xeon 3.0GHz
400000
HP
350000 CP3000+InfiniBand
(2x2) Xeon 3.0GHz
300000
Model (car2car model):
250000 2,500,000+ nodes and elements
200000 Fully integrated (type 16) shells
150000
120 times 251 times
100000
50000
0
2 4 8 16 32 64 128 224 256 512 1024
Number of CPU
MPP scaling
31 ideal
Total
26 element
contact
21
Performance
rigidbody
16
11
Scaling of contact
6
Courtesy of: Steve Kan, Dhafer Marzougui (CCSA, George Mason University)
Intel
DoCode
youname Min. featureincrease?
see clockrate sz Rel. core/sck Clockrate Instruction
Do you notice cores/socket
Bloomfield 45nm increase?
2008 4 3.2 GHz SSE
SMP
SMP SMP
SMP SMP
SMP
SMP
MPP message
(8,16) MPP ranks
96 MPP ranks (12,6) SMP threads
Explicit: L1$
Implicit:
S CPU bounded CPU bounded
p Memory bounded
e L2$ IO bounded
e
d L3$
DRAM
NVMe Disk
SSD Disk
Disk
Capacities
Numerical consistency
SMP
ncpu= - # (~10% slower)
NCAC model
2.5M elements, 53 contacts, type 16 shell
35 mph frontal
100 ms
Explicit MPP/Hybrid Performance
Multi-core/Multi-socket clusters
Nodes to monitor
Explicit MPP/Hybrid Performance
Tomp
Tmpp~Tmpp+Tomp
Explicit MPP/Hybrid Consistency
Explicit MPP/Hybrid Consistency
Implicit Hybrid Performance
Model CYL0P5E6 – Intel 6core 2.95 GHz
16 x 4 x 1 2055 14417
16 x 4 x 2 985 13290
16 x 4 x 4 582 29135
20000 Contact
Element
15000 2 threads
Tmpp > Tmpp+Tomp
10000
8M
5000
0
1 2 3 4 5 6 7
How to choose your executable
Explicit analysis
• SMP - 1-2 cores
• MPP - 2-100 cores
• HYBRID
– More than 100 cores
– Run under different core counts but need consistent answers
Implicit analysis
• HYBRID
– Keep as much data in memory for incore solution
– Reduce IO bottleneck
– Reduce amount of data transferring in the network
Summary
For better prediction and to reduce cost of prototyping
• Models involve coupled multi physics, HAZ, CPM airbag, etc.
• Mesh becomes finer and critical parts are using solids
• More sophisticated contact (mortar) for better behavior
• Gives better scaling for explicit analysis with more then 100
cores
• Gives better performance in most of the implicit analysis
Thank you !
• Hybrid has already used in production (stable)
• More capabilities will become OpenMP enabled to get better
speedup