Professional Documents
Culture Documents
Parallel Reservoir Simulation: User Course
Parallel Reservoir Simulation: User Course
User Course
Schlumberger Private
SIS Houston
October 2010
© 2010 Schlumberger. All rights reserved.
Schlumberger Private
An asterisk is used throughout this presentation to denote a mark of Schlumberger.
Other company, product, and service names are the properties of their respective
owners.
2
Agenda
• Nomenclature of modern high-performance computing
hardware
• ECLIPSE Parallel Technology
Schlumberger Private
• Working with Compute Clusters
– Remote Job Submission Methods
– Cluster management software
– Useful Tips
3
Nomenclature of modern high-performance
computing hardware
Schlumberger Private
Nomenclature
• Core: • Cluster Node:
– Processing engine; it carries out the – “Pizza box” with motherboard,
instructions of a computer program. power supply, fans, network
• CPU/chip: cards, etc
Schlumberger Private
– A CPU may contain 1, 2, 3, 4, 8, …
• Cluster:
cores, but also has caches, memory – Assembly of nodes connected via
controller, etc high-speed Interconnects
• Socket: (Gigabit, Infiniband)
5
Cluster
Cluster
Schlumberger Private
Node
Interconnect
(cables)
6
Blade Server
Schlumberger Private
Blades
Blade Server
7
Sockets
• Motherboard with 2
sockets
• CPUs will be plugged into
sockets
Schlumberger Private
Sockets
8
Compute Cores (INTEL)
ECLIPSE domains are
processed on these cores!
Schlumberger Private
C1 C2 C3 C4
Message-passing
between cores (fast!)
Message-passing
between CPUs (slower!)
9
ECLIPSE Parallel Technology
Schlumberger Private
Parallel Reservoir Simulation
The Idea!
Schlumberger Private
• Each domain is sent to a different
processor
11
ECLIPSE Parallel
Individual domains are processed on different cores
Schlumberger Private
13
Licensing
Each domain needs one Parallel license.
Example: If a simulation model is divided eg into 8 domains, 8 Parallel
licenses are required.
– This subdivision is independent of the # of CPUs/cores in the computer!
Schlumberger Private
• The 8 domains can be processed in the following ways:
– Process 8 domains on 8 real CPUs/cores (recommended)
– Use less than 8 real CPUs/cores for 8 domains (not recommended as some
CPUs/cores will have to process more than one domain (overload!) and the
simulator performance will degrade dramatically)
• for this reason also avoid hyper-threading!
– Use only one core of a multi-core CPU and process a single domain on it; the
remaining domains will be sent to a CPU on other nodes.
• This gives very good performance but leaves a lot of hardware
underutilized (not recommended except eg for benchmarking)
14
Parallelisation in ECLIPSE 100
• Black-oil formulation:
• Ca. 70% of the CPU-time is spent in linear solver
Schlumberger Private
• Model is divided into 1D slabs
• Division is in horizontal plane
– In the vertical direction, the strong coupling is preserved
• Slabs are in full communication during the solution of
the linear equations (a special version of the nested-
factorization solver is used!)
– The work per iteration increases a little (~1.3), however,
the number of iterations during parallel processing
should be the same as for serial runs.
15
Parallelisation in ECLIPSE 100
Schlumberger Private
individual processor.
• Using DOMAINS in RPTGRID outputs the
partitions of a reservoir to the PRT-file.
• Useful tip:
• Check the elapsed runtime at the end of a
job (prt-file) so see how long it took for
each domain to be processed. If times are
very different, change the number of cells
in the domains to get a better load
balance.
16
Parallelisation in ECLIPSE 100
Summary of Keywords:
GRID Section
Schlumberger Private
• SOLVDIRS: Solver principal directions
– Don’t change except in very rare cases!
• DOMAINS: Defines the partitioning of the
grid.
• IHOST: Allocates LGRs to specific
processes. Each LGR is sent to a
different CPU (task farming).
• PARAOPTS: Permits some fine-tuning of
the parallel processing parameters.
• Use with care!
17
Parallelisation in ECLIPSE 300
• Solver is less dominant
• Only ca. 30-40% of total CPU-time
Schlumberger Private
decomposition and requires no extra
work per linear iteration, at the cost of
breaking the coupling across the
reservoir at each linear iteration.
– Therefore the number of linear iterations
increases with the number of domains.
• However, the reservoir is fully coupled
at the level of each Newton iteration.
18
Parallelisation in ECLIPSE 300
Summary of Keywords:
RUNSPEC Section
Schlumberger Private
used in the x-direction.
• NPROCY: Defines the number of
processors to be used in the y-
direction.
• PARALLEL: Initializes the Parallel
option.
• PSPLITX: Specifies the domain
decomposition in the x-direction.
• PSPLITY: Specifies the domain
decomposition in the y-direction.
19
Parallelisation in ECLIPSE 300
Best Practice:
• Use RPTSOL in SOLUTION section
with mnemonic PART:
• Outputs all possible domain partitions
Schlumberger Private
for the Parallel option, ie the PSPLITX
and PSPLITY keywords.
• Check/optimize the (static) load
balance!
• Run a NOSIM case and copy values
from PRT file into the data set.
20
Parallelisation in ECLIPSE 300
• Example: Active cell load balance % :
Schlumberger Private
• Process Cells If the active cell load balance is 75% for a
1 1 to 40 4-processor job,
then the maximum parallel speed-up that
2 41 to 60 can be achieved is 3 !
3 61 to 80 (ignoring any super-linear speed-up that
4 81 to 120 may be achieved by cache effects).
21
Results Reproducibility
Serial vs. Parallel runs
Schlumberger Private
23
Non-ideal Speedup due to Parallelisation Overhead
• Latency is a
1200 hardware constant
latency related to the
1000
comms Interconnect and
time
Schlumberger Private
800 work
# of CPUs.
600
• Communication
400
overhead increases
200 with the # of CPUs.
0 • Only the Work (eg
0 200000 400000 600000 800000 1000000 1200000 solution of the
problem size equations) can be
reduced if the # of
CPUs is increased.
24
Typical Speed-up
100K cells
25
Schlumberger Private
20
15
Speed-up
10
Ideal Speedup
5 Actual Speedup
0
0 8 16 24 32
Number of processors
25
Memory Scalability
Illustration! Not indicative of real memory scaling!
60
50
Memory (MB) 40
30
MB
Schlumberger Private
20
10
0
1 2 3 4 5 6 7 8
No of CPUs
Schlumberger Private
6 CPUs due to
– Different # of cells in each domain
5
Node
Compositional: 20k-50k
Schlumberger Private
• Depends on the number of components vs model size
Thermal: 10k-30k
• Thermal models use very small time steps (ie lots of message
passing)
• Real-world performance is highly dependent on the model physics
(not so much on model size)
– Steam flooding, in-situ combustion, SAGD, THAI, etc
28
Useful Tips
Numerical Stuff
Residual
• Numerical errors can increase during
parallel runs when many CPUs are
e used (due to finite precision of
computer hardware). Usually these
Schlumberger Private
errors are way below engineering
Time accuracy, e, however, sometimes they
accumulate and result in additional
linear/non-linear iterations.
Machine accuracy
TUNINGDP (use defaults)
Algorithmic accuracy
• Use this keyword if the parallel run
shows different convergence behavior
to the serial run.
• It tightens several numerical tolerances and may
help performance
29
Useful Tips
FrontSim Multi-threaded
• FrontSim can run “in parallel” • FrontSim with multi-threading is
• It uses multi-threading and not not supported with any remote
queuing system
mpi-based message passing (like
•
Schlumberger Private
ECLIPSE) it can only be used on the local
machine
• Works for shared-memory
machines only
– eg one workstation or one
compute node with multiple
CPUs/cores
– Enabled by THREADFS keyword
• Separate licenses necessary
(“Parallel Multicore” feature)!
30
Miscellaneous
Simulation output cleanup
• Remote jobs: All input and output files will be transferred to a
remote temporary directory, which is deleted at the end of run ( after
result files have been copied back to the local directory).
Schlumberger Private
– Option not to delete directory
• Local jobs: Files are kept; if a new run has same file names, old file
names will be overwritten.
31
Working with Compute Clusters
Schlumberger Private
Queuing System
A queuing system (eg LSF from Platform Computing or Microsoft’s
queuing system) will distribute serial and parallel runs on a compute
cluster such that only one process runs on one compute core.
• If more than one process runs on a compute core, the performance
Schlumberger Private
of the simulation run will degrade dramatically (ie runtime goes up).
33
Simulation Job Submission: Simulation Launcher
• Useful when there is only a
simulation deck available (ie no
Petrel model)
• Installed on a frontend workstation
Schlumberger Private
which is connected to a cluster.
• Launcher can submit jobs to a
remote queue (using eclrun).
• If EnginFrame is installed on the
cluster, continuous runtime
monitoring is possible, otherwise,
manual re-loading of prt-file can be
done (Linux only!).
• The launcher is also used to launch
other (interactive) simulation
applications.
34
Petrel Simulation Job Submission (to Linux)
Schlumberger Private
35 Runtime monitoring!
Petrel Simulation Job Submission (to Windows)
Schlumberger Private
Windows HPC works with “job templates”
rather than queues!
However, Petrel maps these templates
to (pseudo) queues for a consistent
user experience.
36
Windows HPC Cluster Manager
Work Load Monitoring –Only available to Cluster Administrators
Schlumberger Private
42
Windows HPC Cluster Manager
Job Management –Available to all users
Schlumberger Private
43
Schlumberger Private
Flexible Licensing
Flexible Licensing
License-aware scheduling
Schlumberger Private
Pay-per-use licensing (PPU)
45
Flexible Licensing
License-aware Scheduling (Windows)
• License-aware scheduling with LSF or MS Scheduler
• Use LICENSES keyword in ECLIPSE deck to reserve licenses for a simulation
run.
• Set ECL_LSF_LICCHECK to TRUE in the ECLIPSE macros
Schlumberger Private
• Also requires 2010.2 eclrun-macros and Windows 2008 Server R2 HPC
• The ECLIPSE job will be put in a queue if not all licenses are available at the
beginning of a run (rather than stopping with an error!).
47
Flexible Software Licensing
Multiple Realization Licensing
Modern workflows require a large number of simulation runs
• Assisted history-matching
• Uncertainty workflows
Schlumberger Private
• What-if scenario based forecasting
Common to all workflows is that N different realizations are created from one base
model
In these cases, MR-licenses can be used instead of having N full sets of ECLIPSE
licenses
• 1 MR-license costs 1/3 of an ECLIPSE license, and it includes all ECLIPSE
options!
• Logic is currently implemented in PETREL, Cougar (IFP), EnABLE (Roxar), and
MEPO (SPT-group)
48
Flexible Software Licensing
MR Licensing Examples
Price of 1 MR-license = 1/3 of ECLIPSE Blackoil license!
Schlumberger Private
= 1 MEPO + 20 ECLIPSE Blackoil + 20 LGR
ECLIPSE
ECLIPSE ECLIPSE
= 1 MEPO + 1 Black + 1 LGR + 8*20=160 + 20
Parallel MR
oil
49
Flexible Software Licensing
MR Licensing Pricing Examples
20 concurrent simulation runs - costs spread • Save 66% for “basic” model
• Save 36% for ECLIPSE Parallel
case;
Schlumberger Private
– Parallel is a “performance option”
Costs
NON MR
MR
not an MR-option!
• Save 84% for “complex” model
ECLIPSE Blackoil model - serial ECLIPSE Blackoil model - parallel ECLIPSE Compositional model
runs runs (8-way) (LGR, MWS, Res. Opt) - serial
runs
Model type
50
Flexible Software Licensing
Pay-Per-Use Licensing
Schlumberger Private
Need to eliminate application Enable
Enable totally
totally flexible
flexible access
access to
to selected
selected
access denials SIS
SIS application
application as as and
and when
when needed
needed
Need to access infrequently-used Monitor
Monitor application
application usage
usage
applications or modules
Improve
Improve client’s
client’s internal
internal cost
cost allocation
allocation
Need to track application usage for
internal cost allocation Observe
Observe and
and manage
manage current
current billings
billings for
for
software
software usage
usage
51
Flexible Software Licensing
Pay Per Use Program – How it works
SIS applications, licenses and
Pay Per Use logging software
are installed at client site
Users access licenses as
Schlumberger Private
needed, for as long as needed
Secure usage logs sent to
Schlumberger Program
Administration nightly
Schlumberger generates billing
information for client
52
Flexible Software Licensing
Pay Per Use (PPU)
PPU
– Utilizes a tracking agent on the License Server
– Start Time captured on license checkout
– Stop Time captured when checked in
Schlumberger Private
– Usage time summed on a monthly basis
– Invoiced on an hourly rate for time used
53
Apache’s Windows HPC System
Schlumberger Private
What’s in it!
Apache Simulation Hardware
1x Blade server (HP BL 460c in a BLc7000 enclosure)
• 10 compute nodes,1x head node
– Per node:
• 2xINTEL Xeon 5670 CPUs
Schlumberger Private
– 6 cores/CPU @ 2.93GHz, 24GB RAM (DDR3, 1333MHz)
• 3 cores per CPU switched off for performance reasons!
• Operating System: Microsoft Windows Server 2008 R2 HPC Edition
– 60 compute cores in total
• Disk space: ~25TB (usable) of NetApp storage attached to Blade server
-Not currently installed
• Interconnect: Gb, Infiniband 4x QDR
55
Apache Queuing System
Cluster is hard-divided into two groups of Blades:
• 2 Blades (ie 12 cores in total) are reserved for serial runs
• 8 Blades (ie 48 cores in total) are reserved for parallel runs
Schlumberger Private
Queues (set up as HPC-templates and provided as “pseudo queues” in Petrel):
• SerialWork, ParallelWork: All users have access to them
• MR-SerialWork, MR-ParallelWork: Only certain (power) users have access
– Runs have lower priority and are not allowed to use the whole cluster
• Priority-Serial, Priority-Parallel: High-priority runs (certain users only)
• (External-Serial, External-Parallel: Lowest priority; for users from other offices)
Note: The ECLIPSE Launcher cannot save the queue commands! Will be fixed in 2011.1.
56
Apache Simulator Licenses
Installed Simulator licenses:
Purchased ECLIPSE licenses:
• 4x Blackoil, 1xSolvent, 2xLGRs, 1xCompositional, 1xMultisegment Wells, 1xOpen
ECLIPSE,
Schlumberger Private
ECLIPSE Parallel licenses:
• 24x purchased ECLIPSE Parallel licenses (on 3 Blades)
• 56x ECLIPSE Parallel licenses provided on “pay-as-you-use” basis (on 7 Blades)
• 10xMultiple Realization
57
Additional Software Requirements
•Windows HPC Client Utilities must be installed on
workstations/servers where Petrel & ECLIPSE reside
Schlumberger Private
Service Pack 3 and Microsoft PowerShell
58
Apache “Queue” Settings
To submit jobs to the Windows Cluster from Petrel 2010.1:
Set the following under "Tools, System settings, Queue definition"
Name: Cluster
Server: localhost
Schlumberger Private
Remote Queue: houexap40
Options: --queueparameters=”/jobtemplate:ParallelWork”
(Or other appropriate template name)
59
Apache “Queue” Settings
To submit jobs to Windows Cluster from the Simulation Launcher:
Set the following under "Configuration, Settings, Queues"
Name: Cluster
Server: localhost
Schlumberger Private
Remote Queue: houexap40
(Unlike Linux, you will have to manually enter the Remote Queue
name)
Set the following under "Summary, Advanced"
Edit Command Line Options: “Enable” and add the following:
--queueparameters=”/jobtemplate:ParallelWork”
(Or other appropriate template name)
60
Biggest Simulation Models
Schlumberger Private
Biggest simulation models run on a daily basis
Real models:
Russia:
• 3-phase black-oil, 3.5mio/1.6mio cells, over 14,000 wells(!), 30 years history
• 76mio/6mio, blackoil
Schlumberger Private
Middle East:
• 7 component compositional, 3.5mio/1.2mio
– gas re-injection for EOR (complex physics), very heterogeneous geology
Canada:
• 7mio cell Thermal model (6 SAGD pairs on geological model)
Synthetic models:
• FrontSim: 56mio cells, synthetic 2-oil-water model
• ECLIPSE BO: Synthetic 3-phase model, 160mio cells on 992 cores, 4TB RAM
62