You are on page 1of 54

Parallel Reservoir Simulation

User Course

Schlumberger Private
SIS Houston
October 2010
© 2010 Schlumberger. All rights reserved.

Schlumberger Private
An asterisk is used throughout this presentation to denote a mark of Schlumberger.
Other company, product, and service names are the properties of their respective
owners.

2
Agenda
• Nomenclature of modern high-performance computing
hardware
• ECLIPSE Parallel Technology

Schlumberger Private
• Working with Compute Clusters
– Remote Job Submission Methods
– Cluster management software
– Useful Tips

• Advanced Licensing Schemes


• Apache’s Installation

3
Nomenclature of modern high-performance
computing hardware

Schlumberger Private
Nomenclature
• Core: • Cluster Node:
– Processing engine; it carries out the – “Pizza box” with motherboard,
instructions of a computer program. power supply, fans, network
• CPU/chip: cards, etc

Schlumberger Private
– A CPU may contain 1, 2, 3, 4, 8, …
• Cluster:
cores, but also has caches, memory – Assembly of nodes connected via
controller, etc high-speed Interconnects
• Socket: (Gigabit, Infiniband)

– The connector on a motherboard for


• Blade Server:
the CPU. – A blade server is a stripped
• A motherboard can have 1, 2, or 4 down cluster with a modular
sockets that are connected to each design optimized to minimize the
other. use of physical space and energy.
• Typically, 2-socket motherboards are • “Blades” are the equivalent of
used in simulation clusters “cluster nodes”

5
Cluster

Cluster

Schlumberger Private
Node

Interconnect
(cables)
6
Blade Server

Schlumberger Private
Blades

Blade Server

7
Sockets
• Motherboard with 2
sockets
• CPUs will be plugged into
sockets

Schlumberger Private
Sockets

8
Compute Cores (INTEL)
ECLIPSE domains are
processed on these cores!

Schlumberger Private
C1 C2 C3 C4

Message-passing
between cores (fast!)

Message-passing
between CPUs (slower!)
9
ECLIPSE Parallel Technology

Schlumberger Private
Parallel Reservoir Simulation
The Idea!

• A large simulation model is split


into n smaller domains

Schlumberger Private
• Each domain is sent to a different
processor

• All domains are then processed


simultaneously on the n
processors (ie, in parallel)

11
ECLIPSE Parallel
Individual domains are processed on different cores

Schlumberger Private
13
Licensing
Each domain needs one Parallel license.
Example: If a simulation model is divided eg into 8 domains, 8 Parallel
licenses are required.
– This subdivision is independent of the # of CPUs/cores in the computer!

Schlumberger Private
• The 8 domains can be processed in the following ways:
– Process 8 domains on 8 real CPUs/cores (recommended)
– Use less than 8 real CPUs/cores for 8 domains (not recommended as some
CPUs/cores will have to process more than one domain (overload!) and the
simulator performance will degrade dramatically)
• for this reason also avoid hyper-threading!
– Use only one core of a multi-core CPU and process a single domain on it; the
remaining domains will be sent to a CPU on other nodes.
• This gives very good performance but leaves a lot of hardware
underutilized (not recommended except eg for benchmarking)
14
Parallelisation in ECLIPSE 100

• Black-oil formulation:
• Ca. 70% of the CPU-time is spent in linear solver

Schlumberger Private
• Model is divided into 1D slabs
• Division is in horizontal plane
– In the vertical direction, the strong coupling is preserved
• Slabs are in full communication during the solution of
the linear equations (a special version of the nested-
factorization solver is used!)
– The work per iteration increases a little (~1.3), however,
the number of iterations during parallel processing
should be the same as for serial runs.

15
Parallelisation in ECLIPSE 100

• The number of cells allocated to each


domain can be modified using the
DOMAINS keyword in the GRID section.
• Each domain is processed by an

Schlumberger Private
individual processor.
• Using DOMAINS in RPTGRID outputs the
partitions of a reservoir to the PRT-file.
• Useful tip:
• Check the elapsed runtime at the end of a
job (prt-file) so see how long it took for
each domain to be processed. If times are
very different, change the number of cells
in the domains to get a better load
balance.
16
Parallelisation in ECLIPSE 100

Summary of Keywords:
GRID Section

Schlumberger Private
• SOLVDIRS: Solver principal directions
– Don’t change except in very rare cases!
• DOMAINS: Defines the partitioning of the
grid.
• IHOST: Allocates LGRs to specific
processes. Each LGR is sent to a
different CPU (task farming).
• PARAOPTS: Permits some fine-tuning of
the parallel processing parameters.
• Use with care!
17
Parallelisation in ECLIPSE 300
• Solver is less dominant
• Only ca. 30-40% of total CPU-time

• ECLIPSE 300 allows a 2-dimensional

Schlumberger Private
decomposition and requires no extra
work per linear iteration, at the cost of
breaking the coupling across the
reservoir at each linear iteration.
– Therefore the number of linear iterations
increases with the number of domains.
• However, the reservoir is fully coupled
at the level of each Newton iteration.

18
Parallelisation in ECLIPSE 300
Summary of Keywords:
RUNSPEC Section

• NPROCX: Number of processors to be

Schlumberger Private
used in the x-direction.
• NPROCY: Defines the number of
processors to be used in the y-
direction.
• PARALLEL: Initializes the Parallel
option.
• PSPLITX: Specifies the domain
decomposition in the x-direction.
• PSPLITY: Specifies the domain
decomposition in the y-direction.
19
Parallelisation in ECLIPSE 300
Best Practice:
• Use RPTSOL in SOLUTION section
with mnemonic PART:
• Outputs all possible domain partitions

Schlumberger Private
for the Parallel option, ie the PSPLITX
and PSPLITY keywords.
• Check/optimize the (static) load
balance!
• Run a NOSIM case and copy values
from PRT file into the data set.

20
Parallelisation in ECLIPSE 300
• Example: Active cell load balance % :

PSPLITX Total active cells in reservoir


40 60 80 120 / --------------------------------------------------------- x 100
Active cells per partition × number of procs
/

Schlumberger Private
• Process Cells If the active cell load balance is 75% for a
1 1 to 40 4-processor job,
then the maximum parallel speed-up that
2 41 to 60 can be achieved is 3 !
3 61 to 80 (ignoring any super-linear speed-up that
4 81 to 120 may be achieved by cache effects).

21
Results Reproducibility
Serial vs. Parallel runs

Schlumberger Private
23
Non-ideal Speedup due to Parallelisation Overhead

• Latency is a
1200 hardware constant
latency related to the
1000
comms Interconnect and
time

does not depend on

Schlumberger Private
800 work
# of CPUs.
600
• Communication
400
overhead increases
200 with the # of CPUs.
0 • Only the Work (eg
0 200000 400000 600000 800000 1000000 1200000 solution of the
problem size equations) can be
reduced if the # of
CPUs is increased.

24
Typical Speed-up

100K cells
25

Schlumberger Private
20

15
Speed-up

10

 Ideal Speedup
5  Actual Speedup

0
0 8 16 24 32
Number of processors

25
Memory Scalability
Illustration! Not indicative of real memory scaling!
60

50

Memory (MB) 40

30
MB

Schlumberger Private
20

10

0
1 2 3 4 5 6 7 8

No of CPUs

• Memory requirements are not independent of the number of CPUs


• Memory demand increases slightly with the number of CPUs
• Reasons:
– Each process has to hold its own copy of, eg well data, rel. perm. tables, or PVT-data
– Useful tip: Avoid over-dimensioning of this data in the RUNSPEC section!
26
Load Imbalance
Reasons:
Load Imbalance
• Different types of CPUs in one
8
machine
7
• Different work load on different

Schlumberger Private
6 CPUs due to
– Different # of cells in each domain
5
Node

– 1-phase aquifer domains vs 3-phase


4
coning domains
3 – E300-flash: Single- and multiphase
2 cells
1 • Other users/processes if no
0 5 10 15 20
queuing system is used
Time • (Network traffic)

• The overall model runtime is


determined by the slowest domain!
27
Useful Tips
Minimum # of cells per node
Blackoil: 80k-100k

Compositional: 20k-50k

Schlumberger Private
• Depends on the number of components vs model size

Thermal: 10k-30k
• Thermal models use very small time steps (ie lots of message
passing)
• Real-world performance is highly dependent on the model physics
(not so much on model size)
– Steam flooding, in-situ combustion, SAGD, THAI, etc
28
Useful Tips
Numerical Stuff
Residual
• Numerical errors can increase during
parallel runs when many CPUs are
e used (due to finite precision of
computer hardware). Usually these

Schlumberger Private
errors are way below engineering
Time accuracy, e, however, sometimes they
accumulate and result in additional
linear/non-linear iterations.
Machine accuracy
TUNINGDP (use defaults)
Algorithmic accuracy
• Use this keyword if the parallel run
shows different convergence behavior
to the serial run.
• It tightens several numerical tolerances and may
help performance
29
Useful Tips
FrontSim Multi-threaded
• FrontSim can run “in parallel” • FrontSim with multi-threading is
• It uses multi-threading and not not supported with any remote
queuing system
mpi-based message passing (like

Schlumberger Private
ECLIPSE) it can only be used on the local
machine
• Works for shared-memory
machines only
– eg one workstation or one
compute node with multiple
CPUs/cores
– Enabled by THREADFS keyword
• Separate licenses necessary
(“Parallel Multicore” feature)!

30
Miscellaneous
Simulation output cleanup
• Remote jobs: All input and output files will be transferred to a
remote temporary directory, which is deleted at the end of run ( after
result files have been copied back to the local directory).

Schlumberger Private
– Option not to delete directory
• Local jobs: Files are kept; if a new run has same file names, old file
names will be overwritten.

31
Working with Compute Clusters

Schlumberger Private
Queuing System
A queuing system (eg LSF from Platform Computing or Microsoft’s
queuing system) will distribute serial and parallel runs on a compute
cluster such that only one process runs on one compute core.
• If more than one process runs on a compute core, the performance

Schlumberger Private
of the simulation run will degrade dramatically (ie runtime goes up).

A queuing system can also be used to distribute jobs according to


memory requirements, maximum runtime, a time schedule (eg only use
certain computers at night but not during the day), etc

33
Simulation Job Submission: Simulation Launcher
• Useful when there is only a
simulation deck available (ie no
Petrel model)
• Installed on a frontend workstation

Schlumberger Private
which is connected to a cluster.
• Launcher can submit jobs to a
remote queue (using eclrun).
• If EnginFrame is installed on the
cluster, continuous runtime
monitoring is possible, otherwise,
manual re-loading of prt-file can be
done (Linux only!).
• The launcher is also used to launch
other (interactive) simulation
applications.

34
Petrel Simulation Job Submission (to Linux)

Schlumberger Private
35 Runtime monitoring!
Petrel Simulation Job Submission (to Windows)

Schlumberger Private
Windows HPC works with “job templates”
rather than queues!
However, Petrel maps these templates
to (pseudo) queues for a consistent
user experience.

36
Windows HPC Cluster Manager
Work Load Monitoring –Only available to Cluster Administrators

Schlumberger Private
42
Windows HPC Cluster Manager
Job Management –Available to all users

Schlumberger Private
43
Schlumberger Private
Flexible Licensing
Flexible Licensing
License-aware scheduling

Multiple Realization licensing (MR)

Schlumberger Private
Pay-per-use licensing (PPU)

45
Flexible Licensing
License-aware Scheduling (Windows)
• License-aware scheduling with LSF or MS Scheduler
• Use LICENSES keyword in ECLIPSE deck to reserve licenses for a simulation
run.
• Set ECL_LSF_LICCHECK to TRUE in the ECLIPSE macros

Schlumberger Private
• Also requires 2010.2 eclrun-macros and Windows 2008 Server R2 HPC
• The ECLIPSE job will be put in a queue if not all licenses are available at the
beginning of a run (rather than stopping with an error!).

47
Flexible Software Licensing
Multiple Realization Licensing
Modern workflows require a large number of simulation runs
• Assisted history-matching
• Uncertainty workflows

Schlumberger Private
• What-if scenario based forecasting
Common to all workflows is that N different realizations are created from one base
model
In these cases, MR-licenses can be used instead of having N full sets of ECLIPSE
licenses
• 1 MR-license costs 1/3 of an ECLIPSE license, and it includes all ECLIPSE
options!
• Logic is currently implemented in PETREL, Cougar (IFP), EnABLE (Roxar), and
MEPO (SPT-group)

48
Flexible Software Licensing
MR Licensing Examples
Price of 1 MR-license = 1/3 of ECLIPSE Blackoil license!

Schlumberger Private
= 1 MEPO + 20 ECLIPSE Blackoil + 20 LGR

= 1 MEPO + 1 ECLIPSE Blackoil + 1 LGR + 20 ECLIPSE MR

= 1 MEPO + 20 ECLIPSE Blackoil + 20 LGR + 20 ECLIPSE Compositional

ECLIPSE ECLIPSE ECLIPSE


= 1 MEPO + 1
Blackoil
+ 1 LGR + 1
Compositional
+ 20
MR

= 1 MEPO + 20 ECLIPSE Blackoil + 20 LGR + 8*20=160 ECLIPSE Parallel

ECLIPSE
ECLIPSE ECLIPSE
= 1 MEPO + 1 Black + 1 LGR + 8*20=160 + 20
Parallel MR
oil

49
Flexible Software Licensing
MR Licensing Pricing Examples

20 concurrent simulation runs - costs spread • Save 66% for “basic” model
• Save 36% for ECLIPSE Parallel
case;

Schlumberger Private
– Parallel is a “performance option”
Costs

NON MR
MR

not an MR-option!
• Save 84% for “complex” model
ECLIPSE Blackoil model - serial ECLIPSE Blackoil model - parallel ECLIPSE Compositional model
runs runs (8-way) (LGR, MWS, Res. Opt) - serial
runs
Model type

50
Flexible Software Licensing
Pay-Per-Use Licensing

Need to accommodate periods of Optimize


Optimize application
application usage
usage fees
fees with
with the
the SIS
SIS
peak SIS application usage Pay
Pay Per
Per Use
Use Program
Program

Schlumberger Private
Need to eliminate application Enable
Enable totally
totally flexible
flexible access
access to
to selected
selected
access denials SIS
SIS application
application as as and
and when
when needed
needed
Need to access infrequently-used Monitor
Monitor application
application usage
usage
applications or modules
Improve
Improve client’s
client’s internal
internal cost
cost allocation
allocation
Need to track application usage for
internal cost allocation Observe
Observe and
and manage
manage current
current billings
billings for
for
software
software usage
usage

51
Flexible Software Licensing
Pay Per Use Program – How it works
SIS applications, licenses and
Pay Per Use logging software
are installed at client site
Users access licenses as

Schlumberger Private
needed, for as long as needed
Secure usage logs sent to
Schlumberger Program
Administration nightly
Schlumberger generates billing
information for client

52
Flexible Software Licensing
Pay Per Use (PPU)
PPU
– Utilizes a tracking agent on the License Server
– Start Time captured on license checkout
– Stop Time captured when checked in

Schlumberger Private
– Usage time summed on a monthly basis
– Invoiced on an hourly rate for time used

53
Apache’s Windows HPC System

Schlumberger Private
What’s in it!
Apache Simulation Hardware
1x Blade server (HP BL 460c in a BLc7000 enclosure)
• 10 compute nodes,1x head node
– Per node:
• 2xINTEL Xeon 5670 CPUs

Schlumberger Private
– 6 cores/CPU @ 2.93GHz, 24GB RAM (DDR3, 1333MHz)
• 3 cores per CPU switched off for performance reasons!
• Operating System: Microsoft Windows Server 2008 R2 HPC Edition
– 60 compute cores in total
• Disk space: ~25TB (usable) of NetApp storage attached to Blade server
-Not currently installed
• Interconnect: Gb, Infiniband 4x QDR

55
Apache Queuing System
Cluster is hard-divided into two groups of Blades:
• 2 Blades (ie 12 cores in total) are reserved for serial runs
• 8 Blades (ie 48 cores in total) are reserved for parallel runs

Schlumberger Private
Queues (set up as HPC-templates and provided as “pseudo queues” in Petrel):
• SerialWork, ParallelWork: All users have access to them
• MR-SerialWork, MR-ParallelWork: Only certain (power) users have access
– Runs have lower priority and are not allowed to use the whole cluster
• Priority-Serial, Priority-Parallel: High-priority runs (certain users only)
• (External-Serial, External-Parallel: Lowest priority; for users from other offices)

Note: The ECLIPSE Launcher cannot save the queue commands! Will be fixed in 2011.1.

56
Apache Simulator Licenses
Installed Simulator licenses:
Purchased ECLIPSE licenses:
• 4x Blackoil, 1xSolvent, 2xLGRs, 1xCompositional, 1xMultisegment Wells, 1xOpen
ECLIPSE,

Schlumberger Private
ECLIPSE Parallel licenses:
• 24x purchased ECLIPSE Parallel licenses (on 3 Blades)
• 56x ECLIPSE Parallel licenses provided on “pay-as-you-use” basis (on 7 Blades)

ECLIPSE options available only as PPU:


• 3 licenses each for
– Coalbed Methane, Reservoir Coupling, Open ECLIPSE, Multisegment Wells, LGRs, Gas Field
Operations, Networks, EOR Foam/Surfactant/Solvent/Polymer, Blackoil, Compositional, FrontSim

• 10xMultiple Realization
57
Additional Software Requirements
•Windows HPC Client Utilities must be installed on
workstations/servers where Petrel & ECLIPSE reside

•Windows XP workstations also require the installation of

Schlumberger Private
Service Pack 3 and Microsoft PowerShell

58
Apache “Queue” Settings
To submit jobs to the Windows Cluster from Petrel 2010.1:
Set the following under "Tools, System settings, Queue definition"
Name: Cluster
Server: localhost

Schlumberger Private
Remote Queue: houexap40
Options: --queueparameters=”/jobtemplate:ParallelWork”
(Or other appropriate template name)

 
 

59
Apache “Queue” Settings
To submit jobs to Windows Cluster from the Simulation Launcher:
Set the following under "Configuration, Settings, Queues"
Name: Cluster
Server: localhost

Schlumberger Private
Remote Queue: houexap40
(Unlike Linux, you will have to manually enter the Remote Queue
name)
 
Set the following under "Summary, Advanced"
Edit Command Line Options: “Enable” and add the following:
--queueparameters=”/jobtemplate:ParallelWork”
(Or other appropriate template name)
60
 
Biggest Simulation Models

Schlumberger Private
Biggest simulation models run on a daily basis
Real models:
Russia:
• 3-phase black-oil, 3.5mio/1.6mio cells, over 14,000 wells(!), 30 years history
• 76mio/6mio, blackoil

Schlumberger Private
Middle East:
• 7 component compositional, 3.5mio/1.2mio
– gas re-injection for EOR (complex physics), very heterogeneous geology
Canada:
• 7mio cell Thermal model (6 SAGD pairs on geological model)

Synthetic models:
• FrontSim: 56mio cells, synthetic 2-oil-water model
• ECLIPSE BO: Synthetic 3-phase model, 160mio cells on 992 cores, 4TB RAM

62

You might also like