Professional Documents
Culture Documents
This paper was prepared for presentation at the EAGE Annual Conference & Exhibition incorporating SPE Europec held in Copenhagen, Denmark, 4–7 June 2012.
This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been
reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessar ily reflect any position of the Society of Petroleum Engineers, its
officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohi bited. Permission to
reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.
Abstract 1. Introduction
High resolution reservoir modeling is necessary to Reservoir simulation is an important tool to gain
analyze complex flow phenomena in reservoirs. As insight into flow processes in the reservoir. Coats [1]
more powerful computing platforms become gave a brief history of early reservoir simulation and
available, reservoir simulation engineers are building discussed numerical errors in such computations.
larger and higher resolution reservoir models to Seismic interpretation, core data and well logs are
study giant fields. Large number of simulations is used to create geological or static models.
necessary to validate the model and to lower Simulation models are built by up-scaling a static
uncertainty in prediction results. It is a challenge to geological model and then integrating that model
accurately model complex processes in the reservoir with historical well data. Because the up-scaling
and also efficiently solve high resolution giant process introduces errors in approximation, efforts
reservoir models on rapidly evolving hardware and are being made to build simulation model to avoid
software platforms. There are many challenges to get up-scaling [2]. These high resolution models provide
high performance in reservoir simulations on detailed solutions, but pose a challenge for practical
constantly evolving computing platforms. There are simulation. Modeling fluid flows in a well has
constraints which limit performance of large scale become complex with the introduction of Maximum
reservoir simulation. In this study, we review some Reservoir Contact (MRC) with equalizers and other
of these constraints and show their effects on down-hole equipment. Unstructured gridding is
practical reservoir simulation. We review emerging needed to accurately model features in the reservoir
computing platforms and highlight opportunities and and it adds considerable complexities in simulation.
challenges for reservoir simulation on those Fig. 1a shows details of a reservoir model which are
platforms. It is anticipated that management of data needed to describe the reservoir and modern wells
locality by the simulator will become very important accurately. As one can easily realize, it is not an easy
on emerging computing platforms and there will be task to model and solve problems which have such
needs to manage locality to achieve good complexities. There have been considerable
performance. Heterogeneity in the computing advances in techniques for reservoir simulation and
platform will make it difficult to get good many large high resolution models have been built
performance without adoption of a hybrid to study the reservoir, however, many challenges
parallelization style in the simulator. In this study, still remain to be overcome. Branets et al. [3] gave
we analyze many benchmark results to illustrate an overview of modeling techniques for complex
challenges in high performance computations of geometries and heterogeneous properties of
reservoir simulation on current and emerging reservoir. One of the key challenges in reservoir
computing platforms. modeling is accurate representation of reservoir
geometry, including the structural framework. The
structural frameworks define major sections of
2 SPE 152414
2. Challenges in HPC
Simulation engineers require rapid turnaround of
hundreds of simulation jobs which can forecast With the advent of modern computers,
reservoir performance. HPC is indispensable to simulation has become a powerful tool along with
address such needs. There are many hurdles towards theory and experiments. Numerical simulation may
simulating very large models. Computational load to provide cost effective answer to many of the
solve a reservoir model increases nonlinearly with problems of interest to reservoir development and
the model size, there is also need to have large fast management engineers, which are extremely
memory on a processor to accommodate data for a difficult to get by theoretical analysis or extremely
large simulation model. Complex algorithms are costly by running series of experiments. Many
needed to manage data efficiently and distribute load physical problems, including processes inside the
evenly among processors. If all array variables in reservoir are very complex. Analysis of such
simulation are not distributed across processors, size problems often requires fast solution of large
of memory on a node will put a constraint on the systems of equations. There are many challenges
maximum size of the model that can be simulated on towards getting solutions of large models efficiently.
the system. For example, a one billion long array We discuss some of the important issues in the next
(using double precision) requires 7.5 giga-byte of section.
memory for storage. One can distribute arrays
among processors instead of maintaining a global
array. This approach has communication overhead,
i.e., increases data movement. As we will discuss 2.1 Performance on a Single Processor
later, cost of data movement is a critical issue on
emerging architectures. Therefore, algorithm and The processing element in the CPU needs to access
implementation of that algorithm on underlying data quickly to maintain high computational speed,
architectures needs to be designed carefully to which is commonly measured as floating point
extract good performance from increasingly operation per second (flops). The processing element
complex computing platforms. Saudi Aramco uses has access to very high speed memory, known as
an intensive collaborative environment [9] in cache. There is a hierarchy of cache memory on a
reservoir simulation studies to address uncertainty in CPU with varying amount of access time or latency.
reservoir characterizations and shorten decision Performance of CPU degrades if needed data are not
making cycle time. The intense nature of the study available in a cache level nearer to the processor.
creates very high demand for computational Cache misses can occur because of limitation of
resources. Such demand can only be met by ensuring capacity of cache buffer, conflict in data access, etc.
high availability of HPC resources. To provide cost Many models have been built to analyze
effective platform for HPC in reservoir simulation performance of computing architectures. They are
Saudi Aramco currently uses PC-Clusters [10] for based on techniques such as statistical analysis,
reservoir simulations. bottleneck analysis, etc.
In this study, our focus is to examine HPC of
4 SPE 152414
Kernel B
kernel A
8
The Roofline model combines peak floating point
2
1/8 2/8 1 4 16 64 256
performance, operational intensity and memory
Operational Intensity (Flops/Byte) performance in a two dimensional graph. The peak
floating performance can be found in hardware
specification or may also be measured by
(a) model-1 appropriate benchmarks. Similarly, the memory
performance can also be determined by benchmarks.
Fig. 3a shows a typical roofline model for a
2048 computer, where the x-axis is the operational
intensity (Flops/ DRAM byte-accessed) varying
512 from 0.125 to 256. The y-axis is the attainable
Attainable GFlop/s
efficient static load balance (domain decomposition) for a platforms in Saudi Aramco over recent years [7] is shown
model, we developed empirical relation based on run in Fig. 8a. The scalability (shown in Fig. 8b) remained
statistics and some simulation parameters. This approach almost the same. One may observe super scaling in large
has been very useful to improve overall performance of scale reservoir simulation (see for example, Ref. ([22],
simulation and utilization of our clusters. Load balance [23])) because of memory effect (as domain size
can be difficult to achieve if the simulation integrates decreases data are stored in cache/memory closer to the
multiple modules with disparate level of performance (for processor, which has lower latency, resulting in more than
example coupled facility and reservoir simulation study linear/ideal speedup).
reported by Hayder et. al. [17] couples a highly efficient
reservoir simulator with relatively much slower facility
simulator. Reservoir simulator was idle for most of the
time, because the facility simulation was slow. See Ref.
[17] for details of the implementation).
communication depends on surface areas of neighboring changing MPI calls based on message sizes. Table 2
domains. Therefore, such communication can be reduced shows their simulation timings for two reservoir models
by reducing surface area of domains relative to before and after optimization.
computational task (which is related to the number of grid
cells in the domain). Therefore, the decomposition
strategy should keep surface to volume ratio of domains Table 2: MPI collective operations timings
low. Unfortunately, communications related to wells are Model Message Orig Time New Time Overhead
irregular in nature and can far exceed communication Size (MB) (min.) (min.) Reduction
needed to calculate fluxes. Amount of well A 360 74 47 36.5%
communication becomes significant in a high resolution B 90 1309 1115 15.0%
model with large number of wells (number of well
perforation, i.e., connections to the reservoir increases as
grid blocks become smaller; if a grid block with a well
perforation is subdivided into two blocks, number of well 3.5 Software Library
completions in the new model for that grid block may
double to provide connections to two grid blocks instead
of one in the original case). Habiballah and Hayder ([22], Simulation time of a reservoir model depends on both the
[23]) studied communication algorithms in the POWERS hardware and also software library used during the
simulator to reduce overall communication overhead. As simulation. As expected hardware has a big impact on the
their results shown in Fig. 12a, well communication simulation time. The same is also true for software
overhead of a 200,000 cell model with 400 well is much libraries. In Fig. 13, we show normalized simulation
higher than Flux communication overhead. This was even times of a 270x410x27 model running on 192 cores with
after implementation of an optimization algorithm to three different MPI libraries -- Intel MPI (uses MPI-2
reduce irregular communication (Fig 12b). As Killough standard), MVAPICH1 (uses MPI-1 standard) and
[16] pointed out, well related computation is a challenge MVAPICH2 (uses MPI-2 standard) [Ref. 29]. This model
to HPC of reservoir models. has about 102,000 well completions (i.e., connections
with reservoir grid blocks). Fig. 13 shows that Intel MPI
and MVAPICH2 (both using MPI-2 standard) have
similar performance, while higher communication
overhead causes simulation using MVAPICH1 to be
nearly 35% slower. Strength of a particular
communication (e.g., MPI) library can be observed in
models with high level of communications. Numerical
libraries may also be used to in the simulator to achieve
high performance during simulation.
4.25 54 48 11%
performance.
4. Emerging Trends and Challenges
processor. Careful use of memory will be needed when simulation jobs. As massive reservoir models are built,
large problem is solved. One of the techniques which one there will be needs to use more nodes for a simulation and
may use is the use of mixed precision in computation, i.e., in addition, simulation will take longer time to finish. If
judiciously store some variables in single precision one uses four times more nodes and the simulation takes
instead of in double precision without compromising three times as long which is not an unreasonable estimate
accuracy of the solution. This should improve for larger jobs, then more than 10% of those jobs will
performance of memory bound algorithms, because likely fail, with the current level of hardware reliability. It
bandwidth requirement will decease with the use of single is important to have resiliency in the software [37] for
precision instead of double precision. An example of such recognition and adaptation to errors to mitigate the
computations can be found in Ref. [34]. Some grid blocks hardware reliability issue. It will be important to be able
in a particular model may not be active in simulation to restart computations from output of a failed job.
because of some of their limiting physical values, for 5. Conclusions
example, cut-off porosity, etc. are below some threshold It is now possible to build very large high resolution
value. Most of computations associated with these grid reservoir model to accurately simulate reservoir
blocks may be eliminated to reduce both memory and processes. We expect continued improvements in
computational requirements during simulation. Recently hardware and software technologies in coming years,
an international effort is underway to build an integrated which will make it possible to simulate even larger
collection of software known as extreme-scale/exascale models. It is a challenge to adopt rapidly evolving
software stack or X-Stack [37]. Many HPC issues hardware and software platforms, and ensure high
discussed in Ref. [37] should be also applicable to HPC of efficiency in reservoir simulations to solve high resolution
reservoir simulation on emerging computing platforms. giant reservoir models. This will remain a challenge as
demand for more accurate simulation results grows. High
4.3 System Reliability speed networks are essential for simulation of big models
System reliability is an important issue in simulation and on large number of processors. Simulation grid approach
several studies have been made to study failure rates in may be used to solve very large problems, at least for
large HPC platforms. Schroeder and Gibson [38] some models in limited basis. Communication overhead is
collected and analyzed failure rate statistics of node likely to limit their usage in routine basis, unless clever
outages and storage failures from a few large HPC latency tolerant algorithms are used.
systems. In terms of complete node outages, they
identified that hardware was the single largest component It is expected that high end systems with 100 million to
responsible for these outages, with more than 50% of billion cores will be built in the near future. It will be
failures assigned to this category, while software was the difficult to efficiently use systems with large number of
second largest category with 20%. The remaining portion cores. Level of power consumption mainly coming from
was related to reasons such as human, environment, data movements will be a big concern. Cost of data
network outages, etc. They observed that number of movement will be a critical factor to consider in designing
failures per year per processor varied between 0.1 and and implementing algorithms on emerging systems. It is
0.65 on those systems. With 0.65 failures per year per expected that there will be a need to use hybrid
processor, a large system they studied had close to 1100 programming model, with MPI is likely to be a part of it,
failures per year. A typical application running on that at least for the near future. Improvement of algorithms for
system would be interrupted and forced into recovery computations on emerging computing platform will
more than two times a day with that high level of failure advance opportunities in HPC in reservoir simulation.
rate. In terms of storage and hard drive failures, Schroeder
and Gibson [38] found out that the average annual failure Acknowledgements
and replacement rate for hard drives was up to 5%. This The authors would like to thank the Saudi Arabian Oil
means that in a cluster of 512 nodes, the average failure Company’s Management for permission to publish this
rate for hard drives is around 1-2 drives every two weeks, paper. We thank our colleague Raed Al-Shaikh for his
which is consistent with our observations of the HPC encouragement to examine the simulation grid concept for
systems in Saudi Aramco. Since the failure rate of a reservoir studies on HPC platforms at Saudi Aramco. We
system grows with the number of processor chips in the also thank our colleagues Raed Al-Shaikh, Gordon Tobert
system, failure rates in future systems will likely to and Tofig Dhubaib for reviewing the paper and making
increase ([37], [38]), as a result, a significant portion of many helpful suggestions.
the system resources will not be available for applications
to do computation [38]. Utilities should be used to
identify any system problem and subsequently take References
preventive measures during resource allocation for a [1] Coats, K. H., “Reservoir Simulation: State of the
simulation job [39]. At present, we observe processor Art”, SPE 10020, JPT, Aug 1982, pp. 1633-1642.
failure rate on Saudi Aramco HPC computing platform [2] Dogru, A. H., “Giga-Cell Simulation”, Saudi Aramco
close to 0.1 per year per processor (or 0.2 per year per Journal of Technology, Spring 2011. pp. 2-8.
node), which translates into failure of about 1% of
SPE 152414 13
[3] Branets, L. V., Ghai, S. S., Lyons, S. L. and Wu X., and Reservoir Simulation Models”. Saudi Aramco
2009. “Challenges and Technologies in Reservoir Journal of Technology, Fall 2011, pp. 66-71.
Modeling”, Communications in Computational [18] Gropp, W. D., Kaushik, D. K., Keyes, D. E. and
Physics, 6(1), pp. 1-23. Smith, B. F., 2001. “Latency, bandwidth, and
[4] Sunaidi, H. A., “Advanced Reservoir Simulation concurrent issue limitations in high-performance
Technology for Effective Management of Saudi CFD”, Computational Fluid and Solid Mechanics,
Arabian Oil Fields”, 1998, Presented at the 17th pp. 839-842. Elsevier Science Ltd.
Congress of the World Energy Council, Sept. 1998, [19] Gropp, W. D., Kaushik, D. K., Keyes, D. E. and
Houston, TX. Smith, B. F., 2001. “High-Performance parallel
[5] Dogru, A.H., Li, K.G., Sunaidi, H.A., et al. 2002. “A implicit CFD”, Parallel Computing, 27, pp. 337-362.
Parallel Reservoir Simulator for Large Scale [20] Jayasimha, D. N., Hayder, M. E. and Pillay, S. K.
Reservoir Simulation”, SPE Reservoir Evaluation & 1997. “An Evaluation of Architectural Platforms for
Engineering Journal, 6(1), pp. 11-23. Parallel Navier-Stokes Computations”, Journal of
[6] Pavlas, E. J., “Fine-Scale Simulation of Complex Supercomputing, 11, pp 41-60.
Water Encroachment in a Large Carbonate Reservoir [21] Sindi, M., Baddourah, M. and Hayder, M.E., 2011.
in Saudi Arabia”, 2002. SPE 79718, SPE Reservoir “Studies of Massive Fields Modeling Using High
Evaluation & Engineering, Oct 2002, pp. 346-354. Performance Computing in the Oil Industry”. Saudi
[7] Baddourah, M., Hayder, M.E., Habiballah, W., et al., Aramco Internal Report, Unpublished.
2011. “Application of High Performance Computing [22] Habiballah, W.A. and Hayder, M.E. 2003. “Large
in Modeling Giant Fields of Saudi Arabia”. SPE Scale Parallel Reservoir Simulations on a Linux PC-
149132, Presented in the 2011 SPE Saudi Arabia Cluster”. 2003. Proceedings of the ClusterWorld
Section Technical Symposium and Exhibition, Al- Conference and Expo: The HPC Revolution (4th LCI
Khobar, Saudi Arabia. International Conference on Linux Clusters), San
[8] Keyes, D. E., “Algorithms for Extreme Simulation in Jose, CA.
Science and Engineering”, Presented at the 2011 [23] Habiballah, W. A. and Hayder, M. E., 2004. “Parallel
Saudi Arabian HPC Symposium, Al-Khobar, Saudi Reservoir Simulation Utilizing PC-Clusters, Saudi
Arabia, December 6-7, 2011. Aramco Journal of Technology, Spring 2004. pp. 18-
[9] Elrafie, E., White, J. P. and Al-Awami, F. H., “The 30.
Event Solution: A New Approach for Fully [24] Brazell, O., Medssenger, S. Abusalbi, N. and
Integrated Studies Covering Uncertainty Analysis Fjerstad, P. 2010. “Multi-core evaluation and
and Risk Assessment”, Saudi Aramco Journal of performance analysis of the ECLIPSE and
Technology, Spring 2009, pp. 53-62. INTERSECT Reservoir simulation codes”. Presented
[10] Huwaidi, M. H., Tyraskis, P. T., Khan, M. S., et al.: at the 2010 Oil and Gas High Performance
“PC-Clustering at Saudi Aramco: from Concept to Computing Workshop, Rice University, Houston, TX.
Reality”, Saudi Aramco Journal of Technology, [25] Bova, S. W., Breshears, C. P., Cuicchi, C. E., et al.,
Spring 2003, pp. 32-42. 2000. “Dual-level parallel analysis of harbor wave
[11] Williams, S., Waterman, A. and Patterson, D., response using MPI and OpenMP”, Int. J. High
“Roofline: An Insightful Visual Performance Model Performance Comput. Appl., 14, pp. 49-64.
for Multicore Architectures”, 2009. Communications [26] Hayder, M.E., Keyes, D.E. and Mehrotra, P. 1997.
of the ACM, 52(4), pp. 65-76. “A Comparison of PETSc-Library and HPF
[12] Mora, J., “Current R&D hands on work for pre- Implementations of an Archetypal PDE
Exascale HPC systems”, Presented at the 2011 Saudi Computation”, Advances in Engineering Software,
Arabian HPC Symposium, Al-Khobar, Saudi Arabia, Vol. 29 (3-6), pp. 415-423.
December 6-7, 2011. [27] Hayder, M. E. and Jayasimha, D. N.: “Navier-Stokes
[13] Amdahl, G.M., “Validity of the single-processor Simulations of Jet Flows on a Network of
approach to achieving large scale computing Workstations”, AIAA Journal, 34(4), pp. 744-749,
capabilities”. Proc. Am. Federation of Information 1996.
Processing Societies Conf, AFIPS Press, 1967, pp. [28] Brazell et al, “Reservoir Simulation Made Simpler
483-485 and More Efficient with 10 Gigabit Ethernet”,
[14] Gustafson, J. L., “Reevaluating Amdahl's Law”, ftp://download.intel.com/support/network/sb/reservoi
1988.Communications of the ACM, 31(5), pp. 532- r_cs_2010.pdf
533. [29] http://mvapich.cse.ohio-state.edu/overview/
[30] Owens, J. D., Houston, M., Luebke, D., et al., 2008.
[15] Hill, M, and Marty, M., 2008. “Amdahl’s Law in the “GPU Computing”, Proceedings of IEEE, 96 (5), pp.
Multicore Era”, Computer, 41(7), pp. 33-38. 879-898.
[16] Killough, J. E., “Will Parallel Computing Ever be [31] Appleyard, J.R., Appleyard, J.D., Wakefield, M.A.
Practical”, 1993. SPE 25556, Presented at the Middle and Desitter, A.L. 2011. “Accelerating Reservoir
East Oil Show of SPE, Manama, Bahrain. Simulators using GPU Technology”, SPE paper
[17] Hayder, M.E., Munoz, A. and Al-Shammari, A. 141265. Proceedings of the 2011 SPE Reservoir
2011. “Facilities Planning Using Coupled Surface Symposium, Woodlands, TX.
14 SPE 152414