You are on page 1of 9

SPE 106075

Data Management in Reservoir Simulation


S.C. Gencer, ExxonMobil Upstream Research Co.; B.P. Ketcherside, ExxonMobil Global Services Co.; and
G.O. Morrell, E.L. Mulkay, and K.D. Wiegand, ExxonMobil Upstream Research Co.

Copyright 2007, Society of Petroleum Engineers


This paper was prepared for presentation at the 2007 SPE Reservoir Simulation Symposium
held in Houston, Texas, U.S.A., 2628 February 2007.
This paper was selected for presentation by an SPE Program Committee following review of
information contained in an abstract submitted by the author(s). Contents of the paper, as
presented, have not been reviewed by the Society of Petroleum Engineers and are subject to
correction by the author(s). The material, as presented, does not necessarily reflect any
position of the Society of Petroleum Engineers, its officers, or members. Papers presented at
SPE meetings are subject to publication review by Editorial Committees of the Society of
Petroleum Engineers. Electronic reproduction, distribution, or storage of any part of this paper
for commercial purposes without the written consent of the Society of Petroleum Engineers is
prohibited. Permission to reproduce in print is restricted to an abstract of not more than
300 words; illustrations may not be copied. The abstract must contain conspicuous
acknowledgment of where and by whom the paper was presented. Write Librarian, SPE, P.O.
Box 833836, Richardson, Texas 75083-3836 U.S.A., fax 01-972-952-9435.

Abstract
Data from a wide variety of sources are required for reservoir
simulation. Simulation itself produces large quantities of data.
Yet, good data management practices for reservoir simulation
data are typically neither well-understood nor widely
investigated. This paper presents a specific architecture to
manage reservoir simulation data, discusses experiences from
six years of global use, explains adjustments to support
changing workflows and outlines challenges that lie ahead.
The architecture consists of a Database Management System
(DBMS) and files in managed file directories, called Reservoir
Input Output System (RIOS). All simulation input data and
results are maintained by a Data Management System (DMS).
The reservoir simulator reads input files written from the
DBMS to RIOS and writes results to files in RIOS. DBMS,
RIOS and integrated management tools (DMS) make up the
data management environment.
The environment has been in use inside ExxonMobil since late
2000 and now supports close to 500 users (85% of reservoir
engineers). There are over 30 individual databases containing
2TB of online data and about 6TB of online RIOS data. The
environment itself introduces some additional work. Support
staff is required for maintenance of databases, RIOS areas and
problem resolution. Direct user manipulation of data is not
permitted and additional tools are required to access and
interpret data.
The environment provides many benefits. While it insures
data integrity, security and consistency, it also automatically
updates defaults, limits, associations, types, etc. This allows
running of older simulations and generation of aggregate
statistics and usage audit trails.

The architecture and experiences presented in this paper may


be unique in the industry. The DMS was designed, developed
and deployed over a ten year period. It is a successful
software story and is viewed, along with the simulator, as a
key enabling technology for success with reservoir simulation
within ExxonMobil.
Introduction
Reservoir simulation is inherently a data-intensive process. It
starts with geological models and their properties, and
assignment of phase behavior or equation of state data,
relative permeability and capillary pressure information and
geomechanical data. It requires layout of the surface facility
network, subsurface configuration of wells, their attributes,
pressure and rate limits and other production and optimization
constraints. Very often, production history information,
hydraulics tables, completion tables and logic for runtime
management of wells and surface facilities are needed.
Finally, special cases like thermal and fractured reservoir
simulations require their own set of additional data.
During simulation, timestepping information, convergence
parameters and well performance data can be logged and
analyzed. Results, such as pressures and rates from wells and
surface facilities and pressures and saturations from the
simulation grid can be monitored and recorded. The state of
the simulator can be recorded at specificied intervals to enable
restart of a run at a later time.
This results in an abundance of data to analyze, visualize,
summarize, report and archive. Over the years, many authors
have tried to address one aspect or another of this data
management problem and many commercial and proprietary
simulators have made allowances to simplify users work in
this area1-3. However, in general, data management has not
been a widely investigated aspect of reservoir simulation.
Data management in reservoir simulation enables workflows
and collaboration, insures data integrity, security and
consistency and expedites access to results. In todays
computing environment, data management is an enabler to
meet the growing need for reservoir simulation and to make
simulation available to a wider audience of professionals,
including many kinds of engineers and geoscientists.

With its EMpower TM reservoir simulator4-5, ExxonMobil spent


considerable time and effort in developing, deploying,
supporting and maintaining a data management environment
surrounding the reservoir simulator. These experiences - and
not the computational aspects of the reservoir simulator - are
the subject of this paper.
Elements of the Data Management Environment
The data management environment encompasses all
simulation input, results and restart data and a collection of
software programs, tools and procedures for their management
(DMS).
Simulation Data
The top-down view of the simulation data starts with a
hierarchy of projects, models and cases (Figure 1). A project
usually encompasses a particular reservoir study. Models are
used to distinguish between different simulation approaches,
which may require fundamentally different discretizations or
fluid representations such as black-oil vs. compositional
simulation, fractured vs. non-fractured, etc. Cases within a
given model are generally expected to represent minor
changes in the input data or facility network representation,
with most of the data being shared among them. Currently,
approximately 1,000 projects with 5,000 models and 20,000
cases are managed worldwide.

Figure 1: Subset of data model showing project/model/case


hierarchy and their relationships. Projects can contain
one or more models, each of which can contain one or
more cases.
All data needed for and produced by a simulation fall within
one of three broad categories: Arrays, Granules and Facility
Network Data. Simulation cell and interface data such as
pressure, mole fractions, fluxes, etc. fall into the first category.
Granules are collections of parameters that are intended to be
small in size while containing a variety of different data types.
For instance, black-oil fluid parameters for a given domain
comprise such a collection; solver parameters and timestepcontrol parameters are further examples. A Facility Network
is a collection of physical facilities represented as nodes and
TM

EMpower is a trademark owned by ExxonMobil Upstream


Research Company.

SPE 106075

connections.
Example facilities are wells, platforms,
separators, terminals and the pipelines that connect them. All
facilities have attributes and constraints that describe them and
their behavior. For example, all facilities have a name and
active state and all wells have a rate or pressure limit.
A key feature distinguishing ExxonMobils current reservoir
simulation system from its predecessors is its use of an
extended surface facility network model that is fully integrated
with the reservoir. This key feature contributes greatly to the
complexity of the data model. All facilities in the network are
directly accessible and can be manipulated by the reservoir
engineer for maximum flexibility. In addition, users can add
their own attributes and procedures to a given facility type.
This capability is extremely important. Assume, for instance,
that the reservoir engineer wants to model submersible pumps
in a way that the current simulator version does not support.
The needed variables and functionality can be added to the
well facility type by the engineer and made a part of the
timestep calculation. This flexibility is very powerful and
allows rapid prototyping of new functionality.
Well Management Logic
Facilities are the most dynamic part of reservoir simulation.
In EMpower, they are managed at runtime with user defined
logic called Well Management Logic. This is part of the input
data but it is such a distinctive concept that it deserves a more
detailed description. The timeline of a reservoir simulation is
usually divided into two segments. The first is history
matching while the second is prediction. During history
matching, the goal is to design a model that will match
historical rates and pressures. During prediction, reservoir
engineers want to experiment with various scenarios in order
to approximate a good production profile for the field. For
instance, the engineer wants to test if it is sufficient to reduce
high GOR wells and increase production of low GOR wells in
order to maintain a given oil-production plateau while keeping
the fields gas production in check, or whether it is necessary
to work-over some wells. While it is theoretically possible to
hard-code scenarios like this, it is impossible to pre-conceive
every possible strategy a reservoir engineer might want to try.
Allowing the engineer to define such strategies using a
programming environment greatly enhances the flexibility and
utility of a reservoir simulator while complicating the data
management environment.
Data Management System
In EMpower, the DMS is the central work environment for the
simulation engineer. It is the single point of entry for
preparing, running and analyzing simulations and therefore, it
has several distinguishing characteristics and requirements.
First, it is data driven; all dialogs work from data definitions.
Some can display the three data types (arrays, granules and
facility network data) without knowledge of actual data
content. Second, user access is controlled by login and data
access is controlled by user, group and world permissions. It
is possible to completely hide projects, models and cases from
other users and it is also possible to setup a project, model or
case for use by a specific group of users. Third, the DMS

SPE 106075

insures backward compatibility, interoperability and data


integrity with tools that validate and upgrade data and check
integrity of arrays, granules and facility data. Finally, a set of
administrative tools are supplied to test components of the
data management environment, to support different access
models (administrator, manager, user, etc.) and to provide
functions like managing users, migration of data from one
version to another and reporting of project, model, case
statistics.
Simulation Workflow and Data Management
One of the great advantages of using a DMS is that it allows
the definition of dependencies between input data, results data
and simulation times. For instance, if a user changes input data
at time t0, the system is able to determine what data becomes
invalid at times t>=t0. Or assume that the user changes from
black-oil to compositional simulation. The DMS is able to
indicate what additional input data is needed and can provide
appropriate defaults. Data validation options such as checking
fluid property tables or timestep controls can prevent the user
from wasting time by supplying ill-conditioned parameters to
the simulator.

domain is a user-defined region inside the simulation model


(Figure 2). An identical copy of a case does not duplicate any
data, but triggers the creation of a second set of relationships.
Selected data can then be unshared to facilitate differences
between cases.
Variable Attribute Repository
Since the development of a reservoir simulator is an ongoing
process with new features being added on a regular basis, care
must be taken to avoid frequent changes in the data layout,
which is costly. Therefore, early in the development process,
it was decided to create a meta-layer between the data model
and the data layout to avoid frequent changes in data layout.
This meta-layer is called the Variable Attribute Repository
(VAR) and describes data items of all three categories
mentioned earlier: arrays, granules and facility network data.
Assume a new array needs to be added to the system. From a
data layout perspective this is just another generic array that
can be linked to a case, time and domain. The VAR however,
(whose layout is fixed) will have an additional entry detailing
the purpose of the array, its description, default value, etc.
Facility data description is even more versatile: not only is it
possible to define any kind of attribute for a facility type, new
facility types can also be defined from a base set of facility
types. For example, a separator node is similar to network
nodes, but with some unique attributes of its own, such as
temperature.
The VAR is extended as users define new facility attributes
and arrays during their work. For example new attributes can
be calculated and used in well management logic or new
arrays can be created to modify transmissibilities. The
definitions of these attributes and arrays are stored in User
VAR at model level and are available to all cases the model
contains in the same manner as regular VAR definitions.

Figure 2: Subset of data model showing relationships of a


variable to a case, domain and time. The relationships are
managed with variable use class and each case has a
hashed list of variables which is managed by case to
variable use class.
Data Sharing
The project/model/case hierarchy implies that the majority of
the simulation input data is shared among cases within the
same model. For instance, the user may try different
permeability values during a history match or test different
solver parameters to achieve better performance, and there is
no need to create a complete new set of data that duplicates
the simulation grid, input arrays, granules and facility
network. However, as simple as the concept sounds, the data
sharing code within the DMS can be quite complicated, since
almost all input data can be time dependent. Mathematically,
data sharing is established via a unique relationship (data-item
to case, time and domain for variables, data-item to case, time
and facility for facility attributes and constraints) where

Data Mining
Potentially the greatest benefit of managing reservoir
simulation data, though, is the capability for data mining. The
amount of data generated for and by simulation is significant.
It is not easy to analyze results just for one study, let alone
across many. However, with well-defined data management,
automated tools can scan and analyze data areas to generate
overall statistics and trends; this capability is known as data
mining. Data mining enables quick overview of just what
kind of models are being worked on as well as providing
insight into the type of problems users run into, etc. This
improves quality control and opens the door to a self learning
system.
Architecture of the Data Management Environment
The simulation environment has been implemented as a
heterogeneous,
distributed,
three-tier,
client-server
architecture. The DMS is the client software at end user
workstation. All reservoir simulation data are stored in the
second tier consisting of a database and file directories in a
mass storage area called RIOS. The simulator running on
different compute servers is the third component and
represents the server side.
Figure 3 summarizes this
architecture.

SPE 106075

and the simulator, ability to handle large cardinalities of


relationships and for implementation of the VAR concept and
generic storage of arrays, granules and facility data, an object
database was deemed the better choice.
Object Database
The object database provides many desired features including
transaction oriented, multi-user access with object locking and
rollback functionality. It manages the schema and object
relationships.
It enables definition of granularity of
transactions based on user actions. Management of object
relationships is probably its biggest strength. This is difficult
and involved to implement with a relational or object
relational database. There are more than 100 unique object
classes, and approximately 150 distinct object relationships,
some of which can have tens of millions of rows in a simple
relational table implementation.
Figure 3: Diagram of the three tier simulation
environment architecture. The DMS is the client piece of
the architecture and the simulator is the server piece. The
database and RIOS comprise the second tier. Middleware
(not detailed in this paper) manages communication
between tiers.
This architecture explicitly decouples the simulator from the
DMS and database. The simulator has completely different
requirements that guide what platform it should run on. It has
access to RIOS areas, reads its input from files in RIOS and
writes all its output to files in RIOS.
Access control, security, data integrity and scalability features
discussed above are inherently addressed by commercial
databases. Databases are also ideal for managing and relating
large sets of data. Therefore, for the second tier, a database
was selected for managing primarily input data. When
running a case, the DMS writes input data from the database
to RIOS and launches the simulator. Results and restart data
and runtime log files written by the simulator to RIOS are
managed and used by the DMS as well.
Middleware
The three-tier architecture will not work without a middleware
component.
The middleware keeps track of running
simulations, figures out where files are and enables
communication between the DMS and simulator. It consists
of one master network service per site, which manages
services running on different compute servers at this site.
Each compute server service is aware of RIOS directory
names and their mapping to simulation jobs and passes simple
commands and their return codes between the DMS and
simulator. The details of the middleware are not discussed in
this paper.
Database
There are several choices for the type of database, including
relational, which is highly pervasive in many industries.
However, for the needs of this project, which include
compatibility with the object oriented paradigm of the DMS

The database schema is a logical decomposition of the users


view into the data model. It stores parameters and works in
parallel with locking. For the development team, a guiding
principle was to minimize changes to the database schema
since each change requires migration of existing data, which is
cumbersome and time consuming. Therefore, the database
schema is kept relatively simple. The meta-schema or VAR
concept is built on top of the database schema and enables
definition of all granules, arrays, facility types and attributes
without any database schema modification.
Relationship Management
Executing lookups of array, granule or facility data is a key
performance issue. A quick response time is critical. The
number of domains, arrays and granules in a case is on the
order of hundreds of objects. Thousands of objects result
when many cases in the same model share the same arrays and
granules. Depending on the number of facilities and time
variant changes, the number of facility attribute and constraint
objects managed by a case can reach into tens of thousands.
When multiple cases share the same facility network, the
object count can reach hundreds of thousands and more. To
simplify searches, facility data is looked up from a facility
instead of a case, but still this can mean examining tens of
thousands of relationships per facility. In a highly interactive
environment, where hundreds to thousands of attributes and
constraints may be looked up during a user action, slowness of
this capability can be a major bottleneck. To maximize
performance, a hashing technique based on cryptographic
hashing keys was developed. With this technique, object use
lookups are reduced to an average of one or two searches into
hundreds of thousands of elements.
Although the database schema is relatively simple, the
quantity of relationships and the number of objects in each
make the database environment quite complex. Test programs
were developed to exercise and validate functionality at the
unit level and maintenance programs were written that correct
inconsistencies and problems with the database. Although the
VAR concept has minimized the need for schema changes,
programs had to be written to manage upgrades of data when
schema changes occur.

SPE 106075

RIOS
The RIOS concept was developed to enable sharing of data
between the DMS and the simulator, as the simulator was
designed to be independent of the database. Every case has a
RIOS directory with a unique name. The database case
objects know about their RIOS directories. Every RIOS area
is associated with a specific business unit. Access can be
controlled in a fashion similar to database permissions with
system level owner, group and world permissions. It is
possible to completely hide a certain RIOS area by allowing
only a specific group ownership and access to it. RIOS areas
can be network accessible or local. When local, unless public
access is granted explicitly by the user, the RIOS is only
accessible by local simulation jobs.
A RIOS directory contains two types of files: (1) a collection
of files that contain input, restart and results data and (2) a set
of log files that store per timestep runtime information and
user requested output generated by well management logic.
Input, Restart and Results Files
As explained earlier, before launching a simulation, the DMS
generates an input file for the simulator with all the input
granule, array and facility network data in the case RIOS
directory. As the simulator is running, it appends its complete
state information to this file at the restart times requested by
the user, or the user may request a set of restart data be written
on demand at any point during the run. The simulator also
writes specified arrays, granules and facility data to the results
file at user requested times. These results can be monitored
while the simulator is running.
The input/restart and results files are self-referencing files;
they can refer to data within the same file or file A can refer to
data in file B or vice versa. They can be ASCII or binary and
are completely portable. The format and structure of these
files have been developed in-house over many years and are
proprietary.

Figure 4: The Log Browser tool provides an interactive,


HTML interface to critical timestep information, like
timestep cuts, timings, material balance, I/O, etc. with text
information, tables and charts.
Hyperlinks enable
complete cross-referencing of this critical data.
DMS
The DMS is probably the most visible component of the data
management environment. Written completely in C++ on the
Windows operating system, it brings together many home
grown applications, vendor applications, 2D and 3D
visualization tools and other 3rd party packages to enable
engineers to do their work without worrying about system
details. The DMS is where all the data and their unique
characteristics and associations are exposed to the engineer as
intuitively as possible. Therefore, it is an area of constant
evolution. The front-end, the main entry point for users, is
shown in Figure 5 with sample viewers along with the
project/model/case tree and data manager, which lists data
items for the current case.

Log Files
The log files record timestepping information, convergence
parameters, well performance data and information on
problem nodes. The presentation of log file data is extremely
sophisticated with a web style interface that displays highly
detailed tables, charts and graphs. The power of this interface
is further enhanced with its ability to present user messages
written from well management logic in these formats as well.
A screenshot of this tool is presented in Figure 4.
When a case is run, the DMS deletes any existing RIOS files
first. When a case is restarted, all RIOS files are truncated to
restart time. The DMS accesses results arrays, granules and
facility data from RIOS files directly.
Figure 5: The Front-End is the main entry point for
users to the data management environment. It presents
the filtered project/model/case hierarchy, the data items
for the current case, and ability to manage the data items
for both input and results.

Archiving
One of the crucial tools available in DMS is the archiving
capability. Models, which are self-contained, can be archived
with or without their RIOS data to a selected destination, such
as a LAN drive, a DVD drive or a local disk. The archive file
is in XML format and is therefore completely device and
application independent. This data can eventually be migrated
to off-line storage. The model can be deleted from the
database as well as the associated RIOS data to free up disk
space. This process is crucial because it guarantees problemfree access to data even years later, regardless of version of
the DMS, database schema or VAR, by applying all relevant
changes necessary to upgrade the data at the time of restore.
Deployment and Usage
The data management environment was first released in late
2000 along with the deployment of ExxonMobils EMpower
reservoir simulator. It has been in use inside ExxonMobil
since then and now supports close to 500 users. There are 32
individual databases containing 2TB of online data and about
6TB of online RIOS data corresponding to about 5,000
models. The environment has been successfully deployed in
eight countries on five continents outside the United States
including Europe, Asia, Australia, Africa and South America.
The system has gone through three major, five minor and three
patch releases. Not including the simulator, there are
approximately nine million lines of software code (with six
million lines of 3rd party vendor code and over half a million
lines of database related code) and 25,000 files (18,000 of
which are 3rd party vendor source code files).

Figure 6: Flow of a support issue through organizational


components making up the support structure for the
simulator and its data management environment.
Development and business validation are in Research
organization, while all other components except users are
in Information Technology organization.
Support Issues
There is a significant support infrastructure built around the
reservoir simulator and its components (Figure 6). The data
management environment adds additional complexity to this
work. The environment is developed and validated for

SPE 106075

business use in the research organization. After this the


Information Technology organization takes over the full
deployment and training tasks.
Besides users, system
administrators, database administrators and user support staff
must be trained. The deployment includes upgrade of existing
databases and is a phased process which takes several months.
Once deployed, the environment requires care and feeding by
on-site user support staff and database administrators, who in
turn can rely on central user support, application support and
database administration organizations. The long cycles from
development to deployment require maintenance of two or
three versions of the environment concurrently at any one
time. The issue faced in development and support of this data
management environment is similar to those faced by larger
software vendors: increasing user base, multiple versions and
data compatibility needs require substantial development and
support time.
Data Complexity
Reservoir simulation data can cover a wide spectrum: the
simulation grid can be one cell or millions of cells; the facility
network can be a single well or tens of thousands of wells; the
time range of simulation may cover milliseconds to millions of
years; time dependence of data can be from none to seconds to
every minute to days; results can be reported so frequently or a
run can be so long that the files written by simulator can reach
over 10GB in size. A good simulator and its data management
system must be flexible enough to handle any of these
requirements.
Size
From the beginning of the project, this variability in quantity
and spectrum of data has continuously taxed the data
management environment. Schemes have been developed to
compress RIOS files and to handle files greater than 2GB in
size.
The log files have been compacted from an initial
ASCII format to a compressed, delimited form. The restart
data has been made less persistent by implementation of
disposable restarts, which keeps track of latest restart data
only. None of these, however, have been as taxing as the
work required on the database. There has always been a
feature or an action not performing well enough or not
working at all with a particular set of data or use of it. Either
the implementation had not considered this kind of use or the
database design reached its limits in dealing with this data.
Some of the actions most fragile in this aspect include copying
cases, archiving models, creating restart cases and unsharing
facility network data. These issues have all been addressed
over time. The database was initially designed to be all
comprehensive and included loading of all facility results and
meta-data for results arrays into the database as well.
However, after several years of dealing with performance
problems with results loading and deletion, RIOS files are
now the single source of results and restart data. The database
still stores all input data and has a pointer to the RIOS
directory of each case to insure consistency between RIOS and
database.

SPE 106075

Database Migration
The biggest data challenge came when business drivers
demanded change of the database vendor. This was a big
undertaking, especially since object databases are not
standardized like relational databases and the code was not
developed with a distinct, database independent data access
layer. However, the design was adequate and a distinct
separation of database transactions from other code is in place.
This huge effort was brought to a successful completion and
also ushered in the XML archiving scheme, which enabled
archive and restore of models independent of database vendor,
with backward compatibility. Upgrading older data schema
models to new data schema using archive/restore turned out to
be a natural extension.
Today, the database provides a secure and consistent
repository that works in a collaborative environment.
However, it is an ongoing project that requires continuous
improvement to support increasing data and flexibility
requirements.
User Perspective
Users of the reservoir simulator are as varied as the simulation
data. There are engineers who are new to the company and
who are trying to use reservoir simulation for the first time;
there are experienced engineers who use simulation only now
and then; there are geoscientists who do studies of different
dimensions and then there are the experienced, hard core, day
to day users who know the simulation process inside and out.
It is not possible to meet all the desires of this diverse set of
users when building a system. Less experienced users prefer
rigidly defined workflows that rely heavily on graphical user
interfaces, while more experienced users find the graphical
user interface limiting and want to do their own analysis using
tools like SAS, Excel and MatLab. The data management
environment, initially designed to be all encompassing and
self-contained, is now asked to be more open. Some progress
has been made to this end. Many functions of the DMS can
now be driven from scripts and a new API is being made
available for accessing input and results/restart data. Further
developments are in progress to connect the DMS with other
applications using Windows Workflows.
Database/RIOS as Integral Part of Work
One of the most visible results of the data management
environment is awareness of database and RIOS usage by both
users and administrators. Consolidation of all relevant
simulation data to these areas has clearly brought to light the
quantity and nature of data that has to be managed. Disk
space for both database and RIOS areas is cost allocated to
business units; therefore, there is a constant check for overuse.
Users must always be aware or are reminded when they fill up
the database or RIOS.
Help System
One of the least talked about but most appreciated components
of the DMS is the online help system based on Microsoft
HTMLHelp.
All functionality within DMS is clearly
documented and explained. For many users, this is their first
point of support. The help system is context sensitive and

incorporates many hyperlinks to interrelated information


(Figure 7). It contains over 1100 HTML pages. Users would
like the help system to incorporate newer search capabilities,
similar to those used in internet search engines.

Figure 7: The HTML based Help System provides detailed


information on program functionality and underlying
science. It is context sensitive, and hypertext links and
index and search capabilities facilitate finding of
information.
Developer View
There are on average eight developers devoted to development
and maintenance of the data management environment. About
three versions of the software must be maintained
concurrently and the same resources must also deal with
nightly builds, regression tests and porting to new Windows
operating systems. Developers on average spent about 50% of
their time on maintenance and support issues.
Most of the developers are seconded from the Information
Technology organization; the turnover of staff is pretty high
however, as the IT career development process rotates staff
every two to three years. This puts an extra burden on the
project, as it takes at least four to six months for new staff to
become fully productive. Overall complexity of the system
and interrelationships of its components do not make it any
easier. However, the fact that the system has successfully
gone through many releases and has continously increased its
user base is a positive aspect of the project, especially
considering many of the original developers have long ago
moved on to other assignments.
Management of reservoir simulation data requires long term
commitment and continuous improvement. From business
perspective, it makes no sense to start anew from version to
version. More and more changes must be done and new
features added while insuring compatibility with existing data.
The Road Ahead
The trend in reservoir simulation is towards more: more users,
more models, more cells, more wells, more cases, more data
and more integration.

Bigger Grids and Facility Networks


Simulation engineers would like to be able to build reservoir
models with several million cells and manage several
thousand wells with more flexibility. Currently, hardware and
software limit the DMS to a few million unstructured grid cell
models and a few thousand wells for comfortable operation.
As grids get larger and larger and number of facilities in the
facility networks reach the many thousands range, both the
database and the DMS will be taxed even further. To be able
to handle this kind of load, they will continue to be the subject
of continuous improvement. The computing environment is
also changing to support this load: grid computing, high-end
compute servers and 64-bit desktops for clients.
Movement of input data, at least large granules and arrays
related to grid definition and properties, to the RIOS area is
also being considered. This would eliminate duplicate storage
of the same data in different forms, two separate sets of input
and output routines (one to the database and one to the RIOS)
and allow greater flexibility for external programs to supply
and/or modify simulator input data. A database would still be
used to manage data relationships, but would not be burdened
with having to manage huge arrays and millions of objects
which have been major bottlenecks to database performance.
Automated History Matching and Optimization
With the requirement to be able to run many slight variations
of a base case, automated history matching and optimization
add another dimension to more. Optimization and history
matching are two areas of increasing popularity and research
interest. Both need efficient management of tens to hundreds
of cases that have little variation. Users have to be able to
design experiments easily, many time dependencies must be
managed behind the scenes, and results must be presented in
new ways. Data sharing was a good start ten years ago, but
now scenario management becomes very important. This
requires substantial work on the architecture of the data
management environment and will involve extension of the
data sharing concept to the RIOS files.
Data Mining
More simulation runs produce more data. Data mining will
come into greater use to extract useful information from all
this data. With standardized files in RIOS directories and
consistent databases, finding the right interpretation will be the
key. This is a recent area of development for ExxonMobil.
Tools have been developed to go through databases and RIOS
files, extract information and generate statistics; however,
much more work is necessary in this area and may possibly
require use of another database to collect information for
analysis.
Integration and Open Environment
Finally, there is more demand for integration and easier access
to simulation input and results data. Efforts to integrate
subsurface work environment call for visualization and
interpretation of simulation data with geoscience data and
better exchange of data such as hydraulics tables, pvt, k-value
and eos properties and completion efficiency tables with the
source applications. More and more external applications

SPE 106075

want to talk to the database to get or change data with


automated tools. Users want to be able to get to simulation
data directly to analyze using their favorite tools.
To meet all of these challenges, the data management
environment must be able to bring in new components. It
must become more open and more easily communicate with
other applications. It must provide simple interfaces for users
to get to data quickly. This is an area of ongoing work.
Conclusions
The heterogeneous, distributed and multi-tier data
management environment that has been described allows
engineers to work on reservoir models, using logically
centralized, physically decentralized data sources where
integrity, security, consistency, etc. are managed. The
environment was designed, developed and distributed over a
ten year period and has gone through several versions.
The environment increases development work and support
load and can have implementation issues that take time to
resolve. Nevertheless, it has proven its value: (1) it has
enabled penetration of reservoir simulation into a much wider
audience than ever before, (2) it has exposed the volume and
diversity of data in use and the need for good data
management of simulation information, and (3) it has opened
doors to new ways of analyzing simulation data, including
data mining.
The environment must continuously improve and adapt to
changing requirements and workflows. Initially designed as
an all inclusive, self sufficient solution, it must now open up to
enable integration, to handle ever bigger models with
increasing number of facilities and to enable automated
history matching and optimization workflows.
The benefits of good data management are not obvious
the system is in place.
It requires full backing
commitment of company management for success.
experiences discussed in this paper would not have
possible without such a technology leadership.

until
and
The
been

Acknowledgments
The authors wish to acknowledge B.L. Beckner, B.A. Boyett,
T.K. Eccles, J.D. Hindmon and C.J. Jett for their valuable
assistance to this paper. The authors also acknowledge the
management of ExxonMobil Upstream Research Company for
permission to publish this paper.
Windows is a registered trademark of Microsoft Corporation
in the United States and other countries.
References
1. Huang, A.Y. and Ziauddin, Z.: Use of Computer
Graphics in Large-Scale Reservoir Simulation, SPE
20343, presented at the 5th Petroleum Computer
Conference of the Society of Petroleum Engineers,
Denver, Colorado, June 25-28, 1990, 145-150.
2. Kreule, T., Good, P.A., Hoff, A.H.F.B. and Maunder,
R.E.: RISRES A State-Of-The-Art Hydrocarbon

SPE 106075

Resource Database That Works! SPE 35986, presented


at the 1996 Petroleum Computer Conference of the
Society of Petroleum Engineers, Dallas, Texas, June 2-5,
1996, 27-40.
3. Howell, A., Szatny, M. and Torrens, R.: From Reservoir
Through Process, From Today to Tomorrow The
Integrated Asset Model, SPE 99469, presented at the
Intelligent Energy Conference and Exhibition of the
Society
of
Petroleum
Engineers,
Amsterdam,
Netherlands, April 11-13, 2006.
4. Beckner, B.L., Hutfilz, J.M., Ray, M.B. and Tomich, J.F.:
EMpower: New Reservoir Simulation System, SPE
68116, presented at the 2001 Middle East Oil Show of the
Society of Petroleum Engineers, Bahrain, March 17-20,
2001.
5. Beckner, B.L., Usadi, A.K., Ray, M.B. and Diyankov,
O.V.: Next Generation Reservoir Simulation Using
Russian Linear Solvers, SPE 103578, presented at the
2006 Russian Oil and Gas Technical Conference and
Exhibition of the Society of Petroleum Engineers,
Moscow, October 3-6, 2006.

You might also like