You are on page 1of 4

While many other industries have experienced tremendous benefits over the last few decades,

adoption of data- driven analytics is still young in the oil and gas sectors. Benefits captured across
industries involve improving the quality of decisions, improving planning and forecasting, lowering
costs, and driving operational-efficiency improvements. However, many challenges for full adoption
exist in our industry. In addition to the outdated data-management challenges, key gaps exist in the
understanding of basic principles concerning how and when to use different data-analytics tools.

Data-analytics benefits are being demonstrated through the efficient exploitation of data sources to
derive insights and support making decisions. An exponential increase in the number of applications
in recent years has been observed for enhancing data quality during/after acquisition by
automatically removing noise and outliers; better assimilating new and high-frequency data into
physics-based models; optimizing calendar-based inspections for preventive-maintenance tasks;
increasing equipment availability of well, surface, and drilling systems; optimizing reservoir recovery
on the basis of injector-to-producer allocation factors; and many others.

Machine learning is a collection of techniques, both supervised and unsupervised, that gives
computers the ability to learn and adapt without being explicitly programmed. This ability to learn
provides capabilities for describing past and current operating conditions, predicting, and
prescribing.

Supervised learning includes regression and classification methods in which a relationship is


established between the input and a known output. Unsupervised learning includes clustering,
which addresses problems with no prior knowledge on the output, automatically grouping a large
number of data variables into a smaller variable set.

Most data-driven projects may follow a similar approach during implementation. In the majority of
these, large efforts are made in collecting and preparing the data, which typically reside in scatter
sources and exist in unstructured formats. Once data are placed in proper tabular forms and
relationships are established, then data are ready for analysis, which may include exploratory
visualizations, model order or dimensionality reduction, clustering, regression, classification, pattern
recognition, cross validation, model validation, prediction, and optimization. Insights and syntheses
are derived along the analysis process.
Data mining can be defined as the science of extracting useful information from large data sets or
databases. Data mining is used for building empirical models, which are based not on the underlying
theory about the process or mechanism that generated the data.

Data mining, as the name suggests it, is data-driven, and it provides a description of the observed
data. Its fundamental objective is to provide insight and understanding about the structure of the
data and its important features and to discover and extract patterns contained in the data set.

Application of data mining in the manufacturing processes.

In many industrial plants, large amounts of process data are collected by data historians. Hundreds
of variables are continuously measured, and their values are being stored in voluminous databases.

These databases are a rich source of information, but without the use of data mining algorithms,
extracting useful information and creating knowledge from a large number of process variables and
different operating modes can be a very difficult

Data mining techniques can be used, for example, to identify patterns in the data, correlations
between different process variables, classify or cluster variables in order to ultimately discover new
relationships between the variables contained in the database, and thus extract previously unknown
process knowledge.

In industrial process context, data quality is a key issue, since historical process data include
numerous invalid data, such as:

- Incomplete data: missing process variables, missing variable values


- Inconsistent data: impossible or out-of-range values
- Noisy data: errors, outliers, inaccurate values

This can be caused by various process or equipment related factors, such as:

- Drift and malfunction of measuring instruments


- Starts and stops of key unit operations
- Aberrant process behaviour
- Relative infrequent laboratory analyses or product quality sampling
- Dubious periods of operation, such as shutdowns, low production periods, equipment
cleaning

Data Mining concepts and approaches

A descriptive model represents the main features of the data. It summarizes the information
contained in the data set and describes its main characteristics.

Techniques used for constructive descriptive models are:

- Neural networks
- Statistical Factor analysis such as Principal component analysis
- Fuzzy logic for discovering association rules, for clustering.
A predictive model is a model of future behaviour, allowing the value of some variables to be
predicted from known values of other variables.

The predictive model is further classified in two distinct categories, depending on whether the
variable to be predicted is categorical or real-valued

Classification Models:

Some of the main classification techniques are:

- Linear discriminates, linear classifiers based on the concept of searching for linear
combination of the variables that best separates the classes
- Nearest neighbour method
- Neural networks

Predictive models for regression:

Some of the main methods used for regression models are:

- Linear models and least square fitting, where the output is predicted as a linear combination
of inputs
- Neural networks
- Partial Least Squares (PLS, also known as Projection to Latent Structures)

Some of the data mining techniques used for Process monitoring and control, soft sensors, expert
systems for decision making support and fault detection and diagnosis systems are as follows:

- Regression techniques
- Classification techniques
- Descriptive modelling techniques such as PCA, used to identify correlations between
variables, to further process understanding and for statistical process control purposes.

Data mining techniques which are extensively used are:

- Factor based statistical models, such as Principal Component Analysis (PCA) and Projection
to latent structures (PLS)
- Neural network models
- Fuzzy logic concepts

Factor Based Statistical Models

Used to transform a number of related process variables to a smaller set of uncorrelated variable.

Often called as Data reduction methodologies

Typically industrial data mining applications include:


- Multivariate stasitical process control charts to detect abnormal situations and help locate
the cause of problem
- Classifying products
- Predicting process parameter values

Neural Networks

Describe a nonlinear relationship between the input and output of a complex system using
historic process data

Typical industrial data mining applications include

- detect abnormal situations and help locate the cause of problem


- Classifying products
- Predicting process parameter values

Fuzzy Logic

Can convert a set of user-supplied human language rules into mathematical equivalents

Fuzzy logic modelling is a probability based modelling, and it allows numerical data to be
converted into corresponding qualitative attributes

You might also like