You are on page 1of 6

Fuel 85 (2006) 553–558

www.fuelfirst.com

Gasoline quality prediction using gas chromatography and FTIR


spectroscopy: An artificial intelligence approach
K. Brudzewskia,*, A. Kesikb, K. Kołodziejczykb, U. Zborowskab, J. Ulaczykc
a
Department of Chemistry, Warsaw University of Technology, ul. Noakowskiego 3, 00-664 Warsaw, Poland
b
Central Petroleum Laboratory, Al. Zwirki i Wigury 31, 02-091 Warsaw, Poland
c
Department of Physics, Warsaw University of Technology, ul. Koszykowa 75, 00-668 Warsaw, Poland
Received 29 June 2004; received in revised form 21 July 2005; accepted 21 July 2005
Available online 1 September 2005

Abstract
This paper reports on analysis of 45 gasoline samples with different qualities, namely, octane number and chemical composition.
Measurements of data from gas chromatography and IR (FTIR) spectroscopy are used to gasoline quality prediction and classification. The
data were processed using principal component analysis (PCA) and fuzzy C means (FCM) algorithm. The data were then analyzed following
the neural network paradigms, hybrid neural network and support vector machines (SVM) classifier. The IR spectra were compressed and de-
noised by the discrete wavelet analysis. Using the hybrid neural network and multi linear regression method (MLRM), excellent correlation
between chemical composition of the gasoline samples and predicted value of the octane number was obtained. About 100% correct
classification for six different categories of the gasoline was achieved, each of which has different qualities.
q 2005 Elsevier Ltd. All rights reserved.

Keywords: Gasoline classification; Octane number; Neural networks; Wavelet analysis; SVM classifier

1. Introduction octane numbers from IR and NIR spectral data was done.
Octane number has also been correlated with carbon or
The antiknock performance of a gasoline is its ability to hydrocarbon types [1,2] measured by gas chromatography,
resist detonation, a form of abnormal combustion. Detona- high performance liquid chromatography, or nuclear
tion occurs when the air–fuel mixture reaches a temperature magnetic resonance [3]. The octane number prediction out
and/or pressure at which it can no longer keep from self- of these models gave good and reproducible results, but only
igniting. Two types of abnormal combustion are common: for fuels with a very similar composition. Most of the
the first is detonation, as previously mentioned, and the correlation models published were developed with multiple
other is preignition. linear or nonlinear regression techniques, which require the
Research octane number (RON) is determined in a user to specify a priori a mathematical model of the
standardized engine. This is a very expensive method but empirical correlation. The neural network approach is an
still the only accepted one. Very soon scientists began to alternative way of solving the problem. Unlike multiple
look for a correlation between the tendency of hydrocarbon- linear or nonlinear regression techniques, which require a
based fuels to knock and the composition of these fuels. predefined empirical model, the neural network can identify
With the help of kinetic models, possible reaction and learn the correlative patterns between the input and
mechanisms were established. Later on the calculation of corresponding output values once a training set is provided.
In this paper, the application of gas chromatography
and IR (FTIR) data in combination with different pattern-
* Corresponding author. Tel./fax: C48 22 660 5358. recognition engines (PCA, FCM, neural networks) to
E-mail address: bruxz@ch.pw.edu.pl (K. Brudzewski). predict the octane number of gasoline is reported. The
0016-2361/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. selforganizing hybrid network and SVM network were
doi:10.1016/j.fuel.2005.07.019 used.
554 K. Brudzewski et al. / Fuel 85 (2006) 553–558

2. Method Repeatability and reproducibility of the analytical


techniques used in this work were within the requirement
Forty-five unleaded gasoline samples were selected for of the standard ASTM methods for the specific analysis.
this study. These samples covered a wide range of gasoline All network calculations in this study were performed
properties. The following methods were applied to the using MATLAB 6.5 software, originally developed soft-
chemical and physical and chemical structure analysis: ware for the selforganizing hybrid neural network [5] and
SVM neural network [6] as well. The experimental data
– Research octane number (RON). The test pro- were processed using principal component analysis (PCA),
cedure was performed according to PN-82/C- fuzzy C means analysis (FCM), wavelet analysis and neural
04112 standard. This parameter was measured by network (SVM) classifier.
comparing of antiknock performance of tested
fuel with antiknock performance fuels in con-
ditions described in this standard. The octane
number of the fuel samples was determined by 3. Hybrid neural network as a signal processor
burning the gasoline in an engine under controlled
conditions, e.g. of spark timing, compression, To solve the complex estimation tasks, the neural
engine speed, and load, until a standard level of
network of the hybrid structure presented in Fig. 1 was
knock occurs.
applied. The first part of the network is the selforganizing
– Gas chromatography. The determination of
Kohonen layer, the second one is the feedforward network
hydrocarbon composition was performed accord-
called multilayer perceptron (MLP) which is trained in the
ing to the test procedure based on the ASTM D
supervised mode. The hybrid neural structure consisting of
5134. This test method provides the procedure to
the selforganizing layer performing the role of recognition
determine total chemical group composition of
and classification and the second cascaded subnetwork in
tested samples (e.g. n-paraffins, i-paraffins,
the form of multilayer perceptron, performing the role of
naphthenes, olefins and aromatics). In addition,
several hundred chemical compounds presented estimator. The perceptron subnetwork is fed up with the
in tested samples were determined. The mass signals generated by the selforganizing layer. The important
percent of the five hydrocarbon types and ethanol point of the proposed organization of signal processing is
identified by GC were used as neural network the reliability and acceleration of the whole pattern
inputs. recognition process. The procedure of learning the hybrid
– FTIR spectroscopy. The determination of hydro- network is split into two separate phases: the selforganiza-
carbon composition was performed according to tion of the Kohonen layer (the generalized Kohonen
the test procedure worked in Central Petroleum algorithm was used here) and afterwards the supervised
Laboratory using a spectrometer type MAGNA- learning of the MLP subnetwork. Thanks to the separation
IR 750 NICOLET. All the spectra of samples of both phases, the complexity of the learning has been
were registered and analyzed by two compatible significantly reduced and learning process accelerated.
programs: OMNIC 1.2a and QUANTIR 1.20 at This procedure is not straightforward, since qualitative
the following conditions: and quantitative aspects are merged together with a degree
of complexity generally dependent on the number of
Number of sample scans 32, sample cell KBr (25), components composing the chemical pattern and on the
resolution about 2 cmK1 and spectral range 4000–400 cmK1. degree of non-linearity of the problem.
The measurements were performed in nitrogen atmosphere.
One spectrum was recorded as data-vector which has 1868
independent variables (individual spectral channels). The
dimension of the spectrum vectors was reduced to 231
approximation coefficients by wavelets technique using
3-level decomposition and 5-db wavelet function [4].
These 231 approximation coefficients after the features
extraction were reduced to six independent features. The
features extraction was performed using three different
methods: PCA technique, test of the maximal variance of
the peak absorption and chemical interpretation of the
optical transitions (peaks of the absorption). Finally, one
spectrum was represented as vector with six components.
These six-component vectors were used as neural network
inputs. Fig. 1. Structure of the hybrid neural network.
K. Brudzewski et al. / Fuel 85 (2006) 553–558 555

4. Neural SVM classifier 5. The results and discussion

As the classifier, the artificial neural network of the SVM Forty-five unleaded gasoline samples were prepared in
type was applied. The SVM solution of Vapnik [7] is known such a way that they covered a wide range of the gasoline
as a very good tool for classification problems with properties (see Table 1). In total, 45 gasoline samples were
excellent generalization ability. The SVM neural network available for the study: 35 gasoline samples were included in
structure is presented in Fig. 2. In distinction to the classical the training dataset and 10 in the test dataset. To cover the
neural networks SVM formulation of learning problem whole range of the gasoline fuels in this study, the training
leads to the quadratic programming with linear constraints dataset included samples that contained at least the maximum
[8]. Basically, the SVM is a linear machine working in the or minimum values of inputs and outputs. The rest 10
high dimensional feature space formed by the non-linear available gasoline samples were used as the test dataset.
mapping of the n-dimensional input vector x into a K- Only the data from the gas chromatography was used in
dimensional feature space (KOn) through the use of the the first experiment (see Table 1). The mass percent of the
function J(x). The equation of the hyperplane separating five hydrocarbon types and ethanol identified by GC were
two different classes is given by used as neural network inputs and independent variables in
the linear regression equations. Octane numbers used as the
yðxÞ Z wT JðxÞ Z 0 (1)
outputs for the neural network and the dependent variables
where J(x)Z[J0(x),J1(x),.,JK(x)]T with J0(x)Z1 and for linear regression correlations. The prediction of the
w is the weight vector of the network. The data vector x octane numbers was done using the hybrid neural network.
satisfying the condition y(x)O0 belongs to one class and What is particularly important in defining the hybrid
when y(x)!0 belongs to the opposite one. The most network is the proper choice of the number of neurons in
distinctive fact about SVM is that the learning task is each layer. The size of the input layer is dictated by the number
simplified to the quadratic programming by introducing so- of the input vector components. In the described case this
called Lagrange multipliers. All operations in learning and number is equal to 6 (5 hydrocarbonsCethanol). The number
testing modes are done in SVM using so-called kernel of Kohonen neurons should reflect the complexity of data
functions. The kernel is defined as: distribution. After some experiments, 64 neurons in Kohonen
layer were found as an optimal number. The input dimension
Kðx; xi Þ Z JT ðxÞJðxi Þ (2)
of MLP network is equal to the number of neurons in the
Polynomial kernel function was used in the calculation Kohonen layer. The output dimension of the system is defined
by the number of the predicted parameters (there is only one
Kðx; xi Þ Z ðxT xi C gÞp (3) parameter used-octane number). The number of hidden
where pZ5, gZ0.45. neurons has been adjusted experimentally to obtain the best
Although SVM separates the data only into two classes, accuracy of generalization. The experiments of learning
the recognition of more classes is straightforward by different structures of MLP have shown that in this case the
applying either ‘one against one’ or ‘one against all’ hidden layer consisting of six neurons is optimal. So, the
methods. The important advantage of the SVM approach is structure of the hybrid network (64–6–1) uses only one hidden
transformation of the learning task to the quadratic layer of six neurons.
programming problem. For this tape of optimization, there To estimate the qualitative and quantitative prediction
exist many very effective learning algorithms, leading in of the octane number by the hybrid network, two kinds of
almost all cases to the global minimum of the coast function Table 1
and to the best possible choice of the parameter values of Main properties of gasolines used in the research
neural network.
Property Unit Minimum Maximum
value value
Density at 20 8C g/cm3 0.7173 0.8077
Content of n-paraffin %, m/m 4.627 7.867
hydrocarbons
Content of i-paraffin %, m/m 18.339 52.151
hydrocarbons
Content of naphthenes %, m/m 1.938 26.275
Content of olefins %, m/m 0.029 22.713
Content of aromatic %, m/m 13.863 71.990
hydrocarbons
Content of EMTB %, m/m 0 0
Content of ethanol %, m/m 0 5.00
Content of ETBE %, m/m 0 0
Octane number – 81.4 99.7
Fig. 2. Structure of the SVM neural network.
556 K. Brudzewski et al. / Fuel 85 (2006) 553–558

Table 2 The relationship between octane number and hydro-


Comparison of the statistical results of the octane number correlation obtained carbon composition can be determined by multilinear
from the hybrid neural network model and the multilinear regression model
regression. In this work, the data from the gas chromatog-
Error Hybrid neural network MLR-method raphy were entered in a multilinear regression model. The
RMS Test data: 0.3584 Test data: 0.2056 equation that relate the gas chromatography data to the
MAE Test data: 0.2747 Test data: 0.1781 octane number (ON) is following
MaxErr Test data: 0.7797 Test data: 0.3238
ON Z 2123:8 C 158:7x1 K47:5x2 K15:5x3 K7:5x4
errors have been calculated: the root mean square error (RMS) K28:3x5 K19:9x6 (6)
and mean absolute error (MAE). The RMS error is defined as
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where x1, x2, x3, x4, x5, x6 are the mass percent of the five
uP
u p pred hydrocarbons: n-parafin, i-parafin, naphthenes, olefines,
u ðONi KONdest i Þ
2
t iZ1 aromatic and ethanol. Table 2 compares the statistical
RMS Z (4)
p results of the octane number correlation obtained from the
hybrid neural network model and the multilinear regression
where ONpred
i are the predicted octane numbers, ONdest
i are model.
the real measured values of the octane number, p is the
number of samples. The MAE errors express the linear
relationship between the errors of different samples. The 5.1. Data clustering
MAE error is defined as:
p   In the second experiment, the IR (FTIR) spectra obtained
P  pred 
ONi KONdest i 
from the same 45 samples were used. A plot of the IR
MAE Z iK1 (5) (FTIR) spectra is shown in Fig. 3a. The results of the test of
p maximal variance of peak absorption are shown in Fig. 3b.

Fig. 3. Plot of 45 gasolines FTIR spectra (a), variance of absorption (b).


K. Brudzewski et al. / Fuel 85 (2006) 553–558 557

Table 3 Table 4
Division of gasoline into six different categories of quality according to its Confusion matrix for SVM classification results, true vs. predicted (rows vs.
octane number value columns) for the training dataset (35 samples)

Octane number value Class number Octane number Class number


(and class) 1 2 3 4 5 6
Min: 78; Max: 81 1
Min: 82; Max: 86 2 78–81 (1) 3 0 0 0 0 0
Min: 87; Max: 91 3 82–86 (2) 0 7 0 0 0 0
Min: 92; Max: 94 4 87–91 (3) 0 0 9 0 0 0
Min: 95; Max: 97 5 92–94 (4) 0 0 0 10 0 0
Min: 98; Max: 100 6 95–97 (5) 0 0 0 0 4 0
98–100 (6) 0 0 0 0 0 2

After the features extraction, six peaks of the absorbance at


the wave numbers: 1029.88, 1216.99, 1460.92, 1495.64, a cluster center is found for each group. These clusters
1605.57, 3026.95 [cmK1] were chosen. centers were plotted in the multi-feature space. So
The gasoline samples have been divided into the six combining the 3D (PC1–PC2–PC3) plot and FCM, cluster
different categories of quality according to their octane centers were properly located in a multi feature space. So
number value (see Table 3). using these two data clustering algorithms simultaneously,
The use of PCA and FCM cluster analysis to explore better ‘representation’ of data into different clusters was
clustering within the datasets is now discussed. These achieved.
cluster classification methods were applied to help us
explore the existence of expected clusters in the feature 5.2. Neural network-classification
space. As a result, it was possible to verify that the
categories or clusters identified by each of these methods The datasets were analyzed using SVM neural classifier.
were not arbitrary. The aim of this study was to obtain much effective classifier,
Fig. 4 presents the distribution of the pre-processed FTIR which can be trained with best accuracy, to predict the
spectra (where one spectrum is represented as the six- ‘quality of gasoline samples’. In analysis, the SVM having a
component features-vector) mapped on three most import- five-degree polynomial kernel function was applied. More-
ant principal components PC1, PC2, PC3. Notice that PCA over, the optimal regularization parameter C of the SVM
analysis is only used to visualize the dataset in feature space. was found out experimentally by minimizing the leave-one-
PCA was used to investigate the spectra-vectors data cluster out error over the training set, which provides an estimation
in the multi-dimensional feature space. Three principal of the generalization performances of the final classifier.
components were kept because they accounted for 99.98% Some introductory experiments have allowed us to find
of the variance in the dataset. Six distinct gasoline the optimal value of the penalty coefficient CZ100 and the
categories of features can be seen. It also became clearly number of support vectors 23. It will be shown the
evident that the separation of the classes is complicated superiority of the SVM-based classifier for the complex
when the linear method (PCA) was applied. Next, FCM dataset and the leave-one-out training process suitable for
(non-linear method) [9] was used which enabled us to small training sets and for avoiding outliers in the learning
understand better the nature of the investigated data of the functional input–output mapping. Table 4 shows the
presented in the feature space. From the FCM approach, confusion matrix of the SVM classifier for the training
dataset and Table 5 for the testing dataset. Rows indicate
true values and columns, those respectively predicted. As
can be noted in the tables, in all cases obtained accuracy of
0.04 the classification was 100% (in term of ‘percentage correct
classification’).
0.02

Table 5
PC3

0
Confusion matrix for SVM classification results, true vs. predicted (rows vs.
columns) for the test dataset (10 samples)
–0.02
Octane number Class number
–0.04 and (class number) 1 2 3 4 5 6
0.5
1.5 78–81 (1) 0 0 0 0 0 0
1.4
0 1.3 82–86 (2) 0 1 0 0 0 0
1.2
PC2 1.1 87–91 (3) 0 0 4 0 0 0
–0.5 1 PC1 92–94 (4) 0 0 0 2 0 0
95–97 (5) 0 0 0 0 1 0
Fig. 4. PCA (3D) plot of the IR spectra and clusters centers ‘C’ from Fuzzy 98–100 (6) 0 0 0 0 0 2
C means method.
558 K. Brudzewski et al. / Fuel 85 (2006) 553–558

6. Conclusions and is dependent on the chemical composition of samples.


It is clearly evident that the separation of the classes is
In this paper, an attempt has been made to discriminate much-complicated nonlinear problem. Concluding, it is
between the different gasoline samples using an artificial believed that nonlinear approach (SVM classifier) is optimal
intelligence approach. In the first experiment used, the in this case. With (SVM) classifier, up to 100% accuracy of
hybrid neural network to correlate and predict the octane the classification was achieved. Nevertheless, further work
number of gasoline samples from its chemical composition is needed to assess the correlations between IR spectra and
(gas chromatography data). Predictive equation for octane the octane number value.
number of gasoline from its chemical composition was also
developed with a standard multiple linear regression model.
The comparison of the neural network method with the References
multi linear regression method using this dataset, indicated
that obtained accuracy of the prediction octane number was [1] Walsh RP, Mortimer JV. Hydrocarbon Process 1971;50:153–8.
similar in both cases. In conclusion, it is believed that the [2] Le TT, Allen DT. Fuel 1985;1754–9.
linear regression model is adequate to predict of the octane [3] Meusinger R, Moros R. Fuel 2001;80:613–21.
number from the gas chromatography data. [4] Matlab 6p5, Toolbox: wavelet.
[5] Brudzewski K, Osowski S. Sens Actuators B 1999;55:38–46.
In the second experiment, the neural network classifier [6] Brudzewski K, Osowski S, Markiewicz T. Sens Actuators B 2004;
(SVM) was used to classify the gasoline samples depending 291–8.
on their IR spectra. Six different arbitrary categories of [7] Vapnik V. Statistical learning theory. New York: Wiley; 1998.
gasoline were introduced according to the octane number [8] Burges C. A tutorial on support vector machines for pattern
recognition. In: Fayyad U, editor. Knowledge discovery and data
values. These six different categories of gasoline were
mining, Kliwer, 2000. p. 1–43.
identified with the help of PCA and FCM analysis. As it is [9] Jang JSR, Sun CT, Mizutani E. Neuro-fuzzy and soft computing: a
seen, all six classes can be distinguished, however the computational approach to learning and machine intelligence. Saddle
distribution of clusters forming individual classes varies River, NJ: Upper Prentice-Hall; 1997 pp. 423–33.

You might also like