Professional Documents
Culture Documents
Or
minimize R= ,,w
T
w,, + C[
i
+
i
*]
Under the constraints,
y
i
- w
T
x-b s +
i
w
T
x+b-y
i
s +
i
*
*
i
>0 ,
i
>0 .
SVR General Steps
1. Define the problem as classification or regression.
2. Standardize the input data.
3. Check for outliers, i.e. the strange data points.
4. Select the kernel function in order to transform the
data to a higher dimensional feature space. One of
the common kernels considered is Radial Basis
Function (RBF) kernel.
5. Select the shape, i.e. the smoothing parameter
of the kernel function. This is the polynomial degree
for the polynomial and variances for the Gaussian
RBF kernel.
SVR General Steps
6. Choose the penalty parameter C and the desired accuracy
defining the insensitivity zone .
7. Solve the quadratic programming problem in the 2 x L
variables for the corresponding regression task.
8. Train the model and validate it on a previously unseen
dataset. If the validation result is not satisfactory, repeat the
steps from 4 to 8.
9. Since the search of the individual C, and the shape
parameter can be tedious and a time consuming task, an
alternative approach could be cross-validation and grid
search to find the best value of cost parameter.
University of Alaska Fairbanks
18
Approach
Three data division techniques were
investigated
Random Division
Genetic Algorithm
Kohonen Network
Appropriate model developed for Ore
Reserve Estimation.
Study Area and Data Characteristics
Greens Creek mine located in Southeast
Alaska
Polymetallic ore body (silver, zinc, gold, and
lead)
Data available in terms of easting (x) and
northing (y) co-ordinates (in m), gold, silver,
lead, zinc and copper content (in ppm).
University of Alaska Fairbanks
University of Alaska Fairbanks
20
Comparative Analysis in a Lode
Deposit
Data obtained from the Greens Creek mine.
432 exploratory borehole observations (x, y, gold,
silver, lead, zinc, copper).
Silver values were estimated.
Training= 216; Calibration= 108; Validation=108.
GA was used to obtain the model data subsets.
University of Alaska Fairbanks
21
Modeling
Split Sampling Approach was carried out.
For NN three datasets and SVM two datasets
required.
Random data division resulted in dissimilar
datasets.
Genetic Algorithms was applied.
Data Modeling
Data divided into
three statistically
similar subsets
employing genetic
algorithms (GA).
Training Dataset Mean SD
X 5541.92 409.34
Y 3752.67 541.29
Gold 0.03 0.06
Lead 0.15 0.28
Zinc 3.41 7.38
Copper 2.89 3.91
Silver 0.96 1.43
Calibration Dataset
X 5558.50 416.80
Y 3707.91 494.98
Gold 0.03 0.05
Lead 0.13 0.23
Zinc 3.74 5.02
Copper 2.75 3.39
Silver 0.92 1.18
Validation Dataset
X 5567.85 429.48
Y 3670.76 520.68
Gold 0.03 0.06
Lead 0.14 0.26
Zinc 3.03 4.03
Copper 2.69 3.53
Silver 0.89 1.20
Statistical Properties of the Greens Creek model datasets
Data Modeling
For NN modelling, the commercially available
software package Neuroshell was used
Several network architectures investigated
Final architecture consisted of 5 slabs (one input
slab, one output slab and three hidden slabs)
University of Alaska Fairbanks
Histogram plot for the silver values
University of Alaska Fairbanks
Snapshot of the semi-variogram
modeling on the variable silver
University of Alaska Fairbanks
NN Modeling
For this modelling exercise, the network comprised
of 5 slabs: one input slab, 3 hidden slabs and 1
output slab (slab is basically a group of neurons; a
particular layer may have multiple slabs). Each slab
in the hidden layer and the output layer consisted of
different activation functions. The input slab has six
neurons for each of the input variables while the
output slab has one neuron for the silver values as
the output variable. The slabs in the hidden layer
have 8, 6 and 8 neurons respectively.
University of Alaska Fairbanks
27
NN Architecture
Tanh
Slab 5 Slab 1 Slab 3
Gaussian
Gaussian
Complementary
Slab 4
Slab 2
Output
Ward Net Architecture for the NN
modeling
Linear
Activation
I
n
p
u
t
SVR Modeling
For the SVM modelling, a grid based approach with 10 fold
cross validation on the training dataset was employed to select
the optimal model parameters C, and .
The following Figure shows the plot for the model performance
(troughs and flat regions) for different combinations of the C
and values.
The cross-validation MSE was used as a criterion to select the
optimum parameter values of C, and .
The flat regions correspond to the various possible
combinations for the optimal values of C and . The optimal
estimates of C and were found to be 2.5 and 0.5 respectively
University of Alaska Fairbanks
Effect of the cost and kernel width on the error for the silver values
Variation of error with epsilon for the variable silver.
SVR Modeling
Once the optimum values of these parameters were
determined the next step involved the selection of an
optimum value of . This was selected by fixing the
values of C and at their optimum values, while
varying the parameter . This exercise was also
carried out through a cross-validation study on the
training data set. The previous figure showed the
variation of the mean squared error with respect to
the parameter for the training dataset.
SVR Modeling
Following this exercise, the optimum model
parameter values of C, and for the silver
values were found to be 2.5, 0.5 and 0.05
respectively. The final step involved the
assessment of the model generalization
ability through the examination of the
generalization error on the prediction data
set.
Data Modeling
SVM modeling performed using R
a grid based approach with 10 fold cross
validation on the training dataset was employed to
select the optimal model parameters C, and .
cross-validation MSE was used as a criterion to
select the optimum parameter values of C,
optimum model parameter values of C, and
for the silver values were found to be 2.5, 0.5 and
0.05
University of Alaska Fairbanks
Results
Silver values predicted for 108 observations
Model performance evaluated based on a
summary statistic, termed the skill value
skill value = abs (ME) + MAE + RMSE + (1- RSQ)
where,
ME= mean error (ME),
MAE= mean absolute error (MAE),
RMSE=root mean squared error (RMSE)
RSQ= coefficient of determination
University of Alaska Fairbanks
SKILL VALUE
This summary statistic, termed the skill value, is an
entirely subjective measurement. One can possibly
devise numerous skill measures; however, the one
proposed here is quite simple and weights the ME,
MAE, RMSE equally and applies a scaling to the
RSQ so that it is of the same order of magnitude as
the other components. It should be noted that the
lower the skill value, the better the method is.
Results
University of Alaska Fairbanks
Statistics (Silver) SVM NN
OK
Mean Error 0.02 0.08
0.25
Mean Absolute Error 0.25 0.36
0.64
Root Mean Squared
Error
0.48 0.72
1.04
RSQ 0.91 0.79
0.59
Generalization performance of the models for the variable Silver
Results
University of Alaska Fairbanks
Statistics (Silver) SVM NN
OK
skill value 0.84 1.37
2.34
Rank 01 02
03
Model performances based on the skill values
Results
University of Alaska Fairbanks True vs. Predicted (NN)
R
2
= 0.793
0
1
2
3
4
5
6
0 2 4 6 8
True
P
r
e
d
i
c
t
e
d
True vs. Predicted (SVM)
R
2
= 0.9034
0
1
2
3
4
5
6
0 1 2 3 4 5 6 7 8
True
P
r
e
d
i
c
t
e
d
University of Alaska Fairbanks
R
2
= 0.5927
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
0.00 2.00 4.00 6.00 8.00
True
P
r
e
d
i
c
t
e
d
True vs. Predicted (OK)
Results
Results --- Discussions
It can be seen from the plots that the
SVM method over-performs compared to the
other two methods.
To further investigate the performance of
the model, the prediction error distribution
plots for the OK, NN and the SVM methods
were analysed.
Results
University of Alaska Fairbanks
Error distribution for the Silver values (NN)
Error distribution for the Silver values (SVM)
University of Alaska Fairbanks
Error distribution for the Silver values (OK)
Results
Results---
It can be noted that error distribution of silver values
for the SVM model and the NN model approximates
a normal distribution, whereas for the OK method it
is more of a lognormal shape.
A normality assumption of the model errors is always
preferred. Thus, a lognormal error distribution of the
OK method could be seen as a disadvantage. This is
particularly significant where uncertainty analysis is
conducted.
University of Alaska Fairbanks
43
Conclusions
SVM produced better estimates for the silver values
in the lode deposit.
In general, MLAs can be used for the purpose of
predictive mapping such as ore reserve estimation if
data is used sensibly. These methods are
comparatively fast if the dataset is small.
University of Alaska Fairbanks
44
University of Alaska Fairbanks
45
Genetic Algorithm (GA) for Data Division
Optimization technique
based on the theory of
genetics and natural
selection.
Performs reproduction,
cross-over and mutation
operations on each solution
of the successive iterations
to generate the final
solution.
Principle stages of Genetic Algorithms
Generate Initial
Population
Assess fitness
value
Reproduce
Population
Crossover
Population
Mutate
Population
Final
Population
University of Alaska Fairbanks
46
Kohonen Map for Data Division
Unsupervised Learning
technique.
Identifies the various features
existing in the data by grouping
the similar features into a
cluster.
Sampling done from these
clusters to ensure proper
representation in the subsets.
Competition
Cooperation
Weight Update
Stages in Kohonen Mapping
( ) i c i w x
i
w x =
min
) (
) (
) (
)) ( ) ( )( ( ) (
) 1 (
t N i
t N i
t w
t w t x t t w
t w
c
c
i
i i
i
e
e
)
`
+
= +
o
University of Alaska Fairbanks
47
Background (Contd.)
Split Sampling
Data split into atleast two subsets.
Training Subset and Prediction subset.
Similar datasets can be obtained.
Merits: good with large data.
Demerits: larger variance, data division has to be proper
(otherwise model trained in english tested in french)
Standard linear regression equation
The linear case is a special case of the
nonlinear regression equation
T
y w x b = +
( ) y f x =
Support vector regression
Idea : we define a tube of radius around the regression ( 0)
No error if y lays inside the tube or band
Support vector regression
University of Alaska Fairbanks
50
Support Vector Regression
Goal is to determine the functional dependency
- A novel loss function termed as the Vapniks linear loss function with
- insensitivity zone is introduced.
- an error tube of thickness is defined around the regression line ( 0)
T
y w x b = + ( ) y f x =
1
( , , ( )) ( ) max(0, ( ) ) L x y f x y f x y f x
c
c
c = =
2
2
2
( , , ( )) ( ) [max(0, ( ) )] L x y f x y f x y f x
c
c
c = =
We therefore define an -insensitive loss
function L
1
1
( , , ( )) ( ) max(0, ( ) ) L x y f x y f x y f x
c
c
c = =
L
2
2
2
2
( , , ( )) ( ) [max(0, ( ) )] L x y f x y f x y f x
c
c
c = =
Support vector regression
University of Alaska Fairbanks
52
Support Vector Regression (Contd..)
Slack variables e
i
are defined for each observation:
1
max(0, ( ) ) ( , , ( )) ( )
i i i i i i i i
e y f x L x y f x y f x
c
c
c = = =
e
e
e
e
Support vector regression
Support vector regression
Kernels are used to linearize the problems in
conditions of non-linearity
University of Alaska Fairbanks
55
SVR (contd..)
Classic quadratic optimization problem
Can be solved by using lagrange multipliers following the Kharush-Kuhn-
Tucker (KKT) conditions. New primal objective function is
min L
p
(w,b,
I
,
i
*, o, o*,, *) = (),,w
T
w,, + C[
i
+
i
*] - (
i
*
i
* +
i
i
) - o
i
*
(y
i
- w
T
x-b + +
i
)- o
i
(w
T
x+b- y
i
+ +
i
)
- At optimal point first derivative w.r.t the independent variable vanish.
dLp/ dw =w
0
- ( o- o*)x
i
= 0 dLp/do
i
= o
i
(w
T
x+b- y
i
+ +
i
) = 0,
dLp/ db = ( o- o*)= 0 dLp/do
i
*= o
i
* (y
i
- w
T
x-b + +
i
*)=0
dLp/ d
i
= C-o
i
-
i
dLp/d
i
* =
i
*
i
*=(C-a)
i
*= 0
dLp/ d
i
*= C-o
i
*-
i
* dLp/d
i
=
i
i
= (C-a)
i
= 0
University of Alaska Fairbanks
56
SVR (contd..)
It can be expressed in dual form (o
i
,o
i
*) by
substituting the KKT conditions.
max L
d
(oi, oi*) = (-1/2) ( o
i
- o
i
*)( o
j
- o
j
*) x
i
T
x
j
- ( o
i
- o
i
*)
+ ( o
i
- o
i
*)y
i
subjected to
( o
i
- o
i
*)=0
0s o
i
s C
0s o
i
*s C
University of Alaska Fairbanks
57
SVR (contd..)
Optimization will yield L (o
i
, o
i
*) pairs- one each for a training
pattern.
Patterns with non-zero o
i
or o
i
* are support vectors (SV).
Complexity proportional to number of SVs.
The best regression hyperplane is given by
f(x,w) = w
o
T
x + b
= ( o- o*)x
i
T
x + b
bias: average b = y
i
- w
o
T
x
i
+ for 0 < o
I
< C
= y
i
- w
o
T
x
i
- for 0 < o
*
i
< C
University of Alaska Fairbanks
58
SVR (contd..)
In Nonlinear SVR, Kernels used ( such as polynomial, RBF)
Same concept as the linear.
Parameters of the model are:
- C (penalty parameter), (error tube thickness), ( rbf kernel width
when used).
- Optimal parameters can be selected by cross validation techniques.
Basic kernels for vectorial data:
Linear kernel:
(feature space is Q-dimensional if Q is the dim of ; Map is
identity!)
RBF-kernel:
(feature space is infinite dimensional)
Polynomial kernel of degree two:
(feature space is d(d+1)/2 dimensional if d is the dim of )
_ |
_
( , ) '
i j i j
K x x x x =
2
2
( , ) exp( )
2
i j
i j
x x
K x x
o
=
2
( , ) ( ' )
i j i j
K x x x x =
Support vector regression
K-fold Cross-validation
Learning typically involves training and testing the model. This can be done in two
approaches-
(1) Split sample approach
(2) K-fold cross-validation approach.
K-fold cross validation approach is typically the best method of developing a
learning model under conditions of sparse data (Hastie et al., 2001).
University of Alaska Fairbanks
61
Background (Contd.)
K-fold Cross-validation
Data Split into K roughly equal sized parts; for example with
K=5, We have..
The model is fitted using (k-1) parts and the error is estimated
using the (k) part. We do this k=1,2,3..5 and combine the
prediction errors to estimate the model error.
merits: can be good under conditions of data sparseness.
demerits: more training time, imprecise way to measure the
accuracy, model data subsets may not be similar.
Train Test Train Train Train
Key Steps in Modeling
Using the learning curve, estimate the number of folds needed in
cross-validation
For the SVM, select a kernel and estimate the optimal values of sigma
(width of the kernel) and cost function using a grid search.
Using the optimal cost and sigma values, train the model and validate
using the k-fold cross validation approach.
University of Alaska Fairbanks
63
Model Development
For NN modeling a wardnet architecture with
three slabs in the hidden layer was used.
The slabs in the hidden layer had 8, 6, 8
neurons.
For SVM modeling a grid based approach
with 10 fold cross validation was used.
University of Alaska Fairbanks
64
Background (contd.)
Model evaluation is necessary.
Split sampling (or holdout method)
Cross validation (k-cross, leave-one-out)
Model learns based on training data subset.
Performance evaluation is done based on
generalization on validation data subset.
For reliable performance evaluation, validation
subset of the data should have similar statistical
properties as the training data.
NN Modeling
Various network architectures with different numbers of hidden
layers and neurons in each layer were investigated prior to the
selection of an architecture with 9 hidden neurons.
The purpose behind this modelling exercise was to ensure that
the model is neither over-fitted nor under-fitted.
Over-fitting of a NN model is a condition which arises when
there are too many neurons in the hidden layer as a result of
which the network performs exceptionally well in the training
dataset but doesnt generalize well. On the other hand, under-
fitting is a condition arising due to a smaller number of neurons
during which the network results in high training and high
generalization error.
NN Modeling
The three slabs in the hidden layer use three
different activation functions viz. tanh, gaussian and
complementary gaussian whereas the output layer
slab uses a linear activation function. The concept
behind using different combinations of the activation
functions is to identify various patterns in the
dataset. A particular activation function may be more
suitable for a few typical patterns; however, it may
not work at all for other patterns. Thus, the use of
different activation functions ensures that at least
some of the underlying trends in the data are
captured.
NN Modeling
For example, a gaussian activation function in one
hidden slab may detect features in the mid-range of
the data while a gaussian complement activation
function in another hidden slab may detect features
from the upper and the lower extremes of the data.
Similarly, a tanh activation function will tend to group
together data at the low and the high ends of the
original data range. This may be helpful in reducing
the effects of outliers. Implementation of these
features in the output layer may result in better
predictions.
University of Alaska Fairbanks
68
Results
Optimum model parameters values of C, and for
the silver values were found to be 2.5, 0.5 and 0.05.
Effect of the cost and kernel width on the error for the silver
values
Variation of error with epsilon for the variable silver.