McClure - Masters Thesis

ALGORITHMS FOR NONLINEAR PROCESS
MONITORING AND CONTROLLER PERFORMANCE

RECOVERY WITH AN APPLICATION TO
SEMI-AUTOGENOUS GRINDING
by
Kenneth Scott McClure
B.A.Sc., The University of British Columbia, 2011
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF
MASTER OF APPLIED SCIENCE
in
The Faculty of Graduate Studies
(Chemical and Biological Engineering)
THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)
May 2013
Kenneth
c Scott McClure, 2013
Abstract
Chemical and mineral processing industries commonly commission linear feedback

controllers to control unit processes over a narrow and linear operating region where
the economy of the process is maximized. However, most of these processes are non-
linear outside of this narrow operating region. In the event of a large unmeasured
disturbance, a process can shift away from nominal and into an abnormal operating
region. Owing to the nonlinearity of these processes, a linear controller tuned for
the nominal operating region will perform poorly and possibly lead to an unstable
closed-loop system in an abnormal operating region. Moreover, it is often difficult to
determine whether a process has shifted to an abnormal operating region if none of
the constraints on the measured process outputs are violated. In these events, it is the
operator who must detect and recover the process, and this manual response results
in a sub-optimal recovery.
This thesis develops and demonstrates a control strategy that monitors several
process variables simultaneously and provides an estimate of the process shift to a
nonlinear abnormal operating region where a linear recovery controller is implemented
to recover the process back to nominal. To monitor the process, principal component
analysis is proposed for process shifts that can be detected by linear variable trans-
formations. Alternatively, for nonlinear or high-dimensional processes, locally linear
embedding is proposed.
Once a process shift to an abnormal operating region is detected, the control strat-
egy uses the estimate of the process shift in a recovery controller to optimally recover
the process. In the event the linear recovery controller is unable to recover the pro-
cess, an expert system overrides the recovery controller to return the process to a
ii
recoverable region.
A case study on a semi-autogenous grinding mill at Gibraltar Mine in British
Columbia presents the successful application of the control strategy to detect and
recover the mill from overloading. Portions of this control strategy have been imple-
mented at Gibraltar Mine and have provided the operators with a real-time estimate
on the magnitude of the mill overload. In addition, it has shown to be robust to large
changes in process conditions.
iii
Preface
This thesis was written in partial fulfillment of the degree of Masters of Applied Science
in the Faculty of Graduate Studies in the Department of Chemical and Biological
Engineering at the University of British Columbia. The content herein describes a
general control strategy to recover a chemical or mineral process from an abnormal
operating region to the nominal range of operation.
Portions of this work have been submitted for publication in Industrial & Engi-
neering Chemistry Research in May 2013. In addition, the findings for locally linear
embedding will be presented at the American Control Conference in June 2014.
A case study for this work was conducted at Gibraltar Mine in British Columbia.
Portions of this work were implemented at Gibraltar Mine in the first half of 2013. The
case study of semi-autogenous grinding at Gibraltar Mine demonstrates how this work
can improve the stability and economy of comminution circuits in the mineral process-
ing field. However, the stability and economy of comminution can also be improved
indirectly by rejecting disturbances upstream at the dry-surge ore stockpile. The case
study in this work is complementary to the work on dry-surge control that has been
accepted for publication: McClure, K.S., Nagamune, R., Marshman, D., Chmelyk, T.,
and Gopaluni, R.B. Robust level control of a dry-surge ore pile. Canadian Journal of
Chemical Engineering (In Press).
I, Ken McClure, was the main contributor with respect to research for the imple-
mentation of the control strategy in this thesis at Gibraltar Mine, along with all other
content of this work.
iv
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Model-based methods . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Data-based methods . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Controller performance recovery . . . . . . . . . . . . . . . . . . . . . . 21
2.3 SAG milling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 Process overview . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.2 Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
v
4 General Control Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.1 Principal component analysis . . . . . . . . . . . . . . . . . . . 42
4.1.2 Locally linear embedding . . . . . . . . . . . . . . . . . . . . . . 50
4.1.3 Linear and quadratic classification . . . . . . . . . . . . . . . . . 61
4.2.1 Modeling the abnormal operating region . . . . . . . . . . . . . 67
4.2.2 Override MPC design . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.3 Modifications to existing MPC . . . . . . . . . . . . . . . . . . . 74
5 Case Study of SAG Mill Overload . . . . . . . . . . . . . . . . . . . . . 76

5.1 Description of the process . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.1.1 Control objective . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.1 Selection of measured and inferred process variables . . . . . . . 81
5.2.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2.3 SLLEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.4 Comparison between PCA-LDA and SLLEP-LDA . . . . . . . . 105
5.3 Modeling of the overloaded operating region . . . . . . . . . . . . . . . 109
5.4 Recovery from the SAG mill overload . . . . . . . . . . . . . . . . . . . 117
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
vi
A Selection of Process Variables . . . . . . . . . . . . . . . . . . . . . . . . 139
A.1 Transformation of sound level . . . . . . . . . . . . . . . . . . . . . . . 141
B Locally Linear Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 143

B.1 Minimization of reconstruction error . . . . . . . . . . . . . . . . . . . 143
B.2 Minimization of embedding cost . . . . . . . . . . . . . . . . . . . . . . 144
B.3 Least squares regression to map new samples . . . . . . . . . . . . . . . 144
C Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
D Model Predictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . 148
E Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
vii
List of Tables
5.1 Input and output variables critical to the recovery control strategy . . . 79
5.2 Properties of the candidate process variables . . . . . . . . . . . . . . . 82
5.3 Classification results for PCA Method 1 . . . . . . . . . . . . . . . . . 96
5.4 PCA modeling results . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.5 PCA-LDA classification results for the 2D and 1D feature spaces . . . . 99
5.6 Summary of the variables retained in each candidate model . . . . . . . 101
5.7 SLLEP Classification results of four candidate models . . . . . . . . . . 105
5.8 Classification boundaries for SLLE and the classification error rate . . . 105
5.9 Comparison of the classification results between PCA-LDA and SLLEP-
LDA on the test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.10 Modeling steps and fitting results . . . . . . . . . . . . . . . . . . . . . 112
5.11 Best fit models and their parameter variances . . . . . . . . . . . . . . 114
5.12 Key performance indices to measure the potential savings between Sce-
nario 1 and Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.13 Assumptions for calculating the key performance measures . . . . . . . 124
viii
List of Figures
1.1 Example of a process operation with two states . . . . . . . . . . . . . 4
2.1 Types of faults and their development with time . . . . . . . . . . . . . 8

2.2 Residual generation by a parity relation for process monitoring . . . . . 11
2.3 Residual generation by observer for process monitoring . . . . . . . . . 12
2.4 Architecture of a neural network . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Bivariate normal sampled data showing the relative contributions of the
two principal components . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Side view of the ellipse from PCA on 3D data to emphasize dimension-
ality reduction and outlier detection . . . . . . . . . . . . . . . . . . . . 17
2.7 SVM trained with two classes . . . . . . . . . . . . . . . . . . . . . . . 19
2.8 Dimensionality reduction by LLE performed on a Swiss Roll . . . . . . 20
2.9 Schematic of a typical grate-discharge SAG mill . . . . . . . . . . . . . 28
2.10 Mill power curve for several process conditions . . . . . . . . . . . . . . 29
2.11 Cross-sectional view of the inside of a SAG mill showing load torques
and maximum potential energy of cataracting ore . . . . . . . . . . . . 30
2.12 Distribution of slurry within a mill . . . . . . . . . . . . . . . . . . . . 32
2.13 Cross-sectional view of the inside of a SAG mill showing load torques . 33
2.14 Cross-sectional view of the inside of a SAG mill showing load torques
and load distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1 Flowchart of the general control strategy applied to a chemical process 40
ix
4.2 Block diagram of the major steps to statistical pattern recognition for
process monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 The three steps of the LLE algorithm . . . . . . . . . . . . . . . . . . . 52
4.4 Locally linear reconstruction of xi . . . . . . . . . . . . . . . . . . . . . 54
4.5 LLE mapping of an ‘S’ shaped curve to show the accuracy of LLE
projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6 3D manifold of process operation with two planes showing the linearized
regions the nominal and recovery controller are designed to operate . . 68
4.7 Block diagram for the input-output relationship of a modelled process . 69
4.8 MPC controller overrides the MV’s of the existing controller to recover
the process after a process shift . . . . . . . . . . . . . . . . . . . . . . 72
4.9 Expert rules override the MV’s if the process is operating in a severely
abnormal region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.10 Existing MPC controller is modified with a constraint variable to allow
the MPC to recover the process after a process shift . . . . . . . . . . . 75
5.1 Overview of existing SAG mill control at Gibraltar Mine . . . . . . . . 78

5.2 Candidate process variables of the training data-set for process monitoring 88
5.3 Pair-wise scatter plot of the selected process variables . . . . . . . . . . 91
5.4 Candidate process variables for a segment of the test data-set for process
monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5 Candidate process variables of the training data-set for PCA Method 1 94
5.6 Q and T 2 results on the test data-set for PCA Method 1 . . . . . . . . 95
5.7 LDA classification results on the PC score space of the training data-set 98
5.8 LDA classification results on the PC score space of the test data-set . . 100
5.9 LDA and QDA classification on the training data-set mapped into 2D
space by SLLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.10 LDA and QDA classification on the test data-set mapped into 2D space
by SLLEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
x
5.11 Output from process monitoring on the test set for both PCA-LDA and
SLLEP-LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.12 Model fitting results for the response of the process shift to changes in u1 111
5.13 Fitting P1D to a new training data-set in order to fit a model for u2 on
the fit remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.14 P1D fitting results on a test data-set to validate model . . . . . . . . . 115
5.15 Scenario 1: control strategy has not been incorporated to mitigate the
incipient disturbance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.16 Scenario 2: process shift constraint variable is exceeded causing the
MPC to correct for the incipient disturbance before severe overloading
results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.17 Scenario 3: abrupt disturbance leads to significant overloading that the
expert system mitigates before returning control to the MPC . . . . . . 122
A.1 Difference between measured sound level and its 0.4 day moving average 141
xi
Notation
D number of dimensions in a high-dimensional data-set

F statistic for the F continuous probability distribution
Gp process transfer function
H∞ Hardy space with minimax restrictions in the frequency domain
J cost function
Kf number of folds in cross-validation
Kp process gain
K number of nearest neighbors to a data point
Qα upper control limit of the test statistic for principal component analysis
Q test statistic for principal component analysis
R0 Euclidean distance between vectors considering class information
R Euclidean distance between vectors before considering class information
Tα2 upper control limit of the test statistic for principal component analysis
T2 test statistic for principal component analysis
U (s) input signal in Laplace domain
W weights
Y (s) output signal in Laplace domain
∆ difference operator
Λ diagonal eigenvalue matrix
Φ embedding cost
< real number
α percent confidence level
xii
r̄ average model residual per sample
x̄1 mean vector of nominal operating data
x̄2 mean vector of abnormal operating data
δ binary class information between two vectors
η nearest neighbor
γ class information modifier
ŷ(t + k|t) predicted value of y(t + k) with available information at instant t
∈ element of
λL Lagrange multiplier
λm unknown parameter in cross-validation
λ eigenvalue
A mapping matrix in locally linear embedding projection
C correlation matrix
E residual matrix
G system dynamic matrix
I identity matrix
K controller matrix
Q penalty on move weight matrix
R penalty on error weight matrix
S1 covariance matrix of nominal operating data
S2 covariance matrix of abnormal operating data
S pooled pooled covariance matrix
S covariance matrix
T principal component score matrix
V eigenvector matrix
xiii
X observation data matrix
Y feature data matrix
Λ eigenvalue matrix
⊗ cross product of two vectors
π1 class label for nominal data
π2 class label for abnormal data
matrix inequality
τc torque provided by charge
τn numerator time constant
τp torque provided by slurry pool
τ time constant
θ time delay
ε reconstruction cost
a basis mapping vector locally linear embedding projection
f free response vector
t principal component score vector
u array of future control signals
x observation data vector
y feature vector
T
transpose
df degrees of freedom
d number of dimensions of a low-dimensional mapped data-set
f arbitrary feature of a process
n1 number of observations of nominal operating data
n2 number of observations of abnormal operating data
xiv
nu number of controlled process inputs
ny number of controlled process outputs
n number of observations in a data-set
p number of retained principle components
s operator variable in the frequency domain
tf time of onset of a fault
t time
u process input variable
w process input disturbance variable
xc length of the charge torque arm
xp length of the slurry pool torque arm
x process state variable
y process output variable
|S| determinant of matrix S
||v||2Q equivalent to v T Qv
1D one-dimensional
2D two-dimensional
3D three-dimensional
ANN artificial neural network

APER apparent error rate
AV constraint variable
BP bearing pressure
xv
c.g. center of gravity
CPV cumulative percent variance
CV control variable
DCS digital control system

DE discharge-end
DPCA dynamic principal components analysis
DV disturbance variable
EKF extended Kalman filter
FE feed-end
FOPTD first-order plus time delay
FTC fault tolerant control
GPM gallons per minute
KF Kalman filter
KNN K-nearest neighbors
KPCA kernel principal components analysis
LDA linear discriminant analysis

LLE locally linear embedding
LP linear program
LTI linear time-invariant
xvi
MP motor power
MPC model predictive control
MV manipulated variable
PC principal component
PCA principal components analysis
PEmax maximum potential energy
PF particle filter
PID proportional-integral-derivative
PRESS prediction error sum of squares
QDA quadratic discriminant analysis

QP quadratic program
SAG semi-autogeneous grinding

SLLE supervised locally linear embedding
SLLEP supervised locally linear embedding projection
SOPTD second-order plus time delay
SP set-point
STPH short tons per hour
SVD singular value decomposition
SVM support vector machines
UIO unknown input observer
xvii
UKF unscented Kalman filter
USD United States dollars
XV cross-validation
xviii
Acknowledgements
This thesis would not have been possible without the encouragement, support, and
insight provided from my academic supervisors Dr. Bhushan Gopaluni, Dr. Ryozo
Nagamune, and Dr. Sirish Shah, or from my industrial supervisors Terry Chmelyk,
Manny Sidhu, and Devin Marshman at Spartan Controls. To them, I owe a huge debt
of gratitude over these past two years.
I am grateful to the University of Alberta for allowing me to study at their school

under the Western Dean’s Agreement. Dr. Shah deserves a huge thank-you for ac-
cepting me as his student and for encouraging me to take a productive course load at
the university.
I am also grateful to NSERC and Spartan Controls whose financial support made this
work possible.
I thank Dr. Nagamune and Dr. Gopaluni for managing the control group at the Uni-
versity of British Columbia. This control group brought me together with colleagues
outside of the department, and it motivated me to design a multivariate model pre-
dictive controller that was useful in this work.
I sincerely thank my parents for their never-failing interest and support throughout
all the years of my study. They are an inspiration to work hard in life. I also thank
my uncle, Dan Ashman, for his inspiration to become an engineer and exposing me to
mineral processing.
xix
I am thankful to Gibraltar Mine for allowing me to use their SAG mill as a case study
for my thesis. It is a rare opportunity to have been able to implement this work in
the industry and solve a problem in great need of a solution.
I would like to make a very special acknowledgement to my partner, Vanya Jongkind,

for her love, patience, and support throughout these years.
xx
Chapter 1
Introduction
It is well-known that industrial processes are controlled over a narrow operating region
to maximize profitability. Moreover, processes are usually most profitable at the inter-
section of process constraints (Garcı́a et al., 1989). Since processes are intended to be
controlled over a narrow operating region, control engineers appropriately assume this
narrow operating region is ‘locally linear’ and describe the process dynamics as linear
time-invariant (LTI) (Ljung, 2001). The assumption of an LTI process allows control
engineers to design linear controllers for a narrow and desirable operating region, also
known as the nominal operating region, even though the process dynamics may be
nonlinear outside this region.
Controlling industrial processes with linear controllers is advantageous compared
to controlling these processes with nonlinear ones. Linear controllers require rela-
tively minor online computation, and they can be synthesized from many well-known
techniques such as proportional-integral-derivative (PID), model predictive, H∞ , or
µ control methods (Kim et al., 1997). However, linear feedback control, by design,
is inherently limited to the nominal operating region where process economics are
maximized. This limitation exists because the process model is linearized around the
nominal operating region, and a controller is designed based on this linearized model.
Since industrial processes are typically nonlinear, a deviation in the state of the process
away from the nominal operating region where the controller is tuned, also known as
a process shift, results in a degradation of controller performance. If the nonlinearity
1
of the process is significant, the controller may not be able to recover the process back
to nominal. As a consequence, the feedback system may become unstable leading to
safety risks or damage to process equipment (Doyle et al., 1992).
Nimmo (1995) estimated that abnormal process situations cost U.S. industries over
$20 billion each year. For example, the recent BP oil spill in the Gulf of Mexico in
2010 resulted in billions of dollars in damages to the fishing and energy industries
and had enormous social and environmental impacts (Gill et al., 2011). This disaster
started when the cement barrier at the bottom of the well failed. Afterwards, a series
of equipment failures and poor monitoring and control practices resulted in the venting
and ignition of a hydrocarbon-rich vapour that ultimately sank the oil rig and caused
one of the worst oil spills in history (Bly, 2011; Liu et al., 2011). In this example,
process monitoring could have detected the failure of the cement barrier from the
measured increase of the drill pressure, and an automatic recovery control strategy
could have responded 40 minutes faster than the operators onboard the oil rig (Bly,
2011). As a result, the hydrocarbons could have been diverted to a safer atmosphere
with less risk of ignition or catastrophic damage to the oil rig (Bly, 2011).
In the chemical and mineral processing industries, a process is usually recovered
once an operator recognizes the degradation of the process performance and manually
manipulates the control elements to recover the process. However, industrial processes
are often very complex with many process variables; consequently, it takes too long
for an operator to recognize the process has shifted and intervene. In addition, the
actions of an operator are not optimal, and it often takes more time than necessary to
manually recover the process to its nominal operating region (van Drunick and Penny,
2005). As an example, in the case study for this thesis, semi-autogenous grinding
(SAG), operators usually over-correct a mill overload to the point where the mill
becomes underloaded—leading to poor throughput and excessive liner wear. On the
other hand, past attempts to design a method to automatically recover from a SAG
mill overload have been ad hoc and often lead to sub-optimal results (van Drunick and
2
Penny, 2005).
There are few, if any, generalized control strategies applicable to the chemical in-
dustry that solve both the problem of identifying a process shift and recovering the
process back to the nominal operating region with a linear control method. There have
been efforts in the control field to try to prevent a process from shifting away from
the nominal operating region. These methods focus mostly on disturbance rejection.
However, it is not always feasible to prevent a process shift for every possible scenario.
As a result, another approach has been developed that attempts to make the feed-
back system tolerant to the consequences of a process shift. This approach is called
fault-tolerant control (FTC), where a reasonable system performance and stability is
achieved in the event of an abnormal event, known as a fault, by designing a feedback
system that is tolerant to abnormal events such as process equipment failure. These
two approaches are discussed briefly in the literature review in Chapter 2.
The approach of this work is to design a general control strategy applicable to
industrial processes that experience process shifts regardless of disturbance rejection
efforts. Furthermore, these process shifts are caused by disturbances to the inputs of
the process that are not a result of any equipment failure. Therefore, in most of these
scenarios, the process recovery is controllable (Venkatasubramanian et al., 2003a),
and we can design a control method to recover the process to the nominal operating
state. In this work, this control method is known as ‘controller performance recovery’
because the existing controller for the process is modified or overridden in the event
of a process shift in order to recover control performance and ultimately recover the
process to back to nominal. In this thesis, a general control strategy is sought that,
first, monitors a process to detect a process shift for linear and nonlinear processes,
and afterwards utilizes information from the process monitoring tool to recover linear
feedback control performance and return the process to nominal operation.
Figure 1.1 illustrates the two main elements of the control strategy on an example
process with two states, x1 and x2 . The figure depicts two different types of undesired
3
operating regions. In this example, only the operating region with serious consequences
to process stability needs a recovery control strategy. In this region, the existing
linear controller, also known as the nominal controller, is unable to recover the process
due to the nonlinear nature of the process. However, with an appropriate control
strategy, the process shift can be detected by a process monitoring tool, and a controller
specifically tuned for this operating region can optimally recover the process to the
nominal operating region.
x2
1. Detect Process Shift
2. Recover Process
Nominal State x1
Undesired State – Minor Consequences
Undesired State – Serious Consequences
Figure 1.1: Example of a process operation with two states, x1 and x2 , where 1. the
shift to an undesired operating region is detected, and 2. the process is recovered with
a linear controller
Once a control strategy is developed, it will be tested on a SAG mill at Gibraltar

Mine, located in central British Columbia, Canada. The SAG mill at Gibraltar Mine
sometimes shifts to an overloaded state due to disturbances in the ore hardness, particle
size, and composition. The overloaded state is difficult to detect because the process
is operated as close to an overloaded state as possible in order to maximize the mill’s
throughput, and changes in the state of the mill are only subtle at first. Once the mill
is severely overloaded, the grinding efficiency of the mill decreases to the point where
4
ore rapidly accumulates and the mill’s bearings are at risk of damage. To recover from
the mill overload, operators manually stop the feed ore conveyor for a long period of
time to let the mill grind the ore charge. However, these actions usually cause the
state of the mill to be overshot to an underloaded operating state.
The SAG mill at Gibraltar Mine is an excellent industrial application for the control
strategy in this thesis. The SAG mill can benefit from earlier detection of the process
shift to an overloaded operating state. Furthermore, the control system can automat-
ically recover the process to nominal operation in minimal time without underloading
the mill. This case study will showcase the adaptability of the general control strat-
egy to a real industrial process, and it demonstrates the return of investment possible
when implementing this technique.
The main contribution of this thesis to the chemical and mineral processing in-
dustry is the development of a control strategy that can be used in a digital control
system to detect and recover a process that has shifted to a nonlinear operating range
where the existing controller performs poorly. This control strategy allows industrial
processes to operate in process conditions that are close to constraints or undesired
operation regions to enhance the economic performance of the process. In addition,
a technique that allows locally linear embedding to be commissioned in an industrial
controller for use as a nonlinear process monitoring technique is also discussed in this
work. Finally, the control strategy developed in this thesis was applied to the SAG
mill at Gibraltar Mine, and it addresses the problem of detecting and correcting a
mill overload before controller performance degrades excessively. As a result, Gibral-
tar Mine can operate their process at a higher throughput with less risk that manual
intervention will be needed in the event the process is significantly disturbed.
The remainder of this thesis is organized as follows. In Chapter 2, a literature
review covers the most relevant process monitoring and control techniques to this work.
Chapter 3 develops the problem statement and scope for this work. In Chapter 4, a
general control strategy is developed that addresses the issues identified in the problem
5
statement. An industrial implementation of the control strategy on the SAG mill at
Gibraltar Mine is presented in Chapter 5. Finally, in Chapter 6, the conclusions and
future work are discussed.
6
Chapter 2
Literature Review
The literature review focuses on two topics: process monitoring and controller perfor-
mance recovery. Process monitoring tools are able to estimate whether the state of
a process is nominal or abnormal. Controller performance recovery tools are control
methods that recover a process from an abnormal operating situation to the nominal
operating region where process economy is maximized and the existing nominal con-
troller has the highest performance. Both topics are broad with numerous techniques
available. Emphasis is placed on contributions relevant to the work in this thesis.
2.1 Process monitoring
Process monitoring is an important problem in the chemical and mineral processing

industry because early detection of abnormal, also referred to as undesired, process
conditions can help prevent escalation of abnormal events and consequent production
loss. Traditionally, the industry has relied on human operators to monitor processes.
However, process monitoring is a non-trivial task because industrial processes are
usually complex multivariate systems. For example, a large process plant may have
more than 1500 variables that are measured every few seconds (Bailey, 1984). In
these systems, measurements may remain within their acceptable alarm limits when
the process is in an abnormal operating region, thus making it difficult for operators
to detect by monitoring the data directly (Serpas et al., 2012). However, a process
7
monitoring system would monitor correlations between multiple variables to determine
when the process deviates to an undesired state. The benefits of an effective process
monitoring method include improved process economics, safety, and environmental
impact.
Abnormal events can manifest in many ways such as process unit failures, instru-
mentation failures or drifting, and process degradation (Venkatasubramanian et al.,
2003a). Furthermore, unmeasured variables, such as feed characteristics, may also
introduce large disturbances into a process that shift the process to an undesired op-
erating region. Regardless of the root cause, any departure from an acceptable range of
an observed variable or calculated parameter is known as a fault (Himmelblau, 1978).
Isermann (2005) showed that the time dependency of these faults can be divided into
three categories as shown in Figure 2.1. The variable f represents the change of some
feature, or operating state, of a process due to a fault, and tf signifies the time of onset
of the fault. Abrupt faults cause an immediate and drastic change to some feature of
the process; for example, a process unit such as a pump has failed. Incipient faults
progressively alter the feature of a process; for example, a measurement slowly drifts
due to poor maintenance of instrument calibrations. Lastly, intermittent faults are
transient faults that exist for a short time—only to eventually reappear again.
f Abrupt
Incipient
Intermittent
tf t
Figure 2.1: Types of faults and their development with time
Fault detection methods can be divided into two categories: model-based methods
8
and data-based methods (Serpas et al., 2012), and both categories can be further
subdivided into either quantitative and qualitative methods. Ohran et al. (2009)
discuss the main differences between data-based and model-based methods as follows:
model-based methods generate data that are compared to measured outputs to create
residuals relating to specific process conditions, while data-based methods estimate
the location and direction of the system trajectory in state space.
Desirable characteristics of a fault detection system are discussed by Venkatasub-
ramanian et al. (2003a) and the points relevant to this work are listed below:
1. Quick detection: there is a trade-off between quick fault detection and sensitivity
to high frequency influences such as noise.
2. Robustness: the performance should degrade gracefully rather than abruptly,

and the method should be robust to noise and uncertainties.
3. Novelty identifiability: the ability to successfully detect the known fault from
the occurrence of unknown, or novel, faults that may arise.
4. Estimate classification errors: provide a confidence level for the classification of

a fault condition.
5. Adaptability: the adaptability of the method to changing process conditions

such as disturbances, environmental conditions, production targets, etc.
6. Modeling requirements: modeling effort should be minimal.
7. Computational requirements: the method must operate at a satisfactory scan

rate in an industrial controller.
Although a process monitoring method may satisfy these desirable characteristics,

it will not be utilized unless the operators of the process accept the method. To
increase operator acceptance, the method should be intuitive and simple to operate
and maintain. Furthermore, the operators need confidence that the output of the
9
process monitoring tool is accurate and makes sense. Therefore, for new measurements
it is preferable to have a two, or at most, three-dimensional plot depicting the output
of the process monitoring tool and the classification boundary between the abnormal
and nominal operating conditions.
2.1.1 Model-based methods
Model-based process monitoring tools require a model of the process, often in the form
of quantitative methods such as parity relations, diagnostic observers, or optimal state
estimators. The output from these models is the expected behaviour of the process, and
it is compared to the actual behaviour to generate inconsistencies known as residuals
(Venkatasubramanian et al., 2003a). Residual signals are useful for fault detection
because they may reflect the occurrence and magnitude of process abnormalities.
The advantages of model-based methods over data-based methods, especially for
first-principles models, is that the models reflect physical laws of the process rather
than coincidences which may only occur under certain conditions. Furthermore,
model-based methods can be reused on similar processes. Additionally, the assump-
tions and limitations of the models are typically more clear than data-based methods.
However, model-based methods are usually developed on some fundamental under-
standing of the physics of the process (Venkatasubramanian et al., 2003b), and require
very accurate process models to be effective for process monitoring (Ohran et al., 2009).
Many chemical processes are so complex and poorly understood that it may not be
possible to develop a sufficiently accurate model for model-based process monitoring
(Yoon and MacGregor, 2001).
There are many model-based techniques that can be used for process monitor-
ing. Each technique offers advantages and disadvantages in terms of computational
requirements, required model accuracy, and assumed a priori knowledge of the physics
of a process. Moreover, some techniques are better suited for nonlinear processes or
stochastic processes. Model-based techniques can be separated into qualitative and
10
quantitative methods. Qualitative methods, such as cause-effect models, are discussed
by Venkatasubramanian et al. (2003b). Their main advantage is in diagnosing the
root cause of process abnormalities, and the methods are less sensitive to process
drift. However, these methods may lead to spurious solutions, and a measurement
of the magnitude of process shift is not provided as an output from these methods.
Three model-based methods that can be used for process monitoring are discussed in
this section: parity relations, diagnostic observers, and stochastic filters.
Parity relations
A common quantitative model-based process monitoring technique is to develop parity

relations. Parity relations are most commonly used for analytical redundancy that may
indicate sensor or actuator failures. A parity relation is obtained by rearranging or
transforming the input-output process models. The general structure of the parity
approach is illustrated in Figure 2.2. This approach is useful when measurements are
not heavily corrupted with noise and sensors are calibrated frequently to minimize
drift.
Measured
Inputs u(t) Process
Outputs y(t)
Residual
Generator
(Parity Relation)
Residuals r(t)
Residual
Evaluator
Classification
Figure 2.2: Residual generation by a parity relation for process monitoring
11
Diagnostic observers
Chou (2000) discusses the application of diagnostic observers, also known as unknown
input observers (UIO), as a model-based process monitoring technique. Often, a dy-
namic system can be modelled as having unknown inputs representing disturbances.
In a disturbance-free case, the UIO is designed so that its output tracks the output
of the process regardless of the unknown inputs. When the process is disturbed, the
residuals carry a ‘signature’ of the resultant process shift. The general structure of
the UIO approach is shown in Figure 2.3. The figure shows that the UIO approach
is different from the parity approach in that the observer provides an estimate of the
state of the process, and it is designed to track the real state asymptotically under
nominal operating conditions.
Measured
Inputs u(t) Process
Outputs y(t)
Unknown Input
Observer
State Estimates x(t)
Residual
Generator
Residuals r(t)
Residual
Evaluator
Classification
Figure 2.3: Residual generation by observer for process monitoring
12
Stochastic filters
An extension of the observer approach is the optimal stochastic filter known as the
Kalman filter (KF) originally proposed by Kalman (1960), and applications of the KF
to process monitoring are discussed by Basseville (1988). The KF is used to estimate
the state of the process from a series of measurements corrupted by white Gaussian
noise (Frank et al., 2000). A KF uses a weighted average of the model prediction of a
system and a new measurement to produce a new state estimate with minimum mean
square error. A KF operates recursively and requires only the previous most trusted
estimate; therefore, a KF can be solved in real-time.
The KF has been adapted to nonlinear processes with white Gaussian noise. The
Extended Kalman filter (EKF) (Maybeck, 1979) and the unscented Kalman filter
(UKF) (Wan and Van Der Merwe, 2000) are two such approaches. These methods are
more computationally complex than the KF; however, these techniques can be adapted
for online process monitoring. For example, Quine (2006) discusses a derivative-free
implementation of the EKF that avoids the need to calculate the Jacobian matrix.
Particle filters (PFs) can be employed for processes with nonlinear dynamics where
the noise is non-Gaussian (Manohar and Roy, 2006). Monte Carlo simulations are
performed to provide an asymptotically consistent distribution of the state of the
process; however, PFs are computationally expensive and require an exact process
model (Saha and Roy, 2011).
2.1.2 Data-based methods
In contrast to model-based methods where a priori knowledge about the process is

needed, data-based process monitoring requires only a sufficient database of historical
process data. Consequently, for systems that are complex or poorly understood, a
data-based method is advantageous because the process dynamics would naturally be
taken into account if enough historical data around various operating points is provided
13
(Venkatasubramanian et al., 2003c). Ohran et al. (2008) mentions the disadvantage
of these methods is they usually require fault specific historical data which may not
always be available; however, statistical methods for unsupervised learning of the
models may be utilized in these cases.
In data-based process monitoring, features are extracted from a set of data that
present some priori knowledge of the system. Specifically, data-based methods are
pattern recognition problems (Venkatasubramanian et al., 2003c). Each data-based
method demonstrates some advantage or disadvantage in terms of the desirable char-
acteristics of a fault detection method as listed in Chapter 2.1. Furthermore, each
technique has a specific class of systems it is intended to be applied to. For example,
some methods work best for only linear or multivariate normal systems.
Data-based methods fall into two categories: qualitative methods and quantita-
tive methods. Two important qualitative methods for process monitoring are expert
systems and trend analysis (Venkatasubramanian et al., 2003c). These techniques are
best suited to diagnosing the root cause of a fault rather than detecting the onset
of an abnormal event. For example, expert systems match existing symptomatic in-
formation for a fault, such as alarm information or operator supplied information,
with a rule-set involving logical condition statements (Ramesh et al., 1988). Venkata-
subramanian et al. (2003a) suggest that the best fault detection method is likely a
hybrid of process monitoring methods in order to both detect and diagnose faults. In
this section, we discuss four different data-based tools that may be used for process
monitoring: artificial neural networks; principal component analysis; support vector
machines; and locally linear embedding.
Artificial neural networks
Artificial neural networks (ANN) are proposed by Venkatasubramanian and Chan

(1989) and Patton et al. (1994) as an effective non-statistical quantitative fault de-
tection and diagnosing technique for nonlinear and dynamic systems in the chemical
14
industry. ANN are capable of mapping multi-input and multi-output systems through
a network of highly interconnected layers of neuron-like processing elements as shown
in Figure 2.4. There are three layers to an ANN: the input layer where signals enter
the network, a hidden layer where input signals are routed to the output layer, and
the output layer which contains the response to a given set of inputs.
Input Layer Hidden Layer Output Layer

Figure 2.4: Architecture of a neural network
ANN are trained with historical process data to modify the weights of all intercon-
nections in the network and create an input-output relation useful for detecting faults.
However, Venkatasubramanian and Chan (1989) mention ANNs are limited because of
the comprehensive and excessive amounts of data or very detailed simulation models
required for training. In addition, Zhou et al. (2003) found ANN needed to be con-
stantly re-tuned due to the time-varying behaviour of some processes. Consequently,
industrial applications of ANN for fault detection are rare (Koivo, 1994). As an al-
ternative, multivariate statistical methods can be used for fault detection in place of
ANN and do not require as extensive of a historian database.
Principal component analysis
Multivariate statistical techniques have proven in the industry to be effective fault

detection techniques. The major benefits of these methods is that there are no re-
quirement for a fundamental or causal model of the system; additionally, they handle
15
large problems, missing data, and noise efficiently (Yoon and MacGregor, 2001). One
technique that has seen success in the chemical and mineral processing industries is
principal component analysis (PCA). The main concept behind PCA is to transform
a number of related process variables to a smaller set of uncorrelated variables, called
principal components (PCs), containing the most relevant information (Cheng et al.,
2008). Mathematically, PCA utilizes singular value decomposition (SVD) on the co-
variance matrix of a data-set of the relevant process variables to produce its eigenvec-
tors and eigenvalues (Garcı́a-Álvarez et al., 2012; Johnson and Wichern, 2007). Only
the eigenvectors that explain most of the variability in the data are selected, resulting
in the chosen PCs. Figure 2.5 shows a bivariate normal set of data for two arbitrary
process variables y 1 and y 2 with the magnitude and direction of the two PCs for the
data. The figure shows that PC1 explains significantly more variability than PC2 ; as
a result, PC2 may be discarded to reduce the dimensionality of the fault detection
model.
y2 PC1
PC2
y1
Figure 2.5: Bivariate normal sampled data showing the relative contributions of the
two principal components
From the information-rich PCs, it is possible to perform fault detection through

statistical analysis of the PC scores of new sampled data to discover outliers. Two
commonly used methods to detect data outliers are the Q test, also known as squared
16
prediction error, and the Hotelling T 2 test (Garcı́a-Álvarez et al., 2012). These meth-
ods are two different measures of distance that reveal if any PC scores are located
significantly far from the center of the ellipse of the PC model. Figure 2.6 portrays
data dimensionality reduction and outlier detection for three-dimensional data from
arbitrary process variables y 1 , y 2 , and y 3 . In this figure, the black line through the
data points depicts a side view of a 2D ellipse discovered by PCA. Data points along
the ellipse axis, but outside of the ellipse itself, are detected by the T 2 statistic and
represent abnormal observations whose variances can be explained by the PC model.
Data points that are far away from the axis of the ellipse are detected by the Q statis-
tic and represent the variances that cannot be explained by the PC model (Ko and
Shang, 2011).
y2 Normal operating data

Outside of normal operating range
Correlation structure does not hold
Show up in Q test
y1
y3 Show up in T2 test
Figure 2.6: Side view of the ellipse from PCA on 3D data to emphasize dimensionality
reduction and outlier detection
PCA in its basic form has limited application for fault detection in the chemical
industry because many processes are nonlinear and/or dynamic. Hence, PCA has been
adapted in many ways to address these shortcomings. To address serial correlations
between variables due to process dynamics, Ku et al. (1995) proposed a technique
called dynamic PCA (DPCA). DPCA accounts for serial correlations by augment-
ing observation vectors in the PCA method with observations at previous sampling
17
instances. However, a study by Russell et al. (2000) found that for many common
industrial faults, DPCA gave similar performance to PCA in terms of detection delay
and missed faults.
To extend PCA to nonlinear processes, Schölkopf et al. (1998) proposed a method
called kernel PCA. Kernel PCA received attention as a nonlinear fault detection tech-
nique because it did not involve a nonlinear optimization problem to be solved as
had been required of predecessors of nonlinear PCA based on neural networks (Choi
et al., 2005; Kramer, 1991). Kernel PCA uses integral operators and nonlinear kernel
functions to map an input space to a feature space via a nonlinear mapping and then
it computes the PCs in that space (Lee et al., 2004). Because kernel PCA can use a
variety of nonlinear kernel functions, it can handle a wide range of process nonlineari-
ties. However, Choi et al. (2005) note that the computational requirements for kernel
PCA may be unreasonable for large systems with many samples, and the feature space
is difficult to interpret compared to other methods such as linear PCA.
Support vector machines
An alternative to kernel PCA is support vector machines (SVMs). SVMs were origi-
nally proposed by Boser et al. (1992) as a machine learning technique to classify data
having two different pre-defined categories, and the technique has been evolving since.
SVMs have shown to be suitable for fault detection in order to classify data as either
faulty or non-faulty. Figure 2.7 shows the principle of SVMs on a two-dimensional plot
of arbitrary process variables y 1 and y 2 . The technique discovers an optimal linear
classifier in the form of a hyperplane which maximizes the margin between data points
on either side of the hyperplane (Tong and Koller, 2002).
If a nonlinear classification boundary is required, a kernel transformation can be
used to map the data to a higher-dimensional feature space where a separating hy-
perplane in this space results in a nonlinear boundary when re-mapped to a lower-
dimensional space. SVMs can also be adapted with unsupervised learning so that it
18
y2 Margin
Normal Sample Hyperplane

Fault Sample
y1
Figure 2.7: SVM trained with two classes where data samples on the margin are called
support vectors
only requires nominal operating data in the learning algorithm, rather than requiring
data from both nominal and abnormal classes (Mahadevan and Shah, 2009). The
disadvantage to SVMs, however, is that the kernel mapping technique for nonlinear
boundaries is too computationally expensive to be used online in an industrial con-
troller.
Locally linear embedding
One of the most promising feature extraction techniques with minimal computational
requirements for nonlinear processes is locally linear embedding (LLE). LLE was first
proposed by Saul and Roweis (2000) as an image analysis technique and has found
applications as a fault detection technique, such as for the machinery fault detection
simulated case study performed by Li and Zhang (2011). However, no application or
case studies of LLE for the chemical or mineral processing industries have been docu-
mented in literature, nor has this technique been applied in an industrial environment.
LLE is a simple and elegant technique to reduce data dimensionality and extract
important features within data that are useful for detecting abnormal operation. For
example, Figure 2.8 demonstrates LLE reconstructing a three-dimensional (3D) ‘Swiss
Roll’ shaped manifold into two-dimensional (2D) space with the shape’s key features
19
intact. To accomplish this reconstruction, LLE assumes that each data point and
its neighbors lie on a locally linear portion of its reduced dimension manifold (Li
and Zhang, 2011). Accordingly, LLE often utilizes the K-nearest neighbors (KNN)
method to determine the neighbors of each data point (Johnson and Wichern, 2007).
The technique also requires the computation of two quadratic minimization problems
to determine the optimal weights and coordinate vector in the reduced dimension. If
the problem is well posed, the coordinate reconstruction can be minimized by solving
a sparse n × n eigenvector problem, where n is the number of observations in the data
matrix (Roweis and Saul, 2000).
Figure 2.8: Dimensionality reduction by LLE performed on A, a 3D manifold of a

Swiss Roll, where B is the sampling of the 3D manifold, and C is the reconstruction
by LLE of the Swiss Roll in 2D space (Roweis and Saul, 2000)
Li and Zhang (2011) observed that LLE neglects class information which would
allow the technique to be trained with supervised learning to enhance fault recognition
accuracy. Likewise, Li and Zhang (2011) found that LLE does not provide an explicit
mapping from high to low-dimensional space. Consequently, new data samples can-
not be mapped to the reduced dimension space resulting in the inability to use LLE
for fault detection in an online controller. However, Li and Zhang (2011) proposed
two techniques to overcome these drawbacks. The first is to artificially add a distance
between data samples that are from different classes. The second is to find a locally lin-
ear projection that best approximates the high to low-dimensional mapping performed
by LLE. Subsequently, a simple classification technique, such as linear discriminant
20
analysis (LDA), is used on the low-dimensional feature space discovered by LLE in
order to classify fault and non-fault conditions. The resulting method is named by the
authors as supervised locally linear embedding projection (SLLEP). The results from
Li and Zhang (2011) show that SLLEP offers significant improvements in accuracy
over PCA, and that the method, once trained, is relatively simple to implement and
is computationally inexpensive in a digital control system (DCS).
Since LLE is only a feature extraction technique, a simple classifier is used on the
feature space discovered by LLE. LDA, also known as Fisher’s discriminant analysis
and originally proposed by Fisher (1936), may be used as a linear classifier. LDA is
a computationally simple technique to train. In addition, LDA provides an intuitive
classification boundary between nominal and abnormal operating data. In this way,
LDA is more advantageous than SVM as a classifier. Furthermore, LDA can be ex-
tended to a multiclass linear classifier (Rao, 1948). This is especially useful if separate
control strategies are needed for multiple abnormal process operating regions. Since
LLE preserves the nonlinear features of process data, LDA may need to be extended
to its quadratic version, known as quadratic discriminant analysis (QDA), to capture
this nonlinearity. Both techniques assume a multivariate normal distribution of the
data; however, LDA assumes the covariances of the abnormal and nominal operating
data are the same. QDA is derived from one fundamental difference: the covariances
of the abnormal and nominal operating data are not assumed to be equal (McLachlan,
2004). Li and Zhang (2011) state that the mapping from SLLEP is more separable
than LLE and can be sufficiently classified with either LDA or QDA.
2.2 Controller performance recovery
Controller performance recovery is that the task of recovering a process from an ab-
normal, nonlinear operating region, back to nominal operation where the nominal
linear controller is tuned for its highest performance. Furthermore, a process shift to
an abnormal operating region is usually caused by disturbances to the process that
21
were unaccounted for in the nominal control strategy. Unfortunately, it is not always
feasible to prevent disturbances from causing a process shift. Chen (2004); Liu and
Peng (2000) propose modeling observers that identify external disturbances and then
compensate for the effects of these disturbances with proper feedback. However, this
is a model-based process monitoring method, hence very accurate process models are
required to design an effective observer. Krohling and Rey (2001) propose another
method where a PID controller can be designed to reject disturbances if the H∞ norm
of the disturbance can be assumed to be limited. In both approaches, there are lim-
itations to their applicability; as a consequence, when these limitations are violated
the process may still deviate from its nominal operating envelope. We instead discuss
five techniques in this section that improve controller performance in the event of a
process shift: fault-tolerant; nonlinear; robust; gain scheduling; and override control.
Fault-tolerant control
Since abnormal situations may still arise despite disturbance rejection efforts, there
has been considerable interest in literature in fault-tolerant control. FTC is a control
strategy that is tolerant to component malfunctions in order to maintain desirable
closed-loop performance and stability (Patton, 1997; Zhang and Jiang, 2008). In
the event an abnormal situation is detected, FTC attempts to reconfigure the control
system by selecting one of several predetermined control configurations whose stability
regions have also been predetermined. The selected control configuration is the one
whose state of the closed-loop system resides in the stability region and has the most
minimal cost function (Mhaskar et al., 2005).
The limitation of FTC is that it is not a control strategy designed to recover a
process in the event of a process shift. Rather, FTC is designed to avoid disaster in
the event of a malfunction of process equipment, sensors and actuators, controller,
or control loop. A process shift, however, is a result of acceptable system behaviour
that causes the process to deviate from the narrow operating range a linear controller
22
is designed for. In this event, a control method is needed that optimizes only the
recovery of the process to its nominal operating range. Nevertheless, FTC shares a
similar overall strategy to the objectives of this thesis, and FTC can offer valuable
insight into process monitoring and controller performance recovery. Other techniques
reviewed with potential application to this work are nonlinear, robust, gain scheduling,
and override control.
Nonlinear control
One strategy that addresses the problem of poor control when a process has shifted
to a new operating region is nonlinear control. Nonlinear controllers are built from a
nonlinear model of the process. As a result, these controllers can have a much wider
range of operation than a linear controller. Existing linear control methods are utilized
to design a nonlinear controller; however, feedback systems with nonlinear control are
usually not analyzable by standard LTI system theory (Nijmeijer and Van der Schaft,
1990). Moreover, Goodwin (2002) shows that it is difficult to guarantee global stability
for a nonlinear system, and nonlinear feedback design techniques are usually associated
with finding a system output that has stable zero dynamics at the desired operating
point. Furthermore, Primbs and Nevistic (1996) show that nonlinear control is not
always optimal since nonlinear control problems are usually non-convex, and there is
often a trade-off between optimality and global stability.
Although a nonlinear controller can offer greater performance and range of op-
erability than a linear controller, the mathematical analysis involved with nonlinear
feedback control is rigorous (Mariño, 1995). Consequently, personnel in the chem-
ical industry are not well-equipped with the knowledge or tools needed to maintain
nonlinear controllers. Besides, Ljung (2001) explain that linear feedback control is for-
giving and achieves good control based on an approximate model. Therefore, nonlinear
controllers are not essential in the chemical and mineral processing industries to main-
tain reasonable performance. Furthermore, linear control methods can be adapted to
23
recover a process from a nonlinear region.
Robust control
Robust control is a technique that can be formulated as a linear control method and be
used in place of the nominal controller in a process. A robust controller can be designed
to achieve robust performance and/or stability for a given set of modelled uncertainty
(Skogestad and Postlethwaite, 2005). Consequently, disturbances pertaining to certain
unknown parameters, such as ore hardness, can be modelled and a robust controller,
such as a µ-controller, can be designed to guarantee stability for a given range of
modelled uncertainty (Packard and Doyle, 1993). The challenge with robust control is
the performance–stability trade-off of the feedback system. To overcome this challenge,
multiple robust controllers may be designed for specific operating intervals. However,
there is generally not enough expertise in the chemical industry to maintain robust
controllers.
Gain scheduling control
Gain scheduling control is a promising linear technique that is a form of adaptive

control. Nonlinear plant dynamics are modelled as a linear parameter-varying sys-
tem, and a Jacobian linearization about a family of operating points is taken in a
method called linearization scheduling (Rugh and Shamma, 2000). A family of linear
controllers are developed for the linear parameter-dependent plant and the controller
gains are scheduled according to the value of the scheduling variable. Gain scheduling
techniques are successfully utilized by industry in many applications such as wind tur-
bine control (Bianchi et al., 2006). However, the main disadvantage of gain scheduling
is that stability can only be guaranteed under the assumption of a slow variation in the
scheduling signals (Leith and Leithead, 2000). Additionally, different gain scheduling
techniques need to be developed depending on whether the nominal controller to be
scheduled is PID (Rugh and Shamma, 2000), MPC (Chisci et al., 2003), or another
24
type.
Override control
Nonlinear, robust, or gain scheduling control strategies focus on replacing the nominal
controller in the process with a new controller that is tuned to control a process at
both the nominal and abnormal operating points. An entirely different approach is
to keep the nominal controller that is tuned for nominal operation, and implement a
secondary controller that is tuned specifically for the abnormal operating region re-
quiring recovery control. The secondary controller would then override the nominal
controller’s actions once the process shifts to the abnormal operating region. This con-
troller would be designed to optimally recover the process back to nominal operation.
The benefit of using an override controller is that the nominal controller in the pro-
cess is kept, and the secondary controller can be designed from linear control methods
that are easy to maintain. The secondary controller would be some form of an optimal
controller to return the process back to nominal operation, such as MPC (Coetzee,
2009). The disadvantage to this approach is that transient signals caused from switch-
ing between controllers may lead to instability. Hespanha and Morse (2002) propose a
method to avoid instability from controller switching by properly selecting controller
realizations where the closed-loop system is bounded for every possible switching sig-
nal. In addition, some modern DCS eliminate feedback on inactive loops to prevent
integral windup that would cause unstable oscillations upon switching.
2.3 SAG milling
The case study selected for this research is a SAG mill at Gibraltar Mine located
north of Williams Lake in British Columbia, Canada. Grinding is one of the most
fundamental unit operations in a mineral processing facility. Optimal operation of
grinding mills is an important research problem because they are the most energy
intensive processes in a mineral processing facility—constituting nearly half of all
25
energy consumed in a concentrator plant (Yang et al., 2010). Furthermore, the product
particle size distribution from grinding mills has a large effect on the recovery of
valuable minerals in downstream processing (Chen et al., 2008).
Many mineral processing facilities, such as Gibraltar Mine at the start of this work,
have only one SAG mill due to their large size and operating cost. An industrial survey
conducted by Wei and Craig (2009) revealed that SAG mills are used in approximately
38% of all concentrator plants, and approximately 35% of all concentrator plants have
one milling circuit. Therefore, SAG mills often bottleneck the total ore throughput of
these facilities, and their stable operation is critical to plant economics. Moreover, if
there is only one SAG mill, the entire plant can be shutdown in the event the mill is
shutdown.
Improving the control of a SAG mill can lead to a substantial increase in the
stability and performance of the mill and downstream processing. However, the input
conditions and internal state of a SAG mill are highly uncertain and disturbances can
easily lead to abnormal and unstable operation, even when the most advanced control
methods available to the industry are utilized. For example, Gibraltar Mine currently
utilizes MPC to control their SAG mill. Accordingly, the MPC at Gibraltar is often
set to manual mode so operators can manage the abnormal operating conditions of the
SAG mill resulting from input disturbances and variations in the gain of the system.
Previous work to address the stability and abnormal conditions of the SAG mill
includes robust control (Galán et al., 2002), adaptive control (Najim et al., 1995), and
disturbance-observer with MPC (Yang et al., 2010). However, the drawbacks are that
robust control sacrifices performance for stability (Skogestad and Postlethwaite, 2005),
and adaptive control will be sluggish when dealing with strong disturbances (Chen
et al., 2009). Disturbance-observer MPC is a promising method to handle disturbances
directly; however, if an unobserved or poorly rejected disturbance perturbs the system
to an abnormal operating state, then the controller will no longer effectively manage
the process.
26
2.3.1 Process overview
In a mineral processing facility, ore size must be reduced in order to liberate the
valuable minerals from the waste rock for downstream processing. Accordingly, a
SAG mill is often used in these facilities as a means to grind the ore. SAG mills are
unique to other grinding processes because they utilize the ore itself to reduce material
size, and can typically grind ore from 200 mm down to 0.1 mm in one unit (Wills
and Napier-Munn, 2006). Semi-autogenous grinding implies that a steel ball charge,
usually between 4% to 15% of the total mill volume, is added to the ore charge to
enhance the grinding process. In these mills, the total charge volume usually occupies
between 30% and 35% of the mill interior (Gupta and Yan, 2006).
The main components of a typical grate-discharge trunnion supported SAG mill,
such as the one operated by Gibraltar Mine, are described below and shown in Fig-
ure 2.9.
1. Feed trunnion: provides a support for the mill and an entrance for ore and water
to enter the mill.
2. Mill shell: main chamber where ore is lifted by the lifting bars and tumbles down
to grind the ore as the mill rotates.
3. Grate: a screen allowing the finished product in the form of slurry to pass into
the discharge chamber.
4. Pulp lifter: lifts the ore slurry that passes through the grate to the discharge
trunnion.
5. Discharge trunnion: provides a support for the mill and an exit for ore slurry.
The primary objective of SAG mill control is to stabilize the circuit, and the
secondary objective is to maximize the economic performance of the mill (Coetzee,
2009). To maximize the economy of the mill, sub-objectives often include (Craig and
MacLeod, 1995):
27
Feed Mill Gratee Pulp sc a
Discharge
Trunnion Shell f
Lifter Trunnion
Figure 2.9: Schematic of a typical grate-discharge SAG mill (Latchireddi and Morrell,
2003a)
1. maximize throughput.
2. minimize power consumption per tonne of ore processed.
3. minimize steel ball and liner consumption per tonne of ore processed.
4. minimize variations in output particle size.
2.3.2 Overloading
SAG mills are often operated as close to their upper throughput constraint as possible
in order to maximize the economic performance of the mill. As a result of high
feed rates, the operation of the mill can easily become unstable from overloading.
Preventing overloading is non-trivial because the properties of the ore that affect
grind-ability, such as ore size, hardness, and composition, are highly variable, and ore
hardness is not feasible to measure online. Likewise, the internal state of the mill
cannot be directly measured, and estimates from mill power and bearing pressures are
seldom accurate (Wei and Craig, 2009).
The principle variables affecting the internal state of the mill are changes in feed
rate and circulating load, size distribution, ore properties, steel ball charge and liner
wear, and the rate of feed water addition (Wills and Napier-Munn, 2006). Moreover,
28
slurry rheology is highly variable and can have significant effects on load distribu-
tion, breakage behaviour, and discharge efficiency (Moys, 1989; Shi and Napier-Munn,
2002). Figure 2.10 shows how a mill can be overloaded for several cases of these varying
process conditions. The figure shows the mill is always overloaded for a specific range
in charge volume; however, the maximum power draw, representative of mill overload-
ing, varies between operating conditions A through D. Unfortunately, it is currently
impossible to accurately measure charge volume or slurry pooling during operation.
Consequently, it is difficult for operators to detect mill overloading, and once it has
been detected, usually from an increasing bearing pressures and constant or decreasing
mill power, the mill has been overloaded for a long period. Once detected, operators
must then make sudden and large process changes to decrease the feed ore rate to let
the mill slowly grind down the charge.
A
Power
D Overload Zone
Total Charge Volume

Figure 2.10: Mill power curve for several process conditions A, B, C, and D showing
that mill power is a poor indicator of overloading
Overloading occurs as a result of either, or a combination of, three cases: volumetric

overloading, slurry pooling, or the freewheel effect. Additionally, slurry pooling and
the freewheel effect eventually lead to volumetric overloading. These three modes
of overloading are discussed in detail in the remainder of this section. To the best
29
of our knowledge, this thesis is the first publication that discusses all three complex
phenomena within a SAG mill that directly leads to mill overloading.
Case 1: Volumetric overload
Volumetric overloading occurs when either the ore feed rate, particle size distribution,
or ore hardness is too high, thus leading to an accumulation of charge in the mill.
Additionally, any disturbance that causes the grinding efficiency in the mill to decrease
may also lead to volumetric overloading. As charge accumulates in the mill, the
volume occupied by the charge increases such that the total distance ore can fall
under gravity, known as cataracting, decreases and causes the grinding efficiency of
the mill to degrade (Morrell, 1993). Figure 2.11 illustrates the reduction of maximum
potential energy (PEmax ) of cataracting ore resulting from a loss in free-fall distance
when the mill is overloaded.
rotation rotation
PEmax PEmax
xc xc
c.g. charge c.g. charge
τc
(a) (b) τc
Figure 2.11: Cross-sectional view of the inside of a SAG mill showing load torques and
maximum potential energy of cataracting ore for (a) a non-overloaded mill, and (b)
an overloaded mill with reduced potential energy of the cataracting ore
Once a SAG mill has begun to overload, ore may rapidly accumulate until the
charge level reaches the center of the mill where slurry is allowed to freely discharge
and the feed chute plugs. However, depending on the charge density, the bearing
30
pressures may become too high such that the system alarms well before this event.
Depending on the charge density, it may be difficult to detect volumetric overloading
early, since this type of overloading is determined by charge volume rather than mass.
However, Figure 2.11 shows that the center of gravity (c.g.) shifts inside the mill, as
illustrated by the length of the charge torque arm (xc in Figure 2.11). Consequently,
it may be possible to measure the load distribution using the pressure measurements
from the bearing units supporting the mill, or by monitoring whether the motor power
decreases resulting from a reduced counter-torque provided by the charge. However,
the density, volume, and distribution of the charge in the mill are highly variable and
may not always lead to a significant change in the counter-torque provided by the
charge (τc in Figure 2.11) to affect mill power noticeably.
Case 2: Slurry pooling
Slurry pooling occurs when an excess of slurry builds up inside the mill from poor
pulp lifter design resulting in the formation of a slurry pool (Mokken et al., 1975).
The slurry pool drastically dampens the impact energy of the falling rock and reduces
grinding efficiency.
Figure 2.12 shows how slurry is distributed inside the mill for various slurry levels.
The figure shows that slurry is held-up within the solid charge as the material is lifted
up with the rotation of the mill (Latchireddi and Morrell, 2003a). As slurry content
increases, the slurry fills the interstitial space of the solids charge in the direction of
the toe, or bottom of the solids charge. When all the interstitial space is occupied by
slurry, a pool then forms at the toe of the charge (Cleary et al., 2006).
As the slurry pool forms inside the mill, grinding efficiency decreases and the mill
may become overloaded—leading to Case 1. At this point, operators may realize that
for a constant ore feed rate, the trunnion bearing pressures increase indicating an
increase in mill weight. However, mill power may not increase in proportion to the
bearing pressure as would normally be expected. This phenomena is the result of a
31
(a) (b) (c)
Slurry in Charge Solids Charge Slurry Pool
Figure 2.12: Distribution of slurry within a mill for (a) low slurry content, (b) charge
saturated with slurry, and (c) slurry pool formation due to excess slurry content
(Latchireddi and Morrell, 2003a)
shift in the center of gravity of the mill toward the center as the slurry pool builds.
Figure 2.13 shows the slurry pool provides a counter torque (illustrated by τp and the
length of the slurry torque arm xp ) to the torque provided from the charge, τc (Moys
et al., 1996). As a result, the center of gravity of the total charge inside the mill shifts
towards the mill center and results in a decrease in mill power for severely overloaded
mills.
32
rotation rotation
xc xc xp
c.g. charge c.g. charge c.g. pool

τp
(a) τc (b) τc
Figure 2.13: Cross-sectional view of the inside of a SAG mill showing load torques
for (a) a non-overloaded mill, and (b) an overloaded mill with additional torque from
slurry pool
Case 3: Freewheel effect
The third mode of mill overloading is when charge becomes trapped between the lifter
bars due to centripetal force and the viscosity and density of the charge (Dean, 1999).
This phenomena is known as freewheeling because the inertia of the trapped charge
combines with the kinetic energy of the mill and effectively reduces the motor load.
In addition, motor load decreases significantly because the remaining charge in the
mill is no longer lifted because the lifter bars are obstructed. Therefore, the remaining
charge becomes static and results in poor comminution and an accumulation of ore
in the mill—leading to Case 1. In this abnormal situation, increasing the feed water
or decreasing the ore feed rate is needed to decrease the density and viscosity of the
charge and recover from the freewheel effect. The freewheel effect is illustrated in
Fig. 2.14.
33
rotation rotation
xc
xc
c.g. charge c.g. charge
τc (b)
(a) τc
Figure 2.14: Cross-sectional view of the inside of a SAG mill showing load torques
and load distribution for (a) a non-overloaded mill, and (b) an overloaded mill with
freewheeling effect
34
Chapter 3
Problem Statement
The objective of this thesis is to develop a linear control strategy to recover a process
to its nominal operating region following a disturbance that has shifted the process to
an undesired operating region. To satisfy this objective, the problem is divided into
two parts:
1. apply a linear or nonlinear process monitoring method to detect when the linear
or nonlinear process has shifted to an undesirable operating region.
2. develop a control method to optimally recover the process back to its nominal
operating region.
It is also the objective of this thesis to illustrate the overall control strategy with an
industrial application on the SAG mill at Gibraltar Mine. Consequently, the control
strategy must also satisfy the following conditions:
• be easy to understand and maintain for process control engineers in industry.
• be computationally simple for it to operate in an industrial controller.
• be robust to changing process conditions.
There are many forms of disturbances to a process, and many root causes for these
disturbances. However, we are not interested in the nature of the disturbance. Rather,
this work is concerned with the nature of the process shift. The control strategy must
35
be robust to abrupt, incipient, or transient forms of a process shift. In addition,
this control strategy only targets process shifts to specific undesired operating regions
where the process is recoverable to the nominal operating region. Therefore, process
shifts resulting from equipment failure may not result in a controllable recovery and
are not within the scope of this thesis.
A process monitoring and controller performance recovery technique, when ap-
plied so that the techniques work together, may be extremely useful when used to
recover processes from specific undesired operating regions where nominal controller
performance is low. In particular, these techniques are essential to recover controller
performance from undesired operating regions that result in severe consequences to
the safety and/or productivity of a process. The following two sections elaborate on
the objectives of this thesis.
As part of the control strategy to recover a process, the process shift must first be de-
tected. Therefore, the process monitoring method must have the ability to determine
whether the state of the system is nominal, or whether it has shifted to an abnormal
operating region. Furthermore, there may be many different abnormal operating re-
gions of a process. For example, a process such as a grinding mill may be overloaded or
underloaded. In either case, the throughput of the grinding mill will be depressed, and
the process equipment will be at a higher risk of damage. In this example, the nominal
controller can likely recover the process from the underloaded condition. However, an
overloaded mill can quickly become unstable due to the nonlinearity of the process. In
this case, process monitoring and the controller performance recovery techniques are
designed specifically for the mill overload.
Process monitoring methods have been thoroughly developed and reviewed in lit-
erature. However, for many process monitoring methods in literature, there has not
been an effort to develop a control strategy that utilizes the information from process
36
monitoring to be used by a controller to recover the process. Consequently, existing
techniques will be adapted in this work in order to provide information to a controller
that links the process monitoring and controller performance recovery methods.
The purpose of controller performance recovery is to recover a process from a nonlin-

ear, undesired, operating region back to the nominal operating region where nominal
controller performance and process economy is maximized. Comparatively, it is not
the objective of this work to prevent the occurrence of a process shift that leads to a
loss in controller performance. There are plenty of techniques in literature that try to
address this issue; however, it is sometimes inevitable that a process may shift away
from its nominal operating region. Nevertheless, it is possible that a control strategy
that focuses on recovering a process in order to improve controller performance can
also prevent a process from shifting to an undesired operating region. For example,
if the process shift is incipient, a control strategy may halt the process shift at the
boundary between what is defined as the nominal and undesired operating region.
This is an unintended, yet beneficial, consequence of a control strategy that is robust
to a diverse nature of process shifts.
The controller performance recovery technique is intended only for a specific class
of systems. We are interested in systems that can be reasonably approximated as
time-invariant. Moreover, the specific abnormal operating region of a process that
the control strategy is designed for must be reasonably approximated as LTI. How-
ever, most chemical processes are nonlinear. Therefore, the linear control method can
only be effective for a specific range of the abnormal operating region that can be ap-
proximated as ‘locally linear’. Outside of this range, nonlinear control techniques not
covered within the scope of this work need to be applied in place of the linear control
method proposed here for optimal recovery. Alternatively, simple rules based on op-
erators’ best practices may be added to the controller performance recovery technique
37
to help recover a process outside of the locally linear region to the region where the
technique is designed. However, this alternative is not mathematically optimal and is
only to be used if the focus is to recover the process from severe conditions where the
linear control strategy designed in this work no longer suffices to recover the process.
Implementing a controller performance recovery technique where an existing nom-
inal controller exists for a process may have unintended consequences. For example,
the overall economic performance or stability of a process should not be compromised;
conversely, the economic performance and stability of the process should be improved.
In addition, the transition between the nominal and abnormal operating region should
not result in stability issues or transient behaviours.
38
Chapter 4
General Control Strategy
This chapter covers the establishment of a general control strategy to recover a chem-
ical process from an abnormal operating region to its nominal operating region, as
stated in Chapter 3. The control strategy is divided into two main tasks: the detec-
tion of a process shift through process monitoring, and the recovery of the process to
nominal operation using a linear controller. An application of this control strategy on
semi-autogenous grinding is presented in Chapter 5.
Figure 4.1 presents a flowchart showing the major steps of the general control
strategy presented in this chapter. The first step is process monitoring. Process
monitoring is the task of monitoring process outputs and determining if the process
is operating in an abnormal operating range that requires recovery control. In the
event the process requires recovery control, it is also the task of process monitoring
to estimate the magnitude of the process shift. Process monitoring is discussed in
Chapter 4.1.
In the event that the process monitoring method detects a process shift where it
is deemed control action is necessary, the next task is to recover the process from
the abnormal situation. The goal of a controller performance recovery technique is
to execute control actions that are tuned specifically to recover the process back to
nominal operation in the shortest time possible in order to recover nominal controller
performance and process economy. Controller performance recovery is discussed in
Chapter 4.2.
39
Measure Process
Outputs Process Monitoring
Chapter 4.1
No Abnormal Yes
Operation?
Determine
Normal Control
Magnitude of
Actions
Process Shift
Controller Performance Recovery

Chapter 4.2 Recovery Control
Actions
Figure 4.1: Flowchart of the general control strategy applied to a chemical process
There are a multitude of techniques available for process monitoring. Data-based

methods are selected in preference to model-based methods because model-based meth-
ods usually require very accurate process models. Comparatively, chemical processes
are often poorly understood and are too complex to model accurately. In addition, the
computational effort of nonlinear model-based techniques, such as particle filtering, is
usually too large for an industrial controller.
The two main steps for a general data-based statistical process monitoring method
are shown in Figure 4.2. The first step is to extract the most important features from
a measured high-dimensional data-set known as the observation space. This feature
extraction step reduces data sparsity and the dimensionality of the data-set, thus al-
lowing for more accurate training of a classifier in the second step. The second and
final step of a general process monitoring procedure is to classify the low-dimensional
feature space into regions of nominal and abnormal operating conditions. These clas-
40
PCA, SLLEP Tα2, Qα, LDA, QDA
Observation Feature Feature Decision

Classification
Space Extraction Space Space
Figure 4.2: Block diagram of the major steps to statistical pattern recognition for
process monitoring
sified operating regions are known as the decision space. In this step, it is possible
to include both categorical class information and a measurement of the magnitude of
process shift in the decision space.
There are two data-based process monitoring methods that sufficiently satisfy the
desirable characteristics of a process monitoring method that were outlined in Chap-
ter 2.1. PCA is a computationally efficient feature extraction method that can be
applied to monitor process shifts for linear processes. In addition, the method is for-
giving for process shifts that are mildly nonlinear because there will only be small
changes in the correlation structure between process variables. Also, PCA can detect
a process shift to an abnormal, nonlinear, operating region that has a significantly
different correlation structure than nominal operation by measuring the squared pre-
diction error. Classification of the feature space from PCA can be performed with Q
and T 2 statistics, or LDA.
For processes that are highly nonlinear, or are highly-multivariate such that PCA
cannot find at most a three-dimensional model to visualize and summarize key in-
formation regarding a process shift, then LLE can be used instead. LLE has proven
in image analysis to map high-dimensional data to often three or fewer dimensions
and still retain the most important nonlinear features (Saul and Roweis, 2003; Wang,
2011). The resulting low-dimensional space can be classified into regions of nominal
and abnormal operation and plotted to visually verify the accuracy and credibility of
the technique. Moreover, LLE is also computationally efficient, and it is possible to
41
design a simple approach for it to operate in an industrial controller. Classification of
the feature space from LLE can be performed with LDA or QDA.
4.1.1 Principal component analysis
PCA is well-suited as a feature extraction method for process monitoring of linear

processes. The technique can easily fit in an industrial controller since there are only
O(D2 ) online computations at each execution. Additionally, due to its simplicity,
operator acceptance of PCA is high with several applications (Neogi and Schlags,
1998; Kourti et al., 1996; Kosanovich et al., 1996). In this section, we discuss two
different methods to using PCA in process monitoring. In the first method, PCA is
trained on nominal operating data, and the PC scores of the model are classified with
Q and T 2 statistics. In the second method, we propose training PCA on nominal and
abnormal operating data, and the PC scores are classified using LDA. In Chapter 5,
we will demonstrate that the second method has better process monitoring results if
the changes in the correlation matrix during a process shift are too subtle for the first
method to detect.
As discussed in Chapter 2.1.2, PCA extracts features within data by finding or-
thogonal directions of maximum variance using SVD. The orthogonal directions of
maximum variance are characterized by the eigenvectors of the correlation matrix
from a training data-set, and the magnitude of these vectors is characterized by the
eigenvalues. The main underlying assumption with PCA is the data are multivariate
normal; however, reasonable results can often be achieved even if this assumption is
invalid (Johnson and Wichern, 2007).
The first, and most challenging, step of PCA is to determine which process vari-
ables should be included in the model and how these variables should be transformed
to extract key information in the data. Traditionally, nearly all measured variables
related to a particular process are used to train the PC model. In fact, this is the
advantage to PCA: one does not need to know much about the process to use the
42
technique. However, this method of training the PCA model may incorporate biases
in the PC scores towards certain aspects of the process if there is a lot of redundant
information between variables. In addition, the PC model may be trained to find cor-
relations between certain variables in the training data-set that are in fact transient
conditions that may not be experienced with the same operating conditions. Ap-
pendix A discusses techniques helpful for selecting and transforming process variables
for PCA.
Consider a data matrix X ∈ <n×D consisting of n observations and D measured
process variables. Generally, n does not need to be very large to adequately train the
PC model if there is adequate excitation of the process in the data. Since PCA is
scale-dependent, the variables, or columns of X, must be scaled in some meaningful
way (Johnson and Wichern, 2007). The most common method is to standardize the
D variables by scaling them to zero mean and unit variance. Once X is scaled so that
the mean and variance of the variables are in relative proportional to each other, the
next step is to construct the covariance matrix, S, from X:
1
S= X T X, (4.1)
n−1
where S is a square symmetric matrix of dimensionality D. Next, SVD is performed on

S to find the eigenvector matrix, V ∈ <D×D , and the eigenvalue matrix, Λ ∈ <D×D :
S = V ΛV T , (4.2)
where the columns of V are the eigenvectors of S, and Λ is a diagonal matrix of the
eigenvalues sorted in descending magnitude. Now, it is possible to calculate the PC
scores, T , which provides a measurement of how an observation scores for each PC. T
is calculated using the following expression:
T = XV , (4.3)
43
where the columns of T ∈ <n×D are the scores corresponding to each PC. The next
step is to select only the PCs explaining most of the variance in the data-set. There
are several procedures to determine the number of PCs to be retained:
1. Cumulative percent variance (CPV): a measure of the total observed variance

captured by the first p PCs (Zumoffen and Basualdo, 2008). Select p such that
CPV is greater than some threshold.
2. Scree analysis: the sharpest bend in a graph where the diagonal elements of Λ
are plotted in descending order indicates the best choice of p (Jackson, 1991).
3. Cross-validation: partition the data into training and test sets and evaluate the
predictive accuracy of the PC model for different values of p. p is chosen such
that the squared prediction error no longer significantly increases if p is increased
(Ko and Shang, 2011).
The procedure for cross-validation is discussed in Appendix C. For process moni-

toring, it is often acceptable to simply use CPV to find p that satisfies CPV > 90%.
CPV is calculated from the following expression:
p
X
λk
k=1
CPV(p) = × 100%, (4.4)
trace(Λ)
where λk is the eigenvalue in Λ corresponding to the kth PC. Once p PCs have been
selected, the columns of V corresponding to the first p PCs create the PC eigenvector
matrix P ∈ <D×p .
The most common usage of PCA as a process monitoring technique uses T 2 and
Q test statistics to classify process conditions. T 2 indicates the magnitude that obser-
vations are from the center of the PC ellipse, yet remain within the axis of the ellipse
44
(Garcı́a-Álvarez et al., 2009). T 2 ∈ <n×1 is calculated from the expression:
p
X
2
T = diag(tk λ−1 T
k tk ), (4.5)
k=1
where tk is the score vector in T corresponding to the kth fault-specific PC. The
acceptable upper limit of any T 2 value is given by Tα2 :
p(p − 1)
Tα2 = Fp,n−p,α , (4.6)
n−p
where α is the percentage confidence level, and F is the F -distribution at the α

statistical significance level with p degrees of freedom in the numerator and n – p
degrees of freedom in the denominator. Values of T 2 larger than Tα2 are considered
abnormal observations. Furthermore, the magnitude of T 2 > Tα2 provides an estimate
of the magnitude of process shift.
The Q test statistic may also be used to classify process behaviour (Jackson and
Mudholkar, 1979). Q is different from T 2 in that it provides a measure of how far
observations are from axis of the ellipse of the PC model. In other words, it provides
a measure of how poorly the PC model fits a particular observation. Q ∈ <n×1 is
calculated as:
Q = diag(E T E), (4.7)
where E ∈ <D×n is the residual matrix for the prediction of X from the selected PCs,
and E can be calculated from the expression:
E = (I − P P T )X T . (4.8)
Here, I is the D × D identity matrix. For a new observation, xnew ∈ <D×1 , the
matrix X T in Equation (4.8) is simply replaced with the vector xnew . The acceptable
45
upper limit of the Q statistic is given by Qα :
" p #1
h0 cα 2θ2 θ2 h0 (h0 − 1) h0
Qα = θ1 +1+ , (4.9)
θ1 θ12
where cα is the one-tailed normal deviate corresponding to the upper (1−α) percentile,
and θi and h0 are calculated from the following two expressions:
D
X 2θ1 θ3
θi = λij h0 = 1 − , (4.10)
j=p+1
3θ22
with λj raised to the ith power, and 1 ≤ i ≤ 3. Q functions similar to T 2 where

values of Q greater than Qα are classified as abnormal operating data. Moreover, the
magnitude of Q > Qα provides an estimate of the magnitude of the process shift that
can be used as a control signal.
The Q statistic is unique because it makes process monitoring with PCA applicable
to some nonlinear processes. Since Q provides a measure of how well the PC model fits
the observed data, Q will become larger as the process shifts to a nonlinear operating
region that is outside of the region where the linear PC model was trained. However,
the Q statistic estimates the magnitude of process shift into any nonlinear region.
Therefore, the Q statistic should be coupled with some sort of rule set to distinguish
which nonlinear operating region the process has shifted to. Taking a SAG mill as an
example, if Q > Qα and the bearing pressure of the mill is high, then the mill is likely
to be overloaded as opposed to being underloaded.
The traditional approach to use PCA as a process monitoring technique is divided
into the offline and online portions of the method and discussed in Method 1:
Method 1: Offline
1. Obtain a training data-set, X, from nominal process behaviour with key process
variables selected relevant to abnormal operation.
46
2. Obtain the mean and standard deviation vectors of X, then standardize the
columns of X to zero mean and unit variance.
3. Perform SVD on the correlation matrix of X to obtain the PC model.
4. Select the number of PCs explaining most of the variance in the data using CPV.
5. Determine the upper control limits for the T 2 and Q statistics.
Method 1: Online
1. Obtain a new test sample, xnew .
2. Scale xnew using the original mean and standard deviation vectors of X.
3. Evaluate the T 2 and Q statistics. If one of these values exceeds their upper
control limit, then it is considered an abnormal situation.
Traditionally, when Tα2 or Qα are violated, the response of the DCS is to send an
alarm to the operators so they can manually recover the process from the abnormal
situation. However, in this work, we are proposing using the magnitude that these
control limits are exceeded as an estimate of the magnitude of the process shift to an
abnormal operating region. Consequently, this signal can be controlled in the DCS to
reduce the magnitude of the process shift below some constraint value.
A new approach to use PCA for process monitoring is proposed here for cases
where it is found that the T 2 or Q statistics do not correctly identify the process
abnormality upon testing the PC model. This may occur if the abnormality causes
only subtle changes to the correlation structure of the data. In this event, there
may exist some region in the high-dimensional space of X that can be classified as
an abnormal operating region. Furthermore, PCA can be used to reduce variable
redundancy and lower the dimensionality of X to allow for a simple classification
47
method, such as LDA1 , to classify the nominal and abnormal operating regions within
the data. The procedure for this approach is outlined below in Method 2:
Method 2: Offline
1. Obtain a training data-set, X, from a relatively even distribution of nominal

and abnormal process behaviour with key process variables selected relevant to
detecting the abnormal operation.
2. Obtain the mean and standard deviation vectors of X, then standardize the
columns of X to zero mean and unit variance.
3. Perform SVD on the correlation matrix of X to obtain the PC model.
4. Perform LDA classification with 10-fold cross-validation, as discussed in Ap-

pendix C, on all 2D combinations of the PC scores in T .
5. Select only the p PCs that best allow the LDA classifier to separate the nominal
and abnormal operating data.
6. Find the linear equation of the LDA classifier from Equation (4.27) separating
the nominal and abnormal operating data in the pth dimension.
Method 2: Online
1. Obtain a new test sample, xnew .
2. Scale xnew using the original mean and standard deviation vectors of X.
3. Calculate the p PC scores of xnew , where each score is part of the score vector,
y, of xnew .
4. Calculate the LDA classifier output value and class label for y from the linear
equation discovered in Step 6 of Method 2: Offline.
1
Refer to Chapter 4.1.3 for a discussion on LDA
48
In the offline portion of Method 2, it is most likely that a fault will be character-
ized by no more than three PCs. It is also not necessary that the fault-specific PCs
determined from this procedure be the PCs that explain most of the variance in the
data. If an abnormal situation is significantly characterized by more than three PCs,
then LLE should be used instead of PCA since LLE is usually capable of mapping
high-dimensional data to three or fewer dimensions. It is important to have three
or fewer dimensions characterizing a process shift to an abnormal operating region
because it allows operators to visually interpret the PC scores and the classification
boundary and gain trust in the process monitoring method.
In Method 2, abnormal process training data should be selected where the correla-
tion structure between the highly correlated variables determined from Method 1 does
not change drastically when the model is trained with the addition of the abnormal
process data. The objective of training the PC model with the inclusion of abnormal
process data is to develop the correlation structure of variables having poor correla-
tions in the nominal operating region, yet may express a clear correlation structure
in the abnormal operating range. The shift in center of gravity of a SAG mill is a
good example of a variable that only has a clear correlation structure once the mill
has begun overloading. These types of variables provide extra redundancy in the PC
model that is useful for detecting complex phenomena such as mill overloading.
For both Method 1 and Method 2, it may be beneficial to scale the variables in X
from 0% to 100% in proportion to the minimum and maximum values that may be
expected during process operation. A percentage scale is meaningful to operators and
more robust to time-varying changes (MacGregor and Kourti, 1995). The minimum
and maximum limits to scale the variables can usually be estimated from the operating
limits of the variables measured in the DCS, since industrial control often requires
process variables to be scaled from 0% to 100%. When using a percentage scale, it is
important to scale so that the variances of the variables are in relative proportion to
each other. However, the limits for scaling may also be manipulated to add bias to
49
certain variables in the variances when training the PC model.
In Method 2, the signal to be controlled is the distance the PC score vector of xnew is
located away from the classification boundary and into the abnormal operating region.
Moreover, using the centroid of several consecutive score vectors reduces the likeliness
that stochastic signals will cause a single observation to be classified as abnormal when
the process is actually operating nominally.
In Method 2, Fisher’s approach to LDA has the ability to perform feature extraction
instead of PCA (Johnson and Wichern, 2007). However, we do not use Fisher’s LDA
because we found that this technique is sensitive to changes in the mean values of
variables for nominal operation from preliminary experiments on the case study in
Chapter 5. This lead to poor performance with Fisher’s LDA and classification when
the model was used on a new test set having different nominal operating conditions.
On the other hand, we found that PCA is less sensitive to changes in the mean values
of process variables, thus better process monitoring results are expected for monitoring
processes with varying operating conditions.
If either Method 1 or Method 2 is unable to distinguish the abnormal process
behaviour of concern from the nominal process behaviour, it may be because the
abnormal process behaviour is too nonlinear for PCA, hence a nonlinear process mon-
itoring technique such as LLE is needed. Likewise, if more than three PCs are needed
to classify the abnormal process behaviour, then LLE should be used since it has
demonstrated in image analysis to be capable of mapping high-dimensional data on
the order of thousands of dimensions to three or fewer dimensions (Saul and Roweis,
2003; Wang, 2011).
4.1.2 Locally linear embedding
If the process is too nonlinear or high-dimensional that PCA is found to be unsuitable

as a process monitoring method, then LLE can be used as an alternative to PCA.
LLE, however, is more computationally expensive than PCA both in training offline
50
and in online operation.
As discussed in Chapter 2.1.2, LLE is a feature extraction method that nonlinearly
maps locally linear patches of a high-dimensional data-set into a low-dimensional space
that preserves the critical information in the data-set. Furthermore, if LLE is trained
with supervised learning, classification of the low-dimensional mapped space can be
done with a simple classifier. As a result, LLE allows operators to visualize how
the process monitoring method determines whether new observations are classified as
nominal or abnormal.
Before training the LLE model, we must first determine which process variables
should be used in the model, and how they should be transformed. Appendix A gives
some suggestions to this process. Unlike PCA, LLE does not require as much pruning
when selecting process variables; in other words, LLE can tolerate high-dimensional
data-sets. On the other hand, training LLE requires substantially more process data
than PCA. LLE requires a lot of data because new data are mapped to the feature
space by performing a linear regression on the training data, and if a lot of training
data are available, then it is expected that a new data point and its neighbors in the
training data-set are located on a locally linear patch of the manifold.
To properly map a variety of process conditions, we found from the case study in
Chapter 5 that we needed to train LLE with nearly n = 2000 observations that cover
a wide range of operation and sampled only as frequently as necessary, including as
extensive coverage as possible of the abnormal operation requiring the control strategy.
If abnormal operating data are scarce, it should be sampled more frequently than
nominal operating data in order to provide enough observations to properly train the
technique. Correspondingly, n may be different depending upon the signal to noise
ratio of the data, variations in operating conditions, and information redundancy in
the training data. Lastly, failure to include enough data when training LLE may result
in erroneous results when mapping new observations to the feature space.
In the most simple formulation of LLE discussed by Roweis and Saul (2000), there
51
are three fundamental steps performed offline of a DCS once X has been obtained:
the first step is to select the nearest neighbors of each data point in X; the second
step is to reconstruct each data point from its neighbors; and the final step is to map
the data to a low-dimensional feature space. Fig. 4.3 reviews the three steps of the
LLE procedure. In this form of LLE, there is only one free parameter to determine
during training: the number of neighbors, K. Selection of K is discussed at the end
of this section.
1 Select neighbors
xi
Reconstruct with xi Wik

2 xk
linear weights
Wij
xj
yi Wik y
k
Wij
yj
3 Map to embedded coordinates
Figure 4.3: The three steps of the LLE algorithm (Roweis and Saul, 2000)
Step 1: Select neighbors.
The first step is to identify the K-nearest neighbors of the training data matrix,
X ∈ <n×D , where X has been standardized to zero mean and unit variance or scaled
from 0% to 100%. To find the neighbors of the ith observation in X, xi , the Euclidean
52
distance between xi and every observation needs to be determined. Here, the K data
points with the shortest distances are the neighbors, and a neighbor of xi is denoted
as xη .
The Euclidean distance between xi and another observation in X, xj is calculated
from the expression:

x i − x j , (4.11)
2
where || · ||2 refers to the L2 norm. The K-nearest neighbors characterize the geometry
of these local patches by linear coefficients used to reconstruct each data point from
its neighbors. In reality, these local patches are not exactly linear; consequently, there
are errors when using linear weights to reconstruct each data point from its neighbors
in Step 2.
Step 2: Reconstruct each data point from its neighbors.
The second step is to reconstruct each data point from its neighbors by minimizing
the cost function:
n
X n
X 2

minimize ε(W ) = x i − Wij x j
, (4.12)
W
i=1 j=1 2
subject to Wij = 0, if xj 6∈ {xη1 , xη2 , . . . , xηK }, (4.13)

n
X
Wij = 1. (4.14)
j=1
Here, the weights W ij are the magnitude that the jth observation contributes to
the ith reconstruction. W ij are constrained to zero in Equation (4.13) if they do
not belong to the set of nearest neighbors for the ith reconstruction. ε(W ) adds
up to the squared distances between all data points and their reconstructions. The
optimal weights subject to the constraints in Equation (4.13) and Equation (4.14)
that minimizes the reconstruction error in Equation (4.12) can be found by solving a
least-squares problem as detailed in Appendix B.1.
53
An illustration of the locally linear reconstruction of xi is shown in Figure 4.4. This
figure shows that a nonlinear curve must be drawn between xi and its two neighbors.
In order to to develop a linear reconstruction, the data point xi must be reconstructed
onto a line between the two neighbors xη1 and xη2 . The weights W ij are the degrees
of freedom to minimize the reconstruction errors.
xi
xη2
xη1 ΣWij xj
Figure 4.4: Locally linear reconstruction of xi (Saul and Roweis, 2003)
Step 3: Map to embedded coordinates.
The third and final step is to map each high-dimensional observation, xi , to a low-
dimensional feature vector, y i . Each y i represents the d-dimensional global internal
coordinates on the low-dimensional manifold, where d represents a low dimension such
that d < D, and y i is found by minimizing the embedding cost function:
n
X n
X
2

minimize Φ(y) = y i − Wij y j , (4.15)

y
i=1 j=1 2
n
X
subject to yi = 0, (4.16)
i=1
n
1 X
y i ⊗ y i = I, (4.17)
n i=1
where I is the d × d identity matrix, and ⊗ denotes the cross product operation.
Equation (4.16) prevents y i from be translated by a constant displacement by forcing
y i to be centerd on the origin, and Equation (4.17) avoids degenerate solutions by
constraining the embedding vectors to have unit covariance.
The embedding cost defined in Equation (4.15) is a quadratic program (QP); how-
ever, subject to the constraints in Equation (4.16) and Equation (4.17) that make the
54
problem well-posed, the QP can be reformulated to solve a sparse n × n eigenvalue
problem as detailed in Appendix B.2.
Supervised LLE
An adaption to LLE is discussed by De Ridder and Duin (2002) to improve the method
when used to extract features within a data-set for subsequent classification. The
technique is called supervised LLE (SLLE), and it includes class information in the
algorithm by artificially adding a distance between samples from different classes. In
process monitoring, there are usually two classes: data that is nominal, and data that
is abnormal.
To perform SLLE, only Step 1 of the original algorithm is changed: before se-
lecting the K-nearest neighbors of xi , add an additional distance, R0 , between data
points if they belong to different classes. Here, the distance R0 that incorporates class
information is calculated as follows:
R0 = R + γ max(R)(1 − δ(xi , xj )), (4.18)
where R = ||xi −xj ||2 is the Euclidean distance between two data points, and max(R) =
maxij ||xi −xj ||2 is the distance between the two observations that are farthest apart in
X. For every combination of xi and xj that belong to separate classes, δ(xi , xj ) = 0,
otherwise δ(xi , xj ) = 1. Lastly, the parameter γ is used to control the amount of class
information incorporated into R0 , where 0 ≤ γ ≤ 1. In practice, γ is often chosen to be
at most 0.2 because large γ values result in data from the same class to coalesce on the
same point, thus resulting in poor mapping of new data to the feature space with LLE
projection (De Ridder and Duin, 2002; Li and Zhang, 2011). Comparatively, γ can
be very small if there are outliers in the data that cause max(R) to be large. Owing
to the ability of SLLE to separate data belonging to different classes, we are able to
use a simple classifier, such as LDA or QDA, when classifying the feature space into
55
regions of nominal and abnormal process operation.
LLE projection
Since LLE is a non-parametric method, there is no equation to map new observations

into the same feature space discovered by LLE during offline training. As a result, Li
and Zhang (2011) adapted LLE to project a new D-dimensional observation, xnew , to
the d-dimensional space as y new . This technique is called LLE projection, and there
are three steps computed at each controller execution to project new observations into
the feature space for subsequent classification: the first step is to select the nearest
neighbors of a new observation to the observations in X; the second step is to find
a mapping matrix; and the final step is to calculate the embedding coordinate of the
new observation.
LLE projection is performed online in a DCS at every scan. Moreover, this tech-
nique requires the entire training data-set, X, and the entire embedded coordinate
vector, Y , to be stored in the DCS. Here, X includes the class information from
SLLE, and Y is a matrix containing all embedded coordinates, y i , determined from
Equation (4.15).
Step 1: Select neighbors.
For a new high-dimensional sample, xnew ∈ <D×1 , the first step is to find its K
neighbors, X η , in the training data-set X by a Euclidean distance measurement,
where:
X η = [xη1 , xη2 , . . . , xηK ] ∈ <D×K . (4.19)
Step 2: Find a mapping matrix.
The second step is to compute a mapping matrix, A, from the training data that
satisfies:
Y η = AX η , (4.20)
56
in the least squares sense, where:
A = [a1 , a2 , . . . , ad ]T ∈ <d×D , (4.21)
Y η = [y η1 , y η2 , . . . , y ηK ] ∈ <d×K , (4.22)
and Y η is the neighborhood embedded coordinate matrix of the feature data corre-
sponding to the observations of X η . In other words, the coordinate vectors in Y η are
taken from the vectors in Y corresponding to the observations, X η . Each basis vector
of A, aj , is obtained by solving the linear least squares regression problem:
K
a xη − yηj 2 ,
X T
aj = argmin i i 2
(4.23)
a
i=1
where 1 ≤ j ≤ d, and yηj i is the jth element of y ηi . The analytical solution for the
case of X Tη having full column rank and the case where X Tη is column rank deficient
is given in Appendix B.3.
Step 3: Compute the embedding coordinate.
The final step is to calculate the embedding coordinate of the new observation, y new ∈
<d×1 , from xnew using the mapping matrix determined in Step 2:
y new = Axnew . (4.24)
In Step 2 of LLE projection, the solution for A is obtained by solving the QP in

Equation (4.23). Since the analytical solution described in Appendix B.3 requires the
computation of a matrix inverse, it is difficult to program and execute in an industrial
controller. In many industrial controllers, matrix inversion is not built-in and must be
programmed using elementary row operations—often using Gauss-Jordan elimination
(Friedberg et al., 2002).
A new method is proposed in this thesis to solve A. Since MPC is a pre-existing
57
module found in most modern DCS, it can be used as a QP solver to solve Equa-
tion (4.23). A dynamic matrix control MPC algorithm is discussed in detail in Ap-
pendix D, which is a common unconstrained MPC formulation.
To use any form of MPC to solve Equation (4.23), the manipulated variables (MVs)
are treated as the elements of A, and the controller will determine the values of these
elements in A by minimizing its cost function. Furthermore, the control variables
(CVs) are treated as elements of Y η whose target set-points (SPs) are the values of
the elements in Y η . Finally, the input-output model gains of the MPC are the values
of X η pertaining to the elements of the input-output relation in Equation (4.20), and
it is not necessary to assume a first-order model, or dynamics, in these models.
Example of LLE projection using MPC
MPC was used to find the projection of a data point in an ‘S’ shaped curve shown in
Figure 4.5. In this figure, n = 800 random samples, or observations, of a 3D ‘S’ shaped
manifold are shown in Figure 4.5a. In this plot, one observation is randomly selected,
and its K = 12 neighbors are identified. Next, LLE was performed to map the 3D data
to 2D space and the results are shown in Figure 4.5b. A black box depicts the inset
of this plot that is shown in Figure 4.5c. Finally, in Figure 4.5d, MPC was used to
perform linear regression on the neighbors of the observation, and the linear equation
was used to project the coordinates of the data point in 3D space to the mapped 2D
feature space. The plot shows that discrepancy exists between the coordinates of the
data point mapped by LLE and its projection from the linear equation. This is due
to the nonlinearity of the ‘S’ curve, and this error decreases as the number of samples
the LLE model is trained with increases.
The results from MPC for the mapping shown in Figure 4.5 are comparable to
the analytical solution of Equation (4.23) described in Appendix B.3. In fact, MPC
discovers the exact same solution as the analytical solution from the Moore-Penrose
pseudo-inverse in one iteration. To have MPC converge on the solution immediately,
58
a) b)
observation neighbors projection
c) d)
Figure 4.5: LLE mapping of an ‘S’ shaped curve to show the accuracy of LLE projec-
tion
the models in MPC, as described by the coordinates of the K neighbors in 3D space,

were treated as gain-only models with no dynamics. In addition, the penalty-on-move
term of the MPC was set to zero for all six of the MVs describing the linear mapping
matrix A.
On a final note for LLE projection, we observe that X and Y need to be stored
in the DCS in order to determine X η and Y η in Equation (4.20). Since n needs to
be large for adequate LLE projection results, a lot of data needs to be stored in the
DCS. To reduce the amount of data that needs to be stored in the DCS, we recommend
pruning the training data-set once the feature space has been trained in order to reduce
the size of n. De Ridder and Duin (2002) propose a method to select a subset of the
training data once the feature space has been trained to resolve this issue.
59
SLLEP for classification
The combination of SLLE for training the feature space and LLE projection for map-
ping new data is collectively known as SLLEP. Once SLLE trains the information
rich feature space, this low-dimensional space can be classified into regions of nominal
and abnormal process operation. If a control strategy is sought for only one type
of abnormal operation, there will likely be only one contiguous region in this space
corresponding to the abnormal operation. However, multiple regions can be classified
if the features in the low-dimensional space characterize more than just one type of
process abnormality.
Owing to the ability of SLLE to incorporate class information when training the
feature space, process data from separate classes is well-segregated and can be easily
classified with a linear or quadratic classifier. It is ideal to use a linear or quadratic
classifier because of the simplicity and robustness of training the classifier offline, and
because it is simple to determine a meaningful distance that an observation in the
feature space is located from the classification boundary.
LDA and QDA are suitable linear and quadratic classifiers, respectively, to classify
the feature space from SLLE into regions of nominal and abnormal operation. LDA
and QDA are discussed in Chapter 4.1.3. LDA should be used if it appears that
the abnormal operating data are linearly separable from the nominal operating data.
Otherwise, QDA, is usually acceptable. New data projected to the feature space is
classified by the same classification rule as the training data to provide an estimate of
the class label and the distance to the classification boundary. If LDA or QDA poorly
classify the feature space into regions of nominal and abnormal operation, then the
tuning parameters K and γ should be adjusted when training the SLLE technique.
Selection of K and γ
The selection of K and γ significantly affects the ability of LDA or QDA to classify the
feature space discovered by SLLEP. K determines the number of neighbors around a
60
particular data point that are considered to be within a locally linear region of the
observation space. If K is too small, then SLLEP does not have enough information
about the global properties within the data-set; if K is too large, then the mapping
loses its nonlinear character and behaves like PCA.
γ determines how much class information is included in training SLLEP. A large
γ will excessively separate classes and lead to over-generalization of the feature space.
With a large γ, erroneous results may occur when projecting new observations into
the feature space. Furthermore, the classifier loses the ability to provide meaningful
information on the magnitude of process shift to a controller since data from the same
class converge on a single point in the feature space. On the other hand, a small γ
does not incorporate any class information when training SLLEP. Consequently, it is
less likely that LDA or QDA can classify the feature space due to the nonlinearity of
the features.
Since there are two free parameters to manipulate when training SLLEP, 10-fold
cross-validation should be performed using the procedure discussed in Appendix C
to optimize the selection of the parameters. Cross-validation finds the best selection
of K and γ by evaluating the unbiased prediction accuracy of LDA or QDA. Once
the parameters have been chosen and the SLLEP model has been trained, the model
should then be tested on a test set of data before commissioning in the industry.
4.1.3 Linear and quadratic classification
LDA and QDA are simple classification methods that find a linear or quadratic com-
bination of features that best separate two or more classes. In this thesis, we focus
on the classification of two classes, and leave the simple extension to three or more
classes to Johnson and Wichern (2007).
The main underlying assumption in LDA and QDA is that the data to be classified
are multivariate normal (Johnson and Wichern, 2007). Although this is not always
the case, LDA often remains an effective classifier if this assumption is not violated
61
(Johnson and Wichern, 2007). However, QDA may lead to erroneous results when
classifying a feature space greater than 2D if this assumption is violated (Johnson and
Wichern, 2007). Since PCA only performs well on multivariate normal data, LDA
should be a sufficient classifier when PCA is used for feature extraction. SLLEP does
not assume multivariate normality, thus γ can be tuned to sufficiently separate the
classes in the feature space, and LDA may be used in the absence of multivariate
normal data. However, QDA is a quadratic classifier, and it may be used to better
separate a 2D feature space than LDA without having to over-tune γ.
LDA and QDA classifiers are trained in a supervised manner from the known class
labels of the feature data to be classified. Therefore, we must divide our feature data
matrix into two groups, Y 1 and Y 2 . Here, Y 1 contains the nominal operating data
belonging to class π1 , and Y 2 contains the specific abnormal operating data belonging
to class π2 that represents the abnormal operating range that a control strategy is to
be designed for. The remaining steps between LDA and QDA differ from each other
and are discussed separately.
Linear discriminant analysis
To construct a linear classifier, it is assumed in LDA that the covariance matrices of

Y 1 and Y 2 are equivalent. However, since the true covariance matrix of any of these
populations are unknown, we take a pooled estimate of the covariance matrix, S pooled ,
from the sample covariances. S pooled is calculated from the expression:
n1 − 1 n2 − 1
S pooled = S1 + S2, (4.25)
(n1 − 1) + (n2 − 1) (n1 − 1) + (n2 − 1)
where S 1 and S 2 are the sample covariance matrices calculated from Y 1 and Y 2 ,
respectively, and the formula to find the covariance matrix is given in Equation (4.2).
Additionally, n1 is the number of observations in Y 1 , and n2 is the number of obser-
62
vations in Y 2 . The next step is to find the mean vectors of Y 1 and Y 2 :
1 T 1 T
ȳ 1 = Y 1, ȳ 2 = Y 1, (4.26)
n1 1 n2 2
where 1 is a vector of ones with as many elements as there are observations in Y 1 or

Y 2 . The allocation rule of LDA for a given observation in the feature space, y i , is:

π 1 âT y i − m̂ ≥ cost

class = (4.27)
T
π 2 â y i − m̂ < cost,

where:
â = (ȳ 1 − ȳ 2 )T S −1
pooled , (4.28)
1
m̂ = (ȳ 1 − ȳ 2 )T S −1
pooled (ȳ 1 − ȳ 2 ), (4.29)
2
c(1|2) p2
cost = ln . (4.30)
c(2|1) p1
Here, â is the sample coefficient vector, m̂ is the estimated midpoint between the
two classes, and the cost equation accounts for the probability of misclassification
and the cost of misclassification. In the cost equation, p1 is the prior probability of
occurrence that an observation belongs to the π1 , and vice versa for p2 ; for example,
if the probability that a new observation will belong to π2 is small, then LDA should
classify a new observation as π1 unless the data overwhelmingly favors π2 . In addition,
c(1|2) is the cost associated with classifying an observation as π1 , when it actually
belongs to π2 , and vice versa for c(2|1). For example, if a SAG mill is overloaded and
we classify a new observation as belonging to the non-overloaded class, then the mill
overload is not corrected for and the consequences are more severe than if the mill was
not overloaded but was classified as overloaded.
Not only does the LDA allocation rule dictate the class of an observation, it also
63
gives the equation of the linear classification boundary for solutions to:
âT y i − m̂ − cost = 0. (4.31)
Equation (4.27) provides the classification rule to determine whether an observation

belongs to π1 or π2 . In addition, the calculated output of âT y i − m̂ − cost yields a
value that is proportional to the distance an observation is located away from the
linear classification boundary. The output value is simply the closest distance an
observation is to the classification line that has been scaled by an arbitrary amount
that is consistent between all observations. In other words, this output value can
provide an estimate of the magnitude of process shift that is needed for the controller
performance recovery strategy.
Quadratic discriminant analysis
QDA is different from LDA in that there is no assumption that the covariance matrices
of Y 1 and Y 2 are equivalent. Without this assumption, the classification regions are
defined by quadratic functions of y i . The allocation rule of QDA is:
1
f (y i ) = − y Ti (S −1 −1 T −1 T −1
1 − S 2 )y i + (ȳ 1 S 1 − ȳ 2 S 2 )y i − k − cost, (4.32)
2

π 1 f (y i ) ≥ 0

class = (4.33)
π 2

f (y i ) < 0,
where:
1 |S 1 | 1
k = ln + (ȳ T1 S −1 T −1
1 ȳ 1 − ȳ 2 S 2 ȳ 2 ). (4.34)
2 |S 2 | 2
Here, |S| denotes the determinant of S. The QDA allocation rule not only dictates
the class of an observation, it also gives the equation of the quadratic classification
boundary for solutions to f (y i ) = 0.
64
Similar to LDA, the calculated value of f (y i ) provides an estimate of the distance
that an observation is located from the quadratic classification boundary. However,
unlike LDA, this distance is not the shortest distance that an observation is to the
classification boundary. Instead, the value of f (y i ) corresponds to the location of the
observation on the classification boundary if the quadratic feature was increased or
decreased in size. In other words, the output of f (y i ) is a meaningful estimate of the
magnitude that an observation is from the classification boundary that also accounts
for the curvature of the classifier. As a result, this output value can be used to estimate
the magnitude of process shift that is needed in the controller performance recovery
technique.
Controller performance recovery is a control strategy where information from process

monitoring is used to improve controller performance in order to recover a process from
an abnormal operating range to its nominal operating range. This section discusses
how to model the process in an abnormal operating region to allow us to tune the
recovery controller. In addition, this section discusses two control strategies to recover
the process: the first is applicable to a process that utilizes any style of nominal
feedback controller, and the second is applicable to a process utilizing MPC as the
nominal feedback controller.
A general control objective of a process is given in the following linear optimization
65
problem:
minimize ε(u) = abs(x − xsp ), (4.35)

u
subject to xlower ≤ x ≤ xupper , (4.36)
y lower ≤ y ≤ y upper , (4.37)
ulower ≤ u ≤ uupper , (4.38)

 
ẋ
  = f (x, u, w), (4.39)
y
where x is a vector of process states to control; y is a vector of measured process

outputs; u is a vector of manipulated process input variables; the subscript sp repre-
sents a set-point; and the subscripts upper and lower represent the upper and lower
constraint limits.
The process outputs are related to the inputs in Equation (4.39). Here, ẋ is the
derivative of x, and w is a vector of measured input disturbances to the process. The
process dynamics of f (x, u, w) are complex due to the nonlinearity of the process.
Therefore, it is not always possible for a linear nominal controller to maintain high
performance in the form of best minimizing Equation (4.35) in the event the process
shifts away from the nominal, locally linear, operating region. As a result, a control
strategy is needed to recover the process to the nominal operating region.
MPC is used in both controller performance recovery techniques as the recovery
controller. In the case where any style of nominal feedback controller is used, MPC
will externally override the existing controller; conversely, in the case the existing
controller is already MPC, then the process monitoring information is used as a new
constraint variable in the existing MPC. MPC is an ideal choice because it is a mul-
tivariate controller that can handle constraints, and it has achieved wide acceptance
in the industry. MPC also solves an optimization problem in order to choose the best
control actions that minimize prediction residuals over a finite prediction horizon. For
66
operators in the industry, MPC is reasonably easy to maintain and understand, and
tools exist in most modern DCS to easily implement MPC. The principles and de-
sign of an unconstrained MPC formulation are presented in Appendix D. For more
detail on MPC, Camacho and Bordons (2007) discuss MPC in great detail including
formulations involving constraints.
In addition to using MPC as the recovery control in the event of a process shift
leading to poor nominal controller performance, expert rules are also proposed in this
section. Expert rules are only needed to recover a process that has shifted into a
severely abnormal operating region where even the MPC recovery controller has poor
performance. In this case, the objective of the expert rules is only to recover the
controller performance of the MPC recovery controller, thereby enabling the MPC
recovery controller to recover the process to the nominal operating region.
4.2.1 Modeling the abnormal operating region
When modeling the abnormal operating region, we use the output from the process
monitoring method as the CV used to monitor the process recovery. This CV contains
a measure of the magnitude of the process shift into a specific abnormal operating
region. Since this CV is a special transformation of measured process outputs, it is
appropriate to use in closed-loop feedback.
The main assumption to model the abnormal operating region is that this region
is assumed to be locally linear. For this assumption to be valid, the control boundary
for this region must be selected so that reasonably linear process behaviour can be
expected for observations in one point fn the region to another. Figure 4.6 shows a 3D
nonlinear manifold of process operation depicted by three arbitrary feature variables.
This figure shows the locally linear regions of the nonlinear manifold depicted by planes
tangent to the manifold where the nominal and recovery controllers are linearized for
operation. At the edges of the abnormal operating region, we see that on one end
lies the nominal operating region, and on the other end is process operation that
67
Severely Abnormal
Operating Range
Abnormal
Operating Range
y3
y2
y1
Nominal
Operating Range
Figure 4.6: 3D manifold of process operation with two planes showing the linearized
regions the nominal and recovery controller are designed to operate
is considered to be severely abnormal. Process recovery from a severely abnormal

operating region is challenging and probably best achieved using expert rules that
mimic an operator’s best actions in this event. However, discrete control actions from
these rules are not mathematically optimal, thus the expert rules should focus only on
recovering the process to the abnormal operating region where the continuous recovery
controller is designed.
In Figure 4.6, we see the assumption of local linearity breaks down at the edges
of the planes due to the nonlinearity of the manifold, thus the size of these regions is
dictated by a predefined allowable tolerance for process model mismatch and controller
performance decay. Once the boundaries of the abnormal operating region have been
decided, then operating data that lies within abnormal operating region is needed to
model the process in this region. However, it is unlikely that there will be abundant
operating data available within this region, and the process would not have been
allowed to operate abnormally for a long time. Thus, it may be hard to find enough
operating data to train a process model accurately in this region.
The strategy to develop the process model is to ideally find at least two sets of data
68
U Gp Y
Figure 4.7: Block diagram for the input-output relationship of a modelled process
where the operators placed the control in open-loop to manually recover the process
from an abnormal operating region. At least one data-set is used to train a process
model for the abnormal operating region, and one data-set should be set aside as a
test set to validate the model. For each of the training data-sets, train a first-order
plus time delay (FOPTD) or second-order process plus time delay (SOPTD) model
of the process response for each process input-output pair. There are many forms of
process models we can use; however, FOPTD or SOPTD are chosen in this work to
model the process because many chemical processes can be adequately modelled with
these forms, and most modern DCS allow for these forms of the process model in the
built-in MPC.
Figure 4.7 illustrates the block diagram for a process with output Y (s) and input
U (s) that is modelled using a transfer function Gp (s), where Y (s) is the process
output, y(t), represented in the frequency domain and U (s) is the process input, u(t),
represented in the frequency domain.
The process transfer function, Gp (s), for a FOPTD process is of the form:
Y (s) Kp
Gp (s) = = exp(−θs), (4.40)
U (s) τ1 s + 1
where Kp is the gain of the response in Y (s) for a change in U (s), τ 1 is the time
constant, and θ is the time delay of the process response. The Laplace variable, s, is
known as the operator variable in the frequency domain. A seldom used form of the
FOPTD model is when lead dynamics are included in the numerator of the transfer
function. A FOPTD model with lead dynamics is of the form:
Y (s) τn s + 1
Gp (s) = = Kp exp(−θs), (4.41)
U (s) τ1 s + 1
69
where τn is the time constant of the first-order numerator dynamics. It is unlikely
this transfer function is needed for most chemical processes. However, if the process
response is fast, the numerator dynamics may help in modeling the transfer function.
A SOPTD model usually arises whenever two first-order processes are connected
in series. Gp (s) for a SOPTD process is of the form:
Y (s) Kp
Gp (s) = = exp(−θs), (4.42)
U (s) (τ1 s + 1)(τ2 s + 1)
where τ 1 and τ 2 are the time constants for the two first-order transfer functions making
up the SOPTD transfer function. For chemical processes, it may be necessary to
include dynamics in the numerator of the SOPTD model, particularly when the process
data shows an inverse response. The SOPTD model with numerator dynamics is:
Y (s) τn s + 1
Gp (s) = = Kp exp(−θs). (4.43)
U (s) (τ1 s + 1)(τ2 s + 1)
To identify any of the process models in Equations (4.40)–(4.43), a continuous-time

model identification technique should be conducted on sampled data. Discrete-time
model identification techniques, such as output error modeling, are not considered
because these models need to be converted to continuous time, and it is not guaran-
teed that a discrete-time model can be converted to a continuous-time model with a
specific pre-determined form (Bequette, 1998; Ljung, 2007b). Garnier et al. (2003);
Ljung (2009) discuss methods to identify a continuous-time model from discrete-time
data. One of the most common continuous-time model identification methods is the
prediction error method. This technique minimizes the prediction errors when fitting
a model and involves an iterative numerical search (Ljung and Wills, 2008). The
prediction error method for continuous-time model identification is available in the
MATLAB System Identification Toolbox (Ljung, 2007a).
The procedure to find the best process model of an abnormal operating region,
and its model parameters, is given below as a guideline and is to be repeated for each
70
input-output pair, u(t) and y(t).
Step 1: Identify a set of training data where the process is in the abnormal operating
region and y(t) has been sufficiently excited from changes to u(t) that are uncorrelated
from other process inputs.
Step 2: Determine the parameters for each of the four continuous-time models in
Equations (4.40)–(4.43) using the MATLAB System Identification Toolbox.
Step 3: Compute the residuals of each model, and evaluate whether increasing the
order of the model increases the prediction accuracy at a statistical significance level of
α. Choose the model of lowest order that gives the best fit. A more detailed discussion
on model selection is given in Appendix E.
Step 4: Evaluate the overall model on the test data-set to see how well it will perform
in practice. Once the process model is satisfactory, it can be used as a predictive model
in the MPC controller.
4.2.2 Override MPC design
An override MPC is used when the nominal controller is any type of feedback controller
except MPC. The override controller is tuned for a specific abnormal range of process
operation. When the measure of process shift output from the process monitoring
method, given by the output of the LDA or QDA classifier, indicates the process shift
to this region is high, the controller will try to recover the process shift to a SP of
zero by overriding the process inputs that are manipulated by the nominal controller.
Figure 4.8 illustrates the override control strategy.
Figure 4.8 shows that the override controller takes control of the MVs using high
(>) and low (<) signal selects. High selects are used when the optimal action to
recover the process for a particular MV is to increase the variable, and vice versa for
71
Process Shift + Override
Set-point MPC
-
Process
Measure of Monitoring
Process Shift
MV’s
CV + Nominal <
Process CV
Set-point - Controller
MV’s
>
Figure 4.8: MPC controller overrides the MV’s of the existing controller to recover the
process after a process shift
the low selects. The implementation of the override MPC controller is simple: the
only CV is the measure of process shift, there are as many MV’s as there are degrees
of freedom needed to recover the process, and there are no constraint variables.
In the MPC design, the penalty-on-move weights should be set as low as possible
because the main objective of the override control is fast recovery of the process.
This is because the fastest recovery of the process possible with MPC is when the cost
function only includes the prediction errors. When the cost associated with changes to
the MVs is included, the time for the process to recover becomes suboptimal. However,
some cost associated with the control moves is necessary to prevent excessive wear to
the control elements in a recovery situation.
With the override MPC configuration, there is the possibility that switching tran-
sients from the high and low selects may introduce instability to the system. In
particular, if the disturbance at the root cause of a process shift persists, the recovery
controller may keep the process at the boundary between the abnormal and nominal
operating regions. In this case, dwell-time switching logic may be used to prevent
rapid switching between control signals. In this logic, a pre-specified length of time,
known as the dwell time, must pass before the signal can switch from one controller
(Morse, 1993). However, dwell-time switching has some serious disadvantages. For
72
example, the controller performance could severely decay within the dwell-time. An
alternative to dwell-time switching is to use hysteresis based switching. Hysteresis
switching requires that the signal to be switched exceeds the switching threshold by
a predefined value before switching occurs (Hespanha et al., 2003). However, hystere-
sis switching may introduce destabilizing transients resulting from the signal to the
control elements abruptly changing.
Given that the nature of the switching logic for high and low signal selects inher-
ently results in reasonably smooth switching, it is unlikely that efforts are required to
mediate instabilities resulting from switching control signals to the control elements
in the override MPC design. This continuity of switching exists because both signals
must be nearly identical in magnitude as switching occurs, with only the signal that
is slightly higher or lower than the competing signal being selected by the high or low
select functions, respectively. Nevertheless, a quick simulation should be performed us-
ing the process models from the nominal and override controllers to determine whether
any instabilities result from switching.
In the event that the magnitude of process shift is large and the process is consid-
ered to be in a severely abnormal and nonlinear operating region, expert rules that
mimic an operator’s best actions may be needed to help recover the process. The
expert system acts as the final override to the MVs and has the ability to override any
previous control decisions to the MVs. Figure 4.9 illustrates the expert system.
The objective of the expert system is only to recover the process from the severely
abnormal operating region into the less-severe abnormal operating region that the
override MPC is designed for. Likewise, the expert rules should be designed with
hysteresis to ensure that the process has been recovered far enough away from the
severe abnormal operating region that it is unlikely that the override MPC will be
unable to complete the recovery of the process.
With the override MPC control strategy commissioned, it is unlikely that a process
will shift to a severely abnormal operating region. However, large abrupt disturbances
73
...
Expert Process
Override Monitoring
...
... ... <
Process CV
... >
...
Figure 4.9: Expert rules override the MV’s if the process is operating in a severely
abnormal region
are possible in some processes, and these disturbances may rapidly shift a process into
severely abnormal operation. Therefore, an expert system provides an extra layer of
security in the process to maintain safe and stable operation following an extraordinary
event.
4.2.3 Modifications to existing MPC
If the existing controller in the process is already MPC, then the controller can be
modified to incorporate the measure of process shift as a constraint variable. Therefore,
when the constraint is violated, the MPC will switch to the models for the constraint
variable and execute the corrective actions to recover the process to nominal operation.
The structure of the existing MPC is shown in Figure 4.10.
When comparing Figure 4.10 to Figure 4.8, we see that it is more efficient and
intuitive to implement the control strategy when the nominal controller is MPC. With
this method, there is no need to license a new MPC. Furthermore, the MPC inherently
switches the models to the constraint model set after a process shift, and MPC in
modern DCS is designed to avoid the risk of creating process instability from internal
model switching.
Figure 4.10 also illustrates the expert override control strategy that is identical to
74
Measure of Process Shift Process
Monitoring
Expert
Override
CV
MPC Process
AV
+ MV
CV CV
Set-point -
Figure 4.10: Existing MPC controller is modified with a constraint variable to allow
the MPC to recover the process after a process shift
the override strategy depicted in Figure 4.9. In this figure, however, MV and CV can
represent any number of manipulated or controlled variables since MPC is a multiple-
input and multiple-output controller.
75
Chapter 5
Case Study of SAG Mill Overload
The objective of this thesis is to develop a control strategy to manage processes that
are operating outside of their nominal operating region. This was achieved in Chap-
ter 4. This chapter provides an industrial case study for the application of the control
strategy for the detection of a process shift to an abnormal operating region, and the
recovery of the process from the abnormal operating region to the nominal operating
region.
This case study is conducted on the SAG mill at Gibraltar Mine. The operators
at Gibraltar Mine operate the SAG mill conservatively because of their concern for
the mill to overload. They find the early detection of overloading to be extremely
difficult, and are therefore not able to react until the situation deteriorates signifi-
cantly. Furthermore, the operators’ corrective actions to recover the process result in
a sub-optimal recovery and a substantial opportunity loss in mill performance. In this
chapter, we show how we can address these issues using the control strategy proposed
in Chapter 4.
5.1 Description of the process
A detailed overview of the SAG mill process, and the overload process abnormality is
discussed in Chapter 2.3. Figure 5.1 concisely shows the SAG mill circuit at Gibraltar
Mine with key process variables labelled that are utilized by the existing MPC nominal
76
control strategy. In the MPC, the CVs are the feed-end west bearing oil pressure
and calculated charge density. The feed ore and water flow rates are the MVs, and
the constraint variables, labelled AV in the figure, are the motor powers, remaining
three bearing oil pressure measurements, pebble recycle, and sound level. Finally, the
disturbance variable (DV) is a measurement from a visual size analyzer of the fraction
of ore particles in the feed that are greater than two inches. Here, a particle size of
two inches is of interest because the pebble ports on the discharge grate allow two inch
particles to pass, thus particle sizes greater than two inches have a longer residence
time in the mill and significantly affects the mill dynamics.
Figure 5.1 shows only the most relevant process variables to the operation of the
mill using MPC. There are numerous additional measurements monitoring the con-
dition of the mill such as bearing temperatures and vibration measurements. Ad-
ditionally, Gibraltar Mine lacks important measurements such as discharge flow to
the pump box; however, some missing measurements can be inferred from available
measurements. Based upon the nature of SAG mill overloading, we believe a process
monitoring method that utilizes available and inferred measurements can accurately
detect the onset of overloading and also provide an estimate of the severity of the
overload. The selection of process variables for process monitoring is discussed in
Chapter 5.2.1.
5.1.1 Control objective
The objective of the nominal control strategy at Gibraltar Mine is to control the feed-
end west bearing pressure and charge density to target values. The feed-end west
bearing pressure is the highest bearing pressure because the load is distributed mostly
towards the feed chute and, during mill rotation, is lifted up onto the west side of the
mill. By controlling this bearing pressure to a target value, it is expected that the
throughput of the mill is maximized. This is true only when the mill is not overloaded
to any extent. Charge density is used as the second CV to ensure the slurry water
77
CONTROLLED VARIABLES (CV’s) MANIPULATED VARIABLES (MV’s) NOTATION
CV1. FEED END WEST BEARING OIL PRESSURE MV1. ORE FEED RATE C. CONTROLLER
CV2. SAG MILL DENSITY MV2. FEED WATER D. DENSITY
F. FLOW
CONSTRAINT VARIABLES (AV’s) DISTURBANCE VARIABLES (DV’s) I. INDICATOR
AV1. MOTOR A POWER DV1. ORE PARTICLE SIZE > 2" J. POWER
AV2. MOTOR B POWER M. MOTOR
AV3. PEBBLE RECYCLE CONVEYOR LOAD P. PRESSURE
AV4. SOUND LEVEL T. TRANSMITTER
AV5. FEED END EAST BEARING OIL PRESSURE W. WEIGHT
AV6. DISCHARGE END EAST BEARING OIL PRESSURE X. SOUND
AV7. DISCHARGE END WEST BEARING OIL PRESSURE
AV3
conveyer
WT
PARTICLE SIZE co n v
78
e ye r
ANALYZER AV2 JT AV4
DV1 DISCHARGE-END
M MOTOR B TRUNNION
XT BEARING OIL
FIC conveyer (EAST / WEST)
PT AV6
er
ey
FEED PT AV7
nv
WATER
co
FT
WT SAG MILL
DENSITY CV1 PT
CALCULATION
f(x) AV5 PT
screen
FEED-END
CASCADE TRUNNION
DI DIC FIC
BEARING OIL M MOTOR A
(EAST / WEST)
CV2 MV2 JT AV1 TO PUMP BOX
Figure 5.1: Overview of existing SAG mill control at Gibraltar Mine

content is sufficient to allow adequate discharge and to prevent excessive cohesion of
clay minerals that may lead to overloading. However, the density measurement is
seldom accurate because the calculation is rudimentary and based upon the feed and
water rate measured at a single sample instance in the past. Although not discussed
in this work, a more practical approach to SAG mill control would be to maximize the
throughput of the mill. To maximize throughput, the discharge flow to the pump box
would need to be measured for use as a CV in the MPC.
The input and output variables of the process shown in Figure 5.1 are listed in
Table 5.1 with their proper control notation: x is a state variable to be controlled; u is
an input manipulated variable; w is an input disturbance; and y is a vector of output
measurements. In the case of the SAG mill overload, the constraint variables will
not be violated until the overload becomes extremely severe. Instead, many of these
measurements are included in the process monitoring technique in order to estimate
the magnitude of the mill overload. PM1 denotes the process monitoring output
variable in the table, and it represents the estimated mill overload magnitude as a
single variable.
Table 5.1: Input and output variables critical to the recovery control strategy
Outputs Variable Inputs Variable

CV1 x1 MV1 u1
CV2 x2 MV2 u2
PM1 x3 DV1 w1
AV1–7 y
The control objective of the SAG mill process is shown in the following optimization
79
problem:
minimize ε(u1 , u2 ) = abs(x1 − x1,sp ) + abs(x2 − x2,sp ), (5.1)

u1 ,u2
subject to x1,lower ≤ x1 ≤ x1,upper , (5.2)
x2,lower ≤ x2 ≤ x2,upper , (5.3)
y lower ≤ y ≤ y upper . (5.4)
x3 ≤ x3,upper , (5.5)
 
ẋ
  = f (x, u, w), (5.6)
y
Here, the control objective is formulated to minimize the absolute value of the error
between the control variables, x1 and x2 , and their SPs, x1,sp and x2,sp . In addition, the
process monitoring output variable, x3 , has an upper constraint limit. If the constraint
in Equation (5.5) is violated, then the process is considered to be overloaded and it is
desirable to recover controller performance. Otherwise, we will be unable to control
x1 and x2 to the target SPs. This will result in poor minimization of Equation (5.1)
and possible violations of the constraints in Equations (5.2) to (5.4).
The greatest challenge in satisfying Equations (5.1) to (5.6) is the equality con-
straint imposed in Equation (5.6). Because the dynamics of f (x, u, w) are nonlinear,
the nominal linear controller used to satisfy the SAG mill control objective will be
insufficient in the event of a mill overload. In particular, a process shift from the nom-
inal operating region to the overload operating region is nonlinear, and the nominal
controller is not tuned for the overload operating region. In Chapter 5.4, we introduce
Equation (5.5) as a constraint variable into the existing nominal MPC controller for
the SAG mill. When the upper limit of Equation (5.5) is violated, the MPC switches to
the constraint variable models which are tuned specifically for the overload operating
region in order to recover controller performance.
80
The first task in the application of the overall control strategy on the SAG mill is to
train the process monitoring method to be sensitive to the overloaded operating region.
The SAG mill overload is a complex and nonlinear abnormal situation to detect. This
is because the onset of overloading can be the result of any of several phenomena each
affecting process outputs in different ways. In addition, the effects of overloading in
the early stages are subtle with only small deviations to the linear characteristic of the
nominal range of operation.
In this section, we show that traditional PCA using Q and T 2 , Method 1, is unable
to detect the onset of overloading. Next, we show that PCA with LDA classifica-
tion, Method 2, successfully detects the onset of overloading when monitoring only a
specific selection of process variables. Furthermore, PCA Method 2 was implemented
on site at Gibraltar Mine in January 2013. Finally, we also show that SLLEP with
LDA classification is able to successfully detect the onset of overloading regardless of
the selection of process variables, and the classification results are superior to PCA
Method 2. Unlike PCA, SLLEP was not implemented on site due to limitations of the
MPC tool available at Gibraltar Mine. The results discovered from the implementa-
tion of PCA demonstrates the first successful implementation of a process monitoring
method to both detect SAG mill overloading and estimate the magnitude of overload
at a mineral processing facility.
5.2.1 Selection of measured and inferred process variables
There are numerous input and output variables considered to train the feature extrac-
tion methods for process monitoring. There are nearly 100 measurements available
for the SAG mill. However, only a handful of these variables are needed to estimate
a process overload; other process variables are not sensitive to the changes within the
mill relating to overloading, or they provide redundant information that would bias
81
the feature extraction results.
The candidate process variables considered for process monitoring are listed in
Table 5.2. This table provides a summary of the original units, filtering time constant,
upper and lower ranges for scaling, and the standard deviation of the scaled variables.
All 10 candidate variables are scaled from 0 to 1, and the upper and lower ranges for
scaling are manually selected based upon the following heuristics: the variables are
scaled considering their upper and lower operational constraints at Gibraltar Mine, and
the scale range is adjusted to create relatively similar standard deviations between all
candidate variables.
Table 5.2: Properties of the candidate process variables

Lower Upper Filter Time Standard
Process Variable Units
Range Range Constant (min) Deviation
Bearing Pressure psi 850 1200 1 0.1316
Motor Power kW 4000 6200 1 0.0701
Sound Level n/a -40 40 1 0.1099
Center of Gravity n/a 0.459 0.464 4 0.0907
Difference of Slopes n/a -0.0027 0.0027 11 0.1135
Recycle STPH 0 600 1 0.0599
Discharge to Pump Box STPH 2200 4800 1 0.0870
Specific Energy kW/ton 1 12 1 0.1291
Vibration n/a -0.15 0.15 5 0.1132
Size > 2 inch % 0 100 4 0.1017
The filtering time constants listed in Table 5.2 are also determined heuristically.
The filter time constant needs to be as small as possible to minimize detection delay
of the overload in process monitoring. Conversely, the time constant needs to be large
enough to filter stochastic noise from the data in order to help minimize false alarms.
A first order filter is used to filter the data and has the difference equation:
y[k] = αf y[k − 1] + (1 − αf )x[k], (5.7)
where k is the sample instance, x is the filter input, y is the filter output, and αf is
82
given in the following expression:

−∆t
αf = exp . (5.8)
τ
Here, τ is the filter time constant, and ∆t is the elapsed time in minutes between
samples. For the SAG mill case study, we believe it is ideal to detect overloading
within at most 20 minutes. Otherwise, the mill may severely overload and result in
excessive stress on the equipment and poor process economy. Although some variables
may have a large detection delay due to filtering, the advantage of a multivariate
process monitoring method is that all the variables complement each other and the
disadvantages of one variable, such as detection delay, can be offset by other variables
with low detection delay. Similarly, no single variable alone is suitable to monitor SAG
mill overloading, thus several variables are needed to provide sufficient evidence of an
overload.
It is important to note in Table 5.2 that there is only one input candidate variable,
feed rate with size greater than two inches, and the rest are process outputs. Time
plot studies of the data discovered that other process inputs, such as the total feed
rate and water flow, could be any arbitrary value during an overload. In other words,
there is a poor correlation between mill overloading and these inputs. This is expected
because the overload is the result of complex changes to the internal states of the mill
that needs to be indirectly measured from the process outputs.
Each of the 10 candidate variables and their relevance to a SAG mill overload are
briefly discussed in the following sections.
Bearing pressure, motor power, difference of slopes
Bearing pressure and motor power are two variables that are closely monitored by op-
erators when they are concerned about an overload. During an overload, they expect
to see high values for bearing pressure and motor power. Furthermore, they also ex-
83
pect bearing pressure to continually increase as a mill overloads, yet the motor power
may not increase or even decrease during severe overloading. Therefore, based upon
operator experience, we believe that the difference in the rates of change between bear-
ing pressure and motor power is also important to monitor. A variable transformation
of the scaled bearing pressure and scaled motor power is used to create the difference
of slopes variable:
∆BP ∆MP
Difference of Slopes = − , (5.9)
∆t ∆t
where the slopes are determined from a first order finite difference calculation given
by:
∆BP BP[k] − BP[k − 1]

= , (5.10)
∆t ∆t
∆MP MP[k] − MP[k − 1]
= . (5.11)
∆t ∆t
Here, BP is the feed-end west bearing pressure, MP is the motor power for motor
A, k is the sample instance, and ∆t is the time between sample instances and is chosen
to be one minute—one minute is reasonable since the time constant of the process is
estimated to be around 10 minutes.
Sound level
Another important variable to monitor is sound level. Experienced operators insist

that an overloaded mill is quieter than normal. Moreover, the review of SAG mill
overloading in Chapter 2.3 identified that mill acoustics should decrease during an
overload resulting from the reduced potential energy or dampening of the cataracting
charge.
A time plot of the sound level measurement of the SAG mill revealed that the sound
level has a moving average. The moving average is likely a result of changes in the
steel ball charge and ore type. The condition of the mill liner and auxiliary equipment
where the SAG mill is located may also affect the sound level. Consequently, the
84
sound level was transformed into the difference between a 0.4 day moving average of
the sound level and the raw measurement filtered at 40 seconds. A time plot of the
raw sound level and the moving average is shown in Figure A.1 in Appendix A.1, and
the sample instances when the mill was overloaded is highlighted to show the effect of
overloading on the raw sound level compared to the moving average.
Center of gravity
The review in Chapter 2.3 also revealed that load distribution may be another key
variable for process monitoring. If the load distribution shifts to the east side of
the mill where the charge is not lifted, then it may indicate that ore or slurry is
accumulating on this side and reducing the grinding efficiency of the mill. To measure
load distribution, the change of total mass between the east and west side of the mill
can be determined from the bearing pressure measurements. The center of gravity of
the mill can be used to estimate the load distribution, and it is calculated from the
expression:
FEEast BP + DEEast BP
Center of Gravity = , (5.12)
FEWest BP + DEWest BP + FEEast BP + DEEast BP
where FE is the feed-end of the mill, DE is the discharge-end of the mill, and BP refers
to the bearing pressure at specific locations of the SAG mill. Since the total force on
the bearings is largely affected by the weight of the mill, it is expected that the load
distribution estimate will not be sensitive to the small changes in charge distribution
inside the mill. However, the estimate can be appropriately scaled to emphasize the
expected subtle changes in this measurement.
Recycle rate, discharge to pump box
During an overload, it is also expected that the recycle flow and discharge to the
pump box will decrease even though bearing pressure increases and the feed rate may
85
be constant. The discharge flow to the pump box is inferred from the feed rate, water
flow addition, and change of tonnage in the mill and calculated from the mass balance
given in the expression:
∆mcharge
= (Feed + Water + Recycle)
∆t
− (Discharge to Pump Box + Recycle), (5.13)
and rearranged to solve for the discharge to pump box:
∆mcharge
Discharge to Pump Box = Feed + Water − . (5.14)
∆t
Here, feed is given in short tons per hour (STPH) and water is also given in STPH
after being converted from gallons per minute (GPM) to STPH where 1 GPM is 0.25
STPH of water at 10◦ C. The discharge to pump box variable is also given in STPH,
and ∆mcharge is the estimated change in the mass of the charge over a time interval
of ∆t of one minute. At Gibraltar Mine, the mass of the total charge in the mill is an
available measurement inferred from the four bearing pressure measurements with a
scaling and bias determined from a calibration performed on the empty mill.
Specific energy
Specific energy is a measure of the amount of energy consumed per ton of ore processed
in the mill. In the midst of an overload, the specific energy of the charge should
decrease because mass is accumulating but motor power does not increase. Specific
energy is calculated from the expression:
Total Motor Power

Specific Energy = , (5.15)
Total Bearing Pressure
where the total motor power is the combination of the motor power from both motors
A and B, and the total bearing pressure is the combination of all four bearing cells.
86
Bearing vibration, feed rate of size greater than two inches
The final two variables considered for monitoring are bearing vibration and the feed
rate with a size greater than two inches. The effects of overloading on bearing vibration
are poorly understood in the industry. However, we believe bearing vibrations may
initially increase upon overloading due to the increased overall inertia and total kinetic
energy of the charge as the mass of the charge increases. Eventually, vibrations may
decrease upon significant overloading as the impact energy of the cataracting charge
decreases.
Even though feed with a particle size greater than two inches is considered as a
candidate process variable, a time plot of this variable also shows that there appears
to be no easily observable correlation between an overload and the particle size distur-
bance. Time plots of each of the scaled process variables are provided in the following
section.
Visualization of the training data
A time plot of the training data-set for PCA and SLLEP feature extraction is given
in Figure 5.2 showing data sampled at two minutes from Gibraltar Mine on Febru-
ary 29th, 2012.
Figure 5.2 shows that the bearing pressure and motor power time trends are two
excellent indicators of the overload condition. Normally, bearing pressure and motor
power are correlated; however, during an overload, bearing pressure increases and
motor power holds steady. Upon severe overloading, shown just before sample instance
250, motor power and bearing pressure are inversely correlated. The figure also shows
that the sound level variable—formulated so that low mill acoustics results in a high
fault indicator—is high during an overload. This corresponds to a much lower sound
level than the moving average of the sound level. The figure also shows the center of
gravity increases implying that the load distribution has shifted slightly to the east
side of the mill. The third subplot of the figure shows the difference of slopes is
87
1 1
Bearing Pressure
Motor Power
0.75 0.75
0.5 0.5
0.25 Overload 0.25
0 0
0 50 100 150 200 250
1 1
Center of Gravity
Sound Level
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250
Difference of Slopes
1 1
0.75 0.75
Recycle
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250
Discharge to Pump Box
1 1
Specific Energy
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250
1 1
Size > 2 inch
0.75 0.75
Vibration
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250
Sample Number
Figure 5.2: Candidate process variables of the training data-set for process monitoring
88
positive when the mill is overloaded which is apparent from the trend being above the
dashed black line at the center of the scale. In this subplot, it is also observable that
the recycle flow decreases once the mill is severely overloaded. The fourth subplot
shows the discharge to pump box is high during the mill overload; however, it does
not increase enough to offset the consistent rise in bearing pressure that indicates an
accumulation of ore in the mill. The specific energy is observed to decrease during the
overload. Finally, the vibration measurement appears to temporarily increase during
an overload, and the feed rate with size greater than two inches is unexpectedly less
than normal.
From the training data shown in Figure 5.2, the first five candidate process vari-
ables listed in Table 5.2 appear to be the most important indicators of the overload
situation. Specific energy provides somewhat redundant information to the bearing
pressure, recycle indicates the overload only during severe overloading, and vibration
only indicates an overload at the onset of overloading. It is unlikely that a linear
feature extraction method could utilize the nonlinear characteristics of variables such
as the vibration measurement since the correlation between other variables is poor.
However, nonlinear feature extraction may appropriately utilize these nonlinear mea-
surements. Although it appears that five variables are the most important for process
monitoring, all 10 candidate process variables will be considered when comparing PCA
to SLLEP in Chapter 5.2.4.
Figure 5.3 shows a pair-wise scatter plot between the five most relevant candidate
process variables for the training data-set—the other five variables are not shown
for brevity. The scatter plots show that the blue data points, representing nominal
operating data, are scattered in an approximately elliptical fashion which is most
visible for plots of bearing pressure and other variables. However, when considering
both the nominal and overload operating data, the data no longer scatters elliptically
for the mill overload. This is a clear indication of the nonlinear nature of the SAG mill
operation, particularly near the boundary between nominal and overloaded operation.
89
The non-elliptical scattering of the data between nominal and overloading operation
shows that at the boundary between these operating states multivariate normality of
the data does not hold. This implies that linear feature extraction techniques such
as PCA may result in poor performance when used in process monitoring to detect
the onset of overloading. However, in PCA, the accuracy of model fit between the PC
model and the measured values can be useful to provide an indication of overloading
if the model fit is poor.
Visualization of the test data
Figure 5.4 shows the time trends of the 10 candidate process variables for the test
data-set. This data-set is developed from data sampled at two minutes from Gibraltar
Mine on May 27th, 2012, when three consecutive overloads occurred. This data-set is
excellent to validate the performance of the process monitoring technique because the
mill is in a double-fault situation. In this case, the engineers overestimated the steel
ball charge in the mill. As a result, the grinding efficiency was depressed and the mill
could only be operated at a much lower total charge mass than normal. The effects
of the lower charge mass can be observed from the unusually low bearing pressure.
Because the bearing pressure was so low, the operators attempted to increase the
bearing pressure in order to increase the mill throughput. However, the mill was
already nearly overloaded. Consequently, the mill easily overloaded and it was later
discovered that the deficiency of the steel ball charge indirectly lead to overloading.
The test data shown in Figure 5.4 has mean and standard deviation values that
are so different from the training data, that any attempts to train feature extractions
methods on the standardized training data failed during model validation on the test
data. Only after scaling the data between 0 and 1 was it possible to mitigate the
drastic changes to the mean and standard deviation of the variables and design a
feature extraction method effective for both the training and test data-sets. Scaling
the variables rather than standardizing is one of the features that makes the process
90
Normal Operating Data Overloaded Operating Data
1
Bearing Pressure
0.5
0
1
Motor Power
0.5
1
Sound Level
0.5
1
Center of Gravity
0.5
0
0.5
0
0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1
Bearing Pressure Motor Power Sound Level Center of Gravity Difference of
Slopes
Figure 5.3: Pair-wise scatter plot of the selected process variables for a wide range of
data including the training set
91
1 1
Bearing Pressure
Motor Power
0.75 0.75
0.5 0.5
0.25 0.25
Overload
0 0
0 50 100 150 200 250
1 1
Center of Gravity
Sound Level
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250
1 1
0.75 0.75
Recycle
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250
1 1
Specific Energy
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250
1 1
Size > 2 inch
0.75 0.75
Vibration
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250
Sample Number
Figure 5.4: Candidate process variables for a segment of the test data-set for process
monitoring
92
monitoring models developed in this chapter robust to changing process conditions.
5.2.2 PCA
Two methods of training the PC model and classifying the feature space were inves-
tigated. The procedure for both of these methods was discussed in Chapter 4.1.1.
First we will show that PCA Method 1 is not satisfactory for process monitoring of a
complex process such as a SAG mill. Then, we will demonstrate that PCA Method 2
is instead more suitable for these processes if careful consideration is given to the
selection of process variables in the model.
PCA Method 1
To use PCA as a feature extraction technique, Method 1 was first evaluated. Since this
method requires that the training data are entirely nominal operating data, a training
data-set different from Figure 5.2 was used. The training data-set used to train PCA
Method 1 is shown in Figure 5.5. This data-set is taken at a two minute sampling
from Gibraltar Mine on February 29th, 2012. This training data includes some of the
training data in Figure 5.2 up until the point where the mill became overloaded. PCA
Method 1 was trained on this data using the first five variables in Table 5.2
The test statistics Q and T 2 were calculated for the test set in Figure 5.4 to classify
abnormal and normal process behaviour. Figure 5.6 shows the results for the Q and T 2
statistical classifiers from Method 1 on the test data-set from May 27th, 2012 sampled
every two minutes from Gibraltar Mine. In these results, we see that neither Q and
T 2 are sensitive enough to the mill overload to detect the early onset of overloading.
This is likely because the overload results in subtle changes to the correlation matrix
used to train the PC model. Moreover, these statistics appear to be more sensitive
to the highly nonlinear underloading abnormal situation. However, mill underloading
does not require a recovery control strategy like overloading does.
Table 5.3 summarizes the classification results for PCA Method 1. Two PCs were
93
1 1
Bearing Pressure
Motor Power
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250 300 350 400 450
1 1
Center of Gravity
Sound Level
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250 300 350 400 450
1 1
0.75 0.75
Recycle
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250 300 350 400 450
1 1
Specific Energy
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250 300 350 400 450
1 1
Size > 2 inch
0.75 0.75
Vibration
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250 300 350 400 450
Sample Number
Figure 5.5: Candidate process variables of the training data-set for PCA Method 1
94
20
Hotelling T 2 Statistic
Overload
15
10
Tα2
5
0
0 50 100 150 200 250
0.25
0.2
Q Statistic
0.15
0.1
Qα
0.05
0
0 50 100 150 200 250
Sample Number
Figure 5.6: Q and T 2 results on the test data-set for PCA Method 1
retained in the PC model and used to calculate the Q and T 2 results. The perfor-
mance of the Q and T 2 test statistics was evaluated using sensitivity and specificity,
and an average of the sensitivity and specificity results in a column labelled ‘Metric’.
Sensitivity is a measure of the proportion of true overload operating data that are
correctly identified by the classifier. Sensitivity is defined by the following expression:
# of true overload
Sensitivity = , (5.16)
# of true overload + # of false nominal
where a true overload corresponds to the data we labelled as overloaded before training
the classifier and was correctly classified, and a false nominal corresponds to data
classified as nominal by the classifier when instead we labelled that data as belonging
to the overloaded class. The low sensitivity of both Q and T 2 reveals that these
95
classifiers are poor indicators of the SAG mill overload.
The other metric used to evaluate the performance of the classifier is specificity.
Specificity is a measure of the proportion of true nominal operating data that are
correctly identified by the classifier. Specificity is defined by the following expression:
# of true nominal
Specificity = , (5.17)
# of true nominal + # of false overload
where a true nominal corresponds to the data we defined as nominal before training the
classifier and was correctly classified, and a false overload corresponds to data classified
as overloaded by the classifier when instead we labelled that data as belonging to the
nominal class.
Table 5.3: Classification results for PCA Method 1
Dimension of Q Q T2 T2
Dataset Qα Tα2 Metric
Feature Space Sens. Spec. Sens. Spec.
Train 0% 97.7% 0% 89.4% 46.8%
2D 0.058 6.05
Test 0% 74.7% 9.1% 93.1% 44.2%
Sensitivity and specificity are used instead of an overall classification error rate
because we are most interested in knowing if we incorrectly classified an overload as
being normal. Also, it is likely that more nominal data is available than overload
data, and this imbalance biases this type of estimate towards measuring the successful
classification of only the nominal data.
For complex processes, we do not expect to see satisfactory process monitoring
results from PCA Method 1. This case study is one example providing evidence that
Method 1 is not sensitive enough to subtle changes in the correlation structure of
data in the event of a process shift. However, Method 1 is included in this thesis to
demonstrate the advantages of PCA Method 2 or SLLEP proposed in this work. As a
general rule, if it is known that the process dynamics are complicated, we recommend
to not attempt to use Method 1 for process monitoring.
96
PCA Method 2
PCA Method 2 has more satisfactory feature extraction and classification results than
Method 1. In Method 2, the PC model was trained on the training data-set in Fig-
ure 5.2. This data-set has enough overload data included with the nominal data that
when the PC model was trained, correlations existed between the first five candidate
variables in Table 5.2. In fact, the inclusion of the overload data allowed variables
such as the center of gravity to develop a correlation structure that otherwise never
existed in the nominal operating region.
Figure 5.2 shows that the center of gravity only has a strong correlation with other
variables once the mill is overloaded. This is a key reason why overload data are needed
in training the PC model. The sacrifice to incorporating overload training data occurs
when the overload is severe and the correlation structure differs significantly from
the nominal operation. However, as long as a limited amount of data in the severe
operating region is used, then the PC model will not significantly lose performance
as a feature extractor. The PC model parameter results from training are shown in
Table 5.4.
Table 5.4: PCA modeling results
PC1 PC2 PC3 PC4 PC5

Bearing Pressure 0.651 0.427 0.073 -0.339 -0.523
Motor Power 0.313 0.128 -0.232 -0.483 0.774
Sound Level 0.456 -0.186 -0.688 0.533 -0.027
Centre of Gravity 0.329 0.300 0.577 0.585 0.355
Difference of Slopes 0.403 -0.823 0.367 -0.158 -0.016
Eigenvalue 0.0360 0.0088 0.0071 0.0034 0.0001
% Variance Explained 64.90 16.00 12.80 6.10 0.30
Table 5.4 shows the five PCs and the coefficients of the five process variables used
in the model. From the table, we see that nearly all of the variance in the data-set can
be explained with one, two, or three PCs. Scores from each PC were plotted against
each other in 2D plots to inspect which PCs were sensitive to overloading. It was
97
found that PC1 and PC2 were most sensitive to the overload. The 2D plot of PC1 and
PC2 scores is shown in Figure 5.7a.
0.2
0.1
0
PC2 Scores
−0.1
−0.2
−0.3
a) Nominal
−0.4 Overloaded
Classification Boundary
−0.5
0.4 0.6 0.8 1 1.2 1.4 1.6
b)
0.4 0.6 0.8 1 1.2 1.4 1.6
PC1 Scores
Figure 5.7: LDA classification results on the PC score space of the training data-set
for a) 2D LDA, and b) 1D LDA
Figure 5.7a shows that the nominal and overloaded operating data occupy their
own distinct regions of the PC score space. A linear classification boundary is drawn
in the figure to separate the data into binary classes of nominal and overload operating
data. The equation for the classification boundary is determined by 2D LDA and is
given in Table 5.5.
Table 5.5 summarizes the classification results for the PCA-LDA process moni-
toring method using sensitivity, specificity, a cumulative performance metric labelled
98
Table 5.5: PCA-LDA classification results for the 2D and 1D feature spaces
Dimension of
Classifier Data-set Sens. Spec. Metric APER
Feature Space
Train 97.8% 99.4% 98.6%
2D y = −1.464x + 1.641 31.2%
Test 18.8% 100% 59.4%
Train 100% 95.5% 97.8%
1D x = 1.195 13.0%
Test 65.4% 100% 82.8%
‘Metric’, and the apparent error rate (APER). The column labelled ‘Metric’ in Ta-
ble 5.5 is simply the average of the sensitivity and specificity metrics, and is used to
provide a concise summary of the classification results. APER is the estimated overall
error rate of the classifier, and a low APER value is favourable. APER is determined
from the classification results on the test set, and it is calculated as follows:
# of false overload + # of false nominal

APER = . (5.18)
# of samples in the test set
Figure 5.7a demonstrates of the importance of being able to visually interpret high-
dimensional data into a low-dimensional feature space that is easily plotted. In this
case, we see the linear classifier has separated the nominal and overloaded data across
two-dimensions. However, it appears the data are almost entirely separable across one
dimension—the scores for PC1 . As a consequence, the 2D linear classifier over-fits the
training data, and the classifier will not generalize well to new data sampled online at
Gibraltar Mine. To mitigate over-fitting, the classifier is trained on PC1 score data
in 1D LDA, and the classification results are shown in Figure 5.7b and summarized
in Table 5.5. The figure and table show that the classification results between 2D
and 1D LDA on the feature space are nearly identical. However, when the 2D and 1D
classifiers are applied to the test data-set, the results indicate that the 1D classifier has
superior classification accuracy than the 2D classifier. The classification results for 2D
and 1D LDA on the feature space of the test set are given in Figure 5.8a, Figure 5.8b,
and Table 5.5.
99
0.2 Nominal
Overloaded
Classification Boundary
0.1
0
PC2 Scores
−0.1
−0.2
−0.3
a)
−0.4
−0.5
0.4 0.6 0.8 1 1.2 1.4 1.6
b)
0.4 0.6 0.8 1 1.2 1.4 1.6
PC1 Scores
Figure 5.8: LDA classification results on the PC score space of the test data-set for
a) 2D LDA, and b) 1D LDA
5.2.3 SLLEP
In this section, we show that SLLEP in combination with LDA can also successfully de-
tect the onset of SAG mill overloading. In addition, in this case study we demonstrate
that this technique achieves equivalent or better process monitoring performance that
PCA Method 2, and this technique is suitable for both low and high-dimensional pro-
cess monitoring. PCA Method 2, on the other hand, is limited to monitoring the few
scarce variables with a reasonably linear response during a process shift. The method
for training SLLEP as a feature extractor is discussed in Chapter 4.1.2.
100
In this case study, we found that we needed approximately 2000 data points from
both nominal and overloaded training data to properly train SLLEP. This training
data-set was obtained from Gibraltar Mine for February 25th, 2012 to February 29th,
2012. This data-set includes the data shown in Figure 5.2. All nominal operating
data were sampled every five minutes, and overload data for the overload shown in
Figure 5.2 and a partial overload on February 27th were sampled at 30 seconds. In
addition, nominal data from May 26th, 2012 were included in the training set with
a sampling of five minutes to include data from different operating regions. Finally,
all data where the SAG mill was not in operation were removed. This results in
a training data-set with 1919 data-points spanning over six days and includes two
identified overloads.
Four candidate models for several combinations of candidate process variables were
considered when training and testing SLLEP. The four candidate models are shown in
Table 5.6. Model 1 is the same model used in training the PC model in Chapter 5.2.2.
Three other models with increasing numbers of variables in the model were considered.
Table 5.6: Summary of the variables retained in each candidate model
Variables Model 1 Model 2 Model 3 Model 4

Bearing Pressure X X X X
Motor Power X X X X
Sound Level X X X X
Centre of Gravity X X X X
Difference of Slopes X X X X
Recycle X X
Discharge to Pump Box X X X
Specific Energy X
Vibration X X
Size > 2 inch X
The results for SLLE on the training data-set mapped to 2D space are shown in
Figure 5.9 for Model 4. The figure shows that the nominal operating data appears
101
to scatter elliptically, and the nominal and overload operating data are well-separated
with some misclassifications near the classification boundary. A 2D visual interpreta-
tion of all 10 process variables would not be possible without a method such as SLLE,
and the overload data are separated enough from the nominal operating data that it
is possible to use LDA to classify the 2D space into regions of nominal and overload
operation. Finally, the figure also shows that QDA over-fits the region and is unable
to classify some severely overloaded data within the quadratic classification boundary,
and classification of severely overloaded data needs to be perfect due to the potential
safety risks and equipment damage resulting from an overloaded mill.
1
Coordinate 2
−1
−2
Nominal
−3 Overloaded
Quadratic Classification Boundary
Linear Classification Boundary
−4
−3 −2 −1 0 1 2 3 4
Coordinate 1
Figure 5.9: LDA and QDA classification on the training data-set mapped into 2D
space by SLLE
In both LDA and QDA classifiers on the feature space in Figure 5.9, we see that
there are misclassified data points. However, these misclassified data points are ac-
102
ceptable because they likely tell us two things. First, since we manually assigned a
class label to data ranges that appeared faulty, there is subjectivity to this process.
Only the most obvious overload data were classified as being overloaded in order to
reduce bias towards classifying data as overloaded. Secondly, it is nearly impossible to
perfectly classify data as being nominal or overloaded when the process shift between
the two operating regions is continuous. In other words, there is a ‘fuzzy’ operating
region in the middle that can be considered both slightly overloaded and nominal at
the same time. This presents a challenge when assigning a binary class label to a non-
binary abnormal situation. Given these points, we expect to see misclassifications,
and the linear classification boundary in the figure shows a good compromise between
classifying data as nominal or overloaded.
Figure 5.10 shows the 2D SLLEP mapping of the test data-set using Model 4.
The results from this plot show that the linear classifier is superior to the quadratic
classifier because it misclassifies fewer observations. The figure also shows that there
are numerous data points where the overload data were classified incorrectly as nominal
by the linear classifier. However, an inspection of these data points revealed that these
data points are on the fringes where manual class labels were added to the overload
ranges in the test set; in other words, we most likely manually classified some data as
being overloaded when in reality the mill was not necessarily overloaded.
Table 5.7 summarizes the SLLEP classification results for the four candidate mod-
els. Both LDA and QDA were tested for each model. The performance metric to
compare the models is simply the average of the four training and test sensitivity
and specificity values. The training sensitivity and specificity were determined from
10-fold cross-validation of the training data-set, then the test data were mapped to
the feature space and the classification results were determined. This table shows
us that the linear classifier consistently scores better than the quadratic classifier for
these four models. This is because the quadratic classifier is more susceptible to over
generalizing the feature space than the linear classifier.
103
4
1
Coordinate 2
−1
−2
Nominal
−3 Overloaded
Quadratic Classification Boundary
Linear Classification Boundary
−4
−3 −2 −1 0 1 2 3 4
Coordinate 1
Figure 5.10: LDA and QDA classification on the test data-set mapped into 2D space
by SLLEP
Table 5.7 also shows that the classification performance increases as the number
of variables used to train the model increases. This is not expected to always be the
case; however, these results demonstrate that SLLEP works well with high-dimensional
data-sets. In addition, SLLEP-LDA classification performance was not hindered by
the addition of the particle size data, even though this variable does not correlate well
with the overload in the test data-set.
The equations of the linear and quadratic classification boundaries, determined
from all solutions to f (yi ) = 0, for SLLEP on Model 4 are given in Table 5.8. The
APER is also provided in the table to provide an estimate on the classification error
rate expected for the classifiers.
104
Table 5.7: SLLEP Classification results of four candidate models
Train Train Test Test

Model Classifier K γ Metric
Sens. Spec. Sens. Spec.
Quadratic 28 0.0016 91.3% 93.7% 66.2% 99.5% 87.7%
1
Linear 28 0.0016 91.1% 93.7% 70.0% 99.1% 88.5%
Quadratic 33 0.0022 91.5% 87.1% 81.5% 94.9% 88.8%
2
Linear 34 0.0020 97.8% 90.1% 85.4% 94.0% 91.8%
Quadratic 31 0.0004 96.2% 93.7% 70.0% 96.8% 89.2%
3
Linear 31 0.0004 96.2% 93.7% 80.0% 98.1% 92.0%
Quadratic 26 0.0012 99.1% 89.0% 86.9% 94.9% 92.5%
4
Linear 26 0.0012 98.9% 89.0% 90.8% 93.5% 93.1%
5.2.4 Comparison between PCA-LDA and SLLEP-LDA
In Chapters 5.2.2 and 5.2.3 we showed that PCA-LDA and SLLEP-LDA both suc-
cessfully classify the overload abnormal operating situation in the training and test
data-sets. The combined classification results of the two methods are summarized in
Table 5.9 for the four models identified in Table 5.6.
Table 5.9 shows that SLLEP-LDA is superior to PCA-LDA as a process monitor-
ing method for the test data-set. For each model, SLLEP-LDA is more sensitive than
PCA-LDA to detect an overload. On the other hand, PCA-LDA consistently has a
higher specificity than SLLEP-LDA. However, a high specificity alone is not indica-
tive of a good classifier. Therefore, a comprehensive performance metric for the two
methods is included that takes the average of the sensitivity and specificity for the
each method. The best model chosen for PCA-LDA is Model 1, and the model chosen
for SLLEP-LDA is Model 4. Based upon the performance metrics for these models,
Table 5.8: Classification boundaries for SLLE and the classification error rate
Classifier f (yi ) APER

Linear 2.8 + 2.42x − 4.95y 8.7%
Quadratic 1.81 + 0.75x − 5.88y + 1.96x2 + 3.3xy + 1.13y 2 13.3%
105
Table 5.9: Comparison of the classification results between PCA-LDA and SLLEP-
LDA on the test set
PCA LLE PCA LLE PCA LLE

Model
Sens. Sens. Spec. Spec. Metric Metric
1 65.4% 70.0% 100% 99.1% 82.7% 84.5%
2 53.1% 85.4% 100% 94.0% 76.5% 89.7%
3 15.4% 80.0% 100% 98.1% 57.7% 89.1%
4 0.0% 90.8% 100% 93.5% 50.0% 92.1%
SLLEP-LDA outperforms PCA-LDA by 9.4%.

Figure 5.11 provides a visual comparison of the process monitoring output, also
defined as the measure of process shift, on the test data-set for Model 1 of PCA-LDA
and Model 4 of SLLEP-LDA. Bearing pressure and motor power are included in the
figure to help orientate these results to the complete test data shown in Figure 5.4.
Figure 5.11 demonstrates that the output between the two methods is nearly identical.
In fact, SLLEP-LDA is slightly more sensitive to the overload than PCA-LDA, and
has resulted in a misclassification around sample 70. False alarms such as these are
proportional to the specificity measurement, and SLLEP-LDA could easily be tuned to
reduce these misclassifications if the classification boundary was adjusted by increasing
the cost of misclassifying nominal data as overloaded in Equation (4.30).
This is one example where, if the assumptions of PCA are valid and the dimension-
ality of the observation space is low, then PCA can almost match the performance of
SLLEP. However, because of the assumption of linearity and multivariate normality
in PCA that SLLEP does not require, SLLEP as a feature extractor will always be
equivalent or better than PCA if trained properly (De Ridder and Duin, 2002).
Table 5.9 demonstrates the power of SLLEP-LDA as a superior process monitoring
tool for high-dimensional data-sets. PCA-LDA shows a deteriorating performance on
the test set as the number of variables in the model increased. This observation is likely
explained by the fact that we observed the extra variables included in Models 2 to 4
are more nonlinear than the five variables carefully selected for Model 1. However,
106
Bearing Pressure 1 1
Overload
Motor Power
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0 50 100 150 200 250
1600
Overload
PCA Output
1200
800
400
0 50 100 150 200 250
10
SLLEP Output
Overload
0
−10
−20
0 50 100 150 200 250
Sample Number
Figure 5.11: Output from process monitoring on the test set for both PCA-LDA and
SLLEP-LDA
the classification performance of SLLEP-LDA improves as the number of observation

variables increases because the method does not require linearity of the variables in the
model, thus more information about the process operation is mapped to the feature
space. In addition, since PC scores are mapped to the feature space using a linear
equation, the information of the fault in the PCs becomes masked by the new variables.
Consequently, a linear model is not a useful tool to extract features pertaining to the
overload for nonlinear variables. Moreover, PCA-LDA in Model 1 is only successful
as a process monitoring method in this case study because of the large effort invested
in selecting and transforming the variables to allow PCA to perform well. On the
other hand, SLLEP-LDA does not require as large of an investment in the selection of
107
process variables because highly nonlinear variables do not need to be pre-screened.
SLLEP-LDA has proven to have equivalent or greater classification accuracy than
PCA-LDA in this case study. However, SLLEP-LDA has some disadvantages too.
Although less effort is required to select process variables, more effort is required to
perform a grid search on the tuning parameters K and α in SLLEP that give the
best classification results in the feature space by LDA. However, if a fast computer
is available during the offline training of SLLEP-LDA to find K and α, then the
grid search can be performed in a reasonable amount of time. Another disadvantage
to SLLEP-LDA is the online computational complexity. PCA-LDA requires only a
handful of calculations; however, SLLEP-LDA requires the real-time solution to an
unconstrained QP.
In Chapter 4.1.2 we demonstrated a new approach that allows SLLEP-LDA to be
utilized in the industry for process monitoring. Consequently, if MPC is available in
the control software at a specific site, then it is possible to solve the unconstrained
QP in real-time with minimal controller loading and programming. In this case study,
Gibraltar Mine uses a modern DCS that is equipped with MPC. However, in their
version of the commercial DCS product, the MPC consists of an optimizer that solves
an LP to provide set-points to the MPC controller. Because the optimizer cannot be
bypassed in this version, the QP of the MPC controller cannot be directly accessed.
As a result, SLLEP-LDA is only solved with the LP and results in poor mapping of
new data to the feature space.
Because of the limitation of the control software available at Gibraltar Mine, we
were not able to implement the SLLEP-LDA process monitoring tool, and the model-
ing and simulation portion of this case study is conducted using the PCA-LDA process
monitoring method. Despite this, we have successfully demonstrated to a major con-
trol systems manufacturer the benefit of bypassing the optimizer to access the MPC
controller directly in order to use the QP solver. Moreover, we submitted a technology
request with the advanced control engineers at the major control systems manufac-
108
turer, and they have agreed to incorporate these changes into the software in a future
release. To put it another way, this work has helped to advance one of the indus-
try’s best commercial control products, allowing for some of the most forthcoming
techniques in literature to be implemented in the industry.
As a final note on the process monitoring method developed for the case study
on the SAG mill at Gibraltar Mine, this work provides a new technique that is su-
perior to the best practice. Ko and Shang (2011) demonstrate PCA with Q and T 2
classification on industrial operating data. However, their study does not validate the
process monitoring method on new test data; instead, they performed cross-validation
on the training data. However, we found with 10-fold cross-validation that PCA can
successfully classify the training data. By way of contrast, the real test is when the
PCA model is tested on new test data where it is likely that the process operating
conditions have changed. We found that careful selection of process variables, classifi-
cation of the PC score space, and scaling the observation variables is the best way to
make sure PCA generalizes well to new data where process conditions have changed.
As a result, this work presents two process monitoring methods that are designed to
work in the industry given realistic operating conditions. Likewise, we have success-
fully implemented PCA-LDA at Gibraltar Mine in January 2013, and have found that
this tool gives the expected output results for their operating conditions. However,
no overloads have occurred in this first half of 2013 to fully evaluate the performance
of PCA-LDA because Gibraltar Mine has recently added an additional SAG mill to
de-bottleneck the original SAG mill.
5.3 Modeling of the overloaded operating region
To model the overload operating region of the SAG mill, we first considered a first-
principles model of the internal dynamics of the SAG mill at Gibraltar Mine based
off the work of Morrell (1993); Apelt (2002), and the work of the Julius Kruttschnitt
Mineral Research Centre at the University of Queensland (Napier-Munn et al., 1996;
109
Valery, 1998). These models are based upon dynamic mass balances for solids, water,
ball charge, and shell liner wear. However, we instead chose an empirical modeling
approach because this approach has the advantage of significantly reducing the time
required for plant personnel to gather detailed information about the design of the
SAG mill. Moreover, many of the model parameters would have to be estimated from
data available at similar plants. Finally, the first-principles model does not account
for the rheological nature of the slurry, flow-back problems of the pulp-lifter discharge
system, slurry pooling, and freewheeling; as a consequence, the models do not provide
an accurate simulation of the onset of mill overloading required for this case study
(Katubilwa and Moys, 2011; Shi and Napier-Munn, 2002; Morrison and Cleary, 2008;
Latchireddi and Morrell, 2003a,b).
The modeling techniques described in Chapter 4.2.1 are used to model the overload
operating region of the SAG mill at Gibraltar Mine. In this case, there are two MVs:
feed ore flow, u1 , and water flow, u2 , and one CV: measure of process shift, y. The
measure of process shift is the output of the process monitoring method discussed in
Chapter 5.2. Since there are two MVs and one CV, we must train two process models
to find the response between each input-output pair. Considering we know that the
feed input will have the strongest effects on the mill behaviour, we first model the
response of y to changes in u1 .
Overload data from February 27th, 2012 was used to train the response between
u1 to y because u2 was not initially changed to correct the overload response. Conse-
quently, we can use the data from this region to find the response in y that is directly
caused by changes in u1 and not u2 . Figure 5.12 shows the data sampled at 30 seconds
used to develop the model relation between u1 to y. In this data, we found 29 samples
where the mill is overloaded and u1 is reduced to zero to let the mill grind the ore
charge and recover the process to nominal operation.
Three different models were fitted to the data shown in Figure 5.12 using the MAT-
LAB system identification toolbox (Ljung, 2007a): P1D, P1DZ, and P2D. Here, P1D
110
3000 6000
Water (GPM)
Feed (STPH)
2000 4000
1000 2000
0 0
0 5 10 15 20 25 30
1600
1400 Overload
Measure of Process Shift
1200
1000
800
600
PCA Output P1D P1DZ P2D
400
0 5 10 15 20 25 30
Sample Number
Figure 5.12: Model fitting results for the response of the process shift to changes in u1
corresponds to the a model with one pole and time delay as given in Equation (4.40).
P1DZ corresponds to a model with one pole, a zero, and time delay as given in Equa-
tion (4.41). Finally, P2D corresponds to a model with two poles and time delay as
given in Equation (4.42). A model with two poles, a zero, and time delay, as given
in Equation (4.43), was not considered because there is not enough data to fit a high
order model as seen with P2D in the figure. The best fitting model is P1D as shown
in the figure, and the model parameters for P1D and the two other candidate models
are shown in Table 5.10 for the data-set labelled Feb 27. In this table, r̄ represents
the average model residual per sample. r̄ is simply the total residuals divided by the
number of samples in the data-set, and it is used instead of the total residuals to pro-
111
vide a comparison of the residuals against models trained on data-sets with more or
less samples. In addition, ‘Fit’ provides a measure that compares how well the model
predicts the response compared to a line centered on the mean of the response. Fit is
calculated from the expression:
!
||y − ŷ||
Fit = 100 1 − , (5.19)
||y − mean(y)||
where ŷ is the array containing the predicted response from the model, and || · ||
corresponds to the L2 norm.
Table 5.10: Modeling steps and fitting results
Data-set Purpose Name Model Fit (%) r̄(×103 )

Y (s) −116s
Feb 27 Train P1D = 0.638e
437s + 1 94.8 0.37
U1 (s)
Y (s) 0.836(−57.6s + 1)e−25.7s
Feb 27 Train P1DZ = 731s + 1 91.7 0.94
U1 (s)
Feb 27 Train P2D
Y (s)
= 0.265e−17.8s 20.6 85.8
U1 (s) (0.03s + 1)(0.03s + 1)
Y (s) −116s
Feb 29 Fit P1D = 0.638e
437s + 1 84.3 3.65
U1 (s)
Y (s) −475s
Feb 29 Train n/a = 0.021e
3.57s + 1 0.00 323.5
U2 (s)
Y (s) −116s
May 27 Test P1D = 0.638e
437s + 1 0.00 109.4
U1 (s)
From Table 5.10, we observe that the fit is highest and r̄ is the lowest with P1D. Ac-
cordingly, it is not necessary to perform an F -test on the model residuals to determine
which model has the best fit for the lowest model order as detailed in Appendix E.
The next step to model the overload operating region is to model the response
between u2 and y. In this case, data are available for the overload on February 29th,
2012 where both u1 and u2 were reduced to recover from the overload. To find the
response between u2 and y, we must estimate the response of y from u1 first, then
112
the remainder between the estimated response and the true response is used to fit the
model between u2 and y. The data from February 29th is shown in Figure 5.13. The
second subplot of the figure shows the estimated response calculated by P1D. The
third subplot shows the remainder of the response that is unexplained by P1D.
3000 6000
Water (GPM)
Feed (STPH)
2000 4000
1000 2000
0 0
0 5 10 15 20 25 30 35 40
1600
Overload
1200
800
PCA Output
P1D
400
0 5 10 15 20 25 30 35 40
150
P1D Fit Remainder
Remainder
100 Model Fit
50
−50
0 5 10 15 20 25 30 35 40
Sample Number
Figure 5.13: Fitting P1D to a new training data-set in order to fit a model for u2 on
the fit remainder
The model remainder shown in Figure 5.13 was used to identify the FOPTD model
response between u2 and y; however, the figure shows that the model fit is poor. One
explanation for the poor fit between u2 and y is that there is not enough data to
113
properly identify the model. Alternately, there is possibly a poor correlation between
u2 and y, or the correlation is very nonlinear. However, there is not enough data in
the overload operating region needed to fit a nonlinear model, and the model fit from
P1D alone satisfactorily explains the response. In fact, Figure 5.13 can be thought of
as a test set for P1D, and Table 5.10 shows that the fit is reasonable at 84.3%. As a
final point, the engineers at Gibraltar Mine were also unable to fit a model between
u2 and motor load for the nominal operating range, and the effects of water on mill
dynamics are poorly understood in the industry and academia.
Table 5.11 summarizes the final model fitting results for u1 and y and u2 and y.
The table shows that the parameters variances of P1D only a fraction of a percent
of the magnitudes of the parameters. Although this is desirable, it is also unlikely
that the real parameter variances are so low because P1D trained on a data-set with
a limited size. Moreover, the parameter variances for the response between u2 and y
were negligible. However, the data-set to train this model was not large enough to
appropriately estimate the model or the parameters variances. As a result, the overall
process model for the overload operating region is P1D, and the model between u2
and y is not included in the overall model.
Table 5.11: Best fit models and their parameter variances
Model Parameter Unit Value Variance

Process Shift
K STPH
0.638 0.001
−116s
Y (s) = 0.638e
437s + 1 U1 (s) τ seconds 437 1.7
θ seconds 116 0.03
Process Shift
K GPM
0.021 n/a
−475s
Y (s) = 0.021e
3.57s + 1 U2 (s) τ seconds 3.57 n/a
θ seconds 475 n/a
The final step, now that the overall model has been chosen, is to validate the overall
model on a test data-set. Overload data on May 27th, 2012 was used to validate the
overall process model, P1D. There are 290 data points sampled every 30 seconds in
114
this data-set. This data-set and the estimated model response is shown in Figure 5.14.
3000 6000
Water (GPM)
Feed (STPH)
2000 4000
1000 2000
0 0
0 50 100 150 200 250 300
1600
Overload
1200
800
PCA Output
P1D
400
0 50 100 150 200 250 300
Sample Number
Figure 5.14: P1D fitting results on a test data-set to validate model
Figure 5.14 shows the estimate of the process shift is accurate in certain regions,
and inaccurate in others. This is expected because the process is nonlinear, and P1D
is a linear model. Correspondingly, P1D was trained to estimate the response of y
only when the system is in an overload operating region. When the system is in
the nominal operating region then the model for y is not needed because the nominal
controller models will be used instead. In this figure, however, we see from the measure
of process shift that the system shifts three times from an overloaded to the nominal
operating region. As a result, P1D will have an excellent fit on the shift from overload
115
to nominal, yet will have a poor fit when the process has recovered to the nominal
operating region. Only when the process finally returns to the overload does P1D once
again show a strong fit for the recovery from overload to nominal. The fitting results
for the test set are also included in Table 5.11. However, these results are ambiguous
because the model fit will only be accurate for changes to the MVs in the overload
region.
The models determined in this chapter are used in the recovery MPC as the process
models for the response in process shift to changes in the feed rate MV. Moreover,
the process model between the process shift measurement and water addition is set
to a zero gain model to eliminate changes in water in the recovery response. Since
a recovery response will involve reducing the feed and maintaining water flow, it is
expected that the response will decrease the density of the charge in the mill sufficiently
to help offset mill overloading.
Expert rules
The final modeling procedure required is to determine the expert rules and a limit in
the magnitude of process shift where the expert rules take control of the MVs. From
the data in Figure 5.14, we see that the operators reduce the feed to zero until the
fault has corrected, and then the operators return the feed to roughly the same level
prior to their intervention. However, there is no consistent action taken with the water
addition. Based on these actions, the expert rules should automatically reduce the feed
to zero until the recovery controller can finish the process recovery, then the feed will
be returned to a feed rate similar to the level before the expert system intervened. In
addition, since the action to take with water is uncertain from the data, the water will
be held constant because the ratio between feed and water will help reduce the density
and viscosity of the charge during an overload. With simulation and plant trials, these
rules may be adjusted. However, to start, based on our observations from the data of
the operators’ actions during an overload, we recommend using the following rules:
116
Rule 1 If the magnitude of process shift is greater than 1400, override control of the
feed and water MVs, and reduce the feed rate to zero.
Rule 2 If Rule 1 is active and the magnitude of process shift is less than 1300, increase
feed to 80% of the feed rate before Rule 1 was active and return control to the
override or nominal controller.
5.4 Recovery from the SAG mill overload
The last part of the case study is a simulation that studies the behaviour of the
control strategy in the event of an incipient or abrupt disturbance to the process.
Because Gibraltar Mine has recently added a second SAG mill to their process, they
no longer experience overloading with either of their SAG mills because comminution is
no longer the bottleneck at their facility. Consequently, we were unable to incorporate
the control strategy at Gibraltar Mine, and a simulation was performed instead.
The process models used in the simulation are the linear process models identified
by Spartan Controls during commissioning of the MPC. However, the time constant of
the models was reduced from roughly 10 minutes to approximately one minute. The
dead time was also reduced by a factor of 10 in order to speed up the simulation;
nonetheless, the sample rate is taken to simulate a two minute sampling of the real
process. Moreover, nonlinearity was added to the process outputs to account for the
nonlinear nature of the process when the mill has overloaded. The behaviour of the
process in this simulation is rudimentary and will not be nearly as accurate as the true
response of the SAG mill. However, the models in the simulation do not necessarily
need to be accurate because the purpose of this simulation is to demonstrate the
behaviour of the control strategy. A more accurate test performed at Gibraltar would
only allow us to demonstrate how well we modelled the abnormal operating region.
Nevertheless, this was demonstrated in Chapter 5.3 on a large test data-set.
In each of the three simulations examined in this section, MPC is used as the
117
nominal controller with two MV’s: feed rate and water flow, and two CV’s: feed-
end west bearing pressure and motor power A. In Scenario 1, an incipient disturbance
enters the system where no control strategy is commissioned to recover the process from
an overload. Scenario 2 demonstrates the effectiveness of the control strategy to recover
the process from an overload following the same input disturbance given in Scenario 1.
Finally, in Scenario 3, an abrupt disturbance leads to significant overloading that the
expert system mitigates before returning control to the MPC.
The effects of the disturbance in these simulations is most visible from the estimated
process shift output representing the simulated PCA-LDA output. Additionally, the
disturbance also affects the bearing pressure and motor load. Bearing pressure will
rise during an overload because the grinding efficiency within the mill is depressed and
causes ore to accumulate in the mill. Furthermore, motor power will initially increase
during an overload; however, when the overload becomes severe, the motor power will
decrease indicating the presence of freewheeling or slurry pooling.
Scenario 1: Incipient disturbance without recovery control.
Scenario 1 demonstrates the behaviour of the controller when no control strategy is

commissioned to recover the process from an overload. In this case, the overload results
in a positive feedback where the overload results in ore accumulating in the mill, thus
decreasing the grinding efficiency of the mill and worsening the overload. The results
of this simulation are shown in Figure 5.15.
In Figure 5.15 we see that the incipient disturbance that enters at sample 15 causes
the process shift to increase, indicating the process is becoming closer to an overload.
However, when the process is classified as overloaded around sample 44, the controller
is reacting slowly because it is only responding to the increase in bearing pressure from
the target SP of 950 psi. Even when the disturbance stops increasing at sample 55,
the positive feedback mechanism of the overload causes the process shift to gradually
increase. As the overload worsens, the bearing pressure increases and the motor power
decreases. However, the controller is tuned to decrease feed when the bearing pressure
118
Process Shift 1400 100
Disturbance
1300 Overload 75
1200 50
1100 25
1000 0
0 20 40 60 80 100 120 140
FEW Bearing Pressure (psi)
Motor Power A (kW)

1200 6000
1100 5500
1000 5000
900 4500
800 4000
0 20 40 60 80 100 120 140
Water Flow (GPM)

3000 6000
Feed Rate (STPH)
2000 5000
1000 4000
0 3000
0 20 40 60 80 100 120 140
Sample Number
Figure 5.15: Scenario 1: control strategy has not been incorporated to mitigate the
incipient disturbance
is high, and increase the feed when motor power is low. As a result, the nonlinearity of
the process results in an incorrect response between motor power and feed rate. This
conflicting response within the MPC causes the changes to the feed to be slow and
smaller in magnitude than required to prevent the overload from worsening. Only at
sample 134 do the operators recognize the process is overloaded and manually make
corrective actions before the bearing pressure exceeds its upper constraint of 1200
psi. However, the operators’ response arrives 90 samples after the process monitoring
119
method first detected that the mill was overloaded.
Scenario 2: Incipient disturbance with recovery control.
Scenario 2 demonstrates the same disturbance that enters the system as in Scenario 1;
however, Scenario 2 treats the measure of process shift as a constraint variable in the
MPC with a constraint limit of 1195 as determined from PCA-LDA. The constraint
variable allows the MPC to switch models from controlling the bearing pressure and
motor power to controlling the measure of process shift as a priority. If the constraint
variable exceeds the constraint limit, or if the MPC predicts the constraint limit will
be exceeded within a 120 sample prediction horizon, then the MPC will act to reduce
the feed to reduce the process shift to the constraint limit. Figure 5.16 shows the
results for this simulation.
In Figure 5.16, we observe that the overload was detected by the process monitoring
method around sample 44—the same as Scenario 1. This tells us that the MPC did not
predict the constraint variable would exceed the limit within its prediction horizon.
This result makes sense, because the prediction of the constraint variable is determined
from the models developed in Chapter 5.3 between the measure of process shift and
the feed and water MVs. The overall model response assumed water flow had zero
gain on the process shift, and feed had a positive gain. In other words, when the feed
decreases in the simulation, the MPC predicts that the measure of process shift should
decrease. However, the MPC does not observe the unmeasured disturbance entering
the system and cannot compensate until the constraint limit on the constraint variable
is violated.
Once the constraint limit for the constraint variable is violated, we see that the
MPC prevents the overload from worsening by sample 70 by reducing the feed rate.
In this simulation, the corrective action of the feed flow was faster than in Scenario 1,
and the overload stopped worsening by sample 70. Moreover, the MPC was able to
keep the process shift at the constraint limit of the constraint variable until the dis-
turbance subsided. We expect the process to have the optimal throughput at the
120
Disturbance
1300 Overload 75
1200 50
1100 25
1000 0
0 50 100 150 200 250 300
Motor Power A (kW)

1200 6000
1100 5500
1000 5000
900 4500
800 4000
0 50 100 150 200 250 300
Water Flow (GPM)

3000 6000
Feed Rate (STPH)
2000 5000
1000 4000
0 3000
0 50 100 150 200 250 300
Sample Number
Figure 5.16: Scenario 2: process shift constraint variable is exceeded causing the MPC
to correct for the incipient disturbance before severe overloading results
constraint variable limit because the process is maximally loaded without being over-
loaded. Therefore, the MPC maintains optimal operation of the process in the presence
of the large disturbance that entered the system.
Scenario 3: Abrupt disturbance with recovery control.
Scenario 3 demonstrates the response of the control strategy in the presence of a large
and abrupt disturbance that rapidly overloads the SAG mill. In this case, the MPC
is unable to compensate for the unmeasured disturbance alone. Figure 5.17 shows the
results for this simulation.
In Figure 5.17, we see that by sample 20, the process shift has exceeded the limit
121
Disturbance
1300 Overload 75
1200 50
1100 25
1000 0
0 20 40 60 80 100 120 140 160 180 200
Motor Power A (kW)

1200 6000
1100 5500
1000 5000
900 4500
800 4000
0 20 40 60 80 100 120 140 160 180 200
Water Flow (GPM)

3000 6000
Feed Rate (STPH)
2000 5000
1000 4000
0 3000
0 20 40 60 80 100 120 140 160 180 200
Sample Number
Figure 5.17: Scenario 3: abrupt disturbance leads to significant overloading that the
expert system mitigates before returning control to the MPC
that was set at a process shift value of 1300 in this simulation where the expert system
is designed to act. The rule in the expert system is to decrease the feed to zero when
the process shift exceeds 1300. The rule of 1300 is for simulation purposes only, and a
different value may be used on-site. A limit of 1300 was used instead of 1400 because
the recovery controller was aggressive enough to make it difficult in the simulation to
bring the process shift above 1400.
The expert system will maintain control of the feed and keep the water constant
until the process shift returns to below 1230 in this simulation. When the process shift
is below 1230, the next rule in the expert system is to increase the feed to 80% of the
original feed rate prior to returning control of the MVs to the MPC.
122
Once the expert system activates, we see that the overload is completely corrected
after approximately 30 samples. The process recovery is smooth; however, there is
a slight overshoot of the process shift around sample 100. On the other hand, the
overshoot is small, and it is possible to correct by adjusting the expert rules slightly.
For example, the expert system could return control of the MVs to the MPC earlier,
or the feed rate could be recovered closer to 100% of the original feed rate prior to
returning control of the MVs.
Key performance measures
We have selected three key performance indices to measure potential benefits of the
controller performance recovery algorithm. These three measures are: the time to
detect an overload, the time until the overload is fully corrected, and the amount of
extra ore, and its value, that Gibraltar Mine is able to process. We will look at the
benefits of two potential cases: the difference in responses between Scenario 1 and
Scenario 2, and the ability to increase the throughput of the SAG mill since the risks
of an overload are reduced.
The benefits of the control strategy for an event such as Scenario 3 are not consid-
ered because of the uncertainty of the costs of this event. Without a control strategy
in this scenario, the abrupt disturbance would cause the bearing pressure limit to be
exceeded and the mill would have been immediately stopped. Likewise, the damages
to the mill would need to be assessed, and it would be difficult to estimate potential
equipment damages and downtime of the mill.
Table 5.12 summarizes the three key performance measures for the difference in
response between Scenario 1 and Scenario 2. Dollar values are given in U.S. dollars
(USD) because these metals are usually traded in USD in the global market. The
assumptions for these estimates are provided in Table 5.13.
To determine the amount of additional ore processed between Scenario 1 and Sce-
nario 2, we assumed that the operators reduced the feed to zero for 30 samples, or one
123
Table 5.12: Key performance indices to measure the potential savings between Sce-
nario 1 and Scenario 2
Key performance measure Value Unit

Reduction in time to detect overload 3 hours
Reduction in time to fully recover from overload 1.5 hours
Additional ore processed 1816 tons
Profit from additional ore processed 25 257 USD
hour, after the overload was detected on sample 134 in Figure 5.15; reducing the feed
to zero for one hour was observed in historical data for a similar overload on February
29th, 2012. Then, we compare the total throughput of the mill over a period of 164
samples, or 5.4 hours, for both scenarios. The profit from the additional ore processed
is calculated using the equation:
Profit = quantity of ore × copper grade × copper price/2
+ quantity of ore × molybdenum grade × molybdenum price/2
− quantity of ore × cost per ton, (5.20)
where the grades, prices, and costs are given in Table 5.13, and the sale price of the
minerals in the concentrate are assumed to sell for half the market value.
Table 5.13: Assumptions for calculating the key performance measures
Assumptions Value Unit

Copper grade of ore1 0.32 %
Molybdenum grade of ore1 0.01 %
2011 average copper price2 8811 USD/ton
2011 average molybdenum price2 15.49 USD/lb
Operating days in a year 300 days/year
Cost per ton of ore processed3 3.48 USD/ton
1
Provided by Gibraltar Mine.
2
London Metal Exchange.
3
An estimate from Wills and Napier-Munn (2006) of operating expenses including labour, energy,
and water usage for a typical copper concentrating facility.
124
The second return of investment opportunity for Gibraltar Mine is the ability to
slightly increase the throughput of the SAG mill. As we mentioned at the beginning
of this chapter, Gibraltar Mine operates their process conservatively because of their
concern for the SAG mill to overload. However, based upon evidence from historical
data, we estimate that they could increase their average throughput by 10 STPH since
they would have the capacity to detect the early onset of an overload and automatically
recover from it. This leads to an estimated increase in profit by one million USD per
year based on an assumed 300 days of operation per year. Moreover, we believe
a 10 STPH throughput increase is a conservative estimate as it represents a 0.5%
increase in the average throughput of approximately 2000 STPH. The assumptions and
calculations of this estimate are also given in Table 5.13, and an additional assumption
is that the SAG mill is the bottleneck of their facility.
125
Chapter 6
Conclusion
Effective management of abnormal situations in a chemical or mineral processing facil-

ity is vital to maximizing process economy and safety. Traditionally, statistical process
monitoring has been utilized to alert operators when a process has shifted to an ab-
normal operating region. The operators then manually make the corrective control
action to recover the process to the nominal operating range. However, this thesis
bridges the gap between statistical process monitoring and the DCS. Consequently,
the DCS is provided with enough information about the process shift to the abnormal
operating range and the necessary control actions that must be made to recover the
process to nominal operation. It is in these specific areas that this work has made the
following contributions:
1. A general control strategy has been developed in Chapter 4 that proposes two
statistical process monitoring techniques that may be used for either linear or
nonlinear processes to detect a process shift into to an abnormal operating sit-
uation. In addition, the control strategy includes two linear control techniques
designed to recover a process from the nonlinear abnormal operating region to
the nominal operating region.
2. A technique to allow SLLEP to be performed in real-time in an industrial DCS

has been developed and described in Chapter 4.1.2. Since most modern industrial
DCS include MPC, it may be used as a tool to solve a QP in real-time with
126
minimal programming. This allows us to map new data to the feature space for
classification in the SLLEP process monitoring algorithm.
3. The control strategy was demonstrated on a SAG mill at Gibraltar Mine that
detects a mill overload and uses the DCS to recover the process without operator
intervention. Moreover, the causes and mechanics of a SAG mill overload have
been thoroughly studied and comprehensively discussed in Chapter 2.3 for the
first time in literature to the best of our knowledge.
6.1 Summary of contributions
The general control strategy proposes using PCA or SLLEP as a feature extractor and
then Q, T 2 , LDA, or QDA as a classifier in process monitoring to identify abnormal
operating situations. Specifically, SLLEP is proposed as a new feature extractor for
processes that behave nonlinearly. The output of the classifiers is a continuous vari-
able that provides an estimate of the magnitude of the process shift to an abnormal
operating region. This measure of process shift can be used in two ways: first as a CV
in an override MPC controller to override the existing nominal controller’s MVs. The
second technique is when the nominal controller is MPC. The technique is to use the
measure of process shift as a constraint variable in the MPC, and the constraint vari-
able models in the MPC are modelled for the nonlinear abnormal operating region. In
addition, if the measure of process shift indicates a severe overload, an expert system
based off of an operator’s best recovery control actions is proposed to recover a pro-
cess from a severely nonlinear abnormal operating region to the less-severe abnormal
operating region that the linear recovery controller is designed.
We have developed a method to use MPC to solve the QP in the SLLEP algorithm
in real-time in a DCS. SLLEP is a new and advanced process monitoring technique
that has shown in this work to be effective for nonlinear processes, or processes with
many variables to be monitored. Through this work, we have demonstrated to one
127
of the largest control software developers in the industry the value of SLLEP and the
necessity to directly access the QP of the MPC controller. As a result, the developers
have agreed to incorporate a manual override of the LP optimizer in their DCS for
advanced users to gain direct access to the QP. In other words, we have helped advance
the industry enough to allow the commissioning of a highly advanced and forthcoming
technique such as SLLEP.
The control strategy in this work was demonstrated on a SAG mill at Gibraltar
Mine. We successfully demonstrated that the process monitoring method is robust
to changes in process operating conditions to be able to accurately detect the early
onset of a mill overload even after significant changes to the process occurred over
the year this study was carried out. In addition, the PCA-LDA process monitoring
technique has been installed at Gibraltar Mine and is providing information to the
operators on the status of their SAG mill. Although Gibraltar Mine was unable to
incorporate the full control strategy developed in this thesis due to staff limitations, we
have demonstrated the control strategy has the potential for significant savings in the
operation of their SAG mill. The control strategy is capable of optimally recovering
their process from an overload before it becomes severely overloaded. As a result,
process stability and safety is significantly increased. Likewise, the control strategy
allows the process to be operated closer to the maximum throughput of the SAG mill
without having to suffer the consequences of the increased risk of overloading the mill.
6.2 Future research
The following is a list of potential research subjects based on the work described in
this thesis:
1. The improvement in modeling of the abnormal operating region in the cases

where data availability is limited, or the excitation of the process inputs is so
large that the response in the data is nonlinear.
128
2. The improvement of locally linear embedding to map new observations to the
feature space for classification. Specifically, the mapping errors need to be re-
duced, especially for regions of data sparsity in the training set.
3. The training of a classifier using fuzzy class data rather than binary class data
4. The application of this control strategy to other industries.
129
Bibliography
Apelt, T. A. Inferential measurement models for semi-autogenous grinding mills. Ph.D.

Thesis, University of Sydney, Australia, 2002.
Bailey, S. J. From desktop to plant floor, a CRT is the control operators window on
the process. Control Engineering, 31(6):86–90, 1984.
Basseville, M. Detecting changes in signals and systems–A survey. Automatica, 24(3):

309–326, 1988.
Bequette, B. W. Process Dynamics: Modeling, Analysis, and Simulation. Prentice-

Hall, 1998.
Bianchi, F., De Battista, H., and Mantz, R. Wind Turbine Control Systems: Princi-
ples, Modelling and Gain-scheduling Design. Springer, 2006.
Bly, M. Deepwater Horizon Accident Investigation Report. DIANE Publishing, 2011.
Boser, B. E., Guyon, I. M., and Vapnik, V. N. A training algorithm for optimal margin
classifiers. In Proceedings of the 5th Annual ACM Workshop on Computational
Learning Theory, pages 144–152, 1992.
Boyd, S. and Vandenberghe, L. Convex Optimization. Cambridge University Press,

2004.
Camacho, E. and Bordons, C. Model Predictive Control. Springer, 2nd edition, 2007.
Chen, W. H. Disturbance observer based control for nonlinear systems. Mechatronics,

IEEE/ASME Transactions on, 9(4):706–710, 2004.
Chen, X. S., Li, Q., and Fei, S. Constrained model predictive control in ball mill
grinding process. Powder Technology, 186:31–39, 2008.
Chen, X. S., Yang, J., Li, S. H., and Li, Q. Disturbance observer based multi-variable
control of ball mill grinding circuits. Journal of Process Control, 19:1205–1213, 2009.
Cheng, H., Nikus, M., and Jämsä-Jounela, S.-L. Evaluation of PCA methods with
improved fault isolation capabilities on a paper machine simulator. Chemometrics
and Intelligent Laboratory Systems, 92(2):186–199, 2008.
130
Chisci, L., Falugi, P., and Zappa, G. Gain-scheduling MPC of nonlinear systems.
International Journal of Robust and Nonlinear Control, 13(3-4):295–308, 2003.
Choi, S. W., Lee, C., Lee, J.-M., Park, J. H., and Lee, I.-B. Fault detection and identi-
fication of nonlinear processes based on kernel PCA. Chemometrics and Intelligent
Laboratory Systems, 75(1):55–67, 2005.
Chou, H. H. Fault diagnosis of a heat exchanger system using unknown input observers.
M.A.Sc. Thesis, University of Toronto, Canada, 2000.
Cleary, P. W., Sinnott, M., and Morrison, R. Prediction of slurry transport in SAG
mills using SPH fluid flow in a dynamic DEM based porous media. Minerals Engi-
neering, 19:1517–1527, 2006.
Coetzee, L. C. Robust nonlinear model predictive control of a closed run-of-mine ore

milling circuit. Ph.D. Thesis, University of Pretoria, South Africa, 2009.
Craig, I. K. and MacLeod, I. M. Specification framework for robust control of a

run-of-mine ore milling circuit. Control Engineering Practice, 3(5):621–630, 1995.
De Ridder, D. and Duin, R. P. W. Locally linear embedding for classification. Pattern

Recognition Group, Dept. of Imaging Science & Technology, Delft University of
Technology, Delft, The Netherlands, Tech. Rep. PH-2002-01, 2002.
Dean, D. R. Memo to Jim Dunstan, Rosebud Joint Venture, Nevada, January 1999.
Doyle, J. C., Francis, B. A., and Tannenbaum, A. Feedback control theory, volume 1.
Macmillan Publishing Company, New York, 1992.
Draper, N. R., Smith, H., and Pownell, E. Applied Regression Analysis. John Wiley
& Sons, 3rd edition, 1998.
Fisher, R. A. The use of multiple measurements in taxonomic problems. Annals of

Human Genetics, 7(2):179–188, 1936.
Frank, P. M., Ding, S. X., and Marcu, T. Model-based fault diagnosis in technical
processes. Transactions of the Institute of Measurement and Control, 22(1):57–101,
2000.
Friedberg, S. H., Insel, A. J., and Spence, L. E. Linear Algebra. Pearson, 4th edition,
2002.
Galán, O., Barton, G. W., and Romagnoli, J. A. Robust control of a SAG mill. Powder
Technology, 124(3):264–271, 2002.
Garcı́a, C. E., Prett, D. M., and Morari, M. Model predictive control: Theory and
practice–A survey. Automatica, 25(3):335–348, 1989.
131
Garcı́a-Álvarez, D., Fuente, M. J., Vega, P., and Sainz, G. Fault detection and diagno-
sis using multivariate statistical techniques in a wastewater treatment plant. In 7th
IFAC International Symposium on Advanced Control of Chemical Processes, pages
952–957, 2009.
Garcı́a-Álvarez, D., Fuente, M., and Sainz, G. Fault detection and isolation in transient
states using principal component analysis. Journal of Process Control, 22(3):551–
563, 2012.
Garnier, H., Mensler, M., and Richard, A. Continuous-time model identification from
sampled data: Implementation issues and performance evaluation. International
Journal of Control, 76(13):1337–1357, 2003.
Gill, D. A., Picou, J. S., and Ritchie, L. A. The exxon valdez and BP oil spills: A com-
parison of initial social and psychological impacts. American Behavioral Scientist,
56(1):3–23, 2011.
Goodwin, G. C. A brief overview of nonlinear control. Centre for Integrated Dynamics

and Control Bildirisi, Department of Electrical and Computer Engineering, The
University of Newcastle, Australia, 2002.
Gupta, A. and Yan, D. Mineral Processing Design and Operations. Elsevier, Amster-
dam, The Netherlands, 2006.
Hespanha, J. P. and Morse, A. S. Switching between stabilizing controllers. Automat-

ica, 38(11):1905–1917, 2002.
Hespanha, J. P., Liberzon, D., and Morse, A. S. Hysteresis-based switching algorithms

for supervisory control of uncertain systems. Automatica, 39(2):263–272, 2003.
Himmelblau, D. M. Fault detection and diagnosis in chemical and petrochemical pro-

cesses. Elsevier Scientific Publishing Company, 1978.
Horn, R. A. and Johnson, C. R. Matrix Analysis. Cambridge University Press, 1990.
Isermann, R. Model-based fault-detection and diagnosis–Status and applications. An-

nual Reviews in Control, 29(1):71–85, 2005.
Jackson, E. J. and Mudholkar, G. S. Control procedures for residuals associated with

principal component analysis. Technometrics, 21(3):341–349, 1979.
Jackson, J. E. A user’s guide to principal components, volume 244. Wiley-Interscience,

1991.
Johnson, R. A. and Wichern, D. W. Applied Multivariate Statistical Analysis. Pearson,

6th edition, 2007.
132
Kalman, R. E. A new approach to linear filtering and prediction problems. Journal
of Basic Engineering, 82:35–45, 1960.
Katubilwa, F. M. and Moys, M. H. Effects of filling degree and viscosity of slurry on

mill load orientation. Minerals Engineering, 24:1502–1512, 2011.
Kim, B. S., Calise, A. J., and Kam, M. Nonlinear flight control using neural networks
and feedback linearization. Journal of Guidance, Control, and Dynamics, 20(1):
26–33, 1997.
Ko, Y.-D. and Shang, H. SAG mill system diagnosis using multivariate process variable
analysis. The Canadian Journal of Chemical Engineering, 89(6):1492–1501, 2011.
Koivo, H. N. Artificial neural networks in fault diagnosis and control. Control Engi-
neering Practice, 2(1):89–101, 1994.
Kosanovich, K. A., Dahl, K. S., and Piovoso, M. J. Improved process understanding

using multiway principal component analysis. Industrial & Engineering Chemistry
Research, 35(1):138–146, 1996.
Kourti, T., Lee, J., and MacGregor, J. F. Experiences with industrial applications
of projection methods for multivariate statistical process control. Computers &
Chemical Engineering, 20:S745–S750, 1996.
Kramer, M. A. Nonlinear principal component analysis using autoassociative neural

networks. AIChE Journal, 37(2):233–243, 1991.
Krohling, R. A. and Rey, J. P. Design of optimal disturbance rejection PID controllers

using genetic algorithms. Evolutionary Computation, IEEE Transactions on, 5(1):
78–82, 2001.
Ku, W., Storer, R. H., and Georgakis, C. Disturbance detection and isolation by
dynamic principal component analysis. Chemometrics and Intelligent Laboratory
Systems, 30(1):179–196, 1995.
Latchireddi, S. and Morrell, S. Slurry flow in mills: Grate-only discharge mechanism

(Part-1). Minerals Engineering, 16:625–633, 2003a.
Latchireddi, S. and Morrell, S. Slurry flow in mills: Grate-pulp lifter discharge systems
(Part-2). Minerals Engineering, 16:635–642, 2003b.
Lee, J.-M., Yoo, C., Choi, S. W., Vanrolleghem, P. A., and Lee, I.-B. Nonlinear
process monitoring using kernel principal component analysis. Chemical Engineering
Science, 59(1):223–234, January 2004.
Leith, D. J. and Leithead, W. E. Survey of gain-scheduling analysis and design.

International Journal of Control, 73(11):1001–1025, 2000.
133
Li, B. and Zhang, Y. Supervised locally linear embedding projection (SLLEP) for
machinery fault diagnosis. Mechanical Systems and Signal Processing, 25(8):3125–
3134, November 2011.
Liu, C. S. and Peng, H. Disturbance observer based tracking control. Journal of

Dynamic Systems and Control Division, 122:332–335, 2000.
Liu, Y., Weisberg, R. H., Hu, C., and Zheng, L. Tracking the deepwater horizon oil
spill: A modeling perspective. 92(6):45, 2011.
Ljung, L. Estimating linear time-invariant models of nonlinear time-varying systems.

European Journal of Control, 7(2-3):203–219, 2001.
Ljung, L. The System Identification Toolbox: The Manual. The MathWorks Inc.,
Natick, MA, USA, 7th edition, 2007a.
Ljung, L. and Wills, A. Issues in sampling and estimating continuous-time models

with stochastic disturbances. In IFAC World Congress, Seoul, Korea, July 2008.
Ljung, L. System Identification. Prentice-Hall, 2nd edition, 2007b.
Ljung, L. Experiments with identification of continuous time models. In Proceedings

of the 15th IFAC symposium on system identification, Saint-Malo, France, pages
90–95, 2009.
MacGregor, J. F. and Kourti, T. Statistical process control of multivariate processes.

Control Engineering Practice, 3(3):403–414, 1995.
Mahadevan, S. and Shah, S. L. Fault detection and diagnosis in process data using
one-class support vector machines. Journal of Process Control, 19(10):1627–1639,
2009.
Manohar, C. S. and Roy, D. Monte Carlo filters for identification of nonlinear structural
dynamical systems. Sādhanā, 31(4):399–427, 2006.
Mariño, R. Nonlinear Control Design: Geometric, Adaptive, & Robust. Prentice Hall,
New York, 1995.
Maybeck, P. S. Stochastic Models, Estimation, and Control, volume 1. Academic

Press, New York, 1979.
McLachlan, G. J. Discriminant Analysis and Statistical Pattern Recognition. John

Wiley & Sons, New Jersey, 2nd edition, 2004.
Mhaskar, P., Gani, A., and Christofides, P. D. Fault-tolerant control of nonlinear

processes: Performance-based reconfiguration and robustness. International Journal
of Robust and Nonlinear Control, 16(3):91–111, 2005.
134
Mokken, A. H., Blendulf, G. K. I., and Young, G. J. C. A study of the arrangements
for pulp discharge on pebble mills, and their influence on mill performance. Journal
of the South African IMM, pages 257–289, 1975.
Morrell, S. The prediction of power draw in wet tumbling mills. Ph.D. Thesis, Univer-
sity of Queensland, Australia, 1993.
Morrison, R. D. and Cleary, P. W. Towards a virtual communition machine. Minerals

Engineering, 21:770–781, 2008.
Morse, A. S. Supervisory control of families of linear set point controllers. In Proceed-

ings of the 32nd Conference on Decision and Control, Texas, 1993.
Moys, M. H. Slurry rheology-The key to a further advance in grinding mill control. Pro-
ceedings: Advances in autogenous and semi-autogenous grinding technology, 1989.
Moys, M. H., Van Nierop, M. A., and Smit, I. Progress in measuring and modelling
load behaviour in pilot and industrial mills. Minerals Engineering, 9(12):1201–1214,
1996.
Najim, K., Hodouin, D., and Desbiens, A. Adaptive control: State of the art and an
application to a grinding process. Powder Technology, 82(1):59–68, 1995.
Napier-Munn, T. J., Morrell, S., Morrison, R. D., and Kojovic, T. Mineral Com-
munition Circuits: Their Operation and Optimization. Julius Kruttschnitt Mineral
Research Centre, Australia, 1996.
Neogi, D. and Schlags, C. E. Multivariate statistical analysis of an emulsion batch

process. Industrial & Engineering Chemistry Research, 37(10):3971–3979, 1998.
Nijmeijer, H. and Van der Schaft, A. Nonlinear dynamical control systems. Springer,
New York, 1990.
Nimmo, I. Adequately address abnormal operations. Chemical Engineering Progress,

91(9):36–45, 1995.
Ohran, B. J., de la Peña, D. M., Christofides, P. D., and Davis, J. F. Enhancing

data-based fault isolation through nonlinear control: Application to a polyethylene
reactor. In American Control Conference, 2008, pages 3299–3306, 2008.
Ohran, B. J., Liu, J., de la Peña, D. M., Christofides, P. D., and Davis, J. F. Data-
based fault detection and isolation using feedback control: Output feedback and
optimality. Chemical Engineering Science, 64(10):2370–2383, 2009.
Packard, A. and Doyle, J. The complex structured singular value. Automatica, 29:
71–109, January 1993.
135
Patton, R. J. Fault-tolerant control systems: The 1997 situation. In IFAC Symposium
on Fault Detection Supervision and Safety for Technical Processes, volume 3, pages
1033–1054, 1997.
Patton, R. J., Chen, J., and Siew, T. M. Fault diagnosis in nonlinear dynamic systems
via neural networks. In International Conference on Control’94, volume 2, pages
1346–1351, 1994.
Primbs, J. A. and Nevistic, V. Optimality of Nonlinear Design Techniques: A Converse

HJB Approach. Technical report CIT-CDS 96022, California Institute of Technology,
Pasadena, CA 91125., 1996.
Quine, B. M. A derivative-free implementation of the extended Kalman filter. Auto-

matica, 42(11):1927–1934, 2006.
Ramesh, T. S., Shum, S. K., and Davis, J. F. A structured framework for efficient
problem solving in diagnostic expert systems. Computers & Chemical Engineering,
12(9):891–902, 1988.
Rao, R. C. The utilization of multiple measurements in problems of biological classi-

fication. Journal of the Royal Statistical Society, 10(2):159–203, 1948.
Roweis, S. T. and Saul, L. K. Nonlinear dimensionality reduction by locally linear

embedding. Science, 290(5500):2323–2326, 2000.
Rugh, W. J. and Shamma, J. S. Research on gain scheduling. Automatica, 36(10):

1401–1425, 2000.
Russell, E. L., Chiang, L. H., and Braatz, R. D. Fault detection in industrial pro-
cesses using canonical variate analysis and dynamic principal component analysis.
Chemometrics and Intelligent Laboratory Systems, 51(1):81–93, 2000.
Saha, N. and Roy, D. Two-stage extended Kalman filters with derivative-free local
linearizations. Journal of Engineering Mechanics, 137(8):537–546, 2011.
Saul, L. K. and Roweis, S. T. An introduction to locally linear embedding. Unpublished.

Available at: http://www.cs.nyu.edu/∼roweis/lle/publications.html, 2000.
Saul, L. K. and Roweis, S. T. Think globally, fit locally: Unsupervised learning of

low dimensional manifolds. The Journal of Machine Learning Research, 4:119–155,
2003.
Schölkopf, B., Smola, A., and Müller, K. R. Nonlinear component analysis as a kernel
eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998.
Serpas, M., Chu, Y., and Hahn, J. Fault detection approach for systems involving soft
sensors. Journal of Loss Prevention in the Process Industries, July 2012.
136
Shi, F. N. and Napier-Munn, T. J. Effects of slurry rheology on industrial grinding
performance. International Journal of Mineral Processing, 65:125–140, 2002.
Skogestad, S. and Postlethwaite, I. Multivariable Feedback Control: Analysis and

Design. John Wiley & Sons, 2nd edition, 2005.
Tong, S. and Koller, D. Support vector machine active learning with applications to
text classification. The Journal of Machine Learning Research, 2:45–66, 2002.
Valery, W. A model for dynamic and steady-state simulation of autogenous and semi-
autogenous mills. Ph.D. Thesis, University of Queensland, Australia, 1998.
van Drunick, W. I. and Penny, B. Expert mill control at AngloGold ashanti. 105(7):
497, 2005.
Venkatasubramanian, V. and Chan, K. A neural network methodology for process

fault diagnosis. AIChE Journal, 35(12):1993–2002, 1989.
Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., and Yin, K. A review of

process fault detection and diagnosis: Part I: Quantitative model-based methods.
Computers & Chemical Engineering, 27(3):293–311, 2003a.
Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., and Yin, K. A review

of process fault detection and diagnosis: Part II: Qualitative models and search
strategies. Computers & Chemical Engineering, 27(3):313–326, 2003b.
Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., and Yin, K. A review

of process fault detection and diagnosis: Part III: Process history based methods.
Computers & Chemical Engineering, 27(3):327–346, 2003c.
Wan, E. A. and Van Der Merwe, R. The unscented Kalman filter for nonlinear esti-
mation. In Adaptive Systems for Signal Processing, Communications, and Control
Symposium 2000, pages 153–158, 2000.
Wang, J. Locally linear embedding. In Geometric Structure of High-Dimensional Data

and Dimensionality Reduction, pages 203–220. Springer Berlin Heidelberg, 2011.
Wei, D. and Craig, I. K. Grinding mill circuits–A survey of control and economic
concerns. International Journal of Mineral Processing, 90(1):56–66, 2009.
Wills, B. A. and Napier-Munn, T. J. Mineral Processing Technology. Elsevier, Oxford,

7th edition, 2006.
Yang, J., Li, S., Chen, X. S., and Li, Q. Disturbance rejection of ball mill grinding
circuits using DOB and MPC. Powder Technology, 198(2):219–228, 2010.
137
Yoon, S. and MacGregor, J. F. Fault diagnosis with multivariate statistical models
Part I: Using steady state fault signatures. Journal of Process Control, 11(4):387–
400, 2001.
Zhang, Y. and Jiang, J. Bibliographical review on reconfigurable fault-tolerant control

systems. Annual Reviews in Control, 32(2):229–252, 2008.
Zhou, Y., Hahn, J., and Mannan, M. S. Fault detection and classification in chemical
processes based on neural networks with feature extraction. ISA Transactions, 42
(4):651–664, 2003.
Zumoffen, D. and Basualdo, M. From large chemical plant data to fault diagnosis
integrated to decentralized fault-tolerant control: Pulp mill process application.
Industrial & Engineering Chemistry Research, 47(4):1201–1220, 2008.
138
Appendix A
Selection of Process Variables
This section acts as guideline to help select process variables to be used in PCA.
To select the most relevant process variables as inputs to a PC model, we need an
understanding of the process from a physics and an operating point of view. With
a basic understanding of the process, we can strategically select process variables,
or transformations of process variables, to make the PC model robust to extraneous
information, noise, or drifting. Moreover, we can replace highly nonlinear process
variables with transformations that behave more linear. As a result, it is possible
to use PCA as a statistical process monitoring technique for some nonlinear or time-
varying processes. The disadvantage of a careful selection of process variables is there is
a much larger time requirement to build a PC model when compared to the traditional
method of building the model from all of the main process inputs and outputs.
From the physics perspective, we need to have a fundamental understanding of the
dynamics inside of the process unit. For example, it is important to know how mass
and energy flows from the inputs to the outputs of the process boundary. Furthermore,
we need to know how the accumulation of mass or energy in the process unit affect
the outputs or the economic objective. It is also important to understand how the
physics of the process change when the process shifts from a nominal to an abnormal
operating region. By gaining a fundamental understanding of the process, we may
be able to transform certain process variables, based on physical intuition, in such
a way that the transformation more clearly indicates the process has shifted. Some
139
examples of the types of variable transformations based on a physical understanding
of the process that can be used in PCA to detect a process shift are:
• transform bearing pressures to reflect load distribution in the process unit.
• design a mass or energy balance to determine if mass or energy is accumulating

inside a process.
• develop a model that compares the expected throughput or economics of the

process to measured values.
• develop a model that compares the expected acoustics, light, or vibration inten-
sity output of a process to measured values.
From the operational perspective, there is a lot of insight into the behaviour of the
nominal and abnormal operation of the process from the operators and process data. It
is important to have a discussion with the operators to learn how they identify nominal
and abnormal process operation. Specifically, we need to know what key variables they
monitor, and what are the changes in these variables following a process shift. Almost
certainly, the variables the operators monitor to detect a process shift will also be
included in some form in the PC model.
Operators can also provide valuable insight on which sensors are poorly calibrated
and drift. Once time-varying variables are identified, they can either be excluded from
the PC model if they are not critical to the model, or the variables can be transformed
into the difference the raw measurement is from a moving average. However, as few
variables as possible should be transformed into the difference from a moving average.
Taking the difference from a moving average may desensitize the process monitoring
method if the process remains stable near the undesired operating region for a long
time.
Operating data can also provide insight into the selection of key process variables.
For example, a correlation analysis, such as a correlation plot, between process vari-
ables may help to eliminate variables that provide redundant information during a
140
process shift. In addition, plotting time series trends or frequency spectra of various
process variables together may show which variables best indicate a process shift has
occurred. A time series plot may also reveal if particular measurements are signifi-
cantly corrupted with noise and need to be filtered. Moreover, a time series plot may
reveal that the rate of change of a variable, when compared to other variables, offers
another form of an indication of the process shift. To incorporate the rate of change of
process variables into a PC model, the derivative of each variable needs to calculated
and subsequently filtered.
A.1 Transformation of sound level
Figure A.1 shows sound level data sampled every two minutes from Gibraltar Mine
on February 25th, 2012 to February 29th, 2012. Data points where the SAG mill was
not operational have been removed from this data-set.
140
120
100
Sound Level, no units
80
60
40
20
Measured Sound Level Overload
Moving Average of Sound Level
0
0 500 1000 1500 2000 2500 3000
Sample Number
Figure A.1: Difference between measured sound level and its 0.4 day moving average
141
The measured sound level is filtered in the DCS at Gibraltar Mine with a first-order
filter with a time constant of 40 seconds. The moving average of the sound level is
calculated from a first-order filter with a time constant of 0.4 days. A moving average
using a first-order filter is desired because it is computationally easy to implement and
compute in real-time in a DCS. The deviation of the measured sound level from the
estimated moving average is used to replace the measured sound level because it is
more robust to changes in the mean of the sound level due to operation of the SAG
mill and auxiliary equipment.
The red lines in the figure depict the region on February 29th, 2012 that the
engineers officially recognize the process as being overloaded. From the figure, we see
that the sound level is less than the moving average. This is expected because the
energy of the falling ore in the mill is either reduced or dampened due to the presence
of slurry pooling or a reduced distance the ore can cataract.
142
Appendix B
Locally Linear Embedding
B.1 Minimization of reconstruction error
The reconstruction error in Equation (4.12) is minimized in three steps:
1. Evaluate the inner product between each of the K neighbors of xi to find the
neighborhood correlation matrix C. Let the K-nearest neighbors of xi be rep-
resented by X η where:
X η = [xη1 , xη2 , . . . , xηK ] ∈ <D×K , (B.1)
Then C is computed as:

C = xTηj xηk , (B.2)
where 1 ≤ j ≤ K and 1 ≤ k ≤ K.
2. Compute the Lagrange multiplier, λL = αL /βL that enforces the sum to one
constraint in Equation (4.14), where for each xi the parameters α and β are:
K X
X K
−1
αL = 1 − Cjk (xTi xηk ), (B.3)
j=1 k=1
K X
X K
−1
βL = Cjk . (B.4)
j=1 k=1
143
3. Compute the reconstruction weights, Wj , corresponding to the ith reconstruc-
tion:
K X
X K
−1
Wj = Cjk (xTi xηk + λL ). (B.5)
j=1 k=1
If the correlation matrix C is nearly singular, it can be conditioned before inversion

by adding a small multiple of the identity matrix—this is known as regularization and
essentially penalizes large weights.
B.2 Minimization of embedding cost
The embedding cost, as a function of in Equation (4.15) subject to the constraints in

Equation (4.16) and Equation (4.17) can be rewritten to the form:
n X
X n
minimize Φ(y) = Mij y Ti y j = tr(Y M Y T ), (B.6)
y
i=1 j=1
where M is a sparse symmetric and positive semidefinite n × n matrix as follows:
M = (I − W )T (I − W ). (B.7)
By the Rayleigh-Ritz theorem, Equation (B.6) can be solved by finding the eigenvectors
corresponding to the smallest d + 1 eigenvalues (Horn and Johnson, 1990). The
smallest eigenvalue of this matrix is the unit vector with all equal components and it
is discarded. It corresponds a free translation mode of eigenvalue zero: discarding it
enforces the constraint that the embeddings have zero mean.
B.3 Least squares regression to map new samples
If X Tη has full column rank, then aj can be solved as:
aj = (X Tη )+ y jη = (X η X Tη )−1 X η y jη , (B.8)
144
where (X Tη )+ is the Moore-Penrose pseudo-inverse of X Tη , and y jη is defined as:
h i
y jη = yηj 1 , yηj 2 , . . . , yηj K (B.9)
If X Tη is column rank deficient, such as in the case where K < D, then Equa-
tion (4.23) is ill-posed and there are infinitely many solutions for aj . In this case, an
effective method to condition the problem so that there are unique solutions to aj is
to perform ridge regression (Draper et al., 1998), which minimizes a regularized least
squares problem. With the introduction of a regularization parameter, β > 0, then
the regularized least square problem is posed:
K
a xη − yηj 2 + β a2 ,
X T
aj = argmin i i 2 2
(B.10)
a
i
where || · || corresponds to the L2 norm. Each basis vector aj can be solved as:
aj = (X η X Tη + βI)−1 X η y jη , (B.11)
where I is the D × D identity matrix.

The parameter β can be determined from the variance of the noise, σ 2 , since
the objective is to project the neighborhoods linearly into a d-dimensional space and
discard the noise. Therefore, β can be estimated by:
K
1 X
β = σ2 = λi , (B.12)
K − d i=d+1
where λi are the non-zero eigenvalues of matrix X η X Tη sorted from largest to smallest.
145
Appendix C
Cross Validation
The purpose of cross-validation is to train a predictive model that has one or more
unknown parameters. Cross-validation optimizes the model parameters to find the
model that best fits the training data-set. Moreover, cross-validation assesses how
well the model we train will perform in practice.
Consider a model, f (x, λm ), with one unknown model parameter, λm , that is to
be trained on the training data-set, X, consisting of n observations, x. The cross-
validation procedure to find λm so that f (x, λm ) fits X as well as possible and also gen-
eralizes well to new observations is given below and is called Kf -fold cross-validation.
Step 1 Randomly partition X into Kf mutually exclusive data-sets:
X = (X 1 , X 2 , . . . , X Kf ). (C.1)
Step 2 For each k = 1, 2, . . . ,Kf , train the model to the entire training set excluding
the kth-fold, X k .
Step 3 Fit the observations in X k to the model trained in the previous step.
Step 4 Compute the prediction error sum of squares (PRESS) statistic for the kth-
fold:
n
xi − x̂i 2 ,
X
PRESSk = 2
(C.2)
i=1
146
where x̂i is the estimate of xi from the model trained in Step 2.
Step 5 Compute the overall cross-validation error, XVerror :
Kf
1 X
XVerror = PRESSk . (C.3)
Kf k=1
Step 6 Repeat Steps 1-5 with several λm values and choose the best one, λm ∗ , that
results in the lowest XVerror .
Step 7 Train f (x, λ∗m ) on X using λm ∗ .
In Step 6, a grid search can be used to select test λm values. First, use a coarse
grid of roughly 10 equidistant λm values within a reasonable range. For the range of
λm values where XVerror is the smallest, perform a fine grid search until satisfied that
the XVerror has been minimized to a desired tolerance. In the case there are multiple
parameters to optimize, perform a grid search that exhausts all combinations of the
gridded model parameters.
Once f (x, λ∗m ) has been trained on X, it should now be tested on a new data-set,
also known as a test set, to further validate the model performance before using in
practice.
In the cross-validation procedure presented in this section, Kf was chosen to be
10 for balance. A large Kf results in a better trained classifier since there is more
training data available; however, variances may exist between different partitions of
the training set that are not captured if Kf is large. On the other hand, if Kf is small
the classifier may be poorly trained. Therefore, Kf = 10 strikes a balance between
these competing objectives.
147
Appendix D
Model Predictive Control
The objective of dynamic matrix control MPC is to reduce the error between measured
process outputs and their SPs in a least squares sense with the inclusion of a penalty
term on the input moves to the process. Therefore, the manipulated variables are
determined such that the cost function, J, is minimized:
ny p m
nu X
ŷj (t + k|t) − wj (t + k)2 + ∆ul (t + q − 1)2 ,
X X X
minimize J(∆u) = R Q
∆u
j=1 k=1 l=1 q=1
(D.1)
where ŷ is a predicted output for the kth prediction and jth output variable; ∆ul is a
future control signal for the qth prediction and lth control variable; ny is the number
of control variables; nu is the number of manipulated variables; p is the length of the
prediction horizon; m is the length of the control horizon; R ∈ <ny ·p×ny ·p is the penalty
on error diagonal weight matrix; and Q ∈ <nu ·m×nu ·m is the penalty on move diagonal
weight matrix. Equation (D.1) can be also be written in matrix form:
2 2
minimize J(u) = ŷ − wR + uQ . (D.2)
u
Here, ŷ ∈ <p·ny ×1 is the vector of predicted outputs; u ∈ <m·nu ×1 is the array of

future control signals, where ŷ is defined as:
ŷ = [ŷ1 (t + 1|t), . . . , ŷ1 (t + p|t), . . . , ŷny (t + 1|t), . . . , ŷny (t + p|t)]T , (D.3)
148
and u is:
u = [∆u1 (t), . . . , ∆u1 (t + m − 1), . . . , ∆unu (t), . . . , ∆unu (t + m − 1)]T . (D.4)
Future output predictions are a function of future control moves; in other words,
ŷ is a function of u, and ŷ can be computed by the expression:
ŷ = Gu + f , (D.5)
where f is the vector for the free response of the system, that is, it is the part of the
system response that does not depend on future control moves, and is given by:
f = [f1 (t + 1|t), . . . , f1 (t + p|t), . . . , fny (t + 1|t), . . . , fny (t + p|t)]T . (D.6)
Here, f1 , . . . , fny are the free responses corresponding to the ny process outputs,
and is given by:
Nj,1 Nj,nu
X j,1
X
fj (t + k) = ymj (t) + (gk+i − gij,1 )∆u(t − i) + · · · + j,nu
(gk+i − gij,nu )∆u(t − i),
i=1 i=1
for j = 1, . . . , ny , (D.7)
where ymj corresponds to the jth measured output; Nj,l is the number of samples
until steady state is reached in yj following a step change in ul with l = 1, . . . , nu ;
k = 1, . . . , p; and gij,l is the ith coefficient of the response in yj following a unit step
change in ul . Furthermore, the gij,l coefficients are used to construct the system’s
149
dynamic matrix between yj and ul , Gj,l ∈ <p×m , and is given by:
 
g 0 ... 0
 1 
 g2 g1 ... 0
 

 .. .. ..
 
... 
 . . . 
Gj,l =

;
 (D.8)
gm gm−1 ... g1 
 .. .. ..
 
..
.

 . . . 
 
gp gp−1 . . . gp−m+1
furthermore, for multiple-input and multiple-output systems, Gj,l are combined into
the combined matrix G containing all the system dynamic matrices, where G is:
 
G G1,2 ... G1,nu
 1,1 
 
 G2,1 G2,2 ... G2,nu 
G=
 .. .. ... ..
.
 (D.9)
 . . . 
 
Gny ,1 Gny ,2 . . . Gny ,nu
Since R 0 and Q 0, Equation (D.2) is a convex quadratic program (Boyd

and Vandenberghe, 2004), and an analytical solution for u is found by setting the
derivative to zero. The solution to u computed at each time instant t is:
u = (GT RG + Q)−1 GT R(w − f ), (D.10)
For each t, only (w − f ) needs to be recalculated since the system is assumed to

be LTI. Therefore, we define K = (GT RG + Q)−1 GT R and calculate it offline prior
to implementation of MPC. Moreover, since only the first control move is executed at
each t, K can be reduced to only the rows corresponding to the first calculated control
move for each ∆ul . Then Equation (D.10) is simplified to:
u = K(w − f ). (D.11)
150
where each element of u contains the control move for each ∆ul to be executed.
151
Appendix E
Model Selection
Model selection is important when modeling a system because there is a trade-off

between prediction accuracy and model complexity. The greater the order of the
model, the more likely the possibility that the model overfits the data and results in
poor generalization to new data. To select the process model that best fits a training
data-set, a method known as the extra sum of squares method can be used. Extra
sum of squares is a simple technique that allows us to measure whether increasing the
order of the model results in a statistically significant increase in prediction accuracy
at a confidence level of α.
The first step in extra sum of squares is to find the sum of squares of the model
residuals, Qj , for the jth model where, j = 1, 2, . . . , nm , and nm is the number of
models to be compared against each other. Moreover, the nm models are sorted in
ascending order in terms of the number of fitted parameters, pj , in the jth model. Qj
is computed from the expression:
n
X
Qj = (yi − ŷi,j )2 , (E.1)
i=1
where n is the number of observations in the training data-set, yi is the ith observation
of the output in the data-set, and ŷi is the ith estimate of yi from the jth model. The
next step is to compute the F -statistic between combinations each model where pj
152
increases only by one, resulting in a total of k = j − 1 combinations:

Qj − Qj+1 n − pj+1
Fk = . (E.2)
pj+1 − pj Qj+1
The next step is to find Fα . To do this, find the value in the F -distribution table
for df1 = pj+1 − pj and df2 = n − pj+1 at a confidence level of α, where df corresponds
to the degrees of freedom in the F -distribution.
The final step is to compare the F values to find the first case where Fk < Fα .
Once this condition is satisfied, then the jth model is chosen as the best candidate,
where j = k.
With this method, it is possible that a comparison between two models where j
in pj is increased by more than one results in a higher-order and better fitting model.
However, it is unlikely in this case that the increase in model order is significantly more
beneficial to the predictive accuracy. If it appears a model is not selected but may
be more beneficial for modeling the process, the predictive accuracy of the unselected
model should be compared to the predictive accuracy of the selected model on a test
data-set to see if the prediction accuracy substantially improved.
This section is meant to act as a guideline for model selection. There are numerous
other methods available for model selection. For example, a cost function incorporating
the cost of prediction errors and parameter costs could be minimized to find the best
model. The final prediction-error criterion discussed by Ljung (2007b) is one example
for model selection based upon the minimization of a cost function.
153

McClure - Masters Thesis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

McClure - Masters Thesis

Uploaded by

Copyright:

Available Formats

ALGORITHMS FOR NONLINEAR PROCESS

MONITORING AND CONTROLLER PERFORMANCE

Kenneth Scott McClure

B.A.Sc., The University of British Columbia, 2011

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE

MASTER OF APPLIED SCIENCE

The Faculty of Graduate Studies

(Chemical and Biological Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA

Chemical and mineral processing industries commonly commission linear feedback

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

4 General Control Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Case Study of SAG Mill Overload . . . . . . . . . . . . . . . . . . . . . 76

B Locally Linear Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 143

C Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

D Model Predictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . 148

E Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

1.1 Example of a process operation with two states . . . . . . . . . . . . . 4

2.1 Types of faults and their development with time . . . . . . . . . . . . . 8

4.1 Flowchart of the general control strategy applied to a chemical process 40

5.1 Overview of existing SAG mill control at Gibraltar Mine . . . . . . . . 78

D number of dimensions in a high-dimensional data-set

ANN artificial neural network

DCS digital control system

EKF extended Kalman filter

GPM gallons per minute

LDA linear discriminant analysis

QDA quadratic discriminant analysis

SAG semi-autogeneous grinding

UIO unknown input observer

I am grateful to the University of Alberta for allowing me to study at their school

I would like to make a very special acknowledgement to my partner, Vanya Jongkind,

1. Detect Process Shift

Once a control strategy is developed, it will be tested on a SAG mill at Gibraltar

2.1 Process monitoring

Process monitoring is an important problem in the chemical and mineral processing

2. Robustness: the performance should degrade gracefully rather than abruptly,

4. Estimate classification errors: provide a confidence level for the classification of

5. Adaptability: the adaptability of the method to changing process conditions

6. Modeling requirements: modeling effort should be minimal.

7. Computational requirements: the method must operate at a satisfactory scan

Although a process monitoring method may satisfy these desirable characteristics,

2.1.1 Model-based methods

A common quantitative model-based process monitoring technique is to develop parity

Figure 2.2: Residual generation by a parity relation for process monitoring

State Estimates x(t)

Figure 2.3: Residual generation by observer for process monitoring

2.1.2 Data-based methods

In contrast to model-based methods where a priori knowledge about the process is

Artificial neural networks

Artificial neural networks (ANN) are proposed by Venkatasubramanian and Chan

Input Layer Hidden Layer Output Layer

Principal component analysis

Multivariate statistical techniques have proven in the industry to be effective fault

From the information-rich PCs, it is possible to perform fault detection through

y2 Normal operating data

Support vector machines

Normal Sample Hyperplane

Locally linear embedding

Figure 2.8: Dimensionality reduction by LLE performed on A, a 3D manifold of a

2.2 Controller performance recovery

Gain scheduling control

Gain scheduling control is a promising linear technique that is a form of adaptive

2.3 SAG milling