Modelling Skills Web

Modelling Skills References
Introduction to Research
Arun K. Tangirala
Modelling Skills
Arun K. Tangirala, IIT Madras Introduction to Research 1
Objectives
To learn the following:
I What is a model?
I First-principles (mechanistic) vs. empirical models.
I Systematic procedure for building models from data.
I Few critical aspects of data-driven (empirical) modelling.

Objectives
To learn the following:
I What is a model?
I First-principles (mechanistic) vs. empirical models.
I Systematic procedure for building models from data.
I Few critical aspects of data-driven (empirical) modelling.
With two hands-on examples in Rr (a popular software for data analysis) . . .
What is a model?
A model is a mathematical (or descriptive) abstraction of a process that

emulates its behaviour or characteristics

What is a model?
A model is a mathematical (or descriptive) abstraction of a process that

emulates its behaviour or characteristics
For the purposes of
I Prediction (inferring the unknowns)

I Classification (pattern recognition)
I Fault detection
I Process simulation
I Design and optimization
I ...
Model vs. process
61&%$0,+"7.&
'8%97:+&%17 7*/'.89#%)%8.'.)/
.;.7%&2 :;*/,6%$9#)(#.)',./
!"#$%& 3$%#$%& !"#$%&%'()* 34'#4'/

'()*$&%+,-.
Process '4.+&$0.)5 +%),%-$./ Model 0#).5,6'.5
/+01+,-.&2 /+01+,-.&2 0).1).//()/2 +%),%-$./2
A model emulates the process given the operating conditions, parameters and physical properties.
However, in general, its architecture is different from the process!

Two broad approaches to modelling
First-principles Empirical
(Fundamentals) (Data-driven)
Write conservation Perform experiments

equations Postulate models
Constitutive laws Estimate parameters
... ...
Models
Two broad approaches to modelling
(Fundamentals) (Data-driven)
Write conservation Perform experiments

equations Postulate models
Constitutive laws Estimate parameters
... ...
Models
Test drive of a vehicle

Taking a vehicle for a test drive is a simple example of how experiments are a natural way of
understanding processes. As the vehicle is subjected to different test (road) conditions, its response is
used by the test-driver to develop a “mental model" of the vehicle.
First-principles vs. Empirical modelling
Causal, continuous, non-linear differential- Models are usually black-box and discrete-time
algebraic equations
Model structures are quite rigid - the structure is Model structures are extremely flexible - implies
decided by the equations describing fundamental they have to be assumed/known a priori.
laws.
Can be quite challenging and time-consuming to Relatively much easier and are less time-consuming
develop to develop
Require good numerical ODE and algebraic solvers. Require good estimators (plenty of them available)
Very effective and reliable models - can be used for Model quality is strongly dependent on data quality
wide range of operating conditions - usually have poor extrapolation capabilities
Transparent and can be easily related to physical Difficult to relate to physical parameters of the pro-
parameters/characteristics of the process cess (black-box) and the underlying phenomena
Example: Liquid level system

Fi
h
Fo


Fi Case 1: Steady-state model
between Fo and h:
p
Fo = C v h
h
(Cv is estimated from data)
Fo

Fi Case 1: Steady-state model Case 2: Dynamic, non-linear model
between Fo and h: of h(t) for changes in Fi (t).
p dh p
Fo = C v h Ac = Fi Cv h
dt
h
(Cv is estimated from data) ( requires numerical solvers)
Fo


p dh p
dt
h
Fo
Case 3: Approximate linear, dynamic

model about an operating point
dh̃ Cv
Ac = F̃i h̃, = p
dt 2 hs
F̃i = Fi Fs , h̃i = hi hs
Typically, hs , Fs are steady-state values.

p dh p
dt
h
Fo
Case 3: Approximate linear, dynamic Case 4: Approximate empirical,

model about an operating point linear, grey-box, dynamic,
discrete-time model,
dh̃ Cv
Ac = F̃i h̃, = p
dt 2 hs h[k] = a1 h[k 1] + bi Fi [k 1] + "[k]
F̃i = Fi Fs , h̃i = hi hs
(Parameters a1 , b1 estimated from
Typically, hs , Fs are steady-state values. experimental data)


p dh p
dt
h
Fo
Case 3: Approximate linear, dynamic Case 4: Approximate empirical, Case 5: Black-box, dynamic,
model about an operating point linear, grey-box, dynamic, discrete-time model
discrete-time model,
dh̃ Cv 1. Model structure based on ease
Ac = F̃i h̃, = p
dt 2 hs h[k] = a1 h[k 1] + bi Fi [k 1] + "[k] of estimation and end-use
F̃i = Fi Fs , h̃i = hi hs 2. Model may not be physically
(Parameters a1 , b1 estimated from
interpretable, but designed to
Typically, hs , Fs are steady-state values. experimental data)
give good predictions.
Building models from data: systematic procedure

Prior Process
Knowledge
Data Generation and Acquisition
Disturbances
Measurable
• Primarily one finds three stages, Inputs Actuators Process Sensors
which again contain sub-stages

• All models should be subjected to Data
a critical validation test

Visualization Optimization Select Candidate Non-parametric
• Final model quality largely depends Pre-processing Criteria Models analysis
on the quality of data Model

Estimation
Model Development
• Generating rich (informative) data

Residual Analysis
Model Quality
is therefore essential Assessment
Estimation Error Analysis
Cross-Validation
...
Satisfactory?
No
Yes
Accept it

Two broad classes of models
Time-series models
I Suited for modelling stochastic or random processes (e.g., stock market index, rainfall, sensor noise)
I Causes are either unknown or also random
I Usually dynamic models (in a few applications, steady-state as well)
I Challenges: choosing model structure, making the right assumptions on process characteristics,
non-linearities, etc.
Two broad classes of models . . . contd.

Input-output or causal models
I Suited for modelling relationship between a variable (or more) and other explanatory variables
(a.k.a. regressors or factors) (e.g., power and current in a wire, temperature and coolant flow)
I Regressors may be known accurately or with some error
I Models could be steady-state or dynamic.
I Challenges: sufficient variability in factors, selection of regressors, measurements and regressors
available at different sampling rates, choosing the order of dynamics, etc.

Two broad classes of models . . . contd.

Input-output or causal models
I Suited for modelling relationship between a variable (or more) and other explanatory variables
(a.k.a. regressors or factors) (e.g., power and current in a wire, temperature and coolant flow)
I Regressors may be known accurately or with some error
I Models could be steady-state or dynamic.
I Challenges: sufficient variability in factors, selection of regressors, measurements and regressors
available at different sampling rates, choosing the order of dynamics, etc.
Remember
In practice, all modelling exercises demand a careful handling of uncertainties
both at the experimental and modelling stages!
Excite the process sufficiently!

Example
Consider a steady-state model:
y[k] = b0 + b1 u[k] + b2 u2 [k]
I Three unknowns
I Therefore, data corresponding to three steady-states is required
I A more general statement: The regressor matrix
2 3
1 u[k1 ] u2 [k1 ]
6 7
U = 41 u[k2 ] u2 [k2 ]5 (1)
1 u[k3 ] u2 [k3 ]
should be non-singular, i.e., of full rank.

For dynamic systems

Example
Consider a (deterministic) process:
y[k] = b1 u[k 1] + b2 u[k 2] + b3 u[k 3]
Suppose that the process is excited with u[k] = sin(!0 k) (sine of single frequency).
For dynamic systems

Example
Consider a (deterministic) process:
y[k] = b1 u[k 1] + b2 u[k 2] + b3 u[k 3]
Suppose that the process is excited with u[k] = sin(!0 k) (sine of single frequency). With this sine wave input,
y[k] = b1 sin(!0 k ) + b2 sin(!0 k 2 ) + b3 sin(!0 3 )

✓ ◆ ✓ ◆
b2 b2
= b1 + sin(!0 k ) + b3 + sin(!0 3 )
2 cos !0 2 cos !0
✓ ◆ ✓ ◆
b2 b2
= b1 + u[k 1] + b3 + u[k 3]
2 cos !0 2 cos !0
Only two of the three parameters can be identified! Why?

Effects of randomness (noise) in data
The second challenging aspect of empirical modelling is the presence of random or stochastic effects.
Randomness (uncertainties) in the process influences identification in several ways:
I Accuracy of predictions
I Errors in parameter estimates
I Goodness of the deterministic model
Signal-to-Noise Ratio (SNR)
SNR
A key measure that quantifies the effects of noise is the Signal-to-Noise Ratio (SNR), which is defined
as the ratio of variance of signal to the variance of noise in a measurement.
2
signal
SNR = 2
noise
SNR can be interpreted as a measure of the degree of certainty (deterministic portion) to uncertainty

Example: Effect of SNR

Estimation of linear model from measured data
Process : x[k] = b1 u[k 1] + b0 ; b1 = 5; b0 = 2
Only y[k] = x[k] + v[k] (measurement) is available. Goal is to estimate b1 , b0 .

Output vs. Input (SNR = 100) Output vs. Input (SNR = 10)
8 8
7 y = 5.009*x + 1.989 7 y = 5.027*x + 1.965
6 6
5 5
Output
Output
4 4
Data
3 Best Fit 3
Data
Best Fit
2 2
1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Input Input
b̂1 = 0.036,
b̂0 = 0.02 b̂1 = 0.114, b̂0 = 0.064

Output vs. Input (SNR = 100) Output vs. Input (SNR = 10)
8 8
7 y = 5.009*x + 1.989 7 y = 5.027*x + 1.965
6 6
5 5
Decrease in SNR increases the
Output
Output
4 4 error in parameter estimates

p
(proportional to 1/SNR)
Data
3 Best Fit 3
Data
Best Fit
2 2
1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Input Input
b̂1 b̂0 = 0.036, = 0.02 b̂1 = 0.114, b̂0 = 0.064

Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]
Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.
25
20
15
Output
10
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Input
Input-output data

Comparison of predictions on test data
25 150
Actual
rd
3 order
20 100 4th order
5th order
15 50
Output
10 Output 0
5 −50
0 −100
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8
Input Input
Input-output data Cross-validation
25 150 30
Actual
rd
3 order
20 100 4th order 25
5th order
Residual norm
15 50 20
Output
Output
10 0 15
5 −50 10
0 −100 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7
Input Input Order
Input-output data Cross-validation Selecting the order

25 150 30
Actual
rd
3 order
20 100 4th order 25
5th order
Residual norm
15 50 20
Output
10 Output 0 15
5 −50 10
0 −100 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7
Input Input Order
Overfitting occurs whenever the local chance variations are treated as global characteristics
25 150 30
Actual
rd
3 order
20 100 4th order 25
5th order
Residual norm
15 50 20
Output
Output
10 0 15
5 −50 10
0 −100 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7
Input Input Order
Overfitting occurs whenever the local chance variations are treated as global characteristics
3rd order fit: ŷ[k] = 1.17 + 0.35 u[k] + 0.36 u2 [k] + 0.19 u3 [k]
(±0.05) (±0.1) (±0.06) (±0.01)

Questions for reflection
I What type of models are possible? Which one(s) to choose?

I How do we “fit" a model that “explains" the variations observed in experimental data?
I How to “correctly" account for the deterministic and stochastic effects?
I Will the experiment influence the model that we fit? If yes, in what way?
I How do we set up and solve the problem of estimating the unknown model parameters?
I What kind of experiments should we design to obtain a good quality model?
I How much data do we collect? (what should be the sample size?)
Bibliography I
Bendat, J. S. and A. G. Piersol (2010). Random Data: Analysis and Measurement Procedures. 4th edition. New
York, USA: John Wiley & Sons, Inc.
Johnson, R. A. (2011). Miller and Freund’s: Probability and Statistics for Engineers. Upper Saddle River, NJ,
USA: Prentice Hall.
Montgomery, D. C. and G. C. Runger (2011). Applied Statistics and Probability for Engineers. 5th edition. New
York, USA: John Wiley & Sons, Inc.
Ogunnaike, B. A. (2010). Random Phenomena: Fundamentals of Probability and Statistics for Engineers. Boca
Raton, FL, USA: CRC Press, Taylor & Francis Group.
Tangirala, A. K. (2014). Principles of System Identification: Theory and Practice. CRC Press, Taylor & Francis
Group.

Modelling Skills Web

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modelling Skills Web

Uploaded by

Copyright:

Available Formats

Modelling Skills References

Arun K. Tangirala, IIT Madras Introduction to Research 1

Modelling Skills References

To learn the following:

Arun K. Tangirala, IIT Madras Introduction to Research 2

To learn the following:

With two hands-on examples in Rr (a popular software for data analysis) . . .

Arun K. Tangirala, IIT Madras Introduction to Research 3

Modelling Skills References

A model is a mathematical (or descriptive) abstraction of a process that

Arun K. Tangirala, IIT Madras Introduction to Research 4

A model is a mathematical (or descriptive) abstraction of a process that

For the purposes of

I Prediction (inferring the unknowns)

Arun K. Tangirala, IIT Madras Introduction to Research 5

Modelling Skills References

Model vs. process

!"#$%& 3$%#$%& !"#$%&%'()* 34'#4'/

Arun K. Tangirala, IIT Madras Introduction to Research 6

Two broad approaches to modelling

Write conservation Perform experiments

Arun K. Tangirala, IIT Madras Introduction to Research 7

Modelling Skills References

Two broad approaches to modelling

Write conservation Perform experiments

Test drive of a vehicle

First-principles vs. Empirical modelling

Arun K. Tangirala, IIT Madras Introduction to Research 9

Modelling Skills References

Example: Liquid level system

Arun K. Tangirala, IIT Madras Introduction to Research 10

Example: Liquid level system

Arun K. Tangirala, IIT Madras Introduction to Research 11

Modelling Skills References

Example: Liquid level system

Arun K. Tangirala, IIT Madras Introduction to Research 12

Example: Liquid level system

Case 3: Approximate linear, dynamic

Typically, hs , Fs are steady-state values.

Arun K. Tangirala, IIT Madras Introduction to Research 13

Modelling Skills References

Example: Liquid level system

Case 3: Approximate linear, dynamic Case 4: Approximate empirical,

Arun K. Tangirala, IIT Madras Introduction to Research 14

Example: Liquid level system

Modelling Skills References

Building models from data: systematic procedure

• Primarily one finds three stages, Inputs Actuators Process Sensors

which again contain sub-stages

a critical validation test

on the quality of data Model

• Generating rich (informative) data

Arun K. Tangirala, IIT Madras Introduction to Research 16

Two broad classes of models

I Causes are either unknown or also random

I Usually dynamic models (in a few applications, steady-state as well)

Arun K. Tangirala, IIT Madras Introduction to Research 17

Modelling Skills References

Two broad classes of models . . . contd.

Arun K. Tangirala, IIT Madras Introduction to Research 18

Two broad classes of models . . . contd.

Arun K. Tangirala, IIT Madras Introduction to Research 19

Modelling Skills References

Excite the process suﬃciently!

7 y = 5.009x + 1.989 7 y = 5.027x + 1.965

7 y = 5.009x + 1.989 7 y = 5.027x + 1.965