Professional Documents
Culture Documents
Introduction to Research
Arun K. Tangirala
Modelling Skills
Objectives
I What is a model?
I First-principles (mechanistic) vs. empirical models.
I Systematic procedure for building models from data.
I Few critical aspects of data-driven (empirical) modelling.
Objectives
I What is a model?
I First-principles (mechanistic) vs. empirical models.
I Systematic procedure for building models from data.
I Few critical aspects of data-driven (empirical) modelling.
What is a model?
What is a model?
61&%$0,+"7.&
'8%97:+&%17 7*/'.89#%)%8.'.)/
.;.7%&2 :;*/,6%$9#)(#.)',./
A model emulates the process given the operating conditions, parameters and physical properties.
However, in general, its architecture is different from the process!
First-principles Empirical
(Fundamentals) (Data-driven)
Models
First-principles Empirical
(Fundamentals) (Data-driven)
Models
First-principles Empirical
Causal, continuous, non-linear differential- Models are usually black-box and discrete-time
algebraic equations
Model structures are quite rigid - the structure is Model structures are extremely flexible - implies
decided by the equations describing fundamental they have to be assumed/known a priori.
laws.
Can be quite challenging and time-consuming to Relatively much easier and are less time-consuming
develop to develop
Require good numerical ODE and algebraic solvers. Require good estimators (plenty of them available)
Very effective and reliable models - can be used for Model quality is strongly dependent on data quality
wide range of operating conditions - usually have poor extrapolation capabilities
Transparent and can be easily related to physical Difficult to relate to physical parameters of the pro-
parameters/characteristics of the process cess (black-box) and the underlying phenomena
h
Fo
dh̃ Cv
Ac = F̃i h̃, = p
dt 2 hs
F̃i = Fi Fs , h̃i = hi hs
Case 3: Approximate linear, dynamic Case 4: Approximate empirical, Case 5: Black-box, dynamic,
model about an operating point linear, grey-box, dynamic, discrete-time model
discrete-time model,
dh̃ Cv 1. Model structure based on ease
Ac = F̃i h̃, = p
dt 2 hs h[k] = a1 h[k 1] + bi Fi [k 1] + "[k] of estimation and end-use
F̃i = Fi Fs , h̃i = hi hs 2. Model may not be physically
(Parameters a1 , b1 estimated from
interpretable, but designed to
Typically, hs , Fs are steady-state values. experimental data)
give good predictions.
Arun K. Tangirala, IIT Madras Introduction to Research 15
Satisfactory?
No
Yes
Accept it
Time-series models
I Suited for modelling stochastic or random processes (e.g., stock market index, rainfall, sensor noise)
I Challenges: choosing model structure, making the right assumptions on process characteristics,
non-linearities, etc.
Remember
In practice, all modelling exercises demand a careful handling of uncertainties
both at the experimental and modelling stages!
I Three unknowns
I Therefore, data corresponding to three steady-states is required
I A more general statement: The regressor matrix
2 3
1 u[k1 ] u2 [k1 ]
6 7
U = 41 u[k2 ] u2 [k2 ]5 (1)
1 u[k3 ] u2 [k3 ]
Suppose that the process is excited with u[k] = sin(!0 k) (sine of single frequency).
Suppose that the process is excited with u[k] = sin(!0 k) (sine of single frequency). With this sine wave input,
The second challenging aspect of empirical modelling is the presence of random or stochastic effects.
I Accuracy of predictions
I Errors in parameter estimates
I Goodness of the deterministic model
SNR
A key measure that quantifies the effects of noise is the Signal-to-Noise Ratio (SNR), which is defined
as the ratio of variance of signal to the variance of noise in a measurement.
2
signal
SNR = 2
noise
SNR can be interpreted as a measure of the degree of certainty (deterministic portion) to uncertainty
Output vs. Input (SNR = 100) Output vs. Input (SNR = 10)
8 8
6 6
5 5
Output
Output
4 4
Data
3 Best Fit 3
Data
Best Fit
2 2
1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Input Input
b̂1 = 0.036,
b̂0 = 0.02 b̂1 = 0.114, b̂0 = 0.064
Arun K. Tangirala, IIT Madras Introduction to Research 26
Modelling Skills References
Output vs. Input (SNR = 100) Output vs. Input (SNR = 10)
8 8
6 6
5 5
Decrease in SNR increases the
Output
Output
1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Input Input
Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]
Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.
25
20
15
Output
10
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Input
Input-output data
Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]
Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.
Comparison of predictions on test data
25 150
Actual
rd
3 order
20 100 4th order
5th order
15 50
Output
10 Output 0
5 −50
0 −100
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8
Input Input
Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]
Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.
Comparison of predictions on test data
25 150 30
Actual
rd
3 order
20 100 4th order 25
5th order
Residual norm
15 50 20
Output
Output
10 0 15
5 −50 10
0 −100 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7
Input Input Order
Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]
Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.
Comparison of predictions on test data
25 150 30
Actual
rd
3 order
20 100 4th order 25
5th order
Residual norm
15 50 20
Output
10 Output 0 15
5 −50 10
0 −100 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7
Input Input Order
Overfitting occurs whenever the local chance variations are treated as global characteristics
Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]
Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.
Comparison of predictions on test data
25 150 30
Actual
rd
3 order
20 100 4th order 25
5th order
Residual norm
15 50 20
Output
Output
10 0 15
5 −50 10
0 −100 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7
Input Input Order
Overfitting occurs whenever the local chance variations are treated as global characteristics
3rd order fit: ŷ[k] = 1.17 + 0.35 u[k] + 0.36 u2 [k] + 0.19 u3 [k]
(±0.05) (±0.1) (±0.06) (±0.01)
Bibliography I
Bendat, J. S. and A. G. Piersol (2010). Random Data: Analysis and Measurement Procedures. 4th edition. New
York, USA: John Wiley & Sons, Inc.
Johnson, R. A. (2011). Miller and Freund’s: Probability and Statistics for Engineers. Upper Saddle River, NJ,
USA: Prentice Hall.
Montgomery, D. C. and G. C. Runger (2011). Applied Statistics and Probability for Engineers. 5th edition. New
York, USA: John Wiley & Sons, Inc.
Ogunnaike, B. A. (2010). Random Phenomena: Fundamentals of Probability and Statistics for Engineers. Boca
Raton, FL, USA: CRC Press, Taylor & Francis Group.
Tangirala, A. K. (2014). Principles of System Identification: Theory and Practice. CRC Press, Taylor & Francis
Group.