You are on page 1of 17

Modelling Skills References

Introduction to Research

Arun K. Tangirala

Modelling Skills

Arun K. Tangirala, IIT Madras Introduction to Research 1

Modelling Skills References

Objectives

To learn the following:

I What is a model?
I First-principles (mechanistic) vs. empirical models.
I Systematic procedure for building models from data.
I Few critical aspects of data-driven (empirical) modelling.

Arun K. Tangirala, IIT Madras Introduction to Research 2


Modelling Skills References

Objectives

To learn the following:

I What is a model?
I First-principles (mechanistic) vs. empirical models.
I Systematic procedure for building models from data.
I Few critical aspects of data-driven (empirical) modelling.

With two hands-on examples in Rr (a popular software for data analysis) . . .

Arun K. Tangirala, IIT Madras Introduction to Research 3

Modelling Skills References

What is a model?

A model is a mathematical (or descriptive) abstraction of a process that


emulates its behaviour or characteristics

Arun K. Tangirala, IIT Madras Introduction to Research 4


Modelling Skills References

What is a model?

A model is a mathematical (or descriptive) abstraction of a process that


emulates its behaviour or characteristics

For the purposes of

I Prediction (inferring the unknowns)


I Classification (pattern recognition)
I Fault detection
I Process simulation
I Design and optimization
I ...

Arun K. Tangirala, IIT Madras Introduction to Research 5

Modelling Skills References

Model vs. process

61&%$0,+"7.&
'8%97:+&%17 7*/'.89#%)%8.'.)/
.;.7%&2 :;*/,6%$9#)(#.)',./

!"#$%& 3$%#$%& !"#$%&%'()* 34'#4'/


'()*$&%+,-.
Process '4.+&$0.)5 +%),%-$./ Model 0#).5,6'.5
/+01+,-.&2 /+01+,-.&2 0).1).//()/2 +%),%-$./2

A model emulates the process given the operating conditions, parameters and physical properties.
However, in general, its architecture is different from the process!

Arun K. Tangirala, IIT Madras Introduction to Research 6


Modelling Skills References

Two broad approaches to modelling

First-principles Empirical
(Fundamentals) (Data-driven)

Write conservation Perform experiments


equations Postulate models
Constitutive laws Estimate parameters
... ...

Models

Arun K. Tangirala, IIT Madras Introduction to Research 7

Modelling Skills References

Two broad approaches to modelling

First-principles Empirical
(Fundamentals) (Data-driven)

Write conservation Perform experiments


equations Postulate models
Constitutive laws Estimate parameters
... ...

Models

Test drive of a vehicle


Taking a vehicle for a test drive is a simple example of how experiments are a natural way of
understanding processes. As the vehicle is subjected to different test (road) conditions, its response is
used by the test-driver to develop a “mental model" of the vehicle.
Arun K. Tangirala, IIT Madras Introduction to Research 8
Modelling Skills References

First-principles vs. Empirical modelling

First-principles Empirical
Causal, continuous, non-linear differential- Models are usually black-box and discrete-time
algebraic equations
Model structures are quite rigid - the structure is Model structures are extremely flexible - implies
decided by the equations describing fundamental they have to be assumed/known a priori.
laws.
Can be quite challenging and time-consuming to Relatively much easier and are less time-consuming
develop to develop
Require good numerical ODE and algebraic solvers. Require good estimators (plenty of them available)
Very effective and reliable models - can be used for Model quality is strongly dependent on data quality
wide range of operating conditions - usually have poor extrapolation capabilities
Transparent and can be easily related to physical Difficult to relate to physical parameters of the pro-
parameters/characteristics of the process cess (black-box) and the underlying phenomena

Arun K. Tangirala, IIT Madras Introduction to Research 9

Modelling Skills References

Example: Liquid level system


Fi

h
Fo

Arun K. Tangirala, IIT Madras Introduction to Research 10


Modelling Skills References

Example: Liquid level system


Fi Case 1: Steady-state model
between Fo and h:
p
Fo = C v h
h
(Cv is estimated from data)
Fo

Arun K. Tangirala, IIT Madras Introduction to Research 11

Modelling Skills References

Example: Liquid level system


Fi Case 1: Steady-state model Case 2: Dynamic, non-linear model
between Fo and h: of h(t) for changes in Fi (t).
p dh p
Fo = C v h Ac = Fi Cv h
dt
h
(Cv is estimated from data) ( requires numerical solvers)
Fo

Arun K. Tangirala, IIT Madras Introduction to Research 12


Modelling Skills References

Example: Liquid level system


Fi Case 1: Steady-state model Case 2: Dynamic, non-linear model
between Fo and h: of h(t) for changes in Fi (t).
p dh p
Fo = C v h Ac = Fi Cv h
dt
h
(Cv is estimated from data) ( requires numerical solvers)
Fo

Case 3: Approximate linear, dynamic


model about an operating point

dh̃ Cv
Ac = F̃i h̃, = p
dt 2 hs
F̃i = Fi Fs , h̃i = hi hs

Typically, hs , Fs are steady-state values.

Arun K. Tangirala, IIT Madras Introduction to Research 13

Modelling Skills References

Example: Liquid level system


Fi Case 1: Steady-state model Case 2: Dynamic, non-linear model
between Fo and h: of h(t) for changes in Fi (t).
p dh p
Fo = C v h Ac = Fi Cv h
dt
h
(Cv is estimated from data) ( requires numerical solvers)
Fo

Case 3: Approximate linear, dynamic Case 4: Approximate empirical,


model about an operating point linear, grey-box, dynamic,
discrete-time model,
dh̃ Cv
Ac = F̃i h̃, = p
dt 2 hs h[k] = a1 h[k 1] + bi Fi [k 1] + "[k]
F̃i = Fi Fs , h̃i = hi hs
(Parameters a1 , b1 estimated from
Typically, hs , Fs are steady-state values. experimental data)

Arun K. Tangirala, IIT Madras Introduction to Research 14


Modelling Skills References

Example: Liquid level system


Fi Case 1: Steady-state model Case 2: Dynamic, non-linear model
between Fo and h: of h(t) for changes in Fi (t).
p dh p
Fo = C v h Ac = Fi Cv h
dt
h
(Cv is estimated from data) ( requires numerical solvers)
Fo

Case 3: Approximate linear, dynamic Case 4: Approximate empirical, Case 5: Black-box, dynamic,
model about an operating point linear, grey-box, dynamic, discrete-time model
discrete-time model,
dh̃ Cv 1. Model structure based on ease
Ac = F̃i h̃, = p
dt 2 hs h[k] = a1 h[k 1] + bi Fi [k 1] + "[k] of estimation and end-use
F̃i = Fi Fs , h̃i = hi hs 2. Model may not be physically
(Parameters a1 , b1 estimated from
interpretable, but designed to
Typically, hs , Fs are steady-state values. experimental data)
give good predictions.
Arun K. Tangirala, IIT Madras Introduction to Research 15

Modelling Skills References

Building models from data: systematic procedure


Prior Process
Knowledge
Data Generation and Acquisition
Disturbances
Measurable

• Primarily one finds three stages, Inputs Actuators Process Sensors

which again contain sub-stages


• All models should be subjected to Data

a critical validation test


Visualization Optimization Select Candidate Non-parametric
• Final model quality largely depends Pre-processing Criteria Models analysis

on the quality of data Model


Estimation
Model Development

• Generating rich (informative) data


Residual Analysis
Model Quality
is therefore essential Assessment
Estimation Error Analysis
Cross-Validation
...

Satisfactory?
No
Yes

Accept it

Arun K. Tangirala, IIT Madras Introduction to Research 16


Modelling Skills References

Two broad classes of models

Time-series models
I Suited for modelling stochastic or random processes (e.g., stock market index, rainfall, sensor noise)

I Causes are either unknown or also random

I Usually dynamic models (in a few applications, steady-state as well)

I Challenges: choosing model structure, making the right assumptions on process characteristics,
non-linearities, etc.

Arun K. Tangirala, IIT Madras Introduction to Research 17

Modelling Skills References

Two broad classes of models . . . contd.


Input-output or causal models
I Suited for modelling relationship between a variable (or more) and other explanatory variables
(a.k.a. regressors or factors) (e.g., power and current in a wire, temperature and coolant flow)
I Regressors may be known accurately or with some error
I Models could be steady-state or dynamic.
I Challenges: sufficient variability in factors, selection of regressors, measurements and regressors
available at different sampling rates, choosing the order of dynamics, etc.

Arun K. Tangirala, IIT Madras Introduction to Research 18


Modelling Skills References

Two broad classes of models . . . contd.


Input-output or causal models
I Suited for modelling relationship between a variable (or more) and other explanatory variables
(a.k.a. regressors or factors) (e.g., power and current in a wire, temperature and coolant flow)
I Regressors may be known accurately or with some error
I Models could be steady-state or dynamic.
I Challenges: sufficient variability in factors, selection of regressors, measurements and regressors
available at different sampling rates, choosing the order of dynamics, etc.

Remember
In practice, all modelling exercises demand a careful handling of uncertainties
both at the experimental and modelling stages!

Arun K. Tangirala, IIT Madras Introduction to Research 19

Modelling Skills References

Excite the process sufficiently!


Example
Consider a steady-state model:
y[k] = b0 + b1 u[k] + b2 u2 [k]

I Three unknowns
I Therefore, data corresponding to three steady-states is required
I A more general statement: The regressor matrix
2 3
1 u[k1 ] u2 [k1 ]
6 7
U = 41 u[k2 ] u2 [k2 ]5 (1)
1 u[k3 ] u2 [k3 ]

should be non-singular, i.e., of full rank.

Arun K. Tangirala, IIT Madras Introduction to Research 20


Modelling Skills References

For dynamic systems


Example
Consider a (deterministic) process:

y[k] = b1 u[k 1] + b2 u[k 2] + b3 u[k 3]

Suppose that the process is excited with u[k] = sin(!0 k) (sine of single frequency).

Arun K. Tangirala, IIT Madras Introduction to Research 21

Modelling Skills References

For dynamic systems


Example
Consider a (deterministic) process:

y[k] = b1 u[k 1] + b2 u[k 2] + b3 u[k 3]

Suppose that the process is excited with u[k] = sin(!0 k) (sine of single frequency). With this sine wave input,

y[k] = b1 sin(!0 k ) + b2 sin(!0 k 2 ) + b3 sin(!0 3 )


✓ ◆ ✓ ◆
b2 b2
= b1 + sin(!0 k ) + b3 + sin(!0 3 )
2 cos !0 2 cos !0
✓ ◆ ✓ ◆
b2 b2
= b1 + u[k 1] + b3 + u[k 3]
2 cos !0 2 cos !0

Only two of the three parameters can be identified! Why?

Arun K. Tangirala, IIT Madras Introduction to Research 22


Modelling Skills References

Effects of randomness (noise) in data

The second challenging aspect of empirical modelling is the presence of random or stochastic effects.

Randomness (uncertainties) in the process influences identification in several ways:

I Accuracy of predictions
I Errors in parameter estimates
I Goodness of the deterministic model

Arun K. Tangirala, IIT Madras Introduction to Research 23

Modelling Skills References

Signal-to-Noise Ratio (SNR)

SNR
A key measure that quantifies the effects of noise is the Signal-to-Noise Ratio (SNR), which is defined
as the ratio of variance of signal to the variance of noise in a measurement.
2
signal
SNR = 2
noise
SNR can be interpreted as a measure of the degree of certainty (deterministic portion) to uncertainty

Arun K. Tangirala, IIT Madras Introduction to Research 24


Modelling Skills References

Example: Effect of SNR


Estimation of linear model from measured data

Process : x[k] = b1 u[k 1] + b0 ; b1 = 5; b0 = 2

Only y[k] = x[k] + v[k] (measurement) is available. Goal is to estimate b1 , b0 .

Arun K. Tangirala, IIT Madras Introduction to Research 25

Modelling Skills References

Example: Effect of SNR


Estimation of linear model from measured data

Process : x[k] = b1 u[k 1] + b0 ; b1 = 5; b0 = 2

Only y[k] = x[k] + v[k] (measurement) is available. Goal is to estimate b1 , b0 .

Output vs. Input (SNR = 100) Output vs. Input (SNR = 10)
8 8

7 y = 5.009*x + 1.989 7 y = 5.027*x + 1.965

6 6

5 5
Output

Output

4 4

Data
3 Best Fit 3
Data
Best Fit
2 2

1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Input Input

b̂1 = 0.036,
b̂0 = 0.02 b̂1 = 0.114, b̂0 = 0.064
Arun K. Tangirala, IIT Madras Introduction to Research 26
Modelling Skills References

Example: Effect of SNR


Estimation of linear model from measured data

Process : x[k] = b1 u[k 1] + b0 ; b1 = 5; b0 = 2

Only y[k] = x[k] + v[k] (measurement) is available. Goal is to estimate b1 , b0 .

Output vs. Input (SNR = 100) Output vs. Input (SNR = 10)
8 8

7 y = 5.009*x + 1.989 7 y = 5.027*x + 1.965

6 6

5 5
Decrease in SNR increases the
Output

Output

4 4 error in parameter estimates


p
(proportional to 1/SNR)
Data
3 Best Fit 3
Data
Best Fit
2 2

1 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Input Input

b̂1 b̂0 = 0.036, = 0.02 b̂1 = 0.114, b̂0 = 0.064


Arun K. Tangirala, IIT Madras Introduction to Research 27

Modelling Skills References

Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]

Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.

25

20

15
Output

10

0
0 0.5 1 1.5 2 2.5 3 3.5 4
Input

Input-output data

Arun K. Tangirala, IIT Madras Introduction to Research 28


Modelling Skills References

Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]

Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.
Comparison of predictions on test data
25 150
Actual
rd
3 order
20 100 4th order
5th order

15 50
Output

10 Output 0

5 −50

0 −100
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8
Input Input

Input-output data Cross-validation

Arun K. Tangirala, IIT Madras Introduction to Research 29

Modelling Skills References

Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]

Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.
Comparison of predictions on test data
25 150 30
Actual
rd
3 order
20 100 4th order 25
5th order
Residual norm

15 50 20
Output

Output

10 0 15

5 −50 10

0 −100 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7
Input Input Order

Input-output data Cross-validation Selecting the order

Arun K. Tangirala, IIT Madras Introduction to Research 30


Modelling Skills References

Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]

Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.
Comparison of predictions on test data
25 150 30
Actual
rd
3 order
20 100 4th order 25
5th order

Residual norm
15 50 20
Output

10 Output 0 15

5 −50 10

0 −100 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7
Input Input Order

Input-output data Cross-validation Selecting the order

Overfitting occurs whenever the local chance variations are treated as global characteristics

Arun K. Tangirala, IIT Madras Introduction to Research 31

Modelling Skills References

Example: Overfitting
Process : x[k] = 1.2 + 0.4u[k] + 0.3u2 [k] + 0.2u3 [k]

Only y[k] = x[k] + v[k] (measurement) is available. Goal is to fit a suitable model.
Comparison of predictions on test data
25 150 30
Actual
rd
3 order
20 100 4th order 25
5th order
Residual norm

15 50 20
Output

Output

10 0 15

5 −50 10

0 −100 5
0 0.5 1 1.5 2 2.5 3 3.5 4 4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7
Input Input Order

Input-output data Cross-validation Selecting the order

Overfitting occurs whenever the local chance variations are treated as global characteristics

3rd order fit: ŷ[k] = 1.17 + 0.35 u[k] + 0.36 u2 [k] + 0.19 u3 [k]
(±0.05) (±0.1) (±0.06) (±0.01)

Arun K. Tangirala, IIT Madras Introduction to Research 32


Modelling Skills References

Questions for reflection

I What type of models are possible? Which one(s) to choose?


I How do we “fit" a model that “explains" the variations observed in experimental data?
I How to “correctly" account for the deterministic and stochastic effects?
I Will the experiment influence the model that we fit? If yes, in what way?
I How do we set up and solve the problem of estimating the unknown model parameters?
I What kind of experiments should we design to obtain a good quality model?
I How much data do we collect? (what should be the sample size?)

Arun K. Tangirala, IIT Madras Introduction to Research 33

Modelling Skills References

Bibliography I

Bendat, J. S. and A. G. Piersol (2010). Random Data: Analysis and Measurement Procedures. 4th edition. New
York, USA: John Wiley & Sons, Inc.
Johnson, R. A. (2011). Miller and Freund’s: Probability and Statistics for Engineers. Upper Saddle River, NJ,
USA: Prentice Hall.
Montgomery, D. C. and G. C. Runger (2011). Applied Statistics and Probability for Engineers. 5th edition. New
York, USA: John Wiley & Sons, Inc.
Ogunnaike, B. A. (2010). Random Phenomena: Fundamentals of Probability and Statistics for Engineers. Boca
Raton, FL, USA: CRC Press, Taylor & Francis Group.
Tangirala, A. K. (2014). Principles of System Identification: Theory and Practice. CRC Press, Taylor & Francis
Group.

Arun K. Tangirala, IIT Madras Introduction to Research 34

You might also like