You are on page 1of 8

Computational modelling techniques

Exercise set 2
Solutions

1. Determine whether the following data support a proportionality argument for y being proportional
with z½:

y 3.5 5 6 7 8
z 3 6 9 12 15

y 3.5 5 6 7 8
z 3 6 9 12 15
z1/2 1.732051 2.44949 3 3.464102 3.872983
1/2
y/z 2.020726 2.041241 2 2.020726 2.065591

Yes, they are proportional with a factor of approximately k = 2.03. Can also check this visually by
plotting y and k z1/2. The x axis can be either z or z1/2.

2. Derive the equations that minimize the sum of the squared deviations between a set of data points
and the quadratic model y = c1x2 + c2x + c3. Use the equations to find estimates of c1, c2 and c3 for
the following set of data:

x 0.1 0.2 0.3 0.4 0.5


y 0.06 0.12 0.36 0.65 0.95

We need to minimize ∑ ∑

The partial derivatives with respect to each of the constants should be equal to 0:

∑( ) ⇒ ∑ ∑ ∑ ∑

∑( ) ⇒ ∑ ∑ ∑ ∑

∑( ) ⇒ ∑ ∑ ∑

1
We get the system of equations:

The final model is:

3.
a. In the following data, W represents the weight of a fish and l represents its length. Fit the
model W = kl3 to the data using the least-squares criterion.

l 14.5 12.5 17.25 14.5 12.625 17.75 14.125 12.625


W 27 17 41 26 17 49 23 16

We have to minimize ∑ ∑

The partial derivative with respect to k should be 0:

∑( ) ⇒ ∑ ∑

We obtain k = 0,0084

b. In the following data, g represents the girth of a fish. Fit the model W = klg2 to the data using
the least-squares criterion.

l 14.5 12.5 17.25 14.5 12.625 17.75 14.125 12.625


g 9.75 8.375 11.0 9.75 8.5 12.5 9.0 8.5
W 27 17 41 26 17 49 23 16

We maximize ∑ ∑

2
The partial derivative for k should be 0:

∑( ) ⇒ ∑ ∑

We obtain k = 0.018675

c. Which of the two models fits the data better? Justify. Which model do you prefer? Why?
The two models give the following predictions:

l 14.5 12.5 17.25 14.5 12.625 17.75 14.125 12.625


g 9.75 8.375 11.0 9.75 8.5 12.5 9.0 8.5
W 27 17 41 26 17 49 23 16
kl3 25.7205 16.4780 43.3055 25.7205 16.9773 47.1814 23.7761 16.9773
klg2 25.7419 16.3735 38.9796 25.7419 17.0346 51.7942 21.3666 17.0346



The quality of a model is ∑

quality(kl3) = 4.565%
quality(klg2) = 5.554%

The first model fits the data better, but the second model accounts for the girth of a fish as
well.
60

50

40
Data
30
kl^3

20 klg^2

10

0
1 2 3 4 5 6 7 8

4. Linearize the model P=aebt and then fit it to the data below.

t 7 14 21 28 35 42
P 8 41 133 250 280 297
ln(P)=ln(a)+bt

Minimize the sum ∑ ∑

where the linear model is ln(P) = pt+q.

3
The partial derivatives with respect to p and q are 0:

∑ ⇒ ∑ ∑ ∑

∑ ⇒ ∑ ∑

We get the linear system of equations:


{

p = 0.099862 => b = 0.099862


q = 2.142269 => a = 8.518746

The model is P ≈ 8.52e0.1t

600

500

400

300
p

p
200 prediction

100

0
7 14 21 28 35 42
t

5. In 1976, Marc and Helen Bornstein studied the pace of life. To see if life becomes more hectic as the
size of the city becomes larger, they systematically observed the mean time required for pedestrians
to walk 50 feet on the main streets of their cities and towns. The table below shows some of the data
they collected.

Population Mean Velocity


Location
P V (ft/sec)
(1) Brno, Czechoslovakia 341,948 4.81
(2) Prague, Czechoslovakia 1,092,759 5.88
(3) Corte, Corsica 5,491 3.31
(4) Bastia, France 49,375 4.90
(5) Munich, Germany 1,340,000 5.62
(6) Psychro, Crete 365 2.76
(7) Itea, Greece 2,500 2.27
(8) Iraklion, Greece 78,200 3.85
(9) Athens, Greece 867,023 5.21
(10) Safed, Israel 14,000 3.70
(11) Dimona, Israel 23,700 3.27

4
(12) Netanya, Israel 70,700 4.31
(13) Jerusalem, Israel 304,500 4.42
(14) New Haven, USA 138,000 4.39
(15) Brooklyn, USA 2,602,000 5.05

(a) Fit the model V = CPα to the pace of life data using a log-log transformation.

ln(V) = ln(C) + a ln(P) = p ln(P) + q

Minimize ∑ ∑

The partial derivatives with respect to p and q are 0:

We get the system of equations:


{

p = 0.09613 => a= 0.096013


q = 0.33433 => C = 1.39701

(b) Plot the equation you found in part (a) superimposed on a scatter plot of the original data.

(c) Calculate the mean absolute error

for your model. What do the results suggest about the merits of your model?

MAE1 = 0.33766

5
(d) Now try to fit the model V = m ln(P) + b to the data. Compare the two models.
Which is better and why?

Minimize ∑ ∑

The partial derivatives for m and b are 0:

We get the system of equations:


{
with solution:
{
MAE2 = 0.342628

The mean absolute error of the first model is smaller than the one of the second model.

6. In the following data, X is the Fahrenheit temperature and Y is the number of times a cricket chirps
in 1 minute. Make a scatter plot of the data and discuss the appropriateness of using a 5-degree
polynomial that passes through the data points as an empirical model. Fit a polynomial to the data
and plot the results.

X 46 51 54 57 59 61 63 66 68 72
Y 40 55 72 77 90 96 99 113 127 132

6
Cricket
160
140 y = 3.703x - 130.97

120
100
80 Cricket

60 Linear (Cricket)

40
20
0
0 20 40 60 80

We can see that a linear model gives a very good approximation for the data, making a 5-degree
polynomial not so suitable. We can however fit a 5-degree polynomial to the data if we want to. We
show here two ways of doing this. The first method relies on the Lagrangian from.

P(x) = y0L0(x)+y1L1 x … ynLn(x),

where

We choose six data points:

X 46 51 57 61 66 72
Y 40 55 77 96 113 132

L0=-(x5 - 307x4 + 37569x3 - 2290725x2 + 69591366x - 842657904)/(429000)


L0=-(2,3e-5)x5 -(0,007)x4 +(0.87)x3 - (53,4)x2 +(1622,1)x - 19642,3

L1=( x5 - 302x4 + 36289x3 - 2168160x2 + 64388556x - 760044384)/(20196)


L1=( 4,95e-5)x5 - (0,001)x4 + (1,8)x3 - (10,8)x2 + (3188)x - 37633,4

L2=(x5 - 296x4 + 34819x3 - 2034216x2 + 59014404x - 680039712)/(709280)


L2=(1,4e-6)x5 - (4.2e-4)x4 + (0.05)x3 - (2,86)x2 + (83,2)x - 958,8

L3=(x5 - 292x4 + 33879x3 - 1952280x2 + 55875636x - 635446944)/(63180000)


L3=(1,5e-8)x5 - (4,62e-6)x4 + (5,36e-4)x3 - (0,03)x2 + (0,88)x - 10,05

L4=(x5 - 287x4 + 32749x3 - 1857465x2 + 52372026x - 587307024)/(495954368)


L4=(2,01e-9)x5 -(5,8e-7)x4 + (6,6e-5)x3 - (0,003)x2 +(0,1)x - 1,2

L5=(x5 - 281x4 + 31459x3 - 1753851x2 + 48687444x - 538364772)/(2480843376)


L5=(4,03e-10)x5 - (1,13e-7)x4 + (1,3e-5)x3 - (7e-4)x2 + (0,001)x - 0,22

7
P(x) = 9E-05x5 - 0.0253x4 + 2.9439x3 - 169.77x2 + 4860.4x – 55275

Note that the final result is an approximation. If you try to plot the values of this polynomial
you will have a surprise. The exact polynomial obtained via Lagrange interpolation is

Compare the coefficients of this polynomial to the approximate values. Although all approximations
are reasonable, when used with a high degree polynomial and large values for x (think of 505), they
can lead to totally different results.

The second method for fitting a 5-degree polynomial is to find the coefficients that minimize the
sum of squared deviations.

You might also like