Professional Documents
Culture Documents
Exercise set 2
Solutions
1. Determine whether the following data support a proportionality argument for y being proportional
with z½:
y 3.5 5 6 7 8
z 3 6 9 12 15
y 3.5 5 6 7 8
z 3 6 9 12 15
z1/2 1.732051 2.44949 3 3.464102 3.872983
1/2
y/z 2.020726 2.041241 2 2.020726 2.065591
Yes, they are proportional with a factor of approximately k = 2.03. Can also check this visually by
plotting y and k z1/2. The x axis can be either z or z1/2.
2. Derive the equations that minimize the sum of the squared deviations between a set of data points
and the quadratic model y = c1x2 + c2x + c3. Use the equations to find estimates of c1, c2 and c3 for
the following set of data:
We need to minimize ∑ ∑
The partial derivatives with respect to each of the constants should be equal to 0:
∑( ) ⇒ ∑ ∑ ∑ ∑
∑( ) ⇒ ∑ ∑ ∑ ∑
∑( ) ⇒ ∑ ∑ ∑
1
We get the system of equations:
3.
a. In the following data, W represents the weight of a fish and l represents its length. Fit the
model W = kl3 to the data using the least-squares criterion.
We have to minimize ∑ ∑
∑( ) ⇒ ∑ ∑
We obtain k = 0,0084
b. In the following data, g represents the girth of a fish. Fit the model W = klg2 to the data using
the least-squares criterion.
We maximize ∑ ∑
2
The partial derivative for k should be 0:
∑( ) ⇒ ∑ ∑
We obtain k = 0.018675
c. Which of the two models fits the data better? Justify. Which model do you prefer? Why?
The two models give the following predictions:
∑
√
The quality of a model is ∑
quality(kl3) = 4.565%
quality(klg2) = 5.554%
The first model fits the data better, but the second model accounts for the girth of a fish as
well.
60
50
40
Data
30
kl^3
20 klg^2
10
0
1 2 3 4 5 6 7 8
4. Linearize the model P=aebt and then fit it to the data below.
t 7 14 21 28 35 42
P 8 41 133 250 280 297
ln(P)=ln(a)+bt
3
The partial derivatives with respect to p and q are 0:
∑ ⇒ ∑ ∑ ∑
∑ ⇒ ∑ ∑
600
500
400
300
p
p
200 prediction
100
0
7 14 21 28 35 42
t
5. In 1976, Marc and Helen Bornstein studied the pace of life. To see if life becomes more hectic as the
size of the city becomes larger, they systematically observed the mean time required for pedestrians
to walk 50 feet on the main streets of their cities and towns. The table below shows some of the data
they collected.
4
(12) Netanya, Israel 70,700 4.31
(13) Jerusalem, Israel 304,500 4.42
(14) New Haven, USA 138,000 4.39
(15) Brooklyn, USA 2,602,000 5.05
(a) Fit the model V = CPα to the pace of life data using a log-log transformation.
Minimize ∑ ∑
(b) Plot the equation you found in part (a) superimposed on a scatter plot of the original data.
for your model. What do the results suggest about the merits of your model?
MAE1 = 0.33766
5
(d) Now try to fit the model V = m ln(P) + b to the data. Compare the two models.
Which is better and why?
Minimize ∑ ∑
The mean absolute error of the first model is smaller than the one of the second model.
6. In the following data, X is the Fahrenheit temperature and Y is the number of times a cricket chirps
in 1 minute. Make a scatter plot of the data and discuss the appropriateness of using a 5-degree
polynomial that passes through the data points as an empirical model. Fit a polynomial to the data
and plot the results.
X 46 51 54 57 59 61 63 66 68 72
Y 40 55 72 77 90 96 99 113 127 132
6
Cricket
160
140 y = 3.703x - 130.97
120
100
80 Cricket
60 Linear (Cricket)
40
20
0
0 20 40 60 80
We can see that a linear model gives a very good approximation for the data, making a 5-degree
polynomial not so suitable. We can however fit a 5-degree polynomial to the data if we want to. We
show here two ways of doing this. The first method relies on the Lagrangian from.
where
X 46 51 57 61 66 72
Y 40 55 77 96 113 132
7
P(x) = 9E-05x5 - 0.0253x4 + 2.9439x3 - 169.77x2 + 4860.4x – 55275
Note that the final result is an approximation. If you try to plot the values of this polynomial
you will have a surprise. The exact polynomial obtained via Lagrange interpolation is
Compare the coefficients of this polynomial to the approximate values. Although all approximations
are reasonable, when used with a high degree polynomial and large values for x (think of 505), they
can lead to totally different results.
The second method for fitting a 5-degree polynomial is to find the coefficients that minimize the
sum of squared deviations.