You are on page 1of 6

Kristin Fridgeirsdottir Data Analytics for Leaders

Break-Out Session 3a

Non-Linear Regression Analysis

In some cases a linear model is not suitable for modelling the relationship between two
variables. Let us have a look at another example: General Public Electric (GPE). GPE
operates 11 thermal power stations of basically the same design. We will investigate the
relationship between the cost efficiency (pence per Kilowatt-hour) of the electricity
generating plants, as a function of their generating capacity (Megawatts installed). The object
of the exercise is to model the “economy of scale” effect which allows larger plants to
generate electricity at lower marginal cost per unit. In practice, this analysis might be part of a
larger exercise in which “economy of scale” would be one of a variety of factors which would
be taken into account in deciding between alternative development plans. A more accurate
understanding of the relative efficiency of different size plants would make it easier to
balance this factor against capital investment costs, environmental factors, construction time,
etc. The data can be found in the Excel file “GPE.xlsx”.

The spreadsheet, scatter plot and regression results are shown below.

A B C
1 General Public Electric
2
3 Plant Capacity Cost per Unit
4 1 525 1.80
5 2 555 1.70
6 3 600 1.60
7 4 610 1.58
8 5 700 1.35
9 6 990 1.20
10 7 1100 1.13
11 8 1450 0.95
12 9 1950 0.85
13 10 1950 0.84
14 11 2400 0.75

1
Cost per Unit
2.00
1.80
1.60
1.40
Cost per Unit

1.20
1.00
0.80
0.60
0.40
0.20
0.00
0 500 1,000 1,500 2,000 2,500 3,000
Capacity

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.94
R Square 0.88
Adjusted R Square 0.86
Standard Error 0.14
Observations 11

ANOVA
df SS MS F Significance F
Regression 1 1.26 1.26 64.05 0.00
Residual 9 0.18 0.02
Total 10 1.43

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 1.87 0.09 21.26 0.00 1.67 2.06
Capacity -0.00053 0.00007 -8.00 0.00 -0.00068 -0.00038

RESIDUAL OUTPUT

Observation Predicted Cost per Unit Residuals Standard Residuals


1 1.59 0.21 1.59
2 1.57 0.13 0.96
3 1.55 0.05 0.38
4 1.54 0.04 0.27
5 1.50 -0.15 -1.10
6 1.34 -0.14 -1.08
7 1.29 -0.16 -1.17
8 1.10 -0.15 -1.13
9 0.84 0.01 0.10
10 0.84 0.00 0.03
11 0.60 0.15 1.14

2
The model, namely

Cost = 1.87 – 0.00053  (Capacity) + e

seems reasonable because of the high t-statistic related to the Capacity variable and the high
Adjusted R2 (86%), implying that there is a significant relationship between capacity and cost,
and that we are able to explain a lot of the variability in costs, purely by examining the
capacity. Also, the result is logical in the sense that we indeed observe an economies of scale
effect: cost decreases as capacity increases, for every Megawatt of generating capacity, the
unit cost decreases at a rate of 0.00053.

However, if we examine the line fit plot and the residual plot (see below), we observe the
following:
 the line does not perfectly fit the data, it slightly underestimates the cost for small
capacity values, overestimates it for medium capacity values and again underestimates
it for high capacity values;
 the residual plot clearly exhibits a pattern, the errors are positive for small capacity
values, negative for medium capacity values and again positive for high capacity
values.

Capacity Line Fit  Plot
2.00
1.80
1.60
1.40
Cost per Unit

1.20
1.00 Cost per Unit
0.80
0.60 Predicted Cost per
0.40 Unit
0.20
0.00
0 1,000 2,000 3,000
Capacity

3
Capacity  Residual Plot
0.25
0.2
0.15
0.1
Residuals

0.05
0
‐0.05 0 500 1,000 1,500 2,000 2,500 3,000

‐0.1
‐0.15
‐0.2
Capacity

This indicates that the model is ill-specified, and more specifically that we have been trying to
fit a line to data which exhibits a non-linear relationship. We therefore should look for a more
suitable specification of the model.

Let us try to regress costs to the reciprocal of the plant capacity, i.e.

Cost = a + b  (1/Capacity) + e

We therefore have to transform the capacity data. The proposed relationship (Cost as an
function of 1/Capacity) resembles the shape of the curve we observe in the scatter plot.
Sometimes however, different candidate transformations exist.

In order to transform the capacity data in our model, we add another column. In cell D3, we
enter the title “1/Capacity”. In cell D4, we enter the formula “=1/B4”. The value 0.0019
should appear (= 1/525). We select cell D4 and drag the handle down to fill the entire column
with the transformed capacity data. Now, we can run another regression analysis, using the
new column as the explanatory variable.

We obtain the following results (see results on following pages):


 R2 increased to 99%, indicating a near-perfect fit;
 standard error reduced to 0.04;
 residual plot reveals no obvious pattern;
 line fit plot indicates near-perfect fit of line and data.

Below, you will also find a plot of the estimated costs as a function of Capacity, revealing the
non-linear nature of the estimated relationship. In order to draw such a graph, add another
column in your spreadsheet with the cost predictions, computed using the regression
coefficients and the explanatory variables data. Then draw a scatter plot of the predictions as
well as the actual electricity cost data versus capacity. Again, the predictions will be displayed
as points rather than a line, but this can be changed by double-clicking the points and
selecting a different format.
4
SUMMARY OUTPUT

Regression Statistics
Multiple R 1.00
R Square 0.99
Adjusted R Square 0.99
Standard Error 0.04
Observations 11

ANOVA
df SS MS F Significance F
Regression 1 1.418077648 1.418077648 957.9913963 1.87988E-10
Residual 9 0.013322352 0.001480261
Total 10 1.4314

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 0.50 0.03 18.37 0.00 0.43 0.56
1/Capacity 664.08 21.46 30.95 0.00 615.55 712.62

RESIDUAL OUTPUT

Observation Predicted Cost per Unit Residuals Standard Residuals


1 1.76 0.04 1.08
2 1.69 0.01 0.21
3 1.60 0.00 -0.07
4 1.58 0.00 -0.12
5 1.44 -0.09 -2.59
6 1.17 0.03 0.91
7 1.10 0.03 0.83
8 0.95 0.00 -0.10
9 0.84 0.01 0.37
10 0.84 0.00 0.10
11 0.77 -0.02 -0.62

1/Capacity Line Fit  Plot
2.00
1.80
1.60
1.40
Cost per Unit

1.20
1.00 Cost per Unit
0.80
0.60 Predicted Cost per
0.40 Unit
0.20
0.00
0 0.0005 0.001 0.0015 0.002
1/Capacity

5
1/Capacity  Residual Plot
0.06
0.04
0.02
0
Residuals

‐0.02 0 0.0005 0.001 0.0015 0.002

‐0.04
‐0.06
‐0.08
‐0.1
‐0.12
1/Capacity

Capacity Line Fit  Plot
2.00
1.80
1.60
1.40
Cost per Unit

1.20
1.00
0.80 Cost per Unit
0.60 Prediction
0.40
0.20
0.00
0 500 1,000 1,500 2,000 2,500 3,000
Capacity

You might also like