You are on page 1of 1

216

Chapter 15. Understanding Experimental Data

15.2.1 Coefficient of Determination


When we fit a curve to a set of data, we are finding a function that relates an
independent variable (inches horizontally from the launch point in this example)
to a predicted value of a dependent variable (inches above the launch point in
this example). Asking about the goodness of a fit is equivalent to asking about
the accuracy of these predictions. Recall that the fits were found by minimizing
the mean square error. This suggests that one could evaluate the goodness of a
fit by looking at the mean square error. The problem with that approach is that
while there is a lower bound for the mean square error (zero), there is no upper
bound. This means that while the mean square error is useful for comparing
the relative goodness of two fits to the same data, it is not particularly useful for
getting a sense of the absolute goodness of a fit.
We can calculate the absolute goodness of a fit using the coefficient of
determination, often written as R2.97 Let !! be the ! !! observed value, !! be the
corresponding value predicted by model, and ! be the mean of the observed
values.
!! = 1

!! )!
!
! (!! !)

! (!!

By comparing the estimation errors (the numerator) with the variability of the
original values (the denominator), R2 is intended to capture the proportion of
variability in a data set that is accounted for by the statistical model provided by
the fit. When the model being evaluated is produced by a linear regression, the
value of R2 always lies between 0 and 1. If R2 = 1, the model explains all of the
variability in the data. If R2 = 0, there is no relationship between the values
predicted by the model and the actual data.
The code in Figure 15.5 provides a straightforward implementation of this
statistical measure. Its compactness stems from the expressiveness of the
operations on arrays. The expression (predicted - measured)**2 subtracts the
elements of one array from the elements of another, and then squares each
element in the result. The expression (measured - meanOfMeasured)**2
subtracts the scalar value meanOfMeasured from each element of the array
measured, and then squares each element of the results.
def rSquared(measured, predicted):
"""Assumes measured a one-dimensional array of measured values
predicted a one-dimensional array of predicted values
Returns coefficient of determination"""
estimateError = ((predicted - measured)**2).sum()
meanOfMeasured = measured.sum()/float(len(measured))
variability = ((measured - meanOfMeasured)**2).sum()
return 1 - estimateError/variability

Figure 15.5 Computing R2

97 There are several different definitions of the coefficient of determination. The definition
supplied here is used to evaluate the quality of a fit produced by a linear regression.