You are on page 1of 5

Polynomial Regression Using Excel

There are times when a best-fit line (i.e., a first-order polynomial) is not
enough. Calibration data that is obviously curved can often be fitted
satisfactorily with a second- (or higher-) order polynomial.

Excel can perform polynomial regression two ways, depending on the
information you require.
Method 1: Trendline calculation from a plot

One way to perform polynomial regression is to fit the appropriate trendline
to the data (and there are a number of options in addition to polynomials).

Suppose the data below fit a third order (or degree) polynomial. You want
to determine the equation and plot the data. Use the following steps:

x y
1 7
2 24
3 71
4 160

1) Graph the data using an xy (scatter) plot (do not connect the dots).

2) Select the data points on the graph and select "Add trendline".

3) Pick "Polynomial" for the type of trendline and set the order to 3.

4) Under Options, add checks to display the equation and, if you wish, the
rather uninformative R-squared value (a measure of closeness of fit of the
data to the curve).

5) Use linear scales for the x-axis and y-axis.

6) Extra significant figures can be added to the coefficients in your equation
by selecting the equation on the graph and using the Number function to
increase precision (number of decimal places).

The resultant plot:



Method 2: Using the Regression Function in the Data Analysis Toolpak

While Method 1 is useful in providing basic information about a polynomial
curve, there is some information missing (such as the standard error in the
estimates or the standard deviation in the residuals
*
). It is possible to apply
the Data Analysis ToolPak add-in to obtain this information.

Suppose you want to analyze the Pb concentration in tap water using
graphite furnace AAS. The following data (next page) were collected.
Assuming the data fit a second-order polynomial, report the
concentration of lead in the tap water and its uncertainty.

*
The residual (or error) represents unexplained variation after fitting a regression
model. It is the difference between the observed value of the variable and the value
suggested by the regression model.
! # $%
&
' &%
$
( )% ' *
+
$+
,+
)+
*+
-++
-$+
-,+
-)+
-*+
+ +./ - -./ $ $./ & &./ , ,./
!

#
$%&!'%()*& +,-.,//)%'0 ! 1,./2/ #


1) First the data must be entered into an Excel worksheet.

2) Since you want to generate a second order polynomial, you must then
insert another column after the Pb concentration column whose cells
contain the squared concentration values. (If you had wanted a third
order polynomial, you would insert another column containing the cube
of the concentration value. Etc.)

3) Select the Data Analysis ToolPak add-in Regression. When choosing
the X range, highlight the block that contains both concentrations and
their squared values. (You would highlight additional x
n
columns for
higher order polynomials.) If you include headings in your selection, be
sure to check the Labels box. Also select the Confidence Level (95%)
and the Residuals boxes.

4) The data, a Method 1-generated calibration curve, and the output of the
Regression add-in are posted on the course website (separate tabs in
the Method 2 Output Excel file). There is a lot of information that is
extraneous. Here are the four things researchers most commonly look
for; the last two will be of most interest to us:

R
2
and Adjusted R
2
: the R
2
value gives the percent variance of y that
is explained by the variance of the x value(s). Adjusted R
2
is more
conservative and therefore more likely to be quoted. The higher
these values (closer to 1), the better the data fit the curve.

Regression Significance F: This is the probability that the output
results by chance rather than from a real correlation between
independent and dependent variables. The smaller the value, the
greater the probability that the results have not arisen by chance.

Coefficients for the intercept, x, x
2
, etc. (or their labels, if you had
used them) and their standard errors: These are the actual
coefficients for the equation of your polynomial curve. As with
Significance F, the smaller the P-values, the greater the probability
that the results have not arisen by chance.

Residuals: These have been previously defined qualitatively in a
footnote. You want to see no pattern in these values and that they
are distributed around 0. You can do a quick scatter plot of these
data to see if this is the case. If the residuals follow a pattern, then
there is another factor that is affecting the correlation between your
independent and dependent variables, possibly a systematic error in
the experiment or a physical phenomenon whose influence on the
data has not been taken into consideration.

5) As noted above, you now have the equation of your curve and you can
use it to calculate the Pb concentration (and its uncertainty, using the
standard errors given for your error propagation) in the tap water
sample.

Answer: Pb concentration is 44.3 1.0 ppb, or 44 1 ppb.

You might also like