You are on page 1of 5

Polynomial Regression Using Excel

There are times when a best-fit line (i.e., a first-order polynomial) is not enough. Calibration data that is obviously curved can often be fitted satisfactorily with a second- (or higher-) order polynomial. Excel can perform polynomial regression two ways, depending on the information you require. Method 1: Trendline calculation from a plot One way to perform polynomial regression is to fit the appropriate trendline to the data (and there are a number of options in addition to polynomials). Suppose the data below fit a third order (or degree) polynomial. You want to determine the equation and plot the data. Use the following steps:
x 1 2 3 4 y 7 24 71 160

1) Graph the data using an xy (scatter) plot (do not connect the dots). 2) Select the data points on the graph and select "Add trendline". 3) Pick "Polynomial" for the type of trendline and set the order to 3. 4) Under Options, add checks to display the equation and, if you wish, the rather uninformative R-squared value (a measure of closeness of fit of the data to the curve). 5) Use linear scales for the x-axis and y-axis. 6) Extra significant figures can be added to the coefficients in your equation by selecting the equation on the graph and using the Number function to increase precision (number of decimal places). The resultant plot:

$%&!'%()*&"+,-.,//)%'0"!"1,./2/"#"
-*+" -)+" -,+" -$+" -++" *+" )+" ,+" $+" +" +" +./" -" -./" !"#"$%&"'"&%$"(")%"'"*"

!"

$" #"

$./"

&"

&./"

,"

,./"

Method 2: Using the Regression Function in the Data Analysis Toolpak While Method 1 is useful in providing basic information about a polynomial curve, there is some information missing (such as the standard error in the estimates or the standard deviation in the residuals*). It is possible to apply the Data Analysis ToolPak add-in to obtain this information. Suppose you want to analyze the Pb concentration in tap water using

graphite furnace AAS. The following data (next page) were collected. Assuming the data fit a second-order polynomial, report the concentration of lead in the tap water and its uncertainty.

The residual (or error) represents unexplained variation after fitting a regression model. It is the difference between the observed value of the variable and the value suggested by the regression model.

1) First the data must be entered into an Excel worksheet. 2) Since you want to generate a second order polynomial, you must then insert another column after the Pb concentration column whose cells contain the squared concentration values. (If you had wanted a third order polynomial, you would insert another column containing the cube of the concentration value. Etc.)

3) Select the Data Analysis ToolPak add-in Regression. When choosing the X range, highlight the block that contains both concentrations and their squared values. (You would highlight additional xn columns for higher order polynomials.) If you include headings in your selection, be sure to check the Labels box. Also select the Confidence Level (95%) and the Residuals boxes. 4) The data, a Method 1-generated calibration curve, and the output of the Regression add-in are posted on the course website (separate tabs in the Method 2 Output Excel file). There is a lot of information that is extraneous. Here are the four things researchers most commonly look for; the last two will be of most interest to us: R2 and Adjusted R2: the R2 value gives the percent variance of y that is explained by the variance of the x value(s). Adjusted R2 is more conservative and therefore more likely to be quoted. The higher these values (closer to 1), the better the data fit the curve. Regression Significance F: This is the probability that the output results by chance rather than from a real correlation between independent and dependent variables. The smaller the value, the greater the probability that the results have not arisen by chance. Coefficients for the intercept, x, x2, etc. (or their labels, if you had used them) and their standard errors: These are the actual coefficients for the equation of your polynomial curve. As with Significance F, the smaller the P-values, the greater the probability that the results have not arisen by chance. Residuals: These have been previously defined qualitatively in a footnote. You want to see no pattern in these values and that they are distributed around 0. You can do a quick scatter plot of these data to see if this is the case. If the residuals follow a pattern, then there is another factor that is affecting the correlation between your independent and dependent variables, possibly a systematic error in the experiment or a physical phenomenon whose influence on the data has not been taken into consideration.

5) As noted above, you now have the equation of your curve and you can use it to calculate the Pb concentration (and its uncertainty, using the standard errors given for your error propagation) in the tap water sample. Answer: Pb concentration is 44.3 1.0 ppb, or 44 1 ppb.

You might also like