You are on page 1of 11

Machine Problem No.


1. To solve curve-fitting problems using MS Excel®
2. To determine the best model for a given set of data
3. To estimate parameters based on developed model

Theoretical Discussion

Chemical engineering often involves the expression of experimental data in terms of an equation. The equation
must be developed and the parameters that provide the best fit to the data must be determined. MS Excel® offers
simple methods in fitting a straight line to data, as well as methods in fitting a polynomial to data.

Consider a set of data

{𝑦 (𝑥𝑖 )}, for 𝑖 = 1 − 𝑛 (1)

Find an equation that models the data. In general form:

𝑦 (𝑥; 𝑎1 , 𝑎2 , … . , 𝑎𝑀 ) (2)

The equation depends on x and on some unknown parameters, {a1, a2, … aM}. The goal is to find the set of
parameters that gives the best fit. The best fit is usually defined by minimizing the sum of the square residuals,
where the residual is the difference between the predicted value and the data. Because the data may have errors
in it, an exact fit will not be possible in most cases. Thus, it is imperative that the variance of the residuals be
[𝑦𝑖 − 𝑦(𝑥𝑖 )]2
𝜎 =∑ ; 𝑦 ≡ 𝑦(𝑥𝑖 , 𝑎1 , 𝑎2 , … , 𝑎𝑀 ) (3)

If the parameters enter the equation linearly, then the minimization problem reduces to a set of linear equations
which are solved easily by MS Excel®. The effectiveness of the curve fit is often reported as values of the
square of the linear correlation coefficient, r2. The linear correlation coefficient is defined as:

∑𝑁𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦
𝑟= 𝑁 𝑁 (4)
∑𝑖=1(𝑥𝑖 − 𝑥̅ ) ∑𝑖=1(𝑦1 − 𝑦̅)2

Values of r near 1 indicate a positive correlation; r near -1 means a negative correlation and r near zero means
no correlation.

Illustrative Examples

Straight line curve fit using MS Excel®

Table 2.1 represents seven measurements of the same thing (x,y). The goal is to find the equation, 𝑦 = 𝑎 + 𝑏𝑥
that best represents the data.

Computer Applications in ChE Page 1

Machine Problem No. 1

Table 2.1. Simulated Data for Two Measurements

x y
110 97
210 206
299 310
390 386
480 521
598 551
657 742

1. Prepare the worksheet containing the data given in Table 2.1. Moreover, assign placeholders for the slope,
intercept and the square of the correlation coefficient. See Figure 2.1.

Figure 2.1. Data Worksheet

2. Determine the slope by using the Slope Function. Calling out the Slope Function brings out the dialog
box shown in Figure 2.2.

Figure 2.2. Arguments Dialog Box for Slope Function

Computer Applications in ChE Page 2

Machine Problem No. 1

3. Determine the intercept and the square of the correlation coefficient. Figure 2.3 and Figure 2.4 show the
dialog boxes for the Intercept and Correlation Coefficient Functions respectively.

Figure 2.3. Arguments Dialog Box for Intercept Function

Figure 2.4. Arguments Dialog Box for Rsq Function

The completed worksheet is shown in Figure 2.5.

Figure 2.5. Completed Worksheet

Computer Applications in ChE Page 3

Machine Problem No. 1

The curve fit is given by

𝑦 = −22.30 + 1.08𝑥

Since the r2 value is close to 1.0, the curve fit is good.

4. This may also be done by plotting the data in MS Excel(R) using XY Scatter. Highlight the data. Choose
Insert/Chart and choose the scatter plot with no lines. Figure 2.6 shows the chart produced.

Figure 2.6. MS Excel® Plot of Given Data

5. To plot the trend line, right click on any data point and choose add trendline. Choose the linear trendline
type. Tick Display Equation on chart and Display R-squared value on chart. Figure 2.7 shows the
Trendline Options and Figure 2.8 shows the Data plot with the trendline, including the curve fit and the r2

Figure 2.7. Trendline Options

Computer Applications in ChE Page 4

Machine Problem No. 1

Figure 2.8. Data Plot with Trendline

You need not be limited to a straight line when fitting data. The trendline options offer a number of regression
types. The curve fit that yields the r2 value nearest 1 is the best fit.

Multiple regression using MS Excel®

When functions are not simple powers, polynomial regression is used. However, to keep the problem linear, the
unknown coefficients must be coefficients of those functions; that is, the functions are completely specified.
Multiple regression simply determines how much of each one is needed. The form of the equation is

𝑦(𝑥) = ∑ 𝑎𝑖 𝑓𝑖 (𝑥) (5)


The goal is to find the best M values of {ai}, given the M functions fi (x) and data yi = y(xi), i = 1,...,N.

As an example, determine the constants in a reaction rate formula. The expected expression is

𝑟𝑎𝑡𝑒 = 𝑘𝑝𝐴𝑛 𝑝𝐵𝑚 (6)

and the goal is to find the values of k, n and m that give the best fit of the rate for various partial pressures of
substances A and B. This form is not linear, which is a requirement of multiple regression, but a transformation
can make it linear. In this case, take the logarithm of both sides.

ln(𝑟𝑎𝑡𝑒) = ln 𝑘 + 𝑛 ln 𝑝𝐴 + 𝑚 ln 𝑝𝐵 (7)

The equation has the following form:

𝑦 = 𝑎 + 𝑏𝑥1 + 𝑐𝑥2 (8)

Computer Applications in ChE Page 5

Machine Problem No. 1

1. Prepare the worksheet containing the data given in Table 2.2.

Table 2.2. Reaction Rate Data as a Function of Partial Pressures

pA pB Rate
0.1044 0.1036 0.5051
0.1049 0.2871 0.6302
0.1030 0.5051 0.6342
0.2582 0.1507 1.3155
0.2608 0.3100 1.5663
0.2407 0.4669 1.5981
0.3501 0.0922 1.6217
0.3437 0.1944 1.8976
0.3494 0.5389 2.1780
0.4778 0.1017 2.1313
0.4880 0.2580 2.7227
0.5014 0.5037 3.1632

2. Transform the given to make it linear. Take the logarithm of the partial pressures and the rate. Figure 2.9
shows the worksheet produced.

Figure 2.9. Worksheet on Reaction Rate Data

3. Use Data Analysis / Regression to determine the best line representing ln (rate) depending on ln (pA) and
ln (pB). Figure 2.10 shows the Regression dialog box. Figure 2.11 shows the results of the Regression

Computer Applications in ChE Page 6

Machine Problem No. 1

Figure 2.10. Regression Dialog Box

Figure 2.11. Regression Analysis Results

The coefficients column gives the results of the regression analysis. The best fit is for
𝑎 = 1.9603
𝑏 = 0.9801
𝑐 = 0.1894
𝑘 = 𝑒 𝑎 = 7.101

The curve fit is then

𝑟𝑎𝑡𝑒 = 7.101 𝑝𝐴0.9801 𝑝𝐵0.1894 (9)

The standard error gives an idea of how accurately the parameter is determined. If this value is a significant
fraction of the parameter, the data is probably too scattered to be correctly correlated.

Other options in the Regression Dialog Box that are useful include Residuals, Residual Plots, and Line Fit Plots.
They are important in evaluating the results. Residuals should be both positive and negative with no trends. The
r2 value is 0.9969, which indicates a good correlation.

Computer Applications in ChE Page 7

Machine Problem No. 1

Nonlinear regression using MS Excel®

Nonlinear regression is a curve fit in which the unknown parameters enter into the problem in a nonlinear way.
Note that, since nonlinear regression is more difficult for the computer, this method does not always work.
Nonlinear regression uses techniques borrowed from the field of optimization, and it is difficult to construct a
method that works every single time for every problem.

To use nonlinear regression, Equation 3 is minimized with respect to the unknown parameters. For this, the Solver
function is used.

1. Prepare a new worksheet for the reaction rate data in Table 2.2. Additionally, select place holders for the
parameters k, n and m. Assume a value of 1 for these parameters.

2. In another column, calculate the rate using the parameters, the partial pressure data and Equation 6.

3. The next column should contain the difference between the measured and calculated rates. The square of
this column goes to the next column.

4. Determine the average of the squares. Figure 2.11 shows the completed worksheet.

Figure 2.11. Completed Worksheet

5. The goal is to minimize the average of the square of the residuals, by changing the parameters. Open the
Solver add in by selecting Data/Analysis/Solver. Figure 2.12 shows the Solver Dialog Box.

Computer Applications in ChE Page 8

Machine Problem No. 1

Figure 2.12. Solver Parameters Dialog Box.

6. Selecting Solve brings the Solver Solution Dialog Box. Select the needed reports. The Answer Report
Worksheet generated by Excel(R) gives the optimum solution. The final value of the Target Cell shows
the minimum value of the average of the square of the residuals. The final values of the Adjustable Cells
show the values of the parameters k, n and m. Figure 2.13 shows the Answer Report.

Figure 2.13. Answer Report Worksheet

The best correlation, then, is

𝑟𝑎𝑡𝑒 = 6.978 𝑝𝐴0.965 𝑝𝐵0.196

Computer Applications in ChE Page 9

Machine Problem No. 1


Answer the following problems using MS Excel(R). Save the workbook on the mapped network drive using the
specified filename format: MP1-Surname. Move the charts to separate sheets.

1. Ten data points were taken in an experiment in which the independent variable x is the mole percentage of a
reactant and the dependent variable Y is the yield. Fit a model with these data. Show all iterations made.

x Y
20 73
20 78
30 85
40 90
40 91
50 87
50 86
50 91
60 75
70 65

2. Using the same data points in Problem No. 1, use a quadratic model to determine the value of x that maximizes
the yield.

3. The following experimental data for the equilibrium adsorption of pure methane gas on activated carbon at
296 K were obtained by Ritter and Yang.

q, cm3 (STP) of CH4/g

P, psia
45.5 40
91.5 165
113 350
121 545
125 760
126 910
126 970

Determine which of the three most common isotherms (Linear, Freundlich, Langmuir) best describes the data.
Give the model, including its parameters.

Computer Applications in ChE Page 10

Machine Problem No. 1

Results and Discussion. Answer the following questions.

1. What model best describes the data in Problem #1? What makes it the best model? Explain in terms of

2. What does the r2 value stand for? Why is r2 used, instead of r?

3. Explain through equations how the most appropriate isotherm for Problem #3 was determined.

4. What is the physical significance of the isotherm chosen in #3?

Computer Applications in ChE Page 11