0 views

- Riemann Hypothesis
- Basic Calculus
- Math Cheats Compiled
- solutions_to_WR_Ch4
- Summation.pdf
- test16
- 02 Linear an Quadratic 0001
- MATH102 Syllabus
- test16.pdf
- quadratic functions calculator lab
- Calculus
- Recommended Functions
- IJAIEM-2013-12-31-108
- Bio Optical Models
- Growth
- 20. Exponentials and Logarithms
- H1-sol
- comprehension and communications lesson final
- Gas Field Engineering - Deliverability Tests
- 1913-Theory of the Non-elastic & Elastic Catenary as Applied to Transmission Lines by Pierce

You are on page 1of 6

Farmingdale State University of New York New York Institute of Technology

Probably the one “new” mathematical topic that is most responsible for

modernizing courses in college algebra and precalculus over the last few years is the idea

of fitting a function to a set of data in the sense of a least squares fit. Whether it be

simple linear regression or nonlinear regression, this topic opens the door to applying the

mathematical ideas about lines, as well as curves (from families of exponential, power,

logarithmic, polynomial, sinusoidal, and logistic functions) to situations from all walks of

life. From an instructor’s perspective, these applications provide additional ways to

reinforce the key properties of each family of functions. From the students’ point of view,

they see how the mathematics being learned has direct application in modeling a very

wide variety of situations. Moreover, these methods are precisely the same mathematical

ideas that the students encounter in all of their quantitative courses, so the mathematics

becomes much better linked to the other disciplines. Fortunately, these capabilities are

built into all graphing calculators and spreadsheets such as Excel, so it becomes very

simple to incorporate the methods into courses at this level.

However, many mathematicians naturally feel uncomfortable with simply

introducing these techniques into their courses just because the available technology has

the capability. In their minds, the techniques become equivalent to the students using the

technology blindly without any mathematical understanding of what is happening. The

standard derivations of the formulas for the coefficients a and b in the regression equation

y = ax + b use optimization methods from multivariable calculus and so are totally

inaccessible to the students at this level. In this article, we develop an algebra-based

derivation of the regression equations that simultaneously reinforces other concepts that

one would treat in college algebra and precalculus.

The “least squares” criterion used to create the regression line y = ax + b that fits

a set of data points (x1, y1), (x2, y2), ..., (xn yn) is that the sum of the squares of the vertical

distances from the points to the line be minimum. See Figure 1. This means that we need

to find those values of a and 1b for which

n 2

S=

y-(ax+b)

i i

i

1

is minimum. To simplify the notation somewhat, we omit the indices in the summation.

Expanding the expression inside the summation, we get

S= �� �y - 2(ax + b) y + (ax + b ) �

i

2

i � i i

2

2 2 2 2

� i i i �. i i i

We now rewrite this expression by “distributing” the summation and taking the sum of

each of the terms inside the square bracket to get

S = �yi - �2axi yi - �2byi + �a xi + �2abxi + �b .

2 2 2 2

Since a and b do not depend on the index of summation i, we can factor them out of each

summation to get

S = �yi - 2a �xi yi - 2b�yi + a �xi + 2ab �xi +b �1 .

2 2 2 2

2 2 2 2

(1)

Once the xi and yi values are given in the data points, all of the summations are just sums

of known numbers and so are constant.

Notice that Equation (1) has a multiple of a2, two multiples of a, and the

remaining terms do not involve a. That is, we can rewrite Equation (1) as

S =a

2

�x i

2

-2a �xi yi + 2ab�xi + �yi 2 -2b�yi +nb 2

= ( �x ) a

i

2 2

( 2b�xi ) - ( 2�xi yi ) �

+�

� �yi 2 - 2b�yi +nb2 �

�a + �

� �.

Therefore, we can think of S as a quadratic function in a.

We now use the fact that the vertex of any parabola Y = AX2 + BX + C occurs at X

= -B/2A. (We use capital letters here because a, b, x, and y have different meanings in the

present context.) The coefficient of a2 in the above expression for S is the sum of the

squares of the x’s, so it is a positive leading coefficient and so the corresponding parabola

with S as a quadratic function of a opens upward. It therefore achieves its minimum at

the vertex where

a

( �x -2�x y ) �x y -b�x

- 2b i i i i i i

.

2 � xi 2 � xi 2

a �xi 2 �xi yi - b�xi

or equivalently,

a �xi 2 +b�xi �xi yi . (2)

Now look again at Equation (1) for S. We can also think of it as a quadratic

function of b and perform a similar analysis. When we re-arrange the terms, we get

2a �xi -2�yi �

nb 2 + �

� � �yi 2 +a 2 �xi 2 -2a�xi yi �

b+�

� �.

The leading coefficient, n, in this quadratic function of b is positive, so this parabola also

opens upward and its minimum occurs at its vertex, where

b

( �x -2�y ) �y -a �x

- 2a i i i i .

2n n

Cross-multiplying, we get

bn �yi - a �xi ,

which is equivalent to

a �xi + bn �yi . (3)

This is another linear equation in a and b. The two linear equations (2) and (3) are

sometimes called the normal equations in statistics texts. Their solution

�

Sx 2 �

� ( ) (

�Sy - Sxi yi

� i � i

)( Sxi ) n( Sxi yi ) -( Sxi )( Syi )

a 2 and b 2

n��Sx 2 �

� i �

- Sxi

� ( ) n�

�

�

-( Sxi )

Sxi 2 �

�

�

gives the coefficients in the equation of the regression line and this is the routine that is

built into all calculators and software packages that produce the regression line for any

set of data.

We next investigate the conditions under which the solutions for a and b are not

defined. For simplicity, consider n = 3 points at x = x1, x2, and x3. We can rewrite the

denominator in the equations for a and b (which is the same as the expression for the

determinant of the coefficient matrix of the system of linear equations (2) and (3)) as

3(x1 2 + x2 2 + x3 2) - (x1 + x2 + x3)2.

There is no solution when this quantity is zero. We expand to obtain

3 x1 2 + 3 x2 2 + 3 x3 2 - x1 2 - x2 2 - x3 2 - 2 x1 x2 - 2 x1 x3 - 2 x2 x3 = 0

or

2 x1 2 + 2 x2 2 + 2 x3 2 - 2 x1 x2 - 2 x1 x3 - 2 x2 x3 = 0.

Collecting the terms, we have

(x1 2 - 2 x1 x2 + x2 2) + (x1 2 - 2 x1 x3 + x3 2) + (x2 2 - 2 x2 x3 + x3 2) = 0

or

(x1 - x2)2 + (x1 - x3)2 + (x2 - x3)2 = 0.

However, the only way that the sum of these three non-negative terms can be zero is if

each one is zero, so that

x1 = x2, x1 = x3, and x2 = x3,

which means that all three must be equal: x1 = x2 = x3. That is, all three points must lie on

a vertical line. (Of course, if all three y-values are also equal, then there will be infinitely

many solutions.)

The same condition clearly applies for any value of n > 2.

We next illustrate the use of the normal equations in the following example.

Example Find the equation of the regression line that fits the points (0, 4), (1, 7), (2, 9),

(3, 12), and (5, 18) by solving the normal equations.

Solution We begin with the scatterplot of the data, as shown in Figure 2. We calculate

the various sums needed in the above two equations by completing the entries in the

following table.

x y x2 xy

0 4 0 0

1 7 1 7

2 9 4 18

3 12 9 36

5 18 25 90

39

2

xy

151

The coefficients a and b are the solutions of the normal equations (2) and (3)

a �xi + bn �yi

We substitute the values for the various sums and use the fact that n = 5 to get the system

of equations

39a + 11b = 151

11a + 5b = 50.

The solution is

a = 2.77027027 and b = 3.90540541

and so the equation of the regression line is y = 2.77027027x + 3.90540541. This line is

shown superimposed over the data points in Figure 2. You can check that the regression

features of your calculator or a software package such as Excel give the same results.

Finally, we note that the method used by calculators and spreadsheets to fit

exponential, logarithmic, and power functions is based on a modification of the above

ideas. In particular, fitting an exponential function to (x, y) data uses the principal that if

y is an exponential function of x, then log y is a linear function of x. Consequently, the

programs used will fit a linear function to the transformed (x, log y) data and then undo

the transformation with the inverse function. (The result is the best fit for log y versus x;

it is not necessarily the best fit for y versus x, which would be based on applying the least

squares criterion directly, as is done with some mathematical software packages such as

Derive, Maple, and Mathematica.) However, because of the transformation, the

programs will not accept any data in which y is zero or negative – the logarithm is not

defined and you get a domain error.

In a similar way, fitting a logarithmic function to data is done on the calculators

and spreadsheets by transforming the (x, y) data into (log x, y) values, fitting a linear

function to the transformed data, and then undoing the transformation using the inverse

function. Both of these processes are called semi-log plots.

Fitting a power function to data on a calculator or spreadsheet involves

transforming the original (x, y) data into (log x, log y) data (known as a log-log plot),

fitting a linear function to the transformed data, and then undoing the transformation

using the inverse function. Clearly, this cannot be done if either x or y is non-positive.

There are other complications that can easily arise, particularly with fitting power

functions to data, but we will not go into any of them here. A discussion of some of these

issues can be found in [1].

Reference

1. Gordon, Sheldon P., Florence S. Gordon, et al, Functioning in the Real World: A

Precalculus Experience, 2nd Ed., Addison-Wesley, 2003.

Acknowledgement The work described in this article was supported by the Division of

Undergraduate Education of the National Science Foundation under grant DUE-0089400.

However, the views expressed are not necessarily those of either the Foundation or the

project.

- Riemann HypothesisUploaded byzikibruno
- Basic CalculusUploaded byriee
- Math Cheats CompiledUploaded byWan Muhammad Faizuddin
- solutions_to_WR_Ch4Uploaded byBe Thedean
- Summation.pdfUploaded byNelly Gómez
- test16Uploaded bymihneasos
- 02 Linear an Quadratic 0001Uploaded byArantxa Quiñones
- MATH102 SyllabusUploaded byEnrique Camacho Villalobos
- test16.pdfUploaded byCarlo
- quadratic functions calculator labUploaded byapi-254068644
- CalculusUploaded byStephen Eleserio
- Recommended FunctionsUploaded byMat Mazel
- IJAIEM-2013-12-31-108Uploaded byAnonymous vQrJlEN
- Bio Optical ModelsUploaded byLuluk Purwanto
- GrowthUploaded byIshwar Chandra
- 20. Exponentials and LogarithmsUploaded byArsalan Jumani
- H1-solUploaded bylourdesgino
- comprehension and communications lesson finalUploaded byapi-245023409
- Gas Field Engineering - Deliverability TestsUploaded byLoh Chun Liang
- 1913-Theory of the Non-elastic & Elastic Catenary as Applied to Transmission Lines by PierceUploaded bycustomerx
- Crosswords and Information TheoryUploaded byjdenclar
- Problem Class 1Uploaded byLuca Haraini
- What is ResonanceUploaded byRaghavendra Saraf
- Rules of Calculus - Functions of One VariableUploaded byafjkjchhghgfbf
- cartesian islandUploaded byapi-262033656
- 6Uploaded byZahid Bhat
- Ch02_BasicCommands.pdfUploaded byDavid Capital
- 2017 Formula RioUploaded byDiego Ivan Reyes Gonzalez
- DPP-3-Maths.pdfUploaded byswetank tripathi
- 2018-19_G9_1ST QTR_MATHUploaded byRuben Pulumbarit III

- Aluminium (Handout)Uploaded bys17m0582
- Redhead 074175aUploaded bylex
- Past Present and Future Perspectives of Refrigerants in Air-conditioning Refrigeration and Heat Pump ApplicationsUploaded byvicluc
- 18_AWG_Crimp_Guidelines.pdfUploaded byAmir Hooshang Ghadymi Mahani
- Inertial NavigationUploaded byEmma Mary Manucduc
- Mini EE OutlineUploaded byKayla Morton
- Catalog LHCUploaded byliang_thailand
- Lightweight StructuresUploaded bykartikkeyyan
- Mahabharat All Episodes next genUploaded byAchmad Nugroho
- Arsenic RemovalUploaded byrheumon
- IZM World's SmallestUploaded bygaurav110790
- EE Syllabus OutlineUploaded byPratik Sarkar
- An Experimental Study on Partial Replacement of Sand with Crushed Brick in ConcreteUploaded byIJSTE
- 1239-001Uploaded byDen Oghangsomban
- DC Mod 7 Hydraulics PresentationUploaded byNikhil Menon
- honors general chemistry 1 powerpoint lectures 1 2Uploaded byapi-273728741
- Manual_Berkeley_Tipo_B.pdfUploaded byDanielGonzalez
- Medical Device for Management of Infectious DiseasesUploaded byolipoppo
- Abhijeet Visualcv Resume-1Uploaded byNitin Upadhyay
- 4 Prinsip Position Sensor [Compatibility Mode]Uploaded byanon_345163302
- Castellated BeamsUploaded bysohcahtoa
- (1) Induction Furnaces Lining _ LinkedinUploaded byChayon Mondal
- 63430503-PS-SB-5510-ENG-05Uploaded byfajar abadi
- Spray Nozzles 1Uploaded byFranklin Santiago Suclla Podesta
- 12113-STAADPro V8i Verification ManualUploaded byEugene Afable
- Homework 2Uploaded bygedeus8072
- Robotics Good SlideUploaded byharikiran3285
- classicalmechanics.pdfUploaded byMD Osama ali
- Pilkington Suncool BrochureUploaded byGeorge Cretu
- Series DPTG-3 Specification SheetUploaded byWatts

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.