You are on page 1of 13

MA 106 : Linear Algebra

Lecture 11

J. K. Verma
Department of Mathematics
Indian Institute of Technology Bombay

J. K. Verma 0 / 12
Least squares approximation

1 Suppose we have a large number of data points (xi , yi ) i = 1, 2, . . . , n,


collected from some experiment.
2 Sometime we believe that these points lie on a straight line.
3 So a linear function y (x) = s + tx may satisfy

y (xi ) = yi , i = 1, . . . , n.

4 Due to uncertainty in data and experimental error, in practice the points will
deviate somewhat from a straight line and so it is impossible to find a linear
y (x) that passes through all of them.
5 So we seek a line that fits the data well, in the sense that the errors are made
as small as possible.

J. K. Verma 1 / 12
Least squares approximation

1 A natural question that arises now is: how do we define the error? Consider
the following system of linear equations, in the variables s and t, and known
coefficients xi , yi , i = 1, . . . , n:

s + x1 t = y1
s + x2 t = y2
.
.
s + xn t = yn

2 Note that typically n would be much greater than 2. If we can find s and t to
satisfy all these equations, then we have solved our problem.
3 However, for reasons mentioned above, this is not always possible.

J. K. Verma 2 / 12
Least squares approximation

1 For given s and t, the error in the ith equation is |yi − s − xi t|.
2 There are several ways of combining the errors in the individual equations to
get a measure of the total error.
3 The following are three examples:
v
u n n
uX X
t (yi − s − xi t)2 , |yi − s − xi t|, max 1≤i≤n |yi − s − xi t|.
i=1 i=1

4 Both analytically and computationally, a nice theory exists for the first of
these choices and this is what we shall study.
qP
n 2
The problem of finding s, t so as to minimize i=1 (yi − s − xi t)
5

is called a least squares problem.

J. K. Verma 3 / 12
Least squares approximation

1 Suppose that
     
1 x1 y1 s + tx1
 1 x2   y2   s + tx 
  2 
s
    
A= . . , b =  . , x = , so Ax =  . .
     
    t  
 . .   .   . 
1 xn yn s + txn

2 The least squares problem is finding an x such that ||b − Ax|| is minimized,
i.e., find an x such that Ax is the best approximation to b in the column
space C (A) of A.
3 This is precisely the problem of finding x such that b − Ax ∈ C (A)⊥ .
4 Note that b − Ax ∈ C (A)⊥ ⇐⇒ At (b − Ax) = 0 ⇐⇒ At Ax = At b. These
are the normal equations for the least square problem.

J. K. Verma 4 / 12
Least squares approximation
1 Example. Find s, t such that the straight line y = s + tx best fits the
following data in the least squares sense:

y = 1 at x = −1, y = 1 at x = 1, y = 3 at x = 2.

1 −1
   
1
2 Project b =  1  onto the column space of A =  1 1 .
3 1 2
   
3 2 5
3 Now At A = and At b = . The normal equations are
2 6 6
    
3 2 s 5
= .
2 6 t 6

4 The solution is s = 9/7, t = 4/7.


9
5 Therefore the best line which fits the data is y = 7 + 47 x.
J. K. Verma 5 / 12
Least squares approximation

1 We can also try to fit an mth degree polynomial

y (x) = s0 + s1 x + s2 x 2 + · · · + sm x m

to the data points (xi , yi ), i = 1, . . . , n, so as to minimize the error in the


least squares sense.
2 In this case s0 , s1 , . . . , sm are the variables and we have
     
1 x1 x12 . . x1m y1 s0
 1 x x2 . . x2m  y2   s1
   
 2 2   
A= . . . . . , b =  . , x =  . .
     
     
 . . . . .   .   . 
1 xn xn2 . . xnm
yn sm

J. K. Verma 6 / 12
Fitting data points by a parabola

1 The path of a projectile is a parabola. By observing it at a few points, we can


describe the parabolic path which best fits the data.
2 This is done by the least squares approximation which was first described by
C. F. Gauss who found the trajectory of the largest asteroid Ceres in the
Solar system.
3 Problem. Fit heights b1 , b2 , . . . , bm at the times t1 , t2 , . . . , tm by a parabola
c + dt + et 2 .
4 Solution. If each observation is correct then the equations are

c + dt1 + et12 = b1
c + dt2 + et22 = b2
..
.
2
c + dtm + etm = bm

J. K. Verma 7 / 12
Fitting by a parabola

1 The best parabola c + dt + et 2 which fits the m data points is found by


solving the normal equations At Ax = At b where x = (c, d, e)t and

t12
 
1 t1
 1 t2 t22 
A= .
 
.. .. ..
 . . . 
2
1 tm tm

2 Since t1 , t2 , . . . , tm are distinct, if we have m ≥ 3 then the rank of A is three


since using the van der Monde determinant, we see that the determinant of
the sub matrix of the first three rows and all the columns is nonzero.
3 Hence rankAt A = rankA = 3. This means At A is invertible. Therefore the
only solution to At Ax = At b is given by

x = (At A)−1 At b.

J. K. Verma 8 / 12
Review of orthogonal matrices
1 A real n × n matrix Q is called orthogonal if Q t Q = I .
2 A 2 × 2 orthogonal matrix has two possibilities:
   
cos θ − sin θ cos θ sin θ
A= or B = .
sin θ cos θ sin θ − cos θ
3 TA represents rotation of R2 by θ radians in anticlockwise direction.
4 The matrix B represents a reflection with respect to y = tan(θ/2)x.
5 Definition. A hyperplane in Rn is a subspace of dimension n − 1.
6 A linear map T : Rn → Rn is called a reflection with respect to a hyperplane
H if Tu = −u where u ⊥ H and Tu = u for all u ∈ H.
7 Definition. Let u be a unit vector in Rn . The Householder matrix of u, for
reflection with respect to L(u)⊥ is H = I − 2uu t . Hence
Hu = u − 2u(u t u) = −u. If w ⊥ u then Hw = w − 2uu t w = w .
8 So H induces reflection in the plane perpendicular to the line L(u).
9 Exercise. Show that H is a symmetric and orthogonal.
J. K. Verma 9 / 12
The QR decomposition of a matrix

1 Problem. Suppose A = [u1 u2 . . . un ] is an m × n matrix whose column


vectors u1 , u2 , . . . un are linearly independent. Let Q be the matrix whose
column vectors are q1 , q2 , . . . , qn which are obtained from u1 , u2 , . . . , un by
applying the Gram-Schmidt orthonormalization process. How are A and Q
related?
2 Let C (A) denote the column space of A. Then {q1 , q2 , . . . , qn } is an
orthonormal basis of C (A). Hence

u1 = hq1 , u1 iq1 + hq2 , u1 iq2 + · · · + hqn , u1 iqn


u2 = hq1 , u2 iq1 + hq2 , u2 iq2 + · · · + hqn , u2 iqn
..
.
un = hq1 , un iq1 + hq2 , un iq2 + · · · + hqn , un iqn

J. K. Verma 10 / 12
The QR decomposition of a matrix

1 These equations can be written in the matrix form as

hq1 , u1 i hq1 , u2 i hq1 , un i


 
...
 hq2 , u1 i hq2 , u2 i ... hq2 , un i 
[u1 u2 . . . un ] = [q1 q2 . . . qn ] 
 
.. .. .. 
 . . . 
hqn , u1 i hqn , u2 i . . . hqn , un i

2 By the Gram-Schmidt construction qi ⊥ uj for all j ≤ i − 1.


3 Hence the matrix R = (hqi , uj i) is upper triangular.
4 Moreover none of the diagonal entries are zero. Hence R is an invertible
upper triangular matrix.
5 This gives us A = QR where Q = [q1 q2 . . . qn ].

J. K. Verma 11 / 12
The QR decomposition and the normal equations

1 If A is a square matrix then the column vectors of Q form an orthonormal


basis of Rn and QQ t = I .
2 Matrices which satisfy this condition are called orthogonal matrices.
3 The normal equations for the least squares approximation are At Ax = At b.
These correspond to Ax = b. Let us use the QR decomposition of A to solve
the normal equations.
4 Theorem. Let A be an m × n matrix with linearly independent column
vectors. Let A = QR be its QR-decomposition. Then for every b ∈ Rm the
system Ax = b has a unique least squares solution given by x = R −1 Q t b.
5 Proof. Substitute A = QR in the equations At Ax = At b to get

(QR)t QRx = (QR)t b =⇒ R t (Q t Q)Rx = R t Q t b =⇒ x = R −1 Q t b.

J. K. Verma 12 / 12

You might also like