LECTURE 4: Linear Least Squares Problem, Normal Equation and QR Factorization

MAT
517: COMPUTATIONAL LINEAR ALGEBRA
Assoc. Prof. Dr. Noor Atinah Ahmad
School of Mathematical Sciences
Universiti Sains Malaysia
nooratinah@usm.my

LECTURE 4: Linear Least Squares
Problem, Normal Equation and QR
factorization
Linear least squares problem:
Some examples
 Polynomial regression
 Linear prediction

Recall: Simple linear regression
What is linear regression?
 Fit a straight line to your data:
y  t    0  1t.
 Suppose your dataset is available for , i.e.,
t1 , t2 , , tM
 b  t1     0  1t1 
    t 
 b  2 
t
b , y  0 2 2 
     
   
 b  tM     0  1tM 
Model for your observation
Observation data
data
Polynomial regression
 Fit a polynomial to your data:
y  0 1 2
t     t   t 2
    N t N
 Suppose your dataset is available for , i.e.,
t1 , t2 , , tM
 b  t1     0  1t1   2t12     N t1N 

   N 
 b  t2     0  1t2   2t2     N t2 
2
b , y .
     
   N 

 b  tM           
2
 0 t
1 M t
2 M N M 
t
Model for your observation
Observation data
data
The linear algebra of your problem
 Your model (polynomial of degree N):

  0  1t1   2t12     N t1N  1 t1 t12  t1N    0 

N   N  
   t   t 2
    t 1 t t 2
 t 
y   0 1 2 2 2 N 2 
 2 2 2  1 
          
    t   t 2     t N   1 t 2
 N 
  
 0 1 M 2 M N M   M t M t M  N 

A x
by
 Hypothesis: (i.e. b is ‘almost’ an Nth degree polynomial),
or b  y  ε.
 To find the ‘best fit’ for b, we find the most optimum w by minimizing the distance

min b  y 2  min b  Ax 2 .
2 2
x x
Linear prediction: Predicting the future
I will build a model to predict the stock market on the
assumption that the KLSE index depends on prior day’s
 Dow Jones index, p(t);
 exchange rate of Malaysia Ringgit, r(t);
 happiness index of foreign investors (assuming such
index exists), f(t).
(No need to worry about the credibility of my model at this
point. People have been trying to model the stock market for
decades and there are hundreds of credible and uncredible
models….so another one won’t hurt)

My stock market model
I’m assuming a linear model of the form

mklse(t )  x1 p (t  1)  x2 r (t  1)  x3 f  t  1 .

(mklse(t) – klse index on day t predicted by my model)

To complete the model, I need to figure out what
x   x1 , x2 , x3 
T

is.
The data
Suppose I’m able to get access to historical data for 300 business
days:
 
a1  p  t1  , p  t2  , , p  t300  ,
T

 
a 2  r  t1  , r  t2  , , r  t300  ,
T

 
a3  f  t1  , f  t2  , , f  t300  ,
T

 
b  tklse  t1  1 , tklse  t2  1 , , tklse  t300  1 .
T

(tklse(t) – true klse index on day t)
I shall now tune x so that the model looks the best.
How might I do that?
First, I’m going to put on my math thinking cap
My mathematical intuition tells me….
I have actual data, so why don’t I just plug them in into my
model. This is what I have…

1 2  
mklse(t  1)  x p (t )  x r t  x f t
1 1 1 3  
1
 
mklse(t2  1)  x1 p (t2 )  x2 r t2  x3 f t2  



 
mklse(t300  1)  x1 p (t300 )  x2 r t300  x3 f t300 
(mklse(t) – my model for klse index on day t)

Not a bad idea…now I have 300 linear equations.
Throw in some matrices
w   w1 , w2 , w3 
T
I’m still no where near in getting the value for ,
but I can make a lot of progress if I rewrite my linear equations
in matrix form: a a a
1 2 3
 mklse  t1  1   p  t1  r  t1  f  t1  
     x1 
 mklse  t2  1    p  t2  r  t2  f  t2    
x2 
        
     x3 
 mklse  t300  1   p  t300  r  t300  f  t300  
b̂ x
A
How to get the ‘best’ model?
Ok, let’s define some meaning to ‘best’:
b̂
 We would like our model observation to be as close as
b
possible to the true observation .
2
b̂ b b  bˆ
 Define the distance between and by .
2

 Solve the following minimization problem to get : x
2
min b  b  min b  Ax 2
2
ˆ
x 2 x
x*
 Let be the solution to the minimization problem.
Then our ‘best’ model is
mklse(t )  x *1 p (t  1)  x *2 r (t  1)  x *3 f  t  1 .

Linear least squares problem:
The normal equation
 Normal equation
 Pseudoinverse
The minimization problem
In general, the problem
2
min b  b  min b  Ax 2 ,
2
ˆ
x 2 x
A  R M  N , b, bˆ  R M
with , is called the linear least squares
problem, i.e. the problem of finding the best linear model wrt.
the norm.
L2

Several things to note:
 b ˆ  Ax  range  A 
;
b 2range  A  x  R N b  Ax
 If , then s.t. and
b  bˆ  0
(perfect model).
2
The ‘not so perfect’ world
• It is a common situation that some physical problem leads to
a linear system Ax = b that should be consistent on
theoretical grounds. However due to “measurement errors” in
the entries of A and b, the perturbed system fails to be
consistent
Original system Perturbed system

Ax  b
  b
Ax
Errors A  AE
b  b   b
From the perspective of linear transformation
RM
RN
b
A
x Ax
range  A 
Measuring minimum distance
Simplify the problem
R3
to : b
range  A 
How do you determine the minimum
b range  A 
distance from to the plane ?
Measuring minimum distance (cont’d)
Find an orthogonal
b
projection from to
b r
range  A 
:
Ax *
0 R3 
range  A 
r
 The distance is measured by the ‘length’ of ,
the residual vector.
 The optimum solution is then (called the
x*
Ax  b
least squares solution of ).
The solution method
• What is r?
r  b  Ax.
range  A  r   range  A  

• r is orthogonal to . Thus .
range  A    null  AT 
•  .

• which means….
AT r  0,
or… AT  b  Ax   0,
or… AT Ax  AT b

THE NORMAL
EQUATION
What do we know about the normal equation
A  R M N
• If , then the normal equation is an N x N linear
system.
• The normal system is consistent because it is satisfied by the
least squares solution.
• The normal equation may have infinitely many solutions, in
which case all its solution is least squares solution of Ax = b .

When does the normal equation has a
unique solution?
The normal equation has a unique solution when…
These statements are equivalent:
 Columns of A are linear independent.
 A is full rank.
 AT A is nonsingular.

 
1
A AT
A AT
If is nonsingular , exists. Thus, the normal
equation solves as
 
1
x  A A AT b.
T

This solution is unique.

The pseudoinverse
• When A is full rank, we can define a pseudoinverse for A
 
which is 1
A  A A AT .
† T

• So, in terms of the pseudoinverse, the unique least squares
solution can be written as
x  A †b.

• Condition number:

cond  A   A A † .

Solving the normal equation
AT A
Matrix is symmetric positive definite (SHOW THIS!!!).
Therefore we can use Cholesky factorization to solve the normal
equation
A Ax  A b.
T T

Cost:
2
AT A
 N to compute ;
M
3
 N 6
for the Cholesky factorization.

TOO COSTLY!!!
A catastrophic example
 1 0 
 4 
A  10 0 
 0 104 
WHAT DO YOU
 
MAKE OF THIS
MATRIX?
1  10 8
1 
A A
T
8 
.
 1 1  10 
A catastrophic example
 1 0 
 4 
A  10 0 
 0 104 
WHAT DO YOU
 
MAKE OF THIS
MATRIX?
1  10 8
1 
A A
T
8 
.
 1 1  10 
ILL‐CONDITIONED!
A few facts
• The normal equation method costs too much because we
AT A
need to form .
cond 2  A 
• It squares the condition number (compare and
2 A A
T
in the case M
cond = N).
• It’s useful for small hand calculation.
Example

N  2, M  3
1 4 5
     14 32   46 
A   2 5, b  7, A A  
T
, A b  
T
.
3 6 9  32 77  109 
   
cond   A   16; cond   AT A   220.

Normal equation for Example 1
 14 32   46 
 x   .
 32 77  109 
Solution: That was easy…..
 1 But hang on a minute.
x   . Notice how fast the
 1 numbers grow.
QR factorization

WE ALREADY HAVE LU
FACTORIZATION…WHY DO WE NEED QR
FACTORIZATION?
The LU factorization as a matrix transformation
The LU factorization is a linear transformation of A to U:
1
MA  L A  U.

The transformation matrix M is a product of elementary
matrices.
Suppose the transformation matrix is an orthogonal
matrix?
QT A  R  A  QR
Why choose orthogonal matrices?
Multiplication by orthogonal matrices is stable. The norm
L2
does not grow:

Qv 2  v 2 , QA 2  A 2 .
So, PROOF
THESE
RESULTS!!!
R 2  QT A  A 2 .
2
Qv 2  v 2
What does tell us about
orthogonal transformation?
M N
QR factorization:
M N
The QR factorization of an matrix A is
A = QR,
where
M N
 Q is with orthonormal columns;
NN
 R is upper triangular with positive diagonals.

THIS DEFINES THE ECONOMY SIZE QR FACTORIZATION (OR
‘REDUCED QR’)
In MATLAB:
[Q,R]=qr(A,0);
The second argument 0 matters.

About Q and R
QT Q  I
• Q has orthonormal columns so , where I is the N x N
identity matrix.

• R has positive diagonals so it is nonsingular.
How QR helps solve the least squares problem
We will start from the normal equation
AT Ax  AT b.
Let A be full rank and A = QR. Using the QR factorization we can
write the normal equation as
 QR   QR  x   QR  b,
T T

R T QT QRx  R T QT b.

QT Q  I
and R RT
(hence ) is nonsingular, thus the normal
equation is reduced to
Rx  QT b.
Solving the reduced normal equation
• The reduced normal equation is upper triangular.
• Can be solve by back solving (backward substitution).
• In MATLAB, given A and b, the least squares solution is
obtained as follows:

[Q,R] = qr(A,0);
c = Q’*b
x = R\c;
Problem with pre‐QR factorization of A
You need to store Q .
Q can be rather big.
Q can be very dense with no particular structure.
Cost:
 Compute QR factorization (later) ‐‐‐ N 2 M  N 3 3
 Multiply b by ‐‐‐
QT O  MN 
Rx  QT b
 Solve ‐‐‐ N2 2
I think we can do
better
The classical Gram‐Schmidt
algorithm

How can we improve the QR method?
I’m going to try and get some answers from the factorization
itself.

Let’s write A, Q and R in column wise representation:

    



  

 

 

A   a1 a 2  a N  , Q   q1 q 2  q N  , R   r1 r2  rN  .

           
  
Several tips
Our analyses will require us to be quick with matrix algebra, so
here’s a few tips:

Abi
‐ the ith column of AB
a  B
iT
‐ the ith row of AB
Aei
‐ the ith column of A
eTi A
‐ the ith row of A
Work with R first
R is upper triangular, so can write the columns more specifically
as:

 r11   r12   r1N 
      N
0 r r
   r e , r   22   r e  r e ,  , r   2 N   r e
   
r 
1    11 1 2    12 1 22 2 N iN i
i 1
     
 0  0  NN 
r

e1 , e 2 , , e N
where are the Euclidean unit vectors (canonical
RN
vectors) in .

Now we work on QR
Write the matrix product QR in terms of its column:

    
 
QR   Qr1 Qr2  QrN  .

    
 
Now, the first column is
 r11 
 
 0
Qr1   r11Qe1  r11q1.
 
 
0
The rest of the columns in QR
The second column:
 r12 
 
 r22 
Qr2   r12Qe1  r22Qe 2  r12q1  r22q 2 .
  
 
0
etc.
The Nth column:
 r1N 
  N
r N
QrN     r Qe  r q .
    
2 N
iN i iN i
i 1 i 1
 
 NN 
r
Now compare A and QR
A = QR implies
a1  r11q1 ,
a 2  r12q1  r22q 2 ,


N
a N  r1N q1  r2 N q 2    rNN q N   riN qi .
i 1
Columns of A are linear combinations of columns
of Q.
Which means…….
Relationship between A and Q
• Columns of A spans the same vector space as columns of Q.
range  A 
• Columns of Q spans i.e. column space of A.
range  A 
• Columns of Q forms an orthonormal basis for .
We can form Q by orthonormalizing the
column vectors of A
QR from scratch
• The idea: Build an orthonormal basis from scratch.

• Transforms a set of linearly independent vectors (a basis)
a1 , a 2 , , a N  ,
into an orthonormal basis
q1 , q 2 , , q N  .

• Go through the vectors one by one and substract off the part
of a vector that is not orthogonal to the previous. Normalizing
each time to produce an orthonormal set.
Step 1
a1 , a 2 , , a N 
Start with our full rank matrix A. The columns are
range  A 
linearly independent and forms a basis of .

v1  a1
Let . Turn it into a unit vector q1  v1 v1 2 .
q1 a2
Consider and :
v 2  a 2   2 q1
1
a2
v2
v 2  q1
q1
span a1 , a 2   span q1 , a 2   span q1 , v 2  .

Step 1 (orthogonality condition)
1
We’re still missing something…….  2 ?

Here’s when we apply the orthogonality condition:

 v 2 , q1   0.

This is equivalent to
 1

a 2   2 q1 , q1   a 2 , q1    2  q1 , q1   0,
1
which means

1   a 2 , q1  .
1
At the end of Step 1
 Our starting vector (normalized):
q1  a1 a1 2 .
q1
 A vector orthogonal to :
v 2  a 2   21q1.
v2
 Normalize the intermediate vector to get a unit vector
q2  v2 v2 2 .
We are already two steps towards a
QR factorization:

a1  a1 2 q1
a 2   2 q1  v 2 2 q 2
1
Step 2

Now consider

q1 , q 2 , a3 .
a3
v3
q2
q1
v 3  span q1 , q 2 
span a1 , a 2 , a3   span q1 , q 2 , v 3  

v 3  a3   3 q1   3 q 2
1 2

Step 2 (orthogonality condition)
1  2
3 ,3
Two things we need to determine ……. .
Two unknowns mean we need two orthogonality conditions.
Luckily, we have exactly TWO:

 v3 , q1   0, &  v3 , q 2   0.

The first condition is equivalent to
 2 1

a3   3 q 2   3 q1 , q1

  3 1  3  2 1  3  q1 , q1   0,
a , q    2
q , q   1
which means
3   a 3 , q1  .
1

 2
Use the second orthogonality condition to find  3
At the end of Step 2
q1 q2
 A vector orthogonal to and :
v 3  a3   31q1   3 2q 2 .
 Normalize the intermediate vector to get a unit vector
v3
q3  v 3 v 3 2 .
Can you see the emerging
pattern?

a1  a1 2 q1
a 2   2 q1  v 2 2 q 2
1
a3   3 q1   3 q 2  v 3 2 q3
1 2
Step k (1<k<N-1)
Set v k 1  a k 1   k11q1   k 21q 2     k k1q k  .

Normalize q k 1  v k 1 v k 1 2 .
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
v k 1  span q1 , q 2 , , q k 
span a1 , a 2 , , a k 1  span q1 , q 2 , , q k , v k 1
An orthogonal basis for
A basis for Rk+1
Rk+1
Write the orthogonality condition at the kth step.
 k11 ,  k 21 , ,  k k1
Then, use these conditions to find .
Realizing the QR factorization
Putting it all together:
a1  v1 2 q1 Set:
a 2  v 2 2 q 2   2 q1
1 rii  v i 2 , i  1, 2, , N
rij   j  , i  j ,
i
a3  v 3 2 q3   3 q 2   3 q1
2 1
rij  0, i  j.

a n  v n 2 q n   n n 1q n 1     n1q1
Convince yourself that equations from the previous
A  QR
slide produce an equation of the form , where
Q is an orthogonal matrix and R is an upper triangular
matrix.
Classical Gram‐Schmidt: The algorithm
Input A

For i = 1, …,n
v i  ai
For j = 1,…,i-1
rij  aTi q j
End
For j = 1,…,i-1
v i  v i  rij q j

End
rii  v i 2 ; qi  v i rii

End
Gram‐Schmidt: Cost
WE’LL DO
THIS IN CLASS
Let’s work on an example: QR factorization
Find the QR factorization of A using the CGS method:
0 1 1 1
 
 2 1 2 0
A
1 0 0 0
 
 0 0 1 1
Let’s work on an example: Least squares problem
Find the least squares solution of Ax = b using the QR factorization
you obtained earlier, given that
0 1 1 1 1
   
 2 1 2 0  1
A , b .
1 0 0 0 1
   
 0 0 1 1 1
Classical Gram‐Schmidt: Some MATLAB
function [q,r] = class_gs(A)
% class_gs computes QR factorization via

% Classical Gram-Schmidt orthogonlization
% procedure
[m,n] = size(A);
q = zeros(m,n); r = zeros(n,n);
for j= 1:n
r(1:j-1,j) = q(:,1:j-1)’*A(:,j); VECTORIZATION
v = A(:,j) - q(:,1:j-1)*r(1:j-1,j);
r(j,j) = norm(v);
q(:,j) = v/norm(v);
end
Advantages of CGS

• Easy to vectorize

• MATLAB automatically uses your multi‐core computer
correctly

• Parallelizable
IS THIS ENOUGH???

Stability of CGS
Consider this matrix:
 1 1 1 
 
A  107 107 0 .
107 7 
 0 10 
First, let’s use MATLAB to get the QR factorization and look at
QT Q  I QR  A
and :
>>[qm,rm] = qr(A);
>>[norm(qm’*qm – eye(3)), norm(qm*rm – A)]
ans =
3.8459e-16 4.4409e-16
On the other hand…..
Let’s do the same using CGS:

>>[qc,rc] = class_gs(A);
>>[norm(qc’*qc – eye(3)), norm(qc*rc – A)]
ans =
7.9928e-04 0
THIS IS BAD!!!
WHAT
HAPPENED???
What about Q?
>>qc
qc =
1.0000e+00 9.9920e-08 9.9920e-08

1.0000e-07 9.9262e-15 -1.0000e+00
1.0000e-07 -1.0000e+00 -7.9928e-04
I DON’T THINK THIS IS AN
ORTHOGONAL MATRIX…
Try it with a linear system problem
>> b = A*ones(3,1);
>> [rm\qm'*b rc\qc'*b]
ans =
1.0000e+00 1.0056e+00
1.0000e+00 9.9840e-01
1.0000e+00 9.9600e-01
>> [norm(rm\qm'*b - ones(3,1)) norm(rc\qc'*b - ones(3,1))]
ans =
1.0175e-15 7.0632e-03
DO YOU EXPECT THIS?

Let me try and explain what happened
Q T
Q
Let’s look at :
>> qm'*qm

ans =

q1T q 2
1.0000e+00 ‐1.3235e‐23 ‐1.0703e‐23
‐1.3235e‐23 1.0000e+00 2.2204e‐16
‐1.0703e‐23 2.2204e‐16 1.0000e+00

>> qc'*qc
q1T q3

ans =

1.0000e+00 ‐7.9928e‐11 ‐1.5986e‐10 qT2 q3
‐7.9928e‐11 1.0000e+00 7.9928e‐04
‐1.5986e‐10 7.9928e‐04 1.0000e+00

Loss of orthogonality
• Instability in CGS is caused by loss of orthogonality.
• As the steps in CGS progresses, columns in Q keeps losing
orthogonality.
• The problem is much more serious in CGS than MATLAB’s QR.
ak
Errors in step k will
vk v k be carried through to
steps k+1, k+2, …
q k 1
rk ,k 1  rk ,k 1  
A somewhat more stable version of CGS: Modified
Gram‐Schmidt
Input A

For i = 1, …,n You should verify that MGS is
v i  ai exactly the same as CGS in exact
For j = 1,…,i-1 arithmetic
rij  qTj v i

v i  v i  rij q j
End
rii  v ii 2 ; qi  v i rii
End
CGS vs MGS in exact arithmetic: Rework this exercise
Find the QR factorization of A using the MGS method
(use the algorithm provided in the previous slide):
0 1 1 1
 
 2 1 2 0
A
1 0 0 0
 
 0 0 1 1
CGS vs MGS in exact arithmetic
IN EXACT ARITHMETIC, BOTH
PROCEDURES GENERATE EXACTLY THE
SAME OUTPUT!
MORE WILL BE
IN ENDOF LECTURE 4
THANK YOU!

LECTURE 4: Linear Least Squares Problem, Normal Equation and QR Factorization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LECTURE 4: Linear Least Squares Problem, Normal Equation and QR Factorization

Uploaded by

Copyright:

Available Formats

MAT

 b  t1     0  1t1   2t12     N t1N 

cond   A   16; cond   AT A   220.

span a1 , a 2   span q1 , a 2   span q1 , v 2  .

span a1 , a 2 , a3   span q1 , q 2 , v 3  

Set v k 1  a k 1   k11q1   k 21q 2     k k1q k  .

span a1 , a 2 , , a k 1  span q1 , q 2 , , q k , v k 1

function [q,r] = class_gs(A)

% class_gs computes QR factorization via

1.0000e+00 9.9920e-08 9.9920e-08

>> [norm(rm\qm'b - ones(3,1)) norm(rc\qc'b - ones(3,1))]

You might also like