Professional Documents
Culture Documents
517: COMPUTATIONAL LINEAR ALGEBRA
Assoc. Prof. Dr. Noor Atinah Ahmad
School of Mathematical Sciences
Universiti Sains Malaysia
nooratinah@usm.my
LECTURE 4: Linear Least Squares
Problem, Normal Equation and QR
factorization
Linear least squares problem:
Some examples
Polynomial regression
Linear prediction
Recall: Simple linear regression
What is linear regression?
Fit a straight line to your data:
y t 0 1t.
Suppose your dataset is available for , i.e.,
t1 , t2 , , tM
b t1 0 1t1
t
b 2
t
b , y 0 2 2
b tM 0 1tM
Model for your observation
Observation data
data
Polynomial regression
Fit a polynomial to your data:
y 0 1 2
t t t 2
N t N
Suppose your dataset is available for , i.e.,
t1 , t2 , , tM
Model for your observation
Observation data
data
The linear algebra of your problem
Your model (polynomial of degree N):
0 1t1 2t12 N t1N 1 t1 t12 t1N 0
N N
t t 2
t 1 t t 2
t
y 0 1 2 2 2 N 2
2 2 2 1
t t 2 t N 1 t 2
N
0 1 M 2 M N M M t M t M N
A x
by
Hypothesis: (i.e. b is ‘almost’ an Nth degree polynomial),
or b y ε.
To find the ‘best fit’ for b, we find the most optimum w by minimizing the distance
min b y 2 min b Ax 2 .
2 2
x x
Linear prediction: Predicting the future
I will build a model to predict the stock market on the
assumption that the KLSE index depends on prior day’s
Dow Jones index, p(t);
exchange rate of Malaysia Ringgit, r(t);
happiness index of foreign investors (assuming such
index exists), f(t).
(No need to worry about the credibility of my model at this
point. People have been trying to model the stock market for
decades and there are hundreds of credible and uncredible
models….so another one won’t hurt)
My stock market model
I’m assuming a linear model of the form
mklse(t ) x1 p (t 1) x2 r (t 1) x3 f t 1 .
(mklse(t) – klse index on day t predicted by my model)
To complete the model, I need to figure out what
x x1 , x2 , x3
T
is.
The data
Suppose I’m able to get access to historical data for 300 business
days:
a1 p t1 , p t2 , , p t300 ,
T
a 2 r t1 , r t2 , , r t300 ,
T
a3 f t1 , f t2 , , f t300 ,
T
b tklse t1 1 , tklse t2 1 , , tklse t300 1 .
T
(tklse(t) – true klse index on day t)
I shall now tune x so that the model looks the best.
How might I do that?
First, I’m going to put on my math thinking cap
My mathematical intuition tells me….
I have actual data, so why don’t I just plug them in into my
model. This is what I have…
1 2
mklse(t 1) x p (t ) x r t x f t
1 1 1 3
1
mklse(t2 1) x1 p (t2 ) x2 r t2 x3 f t2
mklse(t300 1) x1 p (t300 ) x2 r t300 x3 f t300
(mklse(t) – my model for klse index on day t)
Not a bad idea…now I have 300 linear equations.
Throw in some matrices
w w1 , w2 , w3
T
I’m still no where near in getting the value for ,
but I can make a lot of progress if I rewrite my linear equations
in matrix form: a a a
1 2 3
mklse t1 1 p t1 r t1 f t1
x1
mklse t2 1 p t2 r t2 f t2
x2
x3
mklse t300 1 p t300 r t300 f t300
b̂ x
A
How to get the ‘best’ model?
Ok, let’s define some meaning to ‘best’:
b̂
We would like our model observation to be as close as
b
possible to the true observation .
2
b̂ b b bˆ
Define the distance between and by .
2
Solve the following minimization problem to get : x
2
min b b min b Ax 2
2
ˆ
x 2 x
x*
Let be the solution to the minimization problem.
Then our ‘best’ model is
mklse(t ) x *1 p (t 1) x *2 r (t 1) x *3 f t 1 .
Linear least squares problem:
The normal equation
Normal equation
Pseudoinverse
The minimization problem
In general, the problem
2
min b b min b Ax 2 ,
2
ˆ
x 2 x
A R M N , b, bˆ R M
with , is called the linear least squares
problem, i.e. the problem of finding the best linear model wrt.
the norm.
L2
Several things to note:
b ˆ Ax range A
;
b 2range A x R N b Ax
If , then s.t. and
b bˆ 0
(perfect model).
2
The ‘not so perfect’ world
• It is a common situation that some physical problem leads to
a linear system Ax = b that should be consistent on
theoretical grounds. However due to “measurement errors” in
the entries of A and b, the perturbed system fails to be
consistent
Original system Perturbed system
Ax b
b
Ax
Errors A AE
b b b
From the perspective of linear transformation
RM
RN
b
A
x Ax
range A
Measuring minimum distance
Simplify the problem
R3
to : b
range A
How do you determine the minimum
b range A
distance from to the plane ?
Measuring minimum distance (cont’d)
Find an orthogonal
b
projection from to
b r
range A
:
Ax *
0 R3
range A
r
The distance is measured by the ‘length’ of ,
the residual vector.
The optimum solution is then (called the
x*
Ax b
least squares solution of ).
The solution method
• What is r?
r b Ax.
range A r range A
• r is orthogonal to . Thus .
range A null AT
• .
• which means….
AT r 0,
or… AT b Ax 0,
or… AT Ax AT b
THE NORMAL
EQUATION
What do we know about the normal equation
A R M N
• If , then the normal equation is an N x N linear
system.
• The normal system is consistent because it is satisfied by the
least squares solution.
• The normal equation may have infinitely many solutions, in
which case all its solution is least squares solution of Ax = b .
When does the normal equation has a
unique solution?
The normal equation has a unique solution when…
These statements are equivalent:
Columns of A are linear independent.
A is full rank.
AT A is nonsingular.
1
A AT
A AT
If is nonsingular , exists. Thus, the normal
equation solves as
1
x A A AT b.
T
This solution is unique.
The pseudoinverse
• When A is full rank, we can define a pseudoinverse for A
which is 1
A A A AT .
† T
• So, in terms of the pseudoinverse, the unique least squares
solution can be written as
x A †b.
• Condition number:
cond A A A † .
Solving the normal equation
AT A
Matrix is symmetric positive definite (SHOW THIS!!!).
Therefore we can use Cholesky factorization to solve the normal
equation
A Ax A b.
T T
Cost:
2
AT A
N to compute ;
M
3
N 6
for the Cholesky factorization.
TOO COSTLY!!!
A catastrophic example
1 0
4
A 10 0
0 104
WHAT DO YOU
MAKE OF THIS
MATRIX?
1 10 8
1
A A
T
8
.
1 1 10
A catastrophic example
1 0
4
A 10 0
0 104
WHAT DO YOU
MAKE OF THIS
MATRIX?
1 10 8
1
A A
T
8
.
1 1 10
ILL‐CONDITIONED!
A few facts
• The normal equation method costs too much because we
AT A
need to form .
cond 2 A
• It squares the condition number (compare and
2 A A
T
in the case M
cond = N).
• It’s useful for small hand calculation.
Example
N 2, M 3
1 4 5
14 32 46
A 2 5, b 7, A A
T
, A b
T
.
3 6 9 32 77 109
14 32 46
x .
32 77 109
Solution: That was easy…..
1 But hang on a minute.
x . Notice how fast the
1 numbers grow.
QR factorization
WE ALREADY HAVE LU
FACTORIZATION…WHY DO WE NEED QR
FACTORIZATION?
The LU factorization as a matrix transformation
The LU factorization is a linear transformation of A to U:
1
MA L A U.
The transformation matrix M is a product of elementary
matrices.
Suppose the transformation matrix is an orthogonal
matrix?
QT A R A QR
Why choose orthogonal matrices?
Multiplication by orthogonal matrices is stable. The norm
L2
does not grow:
Qv 2 v 2 , QA 2 A 2 .
So, PROOF
THESE
RESULTS!!!
R 2 QT A A 2 .
2
Qv 2 v 2
What does tell us about
orthogonal transformation?
M N
QR factorization:
M N
The QR factorization of an matrix A is
A = QR,
where
M N
Q is with orthonormal columns;
NN
R is upper triangular with positive diagonals.
THIS DEFINES THE ECONOMY SIZE QR FACTORIZATION (OR
‘REDUCED QR’)
In MATLAB:
[Q,R]=qr(A,0);
The second argument 0 matters.
About Q and R
QT Q I
• Q has orthonormal columns so , where I is the N x N
identity matrix.
• R has positive diagonals so it is nonsingular.
How QR helps solve the least squares problem
We will start from the normal equation
AT Ax AT b.
Let A be full rank and A = QR. Using the QR factorization we can
write the normal equation as
QR QR x QR b,
T T
R T QT QRx R T QT b.
QT Q I
and R RT
(hence ) is nonsingular, thus the normal
equation is reduced to
Rx QT b.
Solving the reduced normal equation
• The reduced normal equation is upper triangular.
• Can be solve by back solving (backward substitution).
• In MATLAB, given A and b, the least squares solution is
obtained as follows:
[Q,R] = qr(A,0);
c = Q’*b
x = R\c;
Problem with pre‐QR factorization of A
You need to store Q .
Q can be rather big.
Q can be very dense with no particular structure.
Cost:
Compute QR factorization (later) ‐‐‐ N 2 M N 3 3
Multiply b by ‐‐‐
QT O MN
Rx QT b
Solve ‐‐‐ N2 2
I think we can do
better
The classical Gram‐Schmidt
algorithm
How can we improve the QR method?
I’m going to try and get some answers from the factorization
itself.
Let’s write A, Q and R in column wise representation:
A a1 a 2 a N , Q q1 q 2 q N , R r1 r2 rN .
Several tips
Our analyses will require us to be quick with matrix algebra, so
here’s a few tips:
Abi
‐ the ith column of AB
a B
iT
‐ the ith row of AB
Aei
‐ the ith column of A
eTi A
‐ the ith row of A
Work with R first
R is upper triangular, so can write the columns more specifically
as:
r11 r12 r1N
N
0 r r
r e , r 22 r e r e , , r 2 N r e
r
1 11 1 2 12 1 22 2 N iN i
i 1
0 0 NN
r
e1 , e 2 , , e N
where are the Euclidean unit vectors (canonical
RN
vectors) in .
Now we work on QR
Write the matrix product QR in terms of its column:
QR Qr1 Qr2 QrN .
Now, the first column is
r11
0
Qr1 r11Qe1 r11q1.
0
The rest of the columns in QR
The second column:
r12
r22
Qr2 r12Qe1 r22Qe 2 r12q1 r22q 2 .
0
etc.
The Nth column:
r1N
N
r N
QrN r Qe r q .
2 N
iN i iN i
i 1 i 1
NN
r
Now compare A and QR
A = QR implies
a1 r11q1 ,
a 2 r12q1 r22q 2 ,
N
a N r1N q1 r2 N q 2 rNN q N riN qi .
i 1
Columns of A are linear combinations of columns
of Q.
Which means…….
Relationship between A and Q
• Columns of A spans the same vector space as columns of Q.
range A
• Columns of Q spans i.e. column space of A.
range A
• Columns of Q forms an orthonormal basis for .
We can form Q by orthonormalizing the
column vectors of A
QR from scratch
• The idea: Build an orthonormal basis from scratch.
• Transforms a set of linearly independent vectors (a basis)
a1 , a 2 , , a N ,
into an orthonormal basis
q1 , q 2 , , q N .
• Go through the vectors one by one and substract off the part
of a vector that is not orthogonal to the previous. Normalizing
each time to produce an orthonormal set.
Step 1
a1 , a 2 , , a N
Start with our full rank matrix A. The columns are
range A
linearly independent and forms a basis of .
v1 a1
Let . Turn it into a unit vector q1 v1 v1 2 .
q1 a2
Consider and :
v 2 a 2 2 q1
1
a2
v2
v 2 q1
q1
1
We’re still missing something……. 2 ?
Here’s when we apply the orthogonality condition:
v 2 , q1 0.
This is equivalent to
1
a 2 2 q1 , q1 a 2 , q1 2 q1 , q1 0,
1
which means
1 a 2 , q1 .
1
At the end of Step 1
Our starting vector (normalized):
q1 a1 a1 2 .
q1
A vector orthogonal to :
v 2 a 2 21q1.
v2
Normalize the intermediate vector to get a unit vector
q2 v2 v2 2 .
We are already two steps towards a
QR factorization:
a1 a1 2 q1
a 2 2 q1 v 2 2 q 2
1
Step 2
Now consider
q1 , q 2 , a3 .
a3
v3
q2
q1
v 3 span q1 , q 2
1 2
3 ,3
Two things we need to determine ……. .
Two unknowns mean we need two orthogonality conditions.
Luckily, we have exactly TWO:
v3 , q1 0, & v3 , q 2 0.
The first condition is equivalent to
2 1
a3 3 q 2 3 q1 , q1
3 1 3 2 1 3 q1 , q1 0,
a , q 2
q , q 1
which means
3 a 3 , q1 .
1
2
Use the second orthogonality condition to find 3
At the end of Step 2
q1 q2
A vector orthogonal to and :
v 3 a3 31q1 3 2q 2 .
Normalize the intermediate vector to get a unit vector
v3
q3 v 3 v 3 2 .
Can you see the emerging
pattern?
a1 a1 2 q1
a 2 2 q1 v 2 2 q 2
1
a3 3 q1 3 q 2 v 3 2 q3
1 2
Step k (1<k<N-1)
v k 1 span q1 , q 2 , , q k
An orthogonal basis for
A basis for Rk+1
Rk+1
Write the orthogonality condition at the kth step.
k11 , k 21 , , k k1
Then, use these conditions to find .
Realizing the QR factorization
Putting it all together:
a1 v1 2 q1 Set:
a 2 v 2 2 q 2 2 q1
1 rii v i 2 , i 1, 2, , N
rij j , i j ,
i
a3 v 3 2 q3 3 q 2 3 q1
2 1
rij 0, i j.
a n v n 2 q n n n 1q n 1 n1q1
Convince yourself that equations from the previous
A QR
slide produce an equation of the form , where
Q is an orthogonal matrix and R is an upper triangular
matrix.
Classical Gram‐Schmidt: The algorithm
Input A
For i = 1, …,n
v i ai
For j = 1,…,i-1
rij aTi q j
End
For j = 1,…,i-1
v i v i rij q j
End
rii v i 2 ; qi v i rii
End
Gram‐Schmidt: Cost
WE’LL DO
THIS IN CLASS
Let’s work on an example: QR factorization
Find the QR factorization of A using the CGS method:
0 1 1 1
2 1 2 0
A
1 0 0 0
0 0 1 1
Let’s work on an example: Least squares problem
Find the least squares solution of Ax = b using the QR factorization
you obtained earlier, given that
0 1 1 1 1
2 1 2 0 1
A , b .
1 0 0 0 1
0 0 1 1 1
Classical Gram‐Schmidt: Some MATLAB
[m,n] = size(A);
q = zeros(m,n); r = zeros(n,n);
for j= 1:n
r(1:j-1,j) = q(:,1:j-1)’*A(:,j); VECTORIZATION
v = A(:,j) - q(:,1:j-1)*r(1:j-1,j);
r(j,j) = norm(v);
q(:,j) = v/norm(v);
end
Advantages of CGS
• Easy to vectorize
• MATLAB automatically uses your multi‐core computer
correctly
• Parallelizable
IS THIS ENOUGH???
Stability of CGS
Consider this matrix:
1 1 1
A 107 107 0 .
107 7
0 10
First, let’s use MATLAB to get the QR factorization and look at
QT Q I QR A
and :
>>[qm,rm] = qr(A);
>>[norm(qm’*qm – eye(3)), norm(qm*rm – A)]
ans =
3.8459e-16 4.4409e-16
On the other hand…..
Let’s do the same using CGS:
>>[qc,rc] = class_gs(A);
>>[norm(qc’*qc – eye(3)), norm(qc*rc – A)]
ans =
7.9928e-04 0
THIS IS BAD!!!
WHAT
HAPPENED???
What about Q?
>>qc
qc =
I DON’T THINK THIS IS AN
ORTHOGONAL MATRIX…
Try it with a linear system problem
>> b = A*ones(3,1);
>> [rm\qm'*b rc\qc'*b]
ans =
1.0000e+00 1.0056e+00
1.0000e+00 9.9840e-01
1.0000e+00 9.9600e-01
ans =
1.0175e-15 7.0632e-03
DO YOU EXPECT THIS?
Let me try and explain what happened
Q T
Q
Let’s look at :
>> qm'*qm
ans =
q1T q 2
1.0000e+00 ‐1.3235e‐23 ‐1.0703e‐23
‐1.3235e‐23 1.0000e+00 2.2204e‐16
‐1.0703e‐23 2.2204e‐16 1.0000e+00
>> qc'*qc
q1T q3
ans =
1.0000e+00 ‐7.9928e‐11 ‐1.5986e‐10 qT2 q3
‐7.9928e‐11 1.0000e+00 7.9928e‐04
‐1.5986e‐10 7.9928e‐04 1.0000e+00
Loss of orthogonality
• Instability in CGS is caused by loss of orthogonality.
• As the steps in CGS progresses, columns in Q keeps losing
orthogonality.
• The problem is much more serious in CGS than MATLAB’s QR.
ak
Errors in step k will
vk v k be carried through to
steps k+1, k+2, …
q k 1
rk ,k 1 rk ,k 1
A somewhat more stable version of CGS: Modified
Gram‐Schmidt
Input A
For i = 1, …,n You should verify that MGS is
v i ai exactly the same as CGS in exact
For j = 1,…,i-1 arithmetic
rij qTj v i
v i v i rij q j
End
rii v ii 2 ; qi v i rii
End
CGS vs MGS in exact arithmetic: Rework this exercise
Find the QR factorization of A using the MGS method
(use the algorithm provided in the previous slide):
0 1 1 1
2 1 2 0
A
1 0 0 0
0 0 1 1
CGS vs MGS in exact arithmetic
IN EXACT ARITHMETIC, BOTH
PROCEDURES GENERATE EXACTLY THE
SAME OUTPUT!
MORE WILL BE
IN ENDOF LECTURE 4
THANK YOU!