Least Squares: ORE 766 Numerical Methods in Ocean Engineering Instructor: Eva Marie Nosal

Least Squares
ORE 766 Numerical Methods in Ocean Engineering
Instructor: Eva‐Marie Nosal
Least squares
• An approach to solving problems that are over‐
determined: more equations than unknowns
• Problem cannot be solved exactly. Instead,
minimize the sum of squares of residuals (the
differences between the observations and the
model)
• Let t denote the independent variable and y(t)
denote the unknown function of t that we wish
to approximate.
• Given m observations (values of y measured at
specified values of t):
yi = y(ti), i = 1, 2, …, m
• Want to find a model f (t) that minimizes the
sum of squares of residuals. i.e. minimize:
m Figure from WIkipedia
y
i 1
i  f (ti ) 
2
• Appropriate probabilistic assumptions about the underlying error distributions
→ maximum‐likelihood estimate
• Even without these probabilistic assumptions, least squares gives useful results
• Computational techniques use orthogonal matrix factorizations
Linear least squares
• Linear least squares: fit a linear model to measurements
• Model y as a linear combination of n basis functions øj (t) :
y(t) ≈ β1ø1 (t) + β2ø2 (t) + … + βnøn (t) = f (t)
• By ≈ we mean that the sum of squares of residuals is minimized
• The basis functions øj (t) can be nonlinear functions of t but the unknown
parameters βj appear linearly
• In matrix notation:
 1 (t1 ) 2 (t1 )  n (t1 )   1   y1 
 (t )  (t )  n (t2 )    2   y2 
 1 2 2 2       y  X
          
     xi . j   j (ti )
1 (tm ) 2 (tm )  n (tm )   n   ym 
• X is called the design matrix
• Linear least square problems admit a closed‐form solution
• In MATLAB, compute the least squares solution to rectangular, over‐determined
(m > n) systems using the backslash operator (more on this later):
 X\ y
• m = n → can fit data exactly → interpolation
Common linear models
• Line:
f (t) = β1 t + β2
ø1 (t) = t, ø2 (t) = 1
• Polynomial:
f (t) = β1 t n-1 + β2 t n-2+…+ βn-1 t + βn
øj (t) = tn-j, j = 1, 2, …n
(m=n → polynomial interpolation!)
If all t are large then the design matrix
will be badly scaled (recall question
3.18 from Assignment 2). Translate t
by:
s = (t‐μ)/σ
• Log‐linear:
ln( f (t) ) = β1 t + ln(β2) Figure from WIkipedia
f (t) = β2 exp(β1 t )
• Play with censusgui to explore
linear, polynomial, and log‐linear fits
Separable non‐linear least squares
• Model involves both linear (β) and nonlinear (α) parameters:
y(t) ≈ β1ø1 (t,α) + β2ø2 (t,α) + … + βnøn (t,α) = f (t)
• Elements of the design matrix depend on both t and α: xi.j=øj (ti ,α)
• No closed‐form solutions
– Combine backslach with fminsearch
– Use the Curve Fitting Toolbox (cftool) for a graphical interface
– More on this later
• Common separable non‐linear models:
– Rational functions:
1t n 1     n 1t   n t n j
f (t )  ,  j (t )  n 1
1t     n 1t   n
n 1
1t     n 1t   n
– Exponentials:
  jt
f (t )  1e  1t     n e  nt ,  j (t )  e
– Gaussians: 2 2
 t  1 
2
 t  n   t  j 
     
 j 
  n
f (t )  1e  1 
   ne 
,  j (t )  e  
1/ p
 n
p
l p norm : x    xi 
Other norms p
 i 1 
• Residuals are the differences between the observations and the model:
n
ri  yi    j j (ti ,  ) , i  1,..., m
j 1
• In matrix notation:
r  y  X ( ) 
• In least squares we minimize the sum of squares of the residuals i.e. we
m
use the two‐norm:
  ri
2 2
r
j 1
• Can also use a weighted least squares (e.g. when some observations are
more important than others): m
  wi ri
2 2
r w
j 1
• Or the one‐norm (resulting parameters are less sensitive to outliers)
m
r 1   ri
j 1
• Or the infinity‐norm (minimize the largest residual) aka Chebyshev fit:
r 
 max ri
i
5.5 Normal equations
• Recall: Linear least squares problem with m observations and n
basis functions, m > n
X  y
• X is the design matrix of dimensions m‐by‐n
• Overdetermined (m > n) → cannot be solved exactly → solve in a
least squares sense:
min X  y

• Theoretical approach:
– Multiply both sides by XT to reduce the system to a square n‐by‐n
system known as the normal equations:
X T X  X T y
– If the basis functions are independent then XTX is nonsingular and
  X T X  X T y
1
5.6 Pseudoinverse
• If X has full rank (linearly independent columns) then
X   X T X  X T
1
is the Moore‐Penrose pseudoinverse (a.k.a. pseudoinverse, generalized inverse)
• X+ has some properties of the ordinary inverse. It is a left inverse because
X X  X X  X T X  I
 T 1
• However X+ is not a right inverse:

XX  X X X  X T  I
 T 1
• It is as close to the right inverse as possible in the sense that of all
matrices Z that minimize XZ  I F Z=X+ also minimizes Z F
where we’ve used the Frobenius norm: Z F
  zi , j
i j
2
• Actual computation involves singular value decomposition
• MATALB code: pinv
Normal equations & pseudoinverses
X  y
• Using the normal equations and pseudoinverses we get:
  X T X  X T y  X  y
1
• There are several undesirable aspects to this
– Matrix inverses require more work and are less accurate than solving the
system by Gaussian elimination (recall the chapter on Linear Equations)
– The normal equations are always more badly conditioned than the original
system: the condition number is squared
 X T X    X 2
– With finite‐precision computation, normal equations can actually become
singular i.e. (XTX)‐1 is nonexistent, even through the columns of X are
independent (Moler gives an example on p.8)
• MATALB avoids normal equations using the backslash operator
 X\ y
• How? Using QR factorization. But first we need to learn about
Householder reflections…
5.4 Householder reflections
• Basis for very powerful & flexible
numerical algorithms
• Householder reflection is a matrix of the
form:
H  I  u u T
where u is any nonzero vector and
2
  2/ u
• H is symmetric and orthogonal i.e.
HT  H and HT H  H 2  I
• In practice (and in Moler’s code), don’t
actually form H. Apply H to a vector x by:
  u T x , Hx  x  u
• Geometrically H transforms a vector into
its mirror image in the line u
perpendicular to u
Householder reflections
• For a given vector x we can find a Householder reflection such that Hx
is all zeros except the kth component
• This is given by:
  x
u  x  ek
2
  2 / u  1 /(uk )
H  I  u u T
• Where ek is the unit vector zero everywhere except the kth component
• Explore this using Householder.m (code on my ftp site)
Example:
x = [1 2 3]‘
H = Householder(x,3)
Then: H*x = [0 0 ‐3.7417]'
X  y
QR factorization  r11 r12  r1n 
 r22  r2 n 
• An orthogonalization algorithm that decomposes X  
  
into an orthogonal matrix Q and a right (upper)  
R rnn 
triangular matrix R: X = QR
• Hi are householder reflections  0
 
– R = Hn…H2H1X   
– Q = (Hn…H2H1)T  0 
– Check: QR = (Hn…H2H1)T Hn…H2H1X = X
• To solve the system apply the householder reflections to both sides:
H n  H 2H1X  H n  H 2H1 y  R  z
• Results:
– The first n equations in R  z define a small square (n‐by‐n) triangular
system that can be solved for β by back substitution
– The coefficients of the remaining m‐n equations are all zero → resulting
equations are independent of β and the corresponding components of z
constitute the transformed residual
– QR is related to Gram‐Schmidt but is more numerically satisfactory
– QR is a good approach because householder reflections have great numerical
credentials and because the resulting triangular system is ready for back
substitution
QR factorization
• How do we find the appropriate householder reflections?
• Illustrate with an example (explore this using QRexample.m (code on my ftp site))
• Given (scaled) data:
• Want to fit a quadratic: y(s) ≈ β1s2 + β2s + β3

• The design matrix is:
 s12 s1 1  0 0 1
 2  
 s2 s2 1 0.04 0.20 1

 s32 s3 1 0.16 0.40 1
X 2  
 s4 s4 1 0.36 0.60 1
s 2 s5 1 0.64 0.80 1
 52   
 s6 s6 1 1.00 1.00 1
 0 0 1
0.04 0.20 1
 
0.16
X
0.40 1

QR factorization
0.36 0.60 1
0.64 0.80 1
 
1.00 1.00 1
• We want H1 to introduce zeros below the diagonal in
the first column of X
→ H1 is the householder matrix that transforms the
first column of X to all zeros except the 1st component
- 1.2516 - 1.4382 - 1.7578 - 449.3721 
 0  160.1450 
0.1540 0.9119   
 
 0 0.2161 0.6474   126.5001 
• Then H1X  
0 0.1863 0.2067
 and H1 y  
53.9032

   
 0 0.0646 - 0.4102  - 57.2146 
   
 0 - 0.1491 - 1.2035   - 198.0273
- 1.2516 - 1.4382 - 1.7578 
 0 0.1540 0.9119 
 
 0
H1X  
0
0.2161 0.6474 
0.1863 0.2067

QR factorization
 
 0 0.0646 - 0.4102 
 
 0 - 0.1491 - 1.2035 
• Want H2 to introduce zeros below the diagonal in the
second column of H1 X
→ H 2  1 0 
0 H *2  where H is the householder matrix
*2
that transforms the first column of the unreduced
submatrix of H1 X (shown in blue) to all zeros except
the 1st component
- 1.2516 - 1.4382 - 1.7578  - 449.3721 
 0  - 242.3136
- 0.3627 - 1.3010   
 
 0 0 - 0.2781   - 41.8344 
• Then H 2H1X  
0 0 - 0.5911
 and H 2H1 y  
- 91.2017

   
 0 0 - 0.6867   - 107.4922 
   
 0 0 - 0.5649   - 81.8797 
- 1.2516 - 1.4382 - 1.7578 
 0 - 0.3627 - 1.3010 
 
 0
H 2H1X  
0
0
0
- 0.2781 
- 0.5911
 QR factorization
 
 0 0 - 0.6867 
 
 0 0 - 0.5649 
• Want H3 to introduce zeros below the diagonal in the
second column of H2 H1 X
1 0 0 
→ H 3  0 1 0 
 
0 0 H*3 
where H*3 is the householder matrix
that transforms the first column of the unreduced
submatrix of H2 H1 X (shown in blue) to all zeros except
the 1st component
- 1.2516 - 1.4382 - 1.7578  - 449.3721 
 0  - 242.3136
- 0.3627 - 1.3010  
 
 0 0 1.1034   168.2243 
• Then R  H3H 2H1X    and z  H 3H 2H1 y   
- 1.3218
 0 0 0   
 0 0 0   - 3.0801 
   
 0 0 0   4.0087 
QR factorization
• The resulting system R  z is
- 1.2516 - 1.4382 - 1.7578  - 449.3721 
 0 - 0.3627 - 1.3010  - 242.3136
   
 0 0 1.1034   168.2243 
    
 0 0 0   - 1.3218 
 0 0 0   - 3.0801 
   
 0 0 0   4.0087 
• The first 3 equations
- 1.2516 - 1.4382 - 1.7578   1  - 449.3721 
 0 - 0.3627 - 1.3010   2    - 242.3136
    
 0 0 1.1034    3   168.2243 
can be solved exactly to give  1   5.6790 
     121.1636
 2  
  3   152.4663
• The last three equations cannot be satisfied so the last three
components of z represent the residual
• This is the same solution that we get using beta = R \ z or beta = X \ y
(since backslash uses QR decomposition)
QR factorization
• In this polynomial example, once you know β, you can
evaluate the best fit model at any s using polyval(beta,s)
• Again, be careful with extrapolations!
• MATLAB implementation: qr and backslash
• Moler implementation: qrsteps
X  y
5.7 Rank deficiency
• The rank of a matrix is the number of linearly independent columns (or rows)
• The rank of an mxn matrix is at most min(m,n).
• A full rank matrix has rank min(m,n) (rank is as large as possible)
• In this case, beta = X\y and beta = pinv(X)*y give the same solution (pinv
requires more work)
• Rank deficient means the rank is less than min(m,n)
• Pseudoinverse for rank deficient X
– (XTX)‐1 does not exist, so our definition of pseudoinverse X  X X  X
 T1 T
breaks down
– Pinv uses the minimum norm solution: of all the vectors β that minimize
X  y , beta = pinv(X)*y also minimizes 
– The minimum norm solution is unique
• Backslash for rank deficient X
– The solution (obtained via QR decomposition) is called a basic solution
– At most r of the components of beta = X\y are nonzero (where r=rank(X))
– Solution is not unique
Rank deficient example
1 2 3 16 
4 5 6 17 
   
X  7 8 9, y  18 
   
 10 11 12  19 
13 14 15 20
• X is rank deficient since the middle column is the average of the first and last columns
1
    2
•   is a null vector i.e. Xη=0
 1 
• For a particular solution β and any scalar c, β+cη is also a solution since
X  y  X   c   X  cX  y  0  y
• Beta = X\y gives a warning about rand deficiency. Solution: beta = [‐7.5 000 0 7.8333]T
• This is basic (only 2 non‐zero components) but the vectors beta = [ 0 ‐15.0000 15.333]T
and beta = [ ‐15.3333 15.6667 0]T are also basic solutions
• beta = pinv(X)*y doesn’t give a warning and produces beta = [‐7.5556 0.1111 7.7778]T
• As expected, norm(pinv(X)*y) = 10.8440 < norm(X\y) = 10.8449
• Remember: for rank deficient systems the computed solutions are not unique!
5.8 Separable least squares
• Recall: The separable least squares model involves both linear (β) and
nonlinear (α) parameters:
y(t) ≈ β1ø1 (t,α) + β2ø2 (t,α) + … + βnøn (t,α) = f (t)
• Various direct and gradient methods
• MATLAB: Optimization toolbox, Curve Fitting Toolbox
• We’ll use fminsearch (uses the Nelder‐Meade algorithm)
• Can possibly use fminsearch to search for all parameters (as we would
do for a fully non‐linear problem). Instead…
• Make use of the separable structure for a more efficient method
– use fminsearch to search for values of the nonlinear parameters that
minimize the norm of the residual
– At each step, use backslash to compute the values of the linear
parameters
• Two blocks of code:
– One sets up the problem and calls fminsearch
– The other is the objective function that is called by fminsearch
Separable least squares example
• Fmisearch syntax:
[x, fval] = fminsearch(fun, x0, options, p1, p2,…)
where
x = vector of the parameters that minimize the function fun
fval = the value of the function at the minimum
x0 = a vector of the initial guesses for the parameters
options = structure with optimization parameters created with optimset
(use [] for default)
p1,p2,… = additional arguments
• Example: Use observations of radioactive decay
• Model: y ≈ β1e-λ1t + β2e-λ2t

Least Squares: ORE 766 Numerical Methods in Ocean Engineering Instructor: Eva Marie Nosal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Least Squares: ORE 766 Numerical Methods in Ocean Engineering Instructor: Eva Marie Nosal

Uploaded by

Copyright:

Available Formats

Least Squares

• However X+ is not a right inverse:

matrices Z that minimize XZ  I F Z=X+ also minimizes Z F

• Want to fit a quadratic: y(s) ≈ β1s2 + β2s + β3

You might also like