Professional Documents
Culture Documents
ORE 766 Numerical Methods in Ocean Engineering
Instructor: Eva‐Marie Nosal
Least squares
• An approach to solving problems that are over‐
determined: more equations than unknowns
• Problem cannot be solved exactly. Instead,
minimize the sum of squares of residuals (the
differences between the observations and the
model)
• Let t denote the independent variable and y(t)
denote the unknown function of t that we wish
to approximate.
• Given m observations (values of y measured at
specified values of t):
yi = y(ti), i = 1, 2, …, m
• Want to find a model f (t) that minimizes the
sum of squares of residuals. i.e. minimize:
m Figure from WIkipedia
y
i 1
i f (ti )
2
• Appropriate probabilistic assumptions about the underlying error distributions
→ maximum‐likelihood estimate
• Even without these probabilistic assumptions, least squares gives useful results
• Computational techniques use orthogonal matrix factorizations
Linear least squares
• Linear least squares: fit a linear model to measurements
• Model y as a linear combination of n basis functions øj (t) :
y(t) ≈ β1ø1 (t) + β2ø2 (t) + … + βnøn (t) = f (t)
• By ≈ we mean that the sum of squares of residuals is minimized
• The basis functions øj (t) can be nonlinear functions of t but the unknown
parameters βj appear linearly
• In matrix notation:
1 (t1 ) 2 (t1 ) n (t1 ) 1 y1
(t ) (t ) n (t2 ) 2 y2
1 2 2 2 y X
xi . j j (ti )
1 (tm ) 2 (tm ) n (tm ) n ym
• X is called the design matrix
• Linear least square problems admit a closed‐form solution
• In MATLAB, compute the least squares solution to rectangular, over‐determined
(m > n) systems using the backslash operator (more on this later):
X\ y
• m = n → can fit data exactly → interpolation
Common linear models
• Line:
f (t) = β1 t + β2
ø1 (t) = t, ø2 (t) = 1
• Polynomial:
f (t) = β1 t n-1 + β2 t n-2+…+ βn-1 t + βn
øj (t) = tn-j, j = 1, 2, …n
(m=n → polynomial interpolation!)
If all t are large then the design matrix
will be badly scaled (recall question
3.18 from Assignment 2). Translate t
by:
s = (t‐μ)/σ
• Log‐linear:
ln( f (t) ) = β1 t + ln(β2) Figure from WIkipedia
f (t) = β2 exp(β1 t )
• Play with censusgui to explore
linear, polynomial, and log‐linear fits
Separable non‐linear least squares
• Model involves both linear (β) and nonlinear (α) parameters:
y(t) ≈ β1ø1 (t,α) + β2ø2 (t,α) + … + βnøn (t,α) = f (t)
• Elements of the design matrix depend on both t and α: xi.j=øj (ti ,α)
• No closed‐form solutions
– Combine backslach with fminsearch
– Use the Curve Fitting Toolbox (cftool) for a graphical interface
– More on this later
• Common separable non‐linear models:
– Rational functions:
1t n 1 n 1t n t n j
f (t ) , j (t ) n 1
1t n 1t n
n 1
1t n 1t n
– Exponentials:
jt
f (t ) 1e 1t n e nt , j (t ) e
– Gaussians: 2 2
t 1
2
t n t j
j
n
f (t ) 1e 1
ne
, j (t ) e
1/ p
n
p
l p norm : x xi
Other norms p
i 1
• Residuals are the differences between the observations and the model:
n
ri yi j j (ti , ) , i 1,..., m
j 1
• In matrix notation:
r y X ( )
• In least squares we minimize the sum of squares of the residuals i.e. we
m
use the two‐norm:
ri
2 2
r
j 1
• Can also use a weighted least squares (e.g. when some observations are
more important than others): m
wi ri
2 2
r w
j 1
• Or the one‐norm (resulting parameters are less sensitive to outliers)
m
r 1 ri
j 1
• Or the infinity‐norm (minimize the largest residual) aka Chebyshev fit:
r
max ri
i
5.5 Normal equations
• Recall: Linear least squares problem with m observations and n
basis functions, m > n
X y
• X is the design matrix of dimensions m‐by‐n
• Overdetermined (m > n) → cannot be solved exactly → solve in a
least squares sense:
min X y
• Theoretical approach:
– Multiply both sides by XT to reduce the system to a square n‐by‐n
system known as the normal equations:
X T X X T y
– If the basis functions are independent then XTX is nonsingular and
X T X X T y
1
5.6 Pseudoinverse
• If X has full rank (linearly independent columns) then
X X T X X T
1
is the Moore‐Penrose pseudoinverse (a.k.a. pseudoinverse, generalized inverse)
• X+ has some properties of the ordinary inverse. It is a left inverse because
X X X X X T X I
T 1
• It is as close to the right inverse as possible in the sense that of all
where we’ve used the Frobenius norm: Z F
zi , j
i j
2
• Actual computation involves singular value decomposition
• MATALB code: pinv
Normal equations & pseudoinverses
X y
• Using the normal equations and pseudoinverses we get:
X T X X T y X y
1
• There are several undesirable aspects to this
– Matrix inverses require more work and are less accurate than solving the
system by Gaussian elimination (recall the chapter on Linear Equations)
– The normal equations are always more badly conditioned than the original
system: the condition number is squared
X T X X 2
– With finite‐precision computation, normal equations can actually become
singular i.e. (XTX)‐1 is nonexistent, even through the columns of X are
independent (Moler gives an example on p.8)
• MATALB avoids normal equations using the backslash operator
X\ y
• How? Using QR factorization. But first we need to learn about
Householder reflections…
5.4 Householder reflections
• Basis for very powerful & flexible
numerical algorithms
• Householder reflection is a matrix of the
form:
H I u u T
where u is any nonzero vector and
2
2/ u
• H is symmetric and orthogonal i.e.
HT H and HT H H 2 I
• In practice (and in Moler’s code), don’t
actually form H. Apply H to a vector x by:
u T x , Hx x u
• Geometrically H transforms a vector into
its mirror image in the line u
perpendicular to u
Householder reflections
• For a given vector x we can find a Householder reflection such that Hx
is all zeros except the kth component
• This is given by:
x
u x ek
2
2 / u 1 /(uk )
H I u u T
• Where ek is the unit vector zero everywhere except the kth component
• Explore this using Householder.m (code on my ftp site)
Example:
x = [1 2 3]‘
H = Householder(x,3)
Then: H*x = [0 0 ‐3.7417]'
X y
QR factorization r11 r12 r1n
r22 r2 n
• An orthogonalization algorithm that decomposes X
into an orthogonal matrix Q and a right (upper)
R rnn
triangular matrix R: X = QR
• Hi are householder reflections 0
– R = Hn…H2H1X
– Q = (Hn…H2H1)T 0
– Check: QR = (Hn…H2H1)T Hn…H2H1X = X
• To solve the system apply the householder reflections to both sides:
H n H 2H1X H n H 2H1 y R z
• Results:
– The first n equations in R z define a small square (n‐by‐n) triangular
system that can be solved for β by back substitution
– The coefficients of the remaining m‐n equations are all zero → resulting
equations are independent of β and the corresponding components of z
constitute the transformed residual
– QR is related to Gram‐Schmidt but is more numerically satisfactory
– QR is a good approach because householder reflections have great numerical
credentials and because the resulting triangular system is ready for back
substitution
QR factorization
• How do we find the appropriate householder reflections?
• Illustrate with an example (explore this using QRexample.m (code on my ftp site))
• Given (scaled) data:
• We want H1 to introduce zeros below the diagonal in
the first column of X
→ H1 is the householder matrix that transforms the
first column of X to all zeros except the 1st component
- 1.2516 - 1.4382 - 1.7578 - 449.3721
0 160.1450
0.1540 0.9119
0 0.2161 0.6474 126.5001
• Then H1X
0 0.1863 0.2067
and H1 y
53.9032
0 0.0646 - 0.4102 - 57.2146
0 - 0.1491 - 1.2035 - 198.0273
- 1.2516 - 1.4382 - 1.7578
0 0.1540 0.9119
0
H1X
0
0.2161 0.6474
0.1863 0.2067
QR factorization
0 0.0646 - 0.4102
0 - 0.1491 - 1.2035
• Want H2 to introduce zeros below the diagonal in the
second column of H1 X
→ H 2 1 0
0 H *2 where H is the householder matrix
*2
that transforms the first column of the unreduced
submatrix of H1 X (shown in blue) to all zeros except
the 1st component
- 1.2516 - 1.4382 - 1.7578 - 449.3721
0 - 242.3136
- 0.3627 - 1.3010
0 0 - 0.2781 - 41.8344
• Then H 2H1X
0 0 - 0.5911
and H 2H1 y
- 91.2017
0 0 - 0.6867 - 107.4922
0 0 - 0.5649 - 81.8797
- 1.2516 - 1.4382 - 1.7578
0 - 0.3627 - 1.3010
0
H 2H1X
0
0
0
- 0.2781
- 0.5911
QR factorization
0 0 - 0.6867
0 0 - 0.5649
• Want H3 to introduce zeros below the diagonal in the
second column of H2 H1 X
1 0 0
→ H 3 0 1 0
0 0 H*3
where H*3 is the householder matrix
that transforms the first column of the unreduced
submatrix of H2 H1 X (shown in blue) to all zeros except
the 1st component
- 1.2516 - 1.4382 - 1.7578 - 449.3721
0 - 242.3136
- 0.3627 - 1.3010
0 0 1.1034 168.2243
• Then R H3H 2H1X and z H 3H 2H1 y
- 1.3218
0 0 0
0 0 0 - 3.0801
0 0 0 4.0087
QR factorization
• The resulting system R z is
- 1.2516 - 1.4382 - 1.7578 - 449.3721
0 - 0.3627 - 1.3010 - 242.3136
0 0 1.1034 168.2243
0 0 0 - 1.3218
0 0 0 - 3.0801
0 0 0 4.0087
• The first 3 equations
- 1.2516 - 1.4382 - 1.7578 1 - 449.3721
0 - 0.3627 - 1.3010 2 - 242.3136
0 0 1.1034 3 168.2243
can be solved exactly to give 1 5.6790
121.1636
2
3 152.4663
• The last three equations cannot be satisfied so the last three
components of z represent the residual
• This is the same solution that we get using beta = R \ z or beta = X \ y
(since backslash uses QR decomposition)
QR factorization
• In this polynomial example, once you know β, you can
evaluate the best fit model at any s using polyval(beta,s)
• Again, be careful with extrapolations!
• MATLAB implementation: qr and backslash
• Moler implementation: qrsteps
X y
5.7 Rank deficiency
• The rank of a matrix is the number of linearly independent columns (or rows)
• The rank of an mxn matrix is at most min(m,n).
• A full rank matrix has rank min(m,n) (rank is as large as possible)
• In this case, beta = X\y and beta = pinv(X)*y give the same solution (pinv
requires more work)
• Rank deficient means the rank is less than min(m,n)
• Pseudoinverse for rank deficient X
– (XTX)‐1 does not exist, so our definition of pseudoinverse X X X X
T1 T
breaks down
– Pinv uses the minimum norm solution: of all the vectors β that minimize
X y , beta = pinv(X)*y also minimizes
– The minimum norm solution is unique
• Backslash for rank deficient X
– The solution (obtained via QR decomposition) is called a basic solution
– At most r of the components of beta = X\y are nonzero (where r=rank(X))
– Solution is not unique
Rank deficient example
1 2 3 16
4 5 6 17
X 7 8 9, y 18
10 11 12 19
13 14 15 20
• X is rank deficient since the middle column is the average of the first and last columns
1
2
• is a null vector i.e. Xη=0
1
• For a particular solution β and any scalar c, β+cη is also a solution since
X y X c X cX y 0 y
• Beta = X\y gives a warning about rand deficiency. Solution: beta = [‐7.5 000 0 7.8333]T
• This is basic (only 2 non‐zero components) but the vectors beta = [ 0 ‐15.0000 15.333]T
and beta = [ ‐15.3333 15.6667 0]T are also basic solutions
• beta = pinv(X)*y doesn’t give a warning and produces beta = [‐7.5556 0.1111 7.7778]T
• As expected, norm(pinv(X)*y) = 10.8440 < norm(X\y) = 10.8449
• Remember: for rank deficient systems the computed solutions are not unique!
5.8 Separable least squares
• Recall: The separable least squares model involves both linear (β) and
nonlinear (α) parameters:
y(t) ≈ β1ø1 (t,α) + β2ø2 (t,α) + … + βnøn (t,α) = f (t)
• Various direct and gradient methods
• MATLAB: Optimization toolbox, Curve Fitting Toolbox
• We’ll use fminsearch (uses the Nelder‐Meade algorithm)
• Can possibly use fminsearch to search for all parameters (as we would
do for a fully non‐linear problem). Instead…
• Make use of the separable structure for a more efficient method
– use fminsearch to search for values of the nonlinear parameters that
minimize the norm of the residual
– At each step, use backslash to compute the values of the linear
parameters
• Two blocks of code:
– One sets up the problem and calls fminsearch
– The other is the objective function that is called by fminsearch
Separable least squares example
• Fmisearch syntax:
[x, fval] = fminsearch(fun, x0, options, p1, p2,…)
where
x = vector of the parameters that minimize the function fun
fval = the value of the function at the minimum
x0 = a vector of the initial guesses for the parameters
options = structure with optimization parameters created with optimset
(use [] for default)
p1,p2,… = additional arguments
• Example: Use observations of radioactive decay
• Model: y ≈ β1e-λ1t + β2e-λ2t