You are on page 1of 21

Homework 14 @ 2021-12-10 17:02:26-08:00

EECS 16B Designing Information Devices and Systems II


Fall 2021 UC Berkeley Homework 14
This homework is due on Thursday, December 9, 2021, at 11:59PM. Self-
grades and HW Resubmission are due on Saturday, December 11, 2021, at
11:59PM.

1. Reading Lecture Notes


Staying up to date with lectures is an important part of the learning process in this course. Here are links to
the notes that you need to read for this week: Note 14 Note 15 Note 19 Note 2j

(a) What is the least squares solution for the problem A~x ≈ ~b when all matrices and vectors have
complex entries?
b = (A> A)−1 A>~b, we have ~x
Solution: Instead of ~x b = (A∗ A)−1 A∗~b.
(b) Given a complex orthonormal matrix A, what is the projection of a vector ~b onto the columns of
A?
Solution: Using the previous answer, we know that the least squares solution when pre-multiplied by
A will yield the projection of ~b onto the columns of A. This means that projCol(A) (~b) = A(A∗ A)−1 A∗~b =
A(I)A∗~b = AA∗~b
(c) What is the pseudoinverse of a matrix A in terms of the compact SVD of A? You may find looking
at a previous homework problem helpful.
Solution: If A = U ΣV > , where A ∈ Rm×n and the rank of A is r (has r non-zero singular values)
with Σ ∈ Rr×r ,and U has r columns in Rm and V has r columns in Rn , A† = V Σ−1 U > .

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 1
Homework 14 @ 2021-12-10 17:02:26-08:00

2. Segway Tours (Optional)


A segway is a stand on two wheels, and can be thought of as an inverted pendulum. The segway works by
applying a force (through the spinning wheels) to the base of the segway, This controls both the position on
the segway and the angle of the stand. As the driver pushes on the stand, the segway tries to bring itself back
to the upright position, and it can only do this by moving the base.
Recall that you have analyzed a basic version of this segway question in Homework 0 problem 5. You
were given a linear discrete time representation of the segway dynamics, and were guided through the steps
to find if it’s possible to make the segway reach some desired states, essentially laying the foundation of
controllability. Now, we will see how to derive the linear discrete time system from the equations of motion,
and then do some further refined analysis based on our improved knowledge of controllabilty.
The main question we wish to answer is: Is it possible for the segway to be brought upright and to a stop
from any initial configuration? There is only one input (force) used to control two outputs (position and
angle). Let’s model the segway as a cart-pole system and analyze.

dp dθ
A cart-pole system can be fully described by its position p, velocity dt , angle θ, and angular velocity dt .
We can write this as the continuous time state vector ~x as follows:
 
p
 dp 
 dt 
~x =   (1)
θ

dt

The input to this system is a scalar quantity u(t) at time t, which is the force F applied to the cart (or base
of the segway). Let the coefficient of friction be k.
The equations of motion for this system are as follows:
2 !
d2 p

1 u dθ k dp
= M
+ l sin θ − g sin θ cos θ −
dt2 m + sin2 θ m dt m dt
 2 ! (2)
d2 θ 1 u dθ M +m k dp
2
=   − cos θ − l cos θ sin θ + g sin θ + cos θ
dt l M + sin 2
θ m dt m m dt
m

The derivation of these equations is a mechanics problem and not in 16B scope, but interested students can
look up the details online.

(a) First let’s linearize the system of equations in (2) about the upright position at rest, i.e. θ∗ = 0 and

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 2
Homework 14 @ 2021-12-10 17:02:26-08:00


dt ∗ = 0. Show that the linearized system of equations is given by the following state space form:
   
0 1 0 0 0
k m  1 
0 − M
d~x(t)  −M g 0

=  ~x(t) +  M  u(t) (3)
 
dt  0 0 0 1  0 
0 Mk l MM+ml g 0 − M1 l
| {z } | {z }
A ~b


(HINT: Since we are linearizing around θ∗ = 0 and dt ∗ = 0, you can use the following approximations
for small values of θ:

sin θ ≈ θ
sin2 θ ≈ 0
cos θ ≈ 1
dθ 2
 
≈ 0.
dt

You do not have to do the full linearization using Taylor series, you can just substitute the approxima-
tions above. You will get the same answer as doing the linear Taylor series approximation.)
dp
Notice that for this particular choice of θ∗ and dθdt ∗ , the linearization does not depend on what p or dt
is. This is partially a stroke of luck and partially a consequence of the fact that the position p doesn’t
appear in the dynamics equations.
Solution: Using the approximations from the hint, (2) is linearized as follows:

d2 p k dp m 1
2
=− − gθ + u
dt M dt M M (4)
d2 θ k dp M + m 1
2
= + gθ − u
dt M l dt Ml Ml
Now (4) can be represented in state space form as
      
p 0 1 0 0 p 0
dp  k m
0  dp
   1 
d  dt   0 − M −M g

 =   dt  +  M  u(t) (5)
  
dt  θ   0 0 0 1  θ   0 

dt
0 Mk l MM+m l g 0 dθ
dt
− M1 l
| {z } | {z }
A ~b

(b) For all subsequent parts, assume that m = 1, M = 10, g = 10, l = 1 and k = 0.1. Let’s consider the
discrete time representation of the state space (3) at time t = n∆. For simplicity, assume ∆ = 1. The
discrete time state ~xd follows the following linear model:

~xd [n + 1] = Ad ~xd [n] + ~bd ud [n] (6)

where Ad ∈ R4×4 and ~bd ∈ R4×1 . Find Ad and ~bd in terms of the eigenvalues and eigenvectors of
A, and ∆. State numerical values for Ad and ~bd .. Use the Jupyter notebook segway.ipynb for
all numerical calculations, and approximate the results to 2 or 3 significant figures.

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 3
Homework 14 @ 2021-12-10 17:02:26-08:00

(HINT: Recall that the continuous time scalar differential equation

dz(t)
= λz(t) + cw(t)
dt
can be represented in discrete time (n∆ = t) as follows:
  
 (eλ∆ ) · z [n] + eλ∆ −1 · cw [n] if λ 6= 0
d λ d
zd [n + 1] =
 (1) · zd [n] + (∆) · cwd [n] if λ = 0

Use the eigendecompostion of A = V ΛV −1 to do change of basis variables, and you should finally
reach

~xd [n + 1] = V Λd V −1 ~xd [n] + V Md V −1~b ud [n]


| {z } | {z }
Ad ~bd

What are the elements of Λd and Md in terms of the elements of Λ? You may find in later parts of the
notebook that you have Ad and ~bd which can serve as a sanity check for your derivation and numerical
calculations.)
Solution: Using the eigendecompostion of A = V ΛV −1 and a change of variable ~y (t) = V −1 ~x(t),
we can transform (3) into a system of decoupled equations

d~y
= Λ~y (t) + V −1~bu(t) (7)
dt
This can be represented in discrete time as follows:

~yd [n + 1] = Λd ~yd [n] + Md V −1~bud [n] (8)

where Λd is a diagonal matrix given by


(
eΛii ∆ 6 0
if Λii =
Λdii = (9)
1 if Λii = 0

and Md is a diagonal matrix given by


(
eΛii ∆ −1
Λii 6 0
if Λii =
Mdii = (10)
∆ if Λii = 0

Changing variables back to ~xd [n] = V ~yd [n], we get

~xd [n + 1] = V Λd V −1 ~xd [n] + V Md V −1~b ud [n] (11)


| {z } | {z }
Ad ~bd

Plugging in the values of m = 1, M = 10, g = 10, l = 1, k = 0.1, ∆ = 1 in the Jupyter notebook,

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 4
Homework 14 @ 2021-12-10 17:02:26-08:00

we get
 
1 0.994 −1.161 −0.286
0 0.987 −4.138 −1.161
 
Ad ≈ 
0 0.012 13.797 4.150 

0 0.041 45.634 13.797
  (12)
0.056
0.128 
 
~bd ≈ 
−0.116
 
−0.414

(c) Show that the linear-approximation discrete time system in (6) is controllable by using the ap-
propriate matrix in the Jupyter notebook.
(HINT: Is the controllability matrix full rank? You have to use numerical values of Ad and ~bd from the
previous part. Use the Jupyter notebook for all numerical calculations.)
Solution: Since Ad ∈ R4×4 and ~bd ∈ R4×1 , the controllability matrix is given by
h i
C = ~bd Ad~bd A2d~bd A3d~bd

Using the Jupyter notebook, we can see that C has rank = 4, hence the discrete time system is control-
lable.
(d) Since the linear-approximation discrete time system is controllable, it is possible to reach any final
state ~xd,f starting from any initial state ~xd,i using an appropriate sequence of inputs in exactly 4 steps,
provided that the deviations are small enough so that the linearization approximation is valid. Set up
a set of linear equations to solve for the ud [0], ud [1], ud [2], ud [3] giventhe
 initial and final states.
0
0
 
Find the input sequence to reach the upright position ~xd,f = ~xd [4] =   starting from an initial
0
0
 
−2
 3.1 
 
state ~xd,i = ~xd [0] =  . Use the Jupyter notebook for all numerical calculations and simulation.
 0.3 
−0.6
Explain qualitatively what you observe from the segway simulation.
(HINT: Use (6) and loop unrolling to express ~xd [4] as a linear combination of ~xd [0], ud [3], ud [2],
ud [1], ud [0].)
Solution: In 4 time steps, the discrete time system in (6) reaches

~xd [4] = A4d ~xd [0] + ~bd ud [3] + Ad~bd ud [2] + A2d~bd ud [1] + A3d~bd ud [0]
 
ud [3]
h i u [2]
=⇒ ~xd [4] − A4d ~xd [0] = ~bd Ad~bd A2d~bd A3d~bd  d 
 
ud [1]
ud [0]

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 5
Homework 14 @ 2021-12-10 17:02:26-08:00

 
ud [3]
u [2]
 
=C d 
ud [1]
ud [0]
 
ud [3]
u [2]
   
=⇒  d  = C −1 ~xd [4] − A4d ~xd [0]
ud [1]
ud [0]
   
ud [3] −1.636
u [2]  48.650 
   
Using the notebook, we can calculate  d  ≈  .
ud [1] −97.747
ud [0] 17.433
The simulation shows the inverted pendulum stabilizing to the upright position at rest from the initial
position.
 
−2
 3.1 
 
(e) Now suppose we try to use an initial state ~xd,i = ~xd [0] =   for which the approximation is
 3.3 
−0.6
poor since θi = 3.3 is very far from the linearization point θ∗ = 0. Using the equations derived in
the previous part, use the Jupyter notebook to determine the input sequence to reach the same
final upright position. Explain qualitatively what you observe from the segway simulation. Use
the Jupyter notebook for all numerical calculations and simulation.
   
ud [3] −15.049
ud [2]  445.384 
   
Solution: Using the notebook, we can calculate  ≈ .
ud [1] −851.258
ud [0] 387.623
The simulation shows that the inverted pendulum again stabilizes to the upright position at rest from
the initial position, but does some weird unexpected rotations before reaching there.

Compare the simulation results in parts (d) and (e). In both cases, the segway finally stabilizes to an upright
position at rest. However, in part (d) the behavior of the segway looks more realistic whereas in part (e) it is
doing some wild unexpected rotations.
This is because the linearization approximation is valid with the small initial values of θ and dθ dt in part
(d). So this discrete time linear model is a good representation of the original continuous time non-linear
system. Hence the trajectory taken by the segway from the initial to the final position is similar to what we
may expect from real life physics.
However in part (e), the linearization approximation is not really valid. The approximate model still con-
verges to the final upright position because (6) is controllable as we proved in part (c). However, since the
approximation is not valid anymore, this discrete time linear model is not a good representation of the orig-
inal continuous time non-linear system. Hence the predicted trajectory is extremely weird with the segway
undergoing a few full rotations, and does not match what we would expect from the real system.
We can still analyze the system in continuous time by directly solving the set of non-linear differential
equations in (2) (out of 16B scope) or in discrete time using a finely discretized (and still nonlinear) version
of (2). Note that there are two independent distinctions we are making, i.e. continuous vs discrete, and

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 6
Homework 14 @ 2021-12-10 17:02:26-08:00

linear vs non-linear. Part (e) failed because it’s beyond the scope of the linear model, not because we are
using a discrete time system. A non-linear discrete time analysis would also give the correct solution.

(f) Let’s analyze the behavior of the segway by comparing the continuous time linear model and contin-
uous time non-linear model. Deriving the control input to bring the segway to the upright position
at rest requires more care which is out of 16B scope, so we will just look at the simple case of the
segway freely settling to steady state in the absence of any control input, i.e. u(t) = 0. Toggle the
linearized flag between True and False in the Jupyter notebook, and qualitatively explain
the differences in the trajectory as the segway freely swings around.
Solution: We notice that the non-linear system is a much more accurate representation of what a
real segway looks like. The linearized system starts off looking realistic, but then behaves oddly as
the simulation continues for longer. This is because we have approximated a non-linear system with a
linear one, so there will be a discrepancy when we have a large deviation from the operating point.

There’s actually much more that we could have you do with this problem with what is in scope in 16B. But
the semester is drawing to a close and you need to study for other courses too. So we will stop here.

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 7
Homework 14 @ 2021-12-10 17:02:26-08:00

3. Adapting Proofs to the Complex Case


At many points in the course, we have made assumptions that various matrices or eigenvalues are real while
discussing various theorems. If you have noticed, this has always happened in contexts where we have
invoked orthonormality during the proof or statement of the result. Now that you understand the idea of
orthonormality for complex vectors, and how to adapt Gram-Schmitt to complex vectors, you can go back
and remove those restrictions. This problem asks you to do exactly that.
Unlike many of the problems that we have given you in 16A and 16B, this problem has far less hand-
holding — there aren’t multiple parts breaking down each step for you. Fortunately, you have the existing
proofs in your notes to work based on. So this problem has a secondary function to help you solidify your
understanding of these earlier concepts ahead of the final exam.
There is one concept that you will need beyond the idea of what orthogonality means for complex vectors
as well as the idea of conjugate-transposes of vectors and matrices. The analogy of a real symmetric matrix
S that satisfies S = S > is what is called a Hermitian matrix H that satisfies H = H ∗ where H ∗ = H > is
the conjugate-transpose of H.

(a) The upper-triangularization theorem for all (potentially complex) square matrices A says that there
exists an orthonormal (possibly complex) basis V so that V ∗ AV is upper-triangular.
Adapt the proof from the real case with assumed real eigenvalues to prove this theorem.
Feel free to assume that any square matrix has an (potentially complex) eigenvalue/eigenvector pair.
You don’t need to prove this. But you can make no other assumptions.
(HINT: Use the exact same argument as before, just use conjugate-transposes instead of transposes.)
Solution:
The proof proceeds by induction over the size of matrices. The result is trivialy true for 1-dimensional
matrices since every 1-dimensional matrix is already diagonal, and hence upper-triangular.
We now assume (for purposes of induction) that the statement is true for all n−1-dimensional matrices.
This means that given any n − 1-dimensional matrix M , we can find an orthonormal basis Vn−1 with
∗ M V is upper-triangular.
n − 1 vectors so that Vn−1
Given an n-dimensional matrix A, we find a possibly complex eigenvalue/eigenvector pair λ, ~v with
k~v k = 1 so that λ~v = A~v . We take the eigenvector ~v and extend it to an orthonormal potentially
complex basis for all of n-dimensional complex space U = [~v , R = ~r1 , ~r2 , . . . , ~rn−1 ]]. This can be
done using Gram-Schmitt, which discussion showed works just as well for complex vectors. Then we
change coordinates to this orthonormal basis by computing

U ∗ AU = [~v , R]∗ A[~v , R] (13)


" #
~v ∗
= [A~v , AR] (14)
R∗
" #
~v ∗
= [λ~v , AR] (15)
R∗
" #
λ~v ∗~v ~v ∗ AR
= (16)
λR∗~v R∗ AR
" #
λ ~v ∗ AR
= ~ (17)
0 R∗ AR

where we simply use the orthogonality of R and ~v to get the structure of the first column.

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 8
Homework 14 @ 2021-12-10 17:02:26-08:00

It is clear we want to recurse into the R∗ AR block.


So we use the inductive hypothesis on M = R∗ AR to get a basis Vn−1 such that Vn−1
∗ MV
n−1 =
∗ ∗ ∗
Vn−1 R ARVn−1 = (RVn−1 ) A(RVn−1 ) is upper-triangular.
Now, define V = [~v , RVn−1 ] and compute:

V ∗ AV = [~v , RVn−1 ]∗ A[~v , RVn−1 ] (18)


" #
~v ∗
= ∗ R∗ [A~ v , ARVn−1 ] (19)
Vn−1
" #
~v ∗
= ∗ R∗ [λ~ v , ARVn−1 ] (20)
Vn−1
" #
λ ~v ∗ ARVn−1
= (21)
λVn−1 R∗~v Vn−1 ∗ R∗ ARV
n−1
" #
λ ~v ∗ ARVn−1
= (22)
Vn−1 0 Vn−1 R∗ ARVn−1
~ ∗
" #
λ ~v ∗ ARVn−1
= ~ (23)
0 (RVn−1 )∗ A(RVn−1 )

which is upper triangular since by the inductive-hypothesis, (RVn−1 )∗ A(RVn−1 ) is upper-triangular.


All that remains is to show that V is an orthonormal matrix, which can be seen by computing

V ∗ V = [~v , RVn−1 ]∗ [~v , RVn−1 ] (24)


" #
~v ∗
= ∗ R∗ [~ v , RVn−1 ] (25)
Vn−1
" #
~v ∗~v ~v ∗ RVn−1
= ∗ R∗~ ∗ R∗ RV (26)
Vn−1 v Vn−1 n−1
" #
1 ~0∗ Vn−1
= ∗ ~ ∗ (R∗ R)V (27)
Vn−1 0 Vn−1 n−1
" #
1 ~0∗
= ~ ∗ (28)
0 Vn−1 Vn−1
" #
1 ~0∗
= ~ (29)
0 I
=I (30)

which completes the proof. By induction, all complex matrices can be upper-triangularized by an
orthonormal change of coordinates.
Congratulations, once you have completed this part you essentially can solve all systems of linear
differential equations based on what you know, and you can also complete the proof that having all the
eigenvalues being stable implies BIBO stability.
(b) The spectral theorem for Hermitian matrices says that a Hermitian matrix has all real eigenvalues and
an orthonormal set of eigenvectors.

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 9
Homework 14 @ 2021-12-10 17:02:26-08:00

Adapt the proof from the real symmetric case to prove this theorem.
(HINT: As before, you should just leverage upper-triangularization and use the fact that (ABC)∗ =
C ∗ B ∗ A∗ . There is a reason that this part comes after the first part.)
Solution: By upper-triangularization, a Hermitian matrix H can be upper-triangularized so that an
upper-triangular matrix U = V ∗ HV where V is orthonormal. But U ∗ = (V ∗ HV )∗ = V ∗ H ∗ (V ∗ )∗ =
V ∗ H ∗ V = V ∗ HV = U since H is Hermitian. This means that the off-diagonal elements of U must be
zero since conjugate transposing would make the upper-triangular elements become the conjugates of
the lower triangular elements, which would be equal to zero since U is upper-triangular. Furthermore,
along the diagonal, U [i][i] = U [i][i] implies that all these elements must be real. Consequently, U
must be a real diagonal matrix. Call U = Λ to emphasize this fact.
Now we know that Λ = V ∗ HV which implies that H = V ΛV ∗ . Consider the i-th column vector
~vi of V . Computing H~vi = V ΛV ∗~vi = V Λ~ei = V λi~ei = λi~vi where we use ~ei to denote the i-th
standard basis vector — i.e. the i-th column of the identity matrix — and λi is the i-th diagonal entry
in Λ which we know from above must be a real number.
This means that all the columns of V are eigenvectors corresponding to real eigenvalues, establishing
the theorem we want to prove.
(c) The SVDP for complex matrices says that any rectangular (potentially complex) matrix A can be written
as A = i σi ~ui~vi∗ where σi are real positive numbers and the collection {~ui } are orthonormal (but
potentially complex) as well as {~vi }.
Adapt the derivation of the SVD from the real case to prove this theorem.
Feel free to assume that A is wide. (Since you can just conjugate-transpose everything to get a tall
matrix to become wide.)
(HINT: Analogously to before, you’re going to have to show that the matrix A∗ A is Hermitian and that
it has non-negative eigenvalues. Use the previous part. There is a reason that this part comes after the
previous parts.)
Solution:
Consider H = A∗ A. Clearly H ∗ = (A∗ A)∗ = A∗ (A∗ )∗ = A∗ A = H and so H is Hermetian.
From the previous part, it can be written H = V ΛV ∗ where V are the eigenvalues and λi are the real
eigenvalues. Notice that ~vi∗ H~vi = ~vi∗ λi~vi = λi but also that ~vi∗ H~vi = ~vi∗ A∗ A~vi = (A~vi )∗ (A~vi ) =
kA~vi k2 ≥ 0. This means that λi ≥ 0 and so we can just ask for them to be sorted in descending
order, since we are free to rearrange eigenvalues to whatever order we want as long as we sort the
corresponding eigenvectors to maintain the correspondance.

Let σi = λi for all the nonzero λi . Because λi > 0, this means that σi > 0 as well. Suppose that
there are r of them. Clearly, for all the others ~vk for k > r, we know by the fact that we sorted the
non-negative eigenvalues in descending order that H~vk = ~0. But this means that ~vk∗ H~vk = 0 and
hence k(A~vk )k2 = (A~vk )∗ (A~vk ) = 0. This means that A~vk = ~0 as well since the only vector with
zero length is the zero vector. That means that all these ~vk lie in the nullspace of A.
r
1
p ∗ ~vi∗ A∗ A~vi
Now, we can define ~ui = σ i A~vi for i = 1, . . . , r. Computing the norm k~ui k = ~ui ~ui = σi2
=
q ∗ q ∗
~vi H~vi λi~vi ~vi ~v ∗ H~v
λi = λi = 1 so they’re normalized. Meanwhile, the inner products ~u∗k ~ui = σkk σi i =
λi~vk∗~vi
σk σi = 0 so these are also orthogonal and hence orthonormal.
Pr
All that remains is to show that A = ui~vi∗ . To do this, consider any arbitrary vector ~x.
i=1 σi ~
P V is an orthonormal basis by the previous part, we know that there exist coefficiencts so
Because the
that ~x = j αj ~vj .

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 10
Homework 14 @ 2021-12-10 17:02:26-08:00

Now we just compute:


X
A~x = A αj ~vj (31)
j
Xr X
=( αj A~vj ) + ( αj A~vj ) (32)
j=1 j>r
Xr
=( αj A~vj ) + ~0 (33)
j=1
r
X
= αj σj ~uj (34)
j=1

1
where the last line follows from the definition of ~ui = σi A~
vi . Now, compare what happened above
with
Xr Xr X

( σi ~ui~vi )~x = ( σi ~ui~vi∗ ) αj ~vj (35)
i=1 i=1 j
r
X X
= σi ~ui αj ~vi∗~vj (36)
i=1 j
Xr r
X
= σi ~ui αi = αi σi ~ui (37)
i=1 i=1

where the second sum over j vanished because ~vi∗~vj and the V are orthonormal. This means that only
the i = j term survives in that sum.
Because the two are the same no matter what ~x is, the two matrices must be the same — since a matrix
is defined by what it does to vectors. If two matrices always do the same thing to every vector, they
must be the same matrix.
This establishes the SVD in the context of complex matrices.
In all of these proofs, the proof was identical to the real case except we had to use complex conjugate
transposes instead of transposes.

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 11
Homework 14 @ 2021-12-10 17:02:26-08:00

4. Minimum Norm Variants (Optional)


In lecture and HW, you saw how to solve minimum norm problems in which we have a wide matrix A and
solve A~x = ~y such that ~x is a minimum norm solution: k~xk ≤ k~zk for all ~z such that A~z = ~y .
We also saw in the HW how we can solve some variants in which we were interested in minimizing the norm
kC~xk instead. You have solved the case where C is invertible and square or a tall matrix. This question asks
you about the case when C is a wide matrix. The key issue is that wide matrices have nontrivial nullspaces
— that means that there are “free” directions in which we can vary ~x while not having to pay anything. How
do we best take advantage of these “free” directions?
Parts (a-b) are connected; parts (c-d) are another group that can be done independently of (a-b); and parts
(e-g) are another group that can be started independently of either (a-b) or (c-d). If you get stuck, try
another group. During debugging, many TAs found it easier to start with parts (c-g), and coming back to
(a-b) at the end.
In parts (a) and (b) you will reduce the problem of minimizing kC~xk to a problem which you solved a
variant of: minimizing only the norm of a vector k~x ec k with some constraint involving some freely chosen
~x
ef without requiring direct consideration of the matrix C. With (a) and (b), we wish to give you another
example of converting problems that are new into problems you have solved before or problems that are
proximal to ones you have solved before. In parts (c) and (d) we emphasize the skill of solving a problem in
a nice setting to gain some insight before relaxing the nice properties in parts (e) to (g) for more generality.

(a) Given a wide matrix A (with m columns and n rows) and a wide matrix C (with m columns and r
rows), we want to solve:

min kC~xk (38)


~
x such that A~
x=~
y

As mentioned above, the key new issue is to isolate the “free” directions in which wePcan vary ~x so
that they might be properly exploited. Consider the full SVD of C = U ΣC V > = `i=1 σc,i ~ui~vi> .
Here, we write:
   
h i | | | | |
V = VC | VF , VC = ~v1 ~v2 · · · ~v` , VF = ~v`+1 · · · ~vm  (39)
   
| | | | |

so that the columns of VC all correspond to singular values σc,i > 0 of C, and the columns of VF form
an orthonormal basis for the nullspace of C.
" #
~x
e = ~ c where the `-dimensional ~x
Change variables in the problem to be in terms of ~x ec has i-th entry
e
x
ef
ec,i = αi~vi> ~x, and the (m − `)-dimensional ~x
x ef has i-th entry x > ~
ef,i = ~v`+i x. In vector/matrix form,

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 12
Homework 14 @ 2021-12-10 17:02:26-08:00

 
α1 0 · · · 0
 0 α2 · · · 0  >
 
~x > ~
ef = VF ~x and x
ec =  . .. . .  VC ~x. Or directly:
.. 

 .
. . . .
0 0 · · · α`
  
 α1 0 · · · 0 
 
 α1 0 · · · 0
" #  0 α2 · · · 0  > 
~x  0 α2 · · · 0  >
 

. . . . V 
~x ec  .
e = ~ =  . . . . C ~
 x, . .. . . ..  VC ∈ R`×m , VF> ∈ R(m−`)×m .

. . . .
x
ef 
 0 0 · · · α`
 
 . . . . 
  0 0 · · · α`
VF>
(40)
Express ~x in terms of ~x
ef and ~x
ec . Assume the αi 6= 0 so the relevant matrix is invertible.
What iskC~xk in terms of x ~ef and ~xec ? Simplify as much as you can for full credit.
(HINT: If you get stuck on how to express ~x in terms of the new variables, think about the special case
when ` = 1 and α1 = 21 . How is this different from when α1 = 1? The SVD of C might be useful when
looking atkC~xk. A fact you may use without proof is that a vector ~x may be decomposed uniquely as
~x = ~xV + ~xV ⊥ where ~xV is in the span of a matrix V ’s columns, and ~xV ⊥ is in the subspace of vectors
orthogonal to V ’s columns.)
Solution: Let us give a name to the matrix containing the αi .
 
α1 0 · · · 0
 0 α2 · · · 0 
 
B= . .. . . ..  (41)
. . . . .

0 0 · · · αl

Note that we can write ~x ec = BVC> ~x and ~x


ef = VF> ~x. Since the columns of VC and VF together form
a basis for the space, it follows that we can write ~x = VC w ~ 1 + VF w
~ 2 , where ~x is expressed as a linear
combination of the ~vi with unknown weights w ~ 1 and w
~ 2 . Then, by premultiplying ~x in by VC and VF ,
we have by orthonormality of the columns of VC and VF :

VC> ~x = VC> (VC w


~ 1 + VF w
~ 2) VF> ~x = VF> (VC w
~ 1 + VF w
~ 2)
−1~ ~x
B x ec = I w
~ 1 + 0w~2 ef = 0w~1 + Iw ~2 (42)
B −1~x
ec = w
~1 ~x
ef = w
~2

We can thus write ~x in terms of ~x


ec and ~x
ef .
" #
h i ~x
−1~
~x = VC B ec + VF ~x
x ef = VC B −1 VF ~ c (43)
e
x
ef

Another approach can proceed from block matrix reasoning.


" # " # " #" #
~x BV > B 0 VC>
~x
e = ~c =
e C ~ x = ~x (44)
x
ef VF> 0 I VF>

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 13
Homework 14 @ 2021-12-10 17:02:26-08:00

" #−1 " #


B 0 ~x
ec
~ = V > ~x (45)
0 I x
ef

Since B is diagonal, the matrix containing B in the previous line is also diagonal, so its inverse has the
reciprocal of all the diagonal entries which yields B −1 as a submatrix. We can also premultiply by V
to cancel the V > .
" #" # " # " #
B −1 0 ~x B −1~x h i B −1~x
= VC B −1~x
ec + VF ~x
ec ec ec
~x = V =V = VC VF ef (46)
0 I ~xef I ~x
ef ~x
ef

The above level of detail and both approaches were not required and is for your understanding. Mark
your approach appropriately for identifying the relationship between ~x and ~xe.
We can now compute kC~xk2 in terms of ~x ec and ~x
ef . For brevity, define the following diagonal ` × `
matrix:  
σc,1
ΣC,` = 
 .. 
. (47)
 . 
σc,`

kC~xk2 = ~x> C > C~x


= ~x> V Σ> >
C U U ΣC V ~
>
x
= (VC B −1~x ec + VF ~x ef )> V Σ> >
C ΣC V (VC B x
−1~
ec + VF ~xef )
" #> " #> " # " #
B −1~x ec > Σc,` 0`×(m−`) Σc,` 0`×(m−`) > B −1~xec
= ~x V V V V ~x
ef 0(n−`)×` 0(n−`)×(m−`) 0(n−`)×` 0(n−`)×(m−`) ef
" #" #
h
> > >
i Σ2c,` 0`×(m−`) B −1~x
ec
= ~x ec B −1 ~x
ef 0 ~
(m−`)×` 0(m−`)×(m−`) x
ef
   
2
σc,1 0 ··· 0 0 0 ··· 0
2 ··· 0 
> −1  0 σc,2 > 0 0 · · · 0
  
~ ~ ~ ~ef
 −1
=x ec B  .  .. .. ..  B x
 ec + x
ef  . . .

x
.
 .. . . .   .. .. . . .. 
0 0 · · · σc,` 2 0 0 ··· 0
 2 
σc,1
 α21 0 ··· 0 
 2
σc,2 
> 0

α 2 ··· 0 
= ~x
~
ec  . ..
2
..  x
ec
 . ..
 . . . . 

 σ2 
0 0 · · · αc,` 2
`
`
!
X 2
σc,i
= (~x
ec,i )2 · .
i=1
αi2
(48)

Above, we relied crucially on the fact that VC> VF would give 0s while VC> VC gives an identity and
so does VF> VF . This is what allows us to effectively break the huge Σ>
C ΣC matrix into two relevant
diagonal pieces — one containing the diagonal square matrix with the squared nonzero singular values

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 14
Homework 14 @ 2021-12-10 17:02:26-08:00

and the other filled with zeros. Hence, the norm kC~xk is
v
u ` !
uX σ 2
kC~xk = t (~x
ec,i )2 · i
2 = kB −1 Σc,` ~xc k = kΣc,` B −1 ~xc k. (49)
i=1
α i

(b) Continuing the previous part, give appropriate values for the αi so that the problem (38) becomes

  min   kx
ec k ~ (50)
~x
ec h i ~x
ec
~x
e= such that AC | AF  =~y
~x
ef ~x
ef

Give explicit expressions for AC and AF in terms of the original A and terms arising from the
SVD of C. Because you have picked values for the αi , there should be no αi in your final expressions
for full credit.
(HINT: How do the singular values σc,i interact with the αi ? Then apply the appropriate substitution
to (38) to get (50).)
Solution: From the equation we derived in the previous part’s solution we see that in order to have
kC~xk = k~x ec k we should choose α1 = σ1 , α2 = σ2 , . . . , αl = σl . That is, B = Σc,` . Then, by
substituting " #
h i ~x
~x = VC B −1 VF ~ c (51)
e
x
ef
which we derived from the previous part into A~x = ~y we get:
" # " # " #
h i ~x h i ~x h i ~x
A~x = A VC B −1 VF ~ = AVC B −1 AVF ~ = AC , AF ~ c = ~y .
c c
(52)
e e e
x
ef x
ef x
ef
 
1
0 ··· 0
 σc,1 1
···

 0 σc,2 0 
By matching quantities, AC = AVC Σ−1 = AVC  .  and AF = AVF .
 
c,`  .. .. .. ..
 . . . 

1
0 0 ··· σc,`
(c) Let us focus h on a simple case.
i (You can do this even if you didn’t get the previous parts.) Suppose
that A = AC | AF where the columns of AF are orthonormal, as well as orthogonal to the
columns
" # of AC . The columns of A together span the entire n-dimensional space. We directly write
~xc
~x = so that A~x = AC ~xc + AF ~xf . Now suppose that we want to solve A~x = ~y and only care
~xf
about minimizingk~xc k. We don’t care about the length of ~xf — it can be as big or small as necessary.
In other words, we want to solve:

  min   k~xc k (53)


~
xc h i ~
xc
~
x= such that AC | AF  =~
y
~
xf ~
xf

Show that the optimal solution has ~xf = A> F~y.


(HINT: Multiplying both sides of something by A> F might be helpful.)

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 15
Homework 14 @ 2021-12-10 17:02:26-08:00

Solution: The optimal solution of the minimization satisfies what all solutions, ~xsol , of the constraint
equation must satisfy:
~y = A~xsol = AC ~xc + AF ~xf
and hence we can multiply both sides by A>
F on the left side to get

A>
F~y = A> x c + A>
F AC ~ xf = ~0 + I · ~xf = ~xf .
F AF ~

(d) Continuing the previous part, compute the optimal ~xc . Show your work.
(HINT:PWhat is the work that ~xc needs to do? ~y − AF A>
F~y might play a useful role, as will the SVD of
AC = i σi~ti w~ i> . )
Solution: From the above question, we have

~y = AC ~xc + AF ~xf = AC ~xc + AF A>


F~y

which can be rewritten as


AC ~xc = (I − AF A>
F )~
y
and since we want to minimize k~xc k, the solution will be

bc = A† (I − AF A> )~y ,
~x C F

where recall that A†C is the Moore-Penrose pseudoinverse of AC . To calculate the pseudoinverse, let
X
AC = ~ i> = T ΣW > .
σi~ti w
i

Here we are using the compact SVD of AC where Σ is the square matrix of nonzero singular values,
with T and W as non-square matrices with the corresponding left and right singular vectors. Then, the
pseudoinverse of AC is given by
A†C = W Σ−1 T > .
Finally, the solution we want is
 
~x
X 1 ~>
bc = W Σ−1 T > (I − AF A>
F )~
y= w
~i t (I − AF A>
F )~
y. (54)
σi i
i

It is interesting to ask the question whether this special case can be further simplified. And indeed, it
can be. Notice that the columns ~ti span the columns of AC and in this special case, we have said that
the columns of AC are orthogonal to the columns of AF . This means that ~t> ~>
i AF = 0 . Consequently,
  X 1
1 ~>
y = W Σ−1 T > ~y = A†C ~y
X
> ~t>
~x
bc = w
~i ti (I − AF AF )~y = w
~i i ~
σi σi
i i

This is also a fully correct answer for this special case. The pseudo-inverse for a matrix that doesn’t
have a full column rank automatically contains an implicit projection-type aspect to it. This is some-
thing that you might have considered in the context of the MIMO HW problem’s exploration of the
connection between least-squares and minimum-norm solutions.
(e) Now suppose that AC did not necessarily have its columns orthogonal to AF . Continue to assume that
AF has orthonormal columns. (You can do this part even if you didn’t get any of the previous parts.)

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 16
Homework 14 @ 2021-12-10 17:02:26-08:00

Write the matrix AC = AC⊥ + ACF where the columns of ACF are all in the column span of AF and
the columns of AC⊥ are all orthogonal to the columns of AF . Give an expression for ACF in terms
of AC and AF .
(HINT: What does this have to do with projection and least squares?)
Solution: We can see that ACF is the projection of AC onto AF , and hence we can directly get
ACF = AF A> F AC since the AF has orthonormal columns. This a projection onto a subspace (by
using a whole matrix at a time) we used when we did system identification using least squares.
To expand the argument, we can also do it the following way. Note that

A> > > >


F AC = AF AC⊥ + AF ACF = AF ACF .

Since ACF is in the span of AF , we can write ACF = AF W for some W . Then,

A> > > −1 > >


F AC = AF AF W =⇒ W = (AF AF ) AF AC = AF AC .

In this case, we didn’t need to write out the SVD of AF to get this inverse term. This is because we
assumed AF is orthonormal and hence A> F AF = I is known.
Finally, recall that we defined ACF = AF W . Then, we conclude

ACF = AF A>
F AC .

(f) Continuing the previous part, compute the optimal ~xc that solves (53): (copied below)

  min   k~xc k
~
xc h i ~
xc
~
x= such that AC | AF  =~
y
~
xf ~
xf

Show your work. Feel free to call the SVD as a black box as a part of your computation.
(HINT: What is the work that ~xc needs to do? The SVD of AC⊥ might be useful.)
Solution: Note that ~y can be written as ~yc⊥ + ~yf such that ~yc⊥ is in the span of AC⊥ , and ~yf is in
the span of AF . By projection, we have ~yf = AF A> y and ~yc⊥ = y − AF A>
F~ y = (I − AF A>
F~ F )~
y . On
the other hand, the amount ~xc contributes to ~yc⊥ is AC⊥ ~xc . Hence, all solutions must satisfy:
 
AC⊥ ~xc = I − AF A> F ~y.

Notice that we couldn’t have just used AC here instead of AC⊥ because we don’t really know what
AC⊥ ~xc must be — it is allowed to have some components in the subspace spanned by AF . However,
AC⊥ ~xc cannot have any components in that direction by construction, since every column in AC⊥ is
orthogonal to the entire subspace spanned by AF .
Anyway, because we want to minimize k~xc k, we solve this as
 
† >
~xc = AC⊥ I − AF AF ~y .

To calculate this pseudoinverse, let the compact SVD of AC⊥ be:


>
AC⊥ = UC⊥ ΣC⊥ VC⊥ .

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 17
Homework 14 @ 2021-12-10 17:02:26-08:00

Then, the pseudoinverse of AC⊥ is given by

A†C⊥ = VC⊥ Σ−1 >


C⊥ UC⊥ .

Hence, the solution we want is


 
~xc = VC⊥ Σ−1 U
C⊥ C⊥
>
I − A >
F F ~
A y.

The insight that carries over from our simplification made in part (c) and (d) is the utilization of the
pseudoinverse of non AF directions we get from our minimized vector k~xc k. The only difference is
that we project AC to AC⊥ = AC − ACF = (I − AF A> F )AC to only get directions that interact with
AC⊥ and cancel with ACF to insure non-redundancy with what ~xf gives us.
Once again, we can notice that because the columns of AC⊥ are orthogonal to the columns of AF by
construction, that this also carries over to the relevant columns of UC⊥ (the ones that correspond to
nonzero singular values). Consequently, the above can be simplified further to:

~xc = VC⊥ Σ−1 >


C⊥ UC⊥ ~
y

by again implicitly leveraging the connection between minimum-norm solutions and least squares. We
didn’t expect any students to necessarily notice this fact that the pseudo-inverse of AC⊥ effectively can
do all the work itself.
(g) Continuing the previous part, compute the optimal ~xf . Show your work.
You can use the optimal ~xc in your expression just assuming that you did the previous part correctly,
even if you didn’t. You can also assume a decomposition AC = AC⊥ + ACF from further above in
part (e) without having to write what these are, just assume that you did them correctly, even if you
didn’t do them at all.
(HINT: What is the work that ~xf needs to do? How is ACF relevant here?
Solution: We have
AC ~xc + AF ~xf = ~y . (55)

The simplest aproach is just to collect the terms and solve for ~xf by noticing AF ~xf = ~y − AC ~xc and
so ~xf = A>
F~y − A>F AC ~
xc . You could just stop here for full credit.
Replacing AC = AC⊥ + ACF and then using ACF = AF A> F AC as derived in part (e), we have

AC⊥ ~xc + AF A>


F AC ~
xc + AF ~xf = ~y . (56)

First, we multiply A> >


F on the left for both sides and recalling AF AC⊥ = 0, we get

A> xc + ~xf = A>


F AC ~ F~y.

This again implies that


~xf = A>
F~y − A>
F AC ~
xc . (57)

So doing this substitution doesn’t really buy us anything more.


Although you have worked out the problem here for the case of AF having orthonormal columns, you
should hopefully see that you actually could use what you know to solve the fully general case.

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 18
Homework 14 @ 2021-12-10 17:02:26-08:00

5. Remote Final Accomodation


If you are a student who will be outside of the 150 mile radius of UC Berkeley or if you are a student who
has a University-approved accommodation to being in-person, such as having an exemption to the vaccine
mandate, please fill out this Google form if you haven’t already to request remote accommodations for the
final exam which will take place on Friday, December 17, from 8 to 11 am PT.

6. Study Group Feedback Survey


If you are a student who participated in the study group survey we gave in the early weeks of the semester,
we would really appreciate your feedback on the group you were matched with. Though we had a section
devoted to it in the mid-semester feedback form, it was anonymized, which can not serve as good data for
the research team. Please fill out this Google form to provide feedback about the study groups.

7. End of Semester Survey (Updated)


Congratulations on making it to the last week of the semester! We would love to get your feedback about
the class. Please fill out this Google form. We will be awarding 1 global extra credit point to every student
in the class if we hit 70% end-of-semester survey completion by each individual grade category in this class
by Wednesday, December 15, at 11:59 PM. In other words, we want at least 70% of freshmen, 70% of
sophomores, 70% of juniors, and 70% of seniors in this class to complete the survey in order for everyone
to get extra credit. To improve our ability to zoom in on helping students who are struggling, we are asking
for students to voluntarily disclose their identities while filling out the surveys (unlike the campus course
surveys, we do not store your identities with survey responses). Such disclosures are purely voluntary, and
our only purpose in asking for them is to be able to zoom in on the much smaller subpopulation that is
performing poorly in the course and see what might be different for them.

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 19
Homework 14 @ 2021-12-10 17:02:26-08:00

8. Write Your Own Question And Provide a Thorough Solution.


Writing your own problems is a very important way to really learn material. The famous “Bloom’s Tax-
onomy” that lists the levels of learning (from the bottom up) is: Remember, Understand, Apply, Analyze,
Evaluate, and Create. Using what you know to create is the top level. We rarely ask you any homework
questions about the lowest level of straight-up remembering, expecting you to be able to do that yourself
(e.g. making flashcards). But we don’t want the same to be true about the highest level. As a practical mat-
ter, having some practice at trying to create problems helps you study for exams much better than simply
counting on solving existing practice problems. This is because thinking about how to create an interesting
problem forces you to really look at the material from the perspective of those who are going to create the
exams. Besides, this is fun. If you want to make a boring problem, go ahead. That is your prerogative. But
it is more fun to really engage with the material, discover something interesting, and then come up with a
problem that walks others down a journey that lets them share your discovery. You don’t have to achieve
this every week. But unless you try every week, it probably won’t ever happen.
You need to write your own question and provide a thorough solution to it. The scope of your question
should roughly overlap with the scope of this entire problem set. This is because we want you to exercise
your understanding of this material, and not earlier material in the course. However, feel free to combine
material here with earlier material, and clearly, you don’t have to engage with everything all at once. A
problem that just hits one aspect is also fine.
Note: One of the easiest ways to make your own problem is to modify an existing one. Ordinarily, we
do not ask you to cite official course materials themselves as you solve problems. This is an exception.
Because the problem making process involves creative inputs, you should be citing those here. It is a part of
professionalism to give appropriate attribution.
Just FYI: Another easy way to make your own question is to create a Jupyter part for a problem that had no
Jupyter part given, or to add additional Jupyter parts to an existing problem with Jupyter parts. This often
helps you learn, especially in case you have a programming bent.

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 20
Homework 14 @ 2021-12-10 17:02:26-08:00

9. Homework Process and Study Group


Citing sources and collaborators are an important part of life, including being a student!
We also want to understand what resources you find helpful and how much time homework is taking, so we
can change things in the future if possible.

(a) What sources (if any) did you use as you worked through the homework?
(b) If you worked with someone on this homework, who did you work with?
List names and student ID’s. (In case of homework party, you can also just describe the group.)
(c) Roughly how many total hours did you work on this homework? Write it down here where you’ll
need to remember it for the self-grade form.

Contributors:

• Ayan Biswas.

• Daniel Abraham.

• Anant Sahai.

• Kuan-Yun Lee.

• Divija Hasteer.

Homework 14, © UCB EECS 16B, Fall 2021. All Rights Reserved. This may not be publicly shared without explicit permission. 21

You might also like