This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.

Find out moreStephen Boyd

**Introduction to Linear Dynamical Systems
**

Autumn 2010-11

Copyright Stephen Boyd. Limited copying or use for educational purposes is ﬁne, but please acknowledge source, e.g., “taken from Lecture Notes for EE263, Stephen Boyd, Stanford 2010.”

Contents

Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture 1 – Overview 2 – Linear functions and examples 3 – Linear algebra review 4 – Orthonormal sets of vectors and QR factorization 5 – Least-squares 6 – Least-squares applications 7 – Regularized least-squares and Gauss-Newton method 8 – Least-norm solutions of underdetermined equations 9 – Autonomous linear dynamical systems 10 – Solution via Laplace transform and matrix exponential 11 – Eigenvectors and diagonalization 12 – Jordan canonical form 13 – Linear dynamical systems with inputs and outputs 14 – Example: Aircraft dynamics 15 – Symmetric matrices, quadratic forms, matrix norm, and SVD 16 – SVD applications 17 – Example: Quantum mechanics 18 – Controllability and state transfer 19 – Observability and state estimation 20 – Some ﬁnal comments

Basic notation Matrix primer Crimes against matrices Least-squares and least-norm solutions using Matlab Solving general linear equations using Matlab Low rank approximation and extremal gain problems Exercises

EE263 Autumn 2010-11

Stephen Boyd

Lecture 1 Overview

• course mechanics • outline & topics • what is a linear dynamical system? • why study linear systems? • some examples

1–1

Course mechanics

• all class info, lectures, homeworks, announcements on class web page: www.stanford.edu/class/ee263

course requirements: • weekly homework • takehome midterm exam (date TBD) • takehome ﬁnal exam (date TBD)

Overview

1–2

Prerequisites

• exposure to linear algebra (e.g., Math 103) • exposure to Laplace transform, diﬀerential equations

not needed, but might increase appreciation: • control systems • circuits & systems • dynamics

Overview

1–3

Major topics & outline

• linear algebra & applications • autonomous linear dynamical systems • linear dynamical systems with inputs & outputs • basic quadratic control & estimation

Overview

1–4

**Linear dynamical system
**

continuous-time linear dynamical system (CT LDS) has the form dx = A(t)x(t) + B(t)u(t), dt where: • t ∈ R denotes time • x(t) ∈ Rn is the state (vector) • u(t) ∈ Rm is the input or control • y(t) ∈ Rp is the output

Overview 1–5

y(t) = C(t)x(t) + D(t)u(t)

• A(t) ∈ Rn×n is the dynamics matrix • B(t) ∈ Rn×m is the input matrix • C(t) ∈ Rp×n is the output or sensor matrix • D(t) ∈ Rp×m is the feedthrough matrix for lighter appearance, equations are often written x = Ax + Bu, ˙ y = Cx + Du

• CT LDS is a ﬁrst order vector diﬀerential equation • also called state equations, or ‘m-input, n-state, p-output’ LDS

Overview

1–6

Some LDS terminology

• most linear systems encountered are time-invariant: A, B, C, D are constant, i.e., don’t depend on t • when there is no input u (hence, no B or D) system is called autonomous • very often there is no feedthrough, i.e., D = 0 • when u(t) and y(t) are scalar, system is called single-input, single-output (SISO); when input & output signal dimensions are more than one, MIMO

Overview

1–7

Discrete-time linear dynamical system

discrete-time linear dynamical system (DT LDS) has the form x(t + 1) = A(t)x(t) + B(t)u(t), where • t ∈ Z = {0, ±1, ±2, . . .} • (vector) signals x, u, y are sequences y(t) = C(t)x(t) + D(t)u(t)

DT LDS is a ﬁrst order vector recursion

Overview

1–8

Why study linear systems?

applications arise in many areas, e.g. • automatic control systems • signal processing • communications • economics, ﬁnance • circuit analysis, simulation, design • mechanical and civil engineering • aeronautics • navigation, guidance

Overview

1–9

Usefulness of LDS

• depends on availability of computing power, which is large & increasing exponentially • used for – analysis & design – implementation, embedded in real-time systems • like DSP, was a specialized topic & technology 30 years ago

Overview

1–10

Origins and history • parts of LDS theory can be traced to 19th century • builds on classical circuits & systems (1920s on) (transfer functions . . . . information theory. ) but with more emphasis on linear algebra • ﬁrst engineering application: aerospace. ) Overview 1–11 Nonlinear dynamical systems many dynamical systems are nonlinear (a fascinating topic) so why study linear systems? • most techniques for nonlinear systems are based on linear methods • methods for linear systems often work unreasonably well. in practice. for nonlinear systems • if you don’t understand linear dynamical systems you certainly can’t understand nonlinear dynamical systems Overview 1–12 . 1960s • transitioned from specialized topic to ubiquitous in 1980s (just like digital signal processing. . .

no details) • let’s consider a speciﬁc system x = Ax.Examples (ideas only. ˙ y = Cx with x(t) ∈ R16. y(t) ∈ R (a ‘16-state single-output system’) • model of a lightly damped mechanical system. looks almost random and unpredictable • we’ll see that such a solution can be decomposed into much simpler (modal) components Overview 1–14 . but it doesn’t matter Overview 1–13 typical output: 3 2 1 y 0 −1 −2 −3 0 50 100 150 200 250 300 350 3 2 1 t y 0 −1 −2 −3 0 100 200 300 400 500 600 700 800 900 1000 t • output waveform is very complicated.

2 0 50 100 150 50 100 150 50 100 150 50 100 150 50 100 150 50 100 150 50 100 150 t 200 250 300 350 50 100 150 200 250 300 350 200 250 300 350 200 250 300 350 200 250 300 350 200 250 300 350 200 250 300 350 200 250 300 350 (idea probably familiar from ‘poles’) Overview 1–15 Input design add two inputs.2 0 −0. ˙ solve for u to get: ustatic = −CA−1B −1 y = ydes = Cx ydes = −0.0. x. −2) simple approach: consider static conditions (u.5 0 −0. C ∈ R2×16 (same A as before) problem: ﬁnd appropriate u : R+ → R2 so that y(t) → ydes = (1. y constant): x = 0 = Ax + Bustatic. x(0) = 0 where B ∈ R16×2. ˙ y = Cx.36 Overview 1–16 .2 0 −0.63 0.2 0 1 0 −1 0 0.5 0 2 0 −2 0 1 0 −1 0 2 0 −2 0 5 0 −5 0 0. two outputs to system: x = Ax + Bu.

e. .2 0 u1 −0.2 −0. . .5 −2 −2.2 −0.8 −1 −200 0 200 400 600 800 t 1000 1200 1400 1600 1800 0.let’s apply u = ustatic and just wait for things to settle: 0 u1 −0.4 −0. 0.1 −200 2 1.1 0 −0.5 0 10 20 t 30 40 50 60 0 y2 −0.6 0 10 20 t 30 40 50 60 0. here y converges exactly in 50 sec Overview 1–18 .4 u2 0.4 u2 0.6 −0.5 0 10 20 t 30 40 50 60 .2 0 −0.4 −0.g. .5 0 −200 0 200 400 600 800 t 1000 1200 1400 1600 1800 0 y2 −1 −2 −3 −4 −200 0 200 400 600 800 1000 1200 1400 1600 1800 t .5 −1 −1.2 0 10 20 t 30 40 50 60 1 y1 0.5 0 200 400 600 800 t 1000 1200 1400 1600 1800 y1 1 0.5 0 −0. takes about 1500 sec for y(t) to converge to ydes Overview 1–17 using very clever input waveforms (EE263) we can do much better.3 0.2 0.

g. .5 −5 2 1 0 5 10 t 15 20 25 y1 0 −1 −5 0 5 10 15 20 25 t 0 −0. 5 u1 0 −5 −5 0 5 10 t 15 20 25 1 0. .in fact by using larger inputs we do still better.5 y2 −1 −1. e.5 −1 −1. here we have (exact) convergence in 20 sec Overview 1–19 in this course we’ll study • how to synthesize or design such inputs • the tradeoﬀ between size of u and convergence time Overview 1–20 .5 −2 −5 0 5 10 t 15 20 25 .5 u2 0 −0.

given quantized. ﬁltered signal y Overview 1–22 .5 0 1 0 1 2 3 4 5 6 7 8 9 10 w(t) 0 −1 1 0 1 2 3 4 5 6 7 8 9 10 y(t) 0 −1 0 1 2 3 4 5 6 7 8 9 10 t problem: estimate original signal u.Estimation / ﬁltering u H(s) w A/D y • signal u is piecewise constant (period 1 sec) • ﬁltered by 2nd-order system H(s). step response s(t) • A/D runs at 10Hz.5 0 1 2 3 4 5 6 7 8 9 10 s(t) 1 0. with 3-bit quantizer Overview 1–21 1 u(t) 0 −1 1.

. 1 u(t) (solid) and u(t) (dotted) ˆ 0. . yields terrible results Overview 1–23 formulate as estimation problem (EE263) .2 0 −0. .03.4 −0.4 0.8 0.8 −1 0 1 2 3 4 5 6 7 8 9 10 t RMS error 0.6 0.2 −0.e.simple approach: • ignore quantization • design equalizer G(s) for H(s) (i. GH ≈ 1) • approximate u as G(s)y .6 −0. well below quantization error (!) Overview 1–24 . . .

.2 .2 . am1 am2 · · · amn x1 x x = . . ..EE263 Autumn 2010-11 Stephen Boyd Lecture 2 Linear functions and examples • linear equations and functions • engineering examples • interpretations 2–1 Linear equations consider system of linear equations y1 y2 ym = a11x1 + a12x2 + · · · + a1nxn = a21x1 + a22x2 + · · · + a2nxn . . . where y1 y y = . = am1x1 + am2x2 + · · · + amnxn can be written in matrix form as y = Ax. . xn Linear functions and examples 2–2 . ym a11 a12 · · · a1n a a22 · · · a2n A = 21 .

∀x ∈ Rn ∀α ∈ R i.Linear functions a function f : Rn −→ Rm is linear if • f (x + y) = f (x) + f (y). y ∈ Rn • f (αx) = αf (x). ∀x. superposition holds x+y f (y) f (x + y) f (x) y x Linear functions and examples 2–3 Matrix multiplication function • consider function f : Rn → Rm given by f (x) = Ax. where A ∈ Rm×n • matrix multiplication function f is linear • converse is true: any linear function f : Rn → Rm can be written as f (x) = Ax for some A ∈ Rm×n • representation via matrix multiplication is unique: for any linear function f there is only one matrix A for which f (x) = Ax for all x • y = Ax is a concrete representation of a generic linear function Linear functions and examples 2–4 .e..

g. y is ‘output’ or ‘result’ • y = Ax deﬁnes a function or transformation that maps x ∈ Rn into y ∈ Rm Linear functions and examples 2–5 Interpretation of aij n yi = j=1 aij xj aij is gain factor from jth input (xj ) to ith output (yi) thus.Interpretations of y = Ax • y is measurement or observation.. • ith row of A concerns ith output • jth column of A concerns jth input • a27 = 0 means 2nd output (y2) doesn’t depend on 7th input (x7) • |a31| ≫ |a3j | for j = 1 means y3 depends mainly on x1 Linear functions and examples 2–6 . x is unknown to be determined • x is ‘input’ or ‘action’. e.

xi • A is diagonal.. . i.. means ith output depends only on ith input more generally. i. aij = 0 for i = j. sparsity pattern of A. in some ﬁxed direction x1 x2 x3 x4 (provided x. . in some ﬁxed direction • yi is (small) deﬂection of some node. list of zero/nonzero entries of A.• |a52| ≫ |ai2| for i = 5 means x2 aﬀects mainly y5 • A is lower triangular. i.e.e.e. .. y are small) we have y ≈ Ax • A is called the compliance matrix • aij gives deﬂection i per unit force at j (in m/N) Linear functions and examples 2–8 . . means yi only depends on x1. aij = 0 for i < j. shows which xj aﬀect which yi Linear functions and examples 2–7 Linear elastic structure • xj is external force applied at some node.

y6 are x-. y2. linear dependent (controlled) sources. y5.components of total force. and independent sources y3 ib y1 x1 βib x2 y2 • xj is value of independent source j • yi is some circuit variable (voltage. y-. z. A is called the impedance or resistance matrix Linear functions and examples 2–10 .Total force/torque on rigid body x4 x1 x3 CG x2 • xj is external force/torque applied at some point/direction/axis • y ∈ R6 is resulting total force & torque on body (y1. y4.components of total torque) • we have y = Ax • A depends on geometry (of applied forces and torques with respect to center of gravity CG) • jth column gives resulting force & torque for unit force/torque j Linear functions and examples 2–9 Linear static circuit interconnection of resistors. y-. current) • we have y = Ax • if xj are currents and yi are voltages. y3 are x-. z.

• yi is measured gravity anomaly at location i. at t = n) • we have y = Ax • a1j gives inﬂuence of applied force during j − 1 ≤ t < j on ﬁnal position • a2j gives inﬂuence of applied force during j − 1 ≤ t < j on ﬁnal velocity Linear functions and examples 2–11 Gravimeter prospecting gi gavg ρj • xj = ρj − ρavg is (excess) mass density of earth in voxel j. . n (x is the sequence of applied forces. . j = 1. y2 are ﬁnal position and velocity (i.Final position/velocity of mass due to applied forces f • unit mass. zero position/velocity at t = 0. .e.e. constant in each interval) • y1.. subject to force f (t) for 0≤t≤n • f (t) = xj for j − 1 ≤ t < j. . some component (typically vertical) of gi − gavg • y = Ax Linear functions and examples 2–12 . i..

• A comes from physics and geometry • jth column of A shows sensor readings caused by unit density anomaly at voxel j • ith row of A shows sensitivity pattern of sensor i Linear functions and examples 2–13 Thermal system location 4 heating element 5 x1 x2 x3 x4 x5 • xj is power of jth heating element or heat source • yi is change in steady-state temperature at location i • thermal transport via conduction • y = Ax Linear functions and examples 2–14 .

where aij = rij max{cos θij . xj θij rij illum. yi • n lamps illuminating m (small. no shadows • xj is power of jth lamp. 0} (cos θij < 0 means patch i is shaded from lamp j) • jth column of A shows illumination pattern from lamp j Linear functions and examples 2–16 . yi is illumination level of patch i −2 • y = Ax.• aij gives inﬂuence of heater j at location i (in ◦C/W) • jth column of A gives pattern of steady-state temperature rise due to 1W at heater j • ith row shows how heaters aﬀect location i Linear functions and examples 2–15 Illumination with multiple lamps pwr. ﬂat) patches.

z = Bp. B has zero diagonal (ideally. . A is ‘large’. labor. . inadvertantly. to the other receivers) • pj is power of jth transmitter • si is received signal power of ith receiver • zi is received interference power of ith receiver • Gij is path gain from transmitter j to receiver i • we have s = Ap. B is ‘small’) Linear functions and examples 2–17 Cost of production production inputs (materials.Signal and interference power in wireless system • n transmitter/receiver pairs • transmitter j transmits to receiver j (and. parts. where aij = Gii i = j 0 i=j bij = 0 Gij i=j i=j • A is diagonal. . ) are combined to make a number of products • xj is price per unit of production input j • aij is units of production input j required to manufacture one unit of product i • yi is production cost per unit of product i • we have y = Ax • ith row of A is bill of materials for unit of product i Linear functions and examples 2–18 .

is sum of rates of ﬂows passing through it • ﬂow routes given by ﬂow-link incidence matrix Aij = 1 ﬂow j goes over link i 0 otherwise • traﬃc and ﬂow rates related by t = Af Linear functions and examples 2–20 . .production inputs needed • qi is quantity of product i to be produced • rj is total quantity of production input j needed • we have r = AT q total production cost is rT x = (AT q)T x = q T Ax Linear functions and examples 2–19 Network traﬃc and ﬂows • n ﬂows with rates f1. . . traﬃc on link i. . fn pass from their source nodes to their destination nodes over ﬁxed routes in a network • ti.

. output deviation δy := y − y0 • then we have δy ≈ Df (x0)δx • when deviations are small. . . deﬁne input deviation δx := x − x0. and l1. dm be link delays. . . . they are (approximately) related by a linear function Linear functions and examples 2–22 ∂fi ∂xj x0 . total # of packets in network Linear functions and examples 2–21 Linearization • if f : Rn → Rm is diﬀerentiable at x0 ∈ Rn. y0 = f (x0). . then x near x0 =⇒ f (x) very near f (x0) + Df (x0)(x − x0) where Df (x0)ij = is derivative (Jacobian) matrix • with y = f (x). . ln be latency (total travel time) of ﬂows • l = AT d • f T l = f T AT d = (Af )T d = tT d.link delays and ﬂow latency • let d1.

y) ∈ R2: ρi(x. 3.Navigation by range measurement • (x. 2. where (y0 − qi) (x0 − pi)2 + (y0 − qi)2 • linearize around (x0. y) unknown coordinates in plane • (pi. y0) • ﬁrst column of A shows sensitivity of range measurements to (small) change in x from x0 • obvious application: (x0. qi) known coordinates of beacons for i = 1. q4) ρ4 ρ1 (x. y0) is last navigation ﬁx. y) = (x − pi)2 + (y − qi)2 δx δy . q2) • ρ ∈ R4 is a nonlinear function of (x. y) unknown position ρ3 (p3. a short time later Linear functions and examples 2–24 . q3) Linear functions and examples 2–23 ρ2 (p2. y0): δρ ≈ A ai1 = (x0 − pi) (x0 − pi)2 + (y0 − qi)2 ai2 = • ith row of A shows (approximate) change in ith range measurement for (small) shift in (x. 4 • ρi measured (known) distance or range from beacon i beacons (p1. (x. q1) (p4. y) is current position. y) from (x0. .

Broad categories of applications linear model or function y = Ax some broad categories of applications: • estimation or inversion • control or design • mapping or transformation (this list is not exclusive. given y • ﬁnd all x’s that result in y (i. y ≈ Ax (i. can have combinations .e. . ﬁnd x which is almost consistent) Linear functions and examples 2–26 .e. ﬁnd x s. .. if the sensor readings are inconsistent. ) Linear functions and examples 2–25 Estimation or inversion y = Ax • yi is ith measurement or sensor reading (which we know) • xj is jth parameter to be estimated or determined • aij is sensitivity of ith sensor to jth parameter sample problems: • ﬁnd x.t.. all x’s consistent with measurements) • if there is no x such that y = Ax.

Control or design y = Ax • x is vector of design parameters or inputs (which we can choose) • y is vector of results. decode or undo the mapping) Linear functions and examples 2–28 . ﬁnd a small or eﬃcient x that meets speciﬁcations) Linear functions and examples 2–27 Mapping or transformation • x is mapped or transformed to y by linear function y = Ax sample problems: • determine if there is an x that maps to a given y • (if possible) ﬁnd an x that maps to y • ﬁnd all x’s that map to a given y • if there is only one x that maps to y.e. ﬁnd it (i. ﬁnd all designs that meet speciﬁcations) • among x’s that satisfy y = ydes...e.. or outcomes • A describes how input choices aﬀect results sample problems: • ﬁnd x so that y = ydes • ﬁnd all x’s that result in y = ydes (i.e. ﬁnd a small one (i.

aj ’s are m-vectors) • y is a (linear) combination or mixture of the columns of A • coeﬃcients of x give coeﬃcients of mixture Linear functions and examples 2–29 a1 a2 · · · an an important example: x = ej . . . the jth unit vector 1 0 e1 = . then Aej = aj . the jth column of A (ej corresponds to a pure mixture. . 0 0 0 en = .Matrix multiplication as mixture of columns write A ∈ Rm×n in terms of its columns: A= where aj ∈ Rm then y = Ax can be written as y = x1a1 + x2a2 + · · · + xnan (xj ’s are scalars. 0 0 1 e2 = .. giving only column j) Linear functions and examples 2–30 . 1 . .. .

aT ˜n where ai ∈ Rn ˜ then y = Ax can be written as aT x ˜1 aT x ˜ y= 2 . x . i. yi is inner product of ith row of A with x ˜ Linear functions and examples 2–31 geometric interpretation: yi = aT x = α is a hyperplane in Rn (normal to ai) ˜i ˜ yi = ai. .Matrix multiplication as inner product with rows write A in terms of its rows: aT ˜1 aT ˜ A = . T amx ˜ thus yi = ai. x = 2 ˜ yi = ai. x = 1 ˜ Linear functions and examples 2–32 .. x = 3 ˜ ai ˜ yi = ai.e. x = 0 ˜ yi = ai.2 .

y= y1 y2 (x1 ∈ Rn1 ..g. x2 ∈ Rn2 .e.. ) Linear functions and examples 2–33 example: block upper triangular. diagonal. y1 ∈ Rm1 . y2 ∈ Rm2 ) so y1 = A11x1 + A12x2. A12 ∈ Rm1×n2 . .Block diagram representation y = Ax can be represented by a signal ﬂow graph or block diagram e. i.e. block upper triangular. for m = n = 2. A21 ∈ Rm2×n1 . . y2 doesn’t depend on x1 Linear functions and examples 2–34 y2 = A22x2. . A22 ∈ Rm2×n2 partition x and y conformably as x= x1 x2 . we represent y1 y2 as x1 a11 a21 x2 a12 a22 y2 y1 = a11 a12 a21 a22 x1 x2 • aij is the gain along the path from jth input to ith output • (by not drawing paths with zero gain) shows sparsity structure of A (e.. A= A11 A12 0 A22 where A11 ∈ Rm1×n1 . arrow .g. i.

. C = AB ∈ Rm×p where n cij = k=1 aik bkj composition interpretation: y = Cz represents composition of y = Ax and x = Bz z p B x n A m y ≡ z p AB m y (note that B is on left in block diagram) Linear functions and examples 2–36 . so y2 doesn’t depend on x1 Linear functions and examples 2–35 Matrix multiplication as composition for A ∈ Rm×n and B ∈ Rn×p. . no path from x1 to y2.block diagram: x1 A11 A12 y1 x2 A22 y2 .

e. = AB = . fn deﬁned as Gij = fiT fj (gives inner product of each vector with the others) • G = [f1 · · · fn]T [f1 · · · fn] Linear functions and examples 2–38 . . .Column and row interpretations can write product C = AB as C= c1 · · · cp = AB = Ab1 · · · Abp i.. entries of C are inner products of rows of A and columns of B • cij = 0 means ith row of A is orthogonal to jth column of B • Gram matrix of vectors f1. ith column of C is A acting on ith column of B similarly we can write T cT ˜1 a1 B ˜ .e. T T cm ˜ amB ˜ i.e. .. . C= . ith row of C is ith row of A acting (on left) on B Linear functions and examples 2–37 Inner product interpretation inner product interpretation: cij = aT bj = ai. bj ˜i ˜ i.. .

Matrix multiplication interpretation via paths x1 b21 b12 x2 b22 z2 b11 z1 a21 a12 a22 y2 path gain= a22b21 a11 y1 • aik bkj is gain of path from input j to output i via k • cij is sum of gains over all paths from input j to output i Linear functions and examples 2–39 .

basis. dimension • range. inner product 3–1 Vector spaces a vector space or linear space (over the reals) consists of • a set V • a vector sum + : V × V → V • a scalar multiplication : R × V → V • a distinguished element 0 ∈ V which satisfy a list of properties Linear algebra review 3–2 . angle. rank • change of coordinates • norm. subspaces • independence. nullspace.EE263 Autumn 2010-11 Stephen Boyd Lecture 3 Linear algebra review • vector space.

∀x ∈ V ∀α ∈ R ∀x. vk ) = {α1v1 + · · · + αk vk | αi ∈ R} and v1. y ∈ V (+ is commutative) ∀x. v2. v2. ∀α.t. x + (−x) = 0 • (αβ)x = α(βx). • (α + β)x = αx + βx. β ∈ R ∀x ∈ V • α(x + y) = αx + αy. β ∈ R ∀x ∈ V Linear algebra review 3–3 Examples • V1 = Rn. z ∈ V (+ is associative) • (x + y) + z = x + (y + z). .• x + y = y + x. ∀x ∈ V (0 is additive identity) (existence of additive inverse) (scalar mult. vk ) where span(v1. . . . ∀x. . vk ∈ Rn Linear algebra review 3–4 . . is associative) (right distributive rule) (left distributive rule) • ∀x ∈ V ∃(−x) ∈ V s. y. . . with standard (componentwise) vector addition and scalar multiplication • V2 = {0} (where 0 ∈ Rn) • V3 = span(v1. . • 1x = x. • 0 + x = x. . . y ∈ V ∀α. .

a subspace is closed under vector addition and scalar multiplication • examples V1. V3 above are subspaces of Rn Linear algebra review 3–5 Vector spaces of functions • V4 = {x : R+ → Rn | x is diﬀerentiable}.Subspaces • a subspace of a vector space is a subset of a vector space which is itself a vector space • roughly speaking. where vector sum is sum of functions: (x + z)(t) = x(t) + z(t) and scalar multiplication is deﬁned by (αx)(t) = αx(t) (a point in V4 is a trajectory in Rn) • V5 = {x ∈ V4 | x = Ax} ˙ (points in V5 are trajectories of the linear system x = Ax) ˙ • V5 is a subspace of V4 Linear algebra review 3–6 . V2.

vk Linear algebra review 3–7 Basis and dimension set of vectors {v1. v2. . i. vk } is independent equivalent: every v ∈ V can be uniquely expressed as v = α1v1 + · · · + αk vk fact: for a given vector space V. . . . v2. . . .. vk ) • {v1. . . v2. the number of vectors in any basis is the same number of vectors in any basis is called the dimension of V.. . . . . vi+1. . i. . . . . denoted dimV (we assign dim{0} = 0. . vk span V. αk = βk • no vector vi can be expressed as a linear combination of the other vectors v1. v2.e.Independent set of vectors a set of vectors {v1. α1v1 + α2v2 + · · · + αk vk = β1v1 + β2v2 + · · · + βk vk implies α1 = β1. . .e. . . V = span(v1. . . . α2 = β2. . vk } is a basis for a vector space V if • v1. . . vk } is independent if α1v1 + α2v2 + · · · + αk vk = 0 =⇒ α1 = α2 = · · · = 0 some equivalent conditions: • coeﬃcients of α1v1 + α2v2 + · · · + αk vk are uniquely determined. vi−1. and dimV = ∞ if there is no basis) Linear algebra review 3–8 . . . v2. .

BA = I • det(AT A) = 0 (we’ll establish these later) Linear algebra review 3–10 . there is a matrix B ∈ Rn×m s. the linear transformation y = Ax doesn’t ‘lose’ information) • mapping from x to Ax is one-to-one: diﬀerent x’s map to diﬀerent y’s • columns of A are independent (hence.t..e.Nullspace of a matrix the nullspace of A ∈ Rm×n is deﬁned as N (A) = { x ∈ Rn | Ax = 0 } • N (A) is set of vectors mapped to zero by y = Ax • N (A) is set of vectors orthogonal to all rows of A N (A) gives ambiguity in x given y = Ax: • if y = Ax and z ∈ N (A).e. a basis for their span) • A has a left inverse. if y = Ax and y = A˜. then y = A(x + z) • conversely.. i. then x = x + z for some z ∈ N (A) x ˜ Linear algebra review 3–9 Zero nullspace A is called one-to-one if 0 is the only element of its nullspace: N (A) = {0} ⇐⇒ • x can always be uniquely determined from y = Ax (i.

Interpretations of nullspace suppose z ∈ N (A) y = Ax represents measurement of x • z is undetectable from sensors — get zero sensor readings • x and x + z are indistinguishable from sensors: Ax = A(x + z) N (A) characterizes ambiguity in x from measurement y = Ax y = Ax represents output resulting from input x • z is an input with no result • x and x + z have same result N (A) characterizes freedom of input choice for given result Linear algebra review 3–11 Range of a matrix the range of A ∈ Rm×n is deﬁned as R(A) = {Ax | x ∈ Rn} ⊆ Rm R(A) can be interpreted as • the set of vectors that can be ‘hit’ by linear mapping y = Ax • the span of columns of A • the set of vectors y for which Ax = y has a solution Linear algebra review 3–12 .

t.e. we’ll establish them later) Linear algebra review 3–13 Interpretations of range suppose v ∈ R(A). AB = I • rows of A are independent • N (AT ) = {0} • det(AAT ) = 0 (some of these are not obvious..Onto matrices A is called onto if R(A) = Rm ⇐⇒ • Ax = y can be solved in x for any y • columns of A span Rm • A has a right inverse. i. there is a matrix B ∈ Rn×m s. w ∈ R(A) y = Ax represents measurement of x • y = v is a possible or consistent sensor signal • y = w is impossible or inconsistent. sensors have failed or model is wrong y = Ax represents output resulting from input x • v is a possible result or output • w cannot be a result or output R(A) characterizes the possible results or achievable outputs Linear algebra review 3–14 .

with AA−1 = A−1A = I • N (A) = {0} • R(A) = Rn • det AT A = det AAT = 0 Linear algebra review 3–15 Interpretations of inverse suppose A ∈ Rn×n has inverse B = A−1 • mapping associated with B undoes mapping associated with A (applied either before or after!) • x = By is a perfect (pre.or post-) equalizer for the channel y = Ax • x = By is unique solution of Ax = y Linear algebra review 3–16 .Inverse A ∈ Rn×n is invertible or nonsingular if det A = 0 equivalent conditions: • columns of A are a basis for Rn • rows of A are a basis for Rn • y = Ax has a unique solution x for every y ∈ Rn • A has a (left and right) inverse denoted A−1 ∈ Rn×n.

an are called dual bases b b Linear algebra review 3–17 Rank of a matrix we deﬁne the rank of A ∈ Rm×n as rank(A) = dim R(A) (nontrivial) facts: • rank(A) = rank(AT ) • rank(A) is maximum number of independent columns (or rows) of A hence rank(A) ≤ min(m. . inner product with rows of inverse matrix gives the coeﬃcients in the expansion of a vector in the columns of the matrix • ˜1. . . n) • rank(A) + dim N (A) = n Linear algebra review 3–18 . . . ˜n and a1. .Dual basis interpretation • let ai be columns of A. . we get bi n y= i=1 (˜T y)ai bi thus. and ˜T be rows of B = A−1 bi • from y = x1a1 + · · · + xnan and xi = ˜T y. .

Conservation of dimension interpretation of rank(A) + dim N (A) = n: • rank(A) is dimension of set ‘hit’ by the mapping y = Ax • dim N (A) is dimension of set of x ‘crushed’ to zero by y = Ax • ‘conservation of dimension’: each dimension of input is either crushed to zero or ends up in output • roughly speaking: – n is number of degrees of freedom in input x – dim N (A) is number of degrees of freedom lost in the mapping from x to y = Ax – rank(A) is number of degrees of freedom in output y Linear algebra review 3–19 ‘Coding’ interpretation of rank • rank of product: rank(BC) ≤ min{rank(B). rank(C)} • hence if A = BC with B ∈ Rm×r . C ∈ Rr×n. then rank(A) ≤ r • conversely: if rank(A) = r then A ∈ Rm×n can be factored as A = BC with B ∈ Rm×r . C ∈ Rr×n: rank(A) lines x A y x C B y n m n r m • rank(A) = r is minimum size of vector needed to faithfully reconstruct y from x Linear algebra review 3–20 .

full rank means rows are independent Linear algebra review 3–22 . B ∈ Rm×r • computing y = Ax directly: mn operations • computing y = Ax as y = B(Cx) (compute z = Cx ﬁrst. n) we say A is full rank if rank(A) = min(m. full rank means columns are independent • for fat matrices (m ≤ n). n) • for square matrices. then y = Bz): rn + mr = (m + n)r operations • savings can be considerable if r ≪ min{m.Application: fast matrix-vector multiplication • need to compute matrix-vector product y = Ax. A ∈ Rm×n • A has known factorization A = BC. full rank means nonsingular • for skinny matrices (m ≥ n). n} Linear algebra review 3–21 Full rank matrices for A ∈ Rm×n we always have rank(A) ≤ min(m.

. t2. . 0 ei = (1 in ith component) obviously we have x = x1e1 + x2e2 + · · · + xnen xi are called the coordinates of x (in the standard basis) Linear algebra review 3–23 if (t1. . tn) ˜ deﬁne T = t 1 t2 · · · t n so x = T x. . . . . . e2. tn) is another basis for Rn. t2. . . hence ˜ x = T −1x ˜ (T is invertible since ti are a basis) T −1 transforms (standard basis) coordinates of x into ti-coordinates inner product ith row of T −1 with x extracts ti-coordinate of x Linear algebra review 3–24 . . . .Change of coordinates ‘standard’ basis vectors in Rn: (e1. we have x = x1t1 + x2t2 + · · · + xntn ˜ ˜ ˜ where xi are the coordinates of x in the basis (t1. 1 . en) where 0 . .

˜ so y = (T −1AT )˜ ˜ x y = Ty ˜ • A −→ T −1AT is called similarity transformation • similarity transformation by T expresses linear transformation y = Ax in coordinates t1. . t2. . A ∈ Rn×n express y and x in terms of t1. . . t2 . . . tn: x = T x. . tn Linear algebra review 3–25 (Euclidean) norm for x ∈ Rn we deﬁne the (Euclidean) norm as x = x2 + x2 + · · · + x2 = n 1 2 √ xT x x measures length of vector (from origin) important properties: • αx = |α| x (homogeneity) • x + y ≤ x + y (triangle inequality) • x ≥ 0 (nonnegativity) • x = 0 ⇐⇒ x = 0 (deﬁniteness) Linear algebra review 3–26 .consider linear transformation y = Ax.

x ≥ 0 • x.RMS value and (Euclidean) distance root-mean-square (RMS) value of vector x ∈ Rn: rms(x) = 1 n n 1/2 x2 i i=1 x = √ n norm deﬁnes distance between vectors: dist(x. y • x + y. z = x. y := x1y1 + x2y2 + · · · + xnyn = xT y important properties: • αx. z + y. x = 0 ⇐⇒ x = 0 f (y) = x. z • x. x • x. y = α x. y = y. y is linear function : Rn → R. y) = x − y x x−y y Linear algebra review 3–27 Inner product x. with linear map deﬁned by row vector xT Linear algebra review 3–28 .

(if x = 0) y = αx for some α ≥ 0 • x and y are opposed: θ = π. y) = cos−1 xT y x y x y θ thus xT y = x y cos θ Linear algebra review 3–29 xT y y 2 y special cases: • x and y are aligned: θ = 0.Cauchy-Schwartz inequality and angle between vectors • for any x. |xT y| ≤ x y • (unsigned) angle between vectors in Rn deﬁned as θ = (x. y ∈ Rn. xT y = − x y (if x = 0) y = −αx for some α ≥ 0 • x and y are orthogonal: θ = π/2 or −π/2. xT y = 0 denoted x ⊥ y Linear algebra review 3–30 . xT y = x y .

y) is acute • xT y < 0 means (x.interpretation of xT y > 0 and xT y < 0: • xT y > 0 means (x. and boundary passing through 0 {x | xT y ≤ 0} y 0 Linear algebra review 3–31 . y) is obtuse x y x y xT y > 0 xT y < 0 {x | xT y ≤ 0} deﬁnes a halfspace with outward normal vector y.

uk } ∈ Rn is • normalized if ui = 1. not vectors individually in terms of U = [u1 · · · uk ]. . . k (ui are called unit vectors or direction vectors) • orthogonal if ui ⊥ uj for i = j • orthonormal if both slang: we say ‘u1. . . i = 1. . uk are orthonormal vectors’ but orthonormality (like independence) is a property of a set of vectors. . . QR factorization • orthogonal decomposition induced by a matrix 4–1 Orthonormal set of vectors set of vectors {u1. . . . orthonormal means U T U = Ik Orthonormal sets of vectors and QR factorization 4–2 . .EE263 Autumn 2010-11 Stephen Boyd Lecture 4 Orthonormal sets of vectors and QR factorization • orthonormal set of vectors • Gram-Schmidt procedure. .

. . . uk ) = R(U ) • warning: if k < n then U U T = I (since its rank is at most k) (more on this matrix later . . ) Orthonormal sets of vectors and QR factorization 4–3 Geometric properties suppose columns of U = [u1 · · · uk ] are orthonormal if w = U z. uk } is an orthonormal basis for span(u1. then w = z • multiplication by U does not change norm • mapping w = U z is isometric: it preserves distances • simple derivation using matrices: w 2 = Uz 2 = (U z)T (U z) = z T U T U z = z T z = z 2 Orthonormal sets of vectors and QR factorization 4–4 . . .• an orthonormal set of vectors is independent (multiply α1u1 + α2u2 + · · · + αk uk = 0 by uT ) i • hence {u1. . . . .

not orthogonal) • it follows that U −1 = U T . .• inner products are also preserved: U z. z ˜ ˜ • if w = U z and w = U z then ˜ ˜ w..e. angles. . un is an orthonormal basis for Rn • then U = [u1 · · · un] is called orthogonal: it is square and satisﬁes UTU = I (you’d think such matrices would be called orthonormal. U z = (U z)T (U z ) = z T U T U z = z. z ˜ ˜ ˜ ˜ ˜ • norms and inner products preserved. multiplication by U preserves inner products. and distances Orthonormal sets of vectors and QR factorization 4–5 Orthonormal basis for Rn • suppose u1. and hence also U U T = I. . U z = z. . U z ) = (z. so angles are preserved: (U z. w = U z. i. n ui uT = I i i=1 Orthonormal sets of vectors and QR factorization 4–6 . z ) ˜ ˜ • thus.

. i. n x= i=1 (uT x)ui i • uT x is called the component of x in the direction ui i • a = U T x resolves x into the vector of its ui components • x = U a reconstitutes x from its ui components n • x = Ua = i=1 aiui is called the (ui-) expansion of x 4–7 Orthonormal sets of vectors and QR factorization the identity I = U U T = n T i=1 ui ui n is sometimes written (in physics) as |ui ui| I= i=1 since x= n |ui ui|x i=1 (but we won’t use this notation) Orthonormal sets of vectors and QR factorization 4–8 .Expansion in orthonormal basis suppose U is orthogonal.e. so x = U U T x.

i. Uθ = cos θ − sin θ sin θ cos θ since e1 → (cos θ. cos θ) reﬂection across line x2 = x1 tan(θ/2) is given by y = Rθ x.. then transformation w = U z • preserves norm of vectors.. sin θ). e2 → (− sin θ.e. Rθ = cos θ sin θ sin θ − cos θ since e1 → (cos θ. e2 → (sin θ. U z ) = (z.e. sin θ). U z = z ˜ ˜ • preserves angles between vectors. − cos θ) Orthonormal sets of vectors and QR factorization 4–10 . z ) examples: • rotations (about some axis) • reﬂections (through some plane) Orthonormal sets of vectors and QR factorization 4–9 Example: rotation by θ in R2 is given by y = Uθ x. (U z. i.Geometric interpretation if U is orthogonal.

. . qk s. then normalize result to have norm one Orthonormal sets of vectors and QR factorization 4–12 . . . . . . . . . ak ∈ Rn. . .t. qr ) for r ≤ k • thus. . . . .t. qr is an orthonormal basis for span(a1. ar ) = span(q1. . G-S procedure ﬁnds orthonormal vectors q1. .r. span(a1. .x2 e2 θ rotation x2 e2 θ reﬂection e1 x1 e1 x1 can check that Uθ and Rθ are orthogonal Orthonormal sets of vectors and QR factorization 4–11 Gram-Schmidt procedure • given independent vectors a1. q1. previous ones. . . . . ar ) • rough idea of method: ﬁrst orthogonalize each vector w. .

q2 components) ˜ • step 3b. q3 := a3 − (q1 a3)q1 − (q2 a3)q2 (remove q1. . and rii = 0) Orthonormal sets of vectors and QR factorization 4–14 . k we have T T T ai = (q1 ai)q1 + (q2 ai)q2 + · · · + (qi−1ai)qi−1 + qi qi ˜ = r1iq1 + r2iq2 + · · · + riiqi (note that the rij ’s come right out of the G-S procedure. . q1 := q1/ q1 ˜ ˜ (normalize) T • step 2a. 2.Gram-Schmidt procedure • step 1a. q2 := q2/ q2 ˜ ˜ (normalize) T T • step 3a. . q2 := a2 − (q1 a2)q1 (remove q1 component from a2) ˜ • step 2b. . q3 := q3/ q3 ˜ ˜ • etc. q1 := a1 ˜ • step 1b. (normalize) Orthonormal sets of vectors and QR factorization 4–13 a2 T q2 = a2 − (q1 a2)q1 ˜ q2 q1 q1 = a1 ˜ for i = 1.

for i = 1. . k { r T a = ai − j=1 qj qj ai. . R ∈ Rk×k : r11 r12 · · · r1k 0 r22 · · · r2k . aj−1 • modiﬁed algorithm: when we encounter qj = 0. . . . . .. skip to next vector aj+1 ˜ and continue: r = 0. . 0 0 · · · rkk R a1 a2 · · · ak A = q1 q2 · · · q k Q • QT Q = Ik . Q ∈ Rn×k . . . where A ∈ Rn×k . } ˜ ˜ ˜ } Orthonormal sets of vectors and QR factorization 4–16 . ak are dependent.. ˜ if a = 0 { r = r + 1. we ﬁnd qj = 0 for some j. ak ∈ Rn are independent • if a1. . . . . . . which means aj ˜ is linearly dependent on a1. . . . . . and R is upper triangular & invertible • called QR decomposition (or factorization) of A • usually computed using a variation on Gram-Schmidt procedure which is less sensitive to numerical (rounding) errors • columns of Q are orthonormal basis for R(A) Orthonormal sets of vectors and QR factorization 4–15 General Gram-Schmidt procedure • in basic G-S we assume a1.QR decomposition written in matrix form: A = QR. . qr = a/ a .

. qr is an orthonormal basis for R(A) (hence r = Rank(A)) • each ai is linear combination of previously generated qj ’s in matrix notation we have A = QR with QT Q = Ir and R ∈ Rr×k in upper staircase form: × × × possibly nonzero entries × × × zero entries × × × ‘corner’ entries (shown as ×) are nonzero Orthonormal sets of vectors and QR factorization 4–17 can permute columns with × to front of matrix: ˜ A = Q[R S]P where: • QT Q = Ir ˜ • R ∈ Rr×r is upper triangular and invertible • P ∈ Rk×k is a permutation matrix (which moves forward the columns of a which generated a new q) Orthonormal sets of vectors and QR factorization 4–18 . .on exit. . • q1. .

. . orthogonal to Q1 to ﬁnd Q2: ˜ ˜ ˜ • ﬁnd any matrix A s.e. A = I) ˜ • apply general Gram-Schmidt to [A A] • Q1 are orthonormal vectors obtained from columns of A ˜ • Q2 are orthonormal vectors obtained from extra columns (A) Orthonormal sets of vectors and QR factorization 4–20 .Applications • directly yields orthonormal basis for R(A) • yields factorization A = BC with B ∈ Rn×r .. [A A] is full rank (e. ak ): apply Gram-Schmidt to [a1 · · · ak b] • staircase pattern in R shows which columns of A are dependent on previous ones works incrementally: one G-S procedure yields QR factorizations of [a1 · · · ap] for p = 1. . write A= Q1 Q2 R1 0 where [Q1 Q2] is orthogonal.g. columns of Q2 ∈ Rn×(n−r) are orthonormal. . r = Rank(A) • to check if b ∈ span(a1. . i.t. k: [a1 · · · ap] = [q1 · · · qs]Rp where s = Rank([a1 · · · ap]) and Rp is leading s × p submatrix of R Orthonormal sets of vectors and QR factorization 4–19 ‘Full’ QR factorization with A = Q1R1 the QR factorization as above. . C ∈ Rr×k . . ..

e.i..e.e. but what is its orthogonal complement R(Q2)? Orthonormal sets of vectors and QR factorization 4–21 ⊥ Orthogonal decomposition induced by A from AT = T R1 0 QT 1 QT 2 we see that AT z = 0 ⇐⇒ QT z = 0 ⇐⇒ z ∈ R(Q2) 1 so R(Q2) = N (AT ) (in fact the columns of Q2 are an orthonormal basis for N (AT )) we conclude: R(A) and N (AT ) are complementary subspaces: • R(A) + N (AT ) = Rn (recall A ∈ Rn×k ) • R(A)⊥ = N (AT ) (and N (AT )⊥ = R(A)) • called orthogonal decomposition (of Rn) induced by A ∈ Rn×k Orthonormal sets of vectors and QR factorization 4–22 ⊥ .. every vector in Rn can be expressed as a sum of two vectors. one from each subspace) this is written • R(Q1) + R(Q2) = Rn • R(Q2) = R(Q1)⊥ (and R(Q1) = R(Q2)⊥) (each subspace is the orthogonal complement of the other) we know R(Q1) = R(A). every vector in the ﬁrst subspace is orthogonal to every vector in the second subspace) • their sum is Rn (i.. any set of orthonormal vectors can be extended to an orthonormal basis for Rn R(Q1) and R(Q2) are called complementary subspaces since • they are orthogonal (i.

. with z ∈ R(A). ) • can now prove most of the assertions from the linear algebra review lecture • switching A ∈ Rn×k to AT ∈ Rk×n gives decomposition of Rk : N (A) + R(AT ) = Rk ⊥ Orthonormal sets of vectors and QR factorization 4–23 . . w ∈ N (AT ) (we’ll soon see what the vector z is .• every y ∈ Rn can be written uniquely as y = z + w.

m > n • called overdetermined set of linear equations (more equations than unknowns) • for most y..e.EE263 Autumn 2010-11 Stephen Boyd Lecture 5 Least-squares • least-squares (approximate) solution of overdetermined equations • projection and orthogonality principle • least-squares estimation • BLUE property 5–1 Overdetermined linear equations consider y = Ax where A ∈ Rm×n is (strictly) skinny. cannot solve for x one approach to approximately solve y = Ax: • deﬁne residual or error r = Ax − y • ﬁnd x = xls that minimizes r xls called least-squares (approximate) solution of y = Ax Least-squares 5–2 . i.

Geometric interpretation Axls is point in R(A) closest to y (Axls is projection of y onto R(A)) y r Axls R(A) Least-squares 5–3 Least-squares (approximate) solution • assume A is full rank. a very famous formula Least-squares 5–4 • yields the normal equations: AT Ax = AT y . x to zero: ∇x r 2 = 2AT Ax − 2AT y = 0 • assumptions imply AT A invertible.t. . so we have xls = (AT A)−1AT y .r. we’ll minimize norm of residual squared. = xT AT Ax − 2y T Ax + y T y • set gradient w. . skinny r 2 • to ﬁnd xls.

• xls is linear function of y • xls = A−1y if A is square • xls solves y = Axls if y ∈ R(A) • A† = (AT A)−1AT is called the pseudo-inverse of A • A† is a left inverse of (full rank.e.. and given by PR(A)(y) = Axls = A(AT A)−1AT y • A(AT A)−1AT is called the projection matrix (associated with R(A)) Least-squares 5–6 . it is the projection of y onto R(A) Axls = PR(A)(y) • the projection function PR(A) is linear. i. skinny) A: A†A = (AT A)−1AT A = I Least-squares 5–5 Projection on R(A) Axls is (by deﬁnition) the point in R(A) that is closest to y.

Ax − y > Axls − y Least-squares 5–8 .Orthogonality principle optimal residual r = Axls − y = (A(AT A)−1AT − I)y is orthogonal to R(A): r. we have Ax − y 2 = = Axls − y (Axls − y) + A(x − xls) 2 2 2 + A(x − xls) this shows that for x = xls. Az = y T (A(AT A)−1AT − I)T Az = 0 for all z ∈ Rn y r Axls R(A) Least-squares 5–7 Completion of squares since r = Axls − y ⊥ A(x − xls) for any x.

full rank • factor as A = QR with QT Q = In. R ∈ Rn×n upper triangular. invertible • pseudo-inverse is (AT A)−1AT = (RT QT QR)−1RT QT = R−1QT so xls = R−1QT y • projection on R(A) given by matrix A(AT A)−1AT = AR−1QT = QQT Least-squares 5–9 Least-squares via full QR factorization • full QR factorization: A = [Q1 Q2] R1 0 with [Q1 Q2] ∈ Rm×m orthogonal. R1 ∈ Rn×n upper triangular. invertible • multiplication by orthogonal matrix doesn’t change norm.Least-squares via QR factorization • A ∈ Rm×n skinny. so Ax − y 2 = = [Q1 Q2] T R1 0 2 x−y R1 0 2 [Q1 Q2] [Q1 Q2] x − [Q1 Q2] y 5–10 T Least-squares .

and reconstruction problems have form y = Ax + v • x is what we want to estimate or reconstruct • y is our sensor measurement(s) • v is an unknown noise or measurement error (assumed small) • ith row of A characterizes ith sensor Least-squares 5–12 .= = R1x − QT y 1 R1x − QT y 1 −QT y 2 2 2 + QT y 2 2 −1 • this is evidently minimized by choice xls = R1 QT y 1 (which makes ﬁrst term zero) • residual with optimal x is Axls − y = −Q2QT y 2 • Q1QT gives projection onto R(A) 1 • Q2QT gives projection onto R(A)⊥ 2 Least-squares 5–11 Least-squares estimation many applications in inversion. estimation.

and • what we would observe if x = x. deviation between • what we actually observed (y).least-squares estimation: choose as estimate x that minimizes ˆ Aˆ − y x i. B is left inverse of A Least-squares 5–14 . no estimation error when there is no noise) same as BA = I.e... and there were no noise (v = 0) ˆ least-squares estimate is just x = (AT A)−1AT y ˆ Least-squares 5–13 BLUE property linear measurement with noise: y = Ax + v with A full rank..e. skinny consider a linear estimator of form x = By ˆ • called unbiased if x = x whenever v = 0 ˆ (i.e. i.

so linearization around x = 0 (say) nearly exact Least-squares 5–16 . least-squares provides the best linear unbiased estimator (BLUE) Least-squares 5–15 Navigation from range measurements navigation using range measurements from distant beacons beacons k4 x unknown position k3 k1 k2 beacons far from unknown position x ∈ R2. in the following sense: for any B with BA = I.. we’d like B ‘small’ (and BA = I) • fact: A† = (AT A)−1AT is the smallest left inverse of A.j i.• estimation error of unbiased linear estimator is x − x = x − B(Ax + v) = −Bv ˆ obviously. we have 2 Bij ≥ A†2 ij i.e.j i. then.

07) 0 −1. measurement is y = (−11.84. given y ∈ R4 (roughly speaking. 2. Gaussian.ranges y ∈ R4 measured. with standard deviation 2 (details not important) problem: estimate x ∈ R2.0 0 0 −1.95. with measurement noise v: T k1 kT y = − 2 x + v T k3 T k4 where ki is unit vector from 0 to beacon i measurement errors are independent.84 11. 10.12 0.81. a 2 : 1 measurement redundancy ratio) actual position is x = (5. −9.59. −2.9 Least-squares 5–18 .5 0 0 y= 2.58).81) Least-squares 5–17 Just enough measurements method y1 and y2 suﬃce to ﬁnd x (when v = 0) compute estimate x by inverting top (2 × 2) half of A: ˆ x = Bjey = ˆ (norm of error: 3.

47 −0.44 −0. 0 ≤ t ≤ 10: u(t) = xj . .1i).26 (norm of error: 0. i = 1. .48 0. .04 0.Least-squares method compute estimate x by least-squares: ˆ x = A† y = ˆ −0. j = 1. .51 −0.02 −0. . j − 1 ≤ t < j.23 −0. .95 10. period 1 sec. 10 • ﬁltered by system with impulse response h(t): t w(t) = 0 h(t − τ )u(τ ) dτ • sample at 10Hz: yi = w(0.18 y= 4. 100 ˜ Least-squares 5–20 .72) • Bje and A† are both left inverses of A • larger entries in B lead to larger estimation error Least-squares 5–19 Example from overview lecture u H(s) w A/D y • signal u is piecewise constant. . .

6 0. 100.5 0 1 2 3 4 5 6 7 8 9 10 s(t) 1 0.8 −1 0 1 2 3 4 5 6 7 8 9 10 t 5–22 Least-squares .5 0 1 0 −1 1 0 −1 0 1 2 3 4 5 6 7 8 9 10 y(t) w(t) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 t Least-squares 5–21 we have y = Ax + v.6 −0. .1i − τ ) dτ • v ∈ R100 is quantization error: vi = Q(˜i) − yi (so |vi| ≤ 0. . where Q is 3-bit y quantizer characteristic Q(a) = (1/4) (round(4a + 1/2) − 1/2) • problem: estimate x ∈ R10 given y ∈ R100 example: 1 u(t) 0 −1 1.4 0.8 0.125) y ˜ least-squares estimate: xls = (AT A)−1AT y u(t) (solid) & u(t) (dotted) ˆ 1 0. .2 −0. i = 1.2 0 −0. where • A∈R 100×10 j is given by Aij = j−1 h(0. .• 3-bit quantization: yi = Q(˜i).4 −0.

15 0.05 0 −0.03 10 better than if we had no ﬁltering! (RMS error 0. we mostly use y(t) for 3 ≤ t ≤ 7 Least-squares 5–24 .1 0. . which is the original input signal for 4 ≤ t < 5.15 0.15 0. . 8 • to estimate x5.1 0.05 0 −0.05 0 1 2 3 4 5 6 7 8 9 10 row 5 0. Least-squares 5–23 some rows of Bls = (AT A)−1AT : row 2 0.1 0.07) more on this later .05 0 1 2 3 4 5 6 7 8 9 10 t • rows show how sampled measurements of y are used to form estimate of xi for i = 2. 5.RMS error is x − xls √ = 0.05 0 −0.05 0 1 2 3 4 5 6 7 8 9 10 row 8 0.

. called regressors or basis functions • data or measurements (si. ﬁnd linear combination of functions that ﬁts data least-squares ﬁt: choose x to minimize total square ﬁtting error: m (x1f1(si) + · · · + xnfn(si) − gi) i=1 2 Least-squares applications 6–2 . .. m i.e. i = 1. . . . . .EE263 Autumn 2010-11 Stephen Boyd Lecture 6 Least-squares applications • least-squares data ﬁtting • growing sets of regressors • system identiﬁcation • growing sets of measurements and recursive least-squares 6–1 Least-squares data ﬁtting we are given: • functions f1. i = 1. where si ∈ S and (usually) m≫n problem: ﬁnd coeﬃcients x1. xn ∈ R so that x1f1(si) + · · · + xnfn(si) ≈ gi. . . . . . gi). . m. . . . fn : S → R.

. . total square ﬁtting error is Ax − g 2. yi). . p(t) = a0 + a1t + · · · + an−1tn−1. . . j = 1. extrapolation. m • basis functions are fj (t) = tj−1. where Aij = fj (si) • hence. n−1 1 tm t 2 · · · t m m (called a Vandermonde matrix) Least-squares applications 6–4 . . . full rank) • corresponding function is flsﬁt(s) = x1f1(s) + · · · + xnfn(s) • applications: – interpolation. smoothing of data – developing simple. to data (ti. .• using matrix notation. . least-squares ﬁt is given by x = (AT A)−1AT g (assuming A is skinny. i = 1. . . n j−1 • matrix A has form Aij = ti n−1 1 t1 t 2 · · · t 1 1 1 t2 t2 · · · tn−1 2 2 A= . approximate model of data Least-squares applications 6–3 Least-squares polynomial ﬁtting problem: ﬁt polynomial of degree < n.

assuming tk = tl for k = l and m ≥ n.005. .e. 3.135.. i.025. respectively Least-squares applications 6–6 . so p is identically zero. 2. A full rank Least-squares applications 6–5 Example • ﬁt g(t) = 4t/(1 + 10t2) with polynomial • m = 100 points between t = 0 & t = 1 • least-squares ﬁt for degrees 1. . 4 have RMS errors . A is full rank: • suppose Aa = 0 • corresponding polynomial p(t) = a0 + · · · + an−1tn−1 vanishes at m points t1. . tm • by fundamental theorem of algebra p can have no more than n − 1 zeros. and a = 0 • columns of A are independent. . . .076. .

5 0 0 0. . so optimal residual decreases Least-squares applications 6–8 p i=1 xi ai −y .8 0. . . .6 0. .9 1 p4(t) 0.4 0. .5 0 1 0 0.8 0.2 0. ap • as p increases. .8 0. . .1 0.2 0.3 0.5 0 1 0 0.1 0. .9 1 t Least-squares applications 6–7 Growing sets of regressors consider family of least-squares problems minimize for p = 1.4 0. . .5 0.3 0.9 1 p2(t) 0. .6 0.1 p1(t) 0.4 0.2 0. ap are called regressors) • approximate y by linear combination of a1.3 0. .6 0. .9 1 p3(t) 0. . . .5 0 1 0 0. ap} • regress y on a1.7 0. n (a1.7 0. .7 0.2 0.1 0.8 0. .5 0.7 0. ap • project y onto span{a1.1 0.5 0.3 0.6 0. get better ﬁt.5 0.4 0.

solution for each p ≤ n is given by −1 xls = (AT Ap)−1AT y = Rp QT y p p p (p) where • Ap = [a1 · · · ap] ∈ Rm×p is the ﬁrst p columns of A • Ap = QpRp is the QR factorization of Ap • Rp ∈ Rp×p is the leading p × p submatrix of R • Qp = [q1 · · · qp] is the ﬁrst p columns of Q Least-squares applications 6–9 Norm of optimal residual versus p plot of optimal residual versus p shows how well y can be matched by linear combination of a1.. as function of p residual y x1a1 − y minx1 minx1.. ..x7 7 i=1 xi ai −y p 0 1 2 3 4 5 6 7 Least-squares applications 6–10 . . . . ap..

. . . . . . .. .e. . . . y (vector u. N of unknown system u(t) unknown system y(t) system identiﬁcation problem: ﬁnd reasonable model for system based on measured I/O data u. y (N ) ˆ u(N ) u(N − 1) · · · u(N − n) model prediction error is e = (y(n) − y (n). . hn ∈ R Least-squares applications 6–11 we can write model or predicted output as y (n) ˆ u(n) u(n − 1) · · · u(0) y (n + 1) u(n + 1) u(n) ··· u(1) ˆ . y example with scalar u. a least-squares problem (with variables h) Least-squares applications 6–12 .Least-squares system identiﬁcation we measure input u(t) and output y(t) for t = 0. hn least-squares identiﬁcation: choose model (i. . . . h) that minimizes norm of model prediction error e . y readily handled): ﬁt I/O data with moving-average (MA) model with n delays y (t) = h0u(t) + h1u(t − 1) + · · · + hnu(t − n) ˆ where h0. . . . = . y(N ) − y (N )) ˆ ˆ h0 h1 . . . .

487. .Example 4 2 u(t) 0 −2 −4 0 10 20 30 t 40 50 60 70 5 y(t) 0 −5 0 10 20 30 t 40 50 60 70 for n = 7 we obtain MA model with (h0. .418.354. .282.208. .441) with relative prediction error e / y = 0. h7) = (.024. . predicted from model ˆ 10 20 30 t 40 50 60 70 Least-squares applications 6–14 . . . . . .37 Least-squares applications 6–13 5 4 3 2 1 0 −1 −2 −3 −4 0 solid: y(t): actual output dashed: y (t). .243.

6 0.Model order selection question: how large should n be? • obviously the larger n.8 0.2 0. the smaller the prediction error on the data used to form the model • suggests using largest possible model order for smallest prediction error Least-squares applications 6–15 1 relative prediction error e / y 0.9 0.4 0.7 0.1 0 0 5 10 15 20 25 n 30 35 40 45 50 diﬃculty: for n too large the predictive ability of the model on other I/O data (from the same system) becomes worse Least-squares applications 6–16 .3 0.5 0.

8 0.Out of sample validation evaluate model predictive performance on another I/O data set not used to develop model model validation data set: 4 2 u(t) ¯ 0 −2 −4 0 10 20 30 t 40 50 60 70 5 y (t) ¯ 0 −5 0 10 20 30 t 40 50 60 70 Least-squares applications 6–17 now check prediction error of models (developed using modeling data) on validation data: 1 relative prediction error 0.2 modeling data 0 0 5 10 15 20 25 n 30 35 40 45 50 plot suggests n = 10 is a good choice Least-squares applications 6–18 .4 validation data 0.6 0.

for n = 50 the actual and predicted outputs on system identiﬁcation and model validation data are: 5 solid: y(t) dashed: predicted y(t) 0 −5 0 10 20 30 t 40 50 60 70 5 solid: y (t) ¯ dashed: predicted y (t) ¯ 0 −5 0 10 20 30 t 40 50 60 70 loss of predictive ability when n too large is called model overﬁt or overmodeling Least-squares applications 6–19 Growing sets of measurements least-squares problem in ‘row’ form: m minimize Ax − y 2 = i=1 (˜T x − yi)2 ai where aT are the rows of A (˜i ∈ Rn) ˜i a • x ∈ Rn is some vector to be estimated • each pair ai. i..e. yi corresponds to one measurement ˜ • solution is m −1 m xls = i=1 aiaT ˜ ˜i i=1 yiai ˜ • suppose that ai and yi become available sequentially. m increases ˜ with time Least-squares applications 6–20 .

once P (m) becomes invertible. . . . am span Rn ˜ ˜ (so. it stays invertible) Least-squares applications 6–21 Fast update for recursive least-squares we can calculate P (m + 1)−1 = P (m) + am+1aT ˜ ˜m+1 −1 eﬃciently from P (m)−1 using the rank one update formula P + aaT ˜˜ −1 = P −1 − 1 (P −1a)(P −1a)T ˜ ˜ 1 + aT P −1a ˜ ˜ valid when P = P T . P (m + 1) = P (m) + am+1aT ˜ ˜m+1 q(m + 1) = q(m) + ym+1am+1 ˜ • if P (m) is invertible. . we have xls(m) = P (m)−1q(m) • P (m) is invertible ⇐⇒ a1.Recursive least-squares m −1 m we can compute xls(m) = i=1 aiaT ˜ ˜i i=1 yiai recursively ˜ • initialize P (0) = 0 ∈ Rn×n. . . . and P and P + aaT are both invertible ˜˜ • gives an O(n2) method for computing P (m + 1)−1 from P (m)−1 • standard methods for computing P (m + 1)−1 from P (m + 1) are O(n3) Least-squares applications 6–22 . 1. q(0) = 0 ∈ Rn • for m = 0. .

Veriﬁcation of rank one update formula (P + aaT ) P −1 − ˜˜ 1 (P −1a)(P −1a)T ˜ ˜ T P −1 a 1+a ˜ ˜ 1 = I + aaT P −1 − ˜˜ P (P −1a)(P −1a)T ˜ ˜ T P −1 a 1+a ˜ ˜ 1 − aaT (P −1a)(P −1a)T ˜˜ ˜ ˜ T P −1 a 1+a ˜ ˜ aT P −1a ˜ ˜ 1 aaT P −1 − ˜˜ aaT P −1 ˜˜ = I + aaT P −1 − ˜˜ T P −1 a T P −1 a 1+a ˜ ˜ 1+a ˜ ˜ = I Least-squares applications 6–23 .

at the expense of making the other larger common example: F = I.EE263 Autumn 2010-11 Stephen Boyd Lecture 7 Regularized least-squares and Gauss-Newton method • multi-objective least-squares • regularized least-squares • nonlinear least-squares • Gauss-Newton method 7–1 Multi-objective least-squares in many problems we have two (or more) objectives • we want J1 = Ax − y • and also J2 = F x − g (x ∈ Rn is the variable) • usually the objectives are competing • we can make one smaller. with small x Regularized least-squares and Gauss-Newton method 7–2 2 small small 2 . g = 0. we want Ax − y small.

but this plot is in R2. J1(x(1)) Regularized least-squares and Gauss-Newton method 7–3 • shaded area shows (J2. J1) achieved by some x ∈ Rn • clear area shows (J2. point labeled x(1) is really J2(x(1)). F x − g 2) three example choices of x: x(1). J1) not achieved by any x ∈ Rn • boundary of region is called optimal trade-oﬀ curve • corresponding x are called Pareto optimal (for the two objectives Ax − y 2.Plot of achievable objective pairs plot (J2. x(3) • x(3) is worse than x(2) on both counts (J2 and J1) • x(1) is better than x(2) in J2. but worse in J1 Regularized least-squares and Gauss-Newton method 7–4 . x(2). J1) for every x: J1 x(1) x(2) x(3) J2 note that x ∈ Rn.

can sweep out entire optimal tradeoﬀ curve Regularized least-squares and Gauss-Newton method 7–6 . correspond to line with slope −µ on (J2. we minimize weighted-sum objective J1 + µJ2 = Ax − y 2 + µ Fx − g 2 • parameter µ ≥ 0 gives relative weight between J1 and J2 • points where weighted sum is constant. J1) plot Regularized least-squares and Gauss-Newton method 7–5 S J1 x(1) x(3) x(2) J1 + µJ2 = α J2 • x(2) minimizes weighted-sum objective for µ shown • by varying µ from 0 to +∞. J1 + µJ2 = α..e. i. x’s on optimal trade-oﬀ curve.Weighted-sum objective • to ﬁnd Pareto optimal points.

Minimizing weighted-sum objective can express weighted-sum objective as ordinary least-squares objective: Ax − y 2 + µ Fx − g 2 = = A √ µF ˜ Ax − y ˜ y= ˜ 2 x− y √ µg 2 where ˜ A= A √ µF . . . y = aT x where a ∈ R10 • J1 = (y − 1)2 (ﬁnal position error squared) • J2 = x 2 (sum of squares of forces) 2 weighted-sum objective: (aT x − 1)2 + µ x optimal x: x = aaT + µI Regularized least-squares and Gauss-Newton method −1 a 7–8 . . i = 1. 10 • y ∈ R is position at t = 10. y √ µg ˜ hence solution is (assuming A full rank) x = = ˜ ˜ AT A −1 ˜ ˜ AT y −1 AT A + µF T F AT y + µF T g Regularized least-squares and Gauss-Newton method 7–7 Example f • unit mass at rest subject to forces xi for i − 1 < t ≤ i. .

i.3 0.2 0. is called regularized least-squares (approximate) solution of Ax ≈ y • also called Tychonov regularization • for µ > 0. ) Regularized least-squares and Gauss-Newton method 7–10 .. minimizer of weighted-sum objective. g = 0 the objectives are J1 = Ax − y 2. x = AT A + µI −1 J2 = x 2 AT y.4 0.5 1 1.6 0. rank .5 2 2.8 J1 = (y − 1)2 0. .9 0.7 0.e.5 −3 J2 = x 2 • upper left corner of optimal trade-oﬀ curve corresponds to x = 0 • bottom right corresponds to input that yields y = 1.5 3 x 10 3. J1 = 0 Regularized least-squares and Gauss-Newton method 7–9 Regularized least-squares when F = I. . works for any A (no restrictions on shape.optimal trade-oﬀ curve: 1 0.5 0.1 0 0 0.

• reduces to (linear) least-squares if r(x) = Ax − y Regularized least-squares and Gauss-Newton method 7–12 .estimation/inversion application: • Ax − y is sensor residual • prior information: x small • or. size of x Regularized least-squares and Gauss-Newton method 7–11 Nonlinear least-squares nonlinear least-squares (NLLS) problem: ﬁnd x ∈ Rn that minimizes m r(x) where r : Rn → Rm • r(x) is a vector of ‘residuals’ 2 = i=1 ri(x)2. model only accurate for x small • regularized solution trades oﬀ sensor ﬁt.

. . unknown but assumed small) • NLLS estimate: choose x to minimize ˆ m m ri(x) = i=1 i=1 2 (ρi − x − bi ) 2 Regularized least-squares and Gauss-Newton method 7–13 Gauss-Newton method for NLLS m NLLS: ﬁnd x ∈ R that minimizes r(x) r : Rn → Rm • in general. very hard to solve exactly n 2 = i=1 ri(x)2. where • many good heuristics to compute locally optimal solution Gauss-Newton method: given starting guess for x repeat linearize r near current guess new guess is linear LS solution. .Position estimation from ranges estimate position x ∈ R2 from approximate distances to beacons at locations b1. . using linearized r until convergence Regularized least-squares and Gauss-Newton method 7–14 . bm ∈ R2 without linearizing • we measure ρi = x − bi + vi (vi is range error.

Gauss-Newton method (more detail): • linearize r near current iterate x(k): r(x) ≈ r(x(k)) + Dr(x(k))(x − x(k)) where Dr is the Jacobian: (Dr)ij = ∂ri/∂xj • write linearized approximation as r(x(k)) + Dr(x(k))(x − x(k)) = A(k)x − b(k) A(k) = Dr(x(k)). b(k) = Dr(x(k))x(k) − r(x(k)) • at kth iteration. we approximate NLLS problem by linear LS problem: r(x) Regularized least-squares and Gauss-Newton method 2 ≈ A(k)x − b(k) 2 7–15 • next iterate solves this linearized LS problem: x(k+1) = A(k)T A(k) −1 A(k)T b(k) • repeat until convergence (which isn’t guaranteed) Regularized least-squares and Gauss-Newton method 7–16 .

objective would be nice quadratic ‘bowl’ • bumps in objective due to strong nonlinearity of r Regularized least-squares and Gauss-Newton method 7–18 .2).5 5 4 3 2 1 0 −1 −2 −3 −4 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5 Regularized least-squares and Gauss-Newton method 7–17 NLLS objective r(x) 2 versus x: 16 14 12 10 8 6 4 2 5 0 5 0 −5 −5 0 • for a linear LS problem. −1. ♦ initial guess (1.6. 3.2.Gauss-Newton example • 10 beacons • + true position (−3.2) • range estimates accurate to ±0.

3) ˆ • estimation error is x − x = 0. 3.3.objective of Gauss-Newton iterates: 12 10 8 r(x) 2 6 4 2 0 1 2 3 4 5 6 7 8 9 10 iteration • x(k) converges to (in this case.31 ˆ (substantially smaller than range accuracy!) Regularized least-squares and Gauss-Newton method 7–20 . global) minimum of r(x) • convergence takes only ﬁve or so steps Regularized least-squares and Gauss-Newton method 7–19 2 • ﬁnal estimate is x = (−3.

linearized model still pretty accurate) Regularized least-squares and Gauss-Newton method 7–22 .convergence of Gauss-Newton iterates: 5 4 4 56 3 3 2 2 1 0 −1 1 −2 −3 −4 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5 Regularized least-squares and Gauss-Newton method 7–21 useful varation on Gauss-Newton: add regularization term A(k)x − b(k) 2 + µ x − x(k) 2 so that next iterate is not too far from previous one (hence.

i. many choices of x lead to the same y we’ll assume that A is full rank (m).e. there is a solution set of all solutions has form { x | Ax = y } = { xp + z | z ∈ N (A) } where xp is any (‘particular’) solution. • there are more variables than equations • x is underspeciﬁed.e. i.EE263 Autumn 2010-11 Stephen Boyd Lecture 8 Least-norm solutions of undetermined equations • least-norm solution of underdetermined equations • minimum norm solutions via QR factorization • derivation via Lagrange multipliers • relation to regularized least-squares • general norm minimization with equality constraints 8–1 Underdetermined linear equations we consider y = Ax where A ∈ Rm×n is fat (m < n). so for each y ∈ Rm.. Axp = y Least-norm solutions of undetermined equations 8–2 .. i..e.

xln is solution of optimization problem minimize x subject to Ax = y (with variable x ∈ Rn) Least-norm solutions of undetermined equations 8–4 .e..• z characterizes available choices in solution • solution has dim N (A) = n − m ‘degrees of freedom’ • can choose z to satisfy other specs or optimize among solutions Least-norm solutions of undetermined equations 8–3 Least-norm solution one particular solution is xln = AT (AAT )−1y (AAT is invertible since A full rank) in fact. xln is the solution of y = Ax that minimizes x i.

.suppose Ax = y. so A(x − xln) = 0 and (x − xln)T xln = (x − xln)T AT (AAT )−1y = (A(x − xln)) (AAT )−1y = 0 i.e.. (x − xln) ⊥ xln.e. xln has smallest norm of any solution Least-norm solutions of undetermined equations 8–5 { x | Ax = y } N (A) = { x | Ax = 0 } xln • orthogonality condition: xln ⊥ N (A) • projection interpretation: xln is projection of 0 on solution set { x | Ax = y } Least-norm solutions of undetermined equations 8–6 . so x 2 T = xln + x − xln 2 = xln 2 + x − xln 2 ≥ xln 2 i.

i. with • Q ∈ Rn×m.. AT = QR. skinny matrix A: • A† = (AT A)−1AT • (AT A)−1AT is a left inverse of A • A(AT A)−1AT gives projection onto R(A) Least-norm solutions of undetermined equations 8–7 Least-norm solution via QR factorization ﬁnd QR factorization of AT . nonsingular then • xln = AT (AAT )−1y = QR−T y • xln = R−T y Least-norm solutions of undetermined equations 8–8 . analogous formulas for full rank. QT Q = Im • R ∈ Rm×m upper triangular. fat A • AT (AAT )−1 is a right inverse of A • I − AT (AAT )−1A gives projection onto N (A) cf.• A† = AT (AAT )−1 is called the pseudo-inverse of full rank.e.

0) Least-norm solutions of undetermined equations 8–10 .. • from ﬁrst condition. . y = (1. i = 1. λ) = xT x + λT (Ax − y) • optimality conditions are ∇xL = 2x + AT λ = 0. x = −AT λ/2 • substitute into second to get λ = −2(AAT )−1y • hence x = AT (AAT )−1y Least-norm solutions of undetermined equations 8–9 ∇λL = Ax − y = 0 Example: transferring mass unit distance f • unit mass at rest subject to forces xi for i − 1 < t ≤ i.Derivation via Lagrange multipliers • least-norm solution solves optimization problem minimize xT x subject to Ax = y • introduce Lagrange multipliers: L(x. . 10 • y1 is position at t = 10.e. y2 is velocity at t = 10 • y = Ax where A ∈ R2×10 (A is fat) • ﬁnd least norm force that transfers mass unit distance with zero ﬁnal velocity. . i. .

.04 xln 0.2 t velocity 0.02 −0.05 0 0 2 4 6 8 10 12 t Least-norm solutions of undetermined equations 8–11 Relation to regularized least-squares • suppose A ∈ Rm×n is fat.2 0 0 2 4 6 8 10 12 0. full rank • deﬁne J1 = Ax − y 2. fat A) Least-norm solutions of undetermined equations 8–12 −1 AT → AT AAT −1 .1 0.0.02 0 −0.04 −0.06 0 2 4 6 8 10 12 1 t position 0. regularized solution converges to least-norm solution as µ → 0 • in matrix terms: as µ → 0.e.4 0.15 0.8 0. AT A + µI (for full rank. i. J2 = x 2 • least-norm solution minimizes J2 with J1 = 0 • minimizer of weighted-sum objective J1 + µJ2 = Ax − y xµ = AT A + µI −1 2 +µ x 2 is AT y • fact: xµ → xln as µ → 0.6 0.06 0.

General norm minimization with equality constraints consider problem minimize Ax − b subject to Cx = d with variable x • includes least-squares and least-norm problems as special cases • equivalent to minimize (1/2) Ax − b subject to Cx = d • Lagrangian is L(x. • write in block matrix form as AT A C T C 0 x λ = AT b d ∇λL = Cx − d = 0 • if the block matrix is invertible. we have x λ = AT A C T C 0 −1 AT b d Least-norm solutions of undetermined equations 8–14 . λ) = (1/2) Ax − b 2 2 + λT (Cx − d) = (1/2)xT AT Ax − bT Ax + (1/2)bT b + λT Cx − λT d Least-norm solutions of undetermined equations 8–13 • optimality conditions are ∇xL = AT Ax − AT b + C T λ = 0.

we can derive a more explicit (and complicated) formula for x • from ﬁrst block equation we get x = (AT A)−1(AT b − C T λ) • substitute into Cx = d to get C(AT A)−1(AT b − C T λ) = d so λ = C(AT A)−1C T −1 C(AT A)−1AT b − d • recover x from equation above (not pretty) x = (AT A)−1 AT b − C T C(AT A)−1C T −1 C(AT A)−1AT b − d Least-norm solutions of undetermined equations 8–15 .if AT A is invertible.

EE263 Autumn 2010-11 Stephen Boyd Lecture 9 Autonomous linear dynamical systems • autonomous linear dynamical systems • examples • higher order systems • linearization near equilibrium point • linearization along trajectory 9–1 Autonomous linear dynamical systems continuous-time autonomous LDS has form x = Ax ˙ • x(t) ∈ Rn is called the state • n is the state dimension or (informally) the number of states • A is the dynamics matrix (system is time-invariant if A doesn’t depend on t) Autonomous linear dynamical systems 9–2 .

5 2 Autonomous linear dynamical systems 9–4 .5 −2 −2 −1.5 0 0.picture (phase plane): x2 x(t) = Ax(t) ˙ x(t) x1 Autonomous linear dynamical systems 9–3 example 1: x = ˙ 2 −1 0 2 1 x 1.5 1 1.5 −1 −0.5 −1 −1.5 1 0.5 0 −0.

5 0 −0.5 x 1.5 −1 −0.example 2: x = ˙ 2 −0.5 1 1.5 0 0.5 −2 −2 −1.5 −1 −1.5 1 −1 0.5 2 Autonomous linear dynamical systems 9–5 Block diagram block diagram representation of x = Ax: ˙ n x(t) ˙ A n x(t) 1/s • 1/s block represents n parallel scalar integrators • coupling comes from dynamics matrix A Autonomous linear dynamical systems 9–6 .5 1 0.

. . . e. block upper triangular: x= ˙ A11 A12 0 A22 x 1/s x1 A11 A12 1/s A22 here x1 doesn’t aﬀect x2 at all Autonomous linear dynamical systems 9–7 x2 Linear circuit ic1 vc1 C1 icp vcp Cp linear static circuit il1 L1 ilr Lr vlr vl1 circuit equations are C dvc = ic.useful when A has structure.g. . dt L dil = vl. . dt ic vl =F vc il C = diag(C1. . Cp). Lr ) 9–8 . .. Autonomous linear dynamical systems L = diag(L1. .

A is usually sparse Autonomous linear dynamical systems 9–10 . we have C −1 0 0 L−1 x= ˙ Fx Autonomous linear dynamical systems 9–9 Chemical reactions • reaction involving n chemicals. xi is concentration of chemical i • linear model of reaction kinetics dxi = ai1x1 + · · · + ainxn dt • good model for some reactions.with state x = vc il .

.2 x2 0 1 2 3 4 5 6 7 8 9 10 0. e. 0. . 2.4 0. or 3) = [1 1 1 0 · · · 0]p(t)) then we have p(t + 1) = P p(t) Autonomous linear dynamical systems 9–12 .3 0. .6 x(t) 0.1 2 Example: series reaction A −→ B −→ C with linear dynamics k k −k1 0 0 x = k1 −k2 0 x ˙ 0 k2 0 plot for k1 = k2 = 1.g. Prob(z(t) = 1.5 0. p(t) = Prob(z(t) = n) (so. .7 x1 x3 0. . initial x(0) = (1.9 0.8 0. n} is a random sequence with Prob( z(t + 1) = i | z(t) = j ) = Pij where P ∈ Rn×n is the matrix of transition probabilities can represent probability distribution of z(t) as n-vector Prob(z(t) = 1) .1 0 t Autonomous linear dynamical systems 9–11 Finite-state discrete-time Markov chain z(t) ∈ {1. . 0) 1 0.

1 0.0 p(t + 1) = 0.1 0.9 1.9 0.1 2 0. Markov chain is depicted graphically • nodes are states • edges show transition probabilities Autonomous linear dynamical systems 9–13 example: 0.2 • state 1 is ‘system OK’ • state 2 is ‘system down’ • state 3 is ‘system being repaired’ 0.0 3 1 0.7 0.2 0 Autonomous linear dynamical systems 9–14 .P is often sparse.1 0 p(t) 0 0.7 1.

. so deﬁne new variable z = . I x(t) ∈ Rn ··· · · · Ak−1 z a (ﬁrst order) LDS (with bigger state) Autonomous linear dynamical systems 9–16 . we get approximation x(kh) ≈ (I + hA)k x(0) (forward Euler is never used in practice) Autonomous linear dynamical systems 9–15 Higher order linear dynamical systems x(k) = Ak−1x(k−1) + · · · + A1x(1) + A0x. x(0) = x0 ˙ suppose h is small time step (x doesn’t change much in h seconds) simple (‘forward Euler’) approximation: x(t + h) ≈ x(t) + hx(t) = (I + hA)x(t) ˙ by carrying out this recursion (discrete-time LDS). = ˙ x(k) 0 I 0 0 0 I . . (k−1) x x(1) . 0 0 0 A0 A1 A 2 ··· ··· 0 0 . . z= .Numerical integration of continuous system compute approximate solution of x = Ax. where x(m) denotes mth derivative x x(1) ∈ Rnk . starting at x(0) = x0.

block diagram: (k−1) 1/s x (k−2) 1/s x x (k) 1/s x A0 Ak−1 Ak−2 Autonomous linear dynamical systems 9–17 Mechanical systems mechanical system with k degrees of freedom undergoing small motions: M q + Dq + Kq = 0 ¨ ˙ • q(t) ∈ Rk is the vector of generalized displacements • M is the mass matrix • K is the stiﬀness matrix • D is the damping matrix with state x = q q ˙ x= ˙ we have q ˙ q ¨ 0 −M −1K I −M −1D = x Autonomous linear dynamical systems 9–18 .

so x(t) = f (x(t)) ≈ f (xe) + Df (xe)(x(t) − xe) ˙ Autonomous linear dynamical systems 9–19 with δx(t) = x(t) − xe. time-invariant diﬀerential equation (DE): x = f (x) ˙ where f : Rn → Rn suppose xe is an equilibrium point.. f (xe) = 0 (so x(t) = xe satisﬁes DE) now suppose x(t) is near xe. rewrite as ˙ δx(t) ≈ Df (xe)δx(t) replacing ≈ with = yields linearized approximation of DE near xe ˙ we hope solution of δx = Df (xe)δx is a good approximation of x − xe (more later) Autonomous linear dynamical systems 9–20 . i.Linearization near equilibrium point nonlinear.e.

example: pendulum l θ m mg ¨ 2nd order nonlinear DE ml2θ = −lmg sin θ rewrite as ﬁrst order DE with state x = θ ˙ θ : x= ˙ x2 −(g/l) sin x1 Autonomous linear dynamical systems 9–21 equilibrium point (pendulum down): x = 0 linearized system near xe = 0: ˙ δx = 0 1 −g/l 0 δx Autonomous linear dynamical systems 9–22 .

05 0 0 z(t) δx(t) = δz(t) x(t) 10 20 30 40 50 60 70 80 90 100 t • systems with very diﬀerent behavior have same linearized system • linearized systems do not predict qualitative behavior of either system Autonomous linear dynamical systems 9–24 .15 0. solutions are constant Autonomous linear dynamical systems 9–23 −1/2 −1/2 0. gives a good idea of the system behavior near xe example 1: x = −x3 near xe = 0 ˙ for x(0) > 0 solutions have form x(t) = x(0)−2 + 2t ˙ linearized system is δx = 0.2 0.3 0.Does linearization ‘work’ ? the linearized system usually.25 0.4 0. but not always.1 0. solutions are constant example 2: z = z 3 near ze = 0 ˙ for z(0) > 0 solutions have form z(t) = z(0)−2 − 2t (ﬁnite escape time at t = z(0)−2/2) ˙ linearized system is δz = 0.45 0.35 0.5 0.

and is near ˙ xtraj(t) • then d (x − xtraj) = f (x.Linearization along trajectory • suppose xtraj : R+ → Rn satisﬁes xtraj(t) = f (xtraj(t). t) ˙ • suppose x(t) is another trajectory. ˙ xtraj(t + T ) = xtraj(t) linearized system is where A(t) = Df (xtraj(t)) ˙ δx = A(t)δx A(t) is T -periodic. so linearized system is called T -periodic linear system. t) − f (xtraj.e. t) ≈ Dxf (xtraj. t)(x − xtraj) dt • (time-varying) LDS ˙ δx = Dxf (xtraj.. used to study: • startup dynamics of clock and oscillator circuits • eﬀects of power supply and other disturbances on clock behavior Autonomous linear dynamical systems 9–26 . t)δx is called linearized or variational system along trajectory xtraj Autonomous linear dynamical systems 9–25 example: linearized oscillator suppose xtraj(t) is T -periodic solution of nonlinear DE: xtraj(t) = f (xtraj(t)). t). x(t) = f (x(t). i.

where a satisﬁes |zij (t)| ≤ αeat for t ≥ 0. . q Solution via Laplace transform and matrix exponential 10–2 . . i = 1. . where Z : D ⊆ C → Cp×q is deﬁned by ∞ Z(s) = 0 e−stz(t) dt • integral of matrix is done term-by-term • convention: upper case denotes Laplace transform • D is the domain or region of convergence of Z • D includes at least {s | ℜs > a}.EE263 Autumn 2010-11 Stephen Boyd Lecture 10 Solution via Laplace transform and matrix exponential • Laplace transform • solving x = Ax via Laplace transform ˙ • state transition matrix • matrix exponential • qualitative behavior and stability 10–1 Laplace transform of matrix valued function suppose z : R+ → Rp×q Laplace transform: Z = L(z). . j = 1. . . . p. .

integrate by parts: ∞ L(z)(s) = ˙ 0 e−stz(t) dt ˙ ∞ = e−stz(t) t→∞ t=0 +s 0 e−stz(t) dt = sZ(s) − z(0) Solution via Laplace transform and matrix exponential 10–3 Laplace transform solution of x = Ax ˙ consider continuous-time time-invariant (TI) LDS x = Ax ˙ for t ≥ 0.Derivative property L(z) = sZ(s) − z(0) ˙ to derive. where x(t) ∈ Rn • take Laplace transform: sX(s) − x(0) = AX(s) • rewrite as (sI − A)X(s) = x(0) • hence X(s) = (sI − A)−1x(0) • take inverse transform x(t) = L−1 (sI − A)−1 x(0) Solution via Laplace transform and matrix exponential 10–4 .

5 2 Solution via Laplace transform and matrix exponential 10–6 .5 0 0. it maps the initial state to the state at time t: x(t) = Φ(t)x(0) (in particular.5 −1 −1. state x(t) is a linear function of initial state x(0)) Solution via Laplace transform and matrix exponential 10–5 Example 1: Harmonic oscillator 0 1 −1 0 x= ˙ x 2 1.Resolvent and state transition matrix • (sI − A)−1 is called the resolvent of A • resolvent deﬁned for s ∈ C except eigenvalues of A.e.5 0 −0.5 −2 −2 −1. i.5 1 1.5 1 0..5 −1 −0. s such that det(sI − A) = 0 • Φ(t) = L−1 (sI − A)−1 is called the state-transition matrix.

5 1 1. so resolvent is s s2 +1 −1 s2 +1 1 s2 +1 s s2 +1 (sI − A) (eigenvalues are ±i) state transition matrix is Φ(t) = L −1 −1 = s s2 +1 −1 s2 +1 1 s2 +1 s s2 +1 = cos t sin t − sin t cos t a rotation matrix (−t radians) so we have x(t) = cos t sin t − sin t cos t x(0) Solution via Laplace transform and matrix exponential 10–7 Example 2: Double integrator 0 1 0 0 x= ˙ x 2 1.5 0 −0.5 −2 −2 −1.5 1 0.sI − A = s −1 1 s .5 2 Solution via Laplace transform and matrix exponential 10–8 .5 −1 −1.5 −1 −0.5 0 0.

. with leading (i. so resolvent is 1 s 1 s2 1 s (sI − A)−1 = (eigenvalues are 0.e.sI − A = s −1 0 s . so eigenvalues are either real or occur in conjugate pairs • there are n eigenvalues (if we count multiplicity as roots of X ) Solution via Laplace transform and matrix exponential 10–10 . 0) state transition matrix is Φ(t) = L −1 1 s 1 s2 1 s 0 0 = 1 t 0 1 so we have x(t) = 1 t 0 1 x(0) Solution via Laplace transform and matrix exponential 10–9 Characteristic polynomial X (s) = det(sI − A) is called the characteristic polynomial of A • X (s) is a polynomial of degree n. sn) coeﬃcient one • roots of X are the eigenvalues of A • X has real coeﬃcients.

Eigenvalues of A and poles of resolvent i. j entry of resolvent can be expressed via Cramer’s rule as (−1)i+j det ∆ij det(sI − A) where ∆ij is sI − A with jth row and ith column deleted • det ∆ij is a polynomial of degree less than n. so i. j entry of resolvent has form fij (s)/X (s) where fij is polynomial with degree less than n • poles of entries of resolvent must be eigenvalues of A • but not all eigenvalues of A show up as poles of each entry (when there are cancellations between det ∆ij and X (s)) Solution via Laplace transform and matrix exponential 10–11 Matrix exponential (I − C)−1 = I + C + C 2 + C 3 + · · · (if series converges) • series expansion of resolvent: (sI − A)−1 = (1/s)(I − A/s)−1 = (valid for |s| large enough) so Φ(t) = L−1 (sI − A)−1 = I + tA + Solution via Laplace transform and matrix exponential I A A2 + + 3 + ··· s s2 s (tA)2 + ··· 2! 10–12 .

with a ∈ R and constant. . is ˙ x(t) = etax(0) Solution via Laplace transform and matrix exponential 10–14 .• looks like ordinary power series e at (ta)2 = 1 + ta + + ··· 2! with square matrices instead of scalars . . state-transition matrix is Φ(t) = L−1 (sI − A)−1 = etA Solution via Laplace transform and matrix exponential 10–13 Matrix exponential solution of autonomous LDS solution of x = Ax. is ˙ x(t) = etAx(0) generalizes scalar case: solution of x = ax. • deﬁne matrix exponential as eM = I + M + M2 + ··· 2! for M ∈ Rn×n (which in fact converges for all M ) • with this deﬁnition. with A ∈ Rn×n and constant.

84 0. B= eA = eA+B = 0.16 1.e.54 .• matrix exponential is meant to look like scalar exponential • some things you’d guess hold for the matrix exponential (by analogy with the scalar exponential) do in fact hold • but many things you’d guess are wrong example: you might guess that eA+B = eAeB .70 0. s ∈ R. eB = 0.84 −0. but it’s false (in general) 0 1 −1 0 0 1 0 0 1 1 0 1 A= .38 −0. with inverse etA −1 = e−tA Solution via Laplace transform and matrix exponential 10–16 . A and B commute thus for t.84 −0. we do have eA+B = eAeB if AB = BA..54 1. i.54 0. e(tA+sA) = etAesA with s = −t we get etAe−tA = etA−tA = e0 = I so etA is nonsingular.16 = eAeB = 0.30 Solution via Laplace transform and matrix exponential 10–15 however.40 −0.

for any t and τ . apply result above to z(t) = x(t + τ )) interpretation: the matrix etA propagates state t seconds forward in time (backward if t < 0) Solution via Laplace transform and matrix exponential 10–18 . plugging in t = 1. where A = we already found 0 1 0 0 etA = L−1(sI − A)−1 = 1 1 0 1 1 t 0 1 so. we get eA = let’s check power series: A2 e =I +A+ + ··· = I + A 2! A since A2 = A3 = · · · = 0 Solution via Laplace transform and matrix exponential 10–17 Time transfer property for x = Ax we know ˙ x(t) = Φ(t)x(0) = etAx(0) interpretation: the matrix etA propagates initial condition into state at time t more generally we have.example: let’s ﬁnd eA. x(τ + t) = etAx(τ ) (to see this.

for small t: x(τ + t) ≈ x(τ ) + tx(τ ) = (I + tA)x(τ ) ˙ • exact solution is x(τ + t) = etAx(τ ) = (I + tA + (tA)2/2! + · · ·)x(τ ) • forward Euler is just ﬁrst two terms in series Solution via Laplace transform and matrix exponential 10–19 Sampling a continuous-time system suppose x = Ax ˙ sample x at times t1 ≤ t2 ≤ · · ·: deﬁne z(k) = x(tk ) then z(k + 1) = e(tk+1−tk )Az(k) for uniform sampling tk+1 − tk = h. so z(k + 1) = ehAz(k). a discrete-time LDS (called discretized version of continuous-time system) Solution via Laplace transform and matrix exponential 10–20 .• recall ﬁrst order (forward Euler) approximate state update.

Piecewise constant system consider time-varying LDS x = A(t)x. with ˙ A0 0 ≤ t < t1 A1 t1 ≤ t < t2 A(t) = . X(s) = (sI − A)−1x(0) ith component Xi(s) has form Xi(s) = where ai is a polynomial of degree < n thus the poles of Xi are all eigenvalues of A (but not necessarily the other way around) ai(s) X (s) Solution via Laplace transform and matrix exponential 10–22 . and denoted Φ(t)) Solution via Laplace transform and matrix exponential 10–21 Qualitative behavior of x(t) suppose x = Ax. . x(t) ∈ Rn ˙ then x(t) = etAx(0). where 0 < t1 < t2 < · · · (sometimes called jump linear system) for t ∈ [ti. ti+1] we have x(t) = e(t−ti)Ai · · · e(t3−t2)A2 e(t2−t1)A1 et1A0 x(0) (matrix on righthand side is called state transition matrix for system.

or exponential decay rate (if < 0) of term • ℑλj gives frequency of oscillatory term (if = 0) eigenvalues ℑs ℜs Solution via Laplace transform and matrix exponential 10–24 . so Xi(s) cannot have repeated poles then xi(t) has form xi(t) = j=1 n βij eλj t where βij depend on x(0) (linearly) eigenvalues determine (possible) qualitative behavior of x: • eigenvalues give exponents that can occur in exponentials • real eigenvalue λ corresponds to an exponentially decaying or growing term eλt in solution • complex eigenvalue λ = σ + jω corresponds to decaying or growing sinusoidal term eσt cos(ωt + φ) in solution Solution via Laplace transform and matrix exponential 10–23 • ℜλj gives exponential growth rate (if > 0).ﬁrst assume eigenvalues λi are distinct.

. as t → ∞. . . no matter what x(0) is • all trajectories of x = Ax converge to 0 as t → ∞ ˙ fact: x = Ax is stable if and only if all eigenvalues of A have negative real ˙ part: ℜλi < 0. n Solution via Laplace transform and matrix exponential 10–26 . . . . . . . i = 1. respectively (n1 + · · · + nr = n) then xi(t) has form xi(t) = r pij (t)eλj t j=1 where pij (t) is a polynomial of degree < nj (that depends linearly on x(0)) Solution via Laplace transform and matrix exponential 10–25 Stability we say system x = Ax is stable if etA → 0 as t → ∞ ˙ meaning: • state x(t) converges to 0. .now suppose A has repeated eigenvalues. λr (distinct) with multiplicities n1. so Xi can have repeated poles express eigenvalues as λ1. nr . . .

the ‘if’ part is clear since t→∞ lim p(t)eλt = 0 for any polynomial. if ℜλ < 0 we’ll see the ‘only if’ part next lecture more generally. maxi ℜλi determines the maximum asymptotic logarithmic growth rate of x(t) (or decay. if < 0) Solution via Laplace transform and matrix exponential 10–27 .

. i.t. wT (λI − A) = 0.EE263 Autumn 2010-11 Stephen Boyd Lecture 11 Eigenvectors and diagonalization • eigenvectors • dynamic interpretation: invariant sets • complex eigenvectors & invariant planes • left eigenvectors • diagonalization • modal form • discrete-time stability 11–1 Eigenvectors and eigenvalues λ ∈ C is an eigenvalue of A ∈ Cn×n if X (λ) = det(λI − A) = 0 equivalent to: • there exists nonzero v ∈ Cn s.e. i..t. (λI − A)v = 0.e. Av = λv any such v is called an eigenvector of A (associated with eigenvalue λ) • there exists nonzero w ∈ Cn s. wT A = λwT any such w is called a left eigenvector of A Eigenvectors and diagonalization 11–2 .

. we’ll consider λ ∈ C later) if v is an eigenvector. then v is an eigenvector associated with λ: taking conjugate of Av = λv we get Av = λv. for any α ∈ C. λ ∈ R. with A ∈ Rn×n. we can always ﬁnd a real eigenvector v associated with λ: if Av = λv. eigenvalue λ and eigenvector v can be complex • when A and λ are real. α = 0 • even when A is real. . then Aℜv = λℜv. Aℑv = λℑv so ℜv and ℑv are real eigenvectors. if they are nonzero (and at least one is) • conjugate symmetry : if A is real and v ∈ Cn is an eigenvector associated with λ ∈ C. eﬀect of A on v is very simple: scaling by λ Ax v x Av (what is λ here?) Eigenvectors and diagonalization 11–4 . Eigenvectors and diagonalization 11–3 Scaling interpretation (assume λ ∈ R for now. then so is αv. and v ∈ Cn. so Av = λv we’ll assume A is real from now on .• if v is an eigenvector of A with eigenvalue λ.

) Eigenvectors and diagonalization 11–5 Dynamic interpretation suppose Av = λv. v = 0 if x = Ax and x(0) = v. |λ| < 1: Av smaller than v • λ ∈ R. λ < 0: v and Av point in opposite directions • λ ∈ R.• λ ∈ R. then x(t) = eλtv ˙ several ways to see this. e. . x(t) = etAv = I + tA + (tA)2 + ··· v 2! (λt)2 = v + λtv + v + ··· 2! = eλtv (since (tA)k v = (λt)k v) Eigenvectors and diagonalization 11–6 . |λ| > 1: Av larger than v (we’ll see later how this relates to stability of continuous. ..and discrete-time systems. λ > 0: v and Av point in same direction • λ ∈ R.g.

: once trajectory enters S. solution is complex (we’ll interpret later). resulting motion is very simple — always on the line spanned by v • solution x(t) = eλtv is called mode of system x = Ax (associated with ˙ eigenvalue λ) • for λ ∈ R. never out Eigenvectors and diagonalization 11–8 . it stays in S trajectory S vector ﬁeld interpretation: trajectories only cut into S.• for λ ∈ C. then ˙ x(τ ) ∈ S for all τ ≥ t i. λ < 0. for now. mode expands or grows as t ↑ Eigenvectors and diagonalization 11–7 Invariant sets a set S ⊆ Rn is invariant under x = Ax if whenever x(t) ∈ S.e. λ > 0. mode contracts or shrinks as t ↑ • for λ ∈ R. assume λ∈R • if initial state is an eigenvector v.

λ is complex for a ∈ C.suppose Av = λv. ray { tv | t > 0 } is invariant) • if λ < 0. v = 0. (complex) trajectory aeλtv satisﬁes x = Ax ˙ hence so does (real) trajectory x(t) = ℜ aeλtv = eσt where v = vre + jvim. λ = σ + jω. a = α + jβ vre vim cos ωt sin ωt − sin ωt cos ωt α −β • σ gives logarithmic growth/decay factor • trajectory stays in invariant plane span{vre. vim} • ω gives angular velocity of rotation in plane Eigenvectors and diagonalization 11–10 . λ ∈ R • line { tv | t ∈ R } is invariant (in fact. v = 0. line segment { tv | 0 ≤ t ≤ a } is invariant Eigenvectors and diagonalization 11–9 Complex eigenvectors suppose Av = λv.

Dynamic interpretation: left eigenvectors suppose wT A = λwT . λ ∈ R. for any initial condition Eigenvectors and diagonalization 11–12 . λ < 0.. halfspace { z | wT z ≤ a } is invariant (for a ≥ 0) Eigenvectors and diagonalization 11–11 Summary • right eigenvectors are initial conditions from which resulting motion is simple (i. remains on line or in plane) • left eigenvectors give linear functions of state that are simple.. wT x satisﬁes the DE d(wT x)/dt = λ(wT x) hence wT x(t) = eλtwT x(0) • even if trajectory x is complicated..g.e. w = 0 d T (w x) = wT x = wT Ax = λ(wT x) ˙ dt i.e. (ℜw)T x and (ℑw)T x both have form eσt (α cos(ωt) + β sin(ωt)) • if. wT x is simple then • for λ = σ + jω ∈ C. e.

5 2 2.5 −1 0 1 0.5 0.5 4 4.5 2 2.5 t 3 3. ± j 10 Eigenvectors and diagonalization 11–13 trajectory with x(0) = (0. −1 −10 −10 0 0 x example 1: x = 1 ˙ 0 1 0 block diagram: x1 −1 1/s 1/s x2 −10 1/s x3 −10 X (s) = s3 + s2 + 10s + 10 = (s + 1)(s2 + 10) √ eigenvalues are −1.5 t 3 3. −1.5 5 x3 0.5 5 Eigenvectors and diagonalization 11–14 .5 0 0. 1): 2 x1 1 0 −1 0 0.5 4 4.5 0 −0.5 1 1.5 4 4.5 5 x2 0 −0.5 1 1.5 t 3 3.5 1 1.5 2 2.

5 4 4. 0.5 5 t Eigenvectors and diagonalization 11–15 √ eigenvector associated with eigenvalue j 10 is −0.5 3 3.8 0.5 1 1.771 v = 0.175 −0. 1) (as above): 1 0.7 0.175 0.2 0.077 so an invariant plane is spanned by −0.5 0.244 + j0.771 = 0.1 0 0 0.5 2 2.055 0.554 vre = 0.554 + j0.left eigenvector asssociated with eigenvalue −1 is 0.9 0.055 − j0.4 0. −1.1 g= 0 1 let’s check g T x(t) when x(0) = (0.3 0.6 gT x 0.077 vim Eigenvectors and diagonalization 11–16 .244 .

5 t 3 3.5 4 4.5 4 4.5 5 x3 0 −0.5 0 0. so (such matrices are called stochastic) rewrite as: =1 [1 1 · · · 1]P = [1 1 · · · 1] i.e. [1 1 · · · 1] is a left eigenvector of P with e. with x(0) = vre we have 1 x1 0 −1 0 0. if p(0) = v then p(t) = v for all t ≥ 0 (if v is unique it is called the steady-state distribution of the Markov chain) Eigenvectors and diagonalization 11–18 .5 1 1.for example. 1 hence det(I − P ) = 0..e.v.5 1 1.5 1 1.5 2 2.5 t 3 3.1 0.5 4 4..5 2 2.5 t 3 3.5 2 2. hence we can n normalize v so that i=1 vi = 1 interpretation: v is an equilibrium distribution.5 5 Eigenvectors and diagonalization 11–17 Example 2: Markov chain probability distribution satisﬁes p(t + 1) = P p(t) pi(t) = Prob( z(t) = i ) so n i=1 pi (t) =1 n i=1 Pij Pij = Prob( z(t + 1) = i | z(t) = j ).5 5 x2 0 −0. so there is a right eigenvector v = 0 with P v = v it can be shown that v can be chosen so that vi ≥ 0.1 0 0. i.5 0.

i = 1. . T −1AT = Λ is diagonal • A has a set of n linearly independent eigenvectors (if A is not diagonalizable. . . . . . . vn is a linearly independent set of n eigenvectors of A we say A is diagonalizable if • there exists T s. . .e.. vn is a linearly independent set of eigenvectors of A ∈ Rn×n: Avi = λivi. . • T invertible since v1.t. it is sometimes called defective) Eigenvectors and diagonalization 11–20 . n so v1.. . . . . . . . λn A v1 · · · vn = v1 · · · vn deﬁne T = v1 · · · vn and Λ = diag(λ1. .. . .t. . so AT = T Λ and ﬁnally T −1AT = Λ Eigenvectors and diagonalization 11–19 • similarity transformation by T diagonalizes A conversely if there is a T = [v1 · · · vn] s. i. .Diagonalization suppose v1. n express as λ1 . Avi = λivi. λn). . λn) then AT = T Λ. . vn linearly independent T −1AT = Λ = diag(λ1. . i = 1. . . .

Not all matrices are diagonalizable example: A = 0 1 0 0 characteristic polynomial is X (s) = s2. i.e. then A is diagonalizable (the converse is false — A can have repeated eigenvalues but still be diagonalizable) Eigenvectors and diagonalization 11–22 . 0 1 0 0 v1 v2 =0 so all eigenvectors have form v = v1 0 where v1 = 0 thus. i.. λi = λj for i = j. so λ = 0 is only eigenvalue eigenvectors satisfy Av = 0v = 0.e. A cannot have two independent eigenvectors Eigenvectors and diagonalization 11–21 Distinct eigenvalues fact: if A has distinct eigenvalues.

or T T w1 w1 . wn are the rows of T −1 thus T T wi A = λ i w i i. so ˜ ˙ T x = AT x ˜ ˜ ⇔ ˙ x = T −1AT x ˜ ˜ ⇔ ˙ x = Λ˜ ˜ x Eigenvectors and diagonalization 11–24 . . . . indep. the rows of T −1 are (lin. left & right eigenvectors chosen this way are dual bases) Eigenvectors and diagonalization 11–23 Modal form suppose A is diagonalizable by T deﬁne new coordinates by x = T x. A = Λ ..) left eigenvectors.e.e. .Diagonalization and left eigenvectors rewrite T −1AT = Λ as T −1A = ΛT −1. .. . normalized so that T wi vj = δij (i. T T wn wn T T where w1 .

. . Mr+1. Mn−1) where Λr = diag(λ1. and Mi = σi ωi −ωi σi . . Mr+3. r + 3. . system can be put in real modal form: S −1AS = diag (Λr . . . λr ) are the real eigenvalues. . . . . . xi(t) = eλitxi(0) ˜ ˜ hence the name modal form Eigenvectors and diagonalization 11–25 Real modal form when eigenvalues (hence T ) are complex. λi = σi + jωi. n where λi are the complex eigenvalues (one from each conjugate pair) Eigenvectors and diagonalization 11–26 .in new coordinate system. . i.. system is diagonal (decoupled): 1/s λ1 x1 ˜ 1/s λn xn ˜ trajectories consist of n independent modes. i = r + 1.e.

. resolvent: (sI − A)−1 = = sT T −1 − T ΛT −1 T (sI − Λ)T −1 −1 −1 = T (sI − Λ)−1T −1 = T diag powers (i... discrete-time solution): Ak = = T ΛT −1 k 1 1 ..e. . .e. T −1 s − λ1 s − λn = T Λk T −1 T ΛT −1 · · · T ΛT −1 = T diag(λk ...g.block diagram of ‘complex mode’: σ 1/s −ω ω 1/s σ Eigenvectors and diagonalization 11–27 diagonalization simpliﬁes many matrix expressions e... all λi = 0) Eigenvectors and diagonalization 11–28 . . i. λk )T −1 1 n (for k < 0 only if A invertible.

.. . . given by power series f (a) = β0 + β1a + β2a2 + β3a3 + · · · we can deﬁne f (A) for A ∈ Rn×n (i. eλn )T −1 Eigenvectors and diagonalization 11–29 Analytic function of a matrix for any analytic function f : R → R. overload f ) as f (A) = β0I + β1A + β2A2 + β3A3 + · · · substituting A = T ΛT −1.exponential (i. .e.e. . continuous-time solution): eA = I + A + A2/2! + · · · = I + T ΛT −1 + T ΛT −1 = T eΛT −1 2 = T (I + Λ + Λ2/2! + · · ·)T −1 /2! + · · · = T diag(eλ1 . . i. .. f (λn))T −1 Eigenvectors and diagonalization 11–30 . . we have f (A) = β0I + β1A + β2A2 + β3A3 + · · · = β0T T −1 + β1T ΛT −1 + β2(T ΛT −1)2 + · · · = T β0I + β1Λ + β2Λ2 + · · · T −1 = T diag(f (λ1). .e.

with T −1AT = Λ ˙ then x(t) = etAx(0) = T eΛtT −1x(0) n = i=1 T eλit(wi x(0))vi thus: any trajectory can be expressed as linear combination of modes Eigenvectors and diagonalization 11–31 interpretation: • (left eigenvectors) decompose initial state x(0) into modal components T wi x(0) • eλit term propagates ith mode forward t seconds • reconstruct state as linear combination of (right) eigenvectors Eigenvectors and diagonalization 11–32 .Solution via diagonalization assume A is diagonalizable consider LDS x = Ax.

n. and the others. . . . . i = s + 1. ℜλs < 0. . ℜλn ≥ 0 from x(t) = i=1 n T eλit(wi x(0))vi condition for x(t) → 0 is: x(0) ∈ span{v1. . . vs}.application: for what x(0) do we have x(t) → 0 as t → ∞? divide eigenvalues into those with negative real parts ℜλ1 < 0. then Ak = T Λk T −1 then x(t) = A x(0) = i=1 t n T λt(wi x(0))vi → 0 i as t → ∞ for all x(0) if and only if |λi| < 1. . or equivalently. so we have fact: x(t + 1) = Ax(t) is stable if and only if all eigenvalues of A have magnitude less than one Eigenvectors and diagonalization 11–34 . ℜλs+1 ≥ 0. . . . T wi x(0) = 0. . . . n (can you prove this?) Eigenvectors and diagonalization 11–33 Stability of discrete-time systems suppose A diagonalizable consider discrete-time LDS x(t + 1) = Ax(t) if A = T ΛT −1. . . . . . i = 1. . we will see later that this is true even when A is not diagonalizable.

EE263 Autumn 2010-11 Stephen Boyd Lecture 12 Jordan canonical form • Jordan canonical form • generalized modes • Cayley-Hamilton theorem 12–1 Jordan canonical form what if A cannot be diagonalized? any matrix A ∈ Rn×n can be put in Jordan canonical form by a similarity transformation... Jq T −1AT = J = where λi 1 λi is called a Jordan block of size ni with eigenvalue λi (so n = Jordan canonical form Ji = ... ∈ Cni×ni 1 λi q i=1 ni ) 12–2 .. .. i. J1 .e.

never used in numerical computations! X (s) = det(sI − A) = (s − λ1)n1 · · · (s − λq )nq hence distinct eigenvalues ⇒ ni = 1 ⇒ A diagonalizable dim N (λI − A) is the number of Jordan blocks with eigenvalue λ more generally. . ni} so from dim N (λI − A)k for k = 1. we can determine the sizes of the Jordan blocks associated with λ Jordan canonical form 12–4 . dim N (λI − A)k = λi =λ min{k. 2. .• J is upper bidiagonal • J diagonal is the special case of n Jordan blocks of size ni = 1 • Jordan form is unique (up to permutations of the blocks) • can have multiple blocks with same eigenvalue Jordan canonical form 12–3 note: JCF is a conceptual tool. .

. vini are sometimes called generalized eigenvectors Jordan canonical form 12–6 . . .. . . . .• factor out T and T −1. . .e. i. Jq ) express T as where Ti ∈ Cn×ni T = [T1 T2 · · · Tq ] are the columns of T associated with ith Jordan block Ji we have ATi = TiJi let Ti = [vi1 vi2 · · · vini ] then we have: Avi1 = λivi1.v. λi for j = 2. ni. . Avij = vi j−1 + λivij the vectors vi1. a block of size 3: 0 −1 0 0 −1 λiI−Ji = 0 0 0 0 0 0 1 (λiI−Ji)2 = 0 0 0 0 0 0 (λiI−Ji)3 = 0 • for other blocks (say. for k ≥ 2) (λi − λj )k k 0 (λiI−Jj ) = 0 −k(λi − λj )k−1 (k(k − 1)/2)(λi − λj )k−2 (λj − λi)k −k(λj − λi)k−1 k 0 (λj − λi) Jordan canonical form 12–5 Generalized eigenvectors suppose T −1AT = J = diag(J1. say. size 3. λI − A = T (λI − J)T −1 • for. the ﬁrst column of each Ti is an eigenvector associated with e. .

. exponential of Jordan block resolvent of k × k Jordan block with eigenvalue λ: s−λ −1 s−λ −1 (sI − Jλ)−1 = .. −1 (s − λ) −1 s−λ = (s − λ)−1I + (s − λ)−2F1 + · · · + (s − λ)−k Fk−1 where Fi is the matrix with ones on the ith upper diagonal Jordan canonical form 12–8 ... can put into form x = J x ˜ ˜ ˜ ˙ system is decomposed into independent ‘Jordan block systems’ xi = Jixi ˜ ˜ xni ˜ xni−1 ˜ x1 ˜ 1/s λ 1/s λ 1/s λ Jordan blocks are sometimes called Jordan chains (block diagram shows why) Jordan canonical form 12–7 Resolvent.. . . . (s − λ)−1 (s − λ)−2 · · · (s − λ)−k (s − λ)−1 · · · (s − λ)−k+1 = .Jordan form LDS consider LDS x = Ax ˙ ˙ by change of coordinates x = T x..

where p is polynomial • such solutions are called generalized modes of the system Jordan canonical form 12–10 .. 1 Jordan blocks yield: • repeated poles in resolvent • terms of form tpetλ in etA Jordan canonical form 12–9 Generalized modes consider x = Ax.. exponential is: etJλ = etλ I + tF1 + · · · + (tk−1/(k − 1)!)Fk−1 1 t · · · tk−1/(k − 1)! 1 · · · tk−2/(k − 2)! = etλ . with ˙ x(0) = a1vi1 + · · · + ani vini = Tia then x(t) = T eJtx(0) = TieJita ˜ • trajectory stays in span of generalized eigenvectors • coeﬃcients have form p(t)eλt. .by inverse Laplace transform. .

T Sq hence: all solutions of x = Ax are linear combinations of (generalized) ˙ modes Jordan canonical form 12–11 Cayley-Hamilton theorem if p(s) = a0 + a1s + · · · + ak sk is a polynomial and A ∈ Rn×n.with general x(0) we can write q x(t) = e x(0) = T e T where T −1 tA tJ −1 x(0) = i=1 T TietJi (Si x(0)) T S1 . = . where X (s) = det(sI − A) example: with A = 1 2 3 4 we have X (s) = s2 − 5s − 2. so X (A) = A2 − 5A − 2I = = 0 7 10 15 22 −5 1 2 3 4 − 2I Jordan canonical form 12–12 . we deﬁne p(A) = a0I + a1A + · · · + ak Ak Cayley-Hamilton theorem: for any A ∈ Rn×n we have X (A) = 0.

every power of A can be expressed as linear combination of I.corollary: for every p ∈ Z+. . n − 1 Jordan canonical form 12–14 . . . . . An−1 proof: divide X (s) into sp to get sp = q(s)X (s) + r(s) r = α0 + α1s + · · · + αn−1sn−1 is remainder polynomial then Ap = q(A)X (A) + r(A) = r(A) = α0I + α1A + · · · + αn−1An−1 Jordan canonical form 12–13 for p = −1: rewrite C-H theorem X (A) = An + an−1An−1 + · · · + a0I = 0 as I = A −(a1/a0)I − (a2/a0)A − · · · − (1/a0)An−1 (A is invertible ⇔ a0 = 0) so A−1 = −(a1/a0)I − (a2/a0)A − · · · − (1/a0)An−1 i. .e. we have Ap ∈ span I. . A.e. k = 0. A2. .. . inverse is linear combination of Ak . A. . . also for p ∈ Z) i.. An−1 (and if A is invertible. .

.Proof of C-H theorem ﬁrst assume A is diagonalizable: T −1AT = Λ X (s) = (s − λ1) · · · (s − λn) since X (A) = X (T ΛT −1) = T X (Λ)T −1 it suﬃces to show X (Λ) = 0 X (Λ) = (Λ − λ1I) · · · (Λ − λnI) = diag(0. . . λn − λ1) · · · diag(λ1 − λn. λ2 − λ1. .. 0) = 0 Jordan canonical form 12–15 now let’s do general case: T −1AT = J X (s) = (s − λ1)n1 · · · (s − λq )nq suﬃces to show X (Ji) = 0 ni 0 1 0 ··· X (Ji) = (Ji − λ1I)n1 · · · 0 0 1 · · · · · · (Ji − λq I)nq = 0 .. . . (Ji −λi I)ni Jordan canonical form 12–16 . . . λn−1 − λn.

with B ∈ R2×1: x(t) (with u(t) = 1) ˙ x(t) (with u(t) = −1. ˙ • Ax is called the drift term (of x) ˙ • Bu is called the input term (of x) ˙ picture.5) Ax(t) ˙ B x(t) y = Cx + Du Linear dynamical systems with inputs & outputs 13–2 .EE263 Autumn 2010-11 Stephen Boyd Lecture 13 Linear dynamical systems with inputs & outputs • inputs & outputs: interpretations • transfer function • impulse and step responses • examples 13–1 Inputs & outputs recall continuous-time time-invariant LDS has form x = Ax + Bu.

˜T are the rows of A. where B = [b1 · · · bm] ˙ • state derivative is sum of autonomous term (Ax) and one term per input (biui) • each input ui gives another degree of freedom for x (assuming columns ˙ of B independent) bi ˜ i bi write x = Ax + Bu as xi = aT x + ˜T u. B ˙ ˙ ˜i • ith state derivative is linear function of state x and input u Linear dynamical systems with inputs & outputs 13–3 Block diagram D x(t) ˙ x(t) u(t) B 1/s C y(t) A • Aij is gain factor from state xj into integrator i • Bij is gain factor from input uj into integrator i • Cij is gain factor from state xj into output yi • Dij is gain factor from input uj into output yi Linear dynamical systems with inputs & outputs 13–4 .Interpretations write x = Ax + b1u1 + · · · + bmum. where aT .

e. x2 ∈ Rn2 : d dt x1 x2 = A11 A12 0 A22 u x1 x2 + B1 0 u. e... x2 propagates autonomously • x2 aﬀects y directly and through x1 Linear dynamical systems with inputs & outputs 13–5 Transfer function take Laplace transform of x = Ax + Bu: ˙ sX(s) − x(0) = AX(s) + BU (s) hence X(s) = (sI − A)−1x(0) + (sI − A)−1BU (s) so x(t) = e x(0) + 0 tA t e(t−τ )ABu(τ ) dτ • etAx(0) is the unforced or autonomous response • etAB is called the input-to-state impulse response or impulse matrix • (sI − A)−1B is called the input-to-state transfer function or transfer matrix Linear dynamical systems with inputs & outputs 13–6 . y= C1 C2 x1 x2 B1 1/s A11 A12 1/s A22 x1 C1 y x2 C2 • x2 is not aﬀected by input u. with x1 ∈ Rn1 . i.g.interesting when there is structure.

with y = Cx + Du we have: Y (s) = C(sI − A)−1x(0) + (C(sI − A)−1B + D)U (s) so y(t) = Ce x(0) + 0 tA t Ce(t−τ )ABu(τ ) dτ + Du(t) • output term CetAx(0) due to initial condition • H(s) = C(sI − A)−1B + D is called the transfer function or transfer matrix • h(t) = CetAB + Dδ(t) is called the impulse response or impulse matrix (δ is the Dirac delta function) Linear dynamical systems with inputs & outputs 13–7 with zero initial condition we have: Y (s) = H(s)U (s). y =h∗u where ∗ is convolution (of matrix valued functions) intepretation: • Hij is transfer function from input uj to output yi Linear dynamical systems with inputs & outputs 13–8 .

e.. y = h ∗ u. τ seconds ago • i indexes output. we have s(t) = CA−1 etA − I B + D Linear dynamical systems with inputs & outputs 13–10 . i. τ indexes time lag Linear dynamical systems with inputs & outputs 13–9 Step response the step response or step matrix is given by t s(t) = 0 h(τ ) dτ interpretations: • sij (t) is step response from jth input to ith output • sij (t) gives yi when u = ej for t ≥ 0 for invertible A. j indexes input.Impulse response impulse response h(t) = CetAB + Dδ(t) with x(0) = 0. m t yi(t) = j=1 0 hij (t − τ )uj (τ ) dτ interpretations: • hij (t) is impulse response from jth input to ith output • hij (t) gives yi when u(t) = ej δ • hij (τ ) shows how dependent output i is. on what input j was.

00.29 ± j0.3 • x= y y ˙ 13–11 Linear dynamical systems with inputs & outputs system is: 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 −2 1 0 −2 1 0 1 −2 1 1 −2 1 0 1 −2 0 1 −2 0 0 0 0 0 0 1 0 −1 1 0 −1 x= ˙ x + u1 u2 eigenvalues of A are −1.71. springs.Example 1 u1 u1 u2 u2 • unit masses. dampers • u1 is tension between 1st & 2nd masses • u2 is tension between 2nd & 3rd masses • y ∈ R3 is displacement of masses 1.71 ± j0.00 ± j1.71 Linear dynamical systems with inputs & outputs 13–12 . −1. −0.2.

2 0.1 0 −0.2 0 t 10 15 roughly speaking: • impulse at u1 aﬀects third mass less than other two • impulse at u2 aﬀects ﬁrst mass later than other two Linear dynamical systems with inputs & outputs 13–13 Example 2 interconnect circuit: C3 u C1 C2 C4 • u(t) ∈ R is input (drive) voltage • xi is voltage across Ci • output is state: y = x • unit resistors.2 0.impulse response: impulse response from u1 0.1 0 −0.1 −0.1 −0.2 0 h11 h31 h21 t impulse response from u2 h22 h12 h32 5 5 10 15 0. unit capacitors • step response matrix shows delay to each node Linear dynamical systems with inputs & outputs 13–14 .

17.1 0 0 5 10 15 • shortest delay to x1.5 s1 s2 s3 0.9 0.21.e. −2.system is −3 1 1 0 1 −1 0 0 x + x= ˙ 1 0 −2 1 0 0 1 −1 1 0 u.17 Linear dynamical systems with inputs & outputs 13–16 . −3.8 0. 0 0 y=x eigenvalues of A are −0..3 0.6 0.2 0.4 s4 0. −0. longest delay to x4 • delays ≈ 10.66.7 0. dominant) eigenvalue −0. consistent with slowest (i.96 Linear dynamical systems with inputs & outputs 13–15 step response matrix s(t) ∈ R4×1: 1 0.

i. u. s(t) = 0 h(τ ) dτ ) if u(t) → u∞ ∈ Rm. x. y constant: 0 = x = Ax + Bu. ˙ eliminate x to get y = H(0)u y = Cx + Du Linear dynamical systems with inputs & outputs 13–17 • if system is stable. then y(t) → y∞ ∈ Rp where y∞ = H(0)u∞ Linear dynamical systems with inputs & outputs 13–18 . ∞ H(0) = 0 ∞ h(t) dt = lim s(t) t→∞ t (recall: H(s) = 0 e −st h(t) dt..e.DC or static gain matrix • transfer function at s = 0 is H(0) = −CA−1B + D ∈ Rm×p • DC transfer function describes system under static conditions.

DC gain matrix for example 1 (springs): 1/4 1/4 1/2 H(0) = −1/2 −1/4 −1/4 DC gain matrix for example 2 (RC circuit): 1 1 H(0) = 1 1 (do these make sense?)

Linear dynamical systems with inputs & outputs

13–19

**Discretization with piecewise constant inputs
**

linear system x = Ax + Bu, y = Cx + Du ˙ suppose ud : Z+ → Rm is a sequence, and u(t) = ud(k) deﬁne sequences xd(k) = x(kh), yd(k) = y(kh), k = 0, 1, . . . for kh ≤ t < (k + 1)h, k = 0, 1, . . .

• h > 0 is called the sample interval (for x and y) or update interval (for u) • u is piecewise constant (called zero-order-hold) • xd, yd are sampled versions of x, y

Linear dynamical systems with inputs & outputs 13–20

xd(k + 1) = x((k + 1)h)

h

= e

hA

x(kh) +

0

eτ ABu((k + 1)h − τ ) dτ

h

= ehAxd(k) +

0

eτ A dτ

B ud(k)

**xd, ud, and yd satisfy discrete-time LDS equations xd(k + 1) = Adxd(k) + Bdud(k), where
**

h

yd(k) = Cdxd(k) + Ddud(k)

Ad = e

hA

,

Bd =

0

eτ A dτ

B,

Cd = C,

Dd = D

13–21

Linear dynamical systems with inputs & outputs

**called discretized system if A is invertible, we can express integral as
**

h

**eτ A dτ = A−1 ehA − I
**

0

stability: if eigenvalues of A are λ1, . . . , λn, then eigenvalues of Ad are ehλ1 , . . . , ehλn discretization preserves stability properties since ℜλi < 0 ⇔ for h > 0

Linear dynamical systems with inputs & outputs 13–22

ehλi < 1

extensions/variations: • oﬀsets: updates for u and sampling of x, y are oﬀset in time • multirate: ui updated, yi sampled at diﬀerent intervals (usually integer multiples of a common interval h) both very common in practice

Linear dynamical systems with inputs & outputs

13–23

Dual system

the dual system associated with system x = Ax + Bu, ˙ is given by z = AT z + C T v, ˙ • all matrices are transposed • role of B and C are swapped transfer function of dual system: (B T )(sI − AT )−1(C T ) + DT = H(s)T where H(s) = C(sI − A)−1B + D

Linear dynamical systems with inputs & outputs 13–24

y = Cx + Du

w = B T z + DT v

(for SISO case, TF of dual is same as original) eigenvalues (hence stability properties) are the same

Linear dynamical systems with inputs & outputs

13–25

Dual via block diagram

in terms of block diagrams, dual is formed by:

• transpose all matrices • swap inputs and outputs on all boxes • reverse directions of signal ﬂow arrows • swap solder joints and summing junctions

Linear dynamical systems with inputs & outputs

13–26

original system:

D u(t) B 1/s x(t) C y(t)

A

dual system:

DT w(t) BT z(t) 1/s AT CT v(t)

Linear dynamical systems with inputs & outputs

13–27

Causality

interpretation of

t

x(t) = e x(0) +

0

tA

e(t−τ )ABu(τ ) dτ

t

y(t) = CetAx(0) +

0

Ce(t−τ )ABu(τ ) dτ + Du(t)

for t ≥ 0: current state (x(t)) and output (y(t)) depend on past input (u(τ ) for τ ≤ t) i.e., mapping from input to state and output is causal (with ﬁxed initial state)

Linear dynamical systems with inputs & outputs 13–28

**now consider ﬁxed ﬁnal state x(T ): for t ≤ T ,
**

t

x(t) = e(t−T )Ax(T ) +

T

e(t−τ )ABu(τ ) dτ,

i.e., current state (and output) depend on future input!

so for ﬁxed ﬁnal condition, same system is anti-causal

Linear dynamical systems with inputs & outputs

13–29

Idea of state

x(t) is called state of system at time t since: • future output depends only on current state and future input • future output depends on past input only through current state • state summarizes eﬀect of past inputs on future output • state is bridge between past inputs and future outputs

Linear dynamical systems with inputs & outputs

13–30

Change of coordinates

start with LDS x = Ax + Bu, y = Cx + Du ˙ change coordinates in Rn to x, with x = T x ˜ ˜ then ˙ x = T −1x = T −1(Ax + Bu) = T −1AT x + T −1Bu ˜ ˙ ˜

hence LDS can be expressed as ˜x ˜ ˙ x = A˜ + Bu, ˜ where ˜ A = T −1AT, ˜ B = T −1B, ˜ C = CT, ˜ D=D ˜˜ ˜ y = C x + Du

**TF is same (since u, y aren’t aﬀected): ˜ ˜ ˜ ˜ C(sI − A)−1B + D = C(sI − A)−1B + D
**

Linear dynamical systems with inputs & outputs 13–31

**Standard forms for LDS
**

can change coordinates to put A in various forms (diagonal, real modal, Jordan . . . ) e.g., to put LDS in diagonal form, ﬁnd T s.t. T −1AT = diag(λ1, . . . , λn) write ˜T b1 . T −1B = . , ˜T bn ˙ xi = λixi + ˜T u, ˜ ˜ bi

CT =

c1 · · · cn ˜ ˜

n

so

y=

i=1

cixi ˜˜

Linear dynamical systems with inputs & outputs

13–32

˜T b1

1/s λ1

x1 ˜

c1 ˜

u

y

˜T bn

1/s λn

xn ˜

cn ˜

(here we assume D = 0)

Linear dynamical systems with inputs & outputs

13–33

**Discrete-time systems
**

discrete-time LDS: x(t + 1) = Ax(t) + Bu(t), D u(t) x(t + 1) 1/z B A • only diﬀerence w/cts-time: z instead of s • interpretation of z −1 block: – unit delayor (shifts sequence back in time one epoch) – latch (plus small delay to avoid race condition)

Linear dynamical systems with inputs & outputs 13–34

y(t) = Cx(t) + Du(t)

x(t)

C

y(t)

t > 0 Linear dynamical systems with inputs & outputs 13–36 . x(2) = Ax(1) + Bu(1) = A2x(0) + ABu(0) + Bu(1). and in general. t−1 x(t) = A x(0) + τ =0 t A(t−1−τ )Bu(τ ) hence y(t) = CAtx(0) + h ∗ u Linear dynamical systems with inputs & outputs 13–35 where ∗ is discrete-time convolution and h(t) = is the impulse response D. for t ∈ Z+.we have: x(1) = Ax(0) + Bu(0). t=0 t−1 CA B.

13–37 Z-transform of time-advanced signal: ∞ V (z) = = z z −tw(t + 1) t=0 ∞ z −tw(t) t=1 = zW (z) − zw(0) Linear dynamical systems with inputs & outputs 13–38 . Linear dynamical systems with inputs & outputs t = 0. . i. . . 1. w : Z+ → Rp×q recall Z-transform W = Z(w): ∞ W (z) = t=0 z −tw(t) where W : D ⊆ C → Cp×q (D is domain of W ) time-advanced or shifted signal v: v(t) = w(t + 1).Z-transform suppose w ∈ Rp×q is a sequence (discrete-time signal).e..

Y (z) = CX(z) + DU (z) solve for X(z) to get X(z) = (zI − A)−1zx(0) + (zI − A)−1BU (z) (note extra z in ﬁrst term!) Linear dynamical systems with inputs & outputs 13–39 hence Y (z) = H(z)U (z) + C(zI − A)−1zx(0) where H(z) = C(zI − A)−1B + D is the discrete-time transfer function note power series expansion of resolvent: (zI − A)−1 = z −1I + z −2A + z −3A2 + · · · Linear dynamical systems with inputs & outputs 13–40 .Discrete-time transfer function take Z-transform of system equations x(t + 1) = Ax(t) + Bu(t). y(t) = Cx(t) + Du(t) yields zX(z) − zx(0) = AX(z) + BU (z).

EE263 Autumn 2010-11 Stephen Boyd Lecture 14 Example: Aircraft dynamics • longitudinal aircraft dynamics • wind gust & control inputs • linearized dynamics • steady-state analysis • eigenvalues & modes • impulse matrices 14–1 Longitudinal aircraft dynamics θ body axis horizontal variables are (small) deviations from operating point or trim conditions state (components): • u: velocity of aircraft along body axis • v: velocity of aircraft perpendicular to body axis (down is positive) • θ: angle between body axis and horizontal (up is positive) ˙ • q = θ: angular velocity of aircraft (pitch rate) Example: Aircraft dynamics 14–2 .

322 v −.039 0 −.Inputs disturbance inputs: • uw : velocity of wind along body axis • vw : velocity of wind perpendicular to body axis control or actuator inputs: • δe: elevator angle (δe > 0 is down) • δt: thrust Example: Aircraft dynamics 14–3 Linearized dynamics for 747. sec.319 7.01 1 −.57◦) • matrix coeﬃcients are called stability derivatives Example: Aircraft dynamics 14–4 .16 .04 δe + −1.429 ˙ q 0 ˙ θ 0 0 1 0 θ . 40000 ft.020 −. level ﬂight.065 −. 774 ft/sec.598 δt 0 0 • units: ft. u ˙ u − uw −.01rad ≈ 0.18 −.101 −.74 v − vw 0 ˙ = q . crad (= 0.003 .

vw .outputs of interest: • aircraft speed u (deviation from trim) ˙ • climb rate h = −v + 7. elevator & thrust changes solve for control variables in terms of wind velocities.9 gives steady-state change in speed & climb rate due to wind. desired speed & climb rate δe δt .0379 .34 24. h): H(0) = −CA−1B + D = 1 0 27.0020 .0229 . δe. δt) to (u.2 −15.0413 u − uw ˙ h + vw = Example: Aircraft dynamics 14–6 .74θ Example: Aircraft dynamics 14–5 Steady-state analysis ˙ DC gain from (uw .0 0 −1 −1.

e.0005 ± 0.0674j • two complex modes. respectively • system is stable (but lightly damped) • hence step responses converge (eventually) to DC gain matrix Example: Aircraft dynamics 14–8 . called short-period and phugoid..3750 ± 0. increase in speed is obtained mostly by increasing elevator (i.8818j.• level ﬂight. downwards) (thrust on 747 gives strong pitch up torque) Example: Aircraft dynamics 14–7 Eigenvalues and modes eigenvalues are −0.e. increase in climb rate is obtained by increasing thrust and increasing elevator (i.. downwards) • constant speed. −0.

1140 −0.1225 0.0899 −0. = −0.6130 −0.5 ˙ h(t) 0 −0.7510 0.0283 0.0111 0.eigenvectors are 0.0135 −0.1637 xshort xphug Example: Aircraft dynamics 14–9 Short-period mode y(t) = CetA(ℜxshort) (pure short-period mode motion) 1 0.5 −1 0 2 4 6 8 10 12 14 16 18 20 1 0.5433 ± j 0.8235 .0082 0.5 u(t) 0 −0.0941 = −0. decays in ≈ 10 sec Example: Aircraft dynamics 14–10 .5 −1 0 2 4 6 8 10 12 14 16 18 20 • only small eﬀect on speed u • period ≈ 7 sec.0962 ± j 0.0677 −0.0005 0.

5 0 5 10 15 20 Example: Aircraft dynamics 14–12 .1 0.Phugoid mode y(t) = CetA(ℜxphug ) (pure phugoid mode motion) 2 1 u(t) 0 −1 −2 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2 1 ˙ h(t) 0 −1 −2 0 200 400 600 800 1000 1200 1400 1600 1800 2000 • aﬀects both speed and climb rate • period ≈ 100 sec. decays in ≈ 5000 sec Example: Aircraft dynamics 14–11 Dynamic response to wind gusts ˙ impulse response matrix from (uw .1 0 5 10 15 20 0.5 0.5 −0. h) (gives response to short wind bursts) over time period [0. 20]: 0. vw ) to (u.1 −0.5 h21 0 h22 0 5 10 15 20 0 −0.1 h11 0 h12 0 5 10 15 20 0 −0.

1 0 200 400 600 0.5 1 0. h) over time period [0.5 2 h21 0 h22 0 5 10 15 20 1.5 0.5 0 200 400 600 Example: Aircraft dynamics 14–13 Dynamic response to actuators ˙ impulse response matrix from (δe.1 −0. δt) to (u. 600]: 0.5 −5 0 0 5 10 15 20 Example: Aircraft dynamics 14–14 .5 −0.over time period [0. 20]: 2 2 1 1 h11 0 h12 0 5 10 15 20 0 −1 −1 −2 −2 0 5 10 15 20 5 3 2.1 0.1 h11 0 h12 0 200 400 600 0 −0.5 h21 0 h22 0 200 400 600 0 −0.

over time period [0. 600]: 2 2 1 1 h11 0 h12 0 200 400 600 0 −1 −1 −2 −2 0 200 400 600 3 2 1 3 2 1 h21 0 −1 −2 −3 0 200 400 600 h22 0 −1 −2 −3 0 200 400 600 Example: Aircraft dynamics 14–15 .

i. i. can assume v ∈ Rn) Symmetric matrices.. quadratic forms.e. λ ∈ R (hence.. v = 0. A = AT fact: the eigenvalues of A are real to see this. matrix norm.EE263 Autumn 2010-11 Stephen Boyd Lecture 15 Symmetric matrices.e. and SVD • eigenvectors of symmetric matrices • quadratic forms • inequalities for quadratic forms • positive semideﬁnite matrices • norm of a matrix • singular value decomposition 15–1 Eigenvalues of symmetric matrices suppose A ∈ Rn×n is symmetric. and SVD 15–2 . suppose Av = λv. quadratic forms. matrix norm. v ∈ Cn then v Av = v (Av) = λv v = λ i=1 T T T n |vi|2 but also v Av = (Av) v = (λv) v = λ T T T n i=1 |vi|2 so we have λ = λ.

matrix norm.. . . and SVD 15–4 . and SVD 15–3 ac A = QΛQT x Interpretations Q T QT x Λ ΛQT x Q Ax linear mapping y = Ax can be decomposed as • resolve into qi coordinates • scale coordinates by λi • reconstitute with basis qi Symmetric matrices.t. qi are both left and right eigenvectors Symmetric matrices.e. q1. T Aqi = λiqi. quadratic forms.t. matrix norm. . Q−1AQ = QT AQ = Λ hence we can express A as n A = QΛQ = i=1 T T λi q i q i in particular. quadratic forms.Eigenvectors of symmetric matrices fact: there is a set of orthonormal eigenvectors of A. i. qn s. . qi qj = δij in matrix form: there is an orthogonal Q s.

quadratic forms. matrix norm. quadratic forms. geometrically. matrix norm. and SVD 15–6 .or. and SVD 15–5 example: A = = −1/2 3/2 3/2 −1/2 1 √ 2 1 1 1 −1 1 0 0 −2 1 √ 2 1 1 1 −1 T T q2 q2 x q1 x T q1 q1 x T λ1 q 1 q 1 x q2 T λ2 q 2 q 2 x Ax Symmetric matrices. • rotate by QT • diagonal real scale (‘dilation’) by Λ • rotate back by Q decomposition A= n T λi q i q i i=1 expresses A as linear combination of 1-dimensional projections Symmetric matrices.

hence vi vj = 0 • in this case we can say: eigenvectors are orthogonal • in general case (λi not distinct) we must say: eigenvectors can be chosen to be orthogonal Symmetric matrices. a set of linearly independent eigenvectors of A: Avi = λivi. can ﬁnd v1. vi = 1 then we have T T T vi (Avj ) = λj vi vj = (Avi)T vj = λivi vj T so (λi − λj )vi vj = 0 T for i = j. . ˙ i = Gv G = GT ∈ Rn×n is conductance matrix of resistive circuit thus v = −C −1Gv where C = diag(c1. λi = λj . . . . cn) ˙ note −C −1G is not symmetric Symmetric matrices. matrix norm. . . and SVD 15–8 . . and SVD 15–7 Example: RC circuit i1 v1 c1 resistive circuit in vn cn ck vk = −ik . quadratic forms. quadratic forms. vn. matrix norm. .proof (case of λi distinct) since λi distinct.

matrix norm. .use state xi = √ civi. cn) we conclude: • eigenvalues λ1. sT Csi = δij i Symmetric matrices. . . so x = C 1/2v = −C −1/2GC −1/2x ˙ ˙ √ √ where C 1/2 = diag( c1. quadratic forms. then A=B Symmetric matrices. and SVD 15–10 . . . . . and SVD 15–9 Quadratic forms a function f : Rn → R of the form n f (x) = x Ax = i. B = B T .j=1 T Aij xixj is called a quadratic form in a quadratic form we may as well assume A = AT since xT Ax = xT ((A + AT )/2)x ((A + AT )/2 is called the symmetric part of A) uniqueness: if xT Ax = xT Bx for all x ∈ Rn and A = AT . −C −1G) are real • eigenvectors qi (in xi coordinates) can be chosen orthogonal • eigenvectors in voltage coordinates. λn of −C −1/2GC −1/2 (hence. . si = C −1/2qi. matrix norm. quadratic forms. satisfy −C −1Gsi = λisi.

e.. matrix norm. we have xT Ax ≤ λ1xT x Symmetric matrices. quadratic forms.Examples • Bx • 2 = xT B T Bx − xi)2 2 n−1 i=1 (xi+1 2 • Fx − Gx sets deﬁned by quadratic forms: • { x | f (x) = a } is called a quadratic surface • { x | f (x) ≤ a } is called a quadratic region Symmetric matrices. and SVD 15–12 . A = QΛQT with eigenvalues sorted so λ1 ≥ · · · ≥ λn xT Ax = xT QΛQT x = (QT x)T Λ(QT x) n = i=1 T λi(qi x)2 n ≤ λ1 T (qi x)2 i=1 2 = λ1 x i. quadratic forms. matrix norm. and SVD 15–11 Inequalities for quadratic forms suppose A = AT .

e.e. i. quadratic forms.. all eigenvalues are nonnegative ..similar argument shows xT Ax ≥ λn x 2. i. so the inequalities are tight Symmetric matrices. and SVD 15–14 0) • A ≥ 0 if and only if λmin(A) ≥ 0. so we have λnxT x ≤ xT Ax ≤ λ1xT x sometimes λ1 is called λmax. matrix norm. and SVD 15–13 Positive semideﬁnite and positive deﬁnite matrices suppose A = AT ∈ Rn×n we say A is positive semideﬁnite if xT Ax ≥ 0 for all x • denoted A ≥ 0 (and sometimes A • not the same as Aij ≥ 0 for all i. quadratic forms. matrix norm. λn is called λmin note also that T q1 Aq1 = λ1 q1 2. all eigenvalues are positive Symmetric matrices. j we say A is positive deﬁnite if xT Ax > 0 for all x = 0 • denoted A > 0 • A > 0 if and only if λmin(A) > 0. T qn Aqn = λn qn 2.

for example: • A ≥ 0 means A is positive semideﬁnite • A > B means xT Ax > xT Bx for all x = 0 Symmetric matrices. e. etc. then A + C ≥ B + D • if B ≤ 0 then A + B ≤ A • A2 ≥ 0 • if A ≥ 0 and α ≥ 0. then A−1 > 0 matrix inequality is only a partial order : we can have A ≥ B. A < B if B − A > 0.g. we say A is indeﬁnite matrix inequality: if B = B T ∈ Rn we say A ≥ B if A − B ≥ 0. matrix norm. • if A ≥ B and C ≥ D.. (such matrices are called incomparable) B≥A Symmetric matrices. quadratic forms. quadratic forms. and SVD 15–15 many properties that you’d guess hold actually do. and SVD 15–16 . matrix norm. then αA ≥ 0 • if A > 0.Matrix inequalities • we say A is negative semideﬁnite if −A ≥ 0 • we say A is negative deﬁnite if −A > 0 • otherwise.

e. centered at 0 s1 s2 E Symmetric matrices. quadratic forms. xT Ax is large. quadratic forms. and SVD 15–18 . matrix norm. hence ellipsoid is thin in direction q1 • in direction qn. where B > 0. xT Ax is small.Ellipsoids if A = AT > 0. i.: • eigenvectors determine directions of semiaxes • eigenvalues determine lengths of semiaxes note: • in direction q1. the set E = { x | xT Ax ≤ 1 } is an ellipsoid in Rn. and SVD 15–17 semi-axes are given by si = λi −1/2 qi. matrix norm. then E ⊆ E ⇐⇒ A ≥ B Symmetric matrices. hence ellipsoid is fat in direction qn • λmax/λmin gives maximum eccentricity ˜ ˜ if E = { x | xT Bx ≤ 1 }.

Gain of a matrix in a direction suppose A ∈ Rm×n (not necessarily square or symmetric) for x ∈ Rn. quadratic forms. matrix norm. and SVD 15–19 Matrix norm the maximum gain Ax x=0 x is called the matrix norm or spectral norm of A and is denoted A max xT AT Ax Ax 2 = max = λmax(AT A) max 2 2 x=0 x=0 x x so we have A = λmax(AT A) similarly the minimum gain is given by min Ax / x = x=0 λmin(AT A) Symmetric matrices. Ax / x gives the ampliﬁcation factor or gain of A in the direction x obviously. quadratic forms. matrix norm. and SVD 15–20 . gain varies with direction of input x questions: • what is maximum gain of A (and corresponding maximum gain direction)? • what is minimum gain of A (and corresponding minimum gain direction)? • how does gain of A vary with direction? Symmetric matrices.

620 90.99 = 9.620 0. eigenvector of AT A associated with λmax • ‘min gain’ input direction is x = qn.785 2. quadratic forms.785 −0.18 = 4.785 = 1.620 T then A = λmax(AT A) = 9.785 0. quadratic forms.620 0.785 0. eigenvector of AT A associated with λmin Symmetric matrices. matrix norm.620 0.785 −0. λmax ≥ 0 • ‘max gain’ input direction is x = q1. and SVD 15–21 1 2 example: A = 3 4 5 6 AT A = = 35 44 44 56 0.78 15–22 0. matrix norm. A Symmetric matrices.265 0.620 0. and SVD .53 7.53: 0.note that • AT A ∈ Rn×n is symmetric and AT A ≥ 0 so λmin.7 0 0 0.

18 0.514 −0.785 −0. and SVD 15–23 Properties of matrix norm • consistent with vector norm: matrix norm of a ∈ Rn×1 is √ λmax(aT a) = aT a • for any x.min gain is λmin(AT A) = 0. matrix norm.53 x Symmetric matrices.620 = 1. matrix norm. Ax ≤ A • scaling: aA = |a| A • triangle inequality: A + B ≤ A + B • deﬁniteness: A = 0 ⇔ A = 0 • norm of product: AB ≤ A B x Symmetric matrices. we have 0.14 = 0.785 −0.46 = 0. quadratic forms. and SVD 15–24 . quadratic forms. A for all x = 0.620 0.514: 0.514 ≤ Ax ≤ 9.

V = [v1 · · · vr ]. quadratic forms. σr ). and SVD 15–26 . where σ1 ≥ · · · ≥ σr > 0 Symmetric matrices. and SVD 15–25 with U = [u1 · · · ur ]. V T V = I • Σ = diag(σ1. . r A = U ΣV T = i=1 T σiuivi • σi are the (nonzero) singular values of A • vi are the right or input singular vectors of A • ui are the left or output singular vectors of A Symmetric matrices. quadratic forms. U T U = I • V ∈ Rn×r . matrix norm. Rank(A) = r • U ∈ Rm×r . . matrix norm. . .Singular value decomposition more complete picture of gain properties of A given by singular value decomposition (SVD) of A: A = U ΣV T where • A ∈ Rm×n.

. . vr are orthonormal basis for N (A)⊥ Symmetric matrices. . . matrix norm. AAT = (U ΣV T )(U ΣV T )T = U Σ2U T hence: • ui are eigenvectors of AAT (corresponding to nonzero eigenvalues) • σi = λi(AAT ) (and λi(AAT ) = 0 for i > r) • u1. ur are orthonormal basis for range(A) • v1. . and SVD 15–28 . quadratic forms. quadratic forms. and SVD 15–27 similarly. matrix norm.AT A = (U ΣV T )T (U ΣV T ) = V Σ2V T hence: • vi are eigenvectors of AT A (corresponding to nonzero eigenvalues) • σi = λi(AT A) (and λi(AT A) = 0 for i > r) • A = σ1 Symmetric matrices. .

matrix norm. vr • scale coeﬃcients by σi • reconstitute along output directions u1. . . . . quadratic forms. ur diﬀerence with eigenvalue decomposition for symmetric A: input and output directions are diﬀerent Symmetric matrices. quadratic forms. and SVD 15–30 . and SVD 15–29 • v1 is most sensitive (highest gain) input direction • u1 is highest gain output direction • Av1 = σ1u1 Symmetric matrices. . . matrix norm. .Interpretations r A = U ΣV T = i=1 T σiuivi x VT V Tx Σ ΣV T x U Ax linear mapping y = Ax can be decomposed as • compute coeﬃcients of x along input directions v1. .

matrix norm. 7..6)(0. quadratic forms. σ2 = 0.6v2 T T • now form Ax = (v1 x)σ1u1 + (v2 x)σ2u2 = (0.05) • input components along directions v1 and v2 are ampliﬁed (by about 10) and come out mostly along plane spanned by u1. u2 • input components along directions v3 and v4 are attenuated (by about 10) • Ax / x can range between 10 and 0. quadratic forms. v2 x = 0. and SVD 15–32 .5 T T • resolve x along v1.05 • A is nonsingular • for some applications you might say A is eﬀectively rank 2 Symmetric matrices. with σ1 = 1. 0.6. x = 0. v2: v1 x = 0.1.5v1 + 0. 0. matrix norm. and SVD 15–31 example: A ∈ R2×2.e. i.5)(1)u1 + (0.5.5)u2 v2 u1 x Ax v1 u2 Symmetric matrices.SVD gives clearer picture of gain as function of input/output directions example: consider A ∈ R4×4 with Σ = diag(10.

A† = AT (AAT )−1 gives the least-norm solution xln = A†y SVD Applications 16–2 . A† = (AT A)−1AT gives the least-squares approximate solution xls = A†y if A is fat and full rank.EE263 Autumn 2010-11 Stephen Boyd Lecture 16 SVD Applications • general pseudo-inverse • full SVD • image of unit ball under linear transformation • SVD in estimation/inversion • sensitivity of linear equations to data error • low rank approximation via SVD 16–1 General pseudo-inverse if A = 0 has SVD A = U ΣV T . A† = V Σ−1U T is the pseudo-inverse or Moore-Penrose inverse of A if A is skinny and full rank.

least-squares approximate solution SVD Applications 16–3 Pseudo-inverse via regularization for µ > 0. xpinv is the minimum-norm.e. i. we have lim AT A + µI µ→0 −1 A T = A† (check this!) SVD Applications 16–4 .in general case: Xls = { z | Az − y = min Aw − y } w is set of least-squares approximate solutions xpinv = A†y ∈ Xls has minimum norm on Xls. AT A + µI > 0 and so is invertible then we have lim xµ = A†y µ→0 2 +µ x 2 −1 AT y in fact. xµ = AT A + µI here.e. let xµ be (unique) minimizer of Ax − y i...

. T v1 .e.Full SVD SVD of A ∈ Rm×n with Rank(A) = r: σ1 . U = [U1 U2] ∈ Rm×m and V = [V1 V2] ∈ Rn×n are orthogonal • add zero rows/cols to Σ1 to form Σ ∈ Rm×n: Σ= Σ1 0(m − r)×r 0r×(n − r) 0(m − r)×(n − r) 16–5 SVD Applications then we have A = U1Σ1V1T = i. V2 ∈ Rn×(n−r) s. T σr vr A = U1Σ1V1T = u1 · · · u r • ﬁnd U2 ∈ Rm×(m−r). ..t.: A = U ΣV T U1 U2 Σ1 0(m − r)×r 0r×(n − r) 0(m − r)×(n − r) V1T V2T called full SVD of A (SVD with positive singular values only called compact SVD) SVD Applications 16–6 .

0.Image of unit ball under linear transformation full SVD: A = U ΣV T gives intepretation of y = Ax: • rotate (by V T ) • stretch along axes by σi (σi = 0 for i > r) • zero-pad (if m > n) or truncate (if m < n) to get m-vector • rotate (by U ) SVD Applications 16–7 Image of unit ball under A 1 1 rotate by V T 1 1 stretch. Σ = diag(2. SVD Applications 16–8 .5) u2 u1 rotate by U 0.5 2 {Ax | x ≤ 1} is ellipsoid with principal axes σiui.

unbiased) ˆ • estimation or inversion error is x = x − x = Bv ˜ ˆ • set of possible estimation errors is ellipsoid x ∈ Eunc = { Bv | v ≤ α } ˜ • x = x − x ∈ x − Eunc = x + Eunc. where • y ∈ Rm is measurement • x ∈ Rn is vector to be estimated • v is a measurement noise or error ‘norm-bound’ model of noise: we assume v ≤ α but otherwise know nothing about v (α gives max norm of noise) SVD Applications 16–9 • consider estimator x = By.. of course) SVD Applications 16–10 . centered at estimate x ˆ • ‘good’ estimator has ‘small’ Eunc (with BA = I.e.e. i.: ˆ ˜ ˆ ˆ true x lies in uncertainty ellipsoid Eunc.SVD in estimation/inversion suppose y = Ax + v. with BA = I (i.

e. i. maximum norm of error is α B . x − x ≤ α B ˆ optimality of least-squares: suppose BA = I is any estimator. the least-squares estimator gives the smallest uncertainty ellipsoid SVD Applications 16–11 Example: navigation using range measurements (lect..e.. 4) we have T k1 kT y = − 2 x T k3 T k4 where ki ∈ R2 using ﬁrst two measurements and inverting: x=− ˆ T k1 T k2 −1 02×2 y using all four measurements and least-squares: x = A† y ˆ SVD Applications 16–12 .semiaxes of Eunc are ασiui (singular values & vectors of B) e. and Bls = A† is the least-squares estimator then: T • BlsBls ≤ BB T • Els ⊆ E • in particular Bls ≤ B i..g.

so B = Bls + Z then ZA = ZU ΣV T = 0. with V orthogonal Bls = A† = V Σ−1U T . and B satisﬁes BA = I deﬁne Z = B − Bls. is full rank SVD: A = U ΣV T .uncertainty regions (with α = 1): uncertainty region for x using inversion 20 15 x2 10 5 0 −10 −8 x1 uncertainty region for x using least-squares −6 −4 −2 0 2 4 6 8 10 20 15 x2 10 5 0 −10 −8 −6 −4 −2 x1 0 2 4 6 8 10 SVD Applications 16–13 Proof of optimality property suppose A ∈ Rm×n. so ZU = 0 (multiply by V Σ−1 on right) therefore BB T = (Bls + Z)(Bls + Z)T T T = BlsBls + BlsZ T + ZBls + ZZ T T = BlsBls + ZZ T T ≥ BlsBls T using ZBls = (ZU )Σ−1V T = 0 SVD Applications 16–14 . m > n.

A ∈ Rn×n invertible.Sensitivity of linear equations to data error consider y = Ax. we also have y ≤ A δx ≤ A x x . i. • small errors in y can lead to large errors in x • can’t solve for x given y (with small errors) • hence. y becomes y + δy then x becomes x + δx with δx = A−1δy hence we have δx = A−1δy ≤ A−1 if A−1 is large.e. of course x = A−1y suppose we have an error or noise in y. A can be considered singular in practice SVD Applications 16–15 δy a more reﬁned analysis uses relative instead of absolute errors in x and y since y = Ax. in terms of # bits of guaranteed accuracy: # bits accuracy in solution ≈ # bits accuracy in data − log2 κ SVD Applications 16–16 .. hence A−1 δy y κ(A) = A A−1 = σmax(A)/σmin(A) is called the condition number of A we have: relative error in solution x ≤ condition number · relative error in data y or.

take p to get rank p approximant SVD Applications 16–18 . Rank(A) ≤ p < r. with SVD A = U ΣV T = i=1 T σiuivi ˆ ˆ ˆ we seek matrix A. A ≈ A in the sense that ˆ A − A is minimized solution: optimal rank p approximator is p ˆ A= i=1 T σiuivi ˆ • hence A − A = r T i=p+1 σi ui vi = σp+1 T • interpretation: SVD dyads uivi are ranked in order of ‘importance’. Rank(A) = r. s.t.we say • A is well conditioned if κ is small • A is poorly conditioned if κ is large (deﬁnition of ‘small’ and ‘large’ depend on application) same analysis holds for least-squares approximate solutions with A nonsquare. κ = σmax(A)/σmin(A) SVD Applications 16–17 Low rank approximations r suppose A ∈ R m×n .

. dim span{v1. vp+1} = p + 1 hence. . .proof: suppose Rank(B) ≤ p then dim N (B) ≥ n − p also. small σmin means A is near to a singular matrix SVD Applications 16–20 . z ∈ span{v1. i.e. there is a unit vector z ∈ Rn s. . vp+1} p+1 T σiuivi z i=1 (A − B)z = Az = p+1 (A − B)z 2 = i=1 2 T 2 σi (vi z)2 ≥ σp+1 z 2 ˆ hence A − B ≥ σp+1 = A − A SVD Applications 16–19 Distance to singularity another interpretation of σi: σi = min{ A − B | Rank(B) ≤ i − 1 } i. .t. . the two subspaces intersect.. Bz = 0.e. . . the distance (measured by matrix norm) to the nearest rank i − 1 matrix for example. σn = σmin is distance to nearest singular matrix hence. if A ∈ Rn×n. .

30. . . . . 0. where • A ∈ R100×30 has SVs 10.1 T then the terms σiuivi x. for i = 5.5. 0. are substantially smaller than the noise term v simpliﬁed model: 4 y= i=1 T σiuivi x + v SVD Applications 16–21 . . . 7. 0. 2.0001 • x is on the order of 1 • unknown error or noise v has norm on the order of 0. .01. .application: model simpliﬁcation suppose y = Ax + v.

1]. t)|2 is probability density of particle at position x. i = Example: Quantum mechanics . t)|2 dx = 1 for all t) evolution of Ψ governed by Schrodinger equation: i¯ Ψ = h˙ V − h ¯2 2 ∇ Ψ = HΨ 2m x √ −1 17–2 where H is Hamiltonian operator. mass m • potential V : [0.EE263 Autumn 2010-11 Stephen Boyd Lecture 17 Example: Quantum mechanics • wave function and Schrodinger equation • discretization • preservation of probability • eigenvalues & eigenstates • example 17–1 Quantum mechanics • single particle in interval [0. 1] × R+ → C is (complex-valued) wave function interpretation: |Ψ(x. 1] → R Ψ : [0. time t 1 (so 0 |Ψ(x.

.. (which approximates w = ∂ 2v/∂x2) Example: Quantum mechanics 17–3 discretized Schrodinger equation is (complex) linear dynamical system ˙ Ψ = (−i/¯ )(V − (¯ /2m)∇2)Ψ = (−i/¯ )HΨ h h h where V is a diagonal matrix with Vkk = V (k/N ) hence we analyze using linear dynamical system theory (with complex vectors & matrices): ˙ Ψ = (−i/¯ )HΨ h h solution of Shrodinger equation: Ψ(t) = e(−i/¯ )tH Ψ(0) h matrix e(−i/¯ )tH propogates wave function forward in time t seconds (backward if t < 0) Example: Quantum mechanics 17–4 . N wave function is approximated as vector Ψ(t) ∈ CN ∇2 operator is approximated as matrix x −2 1 1 −2 1 2 2 1 −2 ∇ =N . 1 −2 1 . k = 1.Discretization let’s discretize position x into N discrete points... . k/N ... . . . so w = ∇2v means wk = (vk+1 − vk )/(1/N ) − (vk − vk−1)(1/N ) 1/N .

our discretization preserves probability exactly Example: Quantum mechanics 17–5 h U = e−(i/¯ )tH is unitary. meaning U ∗U = I unitary is extension of orthogonal for complex matrix: if U ∈ CN ×N is unitary and z ∈ CN . then Uz 2 = (U z)∗(U z) = z ∗U ∗U z = z ∗z = z 2 Example: Quantum mechanics 17–6 . Ψ(t) 2 is constant.Preservation of probability d Ψ dt 2 d ∗ Ψ Ψ dt ˙ ˙ = Ψ∗Ψ + Ψ∗Ψ = = ((−i/¯ )HΨ)∗Ψ + Ψ∗((−i/¯ )HΨ) h h = (i/¯ )Ψ∗HΨ + (−i/¯ )Ψ∗HΨ h h = 0 (using H = H T ∈ RN ×N ) hence.

probability density h |Ψm(t)|2 = e(−i/¯ )λk tvk 2 = |vmk |2 doesn’t change with time (vmk is mth entry of vk ) Example: Quantum mechanics 17–8 . (−i/¯ )λN (which are pure h h h imaginary) Example: Quantum mechanics 17–7 • eigenvectors vk are called eigenstates of system • eigenvalue λk is energy of eigenstate vk h • for mode Ψ(t) = e(−i/¯ )λk tvk .e.. vN h • eigenvalues of (−i/¯ )H are (−i/¯ )λ1. λN are real (λ1 ≤ · · · ≤ λN ) • its eigenvectors v1. . vN can be chosen to be orthogonal (and real) from Hv = λv ⇔ (−i/¯ )Hv = (−i/¯ )λv we see: h h • eigenvectors of (−i/¯ )H are same as eigenvectors of H. . . v1. i. . . .Eigenvalues & eigenstates H is symmetric. . . . . . . . so • its eigenvalues λ1. . . .

8 0.3 0.2 0 −0.6 0.8 0.3 0.7 0.2 0. ) h Example: Quantum mechanics 17–9 lowest energy eigenfunctions 0.1 0.5 0. .2 0.e.6 0.6 0.9 1 0.5 0.1 0.6 0. we set ¯ = 1.2 0 0.3 0.8 0.2 0.4 0.4 0.9 1 x • potential bump in middle of inﬁnite potential well • (for this example.3 0.Example Potential Function V (x) 1000 900 800 700 600 V 500 400 300 200 100 0 0 0.5 0.4 0.9 1 0.2 0 −0.7 0.9 1 0.2 0 0. .8 0.2 0.2 0 0.6 0.8 0.7 0.2 0 −0. v3.7 0.5 0.4 0.7 0. v4) Example: Quantum mechanics 17–10 .1 0.1 0. v1.5 0.4 0.2 0 0. v2.1 0. m = 1 .2 0 −0.3 0..2 0.9 1 x • potential V shown as dotted line (scaled to ﬁt plot) • four eigenstates with lowest energy shown (i.

8 0.3 0.06 0.2 0..05 0 0 10 20 30 40 50 60 70 80 90 100 eigenstate ∗ • bottom plot shows |vk Ψ(0)|2.7 0.6 0.1 0.4 0.9 1 0.15 0.2 • with momentum to right (can’t see in plot of |Ψ|2) • (expected) kinetic energy half potential bump height Example: Quantum mechanics 17–11 0.e.04 0.5 x 0.08 0.1 0. resolution of Ψ(0) into eigenstates Example: Quantum mechanics 17–12 • top plot shows initial probability density |Ψ(0)|2 . with initial wave function Ψ(0) • particle near x = 0.02 0 0 0.now let’s look at a trajectory of Ψ. i.

. 40. 80. . then rolls back down • reverses velocity when it hits the wall at left (perfectly elastic collision) • then repeats Example: Quantum mechanics 17–14 . stops. classical solution: • particle rolls half way up potential bump. 320: |Ψ(t)|2 Example: Quantum mechanics 17–13 cf. . for t = 0.time evolution. .

7 0.8 x 10 2 4 t N/2 plot shows probability that particle is in left half of well.8 1 1.2 0.4 1.8 0.1 0.6 0.6 0. i.3 0..9 0.1 0 0 0. versus time t Example: Quantum mechanics k=1 |Ψk (t)|2.2 1.4 0. 17–15 .e.5 0.2 0.6 1.4 0.

tf ] we say input u : [ti. tf ] → Rm steers or transfers state from x(ti) to x(tf ) (over time interval [ti.EE263 Autumn 2010-11 Stephen Boyd Lecture 18 Controllability and state transfer • state transfer • reachable set. tf ]) (subscripts stand for initial and ﬁnal) questions: • where can x(ti) be transfered to at t = tf ? • how quickly can x(ti) be transfered to some xtarget? • how do we ﬁnd a u that transfers x(ti) to x(tf )? • how do we ﬁnd a ‘small’ or ‘eﬃcient’ u that transfers x(ti) to x(tf )? Controllability and state transfer 18–2 . controllability matrix • minimum norm inputs • inﬁnite-horizon minimum norm transfer 18–1 State transfer consider x = Ax + Bu (or x(t + 1) = Ax(t) + Bu(t)) over time interval ˙ [ti.

.. t−1 Rt = τ =0 At−1−τ Bu(τ ) u(0). .e.Reachability consider state transfer from x(0) = 0 to x(t) we say x(t) is reachable (in t seconds or epochs) we deﬁne Rt ⊆ Rn as the set of points reachable in t seconds or epochs for CT system x = Ax + Bu. . t] → Rm and for DT system x(t + 1) = Ax(t) + Bu(t). can reach more points given more time) we deﬁne the reachable set R as the set of points reachable for some t: R= t≥0 Rt Controllability and state transfer 18–4 . u(t − 1) ∈ Rm Controllability and state transfer 18–3 • Rt is a subspace of Rn • Rt ⊆ Rs if t ≤ s (i. . ˙ t Rt = 0 e(t−τ )ABu(τ ) dτ u : [0.

. . . range(Ct) = range(Cn) Controllability and state transfer 18–5 thus we have range(Ct) t < n range(C) t ≥ n where C = Cn is called the controllability matrix Rt = • any state that can be reached can be reached by t = n • the reachable set is R = range(C) Controllability and state transfer 18–6 . . x(t) ∈ Rn u(t − 1) .Reachability for discrete-time LDS DT system x(t + 1) = Ax(t) + Bu(t). x(t) = Ct u(0) where Ct = B AB · · · At−1B so reachable set at t is Rt = range(Ct) by C-H theorem. An−1 hence for t ≥ n. we can express each Ak for k ≥ n as linear combination of A0. .

e.Controllable system system is called reachable or controllable if all states are reachable (i. R = Rn) system is reachable if and only if Rank(C) = n example: x(t + 1) = 0 1 1 0 x(t) + 1 1 1 1 1 1 u(t) controllability matrix is C = hence system is not controllable. x(tf ) = Atf −ti x(ti) + Ctf −ti u(ti) hence can transfer x(ti) to x(tf ) = xdes ⇔ xdes − Atf −ti x(ti) ∈ Rtf −ti • general state transfer reduces to reachability problem • if system is controllable any state transfer can be achieved in ≤ n steps • important special case: driving state to zero (sometimes called regulating or controlling state) Controllability and state transfer 18–8 . . u(tf − 1) . reachable set is R = range(C) = { x | x1 = x2 } Controllability and state transfer 18–7 General state transfer with tf > ti..

= Ct u(0) xdes among all u that steer x(0) = 0 to x(t) = xdes. u(t − 1) must satisfy u(t − 1) . . t − 1 Controllability and state transfer 18–10 .Least-norm input for reachability assume system is reachable. for τ = 0. . . . . = Ct (CtCt )−1xdes . Rank(Ct) = n to steer x(0) = 0 to x(t) = xdes. . inputs u(0). uln(0) uln is called least-norm or minimum energy input that eﬀects state transfer can express as t−1 −1 uln(τ ) = B T (AT )(t−1−τ ) s=0 AsBB T (AT )s xdes. . . the one that minimizes t−1 u(τ ) τ =0 Controllability and state transfer 2 18–9 is given by uln(t − 1) T T . .

. and directions that can be reached only with large inputs) Controllability and state transfer 18–12 . t) gives measure of how hard it is to reach x(t) = xdes from x(0) = 0 (i. t) gives practical measure of controllability/reachability (as function of xdes. is sometimes called minimum energy required to reach x(t) = xdes t−1 Emin = τ =0 uln(τ ) 2 = T T Ct (CtCt )−1xdes T T T Ct (CtCt )−1xdes T = xT (CtCt )−1xdes des t−1 −1 = xT des τ =0 Aτ BB T (AT )τ xdes Controllability and state transfer 18–11 • Emin(xdes.e. how large a u is required) • Emin(xdes. the minimum value of τ =0 u(τ ) 2 required to reach x(t) = xdes. t) • ellipsoid { z | Emin(z. t) ≤ 1 } shows points in state space reachable at t with one unit of energy (shows directions that can be reached with small inputs.t−1 Emin.

t) for z = [1 1]T : 10 9 8 7 6 Emin 5 4 3 2 1 0 0 5 10 15 t 20 25 30 35 Controllability and state transfer 18–14 .e.: takes less energy to get somewhere more leisurely Controllability and state transfer 18–13 example: x(t + 1) = 1.95 0 x(t) + 1 0 u(t) Emin(z. t) ≤ Emin(xdes.75 0.8 −0.Emin as function of t: if t ≥ s then t−1 s−1 A BB (A ) ≥ τ =0 τ =0 τ T T τ Aτ BB T (AT )τ hence t−1 −1 s−1 −1 Aτ BB T (AT )τ τ =0 ≤ τ =0 Aτ BB T (AT )τ so Emin(xdes. s) i.

then P can have nonzero nullspace Controllability and state transfer 18–16 . 3) ≤ 1 5 x2 0 −5 −10 −10 −8 −6 −4 −2 0 x1 2 4 6 8 10 10 Emin (x.e. x(t) = xdes = xT P xdes des if A is stable. and gives the minimum energy required to reach a point xdes (with no limit on t): t−1 min τ =0 u(τ ) 2 x(0) = 0. 10) ≤ 1 5 x2 0 −5 −10 −10 −8 −6 −4 −2 0 x1 2 4 6 8 10 Controllability and state transfer 18–15 Minimum energy over inﬁnite horizon the matrix t−1 −1 P = lim t→∞ A BB (A ) τ =0 τ T T τ always exists..ellipsoids Emin ≤ 1 for t = 3 and t = 10: 10 Emin (x. P > 0 (i. can’t get anywhere for free) if A is not stable.

Rt = R = range(C). the instability carries it out to z eﬃciently) • basis of highly maneuverable. B) • same R as discrete-time system • for continuous-time system. z = 0 means can get to z using u’s with energy as small as you like (u just gives a little kick to the state.• P z = 0. any reachable point can be reached as fast as you like (with large enough u) Controllability and state transfer 18–18 . unstable aircraft Controllability and state transfer 18–17 Continuous-time reachability consider now x = Ax + Bu with x(t) ∈ Rn ˙ reachable set at time t is t Rt = 0 e(t−τ )ABu(τ ) dτ u : [0. where C= B AB · · · An−1B is the controllability matrix of (A. t] → Rm fact: for t > 0.

. An+1. . An−1 and collect powers of A: etA = α0(t)I + α1(t)A + · · · + αn−1(t)An−1 therefore t x(t) = 0 t eτ ABu(t − τ ) dτ n−1 = 0 Controllability and state transfer αi(τ )Ai Bu(t − τ ) dτ i=0 18–19 n−1 t = i=0 Ai B 0 αi(τ )u(t − τ ) dτ = Cz t where zi = 0 αi(τ )u(t − τ ) dτ hence. in terms of A0. . express An. . x(t) is always in range(C) need to show converse: every point in range(C) can be reached Controllability and state transfer 18–20 . . . .ﬁrst let’s show for any u (and x(0) = 0) we have x(t) ∈ range(C) write etA as power series: etA = I + t2 t A + A2 + · · · 1! 2! by C-H.

where δ (k) denotes kth derivative of δ and f ∈ Rm then U (s) = sk f . fn−1 x(0+) = Bf0 + · · · + An−1Bfn−1 = C hence we can reach any point in range(C) (at least. so X(s) = (sI − A)−1Bsk f = s−1I + s−2A + · · · Bsk f = ( sk−1 + · · · + sAk−2 + Ak−1 +s−1Ak + · · · )Bf impulsive terms hence x(t) = impulsive terms + A Bf + A in particular. x(0+) = Ak Bf Controllability and state transfer 18–21 k k+1 t t2 k+2 Bf + A Bf + · · · 1! 2! thus. input u = δ (k)f transfers state from x(0−) = 0 to x(0+) = Ak Bf now consider input of form u(t) = δ(t)f0 + · · · + δ (n−1)(t)fn−1 where fi ∈ Rm by linearity we have f0 . . using impulse inputs) Controllability and state transfer 18–22 .Impulsive inputs suppose x(0−) = 0 and we apply input u(t) = δ (k)(t)f .

. starting from x(0) = 0? • if not. dampers • input is tension between masses • state is x = [y T y T ]T ˙ u(t) u(t) system is 0 0 0 1 0 0 0 0 0 1 x+ x= ˙ 1 −1 1 −1 1 1 −1 1 −1 −1 u • can we maneuver state anywhere. then x(t) ∈ R for all t (no matter what u is) to show this. need to show etAx(0) ∈ R if x(0) ∈ R .can also be shown that any point in range(C) can be reached for any t > 0 using nonimpulsive inputs fact: if x(0) ∈ R. where can we maneuver state? Controllability and state transfer 18–24 . connected by unit springs. y2. Controllability and state transfer 18–23 Example • unit masses at y1. .

precisely the ˙ ˙ diﬀerential motions it’s obvious — internal force does not aﬀect center of mass position or total momentum! Controllability and state transfer 18–25 Least-norm input for reachability (also called minimum energy input) assume that x = Ax + Bu is reachable ˙ we seek u that steers x(0) = 0 to x(t) = xdes and minimizes t u(τ ) 0 2 dτ let’s discretize system with interval h = t/N (we’ll let N → ∞ later) thus u is piecewise constant: u(τ ) = ud(k) for kh ≤ τ < (k + 1)h.. . R = span 0 0 we can reach states with y1 = −y2. . i. .e. y1 = −y2. N − 1 18–26 . Controllability and state transfer k = 0. .controllability matrix is 0 1 −2 2 0 −1 2 −2 = 1 −2 2 0 −1 2 −2 0 0 0 1 −1 C= B AB A2B A3B hence reachable set is 1 −1 .

so this is approximately (t/N )B T e(t−τ )A T similarly N −1 T Ai BdBd (AT )i d d i=0 N −1 T e(ti/N )ABdBd e(ti/N )A i=0 t T = ≈ (t/N ) 0 ¯ etABB T etA dt ¯ ¯ T for large N Controllability and state transfer 18–28 .so x(t) = where Ad = ehA. . Bd ≈ (t/N )B. ud(0) eτ A dτ B least-norm ud that yields x(t) = xdes is N −1 −1 T Ai BdBd (AT )i d d i=0 udln(k) = T Bd (AT )(N −1−k) d xdes let’s express in terms of A: T T Bd (AT )(N −1−k) = Bd e(t−τ )A d Controllability and state transfer T 18–27 where τ = t(k + 1)/N for N large. Bd = 0 Bd AdBd · · · AN −1Bd d h ud(N − 1) .

but get larger u • cf. B) controllable ⇔ Q(t) > 0 for all t > 0 ⇔ Q(s) > 0 for some s > 0 in fact. this is the least-norm continuous input • can make t small. 0≤τ ≤t hence. DT solution: sum becomes integral Controllability and state transfer 18–29 min energy is 0 t uln(τ ) where Q(t) = 2 dτ = xT Q(t)−1xdes des t eτ ABB T eτ A dτ 0 T can show (A.hence least-norm discretized input is approximately uln(τ ) = B e for large N T (t−τ )AT 0 t −1 e BB e ¯ tA ¯ T tAT ¯ dt xdes. range(Q(t)) = R for any t > 0 Controllability and state transfer 18–30 .

Minimum energy over inﬁnite horizon the matrix P = lim t→∞ 0 t −1 e τA BB e T τ AT dτ always exists.e. then P can have nonzero nullspace • P z = 0. tf > ti since x(tf ) = e(tf −ti)Ax(ti) + u steers x(ti) to x(tf ) = xdes ⇔ u (shifted by ti) steers x(0) = 0 to x(tf − ti) = xdes − e(tf −ti)Ax(ti) • general state transfer reduces to reachability problem • if system is controllable. x(t) = xdes = xT P xdes des • if A is stable.. and gives minimum energy required to reach a point xdes (with no limit on t): t min 0 u(τ ) 2 dτ x(0) = 0. z = 0 means can get to z using u’s with energy as small as you like (u just gives a little kick to the state. the instability carries it out to z eﬃciently) Controllability and state transfer 18–31 General state transfer consider state transfer from x(ti) to x(tf ) = xdes. P > 0 (i. any state transfer can be eﬀected – in ‘zero’ time with impulsive inputs – in any positive time with non-impulsive inputs tf ti e(tf −τ )ABu(τ ) dτ Controllability and state transfer 18–32 . can’t get anywhere for free) • if A is not stable.

Example u1 u1 u2 u2 • unit masses.3 • x= y y ˙ 18–33 Controllability and state transfer system is: 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 −2 1 0 −2 1 0 1 −2 1 1 −2 1 0 1 −2 0 1 −2 0 0 0 0 0 0 1 0 −1 1 0 −1 x= ˙ x + u1 u2 Controllability and state transfer 18–34 .2. dampers • u1 is force between 1st & 2nd masses • u2 is force between 2nd & 3rd masses • y ∈ R3 is displacement of masses 1. springs.

5 3 3.5 1 1.e. control initial state e1 to zero at t = tf tf Emin = 0 uln(τ ) 2 dτ vs.5 4 4. u = uln is: 5 u1(t) 0 −5 0 0.5 2 2.5 t 5 u2(t) 0 −5 0 0.5 1 1..5 2 2.5 3 3. tf : 50 45 40 35 30 Emin 25 20 15 10 5 0 0 2 4 6 8 10 12 tf Controllability and state transfer 18–35 for tf = 3.5 4 4.5 t Controllability and state transfer 18–36 .steer state from x(0) = e1 to x(tf ) = 0 i.

5 3 3.4 0.2 −0.and for tf = 4: 5 u1(t) 0 −5 0 0.5 2 2.5 4 4.5 3 3.8 0.2 0 −0.5 1 1.6 y1(t) 0.5 t Controllability and state transfer 18–37 output y1 for u = 0: 1 0.5 2 2.5 1 1.5 4 4.5 t 5 u2(t) 0 −5 0 0.4 0 2 4 6 8 10 12 14 t Controllability and state transfer 18–38 .

2 −0.4 0 0.5 1 1.5 1 1.4 0 0.2 −0.4 0.5 2 2.output y1 for u = uln with tf = 3: 1 0.5 4 4.2 0 −0.6 y1(t) 0.5 4 4.8 0.6 y1(t) 0.5 t Controllability and state transfer 18–39 output y1 for u = uln with tf = 4: 1 0.8 0.5 t Controllability and state transfer 18–40 .2 0 −0.5 3 3.5 3 3.4 0.5 2 2.

• w is state disturbance or noise • v is sensor noise or error • A.EE263 Autumn 2010-11 Stephen Boyd Lecture 19 Observability and state estimation • state estimation • discrete-time observability • observability – controllability duality • observers for noiseless case • least-squares observers • example • continuous-time observability 19–1 State estimation set up we consider the discrete-time system x(t + 1) = Ax(t) + Bu(t) + w(t). C.. or assumed small (e. and D are known • u and y are observed over time interval [0. B. t − 1] • w and v are not known.g. but can be described statistically. in RMS value) y(t) = Cx(t) + Du(t) + v(t) Observability and state estimation 19–2 .

predict) next state an algorithm or system that yields an estimate x(s) is called an observer or ˆ state estimator x(s) is denoted x(s|t − 1) to show what information estimate is based on ˆ ˆ (read.. . . . . with no state or measurement noise: x(t + 1) = Ax(t) + Bu(t). . . u(t − 1). with x(t) ∈ Rn. . u(t) ∈ Rm.e. . y(0). y(t) ∈ Rp y(t) = Cx(t) + Du(t) then we have y(0) u(0) . y(t − 1) u(t − 1) Observability and state estimation 19–4 .State estimation problem state estimation problem: estimate x(s) from u(0). = Otx(0) + Tt . . y(t − 1) • s = 0: estimate initial state • s = t − 1: estimate current state • s = t: estimate (i. . “ˆ(s) given t − 1”) x Observability and state estimation 19–3 Noiseless case let’s look at ﬁnding x(0).

. t−2 CA B 0 D CAt−3B ··· 0 ··· D · · · CB • Ot maps initials state into resulting output over [0. output is zero over interval [0. t−1 CA D CB Tt = . . . . − Tt . x(0) is to be determined Observability and state estimation 19–5 hence: • can uniquely determine x(0) if and only if N (Ot) = {0} • N (Ot) gives ambiguity in determining x(0) • if x(0) ∈ N (Ot) and u = 0. Otx(0) = y(t − 1) u(t − 1) RHS is known.where C CA . t − 1] • input u does not aﬀect ability to determine x(0). Ot = . t − 1] • Tt maps input to output over [0. t − 1] hence we have y(0) u(0) . its eﬀect can be subtracted out Observability and state estimation 19–6 .

. . Rank(O) = n Observability and state estimation 19–7 Observability – controllability duality ˜ ˜ ˜ ˜ let (A. D). D) be dual of system (A. . B. . n−1 CA is called the observability matrix if x(0) can be deduced from u and y over [0. t − 1] for any t. each Ak is linear combination of A0. ˜ A = AT .Observability matrix by C-H theorem. ˜ C = BT . C. n − 1] N (O) is called unobservable subspace.e. . . i. then x(0) can be deduced from u and y over [0. i. N (Ot) = N (O) where C CA O = On = .e. ˜ D = DT controllability matrix of dual system is ˜ ˜ ˜˜ ˜ ˜ C = [B AB · · · An−1B] = OT .. B. describes ambiguity in determining state from input and output system is called observable if N (O) = {0}. ˜ B = CT . = [C T AT C T · · · (AT )n−1C T ] transpose of observability matrix ˜ similarly we have O = C T Observability and state estimation 19–8 . An−1 hence for t ≥ n. C.

x(τ − t + 1) = F y(τ ) u(τ ) Observability and state estimation 19–10 .e. . system is observable (controllable) if and only if dual system is controllable (observable) in fact.. y over [0.thus. − Tt . F Ot = I then we have the observer y(0) u(0) . − Tt . . t − 1] in fact we have y(τ − t + 1) u(τ − t + 1) . unobservable subspace is orthogonal complement of controllable subspace of dual ˜ N (O) = range(OT )⊥ = range(C)⊥ Observability and state estimation 19–9 Observers for noiseless case suppose Rank(Ot) = n (i. . i.. . x(0) = F y(t − 1) u(t − 1) which deduces x(0) (exactly) from u.e.. system is observable) and let F be any left inverse of Ot. i.e.

. multi-output) ﬁnite impulse response (FIR) ﬁlter. . given past t − 1 inputs & outputs observer is (multi-input.e. . with inputs u and y.. . n − 2. n − 1 evidently CAk (Az) = 0 for k = 0. and output x ˆ Observability and state estimation 19–11 Invariance of unobservable set fact: the unobservable subspace N (O) is invariant. i. .e. CAk z = 0 for k = 0. if z ∈ N (O).e. . n−1 CA n−1 (Az) = CA z = − n αiCAiz = 0 i=0 (by C-H) where det(sI − A) = sn + αn−1sn−1 + · · · + α0 Observability and state estimation 19–12 . . i. . our observer estimates what state was t − 1 epochs ago.. then Az ∈ N (O) proof: suppose z ∈ N (O)..i.

n−2 CA B 0 D = Ox + T u u ˙ . u(n−1) D ··· 0 ··· CAn−3B · · · CB (same matrices we encountered in discrete-time case!) Observability and state estimation 19–14 . y (n−1) where O is the observability matrix and D CB T = .Continuous-time observability continuous-time system with no sensor or state noise: x = Ax + Bu. ˙ y = Cx + Du can we deduce state x from u and y? let’s look at derivatives of y: y y ˙ = Cx + Du = C x + Du = CAx + CBu + Du ˙ ˙ ˙ y = CA2x + CABu + CB u + D¨ ¨ ˙ u and so on Observability and state estimation 19–13 hence we have y y ˙ . . . .

y (n−1)(t) • derivative-based state reconstruction is dual of state transfer using impulsive inputs Observability and state estimation 19–16 . . u(n−1) hence if N (O) = {0} we can deduce x(t) from derivatives of u(t). y (n−1) −T u u ˙ . . . y (n−1) −T u u ˙ . . .rewrite as RHS is known. . x is to be determined Ox = y y ˙ . . u(n−1) 19–15 • reconstructs x(t) (exactly and instantaneously) from u(t). y(t). . u(n−1)(t). . . . . y(t) up to order n − 1 in this case we say system is observable can construct an observer using any left inverse F of O: x = F Observability and state estimation y y ˙ .

. with x. y consistent with both state trajectories x. system is observable) least-squares observer uses pseudo-inverse: y(0) u(0) † . .e. and u is any input.e. ˜ x y = C x + Du ˜ i. no signal processing of any kind applied to u and y can deduce x unobservable subspace N (O) gives fundamental ambiguity in deducing x from u. x(0) = Ot ˆ y(t − 1) u(t − 1) † T where Ot = Ot Ot Observability and state estimation −1 T Ot 19–18 . .A converse suppose z ∈ N (O) (the unobservable subspace). ˙ y = Cx + Du then state trajectory x = x + etAz satisﬁes ˜ ˙ x = A˜ + Bu. y the corresponding state and output. i. x ˜ hence if system is unobservable.. x = Ax + Bu. − Tt . input/output signals u. with sensor noise: x(t + 1) = Ax(t) + Bu(t). y(t) = Cx(t) + Du(t) + v(t) we assume Rank(Ot) = n (hence. y Observability and state estimation 19–17 Least-squares observers discrete-time system.

. t−1 measured as τ =0 y (τ ) − y(τ ) ˆ 2 can express least-squares initial state estimate as t−1 −1 t−1 xls(0) = ˆ (A ) C CA τ =0 T τ T τ τ =0 (AT )τ C T y (τ ) ˜ where y is observed output with portion due to input subtracted: ˜ y = y − h ∗ u where h is impulse response ˜ Observability and state estimation 19–19 Least-squares observer uncertainty ellipsoid † since Ot Ot = I.e. we have v(0) † . and • output y that was observed. observer recovers exact state in noiseless case) now assume sensor noise is unknown. xls(0) = x(0) if sensor noise is zero ˆ (i. . 1 v(τ ) t τ =0 t−1 2 ≤ α2 Observability and state estimation 19–20 .interpretation: xls(0) minimizes discrepancy between ˆ • output y that would be observed. x(0) = xls(0) − x(0) = Ot ˜ ˆ v(t − 1) where x(0) is the estimation error of the initial state ˜ in particular. with input u and initial state x(0) ˆ (and no sensor noise). but has RMS value ≤ α.

. can’t estimate initial state perfectly even with inﬁnite number of measurements u(t). .. .e. initial state estimation error gets arbitrarily small (at least in some directions) as more and more of signals u and y are observed Observability and state estimation 19–22 . . Ot = v(t − 1) 1 v(τ ) t τ =0 t−1 2 x(0) ∈ Eunc ˜ ≤ α2 Eunc is ‘uncertainty ellipsoid’ for x(0) (least-square gives best Eunc) shape of uncertainty ellipsoid determined by matrix T Ot Ot −1 t−1 −1 = τ =0 (AT )τ C T CAτ maximum norm of error is √ † xls(0) − x(0) ≤ α t Ot ˆ Observability and state estimation 19–21 Inﬁnite horizon uncertainty ellipsoid the matrix t−1 −1 P = lim t→∞ (AT )τ C T CAτ τ =0 always exists. P > 0 i.e. ) • if A is not stable. t = 0. y over longer and longer periods: • if A is stable.set of possible estimation errors is ellipsoid v(0) † . and gives the limiting uncertainty in estimating x(0) from u. y(t).. . . (since memory of x(0) fades . then P can have nonzero nullspace i.

once per second • range noises IID N (0.04 0. can assume RMS value of v is not much more than 2 • no assumptions about initial position & velocity particle range sensors problem: estimate initial position & velocity from range measurements Observability and state estimation 19–23 express as linear system 1 0 x(t + 1) = 0 0 0 1 0 0 1 0 1 0 0 1 x(t). 0 1 T k1 .03) Observability and state estimation 19–24 . y(t) = .Example • particle in R2 moves with uniform velocity • (linear. noisy) range measurements from directions −15◦. 30◦. x4(t)) is velocity of particle • can assume RMS value of v is around 2 • ki is unit vector from sensor i to origin true initial position & velocities: x(0) = (1 − 3 − 0. x(t) + v(t) T k4 • (x1(t). 1). 0◦. x2(t)) is position of particle • (x3(t). 20◦.

range measurements (& noiseless versions):

5

measurements from sensors 1 − 4

4

3

2

1

range

0

−1

−2

−3

−4

−5

0

20

40

60

80

100

120

t

Observability and state estimation

19–25

• estimate based on (y(0), . . . , y(t)) is x(0|t) ˆ

• actual RMS position error is

(ˆ1(0|t) − x1(0))2 + (ˆ2(0|t) − x2(0))2 x x

**(similarly for actual RMS velocity error)
**

Observability and state estimation 19–26

RMS position error

1.5

1

0.5

0

10

20

30

40

50

60

70

80

90

100

110

120

RMS velocity error

1 0.8 0.6 0.4 0.2 0

10

20

30

40

50

60

70

80

90

100

110

120

t

Observability and state estimation

19–27

**Continuous-time least-squares state estimation
**

assume x = Ax + Bu, y = Cx + Du + v is observable ˙ least-squares estimate of initial state x(0), given u(τ ), y(τ ), 0 ≤ τ ≤ t: choose xls(0) to minimize integral square residual ˆ

t

J=

0

y (τ ) − Ceτ Ax(0) ˜

2

dτ

**where y = y − h ∗ u is observed output minus part due to input ˜ let’s expand as J = x(0)T Qx(0) + 2rT x(0) + s,
**

t t

Q=

0

e

τ AT

C Ce

T

τA

dτ,

t

r=

0

˜ eτ A C T y (τ ) dτ,

T

s=

0

Observability and state estimation

y (τ )T y (τ ) dτ ˜ ˜

19–28

**setting ∇x(0)J to zero, we obtain the least-squares observer
**

t −1

xls(0) = Q ˆ

−1

r=

0

e

τ AT

t 0

C Ce

T

τA

dτ

eA τ C T y (τ ) dτ ˜

T

estimation error is

t −1

x(0) = xls(0) − x(0) = ˜ ˆ

e

0

τ AT

t 0

C Ce

T

τA

dτ

eτ A C T v(τ ) dτ

T

therefore if v = 0 then xls(0) = x(0) ˆ

Observability and state estimation

19–29

EE263 Autumn 2010-11

Stephen Boyd

Lecture 20 Some parting thoughts . . .

• linear algebra • levels of understanding • what’s next?

20–1

Linear algebra

• comes up in many practical contexts (EE, ME, CE, AA, OR, Econ, . . . ) • nowadays is readily done cf. 10 yrs ago (when it was mostly talked about)

• Matlab or equiv for fooling around • real codes (e.g., LAPACK) widely available • current level of linear algebra technology: – 500 – 1000 vbles: easy with general purpose codes – much more possible with special structure, special codes (e.g., sparse, convolution, banded, . . . )

Some parting thoughts . . . 20–2

Levels of understanding

Simple, intuitive view: • 17 vbles, 17 eqns: usually has unique solution • 80 vbles, 60 eqns: 20 extra degrees of freedom Platonic view: • singular, rank, range, nullspace, Jordan form, controllability • everything is precise & unambiguous • gives insight & deeper understanding • sometimes misleading in practice

Some parting thoughts . . . 20–3

Quantitative view: • based on ideas like least-squares, SVD • gives numerical measures for ideas like singularity, rank, etc. • interpretation depends on (practical) context • very useful in practice

Some parting thoughts . . .

20–4

• must have understanding at one level before moving to next • never forget which level you are operating in

Some parting thoughts . . .

20–5

What’s next?

• EE364a — convex optimization I (Win 10-11) • EE364b — convex optimization II (Spr 10-11) (plus lots of other EE, CS, ICME, MS&E, Stat, ME, AA courses on signal processing, control, graphics & vision, machine learning, computational geometry, numerical linear algebra, . . . )

Some parting thoughts . . .

20–6

EE263

Prof. S. Boyd

Basic Notation

Basic set notation the set with elements a1 , . . . , ar . a is in the set S. the sets S and T are equal, i.e., every element of S is in T and every element of T is in S. S⊆T the set S is a subset of the set T , i.e., every element of S is also an element of T . ∃a ∈ S P(a) there exists an a in S for which the property P holds. ∀x ∈ S P(a) property P holds for every element in S. {a ∈ S | P(a)} the set of all a in S for which P holds (the set S is sometimes omitted if it can be determined from context). A∪B union of sets, A ∪ B = {x | x ∈ A or x ∈ B}. A∩B intersection of sets, A ∩ B = {x | x ∈ A and x ∈ B}. A×B Cartesian product of two sets, A × B = {(a, b) | a ∈ A, b ∈ B}. Some speciﬁc sets R Rn R1×n Rm×n j i the set of real numbers. the set of real n-vectors (n × 1 matrices). the set of real n-row-vectors (1 × n matrices). the set of real m × n matrices. √ can mean √ −1, in the company of electrical engineers. can mean −1, for normal people; i is the polite term in mixed company (i.e., when non-electrical engineers are present). C, Cn , Cm×n the set of complex numbers, complex n-vectors, complex m × n matrices. Z the set of integers: Z = {. . . , −1, 0, 1, . . .}. R+ the nonnegative real numbers, i.e., R+ = {x ∈ R | x ≥ 0}. [a, b], (a, b], [a, b), (a, b) the real intervals {x | a ≤ x ≤ b}, {x | a < x ≤ b}, {x | a ≤ x < b}, and {x | a < x < b}, respectively. {a1 , . . . , ar } a∈S S=T

1

Vectors and matrices We use square brackets [ and ] to construct matrices and vectors, with white space delineating the entries in a row, and a new line indicating a new row. For example [1 2] is a row vector 1 2 3 is matrix in R2×3 . [1 2]T denotes a column vector, i.e., an element in R1×2 , and 4 5 6 of R2×1 , which we abbreviate as R2 . We use curved brackets ( and ) surrounding lists of entries, delineated by commas, as an alternative method to construct (column) vectors. Thus, we have three ways to denote a column vector: 1 (1, 2) = [1 2]T = . 2 Note that in our notation scheme (which is fairly standard), [1, 2, 3] and (1 2 3) aren’t used. We also use square and curved brackets to construct block matrices and vectors. For example if x, y, z ∈ Rn , we have x y z ∈ Rn×3 , a matrix with columns x, y, and z. We can construct a block vector as x (x, y, z) = y ∈ R3n . z Functions The notation f : A → B means that f is a function on the set A into the set B. The notation b = f (a) means b is the value of the function f at the point a, where a ∈ A and b ∈ B. The set A is called the domain of the function f ; it can thought of as the set of legal parameter values that can be passed to the function f . The set B is called the codomain (or sometimes range) of the function f ; it can thought of as a set that contains all possible returned values of the function f . There are several ways to think of a function. The formal deﬁnition is that f is a subset of A × B, with the property that for every a ∈ A, there is exactly one b ∈ B such that (a, b) ∈ f . We denote this as b = f (a). Perhaps a better way to think of a function is as a black box or (software) function or subroutine. The domain is the set of all legal values (or data types or structures) that can be passed to f . The codomain of f gives the data type or data structure of the values returned by f . Thus f (a) is meaningless if a ∈ A. If a ∈ A, then b = f (a) is an element of B. Also note that the function is denoted f ; it is wrong to say ‘the function f (a)’ (since f (a) is an element 2

of B, not a function). Having said that, we do sometimes use sloppy notation such as ‘the function f (t) = t3 ’. To say this more clearly you could say ‘the function f : R → R deﬁned by f (t) = t3 for t ∈ R’. Examples • −0.1 ∈ R, √ 2 ∈ R+ , 1 − 2j ∈ C (with j = A= √ −1).

• The matrix

0.3 6.1 −0.12 7.2 0 0.01

is an element in R2×3 . We can deﬁne a function f : R3 → R2 as f (x) = Ax for any x ∈ R3 . If x ∈ R3 , then f (x) is a particular vector in R2 . We can say ‘the function f is linear’. To say ‘the function f (x) is linear’ is technically wrong since f (x) is a vector, not a function. Similarly we can’t say ‘A is linear’; it is just a matrix. • We can deﬁne a function f : {a ∈ R | a = 0} × Rn → Rn by f (a, x) = (1/a)x, for any a ∈ R, a = 0, and any x ∈ Rn . The function f could be informally described as division of a vector by a nonzero scalar. • Consider the set A = { 0, − 1, 3.2 }. The elements of A are 0, −1 and 3.2. Therefore, for example, −1 ∈ A and { 0, 3.2 } ⊆ A. Also, we can say that ∀x ∈ A, − 1 ≤ x ≤ 4 or ∃x ∈ A, x > 3. • Suppose A = { 1, − 1 }. Another representation for A is A = { x ∈ R | x2 = 1 }. • Suppose A = { 1, − 2, 0 } and B = { 3, − 2 }. Then A ∪ B = { 1, − 2, 0, 3 }, A ∩ B = { − 2 }.

• Suppose A = { 1, − 2, 0 } and B = {1, 3}. Then A × B = { (1, 1), (1, 3), (−2, 1), (−2, 3), (0, 1), (0, 3) }. • f : R → R with f (x) = x2 − x deﬁnes a function from R to R while u : R+ → R2 with t cos t u(t) = . 2e−t deﬁnes a function from R+ to R2 .

3

A primer on matrices

Stephen Boyd September 27, 2011

These notes describe the notation of matrices, the mechanics of matrix manipulation, and how to use matrices to formulate and solve sets of simultaneous linear equations. We won’t cover • linear algebra, i.e., the underlying mathematics of matrices • numerical linear algebra, i.e., the algorithms used to manipulate matrices and solve linear equations • software for forming and manipulating matrices, e.g., Matlab, Mathematica, or Octave • how to represent and manipulate matrices, or solve linear equations, in computer languages such as C/C++ or Java • applications, for example in statistics, mechanics, economics, circuit analysis, or graph theory

1

**Matrix terminology and notation
**

scalars), written between square

Matrices A matrix is a rectangular array of numbers (also called brackets, as in 0 1 −2.3 0.1 4 −0.1 0 A = 1.3 4.1 −1 0 1.7

An important attribute of a matrix is its size or dimensions, i.e., the numbers of rows and columns. The matrix A above, for example, has 3 rows and 4 columns, so its size is 3 × 4. (Size is always given as rows × columns.) A matrix with m rows and n columns is called an m × n matrix. An m × n matrix is called square if m = n, i.e., if it has an equal number of rows and columns. Some authors refer to an m × n matrix as fat if m < n (fewer rows than columns), or skinny if m > n (more rows than columns). The matrix A above is fat. 1

.

. scalars.3. A13 = −2. i. For example. where the subscript denotes the size. Zero and identity matrices The zero matrix (of size m × n) is the matrix with all entries equal to zero. In the context of matrices and scalars. As an example..3 0.e. vectors. The row index of the bottom left entry (which has value 4. Two matrices are equal if they are the same size and all the corresponding entries (which are numbers) are equal. For our example above.e. and scalars (numbers). The i. But often. vectors. there are about as many notational conventions as authors. is called a row vector. vectors.. . . j entry is the value in the ith row and jth column. In this case you’ll 2 . is called a column vector or just a vector. denoted by double subscripts: the i. Notational conventions for matrices. ) for vectors. and capital letters (A. j entry of a matrix C is denoted Cij (which is a number). and scalars Some authors try to use notation that helps the reader distinguish between matrices.e. Similarly. with size n × 1. f . .3. .3 is a 4-vector (or 4 × 1 matrix. its third component is v3 = 3. its third component is w3 = 0. or vectors written with arrows above them (a). Sometimes the size is speciﬁed by calling it an n-vector. the same symbol used to denote the number 0. Sometimes a 1 × 1 matrix is considered to be the same as an ordinary number. Unfortunately.The entries or coeﬃcients of a matrix are the values in the array. Greek letters (α. The entries are sometimes called the components of the vector. and the number of rows of a vector is sometimes called its dimension.1 −3 0 is a row vector (or 1 × 3 matrix).. . its column index is 1. β. so you should be prepared to ﬁgure out what things are (i. ) might be used for numbers. i. Other notational conventions include matrices given in bold font (G). 1 −2 v= 3. matrices) despite the author’s notational scheme (if any exists). a matrix with only one row. ) for matrices. or vector of dimension 4). Sometimes the zero matrix is written as 0m×n . w = −2. . A32 = −1. F . x. The positive integers i and j are called the (row and column. a zero matrix is denoted just 0. The entries of a vector are denoted with just one subscript (since the other is 1). lower-case letters (a. . . As an example. as in a3 .1) is 3. ordinary numbers are often called scalars. with size 1 × n. respectively) indices. Vectors and scalars A matrix with only one column.

As with zero or identity matrices. and its oﬀ-diagonal entries (those with unequal row and column indices) are zero. Note that zero matrices of diﬀerent sizes are diﬀerent matrices. i.e.e. But more often the size must be determined from context (just like zero matrices). are all equal to one. Identity matrices are denoted by the letter I. the 4-dimensional ones vector is 1 1 1 = . (More on this later.e. 0).. Its diagonal entries. namely.e. For example. sometimes called the ones vector. whose ith component is 1 and all others are zero.) When a zero matrix is a (row or column) vector.. Another term for ei is ith standard basis vector. the identity matrix of size n is deﬁned by Iij = Perhaps more illuminating are the examples 1 0 0 1 1 i = j. The ith unit vector. is usually denoted ei . the equation it appears in). 0 i = j. i. we call it a zero (row or column) vector. which are the 2 × 2 and 4 × 4 identity matrices. e1 = 0 . even though we use the same symbol to denote them (i.have to ﬁgure out the size of the zero matrix from the context. you should watch out. (We’ll explain that later.) The importance of the identity matrix will become clear later. 1 0 0 Note that the n columns of the n × n identity matrix are the n unit n-vectors.) Another common vector is the one with all components one.. It is always square. 1 1 3 . The three unit 3-vectors are: 0 0 1 e3 = 0 . 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 . In programming we call this overloading: we say the symbol 0 is overloaded because it can mean diﬀerent things depending on its context (i. Formally. Sometimes a subscript denotes the size. you usually have the ﬁgure out the dimension of a unit vector from context.. An identity matrix is another common matrix. Also. (Remember that both are denoted with the same symbol. because some authors use the term ‘unit vector’ to mean a vector of length one. Unit and ones vectors A vector with one component one and all others zero is called a unit vector. and denoted 1 (by some authors) or e (by others). as in I4 or I2×2 . e2 = 1 . I. those with equal row and column index. has the same number of rows as columns.

For ij example. (A + B) + C = A + (B + C).e. We always have A + 0 = 0 + A = A. 1 6 1 2 0 4 7 0 + 2 3 = 9 3 3 5 0 4 3 1 A pair of row or column vectors of the same size can be added. T 0 4 0 7 3 .) For example. is the n × m matrix given by AT = Aji . Matrix addition is denoted by the symbol +. by adding the corresponding entries (which are numbers). Matrix addition is commutative. Since you can only add (or subtract) matrices of the same size. we conclude that I must refer to a 2 × 2 identity matrix. 7 0 = 4 0 1 3 1 Transposition converts row vectors into column vectors. and vice versa. if A and B are matrices of the same size..e. Matrix subtraction is similar. 1 6 9 3 −I = 0 6 9 2 . i.2 Matrix operations Matrices can be combined using various operations to form other matrices. i. we get back the original matrix: AT = A. otherwise they could not be added. In words. its transpose. adding the zero matrix to a matrix has no eﬀect. (This is another example where you have to ﬁgure out the exact dimensions of the zero matrix from context.. Note that this gives an example where we have to ﬁgure out what size the identity matrix is. If we transpose a T matrix twice. the rows and columns of A are transposed in AT .) 4 . to form another matrix (of the same size). Matrix addition Two matrices of the same size can be added together.e. then A + B = B + A. As an example.. the zero matrix must have the same dimensions as A. and matrix addition when matrices appear on its left and right hand sides. denoted AT (or sometimes A′ ). (Thus the symbol + is overloaded to mean scalar addition when scalars appear on its left and right hand side. so we write both as A + B + C. Matrix transpose If A is an m × n matrix. It’s also associative. i. but you cannot add together a row vector and a column vector (except when they are both scalars!). Here.

e. as in −2 −12 1 6 (−2) 9 3 = −18 −6 . . β are any scalars. To ﬁnd the i. Note that 0 · A = 0 (where the lefthand zero is the scalar zero. j entry of the product C = AB. and the righthand zero is a matrix zero of the same size as A).e. The + symbol on the left is addition of scalars. . 12 0 6 0 9 6 9 6 0 3 3 = 3 2 3 2 0 1 . which is done by multiplying every entry of the matrix by the scalar. i. −12 0 6 0 Sometimes you see scalar multiplication with the scalar on the right. or even scalar division with the scalar shown in the denominator (which just means scalar multiplication by one over the scalar). The 5 . which means the number of columns of A (i. as in 2 12 1 6 9 3 · 2 = 18 6 . e. i = 1.e. . Suppose A and B are compatible. . but I think these look pretty ugly. number). where α and β are scalars and A is a matrix. . Scalar multiplication is usually denoted by juxtaposition. you need to know the ith row of A and the jth column of B. m. on the right we see two cases of scalar-matrix multiplication.. You can multiply two matrices A and B provided their dimensions are compatible.e. if A is any matrix and α. . ... . which has size m × n. while the + symbol on the right denotes matrix addition. then (α + β)A = αA + βA. its width) equals the number of rows of B (i.Scalar multiplication Another operation is scalar multiplication: multiplying a matrix by a scalar (i. n. j = 1.. its height).g. Scalar multiplication obeys several laws you can ﬁgure out for yourself. but there are several ways to remember it. Matrix multiplication It’s also possible to multiply two matrices using matrix multiplication. Another simple property is (αβ)A = (α)(βA). A has size m × p and B has size p × n.. This rule looks complicated. It’s a useful exercise to identify the symbols appearing in this formula. with the scalar on the left. On the left hand side we see scalar-scalar multiplication (αβ) and scalar-matrix multiplication. is deﬁned by p Cij = k=1 aik bkj = ai1 b1j + · · · + aip bpj . The product matrix C = AB.

1 entry. As an example. (The identity matrices in the formulas AI = A and IA = A have diﬀerent sizes — what are they?) One very important fact about matrix multiplication is that it is (in general) not commutative: we don’t (in general) have AB = BA. B= 2 −1 0 First. Even when AB and BA both make sense and are the same size. 0 −3 1 = 2 −1 0 1 −1 −4 3 . you’ll get used to it. when you multiply a matrix by an identity matrix. matrix multiplication probably looks very complicated to you. The product matrix C will have two rows (the number of rows of A) and two columns (the number of columns of B). summing the products of corresponding entries: C11 = (1)(0) + (2)(2) + (3)(−1) = 1. BA may not even make sense. and B has three rows. 2 entry. As a simple example. if it makes sense. so they’re compatible.. when A and B are square.e. consider: 1 6 9 3 0 −1 −1 2 = −6 11 −3 −3 .summation above can be interpreted as ‘moving left to right along the ith row of A’ while moving ‘top to bottom’ down the jth column of B. where A= 1 2 3 −1 0 4 . you keep a running sum of the product of the coresponding entries from A and B. As you go. then AB makes sense (the dimensions are compatible) but BA doesn’t even make sense (much less equal AB). it has no eﬀect. i. we don’t (in general) have AB = BA. but once you see all the uses for it. It is. and the righthand number comes from the ﬁrst column of B. To ﬁnd the 1. . 0 −3 1 . For example. Now let’s ﬁnd the entries of the product C. the lefthand number comes from the ﬁrst row of A. then AI = A and IA = A. To ﬁnd the 1. let’s ﬁnd the product C = AB. 6 0 −1 −1 2 1 6 9 3 = −9 −3 17 0 . we move across the ﬁrst row of A and down the second column of B: C12 = (1)(−3) + (2)(1) + (3)(0) = −1. In each product term here. may be a diﬀerent size than AB (so that equality in AB = BA is meaningless).e. we check that they are compatible: A has three columns.. Some properties of matrix multiplication Now we can explain why the identity has its name: if A is any m × n matrix. we move across the ﬁrst row of A and down the ﬁrst column of B. if A is 2 × 3 and B is 3 × 4. Two more similar calculations give us the remaining entries C21 and C22 : 1 2 3 −1 0 4 At this point. i. or. In fact.

. We refer to this matrix as A2 . Matrix powers When a matrix A is square. Other notation for the inner product is x.Matrix multiplication is associative. where α is a scalar and A and B are matrices (that can be multiplied).e. where A is an m × n matrix. In this case the product (which is a number) is called the inner product or dot product of the vectors x and y. then their inner product is x. or be ambiguous. Similarly. k copies of A multiplied together is denoted Ak . Then the product vw makes sense. and has size 1 × 1. m Inner product Another important special case of matrix multiplication occurs when v is an row n-vector and w is a column n vector. i. If x and y are n-vectors.. and there is a matrix F such that F A = I.e.e. This occurs often in the form xT y where x and y are both n-vectors. unless certain conditions on A hold. If a matrix is not invertible. then we say that A is invertible or nonsingular. y = xT y = x1 y1 + · · · + xn yn . The formula is yi = Ai1 x1 + · · · + Ain xn . then it makes sense to multiply A by itself. . We call F the inverse of A. (AB)C = A(BC) (provided the products make sense).) By convention we set A0 = I (usually only when A is invertible — see below). are pretty tricky — they might not make sense. Matrix inverse If A is square. and y is an m-vector. Matrix-vector product A very important and common case of matrix multiplication is y = Ax. such as A1/2 (the matrix squareroot). . (Non-integer powers. i = 1. we say it is singular or noninvertible.e. y or x · y. Therefore we write the product simply as ABC. to form A · A. We can think of matrix vector multiplication (with an m × n matrix) as a function that transforms n-vectors into m-vectors. is a scalar: vw = v1 w1 + · · · + vn wn . Matrix multiplication is also associative with scalar multiplication. Matrix multiplication distributes across matrix addition: A(B + C) = AB + AC and (A + B)C = AC + BC.. and denote it A−1 . x is an n-vector. .. i. But remember that the matrix product xy doesn’t make sense (unless they are both scalars). We can then also deﬁne k A−k = (A−1 ) . i. α(AB) = (αA)B. This is an advanced topic in linear algebra. . 7 . i.

When a matrix is invertible. a zero matrix never has an inverse. l ≥ 1 in general.. the inverse of the inverse is the original matrix. have inverses. As an example of the matrix inverse. but they are not used in practice. It’s very useful to know the general formula for a 2 × 2 matrix inverse: a b c d −1 = 1 ad − bc d −b −c a provided ad − bc = 0. −1 (A−1 ) = A. The importance of the matrix inverse will become clear when we study linear equations. i.) There are similar.g. (If ad − bc = 0. (For example... formulas for the inverse of larger (invertible) square matrices.It’s important to understand that not all square matrices are invertible. Here we list a few others. the matrix is not invertible. you might try to show that the matrix 1 −1 −2 2 does not have an inverse. l if A is invertible) 8 . that you could ﬁgure out yourself. that are not hard to derive. In other words. A basic result of linear algebra is that AA−1 = I.e. but much more complicated. we have 1 −1 1 2 −1 = 1 3 2 1 −1 1 (you should check this!). and for all k. and quite useful. Useful identities We’ve already mentioned a handful of matrix identities. you get the identity. e. if you multiply a matrix by its inverse on the right (as well as the left).e. A + 0 = A.) As a less obvious example. i. (We’re making no claims that our list is complete!) • transpose of product: (AB)T = B T AT • transpose of sum: (A + B)T = AT + B T • inverse of product: (AB)−1 = B −1 A−1 provided A and B are square (of the same size) and invertible • products of powers: Ak Al = Ak+l (for k.

the same ‘height’). ‘conform’) and you’re careful about the order of multipication. matrices in the same (block) column must have the same number of columns (i. or as an m × 1 block matrix of n-row-vectors (which are its rows). 0 2 3 2 2 5 4 7 1 3 . F . Thus. provided the corresponding entries have the right sizes (i. it’s often useful to write an m × n matrix as a 1 × n block matrix of m-vectors (which are just its columns)... Of course the block matrices must have the right dimensions to be able to ﬁt together: matrices in the same (block) row must have the same number of rows (i. suppose that C= Then we have D C Continuing this example. say. 1 block — it must be p × n. and 2 × 1). 1 block of the second matrix. That ﬁxes the dimensions of the 0 matrix in the 2. Thus in the examples above. For example. C. the entries A. 2 position must have size m × m (since it must have the same number of rows as F ). because the top block has two columns and the bottom block has three. B and C must have the same number of rows (e. the expression C D doesn’t make sense. We also see that G must have m columns. and G are matrices (as are 0 and I). because now the bottom block has two columns. B. dimensions p × m. 0 G where A. Suppose that F is m × n.e.e. Such matrices are called block matrices.e. . etc. 2 × 2.. .Block matrices and submatrices In some applications it’s useful to form matrices whose entries are themselves matrices. D= 0 2 3 5 4 7 . the same ‘width’). just like the top block. Block matrices can be added and multiplied as if the entries were numbers. B.g.. are called ‘blocks’ and are sometimes named by indices. The second example is more interesting. But the block expression C DT does make sense. Then the identity matrix in the 1. In this context the blocks are sometimes called submatrices of the big matrix. As a speciﬁc example. You can also divide a larger matrix (or vector) into ‘blocks’. as in F I A B C . A. Thus we have A B C D X Y = 9 AX + BY CX + DY = 2 2 1 3 . they could be 2 × 3. F is called the 1.

Linear equations Any set of m linear equations in (scalar) variables x1 . . x3 = x2 − 2.. . i. z = By where B is p × m.. Let’s start with a simple example of two equations in three variables: 1 + x2 = x3 − 2x1 .. . i = 1. f (x) = y where n yi = j=1 Aij xj = Ai1 x1 + · · · + Ain xn . . functions deﬁned by matrix-vector multiplication are linear. . . (Conversely. . CX. BY . . xn can be represented by the compact matrix equation Ax = b. i. The ﬁrst thing to do is to rewrite the equations with the variables lined up in columns. which we can express in the simple form z = By = (BA)x.e. f (αx) = αf (x) • superposition: for any n-vectors u and v.) We can also write out the linear function in explicit form. m This gives a simple interpretation of Aij : it gives the coeﬃcient by which yi depends on xj . A is an m × n matrix and b is an m-vector.e.e. and the constants on the righthand side: 2x1 +x2 −x3 = −1 0x1 −x2 +x3 = −2 Now it’s easy to rewrite the equations as a single matrix equation: 2 1 −1 0 −1 1 x1 x2 = x3 10 −1 −2 . where x is a vector made from the variables. We say f is linear if it satisﬁes two properties: • scaling: for any n-vector x and any scalar α. Suppose also that a p-vector z is a linear function of y. linear functions of linear functions of some variables). 3 Linear equations and matrices Linear functions Suppose that f is a function that takes as argument (input) n-vectors and returns (as output) m-vectors. So matrix multiplication corresponds to composition of linear functions (i. . Suppose an m-vector y is a linear function of the n-vector x.provided the products AX. i. Then z is a linear function of x. and DY makes sense.e. f (u + v) = f (u) + f (v) It’s not hard to show that such a function can always be represented as matrix-vector multiplication: there is an m × n matrix A such that f (x) = Ax for all n-vectors x.. y = Ax where A is m × n.

(A 1000 × 1000 matrix requires storage for 106 doubles. . When these pathologies occur. the matrix A is singular. and the equations can be solved as x = A−1 b. to get the solution x = A−1 b (although that procedure would. when a matrix A is singular. Solving linear equations in practice When we solve linear equations in practice. written as the compact matrix equation Ax = b. you can’t always solve n linear equations for n variables. of its entries equal to zero. xn . Here we should make a comment about matrix notation. Now you can see the importance of the matrix inverse: it can be used to solve simultaneous linear equations.. (These facts are studied in linear algebra. A−1 ) can express a lot. Suppose that A is invertible..) From a practical point of view... then. 5000. this means each 11 . so solving say a set of 1000 equations on a small PC class computer takes only a second or so. Of course. will take much (125×) longer (on a PC class computer). One or more of the equations might be redundant (i. and then multiply it on the right by the vector b. for example. A−1 exists.. i. The power of matrix notation is that just a few symbols (e. A is singular means that the equations in Ax = b are redundant or inconsistent — a sign that you have set up the wrong equations (or don’t have enough of them). x1 = 2. the inverse A−1 exists. x1 x = x2 . solving a system of n simultaneous linear equations) require on the order of n3 basic arithmetic operations. where A is an n × n matrix. noninvertible. The most common methods for computing x = A−1 b (i. around 10MB. The lefthand side simpliﬁes to A−1 Ax = Ix = x. . Conversely. Practical methods compute the solution x = A−1 b directly.e. i. (A 5000 × 5000 matrix requires around 250MB to store. so we have actually solved the simultaneous linear equations: x = A−1 b. Otherwise. and b is an n-vector.) In many applications the matrix A has many.. work). Another (perhaps more pessimistic) way to put this is. can be obtained from the others). a lot of work can be hidden behind just a few symbols (e.) But solving larger sets of equations.e. or almost all.e.g. it turns out the simultaneous linear equations Ax = b are redundant or inconsistent. A−1 ). or the equations could be inconsistent as in x1 = 1.e.. . x3 b= −1 −2 Solving linear equations Now suppose we have n linear equations in n variables x1 . in which case we say it is sparse.g. But modern computers are very fast. In terms of the associated linear equations. of course.e. Let’s multiply both sides of the matrix equation Ax = b on the left by A−1 : A−1 (Ax) = A−1 b.so we can express the two equations in the three variables as Ax = b where A= 2 1 −1 0 −1 1 . (i. by computer) we do not ﬁrst compute the matrix A−1 . .

solving a system of 10000 simultaneous sparse linear equations is feasible.equation involves only some (often just a few) of the variables. 12 . using sparse matrix techniques. and might take only a few seconds (but it depends on how sparse the equations are). with hundreds of thousands of (sparse) equations. It’s not uncommon to solve for hundreds of thousands of variables. Even on a PC class computer. It turns out that such sparse equations can be solved by computer very eﬃciently.

the perpetrator forms an expression or makes an assertion that does not break any syntax rules. at least on things you hand in to us. These crimes are a bit harder to detect than syntax crimes. in order to not build bad habits. determinant. These are serious crimes of negligence. but is wrong because of the meaning. or on scratch paper. Semantic crimes In a semantic crime. the perpetrator attempts to combine matrices (or other mathematical objects) in ways that violate basic syntax rules. • Adding. forming AB. • Multiplying matrices with incompatible dimensions (i. since it is so easy to check your work for potential violations. Example: forming A−1 . when A ∈ R2×3 and B ∈ R3×3 . when the number of columns of A does not equal the number of rows of B). But we recommend you avoid these crimes at all times. when A ∈ R2×3 . Be very careful to avoid committing any of these crimes. Boyd Crimes Against Matrices In this note we list some matrix crimes that we have. Example: forming the block matrix [A B]. in EE263 we have a zero-tolerance policy for crimes against matrices. sadly. Example: writing A + B.e.g. when A ∈ R2×3 and B ∈ R3×3 . S.EE263 Prof. or equating. so you need to be more vigilant to avoid committing them. the submatrices in any row of a block matrix must have the same number of rows). (What you do with matrices in your spare time. We list some typical examples below. when A ∈ R2×3 and B ∈ R3×3 . or powers of a nonsquare matrix. • Violating the rules of constructing block matrices (e. witnessed too often.. is of course your own business. • Taking the inverse. trace. matrices (or vectors) of diﬀerent dimensions.) Check your work — don’t become just another sad statistic! Syntax crimes In a syntax crime. subtracting. Example: forming AT B. 1 ..

when m < n. when Q ∈ R5×3 . Note: (AB)−1 = A−1 B −1 violates syntax rules. • Using (AB)T = AT B T (instead of the correct formula (AB)T = B T AT ). if AT B T is not a valid product. identity AB = BA. but unfortunately false. Alleging that a set of m vectors in Rn span Rn . • Referring to a left inverse of a strictly fat matrix or a right inverse of a strictly skinny matrix. • Using (A + B)2 = A2 + 2AB + B 2 .e. 2 . where w ∈ R2 . concluding that B = C from AB = AC. but singular matrix. it violates semantic rules if A or B is not invertible.g. if A or B is not square. when A is not known to be one-to-one (i.. when a. when w ∈ R2 . Note: writing (wwT )−1 = (wT )−1 w−1 . Example: concluding x = y from aT x = aT y. where A ∈ Rm×n and B ∈ Rk×p .) Example: forming (wwT )−1 . Here’s how you might check for various crimes you might commit in forming this expression. • Cancelling matrices on the left or right in inappropriate circumstances. Incorrect use of a matrix identity often falls in this class. e. Alleging that a set of m vectors in Rn is independent. Note: this also violates syntax rules. • Dimension crimes. involves both a syntax and semantic crime.. have independent columns). (Taking the inverse of a nonsquare matrix is a syntax crime—see above. x. • Using (AB)−1 = A−1 B −1 (instead of the correct formula (AB)−1 = B −1 A−1 ). An example Let’s consider the expression (AT B)−1 .• Taking the inverse of a square. when m > n. or involve both syntax and semantic elements. Example: writing QQT = I. Miscellaneous crimes Some crimes are hard to classify. This (false) identity relies on the very useful. y ∈ R4 .

no matter what values A and B might have in your application or argument. when B is scalar. even if A and B have the same dimensions. we must have A and B square or skinny. the matrix AT B can be singular. m ≥ n. Note: if A is a scalar. even though m = k. in which case (AT B)−1 is meaningless. At this point. but can be argued to not violate syntax. A and B must have the same dimensions.e.• We multiply AT . i. which is k × p. Of course. and be skinny or square. • The product AT B is n × p. so we better have m = k to avoid a syntax violation. which is n × m. • If AT B is a strictly skinny–strictly fat product (i. so we better have n = p in order to (attempt to) invert it. and B.e. so we have a semantic violation. then AT B might be a strange thing to write. you can write AT B. we know that the dimensions of A and B must be the same (ignoring the case where one or the other is interpreted as a scalar). Summary: to write (AT B)−1 (assuming neither A nor B is interpreted as a scalar). 3 . A and B are strictly fat). then AT B cannot possibly be invertible. or if A and B are strictly fat. then (AT B)−1 is guaranteed to be meaningless. The point of our analysis above is that if A and B don’t have the same dimension. In a similar way... and are skinny or square. and argue that syntax is not violated. To avoid this.

The least-squares approximate solution of Ax = y is given by xls = (AT A)−1 AT y. We encourage you to verify that you get the same answer (except possibly for some small diﬀerence due to roundoﬀ error) as you do using the backslash operator. You better be sure here that A is skinny (or square) and full rank. or fat. The ‘economy’ QR decomposition. this method is a bit less eﬃcient. (The pseudo-inverse is also deﬁned when A is not full rank. then A\b will generate an error message. This is the unique x ∈ Rn that minimizes Ax − y . orthogonal Q ∈ Rm×m . m ≥ n. S. and full rank. [Q. otherwise you’ll compute something (with no warning messages) that isn’t the least-squares approximate solution. with square. If A is square (and invertible). which computes the pseudo-inverse. it computes A−1 y. There are several ways to compute xls in Matlab.R]=qr(A) returns the ‘full’ QR decomposition. You can also use the pseudo-inverse function pinv(). in which Q ∈ Rm×n (with orthonormal columns) and invertible R. and a bit more sensitive to roundoﬀ errors. the backslash operator just solves the linear equations.0).EE263 Prof. is obtained using [Q. Compared with the backslash operator. The simplest method is to use the backslash operator : xls=A\y. Boyd Least squares and least norm in Matlab Least squares approximate solution Suppose A ∈ Rm×n is skinny (or square). You can also use the formula to compute the least squares approximate solution: xls=inv(A’*A)*A’*y. and then a least-squares solution will be returned. i. you can use xls=pinv(A)*y.e. If A is not full rank. Here you’ll get an error if A is not full rank.) To ﬁnd the least-squares approximate solution using the pseudo-inverse. i. You can compute the least-squares approximate solution using the economy QR decomposition using.e. but it’s not given by the formula above.R]=qr(A. Yet another method is via a QR decomposition.. but you won’t notice it unless m and n are a thousand or so. and R ∈ Rm×n upper triangular (with zeros in its bottom part). 1 . which means that Rank(A) = n. which is A† = (AT A)−1 AT when A is full rank and skinny (or square). In Matlab.. for example.

You’ll get an error here if A is not full rank. but it’s not given by the formula above. Least norm solution Now suppose A ∈ Rm×n and is fat (or square).e. or if A is skinny. We can compute xln in Matlab in several ways. m ≤ n. (The pseudo-inverse is also deﬁned when A is not full rank. xln=Q*(R’\y). since then R will not be invertible.[Q. otherwise you’ll compute something (with no warning messages) that isn’t the least-norm solution. Warning! We should also mention a method that doesn’t work : the backslash operator. % compute economy QR decomposition xls=R\(Q’*y).R]=qr(A’. xln has the smallest norm. we encourage you to check that the residual is orthogonal to the columns of A. which computes the pseudo-inverse.. You can use the formula to compute xln : xln=A’*inv(A*A’)*y.0). Among all solutions of Ax = y.R]=qr(A.0). and full rank. If A is fat and full rank. % compute residual A’*r % compute inner product of columns of A and r and checking that the result is very small.) xln=pinv(A)*y. You better be sure here that A is fat (or square) and full rank. not necessarily the least-norm solution. then A\y gives a solution of the Ax = y. This will fail if A is not full rank. i. which means that Rank(A) = m. You can ﬁnd the least-norm solution via QR as well: [Q. 2 . which is A† = AT (AAT )−1 when A is full rank and fat or square. You can also use the pseudo-inverse function pinv(). The least-norm solution of Ax = y is given by xln = AT (AAT )−1 y. for example with the commands r=A*x-y. To be sure you’ve really computed the least-squares approximate solution.

which is either r (i. This means that you can’t just use the backslash operator: you have to check that what it returns is a solution. This method also has a hidden catch: Matlab uses a numerical tolerance to decide on the rank of a matrix. m > n = r).e.e. Note that executing the ﬁrst line might cause a warning to be issued. In particular. We consider here the general case. with A ∈ Rm×n . the easiest way to determine whether Ax = b has a solution.. if b ∈ R(A). 1 . But this method does not give us a solution. Using the backslash and pseudo-inverse operator In Matlab. S. with Rank(A) = r. Exactly what A\b returns is a bit complicated to describe in the most general case. you decide on the numerical tolerance you’ll accept (i. even when it returns a solution of Ax = b. the rank of A) if b ∈ R(A). and this tolerance might not be appropriate for your particular application. A\b returns the least-squares approximate solution. then A\b returns one. and if so. For example. If the two ranks above are equal. when A is skinny and full rank (i. which in general is not a solution of Ax = b (unless we happen to have b ∈ R(A)). In contrast to the rank method described above. it’s just good common sense to check numerical computations as you do them.EE263 Prof. but if there is a solution to Ax = b. and to ﬁnd such a solution when it does. A common test that works well in many applications is Ax − b ≤ 10−5 b . we conclude that Ax = b does not have a solution. % possibly a solution to Ax=b norm(A*x-b) % if this is zero or very small. when one exists. A couple of warnings: First. we have a solution If the second line yields a result that is not very small. or r + 1.) In Matlab this can be done as follows: x = A\b. To check existence of a solution is the same as checking if b ∈ R(A). then Ax = b has a solution. (In any case.e... is to use the backslash operator. This can be checked in Matlab using rank([A b]) == rank(A) (or evaluating the two ranks separately and comparing them). Second. Existence of solution via rank A simple way to check if b ∈ R(A) is to check the rank of [A b]. ﬁnd one. Boyd Solving general linear equations using Matlab In this note we consider the following problem: Determine whether there is a solution x ∈ Rn of the (set of) m linear equations Ax = b. A\b sometimes causes a warning to be issued. how small Ax−b has to be before you accept x as a solution of Ax = b). we do not assume that A is full rank. A\b returns a result in many cases when there is no solution to Ax = b.

Here Q ∈ Rm×m is orthogonal. R ∈ Rm×n is upper triangular. Multiplying both sides of this equation by QT gives the equivalent set of m equations Rz = QT b. 2 Actually. and can be fully explained and understood using material we’ve seen in the course. 2 . The zero submatrices in the bottom (block) row of R have m − r rows. Now we have a z that satisﬁes Rz = QT b: z = [z1 0]T . 1 a set r linear equations in n variables. if Ax = b has a solution. We see immediately that there is no solution of Ax = b. the equation above becomes R1 z1 = QT b. Now let’s assume that we do have QT b = 0. The submatrices have the following dimensions: Q1 ∈ Rm×r . 0 This x satisﬁes Ax = b.You can also use the pseudo-inverse operator: x=pinv(A)*b is also guaranteed to solve Ax = b. With this form for z. the construction outlined above is pretty much what A\b does. where z = P T x. Using the QR factorization While the backslash operator is a convenient way to check if Ax = b has a solution. you have to check that the result satisﬁes Ax = b. and P ∈ Rn×n is a permutation matrix. We start with the full QR factorization of A with column permutations: AP = QR = Q1 Q2 R1 R2 0 0 . it doesn’t have to. We get the 1 corresponding x from x = P z: −1 R1 QT b 1 x=P . R1 ∈ Rr×r is upper triangular with nonzero elements along its main diagonal. Q2 ∈ Rm×(m−r) . Then the equations reduce to 2 R1 z1 + R2 z2 = QT b. and R2 ∈ Rr×(n−r) . it’s a bit opaque. since in general. Here we describe a method that is transparent. from which we 1 −1 T get z1 = R1 QT b. Expanding this into subcomponents gives Rz = R1 R2 0 0 z= QT b 1 QT b 2 . We can ﬁnd a solution of these equations by setting z2 = 0. Whew. As with the backslash operator. unless we have QT b = 0. because 2 the bottom component of Rz is always zero. provided we have QT b = 0. Using A = QRP T we can write Ax = b as QRP T x = QRz = b.

n]=size(A).r+1:m).. % construct the submatrices Q1=Q(:..1:r). b is in range(A) % construct a solution x=P*[R1\(Q1’*b). R1=R(1:r. % could also get rank directly from QR factorization . % check if b is in range(A) norm(Q2’*b) % if this is zero or very small. % full QR factorization r=rank(A). [Q. we can carry out this construction as follows: [m. if b is in range(A) % check alleged solution (just to be sure) norm(A*x-b) 3 .In Matlab.P]=qr(A).R. % satisfies Ax=b.1)]. zeros(n-r.1:r). Q2=Q(:.

The matrix A need not be the only rank p matrix that is closest to A. (The Frobenius norm is just the Euclidean norm of the matrix. .. as measured in the Frobenius norm. . We are given a matrix A ∈ Rm×n with ˆ rank r. 2 Nearest positive semideﬁnite matrix n Suppose that A = AT ∈ Rn×n . also of rank p. A 1 . with eigenvalue decomposition A= i=1 T λi qi qi . i. i. qn } are orthonormal. 1 Low rank approximation In lecture 15 we considered the following problem. . Boyd Low Rank Approximation and Extremal Gain Problems These notes pull together some similar results that depend on partial or truncated SVD or eigenvector expansions. where ˆ ‘nearest’ is measured by the matrix norm. . where {q1 . however.) In this case. A is the unique rank p closest matrix to A. ˜ ˆ there can be other matrices. that satisfy A − A = A − A = σp+1 ... and we want to ﬁnd the nearest matrix A ∈ Rm×n with rank p (with p ≤ r). ˆ A−A ˆ ˆ = Tr(A − A)T (A − A) 1/2 F = m n i=1 j=1 ˆ (Aij − Aij )2 1/2 . We found that a solution is ˆ A= where A= i=1 p T σi ui vi . A − A . i=1 r T σi ui vi ˆ is the SVD of A. and λ1 ≥ · · · ≥ λn . written out as a long column ˆ vector.e. i. S. Consider the problem of ﬁnding a ˆ ˆ ˆ nearest positive semideﬁnite matrix. as measured in the Frobenius norm.e. a matrix A = AT 0 that minimizes A − A .EE263 Prof.e. ˆ It turns out that the same matrix A is also the nearest rank p matrix to A.

. . i=1 Thus. . and therefore has minimum gain zero. We can express the minimum gain of A on V as σmin (AQ). Deﬁning Q = [q1 · · · qp ]. If r = n. We deﬁne the minimum gain of A ∈ Rm×n on V as min{ Ax | x ∈ V. qp }. . We can also ﬁnd a subspace V of dimension p that has the smallest maximum gain of A. The minimum gain of A on this subspace is σp . 3 Extremal gain problems r Suppose A ∈ Rm×n has SVD A= T σi ui vi . Let V be a subspace of Rn . x = 1}. . then the vector vn minimizes Ax among all vectors of norm one.e. These results can be extended to ﬁnding subspaces on which A has large or small gain. . . 0}qi qi . as measured in the Frobenius norm.solution to this problem is ˆ A=A= n T max{λi . we have QT Q = Ip . A has nullspace {0}). The solution is what you’d guess. The matrix A is sometimes called the positive semideﬁnite part of A. We can then pose the question: ﬁnd a subspace of dimension p. i=1 You already know that v = v1 maximizes Ax over all x with norm one. the span of the right singular vectors associated with the p smallest singular values. . where Ip is the p × p identity matrix. 2 . V = span{q1 .. ˆ As you might guess. deﬁned as max{ Ax | x ∈ V. they all have the same minimum gain. If r < n. vr }. . one such subspace is V = span{vr−p+1 . To deﬁne a subspace of dimension p we use an orthonormal basis. v1 deﬁnes a direction of maximum gain for A. to get a nearest positive semideﬁnite matrix. provided p ≤ r: V = span{v1 . on which A has the largest possible minimum gain. So when p > r you can take V as any subspace of dimension p. the span of the right singular vectors associated with the p largest singular values. then any unit vector x in N (A) minimizes Ax . . We can put state these results in a more concrete form using matrices. . vp }. namely. zero. In other words. then any subspace of dimension p intersects the nullspace of A. . If p > r. x = 1}. the matrix A is also the nearest positive semideﬁnite matrix to A. We can also ﬁnd a direction of minimum gain. Assuming r = n (i. you simply remove the terms in the ˆ eigenvector expansion that correspond to negative eigenvalues.

this reduces to the ﬁrst problem above.The problem of ﬁnding a subspace of dimension p that maximizes the minimum gain of A can be stated as maximize σmin (AQ) subject to QT Q = Ip . with variable X ∈ Rn×k . 3 . A solution of this problem is X = [q1 · · · qk ]. You know that a solution of the problem minimize xT Ax subject to xT x = 1. Note that when k = 1. qn } orthonormal. Now consider the following generalization of the ﬁrst problem: minimize Tr(X T AX) subject to X T X = Ik . and {q1 . where the variable is x ∈ Rn . and Ik denotes the k × k identity matrix. and we assume k ≤ n. with variable x ∈ Rn . where the variable is X ∈ Rn×k . One solution to this problem is Q = [v1 · · · vp ].) A solution of this problem is X = [qn−k+1 · · · qn ]. The related maximization problem is maximize xT Ax subject to xT x = 1. is x = qn . 4 Extremal trace problems T Let A ∈ Rn×n be symmetric. The related maximization problem is maximize Tr(X T AX) subject to X T X = Ik . . A solution to this problem is x = q1 . . . with λ1 ≥ i=1 · · · ≥ λn . . with eigenvalue decomposition A = n λi qi qi . (The constraint means that the columns of X are orthonormal.

.

Various power control algorithms are used to adjust the powers pi to ensure that Si ≥ γ (so that each receiver can receive the signal transmitted by its associated transmitter). Thus the quantities p. Repeat for γ = 5. (1) This scales the power at the next time step to be the power that would achieve Si = αγ. α = 1. the SINR must exceed some threshold value γ (which is often in the range 3 – 10). 2. First some background. The powers are all updated synchronously at a ﬁxed time interval. and S are discrete-time signals.2 State equations for a linear mechanical system.e. Note that increasing pi (t) (power of the ith transmitter) increases Si but decreases all other Sj .01. where A ∈ Rn×n and b ∈ Rn are constant. we get to the problem. . We consider a network of n transmitter/receiver pairs. changing the transmit powers also changes the interference powers.3 . q. 2. Comment brieﬂy on what you observe. The noise plus interference power at receiver i is given by qi = σ + Gij pj j=i where σ > 0 is the self-noise power of the receivers (assumed to be the same for all receivers). Use matlab to simulate the power control algorithm (1). so it’s not that simple! Finally.1 . denoted by t = 0. and Gii are positive). S. 1. The equations of motion of a lumped mechanical system undergoing small motions can be expressed as M q + Dq + Kq = f ¨ ˙ 1 .2. In this problem. What we’d like is Si (t) = si (t)/qi (t) = αγ. γ. Transmitter i transmits at power level pi (which is positive).. we consider a simple power control update algorithm. so for example p3 (5) denotes the transmit power of transmitter 3 at time epoch t = 5. if the interference plus noise term were to stay the same.1 A simple power control algorithm for a wireless network. for example. Describe A and b explicitly in terms of σ. Comment: You’ll soon understand what you see. The signal power at receiver i is given by si = Gii pi .. The signal to interference plus noise ratio (SINR) at receiver i is deﬁned as Si = si /qi .EE263 Autumn 2011-12 Prof. σ = 0. in the form p(t + 1) = Ap(t) + b. starting from various initial (positive) power levels. . γ = 3. Boyd EE263 homework problems Lecture 2 – Linear functions and examples 2. The path gain from transmitter j to receiver i is Gij (which are all nonnegative.1 G = . A very simple power update algorithm is given by pi (t + 1) = pi (t)(αγ/Si (t)). α and the components of G. But unfortunately. For signal reception to occur. Use the problem data 1 . and compare it to the target value αγ. (b) matlab simulation. one or two dB). where α > 1 is an SINR safety margin (of.1 2 .2 . . . i. (a) Show that the power control algorithm (1) can be expressed as a linear dynamical system with constant input.1 3 Plot Si and p as a function of t.

.. Let u and y be two time series (input and output. y(k) ∈ Rp ) are readily handled by allowing the coeﬃcients ai . respectively. then do we have A = A? Either show that this is so.e. if A ∈ Rm×n is another matrix such that f (x) = Ax for all ˜ x ∈ Rn . a function from Z+ into R. We think of u(k) as the value of the signal or quantity u at time (or epoch) k. is called a moving average (MA) model. i. Another widely used model is the autoregressive moving average (ARMA) model. i. econometrics). damping. 2.. You can assume that the convolution is causal. and the set of variables over which we average ‘slides along’ with time.) Remark: multi-input. M .3 Some standard time-series models..) Is the matrix A ˜ ˜ that represents f unique? In other words. bi to be matrices. and stiﬀness matrices. and then verify that f (x) = Ax for any x ∈ Rn . Show that there is a matrix A ∈ Rm×n such that for all x ∈ Rn . and f (t) ∈ Rk is the vector of externally applied forces. multi-output time-series models (i. y(k − p) and for the AR model.. . which combines the MA and AR models: y(k) = b1 y(k − 1) + · · · + bp y(k − p) + a0 u(k) + · · · + ar u(k − r). or give an explicit counterexample. and K are the mass. The study of time series predates the extensive study of state-space linear systems. and is used in many ﬁelds (e. 2. 2. (Explicitly describe how you get the coeﬃcients Aij from f . u(k − r) y(k − 1) . For the MA model. A time series is just a discrete-time signal. with state x= input u = f . x(k) = .4 Representing linear functions as matrix multiplication.e. k ∈ Z. 2 . regression on) the current input and some previous values of the output. .. where hk ∈ R.e. Finally. since the current output is a linear combination of (i. u(k) ∈ Rm . use state You decide on an appropriate state vector for the ARMA model. write linear system equations for the mechanical system.g. D. even with diﬀerent dimensions. . x(k) = . since the output at time k is a weighted average of the previous r inputs. Another model is given by y(k) = u(k) + b1 y(k − 1) + · · · + bp y(k − p). and output y = q. Suppose that f : Rn −→ Rm is linear. Suppose that u and y are scalar-valued discrete-time signals (i.5 Some linear functions associated with a convolution system. This model is called an autoregressive (AR) model. respectively). hj = 0 when j < 0. sequences) related via convolution: y(k) = j hj u(k − j). (There are many possible choices for the state here. use state u(k − 1) .e. We recommend you choose a state for the ARMA model that makes it easy for you to derive the state equations. f (x) = Ax.. the problem: Express each of these models as a linear dynamical system with input u and output y. The relation (or time series model) y(k) = a0 u(k) + a1 u(k − 1) + · · · + ar u(k − r) q q ˙ . Assuming M is invertible.where q(t) ∈ Rk is the vector of deﬂections.e.

(a) Find A ∈ R2×2 such that y = Ax in the system below: 3 = G . Give the structure a reasonable.8 Some sparsity patterns. 2. . . . . .e. 2. . . a1 . T is called the input/output or Toeplitz matrix (of size N + 1) associated with the convolution system. . . . The matrix T describes the linear mapping from (a chunk of) the input to (a chunk of) the output. Dp = dp/dx. H is called the Hankel matrix (of size N + 1) associated with the convolution system. Y = . . . Now assume that u(k) = 0 for k > 0 or k < −N and let u(0) y(0) u(−1) y(1) U = . as the vector (a0 . Similarly.7 Consider the (discrete-time) linear dynamical system x(t + 1) = A(t)x(t) + B(t)u(t). Describe the sparsity structure of A. . then the coeﬃcients of dp/dx are given by Da). p(x) = an−1 xn−1 + an−2 xn−2 + · · · + a1 x + a0 . . . yi depends only on xj for j odd. u(N ) The matrix G shows how the output at t = 0. and Y gives (a chunk of) the resulting future output. an−1 ) ∈ Rn . for i even. u(N ). Assume that u(k) = 0 for k < 0. . 2. . suggestive name. . Consider the linear transformation D that diﬀerentiates polynomials.(a) The input/output (Toeplitz) matrix. respectively. . Find the matrix T such that Y = T U .6 Matrix representation of polynomial diﬀerentiation. . if the coeﬃcients of p are given by a.9 Matrices and signal ﬂow graphs. (a) A matrix A ∈ Rn×n is tridiagonal if Aij = 0 for |i − j| > 1. (b) The Hankel matrix. u(−N ) y(N ) Here U gives the past input to the system. . Find a matrix G such that y(0) y(1) . Y = . We can represent a polynomial of degree less than n. Find the matrix D that represents D (i. Draw a block diagram of y = Ax for A tridiagonal. . . x(0) u(0) . 2. (b) Consider a certain linear mapping y = Ax with A ∈ Rm×n . N depends on the initial state x(0) and the sequence of inputs u(0).. y(N ) y(t) = C(t)x(t) + D(t)u(t). For i odd. u(N ) y(N ) Thus U and Y are vectors that give the ﬁrst N + 1 values of the input and output signals. Find the matrix H such that Y = HU . and deﬁne u(0) y(0) u(1) y(1) U = . . .e. i. yi depends only on xj for j even. . .. . .

.e. by directly evaluating all possible paths from each xj to each zi .. 5.5 . and no self loops (i. Let A ∈ Rn×n be the node adjacency matrix. n. 4. . To describe the allowed symbol transitions we can deﬁne a matrix A ∈ Rn×n by Aij = 1 if symbol i is allowed to follow symbol j .5 x2 + + + + z2 Do this two ways: ﬁrst. symbol 2 must be followed by 3. . .x1 2 y1 0. km where ki ∈ {1. and the following transition rules: • 1 must be followed by 2 or 3 • 2 must be followed by 2 or 5 4 . k1 . ﬁnd a speciﬁc input force sequence x that moves the mass to ﬁnal position 1 and ﬁnal velocity zero.5 . . For each symbol we give a set of symbols which are allowed to follow the symbol.11 Counting paths in an undirected graph. 3. where k ∈ Z..e. A sentence is a ﬁnite sequence of symbols.e. A language or code consists of a set of sequences. Consider an undirected graph with n nodes. . n}. Symbol 1 can be followed by 1 or 3. 0 if symbol i is not allowed to follow symbol j (a) Let B = Ak .5 x2 (b) Find B ∈ R2×2 such that z = Bx in the system below: x1 . which we will call the allowable sequences. all branches connect two diﬀerent nodes). 2. 2. As a simple example. by expressing the matrix B in terms of A from the previous part (explaining why they are related as you claim). and Aii = 0 since there are no self loops. 2. the sentence 1132312 is not allowable (i. k ≥ 1. . 2. We consider a language or code with an alphabet of n symbols 1. Give a simple interpretation of Bij in terms of the original graph. . For n = 4. . in the language). We can intrepret Aij (which is either zero or one) as the number of branches that connect node i to node j. consider a Markov language with three symbols 1.10 Mass/force example. . and symbol 3 can be followed by 1 or 2. Give an interpretation of Bij in terms of the language. (You might need to use the concept of a path of length m from node p to node q. .. Let B = Ak . 3. The sentence 1132313 is allowable (i. 2.5 2 2 + y2 2 2 z1 . not in the language).) 2. . Find the matrix A for the mass/force example in the lecture notes.12 Counting sequences in a language or code. and second. A language is called Markov if the allowed sequences can be described by giving the allowable transitions between consecutive symbols. deﬁned as 1 if there is a branch from node i to node j Aij = 0 if there is no branch from node i to node j Note that A = AT . (b) Consider the Markov language with ﬁve symbols 1.

we mention some reasons why the possible transmissions are assigned to time-slots. 2. 2. To make sure the terminology is clear. .14 Communication over a wireless network with time-slots. K}. or if they would violate some limit (such as on the total power available at a node) if the transmissions occurred simultaneously. you must explain how you solve the problem. where m is an integer. . with n = 4 nodes. . the edges assigned to time-slot 1 are again active. Compare this number to the simple code that consists of all sequences from the alphabet (i. and k ∈ {1. . and counting how many of these have symbol i as the seventh symbol.Find the total number of allowed sentences of length 10. At time period t = K + 1. . the pattern repeats. (We’re interested in the symbol for which this count is largest. which are labeled 1. and the following symbol transition rules: • 1 must be followed by 2 or 3 • 2 must be followed by 2 or 5 • 3 must be followed by 1 • 4 must be followed by 4 or 2 or 5 • 5 must be followed by 1 or 3 Among all allowed sequences of length 10. Consider the Markov language from exercise 12. all symbol transitions are allowed). .. K. transmissions can occur only over edges assigned to time-slot k. If a message is sent from node j to node i in period t. . Explain clearly how you solve the problem. A message or packet can be sent from one node to another by a sequence of transmissions from node to node. labeled 1. 2. . At time period t.) But we’d like you to use a smarter approach. we consider the very simple example shown below. 5. presumably for transmission during a later period. A directed graph shows which nodes can send messages (directly) to which other nodes. We consider a network with n nodes. .e. at t = K + 2. at time period t = 2. 3. 5. In addition to giving the answer. then in period t + 1 the message is at node i. . and K = 3 time-slots. At time period t = 1. . .13 Most common symbol in a given position. After time period t = K. n. 4. Two possible transmissions are assigned to diﬀerent time-slots if they would interfere with each other. the message can be sent across any edge that is active at period t. or transmitted across any edge emanating from node i and active at time period t + 1. Each edge is assigned to one of K time-slots. as well as giving the speciﬁc answer. . etc. for i = 1. Although it doesn’t matter for the problem. and so on. only the edges assigned to time-slot 2 can transmit a message. Do not hesitate to use matlab. only the edges assigned to time-slot 1 can transmit a message. It is also possible to store a message at a node during any time period. ﬁnd the most common value for the seventh symbol. an edge from node j to node i means that node j can transmit a message directly to node i. speciﬁcally. . and can be stored there. the edges assigned to time-slot 2 are active. This cycle repeats indeﬁnitely: when t = mK + k. 1 k=2 2 • 3 must be followed by 1 • 4 must be followed by 4 or 2 or 5 • 5 must be followed by 1 or 3 k=1 k=3 k=1 4 k=2 5 3 . . with ﬁve symbols 1. In principle you could solve this problem by writing down all allowed sequences of length 10.

every node that has the message forwards it to all nodes it is able to transmit to. the message is available at the node to be transmitted along any active edge emanating from that node. Find the fastest way to get a message that starts at node 5. What is the minimum time it takes before all nodes have a message that starts at node 7? For both parts of the problem.m. you can store a message at any node in any period. we allow multi-cast: if during a time period there are multiple active edges emanating from a node that has (a copy of) the message. a copy is kept there. Thus. give the transmission (as in ‘transmit from node 7 to node 9’) or state that the message is to be stored (as in ‘store at node 13’). = 2). to node 18. At each time period. as well as a description of your approach and method. 6 . (b) Minimum time ﬂooding.m. = 3). We consider a speciﬁc network with n = 20 nodes. even when the message is transmitted to another node. = 2). to all others. You only need to give one prescription for getting the message from node 5 to node 18 in minimum time. Note that we set Aii = 0 for convenience. and we attach no cost to storage or transmission. then transmission can occur during that time period across all (or any subset) of the active edges.. and K = 3 time-slots. the problem. This choice has no signiﬁcance. The sequence of transmissions (and storing) described above gets the message from node 1 to node 3 in 5 periods. Moreover. consider the simple example described above. the transmission used is active. You can check that at each period.In this example. The labeled graph that speciﬁes the possible transmissions and the associated time-slot assignments are given in a matrix A ∈ Rn×n . so there is no harm is assuming that at each time period. you must give the speciﬁc solution. and assigned to time-slot k 0 if transmission from node j to node i is never allowed Aij = 0 i = j. we can send a message that starts in node 1 to node 3 as follows: • • • • • During During During During During period period period period period t=1 t=2 t=3 t=4 t=5 (time-slot (time-slot (time-slot (time-slot (time-slot k k k k k = 1). i. we are interested in getting a message that starts at a particular node. Finally. Be sure that transmissions only occur during the associated time-slots. To illustrate this encoding of the graph. assigned to the associated time-slot. store it at node 1. we assume that once the message reaches a node. For this example. Give your solution as a prescription ordered in time from t = 1 to t = T (the last transmission). In this part of the problem. (a) Minimum-time point-to-point routing. we have 0 0 0 1 2 0 1 0 Aexample = 0 0 0 2 . with edges and time-slot assignments given in ts_data. and not the simple example given above. transmit it to node 3. = 1). store it at node 4. 0 3 0 0 The problems below concern the network described in the mﬁle ts_data.e. transmit it to node 4. transmit it to node 2. as in the example above. In this part of the problem. as follows: k if transmission from node j to node i is allowed. at any future period.

e. Roughly speaking. We consider x ∈ Rn as a signal. . .) (a) Suppose that A ∈ Rm×n and b ∈ Rm . A simple example illustrating this notation is shown below. near x. (c) 2× down-sampling with averaging. We consider a directed graph with n nodes. (Example: y = mx+b is described as ‘linear’ in US high schools. We assume here that n is even. a linear function plus a constant. the Taylor approximation ˆ ftay is very near f .) 2. (b) Now the converse: Show that any aﬃne function f can be represented as f (x) = Ax + b. for some A ∈ Rm×n and b ∈ Rm . Recall that the gradient of a diﬀerentiable function f : Rn → R. (c) f (x) = xT Ax.18 Paths and cycles in a directed graph. (b) f (x) = xT Ax. (Without the restriction α + β = 1. β ∈ R with α + β = 1. (a) 2× up-conversion with linear interpolation. yi = (xi/2 + xi/2+1 )/2.. We take y ∈ R2n−1 .17 Aﬃne functions. (Yes. Aii = 0 for all i. and take y ∈ Rn/2 .) Hint. we have f (αx + βy) = αf (x) + βf (y). For simplicity we do not allow self-loops. b ∈ R. aﬃne functions are (mistakenly. The ﬁrst order Taylor approximation of f . Find the gradient of the following functions. with yi = x2i . For each one. A function f : Rn → Rm is called aﬃne if for any x. for i = 1. Below we describe several transformations of the signal x. this is a special case of the previous one. We assume here that n is even. For z near x. For i odd. Show that the function f (x) = Ax + b is aﬃne. is deﬁned as the vector ∂f ∂x1 where the partial derivatives are evaluated at the point x. .. ∇f (x) = . 7 . this operation doubles the sample rate. In some contexts. and take y ∈ Rn/2 . i.) 2. 1 ≤ i ≤ n. for A ∈ Rn×n . that produce a new signal y (whose dimension varies). i.16 Some matrices from signal processing. is given by ˆ ftay (z) = f (x) + ∇f (x)T (z − x). with xi the (scalar) value of the signal at (discrete) time period i. Note that the edges are oriented. with yi = (x2i−1 + x2i )/2. i. Show that the function g(x) = f (x) − f (0) is linear.15 Gradient of some common functions. ﬁnd a matrix A for which y = Ax. at a point x ∈ Rn . where a ∈ Rn . deﬁned as Aij = 1 if there is an edge from node j to node i 0 otherwise. 2. even though in general they are not. . n. . (This representation is unique: for a given aﬃne function f there is only one A and one b for which f (x) = Ax + b for all x. Express the gradients using matrix notation. inserting new samples in between the original ones using linear interpolation. For i even.e. This function is aﬃne.. . (b) 2× down-sampling. y ∈ Rn and any α. ∂f ∂xn .e. The graph is speciﬁed by its node adjacency matrix A ∈ Rn×n . plus an oﬀset. You can think of an aﬃne function as a linear function. this would be the deﬁnition of linearity. yi = x(i+1)/2 . where A = AT ∈ Rn×n . (a) f (x) = aT x + b. A34 = 1 means there is an edge from node 4 to node 3.2. or informally) called linear.

given in the ﬁle directed_graph. ﬁnd the most common ending node. For each of the following questions. s1 . As0 . or predict.20 Quadratic extrapolation of a time series. ˆ 8 si = sj for i = j. . l − 1. We are given a series z up to time t. 3. . (g) Among all paths of length 10. that does not pass through node 3? (d) What is the length of a shortest path from node 13 to node 17. and z(t − 2). . in the graph shown above. ﬁnd i. More precisely. and For example.. .. Aij ≥ 0. . In other words. with As1 . .. sl = i of nodes. . j = 0. with the same starting and ending node. n. For example. . . 1. you can give the answer as ‘inﬁnity’. . we want to extrapolate. 2.sk = 1 for k = 0. z(t − 1). Suppose a matrix A ∈ Rn×n .. 3. z(t).e. Using a quadratic model. have all their elements nonnegative. . nodes 2 and 3 are connected in both directions. 2. A cycle of length l is a path of length l. l − 1. you will ˆ ﬁnd z (t + 1) as follows. 2. with Ask+1 .m on the course web site. 1. there is an edge from 2 to 3 and also an edge from 3 to 2. . i. Your solution (which includes what you can say about A and B. that does pass through node 9? (e) Among all paths of length 10 that start at node 5. z(t + 1) based on the three previous elements of the series. (a) What is the length of a shortest cycle? (Shortest means minimum length. 2 is a path of length 3. . 0 1 A= 0 0 0 0 1 0 0 1 0 1 1 0 . in the graph shown above. In other words. sl−1 .e. 1. s0 . What can you say must be true of A and B? Please give your answer ﬁrst. You must also explain clearly how you arrived at your answer. for i.s1 = 1.) (b) What is the length of a shortest path from node 13 to node 17? (If there are no paths from node 13 to node 17. with no repeated nodes other than the endpoints. The rest of this problem concerns a speciﬁc graph. you must give the answer explicitly (for example. as well as your justiﬁcation) must be short.s0 = 1. 2. . .) (c) What is the length of a shortest path from node 13 to node 17. . and then the justiﬁcation. j which maximize the number of paths of length 10 from i to j. .1 2 4 The node adjacency matrix for this example is 3 In this example. j = 1. s1 . enclosed in a box). A path of length l > 0 from node j to node i is a sequence s0 = j. . and its inverse B. 4.sl−1 = 1. . ﬁnd the most common starting node. We’ll denote the predicted value of z(t + 1) by z (t + 1). i. . As2 . a cycle is a sequence of nodes of the form s0 . Bij ≥ 0. 0 0 . i.19 Element-wise nonnegative matrix and inverse. . (f) Among all paths of length 10 that end at node 8. ﬁnd the most common pair of starting and ending nodes. 1 is a cycle of length 4.

). one is good enough for us. . ˆ z(t − 2) where c ∈ R1×3 . (c) Each column of P makes an acute angle with each column of Q. You can assume that all matrices mentioned have appropriate dimensions. . .1*sin(2*t . t = 1:1000. Here is an example: “Every column of C is a linear combination of the columns of B” can be expressed as “C = BF for some matrix F ”. f (t − 1) = z(t − 1). and does not depend on t. the quadratic extrapolator is a linear function.e. In other words. Show that ˆ z(t) z (t + 1) = c z(t − 1) .(a) Find the quadratic function f (τ ) = a2 τ 2 + a1 τ + a0 which satisﬁes f (t) = z(t). row i of Z is a linear combination of rows i.21 Express the following statements in matrix language. (d) Each column of P makes an acute angle with the corresponding column of Q. which is given by (1/997) 1000 2 z j=4 (ˆ(j) − z(j)) 1000 (1/997) j=4 z(j)2 1/2 (b) Use the following matlab code to generate a time series z: . Use the quadratic extrapolation method from part (a) to ﬁnd z (t) for t = 4. (b) W is obtained from V by permuting adjacent odd and even columns (i.1*sin(t) + 0. . 1 and 2. 2.5). (a) For each i. z = 5*sin(t/10 + 2) + 0. . . . 2..22 Show that a + b ≥ a − b . Then the extrapolated value is given by z (t + 1) = f (t + 1). 9 . Find c explicitly. Find the ˆ relative root-mean-square (RMS) error. 1000. . and f (t − 2) = z(t − 2). . n of Y . . 3 and 4. (e) The ﬁrst k columns of A are orthogonal to the remaining columns of A. There can be several answers. .

Deﬁning the vector y of relative demand changes. Mcone = i=1 mi p i . ∂pj This is usually rewritten in term of the elasticity matrix E. The perceived color is a complex function of the three cone responses. qi xj = δpj . with diﬀerent cone response vectors perceived as diﬀerent colors. q is the demand vector. with entries Df (p∗ )ij = ∂fi ∗ (p ).). Here are the questions: (a) What is a reasonable assumption about the diagonal elements Eii of the elasticity matrix? (b) Goods i and j are called substitutes if they provide a similar service or other satisfaction (e.. and model the cones’ response as follows: 20 20 20 Lcone = i=1 li p i . and short wavelengths. with entries Eij = ∗ ∂fi ∗ 1/qi (p ) .g. where δq ≈ Df (p∗ )δp. Scone ). to q ≈ q ∗ + δq. left and right shoes. Now suppose there is a small price change δp.) 10 . where Df is the derivative or Jacobian of f . What kind of goods could have such an elasticity matrix? 3. medium. Scone = i=1 si pi . automobiles and gasoline.1 Price elasticity of demand. and. Mcone . but the basic idea is right. and give an interpretation (in one or two sentences). δqi ∗ . the vector (Lcone . Human color perception is based on the responses of three diﬀerent types of color light receptors. The three types of cones have diﬀerent spectral response characteristics and are called L. In this problem we will divide the visible spectrum into 20 bands.. For each of these two generic situations. They are called complements if they tend to be used together (e. called cones.g. p∗ j Describe the nullspace of E.. mi and si are nonnegative constants that describe the spectral response of the diﬀerent cones. what can you say about Eij and Eji ? (c) Suppose the price elasticity of demand matrix for two goods is E= −1 −1 −1 −1 . yi = we have the linear model y = Ex. cake and pie. M. S because they respond mainly to long. ∂pj 1/p∗ j so Eij gives the relative change in demand for good i per relative change in price j. etc.e. etc. and li . where p is the price vector. so p = p∗ + δp. (Actual color perception is a bit more complicated than this.).2 Color perception. respectively. The current price and demand are denoted p∗ and q ∗ . and the vector x of relative price changes. The demand for n diﬀerent goods is a function of their prices: q = f (p). where pi is the incident power in the ith wavelength band.Lecture 3 – Linear algebra review 3. respectively. and f : Rn → Rn is the demand function. i. This induces a change in demand. train tickets and bus tickets.

m. If the object is illuminated with a light spectrum characterized by Ii . where u. all of the statements 11 . they will look identical when illuminated by sunlight. then so is AB. G phosphor. . Remark. You can use the vectors sunlight and tungsten deﬁned in color perception.: {x | x − a ≤ x − b } = { x | cT x ≤ d} for appropriate c ∈ Rn and d ∈ R. G. p and p. L coefficients.. and draw a picture showing a. (d) Eﬀects of illumination.e. As a trivial example. An object’s surface can be characterized by its reﬂectance (i. but look diﬀerent when lit by sunlight. either show that it is true. then so does AB. which the material of EE263 doesn’t address. A computer monitor has three phosphors. and the reﬂectance of the object is ri (which is between 0 and 1). visually indistinguishable? (Visually identical ˜ lights with diﬀerent spectral power compositions are called metamers. Sally argues that if the two objects look identical when illuminated by a tungsten bulb. (c) Visual matching with phosphors. . .3 Halfspace. the observer is asked the ﬁnd a spectrum of the form pmatch = a1 u + a2 v + a3 w. the fraction of light it reﬂects) for each band of wavelengths. or give a (speciﬁc) counterexample. intensities. however. c. Find weights that achieve the match or explain why no such weights exist. where i = 1. then the reﬂected light spectrum is given by Ii ri . Can this always be done? Discuss brieﬂy. b. and reﬂectances are all nonnegative quantities. v.(a) Metamers. The data for this problem is in color perception. 3.) (b) Visual color matching. Running color perception will deﬁne and plot the vectors wavelength. These issues can be handled using the material of EE364a. B phosphor. b ∈ Rn are two given points. explain why. and the halfspace. S coefficients. i. • If A and B are full rank then AB is full rank. Spectra. an observer is shown a test light and is asked to change the intensities of three primary lights until the sum of the primary lights looks like the test light.m as the light sources. and ai are the intensities to be found. and test light. So just ignore this while doing this problem. In other words. 20 denotes the wavelength band. and B. R. 3. It is desired to adjust the phosphor intensities to create a color that looks like a reference test light. When are two light spectra. Now consider two objects illuminated (at diﬀerent times) by two diﬀerent light sources.4 Some properties of the product of two matrices. Some of the false statements above become true under certain assumptions on the dimensions of A and B. w are the spectra of the primary lights. • If A and B have zero nullspace. Beth disagrees: she says that two objects can appear identical when illuminated by a tungsten bulb. say an incandescent bulb and sunlight. M coefficients. • If A and B are onto. Suppose a. R phosphor. Show that the set of points in Rn that are closer to a than b is a halfspace. For each of the following statements. In a color matching problem.e. Who is right? If Sally is right. . Give c and d explicitly. If Beth is right give an example of two objects that appear identical under one light source and diﬀerent under another. You can assume that A ∈ Rm×n and B ∈ Rn×p . that is visually indistinguishable from a given test light spectrum ptest . • If AB is full rank then A and B are full rank.

v ⊥ ∈ V ⊥ . You can use matlab to verify the ranks.) Explain brieﬂy why the rank of AB must be one of the values you give... for which Rank(AB) = r. If V is a subspace of Rn we deﬁne V ⊥ as the set of vectors orthogonal to every element in V. transpose. v2 .. You can give your conditions as inequalities involving n. V ⊥ = { x | x.005%. Hint: let v be the projection of x on V. The relative error of the approximation is given by η = e/ x0 − a . B. ∀y ∈ V } . Please try to give simple examples. m. and also draw a picture (for n = 2) to demonstrate it. the linearized model is accurate to about 0. for a relative displacement of α = 1%. and B ∈ R5×7 has rank 3. Express V and V ⊥ in terms of the matrix V = v1 v2 · · · vr ∈ Rn×r using common terms (range.. What values can Rank(AB) possibly have? For each value r that is possible. i. m. nullspace. e = x0 + δx − a − x0 − a − k T δx. • Verify that |β| ≤ α. Try to ﬁnd the most general conditions you can. 2 3. or you can use more informal language such as “A and B are both skinny. For example.e. where k is the unit vector (i. By maximizing and minimizing g over the interval −α ≤ β ≤ α show that 0≤η≤ α2 . i. y = x − a . n = m = p = 1. (a) Show that the linearized model near x0 can be expressed as δy = k T δx. 3. 12 (a) Verify that V ⊥ is a subspace of Rn . Consider a single (scalar) measurement y of the distance or range of x ∈ Rn to a ﬁxed point or beacon at a.e. we have η ≤ 0. (b) Consider the error e of the linearized approximation.e. for example. . Suppose that A ∈ R7×5 has rank 4.. and AB are what you claim they are. a speciﬁc A and B with the dimensions and ranks given above. vr . i. .e. (Matlab may still be useful. .7 Orthogonal complement of a subspace. In fact you will prove that 0≤η≤ α2 2 where α = δx / x0 − a is the relative size of δx. etc. that the absolute value of the relative error is very small provided δx is small. you just have to double check that the ranks it ﬁnds are correct.above are true when A and B are scalars. In many speciﬁc applications.6 Linearizing range measurements. and p that make them true.00005. it is possible and useful to make a stronger statement.) (c) Show that every x ∈ Rn can be expressed uniquely as x = v + v ⊥ where v ∈ V. Derive this analytically. but we don’t recommend it: numerical roundoﬀ errors in matlab’s calculations can sometimes give errors in computing the rank.. i. ﬁnd conditions on n.5 Rank of a product. y = 0. (b) Suppose V is described as the span of some vectors v1 . with length one) pointing from a to x0 . You will do that here. give an example.” 3. For each of the statements above. that make it easy for you to justify that the ranks of A.e. to derive a bound on how large the error can be. To prove this bound you can proceed as follows: • Show that η = −1 + 1 + α2 + 2β − β where β = k T δx/ x0 − a . and p. .. . We know.e. • Consider the function g(β) = −1 + 1 + α2 + 2β − β with |β| ≤ α. of course. i. i.e.

(d) Show that dim V ⊥ + dim V = n. range. with Boolean addition and multiplication (i. Either design a one-bit error correcting linear block ˆ code with the smallest possible m. (c) Apply (a) to the quadratic resulting when the expression in (b) is expanded. then the decoding is perfect. 1 + 1 = 0). 3.10 Proof of Cauchy-Schwarz inequality. v = 0. i. Consider n = 3. Suppose x ∈ Zn is a Boolean vector we wish to transmit over an 2 m×n is the coding unreliable channel. the ﬁeld Z2 is ﬁnite.e. which consists of the two numbers 0 and 1. where A ∈ Z2 Boolean matrix. in Z2 . x and Ax always point in the same direction. i. c ≥ 0. x = x. This holds if and ˆ only if H is a left inverse of G. Linear block codes.. Find the conditions under which A has full rank. √ (a) Suppose a ≥ 0. i. for example Z2 . In this problem you will explore one simple example: block codes. What can you say about the matrix A? Be very speciﬁc.8 Consider the linearized navigation equations from the lecture notes. in terms of the relative positions of the unknown coordinates and the beacons). the vector y = Gx is formed. has only two elements.. x) = 0 for all x ∈ Rn . where H ∈ Zn×m . we deﬁne a function f : Zn → Zm to be linear (over Z2 ) if 2 2 2 f (x + y) = f (x) + f (y) and f (αx) = αf (x) for every x. 3.e. i. the ith bit is transmitted correctly. and sometimes the complex numbers.e. HG = In .e. In this course the scalar ﬁeld. Note that the vector y is ‘redundant’. when vi = 1. i. Much of the linear algebra for Rn 2 and Cn carries over to Zn . For example. ˆ where v is a noise vector (which usually is zero). and all the operations in the matrix multiplication are Boolean. we still have x = x. independence and rank are all deﬁned in the obvious way for vector spaces over Z2 .e. will usually be the real numbers. and for all λ ∈ R. roughly speaking we have coded an n-bit vector as a (larger) m-bit vector. to get the CauchySchwarz inequality: √ √ |v T w| ≤ v T v wT w. Concepts like nullspace. Unlike R or C.e. i.. One reasonable requirement is that ˆ ˆ 2 if the transmission is perfect. (d) When does equality hold? 3.) 13 .e. i. Describe the conditions geometrically (i.. The coded vector y is transmitted over the channel.9 Suppose that (Ax. A vector in Zn is called a Boolean vector. the ith bit is changed during transmission. Although we won’t consider them in this course.. 3.. the received signal is multiplied by another matrix: x = H y . In a linear decoder. or explain why it cannot be done.. It is easy to show that 2 m×n is a every linear function can be expressed as matrix multiplication. where G ∈ Z2 matrix. Show that |b| ≤ ac. indeed. It is also possible to consider vector spaces over other ﬁelds. the components of vectors. y ∈ Zn and α ∈ Z2 .11 Vector spaces over the Boolean ﬁeld. there are many important applications of vector spaces and linear dynamical systems over Z2 . the received signal y is given by ˆ y = y + v. (a) What is the practical signiﬁcance of R(G)? (b) What is the practical signiﬁcance of N (H)? (c) A one-bit error correcting code has the property that for any noise v with one component equal to one. In a linear block code. f (x) = Ax. This means that when vi = 0. You will prove the Cauchy-Schwarz inequality.e. and m > n. m) code.e. give G and H explicitly and verify that they have the required properties. w ∈ Rn explain why (v + λw)T (v + λw) ≥ 0 for all λ ∈ R. a + 2bλ + cλ2 ≥ 0. which we assume to be the case. (b) Given v. (By design we mean. (e) Show that V ⊆ U implies U ⊥ ⊆ V ⊥ . This is called an (n...

and a nonlinear unbiased estimator. and v = 0. so there exists at least one right inverse. but you don’t need them. Explain in the general case why a linear unbiased estimator exists whenever there is a nonlinear one. (a) The second row of B is zero. In other words: it’s possible to have a nonlinear unbiased estimator. i. You may not assume anything about the dimensions of A. 3.Remark: linear decoders are never used in practice.e. which opens the possibility that we can seek right inverses that in addition have other properties. f is not linear). and v ∈ Rm is the vector of measurement errors and noise. give a speciﬁc example of a matrix A. given y).e.. If the function f : Rm → Rn satisﬁes f (Ax) = x for all x ∈ Rn . then we say that f is an unbiased estimator (of x. its rank. Nonlinear unbiased estimators do exist. i. B. (This statement is taken to be true if there are no unbiased estimators for a particular A. Bij = 0 for i < j. (We’ll be very angry if we have to type in your 5 × 3 matrix into matlab to check it.12 Right inverses. Here. 14 . (b) The nullspace of B has dimension one. and an unbiased nonlinear estimator. If you believe this statement. and justify it completely. We consider the standard measurement setup: y = Ax + v. but no linear unbiased estimator exists. In EE263 we have studied linear unbiased estimators. though.13 Nonlinear unbiased estimators. or explain why there is no such B. In cases where there is a right inverse B with the required property. then give a speciﬁc example of a matrix A.) If you believe this statement. If you believe this statement. there are many right inverses of A. y ∈ Rm is the vector of measurements we take.. 3.. if f is any unbiased estimator. its rank is 3). One of the following statements is true. then f returns the true parameter value x. nullspace. A. (d) The second and third rows of B are the same. In other words. But whenever there is a nonlinear unbiased estimator. This problem concerns the speciﬁc −1 0 A= 0 1 1 0 matrix 0 −1 1 0 0 1 1 0 . x ∈ Rn is the vector of parameters we wish to estimate. and also explain why no linear unbiased estimator exists. What this means is that if f is applied to our measurement vector. explain why. there are far better nonlinear ones. etc. where A ∈ Rm×n . then f must be a linear function. there is also a linear unbiased estimator. either ﬁnd a speciﬁc matrix B ∈ R5×3 that satisﬁes AB = I and the given property. There is no such thing as a nonlinear unbiased estimator. (e) B is upper triangular. brieﬂy explain why there is no such B. 0 This matrix is full rank (i. Pick the statement that is true. You can quote any result given in the lecture notes.e. C. you must brieﬂy explain how you found your B. we allow the possibility that f is nonlinear (which we take to mean. You must also attach a printout of some matlab scripts that show the veriﬁcation that AB = I. In fact. Bij = 0 for i > j. For each of the cases below.) When there is no right inverse with the given property. (c) The third column of B is zero. which are unbiased estimators that are also linear functions. (f) B is lower triangular. There are cases for which nonlinear unbiased estimators exist.

y) = (x. We have exact measurements of the (spherical) distances between each beacon and the unknown point x. which means x = −y. (Thus. . As another example. be the same). because no matter what the dimensions of A and B (which must. . Determine if the following statements are true or false. k. The disturbance v is known to be a linear combination of some (known) disturbance patterns. . measured along the sphere.3. . disturbances d1 . whose columns are the individual disturbance patterns. We consider linear equalizers for the channel. d3 . pk under which we can unambiguously determine x. which is deﬁned as the set of vectors with norm one: S n = {x ∈ Rn | x = 1}.. For the problem data given in cedr_data. where B ∈ Rn×m . 3. the spherical distance between them is sphdist(x. . d1 . and A ∈ Rm×n describes the (known) channel.) We say the equalizer B rejects the disturbance pattern di if x = x. 3. the exact position of x. based on this information. when the disturbance v is any linear combination of d1 . pk ∈ S n are the (known) positions of some beacons on the unit sphere. measured in radians: If x. where we take the angle as lying between 0 and π. y). .14 Channel equalizer with disturbance rejection.e.g.e.. and let x ∈ S n be an unknown point on the unit sphere. or you can give a geometric condition (involving the vectors pi ). We would like to determine.’ (We only need one set of disturbances of the maximum size. . and d6 . d3 . which occurs only when the two points x. Find the conditions on p1 . nullspace. Show the matlab veriﬁcation that your B does indeed reconstruct x. and rejects the disturbance patterns you claim it does. We deﬁne the spherical distance between two vectors on the unit sphere as the distance between them. Show any other calculations needed to verify that your equalizer rejects the maximum number of patterns possible. What we mean by “true” is that the statement is true for all values of the matrices and vectors given. Here is the problem.15 Identifying a point on the unit sphere from spherical distances. If the equalizer rejects a set of disturbance patterns. e. as in ‘My equalizer rejects three disturbance patterns: d2 .16 Some true/false questions. using any of the concepts used in class (e. d3 . and d7 (say).. the statement “A + B = B + A” is true. ˆ when v = di . because there are (square) matrices for which this doesn’t hold. you might say that Bij are the equalizer coeﬃcients. i. . . as the angle between the vectors. dk ∈ Rm . ﬁnd an equalizer B that rejects as many disturbance patterns as possible. the maximum distance between two points in S n is π. we are given the numbers ρi = sphdist(x. however. . rank). given the distances ρi . which have the form x = By. A communication channel is described by y = Ax + v where x ∈ Rn is the (unknown) transmitted signal. pi ). You can give your solution algebraically. without any ambiguity. (The disturbance patterns are given as an m × k matrix D. . You can’t assume anything about the dimensions of the matrices (unless it’s explicitly stated). You must justify your answer. an identity matrix. For example. . the statement holds. . But that doesn’t make the statement true. but you can assume that the dimensions are such that all expressions make sense. for example. . In this problem we consider the unit sphere in Rn . y ∈ Rm is the (known) received signal. No justiﬁcation or discussion is needed for your answers.. and d7 . then it can reconstruct the transmitted signal exactly.m. i. i = 1. more precisely. .) Give the speciﬁc set of disturbance patterns that your equalizer rejects. no matter what x is. the statement A2 = A is false.) Explain how you know that there is no equalizer that rejects more disturbance patterns than yours does. (There are also matrices for which it does hold.) Now suppose p1 . range.) 15 . ˆ (We’ll refer to the matrix B as the equalizer. v ∈ Rm is the (unknown) disturbance signal.g. y are antipodal. y ∈ S n . . for any x ∈ S n . and no matter what values A and B have.

are aﬃne functions of the power dissipated by three processor cores. and i=1 αi ui = 0. is onto. These temperatures. . What we mean by “true” is that the statement is true for all values of the matrices and vectors given.g. is onto.) You can’t assume anything about the dimensions of the matrices (unless it’s explicitly stated). . and no matter what values A and B have. If A is full rank and skinny. however. If the matrix A B A C 0 B A B . We make 4 measurements. then so is the matrix e. As another example. the statement “A + B = B + A” is true. But that doesn’t make the statement true. denoted P = (P1 . then so is the matrix d.18 Temperatures in a multi-core processor. . then αi = 0 for i = 1.17 Some true/false questions. . . Determine if the following statements are true or false. We are concerned with the temperature of a processor at two critical locations. all cores are idling. ABC (e) If (f) If A 0 0 B is full rank. uk ∈ Rn are nonzero vectors such that uT uj ≥ 0 for all i.a. the statement holds. then A is full rank. entries) of the matrix A are positive. denoted T = (T1 . 100W. If all coeﬃcients (i. k. then A is full rank.. (i) Suppose u1 . because no matter what the dimensions of A and B (which must. (j) Suppose A ∈ Rn×k and B ∈ Rn×m are skinny. T2 ) (in degrees C). 16 . and the other two are idling. then so is the matrix 3. then A + B is onto. c. then so are the matrices A and B. entries) of the matrices A and B are nonnegative. which means if α1 . (You can assume the entries of the matrices and vectors are all real. f. full rank matrices that satisfy AT B = 0. and both A and B are onto. but you can assume that the dimensions are such that all expressions make sense. A+B (b) N A+B+C A (c) N AB = N (A) ∩ N (B) ∩ N (C). . P2 . the statement A2 = A is false. and dissipate 10W. one of the processors is set to full power. (h) If AT A is onto. If A and B are onto. . be the same). . an identity matrix. P3 ) (in W). 3... In each experiment we measure and note the temperatures at the two critical locations. In the ﬁrst. A 2 0 (g) If A is onto. In the next three measurements. then so are the matrices A and B. If A and B are onto. then A + B must be onto. (d) N (B T AT AB + B T B) = N (B). then A is onto. then A is onto. because there are (square) matrices for which this doesn’t hold.e. Then the vectors are i k nonnegative independent. . If A and B are onto. A = N (A) ∩ N (B) ∩ N (C). .e. (There are also matrices for which it does hold. A B . αk ∈ R are nonnegative scalars. e. For example. j. .) (a) If all coeﬃcients (i. b. . . Then [A B] is skinny and full rank.

. . . deﬁnes A and y (as A and ytilde). 10%). according to y = A1 x1 + · · · + Ak xk . . . . (You can also assume that ni > 0 and Ai = 0. or even whether any has failed). Ak ..e. in general.21 Vector space multiple access (VSMA). . you don’t have to give a formal argument. . xk . m ≥ ni for i = 1. a This is often expressed as a percentage.P1 10W 100W 10W 10W P2 10W 10W 100W 10W P3 10W 10W 10W 100W T1 27◦ 45◦ 41◦ 35◦ T2 29◦ 37◦ 49◦ 55◦ Suppose we operate all cores at the same power. one (say. ˜ Determine which sensor has failed (or if no sensors have failed). for i = 1. Here are four statements about decodability: (a) Each of the signals x1 . . . For bounding (a. without interference from the other transmitters. and submit your code. 3. If the kth sensor fails. k. . ˜ ˜ The ﬁle one_bad_sensor. you can use the matlab code rank([F g])==rank(F) to check if g ∈ R(F ). We have y = Ax. . . . b) be? Explain your reasoning. . where Ai ∈ Rm×ni . we have yi = yi for all i = k. while rejecting any interference from the other signals x1 .. k. How large can we make p. .19 Relative deviation between vectors. . 3. Unfortunately. 17 . of course. which includes the signals of all transmitters. . we call this vector space multiple access.e. . Roughly speaking. For this exercise. available on the course web site. xk is decodable. (We will see later a much better way to check if g ∈ R(F ). . where y is the same as y ˜ ˜ in all entries except. If all sensors are operating correctly. up to one sensor may have failed (but you don’t know which one has failed. and x ∈ Rn is to be found. this means that receiver i can process the received signal so as to perfectly recover the ith transmitted signal. possibly. Ak . without T1 or T2 exceeding 70◦ ? You must fully explain your reasoning and method. xi−1 . . . . no matter what values x1 . Each receiver knows the received signal y. divided by the norm of a. Suppose ηab = 0.20 Single sensor failure detection and identiﬁcation. ηab = a−b . ηab = ηba . b). i. We consider a system of k transmitter-receiver pairs that share a common medium.1 (i. . . and the matrices A1 . xk have. . The goal is for transmitter i to transmit a vector signal xi ∈ Rni to the ith receiver. p. We say that the ith signal is decodable if the ith receiver can determine the value of xi . . . The relative deviation of b from a is deﬁned as the distance between a and b.) Since the k transmitters all share the same m-dimensional vector space. where A ∈ Rm×n is known. . . The relative deviation is not a symmetric function of a and b. xi+1 . in addition to providing the numerical solution.) 3. How big and how small can be ηba be? How big and how small can (a. . Suppose a and b are nonzero vectors of the same size. You are given y and not y.m. . the kth entry). All receivers have access to the same signal y ∈ Rm . we have y = y . you can just draw some pictures. on the matrices A1 . Whether or not the ith signal is decodable depends. You can assume that the matrices Ai are skinny. You must explain your method.

we will accept any correct one. choose ak that maximizes aT ai . These signals.2. recd aT a3 = 1.e. choose ak that minimizes arecd − ai . must have a very speciﬁc form: it must consist of a conjunction of one or more of the following properties: I. You can just assume that ties never occur. we want only your answers.(b) The signal x1 is decodable. . . there may be more than one correct answer.” This response will receive partial credit. .22 Minimum distance and maximum correlation decoding. you are to give the exact (i. Ak and n1 . . . if we have N = 3 and arecd − a1 = 2.. however. We do not want. . For example. You can also give the response “My attorney has advised me not to respond to this question at this time. For (just) this problem. . which collectively are called the signal constellation. or any other type of answers. in which a sender transmits one of N possible signals to a receiver. (c) The signals x2 . in terms of A1 . For example. For both methods. based on the received signal arecd . . then the minimum distance decoder would guess that the signal a2 was sent. When the signal ak is sent. II. We will denote the possible signals as a1 . possible answers (for each statement) could be “I” or “I and II”. For some statements. aN ) under which these two decoding methods are the same. There are many ways to do this. • Maximum correlation decoding. . . rank(A1 ) < n1 . arecd − a2 = 0. . the receiver has to estimate or guess which of the N signals was sent. IV.. nk . the noise v is described by a statistical model. . where v ∈ Rn is (channel or transmission) noise. one of the signals is always closest to.3. aN ∈ Rn . but in this problem we explore two methods. You must show how 18 . . i. . but x1 isn’t. Choose as the estimate of the decoded signal the one in the constellation that has the largest inner product with the received signal. 3. let’s not worry about breaking ties. recd then the maximum correlation decoder would guess that the signal a3 was sent. . xk are decodable when x1 is 0. and will not read.1. arecd − a3 = 1. . are known to both the transmitter and receiver.. We will represent the signals by vectors in Rn . As examples. We consider a simple communication system. (d) The signals x2 . the set of vectors a1 . Give some general conditions on the constellation (i.0. which receives a version of the signal sent that is corrupted by noise. or has maximum inner product with. necessary and suﬃcient) conditions under which the statement holds. the received signal is arecd = ak + v. Each answer. By ‘same’ we mean this: for any received signal arecd .e. the received signal. rank([A1 · · · Ak ]) = n1 + rank([A2 · · · Ak ]). .e. recd aT a2 = 0. Give the simplest condition you can. i. . if we have N = 3 and recd aT a1 = −1. rank([A1 · · · Ak ]) = rank(A1 ) + rank([A2 · · · Ak ]). The receiver must make a guess or estimate as to which of the signals was sent. but here we’ll just assume that it is ‘small’ (and in any case.1. Based on the corrupted received signal. any further explanation or elaboration. the decoded signal for the two methods is the same. rank([A2 · · · Ak ]) = n2 + · · · + nk . . . . or “I and II and IV”.2. . it does not matter for the problem). Choose as the estimate of the decoded signal the one in the constellation that is closest to what is received. For each of these statements. III. xk are decodable.e. • Minimum distance decoding. In a communications course. . ..

. For example. with v2 giving the ﬂow rate. working from an input-output pair. where λ. (When v2 < 0. In other words.) This corresponds to a second column of S of the form (1. . . Mm . µ.the decoding schemes always give the same answer. 0). and the methods diﬀer. Give the values of the weights. for which your conditions don’t hold. The consumption rate of M1 .) This corresponds to a ﬁrst column of S of the form (−1. . . . µ. . Mm . which tells us the rate of consumption and production of the metabolites per unit of reaction rate. the two decoding schemes diﬀer for some received signal. 0. J smooth = i=2 (yi+1 − 2yi + yi−1 )2 are measures of the continuity and smoothness of y. 2. Your goal is to ﬁnd the weights λ. respectively. 3. so Sij = −2.e.23 Reverse engineering a smoothing ﬁlter. . give a speciﬁc counterexample. Flux balance analysis is based on a very simple model of the reactions going on in a cell. . (We are not asking you to show that when your conditions don’t hold. 19 . µ. and κ. vn ∈ R. due to this reaction. and n n J track = i=1 (ui − yi )2 . . suppose that reaction R2 corresponds to the ﬂow of metabolite M1 into the cell. Here we consider consumption of a metabolite as negative production. and κ. Rn . Here is the problem: You have access to one input-output pair. i.) (b) Carry out your method on the data found in rev_eng_smooth_data. 0. . Reactions are also used to model ﬂow of metabolites into and out of the cell. (We will assume that n ≥ 3. But then again. Each reaction has a (known) stoichiometry. and the associated output y. The stoichiometry data is given by the stoichiometry matrix S ∈ Rm×n . . . . labeled R1 .) You might want to check simple cases like n = 1 (scalar signals). . you can assume this will occur automatically. 3. means that reaction Rj causes metabolite Mi to be consumed at a rate 2vj . an input u. .) The output y is obtained as the minimizer of the objective J = J track + λJ norm + µJ cont + κJ smooth . with reaction rates v1 . the production rate of M2 is v1 . or draw some pictures. We focus on m metabolites in a cell. Also. 0).m.24 Flux balance analysis in systems biology. N = 2 (only two messages in the constellation). A smoothing ﬁlter takes an input vector u ∈ Rn and produces an output vector y ∈ Rn . A positive value of vi means the reaction proceeds in the given direction. when your conditions hold.. it means that |v2 | is the ﬂow rate of the metabolite out of the cell. while a negative value of vi means the reaction proceeds in the reverse direction. then metabolite Mi is produced at the rate 2|vj |. (a) Explain how to ﬁnd λ. you might not. . There are n (reversible) reactions going on. respectively. and n n−1 J cont = i=2 (yi − yi−1 )2 . labeled M1 . (The reaction R1 has no eﬀect on metabolites M4 . As an example. suppose reaction R1 has the form M1 → M2 + 2M3 . . is v1 . J norm = i=1 2 yi are the tracking error and norm-squared of y. and the production rate of M3 is 2v1 . (You do not need to worry about ensuring that these are positive. for example. . . keeping track only of the gross conservation of various chemical species (metabolites) within the cell. If vj is negative. . . you will reverse engineer the smoothing ﬁlter. and κ are positive constants (weights). deﬁned as follows: Sij is the rate of production of Mi due to unit reaction rate vj = 1. . . 1.

) and output signal yt ∈ R.m. . you’ll see more sophisticated versions of the same problem. . we’ll assume that reactions 1. vk = 0. ﬂux balance. (y1 . You can use any concepts from the class. . we get to the point of all this. and of course. we predict that knocking out gene Gk will kill the cell. . that incorporate lower and upper limits on reactions rates and other realistic constraints. range.. using any concepts from the course. . least-squares.The last reaction. gene Gk controls reaction Rk . Hint. t ∈ Z (i. Finally. . we have conservation of the metabolites. The last column of S gives the amounts of metabolites used (when the entry is negative) or created (when positive) per unit of cell growth rate. . . For simplicity. corresponds to biomass creation. t = . which can be expressed as the ﬂux balance equation Sv = 0. 20 .) When hM = 0. Give the value of M found.26 Layered medium. . An input signal (sequence) ut ∈ R. shown as shaded rectangles in the ﬁgure below. In this case. 3. vn > 0. . Knocking out Gk has the eﬀect of setting the associated reaction rate to zero. Now for the problem.25 Memory of a linear time-invariant system. In this problem we consider a generic model for (incoherent) transmission in a layered medium. You are given the input and output signal values over a time interval t = 1. . t ∈ Z. hM ) are the impulse response coeﬃcients of the convolution system. are related by a convolution operator M yt = τ =1 hτ ut−τ . Note that you do not know uτ or yτ for τ ≤ 0 or τ > T . . (b) Carry out your method for the problem data given in flux_balance_bio_data. . 0. . −1. . (a) Explain how to solve this problem. . Remark. yT ). Finally. . so the reaction rate vn is the cell growth rate. . e. nullspace. and the gene knockout. Since our reactions include metabolites entering or leaving the cell.m. n − 1 are each controlled by an associated gene. The medium is modeled as a set of n layers. . You may assume that T > 2M . 3. Rn .. T : (u1 .. (Convolution systems are also called linear time-invariant systems. as well as those converted to biomass within the cell. where h = (h1 . . we consider the eﬀect of knocking out a gene. . . . . This is a very simple version of the problem. the integer M is called the memory of the system. or cell growth.g. . Suppose there is no v ∈ Rn that satisﬁes Sv = 0. and call gene Gk an essential gene. (a) Explain how to ﬁnd all essential genes. The goal is to ﬁnd the smallest memory M consistent with this data. given the stoichiometry matrix S. t ∈ Z.e.e. In EE364a. i. List all essential genes. uT ). This means there are no reaction rates consistent with cell growth. You may ﬁnd the matlab functions toeplitz and fliplr useful. . (b) Use your method from part (a) on the data found in mem_lti_data. 1. you do not know h. separated by n dividing interfaces.

A fault in interface k results in a reversal: tk = 0. Explain how to ﬁnd which interface is faulted.. (a) Explain how to ﬁnd the scattering coeﬃcient S. n − 1. yi = ri xi + ti yi+1 . . (This means that the gate interconnections form a directed acyclic graph. yi versus i. 1] is the reﬂection coeﬃcient of the ith interface. and we let yi ∈ R denote the left-traveling wave amplitude in layer i. ri = 0. n (presumably the output of gate i connects to some external circuitry). which is connected via the wires to other gate inputs and possibly to some external circuitry. Carry out your method with S fault = 0.e. 2. a simple digital circuit with n = 4 gates.and right-traveling wave amplitudes xi . we have FO(i) ⊆ {i + 1. . i = 1. or that gate j is in the fan-out of gate i.02 for i = 1. The right-traveling wave of amplitude xi contributes the amplitude ti xi to xi+1 . When the output of gate i is connected to an input of gate j. (c) Fault location. We denote the fan-out list of gate i as FO(i) ⊆ {1. You may assume that the last (fully reﬂective) interface is not faulty.02. For this circuit we have FO(1) = {3. which means that yn = xn .27 Digital circuit gate sizing.96. . . n}. . rk = 0. 4}. A digital circuit consists of a set of n (logic) gates. We describe the topology of the circuit by the fan-out list for each gate. . with all other interfaces having their nominal values ti = 0. It also contributes the amplitude ri xi to yi . n − 1. where ti ∈ [0. Each gate has one or more inputs (typically between one and four). Hint: You may ﬁnd the matlab function diag(x. . .70. n. Be sure to give the value of k that is most consistent with the measurement. FO(4) = ∅. . and the left-traveling wave in the ﬁrst layer is called the reﬂected wave. Thus we have xi+1 = ti xi + ri yi+1 . which means that the output of gate i does not connect to the inputs of any of the gates 1. . We model the last interface as totally reﬂective. for i = 1. The right. You measure the scattering coeﬃcient S = S fault with the fault (but you don’t have access to the left. . .02. .) To illustrate the notation.96. where ri ∈ [0. n}. the lefttraveling wave. given the transmission and reﬂection coeﬃcients for the ﬁrst n − 1 layers. and ti = 0. . . and one output. . We can have FO(i) = ∅. The scattering coeﬃcient for the medium is deﬁned as the ratio S = y1 /x1 (assuming x1 = 0). is shown below. 3. . . so the left-traveling wave with amplitude yi+1 contributes the wave amplitude ti yi+1 to yi (via transmission) and wave amplitude ri yi+1 to xi+1 (via reﬂection). We will assume that the interfaces are symmetric. i. (b) Carry out your method for a medium with n = 20 layers. we say that gate i drives gate j. 1] is the transmission coeﬃcient of the ith interface. . each with 2 inputs. Plot the left. .96. The right-traveling wave in the ﬁrst layer is called the incident wave. interconnected by wires. . . . .x1 y1 x2 y2 x3 y3 x4 y4 xn−1 yn−1 xn yn interface i = 1 2 3 n−1 n We let xi ∈ R denote the right-traveling wave amplitude in layer i.and left-traveling waves on each side of an interface are related by transmission and reﬂection. FO(2) = {3}. It’s common to order the gates in such a way that each gate only drives gates with higher indices. . We’ll assume that’s the case here.or right-traveling waves with the faulted interface). and report the value of S you ﬁnd. ri = 0.k) useful. which tells us which other gates the output of a gate connects to. 21 FO(3) = ∅.

. i. You can assume the fanout lists. load where βi and γi are positive constants. there may or may not exist a feasible design (i. subject to a given area constraint A ≤ Amax . and the 3 output signals emerging from the right are called primary outputs of the circuit.. . however. in Each gate has an input capacitance Ci . we have di = T . we get to the problem. a choice of the xi ..e. with 1 ≤ xi ≤ xmax ) that yields di = T for i = 1. where T > 0 is given. and A ≤ Amax . which is given by load di = βi + γi Ci /xi . Each gate has a delay di . i. 22 . We will follow a simple design method. in Cj . Note that the gate delay di is always larger than βi . to solve this problem. . of course. where ai are positive constants. which can be intepreted as the minimum possible delay of gate i. achieved only in the limit as the gate scale factor becomes large. Be sure to explain how you determine if the design problem is feasible.. . (a) Explain how to ﬁnd a design x⋆ ∈ Rn that minimizes T . where xmax is a given maximum allowed gate scale factor (typically on the order of 100).e. that T > maxi βi . where αi are positive constants. We can assume. These scale factors are the design variables in the gate sizing problem.) Each gate has a (real) scale factor or size xi . The load capacitance of gate i is given by load ext Ci = Ci + j∈FO(i) ext where Ci is a positive constant that accounts for the capacitance of the interconnect wires and external circuitry. n. . For a given value of T . which assigns an equal delay T to all gates in the circuit.1 3 2 4 The 3 input signals arriving from the left are called primary inputs. They must satisfy 1 ≤ xi ≤ xmax . (You don’t need to know this. and Ci is the load capacitance of gate i.e. whether or not there is an x that gives di = T . Finally. with 1 ≤ xi ≤ xmax . i. which depends on the scale factor xi as in Ci = αi xi . The total area of the circuit has the form n A= i=1 ai xi . and all constants in the problem description are known. your job is to ﬁnd the scale factors xi . T is larger than the largest minimum delay of the gates.e.

yN ∈ R. . which satisfy f (xi ) = yi . e. the ith row of F gives the fanout of gate i. The jth entry in the ith row is 1 if gate j is in the fan-out of gate i. trying (many) diﬀerent values of T over a range. b1 . bm .g. (b) Carry out your method on the particular circuit with data given in the ﬁle gate_sizing_data. and y1 . . . . am . everything you need to know is stated above. . . .28 Interpolation with rational functions. . you are to ﬁnd m. am . . . . 3. and then carry out your method on the problem data given in ri_data. . The problem is to ﬁnd a rational function of smallest degree that is consistent with this data. Explain how you will solve this problem. .) Give the value of m you ﬁnd. and the coeﬃcients a0 . . . . . . It can also involve a simple search procedure. and zero otherwise. 1 + b1 x + · · · + bm xm where a0 . x and y. that give the values x1 . . j entry one if j ∈ FO(i). . . . The fan-out lists are given as an n×n matrix F. In other words. .Your method can involve any of the methods or concepts we have seen so far in the course.. 23 . In other words. In this problem we consider a function f : R → R of the form f (x) = a0 + a1 x + · · · + am xm . . . Comment. . bm are parameters. xN ∈ R and y1 . . . . with i. . . . and b1 . Such a function is called a rational function of degree m. . . . and a0 . and 0 otherwise. We are given data points x1 .m. respectively. . (This contains two vectors. yN . which should be as small as possible. bm . am . . Please show us your veriﬁcation that yi = f (xi ) holds (possibly with some small numerical errors). and not the simple example shown above. with either am = 0 or bm = 0. . xN .m. b1 . . . where yi = f (xi ). You do not need to know anything about digital circuits. Note: this problem concerns the general case.

Show that A(AT A)−1 AT is a projection matrix. Make clear how you decide whether a given orthogonal U is a rotation or reﬂection. then so is U −1 . Note that the ﬁrst requirement says that every consistent y does not trip the alarm. 4. Show that U is either a rotation or a reﬂection. we hope that the resulting measurement y will become inconsistent.2 Orthogonal matrices. not consistent. (a) Show that if P is a projection matrix then so is I − P . (b) Show that if U is orthogonal. Starting from x.3 Projection matrices.) A matrix B ∈ Rk×m is called an integrity monitor if the following holds: • By = 0 for any y which is inconsistent. If we ﬁnd such a matrix B.Lecture 4 – Orthonormal sets of vectors and QR factorization 4. . or become faulty. which has the same distance to z as x does. the second requirement states that every inconsistent y does trip the alarm. Finally. of course. A = −2 1 −1 −2 1 1 0 24 • By = 0 for any y which is consistent. Verify that the matrix R is orthogonal. Show that if P is a projection matrix.) (d) If S ⊆ Rn and x ∈ Rn .5 Sensor integrity monitor. (Which is why such matrices are called projection matrices . Show that U T x ≤ x . When the system is operating normally (which we hope is almost always the case) we have y = Ax.. with k ≤ n. (To reﬂect x through the hyperplane means the following: ﬁnd the point z on the hyperplane closest to x. If the system or sensors fail. then so is U V . Then we’re out of luck. Show that U U T is a projection matrix.) 4. Find the matrix R ∈ Rn×n such that reﬂection of x through the hyperplane {z | aT z = 0} (with a = 0) is given by Rx. be consistent. i. We’ll call a measurement y consistent if it has the form Ax for some x ∈ Rn . go in the direction z − x through the hyperplane to a point on the opposite side.4 Reﬂection through a hyperplane. (a) Show that if U and V are orthogonal. A suite of m sensors yields measurement y ∈ Rm of some vector of parameters x ∈ Rn . the problem. then y = P x is the projection of x on R(P ). . Find an integrity monitor B for the matrix 1 2 1 1 −1 −2 1 3 . A matrix P ∈ Rn×n is called a projection matrix if P = P T and P 2 = P . (c) Suppose that U ∈ R2×2 is orthogonal. where m > n. (c) Suppose A ∈ Rn×k is full rank. the system will fail in such a way that y is still consistent. . If the system becomes faulty. (If we are really unlucky. When do we have U T x = x ? 4.1 Bessel’s inequality. If the system is operating normally then our measurement will. then we no longer have the relation y = Ax. (Later we will show that the converse is true: every projection matrix can be expressed as U U T for some U with orthonormal columns. the point y in S closest to x is called the projection of x on S. Suppose the columns of U ∈ Rn×k are orthonormal. ) 4. we can quickly check whether y is consistent.e. we can send an alarm if By = 0. We can exploit the redundancy in our measurements to help us identify whether such a fault has occured. (b) Suppose that the columns of U ∈ Rn×k are orthonormal.

null.7 Finding a basis for the intersection of ranges. in the sense that it does not accumulate much roundoﬀ error.Your B should have the smallest k (i. 2. (d) BA = 0. You must also verify that the matrix you choose satisﬁes the requirements. . That’s OK. as well as giving us your explicit matrix B. (a) Suppose you are given two matrices. As usual. it will be really. Be sure to give us your matrix C. (d) Given a vector x ∈ Rn . number of rows) as possible. 25 (a) R(B) ⊆ R(A). Verify that R(C) ⊆ R(A) and R(C) ⊆ R(B). numerically sound algorithms. 4. For this reason. Apply the corresponding Householder reﬂection to x to ﬁnd Qx. that is.6 Householder reﬂections. A ∈ Rn×p and B ∈ Rn×q . Show that Qv = v. or qr useful. there are many ways to ﬁnd such a B. Then again. uT u = 1. It’s not just a random matrix. (c) AB = 0.e.. for any v such that uT v = 0. In the list below there are 11 statements about two square matrices A and B in Rn×n . you have to explain what you’re doing. (b) Show that Qu = −u. 4. Hints: • You might ﬁnd one or more of the matlab commands orth. • Be very careful typing in the matrix A. (e) rank([ A B ]) = rank(A). with independent columns. by showing that each column of C is in the range of A. Householder reﬂections are used as building blocks for fast. (c) Show that det Q = −1. (a) Show that Q is orthogonal. with v = x + αe1 (ﬁnd the appropriate α and show that such a u works . don’t expect to have By exactly zero for a consistent y. . ﬁnd a unit-length vector u for which Qx lies on the line through e1 . 4. A Householder matrix is deﬁned as Q = I − 2uuT . multiplication by Q gives reﬂection through the plane with normal vector u. Explain how you can ﬁnd a matrix C ∈ Rn×r . This means that the columns of C are a basis for R(A) ∩ R(B).m. 5). as well as the matlab (or other) code that generated it. you might not. for which R(C) = R(A) ∩ R(B). 1. • When checking that your B works. really small. 4. Please carefully separate your answers to part (a) (the general case) and part (b) (the speciﬁc case). .8 Groups of equivalent statements. Note: Multiplication by an orthogonal matrix has very good numerical properties. Thus. Hint: Try a u of the form u = v/ v . (b) there exists a matrix Y ∈ Rn×n such that B = Y A. where u ∈ Rn is normalized. (b) Carry out the method described in part (a) for the particular matrices A and B deﬁned in intersect_range_data. because of roundoﬀ errors in computer arithmetic. and also in the range of B. ) Compute such a u for x = (3.

For example. and j ∈ Rb the vector of branch currents. i. Two statements are not equivalent if there exist (real) square matrices A and B for which one holds. A group of statements is equivalent if any pair of statements in the group is equivalent. the sum of the currents on branches entering the node equals the sum of the currents on branches leaving the node.e. Each node in the circuit has a potential. (b) Kirchhoﬀ ’s voltage law (KVL) states that the voltage across any branch is the diﬀerence between the potential at the node the branch leaves and the potential at the node the branch enters. for each node. 26 .e. deﬁned as +1 edge k leaves node i −1 edge k enters node i Aik = 0 otherwise. This means you believe that statements a.10 Tellegen’s theorem. i e f. For example. (The direction of each edge is the reference ﬂow direction in the branch: current ﬂowing in this direction is considered positive. which will consist of lists of mutually equivalent statements. because every square matrix that is onto has zero nullspace. Show that this can be expressed Aj = 0. k. statements b and i are equivalent. An electrical circuit has n nodes and b branches. g. (a) Kirchhoﬀ ’s current law (KCL) states that. and k are equivalent. each branch has a voltage and current. Put your answer in the following speciﬁc form. c. (j) rank([ A (i) there exists a matrix Z ∈ Rn×n such that B = AZ. (k) N (A) ⊆ N (B).. You also believe that the ﬁrst group of statements is not equivalent to the second. j. but the other does not. B ]) = rank(B). Your job is to collect them into (the largest possible) groups of equivalent statements. and h are equivalent. (h) R(A) ⊆ N (B). g. c. with topology described by a directed graph. and vice versa. (g) rank( A B ) = rank(A). and statements f. Each new line should start with the ﬁrst letter not listed above. Show that this can be expressed v = AT e. 4. v ∈ Rb the vector of branch voltages. We want just your answer. while current ﬂow in the opposite direction is considered negative.) The directed graph is given by the incidence matrix A ∈ Rn×b . or the third. j.(f) R(A) ⊥ N (B T ). We let e ∈ Rn denote the vector of node potentials. Suppose Q ∈ Rn×n is orthogonal. in (alphabetic) order.9 Determinant of an orthogonal matrix. d. j ∈ N (A). Two statements are equivalent if each one implies the other. List each group of equivalent statements on a line. d. and so on. i. which we assume here). v ∈ R(AT ).. h b. the statement ‘A is onto’ is equivalent to ‘N (A) = {0}’ (when A is square. What can you say about its determinant? 4. we do not need any justiﬁcation. you might give your answer as a.

Consider the problem of solving the linear equations Ax = y. i. AT A = I. Tellegen’s theorem is a power conservation law. for all i = j. Tellegen’s theorem states that for any circuit. Explain how this follows from parts (a) and (b) above. In lecture we saw that if A ∈ Rm×n has orthonormal columns. 4. then for any vector x ∈ Rn we have Ax = x .e. where z = QT y. i. |vk jk | is the power supplied by branch k). In this exercise. The product vk jk is the power entering (or dissipated by) branch k (when vk jk < 0. 2 = x for all x ∈ Rn . and y given. 4. The algorithm you will discover is called back substitution. then A has Hint.12 Solving linear equations via QR factorization. when R is upper triangular and nonsingular (which means its diagonal entries are all nonzero). and try x = ei . m ≥ n).e. The trick is to ﬁrst ﬁnd xn . Show that the converse holds: If A ∈ Rm×n satisﬁes Ax orthonormal columns (and in particular. multiplication by such a matrix preserves norm.. and so on. then ﬁnd xn−2 (remembering that now you know xn and xn−1 ). Be sure to explain why the algorithm you describe cannot fail. In other words. We can interpret Tellegen’s theorem as saying that the total power supplied by branches that supply power is equal to the total power dissipated by branches that dissipate power. we have v T j = 0.(c) Tellegen’s theorem. and then express x as x = A−1 y = R−1 (QT x) = R−1 z. 27 . you’ll develop a method for computing x = R−1 z.11 Norm preserving implies orthonormal columns. and also x = ei + ej . because you are substituting known or computed values of xi into the equations to compute the next xi (in reverse order). with A ∈ Rn×n nonsingular. In other words. then ﬁnd xn−1 (remembering that now you know xn ). We can use the Gram-Schmidt procedure to compute the QR factorization of A. Start with Ax = x 2 . solving Rx = z..

you can leave block matrices in it. u = |u1 |2 + · · · + |un |2 1/2 . verify that A† y solves the complex least-squares problem of minimizing Ax − y (where A. vectors. (You don’t need to simplify your expression. Show that the residual vector r = y − yls satisﬁes r 2 = y 2 − yls 2 . Thus. x. (Which also reduces to the pseudo-inverse you’ve already seen when A is real). in general. There are two general approaches to dealing with complex linear algebra problems. v and u. Note that these two expressions agree with the deﬁnitions you already know when the vectors are real. the ij entry of the matrix A∗ is given by (Aji ). We deﬁne the inner product of two complex vectors u. The complex least-squares problem is to ﬁnd the x ∈ Cn that minimizes Ax − y 2 . Also. Assuming A is full rank and skinny. . (a) What is the relation between u. We’ll explore that idea in this problem.e. y are complex). and then you apply what you already know about real linear algebra. not all zero. Suppose A is skinny and full-rank. where A† is the (complex) pseudo-inverse of A. give a brief geometric interpretation of this equality (just a couple of sentences. v ? Note that the ﬁrst inner product involves complex ˜ ˜ vectors and the second involves real vectors.1 Least-squares residuals. the Hermitian conjugate is the same as the transpose. There are some slight diﬀerences when it comes to the inner product and other expressions that. given by ˜ u= ˜ ℜu ℑu . Most of the linear algebra you have seen is unchanged when the scalars. Let xls be the least-squares approximate solution of Ax = y.. αn .2 Complex linear algebra and least-squares. . which is equal to (A)T . explanatory name). In the ﬁrst. and maybe a conceptual drawing). v = u∗ v. (b) What is the relation between u and u ? ˜ ˜u (c) What is the relation between Au (complex matrix-vector multiplication) and A˜ (real matrixvector multiplication)? ˜ (d) What is the relation between AT and A∗ ? (e) Using the results above. in the real case. . i. and vectors are complex. v ∈ Cn as u. vn } is dependent if there exist complex scalars α1 . . ˜ We associate with a complex matrix A ∈ Cm×n the real matrix A ∈ R2m×2n given by ˜ A= ℜA ℑA −ℑA ℜA . We denote this as A∗ . Another approach is to represent complex matrices and vectors using real matrices and vectors of twice the dimensions. The Hermitian conjugate of a matrix is sometimes called its conjugate transpose (which is a nice. which. . such that α1 v1 + · · · + αn vn = 0. and let yls = Axls . The norm of a complex vector is deﬁned as u = u. 5.) 28 . .Lecture 5 – Least-squares 5. and scalars. is a complex number. involve the transpose operator. . . We associate with a complex vector u ∈ Cn a real vector u ∈ R2n . given by A† = (A∗ A) −1 A∗ . matrices. where A ∈ Cm×n and y ∈ Cm are given. we say a set of complex vectors {v1 . Note that for a real matrix or vector. the solution is xls = A† y. For complex matrices (or vectors) we deﬁne the Hermitian conjugate as the complex conjugate of the transpose. you simply generalize all the results to work for complex matrices. For example. have complex entries. Express A† y in terms of the real and imaginary parts of A and y.

. Like the norm of a vector. u(N ). It generalizes the notions of left-inverse and right-inverse: When A = I. In this problem you will use least-squares to develop and validate autoregressive (AR) models of a system from some input/output (I/O) records.) The following interpretation is not needed to solve the problem. the matrices A and B) given in the mﬁle axb_data. . explain why this is the case. you can use ineﬃcient code that just loops over n. B ∈ Rq×n . . 6. divided by the squareroot of the sum of squares of y . Dividing by t=n+1 y(t) yields the relative error. and compare it to the actual output signal. . .1 AR system identiﬁcation. . u(N ). Your explanation should be as concrete as possible. . . it measures size. (When such an X exists. b1 . and the goal is to ﬁnd a matrix X that satisﬁes the equation. . y(1). . ˜ ˜ y (1). . You must explain how you found it. X is a left-inverse of B. but it doesn’t matter. The toeplitz() command may be helpful. . To validate or evaluate your models. we call it a middle inverse of the pair A. You will use least-squares to ﬁnd approximate models of the form y(t) = a0 u(t) + b1 y(t − 1) + · · · + bn y(t − n). You are given I/O records u(1). The squareroot of this quantity is the residual norm (on the 2 model data). with data (i. The matrices A and B are given. To develop the models for diﬀerent values of n. bn . ˜ ˜ To ﬁnd the predictive ability of an AR model with coeﬃcients a0 . If you succeed in ﬁnding an X that satisﬁes AXB = I. b1 . which are the measured input and output of an unknown system. . . The ﬁle IOdata.e. you do not have to try to use an eﬃcient method based on one QR factorization. . y . . bn that minimize N t=n+1 (y(t) − a0 u(t) − b1 y(t − 1) − · · · − bn y(t − n)) 2 where u. . You can do this by typing norm(A*X*B-eye(n)) in matlab. . and here you are using it only to check that AXB − I is small. 35.. where A ∈ Rn×p . and X ∈ Rp×q . . and when B = I.m. . You must also submit the matlab code and output showing that you checked that AXB = I holds (up to very small numerical errors). A vector signal x ∈ Rn is ﬁrst processed by one channel. You will plot the squareroot of the ˜ sum of squares of the diﬀerence. you can form the signal y (t) = a0 u(t) + b1 y (t − 1) + · · · + bn y (t − n) ˆ ˜ ˜ ˜ for t = n + 1. ˜ Compare this to the plot above. and submitting a printout of what matlab prints out. You’ll plot this as a function of n for n = 1. . for n = 1. y are the given data record. . y (N ). . . If you think there is no X that satisﬁes AXB = I. X is a right-inverse of A. N . . Speciﬁcally you will choose coeﬃcients a0 . please give it. . . B. . 29 . y(N ). 35. . you can try them on validation data records N u(1). (We haven’t yet studied the matrix norm. . We give it just to mention a concrete situation where a problem like this one might arise. One situation where this problem comes up is a nonstandard ﬁltering or equalization problem. .) You will solve a speciﬁc instance of this problem. or to determine that no such matrix exists. Brieﬂy discuss the results. .2 The middle inverse. .m contains the data for this problem and is available on the class web page. . . In this problem we consider the matrix equation AXB = I. and you must submit the code that you used to generate your solution.Lecture 6 – Least-squares applications 6. .

the ﬁt between Li and Li ). the outer diameter • d. for example. we have α > 0. the number of turns (which is a multiple of 1/4. implemented in CMOS. ) For inductor i. or by fabricating the inductor and measuring the inductance. . w. . The ﬁle inductor data. . Li = αnβ1 wi 2 dβ3 Di 4 ≈ Li i i Your solution must include a clear description of how you found your parameters. for use in RF circuits. but the constants β2 . . can be used for design of planar spiral inductors. β4 can be negative. We deﬁne the percentage error between Li and Li as ˆ ei = 100|Li − Li |/Li . . (The data is real. and Di (all in µm). β1 . The ﬁgure below shows a planar spiral inductor. β1 .represented by B. the problem. the inductance Li (in nH) obtained from measurements. the width of the wire • D. β4 so that β β ˆ for i = 1. . . as well as their actual numerical values. not that it would aﬀect how you solve the problem . β3 .3 Approximate inductance formula.e. wi . is to ﬁnd α. .) Your task. Note that we have not speciﬁed the criterion that you use to judge the approximate ˆ model (i. (The data are organized as vectors of length 50. In this problem you will develop a simple approximate inductance model of the form ˆ L = αnβ1 wβ2 dβ3 Dβ4 . d w D The inductor is characterized by four key parameters: • n. we leave that to your engineering judgment. At this point the signal is available for some ﬁltering or processing by us. The inductance L can be found by solving Maxwell’s equations. 30 .m on the course web site contains data for 50 inductors. if accurate enough. . But be sure to ˆ tell us what criterion you use. After this processing. . the inner diameter The inductance L of such an inductor is a complicated function of the parameters n. and also. . and D. d... 50. w13 gives the wire width of inductor 13. (since L is positive. where α. . Thus. .e. di . . β4 ∈ R are constants that characterize the approximate model. but that needn’t concern us) • w. which is represented by the matrix X. 6. Our goal is to do the intermediate processing in such a way that it undoes the eﬀect of the ﬁrst and last channels. i. we give parameters ni . β2 . which takes considerable computer time. it is acted on by another channel. given by A.) This simple approximate model.

6.Find the average percentage error for your model.5 Image reconstruction from line integrals. z(t − 9)..1*sin(2*t . . zls (the estimated values using least-squares). on the same plot. In that case you obtained the quadratic extrapolator by interpolating the last three samples.e. We’ll denote the predicted value of z(t + 1) by z (t + 1). (We are only asking you to ﬁnd the average percentage error for your model. . 1000. which is given by (1/990) 1000 2 z j=11 (ˆ(j) − z(j)) 1000 (1/990) j=11 z(j)2 1/2 . . using least-squares ﬁt.4 Quadratic extrapolation of a time series. .) Hint: you might ﬁnd it easier to work with log L. (c) In a previous problem you developed a similar predictor for the same time series z. We extrapolate. x1 xn+1 x2 xn x2n xn2 31 . which we divide into an n × n array of square pixels. i. z(t − 1).5). (b) Use the following matlab code to generate a time series z: 1×10 t = 1:1000. . . . Find c explicitly. where c ∈ R does not depend on t. we do not require that your model minimize the average percentage error. In this problem we explore a simple version of a tomography problem. z(t + 1) based on a least-squares ﬁt of a quadratic function to the previous ten elements of the series. . Find ˆ the relative root-mean-square (RMS) error. we ﬁnd the quadratic function f (τ ) = a2 τ 2 + a1 τ + a0 for ˆ ˆ which t τ =t−9 (z(τ ) − f (τ )) 2 is minimized. or predict. (e1 + · · · + e50 )/50. z(t − 9) . Compare the RMS error for these methods and plot z (the true values). We are given a series z up to time t. . z = 5*sin(t/10 + 2) + 0. We consider a square region.1*sin(t) + 0. z(t). . Use the quadratic extrapolation method from part (a) to ﬁnd zls (t) for t = 11. 6. . . The extrapolated value is then given by z (t + 1) = f (t + 1). and zint (the ˆ ˆ estimated values using interpolation). Restrict your plot to t = 1. ˆ (a) Show that z (t + 1) = c ˆ z(t) z(t − 1) . as shown below. to ﬁnd z (t + 1). . now you are obtaining it as the least squares ﬁt to the last ten samples. . 100. More precisely.

In this example. . the side length in pixels of the square region (n). . where lij gives the length of the intersection of line Lj with pixel i. 32 • n_pixels. . . n2 . From these measurements. the number of measurements (N ). line L x1 x4 l4 x2 x5 l5 l2 x3 l3 x6 x9 x8 l7 Now suppose we have N line integral measurements. . from a set of sensor measurements that we now describe. We are interested in some physical property such as density (say) which varies over the region. and v is a (small) measurement noise. N . j = 1. . . This is illustrated below for a problem with n = 3. . • y. associated with lines L1 . . . . . . i=1 where li is the length of the intersection of line L with pixel i (or zero if they don’t intersect). . . LN . The lines are characterized by the intersection lengths lij . To simplify things. by a single index i ranging from 1 to n2 . which you should download and run in matlab. a vector with the line integrals yj . . we want to estimate the vector of densities x. And now the problem: you will reconstruct the pixel densities x from the line integral measurements y. . The problem is to estimate the vector of densities x. N. . It creates the following variables: • N. Thus. n2 . The class webpage contains the M-ﬁle tomo_data. i = 1. we have l1 = l6 = l8 = l9 = 0. and we denote by xi the density in 2 pixel i.The pixels are indexed column ﬁrst. the whole set of measurements forms a vector y ∈ RN whose elements are given by n2 yj = i=1 lij xi + vj . we’ll assume that the density is constant inside each pixel. x ∈ Rn is a vector that describes the density across the rectangular array of pixels. as shown above. the sensor measurement for line L is given by n2 li xi + v. . . each measurement is corrupted by a (small) noise term. . i = 1. . . In other words. Then.m. . j = 1. . N. j = 1. Each sensor measurement is a line integral of the density over a line L. In addition.

Note: While irrelevant to your solution. given dj and θj (and the side length n). the j-th measurement is n2 zj = i=1 e−xi lij .. and display it as an image (of n by n pixels). line_pixel_length. . in case you’re curious. this is actually a simple version of tomography. j = 1. for t = 1. . redeﬁnes the axes of a plot to make the pixels square. . depends only the input at time t. • colormap gray. constant model: static linear: static aﬃne: static quadratic: y(t) = c0 y(t) = c1 u(t) y(t) = c0 + c1 u(t) y(t) = c0 + c1 u(t) + c2 u(t)2 • Dynamic models. whose i.m). . moving average (MA): y(t) = a0 u(t) + a1 u(t − 1) + a2 u(t − 2) autoregressive (AR): y(t) = a0 u(t) + b1 y(t − 1) + b2 y(t − 2) autoregressive moving average (ARMA): y(t) = a0 u(t) + a1 u(t − 1) + b1 y(t − 1) 33 . y(t) depends on u(s) for some s = t. . .. which you do need to use in order to solve the problem. (distance from the center of the region in pixels lengths) of each line. . converts the vector v (which must have n*m elements) into an n × m matrix (the ﬁrst column of A is the ﬁrst n elements of v. j on the image. N . You’ll know you have it right. • imagesc(A). on the same webpage. • Memoryless models. You will develop several diﬀerent models that relate the signals u and y. a vector containing the displacement dj . If an x-ray gets attenuated at rate xi in pixel i (a little piece of a cross-section of your body). In this problem you will use least-squares to ﬁt several diﬀerent types of models to a given set of input/output data. In a dynamic model.m. .m on the webpage. N .e.e.6 Least-squares model ﬁtting. j = 1. and • lines_theta. The data consist of a scalar input sequence u. displays the matrix A as an image. with the lij as before. Another common term for such a model is static. The ﬁle tmeasure. and we get n2 yj = i=1 xi lij .). This function computes the pixel intersection lengths for a given line. shows how the measurements were computed. i. Use this information to ﬁnd x. 6. • axis image. Matlab hints: Here are a few functions that you’ll ﬁnd useful to display an image: • A=reshape(v. which are linear dynamic models.m returns a n × n matrix. . etc. changes matlab’s image display mode to grayscaled (you’ll want to do this to view the pixel patch). N . and a scalar output sequence y. a vector containing the angles θj . the output at time t.• lines_d. You should take a look. We consider some simple time-series models (see problem 2 in the reader). That is. scaled so that its lowest value is black and its highest value is white. . . Now deﬁne yj = − log zj . but you don’t need to understand it to solve the problem. best known for its application in medical imaging as the CAT scan. In a memoryless model.n. We also provide the function line_pixel_length. y(t). u(t). . of each line. i. jth element corresponds to the intersection length for pixel i.

and h(0). which has inﬁnite memory. has a ﬁnite memory: y(t) depends only on the current and two previous inputs. ﬁnd the least-squares ﬁt to the given data. and plot the residual (which is y − y ). For this reason. a1 . y is not generated by any of the models above. 2 (Note that we start the sum at t = 2 which ensures that u(t − 1) and y(t − 1) are deﬁned. A communications channel is modeled by a ﬁnite-impulse-response (FIR) ﬁlter: n−1 τ =0 y(t) = u(t − τ )h(τ ).e. y is the channel output. i. The MA model.e. and g(0). . on the other hand. For each of these models. h(n−1) is the impulse response of the channel. Now you can plot u.. . ﬁnd parameter values that minimize the sum-of-squares of the residuals. Plot the output y predicted by your model. g(m − 1)) so that the ﬁlter output is approximately the channel input delayed by D samples. 6. where z : Z → R is the ﬁlter output. . the scalars ci . This is shown in the block diagram below. . where u : Z → R is the channel input sequence. . Since z = g ∗ h ∗ u (discrete-time convolution). due to the recursive dependence on y(t − 1). bi . The ˆ ˆ data for this problem are available from the class web page.m.Note that in the AR and ARMA models.e. . and b1 that minimize N t=2 (y(t) − a0 u(t) − a1 u(t − 1) − b1 y(t − 1)) . by following the link to matlab datasets. pick a0 . z(t) ≈ u(t − D). It is generated by a nonlinear recursion. Note: the dataset u.7 Least-squares deconvolution. the AR and ARMA models are said to have inﬁnite memory.) For each model. y(t) depends indirectly on all previous inputs. (g ∗ h)(t) ≈ 1 t=D We will refer to g ∗ h as the equalized impulse response. (Another term for this MA model is 3-tap system.. etc. which we are to design. ai . we want the least-squares equalizer is g that minimizes the sum-of-squares error (g ∗ h)(t)2 . Speciﬁcally. For example. . in an m-ﬁle named uy data. for the ARMA model. y : Z → R is the channel output. i. u(s) for s < t. the squareroot of the mean of the optimal residual squared. give the root-mean-square (RMS) residual. You will design a deconvolution ﬁlter or equalizer which also has FIR form: m−1 z(t) = τ =0 y(t − τ )g(τ ). Copy this ﬁle to your working directory and type uy data from within matlab.) Each of these models is speciﬁed by its parameters. In other words. t=D 34 . where taps refer to taps on a delay line. In terms of discrete-time convolution we write this as y = h ∗ u. y. this means that we’d like 0 t = D. . . This will create the vectors u and y and the scalar N (the length of the vectors). y u ∗h ∗g z The goal is to choose g = (g(0).. the goal is to make it as close as possible to a D-sample delay. g(m − 1) is the impulse response of the ﬁlter. . . . i.

and centered around zero. each sensor measurement might include a (common) oﬀset or bias.) You can assume that m ≥ n and the measurement matrix T a1 aT 2 A= . to form the signal z.. It will deﬁne the channel impulse response h as a matlab vector h. i.) Comment on what you ﬁnd. α + βiT from the sensor signal. the ith measurement. small. If your method always works. (You don’t need to worry about how to make this idea precise. i. Pass y through the least-squares equalizer found in part a. Plot the impulse responses of the channel (h) and the equalizer (g). least-squares estimation of x works well.e. (Indices in matlab run from 1 to n. Matlab hints: The command conv convolves two vectors. But we’re interested in the case where we don’t know the oﬀset α or the drift coeﬃcient β. with delay D = 12. u(t) = 0 for t < 0). the oﬀset α ∈ R. . unpredictable. . If we knew the oﬀset α and the drift term β we could just subtract the predictable part of the sensor signal. T seconds apart. 35 .e. m. and wi is part of the sensor error that is unpredictable. as well as a term that grows linearly with time (called a drift). Plot the equalized impulse response (g ∗ h). however.m) contains the channel output corresponding to a signal u passed through the channel (i. (You can remove the ﬁrst and last D samples of z before making the histogram plot.e. aT m i = 1.) (a) Find the least-squares equalizer g.. . Show how to use least-squares to simultaneously estimate the parameter vector x ∈ Rn .subject to the constraint To solve the problem below you’ll need to get the ﬁle deconv data. say so. yi . y = h ∗ u). small. (This isn’t really material. Give a histogram plot of the amplitude distribution of both y and z. u(t) ∈ {−1.e. the measurement error includes some predictable terms.. 6. . (b) The vector y (also deﬁned in deconv data.m from the class web page in the matlab ﬁles section. Clearly explain your method. it just makes the interpretation simpler. is taken at time t = iT .e. Usually we assume (often implicitly) that the measurement errors vi are random.8 Estimation with sensor oﬀset and drift. For example. the command hist plots a histogram (of the amplitude distribution). and starts at t = 0 (i. has rank n). 1}. We model this situation as vi = α + βiT + wi where α is the sensor bias (which is unknown but the same for all sensor measurements).) In such cases. starting at time t = T . . (g ∗ h)(D) = 1. β is the drift term (again the same for all measurements). while the argument of the channel impulse response runs from t = 0 to t = n − 1.. i where • yi is the ith (scalar) measurement • x ∈ Rn is the vector of parameters we wish to estimate from the measurements • vi is the sensor or measurement error of the ith measurement In this problem we assume the measurements yi are taken at times evenly spaced. and give a simple example where the conditions don’t hold. Otherwise describe the conditions (on the matrix A) that must hold for your method to work. and the drift coeﬃcient β ∈ R. . of length m = 20. The signal u is binary. Thus. and centered around 0. We consider the usual estimation setup: yi = aT x + vi . In some cases. is full rank (i. so h(3) in matlab corresponds to h(2)..

.6. We assume that measurement spots do not coincide with the source locations. Explain how you would identify or guess which one is malfunctioning. where v (k) are assumed to be small. m×n n These represent N samples or observations of the input and output. we want a speciﬁc numerical example. we have si = sj for i = j). or estimated. we let xj denote the emission rate for source j. Each source emits the pollutant at some emission rate. x ∈ R is the input vector. . (The emission rates are not the same as in part (b). In this part. . s3 = [1 2]T . we are given the following: x(1) . but the source and spot measurement locations are. Estimate the pollutant emission rates. y ∈ Rm is the output vector. but to simplify things we won’t concern ourselves with that. the pollutant concentration from a source follows an inverse square law. .e. . which are known. . give two speciﬁc diﬀerent sets of emission rates that yield identical spot measurements. . You are free to (brieﬂy) explain your example using concepts such as range. where A ∈ R . and is proportional to the emission rate. . we assume the spot measurements are exactly as described above. Be careful to use the right set of measurements for each problem! The spot measurements are not perfect (as we assumed in part (a)). The total pollutant measured at spot i is the sum of the contributions from the n sources.) 6. .m that is available on the class web site. . y (N ) ∈ Rm . We consider the standard setup: y = Ax + v.) (b) Get the data from the ﬁle emissions_data.9 Estimating emissions from spot measurements. . We measure the total pollutant level at m spots. Speciﬁcally. y (1) . and then estimate the source emission rates. It also gives two sets of spot measurements: one for part (b). . (a) Give a speciﬁc example of source and spot measurement locations. at known locations s1 . rank. (c) Now we suppose that one of the spot measurments is faulty. . we have ti = tj for i = j) and that none of the source locations is repeated (i. i. . we have y (k) = Ax(k) + v (k) .e. located at t1 .. Explain your method. and what your guess of the emission rates is. . respectively. . . The contribution from source j to measurement i is given by αxj / sj − ti 2 . In other words. In other words. k = 1. We also assume that none of the spot locations is repeated (i. where α is a known (positive) constant. its associated noise or error is far larger than the errors of the other spot measurements. t4 = [3 2]T .) The emission rates are to be determined. possibly corrupted by noise. i. we do not have sj = ti for any i or j. sn ∈ R2 . and v ∈ Rm is the noise or disturbance. and one for part (c). . .. and so on. the columns give the locations). This ﬁle deﬁnes three source locations (given as a 2 × 3 matrix. . Carry out your method on the data given in the matlab ﬁle. The problem is to estimate the (coeﬃcients of the) matrix A. tm ∈ R2 . they contain small noise and errors. we ignore the issue of noise or sensor errors.. (These are positive. . Speciﬁcally. . . To show that your conﬁguration is a valid example. t1 = [1 1]T . and give your estimate for the emissions rates of the three sources. . . We consider here the problem of estimating the matrix A. . . for which it is impossible to ﬁnd the emission rates given the spot measurements.e.. Be sure to tell us which spot measurement you believe to be faulty. x(N ) ∈ Rn . given some input/output data. N.10 Identifying a system from input/output data. ˆ based on the given input/output data.e. nullspace. such as as s1 = [0 1]T . with 4 sensors and 3 sources. You will use a least-squares criterion to form an estimate A of ˆ the matrix that minimizes the quantity A. and ten spot measurement locations (given as a 2×10 matrix). . but remember. you will choose as your estimate A N J= k=1 Ax(k) − y (k) 2 36 . (And similarly for the two emission rates that give the same spot measurements. There are n sources of a pollutant.

. with y = Ax+v. The n × N matrix variable X contains the input data x(1) .e. .11 Robust least-squares estimation methods. . state it clearly. . j=1 which is supposed to represent the most typical value of A. . we ﬁnd x(j) that minimizes A(j) x − y over x.e. Executing this mﬁle will assign values to m. . We refer to this ˆ value of x as xae . etc.. Here is the twist: we do not know the matrix A exactly. (a) Explain how to do this. Aavg = 1 k k A(j) . . the m × N matrix Y contains the output data y (1) .). . A method for guessing x in this situtation is called a robust estimation method.e. on the diﬀerent days. Now suppose we have a measurement y taken on a day when we did not calibrate the sensor system. your source code.) • The estimate then average method. • The average then estimate method. A(k) for the matrix A.. We then choose our estimate x to ˆ minimize the least squares residual using Aavg . 6. Since the ˆ ˆ ˆ matrices A(j) are close but not equal. where x ∈ Rn is a vector we’d like to estimate. We assume that m > n. . . We want to form an estimate x. and found the values A(1) . (You can assume that ˆ ˆ Aavg is full rank. . Three very reasonable proposals for robust estimation are described below. and create two matrices that contain the input and output data. v ∈ Rm is the vector of measurement errors. First.e.e. . i. n.. ˆ We don’t know A exactly. and A ∈ Rm×n . You may want to use the matrices X ∈ Rn×N and Y ∈ Rm×N given by X = x(1) · · · x(N ) . . x(N ) (i. .m. we ﬁnd the least-squares estimate of x for each of the calibration values. i. . y ∈ Rm is the vector of measurements. we ﬁnd that the estimates x(j) are also close but not equal. there is no discernable drift. You can assume that all of the matrices are full rank. given y. the variations appear to be small and random.. since it attempts to take into account the uncertainty in the matrix A. have rank n. for j = 1. based on this measurement. we form the average of the calibration values. i. for example. The measurement error v is not known. i. there are more measurements than parameters to be estimated.over A. . but is assumed to be small. k. but we can assume that it is close to the known values A(1) . to minimize Aavg x − y . Similarly. . respectively. ˆ We ﬁnd our ﬁnal estimate of x as the average of these estimates: xea = ˆ 1 k k x(j) . A(k) found during calibration. In fact we calibrated our sensor system on k > 1 diﬀerent days. There is no pattern to the (small) variations between the matrices. . ˆ j=1 (Here the subscript ‘ea’ stands for ‘estimate (then) average’. where the subscript stands for ‘average (then) estimate’. You must give your ﬁnal estimate ˆ A. the ﬁrst column of X contains x(1) . . If you need to make an assumption about the input/output data to make your method work. These matrices are close to each other. First. .) 37 . The goal is to estimate x. but not exactly the same. and N . Y = y (1) · · · y (N ) in your solution. and also give an explanation of what you did. We consider a standard measurement setup. y (N ) . (b) On the course web site you will ﬁnd some input/output data for an instance of this problem in the mﬁle sysid_data. .

Then. .. A(k) ). The noise is unknown. say whether the estimate x is a linear function of y. Once we’ve estimated x and v. The interference is unknown. based on a measurement that is corrupted by a small noise and also by an interference.12 Estimating a signal with interference.) Almir proposes the estimate and ignore method. v ∈ Rp is the interference. . If we make the guess x. along with some informal justiﬁcation from their proposers. ˆ (b) Are the three methods described above diﬀerent? If any two are the same (for all possible values of the data A(1) .) Each of the EE263 TAs proposes a method for estimating x. but I can improve it. A(k) and y). and use our estimate of x. If it is ˆ a linear function. . . exactly as in Almir’s method. If they are diﬀerent. then there is no hope of ﬁnding x. we then throw away v . give a formula for the matrix that gives x in terms of y. and just ignore it during the estimation process. and w is a noise. (If this last condition does not hold. (This is exactly as in Nikola’s method. Miki proposes the estimate and cancel method. but here we have subtracted oﬀ or ˜ cancelled the eﬀect of the estimated interference. we simply ignore our estimate of v. Nikola proposes the ignore and estimate method. m > n. we use standard least-squares to estimate x from y . He describes it as follows: We should simultaneously estimate both the signal x and the interference v. which is our measurement. We denote this ˆ estimate of x as xrms . . In Almir’s method. and that the ranges of A and B intersect only at 0. that need not be small. These methods. with the eﬀect of the estimated interference subtracted ˜ v oﬀ. is given by r(j) = A(j) x − y. our estimate ˆ of the interference. explain why. so we might as well treat it as noise. . This problem concerns three proposed methods for estimating a signal. (Here we have z = Bv + w. even when w = 0. since a nonzero interference can masquerade as a signal. . The RMS value of the collection of residuals ˆ is given by 1/2 k 1 r(j) 2 . but that doesn’t matter.e. from the simple model ˜ y = Ax + z. where A ∈ Rm×n and B ∈ Rm×p are known. then you should give a formula for Bea (in terms of ˆ A(1) . but cannot be assumed to be small. k j=1 In the minimum RMS residual method. You can assume that the matrices A and B are skinny and full rank (i. where xea = Bea y. We should simultaneously estimate both the signal x and the interference v. Here y ∈ Rm is the measurement (which is known). m > p). if you ˆ believe that xea is a linear function of y. for the model y = Ax + z (with z a noise) to estimate x. 6. using the jth ˆ calibrated value of A. are given below. using a standard least-squares method. . He describes it as follows: We don’t know the interference. using a standard least-squares method to estimate [xT v T ]T given y. He describes it as follows: Almir’s method makes sense to me. but I think we should use it. We can use the usual least-squares method. x ∈ Rn is the signal that we want to estimate. ˆ Here is the problem: (a) For each of these three methods.) 38 . For example. based on y. and can be assumed to be small. We have y = Ax + Bv + w. give a speciﬁc example in which the estimates diﬀer. We can form the “pseudo-measurement” y = y−Bˆ. based on y. then the residual.• Minimum RMS residuals method. we choose x to minimize this quantity.

The n components of y(t) might represent measurements of diﬀerent quantities. yn (t). To ﬁnd the parameter A that deﬁnes our model. ˆ t = 1. y2 (t + 1). which is a fancy name for a linear system driven by a noise. yn (t). Justify your answer. It is known that y1 (t + 1). Roughly speaking. “The methods are all the same”. T 1 y(t) 2 . There is one more twist. . yi (t + 1). Aij = 0 for i < j. which is short for vector auto-regressive. . Give your ﬁnal estimated model parameter A. i. to predict y(t + 1). yi (t + 1). (If your answer to part (b) is “The methods are all the same” then you can simply repeat here. using the data found in the vts_data. y(t). Once we have a model of this form.. t=2 over all lower-triangular matrices. Compare your mean-square prediction error to the mean-square value of y. m. or prices of diﬀerent assets. say so. The prediction error depends on the time-series data. Our goal is to develop a model that allows us to predict the next element in the time series. This problem concerns a vector time-series.) ˆ (b) Are the methods really diﬀerent? Identify any pairs of the methods that coincide (i. yn (t). It is called an VAR(1) model. . B. . . The second component. . whose columns are the vector time-series samples at t = 1. which contains an n × T matrix Y.. does not depend on yi+1 (t). A consultant proposes the following model for the time-series: y(t) = Ay(t − 1) + v(t). Carry out this method. give a speciﬁc numerical example in which the estimates diﬀer. (We keep the deﬁnition of the terms ‘small’ and ‘unpredictable’ vague.e. and submit the code that you use to solve the problem. .These descriptions are a little vague.) 6. It is also called a Gauss-Markov model. the ﬁrst component of the next time-series sample.e.) This type of model has several names. does not depend on y3 (t). To show two methods are the same. .e. . To show two methods are diﬀerent. . Explain your method. i. for each method give a formula for the estimate x in terms of A. If they are all three the same. (c) Which method or methods do you think work best? Give a very brief explanation. .13 Vector time-series modeling. does not depend on y2 (t). The idea here is that v(t) is unpredictable. . The prediction error is given by e(t) = y (t) − y(t). . (a) Give an explicit formula for each of the three estimates. given y(1). . . . . and the resulting mean-square error. T. . . . T. . . and also A. so we simply replace it with zero when we estimate the next time-series sample. T .. and the dimensions n. . 1 T −1 T e(t) 2 . . we can predict the next time-series sample using the formula y (t + 1) = Ay(t). T t=1 39 . In general. p. . since the exact meaning won’t matter. . . where the matrix A ∈ Rn×n is the parameter of the model. This means that the matrix A is lower triangular. .e. . at time period t. part of the problem is to translate their descriptions into more precise algorithms. . . y(1). . ˆ t = 2. the parameter in our model. the ith component. y. this means that the current time-series component yi (t) only aﬀects the next time-series components y1 (t + 1). .m. . you will use a least-squares criterion. with one time lag. (That is. always give exactly the same results). or all three diﬀerent. show that the formulas given in part (a) are equal (even if they don’t appear to be at ﬁrst). y(T ) ∈ Rn . t = 2. i. . You will pick A that minimizes the mean-square prediction error. . . T. . . and v(t) ∈ Rn is a signal that is small and unpredictable.. .

. for i = 1. . . . For example. we have a(k+1) ≈ a(k) . . You can ignore these pathologies. . . 6. . J= 1 N N i=1 |H(si ) − hi |2 . First. Please bear in mind that a0 . and s ∈ C is the complex independent variable. (1) (This can be done using ordinary linear least-squares. and then repeating. and zi denoting the values of these parameters at iteration k. This problem concerns a rational function H : C → C of the form A(s) . . . . . . backslash) in your explanation.) We can start the iteration with zi = 1. based on solving a sequence of (linear) least-squares problems. . . (Not needed to solve the problem. am and b1 .. . i = 1. . . . N . . . for ﬁnding coeﬃcients a. if) successive iterates are very close. you may not refer to any matlab notation or operators (and especially. hN ∈ C. . h1 . a and b are vectors containing the coeﬃcients of A and B (not including the constant coeﬃcient of B. N (which is what would happen if we set B (0) (s) = 1). based on the data given. bm ∈ R are real parameters. . which is ﬁxed at one). . We are given noisy measurements of H at some points on the imaginary axis. . . . . am ) ∈ Rm+1 and b = (b1 . we proceed as follows. .. we set zi = B (k) (si ). we can end up with zi = 0. am ∈ R and b1 . . then updating the numbers zi using the righthand formula. The data is a set of frequency response measurements. . The iteration is stopped when (or more accurately. The goal is to ﬁnd a rational transfer function that ﬁts the measured frequency response. . This problem explores a famous heuristic method. . bm are real. .e. . given z (k+1) . and h1 . We deﬁne a = (a0 . zi = B(si ). with a(k) .14 Fitting a rational transfer function to frequency response data. . which complicates solving the least-squares problem to ﬁnd a(k) and b(k) .e. Here ω1 . 40 . We start by expressing J in the following (strange) way: J= 1 N N i=1 A(si ) − hi B(si ) zi 2 . i. B(s) = 1 + b1 s + · · · + bm sm . N. (a) Explain how to obtain a(k+1) and b(k+1) . some data s1 = jω1 . Here a0 . . . predict what you think y(T + 1) is. sN = jωN ∈ C. H(s) = B(s) where A and B are the polynomials A(s) = a0 + a1 s + · · · + am sm . ..) You can think of H as a rational transfer function. and b(k+1) ≈ b(k) . . . B (k) denoting the associated polynomials. . however. b(k) . and A(k) . To judge the quality of ﬁt we use the mean-square error. let k denote (k) the iteration number. whereas many other variables and data in this problem are complex. hN are complex. bm ) ∈ Rm . You must explain the math. . . . with s the complex frequency variable. .e. and hope to choose a and b so that we have H(si ) ≈ hi . . . i. with some measurement errors. .Finally. . (k) Several pathologies can arise in this algorithm. Then we choose (k+1) (k+1) a and b that minimize 1 N N i=1 A(k+1) (si ) − hi B (k+1) (si ) zi (k+1) 2 . or a certain matrix can be less than full rank. i = 1. The method works by choosing a and b that minimize the lefthand expression (with zi ﬁxed). To update these parameters from iteration k to (k+1) iteration k + 1. . . b that approximately minimize J. i. Interpretation. ωN are real and nonnegative. More precisely.

y1 ). This ﬁle contains the data ω1 . The node incidence matrix A ∈ RK×N is deﬁned as 1 wire k goes to cell j. the wires in an IC are not run on straight lines directly between the cells. . yN ).) There are K wires that connect pairs of the cells. Give the ﬁnal coeﬃcient vectors a. since polyval expects the polynomial coeﬃcients to be listed in the reverse order than we use here. h1 . and the problem data. This is a bit tricky. . assuming that the wires are run as straight lines between the cells. The task of ﬁnding good positions for the free cells is called placement. yN ). and the destination cell number (i. (xN . (The ﬁxed cells correspond to cells that are already placed. i. whose positions are to be determined. . Here I and J are functions that map wire number (i.. to be the ﬁxed ones. using the command semilogy. to be the free ones. last iteration. . (In fact. is to place the free cells in order to minimize the the total square wire length. yn ). . and k on a linear scale. .. y1 ). . and yi gives the y-coordinate of cell i.. The goal in placing the free cells is to use the smallest amount of interconnect wire. hN . . j = I(k) Akj = 0 otherwise. Pretending that the wires do run on straight lines seems to give good placements. I(k)).. . . . We will take the ﬁrst n cells. The positions of the cells are (x1 . and apply it to the data given in rat_data.e. To describe the wire/cell topology and the functions I and J. Terminate the algorithm when a(k+1) − a(k) b(k+1) − b(k) ≤ 10−6 . and free cells.e. J(k)). given by K J= k=1 (xI(k) − xJ(k) )2 + (yI(k) − yJ(k) )2 .e. (xN . wire k goes from cell I(k) to cell J(k). We model a cell as a single point in R2 (which gives its location on the IC) and ignore the requirement that the cells must not overlap. (x2 . To evaluate b(s) you can use polyval([b(m:-1:1). called quadratic placement. or external pins on the IC.15 Quadratic placement. We have two types of cells: ﬁxed cells. .. but that’s another story. (xn . for the ﬁrst iteration. Speciﬁcally. . . . and the jth column of A is associated with the jth cell. To evaluate A(s) in matlab you can use the command polyval(a(m+1:-1:1). k) into the origination cell number (i.) One common method. we’ll use the node incidence matrix A for the associated directed graph. Note: no credit will be given for implementing any algorithm other than the one described in this problem. Note that the kth row of A is associated with the kth wire. b.s). y2 ).m. 6.e.e. xi gives the x-coordinate of cell i. you can either write your own (short) code. and the remaining N − n cells. Plot |H(jω)| on a logarithmic scale versus ω on a linear scale (using semilogy). or use the matlab command polyval. i. whose positions are ﬁxed and given. with J on a logarithmic scale. . ωN . To evaluate a polynomial in matlab. . at positions (xn+1 . . .s). We consider an integrated circuit (IC) that contains N cells or modules that are connected by K wires. 41 . at positions (x1 .(b) Implement the method. as well as m and N . respectively.1]. yn+1 ). i.e. . and the associated ﬁnal value of J. Plot J versus iteration k. j = J(k) −1 wire k goes from cell j.. We will assign an orientation to each wire (even though wires are physically symmetric).

. .16 Least-squares state tracking. We give you two more pieces of information: the system order n is less than 20.1 0 . . . 0 1 B = 0 . . y(220) are. y(200). y1 ).yfree. N . to predict what y(p + 1). (1/p) t=1 v(t)2 . The mﬁle also deﬁnes the node incidence matrix A. yn ). we deﬁne the mean-square tracking error as N 1 x(t) − xdes (t) 2 . i.. and K. . . 0. . The mﬁle qplace_data. Consider the system x(t + 1) = Ax(t) + Bu(t) ∈ Rn . .8 0. .yfixed. . (We have p = 200 in this speciﬁc problem. y(t) ∈ R is the measured output signal. . p: y(1). In this problem.001).e. You will also ﬁnd an mﬁle that plots a proposed placement in a nice way: view_layout(xfree. Suppose xdes (t) ∈ Rn is given for t = 1. where x(t) ∈ Rn .) Then. . .. and v(t) ∈ R represents an output noise signal. (b) True or false: If the system is controllable. and the RMS value of 1/2 p the noise. .and y-coordinates of the free and ﬁxed cells. Plot your optimal placement using view_layout. Your solution can involve a (general) pseudo-inverse. i. . Here is the problem: get the time series data from the class web site in the ﬁle timeseriesdata. 6. . or even the system order n. Be sure to explain how you solve this problem.A). which gives y(1). You may make an assumption about the rank of one or more matrices that arise. (c) Find E(uopt ) for the speciﬁc system with 0. We do not assume it is controllable. y(p). and to explain the matlab source code that solves it (which you must submit). For a given input u. This mﬁle takes as argument the x. which is K ×N . . .e. .17 Time series prediction. with x(0) = 0.and y-coordinates of the ﬁxed cells. . The goal is to predict y(t) for the next q time steps. We consider an autonomous discrete-time linear system of the form x(t + 1) = Ax(t). (b) In this part you will determine the optimal quadratic placement for a speciﬁc set of cells and interconnect topology.1 0 A= 1 0 1 xdes (t) = [t 0 0]T .xfixed. and N = 10. the state x(t) (including the initial state x(0)). y(t) = Cx(t) + v(t). it deﬁnes the dimensions n. as well as the node incidence matrix that describes the wires. predict what y(201).m deﬁnes an instance of the quadratic placement problem. i. . N (and is meant to be the desired or target state trajectory). Brieﬂy justify your answer. (x1 . you do not know the matrices A ∈ Rn×n or C ∈ R1×n . Plot your estimates 42 . .m. and N − n vectors xfixed and yfixed. is small (on the order of 0.. . such as placing all free cells at the origin. . You do know the measured output signal for t = 1. which give the x. .(a) Explain how to ﬁnd the positions of the free cells. (xn . there is a unique uopt that minimizes E(u). Check your placement against various others. Speciﬁcally. . 0 6. It plots the proposed placement. y(p + q) will be.e. . that minimize the total square wire length. E(u) = N t=1 (a) Explain how to ﬁnd uopt that minimizes E (in the general case). Give the optimal locations of the free cells.

3. v10 = 17. What is your method? If you make choices during your procedure. Sq ). S2 = {1. vq . S3 = {1. p. 4. 5}. . . . . . First we note that u1 = v2 − v1 . . vq . Now that we know u1 we can ﬁnd u2 from u2 = v4 − u1 = v4 − v2 + v1 . for i = 1. . As an example with p = 3 and q = 4. . . . S5 = {1. v5 = 20. Then we reconstruct u as u = Z −1 f . . . u3 . . v2 = 14. . . with p = 6 numbers and q = 11 subsets. . • The collection of subsets S1 . Justify why you believe this is the case. Note: this is only an example to illustrate the notation. just to make sure your prediction satisﬁes the ‘eyeball test’. Sq informative if we can determine. (You may also want to plot the whole ˆ ˆ set of data. we say it is uninformative. 2. v7 = 11. . . . . . v8 = 9. .) We call the collection of subsets S1 . . . S1 = {1. we show how to reconstruct u1 . 5. . Speciﬁcally. S10 = {4. 3}. . 3. . . S7 = {2. v11 = 15. but want to ﬁnd. from v1 . v4 = u1 + u2 . 6}. 5}. . . v3 = u1 + u3 . (b) This subproblem concerns a general method for reconstructing u = (u1 . vq ) (and of course. . we deﬁne fi as the sum of all vj . 3. of course. 6}. v1 = u 2 + u 3 .y (201). . (Of course. S4 = {1. v4 = 4. q. . we know v1 . The subsets are S6 = {2. over subsets that contain i: fi = i∈Sj vj .) Choose one of the following: 43 . and reconstruct u1 . (We know both vi and Si . 4}. 2}. . . or reconstruct. (a) This subproblem concerns the following speciﬁc case. . . . . . . 4. . Si denotes the subset of {1. u2 . 3}. .) For each i. . . To see this. . u1 . up without ambiguity. We deﬁne the subset count matrix Z ∈ Rp×p as follows: Zij is the number of subsets containing both i and j. the subsets S1 . S11 is informative. u6 . . . and u1 . 2. . v11 . 4. . . . . 5}. 2. S11 = {1. 4. give us the numbers. (Thus. S4 = {1. y (220). where uj . We do have information about sums of some subsets of the numbers. . 6}. 3}. whose given subset ˜ ˜ sums agree with the given v1 . . . . Zii is the number of subsets that contain i.18 Reconstructing values from sums over subsets. v2 = u1 + u2 + u3 . 6}. how do you make them? 6. . u6 . 6}. . This collection of subsets is informative. To justify this. . v6 = −5. . . vi = j∈Si Here. S9 = {3. There are real numbers u1 . . v9 = 1. If the set of subsets is not informative. and also. 3}. S2 = {1. This corresponds to the subsets S1 = {2. . 6}. 5. . . . The associated sums are v1 = −2. Choose one of the following: • The collection of subsets S1 . S3 = {1. In the same way we can get u3 = v3 − u1 = v3 − v2 + v1 . . 2. v3 = 6. . from t = 1 to t = 220. this requires that Z is invertible. . up ) given v = (v1 . . give two diﬀerent sets of values u1 . S8 = {3. . 5. u6 .) It is extremely important that you explain very clearly how you come up with your prediction. 2. S11 is uninformative. up that we do not know. p} that deﬁnes the partial sum used to form vi . i = 1.

6. and give its RMS value. . • The method can fail. If you believe this is the case. Explain how to ﬁnd y . . . .• The method works. . give a speciﬁc example.19 Signal estimation using least-squares. . . . a linear combination of the 22 signals fk (t) = e−(t−50k) 2 /252 . where pi ∈ Rn . . . . . This problem concerns discrete-time signals deﬁned for t = 1. f10 . (Please give us the simplest example you can think of. To convince us of this. L) = min{ z − u | u ∈ L}. i = 1. . . . explain why. ymeas (500).20 Point of closest convergence of a set of lines. ymeas (500)) in the RMS (root-meansquare) sense.e. y(500) that is thought to be. . .5 0 50 100 150 200 250 300 350 400 450 500 t As our estimate of the original signal. Plot the residual ˆ (the diﬀerence between these two signals) on a diﬀerent graph. . gk (t) = t − 50k 10 e−(t−50k) 2 /252 . . . for i = 1. We deﬁne the distance of a point z ∈ Rn to a line L as dist(z. described as Li = {pi + tvi | t ∈ R}. . whenever the collection of subsets is informative.m on the course web site. 10. . We’ll represent these signals by vectors in R500 . . m. . . and carry out your method on the signal ymeas given in ˆ sig_est_data. . or Z −1 f does not have the required subset sums. of a signal y(1).5 0 50 100 150 200 250 300 350 400 450 500 1. Plots of f4 and g7 (as examples) are shown below. where t = 1. with vi = 1.. but the method above fails. g10 . Plot ymeas and y on the same graph. even when the collection of subsets is informative. . where the collection of subsets is informative.5 1 0. with the index corresponding to the time. that is closest to ymeas = (ymeas (1). 500 and k = 0. 500. We have m lines in Rn . 1. . m. . .5 0 −0. . By ‘works’ we mean that Z is invertible. .5 g7 (t) 0 −0. g0 . at least approximately. either Z is singular. .) 6. . . y (500)) in the span of ˆ y ˆ f0 . . 44 . . . . . and that Z −1 f is the unique u with subset sums v. we will use the signal y = (ˆ(1). . and vi ∈ Rn . .5 −1 −1.5 1 f4 (t) 0. . i. . We are given a noisy measurement ymeas (1).

) dist(z. . . . .. sin φ Here φ is the elevation (which will be between 0◦ and 90◦ . .m. Li )2 . where d = 1. and a. Your job is to estimate the beam direction d ∈ R3 (which is a unit vector). If q = e3 (i. pm and v1 .e. please do so. given the lines (i. i. (b) Carry out your method on the data given in beam_estim_data. m i=1 (In other words. each of which generates a scalar signal that depends on the beam amplitude and direction. L) gives the closest distance between the point z and the line L. and the direction in which the photodetector is pointed. (a) Explain how to ﬁnd the point of closest convergence. A light beam with (nonnegative) amplitude a comes from a direction d ∈ R3 . . as the direction the ith photodetector is pointed. If your method works provided some condition holds (such as some matrix being full rank). where θi is the angle between the beam direction d and the outward normal vector qi of the surface of the ith photodetector. The numbers vi are small measurement errors. We assume that |θi | < 90◦ . and the noisy photodetector outputs. vm . the beam illuminates the top of the photodetectors. . Note that both of these are given in degrees. the vector of photodetector outputs. with pi = aα cos θi + vi . If you can relate this condition to a simple one involving the lines. . a vector det_az. .We seek a point z ⋆ ∈ Rn that minimizes the sum of the squares of the distances to the lines. 6. qm and d in R3 we will use azimuth and elevation. The point z ⋆ that minimizes this quantity is called the point of closest convergence. say so.. not radians. and a vector det_el.e. the azimuth is undeﬁned. (This means the beam travels in the direction −d. . which varies from 0◦ to 360◦ . qm ∈ R3 . respectively. which gives the elevation angles of the photodetector directions. This mﬁle deﬁnes p. Give your ﬁnal estimate of the beam amplitude a and beam direction d (in azimuth and elevation. . . . (a) Explain how to do this.m. . pm . point upward). . given p1 . . If some matrix (or matrices) needs to be full rank for your method to work. . . in degrees). . . and α is the photodetector sensitivity. photodetector i generates an output signal pi . say so. Please include this plot with your solution. . the photodetector sensitivity α. gives the direction in the plane spanned by the ﬁrst and second coordinates. which gives the azimuth angles of the photodetector directions. the beam amplitude.e. since all unit vectors in this problem have positive 3rd component. p1 . (b) Find the point z ⋆ of closest convergence for the lines with data given in the matlab ﬁle line_conv_data. dist(z. . pm ∈ R.. . You are given the photodetector direction vectors q1 . The simpler the method the better. This ﬁle contains n×m matrices P and V whose columns are the vectors p1 . the direction is directly up)..e. vm ). and v1 . i. deﬁned as follows: cos φ cos θ q = cos φ sin θ . . . using a method or methods from this class.21 Estimating direction and amplitude of a light beam. which we assume has norm one. The ﬁle also contains commands to plot the lines and the point of closest convergence (once you have found it). To describe unit vectors q1 . You can interpret qi ∈ R3 . 45 .) The beam falls on m ≥ 3 photodetectors. Speciﬁcally. . . The azimuth angle θ. .

which satisﬁes vec(V ) = Dx vec(U ). the goal is to ﬁll in or interpolate missing data in a 2D array (an image. . . or smoothly varying. given the known values in the array. The complete array is obtained by putting the entries of wknown and wunknown into the correct positions of the array. We can describe such an array by a matrix U ∈ Rm×n . . These are simple approximations of partial diﬀerentiation with respect to the x and y axes. We describe these operations using two matrices Zknown ∈ Rmn×k and Zunknown ∈ Rmn×(mn−k) .j − Uij . the goal is to guess (or estimate or assign) values for ui for i ∈ Iunknown .. R = Dx vec(U ) 2 + Dy vec(U ) 2 . . Thus. . We can represent this linear mapping as multiplication by a matrix Dx ∈ Rm(n−1)×mn . where Uij gives the real number at location i. and the mn − k unknown values in a vector wunknown ∈ Rmn−k . which is to interpolate some unknown values in an array in the smoothest possible way. It will also be convenient to describe such an array by a vector u = vec(U ) ∈ Rmn . . This problem concerns arrays of real numbers on an m × n grid. i = 1. . n − 1. but isn’t. un where U = [u1 · · · un ].6. j = 1. . i = 1. . so the reconstructed array is as smooth as possible. i. j. . we have U = vec−1 (u). which is a simple approximation of partial diﬀerentiation with respect to the y axis. m. We give the k known values in a vector wknown ∈ Rk . Such as array can represent an image.e. so that the resulting U is as smooth as possible. . is deﬁned as Wij = Ui+1. maps an m × n array U into an (m − 1) × n array W .j+1 − Uij . with i ∈ Iunknown . . mn} into two sets: Iknown and Iunknown . . . or a sampled description of a function deﬁned on a rectangle. . . . . . To deﬁne this precisely. from the vector. . m and j = 1. so it minimizes R.e.) We will need two linear functions that operate on m × n arrays. which satisﬁes vec(W ) = Dy vec(U ). and the index j as associated with the x axis. say). the number of elements in Iknown )..e. We’ll choose the values for ui . (This looks scarier than it is—each row of the matrix Dx has exactly one +1 and one −1 entry in it. Now we get to the problem. when Uij are all equal. i. Small R corresponds to smooth. . Here vec is the function that stacks the columns of a matrix on top of each other: u1 . and mn − k the number of unknown values (the number of elements in Iunknown ).. (This looks complicated. . . . 46 . j = 1.22 Smooth interpolation on a 2D grid. The roughness measure R is zero precisely for constant arrays. .) The other linear function. m − 1. . . U . The ﬁrst function takes as argument an m × n array U and returns an m × (n − 1) array V of forward (rightward) diﬀerences: Vij = Ui. n. . that satisfy vec(U ) = Zknown wknown + Zunknown wunknown . vec−1 just arranges the elements in a vector into an array. we partition the set of indices {1. We will think of the index i as associated with the y axis. n. respectively. We deﬁne the roughness of an array U as We deﬁne the matrix Dy ∈ R(m−1)n×mn . . vec(U ) = . To go back to the array representation. for i = 1. The roughness measure R is the sum of the squares of the diﬀerences of each element in the array and its neighbors. We are given the values ui for i ∈ Iknown . . We let k ≥ 1 denote the number of known values (i.

(Note that these signals are deﬁned for all integer times. . you cannot use Uorig to create your interpolated array U. wknown . you are given the problem data wknown (which gives the known array values). and vectors deﬁned above in your solution (e.23 Designing a nonlinear equalizer from I/O data. so multiplication with either matrix just stuﬀs the entries of the w vectors into particular locations in vec(U ). If you prefer a grayscale image. (b) Carry out your method using the data created by smooth_interpolation. Zknown and Zunknown . The mﬁle also includes the original array Uorig from which we removed elements to create the problem. (This was very nice of us.) Here φ : R → R. For this problem instance.) What this means is z(t) = M −1 τ =0 h(τ )v(t − τ ).m. which consists of a memoryless nonlinearity φ. with zeros substituted for the unknown locations.) You are welcome to look at the code that generates these matrices. Of course. If your solution is valid provided some matrix is (or some matrices are) full rank. You are welcome to use any of the operations.n). using imagesc(). . or don’t have a color printer. 47 where s > 0 is a parameter. This will allow you to see the pattern of known and unknown array values. wknown . Compare Uorig (the original array) and U (the interpolated array found by your method).. as well as an image containing the known values. and Zunknown (which gives the locations of the unknown array values. the matrix [Zknown Zunknown ] is an mn × mn permutation matrix. The ﬁle gives m. and the scalar signal z is the output.m. followed by a convolution ﬁlter with ﬁnite impulse response h. you can issue the command colormap gray. Dx . Zunknown .g. Hand in complete source code. This problem concerns the discrete-time system shown below. Your job is to ﬁnd wunknown that minimizes R. In fact. matrices. . u φ v ∗h z (This looks complicated. This function is shown below. and around 50% are unknown. which you are welcome to use. 6. Zknown . This ﬁle also creates the matrices Dx and Dy . v(t) = φ(u(t)). To visualize the arrays use the matlab command imagesc(). Dy . with matrix argument. . Zknown (which gives the locations of the known values). as well as the plots. vec−1 . but isn’t: Each row of these matrices is a unit vector. Hints: • In matlab. The scalar signal u is the input. • vec−1 (u) can be computed as reshape(u. around 50% of the array elements are known. n. ). by the way. Be sure to give the value of roughness R of U . say so. (a) Explain how to solve this problem. This is just so you can see how well your smooth reconstruction method does in reconstructing the original array. The mﬁle that gives the problem data will plot the original image Uorig. but you do not need to understand it. vec(U ) can be computed as U(:). with the speciﬁc form −1 ≤ a ≤ 1 a 1 − s + sa a>1 φ(a) = −1 + s + sa a < −1. not just nonnegative times. t ∈ Z. in some speciﬁc order).In summary. vec.

.t−M +1} g(τ )h(t − τ ). . Note that if ˆ g ∗ h = δ and s = s. To make sure our (standard) notation here is clear. . and v comes from the given output data z. ˆ ψ(φ(a)) = a for all a). which are unknown.) The term δ is the Kronecker delta. and g(0).. deﬁned as δ(0) = 1. . and M (the length of g. You do not know the parameter s. . recall that (g ∗ h)(t) = min{M −1. Here u refers to the given input data. . .. The nonlinear function φ represents a power ampliﬁer that is nonlinear for input signals larger than one in magnitude. . . i. i. δ(i) = 0 for i = 0. . t = 0. that minimize ˆ 1 J= N −M +1 N i=M v (i) − φ(u(i)) ˆ 2 . .φ(a) s 1 −1 −1 1 a Here is an interpretation (that is not needed to solve the problem). and also the length of h). 2M − 1. . Now. or the channel impulse response h(0). . . You also don’t know u(t). z ∗g v ˆ ψ u ˆ This means v (t) = ˆ M −1 τ =0 g(τ )z(t − τ ). ˆ We exclude i = 1.e. z(t) for t ≤ 0. another system that takes the signal z as input. . . . . ˆ v t ∈ Z. z(N ). . The convolution system represents the transmission channel.. g(1) corresponds to g(0). we come to the problem. (Note: in matlab conv(g. and ψ = φ−1 (i. (a) Explain how to ﬁnd s. an estimate of the saturation gain s. g(M − 1).e. . s is called the saturation gain of the ampliﬁer. . . h(M − 1). This equalizer will work well provided g ∗ h ≈ δ (in which case v (t) ≈ v(t)).e. ﬁnally. .h) gives the convolution of g and h. 48 . . z(−1). u(t) = ψ(ˆ(t)). z(1). u(N ). . We are going to design an equalizer for the system. You are given some input/output (I/O) data u(1)..t} τ =max{0. . we have J = 0. M − 1 in the sum deﬁning J because these terms depend (through v ) on ˆ z(0). . and gives an output u which is an approximation of the input signal u. but these vectors are indexed from 1 to M . ˆ Our equalizer will have the form shown below.

ˆ we can expect a large equalization error (i. For the purposes of ﬁnding u(t) you can assume that z(t) = 0 for t ≤ 0. 2. N.e. You are given some data x1 . . Give a one sentence comment about what you observe.24 Simple ﬁtting. It is common to write the index for a signal as an argument. where ‘best’ means minimizing i=1 (yi − (axi + b))2 . it is remotely possible that the second method will work better than the ﬁrst. N . if y ∈ RN is a signal. Finally.. generate a measurement vector y = Ax + v. (This chooses the entries from a normal distribution. with w(k) = 1/N . . (This is often called the ’best linear ﬁt’. .e. . . i. .1). . for t = 1. In this problem you will compare a least-squares estimate of a parameter vector (which uses all available measurements) with a method that uses just enough measurements. The discrete cosine transformation (DCT) of the signal y ∈ RN is another signal. (a) First we generate some problem data in matlab. Carry out the following steps.. N }. . yi ≈ axi + b. As a result. u(t) − u(t)) for t = 1. and the equalization ˆ error u(t) − u(t). the equalized signal u(t).e. . . 2N k = 1. (c) Now form a 20-long truncated measurement vector y trunc which consists of the ﬁrst 20 entries of y. k = 2. i. . as well as J. xN ∈ R and y1 . ˆ (c) Using the values of s and g(0). with index interpreted as (discrete) time. Call this estimate xjem (‘Just Enough Measurements’).26 Signal reconstruction for a bandlimited signal. and you’ll get diﬀerent numerical results in parts (b) and (c). for example.. N .m on the course web site. (a) Find the best aﬃne ﬁt. You’ll generate diﬀerent diﬀerent data each time. g(M − 1) found in part (b). . .1). Do not mention the incident to anyone. and submit the code you used to ﬁnd a and b. . . . Note..25 Estimating parameters from noisy measurements.m.(b) Apply your method to the data given in the ﬁle nleq_data. ﬁnd the equalized signal u(t). yi ≈ ax3 + bx2 + cxi + d.20). Plot the data and the ﬁt in the same ﬁgure. . typically denoted using the corresponding upper case symbol Y ∈ RN . but this doesn’t really matter for us. Generate a measurement vector x of length 20 using x=randn(20. N. (a)–(c)) several times. where w(k) are weights. Since you are generating the data randomly. 49 k = 1. In this problem we refer to signals. . 6. . at least for one run. . and call it xls .) Set this up and solve it as a least-squares problem. (You’re free to use any other software system instead. . yN ∈ R. quickly run your script again. (b) Repeat for the best least-squares cubic ﬁt. . i i 6. . .e. we use y(t) to denote yt . . Give us a and b. 2/N . These data are available in simplefitdata. Find the relative error of xjem . . It is deﬁned as N Y (k) = t=1 y(t)w(k) cos π(2t − 1)(k − 1) . . with t ∈ {1. ˆ Plot the input signal u(t). g(M − 1) found. . If this happens to you. . for ˆ ˆ t = 1. Give the values of the parameters s and g(0). N (d) Run your script (i. rather than as a subscript. Plot g using the matlab command stem. M − 1.) Generate a noise vector v of length 50 using v=0. Another notation you’ll sometimes see in signal processing texts is y[t] for yt . . . . . ˆ 6. (b) Find the least-squares approximate solution of y = Ax. . the output signal z(t). . . .) Generate a 50 × 20 matrix A using A=randn(50. . . .1*randn(50. Form an estimate of x from y trunc . which are just vectors. Find the relative error xls − x / x . .

i = 1. Here. a 24-periodic component): yt = at + pt . and sigma. . . . N . You are given noise-corrupted values of a DCT bandlimited signal y.. where 1 ≤ t1 < t2 < · · · < tM ≤ N : samp yi = y(ti ) + vi . t = 1. ˆ You might ﬁnd the matlab functions dct and idct useful. You don’t know v. respectively) over the week. . to four signiﬁcant ﬁgures. tsamp. K. . • Find an estimate of y. . a known constant.27 Fitting a model for hourly temperature. taken hourly over one week (so N = 168). . You cannot use (and do not need) any concepts from outside the class. .m. We can interpret a (which has units of degrees C per hour) as the warming or cooling trend (for a > 0 or a < 0. You are given a set of temperature measurements (in degrees C).) Your job is to • Determine the smallest DCT bandwidth (i. 50 has a small RMS value. While it need not match ˆ exactly (you were told there was a small amount of noise in y samp ). This includes the Fourier transform and other signal processing methods you might know. . an appropriate model for the hourly temperature is a trend (i. The DCT bandwidth of the signal y is the smallest K for which Y (K) = 0. where w(k) are the same weights as those used above in the DCT. . dct(eye(N)) and idct(eye(N)) will return matrices whose columns are the DCT. ysamp.e.. . M. . give us y (129). (The term is typically used to refer to the case when the DCT bandwidth. you should ensure that the vector of diﬀerences. respectively. and inverse DCT transforms. yM − y (tsamp )). ˆ Your estimate y must be consistent with the sample measurements y samp . (In signal processing. . . ˆ 1 ˆ M (a) Clearly describe how to solve this problem. on the order of σ (and certainly no more than 3σ). . 6. with N y(t) = k=1 Y (k)w(k) cos π(2t − 1)(k − 1) . . . Note. . . . via the inverse DCT transform. and Y (k) = 0 for k = K + 1. y samp would be called a non-uniformly sampled. 2N t = 1. a linear function of t) plus a diurnal component (i. however. When K < N . but you do know that its RMS value is approximately σ. . your estimate of the DCT bandwidth of y.The number Y (k) is referred to as the kth DCT coeﬃcient of y. yt ∈ R. (We have added the command to do this in bandlimit. It will also plot the sampled signal.e. (b) Carry out your method on the data in bandlimit. which has this bandwidth. N . ˆ where a ∈ R and p ∈ RN satisﬁes pt+24 = pt . . You can use any concepts we have used.) A signal y can be reconstructed from its DCT Y . is signiﬁcantly smaller than N . Now for the problem. that you can solve the problem without using these functions. Running this script will deﬁne N. M. An expert says that over this week.. Show y on the same plot as the original ˆ sampled signal. in EE263. Give K.e. the signal is called DCT bandlimited. .) Also. . y . vi are small noises or errors. but commented it out. N − 24. . . at some (integer) times t1 . . tM . . samp samp (y1 − y (tsamp ). to date.m. N. the smallest K) that y could have. . for t = 1. of the unit vectors. noise corrupted version of y.

which is characterized by three key parameters: k. . T − 1.m. T . (a) Explain how to do this. .) (c) Temperature prediction. . .m. For each data record.28 Empirical algorithm complexity. . . . 2. You know the matrices A. where α. . for example. 51 y(t) − C x(t). but commented out. β. . the corresponding runtimes can be (and often are) a little diﬀerent. mi . .m also contains a 24 long vector ytom with tomorrow’s temperatures. . you are given Ti (the runtime).e. and δ you ﬁnd. input u(t) ∈ Rm and output y(t) ∈ Rp . . We deﬁne the ﬁtting cost as N J = (1/N ) i=1 ˆ log(Ti /Ti ) 2 . Your job is to estimate the state trajectory x(t). T . The runtime T of an algorithm depends on its input data. B. γ. ˆ (b) Carry out the procedure described in part (a) on the data set found in tempdata. Plot the model y and the measured temperatures ˆ y on the same plot. or close to integers. C. δ ∈ R are constants that characterize the approximate runtime model. and n. . The signal w(t) is called the process noise. (b) Carry out your method on the data found in empac_data. say so. β.. . and the parameters ki . Use the model found in part (b) to predict the temperature for the next 24-hour period (i. T. If your method always ﬁnds the values that give the true global minimum value of J. 2. What is the RMS value of your prediction error for tomorrow’s temperatures? 6. If your algorithm cannot guarantee ﬁnding the true global minimum. and δ for which our model (approximately) ﬁts our measurements. and n. ˆ x t = 1. but diﬀerent values of T . It’s possible (and often occurs) that two data records have identical values of k. γ. m. T . γ. m. . we say that the algorithm has (approximately) cubic complexity in n. y(t) = Cx(t) + v(t). Plot tomorrow’s temperature and your prediction of it. . 6. . This i i ˆ ﬁtting cost can be (loosely) interpreted in terms of relative or percentage ﬁt. We wish to ﬁnd values of α. . . . . . δ ≈ 3. . on the same plot. ˆ t = 1. x(t + 1) − (Aˆ(t) + Bu(t)). and n has the form ˆ T = αkβ mγ nδ . T . and the corresponding value of J. t = 1. t = 1. Give the values of α. the inputs u(t). You do not know x(t). If. for t = 1. and the signal v(t) is the measurement noise. . with state x(t) ∈ Rn . γ. then ˆi lies between Ti / exp √ǫ and Ti exp √ǫ. m. .) A simple and standard model that shows how T scales with k. . (The matlab code to do this is in the ﬁle tempdata. This means the algorithm was run on two diﬀerent data sets that had the same dimensions. and outputs y(t). or v(t). 2. . If your method requires some matrix (or matrices) to be full rank. . t = 1. We consider a discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t) + w(t). w(t). say so. β ˆ where Ti = αki mγ nδ is the runtime predicted by our model. Give the value of the trend parameter a that you ﬁnd. T − 1. (These are typically integers that give the dimensions of the problem data. t = 1. from t = 169 to t = 192). based on the model found in part (b).29 State trajectory estimation. with diﬀerent sets of input data.) Now suppose you are given measured runtimes for N executions of the algorithm. . using the given parameter values. T Your task is to ﬁnd constants α. and δ need not be integers. . When we guess the state trajectory x(t). We will denote your estimate of the state trajectory as x(t). δ that minimize J. γ. . . β. . If (log(Ti /Ti ))2 ≤ ǫ. (In general. The ﬁle tempdata.(a) Explain how to ﬁnd a ∈ R and p ∈ RN (which is 24-periodic) that minimize the RMS value of y − y. and ni .m. t = 1. the exponents β. β. we have two ˆ ˆ sets of residuals. say so.

You may then recover X with the command X = reshape(x. (a) Explain how to ﬁnd the model parameters a and b(1) . which gives A. (of course you may not use it in forming your estimate x(t)). . where X is an n by m matrix. x(t). respectively. . t denotes the time. (b) Carry out your method from part (a) using state_traj_estim_data. . .30 Fleet modeling. . . x(k) (t) ∈ Rn the input. (a) Explain how to ﬁnd the state trajectory estimate x(t). t=1 k=1 y (k) (t) − aT x(k) (t) − b(k) 2 .m). B. 6. The mﬁle includes the true value of the state trajectory. respectively. . Here k denotes the vehicle number. We collect input and output data at multiple time instances. T.m. k = 1. m. where ρ > 0 is a given parameter (related to our guess of the relative sizes of w(t) and v(t)). . (This description is not needed to solve the problem. Plot x1 (t) (the true ﬁrst state component) and x1 (t) (the estimated ˆ ˆ ﬁrst state component) on the same plot. We will choose these to minimize the mean square error 1 E= TK T K y (k) (t) ≈ aT x(k) (t) + b(k) . using any concepts from the ˆ course. stacks the columns of X into a vector of dimension nm. but for simplicity here we consider the scalar output case. . we will consider model estimation for vehicles in a ﬂeet. for each vehicle in a ﬂeet of vehicles: x(k) (t) ∈ Rn . b(K) . and y (k) (t) ∈ R the output. . t = 1. . and the time horizon T .n. the output might be vertical acceleration. You will choose x(t) so as to minimize the overall objective ˆ J= T −1 t=1 T x(t + 1) − (Aˆ(t) + Bu(t)) ˆ x 2 +ρ t=1 y(t) − C x(t) ˆ 2 . say so. . If one or more matrices must satisfy a rank condition for your method to work. for example. t = 1. Matlab hints: • The matlab command x = X(:). (In the general case the output would also be a vector.) While it does not aﬀect the problem. C. and b(k) ∈ R is the (individual) oﬀset for the kth vehicle. The input and output trajectories are given as m × T and p × T matrices. • You might ﬁnd the matlab function blkdiag useful. for every commercial ﬂight. we describe a more speciﬁc application. T . . Airlines are required to collect this data. The components of the inputs might be. the dimensions n.which correspond to (implicit) estimates of w(t) and v(t). The objective J is a weighted sum of norm squares of our two residuals. . called FOQA data. 52 . (The tth column gives the vector at the tth period. In this problem.) We will ﬁt a model of the form where a ∈ Rn is the (common) linear model parameter. y (k) (t) ∈ R. K. . the deﬂections of various control surfaces and the thrust of the engines. where the vehicles are airplanes. the parameter ρ. p.) Give the value of J corresponding to your estimate. .

and b ∈ Rn . and the 1 × T row vector y{k} contains y (k) (1). (a) Show that u(t) has the form u(t) = round(kx(t)).) Our goal is to regulate the system. we will choose u(t) so as to minimize x(t + 1) . and report the associated mean square error E. . 1}. with x(t) ∈ Rn . . Give the model parameters a and b(1) . we don’t want to hear a long explanation or discussion.9 0 0 −. y (k) (T ).e. x(k) (T ). .15 A= . . Compare E to the (minimum) mean square error E com obtained using a common oﬀset b = b(1) = · · · = b(K) for all vehicles. (This is a simple.2 0 . Give k.1 −. . and plot x(t) and u(t) for t = 1. 2.1 . . 0. .m. i. .1 b= . . (The problem title comes from the restriction that the input can only take three possible values. . . 0.31 Regulation using ternary inputs.e.. i. Use the matlab function stairs to plot u(t).(b) Carry out your method on the data given in fleet_mod_data. straightforward question. choose the inputs u(t) so as to drive the state x(t) towards zero. −.15 0 1 where k ∈ R1×n . .) 6.2 0 −. . The data is given using cell arrays X and y. −4 53 . . b = 0. 1 0 round(a) = −1 (b) Consider the speciﬁc problem with data 1 .1 −4 0 x(1) = 0 . b(K) .) Give an explicit expression for k.2 −. suggest a vehicle you might want to have a maintenance crew check out. u(t) ∈ {−1. and round(a) rounds the a > 1/2 |a| ≤ 1/2 a < −1/2.2 1 0 . t = 1. 100. . . By examining the oﬀsets for the diﬀerent vehicles. . number a to the closest of {−1. . . . we have arbitrarily broken ties in favor of a = 0. The columns of the n × T matrix X{k} are x(k) (1). . .. Consider a discrete-time linear dynamical system x(t + 1) = Ax(t) + bu(t). 1}. (We don’t care about what happens when there are ties. We will adopt a greedy strategy: At each time t.

(a) Work out the details of the Gauss-Newton method for this ﬁtting problem. We are given a set of data.. The problem is to estimate the cell phone 54 . (This is possible because the base stations are synchronized using the Global Positioning System. A cell phone at location x ∈ R2 (we assume that the elevation is zero for simplicity) transmits an emergency signal at time τ .) (b) Get the data t. p = [a µ σ]T . µ. n. p(k+1) = p(k) + ∆p(k) . i. . t1 . . where c is the speed of light. µ ∈ R. Each base station can measure the time of arrival of the emergency signal. within a few tens of nanoseconds. i. . Explicitly describe the Gauss-Newton steps. Here t ∈ R is the independent variable. . You can assume that vi is on the order of a few tens of nanseconds. y1 . . yN . and vi is the noise or error in the measured time of arrival. You’ll use the Gauss-Newton method. This problem concerns a simpliﬁed version of an E-911 system that uses time of arrival information at a number of base stations to estimate the cell phone location. and our goal is to ﬁt a Gaussian function to the data. say so. (You should plot enough iterations to convince yourself that the algorithm has nearly converged. Note that E is a function of the parameters a. The parameter a is called the amplitude of the Gaussian. You’ll need an initial guess for the parameters.e. σ. A Gaussian function has the form f (t) = ae−(t−µ) 2 /σ 2 . 7. but diﬀerent initial guess for the parameters. not a good guess for the parameters. . Repeat for another set of parameters that is not reasonable. or estimate them by any other method (but you must explain your method). µ is called its center. (It’s possible. You can use the notation ∆p(k) = [∆a(k) ∆µ(k) ∆σ (k) ]T to denote the update to the parameters.2 E-911. This signal is received at n base stations. (Here k denotes the kth iteration. and σ is called the spread or width. . including the matrices and vectors that come up.1 Fitting a Gaussian function to data. given by E= 1 N N i=1 1/2 (f (ti ) − yi ) 2 . The federal government has mandated that cellular network operators must have the ability to locate a cell phone from which an emergency call is made.. of course. . For convenience we deﬁne p ∈ R3 as the vector of the parameters. . . Repeat for another reasonable. . Your job is to choose these parameters to minimize E.Lecture 7 – Regularized least-squares and Gauss-Newton method 7. available on the class website. We will measure the quality of the ﬁt by the root-mean-square (RMS) ﬁtting error.e. c i = 1. Implement the Gauss-Newton (as outlined in part (a) above). i. You can visually estimate them (giving a short justiﬁcation).) Plot the ﬁnal Gaussian function obtained along with the data on the same plot. that the Gauss-Newton algorithm doesn’t converge. and σ ∈ R are parameters that aﬀect its shape.m.. p.) Brieﬂy comment on the results you obtain in the three cases.. and a ∈ R. sn ∈ R2 . if this occurs. . or fails at some step. y (and N ) from the ﬁle gauss_fit_data. . .) The measured times of arrival are ti = 1 si − x + τ + vi . . We can always take σ > 0. located at locations s1 . i. . Plot the RMS error E as a function of the iteration number.e.e. tN .

which makes D = 0.e. We want both D and C to be small. 1] → R... . The problem is to identify the optimal trade-oﬀ curve between C and D. including the results of any veriﬁcation you do. To reduce the problem to a ﬁnite-dimensional one. we can choose G to be an aﬃne function (i. . to have G′′ (t) = 0 for all t ∈ [0. whose columns give the positions of the 9 basestations. it deﬁnes a 2 × 9 matrix S.e.m. Our goal is to ﬁnd another function G : [0. |x2 | ≤ 3000. At one extreme. deﬁned as C= 0 G′′ (t)2 dt. gi+1 − 2gi + gi−1 1/n2 gives a simple approximation of G′′ (i/n). the vectors f and g and the objectives c and d. and the constant c. and that |τ | ≤ 5000 (although all that really matters are the time diﬀerences). deﬁnes the data for this problem. 7. a 1 × 9 vector t that contains the measured times of arrival. • Mean-square curvature. deﬁned as 1 d= n n i=1 (fi − gi )2 . . You can assume that n is chosen large enough to represent the functions well. You can assume that the position x is somewhere in the box |x1 | ≤ 3000. times in nanoseconds.3 Curve-smoothing. gi = G(i/n). including how you will check that your estimate is reasonable. • The matlab source code you use to solve the problem. Your solution must contain the following: • An explanation of your approach to solving this problem. available on the course web site. i. 1 • Mean-square curvature. The mﬁle e911_data. Speciﬁcally. as well as the time of transmission τ . we can choose G = F . where fi = F (i/n). deﬁned as D= 0 1 (F (t) − G(t))2 dt. which is a smoothed version of F . we will represent the functions F and G (approximately) by vectors f. Using this representation we will use the following objectives. g ∈ Rn . at the other extreme. Distances are given in meters. tn . We’ll judge the smoothed version G of F in two ways: • Mean-square deviation from F . • The numerical results obtained by your method.position x ∈ R2 . so we have a problem with two objectives. which approximate the ones deﬁned for the functions above: • Mean-square deviation. and explain how to ﬁnd smoothed functions G on the optimal trade-oﬀ curve. In general there will be a trade-oﬀ between the two objectives. in which case C = 0. You will only work with this approximate version of the problem. and the speed of light in meters/nanosecond. and check the results. 55 . We are given a function F : [0. based on the time of arrival measurements t1 . . which is the speed of light. deﬁned as c= In our deﬁnition of c. note that 1 n−2 n−1 i=2 gi+1 − 2gi + gi−1 1/n2 2 . 1] → R (whose graph gives a curve in R2 ). 1]).

independence of some vectors. say with dotted line type. Plot the optimal g for the two extreme cases µ = 0 and µ → ∞.m from the course web site. on rank of some matrix. On your plots of g.e. This ﬁle deﬁnes a speciﬁc vector f that you will use. etc.. be sure to include also a plot of f . any intersection of the curve with an axis). for reference. state them clearly. Does your method always work? If there are some assumptions you need to make (say. which corresponds to minimizing d without regard for c. Be sure to identify any critical points (such as.). Find and plot the optimal trade-oﬀ curve between d and c. and also the solution obtained as µ → ∞ (i. Submit your matlab code. where µ ≥ 0 is a parameter that gives the relative weighting of sum-square curvature compared to sum-square deviation. as we put more and more weight on minimizing curvature). Explain how to obtain the two extreme cases: µ = 0.(a) Explain how to ﬁnd g that minimizes d + µc. for example. 56 . (b) Get the ﬁle curve smoothing. and for three values of µ in between (chosen to show the trade-oﬀ nicely).

x(1) = 0. The system starts from the zero state.. you should observe that y(t) appears to be converging to ydes . along with comments explaining it all. .e. logspace in matlab. Give a short intuitive ˙ explanation for what you see. while keeping J2 = t=0 10 f (t)2 dt small. subplot(412). 57 .. . 2. using. You don’t need more than 50 or so points on the trade-oﬀ curve. . the matlab source code you devise to ﬁnd the numerical answers.g. . (We start from initial time t = 1 rather than the more conventional t = 0 since matlab indexes vectors starting from 1. figure. Check that the end points make sense to you.:)).1 Optimal control of unit mass. and the resulting p and p. Our goal is to bring the ˙ mass near or to the origin at t = 10. so it usually works better in practice to give it a logarithmic spacing. and the ﬁnal plots produced by matlab. (a) Assume the mass has zero initial position and velocity: p(0) = p(0) = 0. . t = 1. . with u(t) = uss . even better. where f (t) = xi for i − 1 < t ≤ i. Hint: the parameter µ has to cover a very large range. where Ad ∈ R16×16 . subplot(414). 8. and gives formulas symbolically.. Find uss and xss . . p(10) = 0. (i. subplot(411). The goal is to ﬁnd an input u that results in y(t) → ydes = (1. plot(u(2. we want J1 = p(10)2 + p(10)2 . We start with the discrete-time model of the system used in lecture 1: x(t + 1) = Ad x(t) + Bd u(t). . not 0. 10. Consider a unit mass at position p(t) with velocity p(t). Cd ∈ R2×16 . exact convergence after T steps). . −2) as t → ∞ (i. subjected ˙ to force f (t).e. If you’ve done everything right.:)). .:)). at or near rest.. Make sure the speciﬁcations are satisﬁed. an input u that results in y(t) = ydes for t = T + 1. for i = 1. e.e. (b) Simple simulation.2 Smallest input that drives a system to a desired steady-state output. Find x that minimizes ˙ 10 f (t)2 dt t=0 subject to the following speciﬁcations: p(10) = 1. ˙ small. (a) Steady-state analysis for desired constant output. Suppose that the system is in steady-state. plot(y(1. (b) Assume the mass has initial position p(0) = 0 and velocity p(0) = 1. plot(u(1. asymptotic convergence to a desired output) or. and p(5) = 0.. Your solution to this problem should consist of a clear written narrative that explains what you are doing. Plot the optimal force ˙ f . .Lecture 8 – Least-norm solutions of underdetermined equations 8. Bd ∈ R16×2 . Find y(t).. . i. x(t) = xss . You can use the following matlab code to obtain plots that look like the ones in lecture 1. . Plot u and y versus t. y(t) = Cd x(t). u(t) = uss and y(t) = ydes are constant (do not depend on t).e.m. . 20000.e. i. plot(y(2. Plot the optimal trade-oﬀ curve between J2 and J1 .) The data for this problem can be found in ss_small_input_data. or at least not too large. for t = 1.:)). subplot(413). In this problem you will use matlab to solve an optimal control problem for a force acting on a unit mass. i. with initial state x(1) = 0. .

For the three cases T = 100. From this inequality it follows that whenever v = 0. then y(t) = ydes for t ≥ T + 1. x ∞ = max |xi |. Therefore if you can ﬁnd xmf ∈ Rn (mf stands for minimum fuel) and λ ∈ Rm such that Axmf = y and λT y xmf 1 = . since we can’t just diﬀerentiate to verify that we have the minimizer. Two common examples are the 1-norm and ∞-norm. For each of these three cases. plot u and y versus t.. . There is no simple formula for the least 1-norm or ∞-norm solution of Ax = y.Here we assume that u and y are 2 × 20000 matrices. (Explain why. (That’s one of the topics of EE364. which is like the Cauchy-Schwarz inequality (but even easier to prove): for any v. the following inequality holds: wT v ≤ v ∞ w 1 . Note that if u(t) = u⋆ (t) for t = 1. .e. . The plot is probably better viewed on a log-log scale. The mass starts at position p(0) = 0 with zero velocity and is required 58 .. v ∞ Now let z be any solution of Az = y. like there is for the least (Euclidean) norm solution.3 Minimum fuel and minimum peak input solutions. T = 200. for example. so there are many x’s that satisfy Ax = y. First verify the following inequality. you’ll use these ideas to verify a statement made during lecture.. (c) Smallest input. . however. how would you know that a solution of Ax = y has minimum 1-norm? In this problem you will explore this idea. and T = 500. Now consider the problem from the lecture notes of a unit mass acted on by forces x1 . if you like). In other words. . i=1. .. AT λ ∞ then xmf is a minimum fuel solution. . . There will be two diﬀerences between these plots and those in lecture 1: These plots start from t = 1. 8. we have exact convergence to the desired output in T steps. Let u⋆ (t). In many applications we want to minimize another norm of x (i. .1. . .. This solution has the minimum (Euclidean) norm among all solutions of Ax = y. (d) Plot the RMS value of u⋆ versus T for T between 100 and 1000 (for multiples of 10. . be the input with minimum RMS value 1 T T 1/2 u(t) t=1 2 that yields x(T + 1) = xss (the value found in part (a)). measure of size of x) subject to Ax = y. and then u(t) = uss for t = T + 1. which are deﬁned as n x 1 = i=1 |xi |. x10 for one second each. . for t = 1. In the rest of this problem. is often a good measure of fuel use. any solution of Az = y must have 1-norm at least as big as the righthand side expression. T . and the plots in lecture 1 scale t by a factor of 0. . z 1≥ AT λ ∞ Thus. Explain why we must have λT y . For example. and let λ ∈ Rm be such that AT λ = 0. ﬁnd u⋆ and its associated RMS value. w 1 ≥ wT v .) Methods for computing xmf and the mysterious vector λ are described in EE364. w ∈ Rp . Suppose A ∈ Rm×n is fat and full rank. T . the ∞-norm is the peak of the vector or signal x. In lecture we encountered the least-norm solution given by xln = AT (AAT )−1 y. T + 2.) The analysis is a bit trickier as well. which can be done using the command loglog instead of the command plot.n The 1-norm. .. They can be computed very easily.

Here x is some variable we wish to estimate or ﬁnd.1) and norm(z. the (Euclidean) least-norm solution. many force vectors that satisfy these requirements. −1/9). an accelerating force at the beginning.e. for fun: what do you think is the minimum peak force vector xmp ? How would you verify that a vector xmp (mp for minimum peak) is a minimum ∞-norm solution of Ax = y? This input. norm(z. . Such a reconstruction matrix has the nice property that it recovers x perfectly from either set of measurements (y or y ). It is (basically) the input used in a disk drive to move the head from one track to another. or explain why there is no such H. 1 1 −1 −3 −1 2 1 2 8. We consider the phased-array antenna system shown below.4 Simultaneous left inverse of two matrices. by the way. . ˜ where G ∈ Rm×n . of course. −5). ˙ In the lecture notes.g. impinges on the array. and y gives the measurements with some alternate set ˜ ˜ of (linear) sensors. In class I stated that the minimum fuel solution is given by xmf = (1/9. Consider a system where y = Gx. ˜ y = Gx ˜ Either ﬁnd an explicit reconstruction matrix H.and ∞-norm of a vector z. There are.e. x = Hy = H y . with spacing d between elements. A sinusoidal plane wave. p(10) = 0. Hint: try λ = (1. Verify that the 1-norm of xmf is less than the 1norm of xln .5 Phased-array antenna weight design. Consider the speciﬁc case ˜ ˜ 2 3 −3 −1 1 0 −1 0 ˜ G = 0 4 . while respecting a maximum possible current in the disk drive motor coil.. One last question. the output of element 1 does not depend on the 59 . and a (braking) force at the end to decelerate the mass to zero velocity at t = 10. Hints: • The input is called bang-bang.to satisfy p(10) = 1. There are several convenient ways to ﬁnd the 1. i.inf) or sum(abs(z)) and max(abs(z)). y gives the measurements with some set of (linear) sensors. you can see a plot of the least (Euclidean) norm force proﬁle. G = 2 −3 . Prove this. G ∈ Rm×n . . e. i. Feel free to use matlab. with wavelength λ and angle of arrival θ. 8.. i.. 0. is very widely used in practice. We want to ﬁnd a reconstruction matrix H ∈ Rn×m such that HG = H G = I. .e. (We’ve chosen the phase center as element 1. 8 seconds of coasting. 0. which yields the output e2πj(k−1)(d/λ) cos θ (which is a complex number) from the kth element. θ d The array consists of n individual antennas (called antenna elements) spaced on a line. • Some people drive this way..

.. and x2 and x3 are the currents ﬂowing out of the node. The vectors ai and the constants bi are known. versus k (which. the real and imaginary parts of the antenna weights).) To solve this problem..e. will achieve the desired goal of being relatively insensitive to plane waves arriving at angles more than 10◦ from θ = 70◦ ). is called the antenna array gain pattern. . ‘jammers’ or multipath reﬂections) coming from other directions.g.e. 60 . you might ﬁnd it fun to also plot your antenna gain pattern on a polar plot. In this problem. A’ gives the conjugate transpose. . this is done using the polar command. then we get a uniform or omnidirectional gain pattern. the measurement y won’t satisfy the conservation laws (i. As a simple example. we want a gain pattern that is one in a given (‘target’) direction θtarget .4λ (which is a typical value). w2 = · · · = wn = 0.6 Modifying measurements to satisfy known conservation laws.) A (complex) linear combination of these outputs is formed. you must explain how you solve the problem. . design. but small at other angles. . aT x = bi . The combined array output depends on the angle of arrival of the wave. and also a plot of the antenna array response. . . i. . In matlab.e. you will ﬁrst discretize the angles between 0◦ and 180◦ in 1◦ increments. y(πk/180). and we want |y(θ)| small for 0◦ ≤ θ ≤ 60◦ and 80◦ ≤ θ ≤ 180◦ . m. i i = 1. the prime is actually the Hermitian conjugate operator.. and attenuate signals (e. You will design the weights for an array with n = 20 elements. if A is a complex matrix or vector. Here’s the problem. 180. which are the coeﬃcients of the linear combination.. i. if we choose the weights as w1 = 1. . n y(θ) = k=1 wk e2πj(k−1)(d/λ) cos θ .e. You are to choose w ∈ C20 that minimizes 60 180 k=1 |yk |2 + k=80 |yk |2 subject to the constraint y70 = 1. and a spacing d = 0. . . i. wn . am are independent. charge .. if x1 is the current in a circuit ﬂowing into a node. which allows you to easily visualize the pattern. and called the combined array output. . Thus y ∈ C180 will be a (complex) vector. or Hermitian conjugate. or balance equations (mass. hopefully.incidence angle θ. In other words. The measurements are good.. From physical principles it is known that the quantities x must satisfy some linear equations. we want the antenna array to be relatively insensitive to plane waves arriving from angles more than 10◦ away from the target direction.e. so we have y ≈ x. Give the weights you ﬁnd. for k = 1. wn intelligently. . and real matrices. . ). Such a pattern would receive a signal coming from the direction θtarget . energy. we want a beamwidth of 20◦ around a target direction of 70◦ . the linear equations might come from various conservation laws. |yk |. . with yk equal to y(k ◦ ). A vector y ∈ Rn contains n measurements of some physical quantities x ∈ Rn . but not perfect. . of A. Due to measurement errors. where m < n. . . . As usual. |y(θ)| = 1 for all θ. As a simple example.e. We can choose. Hints: • You’ll probably want to rewrite the problem as one involving real variables (i. then we must have x1 = x2 + x3 . for 0◦ ≤ θ ≤ 180◦ . The complex numbers w1 . • Very important: in matlab. 8. . heat. (In the language of antenna arrays. i. . the weights. • Although we don’t require you to. and we assume that a1 .. The function |y(θ)|. we can shape the gain pattern to satisfy some speciﬁcations. More generally. By choosing the weights w1 . are called the antenna weights. We want y(70◦ ) = 1. In other words. . You can then rewrite your solution in a more compact formula that uses complex matrices and vectors (if you like).

We consider the usual measurement setup: y = Ax + v.7 Estimator insensitive to certain measurement errors. fk ∈ Rm are given. In this problem we consider the stronger condition that the estimator predicts x perfectly. Using these reference directions. known vectors. and any v ∈ V. . explain why the matrix to be inverted is nonsingular. Each communication link connects two (distinct) nodes and is bidirectional. An engineer proposes to adjust i the measurements y by adding a correction term c ∈ Rn . which deﬁnes the direction of information ﬂow that we will call positive. for any x ∈ Rn . in matlab) that it satisﬁes the required properties. in terms of y. an unbiased estimator predicts x perfectly when there is no measurement error. given by yadj = y + c.e. and n communication links. we say ˆ that the estimator is insensitive to measurement errors in V. aT yadj = bi .e. Verify that the resulting adjusted measurement satisﬁes the conservation laws. you must explain how you found it. . f2 = . With each communication link we associate a directed arc. 1 2 −1 1 0 3 3 2 2 1 (b) Now we consider a speciﬁc example. and A is full rank. although we would expect aT y ≈ bi . 8. . We consider a communications network with m nodes.. In other words. She proposes to ﬁnd the smallest possible correction term (measured by c ) such that the adjusted measurements yadj satisfy the known conservation laws. i.8 Optimal ﬂow on a data collection network. (Note that this condition is a stronger condition than the estimator being unbiased. ai . . then there is no estimator insensitive to measurement errors in V. i 8.e. for any measurement error in V. to get an adjusted estimate of x. Give an explicit formula for the correction term. (The units are bits per second. where • x ∈ Rn is the vector of parameters we wish to estimate • A ∈ Rm×n is the coeﬃcient matrix relating the parameters to the measurements You can assume that m > n. the ﬂow in each communication link) is given by a 61 .. If this condition holds. .linear equations above) exactly. we have x = x. In other ˆ words. and verify (say. If you ﬁnd such a B. . If any matrix inverses appear in your formula. information can ﬂow in either direction. . there is a path. but that won’t matter to us. the ﬂow or traﬃc on link j is denoted fj . fk }..) (a) Show that if R(A) ∩ V = {0}. from every node (including the special destination node) to every other node. . i. (We’ll be really annoyed if you just give a matrix and leave the veriﬁcation to us!) f1 = . plus a special destination node. or explain in detail why none exists.) The traﬃc on the network (i. or sequence of links. bi . In this problem we assume that the measurement errors lie in the subspace V = span{f1 . we have x = x. We will assume that the network is connected. with 1 0 1 1 1 −1 . for any x ∈ Rn . Now consider a linear estimator of the form x = By. i. ˆ Recall that the estimator is called unbiased if whenever v = 0. • v ∈ Rm is the vector of measurement errors • y ∈ Rm is the vector of measurements where f1 .. A= 2 1 −1 2 Either construct a speciﬁc B ∈ R2×5 for which the linear estimator x = By is insensitive to ˆ measurement errors in V.e.

Links 4 and 5 enter this node. plus the external ﬂow. s ∈ Rm . equals the sum of the ﬂows leaving that node on communication links. The external ﬂows are 1 4 s= 10 . f5 . Therefore. In other words. consider node 3 in the network of part 2. from node 3 to node 1. This simple routing scheme results in the feasible ﬂow 5 4 0 fsimple = 0 . and link 6 leaves the node. Information ﬂow is conserved. ﬂow conservation at node 3 is given by f4 + f5 + s3 = f6 . and f6 . all the external ﬂow entering node 2 goes to node 1. In this example.vector f ∈ Rn . we arbitrarily choose the path through node 4. then to the destination node. 10 One simple feasible ﬂow is obtained by routing all the external ﬂow entering each node along a shortest path to the destination. 0 10 20 62 . The node incidence matrix is deﬁned as 1 arc j enters (or points into) node i −1 arc j leaves (or points out of) node i Aij = 0 otherwise. an external information ﬂow si (which is nonnegative) enters.) At node i. Note that each row of A is associated with a node on the network (not including the destination node). and the network topology. The nodes are shown as circles. and each column is associated with an arc or link. The vector s ∈ Rm of external ﬂows is sometimes called the source vector. and the associated arc points from node 1 to node 3. External information enters each of the m regular nodes and ﬂows across links to the special destination node. nodes 1 and 3 are connected by communication link 4.e. the last term on the left gives the external ﬂow entering node 3. the network is used to collect information from the nodes and route it through the links to the special destination node. from node 1 to node 3. For node 3. are given. (b) Now consider the speciﬁc (and very small) network shown below. (That explains why we call it a data collection network. The term on the righthand side gives the ﬂow leaving over link 6. A small example is shown in part 2 of this problem. As an example. For example. This means that at each node (except the special destination node) the sum of all ﬂows entering the node from communication links connected to that node. Finally.. here is the problem. n 1 f 2. and the special destination node is shown as a square. (a) The vector of external ﬂows. i. which has two shortest paths to the destination. and the node incidence matrix A ∈ Rm×n that describes the network topology. Similarly. and you must ﬁnd the ﬂow f that satisﬁes the conservation equations. n j=1 j Your answer should be in terms of the external ﬂow s. Thus f4 = 12 means the ﬂow on that link is 12 (bits per second). and minimizes the mean-square traﬃc on the network. The ﬁrst two terms on the left give the ﬂow entering node 3 on links 4 and 5. Note that this equation correctly expresses ﬂow conservation regardless of the signs of f4 . f4 = −3 means the ﬂow on link 4 is 3 (bits per second).

A sinusoidal plane wave.e. wn intelligently. wn . and attenuate signals (e. .. Compare the mean square ﬂow of the optimal ﬂow with the mean square ﬂow of fsimple . The complex numbers w1 . but small at other angles. The combined array output depends on the angle of arrival of the wave. We want r(70◦ ) = 1. with the coordinates of the kth element being xk and yk . which yields the output ej λ (xk cos θ+yk sin θ) (which is a complex number) from the kth element. which are the coeﬃcients of the linear combination. . yk ) The array consists of n individual antennas (called antenna elements) randomly placed on 2d space. design. Design the weights for the antenna array. We can choose. In this problem. We consider the phased-array antenna system shown below. The function |r(θ)|. s2 f2 s1 1 f3 f1 4 f7 D 8. A (complex) linear combination of these outputs is formed. By choosing the weights w1 .Find the mean square optimal ﬂow for this problem (as in part 1). . for 0◦ ≤ θ < 360◦ . Here’s the problem.9 Random geometry antenna weight design. and we want |r(θ)| small for 0◦ ≤ θ ≤ 60◦ and 80◦ ≤ θ < 360◦ . i. (In the language of antenna arrays. . .m. n r(θ) = k=1 wk ej 2π λ (xk cos θ+yk sin θ) . . we want a gain pattern that is one in a given (‘target’) direction θtarget . f6 2 f4 f5 3 s3 s4 θ (xk .g. In other words. are called the antenna weights. the weights. impinges on the array.. we want the antenna array to be relatively insensitive to plane waves arriving from angles more that 10◦ away from the target direction. is called the antenna array gain pattern. whose elements have coordinates given in the ﬁle antenna_geom. we can shape the gain pattern to satisfy some speciﬁcations. with wavelength λ 2π and angle of arrival θ. . ‘jammers’ or multipath reﬂections) coming from other directions. . and called the combined array output. Such a pattern would receive a signal coming from the direction θtarget . 63 .

(In other words.) This might occur in a communications system. • Very important: in matlab.e. the vector we are estimating is known ahead of time to have norm one. where the transmitted signal power is known to be equal to one. z(0) = z0 . which allows you to easily visualize the pattern. taking into account the prior information x = 1. As usual. given A and y. we add one more piece of prior information: we know that x = 1. . Hints: • You’ll probably want to rewrite the problem as one involving real variables (i.e. we assume that smaller values of v are more plausible than larger values. In this problem. A’ gives the conjugate transpose. which gives the matrix A and the observed vector y. As usual. hopefully. this is done using the polar command. • Although we don’t require you to. . will achieve the desired goal of being relatively insensitive to plane waves arriving at angles more than 10◦ from θ = 70◦ ). Give the matlab source you use to compute x. 360.. The dynamics of two vehicles. . In matlab. . In other words. You can then rewrite your solution in a more compact formula that uses complex matrices and vectors (if you like).we want a beamwidth of 20◦ around a target direction of 70◦ . z(t + 1) = F z(t) + gv(t) where • x(t) ∈ Rn is the state of vehicle 1 • z(t) ∈ Rn is the state of vehicle 2 • u(t) ∈ R is the (scalar) input to vehicle 1 • v(t) ∈ R is the (scalar) input to vehicle 2 The initial states of the two vehicles are ﬁxed and given: x(0) = x0 . Thus r ∈ C360 will be a (complex) vector. are given by x(t + 1) = Ax(t) + bu(t).) You are told that λ = 1. To solve this problem. x ∈ Rn is the vector we wish to estimate. Give the weights you ﬁnd. you must explain how you solve the problem.. for k = 1. and verify that it satisﬁes x = 1. and real matrices. Carry out the estimation procedure you developed in part (a).. if A is a complex matrix or vector. |rk |. with rk equal to r(k ◦ ). and also a plot of the antenna array response. or Hermitian conjugate. We consider a standard estimation setup: y = Ax + v. 8. the prime is actually the Hermitian conjugate operator. Give your estimate x.. i. i. i. . skinny matrix.. v ∈ Rm is an unknown noise vector. 64 . you will ﬁnd the ﬁle mtprob4. 2. . versus k (which. the real and imaginary parts of the antenna weights). (AT A)−1 AT y > 1. 1.e. r(πk/180).11 Minimum energy rendezvous. . at sampling times t = 0. you will ﬁrst discretize the angles between 1◦ and 360◦ in 1◦ increments. ˆ ˆ ˆ 8. and y ∈ Rm is the measurement vector. You are to choose w ∈ Cn that minimizes 60 360 k=1 |rk |2 + k=80 |rk |2 subject to the constraint r70 = 1. Is your estimate ˆ x a linear function of y? ˆ (b) On the EE263 webpage.10 Estimation with known input norm. you might ﬁnd it fun to also plot your antenna gain pattern on a polar plot.) (a) Explain clearly how would you ﬁnd the best estimate of x. (You may assume that the norm of the least-squares approximate solution exceeds one.e. Explain how you would compute your estimate x.m. where A ∈ Rm×n is a full rank. of A.

where m < n (i. N − 1 so that they rendezvous at state w ∈ Rn at time t = N . . and w that satisfy the rendezvous condition. and the equation residual. i. . . i. u(N − 1). . linear plus a constant). Use the notation x(k) to denote the kth iteration of your method. v(1). g. v.) Your method should have the property that f (x(k) ) converges to y as k increases.1x1 x2 − 0. x0 .. v. we want the one that minimizes the total input energy deﬁned as E= N −1 t=0 u(t)2 + N −1 t=0 v(t)2 . Explain clearly how you obtain x(k+1) from x(k) . When the function f is linear or aﬃne (i.e. In general. . (a) Suggest an iterative method for (approximately) solving the nonlinear least-norm problem. If you need to make any assumptions about rank of some matrix.. Suppose f : Rn → Rm is a function. etc. i. Among choices of u.) Suggest a name for the method you invent. u(1). You do not have to prove that the method converges. which we’ll call x(0) .. There are. that’s ﬁne.3x3 x4 .5x2 x5 . as well as the rendezvous point w ∈ Rn .. some good heuristic iterative methods that work well when the function f is not too far from aﬃne. (You don’t have to worry about what happens if the matrix is not full rank. If you need to assume that one or more matrices that arise in your solution are invertible. Make sure to turn in your matlab code as well as to identify the least-norm x you ﬁnd.) As initial guess. it converges to a least-norm solution. u(0). Use the method you invented in part a to ﬁnd the least-norm solution of f (x) = y = 1 1 . and we know how to ﬁnd the least-norm solution for such problems. z(N ) = w. . .e.. its norm. v(0). We say that x ∈ Rn is a least-norm solution of f (x) = y if for any z ∈ Rn that satisﬁes f (z) = y.12 Least-norm solution of nonlinear equations. and z0 . and w in terms of the problem data. (The point w ∈ Rn is called the rendezvous point. .e. and y ∈ Rm is a vector. i.We are interested in ﬁnding inputs for the two vehicles over the time interval t = 0. however. x has larger dimension than y). and also a quadratic part. 2x1 − 3x3 + x5 + 0. x(N ) = w. This guess doesn’t necessarily satisfy the equations f (x) = y. we don’t need to have the iterates satsﬁy the nonlinear equations exactly. All you have to do is suggest a sensible. it is an extremely diﬃcult problem to compute a least-norm solution to a set of nonlinear equations. 65 . its nonlinear terms are small compared to its linear and constant part. starting from the initial guess x(0) .) You can select the inputs to the two vehicles.e. v(N − 1).6x1 x4 + 0. Give explicit formulas for the optimal u. do so. (b) Now we consider a speciﬁc example. you can use the least-norm solution of the linear equations resulting if you ignore the quadratic terms in f . Your method should not be complicated or require a long explanation. full rank. the equations f (x) = y are linear. we have z ≥ x . or that when it converges. 1. .. but be sure to make very clear what you’re assuming. A. F . Note that each component of f consists of a linear part. however. f (x) − y (which should be very small).. . (In particular. and the starting guess x(0) is good. (We repeat that you do not have to prove that the solution you found is really the least-norm one. b.e. You may assume that you have a starting guess. 8. simple method that ought to work well when f is not too nonlinear.e. . . with the function f : R5 → R2 given by f1 (x) f2 (x) = = −x2 + x3 − x4 + x5 − 0.

Explain how to solve this problem. . u(19) that yields x(20) = 0. w4 = −4 −2 . .25 0 . and the M = 4 way-points w1 = 2 2 .. h. 1) models drag on the vehicle: In the absence of any other force.. These are related by the equations p(k + 1) = p(k) + hv(k). m > 0 is the vehicle mass. .1. 8. The position at time index k is denoted by p(k) ∈ R2 . and the way-point indices k1 . . (b) Carry out your method on the speciﬁc problem instance with data h = 0. α = 0. v(1) = 0. .5 0. α. . i. . Plot f1 (k) and f2 (k) versus k. for k = 1. We will use a discrete-time model. .0 1. .1 . (We take k = 1 as the initial time.13 The smoothest input that takes the state to zero. at rest. using 66 . K + 1. where f (k) ∈ R2 is the force applied to the vehicle at time index k.e.) The problem is to ﬁnd forces f (1). M. that minimizes Jsmooth = 1 20 19 t=0 1/2 (u(t) − u(t − 1)) 2 . K. . k2 = 30. . Plot the smoothest input usmooth . .e. v(k + 1) = (1 − α)v(k) + (h/m)f (k). i. . we consider them exact. . . wM . (a) Explain how to solve this problem.. 0 1. . . k = 1. m = 1. . .25 −25 0. and the velocity by v(k) ∈ R2 . with way-point indices k1 = 10. We consider the discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t).. . (These formulas are approximations of more accurate formulas that we will see soon. subject to way-point constraints p(ki ) = wi . (These state that at the time ti = hki . . the vehicle must pass through the location wi ∈ R2 . . .5 0 The goal is to choose an input sequence u(0). We consider a vehicle that moves in R2 due to an applied force input. we want the one that is smoothest. . to simplify indexing. m.0 −0. with time index k = 1.14 Minimum energy input with way-point constraints. .5 1. where h > 0 is the sample interval. K. where ki are integers between 1 and K.8. A = 0. k3 = 40.0 0. but for the purposes of this problem. the way-points w1 . 2.0 . where we take u(−1) = 0 in this formula. i = 1. w3 = 4 −3 . . and α ∈ (0. Among the input sequences that yield x(20) = 0. f (K) ∈ R2 that minimize the cost function K J= k=1 f (k) 2 . x(0) = B = 0. Give the optimal value of J. . K = 100. u(1).) The vehicle starts at the origin. the vehicle velocity decreases by the factor 1 − α in each time index. w2 = −2 3 . kM ). . .1.) Note that there is no requirement on the vehicle velocity at the way-points. given all the problem data (i. and give the associated value of Jsmooth . we have p(1) = 0. with 25 1. . and k4 = 80. time index k corresponds to time t = kh. .e.

) Suppose that (AT A + µI)z = 0. the matrices AT A+µI and AAT + µI are both invertible.) The matrix above. i. . We asume that C is fat (k ≤ n). (b) Suppose that there is a nonzero u ∈ N (A) ∩ N (C). plot(f(1. p(K + 1). the number of equality constraints is no more than the number of variables. Use this u to show that K is singular. . Multiply on the left by z T . for any matrix A. and on the right by (AAT + µI)−1 .e. f (K).15 In this problem you will show that. Then.. with columns f (1). We assume here that f is a 2 × K matrix. but not shown. . (a) Let’s ﬁrst show that AT A + µI is invertible.:)). 8.. when is the KKT matrix K nonsingular? The answer is: K is nonsingular if and only if C is full rank and N (A) ∩ N (C) = {0}. (Your job is to ﬁll all details of the argument. which we will call K ∈ R(n+k)×(n+k) . Show that K is singular. . . and C ∈ Rk×n .) One question that arises is.:)).e. with AT substituted for A. (KKT are the initials of some of the people who came up with the optimality conditions for a more general type of problem. (The vector x gives the solution of the norm minimization problem above.:)). we found that the solution can be obtained by solving the linear equations AT A C T C 0 x λ = AT b d for x and λ. in the lecture notes on page 8-12.subplot(211). A ∈ Rm×n . plot(f(2. is called the KKT matrix for the problem. Using Lagrange multipliers.16 Singularity of the KKT matrix.) (c) Now assume that A is fat and full rank. explain why AT (AAT + µI) = (AT A + µI)AT holds.:).) 8.) (b) Now let’s establish the identity above. assuming µ > 0. . and (AT A + µI)−1 AT = AT (AAT + µI)−1 . This problem concerns the general norm minimization with equality constraints problem (described in the lectures notes on pages 8-13). This is what we needed to show. will show that AAT + µI is invertible. by part (a). (The same argument. (This is asserted. (a) Suppose C is not full rank. using plot(p(1. µ is positive) we have (AT A + µI)−1 AT → AT (AAT )−1 . You will ﬁll in all details of the argument below. multiply on the left by (AT A + µI)−1 . and any positive number µ. where the variable is x ∈ Rn . and argue that z = 0. minimize Ax − b subject to Cx = d. Here p is a 2 × (K + 1) matrix with columns p(1).p(2. . subplot(212). . Show that as µ tends to zero from above (i. (These inverses exist. First. Plot the vehicle trajectory. 67 .

. and plot the optimal input trajectory u. bT can all be any real number. Multiply AT Au + C T v = 0 on the left by uT . .e. You are to do this over T time periods. Finish the argument that leads to the conclusion that either C is not full rank. b1 . The prices change according to the following equations: p1 = p + αb1 . The amounts we purchase are large enough to have a noticeable eﬀect on the price of the shares. . T . . . with u(t) ∈ R and x(t) ∈ Rn . for t = 1. AT Au + C T v = 0 and Cu = 0. the time when the state hits the desired state. Write this out as two block equations. . . which implies u ∈ N (A). so we have b1 + · · · + bT = B. ¯ pt = θpt−1 + (1 − θ)¯ + αbt . . Conclude that u ∈ N (C). . But you must state this clearly. measures the memory: If θ = 0 the share price has no memory. . . For this problem. (b) Carry out your method on the particular problem instance with data 1 1 1 0 0 A = 1 1 0 . at time tdest the state is equal to xdest ). . If you need some matrix or matrices that arise in your analysis to be full rank. or N (A) ∩ N (C) = {0}. and the purchase made in period t only aﬀects the price in 68 . . so the total cost of purchasing the B shares is C = p1 b1 + · · · + pT bT . you can just assume they are. so there exists a nonzero vector [uT v T ]T for which AT A C CT 0 u v = 0. a single time period could be between tens of milliseconds and minutes. T.17 Minimum energy roundtrip. . as well as the input trajectory u(0). and use Cu = 0 to conclude that Au = 0. The time tdest is not given. for example. u(T − 1). 8. 8. Note that you have to ﬁnd tdest . . bt < 0. The parameter α.) We will let bt denote the number of shares bought in time period t. Here p is the base price of the shares and α and θ are parameters that determine how purchases aﬀect ¯ the prices. u(1). which is positive. deﬁned as E= T −1 t=0 u(t)2 . of shares in some company. you may assume that n ≤ tdest ≤ T − n. it can be any integer between 1 and T − 1. −1 0 0 1 1 Give the optimal value of tdest and the associated value of E. We consider the linear dynamical system x(t + 1) = Ax(t) + Bu(t).. T = 30. u(T − 1) so that x(T ) = 0 (i. . (Depending on the circumstances. The goal is to minimize the total input energy. . We also don’t require bt to be integers. (a) Explain how to do this. B. xdest = 1 . p t = 2. B = 0 . . tells us how much the price goes up in the current period when we buy one share. Here xdest ∈ Rn is a given destination state. and x(tdest ) = xdest (i.(c) Suppose that K is singular. x(0) = 0.18 Optimal dynamic purchasing. . The parameter θ. . You are to complete a large order to buy a certain number. means we sold shares in the period t. . (The quantities B.e.) We let pt denote the price per share in period t. which lies between 0 and 1. We must choose u(0). after T steps we are back at the zero state).

with +1 meaning the document is interesting or useful.e. rank. B. etc. . For each of N documents we are given a feature vector x(i) ∈ Rn . If you need to make an assumption about the rank of a matrix. we deﬁne sign(a) = +1 ˆ for a ≥ 0 and sign(a) = −1 for a < 0. n and N ). (This is called a binary label. ¯ Find the purchase quantities b1 . . which deﬁnes X. 8. T = 10. the eﬀect a purchase has on the price decays by a factor of two between periods. Compare these three quantities. Give the number of correct predictions (for which yi = yi ). singular values. you can use sum((yhat == 1) & (y == -1)).8. of the same size (i. We can now use w and v to predict the label for other documents. or in EE364a. the price has perfect memory and the price change will persist for all future periods. etc. such as least-squares.m. There are better methods for binary classiﬁcation. u(N − 1). y y You may ﬁnd the matlab function sign() useful.e. 69 t = 0.) Each component of the feature vector could be. . Explain how to ﬁnd an input sequence u(0). with x(t) ∈ Rn . You must also check. If θ = 1. From this data set we construct w ∈ Rn and v ∈ R that minimize N i=1 (wT x(i) + v − yi )2 . The ¯ diﬀerence between the total cost and this cost. whose columns are x(i) . 8. (b) Find w and v for the data in ls_classify_data. We consider a discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t). (a) You are given A. Give the optimal transaction cost. . for the particular ¯ problem instance with B = 10000. which you can learn about in a modern statistics or machine learning course. i. . Xtest and ytest.5 (say). This M-ﬁle will also deﬁne a second data set. by forming y = sign(wT x + v).20 Minimum time control. and y. nullspace. . so that x(N ) = 0. You must explain your method clearly.. If your method requires that some rank or other conditions to hold. the cost of purchasing the shares would always be pB. say so.. eigenvalues. 1. . . For scalar a.. α = 0. and false negatives (ˆi = −1 while yi = +1) for the test set. p = 10. for vector arguments. the number of occurrences of a certain term in the document. to guess whether an as-yetunread document is interesting. To count false positives. and x(0) = xinit . QR factorization.. if θ is 0. Your answer can involve any of the concepts used in the course so far.g. bT that minimize the transaction cost C − pB. range. with N is as small as possible. in your matlab code. and the transaction cost if the purchases were evenly spread over the periods (i. Remark. spam). . 1}. C − pB. sign() is taken elementwise. . . u(t) ∈ Rm .e. e. Use the w and v you found to make predictions about whether the documents in the test set are interesting. (a) Explain (brieﬂy) how to ﬁnd w and v. u(1). that these conditions are satisﬁed for the given problem instance. for example. Also give the transaction cost if all the shares were purchased in the ﬁrst period. and −1 meaning the document is not (for example. . the label could be decided by a person working with the documents. using any concepts from this class.that period. false positives ˆ (ˆi = +1 while yi = −1). and a label yi ∈ {−1. pseudoinverses. . is called the transaction cost. say so. ¯ θ = 0. But least-squares classiﬁcation can sometimes work well.00015. least-squares. If purchases didn’t increase the price.19 Least-squares classiﬁcation. if 1000 shares were purchased in each period). for example.

N. A and B) given in the M-ﬁle mn_perf_dec_data. depend on A and B.22 Piecewise aﬃne ﬁtting. . We say that a signal z is piecewise-aﬃne (PWA). minimize p n 2 Fij i=1 j=1 q n + i=1 j=1 G2 . K − 1.) Your answer can involve any of the concepts we’ve seen so far in EE263. αk ik+1 + βk = αk+1 ik+1 + βk+1 . in the kth interval.m.(b) Apply the method described in part (a) 0 0 1 1 0 1 1 1 A= 1 0 1 0 1 1 1 0 (where n = 4 and m = 1).. we say that it is continuous. . The transmitter broadcasts the signal z = Ax + By ∈ Rn to each receiver. When a PWA signal satisﬁes this condition. . and a sequence of inputs u(0). Two vector signals x ∈ Rp and y ∈ Rq are to be transmitted to two receivers. each reconstructs the exact desired signal. How would you ﬁnd perfect decoding matrices that. of course. . . i2 .) Finally. it doesn’t make sense to refer to a discrete signal as continuous. .. no matter what values x and y take. . . . iK+1 . (c) Find minimum norm perfect decoding matrices for the data (i. . You must give us the (minimum) value of N . xinit = 1 1 1 with data .) We can also add a continuity requirement. (a) When is it possible to ﬁnd perfect decoding matrices F and G? (The conditions.21 Interference cancelling equalizers. 70 . this is just special notation for PWA signals that refers to the condition above. respectively. . if zj = αk j + βk . we get to the problem.e. . . . This means ˆ ˆ that both decoders are perfect..) ˆ The goal is to ﬁnd F and G so that x = x and y = y. since ‘linear’ is sometimes used to mean ‘aﬃne’. . i2 − 1. k = 1. 8. . For this reason we call decoding matrices with this property perfect. iK . In this problem we refer to vectors in RN as signals. with kink points i1 . the signal value is an aﬃne function of the index j (which we might interpret as time in this problem) over the (integer) intervals 1. Thus. while completely rejecting the other (undesired) signal. . (Of course. to the speciﬁc problem instance 1 0 1 1 . receiver 2 forms an estimate of the signal y using the linear decoder ˆ y = Gz. . and are known. 8. for k = 1. (b) Suppose that A and B satisfy the conditions in part (a). . . . ij We call such decoding matrices minimum norm perfect decoding matrices. We call αk and βk the slope and oﬀset. . . for ik ≤ j < ik+1 . . . (The matrices F ∈ Rp×n and G ∈ Rq×n are called the decoding matrices. u(N − 1) that results in x(N ) = 0. This means that if each piecewise aﬃne segment were extrapolated to the next index. . B = 0 . K. (The matrices A and B are called the coding matrices. . which are integers satisfying i1 = 1 < i2 < · · · < iK+1 = N + 1. (It is very common to refer to such a signal as piecewise-linear.) Receiver 1 forms an estimate of the signal x using the linear decoder x = F z. . among all perfect decoding matrices. . i3 − 1. the value would agree with the starting value for the next segment.

24 Unbiased estimation with deadlines. We now deﬁne y (i) = A(i) x. and some kink points i1 . . . be obtained by measurements of A taken on diﬀerent days. i. The catch here. . x ∈ Rn is the vector of parameters to be estimated. A(:.:. Also included in the data ﬁle (commented out) is code to produce scatter plots of your results. and so on. K. say. . We will consider two diﬀerent methods to choose x.m. iK+1 . .23 Robust input design. with the estimate of x given by x = By. . (You can assume that A is full rank. we have more degrees of freedom in our choice of input than speciﬁcations for the outcome. The data ﬁle also includes a commented out code template for plotting your results. We require that the estimator be unbiased.e. You are to design a linear estimator. Use the code we provided as a template for your plots. . • Mean-square error minimization method. plot the original signal along with the PWA and continuous PWA approximations. day to day. You can make any needed rank assumptions about matrices that come up. . . given by a matrix B ∈ Rn×m . (b) Find xln and xmmse for the problem with data given in rob_inp_des_data. ˆ ˆ 71 . in the next time period y2 becomes available. How would one ﬁnd the best PWA approximation y pwa of y. We consider a standard measurement set up. which might.e.. Our goal is to choose x so that y (i) ≈ y des . the components of y) as arriving sequentially. Choose xln to be the least-norm solution of des ¯ ¯ Ax = y . But we do have some possible values of A. and you should imagine the measurements (i. A(K) . . .m.. for i = 1. K. for i = 1. 8. K ¯ • Least norm method. with y = Ax + v. A(K) . where y ∈ Rm is the vector of measurements.e. so that all m measurements are available in the mth time period. . If we knew A. . We’ll assume that m ≤ n. . . but please state them explicitly.. Choose xmmse to minimize the mean-square error 1 K K i=1 y (i) − y des 2 . which we know follows y = Ax. Our goal is to choose the input x ∈ Rn so that y ≈ y des . with A ∈ Rm×n .) (ˆj − yj )2 y pwa 1/2 . Running this M-ﬁle will deﬁne ydes and the matrices A(i) (given as a 3 dimensional array. . Deﬁne A = (1/K) i=1 A(i) . it varies a bit. that x = x when v = 0. .(a) You are given a signal y ∈ RN . (a) Give formulas for xln and xmmse . (c) Carry out your methods from parts (a) and (b) on the data given in pwa_data. 8. v ∈ Rm is the vector of measurement noises. for example. . in terms of y des and A(1) . y1 becomes available. In the ﬁrst time period. Produce and submit scatter plots of y (i) for xln and xmmse . . is that we don’t know A exactly. Using this template. we could use standard EE263 methods to choose x. you should think of the index for y as denoting a time period. We are given a system. i. with approximation error measured in the RMS sense. and A ∈ Rm×n characterizes the measurement system. though. In this problem. Write down the values of xln and xmmse you found. for example. where y des ∈ Rm is a given target outcome. Give us the RMS approximation error in both cases. but this time. A(1) . . ˆ 1 N N j=1 (b) Repeat part (a).13) is A(13) ). Running this data ﬁle will deﬁne y and the kink points. you are to ﬁnd the continuous PWA approximation y pwac that ˆ minimizes the RMS deviation from y.

. Of course we also have the normalization constraint 1T x = 1.e. . i. is called the risk aversion parameter. . If Ri is large (in magnitude) our portfolio is exposed to risk fact from changes in that sector. which we now explain. σn ) ∈ Rn .e. and λ ∈ R. defense. given as Rfact = F x ∈ Rk . ). . you do not need to understand the statistical interpretation. x1 may only be computed from y1 . while avoiding risk exposure. such as ‘A is skinny and full rank’. .) We will choose the portfolio weights x so as to maximize r − λRid . . . i. We want to choose x so that r is large. . Explain how to ﬁnd the smallest such estimator matrix. . the xi are sometimes called portfolio weights. (b) Assume that it is possible to ﬁnd an unbiased linear estimator that respects the deadlines.e. The increase in J can be thought of as the cost of imposing the deadlines.In addition. The data in this problem are the measurement matrix A and the deadlines k1 .25 Portfolio selection with sector neutrality constraints. i=1 j=1 J= If your method requires some matrix or matrices to be full rank. A matrix F ∈ Rk×n . Rfact = 0. we have deadline constraints. We consider the problem of selecting a portfolio composed of n assets. We normalize our total portfolio as 1T x = 1. . (a) How would you determine whether or not an unbiased linear estimator. it can involve a sequence of tests. . We say that ki ˆ is the deadline for computing xi . kn . First we explain the idea of sector exposure. Another type of risk exposure is due to ﬂuctations in the returns of the individual assets. We let xi ∈ R denote the investment (say. in dollars) in asset i. . F ∈ Rk×n . transportation. . . . we say that the portfolio is neutral with respect to sector i. our estimate of the ith parameter to be estimated. with xi < 0 meaning that we hold a short position in asset i. You are given ˆ increasing deadlines. which is called the risk-adjusted return. We have a list of k economic sectors (such as manufacturing. ym . The (known) data in this problem are µ ∈ Rn . . yk1 . which is positive. where µ ∈ Rn is a vector of asset (mean) returns. We require that xi can be computed ˆ after ki time periods.) The portfolio (mean) return is given by r = µT x. relates the fact portfolio x to the factor exposures. . σ = (σ1 .. i where σi > 0 are the standard deviations of the asset returns. 0 < k1 < k2 < · · · < kn = m. . (You can take the formula above as a deﬁnition. i. The idiosyncratic risk is given by n Rid = i=1 2 σi x2 . . 8. energy. (c) Carry out the method described in part (b) on the data found in the M-ﬁle unbdl_data. . Compare the value of J found for your estimator with the value of J for B = A† . 72 . The number Ri is the portfolio risk fact exposure to the ith economic sector. while xn may be computed from all of the measureˆ ments y1 . called the factor loading matrix.. we require that xi must be a function of y1 . in terms of the size of estimator matrix. where 1 is the vector with all entries 1. but you must state this clearly. . which we explain next. the B that minimizes n m 2 Bij . you can just assume they are. yki only. .m.. exists? Your answer does not have to be a single condition. . (With normalization. The parameter λ. if it is small. subject to neutrality with respect to all sectors. ˆ Thus. we are less exposed to risk from that sector. which respects the given deadlines. If Ri = 0.

Report the optimal cost function value. . The vehicle must reach a destination d ∈ R2 at time t = M + N + 1. are given. . . u(M + N ) depending on the ﬁnal destination. . i. . u(k) (M + N ). The subtlety here is that we are not told what d is at t = 1. . We consider a vehicle moving in R2 . d(K) ) will be revealed to you. (The data ﬁle contains commented-out code for producing your plots. in the case when the ﬁnal destination is d(k) . you must choose the inputs up to time M . u(M ). . . . we must have p(M + N + 1) = d.(a) Explain how to ﬁnd x. Thus. We will choose the inputs to minimize the cost function M u(t) t=1 2 1 + K K M +N u(k) (t) 2 . You are welcome (even encouraged) to express your solution in terms of block matrices. with dynamics x(t + 1) = Ax(t) + Bu(t).e. we simply know that it is one of K possible destinations d(1) .m. . p(k) (M + N + 1). . u(k) (M + N ). .26 Minimum energy control with delayed destination knowledge. . at time t. B ∈ Rn×m and C ∈ R2×n . At time t = M + 1. p(t) = Cx(t). .” 73 . 2. u(1). . . and for each possible destination plot the position of the vehicle p(k) (1). . and p(t) ∈ R2 is the position of the vehicle. . u(2). . Report the associated values of r (the return). x(1) = 0. d(K) (which we are given). as u(k) (M + 1). and the initial state x(1). averaged over all destinations. . . . 8. We will denote the choice of these inputs. Verify that 1T x = 1 (or very close) and Rfact = 0 (or very small). .m. . (b) Using the data given in sector_neutral_portfolio_data. . there’s no point using the input (for which we are charged) until then. . . . . but you can choose u(M + 1). k=1 t=M +1 which is the sum of squared-norm costs. . The matrices A ∈ Rn×n . u(t) ∈ Rm is the input. . and Rid (the idiosyncratic risk). independent of the actual ﬁnal destination. formed from the given data.) Comment brieﬂy on the following statement: “Since we do not know where we are supposed to go until t = M + 1. t = 1. where M > 0 and N > 0. the destination (which is one of d(1) . K. . using methods from the course. . . (a) Explain how to ﬁnd u(1). . u(M ) and u(k) (M + 1). . .. . where x(t) ∈ Rn is the state. for k = 1. (b) Carry out your method on the data given in delayed_dest_data. . ﬁnd the optimal portfolio. .

3. • All of the two-to-three year old ﬁsh in the current year die. Tridiagonal matrices arise in many applications. i. • The number of ﬁsh less than one year old in the next year (t + 1) is equal to the total number of oﬀspring born during the current year. (a) Draw a pretty block diagram of x = Ax. the remaining 60% live on to be between one and two years old in the next year (t + 1).2 Tridiagonal systems.2. ˙ (b) Consider a Markov chain with four states labeled 1. 9... i. when z is not 1. The state transition probabilities are described as follows: when z is not 4. R5 R6 R7 R1 C1 R2 C2 C3 R3 C4 R4 9. Remark: this example is silly.Lecture 9 – Autonomous linear dynamical systems 9. Fish that are between two and three years old in the current year (t) bear an average of 1 oﬀspring each. and 50% of the old ﬁsh. to the destination node. The routes are determined and known. Give the discrete time linear system equations that govern the evolution of the state distribution. ﬁsh. • 30% of the one-to-two year old ﬁsh in the current year die.3. as follows: • x2 (t) denotes the number of ﬁsh between one and two years old • x1 (t) denotes the number of ﬁsh less than one year old • x3 (t) denotes the number of ﬁsh between two and three years (We will ignore the fact that these numbers are integers. Fish that are between one and two years old in the current year (t) bear an average of 2 oﬀspring each. it increases by one with probability 0. Let z(k) denote the state at time k.3 A distributed congestion control scheme. z(k + 1) = z(k)).2. There are p routes in the network. where vi is the voltage across the capacitor Ci . A square matrix A is called tridiagonal if Aij = 0 whenever |i−j| > 1.1 A simple population model. and 70% live on to be two-to-three year old ﬁsh in the next year. x(t) ∈ R3 will describe the population of ﬁsh at year t ∈ Z. Express the population dynamics as an autonomous linear system with state x(t).e. • 40% of the ﬁsh less than one year old in the current year (t) die. where A ∈ R4×4 is tridiagonal. it decreases by one with probability 0. (c) Find the linear dynamical system description for the circuit shown below. which is a path from a source node. Draw a graph of this Markov chain as in the lecture notes. (If z neither increases nor decreases.) The population evolves from year t to year t + 1 as follows. A data network is modeled as a set of l directed links that connect n nodes. along one or more links in the network.e. Each route 74 . Fish that are less than one year old in the current year (t) bear no oﬀspring.4. it stays the same. in the form x(t + 1) = Ax(t). Use state x = [v1 v2 v3 v4 ]T . We consider a certain population of ﬁsh (say) each (yearly) season. but more sophisticated population dynamics models are very useful and widely used.

.e. . we have x(t + 1) = Ax(t) + b. we ignore these eﬀects. (a) Show that x(t). Be as explicit as you can about what A and b are. x2 . Note that this congestion control method is distributed . . . the linear dynamical system found in part (a)) has a unique equilibrium point x. for each of the initial source rates.has a source rate (in. .. It then adjusts its source rate proportional to the sum of the congestion along its route. . . using algorithm parameter α = 0. or close to the target levels otherwise. l. i i+1 i+2 for i = 1. Plot the traﬃc level Ti (t) for each link (on the same plot) versus t. . . from two diﬀerent initial source rates. and −1. 1. . each source only needs to know the congestion along its own route. we only ask you to hand in the plots for two. and all target traﬃc levels equal to one. (You are welcome to simulate the system from more than two initial source rates. 0. In real congestion control.. l. . Does the rate x always correspond to zero congestion on every ¯ link? Is it optimal in any way? 9. The route-link matrix R ∈ Rl×p . for example. we do not take into account the link capacities. . k − 1.e. can be expressed as a linear dynamical system with constant input. . xk . . p. say. Each route monitors the congestion for the links along its route. the vector of source rates. . i.) Make a brief comment on the results. . . . i = 1. . a 0 or a 1 can follow a 1. Try to use the simplest notation you can. . A communication system is based on 3 symbols: 1. however. a valid sequence of symbols x1 . l. (We assume the system operates in discrete time.1. but a −1 cannot follow a 1. . is deﬁned as Rij = 1 0 route j utilizes link i otherwise. In this problem we consider a very simple congestion control protocol. and allow the source rates and total traﬃc levels to become negative. j = 1. The congestion is positive if the traﬃc exceeds the target rate. k − 2. We use Ti (t) to denote the total traﬃc on link i at time t. What can you say about x? ¯ ¯ Limit yourself to a few sentences. . Hint: use the matrix R. Each link has a target traﬃc level.4 Sequence transmission with constraints. . (c) Now we come back to the general case (and not just the speciﬁc example from part (b)). . the rates and traﬃc are nonnegative. t = 0. for i = 1. 75 . . we deﬁne a matrix that may be useful. . . • Power constraint: The sum of the squares of any three consecutive symbols cannot exceed 2: x2 + x2 + x2 ≤ 2. Assume the congestion control update (i. . (b) Simulate the congestion control algorithm for the network shown in ﬁgure 1. and negative if it is below the target rate. The goal in congestion control is to adjust the source rates in such a way that the traﬃc levels converge to the target levels if possible. must satisfy several constraints: • Transition constraint: Consecutive symbols cannot be more than one apart from each other: we must have |xi+1 − xi | ≤ 1. This can be expressed as: xj (t + 1) = xj (t) − α (sum of congestion along route j) .) The total traﬃc on a link is the sum of the source rates for the routes that pass through it. . For this communication system. Thus. however. We deﬁne the congestion on link i as Ti (t) − Titarget . Before we get to the questions. . . and does not directly coordinate its adjustments with the other routes. We denote the source rate for route j at time t as xj (t). i = 1.. and that the rate x(t) converges to it as t → ∞. bits per second). 2. and the traﬃc on each link must be below a maximum allowed level called the link capacity. for i = 1. In this problem. where α is a positive scalar that determines how aggressively the source rates react to congestion. which we denote Titarget .

5. All traﬃc and routes ﬂow counterclockwise (although this doesn’t matter). with links shown darker. 5). 2. 2. and route 4 is (4. 4). . route 2 is (1. Route 1 is (1. route 3 is (3.4 1 3 2 1 5 4 2 3 Figure 1: Data network for part (b). and ki are the spring 1 0 0 0 0 1 x 0 0 76 1 0 11 00 1 0 1 0 11 00 1 0 9. . q1 ˙ q2 ˙ values of the masses. How many diﬀerent (valid) sequences of length 20 are there? q1 k2 m2 k1 m1 where the state is given by Here qi give the displacements of the masses.5 Consider the mechanical system shown below: q2 . 3. 1). . 3). 4. because the sum of the ﬁrst three consecutive symbols is 2. • Average constraint: The sum of any three consecutive symbols must not exceed one in absolute value: |xi + xi+1 + xi+2 | ≤ 1. The dynamics of this system are 0 0 0 0 x = k1 +k2 ˙ k2 − m1 m1 k k2 − m22 m2 q1 q x = 2 . k − 2. where routes are deﬁned as sequences of links. So. a sequence that contains 1100 would not be valid. . for i = 1. mi are the stiﬀnesses. for example. respectively.

In many applications we need to solve a set of linear equations Ax = b. which results in initial condition 0 0 x(0) = α1 /m1 . 9. These methods rely on another matrix ˆ ˆ ˆ A. Note that r(t) is the residual after the tth iteration. α2 that satisfy 2 2 the speciﬁcation and minimize α1 + α2 . Obviously computing ˆ A−1 z is fast. (For example. q2 (10) = 2 when the parameters have the values m1 = 1. or give an explicit counterexample.. which is supposed to be ‘close’ to A. .e. the matrix A might be the diagonal part of the matrix A (which.. As a simple example.e. (a) Find the exact conditions on A for which the unit square S is invariant under x = Ax. x ∈ R100000 ). m2 = 1.(i. sometimes called relaxation.g. If the speciﬁcation is feasible.) This problem concerns selection of the impulsive forces α1 and α2 . and also.3. − 1 ≤ x2 ≤ 1 }. Consider the linear dynamical system x = Ax with A ∈ R2×2 . has relatively small oﬀ-diagonal elements).e. where A is nonsingular (square) and x is very large (e. but the standard methods for computing x = A−1 b (e. x(2) . . LU decomposition) are not feasible. k1 = k2 = 1). k1 = k2 = 1. is to set x(0) equal to some approximation of x (e. . ˆ r(t) = Aˆ(t) − b. ˆ ˆ x(0) = A−1 b) and repeat. q2 (10) = 2. ˆ ˆ Immediately before t = 0. α2 that satisfy q1 (10) = 1. Determine whether each of these speciﬁcations is feasible or not (i. to satisfying the speciﬁcation. m1 = m2 = 1. you are able to apply a strong impulsive force αi to mass i. The unit ˙ square in R2 is deﬁned by S = { x | − 1 ≤ x1 ≤ 1. α2 that come closest. . q1 (10) = 0. after t iterations. A simple iterative method. A has the property that A−1 z is easily or ˆ cheaply computed for any given z. x ˆ x(t + 1) = x(t) − A−1 r(t).7 Iterative solution of linear equations. A common approach is to use an iterative method.6 Invariance of the unit square. There are many. . presumably. for t = 0. q2 (10) = 0 ˙ ˙ (d) q2 (10) = 2 when the parameters have the values used above (i. q2 (10) = 2 (c) q1 (10) = 1. More importantly.g. If the speciﬁcation is infeasible. ﬁnd the particular α1 . We assume that Az can be computed at reasonable cost. Either show that this is true. if you cannot ﬁnd α1 . whether there exist α1 . .. (b) Consider the following statement: if the eigenvalues of A are real and negative. For parts a–c below. for any z. the parameter values are m1 = m2 = 1. many other examples. ﬁnd the particular α1 . α2 /m2 (The hat reminds us that x(t) is an approximation.. each mass starts with zero position and a velocity determined by the impulsive forces.) ˆ ˆ This iteration uses only ‘cheap’ calculations: multiplication by A and A−1 . it’s just scaling the entries of z.g. of the true solution x = A−1 b. Consider the following speciﬁcations: (a) q2 (10) = 2 (b) q1 (10) = 1. 77 . in a least-squares sense. Give the ˙ conditions as explicitly as possible.) Be sure to be very clear about which alternative holds for each speciﬁcation. 1. that converges to the solution x = A−1 b.. q2 (10) = 2. then S is invariant under x = Ax. which computes a sequence x(1). then ﬁnd αi that minimize (q1 (10) − 1)2 + (q2 (10) − 2)2 . α2 ∈ R that make the speciﬁcation hold). ˙ 9.. k1 = k2 = 1.

A(t) switches between the two values A1 ∈ Rn×n and A2 ∈ Rn×n every second. and doesn’t work for γ > γcrit . the powers appeared to diverge. the powers converged to values for which each SINR exceeded γ.8. Try to avoid the following two errors: ˆ • Your condition guarantees convergence but is too restrictive.e. the iterative method works. 1. singular values and singular vectors. You can refer to the matrices A1 and A2 . and the two SINR threshold levels γ = 3 and γ = 5. they converge to values for which each SINR exceeds γ. (And if β < 0. k = 0. whereas for the larger target SINR value γ = 5. 9. . . σ. In that problem. (For example: β = A−1 (A − ˆ A) < 0.. with x(t + 2) = x(t) and x = A(t)x. or some combination.8) • Your condition doesn’t guarantee convergence. ˙ (b) All trajectories are asymptotically periodic. say. σ).) Find an expression for γcrit in terms of G ∈ Rn×n . you expressed the power control method as a discrete-time linear dynamical system. 2. α. no matter what the initial power was. Let us consider ﬁxed values of G. 2. . their eigenvalues and eigenvectors or Jordan forms. x not identically zero. What are the conditions on A1 and A2 under which the system has a nonzero periodic trajectory. 78 . The matrix A(t) is periodic with period 2. Explain your simulation results from the problem 1(b) for the given values of G. α. and σ. Of course you must explain how you came up with your expression. with period 2? By this we mean: there exists x : R+ → Rn . etc. i. Your condition should be as explicit as possible. for example. and the SINRs did not appear to converge. then x(t) − x ≤ β t+1 x . now that you know alot more about linear systems. α. Your condition can involve norms. A(t) = A2 2k + 1 ≤ t < 2k + 2. or any matrices derived from them using standard matrix operations. condition number. and σ. and simulated it for a speciﬁc set of parameters. α. (a) Explain the simulations. then convergence is pretty fast.8 Periodic solution of periodic linear dynamical system. t→∞ (Note that this holds when x converges to zero .9 Analysis of a power control algorithm. We want you to give us the most general conditions under which the property holds. . 9.e. Show that if we choose ˆ−1 b. .ˆ ˆ ˆ (a) Let β = A−1 (A − A) (which is a measure of how close A and A are). You found that for the target SINR value γ = 3. A(t + 2) = A(t) for all t ≥ 0. (a) Existence of a periodic trajectory. . Thus if β < 1. Give the simplest expression you can. . with several values of initial power levels. . ) Please note: • Your conditions should be as explicit as possible. singular values.) ˆ ˆ (b) Find the exact conditions on A and A such that the method works for any starting approximation x(0) and any b. You are going to analyze this. for any x(0) = A ˆ ˆ b we have x(t) → x as t → ∞. (‘Works’ means that no matter what the initial powers are. Consider the linear dynamical system x = A(t)x ˙ where A1 2k ≤ t < 2k + 1. etc. i.. and eigenˆ ˆ values of A and A. we have ˙ lim x(t + 2) − x(t) = 0. (b) Critical SINR threshold level.1 Please refer to this problem for the setup and background. What are the conditions on A1 and A2 under which all trajectories of the system are asymptotically 2-periodic? By this we mean: for every x : R+ → Rn with x = A(t)x. 1. In this problem we consider again the power control method described in homework problem 2. k = 0. and two target SINRs. it should not include any limits. In other words. • We do not want you to give us a condition under which the property described holds. It turns out that the power control algorithm works provided the SINR threshold γ is less than some critical value γcrit (which might depend on G.

n ˙ cn Tn = −an Tn − bn−1 (Tn − Tn−1 ). singular values. 9. A3 . The dynamics of the system are described by ˙ c1 T1 = −a1 T1 − b1 (T1 − T2 ). bi−1 (Ti − Ti−1 ) is the heat ﬂow from element i to element i − 1. are given in tv_data. 79 ˙ We can interpret this model as follows. . least-squares. give a general formula for x(t) (involving xe . from the desired value. and so on. (This means that for any x(0). . and show the results as well. . and n ˙ ci Ti = −ai Ti − bi (Ti − Ti+1 ) − bi−1 (Ti − Ti−1 ).e.g. with x(t) ∈ Rn . From ˙ this. where c ∈ R . We consider a discrete-time linear dynamical system x(t + 1) = A(t)x(t). where A(t) ∈ {A1 . (e) Show that if all eigenvalues of A have negative real part. we use the objective (1/ n) T (tdes ) − T des and the constraint (1/ n) T (0) ≤ T max . x(0)).m. Finally. so ai Ti is the heat ﬂow from element i to the environment (i..11 Linear dynamical system with constant input. for any trajectory x. and b ∈ Rn−1 are given and are all positive.. Here. A ˙ vector xe is an equilibrium point if 0 = Axe + b. at t = tdes . we have x(t) → 0 as t → ∞. we have x(t) → 0 as t → ∞. eigenvalues. the direct heat loss from element i. Your proof will consist of two parts: • An explanation of how you are going to show that any trajectory converges to zero. and for any sequence A(0). . √ To √ formalize these requirements. Temperature is measured in degrees Celsius above ambient. You must provide the source code for these calculations. and for any trajectory x(t). 9. exp(tA).12 Optimal choice of initial temperature proﬁle.. Show that z = Az. so ci Ti is the net heat ﬂow into element i. so bi (Ti − Ti+1 ) is the heat ﬂow from element i to element i + 1. a ∈ R . e. The ﬁrst expression is the RMS temperature deviation. so that T (tdes ) ≈ T des . n − 1.. A4 }. Show that this system is stable. which are 4 × 4. (This means that the constant trajectory x(t) = xe is a solution of x = Ax + b. These 4 matrices. We also wish to satisfy a constraint that T (0) should be not be too large. The parameter ci is the heat capacity of element i. then there is exactly one equilibrium point xe .) The parameter bi gives the thermal conductance between element i and element i + 1. controllability. We consider the system x = Ax + b. negative Ti (t) corresponds to a temperature below ambient. A2 . A(1). . T max is the (given) maximum inital RMS temperature value. • The numerical calculations that verify the conditions hold for the given data. We consider a thermal system described by an n-element ﬁnite-element model.10 Stability of a time-varying system. T (0) ∈ Rn . . A(2). The parameter ai gives the thermal conductance between element i and the environment. . . with the temperature of element i at time t denoted Ti (t). Your argument of course will require certain conditions (that you will ﬁnd) to hold for the given data A1 . The elements are arranged in a line. . and the second is the RMS temperature deviation from ambient at t = 0. tdes ∈ R is a speciﬁc time when we want the temperature proﬁle to closely match T des ∈ Rn . .) ˙ (a) When is there an equilibrium point? (b) When are there multiple equilibrium points? (c) When is there a unique equilibrium point? (d) Now suppose that xe is an equilibrium point. i = 2. .) You may use any methods or concepts used in the class. .9. i. The goal of this problem is to choose the initial temperature proﬁle. Deﬁne z(t) = x(t) − xe .e. A4 . we have x(t) → xe as t → ∞.

Give √ RMS temperature error (1/ n) T (tdes )− the T des . b. T (tdes ) and T des . c. √ Plot.m. your T (0).(a) Explain how to ﬁnd T (0) that minimizes the objective while satisfying the constraint. (b) Solve the problem instance with the values of n. T des and T max deﬁned in the ﬁle temp_prof_data. and the RMS value of initial temperature (1/ n) T (0) . 80 . on one graph. a. tdes .

4 0. (a) Find the eigenvalues. 2. . since we are given conditions on the state at two time points instead of the usual single initial point. you replace every integrator in the original system with a leaky integrator.4 Two-point boundary value problem. . x(t) is constant. You can assume that A is diagonalizable. Justify your answer. Find x(2). y2 (t) = sgn(x2 (t)). λn . In this problem we consider the speciﬁc system x = Ax = ˙ 0.1 Suppose x = Ax and z = σz + Az = (A + σI)z where σ ∈ R. . −1 1 −1 1 . where the function sgn : R → R is deﬁned by sgn(a) = 1 a>0 0 a=0 −1 a < 0 81 . (a) Suppose the eigenvalues of A ∈ Rn×n are λ1 . We have a detector or sensor that gives us the sign of each component of the state x = [x1 x2 ]T each second: y1 (t) = sgn(x1 (t)). . i. How are z(t) and x(t) ˙ ˙ related? Find the simplest possible expression for z(t) in terms of x(t).Lecture 10 – Solution via Laplace transform and matrix exponential 10. (c) The state trajectories describe circular orbits. although it is true in the general case. where A = ˙ (a) Find eA . Another way to ˙ ˙ think of the damped system is in terms of leaky integrators..6 Linear system with a quadrant detector. . (This is called a two-point boundary value problem.. (d) You may remember that circular motion (in a plane) is characterized by the velocity vector being orthogonal to the position vector. . When σ < 0. resolvent. . 10. AB = BA. . and x(0) = z(0). and Tr Y is the sum of the eigenvalues of Y . Verify that this holds for any trajectory of the harmonic oscillator.7 1. Hint: det X is the product of the eigenvalues of X. 10. and state transition matrix for the harmonic oscillator. (b) Sketch the vector ﬁeld of the harmonic oscillator.2 Harmonic oscillator. 10.e. 1. some people refer to the system z = σz + Az as a damped version of x = Ax. t = 0.5 −0. (b) Show that det eA = eTr A . Express x(t) in terms of x(0). Verify this fact using the solution from part (a). i. do not use the explicit solution you found in part (a). 10. (b) Carefully show that d At dt e = AeAt = eAt A. . (a) Show that eA+B = eA eB if A and B commute.e. .5 Determinant of matrix exponential. to ˙ get the damped system. eλn . . Use only the diﬀerential equation. The system x = ˙ 0 −ω ω 0 x is called a harmonic oscillator. Show that the eigenvalues of eA are eλ1 . (b) Suppose x1 (0) = 1 and x2 (1) = 2.) 10.5 x.3 Properties of the matrix exponential. Consider the system described by x = Ax. A leaky integrator satisﬁes y − σy = u.

(c) If the eigenvalues of A are λ1 . Rougly speaking. Or.7)? Your response might be.7) = +1. and y(3. the problem can be stated as follows. . then the eigenvalues of A−1 are 1/λ1 . or 1)”. the problem. You can think of y(t) = [y1 (t) y2 (t)]T as determining which quadrant the state is in at time t (thus the name quadrant detector ). −1. Hint: you’ll need to use the facts that det A = det(AT ). and. and x(1) is also in quadrant IV. x(0) is in quadrant IV.8). You observe the sensor measurements y(0) = 1 .1 . (d) The eigenvalues of A and T −1 AT are the same.8) = +1. you can think of y(t) as a one-bit quantized measurement of the state at time t. Consider the characteristic polynomial X (s) = det(sI − A) of the matrix A ∈ Rn×n . .e. Finally. 1/λn .4) = +1.3) = −1. 82 .8) is deﬁnitely −1. if A is invertible. y(1. .2) = −1. 0. y(1. what values could y(2) possibly take on? In terms of the quadrants. we also have y(0. you must completely justify and explain your answer.7 Linear system with one-bit quantized output. (a) Show that X is monic. 10. The question is: which quadrant(s) can x(2) possibly be in? You do not know the initial state x(0).1 −1 y(t) = sign (cx(t)) 1 0. which means that its leading coeﬃcient is one: X (s) = sn + · · ·. 10. Show the following: (a) The eigenvalues of A and AT are the same. (What we mean by “y(0. (b) A is invertible if and only if A does not have a zero eigenvalue.2) = −1. . Of course.7).9 Characteristic polynomial. λn and A is invertible.. but y(3.7) is deﬁnitely +1. det A−1 = 1/ det A. ˙ where A= and the sign function is deﬁned as +1 if a > 0 −1 if a < 0 sign(a) = 0 if a = 0 y(1. . c= 1 −1 . det(AB) = det A det B.3) = −1. . for example: “y(0. the output of this autonomous linear system is quantized to one-bit precision. The following outputs are observed: y(0.4) = +1. We consider the system x = Ax. and y(1.7) can be anything (i.) 10.8) = +1 What can you say (if anything) about the following: y(0. −1 Based on these measurements. y(3. and y(3. y(2. −0. Of course you must fully explain how you arrive at your conclusions.7) is deﬁnitely +1” is: for any trajectory of the system for which y(0. .8 Some basic properties of eigenvalues.There are several ways to think of these sensor measurements. −1 y(1) = 1 . . y(2.

. ˙ (a) How are the state-transition matrices of the system and the adjoint system related? (b) Show that z(0)T x(t) = z(t)T x(0).. Let P = [p1 · · · pn ] and Q = P −1 .9859 −3. s − λk Note that this is a partial fraction expansion of (sI − A)−1 . . You might ﬁnd the matlab command null useful. What is Ri ? (c) Show that (sI − A)−1 = n k=1 Rk . i T Let qi be the ith row of Q.4575 3. Consider the continuous-time system x = Ax with A given ˙ by −0. By equating coeﬃcients show that an−1 = − n i=1 λi and a0 = n i=1 (−λi ). 83 .1444 5.0939 2. 10.0428 4.11 Spectral resolution of the identity.0510 −5.9229 −4. it computes an orthonormal basis of the null space of a matrix. (b) Find q3 .10 ± j5. .9709 −0. Evidently the adjoint system and the system have the same eigenvalues. n.1387 1. The adjoint system associated with the linear dynamical system x = Ax is ˙ z = AT z. What is the range of Rk ? What is the rank of Rk ? Can you describe the null space of Rk ? 2 (b) Show that Ri Rj = 0 for i = j. λ3.2 = −0.4599 −1. q2 ) for the invariant plane associated with λ1 and λ2 . .15 ± j7. X (s) = sn + an−1 sn−1 + · · · + a1 s + a0 = (s − λ1 )(s − λ2 ) · · · (s − λn ). .1164 You can verify that the eigenvalues of A are λ1. . and then do a partial fraction expansion of (sI − A)−1 to ﬁnd the R’s).0753 −1.10 The adjoint system.(b) Show that the sn−1 coeﬃcient of X is given by − Tr A.e. .4 = −0. pT pi = 1. . For this reason the residue matrices are said to constitute a resolution of the identity.1005 1. for 0 ≤ t ≤ 40. (c) Plot the individual states constituting the trajectory x(t) of the system starting from an initial point in the invariant plane. For this reason the Ri ’s are called the residue matrices of A. 10. T (a) Let Rk = pk qk . i = 1. . pn . ﬁnd P and Q and then calculate the R’s.8847 −0. 10. q4 ∈ R4 such that Q = [q1 q2 q3 q4 ] is orthogonal.0880 −0. Suppose A ∈ Rn×n has n linearly independent eigenvectors p1 . (Tr X is the trace of a matrix: Tr X = n i=1 Xii . . λn denote the eigenvalues of A. (a) Find an orthonormal basis (q1 . so that (c) Show that the constant coeﬃcient of X is given by det(−A). (d) Show that R1 + · · · + Rn = I.) (d) Let λ1 . with associated eigenvalues λi .12 Using matlab to ﬁnd an invariant plane.0481 A= −2. . say x(0) = q1 . (e) Find the residue matrices for A= 1 0 1 −2 both ways described above (i. .

(b) Plot a few trajectories of x(t).m and is equal to −0. .0428 4.. vol S(t) = vol S for all t? Can the ﬂow x = Ax be stable? ˙ Hint: if F ∈ Rn×n then vol(F S) = | det F | vol S.14 Some matlab exercises. if the state starts inside (or enters) the positive (i. Note: The A matrix is available on the class web site in the ﬁle inv plane matrix. . we have x(t) → 0 as t → ∞ for any x(0). A= −2. A(t) switches between the two values A1 and A2 every second.e. then the state remains indeﬁnitely in the positive quadrant. . then the eigenvalues of A are real.m. i. 2. (a) Find the precise conditions on A under which the system x = Ax is PQI. x1 (t).. i. Your conditions should be as explicit as possible. What are the conditions on A so that the ﬂow preserves volume. Suppose we have a set S ⊆ Rn and a linear dynamical system x = Ax.1444 5. Z is the ‘15 seconds forward predictor matrix’.9709 −0.0481 . S(t) is the image of the set S under the linear transformation etA . We say the system is positive quadrant invariant (PQI) if whenever x1 (T ) ≥ 0 and x2 (T ) ≥ 0..1164 (a) What are the eigenvalues of A? Is the system stable? You can use the command eig in matlab. 10.e. Thus.8847 −0. Consider the linear dynamical system x = A(t)x where ˙ A(t) = A1 A2 2n ≤ t < 2n + 1. 1. n = 0. In other words. x2 (t).9229 −4.e. We ˙ can propagate S along the ‘ﬂow’ induced by the linear dynamical system by considering S(t) = eAt S = { eAt s | s ∈ S }.0880 −0. n = 0.(d) If x(t) is in the invariant plane what can you say about the components of the vector QT x(t)? (e) Using the result of part (12d) verify that the trajectory you found in part (12c) is in the invariant plane. We say that this (timevarying) linear dynamical system is stable if every trajectory converges to zero.0939 2.1387 1. Try to express the ˙ conditions in the simplest form. (e) Brieﬂy comment on the size of the elements of the matrices Y and Z. . we have x1 (t) ≥ 0 and x2 (t) ≥ 0 for all t ≥ T . In other words. 2. . (c) Find the matrix Z such that Zx(t) gives x(t + 15). i. (f) Find x(0) such that x(10) = [1 1 1 1]T .16 Stability of a periodic system. Thus Y reconstructs what the state was 20 seconds ago. 10. 84 .9859 −3. Thus. for a few initial conditions. Find the conditions on A1 and A2 under which the periodic system is stable. We consider a system x = Ax with x(t) ∈ R2 (although the results ˙ of this problem can be generalized to systems of higher dimension).e. where F S = { F s | s ∈ S }.13 Positive quadrant invariance. 2n + 1 ≤ t < 2n + 2.1005 1.0753 −1.4575 3. (b) True or False: if x = Ax is PQI.0510 −5. To do this you can use the matrix exponential command in matlab expm (not exp which gives the element-by-element exponential of a matrix). ˙ 10.4599 −1.15 Volume preserving ﬂows. Consider the continuous-time system x = Ax where A can be found in ˙ sys_dynamics_matA. ﬁrst) quadrant. Verify that the qualitative behavior of the system is consistent with the eigenvalues you found in part (14a). 1. x3 (t) and x4 (t). 10.. (d) Find the matrix Y such that Y x(t) gives x(t − 20). .

k! Compute 4 iterates of the discrete-time LDS z(k + 1) = Bz(k). Add z1 (k) to the plot of the y1 (k). linear dynamical system x = Ax.10. We have seen in class that if x(t) is the solution to the continuous-time. . Plot ǫ as a function of N on a logarithmic scale (hint: use the matlab function loglog). . compute an approximation of the matrix exponential of hA by adding the ﬁrst ten term of the series: 10 B=I+ k=1 1 (hA)k . go ahead and compute the corresponding ﬁnal error ǫ. time-invariant. k! With A as above and h = 0. With this same A and x0 .17 Computing trajectories of a continuous-time LDS. Run your simulation from t = 0 to t = 2. For each of the four runs.. 1000. use the values 10. Hence. given by ǫ = y1 (N ) − x1 (2). . write x(t) in terms of the residue matrices Ri and associated eigenvalues λi . with z(0) = x0 . 0 −1 Write the solution x(t) for this dynamics matrix.e.5. i. . Compute x1 (2). . (d) Error in Euler approximation. with the initial condition x0 = [ 2 − 1 ] . How many steps do you estimate you would you need to achieve a precision of 10−6 ? (e) Matrix exponential. n. the value of the ﬁrst entry of x(t) at t = 2. What is the ﬁnal error ǫ = z1 (4) − x1 (2)? Note: The matlab function expm uses a much more eﬃcient algorithm to compute the matrix exponential. k = 0. plot the ﬁrst entry. compute an approximation to the trajectory x(t) by Euler approximation. 100. compute the ﬁnal error in x1 . . k = 0. ˙ then the Laplace transform of x(t) is given by X(s) = (sI − A) −1 x(0) = x0 . For each run. . we can obtain x(t) from the inverse Laplace transform of the resolvent of A: x(t) = L−1 (sI − A) −1 x0 . but the result is much more accurate. (The residue matrices are deﬁned in the previous problem. The matrix exponential is deﬁned by the series +∞ eA = I + k=1 1 k A . (c) Forward Euler approximation. . expm requires about the same computational eﬀort as is needed to add the ﬁrst ten terms of the series. . y1 (k). with diﬀerent step-sizes h. (a) Assuming that A ∈ Rn×n has n independent eigenvectors.) 85 . (If you’re curious. of each of the four sequences you obtain (with hk on the horizontal axis). For this example. For the number of steps N . i = 1. and 10000 (with the corresponding step-size h = 2/N ).) (b) Consider once again the matrix 1 3 A= . x0 . . you’ll obtain the sequence resulting from the discrete-time LDS y(k + 1) = (I + hA)y(k). On the same graph. with N steps. . N − 1 T with y(0) = x0 . . 3.

over all possible initial conditions. . . followed by vehicle 2 to its right. .e. given the problem data A. the minimum possible value of the output at time t. 86 i = 1. . . you are not required to ﬁnd a single initial condition. we deﬁne the minimum output or lower output envelope as y(t) = min{y(t) | x(0) − x0 ≤ r}. ˙ vi = u i − vi . and all stationary.19 Output response envelope for linear system with uncertain initial condition. We call x0 the nominal initial condition. We deﬁne the spacing between vehicle i and i + 1 as si (t) = yi+1 (t) − yi (t). (b) Find A. . . (When the vehicles are aligned. versus t.) We will investigate three control schemes for aligning the ﬂeet of vehicles.e. n − 1. • Right looking control is based on the spacing to the vehicle to the right. . . . We use the control law ui (t) = si (t) − 1.18 Suppose x = Ax with A ∈ Rn×n . Two one-second experiments are performed. 10. plot ynom . (b) Carry out your method on the problem data in uie_data. which move along a line with (scalar) positions y1 .10. . over the range 0 ≤ t ≤ 10. . We consider the autonomous linear dynamical system x = Ax. x0 . . i. (Here we take each vehicle mass to be one. the vehicles start out with vehicle 1 in the leftmost position. n. yn . and y. and include a damping term in the equations. vi = 0. On the same axes. What conditions must be satisﬁed for your procedure to work? 10. . . these spacings are all one. i. by ﬁrst computing the matrix exponential. and the resulting output. We let v1 .. The vehicle motions are governed by the equations y i = vi . we only know that it lies in a ball of radius r centered at the point x0 : x(0) − x0 ≤ r. un the net forces applied to the vehicles.) We assume that y1 (0) < · · · < yn (0). n. . i = 1. In the second. x(0) = [1 1]T ˙ and x(1) = [4 − 2]T .e. . x(0) = [1 2]T and x(1) = [5 − 2]T . describe a procedure for ﬁnding A using experiments ˙ with diﬀerent initial values. for i = 1. and u1 . y(t) = Cx(t). C. . We call this conﬁguration aligned. n− 1. . . and the goal is to drive the vehicles to this conﬁguration. . . . The goal is for the vehicles to converge to the conﬁguration yi = i.. to align the vehicles. the maximum possible value of the output at time t. labeled 1. . with vehicle n in the rightmost position. and so on.e.m. and r. i.. . the nominal output. In the ﬁrst. i. where x(t) ∈ Rn and y(t) ∈ R. i. (a) Explain how to ﬁnd y(t) and y(t). (Here you can choose a diﬀerent initial condition for each t. . We consider a ﬂeet of vehicles.. given x(0) = [3 − 1]T . over all possible initial conditions.20 Alignment of a ﬂeet of vehicles.. vn denote the velocities of the vehicles. We do not ˙ know the initial condition exactly. y. with unit spacing between adjacent vehicles. .5) or explain why you cannot (x(0) = [3 − 1]T ). (a) Find x(1) and x(2). for x = Ax with A ∈ Rn×n . (d) More generally. ﬁrst vehicle at position 1. ynom (t) = CetA x0 . . We deﬁne the maximum output or upper output envelope as y(t) = max{y(t) | x(0) − x0 ≤ r}.) In a similar way. .e. ˙ (c) Either ﬁnd x(1.

In this problem we analyze vehicle collisions. we apply a force on vehicle i proportional to its spacing error with respect to the vehicle to the right (i. 2. . ﬁnd the point of closest approach. i. In other words. for t ≥ 0. (Give the time. ﬁnd the earliest collision.for vehicles i = 1.e. say so. un (t) = −(yn (t) − n).. which has no vehicle to its left. . 0. we consider the speciﬁc case with n = 5 vehicles.. 2 2 i = 2.7) = 0 means that vehicles 3 and 4 collide at t = 5. uses a right looking control scheme. say so. and the ﬁrst vehicle. i. the others only need a measurement of the distance to their righthand neighbor. Be sure to give us times of collisions or closest approach with an absolute precision of at least 0.e. . For each of the three schemes above (whether or not they work). giving the time and the vehicles involved. . . If the vehicles do not collide. • Independent alignment is based on each vehicle independently adjusting its position with respect to its required position: ui (t) = −(yi (t) − i). v = (0.1. which occur when any spacing between vehicles is equal to zero. no matter what the initial positions and velocities are. 0. and the minimum spacing. i. Among the schemes that do work. • Left and right looking control adjusts the input force based on the spacing errors to the vehicle to the left and the vehicle to the right: ui (t) = si (t) − 1 si−1 (t) − 1 − . 0). n. determine if a collision occurs. which corresponds to the vehicles with zero initial velocity.e. . (For example. i = 1. . but not in the aligned positions. i.e. 3. which one gives the fastest asymptotic convergence to alignment? (If there is a tie between two or three schemes. If a collision does occur.e. two or more pairs of vehicles have the same distance of closest approach.. the vehicles involved. vehicle i + 1). which applies a force proportional to its position error. two pairs of vehicles collide at the same time. This scheme requires all vehicles to have absolute position sensors. n − 1. The rightmost vehicle uses the same absolute error method as in right looking control. 0. but the other vehicles only need to measure the distance to their neighbors. between any pair of vehicles. This scheme requires vehicle n to have an absolute position sensor. s3 (5. This control law has the advantage that only the rightmost vehicle needs an absolute measurement sensor. in the opposite direction.e. (a) Which of the three schemes work? By ‘work’ we mean that the vehicles converge to the alignment conﬁguration. 87 ... say so. .7. In the questions below.) We take the particular starting conﬁguration y = (0. . 7). . i.) In this part of the problem you can ignore the issue of vehicle collisions. (For example.) If there is a tie. the minimum spacing that occurs.7. The rightmost vehicle uses the control law un (t) = −(yn (t) − n). (b) Collisions.’) If there is a tie. . u1 (t) = s1 (t) − 1. ‘Vehicles 3 and 4 collide at t = 7. . n − 1. 5.. spacings that pass through zero.

10.0 0 0 0. 10.3 A= 0. xn+1 (t) are the temperatures of the elements in the cup. We now give the thermal model used. with all components 20.3 0. You can assume that A is such that. and the associated (optimal) terminal value cT x(T ). is spent on each initial state component (i. B/n. All of these are in degrees C. where B is a given positive budget. . Give us the terminal value obtained when the initial state has equal mass in each component.) The initial temperature distribution is 100 20 x(0) = . so the total value (or cost) of the components at time t is cT x(t). with xi (t) representing the total mass of species or component i at time t.e. represents the ambient temperature. Also give us the terminal value obtained when the same amount. . with t in seconds. ˙ is given by t x(t) = exp 0 a(τ ) dτ x(0).e. for the cup we use an n-state ﬁnite element model. is poured into an espresso cup. x(0)i = B/(nci )). and show that it satisﬁes x(t) = a(t)x(t). and B. . if and only if the oﬀ-diagonal entries of A are nonnegative.. the water is poured out.. (We ignore any extra cost that would be incurred in separating the components. with n > 1. Component i has (positive) value (or cost) ci . and espresso.1 Give the optimal x(0).23 Optimal espresso cup pre-heating. The vector x(t) ∈ Rn+1 gives the temperature distribution at time t: x1 (t) is the liquid (water or espresso) temperature at time t. . We take the temperature of the liquid in the cup (water or espresso) as one state. B = 1. . x(0) = α1. x(t) will also have all components nonnegative. Compare this with the optimal terminal value.) Your job is to choose the initial state. The problem data (i.. dt where A ∈ R(n+1)×(n+1) . . (b) Carry out your method on the speciﬁc instance with data 3. 10. and x2 (t). The dynamics of a bioreactor are given by x(t) = Ax(t). (You can assume this operation occurs instantaneously. The problem is to choose the pre-heating time P so as to maximize the temperature of the espresso when it is consumed. (This occurs.3 0 0. things you know) are A. that maximizes the total value at time T . At time t = 0 boiling water.e. for any t ≥ 0. at 100◦ C.1 0.) The espresso is then consumed exactly 15 seconds later (yes. (The vector 20 · 1. is poured in.1 0. T = 10. after P seconds (the ‘pre-heating time’). 20 88 .6 0 0.) (a) Explain how to solve this problem.1 0.1 2. for any x(0) with nonnegative components. T .2 0. More speciﬁcally. where x(t) ∈ R. Compare this with the optimal terminal value. (You can just diﬀerentiate this expression. ˙ where x(t) ∈ Rn is the state. that satisﬁes cT x(0) ≤ B. instantaneously).21 Scalar time-varying linear dynamical system.5 0.22 Optimal initial conditions for a bioreactor. you are to choose x(0).1 0 . c = 1. Show that the solution of x(t) = a(t)x(t). under a budget constraint.) Find a speciﬁc ˙ example showing that the analogous formula does not hold when x(t) ∈ Rn . The dynamics are d (x(t) − 20 · 1) = A(x(t) − 20 · 1). . with α adjusted so that the total initial cost is B. with initial temperature 95◦ C. with all entries nonnegative.2 0. i.4 0. by the way. . c.

Explain your method.5 s. You will ﬁnd it in espressodata. which gives an espresso temperature at consumption of 62.’ (This is not the correct answer. to 95. the other states do not change. In addition to A. the ambient. We have very generously derived the matrix A for you. and Tl (100). Te (95). submit your code.) 89 . Note that the dynamics of the system are the same before and after pre-heating (because we assume that water and espresso behave in the same way.m. and give ﬁnal answers. Give both to an accuracy of one decimal place. the liquid temperature changes instantly from whatever value it has. the ﬁle also deﬁnes n. thermally speaking). which must include the optimal value of P and the resulting optimal espresso temperature when it is consumed.3◦ C. respectively. espresso and preheat water temperatures Ta (which is 20). as in ‘P = 23. of course.At t = P . and.

node 2 does not know any of the absolute clock oﬀsets x1 . n. while xi < 0 means the ith clock is running behind the standard clock. the nodes exchange communications messages. The shift or oﬀset of clock i. ea = limk→∞ (1 + a/k)k . Thus we have xi (t + 1) = xi (t) + ai (t). n. Suppose w is a left eigenvector of A ∈ Rn×n with real negative eigenvalue λ. Speciﬁcally.1 Left eigenvector properties. . . . For example. 11. The set { z | α ≤ wT z ≤ β } is referred to as a slab. We refer to one node as a neighbor of another if they are connected by a link.2 Consider the linear dynamical system x = Ax where A ∈ Rn×n is diagonalizable with eigenvalues ˙ λi . (a) Find a simple expression for wT eAt . . 6. Thus xi > 0 means the clock at node i is running in advance of the standard clock. (c) Show that the slab { z | 0 ≤ wT z ≤ β } is invariant for x = Ax. (But remember. . . At discrete intervals. The clocks run at the same speed. NIST’s atomic clocks or the clock for the GPS system) will be denoted xi . 2 . with respect to some absolute clock (e. (b) Let α < β. .Lecture 11 – Eigenvectors and diagonalization 11. eigenvectors vi . . and left eigenvectors wi for i = 1.3 Another formula for the matrix exponential. with communication links shown as lines between the nodes. Describe the trajectories qualitatively. At each interval.4 Synchronizing a communication network. but are not (initially) synchronized. we introduce the numbers xi only so we can analyze the system. . 11. Brieﬂy explain this terminology.. Then it computes 2 90 . node 2 is able to ﬁnd out the diﬀerences x1 − x2 and x5 − x2 . . . Assume that λ1 > 0 and ℜλi < 0 for i = 2. An engineer suggests the following scheme of adjusting the clock oﬀsets. 1 4 3 5 6 Each node has a clock. Through this exchange each node is able to ﬁnd out the relative time oﬀset of its own clock compared to the clocks of its neighboring nodes. or x5 . you can assume A is diagonalizable. You will establish the matrix analog: for any A ∈ Rn×n . which we denote t = 0. Hint: diagonalize. You might remember that for any complex number a ∈ C. what happens to x(t) as t → ∞? Give the answer geometrically. which are labeled 1. Draw a picture in R2 . The new oﬀset takes eﬀect at the next interval.g. where ai (t) is the adjustment made by the ith node to its clock in the tth interval. ˙ 11. The nodes do not know their own clock oﬀsets (or the oﬀsets of any of the other clocks). it is able to adjust it by adding a delay or advance to it. eA = lim (I + A/k)k . k→∞ To simplify things. x2 . . each node determines its relative oﬀset with each of its neighboring nodes. 1. . in terms of x(0). The graph below shows a communication network. ..) While node i does not know its absolute oﬀset xi .

. e. Thus. so 0 < sk < 1. but you must explain what you are doing and why.e. Another engineer suggests a modiﬁcation of the scheme described above.2 bothers you. . The birth rate depends only on age. all people who make it to age n − 1 die during the year.) The death rate coeﬃcients satisfy 0 < di < 1.) 11. To avoid this. . . The birth rate coeﬃcients satisfy bi ≥ 0. this means: a2 (t) = 1 (x1 (t) − x2 (t)) + (x5 (t) − x2 (t)) . do all xi (t) − xj (t) converge to zero as t → ∞)? Does the system become synchronized no matter what the initial oﬀsets are.. n. . xi (t) is the number of people at year t. . For example. and not on time t. but we won’t make that explicit assumption. and x1 (t) (the number of 0 year-olds) denotes the number of people born since the last census. so synchronization does not occur.) The total population at year t is given by 1T x(t). n − 1. of age i − 1. • Birth rate. . where 1 ∈ Rn is the vector with all components 1. for node 2 we would have the adjustment a2 (t) = Finally. Speciﬁcally. . (a) What happens? (b) Why? We are interested in questions such as: do all the clocks become synchronized with the standard clock (i. We’ll assume that at least one of the bk ’s is positive. . Thus we have. or only for some initial oﬀsets? You are welcome to use matlab to do some relevant numerical computations. you can imagine the units as millions. (If x3 (4) = 1. • Death rate. . . i. 2 2 (x1 (t) − x2 (t)) + (x5 (t) − x2 (t)) . xk+1 (t + 1) = (1 − dk )xk (t).the average of these relative oﬀsets. the question. for node 2. . say. .. age below 11 and over 60. she proposes to adjust each node’s clock by only half the average oﬀset with its neighbors. In this problem we will study how some population distribution (say. of people) evolves over time. and not on time t. We assume n is large enough that no one lives to age n. We will not accept simulations of the network as an explanation. n − 1. 2 (c) Would you say this scheme is better or worse than the original one described above? If one is better than the other. The vector x(t) ∈ Rn will give the population distribution at year t (on some ﬁxed census date. at year 3. (Of course you’d expect that bi would be zero for non-fertile ages. k = 1. The node then adjusts its oﬀset by this average. .. denote time in years (since the beginning of the study). . i = 1. . x(t) → 0 as t → ∞)? Do the clocks become synchronized with each other (i.. 1. The death rate depends only on age. for t = 0. . and treat them as real numbers. n − 1. using a discrete-time linear dynamical system model. k = 1. e. (As mentioned above. Let t = 0. .) 91 ..e. . We deﬁne the survival rate coeﬃcients as sk = 1 − dk . Thus x5 (3) denotes the number of people of age 4. etc. how is it better? (For example. She notes that if the scheme above were applied to a simple network consisting of two connected nodes.e. we assume that dn = 1. 1. We’ll also ignore the fact that xi are integers. The coeﬃcient bi is the fraction of people of age i − 1 who will have a child during the year (taking into account multiple births).g. ..g. The coeﬃcient di is the fraction of people of age i − 1 who will die during the year. January 1).5 Population dynamics. . then the two nodes would just trade their oﬀsets each time interval. does it achieve synchronization from a bigger set of initial oﬀsets. . does it achieve synchronization faster. Thus the total births during a year is given by x1 (t + 1) = b1 x1 (t) + · · · + bn xn (t). i = 1.

s1 · · · sk make it to age k. with ﬁve symbols 1.) (f) All the eigenvalues of A are real. x(t + 1) = Ax(t) and z(t + 1) = Az(t). (b) Draw a block diagram of the system found in part (a). If 1T x(0) > 1T z(0). ﬁnd ˜ ˜ a matrix A such y(t + 1) = Ay(t). and assume that all components of x(0) and z(0) are positive. . For each person born. i. then 1T x(t) > 1T z(t) for t = 1. 5. Therefore we don’t have to worry about negative xi (t). Find the rate of this code. and by ‘false’ we mean false for some choice of coeﬃcients consistent with our assumptions. The number R = lim log2 KN N (g) If dk ≥ bk for k = 1. so long as our initial population distribution x(0) has all positive components. the birth and survival rate coeﬃcients). (In words: we consider two populations that satisfy the same dynamics. Consider the Markov language described in exercise 12.) (a) Express the population dynamics model described above as a discrete-time linear dynamical system. .e. n. i. That is. . n. the ‘average’ birth rate is less than the ‘average’ death rate. n. then xi (t) > 0 for i = 1. (Of course by ‘true’ we mean true for any values of the coeﬃcients consistent with our assumptions. . 92 .. . (c) Find the characteristic polynomial of the system explicitly in terms of the birth and death rate coeﬃcients (or. if you prefer. Then the population that is initially larger will always be larger. ﬁnd a matrix A such that x(t + 1) = Ax(t). .) (e) Let x and z both satisfy our population dynamics model. .. 3. . (To use fancy language we’d say the system is positive orthant invariant. Then 1T x(t) → 0 as t → ∞. Compare it to the rate of the code which consists of all sequences from an alphabet of 5 symbols (i. 11. . . s1 s2 make it to age 2. . Let KN denote the number of allowed sequences of length N . with no restrictions on which symbols can follow which symbols). That is. 4.. 2. Determine whether each of the next four statements is true or false. and the following symbol transition rules: • 1 must be followed by 2 or 3 • 2 must be followed by 2 or 5 • 3 must be followed by 1 • 4 must be followed by 4 or 2 or 5 • 5 must be followed by 1 or 3 (a) The rate of the code.e. i. . and in general. then 1T x(t) → 0 as t → ∞. (d) Survival normalized variables. N →∞ (if it exists) is called the rate of the code. in bits per symbol. s1 make it to age 1. .e.The assumptions imply the following important property of our model: if xi (0) > 0 for i = 1. . (h) Suppose that (b1 + · · · + bn )/n ≤ (d1 + · · · dn )/n. 2. .e. the population goes extinct. We deﬁne yk (t) = xk (t) s1 · · · sk−1 (with y1 (t) = x1 (t)) as new population variables that are normalized to the survival rate.6 Rate of a Markov code... Express the population dynamics as a linear dynamical system using the variable y(t) ∈ Rn .

we have FN.. 11. where c ∈ Rn is nonzero. left-.1 + · · · + GN. . 11. 0 0 ··· ··· ··· . an invertible matrix T ∈ Cn×n and diagonal matrix Λ ∈ Cn×n exist such that A = T ΛT −1 . or ﬁrst or last column. N →∞ gi = lim GN. where ˙ A ∈ Rn×n and x(t) ∈ Rn . . Is the logarithm unique? What if we insist on B being real? 11. When λi are real and nonnegative. . Let x = Ax where A is top-campanion. This means it is impossible to have ˜ ˜ cT x(t) > 0 for some t. . (d) Find the eigenvector of A associated with the eigenvalue λ. (b) We say B is a logarithm of A if eB = A.(b) Asymptotic fraction of sequences with a given starting or ending symbol.5 = GN. (e) Suppose that A has distinct eigenvalues λ1 . the vector c = βc ˜ deﬁnes the same hyperplane. . These are referred to as top-. 0 is said to be a (top) companion matrix. (If you think there is always a separating hyperplane for a linear system. say so. it is conventional to take µi = λi (i. and we write B = log A. the nonnegative squareroot). Find the asymptotic fractions fi = lim FN. Let Λ = diag(λ1 .8 Squareroot and logarithm of a (diagonalizable) matrix. Let FN. Suppose that A ∈ Rn×n is diagonalizable. .i denote the number of allowed sequences of length N that end with symbol i. or right-companion matrices.e. In particular. . ˙ (a) Draw a block diagram for the system x = Ax. . . (a) We say B ∈ Rn×n is a squareroot of A if B 2 = A. . ˙ (b) Find the characteristic polynomial of the system using the block diagram and show that A is nonsingular if and only if an = 0. and cT x(t) < 0 for some other t. . . We say that the hyperplane deﬁned by c is a separating hyperplane for this system if no trajectory of the system ever crosses the hyperplane.i /KN .5 = KN . bottom-. Following the idea of part a. .) Now consider the autonomous linear dynamic system x = Ax. 93 . Show that B = i T diag(µ1 . (c) Show that if A is nonsingular.i /KN .) You can assume that A has distinct eigenvalues (and therefore is diagonalizable). . ··· −an−1 0 0 1 −an 0 0 . ﬁnd an expression for a logarithm of A (which you can assume is invertible). then A−1 is a bottom-companion matrix with last row −[1 a1 · · · an−1 ]/an . N →∞ Please don’t ﬁnd your answers by simple simulation or relatively mindless computation. λn . Therefore. . we want to see (and understand) your method. Let µi satisfy µ2 = λi . so in this case A1/2 is unambiguous. µn )T −1 is a squareroot of A. A matrix A of the form −a1 −a2 1 0 0 1 A= .1 + · · · + FN. .. Thus. and let GN.i denote the number of allowed sequences of length N that start with symbol i.9 Separating hyperplane for a linear dynamical system. give the conditions on A under ˙ which there is no separating hyperplane. Find T such that T −1 AT is diagonal. A squareroot is sometimes denoted A1/2 (but note that there are in general many √ squareroots of a matrix). for any trajectory x of the system. . There can be four forms of companion matrices depending on whether the ai ’s occur in the ﬁrst or last row. A hyperplane (passing through 0) in Rn is described by the equation cT x = 0. Explain how to ﬁnd all separating hyperplanes for the system x = Ax.7 Companion matrices. (Note that if β = 0. λn ).

and the angle between any pair of the vectors is θ. that satisﬁes the total energy constraint. . independent eigenvectors v1 . suppose that x1 . .10 Equi-angle sets. It’s easy to ﬁnd such sets: just take x1 = 1 0 . .g. we do know real. i.e. for any other input sequence u(0).11.) 94 . 11. and that it has a single dominant eigenvalue (which here.g. We say that they form a (normalized) equi-angle set. . In R2 . we know the answer: any value of θ between 0 and π is possible. You can use any of the ideas from the class. Attach matlab output to verify that your four vectors are unit vectors. In other words. u(T − 1) that maximize the norm of the state for large t. i = 1. xn is an equi-angle set in Rn . . with n > 2. and (xi . you should not use limits in your solution. Unless you have to. you might ﬁnd a clever way to ﬁnd all the angles at once. and then say. what is the maximum possible value θ can have? (b) Construct a speciﬁc equi-angle set in R4 for angle θ = 100◦ = 5π/9. SVD. Also. Then we have x2 = −x1 (since (x1 . . T − 1. e1 . . ˜ e. x3 ) = π). . Let x1 . Here the matrix E is measurement error. The input is subject to a total energy constraint: u(0) 2 + · · · + u(T − 1) 2 ≤ 1. where x(t) ∈ Rn . . Ameas = A + E. . but we do have a noisy measurement of it.11 Optimal control for maximum asymptotic growth. however. n. with angle θ. eigenvector decomposition. and.. . etc. satisﬁes ˜ ˜ x(t) ≥ x(t) for t large enough. you only have to check 6 angles. . We consider the controllable linear system x(t + 1) = Ax(t) + Bu(t). You can assume that A is diagonalizable.. . . . u(T − 1) is applied over time period 0. we’re searching for u(0). The question then arises. In Rn . are not known. so (x2 . means that there is one eigenvalue with largest magnitude). i. there are equi-angle sets for every value of θ.. with θ = π/2. . xn ∈ Rn . . . n. e. (Since (u. .) 11. j = 1. . but also x3 = −x1 (since (x1 . u). if xi = 1. . . however. . pseudo-inverse. . xj ) = θ. The goal is to choose the inputs u(0). x(0) = 0. . . (a) For general n. . An orthonormal set is a familiar example of an equi-angle set. (Its eigenvalues λ1 . we have u(t) = 0. . . u(T − 1) that satisﬁes the total energy constraint. each of the vectors has unit norm. . you can’t have an equi-angle set with angle θ = π. . . x3 ) = 0. . . en . v) = (v. describe the values of θ for which there is an equi-angle set with angle θ. 1. To see this. This problem is about estimating a matrix A ∈ Rn×n . for every value of θ there is an equi-angle set with angle θ. We’ll take θ to be between 0 and π.12 Estimating a matrix with known eigenvectors. . x2 = cos θ sin θ . . with n > 2. To be more precise. . λn . x2 ) = π). For example you cannot explain how to make x(t) as large as possible (for a speciﬁc value of t). In particular. Explain how to do this. and so does θ = π/2 (just choose any orthonormal basis. . “Take the limit as t → ∞” or “Now take t to be really large”. for what values of θ (between 0 and π) can you have an equi-angle set on Rn ? The angle θ = 0 always has an equi-angle set (just choose any unit vector u and set x1 = · · · = xn = u). The matrix A is not known. While A is not known. An input u(0). u(T − 1). for t ≥ T . . . . . Be sure to summarize your ﬁnal description of how to solve the problem. . But what other angles are possible? For n = 2. . vn of A. i = j. . . . . which is assumed to be small. and that the angle between any two of them is 100◦ . u(t) ∈ Rm .

. We conclude that if λ is an eigenvalue of A.) ˆ (a) Explain how you would ﬁnd A.0 Ameas = 0. then reshape(a. (You can assume that we only consider values of u for which this series converges. eigenvalue decomposition. (The entries of A will be drawn from a unit normal distribution. we choose A J= 1 n2 n i. generate a random 3 × 3 matrix.) In matlab. If your method is iterative.6 .e.. give the value of J for A. with elements taken from a (column by column). least-norm.j=1 ˆ (Ameas − Aij )2 ij ˆ among all matrices which have eigenvectors v1 . (b) Carry out your method 2.e. Hint. You can use any of the methods (least-squares.6 v3 = 0. Also. Gauss-Newton.n) is an m × n matrix..9 1.) Find the eigenvalues of A.5 . then ﬁnding its eigenvalues. we deﬁne f (A) as f (A) = a0 I + a1 A + a2 A2 + · · · (again. A is the matrix closest to our measurement. computes S −1 AS to verify it has the required form). and also by using the spectral mapping theorem.7 0.) or decompositions (QR.ˆ We will combine our measurement of A with our prior knowledge to ﬁnd an estimate A of A. For example. where v = 0. and λ ∈ C. Show that f (A)v = f (λ)v (ignoring the issue of convergence of series). then f (λ) is an eigenvalue of f (A). 0. i. written out column by column. Suppose that Av = λv. This is called the spectral mapping theorem.3 11. low rank approximation. . writing a vector out as a matrix with some given dimensions. 0.2 −1. To illustrate this with an example.0 2. the source code that you use to ﬁnd S. we’ll just assume that this converges). please generate a new instance of A. (You should get very close agreement. 0.6 . Find the real modal form of A. . Your solution should include a clear explanation of how you will ﬁnd S. You might ﬁnd the following useful (but then again. or an approximate solution. and some code that checks the results (i. a matrix S such that S −1 AS has the real modal form given in lecture 11.e.7 v1 = 0 .. Therefore norm(A(:)) gives the squareroot of the sum of the squares of entries of the matrix A. 0. Find the eigenvalues of (I + A)(I − A)−1 by ﬁrst computing this matrix.4 −0.3 v2 = 0. v2 . any diﬀerence is due to numerical round-oﬀ errors in the various compuations. that is consistent with the known eigenvectors. then A(:) is a (column) vector consisting of all the entries of A. for example using A=randn(3). i.14 Spectral mapping theorem. 11. of course. say whether you can guarantee convergence. 0.m. in the mean-square sense. its Frobenius norm. v3 as eigenvectors. SVD. Suppose f : R → R is analytic. etc.0 −0.) For A ∈ Rn×n .0 ˆ Be sure to check that A does indeed have v1 . Be sure to say whether your method ﬁnds the exact minimizer of J (except.e. vn .5 with the data 1. is done using the function reshape. . for numerical error due to roundoﬀ).) 95 . The inverse operation.13 Real modal form. given by a power series expansion f (u) = a0 + a1 u + a2 u2 + · · · (where ai = f (i) (0)/(i!)). by (numerically) ﬁnding its ˆ eigenvectors and eigenvalues. To do ˆ as the matrix that minimizes this. i.e. i. etc. you might not. .) from the course.. If by chance they are all real. if a is an mn vector. Generate a matrix A in R10×10 using A=randn(10).7 0. if A is a matrix. (Thus.

across n sectors. Show that if λ ∈ C is a nonzero eigenvalue of AB.2 0. This total revenue is then spent in the sectors proportionally. encouraged) to simulate the economic activity to double-check your answer. at the beginning of year t.2 Find the growth rate. but we want the value using the expression found in part (a). Express the growth rate G in terms of the problem data r. which accounts for the government taxes and spending. ˜ i = 1. n. The total government revenue is then R(t) = rT x(t). including the eﬀects of government taxes and spending. We let S(t) = n i=1 x(t)i denote the total economic activity in year t. 11.1 0. s.15 Eigenvalues of AB and BA.2 0.2 0. Economic activity propagates from year to year as x(t + 1) = E x(t). r = 0 (in which case s doesn’t matter). .e.. which will imply that all entries of x(t) and x(t) are positive. These rates all satisfy 0 ≤ ri < 1. across n sectors.25 r= 0. .3 s= 0. The post-tax economic activity in sector i.and post-tax activity levels are related as follows. with ri the tax rate for sector i.4 0. which we assume (for simplicity) takes place at the beginning of each year. where E ∈ Rn×n is the input˜ output matrix of the economy. .15 96 . ˜ The pre. Hint: Suppose that ABv = λv. We will assume that all entries of x(0) are positive. These spending proportions n satisfy si ≥ 0 and i=1 si = 1. We let x(t) ∈ Rn denote the post-tax economic ˜ activity. the spending in sector i is si R(t). You may assume that a matrix that arises in your analysis is diagonalizable and has a single dominant eigenvalue. using ideas from the course. with x(t)i being the pre-tax activity level in sector i.) (b) Consider the problem instance with data 0. .e. 1. which we rule out as essentially impossible). for all t ≥ 0. .3 0. .6 0. and E. . Conclude that the nonzero eigenvalues of AB and BA are the same.3 0. The government taxes the sector activities at rates given by r ∈ Rn . an eigenvalue λ1 that satisﬁes |λ1 | > |λi | for i = 2.7 0. Let x(t) ∈ Rn represent the pre-tax economic activity at the beginning of year t. You are welcome (even. (These assumptions aren’t actually needed—they’re just to simplify the problem for you. . . . . 0. In this problem we explore a dynamic model of an economy.2 E= 0.1 0. You can assume that all entries of E are positive. .16 Tax policies.. where v = 0.45 0. is then given by x(t)i = x(t)i − ri x(t)i + si R(t). with s ∈ Rn giving the spending proportions in the sectors.3 0.11. n. λ = 0. t = 0. i. i. 0. 0.4 . Construct a w = 0 for which BAw = λw. then λ is also an eigenvalue of BA. and we let G = lim S(t + 1) S(t) t→∞ denote the (asymptotic) growth rate (assuming it exceeds one) of the economy. Now ﬁnd the growth rate with the tax rate set to zero.15 0.1 0. (a) Explain why the growth rate does not depend on x(0) (unless it exactly satisﬁes a single linear equation. .4 0.1 0. Suppose that A ∈ Rm×n and B ∈ Rn×m .1 .2 0.

. the statement “A + B = B + A” is true. where x(t) ∈ Rn . You can’t assume anything about the dimensions of the matrices (unless it’s explicitly stated). . because no matter what the dimensions of A and B are (they must. and is diagonalizable. (d) If A. hence asymptotically T -periodic). 97 . then every eigenvalue of AB is an eigenvalue of BA. Determine if the following statements are true or false. You do not need to formally prove your answer.) Make sure your answer works for ‘silly’ cases like A = 0 (for which all trajectories are constant. 12. then A is singular. . (e) If A. (g) If A is nonsingular and A2 is diagonalizable.2 Consider the discrete-time system x(t + 1) = Ax(t). (a) Find x(t) in terms of x(0). (a) If A ∈ Rm×n and B ∈ Rn×p are both full rank. to isolate it from any (brief) discussion or explanation. (a) Find the Jordan form J of C. B ∈ Rn×n are both diagonalizable. (The period T can appear in your answer. . it can fail to hold for some values of the matrices and vectors that appear in it. (c) If Ak = 0 for some integer k ≥ 1. λn .1 Some true/false questions. where A ∈ Rn×n . e. For example. without knowing x(0)? 12. . then AB is diagonalizable. (b) Suppose that det(zI − A) = z n . But that doesn’t make the statement true.3 Asymptotically periodic trajectories.) “False” means the statement isn’t true. Describe the precise conditions on A under which ˙ all trajectories of x = Ax are asymptotically T -periodic. B ∈ Rn×n . (There are also matrices for which it does hold. however. 12. As another example. . or stable systems (for which all trajectories converge to 0. a brief explanation will suﬃce.Lecture 12 – Jordan canonical form 12. λn . (b) If A ∈ R3×3 satisﬁes A + AT = 0. (b) Find a matrix T such that J = T −1 CT . . (f) If A. be the same). the statement A2 = A is false. with real. . Mark your answer clearly. .4 Jordan form of a block matrix. then every eigenvector of AB is an eigenvector of BA. What are the eigenvalues of A? What (if anything) can you say about x(k) for k < n and k ≥ n.) Now consider the (time-invariant) linear dynamical system x = Ax.. then I − A is nonsingular. . then n ≥ m + p. vn denote (independent) eigenvectors of A associated with λ1 . . Be sure to explicitly describe its block sizes.g. in other words. We say that x : R+ → Rn is asymptotically T -periodic if x(t + T ) − x(t) converges to 0 as t → ∞. and no matter what values A and B have. We consider the block 2 × 2 matrix C= A I 0 A . hence are asymptotically T -periodic). distinct eigenvalues λ1 . We’ll let v1 . because there are (square) matrices for which this doesn’t hold. (We assume T > 0 is ﬁxed. What we mean by “true” is that the statement is true for all values of the matrices and vectors that appear in the statement. but you can assume that the dimensions are such that all expressions make sense. Here A ∈ Rn×n . and AB = 0. No justiﬁcation or discussion is needed for your answers. . Give your answer in terms of the Jordan form ˙ of A. an identity matrix. B ∈ Rn×n . the statement holds. then A is diagonalizable.

state. t = 0.9 −0. Consider the discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t).3 . You can assume all the matrices are the correct (i.1 Interconnection of linear systems.0 −0. w2 = Cx + D1 u + D2 w1 . B = A= 0 0. . 13. . 2.7 0. The initial state is x(0) = 0.7 0. respectively. ˙ and subsystem T is characterized by z = F z + G1 v + G2 w2 .1 0.4 −0. (a) Find the matrix CT such that u(T − 1) . and feedthrough matrices of the overall system.8 −0. The dynamics and . Note that the subscripts in the matrices above. as in B1 and B2 . Subsystem S is characterized by x = Ax + B1 u + B2 w1 . compatible) dimensions. y = H2 z + Jw2 . which are themselves linear systems. go ahead. x z .1 0 0. and output given by u v . .Lecture 13 – Linear dynamical systems with inputs and outputs 13. 1.2 Minimum energy control. there are two subsystems S and T .5 98 n = 4. Express the overall system as a single linear dynamical system with input. We don’t specify the dimensions of the signals (which can be vectors) or matrices here. ˙ w1 = H1 z. . If you need to make any assumptions about the rank or invertibility of any matrix you encounter in your derivations.. . Be sure to explicitly give the input. In this problem you consider the speciﬁc interconnection shown below: w1 w2 u S v T y Here. u(1) u(0) (b) For the remainder of this problem. Often a linear system is described in terms of a block diagram showing the interconnections between components or subsystems. But be sure to let us know what assumptions you are making. .5 0. where x(t) ∈ Rn . and the input u(t) is a scalar (hence. y. refer to diﬀerent matrices.e. output. x(T ) = CT .4 −0. we consider a speciﬁc system with input matrices are 1 0.1 0. dynamics.7 −0. Now the problem. A ∈ Rn×n and B ∈ Rn×1 ).6 0.5 1 0.

ﬁnd the minimum energy inputs that achieve x(T ) = xdes . u(T − 1) is E(T ) = T −1 t=0 (u(t)) . We’ll cover these ideas in more detail in ee363.3 Output feedback for maximum damping. Note: There is a direct way of computing the assymptotic limit of the minimum energy as T → ∞. (c) Suppose the energy expended in applying inputs u(0). such that x(T ) = xdes for any xdes ∈ R4 ? We’ll denote this T by Tmin .9 99 For a given T (greater than Tmin ) and xdes . ¯ x(t + 1) = Ax(t). Show that this is the case in general (i.Suppose we want the state to be xdes at time T .2 0.5 1. (You should include in your solution a description of how you computed the minimum-energy inputs.1 . The resulting state trajectory is identical to that of an autonomous system. and the plot of the minimum energy as a function of T . But you don’t need to list the actual inputs you computed!) (d) You should observe that Emin (T ) is non-increasing in T . Consider the desired state 0. u(t) = Ky(t). single-output system with 1 0. where K ∈ Rm×p is the feedback gain matrix. In output feedback control we use an input which is a linear function of the output. (b) Consider the single-input. . and xdes ).8 2.5 −0. ¯ (a) Write A explicitly in terms of A. 0 0.3 xdes = −0.0 0. Consider the discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t). 1 C= 0 1 0 . C ∈ Rp×n .0 0. . with A ∈ Rn×n . . such that x(T ) = xdes ? What are the corresponding inputs that achieve xdes at this minimum time? What is the smallest T for which we can ﬁnd inputs u(0).7 .1 A = −0. how can you compute the inputs which achieve x(T ) = xdes with the minimum expense of energy? Consider now the desired state −1 1 xdes = 0 . −0. . . u(T − 1). evaluate the corresponding input energy. . . y(t) = Cx(t).3 What is the smallest T for which we can ﬁnd inputs u(0). . C. B = 0 . For each T .1 0. and K. u(T − 1). that is. 13. Plot Emin (T ) as a function of T . B. which we denote by Emin (T ).. .e. . . . B. 2 For each T ranging from Tmin to 30. . for any A. B ∈ Rn×m .

“turned around.) Hint: You are only required to give your answer Kopt up to a precision of ±0. y = Cx + Du.e.) (c) Find the dual of the LDS found in (a). In the ﬁrst experiment.6 Cascade connection of systems. (a) Two linear systems (A1 .. 2]. you can assume D1 = 0. y = Cx + Du. input.7 Inverse of a linear system. We can generalize linear dynamical systems to aﬃne dynamical systems.” with all arrows reversed. and y are the state. Find state equations for the cascade system: u H1 (s) H2 (s) y Use the state x = [xT xT ]T . You will ﬁnd a linear system with transfer function H(s)−1 . not two scalar components of one vector). In the second. B1 . where D is square and invertible. u(t) = e−t and the resulting output is y(t) = e−3t + e−2t . C. D2 = 0. Show that x. D1 ) and (A2 . By maximally damped. which have the form x = Ax + Bu + f. do so. B. (b) Can you determine A. Assuming A is invertible. the feedback gain matrix K is a scalar (which we call simply the feedback gain. ﬁnd two linear systems consistent with all the data given which have diﬀerent transfer functions. a simple shift of coordinates converts it to a linear dynamical system. respectively.) Remark: quite generally. (To simplify. y = Cx + Du + g.01. we mean that the state goes to zero with the fastest asymptotic decay rate (measured for an initial state x(0) with non-zero coeﬃcient in the slowest mode. Draw a block diagram of the dual system as a cascade connection of two systems. you can assume D1 = 0. If not. have transfer functions H1 (s) and H2 (s). the block diagram corresponding to the dual system is the original block diagram. B2 . deﬁne x = x + A−1 f ˜ and y = y − g + CA−1 f . u.4 Aﬃne dynamical systems. and you can assume that Kopt ∈ [−2. (a) Start with x = Ax + Bu. of the form f (x) = Ax + b. Interpret the result as a linear system with state ˙ x. Your answer ˙ ˙ will have the form: x = Ex + F y. 13. D2 = 0. input y. or D? 13.In this case. u(t) = e−3t and the resulting output is y(t) = 3e−3t − e−2t . 100 . 13.) The question is: ﬁnd the feedback gain Kopt such that the feedback system is maximally damped. Aﬃne functions are more general than linear functions. C2 . C1 . i.5 Two separate experiments are performed for t ≥ 0 on the single-input single-output (SISO) linear system x = Ax + Bu. ˙ Fortunately we don’t need a whole new theory for (or course on) aﬃne systems. x(0) = [1 2 − 1 ]T ˙ (the initial condition is the same in each experiment). and output u. D2 ) with states x1 and x2 (these are two column vectors. 13. 1 2 (b) Use the state equations above to verify that the cascade system has transfer function H2 (s)H1 (s). u = Gx + Hy. and solve for x and u in terms of x and y. A function f : Rn → Rm is called aﬃne if it is a linear function plus a constant. which result when b = 0. (To simplify. Suppose H(s) = C(sI − A)−1 B + D. (a) Can you determine the transfer function C(sI − A)−1 B + D from this information? If it is possible. and output of a linear dynamical ˜ ˜ ˜ system.

.9 Static decoupling. . b2 = 2 to a platform.e. Forces u1 and u2 are applied to the ﬁrst two masses. The input u is given by u(t) = ud (k) for kh + δ ≤ t < (k + 1)h + δ. are synchronized. y2 . 1. 101 . k = 0. and y3 .e. the state and output are sampled every h seconds). Give the matrices that describe this LDS. Consider the mechanical system shown below. The platform is attached to the ground by a spring/damper suspension with stiﬀness k3 = 3 and damping b3 = 3. Hint: use the following “resolvent identity:” (sI − X)−1 − (sI − Y )−1 = (sI − X)−1 (X − Y )(sI − Y )−1 which can be veriﬁed by multiplying by sI − X on the left and sI − Y on the right. where h > 0 (i. 1. The displacements of the masses (with respect to ground) are denoted y1 . where δ is a delay or oﬀset in the input update.. yd (k) = y(kh).. y = Cx + Du. 13. Find a discrete-time LDS with ud as input and yd as output.8 Oﬀset or skewed discretization. 13. In the lecture notes we considered sampling a continuous-time system in which the input update and output sampling occur at the same time. . y1 u1 u2 y2 m1 m2 k1 y3 b1 b2 k2 m3 k3 b3 Two masses with values m1 = 1 and m2 = 2 are attached via spring/damper suspensions with stiﬀnesses k1 = 1. k2 = 2 and damping b1 = 1. . . with 0 ≤ δ < h. We deﬁne ˙ the sequences xd and yd as xd (k) = x(kh). . i. which is another mass of value m3 = 3. k = 0. Consider the continuous-time LDS x = Ax + Bu. In this problem we consider what happens when there is a constant time oﬀset or skew between them (which often happens in practice).(b) Verify that (G(sI − E)−1 F + H)(C(sI − A)−1 B + D) = I.

two-. y2 and y3 ). Determine whether each of these systems is stable. i. y2 and y3 .e.. choose u(t) that minimizes x(t + 1) . We assume that B1 (0) + B2 (0) + B3 (0) = 1. In this problem you will analyze this scheme...(a) Find matrices A ∈ R6×6 and B ∈ R6×2 such that the dynamics of the mechanical system is given by x = Ax + Bu where ˙ x = [y1 y2 y3 y1 y2 y3 ]T .. In order to make the steady-state deﬂections of masses 1 and 2 independent of each other..e. • 0. • B2 (t) denotes the amount of two-year CDs bought at period t.) an investor buys certain amounts of one-. A. and assume they can be bought in any amount. (You can take Bj (t) to be zero for t < 0. i.e. and 7%.e.06B2 (t − 2). • 1. a total of 1 is to be invested at t = 0. . where A ∈ Rn×n and B ∈ Rn×k . i. i.e. u = [u1 u2 ]T .11 Analysis of investment allocation strategies.10 A method for rapidly driving the state to zero. p(t). and three-year certiﬁcates of deposit (CDs) with interest rates 5%. Compare the behavior of x(t + 1) = Ax(t) (i. (a) Find an explicit expression for the proposed input u(t) in terms of x(t). (d) Design of an asymptotic decoupler. since the norm of the state is made as small as possible at every step. the orginal system with u(t) = 0) and x(t + 1) = F x(t) (i.. We consider the discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t). 13..07B3 (t − 3). the original system with u(t) chosen by the scheme described above) for a few initial conditions. B= 1 1 . • 1. the step responses from inputs u1 and u2 to outputs y1 . where ycmd : R+ → R2 . i. . The goal is to choose an input u that causes x(t) to converge to zero as t → ∞.e.. The engineer argues that this scheme will work well. principle plus 6% interest on the amount of two-year CDs bought two years ago. 102 . Brieﬂy interpret and explain your plots. principle plus 5% interest on the amount of one-year CDs bought one year ago.) • B1 (t) denotes the amount of one-year CDs bought at period t. (c) Now consider a speciﬁc case: A= 0 3 0 0 . Show that x satisﬁes an autonomous linear dynamical system equation x(t + 1) = F x(t). at period t is a sum of six terms: • 1. Plot the step responses from ycmd to y1 and y2 . . and B. we let u = H(0)−1 ycmd . and compare with the original ones found in part b.. respectively.e. (We ignore minimum purchase requirements. 6% interest on the amount of two-year CDs bought one year ago. (c) Find the DC gain matrix H(0) from inputs u1 and u2 to outputs y1 and y2 .) The total payout to the investor. (b) Now consider the linear dynamical system x(t+1) = Ax(t)+Bu(t) with u(t) given by the proposed scheme (i. is full rank.06B2 (t − 1).e. i. (b) Plot the step responses matrix.e. • B3 (t) denotes the amount of three-year CDs bought at period t. principle plus 7% interest on the amount of three-year CDs bought three years ago.05B1 (t − 1). as found in (10a)). 6%. k < n. An engineer proposes the following simple method: at time t. 13. ˙ ˙ ˙ Ignore the eﬀect of gravity (or you can assume the eﬀect of gravity has already been taken into account in the deﬁnition of y1 . 1. Express the matrix F explicitly in terms of A and B. Each year or period (denoted t = 0.

• 0. As feature sizes shrink to well below a micron (i. 103 . and B3 (0) that satisfy B1 (0)+B2 (0)+B3 (0) = 1.35. would the asymptotic growth rate be larger?) How much better is your choice of initial investment allocations? Hint for part d: think very carefully about this one.. 20% in two-year CDs. matrices. Two re-investment allocation strategies are suggested. determine the asymptotic growth rate. and B3 (0) = 0. (a) Describe the investments over time as a linear dynamical system x(t + 1) = Ax(t). i. 7% interest on the amount of three-year CDs bought one year ago.20. choose some other nonnegative values for B1 (0). and how would it be better than the (0. 35% in two-year CDs.60.• 0. or nearly nondiagonalizable.. 0. must be taken into account. Hint for whole problem: watch out for nondiagonalizable.35.07B3 (t − 2). You will have two such linear systems: one for the 35-35-30 strategy and one for the 60-20-20 strategy.35. (You can. What allocation would you pick.12 Analysis of cross-coupling in interconnect wiring. of course. deﬁned as limt→∞ w(t + 1)/w(t).) (c) Asymptotic liquidity. as well as to neighboring wires. simulate the strategies to check your answer..30) initial allocation? (For example. In integrated circuits. The total payout is re-invested 35% in one-year CDs. A simple lumped model of three nets is shown below.. u3 . Don’t just blindly type in matlab commands.e. and what the matrices A and C are. • The 60-20-20 strategy. wires which connect the output of one gate to the inputs of one (or more) other gates are called nets. • The 35-35-30 strategy. = B3 (t)/w(t). ‘deep submicron’) the capacitance of a wire to the substrate (which in a simple analysis can be approximated as ground).. The initial investment allocation is the same: B1 (0) = 0. 7% interest on the amount of three-year CDs bought two years ago.e. y(t) = Cx(t) with y(t) equal to the total wealth at time t. The total wealth at time t can be divided into three parts: • B1 (t) + B2 (t − 1) + B3 (t − 2) is the amount that matures in one year (i.e. 0. plotting w(t + 1)/w(t) versus t to guess its limit) is not acceptable.) Note: simple numerical simulation of the strategies (e. is least liquid) L1 (t) L2 (t) L3 (t) = (B1 (t) + B2 (t − 1) + B3 (t − 2))/w(t)..g. For each of the two strategies described above. The total wealth held by the investor at period t is given by w(t) = B1 (t) + B2 (t) + B2 (t − 1) + B3 (t) + B3 (t − 1) + B3 (t − 2). do the liquidity ratios converge as t → ∞? If so. and 20% in three-year CDs.e. i. B2 (0). For the two strategies above. (d) Suppose you could change the initial investment allocation for the 35-35-30 strategy. the amount of principle we could get back next year) • B2 (t) + B3 (t − 1) is the amount that matures in two years • B3 (t) is the amount that matures in three years (i. Be very clear about what the state x(t) is. i.07B3 (t − 1). and 30% in three-year CDs. and B3 (0) = 0. We deﬁne liquidity ratios as the ratio of these amounts to the total wealth: = (B2 (t) + B3 (t − 1))/w(t). (If this limit doesn’t exist.e. check to make sure you’re computing what you think you’re computing. 13. B2 (0) = 0. The initial investment allocation is B1 (0) = 0. say so.e.35.. (b) Asymptotic wealth growth rate. to what values? Note: as above. The total payout is re-invested 60% in one-year CDs. simple numerical simulation alone is not acceptable. u2 . B2 (0) = 0.20.30. The inputs are the voltage sources u1 .

R3 + R4 . The resistances R1 . C6 are capacitances from the interconnect wires to the substrate. (The diﬀerent locations of the these cross-coupling capacitances models the wire 1 crossing over wire 2 near the driving gate. In this problem we will only consider inputs that switch (change value from 0 to 1 or 1 to 0) at most once. To save you the trouble of typing these in. . . which deﬁnes these matrices. .m on the course web page.. ˙ where 2 0 −1 0 0 0 0 1 0 0 0 0 −1 0 0 0 2 0 0 2 0 0 0 −1 1 0 0 F = 0 0 0 y = Kv. respectively) connecting the inputs to the outputs. and wire 2 crossing over wire 3 near the end of the wire. i.) The equations are C v + Gv = F u. . ui (t) ∈ {0. R1 R2 y1 (t) C2 R4 y2 (t) C4 R6 y3 (t) u3 (t) C5 C6 u1 (t) R3 C1 C7 u2 (t) R5 C3 C8 To simplify the problem we’ll assume that all resistors have value 1 and all capacitors have value 1. . R6 represent the resistance of the wire segments. 1} for all t. then shame on you.e. and wires 2 and 3. We recognize that some of you don’t know how to write the equations that govern this circuit. 0 0 0 0 0 1 . G = 0 0 0 0 −1 0 0 1 0 0 0 0 2 0 0 0 0 0 1 0 0 1 0 .and the outputs are the three voltages labeled y1 . (If you’re an EE student in this category. the capacitances C7 and C8 are capacitances between wires 1 and 2. so we’ve done it for you. . . ) In static conditions. . . . . The inputs (which represent the gates that drive the three nets) are Boolean valued. K = 0 0 0 1 0 0 0 0 0 0 0 1 0 0 . 104 2 −1 0 0 −1 1 0 0 0 0 0 . . and R5 + R6 . but you don’t need to know this to do the problem . C6 . . . we’ve put an mﬁle interconn. y3 . The capacitances C1 . 0 0 0 0 2 −1 −1 1 0 0 0 0 0 0 0 0 0 0 0 0 2 −1 −1 1 C= and v ∈ R6 is the vector of voltages at capacitors C1 . respectively. y2 . respectively. the circuit reduces to three wires (with resistance R1 + R2 .

. The input has the speciﬁc form u(t) = 1 kT ≤ t < (k + θ)T. 2. things can get very.. the input switches to the Boolean vector g. We deﬁne the 50%-threshold delay of the transition as smallest T such that |yi (t) − gi | ≤ 0. Give a formula for x(0) in terms of the problem data.. and the transitions f1 . worst) 50%-threshold delay. f3 . g1 . u1 (t) = f1 g1 for t < T1 . and the system is stable. where f1 . (If the following gate thresholds were set at 0. f1 . explain why the matrix to be inverted is nonsingular. and u(t) ∈ R.) Among the 64 possible transitions. i. but also the times t.5 for t ≥ T . . For t < 0. g3 ? Be sure to give not only the maximum value. and when θ = 1. Since the DC gain matrix of this system is I. in contrast. the transition with f = (0. but inputs 1 and 3 undergo transitions at times t = T1 and t = T3 . that is applied over a fraction θ of each cycle. explain why the formula you found in part (a) always makes sense and is valid. the input is u(t) = 1 for all t. very ugly. for t ≥ 0.e. y2 (t) is large enough to trigger the following gate. g3 . Now suppose that input 2 remains zero. g1 . where gi ∈ {0. the mean-square norm of the state.e.5. B. (b) Maximum bounce due to cross-coupling. 0)). x(t + T ) = x(t) for all t ≥ 0.. 0. the system is in static condition. k = 0. 0.g. T3 . f3 . We consider the stable linear dynamical system x = Ax+Bu.e. and also describe which transition gives the largest delay (e. 3. the input is u(t) = 0 for all t. and for i = 1.) To be more precise (and also so nobody can say we weren’t clear). and T3 . over all possible t. 2. (In part 1. then this would be ﬁrst time after which the outputs would be guaranteed correct. ˙ where x(t) ∈ Rn . g3 ∈ {0. J= 1 T T 8 2 . What is the maximum possible bounce? In other words. (For example. 2. f3 . The transitions in inputs 1 and 3 induce a nonzero response in output 2. 0 (k + θ)T ≤ t < (k + 1)T.e. g1 . T1 . . 1. . (a) Explain how to ﬁnd an initial state x(0) for which the resulting state trajectory is T -periodic.) This phenomenon of y2 deviating from zero (which is what it would be if there were no cross-coupling capacitance) is called bounce (induced by the cross-coupling between the nets). since u2 = 0. where fi ∈ {0. u(t) = g.) (c) We now consider the speciﬁc system with 0 1 0 0 1 . A. the output converges to the input value: y(t) → g as t → ∞. ﬁnd the largest (i. In addition.. 1}. if your formula involves a matrix inverse. (b) Explain why there is always exactly one value of x(0) that results in x(t) being T -periodic. but it’s not hard to do so. . 1. . and the inputs have values u(t) = f for t < 0. and θ ∈ [0. x(t) 0 dt. respectively. Note: in this problem we don’t consider multiple transitions. Here T > 0 is the period. 1}. i. (But y2 does converge back to zero. You can think of u as a constant input value one. 1}.(a) 50%-threshold delay. for t ≥ T3 u2 (t) = 0 for all t.13 Periodic solution with intermittent input. 1) to g = (1. If for any t. which maximize y(t).. 105 . B= −14 2 T = 5. all transitions occured at t = 0. Try to give the simplest possible formula. 13. and θ. A= 0 −1 −2 −1 Plot J. i. At t = 0. 1] is called the duty cycle of the input. T . Note that when θ = 0. what is the maximum possible value of y2 (t). Give the largest delay. for t ≥ T1 u3 (t) = f3 g3 for t < T3 . T1 . which lasts T seconds. k = 0.

z(N ) ∈ Rp . x(2). We deﬁne the normalized residual. we are not told which of the scalar signals are input components and which are state components. u(N ) ∈ Rm . In system identiﬁcation. equivalently. If we have ρ = 0. roughly speaking. 13. That’s part of what we have to decide.versus θ. so we don’t expect to ﬁnd matrices A and B for which the linear dynamical system equations hold exactly. and give the associated value of the normalized residual. B as R= 1 N −1 N −1 t=1 1/2 x(t + 1) − Ax(t) − Bu(t) 2 . .m on the course web site. i. and we are asked to ﬁnd matrices A ∈ Rn×n and B ∈ Rn×m such that we have x(t + 1) ≈ Ax(t) + Bu(t). . x(N ) ∈ Rn . Let’s give a quantitative measure of how well the linear dynamical system model (2) holds. .05. denoted ρ. . the previous one on system identiﬁcation. u(2). Here too we need to ﬁt a linear dynamical system model to some given signal data. . To complicate things. or rather extends. Given the signal data. over the same period. as S= 1 N −1 N −1 t=1 1/2 x(t + 1) 2 . .. . . 13. . 106 . for example. (a) Explain how to do this. This problem continues. to within 5%.14 System identiﬁcation of a linear dynamical system. problem 14. x(1). We are given the time series data.15 System identiﬁcation with selection of inputs & states. as ρ = R/S. . You may approximate J as N −1 1 x(iT /N ) 2 . (b) Carry out this procedure on the data in lds sysid. J≈ N i=0 for N large enough (say 1000). a vector signal. Estimate the value of θ that maximizes J. Of course you must show your matlab source code and the output it produces. where x(0) is the periodic initial condition that you found in part (a). z(1). We deﬁne the RMS value of x. . though. . . it means that the state equation (2) holds.e. we are given some time series values for a discrete-time input vector signal. t = 1. z(2). (2) We use the symbol ≈ since there may be small measurement errors in the given signal data. . specify them. . . N − 1. the normalized residual ρ). We deﬁne the RMS (root-mean-square) value of the residuals associated with our signal data and a candidate pair of matrices A. for 0 ≤ θ ≤ 1. we will choose the matrices A and B to minimize the RMS residual R (or. for a particular choice of matrices A and B. u(1). Does the method always work? If some conditions have to hold. and also a discrete-time state vector signal. Give the matrices A and B.

. Our goal is to choose an input u : R+ → Rm .e. Your solution should consist of the following: • Your approach.. for which the open-loop system x = Ax is stable. and be sure to show us veriﬁcation that the open-loop system is stable and that the closed-loop system is not.. As an extreme case. Please order the components in x and u in the same order as in z. and attach code and associated output.m on the class web server. without regard for the eﬀects such an input might have in the future. we will choose u(t). in the sense that the data are explained by a linear dynamical system driven by only a few inputs. when does this control scheme result in the norm squared of the state always decreasing?) (c) Find an example of a system (i. • The matlab code used to solve the problem. (In other words. the one with the smallest dimension of u. for ˙ each t. (We will not check this for you. if all components of z are assigned to x. Get the data given in lds sysid2. One measure of the complexity of the model is the number of components assigned to the input u. u(t) is chosen to minimize the compositive objective above.e. Explain how you solved the problem.e. that is not too big. Consider a cascade of 100 one-sample delays: 107 . here is the problem. i. . Give an explicit formula for K. to minimize the quantity d x(t) 2 + ρ u(t) 2 . and drives the state x : R+ → Rn of the system x = Ax + Bu to zero quickly. 13.e. If the dimension of u is small. which contains a vector z(t) ∈ R8 for t = 1. the larger the dimension of u. ˙ but the closed-loop system x = Ax + Bu (with u as above) is unstable. Give a clear description of what x and u are. ﬁnd matrices A and B) for your choice of x and u. • Your assignments to state and input. using the scheme described above? (In other words. . for which the normalized RMS residuals is smaller than around 5%. if z has four components we might assign its ﬁrst and third to be the state.) 13. dt where ρ > 0 is a given parameter. u(t) = x(t) = z4 (t) z3 (t) You can assume that we always assign at least one component to the state. You must explain how to check this. A and B). then we have a compact model.e.16 A greedy control scheme. and ρ under which we have (d/dt) x(t) 2 < 0 whenever x(t) = 0.17 FIR ﬁlter with small feedback. one with no inputs at all.. We seek the simplest model. and its second and fourth to be the input. The ﬁrst term gives the rate of decrease (if it is negative) of the norm-squared of the state vector. so the dimension of the state is always at least one.. i. 100. the control scheme has the form of a constant linear state feedback. or a state component. then we have an autonomous linear dynamical system model for the data. . we then proceed as in problem (14): we ﬁnd matrices A and B that minimize the RMS residuals as deﬁned in problem (14). Once we assign components of z to either x or u. (a) Show that u(t) can be expressed as u(t) = Kx(t). . the more complex the model. • Your matrices A and B. z2 (t) z1 (t) . For example. when ρ = 1. • The relative RMS residuals obtained by your matrices. and develop a linear dynamical system model (i. Assign the components of z to either state or input. To do this. Finally.) (b) What are the conditions on A. where K ∈ Rm×n .. This scheme is greedy because at each instant t. the second term is a penalty for using a large input. B. and its output.We will assign each component of z as either an input component. i. Try to ﬁnd ˙ the simplest example you can.

when the inductor current is delivered to the load. Many electronic systems include DC-DC converters or power supplies. Many of these are built from electronic switches. (b) What are the eigenvalues of A? (c) Now we add simple feedback. 108 . ˙ vL = Cx. The switch is operated periodically as follows: charge position: kT ≤ t < (k + 1/2)T. deliver position: (k + 1/2)T ≤ t < (k + 1)T. where Vs > 0 is the (constant) source voltage and L > 0 is the inductance. to the system: u z −1 α Express this as a linear dynamical system x(t + 1) = Af x(t) + Bf u(t). with gain α = 10−5 .18 Analysis of a switching power supply. where x(t) ∈ Rn is the internal state of the load.u z −1 z −1 y (a) Express this as a linear dynamical system x(t + 1) = Ax(t) + Bu(t). and to the load. B ∈ Rn . and C ∈ R1×n . shown in schematic diagram below. The load is described by the linear dynamical system x = Ax + BiL . Here A ∈ Rn×n . and capacitors. When the switch is in the deliver position.) Don’t worry—you don’t need to know anything about schematic diagrams or circuits to solve this problem! iL deliver i charge vL load y(t) = Cf x(t) + Df u(t) z −1 y y(t) = Cx(t) + Du(t) Vs The switch alternately connects to ground. the inductor current satisﬁes di/dt = Vs /L. When the switch is in the charge position. and the load current is iL = 0. we have di/dt = (Vs − vL )/L. In this problem we consider a standard boost converter. inductors. during which time the inductor is charged. which convert one voltage to another. (d) What are the eigenvalues of Af ? (e) How diﬀerent is the impulse response of the system with feedback (α = 10−5 ) and without feedback (α = 0)? 13. (It’s called a boost converter because the load voltage can be larger than the supply voltage. and iL = i.

where you will ﬁnd the 2 × 2 impulse response matrices h0. no matter what the initial inductor current i(0) and load initial state x(0) are. To eliminate them. converges to its ﬁnal value by t = 3. r decoupler u process y compensated system 109 . and the output of the decoupler is u. vL (kT ) ¯ ¯ converges to a constant value V as k → ∞. which is another 2-input. and that u2 has a substantial eﬀect on y1 . . 2. 1. u2 as a reactant ﬂow rate.m. The decoupler is used as a preﬁlter for the process: the input r (which is called the reference or command input) is applied as the input to the decoupler. but g(t) = 0 for t ≥ 4. 2. 2 1 2 1 s11 −1 −2 0 2 4 6 s12 0 0 −1 −2 0 2 4 6 t 2 1 2 1 t s21 −1 −2 0 2 4 6 s22 0 0 −1 −2 0 2 4 6 t t The plots show that u1 has a substantial eﬀect on y2 .for k = 0. h3. .m. The decoupler is also FIR of length 4: g(0). Give the value of V . h1. 1. 3. The impulse response matrix of the system (for t = 0. We will not accept simulations for some values of iL (0) and x(0) as an answer. If you want to think of this system in concrete terms. with u1 a heater input. h(t) = 0 for t ≥ 4. 13. This is shown below. 1. and y2 as the reactor pressure. 3) can be obtained from the class web page in dynamic_dec_h. g(2). 2. . neither of which we want.. you will explore the design of a dynamic decoupler for this system. deﬁned as s(t) = τ =0 h(τ ). the input to the industrial process. g(1). The step response matrix of the system is shown below. This means that its step response matrix. Show that. y1 as the reactor temperature.19 Dynamic decoupling. g(3) ∈ R2×2 can be nonzero. We will consider the speciﬁc switching power supply with problem data deﬁned in the ﬁle boost_data. An industrial process is described by a 2-input 2-output discrete-time LDS with ﬁnite impulse response of length 4. h2. Here T > 0 is the period of the switching power supply. 2-output LDS with impulse matrix g. you can imagine it as a chemical process reactor. which means that its impulse response h is nonzero only for t t = 0.

This means if the reference input is constant. from input r to output y.. Let s denote the step response matrix of ˜ the compensated system. say so. The goal is to design the decoupler (i.e. g(3) ∈ R2×2 ) so that the compensated system satisﬁes the following speciﬁcations.We refer to this cascaded system as the compensated system. • limt→∞ s(t) = I. and r2 has no eﬀect on y1 . and explain why. the process output converges to ˜ the reference input. This means the compensated system is ˜ decoupled : r1 has no eﬀect on y2 . g(1). Find such a decoupler. and plot the compensated system step response matrix. 110 .e. the problem speciﬁcations are not feasible). If there is no such decoupler (i. choose g(0). and do something sensible with any extra degrees of freedom you may have. g(2).. If there are many decouplers that satisfy the given speciﬁcations. • The oﬀ-diagonal entries of s(t) are all zero (for all t). say so.

Of those. and vents are at distance rv . and the eﬀect of vent j on temperature ti is given by 1 Aij = 2 rij where rij is the distance between vent j and measuring point i. i = 1. and one is located in the centre.Lecture 15 – Symmetric matrices. and a preferences vector y. and the corresponding RMS error in temperature. The ﬁle temp_control. but is not positive semideﬁnite. n. . . .e.) The temperatures may be hotter (xj > 0) or colder (xj < 0) than the ambient air temperature. matrix norm. Vent 1 lies exactly on the horizontal. and |Aij | ≤ (Aii Ajj )1/2 .. . . which in turn is proportional to x2 . n. Each vent j blows a stream of air at temperature xj . The temperature in each x2 rv x3 t4 t5 t6 t7 x4 t3 t2 t1 rt x6 t9 t10 x1 t8 x5 cubicle (measured at its center as shown in the ﬁgure) is ti .1 Simpliﬁed temperature control. ﬁnd the optimal vent temperatures x. 5 have vents evenly distributed around the perimeter of the room. with minimal power consumption? (b) Temperature measurement points are at distance rt from the center. “cost” means the total power spent on the temperature-control systems. obtain exactly the least possible sum of squares error in temperature). A circular room consists of 10 identical cubicles around a circular shaft. and SVD 15. quadratic forms. . measured relative to the surrounding air (ambient air temperature. with minimal cost. i. But this was not meant to be an entirely realistic problem to start with! 15. the inhabitant of cubicle i wants the temperature to be yi hotter than the surrounding air (which is colder if yi < 0!) The objective is then to choose the xj to best match these preferences (i. There are 6 temperature-control systems in the room. r_v. 111 . . in general. . cooling requires more power (per unit of temperature diﬀerence) than heating . j = 1. So the system can be described by t = Ax (where A is tall. as well as the power usage. It also provides code for computing the distances from each vent to each desired temperature location.m on the course webpage deﬁnes r_t. which is the sum of the power consumed by each heater/cooler. . j (a) How would you choose the xj to best match the given preferences yi . Using these data.2 Find a symmetric matrix A ∈ Rn×n that satisﬁes Aii ≥ 0.) The temperature preferences diﬀer among the inhabitants of the 10 cubicles. Here. Comment: In this problem we ignore the fact that. More precisely. .

.7 Gram matrices.6 A Pythagorean inequality for the matrix norm.. 2 − Gx 2 . Suppose that A ∈ Rm×n and B ∈ Rp×n . Gij = a fi (t)fj (t) dt. then the 15. . . Show that T T AT ≥ 0 if and only if A ≥ 0. b] → R. (c) If P > 0 then P −1 > 0. n. A ≥ 0) if and only if it can be expressed as f (x) = F x 2 for some matrix F ∈ Rk×n . (To form a symmetric submatrix. (a) Show that if A and B are PSD and α ∈ R. .15. 15. (b) Show that G is singular if and only if the functions f1 . (a) Let Z ∈ Rn×p be any matrix.e.3 Norm expressions for quadratic forms. n−1 i=1 (xi+1 Hint: you might ﬁnd it useful for part (d) to prove Z ≥ I implies Z −1 ≤ I. This problem shows that congruences preserve positive semideﬁniteness. then |Aij | ≤ entire ith row and column of A are zero. Show that Z T AZ ≥ 0 if A ≥ 0. . . Explain how to ﬁnd such an F (when A ≥ 0). In this problem P and Q are symmetric matrices. T AT T is called a congruence of A and T AT T and A are said to be congruent. .5 Positive semideﬁnite (PSD) matrices. then so are αA and A + B.4 Congruences and quadratic forms. how small can k be)? (b) Show that f can be expressed as a diﬀerence of squared norms. (e) If P ≥ Q then P 2 ≥ Q2 . Given functions fi : [a. . . in the form f (x) = F x for some appropriate matrices F and G. Is P ≥ 0? P > 0? 112 . (b) Show that any (symmetric) submatrix of a PSD matrix is PSD. For each statement below. if Aii = 0.) (d) Show that if A is (symmetric) PSD. 15. In particular. Suppose A = AT ∈ Rn×n .9 Express − xi )2 in the form xT P x with P = P T . (a) If P ≥ 0 then P + Q ≥ Q. How small can the sizes of F and G be? 15. . the Gram matrix G ∈ Rn×n associated with them is deﬁned by b ≤ A 2 + B 2. (a) Show that G = GT ≥ 0. α ≥ 0. (b) If P ≥ Q then −P ≤ −Q. choose any subset of {1. Show that A B Under what conditions do we have equality? 15. (d) If P ≥ Q > 0 then P −1 ≤ Q−1 . When T is invertible. (c) Show that if A ≥ 0. (a) Show that f is positive semideﬁnite (i. fn are linearly dependent. i = 1. What is the size of the smallest such F (i. 15.8 Properties of symmetric matrices. Aii ≥ 0. n} and then throw away all other columns and rows. . Aii Ajj .e. (b) Suppose that T ∈ Rn×n is invertible. .. Let f (x) = xT Ax (with A = AT ∈ Rn×n ) be a quadratic form. either give a proof or a speciﬁc counterexample.

. . n. i. . . Interpretation: every matrix whose distance to the identity is less than one is invertible.. (d) If the eigenvalues of A and B are the same. σn be the singular values of a matrix A ∈ Rn×n . As usual.11 A power method for computing A . (The singular values are based on the full SVD: If Rank(A) < n. j = 1. then A ≥ B. (e) If there is an orthogonal matrix Q such that A = QT BQ..13 A bound on maximum eigenvalue for a matrix with entries smaller than one. then some of the singular values are zero. B for which the statement does not hold.. Show that it ‘usually’ works. assume that A or B is positive semideﬁnite. i. How are the eigenvalues and singular values related? 15. i. Z ∈ Rn×n . . λn be the eigenvalues. Let λ1 . i.) (b) If {x|xT Ax ≤ 1} ⊆ {x|xT Bx ≤ 1}. w(t)/ w(t) ≈ u1 and z(t)/ z(t) ≈ v1 . λ1 ≥ · · · ≥ λn and σ1 ≥ · · · ≥ σn . . Let z(0) = a ∈ Rn be nonzero. .15 Eigenvalues and singular values of a symmetric matrix. (g) If A ≥ B then Aij ≥ Bij for i. Suppose that A ∈ Rn×n . . Show that A = B.) 15. eAt ≥ eBt . sorted so λ1 (X) ≥ λ2 (X) ≥ · · · ≥ λn (X). . . for t = 1. . . Analyze this algorithm. The following method can be used to compute the largest singular value (σ1 ). Y.) You can assume the eigenvalues (and of course singular values) are sorted. Be very explicit about when it fails. (c) If A ≤ B. j = 1. 15. . A ≤ B means B − A is positive semideﬁnite). xT Ax = xT Bx for all x. . 113 . λi (A) = λi (B) for i = 1. n. j = 1. (‘True’ means the statement holds for all A and B. then A ≥ B. then there is an orthogonal matrix Q such that A = QT BQ. Rn×n . For X = X T ∈ Rn×n . ‘false’ means there is at least one pair A. .16 More facts about singular values of matrices. For large t.10 Suppose A and B are symmetric matrices that yield the same quadratic form. . 15. prove it if it is true.e. .e. then the eigenvalues of A and B are the same. then try x = ei + ej . . the symbol ≤ between symmetric matrices denotes matrix inequality (e. (c) σmax (XY ) ≤ σmax (X)σmax (Y ). In the following problems you can assume that A = AT ∈ Rn×n and B = B T ∈ Rn×n . . . n. (f) If A ≥ B then for all t ≥ 0. (In practice it always works. then {x|xT Ax ≤ 1} ⊆ {x|xT Bx ≤ 1}. . How large can λmax (A) be? Suppose A = AT ∈ 15. We do not. λi (X) will denote its ith eigenvalue. . with |Aij | ≤ 1. For each of the following statements. . σ1 > σ2 . 15.e. (a) σmax (X) ≥ max1≤i≤n (b) σmin (X) ≥ min1≤i≤n 1≤j≤n 1≤j≤n |Xij |2 . . . .g. .e. i. which satisﬁes A = AT .e. . λi (A) = λi (B) for i = 1. Show that A < 1 implies I − A is invertible. however. . z(t + 1) = AT w(t). . n. . Here X. i. (h) If Aij ≥ Bij for i. and also the corresponding left and right singular vectors (u1 and v1 ) of A ∈ Rm×n . .. otherwise give a speciﬁc counterexample. . and then repeat the iteration w(t) = Az(t).. 2. . (a) A ≥ B if λi (A) ≥ λi (B) for i = 1.. You can assume (to simplify) that the largest singular value of A is isolated. |Xij |2 . . n..12 An invertibility criterion. and let σ1 .15.14 Some problems involving matrix inequalities. Decide whether each of the following statements is true or false. n. Hint: ﬁrst try x = ei (the ith unit vector) to conclude that the entries of A and B on the diagonal are the same.

Ax ≤ A F x for all x. We consider the problem of drawing (in two dimensions) a graph with n vertices (or nodes) and m undirected edges (or links). a large σmax ? Suppose. where i ∼ j means that nodes i and j are connected by an edge. Thus the Frobenius norm 2 2 (c) Show that A F = √σ1 + · · · + σr . This just means assigning an x..j |Aij |2 1/2 ..) (a) Show that A F F = Thus the Frobenius norm is simply the Euclidean norm of the matrix when it is considered as an 2 element of Rn . the sum (or mean value) of the x. the objective J is not aﬀected if we shift all the coordinates by some ﬁxed amount (since J only depends on diﬀerences of the x. (Recall Tr is the trace of a matrix.17 A matrix can have all entries large and yet have small gain in some directions. and y ∈ Rn be the vector of y. . We let x ∈ Rn be the vector of x.19 Drawing a graph. First of all. i. for example. 106 106 A= 106 106 has “large” entries while A[1 − 1]T = 0. What can you say about σmax (A)? 15. is to make the drawn graph look good. One goal is that neighboring nodes on the graph (i.e. yi ) ∈ R2 . (b) Show that if U and V are orthogonal. For example. Note also that it is much easier to compute the Frobenius norm of a matrix than the (spectral) norm (i. F.18 √ Frobenius norm of a matrix.coordinate to each node. i∼j (xi − xj )2 + (yi − yj )2 . Another problem is that we can minimize J by putting all the nodes at xi = 0. . Then show that σmax (A) ≤ A F ≤ rσmax (A). i i=1 xi yi = 0. The Frobenius norm of a matrix A ∈ Rn×n is deﬁned as A Tr AT A.coordinates of the nodes.coordinates of the nodes. These two equations ‘center’ our drawn graph. maximum singular value). we impose the constraints n n n 2 yi = 1. 15.and y-coordinates so as to minimize the objective J= i<j.or post. In particular.e. that is. Can a matrix have all entries small and yet have a large gain in some direction. it can have a small σmin . To take this into account. 114 . i=1 i=1 yi = 0.. 15. i. ones connected by an edge) should not be too far apart as drawn. j ≤ n. = i..and y-coordinates). then U A F = AV F = A is not changed by a pre. σr are the singular values of A. . that |Aij | ≤ ǫ for 1 ≤ i. yi = 0. we will choose the x. When we draw the graph.and y-coordinates is zero. So we can assume that n n xi = 0.(d) σmin (XY ) ≥ σmin (X)σmin (Y ). i=1 i=1 x2 = 1. where σ1 . The problem. which results in J = 0.e. the sum of the diagonal entries. that is. The objective J is precisely the sum of the squares of the lengths (in R2 ) of the drawn edges of the graph.e. of course. To force the nodes to spread out. We have to introduce some other constraints into our problem to get a sensible solution. we draw node i at the location (xi .orthogonal transformation.and a y. . (e) σmin (X + Y ) ≥ σmin (X) − σmax (Y ).

plot the value of J versus iteration. and Aij = 0 otherwise. without proving them. then by rotating or reﬂecting them. x1 . You are welcome to use the results described below. We’ll just live with this ambiguity. In describing your method. how to ﬁnd x and y that minimize J subject to the centering and spreading constraints. where the variable is x ∈ Rn . is x = qn . This means that if you have a proposed set of coordinates for the nodes. If your method is iterative.m. You can use any method or ideas we’ve encountered in the course. This mﬁle contains the node adjacency matrix of the graph. and Q ∈ R2×2 is any orthogonal matrix. and have the same value of J. . These coordinates are obtained using a standard technique for drawing a graph. the last says that the x. The mﬁle dg_data.scales to be equal. Give the value of J achieved by your choice of x and y. if x and y are any set of coordinates. so Aii = 0.The ﬁrst two say that the variance of the x.. these are just names for the equations given above. . . are orthonormal. Let A ∈ Rn×n n T be symmetric. [x_circ y_circ].e. You know that a solution of the problem minimize xT Ax subject to xT x = 1. so A is symmetric. and deﬁned as Aij = 1 if nodes i and j are connected by an edge.y. and Ik denotes the k × k identity matrix. . Be clear as to whether your approach solves the problem exactly (i. with λ1 ≥ · · · ≥ λn . ﬁnds a set of coordinates with J as small as it can possibly be). .and ycoordinates are uncorrelated. For example.e. . at least approximately. i. [x y]. and we assume k ≤ n. you get another set of coordinates that is just as good. Even with these constraints. You can draw the given graph this way using gplot(A. but perhaps not the absolute best). A solution of this problem is i 115 . say. you may not refer to any matlab commands or operators. for i = 1.) The three equations above enforce ‘spreading’ of the drawn graph. . according to our objective. . (b) Implement your method. Also. given the graph topology. and carry it out for the graph given in dg_data. or whether it’s just a good heuristic (i. and {q1 . qn } orthonormal. denoted A. a choice of coordinates that achieves a reasonably small value of J. axis(’square’). . . (You don’t have to know what variance or uncorrelated mean. then the coordinates given by xi ˜ xi =Q . which plots the graph and axis(’square’).. (The graph is undirected. The related maximization problem is maximize xT Ax subject to xT x = 1. A solution to this problem is x = q1 .e. Here’s the question: (a) Explain how to solve this problem.. . with eigenvalue decomposition A = i=1 λi qi qi . where the variable is X ∈ Rn×k . The constraint means that the columns of X.’o-’). .and y. xk .) Draw your ﬁnal result using the commands gplot(A. i = 1. . Now consider the following generalization of the ﬁrst problem: minimize Tr(X T AX) subject to X T X = Ik . which sets the x. your description must be entirely in mathematical terms.’o-’).. n. . the coordinates that minimize J are not unique. . by placing the nodes in order on a circle.coordinates is one.m also contains the vectors x_circ and y_circ. . with variable x ∈ Rn . the objective can be k written in terms of the columns of X as Tr(X T AX) = i=1 xT Axi . The radius of the circle is chosen so that x_circ and y_circ satisfy the centering and spread constraints. Verify that your x and y satisfy the centering and spreading conditions. we do not have self-loops. Hint. n yi ˜ yi satisfy the centering and spreading constraints.

The 49 input u is known to satisfy t=0 u(t)2 ≤ 1 and u(t) = 0 for t ≥ 50. and y(t) ∈ R3 represents the (x-. and z. x(0) = 0. v(0). Your description can involve standard matrix operations and decompositions (eigenvector/eigenvalue. and v(t) ∈ R gives the displacement amplitude versus time. v(99) are known (and available in the mﬁle worst susp data. where q ∈ R3 is a (constant) vector with q = 1. you must explain how you solve the problem. As usual. and the direction qmin ∈ R3 that minimizes D. The related maximization problem is maximize Tr(X T AX) subject to X T X = Ik . 15. where u(t) ∈ R3 represents the (x-. The input u has the form u(t) = qv(t).5 0. C= 1 2 . The response of the system is judged by the RMS deviation of the load over a 100 sample interval. .coordinates of the) displacement of base. along with the resulting output y.X = [qn−k+1 · · · qn ]. single output system x(t + 1) = Ax(t) + Bu(t). where A= 0.5 0. The problem is to ﬁnd the direction qmax ∈ R3 that maximizes D. In other words. 15. and y gives the displacement of the top of the building. with variable X ∈ Rn×k . D= 1 100 100 1/2 y(t) t=1 2 . . ). this reduces to the ﬁrst problem above. where α > 0 is given.e. Note that when k = 1.7 . x(0) = 0. y-. y(t) = Cx(t). .coordinates of the) displacement of the load. QR. as well as give explicit numerical answers and plots. . A suspension system is connected at one end to a base (that can move or vibrate) and at the other to the load (that it is supposed to isolate from vibration of the base). i. 15. The single-input. The data A. The input u is ground displacement (during an earthquake). A solution of this problem is X = [q1 · · · qk ].21 Finding worst-case inputs. C. 116 . with amplitude given by the (scalar) signal v. y-. the earthquake has energy less 99 t=0 than one.. and only lasts 50 samples. Give an explicit description of an optimal F . B. . (a) How large can output y. Note that I − F A gives a measure of how much F fails to be a left inverse of A. 99 t=0 y(t)2 be? Plot an input u that maximizes y(t)2 . SVD. . the system is described by x(t + 1) = Ax(t) + Bu(t).. We seek a matrix F ∈ Rn×m that minimizes I − F A subject to the constraint F ≤ α.e. and z. Give the directions and the associated values of D (Dmax and Dmin . y(t) = Cx(t).m on the course web site).22 Worst and best direction of excitation for a suspension system. . along with the resulting (b) How large can |y(100)| be? Plot an input u that maximizes |y(100)|. Suppose A ∈ Rm×n is full rank with m ≥ n. B= 1 −1 .20 Approximate left inverse with norm constraints. i.9 −0. Suitably discretized. respectively). the driving displacement u is always in the direction q. is a very simple (discretized and lumped) dynamical model of a building.

oscillate)? Explain how you arrive at your conclusion and show any calculations (by hand or matlab) that you need to do.. We are looking for an answer that is accurate to within ˆ ±0.e. the result you found. x2 (3. t = 10. explain how to ﬁnd an A so that E1 = E2 . (The estimation method won’t make any use of the information that xi (0) > 0.1 . based on measurements of the total biomass taken at t = 0. 3. You can assume that xi (0) > 0 for i = 1. explain how to ﬁnd an S so that E1 = E2 . Find ˆ the optimal value for T (of course..1 0. we saw two diﬀerent ways of representing an ellipsoid. with S T = S > 0. In 100 words or less. You can assume that v1 + v2 + v3 ≤ 0. The ﬁrst uses a quadratic form: E1 = x xT Sx ≤ 1 .012 . explain how to ﬁnd all S that yield E1 = E2 . i. In the lectures.3 0 0 −0. and strain 3. (c) Selection of optimal assay time. Given A. where ˙ −0. For example. Be sure to say what the optimal T is. Posterior intuitive explanation. or not converge at all (for example. the sum of the squares of the measurement errors is not more than 0. centered at 0. (a) Given S.1. what the maximum value x(0) − x(0) is. x ≤ 1 } . (a) Give a very brief interpretation of the entries of the matrix A. for the T you choose). over all measurement errors that satisfy v1 + v2 + v3 ≤ 0.4 hours. You may use phrases such as ‘the presence of strain i enhances (or inhibits) growth of strain j’. We’ll judge accuracy by the maximum value ˆ 2 2 2 that x(0) − x(0) can have. what is the signiﬁcance of a13 = 0? What is the signiﬁcance of the sign of a11 ? Limit yourself to 100 words. where 1 ∈ R3 denotes the vector with all components one.01 . 15. You can also assume that a good method for computing the estimate of x(0).1 0 −0.24 Determining initial bacteria populations. between 0 and 10).1 You can assume that we always have xi (t) > 0. strain 2. (But you don’t need to know any biology to answer this question!) The population dynamics is given by x = Ax. ˆ 117 . grow without bound)..e. the biomass of each strain changes through several mechanisms including cell division. is as accurate as possible. with non-zero volume.) The problem here is to choose the time T of the intermediate assay in such a way that the estimate of x(0). does the total biomass converge to ∞ (i. given the measurements.. and t = T . 2 i. cell death. The three measurements of total biomass (which are called assays) will include a small additive error.. will be used. (b) Given A. Of course you must explain exactly what you are doing.4) denotes the amount of strain 2 (in grams) in the sample at time t = 3. where T satisﬁes 0 < T < 10.15. intuitively. denoted x(0). measured in hours. i. A= 0. explain how to ﬁnd all A that yield E1 = E2 . For example. and what the optimal accuracy is (i. and submit your matlab code as well the output it produces. The second is as the image of a unit ball under a linear mapping: E2 = { y | y = Ax..23 Two representations of an ellipsoid. The total biomass at time t is given by 1T x(t) = x1 (t) + x2 (t) + x3 (t). with det(A) = 0. or biomass (in grams) of the strains present in the population at time t.e. We consider a population that consists of three strains of a bacterium. (c) What about uniqueness? Given S.e. denoted v1 (for the assay at t = 0). the vector x(0) ∈ R3 . converge to zero.2 0. the biomass of each strain is always positive. and mutation. the value of T that minimizes the maximum value x(0) − x(0) can have. called strain 1.e.e. Over time. give a plausible story that explains. 2. v2 2 2 2 (for the assay at t = T and v3 (for the assay at t = 10). i. A biologist wishes to estimate the original biomass of each of the three strains.012 . (b) As t → ∞. The vector x(t) ∈ R3 will denote the amount.

of length m > 0. where λ1 . and the denominator is the total number of paths of length m. λn are the eigenvalues of A. . . pseudo-inverse. but you cannot leave a limit in your expression.. You must explain why your expression is. We consider a linear system of the form x = Ax. Be sure to explain how you check that the ellipsoid you ﬁnd is reasonably consistent with the given data.j=1 Pm (i. if node 3 and node 5 are connected by a link (i. We consider an undirected graph with n nodes. . given the observed data x(1) . We assume the graph has no self-loops. it gives the asymptotic fraction of all (long) paths that go from node i to node j. such as singular values and singular vectors. . n i. (e) A has a dominant eigenvalue. with the property that Ak1 . deﬁned by Aij = 1 if there is a link connecting nodes i and j 0 otherwise. Your job is to recover. A path from node i to node j. Let Pm (i. . . 15. at least approximately. eigenvalues and eigenvectors.. you can give it for the case n = 4 (which is the dimension of the given data). But it should be clear from your discussion how it works in general. explain why Cij exists. We deﬁne Cij = lim Pm (i. the matrix A. 3. . Note that A = AT . etc. . there is no requirement that each node be visited only once. (b) A has distinct eigenvalues. The number Cij gives a good measure of how “connected” nodes i and j are in the graph.km+1 = 1. . In the fraction in this equation. j) m→∞ when the limits exist. that start at i and end at j. Explain your approach to this problem. and derive an expression for Cij . We assume that the graph has at least one link. |λ1 | > |λi | for i = 2.. .k2 = · · · = Akm . When the limits don’t. j) denote the total number of paths of length m from node i to node j. then the sequence 3. we say that Cij isn’t deﬁned. . x(N ) . is an m + 1-long sequence of nodes. (c) A has distinct singular values. 5 is a path between node 3 and node 5 of length 3. described by its adjacency matrix A ∈ Rn×n . . correspond to an ellipsoid. A35 = 1). .e. and then carry it out on the data given in the mﬁle ellip_bdry_data.15.25 A measure of connectedness in a graph. For example. the numerator is the number of paths of length m between nodes i and j. You are given a set of vectors x(1) . x(N ) ∈ Rn that are thought to lie on or near the surface of an ellipsoid centered at the origin. which we represent as E = {x ∈ Rn | xT Ax = 1}.m. j) .26 Recovering an ellipsoid from boundary points. so the ratio gives the fraction of all paths of length m that go between nodes i and j. When Cij exists. . equal to Cij . km+1 = j. in fact.27 Predicting zero crossings. Note that a path can include loops. 15. . More precisely it is a sequence i = k1 . . . so A = 0. n.) Using your assumption. i. . Aii = 0. (d) A is diagonalizable. To simplify the explanation. ˙ y = Cx. We say that each node is connected to itself by a path of length zero. . in fact. . where A = AT ∈ Rn×n ≥ 0.. and also that the matrix A you ﬁnd does. 5. You can make one of the following assumptions: (a) A is full rank. connected by links. k2 .e. 118 . You can use any of the concepts from the class.e. (Be very clear about which one you choose. i.

. but we do not know x(0). i = 2.. . . t2 = 1. explain in detail why.e.) You will answer this question for the speciﬁc system 0 1 0 0 0 0 1 0 . we are given the zero-crossing times t1 . tp at which y(ti ) = 0. We cannot directly observe the output. C. if that’s part of your procedure. 0 0 0 1 −18 −11 −12 −2 and zero-crossing times t1 = 0. it is enough to just have the value zero. We know A and C. (Note that this deﬁnition of zero-crossing times doesn’t require the output signal to cross the value zero. but have diﬀerent 5th zero-crossings. .) Be sure to make it clear which one of these options you choose. and the zero-crossing times t1 . Our goal is to design the (ﬁnite) impulse response of an equalizer. . A= C= 1 1 1 1 .e.. t4 as the ﬁrst 4 zero-crossings.) We are interested in the following question: given A. i. . We are given the (ﬁnite) impulse response of a communications channel. . . Hint: Be careful estimating rank or determining singularity. ∗c ∗w 119 .000. i. .e. i. 2n. and y(tp+1 ) = 0. t = tp . . 0 ≤ t1 < · · · < tp . . the real numbers c1 . . and ﬁnd two trajectories of the system which have t1 .143. you must do one of the following: • If you think that you can determine the next zero-crossing time t5 .. . but we do know the times at which the output is zero. 15. where we take wi and ci to be zero for i ≤ 0 or i > n.. can we predict the next zero-crossing time tp+1 ? (This means. tp . and that y(t) = 0 for 0 ≤ t < tp and t = t1 . i. w2 . . . wn . . . i−1 hi = j=1 wj ci−j .e. explain in detail how to do it.. . . remember that the zero-crossing times are only given to three signiﬁcant ﬁgures. . c2 . the real numbers w1 . that y(t) = 0 for tp < t < tp+1 . .where x(t) ∈ Rn and y(t) ∈ R. .28 Optimal time compression equalizer.) The equalized channel response h is given by the convolution of w and c. and ﬁnd the next time t5 (to at least two signiﬁcant ﬁgures). Speciﬁcally. . the equalizer has the same length as the channel. (So here we have p = 4.) Note that the zero-crossing times are given to three signiﬁcant digits. t4 = 3. . . . t3 = 2. of course. • If you think that you cannot determine the next zero-crossing time t5 . (To make things simple. and not just in the last signiﬁcant digit. i.e.000. cn . . This is shown below. . (The zero-crossings should diﬀer substantially. You can assume these times are given in increasing order.000.

(b) Suppose the safe operating region is the unit cube C = { x | |xi | ≤ 1. Thus number is clearly between 0 and 1. The input u is beyond our control. (Emin can also be justiﬁed as a system security measure on statistical grounds. Repeat part (a) for Emin . The safe operating region for the state is the unit ball B = { x | x ≤ 1 }. Please note: You do not need to know anything about equalizers. or other matrices derived from them such as the controllability matrix C. and so on. we have Etot > 0.The goal is to choose w so that most of the energy of the equalized impulse response h is concentrated within k samples of t = n + 1. least-squares.) (a) Find Emin explicitly. we can be fairly conﬁdent that the state will not leave the safe operating region. i = 1.) If Emin is much larger than the energy of the u’s we can expect. (Note that we do not specify t.g. n } instead of the cube unit ball B. . Plot your solution w. where x(t) ∈ Rn and u(t) ∈ Rm . and its −1 inverse P = Wc . singular values and singular vectors. the time at which the state is outside the safe operating region. we deﬁne the desired to total energy ratio. e. To deﬁne this formally. or DTE. We consider the stable controllable system x = Ax + Bu. i and the energy in the desired time interval as n+1+k Edes = i=n+1−k h2 . and explain why it works. least-norm. the controllability Gramian Wc . . Make sure you give the simplest possible expression for Emin . (b) Carry out your method for time compression length k = 1 on the data found in time_comp_data. and give the DTE for your w. we ﬁrst deﬁne the total energy of the equalized response as 2n Etot = i=2 h2 . i For any w for which Etot > 0. eigenvalues and eigenvectors. Let Emin denote the minimum energy required to drive the state outside the unit cube cube C.m. The hope is that input u will not drive the state outside the safe operating region. . One measure of system security that is used is the minimum energy Emin that is required to drive the state outside the safe operating region: t Emin = min 0 u(τ ) 2 dτ x(t) ∈ B . . as DTE = Edes /Etot . t = n + 1 + k. B.. it tells us what fraction of the energy in h is contained in the time interval t = n + 1 − k. but we ˙ have some idea of how large its total energy ∞ 0 u(τ ) 2 dτ is likely to be. or even convolution. You can appeal to any concepts used in the class. 120 . . Your description and justiﬁcation must be very clear. . x(0) = 0. Your solution should be in terms of the matrices A. 15. everything you need to solve this problem is clearly deﬁned in the problem statement.29 Minimum energy required to leave safe operating region. but we won’t go into that here. the equalized response h. communications channels. where k < n − 1 is given. matrix exponential. . You can assume that h is such that for any w = 0. (a) How do you ﬁnd a w = 0 that maximizes DTE? You must give a very clear description of your method. .

C. . y(N ). and u(t). G. B. . u(T − 1) that reaches position f ∈ R3 at time T (where T ≥ n). Still. F . . v(0) 2 + · · · + v(T − 1) 2 ≤ 1? The input v is maximally evasive. y(t) ∈ R.) We deﬁne the input energy as Ein = and similarly. and minimizes the input ‘energy’ u(0) 2 + · · · + u(T − 1) 2 . y(2N − 1). A vehicle is governed by the following discrete-time linear dynamical system equations: x(t + 1) = Ax(t) + Bu(t). Express your answer in terms of the data A. . y(t) ∈ R3 is the vehicle position. i.. The initial state is zero. i. . B. . and are interested in the output over the next N samples. (Of course E will depend on the data A.e. w(t) = Hz(t). . .30 Energy storage eﬃciency in a linear dynamical system. . u(N − 1) to maximize the ratio of output energy to input energy. The input u is the (energy) optimal input for the vehicle to arrive at the position f at time T .. the equation above is the result of a suitable sampling. u(N − 1). i. the time of intercept (T ) is known to the second vehicle. . This vehicle is to be overtaken (intercepted) by the ﬁrst vehicle at time T . . . where T ≥ n. given the limit on input energy the second vehicle is allowed to use.. y(T ) = f . the output energy is deﬁned as Eout = 2N −1 t=N N −1 t=0 u(t)2 . 121 . and standard matrix functions (inverse.31 Energy-optimal evasion. We consider the discrete-time linear dynamic system x(t + 1) = Ax(t) + Bu(t). for several reasons: both vehicles start from the zero initial state. We apply an input sequence u(0). and f . and u(t) ∈ Rm is the input signal. w(t) ∈ R3 is the vehicle position. . C. Here x(t) ∈ Rn is the vehicle state.) The system is controllable. y(t) = Cx(t). where x(t) ∈ Rn . and v(t) ∈ Rm is the input signal. SVD. x(0) = 0.e. . How would you choose the (nonzero) input sequence u(0).) (b) Energy-optimal evasion. it’s possible to extend the results of this problem to handle a realistic model of a pursuit/evasion. . . Find the input u(0).e. Give an expression for E. subject to a limit on input energy.15. in the sense that is requires the ﬁrst vehicle to expend the largest amount of input energy to overtake it. y(t)2 . x(0) = 0. and the place of intercept (w(T )) is known ahead of time to the ﬁrst vehicle. . (a) Minimum energy to reach a position. to maximize Eout /Ein ? What is the maximum value the ratio Eout /Ein can have? 15. This means that w(T ) = y(T ). v(T − 1) that maximizes the minimum energy the ﬁrst vehicle must expend to intercept the second vehicle at time T . How would you ﬁnd v(0). . ). Now consider a second vehicle governed by z(t + 1) = F z(t) + Gv(t). (The vehicle dynamics are really continuous. Remark: This problem is obviously not a very realistic model of a real pursuit-evasion situation.e. the energy of the minimum energy input. z(0) = 0 where z(t) ∈ Rn is the state of the vehicle. . . (We take u(t) = 0 for t ≥ N . . . y(t) = Cx(t).. transpose. H. . i.

33 Worst time for control system failure. over all possible impact directions and magnitudes no more than one (i. You must also give your source code. We consider a (time-invariant) linear dynamical system x = Ax + Bu. We are interested in the deviation D between x(T ) and xnom (T ). In other words. we would like to know the maximum possible state deviation. At time t = Tf . and the results. and the resulting value of x(10) . As usual.m. In other words. B. you must explain your approach clearly and completely. and over all possible impact times between 0 and T− . and is very commonly used in automatic control systems.. In this problem the input u represents an impact on the system. and does aﬀect x(T ). T ].01 for Tf is ﬁne. and T ) say so. which means that input u is meant to drive the state to zero as t → ∞. at time t = T . and the input signal becomes zero. 9]) and the initial condition x(0) (with x(0) ≤ 1) that maximizes x(10) . D measures how far the impact has shifted the state at time T . for t < Tf and t > Tf + Tc . A.01). if needed. 122 . xnom (0) = xinit .m on the class web site. and carry out the procedure described in part (a).e. You can use any of the concepts we have encountered in the class. (a) Explain how to ﬁnd the worst-case impact time. xinit . and K. due to an impact of magnitude no more than one... If either the worst-case impact time or the worst-case impact vector do not depend on some of the problem data (i. given the problem data A..e. however. and the corresponding value of D.e.) Here the application is a regulator.e. (b) Get the data from wc_impact_data. and Timp is the time of the impact. with u = 0). An accuracy of 0. B. B. We assume that 0 ≤ Timp ≤ T− . respectively. In essence. the worst failure time Tf . for Tf ≤ t ≤ Tf + Tc . we have 0 ≤ Tf ≤ 9. of the linear system xnom = Axnom . ˙ u(t) = Kx(t). but you can assume that x(0) ≤ 1. (Timp = T− means that the impact occurs right at the end of the period of interest. Your explanation should be as short and clear as possible. where g ∈ Rm is a vector that gives the direction and magnitude of the impact. the worse possible x(0).15. We don’t know what x(0) is. The fault is cleared (i. g ≤ 1). some time in the time interval [0. in the mﬁle fault_ctrl_sys. 15. which deﬁnes A. 9]. Thus. the worst-case impact vector. You’ll ﬁnd ˙ the speciﬁc matrices A. and the worst-case impact vector. Your approach can include a simple numerical search (such as plotting a function of one variable to ﬁnd its maximum). Be sure to give us the worst-case impact time (with absolute precision of 0. had the impact not occured (i.32 Worst-case analysis of impact. The problem is to ﬁnd the time of failure Tf (in [0. The vector xnom (T ) is what the ﬁnal state x(T ) ˙ of the system above would have been at time t = T . as measured by the norm: D = x(T ) − xnom (T ) . we have x(t) = Ax(t). We would like to know how large D can be. a fault occurs. and T . Here’s the problem: suppose the system fails for one second. so it has the form u(t) = gδ(t − Timp ). and input u(t) ∈ Rm . We’ll call the choices of Timp and g that maximize D the worst-case impact time and worst-case impact vector. B. and Tc = 1. with state x(t) ∈ Rn . xinit . We are interested in the state trajectory over the time interval [0.e. In this problem we consider a system that under normal circumstances is governed by the equations x(t) = Ax(t) + Bu(t). xinit . ˙ x(0) = xinit . no matter what x(0) is. i. at time T ..) We let xnom (T ) denote the state. (3) (This is called state feedback. you are carrying out a worst-case analysis of the eﬀects of a one second control system failure. and T . corrected) Tc seconds after it occurs. We also don’t know the time of failure Tf . the system is governed by the equations (3).

Ak ≥ B k . (Our main interest is in the case when N is much larger than n. which might represent a collection of measurements or other data. for example. For example. C and D have the same number of rows. . two points for the right answer (i. the statement “A + B = B + A” is true. B } ≤ A B ≤ A 2 + B 2. because no matter what the dimensions of A and B (which must. . Then A is singular.e.e. the ratio of the largest to the smallest singular value. In other words. . What we mean by “true” is that the statement is true for all values of the matrices and vectors given. . . and the goal is to ﬁnd m. . . i. . and x1 . This problem is about uncovering. we have max{ A . A and C have the same number of columns. (d) For any A. Then we have κ(A) ≤ κ([ A B ]). true or false). 15. In this case we have 500 vectors. the right answer and a valid proof or counterexample). each vector yi consists of 30 scalar measurements or data points. n = 30. A = 2 (which is a matrix in R1×1 ). . C. as usual. Determine if the following statements are true or false. or discovering. (This means A is obtained from A by removing some rows and columns from A. but you can assume that the dimensions are such that all expressions make sense.) Then we say that y = Ax + b is a linear explanation of the data y. and B and D have the same number of columns. A C B D ≤ A C B D . as an extreme case. provide a speciﬁc counterexample. For example. If the statement is true. the statement A2 = A is false. and m is smaller than n. xi ∈ Rm .. the statement holds. b. . provide a speciﬁc (numerical) counterexample. any element of A is a (1 × 1) submatrix of ˜ A. yN ∈ Rn .35 Uncovering a hidden linear explanation. To judge the accuracy of a proposed explanation. Then. (b) Suppose A. and zero points otherwise. and m = 5. i = 1. and Akk = 0 for some k (between 1 and n).. . however. As another example.. for all k ≥ 1.e. In such a case. (e) For any A and B with the same number of columns. and no matter what values A and B have. the condition number of the matrix Z. (f) Suppose the fat (including. with A > B . . suppose N = 500. You get ﬁve points for the correct solution (i. where κ(Z) denotes. B ∈ Rn×n . with m < n. ˜ ˜ (c) Suppose A is a submatrix of a matrix A ∈ Rm×n . if you think it is false. because there are (square) matrices for which this doesn’t hold. a linear explanation of a set of data. we’ll use the RMS explanation error. D with compatible dimensions (see below). A. xN so that yi ≈ Axi + b. B. Suppose we have yi ≈ Axi + b. . . (a) Suppose A = AT ∈ Rn×n satisﬁes A ≥ 0.e. and b ∈ Rn . J= 1 N N i=1 1/2 yi − Axi − b 123 2 . yN . be the same). . . possibly.. Consider a set of vectors y1 .15. We refer to x as the vector of factors or underlying causes of the data y.) You can’t assume anything about the dimensions of the matrices (unless it’s explicitly stated). Compatible dimensions means: A and B have the same number of rows.34 Some proof or counterexample questions.) Then A ≤ A . we are given y1 . given only the data. . i. where A ∈ Rn×m . But these 30-dimensional data points can be ‘explained’ by a much smaller set of 5 ‘factors’ (the components of xi ). (You can assume the entries of the matrices and vectors are all real. prove it. N. square) and full rank matrices A and B have the same number of rows.

we have yi = Axi + b. (This is just an example. value of J. Suppose A.000. yN . Be sure to say how you ﬁnd m. i=1 1 N N xi xT = I. . . Even if we ﬁx the number of factors as m. for i = 1. 3. i. how large (or small) can the speciﬁed quantity be. with singular values 2. we’ll accept a nonzero. ˜ = b − AF −1 g. . yi − Axi − b .. b. we want m. To standardize or normalize the linear explanation. with Σ = diag(σ1 . but hopefully small.36 Some bounds on singular values. of course. .) Generally. . i i=1 In other words. and x1 . 5. and generate another equally good explanation of the original data by appropriately adjusting A and b. with singular values 7. the dimension of the underlying causes. (We allow the possibility that some of these singular values are zero. b. the number of factors in the explanation. . for any A and B with the given sizes and given singular values.420.. . the matrix A. let F ∈ Rm×m be invertible.. . .m available on the course web site. the data explains itself! In this case.. Give your answers with 3 digits after the decimal place. . xN have the required properties.e. . .000. the problem. Suppose A ∈ R6×3 . a linear explanation of a set of data is not unique. . (This gives a good picture of the distribution of explanation errors. it is usually assumed that 1 N N xi = 0.e. Thus. 1. N . and b = 0. and unit sample covariance. Let C = [A B] ∈ R6×6 . . (You don’t need to know what covariance is — it’s just a deﬁnition here. and B ∈ R6×3 . so the RMS explanation error is zero. σ6 ). we can apply any aﬃne (i. xN is a linear explanation of our data. xN . (d) 0. have the required properties. Then we can multiply the matrix A by two (say). . . linear plus constant) mapping to the underlying factors xi . In other words. . . In other words.. subject to the constraint that J is not too large. The ﬁle gives the matrix Y = [y1 · · · yN ]. . (b) 10. Then we have yi ≈ Axi + b = (AF −1 )(F xi + g) + (b − AF −1 g). . and g ∈ Rm . we want to have m substantially smaller than n. . and the vectors x1 . with xi ∈ Rm . xN . 15. xN obtained. To be a useful explanation. . . the underlying factors have an average value zero. b x1 = F x1 + g. 124 . and divide each vector xi by two. with full SVD C = U ΣV T .) By explicit computation verify that the vectors x1 . xN = F xN + g ˜ is another equally good linear explanation of the data. ˜ . ﬁnd A and B that achieve the values you give. (b) Carry out your method on the data in the ﬁle linexp_data. .) (a) How large can σ1 be? (b) How small can σ1 be? (c) How large can σ6 be? (d) How small can σ6 be? What we mean is. . But this is not what we’re after. .552. Give your ﬁnal A. More generally. substantially fewer factors than the dimension of the original data (and for this smaller dimension. . and verify that yi ≈ Axi + b by calculating the norm of the error vector.) Finally. 2. as in (a) 12. Sort these norms in descending order and plot them. ˜ A = AF −1 . . . (a) Explain clearly how you would ﬁnd a hidden linear explanation for a set of data y1 . . A = I. (c) 0. the vector b. to be as small as possible. and x1 . Explain clearly why the vectors x1 . .) Brieﬂy justify your answers. and we have another linear explanation of the original data. .One rather simple linear explanation of the data is obtained with xi = yi .

) (b) Find symmetric matrices A and B for which neither A ≥ B nor B ≥ A holds.3 −0. (f) Rank(eA − I) ≤ Rank(A).38 Some true-false questions.4 −0. A correlation matrix C ∈ Rn×n is one that has unit diagonal entries. but there is a vector x for which xT Ax < 0. i.) Moral: You cannot use positivity of the eigenvalues of A as a test for whether xT Ax ≥ 0 holds for all x.) Whether or not you give an analytical description. (d) κ(eA ) ≤ e2 A (c) κ(eA ) ≤ eκ(A) . (In particular.e. .39 Possible values for correlation coeﬃcients.15. Of course.8 What are all possible values of C23 ? Justify your answer. If you can give an analytical solution in terms of any concepts from the class (eigenvalues. In the following statements. that maximize y T Ax. we’d like the simplest examples in each case. Of course. for i = 1.1 0. How would you ﬁnd vectors y and x. you must explain how you ﬁnd the possible values that C23 can take on. A . The only answers we will read are ‘True’. and κ refers to the condition number. (Give A and x. 15. You may use the fact that C > 0 when C23 = 0. (b) σmin (e ) ≥ eσmin (A) . . 15. . etc. A ∈ Rn×n . singular values. and ‘My attorney has advised me to not answer this question at this time’.40 Suppose that A ∈ Rm×n . (This last choice will receive partial credit. . Suppose that a correlation matrix has the form below: 1 0.) do so. Cii = 1.1 0. or provide any counter-examples. ‘False’. give a numerical description (and explain your method.) If you write anything else. and is symmetric and positive semideﬁnite. 15. do not write justiﬁcation for any answer.4 1 C23 C= −0.. if it diﬀers from the analytical method you gave). and the eigenvalues of A.2 0. C23 = 0 is a possible value. ‘True’ means that the statement holds for any matrix A ∈ Rn×n . x = 1? What is the resulting value of y T Ax? 0. (a) eA ≤ e A A . matrix exponential. you will receive no credit for that statement. In particular. Correlation matrices come up in probability and statistics. σmin refers to σn (the nth largest singular value).37 Some simple matrix inequality counter-examples.2 C23 1 0. but you don’t need to know anything from these ﬁelds to solve this problem. ‘False’ means that the statement is not true. subject to y = 1.3 −0. n. (a) Find a (square) matrix A. (e) Rank(e ) ≥ Rank(A).8 1 125 . pseudoinverse. Tell us whether each statement is true or false. for any n. which has all eigenvalues real and positive. What is the correct way to check whether xT Ax ≥ 0 holds for all x? (You are allowed to ﬁnd eigenvalues in this process.

we T can multiply it by −1 and it still minimizes w subject to (1/T ) t=1 (wT yt )2 = 1. where w ∈ Rn is a weight vector. you will alternate ˆ point A ij ij between two projections. if we don’t know a we really can’t do any better. Once you’ve chosen w. and assuming that we have chosen w so that wT yt ≈ st . This leads us to ˆ T choose w to minimize w subject to (1/T ) t=1 (wT yt )2 = 1.e. you carry out the following steps: 126 . T. . This question investigates a heuristic method for the low rank matrix completion problem. . . but it will not matter. whose columns are yt . This doesn’t determine w uniquely. yT . 1. along with an upper bound on its rank. Give your error rate. i. t = 1. . Plot a histogram of the values of wT yt using hist(w’*Y. For k = 0. which contains a matrix Y. . T . .) The receiver will form an approximation of the transmitted signal as st = wT yt . .) ij We let p = |K| denote the number of known entries. where K ⊆ ij {1. given the received signal. You are told that A ∈ Rm×n has rank ≤ r. m} × {1.2 Alternating projections for low rank matrix completion. . 1} is transmitted to a receiver. you are welcome to ﬂip the sign on w. . with st ∈ {−1. The ﬁle bs_det_data. j) ∈ K. . After choosing an initial ˆ(0) . n} is the set of indices of the known entries. which receives the (vector) signal yt = ast + vt ∈ Rn . . This choice is the smallest (in norm) vector w for which wT a = 1. one negative and one positive. t = 1.Lecture 16 – SVD applications 16. where a ∈ Rn and vt ∈ Rn is a noise signal. . that has the known correct entries (i. . . A(0) = Aknown for ((i.e. one of which is described at the end of this problem. (You are given Aknown for (i. We’ll assume that a = 0. the vector a) is called blind signal estimation or blind signal detection. . but is otherwise unknown. . when you don’t know the mapping from transmitted to received signal (in this case. You’ll know you’re doing well if the result has two peaks.1 Blind signal detection. T. (In practice we’d use other methods to determine whether we have recovered st or −st . (This last statement is vague. you are to guess or estimate the remaining entries. j) ∈ K. ˆ You will use an alternating projection method to ﬁnd an estimate A of A. This arises in several applications. . j) ∈ K. we would have T (1/T ) t=1 T (wT yt )2 ≈ 1. ˜ where sign(u) is +1 for u ≥ 0 and −1 for u < 0. .. If you ˆ knew the vector a. and that the noise signal is centered around zero. j) ∈ K). as a row vector s. So we can only hope to recover either an approximation of st or of −st . possibly.) 16. given the received vector signal y1 . .m. . . Ignoring the noise signal. the fraction of times for which st = st . using concepts from the class. . Since w vt gives the noise contribution to st .50). You are to estimate or guess the entries Aij . we want w to be as small as possible. Estimating the transmitted signal. . . . . then a reasonable choice for w would be w = a† = a/ a 2 .) (a) Explain how to ﬁnd w. ˆ t = 1. sT . Here’s the catch: You don’t know the vector a. In the low rank matrix completion problem. (If ˜ this is more than 50%. you are given some of the entries of a matrix. Your job is to choose the weight vector w so that st ≈ st . . . . . for (i. A binary signal s1 .. a reasonable guess of st (or.m contains the original signal. Give the weight vector w that you ﬁnd. Here is one approach. . (b) Apply the method to the signal in the ﬁle bs_det_data. and that Aij = Aknown for (i. its negative −st ) is given by st = sign(wT yt ).

but overly complicated methods. (a) Clearly explain how to perform each of these projections. that minimizes that is closest to A ˜ ˆ A(k) − A(k) ˜ subject to Rank(A(k) ) ≤ r. None of this is needed to solve the problem. Algorithms like this can be used for problems like the Netﬂix challenge. that minimizes known entries that is closest to A ˆ ˜ A(k+1) − A(k) m n F ˆ(k+1) = Aknown for (i. This ﬁle deﬁnes the rank upper bound r.m. or it can converge to diﬀerent limit points. m n F = i=1 j=1 ˜(k) ˆ(k) (Aij − Aij )2 1/2 ˆ • Project to the closest matrix with the known entries. for example.. We will subtract points for technically correct. the entries in the kth right singular vector tell us how much of factor k (positive or negative) is in each movie. Let µ denote the mean of all the known entries. it is only for your amusement and interest. Make a very brief comment about how well the algorithm worked on this data set. .3 Least-squares matching of supply and demand on a network. Set A(k) to be the matrix of rank ≤ r ˆ(k) in Frobenius norm. 16. j) ∈ K. and want to predict other ratings before they are given.e. The known matrix indices are given as a p × 2 matrix K. with each row giving (i. i. A network consists of a directed graph with n nodes connected by m directed links. for k = 1. the mﬁle also gives the actual matrix A.) It is reasonable to assume (and is conﬁrmed with real data) that ratings matrices like A have (approximately) low rank. . j) ∈ K. You can allude to the fact that you are given only around one sixth of the entries of A. j) ∈ K. i. (b) Use 300 steps of the alternating projections algorithm to ﬁnd a low rank matrix completion estimate for the problem deﬁned in lrmc. The graph is speciﬁed by its node incidence matrix A ∈ Rn×m . Aij = 0 otherwise 127 . (Of course in ˆ applications. where +1 edge j enters node i −1 edge j leaves node i . The entries in the kth left singular vector tell us how much the user ratings are inﬂuenced (positively or negatively) by factor k. Set A(k+1) to be the matrix with the given ˜(k) in Frobenius norm.. 300. The p-vector Aknown gives the corresponding known values. Do not use any matlab notation in your answer. Set Aij = ˆ(0) Aknown for (i. and the known matrix entries. and A = µ for (i. (This would allow us to make recommendations. ˆ(0) Initialize your method as follows. .e. . We have access to some of the ratings. Here Aij represents the rating user i gives (or would give) to movie j. This can be interpreted as meaning that a user’s rating is (mostly) determined by a relatively small number of factors. subject to Aij ij = ˆ (Aij i=1 j=1 (k+1) ˜(k) − Aij )2 1/2 This is a heuristic method: It can fail to converge at all. Remark. depending on the starting point.˜ • Project to the closest matrix satisfying the rank constraint. j) for one known entry. ij ij To judge the performance of the algorithm. But it often works well. the dimensions m and n. you would not have access to the matrix A!) Plot A(k) − A F .

sj < 0 just means that we ship the amount |sj | in the direction opposite to the edge orientation. k − 1. ri (which give the rows sums) are given. as are cj (which give the column sums).. 32 14 4 34 12 . as well as a demand di for the commodity. 16.4 Smallest matrix with given row and column sums.e. If your method ˜ j involves matrix inversion (and we’re not saying it must). with m = 6 and n = 8. n. Hint: First characterize N (AT ). . i.e. Here. . The data r ∈ Rm and c ∈ Rn must satisfy 1T r = 1T c. . . the total quantity available equals the total demand. ˜ (a) Ability to match supply and demand. . do so.. . i=1 Aij = cj . . qi ) plus any amount shipped in from neighboring nodes. i. carry out your method for the speciﬁc data 24 20 30 18 16 26 8 . perfect matching of supply and demand at each node. AT 1 = c. j=1 i = 1. You can refer ˜ to any concepts and results from the class. Explain how to ﬁnd a shipment vector s that m achieves q = d.. m.) We assume that i=1 qi = i=1 di . between any nodes there is a path. Explain your method in the general case. . . You can assume m n that i=1 ri = j=1 cj . . If you can give a nice formula for the optimal A. there is no A that can satisfy the constraints. You can use any concepts from the class. (These n n are typically nonnegative. ik = ˜ with an edge between ip and ip+1 . . minus any amount shipped out from node i. ij subject to the constraints n m Aij = ri . if this doesn’t hold. (b) Least-squares matching of supply and demand. Explain how to ﬁnd the matrix A ∈ Rm×n that minimizes m n J= i=1 j=1 A2 . ignoring edge orientation. This can be positive or negative: sj > 0 means that we ship the amount |sj | in the direction of the edge orientation. .) 128 where 1 denotes a vector (of appropriate size in each case) with all components one. you’ll need to justify that the inverses exist. i. (This means that i. After shipment. . the objective can be written as J = Tr(AT A). of nodes i = i1 . . j = 1. but this won’t matter here.e.We assume the graph is connected: For any pair of distinct nodes i and ˜ with i = ˜ there is a sequence i. r= c= 22 28 . In addition. the quantity of commodity at node i is equal to the original amount there (i. Using matrix notation. . and minimizes j=1 s2 . We denote the post-shipment quantity at node i as qi . (Entries in A do not have to be integers. . and the constraints are A1 = r. We will ship an amount sj along each edge j. and you must limit your argument to one page. for p = 1.) Each node i has a quantity qi of some commodity. Explain why there always exists a shipment vector s ∈ Rm which results in q = d.

16.5 Condition number. Show that κ(A) = 1 if and only if A is a multiple of an orthogonal matrix. Thus the best conditioned matrices are precisely (scaled) orthogonal matrices. 16.6 Tightness of the condition number sensitivity bound. Suppose A is invertible, Ax = y, and A(x + δx) = y + δy. In the lecture notes we showed that δx / x ≤ κ(A) δy / y . Show that this bound is not conservative, i.e., there are x, y, δx, and δy such that equality holds. Conclusion: the bound on relative error can be taken on, if the data x is in a particularly unlucky direction and the data error δx is in (another) unlucky direction. 16.7 Sensitivity to errors in A. This problem concerns the relative error incurred in solving a set of linear equations when there are errors in the matrix A (as opposed to error in the data vector b). Suppose A is invertible, Ax = b, and (A + δA)(x + δx) = b. Show that δx / x + δx ≤ κ(A) δA / A . 16.8 Minimum and maximum RMS gain of an FIR ﬁlter. Consider an FIR ﬁlter with impulse response h1 = 1, h2 = 0.6, h3 = 0.2, h4 = −0.2, h5 = −0.1.

We’ll consider inputs of length 50 (i.e., input signal that are zero for t > 50), so the output will have length (up to) 54, since the FIR ﬁlter has length 5. Let u ∈ R50 denote the input signal, and y ∈ R54 denote the resulting output signal. The RMS gain of the ﬁlter for a signal u is deﬁned as g=

√1 54 √1 50

y u

,

which is the ratio of the RMS value of the output to the RMS value of the input. Obviously, the gain g depends on the input signal. (a) Find the maximum RMS gain gmax of the FIR ﬁlter, i.e., the largest value g can have. Plot the corresponding input signal whose RMS gain is gmax . (b) Find the minimum RMS gain gmin of the FIR ﬁlter, i.e., the smallest value g can have. Plot the corresponding input signal whose RMS gain is gmin . (c) Plot the magnitude of the transfer function of the FIR ﬁlter, i.e., |H(ejΩ )|, where

5

H(ejΩ ) =

k=1

hk e−jkΩ .

Find the maximum and minimum absolute values of the transfer function, and the frequencies at which they are attained. Compare to the results from parts a and b. Hint: To plot the magnitude of the transfer function, you may want to use the freqz matlab command. Make sure you understand what freqz does (using help freqz, for example). (d) (This part is for fun.) Make a conjecture about the maximum and minimum singular values of a Toeplitz matrix, and the associated left and right singular vectors. 16.9 Detecting linear relations. Suppose we have N measurements y1 , . . . , yN of a vector signal x1 , . . . , xN ∈ Rn : yi = xi + di , i = 1, . . . , N. Here di is some small measurement or sensor noise. We hypothesize that there is a linear relation among the components of the vector signal x, i.e., there is a nonzero vector q such that q T xi = 0, i = 1, . . . , N . The geometric interpretation is that all of the vectors xi lie in the hyperplane q T x = 0. We will assume that q = 1, which does not aﬀect the linear relation. Even if the xi ’s do lie in a hyperplane q T x = 0, our measurements yi will not; we will have q T yi = q T di . These numbers are small, assuming the measurement noise is small. So the problem of determing whether or not there is 129

a linear relation among the components of the vectors xi comes down to ﬁnding out whether or not there is a unit-norm vector q such that q T yi , i = 1, . . . , N , are all small. We can view this problem geometrically as well. Assuming that the xi ’s all lie in the hyperplane q T x = 0, and the di ’s are small, the yi ’s will all lie close to the hyperplane. Thus a scatter plot of the yi ’s will reveal a sort of ﬂat cloud, concentrated near the hyperplane q T x = 0. Indeed, for any z and q = 1, |q T z| is the distance from the vector z to the hyperplane q T x = 0. So we seek a vector q, q = 1, such that all the measurements y1 , . . . , yN lie close to the hyperplane q T x = 0 (that is, q T yi are all small). How can we determine if there is such a vector, and what is its value? We deﬁne the following normalized measure: 1 N

N

ρ=

(q T yi )2

i=1

1 N

N

yi 2 .

i=1

This measure is simply the ratio between the root mean square distance of the vectors to the hyperplane q T x = 0 and the root mean square length of the vectors. If ρ is small, it means that the measurements lie close to the hyperplane q T x = 0. Obviously, ρ depends on q. Here is the problem: explain how to ﬁnd the minimum value of ρ over all unit-norm vectors q, and the unit-norm vector q that achieves this minimum, given the data set y1 , . . . , yN . 16.10 Stability conditions for the distributed congestion control scheme. We consider the congestion control scheme in problem 3, and will use the notation from that problem. In this problem, we study the dynamics and convergence properties of the rate adjustment scheme. To simplify things, we’ll assume that the route matrix R is skinny and full rank. You can also assume that α > 0. Let xls = (RT R)−1 RT T target denote the least-squares approximate solution of the (over-determined) equa¯ tions Rx ≈ T target . (The source rates given by xls minimize the sum of the squares of the congestion ¯ measures on all paths.) (a) Find the conditions on the update parameter α under which the rate adjustment scheme converges to xls , no matter what the initial source rate is. ¯ (b) Find the value of α that results in the fastest possible asymptotic convergence of the rate adjustment algorithm. Find the associated asymptotic convergence rate. We deﬁne the convergence rate as the smallest number c for which x(t) − xls ≤ act holds for all trajectories and all t (the ¯ constant a can depend on the trajectory). You can give your solutions in terms of any of the concepts we have studied, e.g., matrix exponential, eigenvalues, singular values, condition number, and so on. Your answers can, of course, depend on R, T target , and xls . If your answer doesn’t depend on some of these (or even all of them) be sure to point ¯ this out. We’ll take points oﬀ if you give a solution that is correct, but not as simple as it can be. 16.11 Consider the system x = Ax with ˙ 0.3132 −0.0897 A = 0.0845 0.2478 0.1744

0.3566 0.2913 0.2433 −0.1875 0.2315

0.2545 0.1888 −0.5888 0.2233 −0.1004

0.2579 0.4392 −0.0407 0.3126 −0.2111

0.2063 0.1470 0.1744 −0.6711 0.0428

(a) Find the initial state x(0) ∈ R5 satisfying x(0) = 1 such that x(3) is maximum. In other words, ﬁnd an initial condition of unit norm that produces the largest state at t = 3.

.

(b) Find the initial state x(0) ∈ R5 satisfying x(0) = 1 such that x(3) is minimum. To save you the trouble of typing in the matrix A, you can ﬁnd it on the web page in the ﬁle max min init state.m.

130

**16.12 Regularization and SVD. Let A ∈ Rn×n be full rank, with SVD
**

n

A=

i=1

T σ i u i vi .

(We consider the square, full rank case just for simplicity; it’s not too hard to consider the general nonsquare, non-full rank case.) Recall that the regularized approximate solution of Ax = y is deﬁned as the vector xreg ∈ Rn that minimizes the function Ax − y

2

+ µ x 2,

where µ > 0 is the regularization parameter. The regularized solution is a linear function of y, so it can be expressed as xreg = By where B ∈ Rn×n . (a) Express the SVD of B in terms of the SVD of A. To be more speciﬁc, let

n

B=

i=1

σ i u i vi ˜ ˜ ˜T

denote the SVD of B. Express σi , ui , vi , for i = 1, . . . , n, in terms of σi , ui , vi , i = 1, . . . , n (and, ˜ ˜ ˜ possibly, µ). Recall the convention that σ1 ≥ · · · ≥ σn . ˜ ˜ (b) Find the norm of B. Give your answer in terms of the SVD of A (and µ). (c) Find the worst-case relative inversion error, deﬁned as max

y=0

ABy − y . y

Give your answer in terms of the SVD of A (and µ). 16.13 Optimal binary signalling. We consider a communication system given by y(t) = Au(t) + v(t), Here • • • • u(t) ∈ Rn is the transmitted (vector) signal at time t y(t) ∈ Rm is the received (vector) signal at time t v(t) ∈ Rm is noise at time t t = 0, 1, . . . is the (discrete) time t = 0, 1, . . . .

Note that the system has no memory: y(t) depends only on u(t). For the noise, we assume that v(t) < Vmax . Other than this maximum value for the norm, we know nothing about the noise (for example, we do not assume it is random). We consider binary signalling, which means that at each time t, the transmitter sends one of two signals, i.e., we have either u(t) = s1 ∈ Rn or u(t) = s2 ∈ Rn . The receiver then guesses which of the two signals was sent, based on y(t). The process of guessing which signal was sent, based on the received signal y(t), is called decoding. In this problem we are only interested in the case when the communication is completely reliable, which means that the receiver’s estimate of which signal was sent is always correct, no matter what v(t) is (provided v(t) < Vmax , of course). Intuition suggests that this is possible when Vmax is small enough. (a) Your job is to design the signal constellation, i.e., the vectors s1 ∈ Rn and s2 ∈ Rn , and the associated (reliable) decoding algorithm used by the receiver. Your signal constellation should minimize the maximum transmitter power, i.e., Pmax = max{ s1 , s2 }. You must describe: 131

• • • •

your analysis of this problem, how you come up with s1 and s2 , the exact decoding algorithm used, how you know that the decoding algorithm is reliable, i.e., the receiver’s guess of which signal was sent is always correct.

(b) The file opt_bin_data.m contains the matrix A and the scalar Vmax . Using your ﬁndings from part 1, determine the optimal signal constellation. 16.14 Some input optimization problems. with 1 0 0 1 1 0 A= 0 1 1 1 0 0 In this problem we consider the system x(t + 1) = Ax(t) + Bu(t), 1 0 1 0 0 0 1 0 , x(0) = B= −1 . 1 0 , 0 1 0 0 0

(a) Least-norm input to steer state to zero in minimum time. Find the minimum T , Tmin , such that x(T ) = 0 is possible. Among all (u(0), u(1), . . . u(Tmin − 1)) that steer x(0) to x(Tmin ) = 0, ﬁnd the one of minimum norm, i.e., the one that minimizes JTmin = u(0) Give the minimum value of JTmin achieved.

2

+ · · · + u(Tmin − 1) 2 .

(b) Least-norm input to achieve x(10) ≤ 0.1. In lecture we worked out the least-norm input that drives the state exactly to zero at t = 10. Suppose instead we only require the state to be small at t = 10, for example, x(10) ≤ 0.1. Find u(0), u(1), . . . , u(9) that minimize J9 = u(0)

2

+ · · · + u(9)

2

subject to the condition x(10) ≤ 0.1. Give the value of J9 achieved by your input. 16.15 Determining the number of signal sources. The signal transmitted by n sources is measured at m receivers. The signal transmitted by each of the sources at sampling period k, for k = 1, . . . , p, is denoted by an n-vector x(k) ∈ Rn . The gain from the j-th source to the i-th receiver is denoted by aij ∈ R. The signal measured at the receivers is then y(k) = A x(k) + v(k), k = 1, . . . , p, where v(k) ∈ Rm is a vector of sensor noises, and A ∈ Rm×n is the matrix of source to receiver gains. However, we do not know the gains aij , nor the transmitted signal x(k), nor even the number of sources present n. We only have the following additional a priori information: • We expect the number of sources to be less than the number of receivers (i.e., n < m, so that A is skinny); • All sources have roughly the same average power, the signal x(k) is unpredictable, and the source signals are unrelated to each other; Hence, given enough samples (i.e., p large) the vectors x(k) will ‘point in all directions’; • The sensor noise v(k) is small relative to the received signal A x(k). Here’s the question: (a) You are given a large number of vectors of sensor measurements y(k) ∈ Rm , k = 1, . . . , p. How would you estimate the number of sources, n? Be sure to clearly describe your proposed method for determining n, and to explain when and why it works. 132 • A is full-rank and well-conditioned;

(b) Try your method on the signals given in the ﬁle nsources.m. Running this script will deﬁne the variables: • m, the number of receivers; • p, the number of signal samples; • Y, the receiver sensor measurements, an array of size m by p (the k-th column of Y is y(k).)

What can you say about the number of signal sources present?

Note: Our problem description and assumptions are not precise. An important part of this problem is to explain your method, and clarify the assumptions. 16.16 The EE263 search engine. In this problem we examine how linear algebra and low-rank approximations can be used to ﬁnd matches to a search query in a set of documents. Let’s assume we have four documents: A, B, C, and D. We want to search these documents for three terms: piano, violin, and drum. We know that: in A, the word piano appears 4 times, violin 3 times, and drum 1 time; in B, the word piano appears 6 times, violin 1 time, and drum 0 times; in C, the word piano appears 7 time, violin 4 times, and drum 39 times; and in D, the word piano appears 0 times, violin 0 times, and drum 5 times. We can tabulate this as follows: A B C D piano 4 6 7 0 violin 3 1 4 0 drum 1 0 39 5 This information is used to form a term-by-document matrix A, where Aij speciﬁes the frequency of the ith term in the jth document, i.e., 4 6 7 0 A = 3 1 4 0 . 1 0 39 5

Now let q be a query vector, with a non-zero entry for each term. The query vector expresses a criterion by which to select a document. Typically, q will have 1 in the entries corresponding to the words we want to search for, and 0 in all other entries (but other weighting schemes are possible.) A simple measure of how relevant document j is to the query is given by the inner product of the jth column of A with q: aT q. j However, this criterion is biased towards large documents. For instance, a query for piano (q = [ 1 0 0 ]T ) by this criterion would return document C as most relevant, even though document B (and even A) is probably much more relevant. For this reason, we use the inner product normalized by the norm of the vectors, aT q j . aj q Note that our criterion for measuring how well a document matches the query is now the cosine of the angle between the document and query vectors. Since all entries are non-negative, the cosine is in [0, 1] ˜ (and the angle is in [−π/2, π/2].) Deﬁne A and q as normalized versions of A and q (A is normalized ˜ column-wise, i.e., each column is divided by its norm.) Then, ˜ ˜ c = AT q is a column vector that gives a measure of the relevance of each document to the query. And now, the question. In the ﬁle term_by_doc.m you are given m search terms, n documents, and the corresponding term-by-document matrix A ∈ Rm×n . (They were obtained randomly from Stanford’s 133

Portfolio collection of internal documents.) The variables term and document are lists of strings. The string term{i} contains the ith word. Each document is speciﬁed by its URL, i.e., if you point your web browser to the URL speciﬁed by the string document{j} you can inspect the contents of the jth document. The matrix entry A(i,j) speciﬁes how many times term i appears in document j. ˜ (a) Compute A, the normalized term-by-document matrix. Compute and plot the singular values of ˜ A. ˜ (b) Perform a query for the word students (i = 53) on A. What are the 5 top results? ˜ (c) We will now consider low-rank approximations of A, that is ˆ Ar =

ˆ ˆ A, rank(A)≤r

min

˜ ˆ A−A .

ˆ ˆ ˆ ˆ Compute A32 , A16 , A8 , and A4 . Perform a query for the word students on these matrices. Comment on the results. (d) Are there advantages of using low-rank approximations over using the full-rank matrix? (You can assume that a very large number of searches will be performed before the term-by-document matrix is updated.) Note: Variations and extensions of this idea are actually used in commercial search engines (although the details are closely guarded secrets . . . ) Issues in real search engines include the fact that m and n are enormous and change with time. These methods are very interesting because they can recover documents that don’t include the term searched for. For example, a search for automobile could retrieve a document with no mention of automobile, but many references to cars (can you give a justiﬁcation for this?) For this reason, this approach is sometimes called latent semantic indexing. Matlab hints: You may ﬁnd the command sort useful. It sorts a vector in ascending order, and can also return a vector with the original indexes of the sorted elements. Here’s a sample code that sorts the vector c in descending order, and prints the URL of the top document and its corresponding cj . [c,j]=sort(-c); c=-c; disp(document{j(1)}) disp(c(1)) 16.17 Condition number and angles between columns. Suppose A ∈ Rn×n has columns a1 , . . . , an ∈ Rn , each of which has unit norm: A = [a1 a2 · · · an ], ai = 1, i = 1, . . . , n.

Suppose that two of the columns have an angle less than 10◦ between them, i.e., aT al ≥ cos 10◦ . Show k that κ(A) ≥ 10, where κ denotes the condition number. (If A is singular, we take κ(A) = ∞, and so κ(A) ≥ 10 holds.) Interpretation: If the columns were orthogonal, i.e., (ai , aj ) = 90◦ for i = j, i, j = 1, . . . , n, then A would be an orthogonal matrix, and therefore its condition number would be one (which is the smallest a condition number can be). At the other extreme, if two columns are the same (i.e., have zero angle between them), the matrix is singular, and the condition number is inﬁnite. Intuition suggests that if some pair of columns has a small angle, such as 10◦ , then the condition number must be large. (Although in many applications, a condition number of 10 is not considered large.) 16.18 Analysis and optimization of a communication network. A communication network is modeled as a set of m directed links connecting nodes. There are n routes in the network. A route is a path, along one

134

or more links in the network, from a source node to a destination node. In this problem, the routes are ﬁxed, and are described by an m × n route-link matrix A, deﬁned as Aij = 1 route j passes through link i 0 otherwise.

Over each route we have a nonnegative ﬂow, measured in (say) bits per second. We denote the ﬂow along route j as fj , and we call f ∈ Rn the ﬂow vector. The traﬃc on a link i, denoted ti , is the sum of the ﬂows on all routes passing through link i. The vector t ∈ Rm is called the traﬃc vector. handle. We’re Each link has an associated nonnegative delay, measured in (say) seconds. We denote the delay for link i as di , and refer to d ∈ Rm as the link delay vector. The latency on a route j, denoted lj , is the sum of the delays along each link constituting the route, i.e., the time it takes for bits entering the source to emerge at the destination. The vector l ∈ Rn is the route latency vector. The total number of bits in the network at an instant in time is given by B = f T l = tT d. (a) Worst-case ﬂows and delays. Suppose the ﬂows and link delays satisfy

n m 2 fj ≤ F 2 ,

(1/n)

j=1

(1/m)

i=1

d2 ≤ D 2 , i

where F and D are given. What is the maximum possible number of bits in the network? What values of f and d achieve this maximum value? (For this problem you can ignore the constraint that the ﬂows and delays must be nonnegative. It turns out, however, that the worst-case ﬂows and delays can always be chosen to be nonnegative.) (b) Utility maximization. For a ﬂow fj , the network operator derives income at a rate pj fj , where pj is the price per unit ﬂow on route j. The network operator’s total rate of income is thus n j=1 pj fj . (The route prices are known and positive.) The network operator is charged at a rate ci ti for having traﬃc ti on link i, where ci is the cost m per unit of traﬃc on link i. The total charge rate for link traﬃc is i=1 ti ci . (The link costs are known and positive.) The net income rate (or utility) to the network operator is therefore

n m

U

net

=

j=1

pj fj −

ci ti .

i=1

Find the ﬂow vector f that maximizes the operator’s net income rate, subject to the constraint that each fj is between 0 and F max , where F max is a given positive maximum ﬂow value. 16.19 A heuristic for MAXCUT. Consider a graph with n nodes and m edges, with the nodes labeled 1, . . . , n and the edges labeled 1, . . . , m. We partition the nodes into two groups, B and C, i.e., B ∩ C = ∅, B ∪ C = {1, . . . , n}. We deﬁne the number of cuts associated with this partition as the number of edges between pairs of nodes when one of the nodes is in B and the other is in C. A famous problem, called the MAXCUT problem, involves choosing a partition (i.e., B and C) that maximizes the number of cuts for a given graph. For any partition, the number of cuts can be no more than m. If the number of cuts is m, nodes in group B connect only to nodes in group C and the graph is bipartite. The MAXCUT problem has many applications. We describe one here, although you do not need it to solve this problem. Suppose we have a communication system that operates with a two-phase clock. During periods t = 0, 2, 4, . . ., each node in group B transmits data to nodes in group C that it is connected to; during periods t = 1, 3, 5, . . ., each node in group C transmits to the nodes in group B that it is connected to. The number of cuts, then, is exactly the number of successful transmissions 135

that can occur in a two-period cycle. The MAXCUT problem is to assign nodes to the two groups so as to maximize the overall eﬃciency of communication. It turns out that the MAXCUT problem is hard to solve exactly, at least if we don’t want to resort to an exhaustive search over all, or most of, the 2n−1 possible partitions. In this problem we explore a sophisticated heuristic method for ﬁnding a good (if not the best) partition in a way that scales to large graphs. We will encode the partition as a vector x ∈ Rn , with xi ∈ {−1, 1}. The associated partition has xi = 1 for i ∈ B and xi = −1 for i ∈ C. We describe the graph by its node adjacency matrix A ∈ Rn×n C), with 1 there is an edge between node i and node j Aij = 0 otherwise Note that A is symmetric and Aii = 0 (since we do not have self-loops in our graph). (a) Find a symmetric matrix P , with Pii = 0 for i = 1, . . . , n, and a constant d, for which xT P x + d is the number of cuts encoded by any partitioning vector x. Explain how to calculate P and d from A. Of course, P and d cannot depend on x. The MAXCUT problem can now be stated as the optimization problem maximize xT P x + d subject to x2 = 1, i = 1, . . . , n, i (b) A famous heuristic for approximately solving MAXCUT is to replace the n constraints x2 = 1, i n i = 1, . . . , n, with a single constraint i=1 x2 = n, creating the so-called relaxed problem i maximize xT P x + d n 2 subject to i=1 xi = n. Explain how to solve this relaxed problem (even if you could not solve part (a)). Let x⋆ be a solution to the relaxed problem. We generate our candidate partition with xi = sign(x⋆ ). (This means that xi = 1 if x⋆ ≥ 0, and xi = −1 if x⋆ < 0.) really i i i Remark: We can give a geometric interpretation of the relaxed problem, which will also explain why it’s called relaxed. The constraints in the problem in part (a), that x2 = 1, require x to lie i on the vertices of the unit hypercube. In the relaxed problem, the constraint set is the unit ball of unit radius. Because this constraint set is larger than the original constraint set (i.e., it includes it), we say the constraints have been relaxed. (c) Run the MAXCUT heuristic described in part (b) on the data given in mc_data.m. How many cuts does your partition yield? A simple alternative to MAXCUT is to generate a large number of random partitions, using the random partition that maximizes the number of cuts as an approximate solution. Carry out this method with 1000 random partitions generated by x = sign(rand(n,1)-0.5). What is the largest number of cuts obtained by these random partitions? Note: There are many other heuristics for approximately solving the MAXCUT problem. However, we are not interested in them. In particular, please do not submit any other method for approximately solving MAXCUT. 16.20 Simultaneously estimating student ability and exercise diﬃculty. Each of n students takes an exam that contains m questions. Student j receives (nonnegative) grade Gij on question i. One simple model ˆ for predicting the grades is to estimate Gij ≈ Gij = aj /di , where aj is a (nonnegative) number that gives the ability of student j, and di is a (positive) number that gives the diﬃculty of exam question i. 136 with variable x ∈ Rn .

without aﬀecting Gij . each of rank r.. Thus. or there is a vector in W whose minimum angle to any vector of V is 10◦ . so that the mean exam question diﬃculty across the m questions is 1. etc. Let V = range(A) and W = range(B). you are given a complete set of grades (i. 137 . W) = 0.g. w). Note: You do not have to concern yourself with the requirement that aj are nonnegative and di are positive. In this problem. QR factorization. and a set of positive. norm. J. say so. Give the diﬃculties of the 7 problems on the exam. (v. This can be compared to the RMS value of the grades. Explain how you could ﬁnd or compute (V. say so. (a) Suppose you are given two matrices A ∈ Rn×r . W) = 10◦ . W) = w∈W. so ˆ that Gij ≈ Gij . w=0 min (v. If (V. normalized question diﬃculties. If v ∈ W. w∈W. using any concepts from EE263. we will normalize the exam question diﬃculties di . or not guaranteed to ﬁnd the global minimum value of J. range(B)). The angle between two nonzero vectors v and w in Rn is deﬁned as (v.21 Angle between two subspaces. say. If carrying out your method requires some rank or other conditions to hold. 16. In particular.m. it means that either there is a vector in V whose minimum angle to any vector of W is 10◦ . we have (v. mn i=1 j=1 ij (a) Explain how to solve this problem. (b) Carry out your method for the matrices found in angsubdata. e. we could simultaneously scale the student abilities and the exam diﬃculties ˆ by any positive number. and also express it as a fraction of the RMS value of the grades.e. You can use any of the concepts in the class. Jordan form. SVD. (b) Carry out your method on the data found in grademodeldata. Give the optimal value of J. least-squares. 1/2 m n 1 G2 . pseudo-inverse. we deﬁne the angle between two nonzero subspaces V and W as (V. choose your model to minimize the RMS error. W) = 10◦ means that the smallest angle between v and any vector in W is 10◦ . W) = max v∈V. You can just assume this works out. the matrix G ∈ Rm×n ). where we take cos−1 (a) as being between 0 and π. This angle is zero if and only if the two subspaces are equal. to ensure a unique model. Finally.Given a particular model.. 1 J = mn m n i=1 j=1 ˆ Gij − Gij 2 1/2 . W). V) . We deﬁne the angle between a nonzero vector v ∈ Rn and a (nonzero) subspace W ⊆ Rn as (v. w) = cos−1 vT w v w . Your task is to ﬁnd a set of nonnegative student abilities. v=0 max (v. w=0 max (w.m. or is easily corrected. If your method is approximate. W). Thus. Give the numerical value for (range(A). B ∈ Rn×r .

Very roughly. but we are given a linear transformation of it. . Submit your code. xn−1 have an RMS value substantially larger than xn . Find the index of the vector that doesn’t ﬁt. and so are nearly uncorrelated (i. in other words. . . . . . sadly. . where w ∈ Rn is a vector of weights.. That. and a wave ﬁle of your estimate xn . in what way the vector you’ve chosen doesn’t ﬁt with the others. . The matlab ﬁle faintestaudio. and so on). as the ith scalar signal. that’s because the linear combination is ‘rich in xn ’. xi (t). T = 26000. . T . The ordering of the vectors isn’t relevant. Carefully explain your method. Since the question is vague. .e. We will not read a long.25 seconds long and sampled at 8 kHz. In particular. . complicated. i. and has only a small amount of energy contributed by x1 . ˆ t = 1. We’ll refer to its ith component. permuting the columns would make no diﬀerence. using concepts from the class (e. using x(t) = A−1 y(t). . xn are unrelated to each other.. Your explanation can be algebraic. . 16. xn . The columns are (vector) data collected in some application. we want a nice. If the linear combination has a small norm. If we knew A. An n-vector valued signal. each 3. or geometric (or both). xn is the faintest scalar signal. y4 . eigenvalues. we have to solve a complex. clarity in your explanation of your method and approach is very important. is exactly what we want. . where A ∈ Rn×n is invertible. The ﬁle will also deﬁne n and T . . .23 One of these vectors doesn’t ﬁt. x(m) ∈ Rn . .16. . in general. . in fact. . . . whose columns we denote as x(1) . We will assume that the scalar signals x1 . Here is a heuristic method for guessing xn (t). . and especially. In any case. . where the tth column of Y is the vector y(t). you don’t need to worry about why the heuristic works (or doesn’t work)—it’s the method you are going to use in this problem. . with α = 0 a n constant. . .24 Extracting RC values from delay data. This ﬁle will deﬁne an n × T matrix Y . Now. t = 1. .m. a perfect reconstruction except for the scale factor ˆ α. along with the associated RMS value of xn . t = 1. (a) Explain how to ﬁnd a w that minimizes the RMS value of xn . x(t) ∈ Rn .22 Extracting the faintest signal. and involve ideas and methods from this course. To ﬁnd the delay of the gate plus its load. or rambling explanation.m contains commands to generate wave ﬁles of the linear combinations y1 . described in the matlab ﬁle faintestdata. one idea behind the heuristic is that. . . . QR factorization. . singular values.. . . we could easily recover the original signal (and therefore also the faintest scalar signal xn (t)). we don’t know A. . the important part of our heuristic: we choose w to minimize the RMS value of xn . . T. We aren’t given the vector signal x(t). . xn−1 . We consider a CMOS digital gate that drives a load consisting of interconnect wires that connect to the inputs of other gates. (b) Carry out your method on the problem instance with n = 4.g.m contains an n × m matrix X. but it should be simple to state. rank. . The signals are actually audio tracks. subject to ˆ w = 1. 16. . You ˆ are welcome to generate and listen to these ﬁles. . We will form our estimate as xn (t) = wT y(t). nearly orthogonal).e. In other words. wT y is a linear combination of the scalar signals x1 . One of the vectors doesn’t ﬁt with the others. and give us the optimal weight vector w ∈ R4 you ﬁnd. then we would have xn (t) = αxn (t). . . ˆ The following is not needed to solve the problem. y(t) = Ax(t). The scalar signals x1 . nonlinear ordinary diﬀerential equation that takes into account 138 . . But. T . The ﬁle one_of_these_data. ˆ range. T. Note that if w were chosen so that wT A = αeT . is deﬁned for t = 1. It is also the signal of interest for this problem. T . . short explanation. least-squares. for t = 1.

16.) The goal is to ﬁnd positive numbers R1 . . msa msa msa msa (a) Minimum mean-square absolute error. (D is given as an m × n matrix. • Peaking factor. . . We will take points oﬀ if your method is not the best possible one. given accurate data. . . . Cm so that Dij ≈ Rj Ci . given accurate delay data (obtained by simulation) for each combination of a gate driving a load. Please note the following: • You do not need to know anything about digital circuits to solve this problem. In this problem. Emsa = 1 mn m n i=1 j=1 (Dij − Rj Ci ) . we will take the ﬁrst load as our ‘unit load’. 139 .. is sometimes called parameter extraction. . which is based on the time it takes the voltage of a simple RC circuit to decay to 1/2 its original value. This problem concerns the autonomous linear dynamical system x = Ax. . . By simulation. . This can be done using a circuit simulator such as SPICE. given accurate delay data Dij (obtained by simulation). Also write down your minimum Emsa and Emsl values. the delay of the gate has the form ηRC. for the particular delay data given in rcextract_data. Cm . any t. We deﬁne the peaking factor of the system as the largest possible value of x(t + τ ) / x(t) . Rn and C1 . 2 (The logarithm here is base e.25 Some attributes of a stable system. . Cm msl (positive.) msa msa msa msa msl msl msl msl (c) Find R1 . with C1 = 1) that minimize the mean-square absolute error. . .circuit nonlinearities. . all trajectories x(t) converge to zero as ˙ t → ∞). . .m from the course web site. We address the question of determining a value of R for each of a set of gates. . . Cm msa (positive. n. Emsl 1 = mn m n i=1 j=1 (log Dij − log(Rj Ci )) . . . . (For simplicity we’ll take η = 1 in our delay model. . . Explain how to ﬁnd R1 . To remove this ambiguity. . A very simple approximation of the delay can be obtained by modeling the gate as a simple resistor with resistance R. and m loads labeled 1. 2 msl msl msl msl (b) Minimum mean-square logarithmic error. . parasitic capacitances.) This simple RC delay model can be used for design.e. we obtain the (accurate) delay Dij for gate j driving load i. . and any τ ≥ 0..e. . Finally we get to the problem. . Rn and C1 . . (One common choice is η = 0. . . . . even if your answer is numerically close to the correct one. so we expect the extracted Rs and Cs to be similar in the two cases. where η is a constant that depends on the precise deﬁnition of delay used. we are extracting the gate resistances Rj and the load capacitances Ci . as well as the values themselves. . . .) Finding good values of parameters for a simple model. i. . or approximate (but very fast) analysis. but it doesn’t really matter. If we scale all resistances by a constant α > 0. we will ﬁx C1 = 1. Rn and C1 . the approximate delays Rj Ci don’t change. In this simple model. • In this problem we are more interested in your approach and method than the ﬁnal numerical results. . Rn and C1 . . . • The two criteria (absolute and logarithmic) are clearly close. with x(t) ∈ Rn . Rn and C1 . with C1 = 1) that minimize the mean-square logarithmic error. and so on. . and scale all capacitances by 1/α. for any nonzero trajectory x. We have n digital gates labeled 1. Explain how to ﬁnd R1 . which we assume is stable (i. . But they are not the same. . . . . . and the load as a simple capacitor with capacitance C. m. as well as R1 .69. Submit the matlab code used to calculate these values. and a value of C for each of a set of loads. . Cm .

6 1 0 6 with τ max = 10. T = 10. This is the smallest possible time the state can rotate 90◦ . (The threshold levels li are known.01. You know A and C. T ].) (a) Explain how to ﬁnd each of these quantities. but not x(t) or y(t). 4. of the following format. . while being consistent with the given alarm data. We’d like all quantities to an accuracy of around 0.0288. (b) Carry out your method for the speciﬁc case with −1 −5 0 5 0 0 A= 0. (If x(t + τ ) ⊥ x(t) never occurs for τ ≥ 0. We allow +∞ as an answer here.0195].0863]. . where li is the threshold level for output component i. for all trajectories. A linear dynamical system evolves according to x(t) = Ax(t). [3.9 −4. where τ max is known. you are given the (possibly empty) set of the intervals in [0. You can assume that you do not need to search over τ greater than τ max . For each output component i = 1. 6. except as described below. you must explain your method. −6 0 16. A= 0 1 0 C= 1 0 −1 0 1 1 . .4176. The output is monitored using level alarms with thresholds.26 System with level alarms. If it is +∞. T ] over which yi (t) ≥ li . Your method can involve some numerical simulation.2 −2 1 0 0 . l1 = l2 = 1. [0. The problem is to ﬁnd an upper bound on how large x(T ) can be. but higher than they need to be. and alarm intervals given below: y1 : y2 : [0. where x(t) ∈ Rn is the state and y(t) ∈ Rp is the output at time t. 1. [6. explain. (We will deduct points for solutions that give bounds that are correct. We deﬁne the minimum decorrelation time as the smallest possible τ ≥ 0 for which x(t + τ ) ⊥ x(t) can hold for some (nonzero) trajectory x.• Halving time. ˙ y(t) = Cx(t). this means that there are trajectories with arbitrarily large values of x(T ) that are consistent with the given alarm data.4 −1 −0.9210. . 0 0 . p. These tell us when yi (t) ≥ li .9402].) Give your bound on x(T ) . such as a search over a (ﬁne) grid of values of τ . 140 . Of course. We deﬁne the halving time of the system as the smallest τ ≥ 0 for which x(t+τ ) ≤ x(t) /2 always holds.) You have alarm data over the time interval [0. • Minimum decorrelation time. 1.9723] We now consider the speciﬁc problem with −0. then the minimum decorrelation time is +∞.

An actuator is used to apply a force u(t) to one of the masses. You must explain and justify how you obtained your solution. and damping constants are all identical and given by m = 1.) • You may (or may not) want to use the result h A 0 eAt B dt = eAh − I B. but with only four masses. . use the matlab script spring_series. in which case we suggest you use 100 discretization intervals (or more. you are to choose an input u(t). spring constants. d = 0. As before. k = 1.01. from among all inputs that achieve the minimum x(tf ) 2 . Your answer should include (i) a plot of the minimizing input uopt (t). and d = 0. . (a) Actuator placement (10 masses).) In other words. y10 . For which of the ten possible actuator placements is the system controllable? ii. Where would you place the actuator? (Since the design speciﬁcation is somewhat vague. tf ]. The mass positions (deviation from rest) are denoted by y1 . and (iii) the resulting x(tf ) 2 .. available at the course web site. a velocity disturbance in the fourth mass is to be attenuated as much as possible in 20 seconds. which constructs the dynamics and input matrices A and B. The masses. (b) Optimal control (4 masses). say.min . the actuator is shown located on the second mass from the left.) A force actuator is placed on the third mass from the left.. You are given the initial state of the system. x(0) = e8 = [0 · · · 0 1]T . (ii) the corresponding energy Eu. ˙ i. In the ﬁgure. Consider now a system with the same characteristics. Is the system controllable? ii. and asked to drive the state to as close to zero as possible at time tf = 20 (i. Use state x = [y T y T ]T . that minimizes x(tf ) 2 . Furthermore. ˙ i. we want the smallest one. but it could also have been placed in any of the other nine masses. Four unit masses are connected in series by springs and light dampers (with k = 1.) You may want to discretize the system. an input that is piece-wise constant in small intervals.) Note: To avoid error propagation in solutions.Lecture 18 – Controllability and state transfer 18. as shown in the ﬁgure above.e. You are given the following design speciﬁcation: any state should be reachable without the need for very large actuation forces. Notes: • We will be happy with an approximate solution (by using. . you should clearly explain and justify your decision.e. the one for which the energy tf Eu = u(t)2 dt 0 is minimized. t ∈ [0.m. i.1 This problem has two parts that are mostly independent. . 141 . u(t) d m k y1 (t) k y2 (t) d m k y3 (t) y10 (t) d m m Ten masses are connected in series by springs and light dampers. use state x = [y T y T ]T .01.

Who is right? Justify your answer. You must explain what you are doing. The system z(t + 1) = A−1 z(t) − A−1 Bv(t) is the same as the given one. dimensions. where tf and xdes are given.5 Some True/False questions. By ‘True’. i. You can assume that A is invertible.) (c) Matlab exercise.4 Minimum energy inputs with coasting. the system ‘coasts’ or ‘drifts’ with u(t) = 0. from t = ta until the ﬁnal time tf . for 0 < ta < tf . 18. etc.3 Minimum energy required to steer the state to zero.e. except time is running backwards. 0 0 1 0 u(0)2 + · · · + u(N − 1)2 . ˙ where A ∈ Rn×n and B ∈ Rn×m . An engineer asserts that the minimum energy required will decrease. 142 . ‘False’ means that the statement fails to hold in at least one case.. i. An engineer argues as follows: This problem is like the minimum energy reachability problem. x(0) = 0. x(0) = 0. x(0) = x0 .e.W) computes the solution of the Lyapunov equation AP AT + W = P .e. Among all u that satisfy these speciﬁcations. and in the reachability problem we steer the state from zero to a given state. Either justify or refute the engineer’s statement. Consider a controllable discrete-time system x(t + 1) = Ax(t) + Bu(t). (b) Now suppose that ta is increased (but still less than tf ). just submitting some code and a plot is not enough. uln will denote the one that minimizes the ‘energy’ ta u(t) 0 2 dt. Hint: the matlab command P=dlyap(A.2 Horizon selection. without ﬁxing the ﬁnal time N . i. we mean that the statement holds for all values of the matrices.. i. Let xdes = [1 0 − 1 0 0 0]T and tf = 6.. We consider the controllable system x = Ax + Bu. Find the minimum value of N such that EN (z) ≤ 1. Plot the minimum energy required as a function of ta .18. from t = 0 until t = ta . but ‘turned backwards in time’ since here we steer the state from a given state to zero. You are also given ta . Therefore E(x0 ) is the same as the minimum energy required for z to reach x0 (a formula for which can be found in the lecture notes). Roughly speaking. vectors. where 0 < ta ≤ tf . For N ≥ 3 let EN (z) denote the minimum input energy. mentioned in the statement. of course. E∞ (z) = limN →∞ EN (z). Another engineer disagrees.1E∞ (z) for all z. 18. Let E(x0 ) denote the minimum energy required to drive the state to zero. (a) Give an explicit formula for uln (t). the minimum value of required to reach x(N ) = z. 18. for any ﬁnal state). Consider the (scalar input) system 1 0 0 0. Consider the mechanical system on page 11-9 of the notes. t−1 E(x0 ) = min τ =0 u(τ ) 2 | x(t) = 0 .. You are to determine an input u that results in x(tf ) = xdes . (This is the shortest horizon that requires no more than 10% more control energy than inﬁnite horizon control.e. Let E∞ (z) denote the minimum energy required to reach the state x(N ) = z. You can use a simple method to numerically approximate any integrals you encounter. and have the constraint that u(t) = 0 for t > ta . you are allowed to apply a (nonzero) input u during the ‘controlled portion’ of the trajectory.8 x(t + 1) = 1 0 0 x(t) + 0 u(t). (It is possible that neither is right. pointing out that the ﬁnal time has not changed.

and drives the state to xdes in N steps? (b) Is the converse true? Suppose we can ﬁnd a standard input sequence so that x(N ) = xdes . . we’ll refer to an input sequence as a standard input sequence if both inputs can be nonzero at each time t. n. Let x : R+ → Rn be any trajectory of the linear dynamical system x = Ax.. can we always ﬁnd an input sequence that uses both inputs at each time step. 5. 18. Can we always ﬁnd a standard input sequence so that x(N ) = xdes ? In other words. .. 4. u2 (t)) ∈ R2 is the input. x(t) ∈ Rn is the state. (b) Suppose x : R+ → Rn is a trajectory of the linear dynamical system x = Ax. with leading coeﬃcient one.e. . . Can we always ﬁnd an alternating input sequence so that x(2N ) = xdes ? 143 . and u(t) = (u1 (t). ˙ (e) Suppose P and Q are symmetric n × n matrices. v = 0. B = [b1 b2 ] ∈ Rn×2 . 3. x(t + 1) = Ax(t) + Bu(t).6 Alternating input reachability. and suppose v ∈ Rn . we have x(t) ≤ x(0) . u(1) = 0 u2 (1) .. Then there is an input that steers the state from xinit at time t = 0 to xﬁnal at time t = n. u(1). .e. . (d) Suppose the two linear dynamical systems x = F x and z = Gz.. . Then AB is full rank. . we must have P ≥ Q. (h) Suppose A ∈ Rn×n has all eigenvalues equal to zero. and a time horizon N ≥ n. if we can drive the state to xdes in 2N steps with an alternating input sequence. where λ ∈ R. t = 0. 1. where F. . Then T T if we have vi P vi ≥ vi Qvi for i = 1. . u(3) = 0 u2 (3) . and B ∈ Rq×r is skinny (i.. . q ≥ r) and full rank. Then the linear dynamical system w = (F + G)w is stable. . is an alternating input sequence if u1 (t) = 0 for t = 1.(a) Suppose A ∈ Rn×n and p(s) = sn + a1 sn−1 + · · · + an is polynomial of degree n... Suppose there is an input that steers the state from a particular initial state xinit at time t = 0 to a particular ﬁnal state xﬁnal at time t = T . .. and let {v1 . u(2) = u1 (2) 0 . i. . where A ∈ Rn×n . We say that an input sequence u(0). (c) Let A ∈ Rp×q and let ai ∈ Rp denote the ith column of A. and u2 (t) = 0 for t = 0. ˙ Then for any t ≥ 0. G ∈ Rn×n .e. Then we have A ≥ max i=1. where A ∈ Rn×n . . We assume that x(0) = 0. p ≤ q) and full rank. . at time t. Then A = 0. ... In contrast.q ai . where T > n. Then p is the characteristic polynomial of A. are both ˙ ˙ stable. that satisﬁes p(A) = 0. vn } be a basis for Rn . (i) Consider the discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t). and the nullspace of A is the same as the nullspace of A2 .. We consider a linear dynamical system with n states and 2 inputs. (f) Let A ∈ Rn×n . satisﬁes v T A = λv T . . u(0) = u1 (0) 0 . We are given a target state xdes ∈ Rn . which is stable. . Then at least one of the following ˙ statements hold: • v T x(t) ≥ v T x(0) for all t ≥ 0 • v T x(t) ≤ v T x(0) for all t ≥ 0 (g) Suppose A ∈ Rp×q is fat (i. . 2. . (a) Suppose we can ﬁnd an alternating input sequence so that x(2N ) = xdes . . v2 .

g. etc. eigenvalues. b2 . we won’t read more than one page. you must provide a justiﬁcation. which must be either ‘Yes’ or ‘No’. xdes . b1 .). b2 . and N ≥ n. controllability. In your solution for parts (a) and (b) you should ﬁrst state your answer.. You may use any of the concepts from the class (e. if we can ﬁnd an alternating input sequence so that x(2N ) = xdes . you are saying that for any A. if your answer is ‘Yes’ for part (a). you must provide a counterexample (and you must explain clearly why it is a counterexample). b1 . So. then we can also ﬁnd a standard input sequence so that x(N ) = xdes . we mean for any A. singular values. If your answer is ‘Yes’.By always. xdes and N ≥ n. 144 . pseudo-inverse. Your solution must be short. for example. and if your answer is ‘No’.

.) (The matrix A is the same as in problem 14.Lecture 19 – Observability and state estimation 19. The degree gives the highest derivative of y used to reconstruct the state. . just to save you typing. of this minimum number.1 Sensor selection and observer design.e. . dt dt where F0 . Fk are matrices that specify the observer. We say the observer uses or requires the sensor i if at least one of the ith columns of F0 . the observer has to work!) Consider an observer deﬁned by F0 . . 1 (This problem concerns observer design so we’ve simpliﬁed things by not even including an input. . .. for which there is an observer using only these sensors. . i. with 0 0 .e. . Such observers have the form x(t) = F0 y(t) + F1 dy dk y (t) + · · · + Fk k (t). If the ith columns of F0 . . . sets) of sensors. . We say the degree of the observer is the largest j such that Fj = 0. Fk are all zero. (b) What is the minimum degree observer? List all combinations of sensors for which an observer of this minimum degree can be found. then the observer doesn’t use the ith sensor signal yi (t) to reconstruct the state. there is no other connection between the problems. Fk is nonzero. y ˙ 1 0 0 0 1 1 0 1 1 0 0 . . Fk . C= 0 1 1 A= 0 1 1 0 0 0 0 1 0 0 0 = Cx. . Consider the system x = Ax. (Of course we require this formula to hold for any trajectory of the system and any t.) We consider observers that (exactly and instantaneously) reconstruct the state from the output and its derivatives.. . . . 145 . (a) What is the minimum number of sensors required for such an observer? List all combinations (i.

- Ee263 Course Reader
- Linear Dynamical Systems - Course Reader
- Introduction to Binary Block Codes
- AB8. an Elementary Treatment of General Inner Products
- ADSP Lectures Codito
- Linear System Theory (1)
- Fibre Bundles and Global Aspects of Gauge Theories
- Notes on Luenberger's Vector Space Optimization
- Fluid Dynamics
- vecmastf - vector spaces, vector algebras, and vector geometries
- solution01_wise2011
- Notes on the equivalence of norms
- AppA
- Introducing Matlab
- JURNAL PERSAMAAN ACKERMAN
- BASIC Mca Syllbus
- la
- Linear Algebra 1
- formemain
- Matrix Econometrics I
- Appendixa
- 2075
- Eig_2_hmwk
- Compulsory Macro Lecture 9
- Non-linear difference equation
- svmzen
- Kac_Notes
- Lecture notes on Matrix Theory
- 533
- Support Vector Machines

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd