Algebraic Methods in Data Science: Lesson 2: Dan Garber

Algebraic Methods in Data Science: Lesson 2
Faculty of Industrial Engineering and Management

Technion - Israel Institute of Technology
Dan Garber
https://dangar.net.technion.ac.il/
Winter Semester 2020-2021
@ Dan Garber (Technion) Lesson 2 Winter 2020-21 1 / 22
Recap
Definition (Norm)
A function k·k : X → R is a norm, if
1 ∀x ∈ X kxk ≥ 0, and kxk = 0 if and only if x = 0 (positivity)
2 ∀x, y ∈ X kx + yk ≤ kxk + kyk (triangle inequality)
3 ∀α ∈ R, x ∈ X : kαxk = |α| · kxk (homogeneity)
Definition (Inner product)
An inner product on a (real) vector space X is a function which maps any
pair x, y ∈ X into a real scalar denoted by hx, yi, which satisfies the
following axioms for any x, y, z ∈ X and scalar α ∈ R:
1 hx, xi ≥ 0, and hx, xi = 0 if and only if x = 0 (positivity)
2 hx + y, zi = hx, zi + hy, zi (additivity)
3 hαx, yi = αhx, yi (homogeneity)
4 hx, yi = hy, xi (symmetry)
Recap
Theorem
Let X be an
pinner product space. Then, the function k·k : X → R given
by kxk := hx, xi is a norm.
Example: for X = Rn the function hx, yi = x> y, x, y ∈ Rn is the

standard inner product, q
and the induced norm is simply the Euclidean
√ Pn
norm kxk2 = x> x = 2
i=1 xi .
Recap
Definition (Orthogonal vectors)

Given an inner product space X and vectors x, y ∈ X , we say that x, y
are orthogonal if hx, yi = 0, and we write x ⊥ y.
Definition
Given an inner product space X and vectors x(1) , . . . , x(n) in X , all are
non-zero, we say that x(1) , . . . , x(n) are mutually orthogonal if and only if
hx(i) , x(j) i = 0 for all i 6= j.
Theorem
Given an inner product space X , any mutually orthogonal vectors
x(1) , . . . , x(n) are linearly independent.

Orthonormal Vectors and Matrices
Definition (Orthonormal vectors)

A set of vectors S = {x(1) , . . . , x(n) } in an inner product space X is said
to be orthonormal if, for all i, j = 1, . . . , n,

0 if i 6= j;
hx(i) , x(j) i =
1 if i = j.
Since from the last theorem orthonormal vectors are linearly independent,
we have that a set of orthonormal vectors S forms an orthonormal basis
for the linear span of S.
Example: for X = Rn with the standard inner product, the set of vectors
{e1 , . . . , en }, where ∀i, j ∈ {1, . . . , n} i 6= j, ei (j) = 0 and ei (i) = 1,
forms an orthonormal basis to Rn .
Orthonormal Vectors and Matrices

Definition (Orthonormal matrix)
A matrix X ∈ Rn×m , m ≤ n is said to be orthonormal if its columns are
orthonormal vectors.
Corollary
A square n × n orthonormal matrix X is invertible and X−1 = X> .
Proof: From theorem on orthogonal vectors we have that columns of X

are linearly independent, i.e., X is full rank and thus it is invertible.
Let Xi denote the ith column of X. Now, from the fact that columns of
X are orthonormal:
∀i 6= j : [X> X]i,j = X>
i Xj = 0, [X> X]i,i = X>
i Xi = 1
That is, X> X = I. Since X−1 exists, using the above we can deduce:
XX−1 = I =⇒ X> XX−1 = X> =⇒ X−1 = X> .
Orthogonal Decomposition of Linear Spaces
Definition (orthogonal complement)

A vector x is said to be orthogonal to a subset S of an inner product space
X , if x ⊥ y for all y ∈ S. The set of vectors in X that are orthogonal to
S is called the orthogonal complement of S, and it is denoted by S ⊥ .
Observation
The orthogonal complement S ⊥ is always a subspace.
Example: Let X = Rn with the standard inner product and consider the
subspace S := {x ∈ Rn | ∀i > 1 : xi = 0}.
It is not hard to see that the orthogonal complement is given by
S ⊥ = {x ∈ Rn | x1 = 0}.
Definition (orthogonal complement)

A vector x is said to be orthogonal to a subset S of an inner product space
X , if x ⊥ y for all y ∈ S. The set of vectors in X that are orthogonal to
S is called the orthogonal complement of S, and it is denoted by S ⊥ .
Definition (direct sum)

A vector space X is said to be the direct sum of two subspaces A, B if
any element x ∈ X can be written in a unique way as x = a + b, with
a ∈ A and b ∈ B, and we write X = A ⊕ B.
Theorem (Orthogonal decomposition of the space)

If S is a subspace of an inner product space X , then any vector x ∈ X can
be written in a unique way as the sum of one element in S and one in the
orthogonal complement S ⊥ . That is X = S ⊕ S ⊥ for any subspace S ⊆ X .
Theorem (Orthogonal decomposition of the space)
Proof: First note that S ∩ S ⊥ = {0}, since if v ∈ S ∩ S ⊥ , then by

definition kvk2 = hv, vi = 0.
Denote W = S + S ⊥ . That is, W = {a + b | a ∈ S, b ∈ S ⊥ }. We will
first prove that W = X and then we will prove the uniqueness.
Observe that clearly W ⊆ X . Thus it remains to be shown that X ⊆ W.
Assume by contradiction that X 6⊆ W. Consider an orthonormal basis of
W and extend it (by adding additional elements) to an orthonormal basis
of X (later on we show that for every subspace it is possible to construct
an orthonormal basis via the Gram-Schmidt procedure), denote this
basis by B.
Orthogonal Decomposition of Linear Spaces (proof cont.)

Denote W = S + S ⊥ . That is, W = {a + b | a ∈ S, b ∈ S ⊥ }. Need to
show W = X . We know W ⊆ X . Assume by contradiction that X 6⊆ W.
Consider an orthonormal basis of W and extend it (by adding additional
elements) to an orthonormal basis of X and denote this basis by B (red
points in picture).
Since X 6⊆ W there must be a basis element z ∈ B such that z ∈

/ W. In
particular z is orthogonal to W.
Since S ⊆ W, z is orthogonal to S as well =⇒ z ∈ S ⊥ .
However, S ⊥ ⊆ W and hence z ∈ W, which results in a contradiction.
Hence we proved that W = S + S ⊥ = X (each element x ∈ X can be
written as the sum of one element in S and one element in S ⊥ ).
Orthogonal Decomposition of Linear Spaces (proof cont.)
Theorem (Orthogonal decomposition)

It remains to prove uniqueness. Suppose for the purpose of contradiction

that uniqueness does not hold. Then, there exist x1 , x2 ∈ S and
y1 , y2 ∈ S ⊥ , x1 6= x2 , y1 6= y2 such that
x1 + y 1 = x and x2 + y2 = x.
However, taking the difference of the two equations will give
x1 − x2 = y2 − y1 .
Since (x1 − x2 ) ∈ S and (y1 − y2 ) ∈ S ⊥ it follows that
x1 − x2 = y2 − y1 = 0 (recall S ∩ S ⊥ = {0}). However, this means that
x1 = x2 and y1 = y2 , and thus we arrive at a contradiction.
Projections onto Subspaces

Projection is the problem of finding a point on a given set that is closest
(in norm) to a given point.
Formally, given a vector x in an inner product space X and a closed set
S ⊆ X (i.e., S contains its boundary), the projection of x onto S, denoted
by ΠS (x), is defined as the point in S at minimal distance from x:
ΠS (x) = arg min ky − xk,

y∈S
where the norm used is the one induced by the inner product.

Warmup: Projection onto a One-dimension Subspace
Given a non-zero vector v ∈ X , where X is an inner-product space, let Sv
denote the subspace spanned by v, i.e., Sv = {λv, λ ∈ R}.
Given a vector x ∈ X , we seek ΠSv (x) = arg miny∈Sv ky − xk.
We will show the projection is characterized by the fact that the difference
(x − ΠSv (x)) is orthogonal to v.
Let xv be a point in Sv such that (x − xv ) ⊥ v (note from the subspace
decomposition theorem we can always write x = xv + z for some z ⊥ Sv ).
Consider an arbitrary vector y ∈ Sv . Note that (y − xv ) ⊥ (x − xv ).
Thus, by the Pythagoras theorem we have
ky − xk2 = k(y − xv ) − (x − xv )k2 = ky − xv k2 + kx − xv k2 .
Since the first term is always non-negative, it follows the minimum over y
is obtained by taking y = xv , which proves that xv is indeed the
projection we sought. Note in particular the projection is unique.
Warmup: Projection onto a One-dimension Subspace

Given a non-zero vector v ∈ X , where X is an inner-product space, let Sv
denote the subspace spanned by v, i.e., Sv = {λv, λ ∈ R}. Given a vector
x ∈ X , we seek ΠSv (x) = arg miny∈Sv ky − xk.
We showed the projection xv = ΠSv (x) is characterized by the
orthogonality condition hx − xv , vi = 0.
Let’s now find an explicit expression for xv .
Since xv ∈ Sv it follows that xv = λv for some λ ∈ R. We have that
0 = hx − xv , vi = hx − λv, vi = hx, vi − λhv, vi = hx, vi − λkvk2 .

hx,vi
Hence, we obtain λ = kvk2
, which results in the projection
hx, vi
xv = ΠSv (x) = v.
kvk2
xv is usually called the component of x along the direction v.
In particular, if kvk = 1 then xv = hx, viv.
Projection onto Arbitrary Subspaces
We now extend the previous result to the case when S is not necessarily
one-dimensional. In this case also orthogonality plays a key role.
Theorem (projection theorem)
Let X be an inner product space, let x be a given element in X , and let S
be a subspace of X . Then, there exists a unique vector x∗ ∈ S which is
the solution to the problem miny∈S ky − xk.
Moreover, a necessary and sufficient condition for x∗ being the optimal
solution for this problem is that x∗ ∈ S, (x − x∗ ) ⊥ S.
Projection onto Arbitrary Subspaces
Theorem (projection theorem)

Let X be an inner product space, let x be a given element in X , and let S
be a subspace of X . Then, there exists a unique vector x∗ ∈ S which is
the solution to the problem miny∈S ky − xk.
Moreover, a necessary and sufficient condition for x∗ being the optimal
solution for this problem is that x∗ ∈ S, (x − x∗ ) ⊥ S.
Proof: by the subspace decomposition theorem x can be written in a

unique way as x = u + z, u ∈ S, z ∈ S ⊥ . Hence, for any y ∈ S, since
(y − u) ∈ S and z ∈ S ⊥ and using the Pythagoras theorem, we have:
ky − xk2 = k(y − u) − zk2 = ky − uk2 + kzk2 .
It follows that the unique minimizer is x∗ = u. Indeed, with this choice we

a have (x − x∗ ) = z ⊥ S.

Projection onto Vector Span
Suppose now we have a basis for a subspace S ⊆ X , that is
S = span(x1 , . . . , xm ). Given x ∈ X , the projection theorem tells us that
the unique projection x∗ of x onto S is characterized by the orthogonality
condition (x − x∗ ) ⊥ S.
Since x∗ ∈ S, we can write x∗ as some (unknown) Pm linear combination of
∗
the basis elements x1 , . . . , xm , that is x = i=1 αi xi , where the scalar
coefficients α1 , . . . , αm are unknown. Note that
(x − x∗ ) ⊥ S ⇔ ∀i ∈ {1, . . . , m} : hx − x∗ , xi i = 0
⇔ ∀i ∈ {1, . . . , m} : hx∗ , xi i = hx, xi i.
Plugging x∗ = m
P
i=1 αi xi , we arrive at the following system of m linear
equations in m unknowns (the scalars α1 , . . . , αm ):
m
X
αi hxi , xk i = hx, xk i, k = 1, . . . m.
i=1
The solution provides the coefficients α1 , . . . , αm , and hence the
projection x∗ .
Projection onto Span of Orthonormal Vectors

Pm
Recall the projection is given by x∗ = i=1 αi xi where α1 , . . . , αm are the
result of solving the linear system:
m
X
αi hxi , xk i = hx, xk i, k = 1, . . . m.
i=1
Consider the previous case with the difference that now the vectors
x1 , . . . , xm are orthonormal (recall we can always construct an
orthonormal basis via the Gram-Schmidt procedure, to be detailed later).
Now, we get
m
X
αk = αi hxi , xk i = hx, xk i, k = 1, . . . , m,
i=1
which gives the closed-form solution to the projection:

m
X
∗
x = hx, xi ixi .
i=1

Projection onto Span of Orthonormal Vectors
In case x1 , . . . , xm are orthonormal, the projection of x onto
S = span(x1 , . . . , xm ) is given by
m
X
x∗ = hx, xi ixi .
i=1
In the special case in which the linear space is X = Rn with the standard
inner product, i.e., x1 , . . . , xm ∈ Rn , the above can be written in the
following matrix form:
m m m
!
X X X
x∗ = x> xi xi = xi x>
i x= xi x>
i x = PP> x,
i=1 i=1 i=1
where P is the n × m matrix whose columns are exactly the basis vectors
x1 , . . . , xm .
PP> is called the projection matrix onto span(x1 , . . . , xm ).
The Gram-Schmidt procedure (briefly)
We now complement the above results by showing that given any basis
(x1 , . . . xm ) for a subspace S ⊆ X , one can construct an orthonormal
basis for S. This could be done via the Gram-Schmidt procedure.
The process takes any basis (x1 , . . . xm ) and generates an orthonormal

basis (y1 , . . . ym ) such that span(y1 , . . . ym ) = span(x1 , . . . xm ).
The actual proof is by induction, showing that for all i ∈ {1, . . . , m},
span(y1 , . . . , yi ) = span(x1 , . . . xi ).
The process works in m steps, each generating a new basis member yi .

x1
For the first step we simply take y1 = kx1 k . Clearly the induction holds for
i = 1, since span(y1 ) = span(x1 ).
Suppose the induction holds for some i ≥ 1 and denote
Si = span(y1 , . . . , yi ) = span(x1 , . . . , xi ).
Let zi+1 = xi+1 − ΠSi (xi+1 ) and yi+1 = kzzi+1 i+1 k
(recall ΠSi (xi+1 ) is the
projection of xt+1 onto Si = span(y1 , . . . , yi )).
From the projection theorem: xi+1 − ΠSi (xi+1 ) ⊥ span(y1 , . . . , yi ).
Thus, yi+1 ⊥ span(y1 , . . . , yi ) and kyt+1 k = 1.
Note that since (x1 , . . . , xm ) is a basis, we have that xi+1 ∈
/ Si and hence
zi+1 6= 0, and therefore yi+1 is well defined.
Moreover, it clearly holds that xi+1 ∈ span(y1 , . . . , yi+1 ) and hence the
induction holds for i + 1 as well, i.e.,
Si = span(y1 , . . . , yi+1 ) = span(x1 , . . . , xi+1 ).
Did you notice the circular logic bug in today’s lecture?

Did you notice the circular logic bug in today’s lecture?
We have used Gram-Schmidt to prove the Orthogonal Decompositon of

Linear Spaces theorem (to construct an orthonormal basis).
But we have proved Gram-Schmidt using the Projection theorem whose
proof uses the Orthogonal Decompositon theorem.
Not a real problem: in the tutorial you will see a more detailed version
and proof of Gram-Schmidt which DOES NOT rely on the Projection
theorem.

Algebraic Methods in Data Science: Lesson 2: Dan Garber

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algebraic Methods in Data Science: Lesson 2: Dan Garber

Uploaded by

Copyright:

Available Formats

Algebraic Methods in Data Science: Lesson 2

Faculty of Industrial Engineering and Management

Winter Semester 2020-2021

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 1 / 22

Example: for X = Rn the function hx, yi = x> y, x, y ∈ Rn is the

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 3 / 22

Definition (Orthogonal vectors)

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 4 / 22

Definition (Orthonormal vectors)

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 5 / 22

Orthonormal Vectors and Matrices

Proof: From theorem on orthogonal vectors we have that columns of X

Definition (orthogonal complement)

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 7 / 22

Orthogonal Decomposition of Linear Spaces

Definition (orthogonal complement)

Definition (direct sum)

Theorem (Orthogonal decomposition of the space)

Proof: First note that S ∩ S ⊥ = {0}, since if v ∈ S ∩ S ⊥ , then by

Orthogonal Decomposition of Linear Spaces (proof cont.)

Since X 6⊆ W there must be a basis element z ∈ B such that z ∈

Theorem (Orthogonal decomposition)

It remains to prove uniqueness. Suppose for the purpose of contradiction

Projections onto Subspaces

ΠS (x) = arg min ky − xk,

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 12 / 22

ky − xk2 = k(y − xv ) − (x − xv )k2 = ky − xv k2 + kx − xv k2 .

Warmup: Projection onto a One-dimension Subspace

0 = hx − xv , vi = hx − λv, vi = hx, vi − λhv, vi = hx, vi − λkvk2 .

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 15 / 22

Projection onto Arbitrary Subspaces

Theorem (projection theorem)

Proof: by the subspace decomposition theorem x can be written in a

ky − xk2 = k(y − u) − zk2 = ky − uk2 + kzk2 .

It follows that the unique minimizer is x∗ = u. Indeed, with this choice we

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 16 / 22

Projection onto Span of Orthonormal Vectors

which gives the closed-form solution to the projection:

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 18 / 22

The Gram-Schmidt procedure (briefly)

The process takes any basis (x1 , . . . xm ) and generates an orthonormal

The process works in m steps, each generating a new basis member yi .

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 20 / 22

Did you notice the circular logic bug in today’s lecture?

The Gram-Schmidt procedure (briefly)

Did you notice the circular logic bug in today’s lecture?

We have used Gram-Schmidt to prove the Orthogonal Decompositon of

@ Dan Garber (Technion) Lesson 2 Winter 2020-21 22 / 22

You might also like