You are on page 1of 5

The Chinese University of Hong Kong, Shenzhen

School of Data Science • Shi PU and Junfeng WU

Linear Algebra Cheat Sheet


Note:

• The purpose of this handout is to give a brief review of some of the basic concepts in linear
algebra. If you are unfamiliar with the material and/or would like to do some further reading,
you may consult, e.g., the books [1,2,3].

1 Vector Spaces
We denote the set of real numbers (also referred to as scalars) by R. A vector space that we may
be familiar with is R2 . We can think of it as a plane, and any point in R2 can be represented by
an ordered list of its coordinates, that is,
  
2 x
R = : x, y ∈ R .
y

To define n-dimensional analogues of R2 , where n is a positive integer, we will replace R with F (F


is a field1 , which could be the set R of real numbers or the set C of complex numbers) and replace
2 with n. Therefore, an n-dimensional vector will be represented as
  

 x 1 

n  .. 
F =  .  : xi ∈ F for i = 1, . . . , n .
 
xn
 

We can easily define algebraic manipulations on Fn . For example, the addition of two elements in
F2 is defined by adding corresponding coordinates:
     
x1 y1 x 1 + y1
 ..   ..   ..
 . + . = ,

.
xn yn x n + yn

and the scalar multiplication on an element of Fn is defined by performing multiplication in each


coordinate:    
x1 ax1
a  ...  =  ...  ,
   

xn axn
where a ∈ F. The motivation for the definition of a vector space is from the properties possessed
by addition and scalar multiplication operations on Fn . Formally, we say that a set V is a vector
space over a field F, if there is a binary operation “+”, called addition, in V, and a map F × V → V,
called scaler multiplication, such that the following properties hold:
1
See https://en.wikipedia.org/wiki/Field (mathematics) for more details.

1
• Commutativity. x + y = y + x for all x, y ∈ V;

• Associativity. (x + y) + z = x + (y + z) and (ab)x = a(bx) for all x, y, z ∈ V and all a, b ∈ F;

• Additive Identity. There exists an element 0 ∈ V such that x + 0 = x for all x ∈ V;

• Additive Inverse. For any x ∈ V, there exist an element y ∈ V such that x + y = 0;

• Multiplicative Identity. 1x = x for all x ∈ V;

• Distributive Properties. a(x + y) = ax + ay and (a + b)x = ax + bx for all a, b ∈ F and


x, y ∈ V.

An element of a vector space is called a vector. A subset U of V is called a subspace of V if U itself


is also a vector space.
In what follows, we are mainly interested in vector spaces over the field R.

2 Linear Independence and Bases


We say that a finite collection C = {x1 , x2 , . . . , xm } of vectors in RP n is linearly dependent if

there exist scalars a1 , . . . , am ∈ R, not all of them are zero, such that m i
i=1 ai x = 0. The collection
C = {x1 , x2 , . . . , xm } is said to be linearly independent if it is not linearly dependent.
A linear combination of a collection {x1 , x2 , . . . , xm } of vector in Rn is a vector of the form

a1 x1 + · · · + am xm ,

where a1 , . . . , am ∈ R. The set of all linear combinations of {x1 , x2 , . . . , xm } is called the span of
{x1 , x2 , . . . , xm }. A fact about space is that the span of a collection of vectors in Rn is a subspace
of Rn .
A basis of Rn is a collection of vectors in Rn that is linearly independent and spans Rn . For
example,    
1 3
,
2 4
is a basis of R2 , and    
1 0
,
0 1
is a standard basis of R2 .

3 Linear Maps and Matrices


A linear map from Rn to Rm is a function T : Rn → Rm , which satisfies the following properties:

• Additivity. For all x, y ∈ Rn , T (x + y) = T x + T y;

• Homogeneity. For all x ∈ Rn and a ∈ R, T (ax) = a(T x).

2
We denote the set of all linear maps from Rn to Rm as L(Rn , Rm ). For T ∈ L(Rn , Rm ), the null
space (also referred to as kernel) of T , denoted as null(T ), is the subset of Rn that consists of
vectors mapped to 0 by T :
null(T ) = {x ∈ Rn : T x = 0}.
The range of T , denoted as range(T ), is the the subset of Rm consisting of vectors that are of the
form T x for some x ∈ Rn :
range(T ) = {T x ∈ Rm : x ∈ Rn }.
A fact of null spaces and ranges is that the null space (range) of T is a subspace of Rn (Rm ).
We use Rm×n to denote the set of m × n arrays whose components are from R. We can write
an element A ∈ Rm×n as:  
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= . ..  , (1)
 
.. ..
 .. . . . 
am1 am2 · · · amn
where aij ∈ R for i = 1, . . . , m, j = 1, . . . , n. For a linear map T ∈ L(Rn , Rm ), suppose that
{y 1 , . . . , y n } is a basis of Rn and {x1 , . . . , xm } a basis of Rm . For each k = 1, . . . , n, suppose that
we can write T y k (uniquely) as a linear combination of x1 , . . . , xm as follows:

T y k = a1k x1 + · · · + amk xm .

Then the scalars ajk ’s completely determine the linear map T . We call the m × n array in (1) the
matrix of T with respect to {y 1 , . . . , y n } and {x1 , . . . , xm }. A row vector is a matrix with m = 1,
and a column vector is a matrix with n = 1.
Now, given an m × n matrix A of the form (1), its transpose2 A⊤ is defined as the following
n × m matrix:
 
a11 a21 · · · am1
 a12 a22 · · · am2 

A = . ..  ,
 
.. ..
 .. . . . 
a1n a2n · · · amn
An m × m real matrix A is said to be symmetric if A = A⊤ .
We use x ≥ 0 to indicate that all the components of x are non–negative, and x ≥ y to mean
that x − y ≥ 0.
The matrix-matrix product between A ∈ Rm×n and B ∈ Rn×p is defined as
n
X
C = AB where cij = aik bkj .
k=1

The matrix-vector product can be viewed as a special case of matrix-matrix product.


2
For more details, see https://en.wikipedia.org/wiki/Transpose.

3
4 Rank and Matrix Inversion
The rank of a matrix A ∈ Rm×n , denoted by rank(A), is defined as the number of elements of a
maximal linearly independent subset of its columns or rows. Some facts about the rank of a matrix:

• rank(A) = rank(A⊤ );

• rank(A + B) ≤ rank(A) + rank(B);

• rank(AB) ≤ min{rank(A), rank(B)}.

An n × n square matrix A is said to be invertible if the columns of A has full-rank. The inverse
of the matrix A is denoted as A−1 , and we have

AA−1 = A−1 A = I.

Some facts:

• (A−1 )−1 = A.

• (AB)−1 = B −1 A−1 , where A, B are square and invertible.

• (A⊤ )−1 = (A−1 )⊤ .

5 Inner Product and Vector Norms


Given two vectors x, y ∈ Rn , their inner product is defined as
n
X
x⊤ y = x i yi .
i=1

We say that x, y ∈ Rn are orthogonal if x⊤ y = 0.


A norm ∥ · ∥ in Rn is a function Rn → R that satisfies

• ∥x∥ > 0 for all x ̸= 0 and ∥x∥ = 0 only if x = 0;

• ∥ax∥ = |a|∥x∥ for x ∈ Rn and a ∈ R;

• ∥x + y∥ ≤ ∥x∥ + ∥y∥ for all x, y ∈ Rn .

We now introduce common norms in Rn —the Hölder p-norm, 1 ≤ p ≤ ∞, which are defined as
follows:  
x1 n
 ..  X
∥  .  ∥p = ( |xi |p )1/p
xn i=1

for 1 ≤ p < ∞, and  


x1
∥  ...  ∥∞ = max |xi |.
 
1≤i≤n
xn

4
√ Pn 1
When p = 2, the 2-norm of x, which in particular is defined as ∥x∥2 = x⊤ x = x2 2 , is also
i=1 i
referred to as the Euclidean norm of x. A fundamental inequality that relates the inner product
of two vectors and their respective Euclidean norms is the Cauchy-Schwarz inequality:

|x⊤ y| ≤ ∥x∥2 ∥y∥2 .

The equality holds if and only if the vectors x and y are linearly dependent; i.e., x = αy for some
α ∈ R.

References
[1] Strang, G. (1993). Introduction to linear algebra (Vol. 3). Wellesley, MA: Wellesley-Cambridge
Press.

[2] Horn, R. A., & Johnson, C. R. (2012). Matrix analysis. Cambridge university press.

[3] Axler, S. J. (1997). Linear algebra done right (Vol. 2). New York: Springer.

You might also like