You are on page 1of 234

Linear Algebra I

Peter Philip∗

Lecture Notes
Originally Created for the Class of Winter Semester 2018/2019 at LMU Munich
Includes Subsequent Corrections and Revisions†

September 19, 2021

Contents
1 Foundations: Mathematical Logic and Set Theory 5
1.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Propositional Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Predicate Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Functions and Relations 25


2.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.2 Order Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . 39

E-Mail: philip@math.lmu.de

Resources used in the preparation of this text include [Kun80, Str08].

1
CONTENTS 2

3 Natural Numbers, Induction, and the Size of Sets 44


3.1 Induction and Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Cardinality: The Size of Sets . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 Definition and General Properties . . . . . . . . . . . . . . . . . . 50
3.2.2 Finite Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.3 Countable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4 Basic Algebraic Structures 60


4.1 Magmas and Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5 Vector Spaces 90
5.1 Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Linear Independence, Basis, Dimension . . . . . . . . . . . . . . . . . . . 97

6 Linear Maps 112


6.1 Basic Properties and Examples . . . . . . . . . . . . . . . . . . . . . . . 112
6.2 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3 Vector Spaces of Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . 125

7 Matrices 129
7.1 Definition and Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2 Matrices as Representations of Linear Maps . . . . . . . . . . . . . . . . 134
7.3 Rank and Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.4 Special Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.5 Blockwise Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . 150

8 Linear Systems 152


8.1 General Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.2 Abstract Solution Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.3 Finding Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
CONTENTS 3

8.3.1 Echelon Form, Back Substitution . . . . . . . . . . . . . . . . . . 156


8.3.2 Elementary Row Operations, Variable Substitution . . . . . . . . 161
8.3.3 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.4 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.4.1 Definition and Motivation . . . . . . . . . . . . . . . . . . . . . . 168
8.4.2 Elementary Row Operations Via Matrix Multiplication . . . . . . 169
8.4.3 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.4.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

A Axiomatic Set Theory 184


A.1 Motivation, Russell’s Antinomy . . . . . . . . . . . . . . . . . . . . . . . 184
A.2 Set-Theoretic Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
A.3 The Axioms of Zermelo-Fraenkel Set Theory . . . . . . . . . . . . . . . . 187
A.3.1 Existence, Extensionality, Comprehension . . . . . . . . . . . . . 188
A.3.2 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
A.3.3 Pairing, Union, Replacement . . . . . . . . . . . . . . . . . . . . . 192
A.3.4 Infinity, Ordinals, Natural Numbers . . . . . . . . . . . . . . . . . 195
A.3.5 Power Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
A.3.6 Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
A.4 The Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
A.5 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

B Associativity and Commutativity 220


B.1 Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
B.2 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

C Groups 224

D Number Theory 226

E Vector Spaces 230


CONTENTS 4

E.1 Cardinality and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . 230


E.2 Cartesian Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

References 232
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 5

1 Foundations: Mathematical Logic and Set Theory

1.1 Introductory Remarks


The task of mathematics is to establish the truth or falsehood of (formalizable) state-
ments using rigorous logic, and to provide methods for the solution of classes of (e.g.
applied) problems, ideally including rigorous logical proofs verifying the validity of the
methods (proofs that the method under consideration will, indeed, provide a correct
solution).
The topic of this class is linear algebra, a subfield of the field of algebra. Algebra
in the sense of this class is also called abstract algebra and constitutes the study of
mathematical objects and rules for combining them (one sometimes says that these
rules form a structure on the underlying set of objects). An important task of algebra
is the solution of equations, where the equations are formulated in some set of objects
with a given structure. In linear algebra, the sets of objects are so-called vector spaces
(the objects being called vectors) and the structure consists of an addition, assigning two
vectors v, w their sum vector v + w, together with a scalar multiplication, assigning a
scalar (from a so-called scalar field, more about this later) λ and a vector v the product
vector λv. In linear algebra, one is especially interested in solving linear equations, i.e.
equations of the form A(x) = b, where A is a linear function, i.e. a function, satisfying

A(λv + µw) = λA(v) + µA(w)

for all vectors v, w and all scalars λ, µ. Before we can properly define and study vector
spaces and linear equations, it still needs some preparatory work. In modern mathe-
matics, the objects under investigation are almost always so-called sets. So one aims at
deriving (i.e. proving) true (and interesting and useful) statements about sets from other
statements about sets known or assumed to be true. Such a derivation or proof means
applying logical rules that guarantee the truth of the derived (i.e. proved) statement.
However, unfortunately, a proper definition of the notion of set is not easy, and neither
is an appropriate treatment of logic and proof theory. Here, we will only be able to
briefly touch on the bare necessities from logic and set theory needed to proceed to the
core matter of this class. We begin with logic in Sec. 1.2, followed by set theory in
Sec. 1.3, combining both in Sec. 1.4. The interested student can find an introductory
presentation of axiomatic set theory in Appendix A and he/she should consider taking
a separate class on set theory, logic, and proof theory at a later time.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 6

1.2 Propositional Calculus


1.2.1 Statements

Mathematical logic is a large field in its own right and, as indicated above, a thorough in-
troduction is beyond the scope of this class – the interested reader may refer to [EFT07],
[Kun12], and references therein. Here, we will just introduce some basic concepts using
common English (rather than formal symbolic languages – a concept touched on in Sec.
A.2 of the Appendix and more thoroughly explained in books like [EFT07]).
As mentioned before, mathematics establishes the truth or falsehood of statements. By
a statement or proposition we mean any sentence (any sequence of symbols) that can
reasonably be assigned a truth value, i.e. a value of either true, abbreviated T, or false,
abbreviated F. The following example illustrates the difference between statements and
sentences that are not statements:

Example 1.1. (a) Sentences that are statements:

Every dog is an animal. (T)


Every animal is a dog. (F)
The number 4 is odd. (F)
2 + 3 = 5. (T)

2 < 0. (F)
x + 1 > 0 holds for each natural number x. (T)

(b) Sentences that are not statements:

Let’s study calculus!


Who are you?
3 · 5 + 7.
x + 1 > 0.
All natural numbers are green.

The fourth sentence in Ex. 1.1(b) is not a statement, as it can not be said to be either
true or false without any further knowledge on x. The fifth sentence in Ex. 1.1(b) is
not a statement as it lacks any meaning and can, hence, not be either true or false. It
would become a statement if given a definition of what it means for a natural number
to be green.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 7

1.2.2 Logical Operators

The next step now is to combine statements into new statements using logical operators,
where the truth value of the combined statements depends on the truth values of the
original statements and on the type of logical operator facilitating the combination.
The simplest logical operator is negation, denoted ¬. It is actually a so-called unary
operator, i.e. it does not combine statements, but is merely applied to one statement.
For example, if A stands for the statement “Every dog is an animal.”, then ¬A stands
for the statement “Not every dog is an animal.”; and if B stands for the statement “The
number 4 is odd.”, then ¬B stands for the statement “The number 4 is not odd.”, which
can also be expressed as “The number 4 is even.”
To completely understand the action of a logical operator, one usually writes what is
known as a truth table. For negation, the truth table is

A ¬A
T F (1.1)
F T

that means if the input statement A is true, then the output statement ¬A is false; if
the input statement A is false, then the output statement ¬A is true.
We now proceed to discuss binary logical operators, i.e. logical operators combining
precisely two statements. The following four operators are essential for mathematical
reasoning:
Conjunction: A and B, usually denoted A ∧ B.
Disjunction: A or B, usually denoted A ∨ B.
Implication: A implies B, usually denoted A ⇒ B.
Equivalence: A is equivalent to B, usually denoted A ⇔ B.
Here is the corresponding truth table:

A B A∧B A∨B A⇒B A⇔B


T T T T T T
T F F T F F (1.2)
F T F T T F
F F F F T T

When first seen, some of the assignments of truth values in (1.2) might not be completely
intuitive, due to the fact that logical operators are often used somewhat differently in
common English. Let us consider each of the four logical operators of (1.2) in sequence:
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 8

For the use in subsequent examples, let A1 , . . . , A6 denote the six statements from Ex.
1.1(a).
Conjunction: Most likely the easiest of the four, basically identical to common language
use: A ∧ B is true if, and only if, both A and B are true. For example, using Ex. 1.1(a),
A1 ∧ A4 is the statement “Every dog is an animal and 2 + 3 = 5.”, which is true since
both A1 and A4 are true. On the other hand, A1 ∧ A3 is the statement “Every dog is
an animal and the number 4 is odd.”, which is false, since A3 is false.
Disjunction: The disjunction A ∨ B is true if, and only if, at least one of the statements
A, B is true. Here one already has to be a bit careful – A ∨ B defines the inclusive or,
whereas “or” in common English is often understood to mean the exclusive or (which is
false if both input statements are true). For example, using Ex. 1.1(a), A1 ∨ A4 is the
statement “Every dog is an animal or 2 + 3 = 5.”, which is true since both A1 and A4
are true. The statement A1 ∨ A3 , i.e. “Every dog is an animal or the number 4 is odd.”
is also true,
√ since A1 is true. However, the statement A2 ∨ A5 , i.e. “Every animal is a
dog or 2 < 0.” is false, as both A2 and A5 are false.
As you will have noted in the above examples, logical operators can be applied to
combine statements that have no obvious contents relation. While this might seem
strange, introducing contents-related restrictions is unnecessary as well as undesirable,
since it is often not clear which seemingly unrelated statements might suddenly appear
in a common context in the future. The same occurs when considering implications and
equivalences, where it might seem even more obscure at first.
Implication: Instead of A implies B, one also says if A then B, B is a consequence
of A, B is concluded or inferred from A, A is sufficient for B, or B is necessary for
A. The implication A ⇒ B is always true, except if A is true and B is false. At first
glance, it might be surprising that A ⇒ B is defined to be true for A false and B true,
however, there are many examples of incorrect statements implying correct statements.
For instance, squaring the (false) equality of integers −1 = 1, implies the (true) equality
of integers 1 = 1. However, as with conjunction and disjunction, it is perfectly valid
to combine statements without any obvious context relation: For example, using Ex.
1.1(a), the statement A1 ⇒ A6 , i.e. “Every dog is an animal implies x + 1 > 0 holds for
each natural number x.” is true, since A6 is true, whereas the statement A4 ⇒ A2 , i.e.
“2 + 3 = 5 implies every animal is a dog.” is false, as A4 is true and A2 is false.
Of course, the implication A ⇒ B is not really useful in situations, where the truth
values of both A and B are already known. Rather, in a typical application, one tries
to establish the truth of A to prove the truth of B (a strategy that will fail if A happens
to be false).
Example 1.2. Suppose we know Sasha to be a member of a group of students, taking
a class in Analysis. Then the statement A “Sasha has taken a class in Analysis before.”
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 9

implies the statement B “There is at least one student in the group, who has taken the
class before”. A priori, we might not know if Sasha has taken the Analysis class before,
but if we can establish that Sasha has, indeed, taken the class before, then we also know
B to be true. If we find Sasha to be taking the class for the first time, then we do not
know, whether B is true or false.

Equivalence: A ⇔ B means A is true if, and only if, B is true. Once again, using
input statements from Ex. 1.1(a), we see that A1 ⇔ A4 , i.e. “Every dog is an animal
is equivalent to 2 + 3 = 5.”, is true as well as A2 ⇔ A3 , i.e. “Every animal is a dog is
equivalent to √the number 4 is odd.”. On the other hand, A4 ⇔ A5 , i.e. “2 + 3 = 5 is
equivalent to 2 < 0, is false.
Analogous to the situation of implications, A ⇔ B is not really useful if the truth values
of both A and B are known a priori, but can be a powerful tool to prove B to be true
or false by establishing the truth value of A. It is obviously more powerful than the
implication as illustrated by the following example (compare with Ex. 1.2):

Example 1.3. Suppose we know Sasha has obtained the highest score among the stu-
dents registered for the Analysis class. Then the statement A “Sasha has taken the
Analysis class before.” is equivalent to the statement B “The student with the highest
score has taken the class before.” As in Ex. 1.2, if we can establish Sasha to have taken
the class before, then we also know B to be true. However, in contrast to Ex. 1.2, if we
find Sasha to have taken the class for the first time, then we know B to be false.

Remark 1.4. In computer science, the truth value T is often coded as 1 and the truth
value F is often coded as 0.

1.2.3 Rules

Note that the expressions in the first row of the truth table (1.2) (e.g. A ∧ B) are not
statements in the sense of Sec. 1.2.1, as they contain the statement variables (also known
as propositional variables) A or B. However, the expressions become statements if all
statement variables are substituted with actual statements. We will call expressions of
this form propositional formulas. Moreover, if a truth value is assigned to each statement
variable of a propositional formula, then this uniquely determines the truth value of the
formula. In other words, the truth value of the propositional formula can be calculated
from the respective truth values of its statement variables – a first justification for the
name propositional calculus.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 10

Example 1.5. (a) Consider the propositional formula (A ∧ B) ∨ (¬B). Suppose A is


true and B is false. The truth value of the formula is obtained according to the
following truth table:
A B A ∧ B ¬B (A ∧ B) ∨ (¬B)
(1.3)
T F F T T

(b) The propositional formula A ∨ (¬A), also known as the law of the excluded middle,
has the remarkable property that its truth value is T for every possible choice of
truth values for A:
A ¬A A ∨ (¬A)
T F T (1.4)
F T T
Formulas with this property are of particular importance.
Definition 1.6. A propositional formula is called a tautology or universally true if,
and only if, its truth value is T for all possible assignments of truth values to all the
statement variables it contains.
Notation 1.7. We write φ(A1 , . . . , An ) if, and only if, the propositional formula φ
contains precisely the n statement variables A1 , . . . , An .
Definition 1.8. The propositional formulas φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) are called
equivalent if, and only if, φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ) is a tautology.
Lemma 1.9. The propositional formulas φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) are equiva-
lent if, and only if, they have the same truth value for all possible assignments of truth
values to A1 , . . . , An .

Proof. If φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) are equivalent and Ai is assigned the truth
value ti , i = 1, . . . , n, then φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ) being a tautology implies it
has truth value T. From (1.2) we see that either φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) both
have truth value T or they both have truth value F.
If, on the other hand, we know φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) have the same truth
value for all possible assignments of truth values to A1 , . . . , An , then, given such an
assignment, either φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) both have truth value T or both
have truth value F, i.e. φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ) has truth value T in each case,
showing it is a tautology. 

For all logical purposes, two equivalent formulas are exactly the same – it does not
matter if one uses one or the other. The following theorem provides some important
equivalences of propositional formulas. As too many parentheses tend to make formulas
less readable, we first introduce some precedence conventions for logical operators:
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 11

Convention 1.10. ¬ takes precedence over ∧, ∨, which take precedence over ⇒, ⇔.


So, for example,
(A ∨ ¬B ⇒ ¬B ∧ ¬A) ⇔ ¬C ∧ (A ∨ ¬D)
is the same as
    
A ∨ (¬B) ⇒ (¬B) ∧ (¬A) ⇔ (¬C) ∧ A ∨ (¬D) .

Theorem 1.11. (a) (A ⇒ B) ⇔ ¬A ∨ B. This means one can actually define impli-
cation via negation and disjunction.

(b) (A ⇔ B) ⇔ (A ⇒ B) ∧ (B ⇒ A) , i.e. A and B are equivalent if, and only if, A
is both necessary and sufficient for B. One also calls the implication B ⇒ A the
converse of the implication A ⇒ B. Thus, A and B are equivalent if, and only if,
both A ⇒ B and its converse hold true.

(c) Commutativity of Conjunction: A ∧ B ⇔ B ∧ A.

(d) Commutativity of Disjunction: A ∨ B ⇔ B ∨ A.

(e) Associativity of Conjunction: (A ∧ B) ∧ C ⇔ A ∧ (B ∧ C).

(f ) Associativity of Disjunction: (A ∨ B) ∨ C ⇔ A ∨ (B ∨ C).

(g) Distributivity I: A ∧ (B ∨ C) ⇔ (A ∧ B) ∨ (A ∧ C).

(h) Distributivity II: A ∨ (B ∧ C) ⇔ (A ∨ B) ∧ (A ∨ C).

(i) De Morgan’s Law I: ¬(A ∧ B) ⇔ ¬A ∨ ¬B.

(j) De Morgan’s Law II: ¬(A ∨ B) ⇔ ¬A ∧ ¬B.

(k) Double Negative: ¬¬A ⇔ A.

(l) Contraposition: (A ⇒ B) ⇔ (¬B ⇒ ¬A).

Proof. Each equivalence is proved by providing a truth table and using Lem. 1.9.
(a):
A B ¬A A ⇒ B ¬A ∨ B
T T F T T
T F F F F
F T T T T
F F T T T
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 12

(b) – (h): Exercise.


(i):
A B ¬A ¬B A ∧ B ¬(A ∧ B) ¬A ∨ ¬B
T T F F T F F
T F F T F T T
F T T F F T T
F F T T F T T

(j): Exercise.
(k):
A ¬A ¬¬A
T F T
F T F

(l):
A B ¬A ¬B A ⇒ B ¬B ⇒ ¬A
T T F F T T
T F F T F F
F T T F T T
F F T T T T
Having checked all the rules completes the proof of the theorem. 

The importance of the rules provided by Th. 1.11 lies in their providing proof techniques,
i.e. methods for establishing the truth of statements from statements known or assumed
to be true. The rules of Th. 1.11 will be used frequently in proofs throughout this class.

Remark 1.12. Another important proof technique is the so-called proof by contradic-
tion, also called indirect proof. It is based on the observation, called the principle of
contradiction, that A ∧ ¬A is always false:

A ¬A A ∧ ¬A
T F F (1.5)
F T F

Thus, one possibility of proving a statement B to be true is to show ¬B ⇒ A ∧ ¬A for


some arbitrary statement A. Since the right-hand side of the implication is false, the
left-hand side must also be false, proving B is true.

1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 13

Two more rules we will use regularly in subsequent proofs are the so-called transitivity
of implication and the transitivity of equivalence (we will encounter equivalence again
in the context of relations in Sec. 1.3 below). In preparation for the transitivity rules,
we generalize implication to propositional formulas:

Definition 1.13. In generalization of the implication operator defined in (1.2), we say


the propositional formula φ(A1 , . . . , An ) implies the propositional formula ψ(A1 , . . . , An )
(denoted φ(A1 , . . . , An ) ⇒ ψ(A1 , . . . , An )) if, and only if, each assignment of truth values
to the A1 , . . . , An that makes φ(A1 , . . . , An ) true, makes ψ(A1 , . . . , An ) true as well.

Theorem 1.14. (a) Transitivity of Implication: (A ⇒ B) ∧ (B ⇒ C) ⇒ (A ⇒ C).

(b) Transitivity of Equivalence: (A ⇔ B) ∧ (B ⇔ C) ⇒ (A ⇔ C).

Proof. According to Def. 1.13, the rules can be verified by providing truth tables that
show that, for all possible assignments of truth values to the propositional formulas on
the left-hand side of the implications, either the left-hand side is false or both sides are
true. (a):

A B C A ⇒ B B ⇒ C (A ⇒ B) ∧ (B ⇒ C) A ⇒ C
T T T T T T T
T F T F T F T
F T T T T T T
F F T T T T T
T T F T F F F
T F F F T F F
F T F T F F T
F F F T T T T

(b):
A B C A ⇔ B B ⇔ C (A ⇔ B) ∧ (B ⇔ C) A ⇔ C
T T T T T T T
T F T F F F T
F T T F T F F
F F T T F F F
T T F T F F F
T F F F T F F
F T F F F F T
F F F T T T T
Having checked both rules, the proof is complete. 
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 14

Definition and Remark 1.15. A proof of the statement B is a finite sequence of


statements A1 , A2 , . . . , An such that A1 is true; for 1 ≤ i < n, Ai implies Ai+1 , and An
implies B. If there exists a proof for B, then Th. 1.14(a) guarantees that B is true.
Remark 1.16. Principle of Duality: In Th. 1.11, there are several pairs of rules that
have an analogous form: (c) and (d), (e) and (f), (g) and (h), (i) and (j). These
analogies are due to the general law called the principle of duality: If φ(A1 , . . . , An ) ⇒
ψ(A1 , . . . , An ) and only the operators ∧, ∨, ¬ occur in φ and ψ, then the reverse im-
plication Φ(A1 , . . . , An ) ⇐ Ψ(A1 , . . . , An ) holds, where one obtains Φ from φ and Ψ
from ψ by replacing each ∧ with ∨ and each ∨ with ∧. In particular, if, instead of an
implication, we start with an equivalence (as in the examples from Th. 1.11), then we
obtain another equivalence.

1.3 Set Theory


In the previous section, we have had a first glance at statements and corresponding truth
values. In the present section, we will move our focus to the objects such statements
are about. Reviewing Example 1.1(a), and recalling that this is a mathematics class
rather than one in zoology, the first two statements of Example 1.1(a) are less relevant
for us than statements 3–6. As in these examples, we will nearly always be interested in
statements involving numbers or collections of numbers or collections of such collections
etc.
In modern mathematics, the term one usually uses instead of “collection” is “set”. In
1895, Georg Cantor defined a set as “any collection into a whole M of definite and
separate objects m of our intuition or our thought”. The objects m are called the ele-
ments of the set M . As explained in Appendix A, without restrictions and refinements,
Cantor’s set theory is not free of contradictions and, thus, not viable to be used in the
foundation of mathematics. Axiomatic set theory provides these necessary restrictions
and refinements and an introductory treatment can also be found in Appendix A. How-
ever, it is possible to follow and understand the rest of this class, without having studied
Appendix A.
Notation 1.17. We write m ∈ M for the statement “m is an element of the set M ”.
Definition 1.18. The sets M and N are equal, denoted M = N , if, and only if, M and
N have precisely the same elements.

Definition 1.18 means we know everything about a set M if, and only if, we know all its
elements.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 15

Definition 1.19. The set with no elements is called the empty set; it is denoted by the
symbol ∅.

Example 1.20. For finite sets, we can simply write down all its elements,
√ for example,
A := {0}, B := {0, 17.5}, C := {5, 1, 5, 3}, D := {3, 5, 1}, E := {2, 2, −2}, where the
symbolism “:=” is to be read as “is defined to be equal to”.
Note C = D, since both sets contain precisely the same elements. In particular, the
order in which the elements are written down plays no role and a set does not change if
an element is written down more than once.
If a set has many elements, instead of writing down all its elements, one might use
abbreviations such as F := {−4, −2, . . . , 20, 22, 24}, where one has to make sure the
meaning of the dots is clear from the context.

Definition 1.21. The set A is called a subset of the set B (denoted A ⊆ B and also
referred to as the inclusion of A in B) if, and only if, every element of A is also an
element of B (one sometimes also calls B a superset of A and writes B ⊇ A). Please
note that A = B is allowed in the above definition of a subset. If A ⊆ B and A 6= B,
then A is called a strict subset of B, denoted A ( B.
If B is a set and P (x) is a statement about an element x of B (i.e., for each x ∈ B,
P (x) is either true or false), then we can define a subset A of B by writing

A := {x ∈ B : P (x)}. (1.6)

This notation is supposed to mean that the set A consists precisely of those elements of
B such that P (x) is true (has the truth value T in the language of Sec. 1.2).

Example 1.22. (a) For each set A, one has A ⊆ A and ∅ ⊆ A.

(b) If A ⊆ B, then A = {x ∈ B : x ∈ A}.

(c) We have {3} ⊆ {6.7, 3, 0}. Letting A := {−10, −8, . . . , 8, 10}, we have {−2, 0, 2} =
{x ∈ A : x3 ∈ A}, ∅ = {x ∈ A : x + 21 ∈ A}.

Remark 1.23. As a consequence of Def. 1.18, the sets A and B are equal if, and only
if, one has both inclusions, namely A ⊆ B and B ⊆ A. Thus, when proving the equality
of sets, one often divides the proof into two parts, first proving one inclusion, then the
other.

Definition 1.24. (a) The intersection of the sets A and B, denoted A ∩ B, consists of
all elements that are in A and in B. The sets A, B are said to be disjoint if, and
only if, A ∩ B = ∅.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 16

(b) The union of the sets A and B, denoted A ∪ B, consists of all elements that are in
A or in B (as in the logical disjunction in (1.2), the or is meant nonexclusively). If
A and B are disjoint, one sometimes writes A ∪˙ B and speaks of the disjoint union
of A and B.

(c) The difference of the sets A and B, denoted A\B (read “A minus B” or “A without
B”), consists of all elements of A that are not elements of B, i.e. A \ B := {x ∈
A: x∈ / B}. If B is a subset of a given set A (sometimes called the universe in
this context), then A \ B is also called the complement of B with respect to A.
In that case, one also writes B c := A \ B (note that this notation suppresses the
dependence on A).
Example 1.25. (a) Examples of Intersections:

{1, 2, 3} ∩ {3, 4, 5} = {3}, (1.7a)



{ 2} ∩ {1, 2, . . . , 10} = ∅, (1.7b)
{−1, 2, −3, 4, 5} ∩ {−10, −9, . . . , −1} ∩ {−1, 7, −3} = {−1, −3}. (1.7c)

(b) Examples of Unions:

{1, 2, 3} ∪ {3, 4, 5} = {1, 2, 3, 4, 5}, (1.8a)


˙
{1, 2, 3}∪{4, 5} = {1, 2, 3, 4, 5}, (1.8b)
{−1, 2, −3, 4, 5} ∪ {−99, −98, . . . , −1} ∪ {−1, 7, −3}
= {−99, −98, . . . , −2, −1, 2, 4, 5, 7}. (1.8c)

(c) Examples of Differences:

{1, 2, 3} \ {3, 4, 5} = {1, 2}, (1.9a)



{1, 2, 3} \ {3, 2, 1, 5} = ∅, (1.9b)
{−10, −9, . . . , 9, 10} \ {0} = {−10, −9, . . . , −1} ∪ {1, 2, . . . , 9, 10}. (1.9c)

With respect to the universe {1, 2, 3, 4, 5}, it is

{1, 2, 3}c = {4, 5}; (1.9d)

with respect to the universe {0, 1, . . . , 20}, it is

{1, 2, 3}c = {0} ∪ {4, 5, . . . , 20}. (1.9e)

As mentioned
 earlier, it will
often be unavoidable
to consider sets of sets. Here are first
examples: ∅, {0}, {0, 1} , {0, 1}, {1, 2} .
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 17

Definition 1.26. Given a set A, the set of all subsets of A is called the power set of A,
denoted P(A) (for reasons explained later (cf. Prop. 2.18), the power set is sometimes
also denoted as 2A ).

Example 1.27. Examples of Power Sets:

P(∅) = {∅}, (1.10a)



P({0}) = ∅, {0} , (1.10b)
   
P P({0}) = P ∅, {0} = ∅, {∅}, {{0}}, P({0}) . (1.10c)

So far, we have restricted our set-theoretic examples to finite sets. However, not sur-
prisingly, many sets of interest to us will be infinite (we will have to postpone a mathe-
matically precise definition of finite and infinite to Sec. 3.2). We will now introduce the
most simple infinite set.

Definition 1.28. The set N := {1, 2, 3, . . . } is called the set of natural numbers (for a
more rigorous construction of N, based on the axioms of axiomatic set theory, see Sec.
A.3.4 of the Appendix, where Th. A.46 shows N to be, indeed, infinite). Moreover, we
define N0 := {0} ∪ N.

The following theorem compiles important set-theoretic rules:

Theorem 1.29. Let A, B, C, U be sets.

(a) Commutativity of Intersections: A ∩ B = B ∩ A.

(b) Commutativity of Unions: A ∪ B = B ∪ A.

(c) Associativity of Intersections: (A ∩ B) ∩ C = A ∩ (B ∩ C).

(d) Associativity of Unions: (A ∪ B) ∪ C = A ∪ (B ∪ C).

(e) Distributivity I: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).

(f ) Distributivity II: A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).

(g) De Morgan’s Law I: U \ (A ∩ B) = (U \ A) ∪ (U \ B).

(h) De Morgan’s Law II: U \ (A ∪ B) = (U \ A) ∩ (U \ B).


1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 18

(i) Double Complement: If A ⊆ U , then U \ (U \ A) = A.

Proof. In each case, the proof results from the corresponding rule of Th. 1.11:
(a):
Th. 1.11(c)
x∈A∩B ⇔x∈A∧x∈B ⇔ x ∈ B ∧ x ∈ A ⇔ x ∈ B ∩ A.

(g): Under the general assumption of x ∈ U , we have the following equivalences:


 Th. 1.11(i)
x ∈ U \ (A ∩ B) ⇔ ¬(x ∈ A ∩ B) ⇔ ¬ x ∈ A ∧ x ∈ B ⇔ ¬(x ∈ A) ∨ ¬(x ∈ B)
⇔ x ∈ U \ A ∨ x ∈ U \ B ⇔ x ∈ (U \ A) ∪ (U \ B).

The proofs of the remaining rules are left as an exercise. 


Remark 1.30. The correspondence between Th. 1.11 and Th. 1.29 is no coincidence.
One can actually prove that, starting with an equivalence of propositional formulas
φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ), where both formulas contain only the operators ∧, ∨, ¬,
one obtains a set-theoretic rule (stating an equality of sets) by reinterpreting all state-
ment variables A1 , . . . , An as variables for sets, all subsets of a universe U , and replacing
∧ by ∩, ∨ by ∪, and ¬ by U \ (if there are no multiple negations, then we do not need
the hypothesis that A1 , . . . , An are subsets of U ). The procedure also works in the op-
posite direction – one can start with a set-theoretic formula for an equality of sets and
translate it into two equivalent propositional formulas.

1.4 Predicate Calculus


Now that we have introduced sets in the previous section, we have to return to the
subject of mathematical logic once more. As it turns out, propositional calculus, which
we discussed in Sec. 1.2, does not quite suffice to develop the theory of calculus (nor
most other mathematical theories). The reason is that we need to consider statements
such as

x + 1 > 0 holds for each natural number x. (T) (1.11a)


All real numbers are positive. (F) (1.11b)
There exists a natural number bigger than 10. (T) (1.11c)
There exists a real number x such that x2 = −1. (F) (1.11d)
For all natural numbers n, there exists a natural number bigger than n. (T) (1.11e)

That means we are interested in statements involving universal quantification via the
quantifier “for all” (one also often uses “for each” or “for every” instead), existential
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 19

quantification via the quantifier “there exists”, or both. The quantifier of universal
quantification is denoted by ∀ and the quantifier of existential quantification is denoted
by ∃. Using these symbols as well as N and R to denote the sets of natural and real
numbers, respectively, we can restate (1.11) as

∀ x + 1 > 0. (T) (1.12a)


x∈N
∀ x > 0. (F) (1.12b)
x∈R
∃ n > 10. (T) (1.12c)
n∈N
∃ x2 = −1. (F) (1.12d)
x∈R
∀ ∃ m > n. (T) (1.12e)
n∈N m∈N

Definition 1.31. A universal statement has the form

∀ P (x), (1.13a)
x∈A

whereas an existential statement has the form

∃ P (x). (1.13b)
x∈A

In (1.13), A denotes a set and P (x) is a sentence involving the variable x, a so-called
predicate of x, that becomes a statement (i.e. becomes either true or false) if x is substi-
tuted with any concrete element of the set A (in particular, P (x) is allowed to contain
further quantifiers, but it must not contain any other quantifier involving x – one says
x must be a free variable in P (x), not bound by any quantifier in P (x)).
The universal statement (1.13a) has the truth value T if, and only if, P (x) has the truth
value T for all elements x ∈ A; the existential statement (1.13b) has the truth value T
if, and only if, P (x) has the truth value T for at least one element x ∈ A.
V W
Remark 1.32. Some people prefer to write instead of ∀ and instead of ∃ .
x∈A x∈A x∈A x∈A
Even though this notation has the advantage of emphasizing that the universal statement
can be interpreted as a big logical conjunction and the existential statement can be
interpreted as a big logical disjunction, it is significantly less common. So we will stick
to ∀ and ∃ in this class.

Remark 1.33. According to Def. 1.31, the existential statement (1.13b) is true if, and
only if, P (x) is true for at least one x ∈ A. So if there is precisely one such x, then
(1.13b) is true; and if there are several different x ∈ A such that P (x) is true, then
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 20

(1.13b) is still true. Uniqueness statements are often of particular importance, and one
sometimes writes
∃! P (x) (1.14)
x∈A

for the statement “there exists a unique x ∈ A such that P (x) is true”. This notation
can be defined as an abbreviation for
 

∃ P (x) ∧ ∀ P (y) ⇒ x = y . (1.15)
x∈A y∈A

Example 1.34. Here are some examples of uniqueness statements:


∃! n > 10. (F) (1.16a)
n∈N
∃! 12 > n > 10. (T) (1.16b)
n∈N
∃! 11 > n > 10. (F) (1.16c)
n∈N
∃! x2 = −1. (F) (1.16d)
x∈R
∃! x2 = 1. (F) (1.16e)
x∈R
∃! x2 = 0. (T) (1.16f)
x∈R

Remark 1.35. As for propositional calculus, we also have some important rules for
predicate calculus:

(a) Consider the negation of a universal statement, ¬ ∀ P (x), which is true if, and
x∈A
only if, P (x) does not hold for each x ∈ A, i.e. if, and only if, there exists at least
one x ∈ A such that P (x) is false (such that ¬P (x) is true). We have just proved
the rule
¬ ∀ P (x) ⇔ ∃ ¬P (x). (1.17a)
x∈A x∈A
Similarly, consider the negation of an existential statement. We claim the corre-
sponding rule is
¬ ∃ P (x) ⇔ ∀ ¬P (x). (1.17b)
x∈A x∈A

Indeed, we can prove (1.17b) from (1.17a):


Th. 1.11(k) (1.17a) Th. 1.11(k)
¬ ∃ P (x) ⇔ ¬ ∃ ¬¬P (x) ⇔ ¬¬ ∀ ¬P (x) ⇔ ∀ ¬P (x).
x∈A x∈A x∈A x∈A
(1.18)
One can interpret (1.17) as a generalization of the De Morgan’s laws Th. 1.11(i),(j).
One can actually generalize (1.17) even a bit more: If a statement starts with several
quantifiers, then one negates the statement by replacing each ∀ with ∃ and vice versa
plus negating the predicate after the quantifiers (see the example in (1.21e) below).
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 21

(b) If A, B are sets and P (x, y) denotes a predicate of both x and y, then ∀ ∀ P (x, y)
x∈A y∈B
and ∀ ∀ P (x, y) both hold true if, and only if, P (x, y) holds true for each x ∈ A
y∈B x∈A
and each y ∈ B, i.e. the order of two consecutive universal quantifiers does not
matter:
∀ ∀ P (x, y) ⇔ ∀ ∀ P (x, y) (1.19a)
x∈A y∈B y∈B x∈A

In the same way, we obtain the following rule:

∃ ∃ P (x, y) ⇔ ∃ ∃ P (x, y). (1.19b)


x∈A y∈B y∈B x∈A

If A = B, one also uses abbreviations of the form

∀ P (x, y) for ∀ ∀ P (x, y), (1.20a)


x,y∈A x∈A y∈A

∃ P (x, y) for ∃ ∃ P (x, y). (1.20b)


x,y∈A x∈A y∈A

Generalizing rules (1.19), we can always commute identical quantifiers. Caveat:


Quantifiers that are not identical must not be commuted (see Ex. 1.36(d) below).

Example 1.36. (a) Negation of universal and existential statements:


¬(x+1>0)
z }| {
Negation of (1.12a) : ∃ x + 1 ≤ 0 . (F) (1.21a)
x∈N
¬(x>0)
z }| {
Negation of (1.12b) : ∃ x ≤ 0 . (T) (1.21b)
x∈R
¬(n>10)
z }| {
Negation of (1.12c) : ∀ n ≤ 10 . (F) (1.21c)
n∈N
¬(x2 =−1)
z }| {
Negation of (1.12d) : ∀ x2 6= −1 . (T) (1.21d)
x∈R
¬(m>n)
z }| {
Negation of (1.12e) : ∃ ∀ m ≤ n . (F) (1.21e)
n∈N m∈N

(b) As a more complicated example, consider the negation of the uniqueness statement
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 22

(1.14), i.e. of (1.15):


 

¬ ∃! P (x) ⇔ ¬ ∃ P (x) ∧ ∀ P (y) ⇒ x = y
x∈A x∈A y∈A
 
(1.17b), Th. 1.11(a) 
⇔ ∀ ¬ P (x) ∧ ∀ ¬P (y) ∨ x = y
x∈A y∈A
 
Th. 1.11(i) 
⇔ ∀ ¬P (x) ∨ ¬ ∀ ¬P (y) ∨ x = y
x∈A y∈A
 
(1.17a) 
⇔ ∀ ¬P (x) ∨ ∃ ¬ ¬P (y) ∨ x = y
x∈A y∈A
 
Th. 1.11(j),(k) 
⇔ ∀ ¬P (x) ∨ ∃ P (y) ∧ x 6= y
x∈A y∈A
 
Th. 1.11(a) 
⇔ ∀ P (x) ⇒ ∃ P (y) ∧ x 6= y . (1.22)
x∈A y∈A

So how to decode the expression, we have obtained at the end? It states that if
P (x) holds for some x ∈ A, then there must be at least a second, different, element
y ∈ A such that P (y) is true. This is, indeed, precisely the negation of ∃! P (x).
x∈A

(c) Identical quantifiers commute:

∀ ∀ x2n ≥ 0 ⇔ ∀ ∀ x2n ≥ 0, (1.23a)


x∈R n∈N n∈N x∈R
∀ ∃ ∃ ny > x2 ⇔ ∀ ∃ ∃ ny > x2 . (1.23b)
x∈R y∈R n∈N x∈R n∈N y∈R

(d) The following example shows that different quantifiers do, in general, not commute
(i.e. do not yield equivalent statements when commuted): While the statement

∀ ∃ y>x (1.24a)
x∈R y∈R

is true (for each real number x, there is a bigger real number y, e.g. y := x + 1 will
do the job), the statement
∃ ∀ y>x (1.24b)
y∈R x∈R

is false (for example, since y > y is false). In particular, (1.24a) and (1.24b) are not
equivalent.

(e) Even though (1.14) provides useful notation, it is better not to think of ∃! as a
quantifier. It is really just an abbreviation for (1.15), and it behaves very differently
from ∃ and ∀: The following examples show that, in general, ∃! commutes neither
with ∃, nor with itself:

∃ ∃! m < n 6⇔ ∃! ∃ m<n
n∈N m∈N m∈N n∈N
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 23

(the statement on the left is true, as one can choose n = 2, but the statement on
the right is false, as ∃ m < n holds for every m ∈ N). Similarly,
n∈N

∃! ∃! m < n 6⇔ ∃! ∃! m < n
n∈N m∈N m∈N n∈N

(the statement on the left is still true and the statement on the right is still false
(there is no m ∈ N such that ∃! m < n)).
n∈N

Remark 1.37. One can make the following observations regarding the strategy for
proving universal and existential statements:

(a) To prove that ∀ P (x) is true, one must check the truth of P (x) for every element
x∈A
x ∈ A – examples are not enough!

(b) To prove that ∀ P (x) is false, it suffices to find one x ∈ A such that P (x) is
x∈A
false – such an x is then called a counterexample and one counterexample is always
enough to prove ∀ P (x) is false!
x∈A

(c) To prove that ∃ P (x) is true, it suffices to find one x ∈ A such that P (x) is true
x∈A
– such an x is then called an example and one example is always enough to prove
∃ P (x) is true!
x∈A

The subfield of mathematical logic dealing with quantified statements is called predicate
calculus. In general, one does not restrict the quantified variables to range only over
elements of sets (as we have done above). Again, we refer to [EFT07] for a deeper
treatment of the subject.
As an application of quantified statements, let us generalize the notion of union and
intersection:

Definition 1.38. Let I 6= ∅ be a nonempty set, usually called an index set in the present
context. For each i ∈ I, let Ai denote a set (some or all of the Ai can be identical).

(a) The intersection  


\
Ai := x : ∀ x ∈ Ai (1.25a)
i∈I
i∈I

consists of all elements x that belong to every Ai .


1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 24

(b) The union  


[
Ai := x : ∃ x ∈ Ai (1.25b)
i∈I
i∈I

consists of all elements x that belong to at least one Ai . The union is called disjoint
if, and only if, for each i, j ∈ I, i 6= j implies Ai ∩ Aj = ∅.

Proposition 1.39. Let I 6= ∅ be an index set, let M denote a set, and, for each i ∈ I,
let Ai denote a set. The following set-theoretic rules hold:
 
T T
(a) Ai ∩M = (Ai ∩ M ).
i∈I i∈I
 
S S
(b) Ai ∪M = (Ai ∪ M ).
i∈I i∈I
 
T T
(c) Ai ∪M = (Ai ∪ M ).
i∈I i∈I
 
S S
(d) Ai ∩M = (Ai ∩ M ).
i∈I i∈I
T S
(e) M \ Ai = (M \ Ai ).
i∈I i∈I
S T
(f ) M \ Ai = (M \ Ai ).
i∈I i∈I

Proof. We prove (c) and (e) and leave the remaining proofs as an exercise.
(c):
!
\ (∗) 
x∈ Ai ∪ M ⇔ x ∈ M ∨ ∀ x ∈ Ai ⇔ ∀ x ∈ Ai ∨ x ∈ M
i∈I i∈I
i∈I
\
⇔ x∈ (Ai ∪ M ).
i∈I

To justify the equivalence at (∗), we make use of Th. 1.11(b) and verify ⇒ and ⇐. For
⇒ note that the truth of x ∈ M implies x ∈ Ai ∨ x ∈ M is true for each i ∈ I. If x ∈ Ai
is true for each i ∈ I, then x ∈ Ai ∨ x ∈ M is still true for each i ∈ I. To verify ⇐, note
that the existence of i ∈ I such that x ∈ M implies the truth of x ∈ M ∨ ∀ x ∈ Ai .
i∈I
If x ∈ M is false for each i ∈ I, then x ∈ Ai must be true for each i ∈ I, showing
x ∈ M ∨ ∀ x ∈ Ai is true also in this case.
i∈I
2 FUNCTIONS AND RELATIONS 25

(e):
\
x∈M\ Ai ⇔ x ∈ M ∧ ¬ ∀ x ∈ Ai ⇔ x ∈ M ∧ ∃ x ∈
/ Ai
i∈I i∈I
i∈I
[
⇔ ∃ x ∈ M \ Ai ⇔ x ∈ (M \ Ai ),
i∈I
i∈I

completing the proof. 

Example 1.40. We have the following identities of sets:


\
N = N, (1.26a)
x∈R
\
{1, 2, . . . , n} = {1}, (1.26b)
n∈N
[
N = N, (1.26c)
x∈R
[
{1, 2, . . . , n} = N, (1.26d)
n∈N
[ \ 
N\ {2n} = {1, 3, 5, . . . } = N \ {2n} : (1.26e)
n∈N n∈N

Comparing with the notation of Def. 1.38, in (1.26a), for example, we have I = R and
Ai = N for each i ∈ I (where, in (1.26a), we have written x instead of i). Similarly, in
(1.26b), we have I = N and An = {1, 2, . . . , n} for each n ∈ I.

2 Functions and Relations

2.1 Functions
Definition 2.1. Let A, B be sets. Given x ∈ A, y ∈ B, the set
n o
(x, y) := {x}, {x, y} (2.1)

is called the ordered pair (often shortened to just pair) consisting of x and y. The set of
all such pairs is called the Cartesian product A × B, i.e.

A × B := {(x, y) : x ∈ A ∧ y ∈ B}. (2.2)


2 FUNCTIONS AND RELATIONS 26

Example 2.2. Let A be a set.

A × ∅ = ∅ × A = ∅, (2.3a)
{1, 2} × {1, 2, 3} = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)} (2.3b)
6 {1, 2, 3} × {1, 2} = {(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)}.
= (2.3c)

Also note that, for x 6= y,


 
(x, y) = {x}, {x, y} 6= {y}, {x, y} = (y, x). (2.4)

Definition 2.3. Given sets A, B, a function or map f is an assignment rule that assigns
to each x ∈ A a unique y ∈ B. One then also writes f (x) for the element y. The set A
is called the domain of f , denoted dom(f ), and B is called the codomain of f , denoted
codom(f ). The information about a map f can be concisely summarized by the notation

f : A −→ B, x 7→ f (x), (2.5)

where x 7→ f (x) is called the assignment rule for f , f (x) is called the image of x, and
x is called a preimage of f (x) (the image must be unique, but there might be several
preimages). The set

graph(f ) := (x, y) ∈ A × B : y = f (x) (2.6)

is called the graph of f (not to be confused with pictures visualizing the function f ,
which are also called graph of f ). If one wants to be completely precise, then one
identifies the function f with the ordered triple (A, B, graph(f )).
The set of all functions with domain A and codomain B is denoted by F(A, B) or B A ,
i.e.

F(A, B) := B A := (f : A −→ B) : A = dom(f ) ∧ B = codom(f ) . (2.7)

Caveat: Some authors reserve the word map for continuous functions, but we use func-
tion and map synonymously.

Definition 2.4. Let A, B be sets and f : A −→ B a function.

(a) If T is a subset of A, then

f (T ) := {f (x) ∈ B : x ∈ T } (2.8)

is called the image of T under f .


2 FUNCTIONS AND RELATIONS 27

(b) If U is a subset of B, then

f −1 (U ) := {x ∈ A : f (x) ∈ U } (2.9)

is called the preimage or inverse image of U under f .

(c) f is called injective or one-to-one if, and only if, every y ∈ B has at most one
preimage, i.e. if, and only if, the preimage of {y} has at most one element:
 
−1
f injective ⇔ ∀ f {y} = ∅ ∨ ∃! f (x) = y
y∈B x∈A

⇔ ∀ x1 6= x2 ⇒ f (x1 ) 6= f (x2 ) . (2.10)
x1 ,x2 ∈A

(d) f is called surjective or onto if, and only if, every element of the codomain of f has
a preimage:

f surjective ⇔ ∀ ∃ y = f (x) ⇔ ∀ f −1 {y} 6= ∅. (2.11)


y∈B x∈A y∈B

(e) f is called bijective if, and only if, f is injective and surjective.

Example 2.5. Examples of Functions:

f : {1, 2, 3, 4, 5} −→ {1, 2, 3, 4, 5}, f (x) := −x + 6, (2.12a)


g : N −→ N, g(n) := 2n, (2.12b)
h : N −→ {2, 4, 6, . . . }, h(n) := 2n, (2.12c)
(
n for n even,
h̃ : N −→ {2, 4, 6, . . . }, h̃(n) := (2.12d)
n + 1 for n odd,
G : N −→ R, G(n) := n/(n + 1), (2.12e)

F : P(N) −→ P P(N) , F (A) := P(A). (2.12f)

Instead of f (x) := −x + 6 in (2.12a), one can also write x 7→ −x + 6 and analogously


in the other cases. Also note that, in the strict sense, functions g and h are different,
since their codomains are different (however, using the following Def. 2.4(a), they have
the same image in the sense that g(N) = h(N)). Furthermore,

f ({1, 2}) = {5, 4} = f −1 ({1, 2}), h̃−1 ({2, 4, 6}) = {1, 2, 3, 4, 5, 6}, (2.13)

f is bijective; g is injective, but not surjective; h is bijective; h̃ is surjective, but not


injective. Can you figure out if G and F are injective and/or surjective?
2 FUNCTIONS AND RELATIONS 28

Example 2.6. (a) For each nonempty set A, the map Id : A −→ A, Id(x) := x, is
called the identity on A. If one needs to emphasize that Id operates on A, then one
also writes IdA instead of Id. The identity is clearly bijective.

(b) Let A, B be nonempty sets. A map f : A −→ B is called constant if, and only if,
there exists c ∈ B such that f (x) = c for each x ∈ A. In that case, one also writes
f ≡ c, which can be read as “f is identically equal to c”. If f ≡ c, ∅ 6= T ⊆ A, and
U ⊆ B, then (
A for c ∈ U ,
f (T ) = {c}, f −1 (U ) = (2.14)
∅ for c ∈/ U.
f is injective if, and only if, A = {x}; f is surjective if, and only if, B = {c}.

(c) Given A ⊆ X, the map


ι : A −→ X, ι(x) := x, (2.15)
is called inclusion (also embedding or imbedding). An inclusion is always injective;
it is surjective if, and only if A = X, i.e. if, and only if, it is the identity on A.

(d) Given A ⊆ X and a map f : X −→ B, the map g : A −→ B, g(x) = f (x), is called


the restriction of f to A; f is called the extension of g to X. In this situation, one
also uses the notation f ↾A for g (some authors prefer the notation f |A or f |A).

Theorem 2.7. Let f : A → B be a map, let ∅ 6= I be an index set, and assume S, T, Si ,


i ∈ I, are subsets of A, whereas U, V, Ui , i ∈ I, are subsets of B. Then we have the
following rules concerning functions and set-theoretic operations:

f (S ∩ T ) ⊆ f (S) ∩ f (T ), (2.16a)
!
\ \
f Si ⊆ f (Si ), (2.16b)
i∈I i∈I
f (S ∪ T ) = f (S) ∪ f (T ), (2.16c)
!
[ [
f Si = f (Si ), (2.16d)
i∈I i∈I
f −1 (U ∩ V ) = f (U ) ∩ f −1 (V ),
−1
(2.16e)
!
\ \
f −1 Ui = f −1 (Ui ), (2.16f)
i∈I i∈I
−1
f (U ∪ V ) = f (U ) ∪ f −1 (V ),
−1
(2.16g)
!
[ [
f −1 Ui = f −1 (Ui ), (2.16h)
i∈I i∈I
2 FUNCTIONS AND RELATIONS 29

f (f −1 (U )) ⊆ U, f −1 (f (S)) ⊇ S, (2.16i)
f −1 (U \ V ) = f −1 (U ) \ f −1 (V ). (2.16j)

Proof. We prove (2.16b) (which includes (2.16a) as a special case) and the second part
of (2.16i), and leave the remaining cases as exercises.
For (2.16b), one argues
!
\  \
y∈f Si ⇔ ∃ ∀ x ∈ Si ∧ y = f (x) ⇒ ∀ y ∈ f (Si ) ⇔ y ∈ f (Si ).
x∈A i∈I i∈I
i∈I i∈I

The observation

x ∈ S ⇒ f (x) ∈ f (S) ⇔ x ∈ f −1 (f (S)).

establishes the second part of (2.16i). 

It is an exercise to find counterexamples that show one can not, in general, replace the
four subset symbols in (2.16) by equalities (it is possible to find examples with sets that
have at most 2 elements).

Definition 2.8. The composition of maps f and g with f : A −→ B, g : C −→ D, and


f (A) ⊆ C is defined to be the map

g ◦ f : A −→ D, (g ◦ f )(x) := g f (x) . (2.17)

The expression g ◦ f is read as “g after f ” or “g composed with f ”.

Example 2.9. Consider the maps

f : N −→ R, n 7→ n2 , (2.18a)
g : N −→ R, n 7→ 2n. (2.18b)

We obtain f (N) = {1, 4, 9, . . . } ⊆ dom(g), g(N) = {2, 4, 6, . . . } ⊆ dom(f ), and the


compositions

(g ◦ f ) : N −→ R, (g ◦ f )(n) = g(n2 ) = 2n2 , (2.19a)


(f ◦ g) : N −→ R, (f ◦ g)(n) = f (2n) = 4n2 , (2.19b)

showing that composing functions is, in general, not commutative, even if the involved
functions have the same domain and the same codomain.
2 FUNCTIONS AND RELATIONS 30

Proposition 2.10. Consider maps f : A −→ B, g : C −→ D, h : E −→ F , satisfying


f (A) ⊆ C and g(C) ⊆ E.

(a) Associativity of Compositions:

h ◦ (g ◦ f ) = (h ◦ g) ◦ f. (2.20)

(b) One has the following law for forming preimages:

∀ (g ◦ f )−1 (W ) = f −1 (g −1 (W )). (2.21)


W ∈P(D)

Proof. (a): Both h ◦ (g ◦ f ) and


 (h ◦ g) ◦ f map A into F . So it just remains to prove
h ◦ (g ◦ f ) (x) = (h ◦ g) ◦ f (x) for each x ∈ A. One computes, for each x ∈ A,
  
h ◦ (g ◦ f ) (x) = h (g ◦ f )(x) = h g(f (x)) = (h ◦ g)(f (x))

= (h ◦ g) ◦ f (x), (2.22)

establishing the case.


(b): Exercise. 

Definition 2.11. A function g : B −→ A is called a right inverse (resp. left inverse)


of a function f : A −→ B if, and only if, f ◦ g = IdB (resp. g ◦ f = IdA ). Moreover,
g is called an inverse of f if, and only if, it is both a right and a left inverse. If g is
an inverse of f , then one also writes f −1 instead of g. The map f is called (right, left)
invertible if, and only if, there exists a (right, left) inverse for f .

Example 2.12. (a) Consider the map

f : N −→ N, f (n) := 2n. (2.23a)

The maps
(
n/2 if n even,
g1 : N −→ N, g1 (n) := (2.23b)
1 if n odd,
(
n/2 if n even,
g2 : N −→ N, g2 (n) := (2.23c)
2 if n odd,

both constitute left inverses of f . It follows from Th. 2.13(c) below that f does not
have a right inverse.
2 FUNCTIONS AND RELATIONS 31

(b) Consider the map


(
n/2 for n even,
f : N −→ N, f (n) := (2.24a)
(n + 1)/2 for n odd.

The maps

g1 : N −→ N, g1 (n) := 2n, (2.24b)


g2 : N −→ N, g2 (n) := 2n − 1, (2.24c)

both constitute right inverses of f . It follows from Th. 2.13(c) below that f does
not have a left inverse.

(c) The map (


n − 1 for n even,
f : N −→ N, f (n) := (2.25a)
n + 1 for n odd,
is its own inverse, i.e. f −1 = f . For the map


 2 for n = 1,

3 for n = 2,
g : N −→ N, g(n) := (2.25b)


 1 for n = 3,

n for n∈/ {1, 2, 3},

the inverse is


 3 for n = 1,

1 for n = 2,
g −1 : N −→ N, g −1 (n) := (2.25c)


 2 for n = 3,

n for n∈/ {1, 2, 3}.

While Examples 2.12(a),(b) show that left and right inverses are usually not unique,
they are unique provided f is bijective (see Th. 2.13(c)).

Theorem 2.13. Let A, B be nonempty sets.

(a) f : A −→ B is right invertible if, and only if, f is surjective (where the implication
“⇐” makes use of the axiom of choice (AC), see Appendix A.4).

(b) f : A −→ B is left invertible if, and only if, f is injective.


2 FUNCTIONS AND RELATIONS 32

(c) f : A −→ B is invertible if, and only if, f is bijective. In this case, the right inverse
and the left inverse are unique and both identical to the inverse.

Proof. (a): If f is surjective, then, for each y ∈ B, there exists xy ∈ f −1 {y} such that
f (xy ) = y. By AC, we can define the choice function
g : B −→ A, g(y) := xy . (2.26)
Then, for each y ∈ B, f (g(y)) = y, showing g is a right inverse of f . Conversely, if
g : B −→ A is a right inverse of f , then, for each y ∈ B, it is y = f (g(y)), showing that
g(y) ∈ A is a preimage of y, i.e. f is surjective.
(b): Fix a ∈ A. If f is injective, then, for each y ∈ B with f −1 {y} 6= ∅, let xy denote
the unique element in A satisfying f (xy ) = y. Define
(
xy for f −1 {y} 6= ∅,
g : B −→ A, g(y) := (2.27)
a otherwise.
Then, for each x ∈ A, g(f (x)) = x, showing g is a left inverse of f . Conversely, if
g : B −→ A is a left inverse of f and x1 , x2 ∈ A with f (x1 ) = f (x2 ) = y, then
x1 = (g ◦ f )(x1 ) = g(f (x1 )) = g(f (x2 )) = (g ◦ f )(x2 ) = x2 , showing y has precisely one
preimage and f is injective.
The first part of (c) follows immediately by combining (a) and (b) (and, actually, without
using AC, since, if f is both injective and surjective, then, for each y ∈ B, the element
xy ∈ f −1 {y} is unique, and (2.26) can be defined without AC). It merely remains to
verify the uniqueness of right and left inverse for bijective maps. So let g be a left inverse
of f , let h be a right inverse of f , and let f −1 be an inverse of f . Then, for each y ∈ B,
 
g(y) = g ◦ (f ◦ f −1 ) (y) = (g ◦ f ) ◦ f −1 (y) = f −1 (y), (2.28a)
 
h(y) = (f −1 ◦ f ) ◦ h (y) = f −1 ◦ (f ◦ h) (y) = f −1 (y), (2.28b)
thereby proving the uniqueness of left and right inverse for bijective maps. 
Theorem 2.14. Consider maps f : A −→ B, g : B −→ C. If f and g are both injective
(resp. both surjective, both bijective), then so is g ◦ f . Moreover, in the bijective case,
one has
(g ◦ f )−1 = f −1 ◦ g −1 . (2.29)

Proof. Exercise. 
Definition 2.15. (a) Given an index set I and a set A, a map f : I −→ A is sometimes
called a family (of elements in A), and is denoted in the form f = (ai )i∈I with
ai := f (i). When using this representation, one often does not even specify f and
A, especially if the ai are themselves sets.
2 FUNCTIONS AND RELATIONS 33

(b) A sequence in a set A is a family of elements in A, where the index set is the set of
natural numbers N. In this case, one writes (an )n∈N or (a1 , a2 , . . . ). More generally,
a family is called a sequence, given a bijective map between the index set I and a
subset of N.

(c) Given a family of sets (Ai )i∈I , we define the Cartesian product of the Ai to be the
set of functions
( ! )
Y [
Ai := f : I −→ Aj : ∀ f (i) ∈ Ai . (2.30)
i∈I
i∈I j∈I

QI has precisely n elements with n ∈ N, then the elements of the Cartesian product
If
i∈I Ai are called (ordered) n-tuples, (ordered) triples for n = 3.

Example
T 2.16. (a) Using
S the notion of family, we can now say that the intersection
i∈I Ai and union i∈I Ai as defined in Def. 1.38 are the intersection and union of
the family of sets (Ai )i∈I , respectively. As a concrete example, let us revisit (1.26b),
where we have
\
(An )n∈N , An := {1, 2, . . . , n}, An = {1}. (2.31)
n∈N

(b) Examples of Sequences:

Sequence in {0, 1} : (1, 0, 1, 0, 1, 0, . . . ), (2.32a)


Sequence in N : (n2 )n∈N = (1, 4, 9, 16, 25, . . . ), (2.32b)
√   √ √ 
Sequence in R : (−1)n n n∈N = −1, 2, − 3, . . . , (2.32c)
 
1 1
Sequence in R : (1/n)n∈N = 1, , , . . . , (2.32d)
2 3

Finite Sequence in P(N) : {3, 2, 1}, {2, 1}, {1}, ∅ . (2.32e)
Q
(c) The Cartesian product i∈I A, where all sets Ai =QA, is the same as AI , the set
of all functions from I into A. So, for example, n∈N R = RN is the set of all
sequences in R. If I = {1, 2, . . . , n} with n ∈ N, then
Y n
Y
{1,2...,n}
A=A =: A =: An (2.33)
i∈I i=1

is the set of all n-tuples with entries from A.



2 FUNCTIONS AND RELATIONS 34

In the following, we explain the common notation 2A for the power set P(A) of a set
A. It is related to a natural identification between subsets and their corresponding
characteristic function.

Definition 2.17. Let A be a set and let B ⊆ A be a subset of A. Then


(
1 if x ∈ B,
χB : A −→ {0, 1}, χB (x) := (2.34)
0 if x ∈/ B,

is called the characteristic function of the set B (with respect to the universe A). One
also finds the notations 1B and 1B instead of χB (note that all the notations supress
the dependence of the characteristic function on the universe A).

Proposition 2.18. Let A be a set. Then the map

χ : P(A) −→ {0, 1}A , χ(B) := χB , (2.35)

is bijective (recall that P(A) denotes the power set of A and {0, 1}A denotes the set of
all functions from A into {0, 1}).

Proof. χ is injective: Let B, C ∈ P(A) with B 6= C. By possibly switching the names


of B and C, we may assume there exists x ∈ B such that x ∈ / C. Then χB (x) = 1,
whereas χC (x) = 0, showing χ(B) 6= χ(C), proving χ is injective.
χ is surjective: Let f : A −→ {0, 1} be an arbitrary function and define B := {x ∈ A :
f (x) = 1}. Then χ(B) = χB = f , proving χ is surjective. 

Proposition 2.18 allows one to identify the sets P(A) and {0, 1}A via the bijective map
χ. This fact together with the common practice of set theory to identify the number 2
with the set {0, 1} explains the notation 2A for P(A).

2.2 Relations
2.2.1 Definition and Properties

Definition 2.19. Given sets A and B, a relation is a subset R of A × B (if one wants
to be completely precise, a relation is an ordered triple (A, B, R), where R ⊆ A × B).
If A = B, then we call R a relation on A. One says that a ∈ A and b ∈ B are related
according to the relation R if, and only if, (a, b) ∈ R. In this context, one usually writes
a R b instead of (a, b) ∈ R.
2 FUNCTIONS AND RELATIONS 35

Example 2.20. (a) The relations we are probably most familiar with are = and ≤.
The relation R of equality, usually denoted =, makes sense on every nonempty set
A:
R := ∆(A) := {(x, x) ∈ A × A : x ∈ A}. (2.36)
The set ∆(A) is called the diagonal of the Cartesian product, i.e., as a subset of
A × A, the relation of equality is identical to the diagonal:

x = y ⇔ x R y ⇔ (x, y) ∈ R = ∆(A). (2.37)

Similarly, the relation ≤ on R is identical to the set

R≤ := {(x, y) ∈ R2 : x ≤ y}. (2.38)

(b) Every function f : A −→ B is a relation, namely the relation

Rf = {(x, y) ∈ A × B : y = f (x)} = graph(f ). (2.39)

Conversely, if B 6= ∅, then every relation R ⊆ A × B uniquely corresponds to the


function
fR : A −→ P(B), fR (x) = {y ∈ B : x R y}. (2.40)

Definition 2.21. Let R be a relation on the set A.

(a) R is called reflexive if, and only if,

∀ x R x, (2.41)
x∈A

i.e. if, and only if, every element is related to itself.

(b) R is called symmetric if, and only if,



∀ xRy ⇒ yRx , (2.42)
x,y∈A

i.e. if, and only if, each x is related to y if, and only if, y is related to x.

(c) R is called antisymmetric if, and only if,



∀ (x R y ∧ y R x) ⇒ x = y , (2.43)
x,y∈A

i.e. if, and only if, the only possibility for x to be related to y at the same time that
y is related to x is in the case x = y.
2 FUNCTIONS AND RELATIONS 36

(d) R is called transitive if, and only if,



∀ (x R y ∧ y R z) ⇒ x R z , (2.44)
x,y,z∈A

i.e. if, and only if, the relatedness of x and y together with the relatedness of y and
z implies the relatedness of x and z.

Example 2.22. The relations = and ≤ on R (or N) are reflexive, antisymmetric, and
transitive; = is also symmetric, whereas ≤ is not; < is antisymmetric (since x < y∧y < x
is always false) and transitive, but neither reflexive nor symmetric. The relation

R := (x, y) ∈ N2 : (x, y are both even) ∨ (x, y are both odd) (2.45)

on N is not antisymmetric, but reflexive, symmetric, and transitive. The relation

S := {(x, y) ∈ N2 : y = x2 } (2.46)

is not transitive (for example, 2 S 4 and 4 S 16, but not 2 S 16), not reflexive, not sym-
metric; it is only antisymmetric.

2.2.2 Order Relations

Definition 2.23. A relation R on a set A is called a partial order if, and only if, R is
reflexive, antisymmetric, and transitive. If R is a partial order, then one usually writes
x ≤ y instead of x R y. A partial order ≤ is called a total or linear order if, and only if,
for each x, y ∈ A, one has x ≤ y or y ≤ x.

Notation 2.24. Given a (partial or total) order ≤ on A 6= ∅, we write x < y if, and
only if, x ≤ y and x 6= y, calling < the strict order corresponding to ≤ (note that the
strict order is never a partial order).

Definition 2.25. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A.

(a) x ∈ A is called lower (resp. upper) bound for B if, and only if, x ≤ b (resp. b ≤ x)
for each b ∈ B. Moreover, B is called bounded from below (resp. from above) if, and
only if, there exists a lower (resp. upper) bound for B; B is called bounded if, and
only if, it is bounded from above and from below.

(b) x ∈ B is called minimum or just min (resp. maximum or max) of B if, and only if,
x is a lower (resp. upper) bound for B. One writes x = min B if x is minimum and
x = max B if x is maximum.
2 FUNCTIONS AND RELATIONS 37

(c) A maximum of the set of lower bounds of B (i.e. a largest lower bound) is called
infimum of B, denoted inf B; a minimum of the set of upper bounds of B (i.e. a
smallest upper bound) is called supremum of B, denoted sup B.
Example 2.26. (a) For each A ⊆ R, the usual relation ≤ defines a total order on A.
For A = R, we see that N has 0 and 1 as lower bound with 1 = min N = inf N. On
the other hand, N is unbounded from above. The set M := {1, 2, 3} is bounded
with min M = 1, max M = 3. The positive real numbers R+ := {x ∈ R : x > 0}
have inf R+ = 0, but they do not have a minimum (if x > 0, then 0 < x/2 < x).
(b) Consider A := N × N. Then

(m1 , m2 ) ≤ (n1 , n2 ) ⇔ m1 ≤ n1 ∧ m2 ≤ n2 , (2.47)

defines a partial order on A that is not a total order (for example, neither (1, 2) ≤
(2, 1) nor (2, 1) ≤ (1, 2)). For the set

B := (1, 1), (2, 1), (1, 2) , (2.48)

we have inf B = min B = (1, 1), B does not have a max, but sup B = (2, 2) (if
(m, n) ∈ A is an upper bound for B, then (2, 1) ≤ (m, n) implies 2 ≤ m and
(1, 2) ≤ (m, n) implies 2 ≤ n, i.e. (2, 2) ≤ (m, n); since (2, 2) is clearly an upper
bound for B, we have proved sup B = (2, 2)).
A different order on A is the so-called lexicographic order defined by

(m1 , m2 ) ≤ (n1 , n2 ) ⇔ m1 < n1 ∨ (m1 = n1 ∧ m2 ≤ n2 ). (2.49)

In contrast to the order from (2.47), the lexicographic order does define a total
order on A.
Lemma 2.27. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. Then the relation ≥,
defined by
x ≥ y ⇔ y ≤ x, (2.50)
is also a partial order on A. Moreover, using obvious notation, we have, for each x ∈ A,

x ≤-lower bound for B ⇔ x ≥-upper bound for B, (2.51a)


x ≤-upper bound for B ⇔ x ≥-lower bound for B, (2.51b)
x = min≤ B ⇔ x = max≥ B, (2.51c)
x = max≤ B ⇔ x = min≥ B, (2.51d)
x = inf ≤ B ⇔ x = sup≥ B, (2.51e)
x = sup≤ B ⇔ x = inf ≥ B. (2.51f)
2 FUNCTIONS AND RELATIONS 38

Proof. Reflexivity, antisymmetry, and transitivity of ≤ clearly imply the same properties
for ≥, respectively. Moreover

x ≤-lower bound for B ⇔ ∀ x ≤ b ⇔ ∀ b ≥ x ⇔ x ≥-upper bound for B,


b∈B b∈B

proving (2.51a). Analogously, we obtain (2.51b). Next, (2.51c) and (2.51d) are implied
by (2.51a) and (2.51b), respectively. Finally, (2.51e) is proved by

x = inf ≤ B ⇔ x = max≤ {y ∈ A : y ≤-lower bound for B}


⇔ x = min≥ {y ∈ A : y ≥-upper bound for B} ⇔ x = sup≥ B,

and (2.51f) follows analogously. 

Proposition 2.28. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. The elements


max B, min B, sup B, inf B are all unique, provided they exist.

Proof. Exercise. 

Definition 2.29. Let A, B be nonempty sets with partial orders, both denoted by ≤
(even though they might be different). A function f : A −→ B, is called (strictly)
isotone, order-preserving, or increasing if, and only if,

∀ x < y ⇒ f (x) ≤ f (y) (resp. f (x) < f (y)) ; (2.52a)
x,y∈A

f is called (strictly) antitone, order-reversing, or decreasing if, and only if,



∀ x < y ⇒ f (x) ≥ f (y) (resp. f (x) > f (y)) . (2.52b)
x,y∈A

Functions that are (strictly) isotone or antitone are called (strictly) monotone.

Proposition 2.30. Let A, B be nonempty sets with partial orders, both denoted by ≤.

(a) A (strictly) isotone function f : A −→ B becomes a (strictly) antitone function


and vice versa if precisely one of the relations ≤ is replaced by ≥.

(b) If the order ≤ on A is total and f : A −→ B is strictly isotone or strictly antitone,


then f is one-to-one.

(c) If the order ≤ on A is total and f : A −→ B is invertible and strictly isotone (resp.
antitone), then f −1 is also strictly isotone (resp. antitone).
2 FUNCTIONS AND RELATIONS 39

Proof. (a) is immediate from (2.52).


(b): Due to (a), it suffices to consider the case that f is strictly isotone. If f is strictly
isotone and x 6= y, then x < y or y < x since the order on A is total. Thus, f (x) < f (y)
or f (y) < f (x), i.e. f (x) 6= f (y) in every case, showing f is one-to-one.
(c): Again, due to (a), it suffices to consider the isotone case. If u, v ∈ B such that u < v,
then u = f (f −1 (u)), v = f (f −1 (v)), and the isotonicity of f imply f −1 (u) < f −1 (v) (we
are using that the order on A is total – otherwise, f −1 (u) and f −1 (v) need not be
comparable). 
Example 2.31. (a) f : N −→ N, f (n) := 2n, is strictly increasing, every constant map
on N is both increasing and decreasing, but not strictly increasing or decreasing.
All maps occurring in (2.25) are neither increasing nor decreasing.
(b) The map f : R −→ R, f (x) := −2x, is invertible and strictly decreasing, and so is
f −1 : R −→ R, f −1 (x) := −x/2.
(c) The following counterexamples show that the assertions of Prop. 2.30(b),(c) are no
longer correct if one does not assume the order on A is total. Let A be the set from
(2.48) (where it had been called B) with the (nontotal) order from (2.47). The map

f (1, 1) := 1,

f : A −→ N, f (1, 2) := 2,


f (2, 1) := 2,
is strictly isotone, but not one-to-one. The map

f (1, 1) := 1,

f : A −→ {1, 2, 3}, f (1, 2) := 2,


f (2, 1) := 3,

is strictly isotone and invertible, however f −1 is not isotone (since 2 < 3, but
f −1 (2) = (1, 2) and f −1 (3) = (2, 1) are not comparable, i.e. f −1 (2) ≤ f −1 (3) is not
true).

2.2.3 Equivalence Relations

Definition 2.32. Let R be a relation on a set A.

(a) R is called an equivalence relation if, and only if, R is reflexive, symmetric, and
transitive. If R is an equivalence relations, then one often writes x ∼ y instead of
x R y.
2 FUNCTIONS AND RELATIONS 40

(b) Let ∼ := R be an equivalence relation on A. For each x ∈ A, define

[x] := {y ∈ A : x ∼ y} (2.53)

and call [x] the equivalence class of x. Moreover, each y ∈ [x] is called a represen-
tative of [x]. The set of all equivalence classes A/ ∼:= {[x] : x ∈ A} is called the
quotient set of A by ∼, and the map

π : A −→ A/ ∼, x 7→ [x], (2.54)

is called the corresponding quotient map, canonical map, or canonical projection.

The following Th. 2.33 shows that the equivalence classes of an equivalence relation
on a nonempty set A decompose A into disjoint sets and, conversely, that a given
decomposition of a nonempty set A into disjoint nonempty sets Ai , i ∈ I, gives rise
to a unique equivalence relation ∼ on A such that the Ai are precisely the equivalence
classes corresponding to ∼:

Theorem 2.33. Let A be a nonempty set.



(a) Given a disjoint union A = i∈I Ai with every Ai 6= ∅ (a so-called decomposition
of A), an equivalence relation on A is defined by

x ∼ y :⇔ ∃ x ∈ Ai ∧ y ∈ Ai . (2.55)
i∈I

Moreover, for the equivalence classes given by ∼, one then has


 
∀ ∀ x ∈ Ai ⇔ Ai = [x] . (2.56)
x∈A i∈I

(b) Given an equivalence relation ∼ on a nonempty set A, the equivalence classes given
by ∼ form a decomposition of A: One has
  
∀ [x] = [y] ⇔ x ∼ y ∧ [x] ∩ [y] = ∅ ⇔ ¬(x ∼ y) (2.57)
x,y∈A

and
˙
[
A= Ai , (2.58)
i∈I

where I := A/ ∼ is the quotient set of Def. 2.32(b) and Ai := i for each i ∈ I.


2 FUNCTIONS AND RELATIONS 41

Proof. (a): That ∼ is symmetric is immediate from (2.55). If x ∈ A, then, as A is the


union of the Ai , there exists i ∈ I with x ∈ Ai , showing x ∼ x, i.e. ∼ is reflexive. If
x, y, z ∈ A with x ∼ y and y ∼ z, then there exist i, j ∈ I with x, y ∈ Ai and y, z ∈ Aj .
Then y ∈ Ai ∩ Aj , implying i = j (as the union is disjoint) and x ∼ z, showing ∼ to be
transitive as well. Thus, we have shown ∼ to be an equivalence relation. Now consider
x ∈ A and i ∈ I. If Ai = [x], then x ∼ x implies x ∈ [x] = Ai . Conversely, assume
x ∈ Ai . Then
(2.55)
y ∈ Ai ⇔ x ∼ y ⇔ y ∈ [x],
proving Ai = [x]. Hence, we have verified (2.56) and (a).
(b): Let x, y ∈ A. If [x] = [y], then (as y ∼ y) y ∈ [y] = [x], implying x ∼ y. Conversely,
assume x ∼ y. Then z ∈ [y] implies y ∼ z, implying x ∼ z (since ∼ is transitive) and,
thus, z ∈ [x] and [y] ⊆ [x]. From this, and the symmetry of ∼, we also obtain x ∼ y
implies y ∼ x, which implies [x] ⊆ [y]. Altogether, we have [x] = [y]. Thus, we have
established the first equivalence of (2.57). In consequence, we also have

[x] 6= [y] ⇔ ¬(x ∼ y).

To prove the second equivalence of (2.57), we now show [x] 6= [y] ⇔ [x] ∩ [y] = ∅: If
[x] ∩ [y] = ∅, then [x] = [y] could only hold for [x] = [y] = ∅. However, x ∈ [x] and
y ∈ [y], showing [x] 6= [y]. For the converse, we argue via contraposition and assume
z ∈ [x] ∩ [y]. Then x ∼ z and y ∼ z and, by symmetry and transitivity of ∼, x ∼ y and
[x] = [y], proving the second equivalence of (2.57). It remains to verify (2.58). From
(2.57), we know the elements of A/ ∼ to be disjoint. On the other hand, if x ∈ A, then
x ∈ [x] ∈ A/ ∼, showing A to be the union of the Ai , thereby proving (2.58) and the
theorem. 

Example 2.34. (a) The equality relation = is an equivalence relation on each A 6= ∅,


where, for each x ∈ A, one has [x] = {x}.

(b) The relation R defined in (2.45) is an equivalence relation on N. Here, R yields pre-
cisely two equivalence classes, one consisting of all even numbers and one consisting
of all odd numbers.

Remark 2.35. If ∼ is an equivalence relation on a nonempty set A, then, clearly, the


quotient map π : A −→ A/ ∼, x 7→ [x], is always surjective. It is injective (and, thus,
bijective) if, and only if, every equivalence class has precisely one element, i.e. if, and
only if, ∼ is the equality relation =.

Example 2.36. (a) An important application of equivalence relations and quotient


sets is the construction of the set Z = {. . . , −2, −1, 0, 1, 2, . . . } of integers from the
set N0 of natural numbers (including 0) of Def. 1.28: One actually defines Z as the
2 FUNCTIONS AND RELATIONS 42

quotient set with respect to the following equivalence relation ∼ on N0 × N0 , where


the relation ∼ on N0 × N0 is defined1 by

(a, b) ∼ (c, d) :⇔ a + d = b + c. (2.59)

We verify that (2.59) does, indeed, define an equivalence relation on N0 × N0 : If


a, b ∈ N0 , then a + b = b + a shows (a, b) ∼ (a, b), proving ∼ to be reflexive. If
a, b, c, d ∈ N0 , then

(a, b) ∼ (c, d) ⇒ a+d=b+c ⇒ c+b=d+a ⇒ (c, d) ∼ (a, b),

proving ∼ to be symmetric. If a, b, c, d, e, f ∈ N0 , then

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f ) ⇒ a + d = b + c ∧ c + f = d + e


⇒ a + d + c + f = b + c + d + e ⇒ a + f = b + e ⇒ (a, b) ∼ (e, f ),

proving ∼ to be transitive and an equivalence relation. Thus, we can, indeed, define



Z := (N0 × N0 )/ ∼ = [(a, b)] : (a, b) ∈ N0 × N0 . (2.60)

To simplify notation, in the following, we will write

[a, b] := [(a, b)] (2.61)

for the equivalence class of (a, b) with respect to ∼. The map

ι : N0 −→ Z, ι(n) := [n, 0], (2.62)

is injective (since ι(m) = [m, 0] = ι(n) = [n, 0] implies m + 0 = 0 + n, i.e. m = n). It


is customary to identify N0 with ι(N0 ), as it usually does not cause any confusion.
One then just writes n instead of [n, 0] and −n instead of [0, n] = −[n, 0] (we will
come back to the addition on Z later and then this equation will make more sense,
cf. Th. 4.15).

(b) Having constructed the set if integers Z in (a), in a next step, one can perform a
similar construction to obtain the set of rational numbers Q. One defines Q as the
1
In (2.59), we employ the usual addition on N. Its mathematically precise definition is actually
somewhat tedious and, at this stage, we do not even have all the necessary prerequisites in place, yet.
Still, it should be possible to follow the present example, using one’s intuitive, informal understanding
of the addition on N0 . The precise definition can be found in [Phi16, Sec. D.1] and interested readers
might want to study the definition in [Phi16, Sec. D.1] once we have introduced the notions of induction
and recursion.
2 FUNCTIONS AND RELATIONS 43

quotient set with respect to the following equivalence relation ∼ on Z × (Z \ {0}),


where the relation ∼ on Z × (Z \ {0}) is defined2 by

(a, b) ∼ (c, d) :⇔ a · d = b · c. (2.63)

Noting that (2.63) is precisely the same as (2.59) if + is replaced by ·, the proof
from (a) also shows that (2.63) does, indeed, define an equivalence relation on
Z × (Z \ {0}): One merely replaces each + with · and each N0 with Z or Z \ {0}),
respectively. The only modification needed occurs for 0 ∈ {a, c, e} in the proof of
transitivity (in this case, the proof of (a) yields adcf = 0 = bcde, which does not
imply af = be), where one now argues, for a = 0,

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f ) ⇒ ad = 0 = bc ∧ cf = de


b6=0 d6=0
⇒ c=0 ⇒ e=0 ⇒ af = 0 = be ⇒ (a, b) ∼ (e, f ),

for c = 0,

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f ) ⇒ ad = 0 = bc ∧ cf = 0 = de


d6=0
⇒ a = e = 0 af = 0 = be ⇒ (a, b) ∼ (e, f ),

and, for e = 0,

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f ) ⇒ ad = bc ∧ cf = 0 = de


f 6=0 d6=0
⇒ c=0 ⇒ a=0 ⇒ af = 0 = be ⇒ (a, b) ∼ (e, f ).

Thus, we can, indeed, define


 
Q := Z × (Z \ {0}) / ∼ = [(a, b)] : (a, b) ∈ Z × (Z \ {0}) . (2.64)

As is common, we will write


a
:= a/b := [(a, b)] (2.65)
b
for the equivalence class of (a, b) with respect to ∼. The map
k
ι : Z −→ Q, ι(k) := , (2.66)
1
2
In (2.63), we employ the usual multiplication on Z, as we used addition on N0 in (2.59) above,
this time appealing to the reader’s intuitive, informal understanding of multiplication on Z. We will
actually provide a mathematically precise definition of multiplication on Z later in this class in Ex.
4.33, making use of the definition of Z given in (a) (readers who do not want to wait can, e.g., consult
[Phi16, Def. D.16]).
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 44

is injective (since ι(k) = k/1 = ι(l) = l/1 implies k · 1 = l · 1, i.e. k = l). It is


customary to identify Z with ι(Z), as it usually does not cause any confusion. One
then just writes k instead of k1 .

While the set of real numbers R can now also be constructed from the set Q, making
use of the notions of equivalence relation and quotient set, the construction is more
complicated and it also makes use of additional notions from the field of Analysis.
Thus, this construction is not within the scope of this class and the interested reader is
referred to [Phi16, Sec. D.5].

3 Natural Numbers, Induction, and the Size of Sets

3.1 Induction and Recursion


One of the most useful proof techniques is the method of induction – it is used in
situations, where one needs to verify the truth of statements φ(n) for each n ∈ N, i.e.
the truth of the statement
∀ φ(n). (3.1)
n∈N

Induction is based on the fact that N satisfies the so-called Peano axioms:

P1: N contains a special element called one, denoted 1.

P2: There exists an injective map S : N −→ N \ {1}, called the successor function (for
each n ∈ N, S(n) is called the successor of n).

P3: If a subset A of N has the property that 1 ∈ A and S(n) ∈ A for each n ∈ A, then
A is equal to N. Written as a formula, the third axiom is:

∀ 1 ∈ A ∧ S(A) ⊆ A ⇒ A = N .
A∈P(N)

Remark 3.1. In Def. 1.28, we had introduced the natural numbers N := {1, 2, 3, . . . }.
The successor function is S(n) = n + 1. In axiomatic set theory, one starts with the
Peano axioms and shows that the axioms of set theory allow the construction of a
set N which satisfies the Peano axioms. One then defines 2 := S(1), 3 := S(2), . . . ,
n + 1 := S(n). The interested reader can find more details in [Phi16, Sec. D.1].
Theorem 3.2 (Principle of Induction). Suppose, for each n ∈ N, φ(n) is a statement
(i.e. a predicate of n in the language of Def. 1.31). If (a) and (b) both hold, where
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 45

(a) φ(1) is true,



(b) ∀ φ(n) ⇒ φ(n + 1) ,
n∈N

then (3.1) is true, i.e. φ(n) is true for every n ∈ N.

Proof. Let A := {n ∈ N : φ(n)}. We have to show A = N. Since 1 ∈ A by (a), and


(b)
n ∈ A ⇒ φ(n) ⇒ φ(n + 1) ⇒ S(n) = n + 1 ∈ A, (3.2)

i.e. S(A) ⊆ A, the Peano axiom P3 implies A = N. 

Remark 3.3. To prove some φ(n) for each n ∈ N by induction according to Th. 3.2
consists of the following two steps:

(a) Prove φ(1), the so-called base case.

(b) Perform the inductive step, i.e. prove that φ(n) (the induction hypothesis) implies
φ(n + 1).

Example 3.4. We use induction to prove the statement


 
n(n + 1)
∀ 1 + 2 + ··· + n = : (3.3)
n∈N 2
| {z }
φ(n)

1·2
Base Case (n = 1): 1 = 2
, i.e. φ(1) is true.
n(n+1)
Induction Hypothesis: Assume φ(n), i.e. 1 + 2 + · · · + n = 2
holds.
Induction Step: One computes

φ(n) n(n + 1) n(n + 1) + 2n + 2
1 + 2 + · · · + n + (n + 1) = +n+1=
2 2
2
n + 3n + 2 (n + 1)(n + 2)
= = , (3.4)
2 2
i.e. φ(n + 1) holds and the induction is complete.

Corollary 3.5. Theorem 3.2 remains true if (b) is replaced by


  
∀ ∀ φ(m) ⇒ φ(n + 1) . (3.5)
n∈N 1≤m≤n
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 46

Proof. If, for each n ∈ N, we use ψ(n) to denote ∀ φ(m), then (3.5) is equivalent to
1≤m≤n

∀ ψ(n) ⇒ ψ(n + 1) , i.e. to Th. 3.2(b) with φ replaced by ψ. Thus, Th. 3.2 implies
n∈N
ψ(n) holds true for each n ∈ N, i.e. φ(n) holds true for each n ∈ N. 

Corollary 3.6. Let I be an index set. Suppose, for each i ∈ I, φ(i) is a statement. If
there is a bijective map f : N −→ I and (a) and (b) both hold, where

(a) φ f (1) is true,
  
(b) ∀ φ f (n) ⇒ φ f (n + 1) ,
n∈N

then φ(i) is true for every i ∈ I.


Finite Induction: The above assertion remains true if f : {1, . . . , m} −→ I is bijective
for some m ∈ N and N in (b) is replaced by {1, . . . , m − 1}.

Proof. If, for each n ∈ N, we use ψ(n) to denote φ f (n) , then Th. 3.2 shows ψ(n) is
true for every n ∈ N. Given i ∈ I, we have n := f −1 (i) ∈ N with f (n) = i, showing that
φ(i) = φ f (n) = ψ(n) is true.

For the finite induction, let ψ(n) denote n ≤ m ∧ φ f (n) ∨ n > m. Then, for 1 ≤
n < m, we have ψ(n) ⇒ ψ(n + 1) due to (b). For n ≥ m, we also have ψ(n) ⇒ ψ(n + 1)
due to n ≥ m ⇒ n + 1 > m. Thus, Th. 3.2 shows ψ(n) is true for every n ∈ N. Given 
i ∈ I, it is n := f −1 (i) ∈ {1, . . . , m} with f (n) = i. Since n ≤ m ∧ ψ(n) ⇒ φ f (n) , we
obtain that φ(i) is true. 

Apart from providing a widely employable proof technique, the most important appli-
cation of Th. 3.2 is the possibility to define sequences (i.e. functions with domain N, cf.
Def. 2.15(b)) inductively, using so-called recursion:

Theorem 3.7 (Recursion Theorem). Let A be a nonempty set and x ∈ A. Given a


sequence of functions (fn )n∈N , where fn : An −→ A, there exists a unique sequence
(xn )n∈N in A satisfying the following two conditions:

(i) x1 = x.

(ii) ∀ xn+1 = fn (x1 , . . . , xn ).


n∈N

The same holds if N is replaced by an index set I as in Cor. 3.6.


3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 47

Proof. To prove uniqueness, let (xn )n∈N and (yn )n∈N be sequences in A, both satisfying
(i) and (ii), i.e.

x1 = y1 = x and (3.6a)

∀ xn+1 = fn (x1 , . . . , xn ) ∧ yn+1 = fn (y1 , . . . , yn ) . (3.6b)
n∈N

We prove by induction (in the form of Cor. 3.5) that (xn )n∈N = (yn )n∈N , i.e.

∀ xn = yn : (3.7)
n∈N | {z }
φ(n)

Base Case (n = 1): φ(1) is true according to (3.6a).


Induction Hypothesis: Assume φ(m) for each m ∈ {1, . . . , n}, i.e. xm = ym holds for
each m ∈ {1, . . . , n}.
Induction Step: One computes

(3.6b) φ(1),...,φ(n) (3.6b)
xn+1 = fn (x1 , . . . , xn ) = fn (y1 , . . . , yn ) = yn+1 , (3.8)

i.e. φ(n + 1) holds and the induction is complete.


To prove existence, we have to show that there is a function F : N −→ A such that the
following two conditions hold:

F (1) = x, (3.9a)

∀ F (n + 1) = fn F (1), . . . , F (n) . (3.9b)
n∈N

To this end, let


 
  
F := B ⊆ N × A : (1, x) ∈ B ∧ ∀ n + 1, fn (a1 , . . . , an ) ∈ B (3.10)
 n∈N, 
(1,a1 ),...,(n,an )∈B

and \
G := B. (3.11)
B∈F

Note that G is well-defined, as N × A ∈ F. Also, clearly, G ∈ F. We would like to


define F such that G = graph(F ). For this to be possible, we will show, by induction,

∀ ∃! (n, xn ) ∈ G . (3.12)
n∈N xn ∈A
| {z }
φ(n)
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 48

Base Case (n = 1): From the definition of G, we know (1, x) ∈ G. If (1, a) ∈ G with
a 6= x, then H := G \ {(1, a)} ∈ F, implying G ⊆ H in contradiction to (1, a) ∈
/ H.
This shows a = x and proves φ(1).
Induction Hypothesis: Assume φ(m) for each m ∈ {1, . . . , n}.
Induction Step: From the induction hypothesis, we know

∃! (1, x1 ), . . . , (n, xn ) ∈ G.
(x1 ,...,xn )∈An

Thus, if we let xn+1 := fn (x1 , . . . , xn ), then (n + 1, xn+1 ) ∈ G by the definition of G. If


(n + 1, a) ∈ G with a 6= xn+1 , then H := G \ {(n + 1, a)} ∈ F (using the uniqueness of
the (1, x1 ), . . . , (n, xn ) ∈ G), implying G ⊆ H in contradiction to (n + 1, a) ∈ / H. This
shows a = xn+1 , proves φ(n + 1), and completes the induction.
Due to (3.12), we can now define F : N −→ A, F (n) := xn , and the definition of G then
guarantees the validity of (3.9). 
Example 3.8. In many applications of Th. 3.7, one has functions gn : A −→ A and
uses 
∀ fn : An −→ A, fn (x1 , . . . , xn ) := gn (xn ) . (3.13)
n∈N

Here are some important concrete examples:

(a) The factorial function F : N0 −→ N, n 7→ n!, is defined recursively by

0! := 1, 1! := 1, ∀ (n + 1)! := (n + 1) · n!, (3.14a)


n∈N

i.e. we have A = N and gn (x) := (n + 1) · x. So we obtain

(n!)n∈N0 = (1, 1, 2, 6, 24, 120, . . . ). (3.14b)

(b) Summation Symbol: On A = R (or, more generally, on every set A, where an


addition + : A × A −→ A is defined), define recursively, for each given (possibly
finite) sequence (a1 , a2 , . . . ) in A:
1
X n+1
X n
X
ai := a1 , ai := an+1 + ai for n ≥ 1, (3.15a)
i=1 i=1 i=1

i.e. we have
gn : A −→ A, gn (x) := x + an+1 . (3.15b)
In (3.15a), one can also use other symbols for i, except a and n; for a finite sequence,
n needs to be less than the maximal index of the finite sequence.
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 49

More generally, if I is an index set and φ : {1, . . . , n} −→ I a bijective map, then


define n
X X
ai := aφ(i) . (3.15c)
i∈I i=1

The commutativity of addition implies that the definition in (3.15c) is actually


independent of the chosen bijective map φ (cf. Th. B.4 in the Appendix). Also
define X
ai := 0 (3.15d)
i∈∅

(for a general A, 0 is meant to be an element such that a + 0 = 0 + a = a for each


a ∈ A and we can even define this if 0 ∈/ A).

(c) Product Symbol: On A = R (or, more generally, on every set A, where a multiplica-
tion · : A × A −→ A is defined), define recursively, for each given (possibly finite)
sequence (a1 , a2 , . . . ) in A:
1
Y n+1
Y n
Y
ai := a1 , ai := an+1 · ai for n ≥ 1, (3.16a)
i=1 i=1 i=1

i.e. we have
gn : A −→ A, gn (x) := an+1 · x. (3.16b)
In (3.16a), one can also use other symbols for i, except a and n; for a finite sequence,
n needs to be less than the maximal index of the finite sequence.
More generally, if I is an index set and φ : {1, . . . , n} −→ I a bijective map, then
define n
Y Y
ai := aφ(i) . (3.16c)
i∈I i=1

The commutativity of multiplication implies that the definition in (3.16c) is actually


independent of the chosen bijective map φ (cf. Th. B.4 in the Appendix); however,
we will see later that, for a general multiplication on a set A, commutativity will
not always hold (an important example will be matrix multiplication), and, in that
case, the definition in (3.16c) does, in general, depend on the chosen bijective map
φ. Also define Y
ai := 1 (3.16d)
i∈∅

(for a general A, 1 is meant to be an element such that a · 1 = 1 · a = a for each


a ∈ A and we can even define this if 1 ∈
/ A).
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 50

Example 3.9. As an (academic) example, where, in each step, the recursive definition
does depend on all previously computed values, consider the sequence (xn )n∈N , defined
by
n
1 Y
x1 := 1, ∀ xn+1 := xi ,
n∈N n i=1
i.e. by setting A := N and
n
1 Y
fn : An −→ A, fn (x1 , . . . , xn ) := xi .
n i=1

One obtains
 
1 1 1
x1 = 1, x2 = f1 (1) = 1, x3 = f2 (1, 1) = , x4 = f3 1, 1, = ,
2 2 6
   
1 1 1 1 1 1 1
x5 = f4 1, 1, , = , x6 = f5 1, 1, , , = , ...
2 6 48 2 6 48 2880

In the above recursive definitions, we have always explicitly specified A and the gn or
fn . However, in the literature as well as in the rest of this class, most of the time, the
gn or fn are not provided explicitly.

3.2 Cardinality: The Size of Sets


Cardinality measures the size of sets. For a finite set A, it is precisely the number of
elements in A. For an infinite set, it classifies the set’s degree or level of infinity (it turns
out that not all infinite sets have the same size).

3.2.1 Definition and General Properties

Definition 3.10. (a) The sets A, B are defined to have the same cardinality or the
same size if, and only if, there exists a bijective map ϕ : A −→ B. According to
Th. 3.11 below, this defines an equivalence relation on every set of sets.

(b) The cardinality of a set A is n ∈ N (denoted #A = n) if, and only if, there exists
a bijective map ϕ : A −→ {1, . . . , n}. The cardinality of ∅ is defined as 0, i.e.
#∅ := 0. A set A is called finite if, and only if, there exists n ∈ N0 such that
#A = n; A is called infinite if, and only if, A is not finite, denoted #A = ∞ (in the
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 51

strict sense, this is an abuse of notation, since ∞ is not a cardinality – for example
#N = ∞ and #P(N) = ∞, but N and P(N) do not have the same cardinality, since
the power set P(A) is always strictly bigger than A (see Th. 3.16 below) – #A = ∞
is merely an abbreviation for the statement “A is infinite”). The interested student
finds additional material regarding characterizations of infinite sets in Th. A.54 of
the Appendix.

(c) The set A is called countable if, and only if, A is finite or A has the same cardinality
as N. Otherwise, A is called uncountable.

Theorem 3.11. Let M be a set of sets. Then the relation ∼ on M, defined by

A ∼ B :⇔ A and B have the same cardinality,

constitutes an equivalence relation on M.

Proof. According to Def. 2.32, we have to prove that ∼ is reflexive, symmetric, and
transitive. According to Def. 3.10(a), A ∼ B holds for A, B ∈ M if, and only if, there
exists a bijective map f : A −→ B. Thus, since the identity Id : A −→ A is bijective,
A ∼ A, showing ∼ is reflexive. If A ∼ B, then there exists a bijective map f : A −→ B,
and f −1 is a bijective map f −1 : B −→ A, showing B ∼ A and that ∼ is symmetric.
If A ∼ B and B ∼ C, then there are bijective maps f : A −→ B and g : B −→ C.
Then, according to Th. 2.14, the composition (g ◦ f ) : A −→ C is also bijective, proving
A ∼ C and that ∼ is transitive. 

Theorem 3.12 (Schröder-Bernstein). Let A, B be sets. The following statements are


equivalent (even without assuming the axiom of choice):

(i) The sets A and B have the same cardinality (i.e. there exists a bijective map
φ : A −→ B).

(ii) There exist an injective map f : A −→ B and an injective map g : B −→ A.

We will base the proof of the Schröder-Bernstein theorem on the following lemma (for
an alternative proof, see [Phi16, Th. A.56]):

Lemma 3.13. Let A be a set. Consider P(A) to be endowed with the partial order
given by set inclusion, i.e., for each X, Y ∈ P(A), X ≤ Y if, and only if, X ⊆ Y . If
F : P(A) −→ P(A) is isotone with respect to that order, then F has a fixed point, i.e.
F (X0 ) = X0 for some X0 ∈ P(A).
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 52

Proof. Define \
A := {X ∈ P(A) : F (X) ⊆ X}, X0 := X
X∈A

(X0 is well-defined, since F (A) ⊆ A). Suppose X ∈ A, i.e. F (X) ⊆ X and X0 ⊆ X.


Then F (X0 ) ⊆ F (X) ⊆ X due to the isotonicity of F . Thus, F (X0 ) ⊆ X for every
X ∈ A, i.e. F (X0 ) ⊆ X0 . Using the isotonicity of F again shows F (F (X0 )) ⊆ F (X0 ),
implying F (X0 ) ∈ A and X0 ⊆ F (X0 ), i.e. F (X0 ) = X0 as desired. 

Proof of Th. 3.12. (i) trivially implies (ii), as one can simply set f := φ and g := φ−1 .
It remains to show (ii) implies (i). Thus, let f : A −→ B and g : B −→ A be injective.
To apply Lem. 3.13, define

F : P(A) −→ P(A), F (X) := A \ g B \ f (X) ,

and note

X ⊆ Y ⊆ A ⇒ f (X) ⊆ f (Y ) ⇒ B \ f (Y ) ⊆ B \ f (X)
 
⇒ g B \ f (Y ) ⊆ g B \ f (X) ⇒ F (X) ⊆ F (Y ).

Thus, by Lem. 3.13, F has a fixed point X0 . We claim that a bijection is obtained via
setting (
f (x) for x ∈ X0 ,
φ : A −→ B, φ(x) := −1
g (x) for x ∈ / X0 .

First, φ is well-defined, since x ∈/ X0 = F (X0 ) implies x ∈ g B \ f (X0 ) . To verify
that φ is injective, let x, y ∈ A, x 6= y. If x, y ∈ X0 , then φ(x) 6= φ(y), as f is
injective. If x, y ∈ A \ X0 , then φ(x) 6= φ(y), as g −1 is well-defined. If x ∈ X0 and
y∈/ X0 , then φ(x) ∈ f (X0 ) and φ(y) ∈ B \ f (X0 ), once again, implying φ(x) 6= φ(y). If
remains to prove surjectivity. If b ∈ f (X0 ), then φ(f −1 (b)) = b. If b ∈ B \ f (X0 ), then
g(b) ∈
/ X0 = F (X0 ), i.e. φ(g(b)) = b, showing φ to be surjective. 
Theorem 3.14. Let A, B be nonempty sets. Then the following statements are equiva-
lent (where the implication “(ii) ⇒ (i)” makes use of the axiom of choice (AC)).

(i) There exists an injective map f : A −→ B.


(ii) There exists a surjective map g : B −→ A.

Proof. According to Th. 2.13(b), (i) is equivalent to f having a left inverse g : B −→ A


(i.e. g ◦ f = IdA ), which is equivalent to g having a right inverse, which, according to
Th. 2.13(a), is equivalent to (ii) (AC is used in the proof of Th. 2.13(a) to show each
surjective map has a right inverse). 
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 53

Corollary 3.15. Let A, B be nonempty sets. Using AC, we can expand the two equiv-
alent statements of Th. 3.12 to the following list of equivalent statements:

(i) The sets A and B have the same cardinality (i.e. there exists a bijective map
φ : A −→ B).

(ii) There exist an injective map f : A −→ B and an injective map g : B −→ A.

(iii) There exist a surjective map f : A −→ B and a surjective map g : B −→ A.

(iv) There exist an injective map f1 : A −→ B and a surjective map f2 : A −→ B.

(v) There exist an injective map g1 : B −→ A and a surjective map g2 : B −→ A.

Proof. The equivalences are an immediate consequence of combining Th. 3.12 with Th.
3.14. 
Theorem 3.16. Let A be a set. There can never exist a surjective map from A onto
P(A) (in this sense, the size of P(A) is always strictly bigger than the size of A; in
particular, A and P(A) can never have the same size).

Proof. If A = ∅, then there is nothing to prove. For nonempty A, the idea is to


conduct a proof by contradiction. To this end, assume there does exist a surjective map
f : A −→ P(A) and define

B := {x ∈ A : x ∈
/ f (x)}.

Now B is a subset of A, i.e. B ∈ P(A) and the assumption that f is surjective implies
the existence of a ∈ A such that f (a) = B. If a ∈ B, then a ∈ / f (a) = B, i.e. a ∈ B
implies a ∈ B ∧ ¬(a ∈ B), so that the principle of contradiction tells us a ∈ / B must be
true. However, a ∈/ B implies a ∈ f (a) = B, i.e., this time, the principle of contradiction
tells us a ∈ B must be true. In conclusion, we have shown our original assumption that
there exists a surjective map f : A −→ P(A) implies a ∈ B ∧ ¬(a ∈ B), i.e., according
to the principle of contradiction, no surjective map from A into P(A) can exist. 

3.2.2 Finite Sets

While many results concerning cardinalities of finite sets seem intuitively clear, as a
mathematician, one still has to provide a rigorous proof. Providing such proofs in the
present section also provides us with the opportunity to see more examples of induction
proofs. We begin by showing finite cardinalities to be uniquely determined. The key is
the following theorem:
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 54

Theorem 3.17. If m, n ∈ N and the map f : {1, . . . , m} −→ {1, . . . , n} is bijective,


then m = n.

Proof. We conduct the proof via induction on m.


Base Case (m = 1): If m = 1, then the surjectivity of f implies n = 1.
Induction Step: Let m > 1. From the bijective map f , we define the map

n
 for x = m,
g : {1, . . . , m} −→ {1, . . . , n}, g(x) := f (m) for x = f −1 (n),


f (x) otherwise.

Then g is bijective, since it is the composition g = h ◦ f of the bijective map f with the
bijective map

h : {f (m), n} −→ {f (m), n}, h(f (m)) := n, h(n) := f (m).

Thus, the restriction g↾{1,...,m−1} : {1, . . . , m−1} −→ {1, . . . , n−1} must also be bijective,
such that the induction hypothesis yields m − 1 = n − 1, which, in turn, implies m = n
as desired. 

Corollary 3.18. Let m, n ∈ N and let A be a set. If #A = m and #A = n, then


m = n.

Proof. If #A = m, then, according to Def. 3.10(b), there exists a bijective map f :


A −→ {1, . . . , m}. Analogously, if #A = n, then there exists a bijective map g :
A −→ {1, . . . , n}. In consequence, we have the bijective map (g ◦ f −1 ) : {1, . . . , m} −→
{1, . . . , n}, such that Th. 3.17 yields m = n. 

Theorem 3.19. Let A 6= ∅ be a finite set.

(a) If B ⊆ A with A 6= B, then B is finite with #B < #A.



(b) If a ∈ A, then # A \ {a} = #A − 1.

Proof. For #A = n ∈ N, we use induction to prove (a) and (b) simultaneously, i.e. we
show
 

∀ #A = n ⇒ ∀ ∀ #B ∈ {0, . . . , n − 1} ∧ # A \ {a} = n − 1 .
n∈N B∈P(A)\{A} a∈A
| {z }
φ(n)
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 55

Base Case (n = 1): In this case, A has precisely one element, i.e. B = A \ {a} = ∅, and
#∅ = 0 = n − 1 proves φ(1).
Induction Step: For the induction hypothesis, we assume φ(n) to be true, i.e. we assume
(a) and (b) hold for each A with #A = n. We have to prove φ(n + 1), i.e., we consider
A with #A = n + 1. From #A = n + 1, we conclude the existence of a bijective map ϕ :
A −→ {1, . . . , n + 1}. We have to construct a bijective map ψ : A \ {a} −→ {1, . . . , n}.
To this end, set k := ϕ(a) and define the auxiliary function

n + 1 for x = k,

f : {1, . . . , n + 1} −→ {1, . . . , n + 1}, f (x) := k for x = n + 1,


x for x ∈
/ {k, n + 1}.
Then f ◦ ϕ : A −→ {1, . . . , n + 1} is bijective by Th. 2.14, and
(f ◦ ϕ)(a) = f (ϕ(a)) = f (k) = n + 1.
Thus, the restriction ψ := (f ◦ ϕ) ↾A\{a} is the desired bijective map ψ : A \ {a} −→
{1, . . . , n}, proving # A \ {a} = n. It remains to consider the strict subset B of A.
Since B is a strict subset of A, there
 exists a ∈ A \ B. Thus, B ⊆ A \ {a} and, as
we have already shown # A \ {a}  = n, the induction hypothesis applies and yields B
is finite with #B ≤ # A \ {a} = n, i.e. #B ∈ {0, . . . , n}, proving φ(n + 1), thereby
completing the induction. 
Theorem 3.20. For #A = #B = n ∈ N and f : A −→ B, the following statements
are equivalent:

(i) f is injective.
(ii) f is surjective.
(iii) f is bijective.

Proof. Exercise. 
Lemma 3.21. For each finite set A (i.e. #A = n ∈ N0 ) and each B ⊆ A, one has
#(A \ B) = #A − #B.

Proof. For B = ∅, the assertion is true since #(A \ B) = #A = #A − 0 = #A − #B.


For B 6= ∅, the proof is conducted over the size of B, i.e. as a finite induction (cf. Cor.
3.6) over the set {1, . . . , n}, showing

∀ #B = m ⇒ #(A \ B) = #A − #B .
m∈{1,...,n} | {z }
φ(m)
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 56

Base Case (m = 1): φ(1) is precisely the statement provided by Th. 3.19(b).
Induction Step: For the induction hypothesis, we assume φ(m) with 1 ≤ m < n. To
prove φ(m + 1), consider B ⊆ A with #B = m + 1. Fix an element b ∈ B and set
B1 := B \ {b}. Then #B1 = m by Th. 3.19(b), A \ B = (A \ B1 ) \ {b}, and we compute

 Th. 3.19(b) φ(m)
#(A \ B) = # (A \ B1 ) \ {b} = #(A \ B1 ) − 1 = #A − #B1 − 1
= #A − #B,

proving φ(m + 1) and completing the induction. 

Theorem 3.22. If A, B are finite sets, then #(A ∪ B) = #A + #B − #(A ∩ B).

Proof. The assertion is clearly true if A or B is empty. If A and B are nonempty, then
there exist m, n ∈ N such that #A = m and #B = n, i.e. there are bijective maps
f : A −→ {1, . . . , m} and g : B −→ {1, . . . , n}.
We first consider the case A∩B = ∅. We need to construct a bijective map h : A∪B −→
{1, . . . , m + n}. To this end, we define
(
f (x) for x ∈ A,
h : A ∪ B −→ {1, . . . , m + n}, h(x) :=
g(x) + m for x ∈ B.

The bijectivity of f and g clearly implies the bijectivity of h, proving #(A ∪ B) =


m + n = #A + #B.
˙
Finally, we consider the case of arbitrary A, B. Since A ∪ B = A ∪(B \ A) and B \ A =
B \ (A ∩ B), we can compute

#(A ∪ B) = # A ∪(B ˙ \ A) = #A + #(B \ A)
 Lem. 3.21
= #A + # B \ (A ∩ B) = #A + #B − #(A ∩ B),

thereby establishing the case. 

Theorem 3.23. If (A1 , . . . , An ), n ∈ N, is a finite sequence of finite sets, then


n
Y n
Y

# Ai = # A1 × · · · × An = #Ai . (3.17)
i=1 i=1

Proof. If at least one Ai is empty, then (3.17) is true, since both sides are 0.
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 57

The case where all Ai are nonempty is proved by induction over n, i.e. we know ki :=
#Ai ∈ N for each i ∈ {1, . . . , n} and show by induction
n
Y n
Y
∀ # Ai = ki .
n∈N
i=1 i=1
| {z }
φ(n)

Q1 Q1
Base Case (n = 1): i=1 Ai = #A1 = k1 = i=1 ki , i.e. φ(1) holds.
Induction Step: From the induction
Q hypothesis φ(n),
Q we obtain a bijective map ϕ :
A −→ {1, . . . , N }, where A := ni=1 Ai and N := ni=1 ki . To prove φ(n + 1), we need
to construct a bijective map h : A × An+1 −→ {1, . . . , N · kn+1 }. Since #An+1 = kn+1 ,
there exists a bijective map f : An+1 −→ {1, . . . , kn+1 }. We define

h : A × An+1 −→ {1, . . . , N · kn+1 },



h(a1 , . . . , an , an+1 ) := f (an+1 ) − 1 · N + ϕ(a1 , . . . , an ).

Since ϕ and f are bijective, and since every m ∈ {1, . . . , N · kn+1 } has a unique repre-
sentation in the form m = a · N + r with a ∈ {0, . . . , kn+1 − 1} and r ∈ {1, . . . , N } (see
Th. D.1 in the Appendix), h is also bijective. This proves φ(n + 1) and completes the
induction. 

Theorem 3.24. For each finite set A with #A = n ∈ N0 , one has #P(A) = 2n .

Proof. The proof is conducted by induction by showing



∀ #A = n ⇒ #P(A) = 2n .
n∈N0 | {z }
φ(n)

Base Case (n = 0): For n = 0, we have A = ∅, i.e. P(A) = {∅}. Thus, #P(A) = 1 = 20 ,
proving φ(0).
Induction Step: Assume φ(n) and consider A with #A = n + 1. Then A contains
 B := A \ {a}, we then
at least one element a. For know #B = n from Th. 3.19(b).
Moreover, setting M := C ∪ {a} : C ∈ P(B) , we have the disjoint decomposition
P(A) = P(B) ∪˙ M. As the map ϕ : P(B) −→ M, ϕ(C) := C ∪ {a}, is clearly bijective,
P(B) and M have the same cardinality. Thus,

Th. 3.22 φ(n)
#P(A) = #P(B) + #M = #P(B) + #P(B) = 2 · 2n = 2n+1 ,

thereby proving φ(n + 1) and completing the induction. 


3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 58

3.2.3 Countable Sets

In this section, we present a number of important results regarding the natural numbers
and countability.
Theorem 3.25. (a) Every nonempty finite subset of a totally ordered set has a mini-
mum and a maximum.
(b) Every nonempty subset of N has a minimum.

Proof. Exercise. 
Proposition 3.26. Every subset A of N is countable.

Proof. Since ∅ is countable, we may assume A 6= ∅. From Th. 3.25(b), we know that
every nonempty subset of N has a min. We recursively define a sequence in A by
(
min An if An := A \ {ai : 1 ≤ i ≤ n} 6= ∅,
a1 := min A, an+1 :=
an if An = ∅.

This sequence is the same as the function f : N −→ A, f (n) = an . An easy induction


shows that, for each n ∈ N, an 6= an+1 implies the restriction f ↾{1,...,n+1} is injective.
Thus, if there exists n ∈ N such that an = an+1 , then f ↾{1,...,k} : {1, . . . , k} −→ A is
bijective, where k := min{n ∈ N : an = an+1 }, showing A is finite, i.e. countable. If
there does not exist n ∈ N with an = an+1 , then f is injective. Another easy induction
shows that, for each n ∈ N, f ({1, . . . , n}) ⊇ {k ∈ A : k ≤ n}, showing f is also
surjective, proving A is countable. 
Proposition 3.27. For each set A 6= ∅, the following three statements are equivalent:

(i) A is countable.
(ii) There exists an injective map f : A −→ N.
(iii) There exists a surjective map g : N −→ A.

Proof. Directly from the definition of countable in Def. 3.10(c), one obtains (i)⇒(ii) and
(i)⇒(iii). To prove (ii)⇒(i), let f : A −→ N be injective. Then f : A −→ f (A) is
bijective, and, since f (A) ⊆ N, f (A) is countable by Prop. 3.26, proving A is countable
as well. To prove (iii)⇒(i), let g : N −→ A be surjective. Then g has a right inverse
f : A −→ N. One can obtain this from Th. 2.13(a), but, here, we can actually construct
f without the axiom of choice: For a ∈ A, let f (a) := min g −1 ({a}) (recall Th. 3.25(b)).
Then, clearly, g ◦ f = IdA . But this means g is a left inverse for f , showing f is injective
according to Th. 2.13(b). Then A is countable by an application of (ii). 
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 59

Qn
Theorem 3.28. If (A1 , . . . , An ), n ∈ N, is a finite family of countable sets, then i=1 Ai
is countable.

Proof. We first consider the special case n = 2 with A1 = A2 = N and show the map

ϕ : N × N −→ N, ϕ(m, n) := 2m · 3n ,

is injective: If ϕ(m, n) = ϕ(p, q), then 2m · 3n = 2p · 3q . Moreover m ≤ p or p ≤ m.


If m ≤ p, then 3n = 2p−m · 3q . Since 3n is odd, 2p−m · 3q must also be odd, implying
p − m = 0, i.e. m = p. Moreover, we now have 3n = 3q , implying n = q, showing
(m, n) = (p, q), i.e. ϕ is injective.
We now come back to the general case stated in the theorem. If at least one of the Ai is
empty, then A is empty. So it remains to consider the case, where all Ai are nonempty.
The proof is conducted by induction by showing
n
Y
∀ Ai is countable .
n∈N
i=1
| {z }
φ(n)

Base Case (n = 1): φ(1) is merely the hypothesis that A1 is countable.


Q
Induction Step: Assuming φ(n), Prop. 3.27(ii) provides injective maps f1 : Q ni=1 Ai −→
N and f2 : An+1 −→ N. To prove φ(n+1), we provide an injective map h : n+1 i=1 Ai −→
N: Define
n+1
Y 
h: Ai −→ N, h(a1 , . . . , an , an+1 ) := ϕ f1 (a1 , . . . , an ), f2 (an+1 ) .
i=1

The injectivity of f1 , f2 , and ϕ clearly implies the injectivity of h, thereby proving


φ(n + 1) and completing the induction. 

S sets (i.e. ∅ 6= I is countable


Theorem 3.29. If (Ai )i∈I is a countable family of countable
and each Ai , i ∈ I, is countable), then the union A := i∈I Ai is also countable (this
result makes use of AC, cf. Rem. 3.30 below).

Proof. It suffices to consider the case that all Ai are nonempty. Moreover, according to
Prop. 3.27(iii), it suffices to construct a surjective map ϕ : N −→ A. Also according
to Prop. 3.27(iii), the countability of I and the Ai provides us with surjective maps
f : N −→ I and gi : N −→ Ai (here AC is used to select each gi from the set of all
surjective maps from N onto Ai ). Define

F : N × N −→ A, F (m, n) := gf (m) (n).


4 BASIC ALGEBRAIC STRUCTURES 60

Then F is surjective: Given x ∈ A, there exists i ∈ I such that x ∈ Ai . Since f is


surjective, there is m ∈ N satisfying f (m) = i. Moreover, since gi is surjective, there
exists n ∈ N with gi (n) = x. Then F (m, n) = gi (n) = x, verifying that F is surjective.
As N × N is countable by Th. 3.28, there exists a surjective map h : N −→ N × N. Thus,
F ◦ h is the desired surjective map from N onto A. 

Remark 3.30. The axiom of choice is, indeed, essential for the proof of Th. 3.29. It
is shown in [Jec73, Th. 10.6] that it is consistent with the axioms of ZF (i.e. with the
axioms of Sec. A.3 of the Appendix) that, e.g., the uncountable sets P(N) and R (cf.
[Phi16, Th. F.2]) are countable unions of countable sets.

4 Basic Algebraic Structures


We are now in a position to conduct some actual algebra, that means we will start to
study sets A that come with a composition law ◦ : A × A −→ A, (a, b) 7→ a ◦ b, assumed
to satisfy a number of rules, so-called axioms. One goal is to prove further rules as a
consequence of the axioms. A main benefit of this method is the fact that these proved
rules are then known to hold in every structure, satisfying the axioms. Perhaps the
most simple structure that still occurs in a vast number of interesting places throughout
mathematics is that of a group (see Def. 4.5(b) below), examples including the set of
real numbers R with the usual addition as well as R \ {0} with the usual multiplication.
So every rule we will prove as a consequence of the group axioms will hold in these two
groups as well as in countless other groups. We note that vector spaces, the structures
of central interest to linear algebra, are, in particular, always groups, so that everything
we prove about groups in the following is, in particular, true in every vector space.

4.1 Magmas and Groups


Definition 4.1. Let A be a nonempty set. A map

◦ : A × A −→ A, (x, y) 7→ x ◦ y, (4.1)

is called a composition on A; the pair (A, ◦) is then called a magma. It is also common
Q and to write x · y or just xy instead of x ◦ y. In
to call a composition a multiplication
this situation, the product symbol is defined as in Ex. 3.8(c). If the composition is
called an addition and x + y is written instead of x ◦ y, then it is assumed that it is
commutative, that means it satisfies the law of Def. 4.2(a) below (a multiplication might
P
or might now satisfy commutativity). For an addition, the summation symbol is
defined as in Ex. 3.8(b).
4 BASIC ALGEBRAIC STRUCTURES 61

Definition 4.2. Let (A, ◦) be a magma.

(a) ◦ is called commutative or abelian if, and only if, x ◦ y = y ◦ x holds for all x, y ∈ A.
(b) ◦ is called associative if, and only if, x ◦ (y ◦ z) = (x ◦ y) ◦ z holds for all x, y, z ∈ A.
(c) An element e ∈ A is called left neutral (resp. right neutral) if, and only if,

∀ e ◦ x = x (resp. x ◦ e = x). (4.2)


x∈A

e ∈ A is called neutral if, and only if, it is both left neutral and right neutral.

If associativity holds for each triple of elements, then it also holds for each finite tuple
of elements; if the composition is associative and commutativity holds for each pair of
elements, then it also holds for each finite tuple of elements (see Th. B.3 and Th. B.4
in the Appendix).
Lemma 4.3. Let (A, ◦) be a magma with l, r ∈ A. If l is left neutral and r is right
neutral, then l = r (in particular, A contains at most one neutral element).

Proof. As l is left neutral and r is right neutral, we obtain r = l ◦ r = l. 


Notation 4.4. If a composition on a set A is written as a multiplication ·, then it
is common to write a neutral element as 1 and to also call it a unit or one. If a
(commutative) composition on a set A is written as an addition +, then it is common
to write a neutral element as 0 and to also call it zero.
Definition 4.5. Let (A, ◦) be a magma.

(a) (A, ◦) is called a semigroup if, and only if, ◦ is associative.


(b) (A, ◦) is called a group if, and only if, it is a semigroup with the additional property
that A contains a right neutral3 element e ∈ A and, for each x ∈ A, there exists an
inverse element x ∈ A, i.e. an element x ∈ A such that

x ◦ x = e. (4.3)

A group is called commutative or abelian if, and only if, ◦ is commutative.


3
We will see in Th. 4.6(a) that the remaining properties of a group then force this right neutral
element to be a neutral element as well. However, not requiring e to be neutral here has the advantage
that we know right away that we only need to prove the existence of a right neutral element to show
some structure is a group.
4 BASIC ALGEBRAIC STRUCTURES 62

Before considering examples of magmas (A, ◦) in Ex. 4.9 below, we prove some first
rules and introduce some more notation.

Theorem 4.6. Let (G, ◦) be a group with right neutral element e ∈ G. Then the
following statements and rules are valid in G:

(a) e is a neutral element (thus, in particular, it is uniquely determined according to


Lem. 4.3).

(b) If x, a ∈ G, then x ◦ a = e if, and only if, a ◦ x = e (i.e. an element is right inverse
if, and only if, it is left inverse). Moreover, inverse elements are unique (for each
x ∈ G, the unique inverse is then denoted by x−1 ).

(c) (x−1 )−1 = x holds for each x ∈ G.

(d) y −1 ◦ x−1 = (x ◦ y)−1 holds for each x, y ∈ G.

(e) Cancellation Laws: For each x, y, a ∈ G, one has

x ◦ a = y ◦ a ⇒ x = y, (4.4a)
a ◦ x = a ◦ y ⇒ x = y. (4.4b)

Proof. (a): Let x ∈ G. By Def. 4.5(b), there exists y ∈ G such that x ◦ y = e and, in
turn, z ∈ G such that y ◦ z = e. Thus, as e is right neutral.

e ◦ z = (x ◦ y) ◦ z = x ◦ (y ◦ z) = x ◦ e = x,

implying
x = e ◦ z = (e ◦ e) ◦ z = e ◦ (e ◦ z) = e ◦ x,
showing e to be left neutral as well.
(b): To a choose b ∈ G such that a ◦ b = e. If x ◦ a = e, then

e = a ◦ b = (a ◦ e) ◦ b = (a ◦ (x ◦ a)) ◦ b = a ◦ ((x ◦ a) ◦ b) = a ◦ (x ◦ (a ◦ b))


= a ◦ (x ◦ e) = a ◦ x.

Interchanging the roles of x and a now proves the converse. To prove uniqueness, let
a, b be inverses to x. Then a = a ◦ e = a ◦ x ◦ b = e ◦ b = b.
(c): x−1 ◦ x = e holds according to (b) and shows that x is the inverse to x−1 . Thus,
(x−1 )−1 = x as claimed.
4 BASIC ALGEBRAIC STRUCTURES 63

(d) is due to y −1 ◦ x−1 ◦ x ◦ y = y −1 ◦ e ◦ y = e.


(e): If x ◦ a = y ◦ a, then x = x ◦ a ◦ a−1 = y ◦ a ◦ a−1 = y as claimed; if a ◦ x = a ◦ y,
then x = a−1 ◦ a ◦ x = a−1 ◦ a ◦ y = y as well. 

Notation 4.7. Exponentiation with Integer Exponents: Let (A, ·) be a magma with
neutral element 1 ∈ A. Define recursively for each x ∈ A and each n ∈ N0 :

x0 := 1, ∀ xn+1 := x · xn . (4.5a)
n∈N0

Moreover, if (A, ·) constitutes a group, then also define for each x ∈ A and each n ∈ N:

x−n := (x−1 )n . (4.5b)

Theorem 4.8. Exponentiation Rules: Let (G, ·) be a semigroup with neutral element
1 ∈ G. Let x, y ∈ G. Then the following rules hold for each m, n ∈ N0 . If (G, ·) is a
group, then the rules even hold for every m, n ∈ Z.

(a) xm+n = xm · xn .

(b) (xm )n = xm n .

(c) If the composition is also commutative, then xn y n = (xy)n .

Proof. (a): First, we fix n ∈ N0 and prove the statement for each m ∈ N0 by induction:
The base case (m = 0) is xn = xn , which is true. For the induction step, we compute
(4.5a) ind. hyp. (4.5a)
xm+1+n = x · xm+n = x · xm · xn = xm+1 xn ,

completing the induction step. Now assume G to be a group. Consider m ≥ 0 and


n < 0. If m + n ≥ 0, then, using what we have already shown,
(4.5b)
xm xn = xm (x−1 )−n = xm+n x−n (x−1 )−n = xm+n .

Similarly, if m + n < 0, then


(4.5b) (4.5b)
xm xn = xm (x−1 )−n = xm (x−1 )m (x−1 )−n−m = xm+n .

The case m < 0, n ≥ 0 is treated completely analogously. It just remains to consider


m < 0 and n < 0. In this case,
(4.5b) (4.5b)
xm+n = x−(−m−n) = (x−1 )−m−n = (x−1 )−m · (x−1 )−n = xm · xn .
4 BASIC ALGEBRAIC STRUCTURES 64

(b): First, we prove the statement for each n ∈ N0 by induction (for m < 0, we assume
G to be a group): The base case (n = 0) is (xm )0 = 1 = x0 , which is true. For the
induction step, we compute
(4.5a) ind. hyp. (a)
(xm )n+1 = xm · (xm )n = xm · xm n = xm n+m = xm (n+1) ,

completing the induction step. Now, let G be a group and n < 0. We already know
(xm )−1 = x−m . Thus, using what we have already shown,
(4.5b) −n
(xm )n = (xm )−1 = (x−m )−n = x(−m)(−n) = xm n .

(c): For n ∈ N0 , the statement is proved by induction: The base case (n = 0) is


x0 y 0 = 1 = (xy)0 , which is true. For the induction step, we compute
(4.5a) ind. hyp. (4.5a)
xn+1 y n+1 = x · xn · y · y n = xy · (xy)n = (xy)n+1 ,

completing the induction step. If G is a group and n < 0, then, using what we have
already shown,
(4.5b) Th. 4.6(d) −n (4.5b)
xn y n = (x−1 )−n (y −1 )−n = (x−1 y −1 )−n = (xy)−1 = (xy)n ,

which completes the proof. 


Example 4.9. (a) Let M be a set. Then (P(M ), ∩) and (P(M ), ∪) are magmas with
neutral elements, where M is neutral for ∩ and ∅ is neutral for ∪. From Th. 1.29,
we also know ∩ and ∪ to be associative and commutative, showing (P(M ), ∩) and
(P(M ), ∪) to form abelian semigroups. However, if M 6= ∅, then neither structure
forms a group, due to the lack of inverse elements: If N ( M , then there does not
exist B ⊆ M such that N ∩ B = M ; if ∅ 6= N ⊆ M , then there does not exist
B ⊆ M such that N ∪ B = ∅. If M = ∅, then P(M ) = {∅} and (P(M ), ∩) and
(P(M ), ∪) both constitute the group with one element (∅ being inverse to itself in
both cases).

(b) Let M be a set and define



SM := (π : M −→ M ) : π is bijective , (4.6)

where the elements of SM are also called permutations on M . Then (SM , ◦), where
◦ is given by composition of maps according to Def. 2.8, forms a group, the so-called
symmetric group or permutation group on M . We verify (SM , ◦) to be a group: ◦
is well-defined on SM , since the composition of bijective maps is bijective by Th.
2.14. Moreover, ◦ is associative by Th. 2.10(a). The neutral element is given by the
4 BASIC ALGEBRAIC STRUCTURES 65

identity map Id : M −→ M , Id(a) := a, and, if π ∈ SM , then its inverse map π −1


is also its inverse element in the group SM . We claim that ◦ is commutative on SM
if, and only if, #M ≤ 2: While commutativity is clear for #M ≤ 2, if {a, b, c} ⊆ M
with #{a, b, c} = 3 and we define f, g ∈ SM by letting
 

 a for x = a, 
 b for x = a,

c 
a
for x = b, for x = b,
f (x) := g(x) := (4.7)


 b for x = c, 

 c for x = c,
 
x for x∈/ {a, b, c}, x for x∈/ {a, b, c}.

Then (f ◦ g)(a) = c 6= b = (g ◦ f )(a), showing ◦ is not commutative.


The case M = {1, . . . , n}, n ∈ N, is often of particular interest and, in this case,
one also writes Sn for SM .

(c) (N0 , +) and (N0 , ·), where + and · denote the usual addition and multiplication on
N0 , respectively (see [Phi16, Def. D.1] for a definition via recursion), form commu-
tative semigroups (see [Phi16, Th. D.2]), but no groups due to the lack of inverse
elements. From N0 , one can then construct the sets of integers Z (see Ex. 2.36(a)),
rational numbers Q (see Ex. 2.36(b)), real numbers R (see [Phi16, Def. D.32]), and
complex numbers C (see [Phi16, Def. 5.1]). One can extend + and · to each of these
sets (see Th. 4.15, Ex. 4.33, and Ex. 4.39 below, [Phi16, Def. D.32], and [Phi16,
Def. 5.1]) and show that (Z, +), (Q, +), (R, +), (C, +) are commutative groups (see
Th. 4.15, Ex. 4.33(b), and Ex. 4.39 below, [Phi16, Th. D.35], and [Phi16, Th. 5.2])
as well as (Q \ {0}, ·), (R \ {0}, ·), (C \ {0}, ·) (see Ex. 4.39 below, [Phi16, Th. D.35],
and [Phi16, Th. 5.2]).

(d) Let (A, ◦) be a magma. We define another magma (P(A), ◦) by letting

∀ B ◦ C := {b ◦ c : b ∈ B ∧ c ∈ C} ⊆ A. (4.8)
B,C∈P(A)

The case, where B or C is a singleton set is of particular interest and one then uses
the following simplified notation:
 
∀ ∀ a ◦ B := {a} ◦ B, B ◦ a := B ◦ {a} . (4.9)
B∈P(A) a∈A

Sets of the form a ◦ B are called left cosets, sets of the form B ◦ a are called right
cosets. If (A, ◦) is commutative, then B ◦ C = C ◦ B for each B, C ∈ P(A), since
a = b ◦ c with b ∈ B, c ∈ C if, and only if, a = c ◦ b. In the same manner, one sees
that, if ◦ is associative on A, then it is also associative on P(A). If e ∈ A is left
(resp. right) neutral, then, clearly, {e} is left (resp. right) neutral for ◦ on P(A). In
4 BASIC ALGEBRAIC STRUCTURES 66

general, one can not expect (P(A), ◦) to be a group, even if (A, ◦) is a group, due
to the lack of inverse elements: For example, while (Z, +) is a group, (P(Z), +) is
not: For example, B := {0, 1} has no inverse: If C were inverse to B, then −1 ∈ C,
implying −1 = 0 + (−1) ∈ B + C, i.e. B + C 6= {0}, in contradiction to C being
inverse to B.
(e) Let (A, ·) be a magma and let M be a set. We consider the set F(M, A) = AM of
functions, mapping M into A. We make AM into a magma by defining · pointwise
on AM : Define

∀ f · g : M −→ A, (f · g)(x) := f (x) · g(x). (4.10)


f,g∈AM

We then have

(A, ·) commutative ⇒ (AM , ·) commutative, (4.11a)


(A, ·) associative ⇒ (AM , ·) associative, (4.11b)
M
e ∈ A left/right neutral ⇒ f ∈ A , f ≡ e left/right neutral, (4.11c)
M
(A, ·) group ⇒ (A , ·) group, (4.11d)

where (4.11a) – (4.11c) are immediate from (4.10). To verify (4.11d), let (A, ·) be
a group with e ∈ A being neutral. If f ∈ AM , then let

g : M −→ A, g(x) := (f (x))−1 .

Then
∀ (f · g)(x) = f (x) · (f (x))−1 = e,
x∈M

showing g to be inverse to f in AM , i.e. (AM , ·) forms a group.


The, perhaps, most important notion when studying structures is that of the composi-
tion-respecting map. The technical term for such a map is homomorphism, which we
define next:
Definition 4.10. Let (A, ◦) and (B, ◦) be magmas (caveat: to simplify notation, we
denote both compositions by ◦; however, they might even denote different compositions
on the same set A = B). A map φ : A −→ B is called homomorphism if, and only if,

∀ φ(x ◦ y) = φ(x) ◦ φ(y). (4.12)


x,y∈A

If φ is a homomorphism, then it is called monomorphism if, and only if, it is injective;


epimorphism if, and only if, it is surjective; isomorphism if, and only if, it is bijective;
4 BASIC ALGEBRAIC STRUCTURES 67

endomorphism if, and only if, (A, ◦) = (B, ◦); automorphism if, and only if, it is both
endomorphism and isomorphism. Moreover, (A, ◦) and (B, ◦) are called isomorphic
(denoted (A, ◦) ∼ = (B, ◦)) if, and only if, there exists an isomorphism φ : A −→ B. If
A and B have neutral elements e ∈ A, e′ ∈ B, then a homomorphism φ : A −→ B is
called unital if, and only if, φ(e) = e′ .

If (A, ◦) and (B, ◦) are isomorphic, then, from an algebraic perspective, the two struc-
tures are the same, except that the elements of the underlying set have been “renamed”
(where the isomorphism provides the assignment rule for obtaining the new names).

Proposition 4.11. Let (A, ◦), (B, ◦), (C, ◦) be magmas.

(a) If φ : A −→ B and ψ : B −→ C are homomorphisms, then so is ψ ◦ φ. If φ, ψ are


unital, so is ψ ◦ φ.

(b) If φ : A −→ B is an isomorphism, then φ−1 is an isomorphism as well.

(c) Let φ : A −→ B be an epimorphism. Then the following implications hold:

(A, ◦) commutative ⇒ (B, ◦) commutative,


(A, ◦) semigroup ⇒ (B, ◦) semigroup,
(A, ◦) group ⇒ (B, ◦) group.

If φ is even an isomorphism, then the above implications become equivalences.

(d) Let φ : A −→ B be an epimorphism. If e ∈ A is neutral, then B contains a neutral


element and φ is unital.

(e) If (A, ◦), (B, ◦) are groups and φ : A −→ B is a homomorphism, then φ is unital
and φ(a−1 ) = (φ(a))−1 for each a ∈ A.

Proof. (a): If φ and ψ are homomorphisms and x, y ∈ A, then



(ψ ◦ φ)(x ◦ y) = ψ φ(x) ◦ φ(y) = ψ(φ(x)) ◦ ψ(φ(y)),

showing ψ ◦ φ to be a homomorphism. If e ∈ A, e′ ∈ B, e′′ ∈ C are neutral and φ and


ψ are unital, then
(ψ ◦ φ)(e) = ψ(e′ ) = e′′ ,
showing ψ ◦ φ to be unital.
4 BASIC ALGEBRAIC STRUCTURES 68

(b): If φ is an isomorphism and x, y ∈ B, then


    
φ−1 (x ◦ y) = φ−1 φ φ−1 (x) ◦ φ φ−1 (y) = φ−1 φ φ−1 (x) ◦ φ−1 (y)

= φ−1 (x) ◦ φ−1 (y),

showing φ−1 to be an isomorphism.


(c): Exercise.
(d): It suffices to show, e′ := φ(e) is neutral. Let b ∈ B and a ∈ A such that φ(a) = b.
Then

b ◦ e′ = φ(a) ◦ φ(e) = φ(a ◦ e) = φ(a) = b,


e′ ◦ b = φ(e) ◦ φ(a) = φ(e ◦ a) = φ(a) = b,

proving e′ to be neutral as desired.


(e): Let (A, ◦), (B, ◦) be groups with neutral elements e ∈ A, e′ ∈ B, and let φ : A −→ B
be a homomorphism. Then

φ(e) ◦ e′ = φ(e) = φ(e ◦ e) = φ(e) ◦ φ(e).

Applying (φ(e))−1 to both sides of the above equality proves φ(e) = e′ . Moreover, for
each a ∈ A,
φ(a−1 ) ◦ φ(a) = φ(a−1 ◦ a) = φ(e) = e′ ,
proving (e). 
Example 4.12. (a) If (A, ◦) is a magma, then it is immediate that the identity Id :
A −→ A is an automorphism.

(b) If (A, ◦), (B, ◦) are magmas, where e ∈ B is neutral, then the constant map φ ≡ e
is a homomorphism: Indeed,

∀ φ(x ◦ y) = e = e ◦ e = φ(x) ◦ φ(y).


x,y∈A

(c) If (A, ·) is a semigroup with neutral element 1 ∈ A, then, for each fixed a ∈ A,

φa : N0 −→ A, φa (n) := an , (4.13)

is a homomorphism from (N0 , +) into (A, ·); if (A, ·) is a group, we may extend φa
to Z and it becomes a homomorphism from (Z, +) into (A, ·): These statements are
immediate from Th. 4.8(a). Note that, if A = N0 , Z, Q, R, C and (A, ·) = (A, +),
then φa (n) = na.
4 BASIC ALGEBRAIC STRUCTURES 69

(d) If (G, ·) is an abelian group, then for each fixed k ∈ Z,

φk : G −→ G, φk (a) := ak , (4.14)

is an endomorphism due to Th. 4.8(c) (the case k = −1, where a 7→ a−1 is of


particular interest). Note that, if G = N0 , Z, Q, R, C and (G, ·) = (G, +), then
φk (a) = ka.

Notation 4.13. A composition ◦ on a nonempty finite set A can be defined by means


of a composition table (also known as a multiplication table or Cayley table: For each
element a ∈ A, the table has a row labeled by a and it has a column labeled by a; if
a, b ∈ A, then the entry at the intersection of the row labeled a and the column labeled
b is a ◦ b (see Ex. 4.14 below, for examples).

Example 4.14. (a) The most trivial groups are the group with one element and the
group with two elements, which can be defined by the following Cayley tables (where
e, a are distinct elements):

◦ e a
◦ e
, e e a
e e
a a e

Both structures are abelian groups, which is immediate for the group with one
element and is easily verified for the group with two elements. We will see later
that they are isomorphic to the groups we will denote Z1 and Z2 (which, in light of
Prop. 4.11(c), shows again that the structure with two elements is a group).

(b) The following examples show that, in non-groups, left and right neutral elements
need not be unique: For a 6= b, let

◦ a b ◦ a b
a a b , a a a
b a b b b b

One easily verifies these structures to be associative, but they are not groups: The
magma on the left has no right neutral element, the magma on the right does not
have inverses to both a and b.

Theorem 4.15. Let (A, +) be an abelian semigroup with neutral element 0 ∈ A and
assume (A, +) to satisfy the cancellation laws (4.4) (with ◦ replaced by +). Consider
the quotient set

G := (A × A)/ ∼ = [(a, b)] : (a, b) ∈ A × A (4.15)
4 BASIC ALGEBRAIC STRUCTURES 70

with respect to the equivalence relation defined by

(a, b) ∼ (c, d) :⇔ a + d = b + c. (4.16)

Introducing the simplified notation [a, b] := [(a, b)], we define addition and subtraction
on G by

+ : G × G −→ G, [a, b], [c, d] 7→ [a, b] + [c, d] := [a + c, b + d], (4.17a)

− : G × G −→ G, [a, b], [c, d] 7→ [a, b] − [c, d] := [a, b] + [d, c]. (4.17b)

Then addition and subtraction on G are well-defined, (G, +) forms an abelian group,
where [0, 0] is the neutral element, [b, a] is the inverse element of [a, b] for each a, b ∈ A,
and, denoting the inverse element of [a, b] by −[a, b] in the usual way, [a, b] − [c, d] =
[a, b] + (−[c, d]) for each a, b, c, d ∈ A. Moreover, the map

ι : A −→ G, ι(a) := [a, 0], (4.18)

is a monomorphism, where it is customary to identify A with ι(A). One then just writes
a instead of [a, 0] and −a instead of [0, a] = −[a, 0]. The most important application is
the case A = N0 , which yields that addition on Z = G forms an abelian group (cf. Ex.
2.36(a), [Phi16, Th. D.2], [Phi16, Th. D.7(b)]).

Proof. If one reexamines Ex. 2.36(a), replacing N0 by A and Z by G, then one sees it
shows ∼ to constitute an equivalence relation, where the proof makes use of commuta-
tivity, associativity, and the cancellation law. To verify + and − are well-defined on G,
we need to check the above definitions do not depend on the chosen representatives, i.e.
 
∀ ˜ ˜
[a, b] = [ã, b̃] ∧ [c, d] = [c̃, d] ⇒ [a + c, b + d] = [ã + c̃, b̃ + d] (4.19a)
˜
a,b,c,d,ã,b̃,c̃,d∈A

and
 
∀ ˜
[a, b] = [ã, b̃] ∧ [c, d] = [c̃, d] ⇒ ˜ . (4.19b)
[a, b] − [c, d] = [ã, b̃] − [c̃, d]
˜
a,b,c,d,ã,b̃,c̃,d∈A

Indeed, [a, b] = [ã, b̃] means a + b̃ = b + ã, [c, d] = [c̃, d] ˜ means c + d˜ = d + c̃, implying
˜ ˜ proving (4.19a). Now (4.19b)
a + c + b̃ + d = b + ã + d + c̃, i.e. [a + c, b + d] = [ã + c̃, b̃ + d],
is just (4.17b) combined with (4.19a). To verify commutativity and associativity on G,
let a, b, c, d, e, f ∈ A. Then

[a, b] + [c, d] = [a + c, b + d] = [c + a, b + d] = [c, d] + [a, b],


4 BASIC ALGEBRAIC STRUCTURES 71

proving commutativity, and



[a, b] + [c, d] + [e, f ] = [a, b] + [c + e, d + f ] = [a + (c + e), b + (d + f )]
= [(a + c) + e, (b + d) + f ] = [a + c, b + d] + [e, f ]

= [a, b] + [c, d] + [e, f ],

proving associativity. For every a, b ∈ A, one obtains [a, b] + [0, 0] = [a + 0, b + 0] = [a, b],
proving neutrality of [0, 0], whereas [a, b] + [b, a] = [a + b, b + a] = [a + b, a + b] = [0, 0]
(since (a + b, a + b) ∼ (0, 0)) shows [b, a] = −[a, b]. Now [a, b] − [c, d] = [a, b] + (−[c, d])
is immediate from (4.17b). The map ι is injective, as, for each a, b ∈ A, ι(a) = [a, 0] =
ι(b) = [b, 0] implies a + 0 = 0 + b, i.e. a = b. The map ι is a homomorphism, as

∀ ι(a + b) = [a + b, 0] = [a, 0] + [b, 0] = ι(a) + ι(b),


a,b∈A

completing the proof of the theorem. 

Definition 4.16. Let (G, ◦) be a group, ∅ 6= U ⊆ G. We call U a subgroup of G


(denoted U ≤ G) if, and only if, (U, ◦) forms a group, where the composition on U is
the restriction of the composition on G, i.e. ◦↾U ×U .

Theorem 4.17. Let (G, ◦) be a group, ∅ 6= U ⊆ G. Then U ≤ G if, and only if, (i)
and (ii) hold, where

(i) For each u, v ∈ U , one has u ◦ v ∈ U .

(ii) For each u ∈ U , one has u−1 ∈ U .

If U is finite, then U ≤ G is already equivalent to (i).

Proof. Exercise. 

Example 4.18. (a) If (G, ◦) is an arbitrary group with neutral element e ∈ G, then it
is immediate from Th. 4.17, that {e} and G are always subgroups.

(b) We use Th. 4.17 to verify that, for each k ∈ Z, kZ = {kl : l ∈ Z} is a subgroup of
(Z, +): As 0 ∈ kZ, kZ 6= ∅. If l1 , l2 ∈ Z, then kl1 + kl2 = k(l1 + l2 ) ∈ kZ. If l ∈ Z,
then −kl = k(−l) ∈ kZ. Thus, Th. 4.17 applies, showing kZ ≤ Z.

(c) As N is no subgroup of Z, even though Th. 4.17(i) holds for N, we see that, in
general, Th. 4.17(ii) can not be omitted for infinite subsets.
4 BASIC ALGEBRAIC STRUCTURES 72

(d) If (G, ◦) is a group with neutral element


T e ∈ G, I 6= ∅ is an index set, and (Ui )i∈I
is a family of subgroups, then U := i∈I Ui is a subgroup as well: Indeed, e ∈ U ,
since e ∈ Ui for each i ∈ I. If a, b ∈ U , then a, b ∈ Ui for each i ∈ I, implying
a ◦ b ∈ Ui for each i ∈ I, i.e. a ◦ b ∈ U . Moreover a−1 ∈ Ui for each i ∈ I as well,
showing a−1 ∈ U . Thus, U is a subgroup by Th. 4.17.

(e) In contrast to intersections, unions of subgroups are not necessarily subgroups (how-
ever, cf. (f) below): We know from (b) that 2Z and 3Z are subgroups of (Z, +).
However, 2 ∈ 2Z, 3 ∈ 3Z, but 5 = 2 + 3 6∈ (2Z) ∪ (3Z), showing (2Z) ∪ (3Z) is not
a subgroup of (Z, +).

(f ) While (e) shows that unions of subgroups are not necessarily subgroups, if (G, ◦) is
a group with neutral element e ∈ G, I 6= ∅ is an index set, partially ordered by ≤
in a way such that, for each i, j ∈ I, there exists k ∈ I with i, j ≤ k (if I is totally
ordered by ≤, then one can use k := max{i, j}), and (Ui )i∈I is an increasingSfamily
of subgroups (i.e., for each i, j ∈ I with i ≤ j, one has Ui ⊆ Uj ), then U := i∈I Ui
is a subgroup as well: Indeed, if i ∈ I, then e ∈ Ui ⊆ U . If a, b ∈ U , then there exist
i, j ∈ I such that a ∈ Ui and b ∈ Uj . If k ∈ I with i, j ≤ k, then a ◦ b ∈ Uk ⊆ U .
Moreover, a−1 ∈ Ui ⊆ U . Thus, U is a subgroup by Th. 4.17.

(g) One can show that every group G is isomorphic to a subgroup of the permutation
group SG (see Sec. C of the Appendix, in particular, Cayley’s Th. C.2).
Definition 4.19. Let G and H be groups and let φ : G −→ H be a homomorphism.
Let e′ ∈ H be neutral. Define the sets

ker φ := φ−1 {e′ } = {a ∈ G : φ(a) = e′ }, (4.20a)


Im φ := φ(G) = {φ(a) : a ∈ G}, (4.20b)

where ker φ is called the kernel of φ and Im φ is called the image of φ.


Theorem 4.20. Let (G, ·) and (H, ·) be groups and let φ : G −→ H be a homomorphism.
Let e ∈ G and e′ ∈ H be the respective neutral elements. Also assume U ≤ G, V ≤ H.
Then the following statements hold true:

(a) φ−1 (V ) ≤ G.

(b) ker φ ≤ G.

(c) Im φ ≤ H.

(d) φ(U ) ≤ H.

(e) φ is a monomorphism (i.e. injective) if, and only if, ker φ = {e}.
4 BASIC ALGEBRAIC STRUCTURES 73

(f ) It holds true that



∀ φ−1 {φ(a)} = a(ker φ) = (ker φ)a,
a∈G

i.e. the nonempty preimages of elements of H are cosets of the kernel of φ (cf. Ex.
4.9(d)).

Proof. (a): Since V ≤ H, we have e′ ∈ V . From Prop. 4.11(e), we know φ(e) = e′ , i.e.
e ∈ φ−1 (V ) and φ−1 (V ) 6= ∅. If a, b ∈ φ−1 (V ), then φ(a), φ(b) ∈ V and, since V ≤ H,

φ(ab) = φ(a)φ(b) ∈ V,
Prop. 4.11(e)
φ(a−1 ) = (φ(a))−1 ∈ V,

showing ab ∈ φ−1 (V ) and a−1 ∈ φ−1 (V ). Thus, φ−1 (V ) ≤ G by Th. 4.17.


(b) is now immediate by applying (a) with V := {e′ } ≤ H.
(c): As φ(e) = e′ , we have e′ ∈ Im φ and Im φ 6= ∅. If x, y ∈ Im φ, then there exist
a, b ∈ G such that φ(a) = x, φ(b) = y. Thus,

φ(ab) = φ(a)φ(b) = xy,


φ(a−1 ) = (φ(a))−1 = x−1 ,

showing xy ∈ Im φ and x−1 ∈ Im φ. Once again, Im φ ≤ H now follows from Th. 4.17.
(d) is now immediate by applying (c) with φ replaced by φ↾U (then φ(U ) = Im φ↾U ≤ H).
(e): If φ is injective, then e is the unique preimage of e′ , showing ker φ = {e}. Conversely,
assume ker φ = {e}. If a, b ∈ G with φ(a) = φ(b), then

φ(a−1 b) = (φ(a))−1 φ(b) = e′ ,

showing a−1 b ∈ ker φ, i.e. a−1 b = e, i.e. b = a, and φ is injective.



(f): Fix a ∈ G. Suppose b ∈ φ−1 {φ(a)} . Then φ(a) = φ(b), implying  φ(a−1 b) =
(φ(a))−1 φ(b) = e′ and a−1 b ∈ ker φ. Thus, b ∈ a(ker φ) and φ−1 {φ(a)} ⊆ a(ker φ).
Similarly, φ(ba−1 ) = φ(b)(φ(a))−1 = e′ , showing b ∈ (ker φ)a and φ−1 {φ(a)} ⊆
(ker φ)a. Conversely, suppose b ∈ a(ker φ) (resp. b ∈ (ker φ)a). Then a−1 b ∈ ker φ
(resp. ba−1 ∈ ker φ), implying

e′ = φ(a−1 b) = φ(a−1 )φ(b) resp. e′ = φ(ba−1 ) = φ(b)φ(a−1 ) .

In both cases,
 we conclude φ(a) = φ(b), i.e.  b ∈ φ−1 {φ(a)} . Thus a(ker φ) ⊆
φ−1 {φ(a)} as well as (ker φ)a ⊆ φ−1 {φ(a)} , thereby establishing the case. 
4 BASIC ALGEBRAIC STRUCTURES 74

Example 4.21. According to Ex. 4.12(c), if k ∈ Z, then the map φk : Z −→ Z,


φk (l) := kl, is a homomorphism of (Z, +) into itself. Clearly, Im φk = kZ, showing, once
again, kZ ≤ Z (cf. Ex. 4.18(b)). For another example, let G := {e, a} be the group
with two elements of Ex. 4.14(a). According to Ex. 4.12(c), the map φ : Z −→ G,
φ(n) := an , is a homomorphism. Clearly, φ is an epimorphism with ker φ = 2Z (i.e. the
set of all even numbers).

In Ex. 4.9(d), we have defined cosets for a given subset of a magma. For a group G,
cosets of subgroups are of particular interest: We will see in Prop. 4.22 that, if U ≤ G,
then the cosets of U form a partition of G, giving rise to an equivalence relation on G
according to Th. 2.33(a). Thus, the cosets form the quotient set with respect to this
equivalence relation and, if the subgroup is sufficiently benign (namely what we will call
a normal subgroup in Def. 4.23 below), then we can make this quotient set itself into
another group, the so-called quotient group or factor group G/U (cf. Def. 4.26 and Th.
4.27(a) below).
Proposition 4.22. Let (G, ·) be a group, U ≤ G. Then
[  
G= aU, ∀ aU = bU ∨ aU ∩ bU = ∅ , (4.21a)
a,b∈G
a∈G
[  
G= U a, ∀ Ua = Ub ∨ Ua ∩ Ub = ∅ , (4.21b)
a,b∈G
a∈G

i.e. both the left cosets and the right cosets of U form decompositions of G.

Proof. We conduct the proof for left cosets – the proof for right cosets is completely
analogous. For each a ∈ G, we have a ∈ aU (since e ∈ U ), already showing G =
S
a∈G aU . Now let a, b, x ∈ G with x ∈ aU ∩ bU . We need to prove aU = bU . From
x ∈ aU ∩ bU , we obtain
∃ x = au1 = bu2 . (4.22)
u1 ,u2 ∈U

Now let au ∈ aU (i.e. u ∈ U ). As (4.22) implies a = bu2 u−1 −1


1 , we obtain au = bu2 u1 u ∈
bU , using u2 u−1
1 u ∈ U , due to U being a subgroup. This shows aU ⊆ bU . Analogously,
if bu ∈ bU , then bu = au1 u−12 u ∈ aU , where we used b = au1 u2
−1
due to (4.22) and
−1
u1 u2 u ∈ U . This shows bU ⊆ aU , completing the proof. 
Definition 4.23. Let (G, ·) be a group with subgroup U ≤ G. Then U is called normal
if, and only if,
∀ aU = U a, (4.23)
a∈G

i.e. if, and only if, left and right cosets with respect to U are the same.
4 BASIC ALGEBRAIC STRUCTURES 75

Theorem 4.24. Let (G, ·) and (H, ·) be groups and let φ : G −→ H be a homomorphism.

(a) If V is a normal subgroup of H, then U := φ−1 (V ) is a normal subgroup of G.

(b) ker φ is a normal subgroup of G.

Proof. (a): As we already know from Th. 4.20(a) that U is a subgroup of G, it merely
remains to show U is normal. If a ∈ G and u ∈ U , then φ(u) ∈ V . Since V is normal,
we have φ(a)V = V φ(a). Thus, there exists v ∈ V such that φ(a)φ(u) = vφ(a), namely

v = φ(a)φ(u)(φ(a))−1 = φ(a)φ(u)φ(a−1 ) = φ(aua−1 ),

showing u′ := aua−1 ∈ U . Thus, au = aua−1 a = u′ a, proving aU = U a, as desired.


(b) is now immediate by applying (a) with V := {e}, where e denotes the neutral
element of H. Alternatively, one also obtains ker φ to be a normal subgroup of G as a
direct consequence of Th. 4.20(f). 
Example 4.25. (a) If G is an abelian group, then every subgroup of G is normal.

(b) In Ex. 4.9(b), we defined the symmetric groups S2 and S3 . Clearly, the map
(
f (n) for n ∈ {1, 2},
φ : S2 −→ S3 , f 7→ φ(f ), φ(f )(n) :=
3 for n = 3,

constitutes a monomorphism. By identifying S2 with φ(S2 ), we may consider S2 as


a subgroup of S3 . We claim that S2 is not a normal subgroup of S3 . Indeed, one
checks that

(23)S2 = (23){Id, (12)} = {(23), (132)} 6= S2 (23) = {(23), (123)},

where we made use of a notation, commonly used for permutations (we will need to
study it more thoroughly later): (12) is the map that permutes 1 and 2 and leaves
3 fixed, (23) permutes 2 and 3 and leaves 1 fixed, (132) maps 1 into 3, 3 into 2, 2
into 1, (123) maps 1 into 2, 2 into 3, 3 into 1.
Definition 4.26. Let (G, ·) be a group and let N ≤ G be a normal subgroup. According
to Prop. 4.22, the cosets of N form a partition of G. According to Th. 2.33(a), the cosets
are precisely the equivalence classes of the equivalence relation ∼ on G, defined by

a ∼ b :⇔ aN = bN.

Define
G/N := G/ ∼,
4 BASIC ALGEBRAIC STRUCTURES 76

i.e. G/N is the set of all cosets of N . We define a composition on G/N by letting

∀ (aN ) · (bN ) := abN (4.24)


a,b∈G

(note that we already know the forming of cosets to be associative by Ex. 4.9(d)). We
call (G/N, ·) the quotient group of G by N or the factor group of G with respect to N
(cf. Th. 4.27(a) below).
Theorem 4.27. (a) Let (G, ·) be a group and let N ≤ G be a normal subgroup. Then
(4.24) well-defines a composition on G/N that makes G/N into a group with neutral
element N . Moreover, the map

φN : G −→ G/N, φN (a) := aN, (4.25)

is an epimorphism, called the canonical or natural homomorphism from G onto


G/N (comparing with Def. 2.32(b), we see that φN is precisely the quotient map
with respect to the equivalence relation ∼ of Def. 4.26).

(b) Isomorphism Theorem: If (G, ·) and (H, ·) are groups and φ : G −→ H is a


homomorphism, then
G/ ker φ ∼
= Im φ (4.26)
(recall from Th. 4.24(b) that ker φ is always a normal subgroup of G). More pre-
cisely, the map
f : G/ ker φ −→ Im φ, f (a ker φ) := φ(a), (4.27)
is well-defined and constitutes an isomorphism. If fe : G −→ G/ ker φ denotes the
natural epimorphism according to (a) and ι : Im φ −→ H, ι(a) := a, denotes the
embedding, then fm : G/ ker φ −→ H, fm := ι ◦ f , is a monomorphism such that

φ = fm ◦ fe . (4.28)

Proof. (a): To see that (4.24) well-defines a composition on G/N , we need to show
that the definition is independent of the chosen coset representatives a, b: To this end,
suppose a, b, a′ , b′ ∈ G are such that aN = a′ N and bN = b′ N . We need to show
abN = a′ b′ N . There exist na , nb ∈ N such that a′ = ana , b′ = bnb . Since N is a normal
subgroup of G, we have bN = N b and there exists n ∈ N such that na b = bn. Then, as
nnb N = N , we obtain

a′ b′ N = ana bnb N = abnnb N = abN,

as needed. That φN is surjective is immediate from (4.25). Moreover, the computation

∀ φN (ab) = abN = (aN ) · (bN ) = φN (a) · φN (b)


a,b∈G
4 BASIC ALGEBRAIC STRUCTURES 77

verifies φN to be a homomorphism. That (G/N, ·) forms a group is now an immediate


consequence of Prop. 4.11(c),(d).
(b): We conduct the proof by showing f (a ker φ) does not depend on the chosen coset
representative a ∈ G: Let e′ ∈ H be neutral and suppose a′ ∈ G is such that a′ ker φ =
a ker φ. Then there exists x ∈ ker φ such that a′ = ax, implying

f (a′ ker φ) = φ(a′ ) = φ(ax) = φ(a)φ(x) = φ(a) · e′ = φ(a) = f (a ker φ),

as desired. To prove f to be a homomorphism, let a, b ∈ G. Then



f (a ker φ) · (b ker φ) = f (ab ker φ) = φ(ab) = φ(a)φ(b) = f (a ker φ)f (b ker φ),

i.e. f is a homomorphism. If x ∈ Im φ, then there exists a ∈ G with x = φ(a).


Thus, f (a ker φ) = φ(a) = x, showing f to be surjective. Now suppose a ∈ G is such
that f (a ker φ) = e′ . Then φ(a) = e′ , i.e. a ∈ ker φ. Thus, a ker φ = ker φ, showing
ker f = {ker φ}, i.e. f is injective, completing the proof that f is an isomorphism. Since
f and ι are monomorphisms, so is fm = ι ◦ f . Moreover, if a ∈ G, then

(fm ◦ fe )(a) = fm (a ker φ) = f (a ker φ) = φ(a),

thereby proving (4.28). 

Example 4.28. (a) Consider (Z, +) and let n ∈ N. We know from Ex. 4.18(b) that
nZ ≤ Z. As (Z, +) is also commutative, nZ is a normal subgroup of Z and we can
form the quotient group
Zn := Z/nZ. (4.29)
The elements of Zn are the cosets r + nZ, r ∈ Z. For r ∈ Z, we have

r̄ := r + nZ = {r + mn : m ∈ Z}.

Note that r + mn + nZ = r + nZ. For integers k, l ∈ Z, one says k is congruent l


modulo n if, and only if, there exists m ∈ Z such that k − l = mn and, in this case,
one writes k ≡ l (mod n). If k = r + mn with m ∈ Z, then k − r = mn, implying

r + nZ = {k ∈ Z : k ≡ r (mod n)}.

For this reason, one also calls the elements of Zn congruence classes. If k = r + mn
as above, one also says that r is the residue (or remainder) when dividing k by n
and, thus, one also calls the elements of Zn residue classes. Here, the canonical
homomorphism of Th. 4.27(a) takes the form

φ : Z −→ Zn , φ(r) := r̄ = r + nZ.
4 BASIC ALGEBRAIC STRUCTURES 78

We also note that


n−1
[
Z= r̄. (4.30)
r=0
Now consider G := {0, . . . , n − 1} and define an addition ⊕ on G by letting
(
r+s for r + s ≤ n − 1,
∀ r ⊕ s :=
r,s∈G r + s − n for n ≤ r + s ≤ 2n − 2,
which is commutative due to (Z, +) being commutative. We claim (G, ⊕) ∼
= (Zn , +)
due to the isomorphism f : G −→ Zn , f (r) := r̄. If r, s ∈ G, then
f (r ⊕ s) = f (r + s) = r + s = r̄ + s̄ = f (r) + f (s) for r + s ≤ n − 1,
f (r ⊕ s) = f (r + s − n) = r + s − n = r + s = r̄ + s̄
= f (r) + f (s) for n ≤ r + s ≤ 2n − 2,
showing f to be a homomorphism. Moreover, if r 6= s, then r̄ 6= s̄, i.e. f is injective,
while (4.30) shows f to be surjective, completing the proof that f is an isomorphism.
In particular, (G, ⊕) is a group, due to Th. 4.11(c). Even though it constitutes a
slight abuse of notation, we will tend to use the isomorphism f to identify G with
Zn , where we write simply + instead of ⊕ and write r instead of r̄, identifying r̄
with its standard representative r ∈ r̄. The Cayley tables for (Z1 , +) and (Z2 , +)
are
+ 0 1
+ 0
, 0 0 1 ,
0 0
1 1 0
respectively. Comparing with Ex. 4.14(a), we see that φ1 : Z1 −→ {e}, φ1 (0) := e,
and φ2 : Z2 −→ {e, a}, φ2 (0) := e, φ2 (1) := a, are isomorphisms.
(b) Let m, n ∈ N. As example for an application of the isomorphism theorem of Th.
4.27(b), we show
mZ/mnZ ∼ = Zn :
Consider the map
φ : mZ −→ Zn , φ(mr) := r + nZ.
Then φ is a homomorphism, since
∀ φ(mr+ms) = φ(m(r+s)) = r+s+nZ = (r+nZ)+(s+nZ) = φ(mr)+φ(ms).
r,s∈Z

Clearly, φ is surjective, i.e. Im φ = Zn . Moreover,


mr ∈ ker φ ⇔ r ∈ nZ ⇔ ∃ r = kn ⇔ mr = mkn ∈ mnZ,
k∈Z

showing ker φ = mnZ. In consequence, mZ/mnZ ∼


= Zn holds due to Th. 4.27(b).
4 BASIC ALGEBRAIC STRUCTURES 79

4.2 Rings and Fields


While groups are already sufficiently rich to give rise to the vast algebraic discipline
called group theory, before we can define vector spaces, we still need to consider struc-
tures called rings and fields (where fields are rings that are especially benign). Rings
are richer and, in general, more complicated than groups, as they always come with two
compositions instead of just one.

Definition 4.29. Let R be a nonempty set with two composition maps

+ : R × R −→ R, (x, y) 7→ x + y,
(4.31)
· : R × R −→ R, (x, y) 7→ x · y

(+ is called addition and · is called multiplication; as before, one often writes xy instead
of x · y). Then (R, +, ·) (or just R, if + and · are understood) is called a ring if, and
only if, the following three conditions are satisfied:

(i) (R, +) is a commutative group (its neutral element is denoted by 0).

(ii) Multiplication is associative.

(iii) Distributivity:

∀ x · (y + z) = x · y + x · z, (4.32a)
x,y,z∈R

∀ (y + z) · x = y · x + z · x. (4.32b)
x,y,z∈R

A ring R is called commutative if, and only if, its multiplication is commutative. More-
over, a ring is called a ring with unity if, and only if, R contains a neutral element with
respect to multiplication (i.e. there is 1 ∈ R such that 1 · x = x · 1 = x for each x ∈ R)
– some authors always require a ring to have a neutral element with respect to multi-
plication. Finally, (R, +, ·) is called a field if, and only if, it is a ring and, in addition,
(R \ {0}, ·) constitutes a commutative group (its neutral element is denoted by 1). For
each x, y in a ring R, define y − x := y + (−x), where −x is the additive inverse of x. If
R is a field, then, for x 6= 0, also define the fractions xy := y/x := yx−1 with numerator
y and denominator x, where x−1 denotes the multiplicative inverse of x.

Like group theory, ring theory and field theory are vast important subdisciplines of
algebra. Here we will merely give a brief introduction to some elementary notions and
results, before proceeding to vector spaces, which are defined, building on the notion of
4 BASIC ALGEBRAIC STRUCTURES 80

a field. Before we come to examples of rings and fields, we will prove a number of basic
rules. However, it might be useful to already know that, with the usual addition and
multiplication, Z is a ring (but not a field), and Q, R, C all are fields.
Theorem 4.30. The following statements and rules are valid in every ring (R, +, ·) (let
0 denote the additive neutral element and let x, y, z ∈ R):

(a) x · 0 = 0 = 0 · x.
(b) x(−y) = −(xy) = (−x)y.
(c) (−x)(−y) = xy.
(d) x(y − z) = xy − xz, (y − z)x = yx − zx.

Proof. (a): One computes


(4.32a)
x · 0 + x · 0 = x · (0 + 0) = x · 0 = 0 + x · 0,

i.e. x · 0 = 0 follows since (R, +) is a group. The second equality follows analogously
using (4.32b).
(b): xy +x(−y) = x(y −y) = x·0 = 0, where we used (4.32a) and (a). This shows x(−y)
is the additive inverse to xy. The second equality follows analogously using (4.32b).
(c): xy = −(−(xy)) = −(x(−y)) = (−x)(−y), where (b) was used twice.
(d): x(y − z) = x(y + (−z)) = xy + x(−z) = xy − xz and (y − z)x = (y + (−z))x =
yx + (−z)x = yx − zx. 
Theorem 4.31. The following statement and rules are valid in every field (F, +, ·):

(a) xy = 0 ⇒ x = 0 ∨ y = 0.
(b) Rules for Fractions:
a b ad + bc a b ab a/c ad
+ = , · = , = ,
c d cd c d cd b/d bc
where all denominators are assumed 6= 0.

Proof. (a): If xy = 0 and x 6= 0, then y = 1 · y = x−1 xy = x−1 · 0 = 0.


(b): One computes
a b ad + bc
+ = ac−1 + bd−1 = add−1 c−1 + bcc−1 d−1 = (ad + bc)(cd)−1 =
c d cd
4 BASIC ALGEBRAIC STRUCTURES 81

and
a b ab
· = ac−1 bd−1 = ab(cd)−1 =
c d cd
and
a/c ad
= ac−1 (bd−1 )−1 = ac−1 b−1 d = ad(bc)−1 = ,
b/d bc
completing the proof. 

Definition and Remark 4.32. Let (R, +, ·) be a ring and a ∈ R. Then a is called a left
(resp. right) zero divisor if, and only if, there exists x ∈ R \ {0} such that ax = 0 (resp.
xa = 0). If a 6= 0 is a zero divisor, then it is called a nonzero or nontrivial zero divisor.
According to Th. 4.30(a), 0 is always a zero divisor, except for R = {0}. According
to Th. 4.31(a), in a field, there do not exist any nontrivial zero divisors. However, in
general, rings can have nontrivial zero divisors (see, e.g. Ex. 4.38 and Ex. 4.42(a) below).

Example 4.33 (Ring of Integers Z). Even though we have occasionally already used
multiplication on Z in examples, so far, we have not provided a mathematically precise
definition. The present example, will remedy this situation. Recall from Ex. 2.36(a) and
Th. 4.15 the definition of Z as a set of equivalence classes of elements of N0 × N0 with
addition (and subtraction) on Z defined according to (4.17). To obtain the expected
laws of arithmetic, multiplication on Z needs to be defined such that (a − b) · (c − d) =
(ac + bd) − (ad + bc), which leads to the following definition: Multiplication on Z is
defined by

· : Z × Z −→ Z, [a, b], [c, d] 7→ [a, b] · [c, d] := [ac + bd, ad + bc]. (4.33)

(a) It is an exercise to show the definition in (4.33) does not depend on the chosen
representatives, i.e.
 
∀ ˜ ⇒ [ac+bd, ad+bc] = [ãc̃+ b̃d,
[a, b] = [ã, b̃]∧[c, d] = [c̃, d] ˜ ãd+
˜ b̃c̃] .
˜ 0
a,b,c,d,ã,b̃,c̃,d∈N
(4.34)

(b) It is an exercise to show (Z, +, ·) is a commutative ring with unity, where [1, 0] is
the neutral element of multiplication, and there are no nontrivial zero divisors i.e.
 
∀ [a, b]·[c, d] = [ac+bd, ad+bc] = [0, 0] ⇒ [a, b] = [0, 0]∨[c, d] = [0, 0] .
a,b,c,d∈N0
(4.35)

(c) (Z, +, ·) is not a field, since, e.g., there is no multiplicative inverse for 2 ∈ Z: While
0 · 2 = 0, one has n · 2 ≥ 2 for each n ∈ N and −n · 2 ≤ −2 for each n ∈ N, showing
k · 2 6= 1 for each k ∈ Z.
4 BASIC ALGEBRAIC STRUCTURES 82

Definition 4.34. (a) Let (R, +, ·) be a ring (resp. a field), ∅ 6= U ⊆ R. We call U a


subring (resp. a subfield) of R if, and only if, (U, +, ·) forms a ring (resp. a field),
where the compositions on U are the respective restrictions of the compositions on
R, i.e. +↾U ×U and ·↾U ×U .

(b) Let (R, +, ·) and (S, +, ·) be rings (caveat: to simplify notation, we use the same
symbols to denote the compositions on R and S, respectively; however, they might
even denote different compositions on the same set R = S). A map φ : R −→ S
is called ring homomorphism if, and only if, φ is a homomorphism in the sense of
Def. 4.10 with respect to both + and · (caveat: Ex. 4.36(a) below shows that a
ring homomorphism is not necessarily unital with respect to multiplication). The
notions ring monomorphism, epimorphism, isomorphism, endomorphism, automor-
phism are then defined as in Def. 4.10. Moreover, (R, +, ·) and (S, +, ·) are called
isomorphic (denoted (R, +, ·) ∼= (S, +, ·)) if, and only if, there exists a ring isomor-
phism φ : R −→ S.

Theorem 4.35. (a) Let (R, +, ·) be a ring, ∅ 6= U ⊆ R. Then U is a subring of R if,


and only if, (i) and (ii) hold, where

(i) (U, +) is a subgroup of (R, +).


(ii) For each a, b ∈ U , one has ab ∈ U .

(b) Let (F, +, ·) be a field, ∅ 6= U ⊆ F . Then U is a subfield of F if, and only if, (i)
and (ii) hold, where

(i) (U, +) is a subgroup of (F, +).


(ii) (U \ {0}, ·) is a subgroup of (F \ {0}, ·).

Proof. In both cases, it is clear that (i) and (ii) are necessary. That (i) and (ii) are
also sufficient in both cases is due to the fact that, if associativity (resp. commutativity,
resp. distributivity) hold on R (resp. F ), then it is immediate that the same property
holds on U . 

Example 4.36. (a) Clearly, the trivial ring ({0}, +, ·) (∼


= (Z1 , +, ·), see Ex. 4.38 below)
is a subring of every ring (it is not a field, since {0} \ {0} = ∅ is not a group).
If R and S are arbitrary rings, then, clearly, the constant map φ0 : R −→ S,
φ0 ≡ 0, is always a ring homomorphism (this shows that a ring homomorphism is
not necessarily unital with respect to multiplication). Also note that Th. 4.30(a)
implies that any ring that has 0 = 1 (in the sense that the neutral elements for
addition and multiplication are the same) has necessarily precisely one element, i.e.
it is (isomorphic to) ({0}, +, ·).
4 BASIC ALGEBRAIC STRUCTURES 83

(b) Let n ∈ N. It is clear from Th. 4.35(a) that nZ is a subring of Z (note that, for
n > 1, nZ is not a ring with unity). Moreover, for n > 1, φ : Z −→ nZ, φ(k) := nk,
is not a ring homomorphism, since, e.g. φ(1 · 1) = n 6= n2 = φ(1)φ(1).

(c) If R, S are arbitrary rings and φ : R −→ S is a ring homomorphism, then it is


clear from Th. 4.35(a) that Im φ is a subring of S. Moreover, ker φ := φ−1 ({0}) is
a subring of R: This is also due to Th. 4.35(a), since (ker φ, +) ≤ (R, +) and, if
a, b ∈ ker φ, then φ(ab) = φ(a)φ(b) = 0, showing ab ∈ ker φ. However, if R and S
are rings with unity, and φ(1) = 1, then ker φ is a ring with unity if, and only if,
S = {0}: Indeed, if 1 ∈ ker φ, then 0 = φ(1) = 1 in S.

(d) Let (R, +, ·) be a ring (resp. a ring with unity, resp. a field), let I 6= ∅ be an index
set, and let (Ui )i∈I be a family of subrings (resp.
T of subrings with unity, resp. of
subfields). It is then an exercise to show U := i∈I Ui is a subring (resp. a subring
with unity, resp. a subfield).

(e) In contrast to intersections, unions of rings are not necessarily rings (however, cf.
(f) below): We know from (b) that 2Z and 3Z are subrings of Z. But we already
noted in Ex. 4.18(e) that (2Z) ∪ (3Z) is not even a subgroup of (Z, +). To show
that the union of two fields is not necessarily a field (not even a ring, actually),
needs slightly more work: We use that Q is a subfield of R (cf. Rem. 4.40 below)
and show that, for each x ∈ R with x2 ∈ Q, Q(x) := {r + sx : r, s ∈ Q} is a subfield
of R (clearly, Q(x) = Q if, and only if, x ∈ Q): Setting r = s = 0, shows 0 ∈ Q(x).
Setting r = 1, s = 0, shows 1 ∈ Q(x). If r1 , s1 , r2 , s2 ∈ Q, then

r1 + s1 x + r2 + s2 x = r1 + r2 + (s1 + s2 )x ∈ Q(x),
(r1 + s1 x)(r2 + s2 x) = r1 r2 + s1 s2 x2 + (s1 r2 + r1 s2 )x ∈ Q(x),
−(r1 + s1 x) = −r1 − s1 x ∈ Q(x),
r1 − s 1 x
(r1 + s1 x)−1 = ∈ Q(x),
r12 − s21 x2

showing
√ Q(x) to be a subfield of R by√Th. 4.35(b).
√ However,√e.g., U := Q( 2) ∪
Q( 3) is not even a ring, since α√:= 2√ + 3∈ / U: α ∈/√ Q( 2) since, otherwise,
there
√ are r, s ∈ Q with α = r + s 2, i.e. 3 = r + (s − 1) 2, and 3 = r2 + 2r(s −
1) 2 + 2(s − 1)2 , implying the false statement
√ p √
3=0 ∨ 3∈Q ∨ 3/2 ∈ Q ∨ 2 ∈ Q.
√ √ √
Switching the roles of 2 and 3, shows α ∈ / Q( 3), i.e. α ∈
/ U.

(f ) While (e) shows that unions of subrings (or even subfields) are not necessarily
subrings, if (R, +, ·) is a ring (resp. a ring with unity, resp. a field), I 6= ∅ is an
4 BASIC ALGEBRAIC STRUCTURES 84

index set, partially ordered by ≤ in a way such that, for each i, j ∈ I, there exists
k ∈ I with i, j ≤ k (if I is totally ordered by ≤, then one can use k := max{i, j}),
and (Ui )i∈I is an increasing family of subrings (resp. of subrings with unity,S resp. of
subfields) (i.e., for each i, j ∈ I with i ≤ j, one has Ui ⊆ Uj ), then U := i∈I Ui is a
subgroup (resp. a subring with unity, resp. a subfield) as well: Indeed, according to
Ex. 4.18(f), (U, +) ≤ (R, +). If a, b ∈ U , then there exist i, j ∈ I such that a ∈ Ui
and b ∈ Uj . If k ∈ I with i, j ≤ k, then ab ∈ Uk ⊆ U , i.e. U is a ring by Th. 4.35(a).
If 1 ∈ Ui , then 1 ∈ U . If each (Ui \ {0}, ·) is a group, then (U \ {0}, ·) is a group by
Ex. 4.18(f), showing U to be a subfield by Th. 4.35(b).

We can extend Prop. 4.11(c) to rings and fields:


Proposition 4.37. Let A be a nonempty set with compositions + and ·, and let B be
a nonempty set with compositions + and · (caveat: to simplify notation, we use the
same symbols to denote the compositions on R and S, respectively; however, they might
even denote different compositions on the same set A = B). If φ : A −→ B is an
epimorphism with respect to both + and ·, then

(A, +, ·) satisfies (4.32a) ⇒ (B, +, ·) satisfies (4.32a),


(A, +, ·) satisfies (4.32b) ⇒ (B, +, ·) satisfies (4.32b),
(A, +, ·) ring ⇒ (B, +, ·) ring,
(A, +, ·) ring with unity ⇒ (B, +, ·) ring with unity,
(A, +, ·) field ⇒ (B, +, ·) field if B 6= {0}.

If φ is even an isomorphism with respect to both + and ·, then the above implications
become equivalences.

Proof. Let b1 , b2 , b3 ∈ B with preimages a1 , a2 , a3 ∈ A, respectively. If (A, +, ·) satisfies


(4.32a), then
 
b1 · (b2 + b3 ) = φ(a1 ) · φ(a2 ) + φ(a3 ) = φ a1 · (a2 + a3 ) = φ(a1 · a2 + a1 · a3 )
= φ(a1 ) · φ(a2 ) + φ(a1 ) · φ(a3 ) = b1 · b2 + b1 · b3 ,

showing (B, +, ·) satisfies (4.32a) as well. If (A, +, ·) satisfies (4.32b), then


 
(b2 + b3 ) · b1 = φ(a2 ) + φ(a3 ) · φ(a1 ) = φ (a2 + a3 ) · a1 = φ(a2 · a1 + a3 · a1 )
= φ(a2 ) · φ(a1 ) + φ(a3 ) · φ(a1 ) = b2 · b1 + b3 · b1 ,

showing (B, +, ·) satisfies (4.32b) as well. If (A, +, ·) is a ring, then we know from Prop.
4.11(c) that (B, +) is a commutative group and that (B, ·) is associative, showing that
4 BASIC ALGEBRAIC STRUCTURES 85

(B, +, ·) is a ring as well. If (A, +, ·) is a ring with unity, then the above and Prop.
4.11(d) imply (B, +, ·) to be a ring with unity as well. If (A, +, ·) is a field, then, as
B \ {0} 6= ∅, (B \ {0}, ·) must also be a group, i.e. (B, +, ·) is a field. Finally, if φ
is an isomorphism, then the implications become equivalences, as both φ and φ−1 are
epimorphisms by Prop. 4.11(b) (and since φ(0) = 0). 
Example 4.38. Let n ∈ N. In Ex. 4.28(a), we considered the group (Zn , +), where Zn
was defined as the quotient group Zn := Z/nZ. We now want to define a multiplication
on Zn that makes Zn into a ring with unity (and into a field if, and only if, n is prime,
cf. Def. D.2(b)). This is accomplished by letting

∀ r̄ · s̄ = (r + nZ) · (s + nZ) := rs = rs + nZ : (4.36)


r,s∈Z

First, we check that the definition does not depend on the chosen representatives: Let
r1 , r2 , s1 , s2 ∈ Z. If r̄1 = r̄2 and s̄1 = s̄2 , then there exist mr , ms ∈ Z such that
r1 − r2 = mr n and s1 − s2 = ms n, implying

r̄1 · s̄1 = r1 s1 + nZ = (r2 + mr n)(s2 + ms n) + nZ


= r2 s2 + (mr s2 + ms r2 + mr ms n)n + nZ = r2 s2 + nZ = r̄2 · s̄2 ,

as desired. We already know the canonical homomorphism

φ : Z −→ Zn , φ(r) = r̄ = r + nZ

to be a homomorphism with respect to addition. Since

∀ φ(rs) = rs = r̄ · s̄ = φ(r) · φ(s),


r,s∈Z

it is also a homomorphism with respect to the above-defined multiplication. Since φ is


also surjective, (Zn , +, ·) is a ring with unity by Prop. 4.37. Now suppose that n is not
prime. If n = 1, then #Z1 = 1, i.e. it is (isomorphic) to the trivial ring of Ex. 4.36(a).
If n = ab with a, b ∈ N, 1 < a, b < n, then

ā · b̄ = n̄ = 0̄,

showing ā and b̄ to be nontrivial divisors of 0 (in Zn ). In particular, (Zn , +, ·) is not a


field. Now consider the case that n is prime. If r ∈ Z and r̄ 6= 0̄, then gcd(r, n) = 1
(cf. Def. D.2(c)) and, by (D.6) of the Appendix, there exist x, y ∈ Z with xn + yr = 1.
Then
ȳ · r̄ = yr = 1 − xn = 1,
showing ȳ to be the multiplicative inverse to r̄. Thus, (Zn \{0}, ·) is a group and (Zn , +, ·)
is a field. For the rest of the example, we, once again, allow an arbitrary n ∈ N. In
4 BASIC ALGEBRAIC STRUCTURES 86

Ex. 4.28(a), we showed that, letting G := {0, . . . , n − 1}, we had (G, +) ∼


= (Zn , +).
We now want to extend this isomorphism to a ring isomorphism. To this end, define a
multiplication ⊗ on G by letting
∀ r ⊗ s := p, where rs = qn + p with q, p ∈ N0 and 0 ≤ p < n (cf. Th. D.1),
r,s∈G

which is commutative due to (Z, +) being commutative. We claim the additive isomor-
phism f : G −→ Zn , f (r) := r̄, of Ex. 4.28(a) to be a multiplicative isomorphism as
well: If r, s ∈ G, then, using q, p as above,
f (r ⊗ s) = f (p) = p̄ = rs − qn = rs = r̄ · s̄ = f (r)f (s),
showing f to be a multiplicative homomorphism. As we know f to be bijective, (G, +, ·)
is a ring by Prop. 4.37, yielding (G, +, ⊗) ∼ = (Zn , +, ·). As stated in Ex. 4.28(a) in the
context of (Zn , +), one tends to use the isomorphism f to identify G with Zn . One does
the same in the context of (Zn , +, ·), also simply writing · for the multiplication on G.
Example 4.39 (Field of Rational Numbers Q). In Ex. 2.36(b), we defined the set of
rational numbers Q as the quotient set of Z × (Z \ {0}) with respect to the equivalence
relation defined by
(a, b) ∼ (c, d) :⇔ a · d = b · c.
We now want to define “the usual” addition and multiplication on Q and then verify
that these make Q into a field (recall the notation ab := a/b := [(a, b)]):
Addition on Q is defined by
a c  a c ad + bc
+ : Q × Q −→ Q, , 7→ + := . (4.37)
b d b d bd
Multiplication on Q is defined by
a c  a c ac
· : Q × Q −→ Q, , 7→ · := . (4.38)
b d b d bd
We will now show that (Q, +, ·) forms a field, where 0/1 and 1/1 are the neutral elements
with respect to addition and multiplication, respectively, (−a/b) is the additive inverse
to a/b, whereas b/a is the multiplicative inverse to a/b with a 6= 0. We already know
from Ex. 2.36(b) that the map
k
ι : Z −→ Q, ι(k) :=
,
1
is injective. We will now also show it is a unital ring monomorphism, i.e. it satisfies
ι(1) = 11 ,
∀ ι(k + l) = ι(k) + ι(l), (4.39a)
k,l∈Z

∀ ι(kl) = ι(k) · ι(l). (4.39b)


k,l∈Z
4 BASIC ALGEBRAIC STRUCTURES 87

We begin by showing that the definitions in (4.37) and (4.38) do not depend on the
chosen representatives, i.e.
!
a ã c c̃ ad + bc ˜
ãd + b̃c̃
∀ ∀ = ∧ = ⇒ = (4.40)
˜
a,c,ã,c̃∈Z b,d,b̃,d∈Z\{0} b b̃ d d˜ bd b̃d˜

and  
a ã c c̃ ac ãc̃
∀ ∀ = ∧ = ⇒ = . (4.41)
a,c,ã,c̃∈Z ˜
b,d,b̃,d∈Z\{0} b b̃ d d˜ bd b̃d˜
Furthermore, the results of both addition and multiplication are always elements of Q.
(4.40) and (4.41): a/b = ã/b̃ means ab̃ = ãb, c/d = c̃/d˜ means cd˜ = c̃d, implying

ad + bc ãd˜ + b̃c̃
(ad + bc)b̃d˜ = bd(ãd˜ + b̃c̃), i.e. =
bd b̃d˜
and
ac ãc̃
acb̃d˜ = bdãc̃, i.e.
= .
bd b̃d˜
That the results of both addition and multiplication are always elements of Q follows
from (4.35), i.e. from the fact that Z has no nontrivial zero divisors. In particular, if
b, d 6= 0, then bd 6= 0, showing (ad + bc)/(bd) ∈ Q and (ac)/(bd) ∈ Q.
Next, we verify + and · to be commutative and associative on Q: Let a, c, e ∈ Z and
b, d, f ∈ Z \ {0}. Then, using commutativity on Z, we compute

c a cb + da ad + bc a c c a ca ac a c
+ = = = + , · = = = · ,
d b db bd b d d b db bd b d
showing commutativity on Q. Using associativity and distributivity on Z, we compute
 
a c e a cf + de adf + b(cf + de) (ad + bc)f + bde
+ + = + = =
b d f b df bdf bdf
ad + bc e  a c  e
= + = + + ,
bd f b d f
   
a c e a(ce) (ac)e a c e
· · = = = · · ,
b d f b(df ) (bd)f b d f

showing associativity on Q. We proceed to checking distributivity on Q: Using commu-


tativity, associativity, and distributivity on Z, we compute
 
a c e a(cf + de) acf + dae acbf + bdae ac ae a c a e
· + = = = = + = · + · ,
b d f bdf bdf bdbf bd bf b d b f
4 BASIC ALGEBRAIC STRUCTURES 88

proving distributivity on Q. We now check the claims regarding neutral and inverse
elements:
a 0 a·1+b·0 a
+ = = ,
b 1 b·1 b
a −a ab + b(−a) (a − a)b 0 0
+ = 2
= 2
= 2 = ,
b b b b b 1
a 1 a·1 a
· = = ,
b 1 b·1 b
a b ab 1
· = = .
b a ba 1
Thus, (Q, +, ·) is a ring and (Q \ {0}, ·) is a group, implying Q to be a field. Finally
k l k·1+1·l k+l
ι(k) + ι(l) = + = = = ι(k + l),
1 1 1 1
k l kl
ι(k) · ι(l) = · = = ι(kl),
1 1 1
proving (4.39).
Remark 4.40. With the usual addition and multiplication, (R, +, ·) and (C, +, ·) are
fields (see [Phi16, Th. D.35] and and [Phi16, Th. 5.2], respectively). In particular, R is
a subfield of C and Q is a subfield of both R and C.
Definition and Remark 4.41. Let (R, +, ·) be a ring with unity. We call x ∈ R
invertible if, and only if, there exists x ∈ R such that xx = xx = 1. We call (R∗ , ·),
where R∗ := {x ∈ R : x invertible} the group of invertible elements of R. We verify
(R∗ , ·) to be a group: If x, y ∈ R∗ , then xy ȳx̄ = 1 and ȳx̄xy = 1, showing xy ∈ R∗ .
Moreover, R∗ inherits associativity from R, 1 ∈ R∗ and each x ∈ R∗ has a multiplicative
inverse by definition of R∗ , proving (R∗ , ·) to be a group. We also note that, if R 6= {0},
then 0 ∈ / R∗ and (R∗ , +) is not a group (in particular, R∗ is then not a subring of R).
Example 4.42. (a) Let (R, +, ·) be a ring and let M be a set. As in Ex. 4.9(e), we
can define + and · pointwise on RM , i.e.

f + g : M −→ R, (f + g)(x) := f (x) + g(x),



f,g∈RM f · g : M −→ R, (f · g)(x) := f (x) · g(x).

As with commutativity and associativity, it is immediate that RM inherits distribu-


tivity from R. Thus, (RM , +, ·) is a ring as well; if (R, +, ·) is a ring with unity,
then so is (RM , +, ·), where f ≡ 1 is neutral for · in RM , and, using the notation of
Def. and Rem. 4.41, (RM )∗ = (R∗ )M . We point out that, if M and R both have at
4 BASIC ALGEBRAIC STRUCTURES 89

least two distinct elements, then RM always has nontrivial divisors of 0, even if R
has none: Let x, y ∈ M with x 6= y, and 0 6= a ∈ R. Define
( (
a if z = x, a if z = y,
fx , fy : M −→ R, fx (z) := fy (z) :=
0 if z 6= x, 0 if z 6= y.
Then fx 6≡ 0, fy 6≡ 0, but fx · fy ≡ 0.
(b) Let (G, +) be an abelian group and let End(G) denote the set of (group) endomor-
phisms on G. Define addition and multiplication on End(G) by letting, for each
f, g ∈ End(G),
(f + g) : G −→ G, (f + g)(a) := f (a) + g(a), (4.42a)
(f · g) : G −→ G, (f · g)(a) := f (g(a)). (4.42b)
We claim that (End(G), +, ·) forms a ring with unity (the so-called ring of endo-
morphisms on G): First, we check + to be well-defined (which is clear for ·): For
each a, b ∈ G, we compute, using commutativity of + on G,
(f +g)(a+b) = f (a+b)+g(a+b) = f (a)+f (b)+g(a)+g(b) = (f +g)(a)+(f +g)(b),
showing f + g ∈ End(G). As we then know (End(G), +) ≤ (F(G, G), +), we know
· to be associative, and Id is clearly neutral for ·, it remains to check distributivity:
To this end, let f, g, h ∈ End(G). Then, for each a ∈ G,
 
f · (g + h) (a) = f g(a) + h(a) = (f g)(a) + (f h)(a) = (f g + f h)(a),

(g + h) · f (a) = (g + h)(f (a)) = (gf )(a) + (hf )(a) = (gf + hf )(a),
proving distributivity and that (End(G), +, ·) forms a ring with unity. Moreover, the
set Aut(G) := End(G) ∩ SG of automorphisms on G is a subgroup of the symmetric
group SG and, comparing with Def. and Rem. 4.41, we also have Aut(G) = End(G)∗ .
Definition and Remark 4.43. Let (F, +, ·) be a field. Recall the homomorphism from
Ex. 4.12(c), which we now apply to the group (F, +) with a := 1. Then
n
X
φ1 : N0 −→ F, φ1 (n) = 1 =: n · 1. (4.43)
i=1

We say that the field F has characteristic 0 (denoted char F = 0) if, and only if, n·1 6= 0
for each n ∈ N. Otherwise, we define the field’s characteristic to be the number
char F := min{n ∈ N : n · 1 = 0}.
With this definition, we have char Q = char R = char C = 0 and, for each prime number
p, char Zp = p.

5 VECTOR SPACES 90

The examples above show that 0 and each prime number occur as characteristic of fields.
It is not difficult to prove that no other numbers can occur:
Theorem 4.44. Let F be a field. If char F 6= 0, then p := char F is a prime number.

Proof. Suppose p = ab with ab ∈ N. For clarity of notation, we use the homomorphisms


φa of Ex. 4.12(c) (in particular φ1 as in Def. and Rem. 4.43) and use · only for the
multiplication in F (to not get confused when reading the following computation, one
should bear in mind that φa and exponentiation are used for the additive group (F, +)).
We obtain
b
X b
X
Th. 4.8(b) 
0 = φ1 (p) = φ1 (ab) = = φφ1 (a) (b) = φ1 (a) = φ1 (a) · 1
i=1 i=1
b
X
= φ1 (a) · 1 = φ1 (a) · φ1 (b).
i=1

As the field F has no nontrivial zero divisors, we must have φ1 (a) = 0 or φ1 (b) = 0. But
p = min{n ∈ N : φ1 (n) = 0}, implying a = p or b = p, showing p to be prime. 

5 Vector Spaces

5.1 Vector Spaces and Subspaces


Definition 5.1. Let F be a field and let V be a nonempty set with two maps
+ : V × V −→ V, (x, y) 7→ x + y,
(5.1)
· : F × V −→ V, (λ, x) 7→ λ · x
(+ is called (vector) addition and · is called scalar multiplication; as with other mul-
tiplications before, often one writes λx instead of λ · x – take care not to confuse the
vector addition on V with the addition on F and, likewise, not to confuse the scalar
multiplication with the multiplication on F , the symbol + is used for both additions
and · is used for both multiplications, but you can always determine from the context
which addition or multiplication is meant). Then V is called a vector space or a linear
space over F (sometimes also an F -vector space) if, and only if, the following conditions
are satisfied:

(i) (V, +) is a commutative group. The neutral element with respect to + is denoted
0 (do not confuse 0 ∈ V with 0 ∈ F – once again, the same symbol is used for
different objects (both objects only coincide for F = V )).
5 VECTOR SPACES 91

(ii) Distributivity:

∀ ∀ λ(x + y) = λx + λy, (5.2a)


λ∈F x,y∈V

∀ ∀ (λ + µ)x = λx + µx. (5.2b)


λ,µ∈F x∈V

(iii) Compatibility between Multiplication on F and Scalar Multiplication:

∀ ∀ (λµ)x = λ(µx). (5.3)


λ,µ∈F x∈V

(iv) The neutral element with respect to the multiplication on F is also neutral with
respect to the scalar multiplication:

∀ 1x = x. (5.4)
x∈V

If V is a vector space over F , then one calls the elements of V vectors and the elements
of F scalars.

Example 5.2. (a) Every field F is a vector space over itself if one uses the field ad-
dition in F as the vector addition and the field multiplication in F as the scalar
multiplication (as important special cases, we obtain that R is a vector space over
R and C is a vector space over C): All the vector space laws are immediate conse-
quences of the corresponding field laws: Def. 5.1(i) holds as every field is a commu-
tative group with respect to addition; Def. 5.1(ii) follows from field distributivity
and multiplicative commutativity on F ; Def. 5.1(iii) is merely the multiplicative
associativity on F ; and Def. 5.1(iv) holds, since scalar multiplication coincides with
field multiplication on F .

(b) The reasoning in (a) actually shows that every field F is a vector space over every
subfield E ⊆ F . In particular, R is a vector space over Q.

(c) If A is any nonempty set, F is a field, and Y is a vector space over the field F ,
then we can make V := F(A, Y ) = Y A (the set of functions from A into Y ) into a
vector space over F by defining for each f, g : A −→ Y :

(f + g) : A −→ Y, (f + g)(x) := f (x) + g(x), (5.5a)


(λ · f ) : A −→ Y, (λ · f )(x) := λ · f (x) for each λ ∈ F : (5.5b)

To verify that (V, +, ·) is, indeed, a vector space, we begin by checking Def. 5.1(i),
i.e. by showing (V, +) is a commutative group: Since (5.5a) is precisely the pointwise
5 VECTOR SPACES 92

definition of Ex. 4.9(e), we already know (V, +) to be a commutative group due to


(4.11d).
To verify Def. 5.1(ii), one computes
  (∗)
λ(f + g) (x) = λ f (x) + g(x) = λf (x) + λg(x)
∀ ∀ ∀ 
λ∈F f,g∈V x∈A
= λf + λg (x),
where distributivity in the vector space Y was used at (∗), proving λ(f +g) = λf +λg
and, thus, (5.2a). Similarly,
 (∗)
(λ + µ)f (x) = (λ + µ)f (x) = λf (x) + µf (x)
∀ ∀ ∀ 
λ,µ∈F f ∈V x∈A
= λf + µf (x),
where, once more, distributivity in the vector space Y was used at (∗), proving
(λ + µ)f = λf + µf and, thus, (5.2b).
The proof of Def. 5.1(iii), is given by
 (∗) 
∀ ∀ ∀ (λµ)f (x) = (λµ)f (x) = λ(µf (x)) = λ(µf ) (x),
λ,µ∈F f ∈V x∈A

where the validity of Def. 5.1(iii) in the vector space Y was used at (∗).
Finally, to prove Def. 5.1(iv), one computes
(∗)
∀ ∀ (1 · f )(x) = 1 · f (x) = f (x),
f ∈V x∈A

where the validity of Def. 5.1(iv) in the vector space Y was used at (∗).
(d) Let F be a field (e.g. F = R or F = C) and n ∈ N. For x = (x1 , . . . , xn ) ∈ F n ,
y = (y1 , . . . , yn ) ∈ F n , and λ ∈ F , define componentwise addition
x + y := (x1 + y1 , . . . , xn + yn ), (5.6a)

and componentwise scalar multiplication

λx := (λx1 , . . . , λxn ). (5.6b)


Then (F n , +, ·) constitutes a vector space over F . The validity of Def. 5.1(i) – Def.
5.1(iv) can easily be verified directly, but (F n , +, ·) can also be seen as a special
case of (c) with A = {1, . . . , n} and Y = F : Recall that, according to Ex. 2.16(c),
F n = F {1,...,n} = F {1, . . . , n}, F is the set of functions from {1, . . . , n} into F .
Then x = (x1 , . . . , xn ) ∈ F n is the same as the function f : {1, . . . , n} −→ F ,
f (j) = xj . Thus, (5.6a) is, indeed, the same as (5.5a), and (5.6b) is, indeed, the
same as (5.5b).
5 VECTOR SPACES 93

Proposition 5.3. The following rules hold in each vector space V over the field F
(here, even though we will usually use the same symbol for both objects, we write 0 for
the additive neutral element in F and ~0 for the additive neutral element in V ):

(a) λ · ~0 = ~0 for each λ ∈ F .

(b) 0 · v = ~0 for each v ∈ V .

(c) λ · (−v) = (−λ) · v = −(λ · v) for each λ ∈ F and each v ∈ V .

(d) If λ · v = ~0 with λ ∈ F and v ∈ V , then λ = 0 or v = ~0.

Proof. Exercise. 

Definition 5.4. Let (V, +, ·) be a vector space over the field F , ∅ 6= U ⊆ V . We call U
a subspace of V if, and only if, (U, +, ·) forms a vector space over F with respect to the
operations + and · it inherits from V , i.e. with respect to +↾U ×U and ·↾F ×U .

Theorem 5.5. Let (V, +, ·) be a vector space over the field F , ∅ 6= U ⊆ V . Then the
following statements are equivalent:

(i) U is a subspace of V .

(ii) One has

∀ x + y ∈ U, (5.7a)
x,y∈U

∧ ∀ ∀ λx ∈ U. (5.7b)
λ∈F x∈U

(iii) One has


∀ ∀ λx + µy ∈ U. (5.8)
λ,µ∈F x,y∈U

Proof. From the definition of a vector space, it is immediate that (i) implies (ii) and
(iii).
“(iii)⇒(ii)”: One obtains (5.7a) from (5.8) by setting λ := µ := 1, and one obtains
(5.7b) from (5.8) by setting y := 0 (and µ arbitrary, e.g., µ := 1).
“(ii)⇒(i)”: Clearly, U inherits commutativity and distributivity as well as the validity
of (5.3) and (5.4) from V . Moreover, if x ∈ U , then, setting λ := −1 in (5.7b), shows
−x ∈ U . This, together with (5.7a), proves (U, +) to be a subgroup of (V, +), thereby
completing the proof of (i). 
5 VECTOR SPACES 94

Example 5.6. (a) As a consequence of Th. 5.5, if V is a vector space over the field F ,
then {0} ⊆ V is always a subspace of V (sometimes called the trivial or the zero
vector space over F ).
(b) √
Q is not a subspace of R if R is considered as a vector space over R (for example,
2·2 ∈/ Q). However, Q is a subspace of R if R is considered as a vector space
over Q.
(c) Consider the vector space V := R3 over R and let
U := {(x, y, z) ∈ V : 3x − y + 7z = 0}.
We use Th. 5.5 to show U is a subspace of V : Since (0, 0, 0) ∈ U , we have U 6= ∅.
Moreover, if u1 = (x1 , y1 , z1 ) ∈ U , u2 = (x2 , y2 , z2 ) ∈ U , and λ, µ ∈ R, then
λu1 + µu2 = (λx1 + µx2 , λy1 + µy2 , λz1 + µz2 )
and
3(λx1 + µx2 ) − (λy1 + µy2 ) + 7(λz1 + µz2 )
u1 ,u2 ∈U
= λ(3x1 − y1 + 7z1 ) + µ(3x2 − y2 + 7z2 ) = λ · 0 + µ · 0 = 0,
showing λu1 + µu2 and proving U to be a subspace of V .
(d) It is customary to write K if K may stand for R or C, and, from now on, we will
occasionally use this notation. From Ex. 5.2(c), we know that, for each ∅ 6= A,
F(A, K) constitutes a vector space over K. Thus, as a consequence of Th. 5.5, a
subset of F(A, K) is a vector space over K if, and only if, it is closed under addition
and scalar multiplication. By using results from Calculus, we obtain the following
examples:
(i) The set P(K) of all polynomials mapping from K into K is a vector space over
K by [Phi16, Rem. 6.4]; for each n ∈ N0 , the set Pn (K) of all such polynomials
of degree at most n is also a vector space over K by [Phi16, Rem. (6.4a),(6.4b)].
(ii) If ∅ 6= M ⊆ C, then the set of continuous functions from M into K, i.e.
C(M, K), is a vector space over K by [Phi16, Th. 7.38].
(iii) If a, b ∈ R ∪ {−∞, ∞} and a < b, then the set of differentiable functions from
I :=]a, b[ into K is a vector space over K by [Phi16, Th. 9.7(a),(b)]. Moreover,
[Phi16, Th. 9.7(a),(b)] also implies that, for each k ∈ N, the set of k times
differentiable functions from I into K is a vector space over K, and so is each
set C k (I, K) of k times continuously differentiable functions ([Phi16,
T Th. 7.38]
∞ k
is also needed for the last conclusion). The set C (I, K) := k∈N C (I, K) is
also a vector space over K by Th. 5.7(a) below.
5 VECTOR SPACES 95

Theorem 5.7. Let V be a vector space over the field F .

(a) Let I 6= ∅ be T
an index set and (Ui )i∈I a family of subspaces of V . Then the inter-
section U := i∈I Ui is also a subspace of V .

(b) In contrast to intersections, unions of subspaces are almost never subspaces (how-
ever, cf. (c) below). More precisely, if U1 and U2 are subspaces of V , then
 
U1 ∪ U2 is subspace of V ⇔ U1 ⊆ U2 ∨ U2 ⊆ U1 . (5.9)

(c) While (b) shows that unions of subspaces are not necessarily subspaces, if I 6= ∅ is
an index set, partially ordered by ≤ in a way such that, for each i, j ∈ I, there exists
k ∈ I with i, j ≤ k (if I is totally ordered by ≤, then one can use k := max{i, j}),
and (Ui )i∈I is an increasing family
S of subspaces (i.e., for each i, j ∈ I with i ≤ j,
one has Ui ⊆ Uj ), then U := i∈I Ui is a subspace as well.

Proof. (a): Since 0 ∈ U , we have U 6= ∅. If x, y ∈ U and λ, µ ∈ F , then λx + µy ∈ Ui


for each i ∈ I, implying λx + µy ∈ U . Thus, U is a subspace of V by Th. 5.5.
(b): Exercise.
(c): Since 0 ∈ Ui ⊆ U , we have U 6= ∅. If x, y ∈ U and λ, µ ∈ F , then there exist i, j ∈ I
such that x ∈ Ui and y ∈ Uj . If k ∈ I with i, j ≤ k, then λx + µy ∈ Uk ⊆ U . Thus, U
is a subspace of V by Th. 5.5. 

While the union of two subspaces U1 , U2 is typically not a subspace, Th. 5.7(a) implies
there is always a smallest subspace containing U1 ∪ U2 and we will see below that this
subspace is always the so-called sum U1 + U2 of the two subspaces (this is a special case
of Prop. 5.11). As it turns out, the sum of vector spaces is closely connected with two
other essential notions of vector space theory, namely the notions linear combination
and span, which we define first:

Definition 5.8. Let V be a vector space over the field F .

(a) Let n ∈ N and v1 , . . . , vn ∈ V . A vector v ∈ V is called a linear combination of


v1 , . . . , vn if, and only if, there exist λ1 , . . . , λn ∈ F (often called coefficients in this
context) such that
X n
v= λi v i . (5.10)
i=1
5 VECTOR SPACES 96

(b) Let A ⊆ V , and



U := U ∈ P(V ) : A ⊆ U ∧ U is subspace of V . (5.11)

Then the set \


hAi := span A := U (5.12)
U ∈U

is called the span of A. Moreover A is called a spanning set or a generating set of


V if, and only if, hAi = V .

Proposition 5.9. Let V be a vector space over the field F and A ⊆ V .

(a) hAi is a subspace of V , namely the smallest subspace of V containing A.

(b) If A = ∅, then hAi = {0}; if A = 6 ∅, then hAi is the set of all linear combinations
of elements from A, i.e.
( n )
X
hAi = λ i ai : n ∈ N ∧ λ 1 , . . . , λ n ∈ F ∧ a1 , . . . , a n ∈ A . (5.13)
i=1

(c) If A ⊆ B ⊆ V , then hAi ⊆ hBi.

(d) A = hAi if, and only if, A is a subspace of V .

(e) hhAii = hAi.

Proof. (a) is immediate from Th. 5.7(a).


(b): For the case A = ∅, note that {0} is a subspace of V , and that {0} is contained in
every subspace of V . For A 6= ∅, let W denote the right-hand side of (5.13), and recall
from (5.12) that hAi is the intersection of all subspaces U of V that contain A. If U is
a subspace of V and A ⊆ U , then W ⊆ U , since U is closed under vector addition and
scalar multiplication, showing W ⊆ hAi. On the other hand, W is clearly a subspace of
V that contains A, showing hAi ⊆ W , completing the proof of hAi = W .
(c) is immediate from (b).
(d): If A = hAi, then A is a subspace by (a). For the converse, while it is clear that
A ⊆ hAi always holds, if A is a subspace, then A ∈ U, where U is as in (5.11), implying
hAi ⊆ A.
(e) now follows by combining (d) with (a). 
5 VECTOR SPACES 97

Definition 5.10. Let V be a vector space over the field F , let I be an index set and
let (Ui )i∈I be a family of subspaces of V . For I = ∅, define
X
Ui := {0}. (5.14a)
i∈I

If #I = n ∈ N, then let φ : {1, . . . , n} −→ I be a bijection and define


n
( n )
X X X
Ui := Uφ(k) := uk : ∀ uk ∈ Uφ(k) , (5.14b)
k∈{1,...,n}
i∈I k=1 k=1

where the commutativity of addition guarantees the definition is independent of the


chosen bijection (note that this definition is consistent with the definition in Ex. 4.9(d)).
For a general, not necessarily finite I, let J := {J ⊆ I : J is finite} and define
X [ X
Ui := Uj . (5.14c)
i∈I J∈J j∈J
P
In each case, we call i∈I Ui the sum of the subspaces Ui , i ∈ I.
Proposition 5.11. Let V be a vector space over the field F , let I be an index set and
let (Ui )i∈I be a family of subspaces of V . Then
* +
X [
Ui = Ui (5.15)
i∈I i∈I
P
(in particular, i∈I Ui is always a subspace of V ).
P S
Proof. Let A := i∈I Ui , B := h i∈I Ui i. If I = ∅, then both sides of (5.15) are {0},
i.e. the statement holds in this case.PNow assume I 6= ∅. If v ∈ A, then there are
i1 , . . . , in ∈ I, n ∈ N, such that v = nk=1 uik , uik ∈ Uik , showing v ∈ B, since B is a
subspace of V . Thus, A ⊆ B. To prove B ⊆ A, it suffices to show A is a subspace of V
that contains each Ui , i ∈ I. If i ∈ I and v ∈ Ui , then v ∈ A is clear from (5.14c) (set
J := {i}). P Let v, w ∈ A, where v is as above andP j1 , . . . , jm ∈ I, m ∈ N, such
there exist P
that w = m w , w
Pn j k j k
k=1 ∈ Ujk . Then v + w = n
u
k=1 ik + m
k=1 ujk ∈ A and, for each
λ ∈ F , λv = k=1 (λuik ) ∈ A, proving A to be a subspace of V . Thus, B ⊆ A, thereby
concluding the proof. 

5.2 Linear Independence, Basis, Dimension


The notions introduced in the present section are of central importance to the theory
of vector spaces. In particular, we will see that the structure of many interesting vector
5 VECTOR SPACES 98

spaces (e.g. Kn ) is particularly simple, as every vector can be written as a so-called


linear combination of a fixed finite set of linearly dependent vectors (we will see this is
the case if, and only if, the vector space is finite-dimensional, cf. Th. 5.23 below).
Definition 5.12. Let V be a vector space over the field F .

(a) A vector v ∈ V is called linearly dependent on a subset U of V (or on the vectors


in U ) if, and only if, v = 0 or there exists n ∈ N and u1 , . . . , un ∈ U such that v is
a linear combination of u1 , . . . , un . Otherwise, v is called linearly independent of U .
(b) A subset U of V is called linearly independent if, and only if, whenever 0 ∈ V is
written as a linear combination of distinct elements of U , then all coefficients must
be 0 ∈ F , i.e. if, and only if,
!
X
n ∈ N ∧ W ⊆ U ∧ #W = n ∧ λu u = 0 ∧ ∀ λu ∈ F
u∈W
u∈W

⇒ ∀ λu = 0. (5.16a)
u∈W

Occasionally, one also wants to have the notion available for families of vectors
instead of sets, and one calls a family (ui )i∈I of vectors in V linearly independent
if, and only if,
!
X
n ∈ N ∧ J ⊆ I ∧ #J = n ∧ λ j uj = 0 ∧ ∀ λj ∈ F
j∈J
j∈J

⇒ ∀ λj = 0. (5.16b)
j∈J

Sets and families that are not linearly independent are called linearly dependent.
Example 5.13. Let V be a vector space over the field F .

(a) ∅ is linearly independent: Indeed, if U = ∅, then the left side of the implication in
(5.16a) is always false (since W ⊆ U means #W = 0), i.e. the implication is true.
Moreover, by Def. 5.12(a), vP∈ V is linearly dependent on ∅ if, and only if, v = 0
(this is also consistent with u∈∅ λu u = 0 (cf. (3.15d)).
(b) If 0 ∈ U ⊆ V , then U is linearly dependent (in particular, {0} is linearly dependent),
due to 1 · 0 = 0.
(c) If 0 6= v ∈ V and λ ∈ F with λv = 0, then λ = 0 by Prop. 5.3(d), showing {v} to be
linearly independent. However, the family (v, v) is always linearly dependent, since
1v + (−1)v = 0 (also for v = 0). Moreover, 1v = v also shows that every v ∈ V is
linearly dependent on itself.
5 VECTOR SPACES 99

(d) The following example is of some general importance: Let I be a set and V = F I
(cf. Ex. 5.2(c),(d)). Define
(
1 if i = j,
∀ ei : I −→ F, ei (j) := δij := (5.17)
i∈I 0 if i 6= j.

Then δ is known as the Kronecker delta (which can be seen as a function δ :


I × I −→ {0, 1}). If n ∈ N and V = F n , i.e. I = {1, . . . , n}, then one often writes
the ei in the form e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en−1 = (0, . . . , 0, 1, 0),
en = (0, . . . , 0, 1); for example in R3 (or, more generally, in F 3 ), e1 = (1, 0, 0),
e2 = (0, 1, 0), e3 = (0, 0, 1). Returning to the case of a general I, we check that the
set
B := {ei : i ∈ I} (5.18)
is linearly independent: Assume
X
λj ej = 0 ∈ V,
j∈J

where J ⊆ I, #J = n ∈ N, and λj ∈ F for each j ∈ J. Recalling that 0 ∈ V is the


function that is constantly equal to 0 ∈ F , we obtain
!
X X X
∀ 0= λj ej (k) = λj ej (k) = λj δjk = λk ,
k∈J
j∈J j∈J j∈J

thereby proving the linear independence of B.

(e) For a Calculus application, we let

∀ fk : R −→ K, fk (x) := ekx ,
k∈N0

and show that


U := {fk : k ∈ N0 }
is linearly independent as a subset of the vector space V := C(R, K) over K of
differentiable functions from R into K (cf. Ex. 5.6(d)(iii)), by showing each set
Un := {fk : k ∈ {0, . . . , n}}, n ∈ N0 , is linearly independent, using an induction
on n: Since, for n = 0, fn ≡ 1, the base case holds. Now let n ≥ 1 and assume
λ0 , . . . , λn ∈ K are such that
n
X
∀ λk ekx = 0. (5.19)
x∈R
k=0
5 VECTOR SPACES 100

Multiplying (5.19) by n yields


n
X
∀ n λk ekx = 0, (5.20)
x∈R
k=0

whereas differentiating (5.19) with respect to x yields


n
X
∀ kλk ekx = 0. (5.21)
x∈R
k=0

By subtracting (5.21) from (5.20), we then obtain


n−1
X
∀ (n − k)λk ekx = 0,
x∈R
k=0

implying λ0 = · · · = λn−1 = 0 due to the induction hypothesis. Using this in (5.19),


then yields
∀ λn enx = 0
x∈R

and, thus, λn = 0 as well, completing the induction proof.

Proposition 5.14. Let V be a vector space over the field F and U ⊆ V .

(a) U is linearly dependent if, and only if, there exists u0 ∈ U such that u0 is linearly
dependent on U \ {u0 }.

(b) If U is linearly dependent and U ⊆ M ⊆ V , then M is linearly dependent as well.

(c) If U is linearly independent and M ⊆ U , then M is linearly independent as well.

(d) Let U be linearly independent. If U1 , U2 ⊆ U and V1 := hU1 i, V2 := hU2 i, then


V1 ∩ V2 = hU1 ∩ U2 i.

Proof. (a):PSuppose, U is linearly dependent. Then there exists W ⊆ U , #W = n ∈ N,


such that u∈W λu u = 0 with λu ∈ F and there exists u0 ∈ W with λu0 6= 0. Then
X X
u0 = −λ−1
u0 λu u = (−λ−1
u0 λu )u,
u∈W \{u0 } u∈W \{u0 }

showing u0 to be linearly dependent on U \ {u0 }. Conversely, if u0 ∈ U is linearly


dependent on U \ {u0 }, then u0 = 0, in which case U is linearly dependent according to
5 VECTOR SPACES 101

Ex. 5.13(b), or u0 6= 0, in which case, there exists n ∈ N, distinct u1 , . . . , un ∈ U \ {u0 },


and λ1 , . . . , λn ∈ F such that
n
X n
X
u0 = λ i ui ⇒ −u0 + λi ui = 0,
i=1 i=1

showing U to be linearly dependent, since the coefficient of u0 is −1 6= 0.


(b) and (c) are now both immediate from (a).
(d): Since V1 ∩ V2 is a vector space with U1 ∩ U2 ⊆ V1 ∩ V2 , we already have hU1 ∩ U2 i ⊆
V1 ∩ V2 . Now let u ∈ V1 ∩ V2 . Then there exist distinct z1 , . . . , zk ∈ U1 ∩ U2 and distinct
xk+1 , . . . , xk+n ∈ U1 \ U2 and distinct yk+1 , . . . , yk+m ∈ U2 \ U1 with k ∈ N0 , m, n ∈ N,
as well as λ1 , . . . , λk+n , µ1 , . . . , µk+m ∈ F such that
k
X k+n
X k
X k+m
X
u= λi z i + λi xi = µi z i + µ i yi ,
i=1 i=k+1 i=1 i=k+1

implying
k
X k+n
X k+m
X
0= (λi − µi )zi + λi xi − µ i yi .
i=1 i=k+1 i=k+1
The linear independence of U then yields λ1 = µ1 , . . . , λk = µk as well as λk+1 =
· · · = λk+n = µk+1 = · · · = µk+m = 0. Thus, in particular, u ∈ hU1 ∩ U2 i, proving
V1 ∩ V2 ⊆ hU1 ∩ U2 i as desired. 
Definition 5.15. Let V be a vector space over the field F and B ⊆ V . Then B is
called a basis of V if, and only if, B is a generating set for V (i.e. V = hBi) that is also
linearly independent.
Example 5.16. (a) Due to Ex. 5.13(a), in each vector space V over a field F , we have
that ∅ is linearly independent with h∅i = {0}, i.e. ∅ is a basis of the trivial space
{0}.
(b) If one considers a field F as a vector space over itself, then, clearly, every {λ} with
0 6= λ ∈ F is a basis.
(c) We continue Ex. 5.13(d), where we showed that, given a field F and a set I, B =
{ei : i ∈ I} with the ei defined by (5.17) constitutes a linearly independent subset
of the vector space F I over F . We will now show that hBi = Ffin I
(i.e. B is a basis
I I
of Ffin ), where Ffin denotes the set of functions f : I −→ F such that there exists
a finite set If ⊆ I, satisfying
f (i) = 0 for each i ∈ I \ If , (5.22a)
f (i) 6= 0 for each i ∈ If (5.22b)
5 VECTOR SPACES 102

I
(then Ffin = F I if, and only if, I is finite (for example Ffinn
= F n for n ∈ N); but, in
I I I
general, Ffin is a strict subset of F ). We first show that Ffin is always a subspace
of F I : Indeed, if f, g ∈ Ffin
I
and λ ∈ F , then Iλf = If for λ 6= 0, Iλf = ∅ for λ = 0,
I I I
and If +g ⊆ If ∪ Ig , i.e. λf ∈ Ffin and f + g ∈ Ffin . We also note B ⊆ Ffin , since
I
Iei = {i} for each i ∈ I. To see hBi = Ffin , we merely note that
X
∀ f= f (i)ei . (5.23)
I
f ∈Ffin
i∈If

I
Thus, B is a basis of Ffin as claimed – B is often referred to as the standard basis of
I
Ffin . In particular, we have shown that, for each n ∈ N, the set {ej : j = 1, . . . , n},
where e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1), forms a basis of
F n (of Rn if F = R and of Cn if F = C). If I = N, then Ffin I
is the space of all
sequences in F that have only finitely many nonzero entries. We will see later that,
I
actually, every vector space is isomorphic to some suitable Ffin (cf. Rem. 5.25 and
Th. 6.9).

(d) In Ex. 5.6(d)(i), we noted that the set P(K) of polynomials with coefficients in
K is a vector space over K. We show the set B := {ej : j ∈ N0 } of monomials
ej : K −→ K, ej (x) := xj , to be a basis of P(K): While hBi = P(K) is immediate
from the definition of P(K), linear independence of B can be shown similarly to the
linear independence of the exponential functions in Ex. 5.13(e): As in that example,
we use the differentiability of the considered functions (here, the ej ) to show each
set Un := {ej : j ∈ {0, . . . , n}}, n ∈ N0 , is linearly independent, using an induction
on n: Since, for n = 0, en ≡ 1, the base case holds. Now let n ≥ 1 and assume
λ0 , . . . , λn ∈ K are such that
n
X
∀ λj xj = 0. (5.24)
x∈R
j=0

Differentiating (5.24) with respect to x yields


n
X
∀ jλj xj−1 = 0,
x∈R
j=1

implying λ1 = · · · = λn = 0 due to the induction hypothesis. Using this in (5.24),


then yields λ0 = λ0 x0 = 0, completing the induction proof. We will see later that
P(K) is, actually, isomorphic, to (K)Nfin .

Theorem 5.17. Let V be a vector space over the field F and B ⊆ V . Then the following
statements (i) – (iii) are equivalent:
5 VECTOR SPACES 103

(i) B is a basis of V .
(ii) B is a maximal linearly independent subset of V , i.e. B is linearly independent
and each set A ⊆ V with B ( A is linearly dependent.
(iii) B is a minimal generating set for V , i.e. hBi = V and hAi ( V for each A ( B.

Proof. “(i)⇒(ii)”: Let B ⊆ A ⊆ V and a ∈ A \ B. Since hBi = V , there exist


λ1 , . . . , λn ∈ F , b1 , . . . , bn ∈ B, n ∈ N, such that
n
X n
X
a= λi bi ⇒ (−1)a + λi bi = 0,
i=1 i=1

showing A to be linearly dependent.


“(ii)⇒(i)”: Suppose B to be a maximal linearly independent subset of V . We need to
show hBi = V . Since B ⊆ hBi, let v ∈ V \ B. Then A := {v} ∪ B is linearly dependent,
i.e.
!
X
∃ #W = n ∈ N ∧ λu u = 0 ∧ ∀ λu ∈ F ∧ ∃ λu 6= 0 ,
W ⊆A u∈W u∈W
u∈W
(5.25)
where the linear independence of B implies v ∈ W and λv 6= 0. Thus,
X
v=− λ−1
v λu u
u∈W \{v}

showing v ∈ hBi, since W \ {v} ⊆ B.


“(i)⇒(iii)”: Let A ( B and b ∈ B \ A. Since B is linearly independent, Prop. 5.14(a)
implies b not to be a linear combination of elements in A, showing hAi ( V .
“(iii)⇒(i)”: We need to show that a minimal generating set for V is linearly independent.
Arguing by contraposition, we let B be a generating set for V that is linearly dependent
and show B is not minimal: Since B is linearly dependent, (5.25) holds with A replaced
by B and there exists b0 ∈ W with λb0 6= 0, yielding
X
b0 = − λ−1
b0 λu u. (5.26)
u∈W \{b0 }

We show A := B \ {b0 } to be a generating set for V : If v ∈ V , since hBi = V , there


exist λ0 , λ1 , . . . , λn ∈ F and b1 , . . . , bn ∈ A, n ∈ N, such that
n
X X n
X
(5.26)
v = λ0 b0 + λi bi = − λ0 λ−1
b 0 λu u + λ i bi .
i=1 u∈W \{b0 } i=1
5 VECTOR SPACES 104

Since u ∈ W \ {b0 } ⊆ A and b1 , . . . , bn ∈ A, this proves A to be a generating set for V ,


i.e. B is not minimal. 
vector space over the field F and U ⊆ V a finite subset,
Proposition 5.18. Let V be a P
#U = n ∈ N, u0 ∈ U , v0 := u∈U λu u with λu ∈ F for each u ∈ U and λu0 6= 0,
Ũ := {v0 } ∪ (U \ {u0 }).

(a) If hU i = V , then hŨ i = V .


(b) If U is linearly independent, then so is Ũ .
(c) If U is a basis of V , then so is Ũ .

Proof. Under the hypothesis, we have


X
u0 = λ−1 −1
u0 v 0 − λ u0 λu u. (5.27)
u∈U \{u0 }

(a): If v ∈ V , then there exist µu ∈ F , u ∈ U , such that


X X (5.27) X
v= µ u u = µ u 0 u0 + µu u = µu0 λ−1 u0 v 0 + (µu − µu0 λ−1
u0 λu )u,
u∈U u∈U \{u0 } u∈U \{u0 }

showing hŨ i = V .
(b): Suppose µ0 ∈ F and µu ∈ F , u ∈ U \ {u0 }, are such that
X (5.27) X
0 = µ0 v0 + µ u u = µ 0 λ u 0 u0 + (µ0 λu + µu )u.
u∈U \{u0 } u∈U \{u0 }

Since U is linearly independent, we obtain µ0 λu0 = 0 = µ0 λu + µu for each u ∈ U \ {u0 }.


Since λu0 6= 0, we have µ0 = 0, then also implying µu = 0 for each u ∈ U \ {u0 }, showing
Ũ to be linearly independent.
(c) is now immediate from combining (a) and (b). 
Theorem 5.19 (Coordinates). Let V be a vector space over the field F and assume
B ⊆ V is a basis of V . Then each vector v ∈ V has unique coordinates with respect
to the basis B, i.e., for each v ∈ V , there exists a unique finite subset Bv of B and a
unique map c : Bv −→ F \ {0} such that
X
v= c(b) b. (5.28)
b∈Bv

P that, for v = 0, one has Bv = ∅, c is the empty map, and (5.28) becomes 0 =
Note
b∈∅ c(b) b.
5 VECTOR SPACES 105

Proof. The existence of Bv and the map c follows from the fact that the basis B is a
generating set, hBi = V . For the uniqueness proof, consider finite sets Bv , B̃v ⊆ B and
maps c : Bv −→ F \ {0}, c̃ : B̃v −→ F \ {0} such that
X X
v= c(b) b = c̃(b) b. (5.29)
b∈Bv b∈B̃v

Extend both c and c̃ to A := Bv ∪ B̃v by letting c(b) := 0 for b ∈ B̃v \ Bv and c̃(b) := 0
for b ∈ Bv \ B̃v . Then X 
0= c(b) − c̃(b) b, (5.30)
b∈A

such that the linear independence of A implies c(b) = c̃(b) for each b ∈ A, which, in
turn, implies Bv = B̃v and c = c̃. 

Remark 5.20. If the basis B of V has finitely many elements, then one often enumerates
the elements B = {b1 , . . . , bn }, n = #B ∈ N, and writes λi := c(bi ) for bi ∈ Bv , λi := 0
for bi ∈
/ Bv , such that (5.28) takes the, perhaps more familiar looking, form
n
X
v= λi bi . (5.31)
i=1

In Ex. 5.16(c), we have seen that the vector space F n , n ∈ N, over F has the finite
basis {e1 , . . . , en } and, more generally, that, for each set I, B = {ei : i ∈ I} is a basis
I
for the vector space Ffin over F . Since, clearly, B and I have the same cardinality (via
the bijective map i 7→ ei ), this shows that, for each set I, there exists a vector space,
whose basis has the same cardinality as I. In Th. 5.23 below, we will show one of the
fundamental results of vector space theory, namely that every vector space has a basis
and that two bases of the same vector space always have the same cardinality (which is
referred to as the dimension of the vector space, cf. Def. 5.24).
As some previous results of this class, the proof of Th. 5.23 makes use of the axiom
of choice (AC) of Appendix A.4, which postulates, for each nonempty set M, whose
elements are all nonempty sets, the existence of a choice function, that means a function
that assigns, to each M ∈ M, an element m ∈ M . Somewhat surprisingly, AC can not
be proved (or disproved) from ZF, i.e. from the remaining standard axioms of set theory
(see Appendix A.4 for a reference). Previously, when AC was used, choice functions were
employed directly. However, in the proof of Th. 5.23, AC enters in the form of Zorn’s
lemma. While Zorn’s lemma turns out to be equivalent to AC, the equivalence is not
obvious (see Th. A.52(iii)). However, Zorn’s lemma provides an important technique
5 VECTOR SPACES 106

for conducting existence proofs that is encountered frequently throughout mathematics,


and the proof of existence of a basis in a vector space is the standard place to first
encounter and learn this technique (as it turns out, the existence of bases in vector
spaces is, actually, also equivalent to AC – see the end of the proof of Th. A.52 for a
reference).
The following Def. 5.21 prepares the statement of Zorn’s lemma:

Definition 5.21 (Same as Part of Appendix Def. A.50). Let X be a set and let ≤ be
a partial order on X.

(a) An element m ∈ X is called maximal (with respect to ≤) if, and only if, there exists
no x ∈ X such that m < x (note that a maximal element does not have to be a
max and that a maximal element is not necessarily unique).

(b) A nonempty subset C of X is called a chain if, and only if, C is totally ordered by
≤.

Theorem 5.22 (Zorn’s Lemma). Let X be a nonempty partially ordered set. If every
chain C ⊆ X (i.e. every nonempty totally ordered subset of X) has an upper bound in
X (such chains with upper bounds are sometimes called inductive), then X contains a
maximal element (cf. Def. 5.21(a)).

Proof. According to Th. A.52(iii), Zorn’s lemma is equivalent to the axiom of choice. 

Theorem 5.23 (Bases). Let V be a vector space over the field F .

(a) If U ⊆ V is linearly independent, then there exists a basis of V that contains U .

(b) V has a basis B ⊆ V .

(c) Bases of V have a unique cardinality, i.e. if B ⊆ V and B̃ ⊆ V are both bases of
V , then there exists a bijective map φ : B −→ B̃. In particular, if #B = n ∈ N0 ,
then #B̃ = n.

(d) If B is a basis of V and U ⊆ V is linearly independent, then there exists C ⊆ B such


that B̃ := U ∪˙ C is a basis of V . In particular, if #B = n ∈ N, B = {b1 , . . . , bn },
then #U = m ∈ N with m ≤ n, and, if U = {u1 , . . . , um }, then there exist distinct
um+1 , . . . , un ∈ B such that B̃ = {u1 , . . . , un } is a basis of V .

Proof. If V is generated by finitely many vectors, then the proof does not need Zorn’s
lemma and, thus, we treat this case first: If V = {0}, then (a) – (d) clearly hold. Thus,
we also assume V 6= {0}. To prove (d), assume B = {b1 , . . . , bn } is a basis of V with
5 VECTOR SPACES 107

#B = n ∈ N. We first show, by induction on m ∈ {1, . . . , n} that, if U is linearly


independent, #U = m, U = {u1 , . . . , um }, then there exist distinct um+1 P, . . . , un ∈ B
such that B̃ = {u1 , . . . , un } is a basis of V : If m = 1, we write u1 = ni=1 λi bi with
λi ∈ F . Since u1 6= 0, there exists i0 ∈ {1, . . . , n} such that λi0 6= 0. Then B̃ :=
{u1 } ∪ (B \ {bi0 }) is a basis of V by Prop. 5.18(c), thereby establishing the base case. If
1 < m ≤ n, then, by induction hypothesis, there exist distinct cm , cm+1 , . . . , cn ∈ B such
that {u1 , . . . , um−1 } ∪ {cm , cm+1 , . . . , cn } forms a basis of V . Then there exist µi ∈ F
such that
m−1
X n
X
um = µ i ui + µi c i .
i=1 i=m
There then must exist i0 ∈ {m, . . . , n} such that µi0 6= 0, since, otherwise, U were
linearly dependent. Thus, again, by Prop. 5.18(c), B̃ := U ∪ ({cm , cm+1 , . . . , cn } \ {ci0 })
forms a basis of V , completing the induction. Now suppose m > n. Then the above
shows {u1 , . . . , un } to form a basis of V , in contradiction to the linear independence of
U . This then also implies that U can not be infinite. Thus, we always have #U = m ≤ n
and #(B \ {um+1 , . . . , un }) = m, completing the proof of (d). Now if U is not merely
linearly independent, but itself a basis of V , then we can switch the roles of B and U ,
obtaining m = n, proving (c). Since we assume V to be generated by finitely many
vectors, there exists C ⊆ V with hCi = V and #C = k ∈ N. Then C is either a basis or
it is not minimal, in which case there exists c ∈ C such that C̃ := C \ {c} still generates
V , #C̃ = k − 1. Proceeding inductively, we can remove vectors until we obtain a basis
of V , proving (b). Combining (b) with (d), now proves (a).
General Case: Here, we prove (a) first, making use of Zorn’s lemma: Given a linearly
independent set U ⊆ V , define
M := {M ⊆ V : U ⊆ M and M is linearly independent} (5.32)
and note that set inclusion ⊆ constitutes a partial order on M. Moreover, U ∈ M, i.e.
M 6= ∅. Let C ⊆ M be a chain, i.e. assume C 6= ∅ to be totally ordered by ⊆. Define
[
C0 := C. (5.33)
C∈C

It is then immediate that C ∈ C implies C ⊆ C0 , i.e. C0 is an upper bound for C.


Moreover, C0 ∈ M: Indeed, if C ∈ C, then U ⊆ C ⊆ C0 . To verify C0 is linearly
independent, let c1 , . . . , cn ∈ C0 be distinct and λ1 , . . . , λn ∈ F , n ∈ N, with
n
X
0= λi c i .
i=1

Then, for each i ∈ {1, . . . , n}, there exists Ci ∈ C with ci ∈ Ci . Since


D := {C1 , . . . , Cn } ⊆ C
5 VECTOR SPACES 108

is totally ordered by ⊆, by Th. 3.25(a), there exist i0 ∈ {1, . . . , n} such that Ci0 = max D,
i.e. Ci ⊆ Ci0 for each i ∈ {1, . . . , n}, implying ci ∈ Ci0 for each i ∈ {1, . . . , n} as well.
As Ci0 is linearly independent, λ1 = · · · = λn = 0, showing C0 is linearly independent,
as desired. Having, thus, verified all hypotheses of Zorn’s lemma, Th. 5.22 provides a
maximal element B ∈ M. By Th. 5.17(ii), B is a basis of V . As U ⊆ B also holds, this
proves (a) and, in particular, (b).
To prove (c), let B1 , B2 be bases of V . As we have shown above that (c) holds, provided
B1 or B2 is finite, we now assume B1 and B2 both to be infinite. We show the existence
of an injective map φ1 : B2 −→ B1 : To this end, for each v ∈ B1 , let B2,v be the finite
subset of B2 given by Th. 5.19, consisting of all b ∈ B2 such that the coordinate of v
with respect to b is nonzero. Let ψv : B2,v −→ {1, . . . , #B2,v } be bijective. Also define
[
E := B2,v .
v∈B1

Then, since V = hB1 i, we also have V = hEi. Thus, as E ⊆ B2 and the basis B2 is
a minimal generating set, we obtain E = B2 . In particular, for each b ∈ B2 , we may
choose some v(b) ∈ B1 such that b ∈ B2,v(b) . The map

f : B2 −→ B1 × N, f (b) := v(b), ψv(b) (b) ,

is then, clearly, well-defined and injective. Moreover, Th. A.57 of the Appendix provides
us with a bijective map φB1 : B1 × N −→ B1 . In consequence, φ1 : B2 −→ B1 ,
φ1 := φB1 ◦ f is injective as well. Since B1 , B2 were arbitrary bases, we can interchange
the roles of B1 and B2 to also obtain an injective map φ2 : B1 −→ B2 . According to
the Schröder-Bernstein Th. 3.12, there then also exists a bijective map φ : B1 −→ B2 ,
completing the proof of (c).
To prove (d), let B be a basis of V and let U ⊆ V be linearly independent. Analogously
to the proof of (a), apply Zorn’s lemma, but this time with the set

M := {M ⊆ V : M = U ∪˙ C, C ⊆ B, M is linearly independent}, (5.34)

such that the maximal element B̃ ∈ M, obtained from Th. 5.22, has the form B̃ = U ∪˙ C
with C ⊆ B. We claim B̃ to be a basis of V : Linear independence is clear, since B̃ ∈ M,
and it only remains to check hB̃i = V . Due to the maximality of B̃, if b ∈ B, there exist
a finite set B̃b ⊆ B̃ and λbu ∈ F , u ∈ B̃b , such that
X
b= λbu u. (5.35)
u∈B̃b

Now let v ∈ V be arbitrary. As B is a basis of V , there exist b1 , . . . , bn ∈ B and


5 VECTOR SPACES 109

λ1 , . . . , λn ∈ F , n ∈ N, such that
n
X n X
X
(5.35)
v= λi bi = λi λbi u u,
i=1 i=1 u∈B̃b
i

proving hB̃i = V as desired. 

Definition 5.24. According to Th. 5.23, for each vector space V over a field F , the
cardinality of its bases is unique. This unique cardinality is called the dimension of V and
is denoted dim V . If dim V < ∞ (i.e. dim V ∈ N0 ), then V is called finite-dimensional,
otherwise infinite-dimensional.

Remark 5.25. Let F be a field. In Ex. 5.16(c), we saw that B = {ei : i ∈ I} with
(
1 if i = j,
∀ ei : I −→ F, ei (j) = δij :=
i∈I 0 if i 6= j,

I
is a basis of Ffin . Since #B = #I, we have shown
I
dim Ffin = #I, (5.36)

and, in particular, for I = {1, . . . , n}, n ∈ N,

dim F n = dim Rn = dim Cn = n. (5.37)


I
We will see in Th. 6.9 below that, in a certain sense, Ffin is the only vector space of
dimension #I over F . In particular, for n ∈ N, one can think of F n as the standard
model of an n-dimensional vector space over F .

Remark 5.26. Bases of vector spaces are especially useful if they are much smaller than
the vector space itself, as in the case of finite-dimensional vector spaces over infinite
fields, Rn and Cn being among the most important examples. For infinite-dimensional
vector spaces V , one often has dim V = #V (cf. Th. E.1 and Cor. E.2 in the Appendix)
such that vector space bases are not as helpful in such situations. On the other hand,
infinite-dimensional vector spaces might come with some additional structure, providing
a more useful notion of basis (e.g., in infinite-dimensional Hilbert spaces, one typically
uses so-called orthonormal bases rather than vector space bases).

In the rest of the section, we investigate relations between bases and subspaces; and we
introduce the related notions direct complement and direct sum.
5 VECTOR SPACES 110

Theorem 5.27. Let V be a vector space over the field F and let U be a subspace of V .

(a) If BU is a basis of U , then there exists a basis B of V with BU ⊆ B.

(b) dim U ≤ dim V , i.e. if BU is a basis of U and B is a basis of V , then there exists
an injective map φ : BU −→ B.

(c) There exists a subspace U ′ of V such that V = U + U ′ and U ∩ U ′ = {0}.

(d) If dim V = n ∈ N0 , then

dim U = dim V ⇔ U = V.

Proof. (a): Let BU be a basis of U . Since BU is a linearly independent subset of V , we


can employ Th. 5.23(a) to obtain C ⊆ V such that B := BU ∪˙ C is a basis of V .
(b): Let BU and B be bases of U and V , respectively. As in the proof of (a), choose
C ⊆ V such that B̃ := BU ∪˙ C is a basis of V . According to Th. 5.23(c), there exists a
bijective map φV : B̃ −→ B. Then the map φ := φV ↾BU is still injective.
(c): Once again, let BU be a basis of U and choose C ⊆ V such that B := BU ∪˙ C is a
basis of V . Define U ′ := hCi. Since B = BU ∪˙ C is a basis of V , it is immediate that
V = U + U ′ . Now suppose u ∈ U ∩ U ′ . Then there exist λ1 , . . . , λm , µ1 , . . . , µn ∈ F with
m, n ∈ N, as well as distinct b1 , . . . , bm ∈ BU and distinct c1 , . . . , cn ∈ C such that
m
X n
X m
X n
X
u= λi bi = µi c i ⇒ 0= λi bi − µi c i .
i=1 i=1 i=1 i=1

Since {b1 , . . . , bm , c1 , . . . , cn } ⊆ B is linearly independent, this implies 0 = λ1 = · · · =


λm = µ1 = · · · = µn and u = 0.
(d): Let dim U = dim V = n ∈ N0 . If BU is a basis of U , then #BU = n and, by (a),
BU must also be a basis of V , implying U = hBU i = V . 

Definition 5.28. Let V be a vector space over the field F and let U, W be subspaces
of V . Then W is called a direct complement of U if, and only if, V = U + W and
U ∩ W = {0}. In that case, we also write V = U ⊕ W and we say that V is the direct
sum of U and W .

Caveat: While set-theoretic complements A \ B are uniquely determined by A and B,


nontrivial direct complements of vector subspaces are never unique, as shown in the
following proposition:
5 VECTOR SPACES 111

Proposition 5.29. Let V be a vector space over the field F and let U, W be subspaces
of V such that V = U ⊕ W . If there exist 0 6= u ∈ U and 0 6= w ∈ W , then there exists
a subspace W̃ 6= W of V with V = U ⊕ W̃ .

Proof. According to Th. 5.23(d), there exist bases BU of U and BW of W with u ∈ BU


and w ∈ BW . Then v := u + w ∈ / W , since, otherwise u = (u + w) − w ∈ W , in
contradiction to U ∩ W = {0}. Let B̃ := {v} ∪ (BW \ {w}) and W̃ := hB̃i. Suppose,
x ∈ U ∩ W̃ . Then there exist distinct w1 , . . . , wn ∈ BW \ {w} and λ0 , λ1 , . . . , λn ∈ F ,
n ∈ N, such that
Xn
x = λ0 (u + w) + λi w i .
i=1
Pn
Since x ∈ U , we also obtain x − λ0 u ∈ U . Also x − λ0 u = λ0 w + i=1 λi w i ∈ W ,
implying
n
X
0 = x − λ0 u = λ0 w + λi w i .
i=1

However, w, w1 , . . . , wn ∈ BW are linearly independent, yielding 0 = λ0 = · · · = λn


and, thus, x = 0, showing U ∩ W̃ = {0}. To obtain V = U ⊕ W̃ , it remains to show
w ∈ U + W̃ (since V = U ⊕ W , U ⊆ U + W̃ , and BW \ {w} ⊆ W̃ ⊆ U + W̃ ). Since
w = −u + (u + w) ∈ U + W̃ , the proof is complete. 
Theorem 5.30. Let V be a vector space over the field F and let U, W be subspaces of
V . Moreover, let BU , BW , B+ , B∩ be bases of U, W, U + W, U ∩ W , respectively.

(a) U + W = hBU ∪ BW i.
(b) If U ∩ W = {0}, then BU ∪˙ BW is a basis of U ⊕ W .
(c) It holds that
dim(U + W ) + dim(U ∩ W ) = dim U + dim W, (5.38)
by which we mean that there exists a bijective map
˙ W × {1}) −→ (B+ × {0}) ∪(B
φ : (BU × {0}) ∪(B ˙ ∩ × {1}).

(d) Suppose B := BU ∪˙ BW is a basis of V . Then V = U ⊕ W .

Proof. (a): Let B := BU ∪BW . If v = u+w with u ∈ U , w ∈ W , then there exist distinct
u1 , . . . , un ∈ BU , distinct w1 , . . . , wm ∈ BW (m, n ∈ N), and λ1 , . . . , λn , µ1 , . . . , µm ∈ F
such that n m
X X
v =u+w = λ i ui + µi w i , (5.39)
i=1 i=1
6 LINEAR MAPS 112

showing U + W = hBi.
(b): Let B := BU ∪ BW . We know U ⊕ W = hBi from (a). Now consider (5.39)
with u + w = 0. Then u = −w ∈ U ∩ W = {0}, i.e. u = 0. Then w = 0 as well and
0 = λ1 = · · · = λn = µ1 = · · · = µm , due to the linear independence of the u1 , . . . , un and
the linear independence of the w1 , . . . , wm . In consequence, B is linearly independent
as well and, thus, a basis of U ⊕ W .
(c): According to Th. 5.27(a), we may choose BU and BW such that B∩ ⊆ BU and
B∩ ⊆ BW . Let B0 := BU \ B∩ , U0 := hB0 i, B1 := BW \ B∩ , U1 := hB1 i. Then
B := BU ∪ BW = B0 ∪˙ B1 ∪˙ B∩
is linearly independent: Consider, once again, the situation of the proof of (b), with
v = 0 and, this time, with u1 , . . . , un ∈ B0 . Again, we obtain u = −w ∈ W , i.e.
Prop. 5.14(d)
u ∈ U0 ∩ (U ∩ W ) = hB0 i ∩ hB∩ i = hB0 ∩ B∩ i = {0}.
Thus, 0 = λ1 = · · · = λn = 0, due to the linear independence of B0 . This then implies
w = 0 and 0 = µ1 = · · · = µm due to the linear independence of BW , yielding the
claimed linear independence of B. According to (a), we also have U + W = hBi, i.e. B
is a basis of U + W . Now we are in a position to define
˙ W × {1}) −→ (B × {0}) ∪(B
φ : (BU × {0}) ∪(B ˙ ∩ × {1}),

(b, 0) for b ∈ BU and α = 0,

φ(b, α) := (b, 0) for b ∈ B1 and α = 1,


(b, 1) for b ∈ B∩ and α = 1,
which is, clearly, bijective. Bearing in mind Th. 5.23(c), this proves (c).
(d): As B = BU ∪˙ BW is a basis of V , we can write each v ∈ V as v = u + w with u ∈ U
and w ∈ W , showing V = U + W . On the other hand,
Prop. 5.14(d)
U ∩ V = hBU i ∩ hBV i = hBU ∩ BV i = {0},
proving V = U ⊕ W . 

6 Linear Maps

6.1 Basic Properties and Examples


In previous sections, we have studied structure-preserving maps, i.e. homomorphisms,
in the context of magmas (in particular, groups) and rings (in particular, fields), cf. Def.
6 LINEAR MAPS 113

4.10 and Def. 4.34(b). In the context of vector spaces, the homomorphisms are so-called
linear maps, which we proceed to define and study next.

Definition 6.1. Let V and W be vector spaces over the field F . A map A : V −→ W
is called vector space homomorphism or F -linear (or merely linear if the field F is
understood) if, and only if,

∀ A(v1 + v2 ) = A(v1 ) + A(v2 ), (6.1a)


v1 ,v2 ∈V

∧ ∀ ∀ A(λv) = λA(v) (6.1b)


λ∈F v∈V

or, equivalently, if, and only if,

∀ ∀ A(λv1 + µv2 ) = λA(v1 ) + µA(v2 ) (6.2)


λ,µ∈F v1 ,v2 ∈V

(note that, in general, vector addition and scalar multiplication will be different on the
left-hand sides and right-hand sides of the above equations). Thus, A is linear if, and
only if, it is a group homomorphism with respect to addition and also satisfies (6.1b).
In particular, if A is linear, then ker A and Im A are defined by Def. 4.19, i.e.

ker A = A−1 {0} = {v ∈ V : A(v) = 0}, (6.3a)


Im A = A(V ) = {A(v) : v ∈ V }. (6.3b)

We denote the set of all F -linear maps from V into W by L(V, W ). The notions
vector space (or linear) monomorphism, epimorphism, isomorphism, endomorphism,
automorphism are then defined as in Def. 4.10. Moreover, V and W are called isomorphic
(denoted V ∼
= W ) if, and only if, there exists a vector space isomorphism A : V −→ W .

Before providing examples of linear maps in Ex. 6.7 below, we first provide and study
some basic properties of such maps.

Notation 6.2. For the composition of linear maps A and B, we often write BA instead
of B ◦ A. For the application of a linear map, we also often write Av instead of A(v).

Proposition 6.3. Let V, W, X be vector spaces over the field F .

(a) If A : V −→ W and B : W −→ X are linear, then so is BA.

(b) If A : V −→ W is a linear isomorphism, then A−1 is a linear isomorphism as well


(i.e. A−1 is not only bijective, but also linear).
6 LINEAR MAPS 114

(c) Let A ∈ L(V, W ). If UW is a subspace of W and UV is a subspace of V , then


A−1 (UW ) is a subspace of V and A(UV ) is a subspace of W . The most important
special cases are that ker A = A−1 ({0}) is a subspace of V and Im A = A(V ) is a
subspace of W .

(d) A ∈ L(V, W ) is injective if, and only if, ker A = {0}.

Proof. (a): We note BA to be a homomorphism with respect to addition by Prop.


4.11(a). Moreover, if v ∈ V and λ ∈ F , then

(BA)(λv) = B(A(λv)) = B(λ(Av)) = λ(B(Av)) = λ((BA)(v)),

proving the linearity of BA.


(b): A−1 is a homomorphism with respect to addition by Prop. 4.11(b). Moreover, if
w ∈ W and λ ∈ F , then
   
−1 −1 −1 −1
A (λw) = A λ A(A w) = A A λ(A w) = λ(A−1 w),
−1

proving the linearity of A−1 .


(c): According to Th. 4.20(a),(d), A−1 (UW ) is a subgroup of V and A(UV ) is a subgroup
of W . Since 0 ∈ UW , 0 ∈ A−1 (UW ). Moreover, if λ ∈ F and v ∈ A−1 (UW ), then
Av ∈ UW and, thus
A(λv) = λ(Av) ∈ UW ,
since UW is a subspace of W . This shows λv ∈ A−1 (UW ) and that A−1 (UW ) is a subspace
of V . If w ∈ Im A, then there exists v ∈ V with Av = w. Thus, if λ ∈ F , then

A(λv) = λ(Av) = λw,

showing λw ∈ Im A and that Im A is a subspace of W . In particular A(UV ) = Im(A↾UV )


is a subspace of W .
(d) is merely a restatement of Th. 4.20(e) for the current situation. 
Proposition 6.4. Let V be a vector space over the field F and let W be a set with
compositions + : W × W −→ W and · : F × W −→ W . If A : V −→ W is an
epimorphism with respect to + that also satisfies (6.1b), then W (with + and ·), is a
vector space over F as well.

Proof. Exercise. 
Proposition 6.5. Let V and W be vector spaces over the field F , and let A : V −→ W
be linear.
6 LINEAR MAPS 115

(a) A is injective if, and only if, for each linearly independent subset S of V , A(S) is
a linearly independent subset of W .

(b) A is surjective if, and only if, for each generating subset S of V , A(S) is a generating
subset of W .

(c) A is bijective if, and only if, for each basis B of V , A(B) is a basis of W .

Proof. Exercise. 
Theorem 6.6. Let V and W be vector spaces over the field F . Then each linear map
A : V −→ W is uniquely determined by its values on a basis of V . More precisely, if B is
a basis of V , (wb )b∈B is a family in W , and, for each v ∈ V , Bv and cv : Bv −→ F \ {0}
are as in Th. 5.19 (we now write cv instead of c to underline the dependence of c on v),
then the map
!
X X
A : V −→ W, A(v) = A cv (b) b := cv (b) wb , (6.4)
b∈Bv b∈Bv

is linear, and à ∈ L(V, W ) with

∀ Ã(b) = wb , (6.5)
b∈B

implies A = Ã.

Proof. We first verify A is linear. Let v ∈ V and λ ∈ F . If λ = 0, then A(λv) = A(0) =


0 = λA(v). If λ 6= 0, then Bλv = Bv , cλv = λcv , and
! !
X X X
A(λv) = A cλv (b) b = λ cv (b) wb = λ A cv (b) b = λA(v). (6.6a)
b∈Bλv b∈Bv b∈Bv

Now let u, v ∈ V . If u = 0, then A(u + v) = A(v) = 0 + A(v) = A(u) + A(v),


and analogously if v = 0. So assume u, v 6= 0. If u + v = 0, then v = −u and
A(u + v) = A(0) = 0 = A(u) + A(−u) = A(u) + A(v). If u + v 6= 0, then Bu+v ⊆ Bu ∪ Bv
and
 
X X
A(u + v) = A  cu+v (b) b = cu+v (b) wb
b∈Bu+v b∈Bu+v
X X
= cu (b) wb + cv (b) wb = A(u) + A(v). (6.6b)
b∈Bu b∈Bv
6 LINEAR MAPS 116

If v ∈ V and Bv and cv are as before, then the linearity of à and (6.5) imply
!
X Ã∈L(V,W ) X X
Ã(v) = Ã cv (b) b = cv (b) Ã(b) = cv (b) wb = A(v). (6.7)
b∈Bv b∈Bv b∈Bv

Since (6.7) establishes à = A, the proof is complete. 


Example 6.7. (a) Let V and W be finite-dimensional vector spaces over the field F ,
dim V = n, dim W = m, where m, n ∈ N0 . From Th. 6.6, we obtain a very useful
characterization of the linear maps between V and W : If V = {0} or W = {0},
then A ≡ 0 is the only element of L(V, W ). Now let b1 , . . . , bn ∈ V form a basis of
V and let c1 , . . . , cm ∈ W form a basis of W . Given A ∈ L(V, W ), define
∀ wi := A(bi ).
i∈{1,...,n}

Then, by Th. 5.19, there exists a unique family (aji )(j,i)∈{1,...,m}×{1,...,n} in F such
that m
X
∀ wi = A(bi ) = aji cj . (6.8)
i∈{1,...,n}
j=1
Pn
Thus, given v = λi bi ∈ V , λ1 , . . . , λn ∈ F , we find
i=1
n
! n n X m m n
!
X X X X X
A(v) = A λi bi = λi w i = λi aji cj = aji λi cj
i=1 i=1 i=1 j=1 j=1 i=1
m
X n
X
= µj cj , where µj := aji λi . (6.9)
j=1 i=1

Thus,
Pn from Th. 6.6, we concludePm that the map,
Pnassigning to each vector v =
i=1 λi bi ∈ V the vector j=1 µj cj with µj := i=1 aji λi is precisely the unique
linear map A : V −→ W , satisfying (6.8).
(b) Let V be a vector space over the field F , let B be a basis of V . Given v ∈ V , let
Bv and cv : Bv −→ F \ {0} be as in Th. 5.19. For each c ∈ B, define
(
cv (c) for c ∈ Bv ,
πc : V −→ F, πc (v) :=
0 for c ∈
/ Bv ,
calling πc the projection onto the coordinate with respect to c. Comparing with
(6.4), we see that πc is precisely the linear map A : V −→ W := F , determined by
the family (wb )b∈B in F with
(
1 for b = c,
∀ wb := δbc := (6.10)
b∈B 0 for b 6= c,
6 LINEAR MAPS 117

as this yields
! (
X X X cv (c) for c ∈ Bv ,
∀ A(v) = A cv (b) b = cv (b) wb = cv (b) δbc =
v∈V
b∈Bv b∈Bv b∈Bv
0 for c ∈
/ Bv .

In particular, πc is linear. Moreover, for each c ∈ B, we have


Im πc = F, ker πc = hB \ {c}i, V = hci ⊕ ker πc . (6.11)

(c) Let I be a nonempty set and let Y be a vector space over the field F . We then
know from Ex. 5.2(c) that V := F(I, Y ) = Y I is a vector space over F . For each
i ∈ I, define
πi : V −→ Y, πi (f ) := f (i),
calling πi the projection onto the ith coordinate (note that, for I = {1, . . . , n},
n ∈ N, Y = F , one has V = F n , πi (v1 , . . . , vn ) = vi ). We verify πi to be linear: Let
λ ∈ F and f, g ∈ V . Then
πi (λf ) = (λf )(i) = λ(f (i)) = λπi (f ),
πi (f + g) = (f + g)(i) = f (i) + g(i) = πi (f ) + πi (g),
proving πi to be linear. Moreover, for each i ∈ I, we have
Im πi = Y, ker πi = {(f : I −→ Y ) : f (i) = 0} ∼
= Y I\{i} , V = Y ⊕ ker πi ,
(6.12)
where, for the last equality, we identified
Y ∼
= {(f : I −→ Y ) : f (j) = 0 for each j 6= i}.
A generalization of the present example to general Cartesian products of vector
spaces can be found in Ex. E.4 of the Appendix. To investigate the relation between
the present projections and the projections of (b), let Y = F . We know from Ex.
5.16(c) that B = {ei : i ∈ I} with
(
1 if i = j,
∀ ei : I −→ F, ei (j) = δij :=
i∈I 0 if i 6= j,
I I I
is a basis of FP
fin , which is a subspace of F , such that the πi are defined on Ffin .
n
Now, if f = j=1 λij eij with λ1 , . . . , λn ∈ F ; i1 , . . . , in ∈ I; n ∈ N, and πeik is
defined as in (b) (k ∈ {1, . . . , n}), then
n
X n
X
πik (f ) = f (ik ) = λij eij (ik ) = λij δjk = λik = πeik (f ), (6.13)
j=1 j=1

showing πik ↾Ffin


I = πe .
ik
6 LINEAR MAPS 118

(d) We consider the set of complex numbers C as vector space over R with basis {1, i}.
Then the map of complex conjugation
A : C −→ C, A(z) = A(x + iy) = z̄ = x − iy,
is R-linear, since, for each z = x + iy, w = u + iv ∈ C,
z + w = x+u−iy−iv = z̄+w̄ ∧ zw = xu−yv−(xv+yu)i = (x−iy)(u−iv) = z̄ w̄
and z̄ = z for z ∈ R. As A is bijective with A = A−1 , A is even an automorphism.
However, complex conjugation is not C-linear (now considering C as a vector space
over itself), since, e.g., i · 1 = −i 6= i · 1̄ = i.
(e) By using results from Calculus, we obtain the following examples:
(i) Let V be the set of convergent sequences in K. According to [Phi16, Th.
7.13(a)], scalar multiples of convergent sequences are convergent and sums of
convergent sequences are convergent, showing V to be a subspace of KN , the
vector space over K of all sequences in K. If (zn )n∈N and (wn )n∈N are elements
of V and λ ∈ K, then, using [Phi16, Th. 7.13(a)] once again, we know
lim (λzn ) = λ lim zn , lim (zn + wn ) = lim zn + lim wn ,
n→∞ n→∞ n→∞ n→∞ n→∞

showing the map


A : V −→ K, A(zn )n∈N := lim zn , (6.14)
n→∞

to be linear. Moreover, we have


Im A = K, ker A = {(zn )n∈N ∈ V : lim zn = 0}, V = h(1)n∈N i ⊕ ker A,
n→∞
(6.15)
where the last equality is due to the fact that, if (zn )n∈N ∈ V with limn→∞ zn =
λ, then limn→∞ (zn − λ · 1) = 0, i.e. (zn )n∈N − λ(1)n∈N ∈ ker A.
(ii) Let a, b ∈ R, a < b, I :=]a, b[, and let V : {(f : I −→ K) : f is differentiable}.
According to [Phi16, Th. 9.7], scalar multiples of differentiable functions are
differentiable and sums of differentiable functions are differentiable, showing
V to be a subspace of KI , the vector space over K of all functions from I into
K. Using [Phi16, Th. 9.7] again, we know the map
A : V −→ KI , A(f ) := f ′ , (6.16)
to be linear. While ker A = {f ∈ V : f constant} (e.g. due to the fundamental
theorem of calculus in the form [Phi16, Th. 10.20(b)]), Im A is not so simple
to characterize (cf. [Phi17, Th. 2.11]).
6 LINEAR MAPS 119

(iii) Let a, b ∈ R, a ≤ b, I := [a, b], and let W := R(I, K) be the set of all K-valued
Riemann integrable functions on I. According to [Phi16, Th. 10.11(a)], scalar
multiples of Riemann integrable functions are Riemann integrable and sums
of Riemann integrable functions are Riemann integrable, showing W to be
another subspace of KI . Using [Phi16, Th. 10.11(a)] again, we know the map
Z
B : W −→ K, B(f ) := f, (6.17)
I

to be linear. Moreover, for a < b, we have


 Z 
Im B = K, ker B = f ∈ W : f =0 , W = h1i ⊕ ker B, (6.18)
I
R
where
R the last equality is due to the fact that, if f ∈ W with I
f = λ, then
λ λ
I
(f − b−a · 1) = 0, i.e. f − b−a · 1 ∈ ker B.

For some of the linear maps investigated in Ex. 6.7 above, we have provided the re-
spective kernels, images, and direct complements of the kernels. The examples suggest
relations between the dimensions of domain, image, and kernel of a linear map, which
are stated an proved in the following theorem:
Theorem 6.8 (Dimension Formulas). Let V and W be vector spaces over the field F ,
and let A : V −→ W be linear. Moreover, let BV be a basis of V , let Bker be a basis of
ker A, let BIm be a basis of Im A, and let BW be a basis of W .

(a) If U is a direct complement of ker A (i.e. if V = U ⊕ ker A), then A↾U : U −→ Im A


is bijective. In consequence, dim V = dim ker A + dim Im A, i.e. there exists a
bijective map φ : BV −→ (Bker × {0}) ∪(B˙ Im × {1}).

(b) dim Im A ≤ dim V , i.e. there exists an injective map from BIm into BV .
(c) dim Im A ≤ dim W , i.e. there exists an injective map from BIm into BW .

Proof. (a): If u ∈ ker A ↾U , then u ∈ U ∩ ker A, i.e. u = 0, i.e. A ↾U is injective.


If w ∈ Im A and Av = w, then v = v0 + u with v0 ∈ ker A and u ∈ U . Then
Au = A(v) − A(v0 ) = w − 0 = w, showing A↾U is surjective. According to Th. 5.30(b),
if BU is a basis of U , then B := Bker ∪˙ BU is a basis of V . Moreover, by Prop. 6.5(c),
A(BU ) is a basis of Im A. Thus, by Th. 5.23(c), it suffices to show there exists a bijective
˙
map ψ : B −→ (Bker × {0}) ∪(A(B U ) × {1}). If we define
(
(b, 0) for b ∈ Bker ,
˙
ψ : B −→ (Bker × {0}) ∪(A(B U ) × {1}), ψ(b) :=
(A(b), 1) for b ∈ BU ,
6 LINEAR MAPS 120

then ψ is bijective due to the bijectivity of A↾U .


(b) is immediate from (a).
(c) is due to Th. 5.27(b), since Im A is a subspace of W . 

The following theorem is one of the central results of Linear Algebra: It says that vector
spaces are essentially determined by the size of their bases:
Theorem 6.9. Let V and W be vector spaces over the field F . Then V ∼
= W (i.e. V
and W are isomorphic) if, and only if, dim V = dim W .

Proof. Suppose dim V = dim W . If BV is a basis of V and BW is a basis of W , then


there exists a bijective map φ : BV −→ BW . According to Th. 6.6, φ defines a unique
linear map A : V −→ W with A(b) = φ(b) for each b ∈ BV . More precisely, letting,
once again, for each v ∈ V , Bv and cv : Bv −→ F \ {0} be as in Th. 5.19 (writing cv
instead of c to underline the dependence of c on v),
!
X X
∀ A(v) = A cv (b) b = cv (b) φ(b). (6.19)
v∈V
b∈Bv b∈Bv
P
It remains to show A is bijective. If v 6= 0, then Bv 6= ∅ and A(v) = b∈Bv cv (b) φ(b) 6= 0,
since cv (b) 6= 0 and {φ(b) : b ∈ Bv } ⊆ BW is linearly independent, showing A is
injective by Prop. 6.3(d). If w ∈ P W , then there exists a finite set B̃w ⊆ BW and
c̃w : B̃w −→ F \ {0} such that w = b̃∈B̃w c̃w (b̃) b̃. Then
 
X A∈L(V,W ) X   φ−1 (b̃)∈B X  
c̃w (b̃) φ−1 (b̃) c̃w (b̃) A φ−1 (b̃) c̃w (b̃) φ φ−1 (b̃)
V
A = =
b̃∈B̃w b̃∈B̃w b̃∈B̃w
X
= c̃w (b̃) b̃ = w, (6.20)
b̃∈B̃w

showing Im A = W , completing the proof that A is bijective.


If A : V −→ W is a linear isomorphism and B is a basis for V , then, by Prop. 6.5(c),
A(B) is a basis for W . As A is bijective, so is A↾B , showing dim V = #B = #A(B) =
dim W as claimed. 
Theorem 6.10. Let V and W be finite-dimensional vector spaces over the field F ,
dim V = dim W = n ∈ N0 . Then, given A ∈ L(V, W ), the following three statements
are equivalent:

(i) A is an isomorphism.
6 LINEAR MAPS 121

(ii) A is an epimorphism.

(iii) A is a monomorphism.

Proof. If suffices to prove the equivalence between (ii) and (iii).


“(ii)⇒(iii)”: If A is an epimorphism, then W = Im A, implying
dim V < ∞, Th. 6.8(a)
dim ker A = dim V − dim Im A = dim V − dim W = n − n = 0,

showing ker A = {0}, i.e. A is a monomorphism.


“(iii)⇒(ii)”: If A is a monomorphism, then ker A = {0}. Thus, by Th. 6.8(a), n =
dim W = dim V = dim ker A + dim Im A = 0 + dim Im A = dim Im A. From Th. 5.27(d),
we then obtain W = Im A, i.e. A is an epimorphism. 

Example 6.11. The present example shows that the analogue of Th. 6.10 does not hold
for infinite-dimensional vector spaces: Let F be a field and V := F N (the vector space
of sequences in F – the example actually still works in precisely the same way over the
N
simpler space Ffin ). Define the maps

R : V −→ V, R(λ1 , λ2 , . . . , ) := (0, λ1 , λ2 , . . . , ),
L : V −→ V, L(λ1 , λ2 , . . . , ) := (λ2 , λ3 , . . . )

(R is called the right shift operator, L is called the left shift operator). Clearly, R and L
are linear and
L ◦ R = Id,
i.e. L is a left inverse for R, R is a right inverse for L. Thus, according to Th. 2.13,
R is injective and L is surjective (which is also easily seen directly from the respective
definitions of R and L). However,

Im R = (f : N −→ F ) : f (1) = 0 6= V (e.g. (1, 0, 0, . . . , ) ∈
/ Im R),

ker L = (f : N −→ F ) : f (k) = 0 for k 6= 1 6= {0} (e.g. 0 6= (1, 0, 0, . . . , ) ∈ ker L),

showing R is not surjective, L is not injective.

6.2 Quotient Spaces


In Def. 4.26, we defined the quotient group G/N of a group G with respect to a normal
subgroup N . Now, if V is a vector space over a field F , then, in particular, (V, +) is a
commutative group. Thus, every subgroup of V is normal. Thus, if U is a subspace of
6 LINEAR MAPS 122

V , then we can always form the quotient group V /U . We will see below that we can
even make V /U into a vector space over F , called the quotient space or factor space of
V with respect to U . As we write (V, +) as an additive group, we write the respective
cosets (i.e. the elements of V /U ) as v + U , v ∈ V .
Theorem 6.12. Let V be a vector space over the field F and let U be a subspace of V .

(a) The compositions

+ : V /U × V /U −→ V /U, (v + U ) + (w + U ) := v + w + U, (6.21a)
· : F × V /U −→ V /U, λ · (v + U ) := λv + U, (6.21b)

are well-defined, i.e. the results do not depend on the chosen representatives of the
respective cosets.
(b) The natural (group) epimorphism of Th. 4.27(a),

φU : V −→ V /U, φU (v) := v + U, (6.22)

satisfies

∀ φU (v + w) = φU (v) + φU (w), (6.23a)


v,w∈V

∀ ∀ φU (λv) = λ φU (v). (6.23b)


λ∈F v∈V

(c) V /U with the compositions of (a) forms a vector space over F and φU of (b)
constitutes a linear epimorphism.

Proof. (a): The composition + is well-defined by Th. 4.27(a). To verify that · is well-
defined as well, let v, w ∈ V , λ ∈ F , and assume v + U = w + U . We need to show
λv + U = λw + U . If x ∈ λv + U , then there exists u1 ∈ U such that x = λv + u1 . Since
v + U = w + U , there then exists u2 ∈ U such that v + u1 = w + u2 and v = w + u2 − u1 .
Thus, x = λv+u1 = λw+λ(u2 −u1 )+u1 . Since U is a subspace of V , λ(u2 −u1 )+u1 ∈ U ,
showing x ∈ λw + U and λv + U ⊆ λw + U . As we can switch the roles of v and w in
the above argument, we also have λw + U ⊆ λv + U and λv + U = λw + U , as desired.
(b): (6.23a) holds, as φU is a homomorphism with respect to + by Th. 4.27(a). To
verify (6.23b), let λ ∈ F , v ∈ V . Then
(6.21b)
φU (λv) = λv + U = λ(v + U ) = λφU (v),

establishing the case.


(c) is now immediate from (b) and Prop. 6.4. 
6 LINEAR MAPS 123

One can sometimes obtain information about a vector space V by studying a subspace
U and the corresponding quotient space V /U , both of which are “smaller” spaces in the
sense of the following Cor. 6.13:
Corollary 6.13. Let V be a vector space over the field F and let U be a subspace of V .
Let φ : V −→ V /U , φ(v) := v + U , denote the natural epimorphism.

(a) If B is a basis of V and BU ⊆ B is a basis of U , then B/ := {b + U : b ∈ B \ BU }


is a basis of V /U .

(b) Let BU be a basis of U and let W be another subspace of V with basis BW . If


φ↾W : W −→ V /U is bijective, then BU ∪˙ BW forms a basis of V .

(c) dim V = dim U + dim V /U (cf. Th. 6.8(a)).

Proof. (a): Letting W := hB\BU i, we have V = U ⊕W by Th. 5.30(d). Since U = ker φ,


we have V = ker φ ⊕ W and φ ↾W : W −→ V /U is bijective by Th. 6.8(a) and, thus,
B/ = φ↾W (B \ BU ) is a basis of V /U .
(b): Let v ∈ V . Then there exists w ∈ W with v + U = φ(v) = φ(w) = w + U , implying
v − w ∈ U . Thus, v = w + v − w ∈ W + U , showing V = W + U . On the other hand,
if v ∈ U ∩ W , then φ(v) = v + U = U = φ(0) and, thus, v = 0, showing V = W ⊕ U .
Then BU ∪˙ BW forms a basis of V by Th. 5.30(b).
(c): Since U = ker φ and V /U = Im φ, (c) is immediate from Th. 6.8(a). 

From Def. 4.26 and Th. 4.27(a), we already know that the elements of V /U are precisely
the equivalence classes of the equivalence relation defined by
 
∀ v ∼ w :⇔ v + U = w + U ⇔ v − w ∈ U . (6.24)
v,w∈V

Here, in the context of vector spaces, it turns out that the equivalence relations on V
that define quotient spaces, are precisely so-called linear equivalence relations:
Definition 6.14. Let V be a vector space over the field F and let R ⊆ V × V be a
relation on V . Then R is called linear if, and only if, it has the following two properties:

(i) If v, w, x, y ∈ V with vRw and xRy, then (v + x)R(w + y).

(ii) If v, w ∈ V and λ ∈ F with vRw, then (λv)R(λw).


Proposition 6.15. Let V be a vector space over the field F .

(a) If U is a subspace of V , then the equivalence relation defined by (6.24) is linear.


6 LINEAR MAPS 124

(b) Let ∼ be a linear equivalence relation on V . Then U := {v ∈ V : v ∼ 0} is a


subspace of V and ∼ satisfies (6.24).

Proof. (a): The linearity of ∼ defined by (6.24) is precisely the assertion that the
compositions + and · of Th. 6.12(a) are well-defined.
(b): We have 0 ∈ U , as ∼ is reflexive. If v, w ∈ U and λ ∈ F , then the linearity of ∼
implies v + w ∼ 0 + 0 = 0 and λv ∼ λ0 = 0, i.e. v + w ∈ U and λv ∈ U , showing U to be
a subspace. Now suppose v, w ∈ V with v ∼ w. Then the linearity of ∼ yields v −w ∼ 0,
i.e. v − w ∈ U . If x = v + u with u ∈ U , then x = w + (x − w) = w + (v + u − w) ∈ w + U ,
showing v + U ⊆ w + U . As ∼ is symmetric, we then have w + U ⊆ v + U as well.
Conversely, if v, w ∈ V with v + U = w + U , then v − w ∈ U , i.e. v − w ∼ 0 and the
linearity of ∼ implies v ∼ w, proving ∼ satisfies (6.24). 

In Th. 4.27(b), we proved the isomorphism theorem for groups: If G and H are groups
and φ : G −→ H is a homomorphism, then G/ ker φ ∼ = Im φ. Since vector spaces V and
W are, in particular, groups, if A : V −→ W is linear, then V / ker A ∼
= Im A as groups,
and it is natural to ask, whether V / ker A and Im A are isomorphic as vector spaces. In
Th. 6.16(a) below we see this, indeed, to be the case.
Theorem 6.16 (Isomorphism Theorem). Let V and X be vector spaces over the field
F , let U, W be subspaces of V .

(a) If A : V −→ X is linear, then

V / ker A ∼
= Im A. (6.25)

More precisely, the map

f : V / ker A −→ Im A, f (v + ker A) := A(v), (6.26)

is well-defined and constitutes an linear isomorphism. If fe : V −→ V / ker A


denotes the natural epimorphism and ι : Im A −→ X, ι(v) := v, denotes the
embedding, then fm : V / ker A −→ X, fm := ι ◦ f , is a linear monomorphism such
that
A = fm ◦ fe . (6.27)

(b) One has


(U + W )/W ∼
= U/(U ∩ W ). (6.28)

(c) If U is a subspace of W , then

(V /U )/(W/U ) ∼
= V /W. (6.29)
6 LINEAR MAPS 125

Proof. (a): All assertions, except the linearity assertions, were already proved in Th.
4.27(b). Moreover, fe is linear by Th. 6.12(b). Thus, it merely remains to show

f λ(v + ker A) = λ f (v + ker A)
for each λ ∈ F and each v ∈ V (then fm is linear, as both f and ι are linear). Indeed,
if λ ∈ F and v ∈ V , then

f λ(v + ker A) = f (λv + ker A) = A(λv) = λ A(v) = λ f (v + ker A),
as desired.
(b): According to (a), it suffices to show that
A : U + W −→ U/(U ∩ W ), A(u + w) := u + (U ∩ W ) for u ∈ U , w ∈ W (6.30)
well-defines a linear map with ker A = W . To verify that A is well-defined, let u, u1 ∈ U
and w, w1 ∈ W with u + w = u1 + w1 . Then u − u1 = w1 − w ∈ U ∩ W and, thus,
A(u + w) = u + (U ∩ W ) = u1 + w1 − w + (U ∩ W ) = u1 + (U ∩ W ) = A(u1 + w1 ),
proving A to be well-defined by (6.30). To prove linearity, we no longer assume u + w =
u1 + w1 and calculate
 
A (u + w) + (u1 + w1 ) = A (u + u1 ) + (w + w1 ) = u + u1 + (U ∩ W )
 
= u + (U ∩ W ) + u1 + (U ∩ W )
= A(u + w) + A(u1 + w1 ),
as well as, for each λ ∈ F ,
 
A λ(u + w) = A(λu + λw) = λu + (U ∩ W ) = λ u + (U ∩ W ) = λ A(u + w),
as needed. If w ∈ W , then A(w) = A(0+w) = U ∩W , showing w ∈ ker A and W ⊆ ker A.
Conversely, let u + w ∈ ker A, u ∈ U , w ∈ W . Then A(u + w) = u + (U ∩ W ) = U ∩ W ,
i.e. u ∈ U ∩ W and u + w ∈ W , showing ker A ⊆ W .
(c): Exercise. 

6.3 Vector Spaces of Linear Maps


Definition 6.17. Let V and W be vector spaces over the field F . We define an addition
and a scalar multiplication on L(V, W ) by
(A + B) : V −→ W, (A + B)(x) := A(x) + B(x), (6.31a)
(λ · A) : V −→ W, (λ · A)(x) := λ · A(x) for each λ ∈ F . (6.31b)
6 LINEAR MAPS 126

Theorem 6.18. Let V and W be vector spaces over the field F . The addition and
scalar multiplication on L(V, W ) given by (6.31) are well-defined in the sense that, if
A, B ∈ L(V, W ) and λ ∈ F , then A + B ∈ L(V, W ) and λA ∈ L(V, W ). Moreover, with
the operations defined in (6.31), L(V, W ) forms a vector space over F .

Proof. Exercise. 
Theorem 6.19. Let V and W be vector spaces over the field F , let BV and BW be bases
of V and W , respectively. Using Th. 6.6, define maps Avw ∈ L(V, W ) by letting
(
w for v = ṽ,
∀ Avw (ṽ) := (6.32)
w,v,ṽ∈BW ×BV ×BV 0 for v 6= ṽ.

Define 
B := Avw : (v, w) ∈ BV × BW . (6.33)

(a) B is linearly independent.

(b) If V is finite-dimensional, dim V = n ∈ N, BV = {v1 , . . . , vn }, then B constitutes a


basis for L(V, W ). If, in addition, dim W = m ∈ N, BW = {w1 , . . . , wm }, then we
can write
dim L(V, W ) = dim V · dim W = n · m. (6.34)

(c) If dim V = ∞ and dim W ≥ 1, then hBi ( L(V, W ) and, in particular, B is not a
basis of L(V, W ).

Proof. (a): Let N, M ∈ N. Let v1 , . . . , vN ∈ BV be distinct and let w1 , . . . , wM ∈ BW


be distinct as well. If λji ∈ F , (j, i) ∈ {1, . . . , M } × {1, . . . , N }, are such that
M X
X N
A := λji Avi wj = 0,
j=1 i=1

then, for each k ∈ {1, . . . , N },


M X
X N M X
X N M
X
0 = A(vk ) = λji Avi wj (vk ) = λji δik wj = λjk wj
j=1 i=1 j=1 i=1 j=1

implies λ1k = · · · = λM k = 0 due to the linear independence of the wj . As this holds for
each k ∈ {1, . . . , N }, we have established the linear independence of B.
(b): According to (a), it remains to show hBi = L(V, W ). Let A P ∈ L(V, W ) and
i ∈ {1, . . . , n}. Then there exists a finite set Bi ⊆ BW such that Avi = w∈Bi λw w with
6 LINEAR MAPS 127

λw ∈ F . Now let w1 , . . . , wM , M ∈ N, be an enumeration of the finite set B1 ∪ · · · ∪ Bn .


Then there exist aji ∈ F , (j, i) ∈ {1, . . . , M } × {1, . . . , n}, such that
M
X
∀ A(vi ) = aji wj .
i∈{1,...,n}
j=1

PM Pn
Letting L := j=1 i=1 aji Avi wj , we claim A = L. Indeed,
M X
X n M X
X n M
X
∀ L(vk ) = aji Avi wj (vk ) = aji δik wj = ajk wj = A(vk ),
k∈{1,...,n}
j=1 i=1 j=1 i=1 j=1

proving L = A by Th. 6.6. Since L ∈ hBi, the proof of (b) is complete.


(c): As dim W ≥ 1, there exists w ∈ BW . If A ∈ hBi, then {v ∈ BV : Av 6= 0} is finite.
Thus, if BV is infinite, then the map A ∈ L(V, W ) with A(v) := w for each v ∈ BV is
not in hBi, proving (c). 
Lemma 6.20. Let V, W, X be vector spaces over the field F . Recalling Not. 6.2, we
have:

(a) If A ∈ L(W, X) and B, C ∈ L(V, W ), then

A(B + C) = AB + AC. (6.35)

(b) If A, B ∈ L(W, X) and C ∈ L(V, W ), then

(A + B)C = AC + BC. (6.36)

Proof. (a): For each v ∈ V , we calculate



A(B + C) v = A(Bv + Cv) = A(Bv) + A(Cv) = (AB)v + (AC)v = (AB + AC)v,

proving (6.35).
(b): For each v ∈ V , we calculate

(A + B)C v = (A + B)(Cv) = A(Cv) + B(Cv) = (AC)v + (BC)v = (AC + BC)v,

proving (6.36). 
Theorem 6.21. Let V be a vector space over the field F . Then R := L(V, V ) con-
stitutes a ring with unity with respect to pointwise addition and map composition as
multiplication. If dim V > 1, then R is not commutative and it has nontrivial divisors
of 0.
6 LINEAR MAPS 128

Proof. Since (V, +) is a commutative group, R is a subring (with unity Id ∈ R) of the


ring with unity of group endomorphisms End(V ) of Ex. 4.42(b) (that R constitutes a
ring can, alternatively, also be obtained from Lem. 6.20 with V = W = X). Now let
v1 , v2 ∈ V be linearly independent and B a basis of V with v1 , v2 ∈ B. Define A1 , A2 ∈ R
by letting, for v ∈ B,
( (
v2 for v = v1 , v1 for v = v2 ,
A1 v := A2 v := (6.37)
0 otherwise, 0 otherwise.

Then A1 A2 v1 = 0, but A2 A1 v1 = A2 v2 = v1 , showing A1 A2 6= A2 A1 . Moreover,


A21 v1 = A1 v2 = 0 and A22 v2 = A2 v1 = 0, showing A21 ≡ 0 and A22 ≡ 0. In particular,
both A1 and A2 are nontrivial divisors of 0. 

Definition 6.22. Let V be a vector space over the field F .

(a) A linear endomorphism A ∈ L(V, V ) is called regular if, and only if, it is an au-
tomorphism (i.e. if, and only if, it is bijective/invertible). Otherwise, A is called
singular.

(b) The set GL(V ) := {A ∈ L(V, V ) : A is regular} = L(V, V )∗ (cf. Def. and Rem.
4.41) is called the general linear group of V (cf. Cor. 6.23 below).

Corollary 6.23. Let V be a vector space over the field F . Then the general linear
group GL(V ) is, indeed, a group with respect to map composition. For V 6= {0}, it is
not a group with respect to addition (note 0 ∈
/ GL(V ) in this case). If dim V ≥ 3 or
F 6= {0, 1} and dim V ≥ 2, then the group GL(V ) is not commutative.

Proof. Since L(V, V ) is a ring with unity, we know GL(V ) = L(V, V )∗ to be a group
by Def. and Rem. 4.41 – since the composition of linear maps is linear, GL(V ) is also a
subgroup of the symmetric group SV (cf. Ex. 4.9(b)). If dim V ≥ 3, then the noncom-
mutativity of GL(V ) follows as in Ex. 4.9(b), where the elements a, b, c of (4.7) are now
taken as distinct elements of a basis of V . Now let v1 , v2 ∈ V be linearly independent
and B a basis of V with v1 , v2 ∈ B. Define A1 , A2 ∈ L(V, V ) by letting, for v ∈ B,
 
v2 for v = v1 ,
 −v1 for v = v1 ,

A1 v := v1 for v = v2 , A2 v := v2 for v = v2 , (6.38)

 

v otherwise, v otherwise.

Clearly, A1 , A2 ∈ GL(V ) with A−1 −1


1 = A1 and A2 = A2 . Moreover, A1 A2 v1 = −A1 v1 =
−v2 , A2 A1 v1 = A2 v2 = v2 . If F 6= {0, 1}, then v2 6= −v2 , showing A1 A2 6= A2 A1 . 
7 MATRICES 129

7 Matrices

7.1 Definition and Arithmetic


Matrices provide a convenient representationfor linear maps A between finite-dimension-
al vector spaces V and W . Recall the basis Avi wj : (j, i) ∈ {1, . . . , m} × {1, . . . , n} of
L(V, W ) that, in Th. 6.19, was shown to arise from bases {v1 , . . . , vn } and {w1 , . . . , wm }
of V and W , respectively; m, n ∈ N. Thus, each A ∈ L(V, W ) can be written in the
form m X n
X
A= aji Avi wj , (7.1)
j=1 i=1

with coordinates (aji )(j,i)∈{1,...,m}×{1,...,n} in F (also cf. Ex. 6.7(a)). This motivates the
following definition of matrices, where, however, instead of F , we allow F an arbitrary
nonempty set S, as it turns out to be sometimes useful to have matrices with entries
that are not necessarily elements of a field.

Definition 7.1. Let S be a nonempty set and m, n ∈ N. A family (aji )(j,i)∈{1,...,m}×{1,...,n}


in S is called an m-by-n or an m × n matrix over S, where m × n is called the size,
dimension or type of the matrix. The aji are called the entries or elements of the matrix.
One also writes just (aji ) instead of (aji )(j,i)∈{1,...,m}×{1,...,n} if the size of the matrix is
understood. One usually thinks of the m × n matrix (aji ) as the rectangular array
 
a11 . . . a1n
(aji ) =  ... .. .. 

. .  (7.2)
am1 . . . amn

with m rows and n columns. One therefore also calls 1 × n matrices row vectors and
m × 1 matrices column vectors, and one calls n × n matrices quadratic. One calls the
elements akk the (main) diagonal elements of (aji ) and one also says that these elements
lie on the (main) diagonal of the matrix and that they form the (main) diagonal of the
matrix. The set of all m × n matrices over S is denoted by M(m, n, S), and for the set
of all quadratic n × n matrices, one uses the abbreviation M(n, S) := M(n, n, S).

Definition 7.2 (Matrix Arithmetic). Let F be a field and m, n, l ∈ N. Let S be a ring


or a vector space over F (e.g. S = F ).

(a) Matrix Addition: For m × n matrices (aji ) and (bji ) over S, define the sum

(aji ) + (bji ) := (aji + bji ) ∈ M(m, n, S). (7.3)


7 MATRICES 130

(b) Scalar Multiplication: Let (aji ) be an m × n matrix over S. If S is a ring, then let
λ ∈ S; if S is a vector space over F , then let λ ∈ F . Define

λ (aji ) := (λ aji ) ∈ M(m, n, S). (7.4)

(c) Matrix Multiplication: Let S be a ring. For each m × n matrix (aji ) and each n × l
matrix (bji ) over S, define the product
n
!
X
(aji )(bji ) := ajk bki ∈ M(m, l, S), (7.5)
k=1 (j,i)∈{1,...,m}×{1,...,l}

i.e. the product of an m × n matrix and an n × l matrix is an m × l matrix (cf. Th.


7.13 below).

Example 7.3. As an example of matrix multiplication, we compute


 
  −3 0  
1 −1 0  1 · (−3) + (−1) · 1 + 0 · 0 1 · 0 + (−1) · 1 + 0 · (−1)
1 1 =
2 0 −2 2 · (−3) + 0 · 1 + (−2) · 0 2 · 0 + 0 · 1 + (−2) · (−1)
0 −1
 
−4 −1
= .
−6 2

Remark 7.4. Let S be a nonempty set and m, n ∈ N.

(a) An m × n matrix A := (aji )(j,i)∈{1,...,m}×{1,...,n} is defined as a family in S, i.e.,


recalling Def. 2.15(a), A is defined as the function A : {1, . . . , m}×{1,
 . . . , n} −→ S,
A(j, i) = aji ; and M(m, n, S) = F {1, . . . , m} × {1, . . . , n}, S (so we notice that
matrices are nothing new in terms of objects, but just a new way of thinking about
functions from {1, . . . , m} × {1, . . . , n} into S, that turns out to be convenient in
certain contexts). Thus, if F is a field and S = Y is a vector space over F , then the
operations defined in Def. 7.2(a),(b) are precisely the same operations that were
defined in (5.5) and M(m, n, Y ) is a vector space according to Ex. 5.2(c). Clearly,
the map
I : M(m, n, Y ) −→ Y m·n , (aji ) 7→ (y1 , . . . , ym·n ),
(7.6)
where yk = aji if, and only if, k = (j − 1) · n + i,
constitutes a linear isomorphism. For Y = F , other important linear isomorphisms
between M(m, n, F ) and vector spaces of linear maps will be provided in Th. 7.10(a)
below.
7 MATRICES 131

(b) Let A := (aji )(j,i)∈{1,...,m}×{1,...,n} be an m × n matrix over S. Then, for each i ∈


{1, . . . , n}, the ith column of A,
 
a1i
A  .. 
ci := ci :=  . 
ami
can be considered both as an m × 1 matrix and as an element of S m . In particular,
if S = F is a field, then one calls ci the ith column vector of A. Analogously, for
each j ∈ {1, . . . , m}, the jth row of A,

rj := rjA := aj1 . . . ajn
can be considered both as a 1 × n matrix and as an element of S n . In particular, if
S = F is a field, then one calls rj the jth row vector of A.

It can sometimes be useful to think of matrix multiplication in terms of the matrix


columns and rows of Rem. 7.4(b) in a number of different ways, as compiled in the
following lemma:
Lemma 7.5. Let S be a ring, let l, m, n ∈ N, let A := (aji ) be an m × n matrix over
S, and let B := (bji ) be an n × l matrix over S. Moreover, let cA A A A
1 , . . . , cn and r1 , . . . , rm
denote the columns and rows of A, respectively, let cB B B B
1 , . . . , cl and r1 , . . . , rn denote the
columns and rows of B (cf. Rem. 7.4(b)), i.e.
   
r1A r1B
A = (aji ) = cA . . . cA
  .. 
B = (bji ) = cB . . . cB

=  ...  .
 
1 n =  . , 1 l
A
rm rnB

(a) Consider
 
v1
 ..  
v :=  .  ∈ M(n, 1, S) ∼
= S n, w := w1 . . . wn ∈ M(1, n, S) ∼
= S n.
vn
Then n n
X X
Av = cA
k vk ∈ M(m, 1, S), wB = wk rkB ∈ M(1, l, S),
k=1 k=1

where we wrote cA
k vk , as we do not assume S to be a commutative ring – if S is
commutative (e.g. a field), then the more familiar form vk cA
k is also admissible.
7 MATRICES 132

(b) Columnwise Matrix Multiplication: One has

∀ cAB
i = A cB
i .
i∈{1,...,l}

(c) Rowwise Matrix Multiplication: One has

∀ rjAB = rjA B.
j∈{1,...,m}

Proof. (a): Let x := Av, y := wB. We then have


   
x1 a1k
x =  ...  , cA
 ..   
rkB = bk1 . . . bkl ,
 
k =  . , y = y1 . . . yl ,
xm amk

yielding
n n
!
X X
∀ xj = ajk vk = cA
k vk ,
j∈{1,...,m}
k=1 k=1 j
n n
!
X X
∀ yi = wk bki = wk rkB ,
i∈{1,...,l}
k=1 k=1 i

thereby proving (a).


(b): For each i ∈ {1, . . . , l}, j ∈ {1, . . . , m}, one computes
n
X
(cAB
i )j = (AB)ji = ajk bki = (A cB
i )j ,
k=1

proving (b).
(c): For each j ∈ {1, . . . , m}, i ∈ {1, . . . , l}, one computes
n
X
(rjAB )i = (AB)ji = ajk bki = (rjA B)i ,
k=1

proving (c). 
Example 7.6. Using the matrices from Ex. 7.3, we have
 
    −3      
−4 1 −1 0   1 −1 0
= 1 = · (−3) + ·1+ ·0
−6 2 0 −2 2 0 −2
0
7 MATRICES 133

and
 
 −3 0    
−4 −1 = 1 −1 0  1 1  = 1 · −3 0 + (−1) · 1 1 + 0 · 0 −1 .
0 −1
Theorem 7.7. Let S be a ring.

(a) Matrix multiplication of matrices over S is associative whenever all relevant multi-
plications are defined. More precisely, if A = (aji ) is an m × n matrix, B = (bji ) is
an n × l, and C = (cji ) is an l × p matrix, then

(AB)C = A(BC). (7.7)

(b) Matrix multiplication of matrices over S is distributive whenever all relevant mul-
tiplications are defined. More precisely, if A = (aji ) is an m × n matrix, B = (bji )
and C = (cji ) are n × l matrices, and D = (dji ) is an l × p matrix, then

A(B + C) = AB + AC, (7.8a)


(B + C)D = BD + CD. (7.8b)

(c) For each n ∈ N, M(n, S) is a ring. If S is a ring with unity, then M(n, S)
is a ring with unity as well, where the additive neutral element is the zero matrix
0 ∈ M(n, S) (all entries are zero) and the multiplicative neutral element of M(n, S)
is the so-called identity matrix Idn := (δji ),
 
( 1
1 for i = j,
Idn =  . . .  ,
 
δji := (7.9)
0 for i 6= j,
1
where the usual meaning of such notation is that all omitted entries are 0. We call
GLn (S) := M(n, S)∗ , i.e. the group of invertible matrices over S, general linear
group (we will see in Th. 7.10(a) below that, if S = F is a field and V is a vector
space of dimension n over F , then GLn (F ) is isomorphic to the general linear group
GL(V ) of Def. 6.22(b)). Analogous to Def. 6.22(a), we call the matrices in GLn (S)
regular and the matrices in M(n, S) \ GLn (S) singular.

Proof. (a): One has m × p matrices (AB)C = (dji ) and A(BC) = (eji ), where, using
associativity and distributivity in S, one computes
l n
! l X n n l
!
X X X X X
dji = ajk bkα cαi = ajk bkα cαi = ajk bkα cαi = eji ,
α=1 k=1 α=1 k=1 k=1 α=1
7 MATRICES 134

thereby proving (7.7).


(b): Exercise.
(c): While M(n, S) is a ring due to (a), (b), and Ex. 4.9(e) (which says that (M(n, S), +)
is a commutative group), we check the neutrality of Idn : For each A = (aji ) ∈ M(n, S),
we obtain
n
! n
!
X X
Idn A = δjk aki = (aji ) = A = ajk δki = A Idn ,
k=1 k=1
thereby establishing the case. 
Caveat 7.8. Matrix multiplication of matrices over a ring S is, in general, not com-
mutative, even if S is commutative: If A is an m × n matrix and B is an n × l matrix
with m 6= l, then BA is not even defined. If m = l, but m 6= n, then AB has dimension
m × m, but BA has different dimension, namely n × n. And even if m = n = l > 1,
then commutativity is, in general, not true – for example, if S is a ring with unity and
0 6= 1 ∈ S, then
    
1 1 ... 1 1 0 ... 0 λ 0 ... 0
0 0 . . . 0 1 0 . . . 0  0 0 . . . 0
    
 .. .. .. ..   .. .. .. ..  =  .. .. .. ..  , (7.10a)
. . . . . . . .  . . . .
0 0 ... 0 1 0 ... 0 0 0 ... 0
    
1 0 ... 0 1 1 ... 1 1 1 ... 1
1 0 . . . 0 0 0 . . . 0 1 1 . . . 1
    
 .. .. .. ..   .. .. .. ..  =  .. .. .. ..  . (7.10b)
. . . . . . . . . . . .
1 0 ... 0 0 0 ... 0 1 1 ... 1
Note that λ = m for S = R or S = Z, but, in general, λ will depend on S, e.g. for
S = {0, 1}, one obtains λ = m mod 2.

7.2 Matrices as Representations of Linear Maps


Remark 7.9. Coming back to the situation Pn discussed at the beginning of the previous
section above, resulting in (7.1), let v = i=1 λi vi ∈ V with λ1 , . . . , λn ∈ F . Then (also
cf. the calculation in (6.9))
Xn n
X m X
X n n m
(6.32) X X
A(v) = λi A(vi ) = λi ajk Avk wj (vi ) = λi aji wj
i=1 i=1 j=1 k=1 i=1 j=1
m n
!
X X
= aji λi wj . (7.11)
j=1 i=1
7 MATRICES 135

Thus, if we represent v by a column vector ṽ (an n×1 matrix) containing its coordinates
λ1 , . . . , λn with respect to the basis {v1 , . . . , vn } and A(v) by a column vector w̃ (an m×1
matrix) containing its coordinates with respect to the basis {w1 , . . . , wm }, then (7.11)
shows  
λ1
 .. 
w̃ = M ṽ = M  .  , where M := (aji ) ∈ M(m, n, F ). (7.12)
λn

For finite-dimensional vector spaces, the precise relationship between linear maps, bases,
and matrices is provided by the following theorem:

Theorem 7.10. Let V and W be finite-dimensional vector spaces over the field F ,
let BV := {v1 , . . . , vn } and BW := {w1 , . . . , wm } be bases of V and W , respectively;
m, n ∈ N, n = dim V , m = dim W .

(a) The map I := I(BV , BW ),

I : L(V, W ) −→ M(m, n, F ), A 7→ (aji ), (7.13)

where the aji are given by (7.1), i.e. by


m X
X n
A= aji Avi wj ,
j=1 i=1

constitutes a linear isomorphism. Moreover, in the special case V = W , vi = wi


for i ∈ {1, . . . , n}, the map I also constitutes a ring isomorphism I : L(V, V ) ∼
=
M(n, F ); its restriction to GL(V ) then constitutes a group isomorphism I : GL(V )

= GLn (F ).

(b) Let aji ∈ F , j ∈ {1, . . . , m}, i ∈ {1, . . . , n}. Moreover, let A ∈ L(V, W ). Then
(7.1) holds if, and only if,
m
X
∀ Avi = aji wj . (7.14)
i∈{1,...,n}
j=1


Proof. (a): According to Th. 6.19, Avi wj : (j, i) ∈ {1, .. . , m} × {1, . . . , n} forms
a basis of L(V, W ). Thus, to every family of coordinates aji : (j, i) ∈ {1, . . . , m} ×
7 MATRICES 136


{1, . . . , n} in F , (7.1) defines a unique element of L(V, W ), i.e. I is bijective. It remains
to verify that I is linear. To this end, let λ, µ ∈ F and A, B ∈ L(V, W ) with
m X
X n
A= aji Avi wj , (aji ) = I(A) ∈ M(m, n, F ),
j=1 i=1
Xm X n
B= bji Avi wj , (bji ) = I(B) ∈ M(m, n, F ).
j=1 i=1

Then
m X
X n m X
X n m X
X n
λA + µB = λ aji Avi wj + µ bji Avi wj = (λaji + µbji )Avi wj ,
j=1 i=1 j=1 i=1 j=1 i=1

showing

I(λA + µB) = (λaji + µbji ) = λ(aji ) + µ(bji ) = λI(A) + µI(B),

proving the linearity of I. Now let V = W and vi = wi for each i ∈ {1, . . . , n}. We
claim n Xn n
X X
BA = cji Avi vj , where cji = bjk aki : (7.15)
j=1 i=1 k=1

Indeed, for each l ∈ {1, . . . , n}, one calculates


n X n
! n
! n X
n n
X X X X
(BA)vl = B aki Avi vk vl = B akl vk = bji Avi vj akl vk
k=1 i=1 k=1 j=1 i=1 k=1
n
XX n n
X n X
X n
= bjk akl vj = cjl vj = cji Avi vj vl ,
j=1 k=1 j=1 j=1 i=1

thereby proving (7.15). Thus,


n
!
X
I(BA) = bjk aki = (bji )(aji ) = I(B)I(A),
k=1

proving I to be a ring isomorphism. Since, clearly, I(GL(V )) = GLn (F ), the proof of


(a) is complete.
(b): If (7.1) holds, then the calculation (7.11) (with λk := δik ) proves (7.14). Conversely,
assume (7.14). It was then shown in the proof of Th. 6.19(b) that (7.1) must hold. 
7 MATRICES 137

Definition and Remark 7.11. In the situation of Th. 7.10, for each A ∈ L(V, W ), one
calls the matrix I(A) = (aji ) ∈ M(m, n, F ) the (transformation) matrix corresponding
to A with respect to the basis {v1 , . . . , vn } of V and the basis {w1 , . . . , wm } of W . If the
bases are understood, then one often tends to identify the map with its corresponding
matrix.
However, as I(A) depends on the bases, identifying A and I(A) is only admissible as
long as one keeps the bases of V and W fixed! Moreover, if one represents matrices as
rectangular arrays as in (7.2) (which one usually does), then one actually considers the
basis vectors of {v1 , . . . , vn } and {w1 , . . . , wm } as ordered from 1 to n (resp. m), i.e. I(A)
actually depends on the so-called ordered bases (v1 , . . . , vn ) and (w1 , . . . , wm ) (ordered
bases are tuples rather than sets and the matrix corresponding to A changes if the order
of the basis vectors changes).
P
Similarly, we had seen in (7.12) that it can be useful to identify a vector v = ni=1 λi vi
with its coordinates (λ1 , . . . , λn ), typically represented as an n × 1 matrix (a column
vector, as in (7.12)) or a 1 × n matrix (a row vector). Obviously, this identification is
also only admissible as long as the basis {v1 , . . . , vn } and its order is kept fixed.

Example 7.12. Let V := Q4 with ordered basis BV := (v1 , v2 , v3 , v4 ) and let W := Q2


with ordered basis BW := (w1 , w2 ). Moreover, assume A ∈ L(V, W ) satisfies

Av1 = w1
Av2 = 2w1 − w2
Av3 = 3w1
Av4 = 4w1 − 2w2 .

According to Th. 7.10(a),(b), the coefficients on the right-hand side of the above equa-
tions provide the columns of the matrix representing A: The matrix I(A) of A with
respect to BV and BW (I according to Th. 7.10(a)) is
 
1 2 3 4
I(A) = .
0 −1 0 −2

If we switch v1 and v4 in BV to obtain BV′ := (v4 , v2 , v3 , v1 ), and we switch w1 and w2 in



BW to obtain BW := (w2 , w1 ), then the matrix I ′ (A) of A with respect to BV′ and BW ′

is obtained from I(A) by switching columns 1 and 4 as well as rows 1 and 2, resulting
in  
′ −2 −1 0 0
I (A) = .
4 2 3 1
Suppose, we want to determine Av for the vector v := −v1 + 3v2 − 2v3 + v4 . Then,
7 MATRICES 138

according to (7.12), we can do that via the matrix multiplication


   
−1   −1  
3 1 2 3 4 3 3
I(A)   =
  .  =
  ,
−2 0 −1 0 −2 −2 −5
1 1

obtaining Av = 3w1 − 5w2 .


The following Th. 7.13 is the justification for defining matrix multiplication according
to Def. 7.2(c).

Theorem 7.13. Let F be a field, let n, m, l ∈ N, and let V, W, X be finite-dimensional


vector spaces over F , dim V = n, dim W = m, dim X = l, with ordered bases, BV :=
(v1 , . . . , vn ), BW := (w1 , . . . , wm ), and BX := (x1 , . . . , xl ), respectively. If A ∈ L(V, W ),
B ∈ L(W, X), M = (aji ) ∈ M(m, n, F ) is the matrix corresponding to A with respect to
BV and BW , and N = (bji ) ∈ M(l, P m, F ) is the matrix corresponding to B with respect
to BW and BX , then N M = ( m k=1 bjk aki ) ∈ M(l, n, F ) is the matrix corresponding to
BA with respect to BV and BX .

Proof. For each i ∈ {1, . . . , n}, one computes


m
! m m l
 X X X X
(BA)(vi ) = B A(vi ) = B aki wk = aki B(wk ) = aki bjk xj
k=1 k=1 k=1 j=1
l X
m l m
!
X X X
= bjk aki xj = bjk aki xj , (7.16)
j=1 k=1 j=1 k=1
Pm
proving N M = ( k=1 bjk aki ) is the matrix corresponding to BA with respect to the
bases {v1 , . . . , vn } and {x1 , . . . , xl }. 

As we have seen, given A ∈ L(V, W ), the matrix I(A) representing A depends on the
chosen (ordered) bases of V and W . It will be one goal of Linear Algebra II to find
methods to determine bases of V and W such that the form of I(A) becomes particularly
simple. In the following Th. 7.14, we show how I(A) changes for given basis transitions
for V and W , respectively.

Theorem 7.14. Let V and W be finite-dimensional vector spaces over the field F ,
m, n ∈ N, dim V = n, dim W = m. Let BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ) be
ordered bases of V and W , respectively. Moreover, let BV′ := (v1′ , . . . , vn′ ) and BW

:=
7 MATRICES 139

(w1′ , . . . , wm

) also be ordered bases of V and W , respectively, and let cji , dji , fji ∈ F be
such that
n
X
∀ vi′ = cji vj , (7.17a)
i∈{1,...,n}
j=1
Xm m
X
∀ wi′ = fji wj , wi = dji wj′ , (7.17b)
i∈{1,...,m}
j=1 j=1

where (dji ) = (fji )−1 ∈ GLm (F ) (we then call (cji ) ∈ GLn (F ) and (fji ) ∈ GLm (F ) the
transition matrices corresponding to the basis transitions from BV to BV′ and from BW

to BW , respectively). If A ∈ L(V, W ) has I(BV , BW )(A) = (aji ) ∈ M(m, n, F ) with
I(BV , BW ) as in Th. 7.10(a), then

I(BV′ , BW

)(A) = (eji ), where (eji ) = (dji )(aji )(cji ) = (fji )−1 (aji )(cji ). (7.18)

In particular, in the special case, where V = W , vi = wi , vi′ = wi′ for each i ∈ {1, . . . , n},
one has
I(BV′ , BV′ )(A) = (eji ), where (eji ) = (cji )−1 (aji )(cji ). (7.19)

Proof. For each i ∈ {1, . . . , n}, we compute


n
! n n m n m m
X X (7.14) X X X X X

Avi = A cli vl = cli Avl = cli akl wk = cli akl djk wj′
l=1 l=1 l=1 k=1 l=1 k=1 j=1
m n m
! !
X X X
= djk akl cli wj′ ,
j=1 l=1 k=1

i.e. (7.18) holds in consequence of Th. 7.10(b),(a). 

Example 7.15. We explicitly write the transformations and matrices of Th. 7.14 for the
situation of Ex. 7.12: There we had V := Q4 with ordered bases BV := (v1 , v2 , v3 , v4 ),
BV′ := (v4 , v2 , v3 , v1 ), and W := Q2 with ordered bases BW := (w1 , w2 ), BW

:= (w2 , w1 ).
Thus,  
v1′ = v4 0 0 0 1
v2′ = v2 0 1 0 0
(c ji ) =  
v3′ = v3 0 0 1 0
v4′ = v1 , 1 0 0 0
and  
w1′ = w2 0 1
′ (fji ) =
w2 = w1 , 1 0
7 MATRICES 140

and  
w1 = w2′ 0 1
′ (dji ) = .
w2 = w1 , 1 0
Moreover, A ∈ L(V, W ) in Ex. 7.12 was such that
 
1 2 3 4
(aji ) = I(BV , BW )(A) = .
0 −1 0 −2

Thus, according to (7.18),


 
   0 0 0 1
0 1 1 2 3 4  0 1 0 0
I(BV′ ′
, BW )(A) = 
1 0 0 −1 0 −2 0 0 1 0
1 0 0 0
    
0 1 4 2 3 1 −2 −1 0 0
= = ,
1 0 −2 −1 0 0 4 2 3 1

which is precisely I ′ (A) of Ex. 7.12.

7.3 Rank and Transpose


Definition 7.16. Let F be a field and m, n ∈ N.

(a) Let V and W be vector spaces over F and A ∈ L(V, W ). Then rk(A) := dim Im A
is called the rank of A.

(b) Let (aji )(j,i)∈{1,...,m}×{1,...,n} ∈ M(m, n, F ) and consider the column vectors and row
vectors of (aji ) as elements of the vector spaces F m and F n , respectively (cf. Rem.
7.4(b)). Then   
* a1i  +
 
 .. 
rk(aji ) := dim  .  : i ∈ {1, . . . , n} (7.20a)

 a 

mi

is called the column rank or just the rank of (aji ). Analogously,


Dn  oE
rrk(aji ) := dim aj1 . . . ajn : j ∈ {1, . . . , m} (7.20b)

is called the row rank of (aji ).


7 MATRICES 141

A main goal of the present section is to show that, if V and W are finite-dimensional
vector spaces, A ∈ L(V, W ), and (aji ) is the matrix representing A with respect to
chosen ordered bases of V and W , respectively, then rk(A) = rk(aji ) = rrk(aji ) (cf. Th.
7.23 below).
Theorem 7.17. Let F be a field, m, n ∈ N.

(a) Let V, W be finite-dimensional vector spaces over F , dim V = n, dim W = m, where


BV := (v1 , . . . , vn ) is an ordered basis of V and BW := (w1 , . . . , wm ) is an ordered
basis of W . If A ∈ L(V, W ) and (aji ) ∈ M(m, n, F ) is the matrix corresponding to
A with respect to BV and BW , then
rk(A) = rk(aji ).

(b) If the matrices (cji ) ∈ M(n, n, F ) and (dji ) ∈ M(m, m, F ) are regular and (aji ) ∈
M(m, n, F ), then  
rk(aji ) = rk (dji )(aji )(cji ) .

Proof. (a) is basically due to Lem. 7.5(a) and Th. 6.9: Let B : W −→ F m be the linear
isomorphism with B(wj ) = ej , where {e1 , . . . , em } is the standard basis of F m (cf. Ex.
5.16(c)). Then
rk(A) = dim Im A = dim B(Im A) = dim Im(BA) = rk(BA). (7.21)
Since  
m
X
! a1i
=  ...  = cA
 
∀ (BA)vi = B aji wj i ,
i∈{1,...,n}
j=1 ami
we have Im(BA) = h{cA A
1 , . . . , cn }i, which, together with (7.21), proves (a).

(b): Let BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ) be ordered bases of V := F n and


W := F m , respectively. Moreover, let A ∈ L(V, W ) be defined by (7.14), i.e. by
m
X
∀ Avi = aji wj .
i∈{1,...,n}
j=1

Then (aji ) is the matrix representing A with respect to BV and BW and rk(A) = rk(aji )

by (a). Now let BV′ := (v1′ , . . . , vn′ ) and BW ′
:= (w1′ , . . . , wm ) also be ordered bases of V
and W , respectively, such that (7.17) holds, i.e.
n
X m
X
∀ vi′ = cji vj , ∀ wi = dji wj′
i∈{1,...,n} i∈{1,...,m}
j=1 j=1
7 MATRICES 142

(such bases exist, since (cji ) and (dji ) are regular), then, according to Th. 7.14, the
matrix (eji ) := (dji )(aji )(cji ) represents A with respect to BV′ and BW

. Thus, (a) yields
rk(eji ) = rk(A) = rk(aji ), as desired. 

Theorem 7.18. Let V, W be finite-dimensional vector spaces over the field F , dim V =
n, dim W = m, (m, n) ∈ N2 , A ∈ L(V, W ) with r := rk(A). Then there exists ordered
bases BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ) of V and W , respectively, such that
the matrix (aji ) ∈ M(m, n, F ) corresponding to A with respect to BV and BW satisfies
(  
1 for j = i, 1 ≤ j ≤ r, Idr 0
aji = i.e. (aji ) = .
0 otherwise, 0 0

Proof. According to Th. 6.8(a), we know

n = dim V = dim ker A + dim Im A = dim ker A + r.

Thus, we can choose an ordered basis BV = (v1 , . . . , vn ) of V , such that (vr+1 , . . . , vn )


is a basis of ker A. For each i ∈ {1, . . . , r}, let wi := Avi . Then (w1 , . . . , wr ) is a basis
of Im A and there exist wr+1 , . . . , wm ∈ W such that BW = (w1 , . . . , wm ) constitutes a
basis of W . Since (
wi for 1 ≤ i ≤ r,
∀ Avi =
i∈{1,...,n} 0 for i > r,
(aji ) has the desired form. 

As a tool for showing rk(aji ) = rrk(aji ), we will now introduce the transpose of a
matrix, which is also of general interest. While, here, we are mostly interested in using
the transpose of a matrix for a matrix representing a linear map, the concept makes
sense in the more general situation of Def. 7.1:

Definition 7.19. Let S be a nonempty set, A := (aji )(j,i)∈{1,...,m}×{1,...,n} ∈ M(m, n, S),


and m, n ∈ N. Then we define the transpose of A, denoted At , by

At := (atji )(j,i)∈{1,...,n}×{1,...,m} ∈ M(n, m, S), where ∀ atji := aij .


(j,i)∈{1,...,n}×{1,...,m}
(7.22)
Thus, if A is an m×n matrix, then its transpose is an n×m matrix, where one obtains At
from A by switching rows and columns: A is the map f : {1, . . . , m} × {1, . . . , n} −→ S,
f (j, i) = aji , and its transpose At is the map f t : {1, . . . , n} × {1, . . . , m} −→ S,
f t (j, i) = atji = f (i, j) = aij .
7 MATRICES 143

Example 7.20. As examples of forming transposes, consider


 
 t    t 1 0
1 2 1 3 1 2 3 4 2 −1
= , = 3 0  .

3 4 2 4 0 −1 0 −2
4 −2

Remark 7.21. Let S be a nonempty set, m, n ∈ N. It is immediate from Def. 7.19 that
the map A 7→ At is bijective from M(m, n, S) onto M(n, m, S), where

∀ (At )t = A. (7.23)
A∈M(m,n,S)

Theorem 7.22. Let m, n, l ∈ N.

(a) Let S be a commutative ring. If A ∈ M(m, n, S) and B ∈ M(n, l, S), then

(AB)t = B t At . (7.24)

If A ∈ GLn (S), then At ∈ GLn (S) and

(At )−1 = (A−1 )t . (7.25)

(b) Let F be a field. Then the map

I : M(m, n, F ) −→ M(n, m, F ), A 7→ At , (7.26)

constitutes a linear isomorphism.

Proof. Exercise. 
Theorem 7.23. Let V, W be finite-dimensional vector spaces over the field F , dim V =
n, dim W = m, with ordered bases BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ), respec-
tively. If A ∈ L(V, W ) and (aji ) ∈ M(m, n, F ) is the matrix corresponding to A with
respect to BV and BW , then

rk(A) = rk(aji ) = rrk(aji ).

Proof. As the first equality was already shown in Th. 7.17(a), it merely remains to show
rk(aji ) = rrk(aji ). Let r := rk(A). According to Th. 7.18 and Th. 7.14, there exist
regular matrices (xji ) ∈ GLm (F ), (yji ) ∈ GLn (F ) such that
   
Idr 0 (7.24) t t t Idr 0
(xji )(aji )(yji ) = ⇒ (yji ) (aji ) (xji ) = .
0 0 0 0
7 MATRICES 144

Since (xji )t and (yji )t are regular by Th. 7.22(a), we may use Th. 7.17(b) to obtain
 
t t t t
 Idr 0
rrk(aji ) = rk(aji ) = rk (yji ) (aji ) (xji ) = rk = r,
0 0

as desired. 
Remark 7.24. Let V, W be finite-dimensional vector spaces over the field F , dim V = n,
dim W = m, where m, n ∈ N. Moreover, let BV = (v1 , . . . , vn ) and BW = (w1 , . . . , wm )
be ordered bases of V and W , respectively, let A ∈ L(V, W ), and let (aji ) ∈ M(m, n, F )
be the matrix corresponding to A with respect to BV and BW . Using the transpose
matrix (aji )t ∈ M(n, m, F ), one can now also define a transpose At of the map A.
However, a subtlety arises related to basis transitions and, given λ1 , . . . , λm ∈ F , rep-
resenting coordinates of w ∈ W with respect to BW , one can not simply define At by
applying (aji )t to the column vector containing λ1 , . . . , λm , as the result would, in gen-
eral, depend on BV and BW . Instead, one obtains At by left-multiplying (aji ) with the
row vector containing λ1 , . . . , λm :

At w := λ1 . . . λm (aji ).

Even though the resulting coordinate values are the same that one obtains from com-
puting (aji )t (λ1 . . . λm )t , the difference appears when considering basis transitions. In
consequence, algebraically, one obtains the transpose At to be a map from the dual W ′
of W into the dual V ′ of V . Here, we will not go into further details, as we do not want
to explain the concept of dual spaces at this time.

7.4 Special Types of Matrices


In the present section, we investigate types of matrices that posses particularly simple
structures, often, but not always, due to many entries being 0. For many of the sets
that we introduce in the following, there does not seem to be a standard notation, at
least not within Linear Algebra. However, as these sets do appear very frequently, it
seems inconvenient and cumbersome not to introduce suitable notation.
Definition 7.25. Let n ∈ N and let S be a set containing elements denoted 0 and 1
(usually, S will be a ring with unity or even a field, but we want to emphasize that the
present definition makes sense without any structure on the set S).

(a) We call a matrix A = (aji ) ∈ M(n, S) diagonal if, and only if, aji = 0 for j 6= i,
i.e. if, and only if, all nondiagonal entries of A are 0. We define

Dn (S) := A ∈ M(n, S) : A is diagonal .
7 MATRICES 145

If (s1 , . . . , sn ) ∈ S n , then we define


(
sj for j = i,
diag(s1 , . . . , sn ) := D = (dji ) ∈ Dn (S), dji :=
0 6 i.
for j =

(b) A matrix A = (aji ) ∈ M(n, S) is called upper triangular or right triangular (resp.
lower triangular or left triangular) if, and only if, aji = 0 for each i, j ∈ {1, . . . , n}
such that j > i (resp. j < i), i.e. if, and only if, all nonzero entries of A are
above/right (resp. below/left) of the diagonal. A triangular matrix A is called
strict if, and only if, aii = 0 for each i ∈ {1, . . . , n}; it is called unipotent if, and
only if, aii = 1 for each i ∈ {1, . . . , n}. We define4

BUn (S) := A ∈ M(n, S) : A is upper triangular ,

BLn (S) := A ∈ M(n, S) : A is lower triangular ,

BU0n (S) := A ∈ M(n, S) : A is strict upper triangular ,

BL0n (S) := A ∈ M(n, S) : A is strict lower triangular ,

BU1n (S) := A ∈ BUn (S) : A is unipotent ,

BL1n (S) := A ∈ BLn (S) : A is unipotent .

(c) A matrix A = (aji ) ∈ M(n, S) is called symmetric if, and only if, At = A, i.e. if
aji = aij for each i, j ∈ {1, . . . , n}. We define

Symn (S) := A ∈ M(n, S) : A is symmetric .
Proposition 7.26. Let S be a ring and m, n ∈ N. Let s1 , . . . , smax{m,n} ∈ S.

(a) Let D1 := diag(s1 , . . . , sm ) ∈ Dm (S), D2 := diag(s1 , . . . , sn ) ∈ Dn (S), A = (aji ) ∈


M(m, n, S). Then left multiplication of A by D1 multiplies the j-th row of A by sj ;
right multiplication of A by D2 multiplies the i-th column of A by si :
   
r1A s1 r1A
D1 A = diag(s1 , . . . , sm )  ...  =  ...  ,
   
(7.27a)
A A
rm s m rm
 
AD2 = cA 1 . . . cA A A
n diag(s1 , . . . , sn ) = c1 s1 . . . cn sn . (7.27b)

(b) Dn (S) is a subring of M(n, S) (subring with unity if S is a ring with unity). If S
is a field, then Dn (S) is a vector subspace of M(n, S).
4
The ‘B’ in the notation comes from the theory of so-called Lie algebras, where one finds, at least
for suitable S, that BUn (S) and BLn (S) form so-called Borel subalgebras of the algebra M(n, S).
7 MATRICES 146

(c) If S is a ring with unity, then D := diag(s1 , . . . , sn ) ∈ GLn (S) if, and only if, each
si is invertible, i.e. if, and only if, si ∈ S ∗ for each i ∈ {1, . . . , n}. In that case, one
has D−1 = diag(s−1 −1 ∗
1 , . . . , sn ) ∈ GLn (S). Moreover, Dn (S ) = Dn (S) ∩ GLn (S) is
a subgroup of GLn (S).

Proof. (a): If (dji ) := D1 , (eji ) := D2 , (xji ) := D1 A and (yji ) := AD2 , then, for each
(j, i) ∈ {1, . . . , m} × {1, . . . , n},
n
X m
X
xji = djk aki = sj aji , yji = ajk eki = aji si .
k=1 k=1

(b): As a consequence of (a), we have, for a1 , . . . , an , b1 , . . . , bn ∈ S,


diag(a1 , . . . , an ) diag(b1 , . . . , bn ) = diag(a1 b1 , . . . , an bn ) ∈ Dn (S). (7.28)
As A, B ∈ Dn (S), clearly, implies A + B ∈ Dn (S) and −A ∈ Dn (S), we have shown
Dn (S) to be a subring of M(n, S), where, if 1 ∈ S, then Idn ∈ Dn (S). If λ ∈ S,
A ∈ Dn (S), then, clearly, λA ∈ Dn (S), showing that, for S being a field, Dn (S) is a
vector subspace of M(n, S).
(c) follows from (7.28) and the fact that (S ∗ , ·) is a group according to Def. and Rem.
4.41. 
Proposition 7.27. Let S be a ring and n ∈ N.

(a) Let A := (aji ) ∈ M(n, S) and B := (bji ) ∈ M(n, S). If A, B ∈ BUn (S) or
A, B ∈ BLn (S), and C := (cji ) = AB, then cii = aii bii for each i ∈ {1, . . . , n}.
(b) BUn (S), BLn (S), BU0n (S), BL0n (S) are subrings of M(n, S) ( BUn (S), BLn (S) are
subrings with unity if S is a ring with unity). If S is a field, then BUn (S), BLn (S),
BU0n (S), BL0n (S) are vector subspaces of M(n, S).
(c) Let S be a ring with unity. If B := (bji ) ∈ BUn (S) ∩ GLn (S) and A := (aji ) = B −1 ,
then A ∈ BUn (S) with

0 for i < j,


−1
aji = bii   for i = j, (7.29a)

 Pi−1 −1
− k=1 ajk bki bii recursively for i > j.

If B := (bji ) ∈ BLn (S) ∩ GLn (S) and A := (aji ) = B −1 , then A ∈ BLn (S) with

0 for j < i,


−1
aji = bii   for j = i, (7.29b)

 Pj −1
− k=i+1 ajk bki bii recursively for j > i.
7 MATRICES 147

(d) If S is a ring with unity and A := (aji ) ∈ BUn (S) ∪ BLn (S), then A ∈ GLn (S) if,
and only if, each aii is invertible. Moreover, BU1n (S), BL1n (S) as well as BUn (S) ∩
GLn (S), BLn (S) ∩ GLn (S) are subgroups of GLn (S).

Proof. (a): For A, B ∈ BUn (S), we compute

i−1 =0 n =0
X z}|{ X z}|{
cii = aik bki + aii bii + aik bki = aii bii ,
k=1 k=i+1

while, for A, B ∈ BLn (S), we compute

i−1 =0 n =0
X z}|{ X z}|{
cii = aik bki +aii bii + aik bki = aii bii .
k=1 k=i+1

(b): Let A, B ∈ BUn (S) and C := (cji ) = AB. We obtain C ∈ BUn (S), since, if j > i,
then
i
X Xn
cji = ajk bki + ajk bki = 0
k=1 k=i+1

(the first sum equals 0 since k ≤ i < j and A is upper triangular, the second sum equals
0 since k > i and B is upper triangular). Now let A, B ∈ BLn (S) and C := (cji ) := AB.
We obtain C ∈ BLn (S), since, if j < i, then
i−1
X n
X
cji = ajk bki + ajk bki = 0
k=1 k=i

(the first sum equals 0 since k < i and B is lower triangular, the second sum equals 0 since
j < i ≤ k and A is lower triangular). Now let M ∈ {BUn (S), BLn (S), BU0n (S), BL0n (S)}.
Then the above and (a) show AB ∈ M for A, B ∈ M. As A, B ∈ M, clearly, implies
A + B ∈ M and −A ∈ M, we have shown M to be a subring of M(n, S), where, if
1 ∈ S, then Idn ∈ BUn (S) ∩ BLn (S). If λ ∈ S, A ∈ M, then, clearly, λA ∈ M, showing
that, for S being a field, M is a vector subspace of M(n, S).
(c): If B ∈ BUn (S) ∩ GLn (S), then each bii is invertible by (a) and (b). Now let
A := (aji ) be defined by (7.29a), C := (cji ) := AB. We already know C ∈ BUn (S)
according to (b). Moreover, cii = 1 follows from (a) and, for j < i,
i
X i−1
X
cji = ajk bki = aji bii + ajk bki = 0
k=j k=j
7 MATRICES 148

by the recursive part of (7.29a), completing the proof of C = Idn . If B ∈ BLn (S) ∩
GLn (S), then each bii is invertible by (a) and (b). Now let A := (aji ) be defined by
(7.29b), C := (cji ) := AB. We already know C ∈ BLn (S) according to (b). Moreover,
cii = 1 follows from (a) and, for j > i,
j j
X X
cji = ajk bki = aji bii + ajk bki = 0
k=i k=i+1

by the recursive part of (7.29b), completing the proof of C = Idn .


(d) is now merely a corollary of suitable parts of (a),(b),(c). 
Example 7.28. Let  
1 0 0
B := 4 2 0 .
6 5 3
We use (7.29b) to compute A := (aji ) = B −1 :
1 1
a11 = 1, a22 = , a33 = ,
2 3
5 1 5
a32 = −a33 b32 /b22 =− · =− ,
3 2 6
4
a21 = −a22 b21 /b11 = − = −2,
2  
5·4 6 4
a31 = −(a32 b21 + a33 b31 )/b11 =− − + = .
6 3 3
Thus,
     
1 0 0 1 0 0 1 0 0
1
0  . 4 2 0 =  −2 + 42
     
AB = −2 2
1 0 = Id3 .
4
3
− 65 1
3
6 5 3 4
3
− 5·4
6
+ 6
3
− 5·2
6
+ 5
3
1
Proposition 7.29. Let F be a field and n ∈ N.

(a) Symn (F ) is a vector subspace of M(n, F ).


(b) If A ∈ Symn (F ) ∩ GLn (F ), then A−1 ∈ Symn (F ) ∩ GLn (F ).

Proof. (a): If A, B ∈ Symn (F ) and λ ∈ F , then (A + B)t = At + B t = A + B and


(λA)t = λAt = λA, showing A + B ∈ Symn (F ) and λA ∈ Symn (F ), i.e. Symn (F ) is a
vector subspace of M(n, F ).
(b): If A ∈ Symn (F ) ∩ GLn (F ), then (A−1 )t = (At )−1 = A−1 , proving A−1 ∈ Symn (F ) ∩
GLn (F ). 
7 MATRICES 149

Definition 7.30. Let S be a group, n ∈ N, A ∈ M(n, S).

(a) A is called skew-symmetric if, and only if, At = −A. Define



Skewn (S) := A ∈ M(n, S) : A is skew-symmetric .

(b) Assume F := S is a field with char F 6= 2 (i.e. 2 := 1 + 1 6= 0 in F ). We then


call Asym := 21 (A + At ) the symmetric part of A and Askew := 12 (A − At ) the skew-
symmetric part of A.

Proposition 7.31. Let F be a field, n ∈ N.

(a) Skewn (F ) is a vector subspace of M(n, F ).

(b) If char F 6= 2, then Skewn (F ) ∩ Symn (F ) = {0}.

(c) If A ∈ Skewn (F ) ∩ GLn (F ), then A−1 ∈ Skewn (F ) ∩ GLn (F )5 .

Proof. (a): If A, B ∈ Skewn (F ) and λ ∈ F , then (A + B)t = At + B t = −A − B =


−(A + B) and (λA)t = λAt = λ(−A) = −(λA), showing A + B ∈ Skewn (F ) and
λA ∈ Skewn (F ), i.e. Skewn (F ) is a vector subspace of M(n, F ).
(b): Let A ∈ Skewn (F ) ∩ Symn (F ). Then −A = At = A, i.e. 2A = 0, implying 2 = 0 or
A = 0. Thus, char F = 2 or A = 0.
(c): If A ∈ Skewn (F ) ∩ GLn (F ), then (A−1 )t = (At )−1 = −A−1 , proving A−1 ∈
Symn (F ) ∩ GLn (F ). 

Proposition 7.32. Let F be a field with char F 6= 2, n ∈ N, A ∈ M(n, F ).

(a) The symmetric part Asym = 21 (A + At ) of A is symmetric, the skew-symmetric part


Askew = 21 (A − At ) of A is skew-symmetric.

(b) A can be uniquely decomposed into its symmetric and skew-symmetric parts, i.e.
A = Asym + Askew and, if A = S + B with S ∈ Symn (F ) and B ∈ Skewn (F ), then
S = Asym and B = Askew .

(c) A is symmetric if, and only if, A = Asym ; A is skew-symmetric if, and only if,
A = Askew .
5
Using the determinant det A and results from Linear Algebra II, we can show that A−1 ∈
Skewn (F ) ∩ GLn (F ) can only exist for char F = 2 or n even: If A ∈ Skewn (F ), then det A = det(At ) =
det(−A) = (−1)n det A, where we have used [Phi19, Cor. 4.24(b),(d)]. Thus, for n odd, 2 det A = 0,
implying 2 = 0 (i.e. char F = 2) or det A = 0 (i.e. A ∈/ GLn (F ) by [Phi19, Cor. 4.24(h)]).
7 MATRICES 150

Proof. (a): Asym is symmetric due to


1 1
Atsym = (A + At )t = (At + A) = Asym .
2 2
Askew is skew-symmetric due to
1 1
Atskew = (A − At )t = (At − A) = −Askew .
2 2

(b): While Asym + Askew = 12 A + 21 At + 21 A − 12 At = A is immediate; if A = S + B with


S ∈ Symn (F ) and B ∈ Skewn (F ), then
Asym + Askew = A = S + B ⇒ Asym − S = B − Askew ,
where Asym − S ∈ Symn (F ) by Prop. 7.29(a) and B − Askew ∈ Skewn (F ) by Prop.
7.31(a), showing Asym = S and Askew = B by Prop. 7.31(b).
(c): If A = Asym , then A is symmetric by (a); if A = Askew , then A is skew-symmetric by
(b). If A is symmetric, then Askew = A − Asym ∈ Skewn (F ) ∩ Symn (F ) = {0}, showing
A = Asym . If A is skew-symmetric, then Asym = A−Askew ∈ Skewn (F )∩Symn (F ) = {0},
showing A = Askew . 

7.5 Blockwise Matrix Multiplication


It is sometimes useful that one can carry out matrix multiplication in a blockwise fashion,
i.e. by partitioning the entries of a matrix into submatrices and then performing a
matrix multiplication for new matrices that have the submatrices as their entries, see
Th. 7.34 below. To formulate and proof Th. 7.34, we need to define matrices and their
multiplication, where the entries are allowed to be indexed by more general index sets:
Definition 7.33. Let S be a ring. Let J, I be finite index sets, #J, #I ∈ N. We then
call each family (aji )(j,i)∈J×I in S a J × I matrix over S, denoting the set of all such
matrices by M(J, I, S). If K is another index set with #K ∈ N, then, for each J × K
matrix (aji ) and each K × I matrix (bji ) over S, define the product
!
X
(aji )(bji ) := ajk bki . (7.30)
k∈K (j,i)∈J×I

Theorem 7.34. Let S be a ring. Let J, I, K be finite index sets, #J, #I, #K ∈ N.
Now assume, we have disjoint partitions of J, I, K into nonempty sets:
[˙ [˙ [˙
J= Jα , I = Iβ , K = Kγ , (7.31)
α∈{1,...,M } β∈{1,...,N } γ∈{1,...,L}
7 MATRICES 151

where M, N, L ∈ N,

∀ mα := #Jα ∈ N, ∀ nβ := #Iβ ∈ N, ∀ lγ := #Kγ ∈ N.


α∈{1,...,M } β∈{1,...,N } γ∈{1,...,M }

Let A := (aji ) be a J × K matrix over S and let B := (bji ) be a K × I matrix over S.


Define the following submatrices of A and B, respectively:

∀ Aαγ := (aji )(j,i)∈Jα ×Kγ , (7.32a)


(α,γ)∈{1,...,M }×{1,...,L}

∀ Bγβ := (bji )(j,i)∈Kγ ×Iβ . (7.32b)


(γ,β)∈{1,...,L}×{1,...,N }

Then, for each (α, γ, β) ∈ {1, . . . , M }×{1, . . . , L}×{1, . . . , N }, Aαγ is a Jα ×Kγ matrix
and Bγβ is a Kγ × Iβ matrix. Thus, we can define
L
!
X
(Cαβ ) := (Aαγ )(Bγβ ) := Aαγ Bγβ . (7.32c)
γ=1 (α,β)∈{1,...,M }×{1,...,N }

We claim that
C := (cji ) := AB = (Cαβ ) (7.33)
in the sense that

∀ ∀ cji = (Cαβ )ji . (7.34)


(α,β)∈{1,...,M }×{1,...,N } (j,i)∈Jα ×Iβ

Proof. Let (α, β) ∈ {1, . . . , M } × {1, . . . , N } and (j, i) ∈ Jα × Iβ . Then, according to


(7.32) and since K is the disjoint union of the Kγ , we have
L X
X X
(Cαβ )ji = ajk bki = ajk bki = cji ,
γ=1 k∈Kγ k∈K

proving (7.34) and the theorem. 

Example 7.35. Let S be a ring with unity and A, B, C, D ∈ GL2 (S). Then we can use
Th. 7.34 to perform the following multiplication of a 4 × 6 matrix by a 6 × 4 matrix:
 
  A−1 0  
A B 0  −1 −1  2 Id2 BC −1
B C = .
0 C D CB −1 2 Id2
0 D−1
8 LINEAR SYSTEMS 152

8 Linear Systems

8.1 General Setting


Let F be a field. A linear equation over F has the form
n
X
ak xk = b,
k=1

where n ∈ N; b ∈ F and the a1 , . . . , an ∈ F are given; and one in interested in determin-


ing the “unknowns” x1 , . . . , xn ∈ F such that the equation holds. The equation is called
linear, as b is desired to be a linear combination of the a1 , . . . , an . A linear system is now
a finite set of linear equations that the x1 , . . . , xn ∈ F need to satisfy simultaneously:
n
X
∀ ajk xk = bj , (8.1)
j∈{1,...,m}
k=1

where m, n ∈ N; b1 , . . . , bm ∈ F and the aji ∈ F , j ∈ {1, . . . , m}, i ∈ {1, . . . , n}


are given. One observes that (8.1) can be concisely written as Ax = b, using matrix
multiplication, giving rise to the following definition:

Definition 8.1. Let F be a field. Given a matrix A ∈ M(m, n, F ), m, n ∈ N, and


b ∈ M(m, 1, F ) ∼
= F m , the equation

Ax = b (8.2)

is called a linear system for the unknown x ∈ M(n, 1, F ) ∼


= F n . The matrix one obtains
by adding b as the (n + 1)th column to A is called the augmented matrix of the linear
system. It is denoted by (A|b). The linear system (8.2) is called homogeneous for b = 0
and inhomogeneous for b 6= 0. By

L(A|b) := {x ∈ F n : Ax = b}, (8.3)

we denote the set of solutions to (8.2).

Example 8.2. While linear systems arise from many applications, we merely provide
a few examples, illustrating the importance of linear systems for problems from inside
the subject of linear algebra.

(a) Let v1 := (1, 2, 0, 3), v2 := (0, 3, 2, 1), v3 := (1, 1, 1, 1). The question if the set
M := {v1 , v2 , v3 } is a linearly dependent subset of R4 is equivalent to asking if there
8 LINEAR SYSTEMS 153

exist x1 , x2 , x3 ∈ R, not all 0, such that x1 v1 + x2 v2 + x3 v3 = 0, which is equivalent


to the question if the linear system

x1 + x3 = 0
2x1 + 3x2 + x3 = 0
2x2 + x3 = 0
3x1 + x2 + x3 = 0

has a solution x = (x1 , x2 , x3 ) 6= (0, 0, 0). The linear system can be written as
   
1 0 1   0
 2 3 1  x1 0
0 2 1 x2 = 0 .
    
x3
3 1 1 0

The augmented matrix of the linear system is


 
1 0 1 | 0
2 3 1 | 0
(A|b) :=  .
0 2 1 | 0
3 1 1 | 0

(b) Let b1 := (1, 2, 2, 1). The question if b1 ∈ h{v1 , v2 , v3 }i with v1 , v2 , v3 as in (a) is


equivalent to the question, whether the linear system
   
1 1 0 1
2
 , A := 2 3 1
 
Ax = 2  0 2 1
1 3 1 1

has a solution.

(c) Let n ∈ N. The problem of finding an inverse to an n × n matrix A ∈ M(n, n, F )


is equivalent to solving the n linear systems

Av1 = e1 , . . . , Avn = en , (8.4)

where e1 , . . . , en are the standard (column) basis vectors. Then the vk are obviously
the column vectors of A−1 .
8 LINEAR SYSTEMS 154

8.2 Abstract Solution Theory


Before describing in Sec. 8.3 below, how one can systematically determine the set of
solutions to a linear system, we apply some of our general results from the theory
of vector spaces, linear maps, and matrices to obtain criteria for the existence and
uniqueness of solutions to linear systems.

Remark 8.3. Let F be a field and m, n ∈ N. If A ∈ M(m, n, F ), then A can be


interpreted as the linear map

LA : F n −→ F m , LA (x) := Ax, (8.5)

where x ∈ F n is identified with a column vector in M(n, 1, F ) and the column vector
Ax ∈ M(m, 1, F ) is identified with an element of F m . In view of this fact, given an
n-dimensional vector space V over F , an m-dimensional vector space W over F , a linear
map L ∈ L(V, W ), and b ∈ W , we call

Lx = b (8.6)

a linear system or merely a linear equation for the unknown vector x ∈ V and we write
L(L|b) for its set of solutions. If V = F n , W = F m , and the the matrix A ∈ M(m, n, F )
corresponds to L with respect to the respective standard bases of F n and F m (cf. Ex.
5.16(c)), then (8.6) is identical to (8.2) and

L(A|b) = L(L|b) = L−1 {b}, (8.7)

i.e. the set of solutions is precisely the preimage of the set {b} under L. In consequence,
we then have

L(A|0) = L(L|0) = ker L, (8.8)


Th. 6.8(a)
dim ker L = dim F n − dim Im L = n − rk(A), (8.9)

and, more generally,

b∈
/ Im L ⇒ L(A|b) = L(L|b) = ∅, (8.10a)
Th. 4.20(f)
b ∈ Im L ⇒ ∀ L(A|b) = L(L|b) = x0 + ker L = x0 + L(L|0). (8.10b)
x0 ∈L(L|b)

One can also express (8.10b) by saying one obtains all solutions to the inhomogeneous
system Lx = b by adding a particular solution of the inhomogeneous system to the set
of all solutions to the homogeneous system Lx = 0.
8 LINEAR SYSTEMS 155

Theorem 8.4 (Existence of Solutions). Let F be a field and m, n ∈ N, let L ∈


L(F n , F m ), and let A ∈ M(m, n, F ) correspond to L with respect to the respective
standard bases of F n and F m .

(a) Given b ∈ F m , the following statements are equivalent:

(i) L(A|b) = L(L|b) 6= ∅.


(ii) b ∈ Im L.
(iii) b ∈ h{cA A A
1 , . . . , cn }i, where the ci denote the column vectors of A.

(iv) rk(A) = rk(A|b).

(b) The following statements are equivalent:

(i) L(A|b) = L(L|b) 6= ∅ holds for each b ∈ F m .


(ii) L is surjective.
(iii) rk(A) = m.

Proof. (a): (i) implies (ii) by (8.7). Since A corresponds to L with respect to the
respective standard bases of F n and F m , (ii) implies (iii) by Lem. 7.5(a). (iii) implies
(iv), since
(iii)
rk(A|b) = dimh{cA A A A
1 , . . . , cn , b}i = dimh{c1 , . . . , cn }i = rk(A).

(iv) implies (i): If rk(A) = rk(A|b), then b is linearly dependent


Pn on cA A
1 , . . . , cn , which,
n A
by Lem. 7.5(a), yields the existence of x ∈ F with Ax = k=1 xk ck = b.
(b): The equivalence between (i) and (ii) is merely the definition of L being surjective.
Moreover, the equivalences

rk(A) = m ⇔ dim Im L = m ⇔ Im L = F m ⇔ L surjective

prove the equivalence between (ii) and (iii). 


Theorem 8.5 (Uniqueness of Solutions). Consider the situation of Th. 8.4. Given
b ∈ Im L, the following statements are equivalent:

(i) #L(A|b) = #L(L|b) = 1, i.e. Lx = b has a unique solution.

(ii) L(A|0) = L(L|0) = {0}, i.e. the homogeneous system Lx = 0 has only the so-called
trivial solution x = 0.

(iii) rk(A) = n.
8 LINEAR SYSTEMS 156

Proof. Since b ∈ Im L, there exists x0 ∈ F n with L(x0 ) = b, i.e. x0 ∈ L(L|b).


“(i)⇔(ii)”: Since L(L|b) = x0 + ker L by (8.10b), (i) is equivalent to L(A|0) = ker L =
{0}.
“(ii)⇔(iii)”: According to Th. 6.8(a), we know
rk(A) = dim Im L = dim F n − dim ker L = n − dim ker L.
Thus, L(A|0) = ker L = {0} is equivalent to rk(A) = n. 
Corollary 8.6. Consider the situation of Th. 8.4 with m = n. Then the following
statements are equivalent:

(i) There exists b ∈ F n such that Lx = b has a unique solution.


(ii) L(A|b) = L(L|b) 6= ∅ holds for each b ∈ F n .
(iii) The homogeneous system Lx = 0 has only the so-called trivial solution x = 0.
(iv) rk(A) = n.
(v) A and L are invertible.

Proof. “(iii)⇔(iv)”: The equivalence is obtained by using b := 0 in Th. 8.5.


“(iii)⇔(v)”: Since m = n, L is bijective if, and only if, it is injective by Th. 6.10.
“(ii)⇔(v)”: Since m = n, L is bijective if, and only if, it is surjective by Th. 6.10.
“(i)⇔(iii)” is another consequence of Th. 8.5. 

8.3 Finding Solutions


While the results of Sec. 8.2 provide some valuable information regarding the solutions
of linear systems, in general, they do not help much to solve concrete systems, especially,
if the systems consist of many equations with many variables. To remedy this situation,
in the present section, we will investigate methods to systematically solve linear systems.

8.3.1 Echelon Form, Back Substitution

We recall from (8.1) and Def. 8.1 that, given a field F and m, n ∈ N, we are considering
linear systems
Xn
∀ ajk xk = bj ,
j∈{1,...,m}
k=1
8 LINEAR SYSTEMS 157

which we can write in the form


Ax = b
with A = (aji ) ∈ M(m, n, F ) and b ∈ M(m, 1, F ) ∼
= F m , where x and b are interpreted
as column vectors, and (A|b) ∈ M(m, n + 1, F ), where b is added as a last column to
A, is called the augmented matrix of the linear system. The goal is to determine the
set of solutions L(A|b) = {x ∈ F n : Ax = b}. We first investigate a situation, where
determining L(A|b) is particularly easy, namely where (A|b) has so-called echelon form:
Definition 8.7. Let S be a set with 0 ∈ S and A = (aji ) ∈ M(m, n, S), m, n ∈ N. For
each row, i.e. for each j ∈ {1, . . . , m}, let ν(j) ∈ {1, . . . , n} be the smallest index k such
that ajk 6= 0 and ν(j) := n + 1 if the jth row consists entirely of zeros. Then A is said to
be in (row) echelon form if, and only if, for each j ∈ {2, . . . , n}, one has ν(j) > ν(j − 1)
or ν(j) = n + 1. Thus, A is in echelon form if, and only if, it looks as follows:
 
0 ... 0  ∗ ∗ ∗ ∗ ... ∗ ∗ ∗ ∗
0 . . . 0 0 . . . 0  ∗ ∗ ∗ . . . ∗ ∗
 
A = 0 . . . 0 0 . . . 0 0 . . . 0  ∗ . . . ∗ . (8.11)
 
.. ..
. .

The first nonzero elements in each row are called pivot elements (in (8.11), the positions
of the pivot elements are marked by squares ). The columns cA k containing a pivot
element are called pivot columns and k ∈ {1, . . . , n} is then called a pivot index; an index
k ∈ {1, . . . , n} that is not a pivot index is called a free index. Let IpA , IfA ⊆ {1, . . . , n}
denote the sets of pivot indices and free indices, respectively. If A represents a linear
system, then the variables corresponding to pivot columns are called pivot variables. All
remaining variables are called free variables.
Example 8.8. The following matrix over R is in echelon form:
 
0 0 3 3 0 −1 0 3
A := 0 0 0 4 0 2 −3 2 .
0 0 0 0 0 0 1 1

It remains in echelon form if one adds zero rows at the bottom. For the linear system
Ax = b, the variables x3 , x4 , x7 are pivot variables, whereas x1 , x2 , x5 , x6 , x8 are free
variables.
Theorem 8.9. Let F be a field and m, n ∈ N. Let A ∈ M(m, n, F ), b ∈ M(m, 1, F ) ∼=
F m , and consider the linear system Ax = b. Assume the augmented matrix (A|b) to be
in echelon form.

(a) Then the following statements are equivalent:


8 LINEAR SYSTEMS 158

(i) Ax = b has at least one solution, i.e. L(A|b) 6= ∅.


(ii) rk(A) = rk(A|b).
(iii) The final column b in the augmented matrix (A|b) contains no pivot elements
(i.e. if there is a zero row in A, then the corresponding entry of b also van-
ishes).

(b) Let LA : F n −→ F m be the linear map associated with A according to Rem. 8.3.
Noting A to be in echelon form as well and using the notation from Def. 8.7, one
has
dim ker LA = dim L(A|0) = #IfA , dim Im LA = rk A = #IpA ,
i.e. the dimension of the kernel of A is given by the number of free variables, whereas
the dimension of the image of A is given by the number of pivot variables.

Proof. (a): The equivalence between (i) and (ii) was already shown in Th. 8.4.
“(ii)⇔(iii)”: As both A and (A|b) are in echelon form, their respective ranks are, clearly,
given by their respective number of nonzero rows. However, A and (A|b) have the same
number of nonzero rows if, and only if, b contains no pivot elements.
(b): As A is in echelon form, its number of nonzero rows (i.e. its rank) is precisely the
number of pivot indices, proving dim Im LA = rk A = #IpA . Since #IpA + #IfA = n =
dim ker LA + dim Im LA , the claimed dim ker LA = #IfA now also follows. 

If the augmented matrix (A|b) of the linear system Ax = b is in echelon form and such
that its final column b contains no pivot element, then the linear system can be solved,
using so-called back substitution: One obtains a parameterized representation of L(A|b)
as follows: Starting at the bottom with the first nonzero row, one solves each row for the
corresponding pivot variable, in each step substituting the expressions for pivot variables
that were obtained in previous steps. The free variables are treated as parameters. A
particular element of L(A|b) can be obtained by setting all free variables to 0. We now
provide a precise formulation of this algorithm:

Algorithm 8.10 (Back Substitution). Let F be a field and m, n ∈ N. Let 0 6= A =


(aji ) ∈ M(m, n, F ) and b ∈ M(m, 1, F ) ∼ = F m and consider the linear system Ax = b.
Assume the augmented matrix (A|b) to be in echelon form and assume its final column
b to contain no pivot element. As before, let IpA , IfA ⊆ {1, . . . , n} denote the sets of pivot
indices and free indices, respectively. Also note IpA 6= ∅, whereas IfA might be empty.
Let i1 < · · · < iN be an enumeration of the elements of IpA , where N ∈ {1, . . . , n}. As A
has echelon form, for each k ∈ {1, . . . , N }, ak,ik is the pivot element occurring in pivot
column cA ik (i.e. the pivot element is in the k-th row and this also implies N ≤ m).
8 LINEAR SYSTEMS 159

The algorithm of back substitution defines a family (αkl )(k,l)∈J×I in F , J := IpA , I :=


{0} ∪ IfA , such that
∀ αik l = 0, (8.12a)
k∈{1,...,N }, 0<l<ik
!
X
∀ x ik = α ik 0 + α ik l x l ∧ α ik 0 = a−1
k ik bk , (8.12b)
k∈{1,...,N }
l∈I: l>ik

where, for l > ik , the αik l are defined recursively over k = N, . . . , 1 as follows: Assuming
αij l have already been defined for each j > k such that (8.12b) holds, solve the k-th
equation of Ax = b for xik :
n
X n
X
ak i x i = b k ⇒ x ik = a−1
k ik bk − a−1
k ik ak i x i . (8.13)
i=ik i=ik +1

All variables xi in the last equation have indices i > ik and, inductively, we use (8.12b)
for ij > ik to replace all pivot variables in this equation for xik . The resulting expression
for xik has the form (8.12b) with αik l ∈ F , as desired.
From (αkl ), we now define column vectors
   
s1 v1i
 .. 
∀ vi :=  ...  ,
 
s :=  .  ,
A
i∈If
sn vni
where

(
 α ik i
 for j = ik ∈ IpA ,
α ik 0 for j = ik ∈ IpA ,
∀ sj := ∀ vji := 1 for j = i,
j∈{1,...,n} 0 for j ∈ IfA , (j,i)∈{1,...,n}×IfA 

0 for j ∈ IfA \ {i}.
Theorem 8.11. Consider the situation of Alg. 8.10 and let LA : F n −→ F m be the
linear map associated with A according to Rem. 8.3.

(a) Let x = (x1 , . . . , xn )t ∈ F n be a column vector. Then x ∈ L(A|b) if, and only if,
the xik , k ∈ {1, . . . , N }, satisfy the recursion over k = N, . . . , 1, given by (8.13) or,
equivalently, by (8.12b). In particular, x ∈ ker LA = L(A|0) if, and only if, the xik
satisfy the respective recursion with b1 = · · · = bN = 0.
(b) The set
B := {vi : i ∈ IfA } (8.14)
forms a basis of ker LA = L(A|0) and
L(A|b) = s + ker LA . (8.15)
8 LINEAR SYSTEMS 160

Proof. (a) holds, as the implication in (8.13) is, clearly, an equivalence.


(b): As we already know dim ker LA = #IfA from Th. 8.9(b), to verify B is a basis of
ker LA , it suffices to show B is linearly independent and B ⊆ ker LA . If IfA = ∅, then
there is nothing to prove. Otherwise, if (λi )i∈IfA is a family in F such that
X
0= λi vi ,
i∈IfA

then X X
∀ 0= λj vji = λj δji = λi ,
i∈IfA
j∈IfA j∈IfA

showing the linear independence of B. To show B ⊆ ker LA , fix i ∈ IfA . To prove


vi ∈ ker LA , according to (a), we need to show
X
∀ v ik i = αik l vli .
k∈{1,...,N }
l∈IfA : l>ik

Indeed, we have
X X
v ik i = α ik i = αik l vli = αik l δli ,
l∈IfA : l>ik l∈IfA : l>ik

which holds, since, if Ik := {l ∈ IfA : l > ik } = ∅, then the sum is empty and, thus, 0
with vik i = 0 as well, due to i < ik , and, if Ik 6= ∅, then, due to the Kronecker δ, the
sum evaluates to αik i as well. To prove (8.15), according to (8.10b), it suffices to show
s ∈ L(A|b) and, by (a), this is equivalent to
X
∀ s ik = α ik 0 + α ik l s l .
k∈{1,...,N }
l∈IfA : l>ik

However, the validity of these equations is immediate from the definition of s, which, in
particular, implies the sums to vanish (sl = 0 for each l ∈ IfA ). 
Example 8.12. Consider the linear system Ax = b, where
   
1 2 −1 3 0 1 1
0 0 1 1 −1 1 
2 ∈ R4 .
 
A := 
0 0 0 ∈ M(4, 6, R), b :=
0 1 1  3
0 0 0 0 0 0 0
Since (A|b) is in echelon form and b does not have any pivot elements, the set of solutions
L(A|b) is nonempty, and it can be obtained using back substitution according to Alg.
8.10: Using the same notation as in Alg. 8.10, we have
m = 4, n = 6, IpA = {1, 3, 5}, N = 3, IfA = {2, 4, 6}.
8 LINEAR SYSTEMS 161

In particular, we have IpA = {i1 , i2 , i3 } with i1 = 1, i2 = 3, i3 = 5 and the recursion for


the xik , αik l is

x5 = 3 − x6 ,
x3 = 2 − x6 + x5 − x4 = 5 − 2x6 − x4 ,
x1 = 1 − x6 − 3x4 + x3 − 2x2 = 6 − 3x6 − 4x4 − 2x2 .

Thus, the αik l are given by

α50 = 3, α52 = 0, α54 = 0, α56 = −1,


α30 = 5, α32 = 0, α34 = −1, α36 = −2,
α10 = 6, α12 = −2, α14 = −4, α16 = −3,

and, from Th. 8.11(b), we obtain


         
6 
 −2 −4 −3 

0  
 1 0 0 


          

5 0 −1 −2
L(A|b) = s + ker LA =   + x2   + x4   + x6   : x2 , x4 , x6 ∈ R .
0   0  1 0 
          

3  

 0   0  −1  


 
0 0 0 1

8.3.2 Elementary Row Operations, Variable Substitution

As seen in the previous section, if the augmented matrix (A|b) of the linear system
Ax = b is in echelon form, then the set of solutions can be obtained via back substitution.
Thus, it is desirable to transform the augmented matrix of a linear system into echelon
form without changing the set of solutions. This can be accomplished in finitely many
steps by the so-called Gaussian elimination algorithm of Alg. 8.17 below (cf. Th. 8.19),
using elementary row operations and variable substitutions, where variable substitutions
actually turn out to constitute particular elementary row operations.

Definition 8.13. Let F be a field, m, n ∈ N, and A ∈ M(m, n, F ). Let r1 , . . . , rm


denote the rows of A. The following three operations, which transform A into another
m × n matrix Aer over F , are known as elementary row operations:

(a) Row Switching: Switching two rows ri and rj , where i, j ∈ {1, . . . , m}.

(b) Row Multiplication: Replacing a row ri by some nonzero multiple λri of that row,
i ∈ {1, . . . , m}, λ ∈ F \ {0}.
8 LINEAR SYSTEMS 162

(c) Row Addition: Replacing a row ri by the sum ri + λrj of that row and a multiple
of another row, (i, j) ∈ {1, . . . , m}2 , i 6= j, λ ∈ F .
Definition and Remark 8.14. Let F be a field. Consider a linear system
n
X
∀ ajk xk = bj ,
j∈{1,...,m}
k=1

where m, n ∈ N; b1 , . . . , bm ∈ F ; and aji ∈ F , j ∈ {1, . . . , m}, i ∈ {1, . . . , n}. For


aji 6= 0, one has
n
X
−1 −1
xi = aji bj − aji ajk xk , (8.16)
k=1,
k6=i
P
and a variable substitution means replacing the l-th equation nk=1 alk xk = bl of the
system by the equation, where the variable xi has been substituted using (8.16) with
some j 6= l, i.e. by the equation
 
i−1
X n
X n
X
 −1 −1

alk xk + ali aji bj − aji
 ajk xk  +
 alk xk = bl ,
k=1 k=1, k=i+1
k6=i

which, after combining the coefficients of the same variables, reads


i−1
X n
X
−1
(alk − ali aji ajk ) xk + 0 · xi + (alk − ali a−1 −1
ji ajk ) xk = bl − ali aji bj . (8.17)
k=1 k=i+1

Comparing with Def. 8.13(c), we see that variable substitution is a particular case of
row addition: One obtains (8.17) by replacing row rl of (A|b) by the sum rl − ali a−1
ji rj .

Theorem 8.15. Let F be a field and m, n ∈ N. Let A = (aji ) ∈ M(m, n, F ) and


b ∈ M(m, 1, F ) ∼ = F m and consider the linear system Ax = b. Applying elementary row
operations (and, in particular, variable substitutions as in Def. and Rem. 8.14) to the
augmented matrix (A|b) of the linear system Ax = b does not change the system’s set of
solutions, i.e. if (Aer |ber ) is the new matrix obtained from applying an elementary row
operation according to Def. 8.13 to (A|b), then L(A|b) = L(Aer |ber ).

Proof. For each of the three elementary row operations, it is immediate from the arith-
metic laws holding in the field F that, for each x ∈ F n , one has x ∈ L(A|b) if, and
only if, x ∈ L(Aer |ber ) (where the inverse operation of row switching is merely switching
rows ri and rj again, the inverse operation of row multiplication by λ ∈ F \ {0} is row
multiplication by λ−1 , and the inverse operation of row addition ri 7→ ri + λrj , i 6= j, is
ri 7→ ri − λrj ). 
8 LINEAR SYSTEMS 163

Corollary 8.16. Let F be a field, A = (aji ) ∈ M(m, n, F ), m, n ∈ N. Applying


elementary row operations to A does not change the rank of A, i.e. if Aer is the new
matrix obtained from applying an elementary row operation according to Def. 8.13 to A,
then rk A = rk Aer .

Proof. Let LA : F n −→ F m and LAer : F n −→ F m be the linear maps associated


with A and Aer , respectively, according to Rem. 8.3. According to Th. 8.15, we have
ker LA = L(A|0) = L(Aer |0) = ker LAer . Thus, according to (8.9), we have rk A =
n − dim ker LA = n − dim ker LAer = rk Aer . 

8.3.3 Gaussian Elimination

Algorithm 8.17 (Gaussian Elimination). Let F be a field and m, n ∈ N. Let A =


(aji ) ∈ M(m, n, F ). The Gaussian elimination algorithm is the following procedure
that, starting with A, recursively applies elementary row operations of Def. 8.13:
Let A(1) := A, r(1) := 1. For each k ≥ 1, as long as r(k) < m and k ≤ n, the Gaussian
(k) (k+1)
elimination algorithm transforms A(k) = (aji ) ∈ M(m, n, F ) into A(k+1) = (aji ) ∈
M(m, n, F ) and r(k) ∈ N into r(k + 1) ∈ N by performing precisely one of the following
actions:
(k)
(a) If ar(k),k 6= 0, then, for each i ∈ {r(k) + 1, . . . , m}, replace the ith row by the ith
(k) (k)
row plus −aik /ar(k),k times the r(k)th row6 . Set r(k + 1) := r(k) + 1.
(k) (k)
(b) If ar(k),k = 0, and there exists i ∈ {r(k) + 1, . . . , m} such that aik 6= 0, then one
chooses such an i ∈ {r(k) + 1, . . . , m} and switches the ith with the r(k)th row.
One then proceeds as in (a).
(k)
(c) If aik = 0 for each i ∈ {r(k), . . . , m}, then set A(k+1) := A(k) and r(k + 1) := r(k).
Remark 8.18. Note that the Gaussian elimination algorithm of Alg. 8.17 stops after
at most n steps. Moreover, in its kth step, it can only manipulate elements that have
row number at least r(k) and column number at least k (the claim with respect to the
(k)
column numbers follows since aij = 0 for each (i, j) ∈ {r(k), . . . , m} × {1, . . . , k − 1},
cf. the proof of the following Th. 8.19).
Theorem 8.19. Let F be a field and m, n ∈ N. Let A = (aji ) ∈ M(m, n, F ). The
Gaussian elimination algorithm of Alg. 8.17 yields a matrix à ∈ M(m, n, F ) in echelon
6
Comparing with Def. and Rem. 8.14, we observe this to be a variable substitution (in the case,
where the matrix A represents the augmented matrix of a linear system), substituting the variable xk
in the ith row, using the r(k)th row to replace xk .
8 LINEAR SYSTEMS 164

form. More precisely, if r(k) = m or k = n + 1, then à := A(k) is in echelon form.


Moreover, rk à = rk A and, if A = (B|b) and à = (B̃|b̃) represent the augmented
matrices of the linear systems Bx = b and B̃x = b̃, respectively, then L(B|b) = L(B̃|b̃).

Proof. Let N ∈ {1, . . . , n + 1} be the maximal k occurring during the Gaussian elimina-
tion algorithm, i.e. à := A(N ) . To prove that à is in echelon form, we show by induction
on k ∈ {1, . . . , N } that the first k − 1 columns of A(k) are in echelon form as well as the
(k)
first r(k) rows of A(k) with aij = 0 for each (i, j) ∈ {r(k), . . . , m} × {1, . . . , k − 1}. For
k = 1, the assertion is trivially true. By induction, we assume the assertion for k < N
and prove it for k + 1 (for k = 1, we do not assume anything). As already stated in
Rem. 8.18 above, the kth step of Alg. 8.17 does not change the first r(k) − 1 rows of A(k)
and it does not change the first k − 1 columns of A(k) , since (by induction hypothesis)
(k)
aij = 0 for each (i, j) ∈ {r(k), . . . , m} × {1, . . . , k − 1}. Moreover, if we are in the case
of Alg. 8.17(a), then
(k)
(k+1) (k) aik (k)
∀ aik = aik − (k)
ar(k),k = 0.
i∈{r(k)+1,...,m} ar(k),k

(k+1)
In each case, after the application of (a), (b), or (c) of Alg. 8.17, aij = 0 for each
(i, j) ∈ {r(k + 1), . . . , m} × {1, . . . , k}. We have to show that the first r(k + 1) rows of
A(k+1) are in echelon form. For Alg. 8.17(c), it is r(k + 1) = r(k) and there is nothing
(k+1)
to prove. For Alg. 8.17(a),(b), we know ar(k+1),j = 0 for each j ∈ {1, . . . , k}, while
(k+1)
ar(k),k 6= 0, showing that the first r(k + 1) rows of A(k+1) are in echelon form in each
case. Thus, we know that the first r(k + 1) rows of A(k+1) are in echelon form, and
all elements in the first k columns of A(k+1) below row r(k + 1) are zero, showing that
the first k columns of A(k+1) are also in echelon form. As, at the end of the Gaussian
elimination algorithm, r(k) = m or k = n + 1, we have shown à to be in echelon form.
That rk à = rk A is an immediate consequence of Cor. 8.16 combined with a simple
induction. If A = (B|b) and à = (B̃|b̃) represent the augmented matrices of the linear
systems Bx = b and B̃x = b̃, respectively, then L(B|b) = L(B̃|b̃) is an immediate
consequence of Th. 8.15 combined with a simple induction. 

Remark 8.20. To avoid mistakes, especially when performing the Gaussian elimination
algorithm manually, it is advisable to check the row sums after each step. It obviously
suffices to consider how row sums are changed if (a) of Alg. 8.17 is carried out. Moreover,
let i ∈ {r(k) + 1, . . . , m}, as only rows with row numbers i ∈ {r(k) + 1, . . . , m} can be
(k) P (k)
changed by (a) in the kth step. If si = nj=1 aij is the sum of the ith row before (a)
8 LINEAR SYSTEMS 165

has been carried out in the kth step, then


n n
!
(k)
(k+1)
X (k+1)
X (k) aik (k)
si = aij = aij − (k)
ar(k),j
j=1 j=1 ar(k),k
(k)
(k) aik (k)
= si − (k)
sr(k) (8.18)
ar(k),k

must be the sum of the ith row after (a) has been carried out in the kth step.

Remark 8.21. Theorem 8.19 shows that the Gaussian elimination algorithm of Alg.
8.17 provides a reliable systematic way of solving linear systems (when combined with
the back substitution Alg. 8.10). It has few simple rules, which make it easy to implement
as computer code. However, the following Ex. 8.22 illustrates that Alg. 8.17 is not
always the most efficient way to obtain a solution and, especially when solving small
linear systems manually, it often makes sense to deviate from it. In the presence of
roundoff errors, there can also be good reasons to deviate from Alg. 8.17, which are
typically addressed in classes on Numerical Analysis (see, e.g. [Phi21, Sec. 5]).

Example 8.22. Over the field R, consider the linear system


    
6 0 1 x1 4
Ax = 0 2 0  x2  = 6 =: b, (8.19)
1 −1 −1 x3 0

which we can also write in the form


6x1 + x3 = 4
2x2 = 6
x1 − x2 − x3 = 0

One can quickly solve this linear system without making use of Alg. 8.17 and Alg.
8.10: The second equation yields x2 = 3, which, plugged into the third equation, yields
x1 − x3 = 3. Adding this to the first equation yields 7x1 = 7, i.e. x1 = 1. Thus, the first
equation then provides x3 = 4 − 6 = −2 and we obtain
 
 1 
L(A|b) =  3  . (8.20)
 
−2

To illustrate Alg. 8.17 and Alg. 8.10, we solve (8.19) once again, first using Alg. 8.17 to
transform the system to echelon form, then using Alg. 8.10 to obtain the solution: For
8 LINEAR SYSTEMS 166

(k)
each k ∈ {1, 2, 3}, we provide (A(k) |b(k) ), r(k), and the row sums si , i ∈ {1, 2, 3}:
 
6 0 1 4
(A(1) |b(1) ) := (A|b) =  0 2 0 6  , r(1) := 1,
1 −1 −1 0
(1) (1) (1)
s1 = 11, s2 = 8, s3 = −1.

For k = 2, we apply Alg. 8.10(a), where we replace the third row of (A(1) |b(1) ) by the
third row plus (− 16 ) times the first row to obtain
 
6 0 1 4
(A(2) |b(2) ) :=  0 2 0 6  , r(2) := r(1) + 1 = 2,
0 −1 − 6 − 23
7

(2) (2) (2) (1) 1 (1) 11 17


s1 = 11, s2 = 8, s3 = s3 − · s1 = −1 − =− .
6 6 6
For k = 3, we apply Alg. 8.10(a) again, where we replace the third row of (A(2) |b(2) ) by
the third row plus 21 times the second row to obtain
 
6 0 1 4
(A(3) |b(3) ) :=  0 2 0 6  , r(3) := r(2) + 1 = 3,
0 0 − 67 73
(3) (3) (3) (2) 1 (2) 17 7
s1 = 11, s2 = 8, s3 = s3 + · s2 = − + 4 = .
2 6 6
Since r(3) = 3, Alg. 8.10 is done and (Ã|b̃) := (A(3) |b(3) ) is in echelon form. In particular,
we see 3 = rk à = rk(Ã|b̃) = rk(A|b), i.e. Ax = b has a unique solution, which we now
determine via the back substitution Alg. 8.10: We obtain
7 6 6 1
x3 = − · = −2, x2 = = 3, x1 = (4 − x3 ) = 1,
3 7 2 6
recovering (8.20).
Example 8.23. (a) In Ex. 8.2(b), we saw that, in R4 , the question if b ∈ hv1 , v2 , v3 i
with b := (1, 2, 2, 1) and v1 := (1, 2, 0, 3), v2 := (0, 3, 2, 1), v3 := (1, 1, 1, 1) is
equivalent to determining if the linear system Ax = b with augmented matrix
 
1 0 1 | 1
2 3 1 | 2
(A|b) := 
0 2 1 | 2

3 1 1 | 1
8 LINEAR SYSTEMS 167

has a solution. Let us answer this question by transforming the system to echelon
form via elementary row operations: Replacing the 2nd row by the 2nd row plus
(−2) times the 1st row, and replacing the 4th row by the 4th row plus (−3) times
the 1st row yields  
1 0 1 | 0
0 3 −1 | 0 
0 2 1 | 2  .
 

0 1 −2 | −2
Switching rows 2 and 4 yields
 
1 0 1 | 0
0
 1 −2 | −2
.
0 2 1 | 2
0 3 −1 | 0

Replacing the 3rd row by the 3rd row plus (−2) times the 2nd row, and replacing
the 4th row by the 4th row plus (−3) times the 2nd row yields
 
1 0 1 | 0
0 1 −2 | −2
0 0 5 | 6  .
 

0 0 5 | 6

Replacing the 4th row by the 4th row plus (−1) times the 3rd row yields
 
1 0 1 | 0
0 1 −2 | −2
0 0 5 | 6  .
(Ã|b̃) :=  

0 0 0 | 0

Thus, 3 = rk A = rk à = rk(Ã|b̃) = rk(A|b), showing Ax = b to have a unique


solution and b ∈ hv1 , v2 , v3 i.

(b) In Ex. 8.2(a), we saw that the question, if the set M := {v1 , v2 , v3 } with v1 , v2 , v3
as in (a) is a linearly dependent subset of R4 , can be equivalently posed as the
question, if the linear system Ax = 0, with the same matrix A as in (a), has a
solution x = (x1 , x2 , x3 ) 6= (0, 0, 0). From (a), we know rk A = rk à = 3, implying
L(A|0) = {0} and M is linearly independent.
8 LINEAR SYSTEMS 168

8.4 LU Decomposition
8.4.1 Definition and Motivation

If one needs to solve linear systems Ax = b with the same matrix A, but varying b (as,
e.g., in the problem of finding an inverse to an n × n matrix A, cf. Ex. 8.2(c)), then it is
not efficient to separately transform each (A|b) into echelon form. A more efficient way
aims at decomposing A into simpler matrices L and U , i.e. A = LU , then facilitating
the simpler solution of Ax = LU x = b when varying b (i.e. without having to transform
(A|b) into echelon form for each new b), cf. Rem. 8.25 below.
Definition 8.24. Let S be a ring with unity and A ∈ M(m, n, S) with m, n ∈ N. A
decomposition
A = LÃ (8.21a)
is called an LU decomposition of A if, and only if, L ∈ BL1m (S) (i.e. L is unipotent lower
triangular) and à ∈ M(m, n, S) is in echelon form. If A is an m × m matrix, then
à ∈ BUm (S) (i.e. à is upper triangular) and (8.21a) is an LU decomposition in the
strict sense, which is emphasized by writing U := Ã:

A = LU. (8.21b)

Remark 8.25. Let F be a field. Suppose the goal is to solve linear systems Ax = b
with fixed matrix A ∈ M(m, n, F ) and varying vector b ∈ F n . Moreover, suppose one
knows an LU decomposition A = LÃ. Then

x ∈ L(A|b) ⇔ Ax = LÃx = b ⇔ Ãx = L−1 b ⇔ x ∈ L(Ã|L−1 b). (8.22)

Thus, solving Ax = LÃx = b is equivalent to solving solving Lz = b for z and then


solving Ãx = z = L−1 b for x. Since à is in echelon form, Ãx = z = L−1 b can be
solved via the back substitution Alg. 8.10; and, since L is lower triangular, Lz = b can
be solved via an analogous version of Alg. 8.10, called forward substitution. To obtain
the entire set L(A|b), one uses L(A|0) = L(Ã|0) and L(A|b) = x0 + L(A|0) for each
x0 ∈ L(A|b), where one finds x0 as described above and L(Ã|0) via the back substitution
Alg. 8.10. Even though the described strategy works fine in many situations, it can fail
in the presence of roundoff errors (we will come back to this issue in Rem. 8.36 below).
Example 8.26. Unfortunately, not every matrix has an LU decomposition: Consider,
for example  
0 1
A = (aji ) := ∈ M(2, 2, R).
1 1
Clearly, rk A = 2, i.e. A is regular. Suppose, A has an LU decomposition, i.e. A = LU
with L = (lji ) unipotent lower triangular and U = (uji ) upper triangular. But then
8 LINEAR SYSTEMS 169

0 = a11 = l11 u11 = u11 , showing the first column of U to be 0, i.e. U is singular, in
contradiction to A being regular.

Even though the previous example shows not every matrix has an LU decomposition, we
will see in Th. 8.34 below that every matrix does have an LU decomposition, provided
one is first allowed to permute its rows in a suitable manner.

8.4.2 Elementary Row Operations Via Matrix Multiplication

The LU decomposition of Th. 8.34 below can be obtained, with virtually no extra
effort, by performing the Gaussian elimination Alg. 8.17. The proof will be founded
on the observation that each elementary row operation can be accomplished via left
multiplication with a suitable matrix: For row multiplication, one multiplies from the
left with a diagonal matrix, for row addition, one multiplies from the left with a so-
called Frobenius matrix or with the transpose of such a matrix (cf. Cor. 8.29 below);
for row switching, one multiplies from the left with a so-called permutation matrix (cf.
Cor. 8.33(a) below) – where, for Alg. 8.17 and the proof of Th. 8.34, we only need row
switching and row addition by means of a Frobenius matrix (not its transpose).

Definition 8.27. Let n ∈ N and let S be a ring with unity. We call a unipotent lower
triangular matrix A = (aji ) ∈ BL1n (S) a Frobenius matrix of index k, k ∈ {1, . . . , n}, if,
and only if, aji = 0 for each i ∈
/ {j, k}, i.e. if, and only if, it has the following form:
 
1
 ... 
 
 
 1 
 
A= ak+1,k 1 . (8.23)
 
 ak+2,k 1 
 .. ... 
 . 
an,k 1

We define 
Frokn (S) := A ∈ M(n, S) : A is Frobenius of index k .

Proposition 8.28. Let S be a ring with unity, n ∈ N.


8 LINEAR SYSTEMS 170

(a) If m ∈ N, A = (aji ) ∈ M(m, n, S), B = (bji ) ∈ Frokm (S), k ∈ {1, . . . , m},


 
1
 ... 
 
 
 1 
 
B= bk+1,k 1 ,
 
 bk+2,k 1 

 .
.. . .. 

bm,k 1
and C = (cji ) = BA ∈ M(m, n, S), then
m
(
X aji for each j ∈ {1, . . . , k},
cji = bjα aαi = (8.24)
α=1
bjk aki + aji for each j ∈ {k + 1, . . . , m}.

(b) Let F := S be a field and A = (aji ) ∈ M(m, n, F ), m ∈ N. Moreover, let


A(k) 7→ A(k+1) be the matrix transformation occurring in the kth step of the Gaussian
elimination Alg. 8.17 and assume the kth step is being performed by Alg. 8.17(a),
i.e. by, for each i ∈ {r(k) + 1, . . . , m}, replacing the ith row by the ith row plus
(k) (k)
−aik /ar(k),k times the r(k)th row, then
r(k)
A(k+1) = Lk A(k) , Lk ∈ From (F ), (8.25a)
where
 
1
 ... 
 
 
 1 
  (k) (k)
Lk =  −lr(k)+1,r(k) 1 , li,r(k) := aik /ar(k),k . (8.25b)
 
 −lr(k)+2,r(k) 1 

 .
.. . .. 

−lm,r(k) 1

(c) Let k ∈ {1, . . . , n}. If L = (lji ) ∈ Frokn (S), then L−1 ∈ Frokn (S). More precisely,
   
1 1
 ...   ... 
   
   
 1   1 
  −1  
L= lk+1,k 1  ⇒ L = −lk+1,k 1 .
   
 lk+2,k 1   −lk+2,k 1 

 .
.. . .. 
 
 .
.. . .. 

ln,k 1 −ln,k 1
8 LINEAR SYSTEMS 171

(d) If  
1 0 ... 0
 l21 1 ... 0
  1
L = (lji ) =  .. .. ... ..  ∈ BLn (S)
 . . .
ln1 ln2 . . . 1
is an arbitrary unipotent lower triangular matrix, then it is the product of the fol-
lowing n − 1 Frobenius matrices:

L = L̃1 · · · L̃n−1 ,

where
 
1
 ... 
 
 
 1 
 ∈ Frokn (S)
 
L̃k :=  lk+1,k 1 for each k ∈ {1, . . . , n − 1}.
 
 lk+2,k 1 

 .
.. . .. 

ln,k 1

Proof. (a): Formula (8.24) is immediate from the definition of matrix multiplication.
(b) follows by applying (a) with B := Lk and A := A(k) .
(c): Applying (a) with A := L and B := L−1 , where L−1 is defined as in the statement
of (c), we can use (8.24) to compute C = BA:
 (

 0 for each j ∈ {1, . . . , k}, i 6= j,

 aji =


 1 for each j ∈ {1, . . . , k}, i = j,
(8.24) 
cji = −ljk · 0 + 0 = 0
 for each j ∈ {k + 1, . . . , n}, i 6= j, k,




 bjk aki + aji = −ljk · 1 + ljk = 0 for each j ∈ {k + 1, . . . , n}, i = k,

 

−ljk · 0 + 1 = 1 for each j ∈ {k + 1, . . . , n}, i = j,

showing C = Id, thereby establishing the case.


8 LINEAR SYSTEMS 172

(d): We proof by induction on k = n − 1, . . . , 1 that


 
1
0 ... 
 
. ... 
 .. 1 
n−1
 
Y 0 0 1 
 
L̃α =  ...  =: Ak .
0 0 lk+1,k 
α=k  
.
0 lk+2,k . .
 
0 1 
. .. .. ... 
 .. ... . . ln−1,n−2 1 
0 . . . 0 lnk . . . ln,n−2 ln,n−1 1

For k = n − 1, there is nothing to prove, since L̃n−1 = An−1 by definition. So let


1 ≤ k < n − 1. We have to show that C := L̃k Ak+1 = Ak . Letting A := Ak+1 and
letting B := L̃k , we can use (8.24) to compute C := BA:
 (

 0 for each j ∈ {1, . . . , k}, j 6= i,

 a ji =
(8.24)
 1 for each j ∈ {1, . . . , k}, j = i,
cji = (

 ljk · 0 + aji = aji for each j ∈ {k + 1, . . . , n}, i 6= k,
bjk aki + aji = l · 1 + 0 = l


jk jk for each j ∈ {k + 1, . . . , n}, i = k,

showing C = Ak , thereby establishing the case. 

Corollary 8.29. Consider the situation of Def. 8.13, i.e. let F be a field, m, n ∈ N,
and A ∈ M(m, n, F ) with rows r1 , . . . , rm .

(a) The elementary row operation of row multiplication, i.e. of replacing row ri by some
nonzero multiple λri of that row, i ∈ {1, . . . , m}, λ ∈ F \{0}, can be accomplished by
left-multiplying A by the diagonal matrix Dλ := diag(d1 , . . . , dm ) ∈ Dm (F ), where
(
λ for j = i,
∀ dj :=
j∈{1,...,m} 1 for j 6= i.

(b) The elementary row operation of row addition, i.e. of replacing row ri by the sum
ri + λrk of that row and a multiple of another row, (i, k) ∈ {1, . . . , m}2 , i 6= k,
λ ∈ F , can be accomplished, for i > k, by left-multiplying A by the Frobenius
8 LINEAR SYSTEMS 173

matrix Fλ ∈ Frokm (F ),
 
1
 ... 
 
  (
 1 
  λ for j = i,
Fλ =  fk+1,k 1 , ∀ fj,k :=

 fk+2,k 1


j∈{k+1,...,m} 0 6 i,
for j =

 .
.. . .. 

fm,k 1

and, for i < k, by left-multiplying A by the transpose of the Frobenius matrix


Fλ ∈ Froim (F ), where Fλ is as above, except with i and k switched.

Proof. (a) is immediate from (7.27a).


(b): For i > k, we apply Prop. 8.28(a) with B := Fλ , obtaining for (cji ) := Fλ A,

ajl
 for each j ∈ {1, . . . , k},
cjl = fjk akl + ajl = λakl + ajl for j = i,


fjk akl + ajl = ajl for each j ∈ {k + 1, . . . , m}, j 6= i,

thereby establishing the case. For i < k, we obtain, for (bji ) := (Fλ )t , (cji ) := (Fλ )t A,
m
(
X ajl for each j ∈ {1, . . . , m}, j 6= i,
cjl = bjα aαl =
α=1
ajl + bjk akl = ajl + λakl for j = i,

once again, establishing the case. 

Definition 8.30. Let n ∈ N and recall the definition of the symmetric group Sn from
Ex. 4.9(b). Moreover, let F be a field (actually, the following definition still makes sense
as long as F is a set containing elements 0 and 1). For each permutation π ∈ Sn , we
define an n × n matrix

Pπ := etπ(1) etπ(2) . . . etπ(n) ∈ M(n, F ), (8.26)

where the ei denote the standard unit (row) vectors of F n . The matrix Pπ is called the
permutation matrix (in M(n, F )) associated with π. We define

Pern (F ) := Pπ ∈ M(n, F ) : π ∈ Sn .

Proposition 8.31. Let n ∈ N and let F be a field.


8 LINEAR SYSTEMS 174

(a) If π ∈ Sn and the columns of Pπ ∈ Pern (F ) are given according to (8.26), then the
rows of Pπ are given according to the inverse permutation π −1 ,
 
eπ−1 (1)
 eπ−1 (2)  t
Pπ =  . . .  , (Pπ ) = Pπ−1 .
 (8.27)
eπ−1 (n)

(b) P := (pji ) ∈ M(n, F ) is a permutation matrix if, and only if, each row and each
column of P have precisely one entry 1 and all other entries of P are 0.

(c) Left multiplication of a matrix A ∈ M(n, m, F ), m ∈ N, by a permutation matrix


Pπ ∈ Pern (F ), permutes the rows of A according to π,
   
— r1 — — rπ(1) —
— r2 — — rπ(2) —
   
Pπ  . = .. ,
— .. — — . —
— rn — — rπ(n) —

which follows from the special case


   
eπ−1 (1) eπ−1 (1) · eti
 eπ−1 (2)  t  eπ−1 (2) · eti 
Pπ eti = 
 . . .  ei = 
   = etπ(i) ,
... 
t
eπ−1 (n) eπ−1 (n) · ei

that holds for each i ∈ {1, . . . , n}.

(d) Right multiplication of a matrix A ∈ M(m, n, F ), m ∈ N, by a permutation matrix


Pπ ∈ Pern (F ) permutes the columns of A according to π −1 ,
   
| | ... | | | ... |
c1 c2 . . . cn  Pπ = cπ−1 (1) cπ−1 (2) . . . cπ−1 (n)  ,
| | ... | | | ... |

which follows from the special case


 
ei Pπ = ei etπ(1) etπ(2) . . . etπ(n) = ei · etπ(1) ei · etπ(2) . . . ei · etπ(n)
= eπ−1 (i) ,

that holds for each i ∈ {1, . . . , n}.


8 LINEAR SYSTEMS 175

(e) For each π, σ ∈ Sn , one has


Pπ Pσ = Pπ◦σ , (8.28)
such that
I : Sn −→ Pern (F ), I(π) := Pπ ,
constitutes a group isomorphism. In particular, Pern (F ) is a subgroup of GLn (F )
and
∀ P −1 = P t . (8.29)
P ∈Pern (F )

Proof. (a): Consider the ith row of (pji ) := Pπ . Then pji = 1 if, and only if, j = π(i)
(since pji also belongs to the ith column). Thus, pji = 1 if, and only if, i = π −1 (j),
which means that the jth row is eπ−1 (j) .
(b): Suppose P = Pπ , π ∈ Sn , n ∈ N, is a permutation matrix. Then (8.26) implies
that each column of P has precisely one 1 and all other elements are 0; (8.27) implies
that each row of P has precisely one 1 and all other elements are 0.
Conversely, suppose that P ∈ M(n, F ) is such that each row and each column of P have
precisely one entry 1 and all other entries of P are 0. Define a map π : {1, . . . , n} −→
{1, . . . , n} by letting π(i) be such that pπ(i),i = 1. Then

P = etπ(1) etπ(2) . . . etπ(n) ,

and it just remains to show that π is a permutation. It suffices to show that π is


surjective. To that end, let k ∈ {1, . . . , n}. Then there is a unique i ∈ {1, . . . , n} such
that pki = 1. But then, according to the definition of π, π(i) = k, showing that π is
surjective and P = Pπ .
(c): Note that, for A = (aji ) ∈ M(n, m, F )
m X
n  
X 0 ... etj ... 0
A= aji |{z} .
i=1 j=1 ith column
8 LINEAR SYSTEMS 176

Thus,
m X
n  
X 0 ... etj ... 0
Pπ A = aji Pπ |{z}
i=1 j=1 ith column
 
eπ−1 (1) 
n
m X t 
X  eπ−1 (2)  0 . . . e j . . . 0
= aji 
 ... 
 |{z}
i=1 j=1 ith column
eπ−1 (n)
 
0 . . . eπ−1 (1) · ej . . . 0
Xm X n 0 . . . eπ−1 (2) · ej . . . 0
 
= aji 
 . . . 

i=1 j=1 0 . . . eπ−1 (n) · ej . . . 0
| {z }
ith column
m X
n  
X 0 ... etπ(j) ... 0
= aji |{z} ,
i=1 j=1 ith column

which establishes the case.


(d): We have
   
  t — c1 — — cπ−1 (1) —
| | ... | — c2 —  (c) — cπ−1 (2) —
 
(8.27) 
c1 c2 . . . cn  Pπ  = Pπ−1  .. = .. 
— . — — . —
| | ... |
— cn — — cπ−1 (n) —
 t
| | ... |
= cπ−1 (1) cπ−1 (2) . . . cπ−1 (n)  ,
| | ... |
proving (d).
(e): For π, σ ∈ Sn , we compute
 
eπ−1 (1)
 eπ−1 (2)  t 
Pπ Pσ =   eσ(1) etσ(2) . . . etσ(n)
 ... 
eπ−1 (n)
(∗) 
= et(π◦σ)(1) et(π◦σ)(2) . . . et(π◦σ)(n) = Pπ◦σ ,
showing (8.28), where, at “(∗)”, we used that
∀ π −1 (i) = σ(j) ⇔ i = (π ◦ σ)(j).
i,j∈{1,...,n}
8 LINEAR SYSTEMS 177

Both surjectivity and injectivity of I are clear from Def. 8.30 and, then, Pern (F ) must
be a group by Prop. 4.11(c). Combining (8.28) with (b) proves (8.29) and, thus, also
shows Pern (F ) to be a subgroup of GLn (F ). 
Definition and Remark 8.32. Let n ∈ N and let F be a field. A permutation matrix
Pτ ∈ Pern (F ), corresponding to a transposition τ = (ji) ∈ Sn (τ permutes i and j
and leaves all other elements fixed), is called a transposition matrix and is denoted by
Pji := Pτ . The transposition matrix Pji has the form
 
1
 .. 
 . 
 
 1 
 
 0 1 
 
 1 
 . 
Pji = 
 .. .
 (8.30)
 1 
 
 1 0 
 
 1 
 

 ...  
1
Since every permutation is the composition of a finite number of transpositions (which we
will prove in Linear Algebra II), it is implied by Prop. 8.31(e) that every permutation
matrix is the product of a finite number of transposition matrices. It is immediate
from Prop. 8.31(c),(d) that left multiplication of a matrix A ∈ M(n, m), m ∈ N,
by Pji ∈ Pern (F ) switches the ith and jth row of A, whereas right multiplication of
A ∈ M(m, n) by Pji ∈ Pern (F ) switches the ith and jth column of A.
Corollary 8.33. Consider the situation of Def. 8.13, i.e. let F be a field, m, n ∈ N,
and A ∈ M(m, n, F ) with rows r1 , . . . , rm .

(a) The elementary row operation of row switching, i.e. of switching rows ri and rj ,
where i, j ∈ {1, . . . , m}, can be accomplished by left-multiplying A by the transposi-
tion matrix Pij ∈ Perm (F ).
(b) Let A(k) 7→ A(k+1) be the matrix transformation occurring in the kth step of the
Gaussian elimination Alg. 8.17 and assume the kth step is being performed by Alg.
8.17(b), i.e. by first switching the ith row with the r(k)th row and then, for each i ∈
(k) (k)
{r(k)+1, . . . , m}, replacing the (new) ith row by the (new) ith row plus −aik /ar(k),k
times the (new) r(k)th row, then
A(k+1) = Lk Pi,r(k) A(k) (8.31)
8 LINEAR SYSTEMS 178

r(k)
with transposition matrix Pi,r(k) ∈ Perm (F ) and with Lk ∈ From (F ) being the
Frobenius matrix of index r(k) defined in (8.25b).

Proof. (a) is immediate from Prop. 8.31(c) (cf. Def. and Rem. (8.32)).
(b) is due to (a) combined with Prop. 8.28(b). 

8.4.3 Existence

Theorem 8.34. Let F be a field and A ∈ M(m, n, F ) with m, n ∈ N. Then there


exists a permutation matrix P ∈ Perm (F ) such that P A has an LU decomposition in
the sense of Def. 8.24. More precisely, if à ∈ M(m, n, F ) is the matrix in echelon form
resulting at the end of the Gaussian elimination Alg. 8.17 applied to A, then there exist a
permutation matrix P ∈ Perm (F ) and a unipotent lower triangular matrix L ∈ BL1m (F )
such that
P A = LÃ. (8.32)
It is an LU decomposition in the strict sense (i.e. with à ∈ BUm (F )) for A being
quadratic (i.e. m × m). If no row switches occurred during the application of Alg. 8.17,
then P = Idm and A itself has an LU decomposition. If A ∈ GLn (F ) has an LU
decomposition, then its LU decomposition is unique.

Proof. Let N ∈ {0, . . . , n} be the final number of steps that is needed to perform the
Gaussian elimination Alg. 8.17. If N = 0, then A consists of precisely one row and there
is nothing to prove (set L := P := (1)). If N ≥ 1, then let A(1) := A and, recursively, for
each k ∈ {1, . . . , N }, let A(k+1) be the matrix obtained in the kth step of the Gaussian
elimination algorithm. If Alg. 8.17(b) is used in the kth step, then, according to Cor.
8.33(b),
A(k+1) = Lk Pk A(k) , (8.33)
where Pk := Pi,r(k) ∈ Perm (F ) is the transposition matrix that switches rows r(k) and i,
r(k)
while Lk ∈ From (F ) is the Frobenius matrix of index r(k) given by (8.25b). If (a) or
(c) of Alg. 8.17 is used in the kth step, then (8.33) also holds, namely with Pk := Idm
for (a) and with Lk := Pk := Idm for (c). By induction, (8.33) implies

à = A(N +1) = LN PN · · · L1 P1 A. (8.34)


To show that we can transform (8.34) into (8.32), we first rewrite the right-hand side of
(8.34) taking into account Pk−1 = Pk for each of the transposition matrices Pk :
à = LN (PN LN −1 PN )(PN PN −1 LN −2 PN −1 PN )(PN PN −1 PN −2 · · · L1 P2 · · · PN )PN PN −1 · · · P2 P1 A.
| {z } | {z } | {z }
Idm Idm Idm
(8.35)
8 LINEAR SYSTEMS 179

Defining

L′N := LN , L′k := PN PN −1 · · · Pk+1 Lk Pk+1 · · · PN −1 PN for each k ∈ {1, . . . , N − 1},


(8.36)
(8.35) takes the form
à = L′N · · · L′1 PN PN −1 · · · P2 P1 A. (8.37)
We now observe that the L′k are still Frobenius matrices of index r(k), except that the
entries lj,r(k) of Lk with j > r(k) have been permuted according to PN PN −1 · · · Pk+1 :
This follows since
   
1 1
 ..   .. 
 .   . 
   

 1 


 1 

 .. ...   .. ... 
 .   . 
Pji   Pji =  ,
 li,r(k) 1   lj,r(k) 1 

 .
.. . ..



 .
.. . ..


   

 lj,r(k) 1 


 li,r(k) 1 

.. ... .. ...
. .
as left multiplication by Pji switches the ith and jth row, switching li,r(k) and lj,r(k)
and moving the corresponding 1’s off the diagonal, whereas right multiplication by Pji
switches ith and jth column, moving both 1’s back onto the diagonal while leaving the
r(k)th column unchanged.
Finally, using that each (L′k )−1 , according to Prop. 8.28(c), exists and is also a Frobenius
matrix, (8.37) becomes
LÃ = P A (8.38)
with
P := PN PN −1 · · · P2 P1 , L := (L′1 )−1 · · · (L′N )−1 . (8.39)
Comparing with Prop. 8.28(d) shows that L = (L′1 )−1 · · · (L′N )−1 is a unipotent lower
triangular matrix, thereby establishing (8.32). Note: From Prop. 8.28(d) and the def-
inition of the Lk , we actually know all entries of L explicitly: Every nonzero entry of
the r(k)th column is given by (8.25b). This will be used when formulating the LU
decomposition algorithm in Sec. 8.4.4 below.
If A is invertible, then so is U := Ã = L−1 A. If A is invertible, and we have LU
decompositions
L1 U1 = A = L2 U2 ,
then we obtain U1 U2−1 = L−11 L2 =: E. As both upper triangular and unipotent lower
triangular matrices are closed under matrix multiplication, the matrix E is unipotent
8 LINEAR SYSTEMS 180

and both upper and lower triangular, showing E = Idm . This, in turn, yields U1 = U2
as well as L1 = L2 , i.e. the claimed uniqueness of the LU decomposition for invertible
A. 

Remark 8.35. Note that, even for invertible A ∈ GLn (F ), the triple (P, L, Ã) of (8.32)
is, in general, not unique. For example
     
1 0 1 1 1 0 1 1
= ,
0 1 1 0 1 1 0 −1
| {z } | {z } | {z } | {z }
P1 A L1 Ã1
    
0 1 1 1 1 0 1 0
= .
1 0 1 0 1 1 0 1
| {z } | {z } | {z } | {z }
P2 A L2 Ã2

Remark 8.36. Let F be a field. Suppose the goal is to solve linear systems Ax = b
with fixed matrix A ∈ M(m, n, F ) and varying vector b ∈ F n . Moreover, suppose one
knows an LU decomposition P A = LÃ with permutation matrix P ∈ Perm (F ). One
can then proceed as in Rem. 8.25 above, except with A replaced by P A and b replaced
by P b. As already indicated in Rem. 8.25, in the presence of roundoff errors, such errors
might be magnified by the use of an LU decomposition. If one works over the fields R
or C, then there are other decompositions (e.g. the so-called QR decomposition) that
tend to behave more benignly in the presence of roundoff errors (cf., e.g., [Phi21, Rem.
5.15] and [Phi21, Sec. 5.4.3]).

8.4.4 Algorithm

Let us crystallize the proof of Th. 8.34 into an algorithm to actually compute the
matrices L, P , and à occurring in decomposition (8.32) of a given à ∈ M(m, n, F ). As
in the proof of Th. 8.19, let N ∈ {1, . . . , n + 1} be the maximal k occurring during the
Gaussian elimination Alg. 8.17 applied to A.

(a) Algorithm for Ã: Starting with A(1) := A, the final step of Alg. 8.17 yields the
matrix à := A(N ) ∈ M(m, n, F ) in echelon form.

(b) Algorithm for P : Starting with P (1) := Idm , in the kth step of Alg. 8.17, define
P (k+1) := Pk P (k) , where Pk := Pi,r(k) if rows r(k) and i are switched according to
Alg. 8.17(b), and Pk := Idm , otherwise. In the last step, one obtains P := P (N ) .

(c) Algorithm for L: Starting with L(1) := 0 (zero matrix), we obtain L(k+1) from L(k)
in the kth step of Alg. 8.17 as follows: For Alg. 8.17(c), set L(k+1) := L(k) . If rows
8 LINEAR SYSTEMS 181

r(k) and i are switched according to Alg. 8.17(b), then we first switch rows r(k) and
i in L(k) to obtain some L̃(k) (this conforms to the definition of the L′k in (8.36), we
will come back to this point below). For Alg. 8.17(a) and for the elimination step of
Alg. 8.17(b), we first copy all elements of L(k) (resp. L̃(k) ) to L(k+1) , but then change
(k+1) (k) (k)
the elements of the r(k)th column according to (8.25b): Set li,r(k) := aik /ar(k),k
for each i ∈ {r(k) + 1, . . . , m}. In the last step, we obtain L from L(N ) by setting
all elements on the diagonal to 1 (postponing this step to the end avoids worrying
about the diagonal elements when switching rows earlier).
To see that this procedure does, indeed, provide the correct L, we go back to the
proof of Th. 8.34: From (8.38), (8.39), and (8.36), it follows that the r(k)th column
of L is precisely the r(k)th column of L−1k permuted according to PN · · · Pk+1 . This
is precisely what is described in (c) above: We start with the r(k)th column of L−1 k
(k+1) (k) (k)
by setting li,r(k) := aik /ar(k),k and then apply Pk+1 , . . . , PN during the remaining
steps.

Remark 8.37. Note that in the kth step of the algorithm for Ã, we eliminate all
elements of the kth column below the row with number r(k), while in the kth step of
the algorithm for L, we populate elements of the r(k)th column below the diagonal for
the first time. Thus, when implementing the algorithm in practice, one can save memory
capacity by storing the new elements for L at the locations of the previously eliminated
elements of A. This strategy is sometimes called compact storage.

Example 8.38. Let us determine the LU decomposition (8.32) (i.e. the matrices P , L,
and Ã) for the matrix
 
1 4 2 3
0 0 1 4
A :=   ∈ M(4, 4, R).
2 6 3 1
1 2 1 0

According to the algorithm described above, we start by initializing

A(1) := A, P (1) := Id4 , L(1) := 0, r(1) := 1

(where the function r is the one introduced in the Gaussian elimination Alg. 8.17). We
8 LINEAR SYSTEMS 182

use Alg. 8.17(a) to eliminate in the first column, obtaining


 
0 0 0 0
0 0 0 0
P (2) = P (1) = Id4 , L(2) =
2
,
0 0 0
1 0 0 0
 
1 4 2 3
0 0 1 4
A(2) =
0 −2 −1 −5 ,
 r(2) = r(1) + 1 = 2.
0 −2 −1 −3
We now need to apply Alg. 8.17(b) and we switch rows 2 and 3 using P23 :
 
1 0 0 0
0 0 1 0 
P (3) = P23 P (2) = 
0 1 0 0  ,

0 0 0 1
   
0 0 0 0 1 4 2 3
 , P23 A(2) = 0 −2 −1 −5 .
2 0 0 0  
P23 L(2) = 
0 0 0 0 0 0 1 4
1 0 0 0 0 −2 −1 −3
Eliminating the second column yields
   
0 0 0 0 1 4 2 3
2 0 0 0  , A(3) = 0 −2 −1 −5 ,
  
L(3) = 
0 r(3) = r(2) + 1 = 3.
0 0 0  0 0 1 4
1 1 0 0 0 0 0 2
Accidentally, elimination of the third column does not require any additional work and
we have
   
1 0 0 0 1 0 0 0
0 0 1 0 2 1 0 0 
P = P (4) = P (3) =  (4) (3)
0 1 0 0 , L = L , L = 0 0 1 0 ,
  

0 0 0 1 1 1 0 1
 
1 4 2 3
 0 −2 −1 −5 
à = A(4) = A(3) = 
0 0
 , r(4) = r(3) + 1 = 4.
1 4
0 0 0 2

Recall that L is obtained from L(4) by setting the diagonal values to 1. One checks that,
indeed, P A = LÃ.
8 LINEAR SYSTEMS 183

Example 8.39. We remain in the situation of the previous Ex. 8.38. We see that
rk A = rk à = 4, showing A to be invertible. In Ex. 8.2(c), we observed that we obtain
the columns vk of A−1 by solving the linear systems

Av1 = e1 , . . . , Avn = en ,

where n = 4 in the present case and where e1 , . . . , en denote the standard (column) basis
vectors. We now determine the vk using the LU decomposition of Ex. 8.38 together with
the strategies described in Rem. 8.36 and Rem. 8.25: First, with
 
1 0 0 0
2 1 0 0 
L= 0 0 1 0 

1 1 0 1

we solve

Lz1 = P e1 = e1 , Lz2 = P e2 = e3 , Lz3 = P e3 = e2 , Lz4 = P e4 = e4 ,

using forward substitution:

z11 = 1, z12 = −2z11 = −2, z13 = 0, z14 = −z11 − z12 = −1 + 2 = 1,


z21 = 0, z22 = 0, z23 = 1, z24 = 0,
z31 = 0, z32 = 1 − 2z31 = 1, z33 = 0, z34 = −z31 − z32 = −1,
z41 = 0, z42 = 0, z43 = 0, z44 = 1,

obtaining the column vectors


       
1 0 0 0
−2 0 1 0
 0  , z2 = 1 ,
z1 =     z3 = 
 0 ,
 z4 = 
0 .

1 0 −1 1

Next, with  
1 4 2 3
0 −2 −1 −5
à = 
0 0

1 4
0 0 0 2
we solve
Ãv1 = z1 , Ãv2 = z2 , Ãv3 = z3 , Ãv4 = z4 ,
A AXIOMATIC SET THEORY 184

using backward substitution:


 
1 1 3 1
v14 = , v13 = 0 − 4v14 = −2, v12 = − (−2 + v13 + 5v14 ) = , v11 = 1 − 4v12 − 2v13 − 3v14 = ,
2 2 4 2
 
1 1
v24 = 0, v23 = 1 − 4v24 = 1, v22 = − (0 + v23 + 5v24 ) = − , v21 = 0 − 4v22 − 2v23 − 3v24 = 0,
2 2
 
1 1 1 3
v34 = − , v33 = 0 − 4v34 = 2, v32 = − (1 + v33 + 5v34 ) = − , v31 = 0 − 4v32 − 2v33 − 3v34 = − ,
2 2 4 2
 
1 1 1 7
v44 = , v43 = 0 − 4v44 = −2, v42 = − (0 + v43 + 5v44 ) = − , v41 = 0 − 4v42 − 2v43 − 3v44 = ,
2 2 4 2

obtaining the column vectors


 1     3  7

2
0 −2 2
 3  − 1  − 1  − 1 
v1 =  4  , v2 =  2  , v3 =  4  , v4 =  4  .
       
−2  1   2   −2 
1
2
0 − 12 1
2

Thus, 
2 0 −6 14
1  3 −2 −1 −1
A−1 = · .
4 −8 4 8 −8
2 0 −2 2
One verifies
     
1 4 2 3 2 0 −6 14 1 0 0 0
0 0 1 4 1  3 −2 −1 −1 0 1 0 0
AA−1 = · · = .
2 6 3 1 4 −8 4 8 −8 0 0 1 0
1 2 1 0 2 0 −2 2 0 0 0 1

A Axiomatic Set Theory

A.1 Motivation, Russell’s Antinomy


As it turns out, naive set theory, founded on the definition of a set according to Cantor
(as stated at the beginning of Sec. 1.3) is not suitable to be used in the foundation of
mathematics. The problem lies in the possibility of obtaining contradictions such as
Russell’s antinomy, after Bertrand Russell, who described it in 1901.
Russell’s antinomy is obtained when considering the set X of all sets that do not con-
tain themselves as an element: When asking the question if X ∈ X, one obtains the
contradiction that X ∈ X ⇔ X ∈ / X:
A AXIOMATIC SET THEORY 185

Suppose X ∈ X. Then X is a set that contains itself. But X was defined to contain
only sets that do not contain themselves, i.e. X ∈
/ X.
So suppose X ∈
/ X. Then X is a set that does not contain itself. Thus, by the definition
of X, X ∈ X.
Perhaps you think Russell’s construction is rather academic, but it is easily translated
into a practical situation. Consider a library. The catalog C of the library should contain
all the library’s books. Since the catalog itself is a book of the library, it should occur
as an entry in the catalog. So there can be catalogs such as C that have themselves as
an entry and there can be other catalogs that do not have themselves as an entry. Now
one might want to have a catalog X of all catalogs that do not have themselves as an
entry. As in Russell’s antinomy, one is led to the contradiction that the catalog X must
have itself as an entry if, and only if, it does not have itself as an entry.
One can construct arbitrarily many versions, which we will not do. Just one more:
Consider a small town with a barber, who, each day, shaves all inhabitants, who do not
shave themselves. The poor barber now faces a terrible dilemma: He will have to shave
himself if, and only if, he does not shave himself.
To avoid contradictions such as Russell’s antinomy, axiomatic set theory restricts the
construction of sets via so-called axioms, as we will see below.

A.2 Set-Theoretic Formulas


The contradiction of Russell’s antinomy is related to Cantor’s sets not being hierarchical.
Another source of contradictions in naive set theory is the imprecise nature of informal
languages such as English. In (1.6), we said that
A := {x ∈ B : P (x)}
defines a subset of B if P (x) is a statement about an element x of B. Now take
B := N := {1, 2, . . . } to be the set of the natural numbers and let
P (x) := “The number x can be defined by fifty English words or less”. (A.1)
Then A is a finite subset of N, since there are only finitely many English words (if you
think there might be infinitely many English words, just restrict yourself to the words
contained in some concrete dictionary). Then there is a smallest natural number n that
is not in A. But then n is the smallest natural number that can not be defined by
fifty English words or less, which, actually, defines n by less than fifty English words, in
contradiction to n ∈/ A.
To avoid contradictions of this type, we require P (x) to be a so-called set-theoretic
formula.
A AXIOMATIC SET THEORY 186

Definition A.1. (a) The language of set theory consists precisely of the following sym-
bols: ∧, ¬, ∃, (, ), ∈, =, vj , where j = 1, 2, . . . .

(b) A set-theoretic formula is a finite string of symbols from the above language of set
theory that can be built using the following recursive rules:

(i) vi ∈ vj is a set-theoretic formula for i, j = 1, 2, . . . .


(ii) vi = vj is a set-theoretic formula for i, j = 1, 2, . . . .
(iii) If φ and ψ are set-theoretic formulas, then (φ) ∧ (ψ) is a set-theoretic formula.
(iv) If φ is a set-theoretic formulas, then ¬(φ) is a set-theoretic formula.
(v) If φ is a set-theoretic formulas, then ∃vj (φ) is a set-theoretic formula for
j = 1, 2, . . . .

Example A.2. Examples of set-theoretic formulas are (v3 ∈ v5 ) ∧ (¬(v2 = v3 )),


∃v1 (¬(v1 = v1 )); examples of symbol strings that are not set-theoretic formulas are
v1 ∈ v2 ∈ v3 , ∃∃¬, and ∈ v3 ∃.

Remark A.3. It is noted that, for a given finite string of symbols, a computer can, in
principle, check in finitely many steps, if the string constitutes a set-theoretic formula
or not. The symbols that can occur in a set-theoretic formula are to be interpreted
as follows: The variables v1 , v2 , . . . are variables for sets. The symbols ∧ and ¬ are
to be interpreted as the logical operators of conjunction and negation as described in
Sec. 1.2.2. Similarly, ∃ stands for an existential quantifier as in Sec. 1.4: The statement
∃vj (φ) means “there exists a set vj that has the property φ”. Parentheses ( and ) are
used to make clear the scope of the logical symbols ∃, ∧, ¬. Where the symbol ∈ occurs,
it is interpreted to mean that the set to the left of ∈ is contained as an element in the
set to the right of ∈. Similarly, = is interpreted to mean that the sets occurring to the
left and to the right of = are equal.

Remark A.4. A disadvantage of set-theoretic formulas as defined in Def. A.1 is that


they quickly become lengthy and unreadable (at least to the human eye). To make
formulas more readable and concise, one introduces additional symbols and notation.
Formally, additional symbols and notation are always to be interpreted as abbreviations
or transcriptions of actual set-theoretic formulas. For example, we use the rules of Th.
1.11 to define the additional logical symbols ∨, ⇒, ⇔ as abbreviations:

(φ) ∨ (ψ) is short for ¬((¬(φ)) ∧ (¬(ψ))) (cf. Th. 1.11(j)), (A.2a)
(φ) ⇒ (ψ) is short for (¬(φ)) ∨ (ψ) (cf. Th. 1.11(a)), (A.2b)
(φ) ⇔ (ψ) is short for ((φ) ⇒ (ψ)) ∧ ((ψ) ⇒ (φ)) (cf. Th. 1.11(b)). (A.2c)
A AXIOMATIC SET THEORY 187

Similarly, we use (1.17a) to define the universal quantifier:

∀vj (φ) is short for ¬(∃vj (¬(φ))). (A.2d)

Further abbreviations and transcriptions are obtained from omitting parentheses if it is


clear from the context and/or from Convention 1.10 where to put them in, by writing
variables bound by quantifiers under the respective quantifiers (as in Sec. 1.4), and by
using other symbols than vj for set variables. For example,

∀ (φ ⇒ ψ) transcribes ¬(∃v1 (¬((¬(φ)) ∨ (ψ)))).


x

Moreover,

vi 6= vj is short for ¬(vi = vj ); vi ∈


/ vj is short for ¬(vi ∈ vj ). (A.2e)

Remark A.5. Even though axiomatic set theory requires the use of set-theoretic for-
mulas as described above, the systematic study of formal symbolic languages is the
subject of the field of mathematical logic and is beyond the scope of this class (see, e.g.,
[EFT07]). In Def. and Rem. 1.15, we defined a proof of statement B from statement A1
as a finite sequence of statements A1 , A2 , . . . , An such that, for 1 ≤ i < n, Ai implies
Ai+1 , and An implies B. In the field of proof theory, also beyond the scope of this class,
such proofs are formalized via a finite set of rules that can be applied to (set-theoretic)
formulas (see, e.g., [EFT07, Sec. IV], [Kun12, Sec. II]). Once proofs have been formal-
ized in this way, one can, in principle, mechanically check if a given sequence of symbols
does, indeed, constitute a valid proof (without even having to understand the actual
meaning of the statements). Indeed, several different computer programs have been
devised that can be used for automatic proof checking, for example Coq [Wik15a], HOL
Light [Wik15b], and Isabelle [Wik15c] to name just a few.

A.3 The Axioms of Zermelo-Fraenkel Set Theory


Axiomatic set theory seems to provide a solid and consistent foundation for conducting
mathematics, and most mathematicians have accepted it as the basis of their everyday
work. However, there do remain some deep, difficult, and subtle philosophical issues
regarding the foundation of logic and mathematics (see, e.g., [Kun12, Sec. 0, Sec. III]).

Definition and Remark A.6. An axiom is a statement that is assumed to be true


without any formal logical justification. The most basic axioms (for example, the stan-
dard axioms of set theory) are taken to be justified by common sense or some underlying
philosophy. However, on a less fundamental (and less philosophical) level, it is a common
mathematical strategy to state a number of axioms (for example, the axioms defining
A AXIOMATIC SET THEORY 188

the mathematical structure called a group), and then to study the logical consequences
of these axioms (for example, group theory studies the statements that are true for all
groups as a consequence of the group axioms). For a given system of axioms, the ques-
tion if there exists an object satisfying all the axioms in the system (i.e. if the system
of axioms is consistent, i.e. free of contradictions) can be extremely difficult to answer.

We are now in a position to formulate and discuss the axioms of axiomatic set the-
ory. More precisely, we will present the axioms of Zermelo-Fraenkel set theory, usually
abbreviated as ZF, which are Axiom 0 – Axiom 8 below. While there exist various
set theories in the literature, each set theory defined by some collection of axioms, the
axioms of ZFC, consisting of the axioms of ZF plus the axiom of choice (Axiom 9, see
Sec. A.4 below), are used as the foundation of mathematics currently accepted by most
mathematicians.

A.3.1 Existence, Extensionality, Comprehension

Axiom 0 Existence:
∃ (X = X).
X
Recall that this is just meant to be a more readable transcription of the
set-theoretic formula ∃v1 (v1 = v1 ). The axiom of existence states that there
exists (at least one) set X.

In Def. 1.18 two sets are defined to be equal if, and only if, they contain precisely the
same elements. In axiomatic set theory, this is guaranteed by the axiom of extensionality:

Axiom 1 Extensionality:
 
∀ ∀ ∀ (z ∈ X ⇔ z ∈ Y ) ⇒ X = Y .
X Y z

Following [Kun12], we assume that the substitution property of equality is part of the
underlying logic, i.e. if X = Y , then X can be substituted for Y and vice versa without
changing the truth value of a (set-theoretic) formula. In particular, this yields the
converse to extensionality:
 
∀ ∀ X = Y ⇒ ∀ (z ∈ X ⇔ z ∈ Y ) .
X Y z

Before we discuss further consequences of extensionality, we would like to have the


existence of the empty set. However, Axioms 0 and 1 do not suffice to prove the exis-
tence of an empty set (see [Kun12, I.6.3]). This, rather, needs the additional axiom of
A AXIOMATIC SET THEORY 189

comprehension. More precisely, in the case of comprehension, we do not have a single


axiom, but a scheme of infinitely many axioms, one for each set-theoretic formula. Its
formulation makes use of the following definition:
Definition A.7. One obtains the universal closure of a set-theoretic formula φ, by
writing ∀ in front of φ for each variable vj that occurs as a free variable in φ (recall
vj
from Def. 1.31 that vj is free in φ if, and only if, it is not bound by a quantifier in φ).

Axiom 2 Comprehension Scheme: For each set-theoretic formula φ, not containing Y


as a free variable, the universal closure of
 
∃ ∀ x ∈ Y ⇔ (x ∈ X ∧ φ)
Y x

is an axiom. Thus, the comprehension scheme states that, given the set X,
there exists (at least one) set Y , containing precisely the elements of X that
have the property φ.

Remark A.8. Comprehension does not provide uniqueness. However, if both Y and
Y ′ are sets containing precisely the elements of X that have the property φ, then
 
∀ x ∈ Y ⇔ (x ∈ X ∧ φ) ⇔ x ∈ Y ′ ,
x

and, then, extensionality implies Y = Y ′ . Thus, due to extensionality, the set Y given
by comprehension is unique, justifying the notation

{x : x ∈ X ∧ φ} := {x ∈ X : φ} := Y (A.3)

(this is the axiomatic justification for (1.6)).


Theorem A.9. There exists a unique empty set (which we denote by ∅ or by 0 – it is
common to identify the empty set with the number zero in axiomatic set theory).

Proof. Axiom 0 provides the existence of a set X. Then comprehension allows us to


define the empty set by
0 := ∅ := {x ∈ X : x 6= x},
where, as explained in Rem. A.8, extensionality guarantees uniqueness. 
Remark A.10. In Rem. A.4 we said that every formula with additional symbols and
notation is to be regarded as an abbreviation or transcription of a set-theoretic formula
as defined in Def. A.1(b). Thus, formulas containing symbols for defined sets (e.g. 0
or ∅ for the empty set) are to be regarded as abbreviations for formulas without such
A AXIOMATIC SET THEORY 190

symbols. Some logical subtleties arise from the fact that there is some ambiguity in the
way such abbreviations can be resolved: For example, 0 ∈ X can abbreviate either
   
ψ : ∃ φ(y) ∧ y ∈ X or χ : ∀ φ(y) ⇒ y ∈ X , where φ(y) stands for ∀ (v ∈ / y).
y y v

Then ψ and χ are equivalent if ∃! φ(y) is true (e.g., if Axioms 0 – 2 hold), but they can
y
be nonequivalent, otherwise (see discussion between Lem. 2.9 and Lem. 2.10 in [Kun80]).

At first glance, the role played by the free variables in φ, which are allowed to occur
in Axiom 2, might seem a bit obscure. So let us consider examples to illustrate that
allowing free variables (i.e. set parameters) in comprehension is quite natural:
Example A.11. (a) Suppose φ in comprehension is the formula x ∈ Z (having Z as a
free variable), then the set given by the resulting axiom is merely the intersection
of X and Z:
X ∩ Z := {x ∈ X : φ} = {x ∈ X : x ∈ Z}.

(b) Note that it is even allowed for φ in comprehension to have X as a free variable, so
one can let φ be the formula ∃ (x ∈ u ∧ u ∈ X) to define the set
u
n o
X ∗ := x ∈ X : ∃ (x ∈ u ∧ u ∈ X) .
u

Then, if 0 := ∅, 1 := {0}, 2 := {0, 1}, we obtain


2∗ = {0} = 1.

It is a consequence of extensionality that the mathematical universe consists of sets


and only of sets: Suppose there were other objects in the mathematical universe, for
example a cow C and a monkey M (or any other object without elements, other than
the empty set) – this would be equivalent to allowing a cow or a monkey (or any other
object without elements, other than the empty set) to be considered a set, which would
mean that our set-theoretic variables vj were allowed to be a cow or a monkey as well.
However, extensionality then implies the false statement C = M = ∅, thereby excluding
cows and monkeys from the mathematical universe.
Similarly, {C} and {M } (or any other object that contains a non-set), can not be inside
the mathematical universe. Indeed, otherwise we had
 
∀ x ∈ {C} ⇔ x ∈ {M }
x
A AXIOMATIC SET THEORY 191

(as C and M are non-sets) and, by extensionality, {C} = {M } were true, in contradic-
tion to a set with a cow inside not being the same as a set with a monkey inside. Thus,
we see that all objects of the mathematical universe must be so-called hereditary sets,
i.e. sets all of whose elements (thinking of the elements as being the children of the sets)
are also sets.

A.3.2 Classes

As we need to avoid contradictions such as Russell’s antinomy, we must not require the
existence of a set {x : φ} for each set-theoretic formula φ. However, it can still be
useful to think of a “collection” of all sets having the property φ. Such collections are
commonly called classes:

Definition A.12. (a) If φ is a set-theoretic formula, then we call {x : φ} a class,


namely the class of all sets that have the property φ (typically, φ will have x as a
free variable).

(b) If φ is a set-theoretic formula, then we say the class {x : φ} exists (as a set) if, and
only if   
∃ ∀ x∈X ⇔ φ (A.4)
X x

is true. Then X is actually unique by extensionality and we identify X with the


class {x : φ}. If (A.4) is false, then {x : φ} is called a proper class (and the usual
interpretation is that the class is in some sense “too large” to be a set).

Example A.13. (a) Due to Russell’s antinomy of Sec. A.1, we know that R := {x :
x∈
/ x} forms a proper class.

(b) The universal class of all sets, V := {x : x = x}, is a proper class. Once again,
this is related to Russell’s antinomy: If V were a set, then

R = {x : x ∈
/ x} = {x : x = x ∧ x ∈
/ x} = {x : x ∈ V ∧ x ∈
/ x}

would also be a set by comprehension. However, this is in contradiction to R being


a proper class by (a).

Remark A.14. From the perspective of formal logic, statements involving proper
classes are to be regarded as abbreviations for statements without proper classes. For
example, it turns out that the class G of all sets forming a group is a proper class. But
we might write G ∈ G as an abbreviation for the statement “The set G is a group.”
A AXIOMATIC SET THEORY 192

A.3.3 Pairing, Union, Replacement

Axioms 0 – 2 are still consistent with the empty set being the only set in existence (see
[Kun12, I.6.13]). The next axiom provides the existence of nonempty sets:

Axiom 3 Pairing:
∀ ∀ ∃ (x ∈ Z ∧ y ∈ Z).
x y Z

Thus, the pairing axiom states that, for all sets x and y, there exists a set Z
that contains x and y as elements.

In consequence of the pairing axiom, the sets

0 := ∅, (A.5a)
1 := {0}, (A.5b)
2 := {0, 1} (A.5c)

all exist. More generally, we may define:


Definition A.15. If x, y are sets and Z is given by the pairing axiom, then we call

(a) {x, y} := {u ∈ Z : u = x ∨ u = y} the unordered pair given by x and y,

(b) {x} := {x, x} the singleton set given by x,

(c) (x, y) := {{x}, {x, y}} the ordered pair given by x and y (cf. Def. 2.1).

We can now show that ordered pairs behave as expected:


Lemma A.16. The following holds true:
 
′ ′ ′ ′
∀′ ′ (x, y) = (x , y ) ⇔ (x = x ) ∧ (y = y ) .
x,y,x ,y

Proof. “⇐” is merely


x=x′ , y=y ′
(x, y) = {{x}, {x, y}} = {{x′ }, {x′ , y ′ }} = (x′ , y ′ ).

“⇒” is done by distinguishing two cases: If x = y, then

{{x}} = (x, y) = (x′ , y ′ ) = {{x′ }, {x′ , y ′ }}.


A AXIOMATIC SET THEORY 193

Next, by extensionality, we first get {x} = {x′ } = {x′ , y ′ }, followed by x = x′ = y ′ ,


establishing the case. If x 6= y, then
{{x}, {x, y}} = (x, y) = (x′ , y ′ ) = {{x′ }, {x′ , y ′ }},
where, by extensionality {x} 6= {x, y} 6= {x′ }. Thus, using extensionality again, {x} =
{x′ } and x = x′ . Next, we conclude
{x, y} = {x′ , y ′ } = {x, y ′ }
and a last application of extensionality yields y = y ′ . 

While we now have the existence of the infinitely many different sets 0, {0}, {{0}}, . . . ,
we are not, yet, able to form sets containing more than two elements. This is remedied
by the following axiom:

Axiom 4 Union:  
∀ ∃ ∀ ∀ (x ∈ X ∧ X ∈ M) ⇒ x ∈ Y .
M Y x X
Thus, the union axiom states that, for each set of sets M, there exists a set
Y containing all elements of elements of M.
Definition A.17. (a) If M is a set and Y is given by the union axiom, then define
[ [  
M := X := x ∈ Y : ∃ x∈X .
X∈M
X∈M

(b) If X and Y are sets, then define


[
X ∪ Y := {X, Y }.

(c) If x, y, z are sets, then define


{x, y, z} := {x, y} ∪ {z}.
Remark A.18. (a) The definition of set-theoretic unions as
[  
Ai := x : ∃ x ∈ Ai
i∈I
i∈I

in (1.25b) will be equivalent to the definition in Def. A.17(a) if we are allowed to


form the set
M := {Ai : i ∈ I}.
If I is a set and Ai is a set for each i ∈ I, then M as above will be a set by Axiom
5 below (the axiom of replacement).
A AXIOMATIC SET THEORY 194

(b) In contrast to unions, intersections can be obtained directly from comprehension


without the introduction of an additional axiom: For example

X ∩ Y := {x ∈ X : x ∈ Y },
\  
Ai := x ∈ Ai0 : ∀ x ∈ Ai ,
i∈I
i∈I

where i0 ∈ I 6= ∅ is an arbitrary fixed element of I.

(c) The union [ [ [


∅= X= Ai = ∅
X∈∅ i∈∅

is the empty set – in particular, a set. However,


\     \
∅ = x : ∀ x ∈ X = V = x : ∀ x ∈ Ai = Ai ,
X∈∅ i∈∅
i∈∅

i.e. the intersection over the empty set is the class of all sets – in particular, a proper
class and not a set.

Definition A.19. We define the successor function

x 7→ S(x) := x ∪ {x} (for each set x).

Thus, recalling (A.5), we have 1 = S(0), 2 = S(1); and we can define 3 := S(2), . . . In
general, we call the set S(x) the successor of the set x.

In Def. 2.3 and Def. 2.19, respectively, we define functions and relations in the usual
manner, making use of the Cartesian product A × B of two sets A and B, which,
according to (2.2) consists of all ordered pairs (x, y), where x ∈ A and y ∈ B. However,
Axioms 0 – 4 are not sufficient to justify the existence of Cartesian products. To obtain
Cartesian products, we employ the axiom of replacement. Analogous to the axiom
of comprehension, the following axiom of replacement actually consists of a scheme of
infinitely many axioms, one for each set-theoretic formula:

Axiom 5 Replacement Scheme: For each set-theoretic formula, not containing Y as a


free variable, the universal closure of
   
∀ ∃! φ ⇒ ∃ ∀ ∃ φ
x∈X y Y x∈X y∈Y
A AXIOMATIC SET THEORY 195

is an axiom. Thus, the replacement scheme states that if, for each x ∈ X,
there exists a unique y having the property φ (where, in general, φ will depend
on x), then there exists a set Y that, for each x ∈ X, contains this y with
property φ. One can view this as obtaining Y by replacing each x ∈ X by
the corresponding y = y(x).

Theorem A.20. If A and B are sets, then the Cartesian product of A and B, i.e. the
class  
A × B := x : ∃ ∃ x = (a, b)
a∈A b∈B

exists as a set.

Proof. For each a ∈ A, we can use replacement with X := B and φ := φa being the
formula y = (a, x) to obtain the existence of the set

{a} × B := {(a, x) : x ∈ B} (A.6a)

(in the usual way, comprehension and extensionality were used as well). Analogously,
using replacement again with X := A and φ being the formula y = {x} × B, we obtain
the existence of the set
M := {{x} × B : x ∈ A}. (A.6b)
In a final step, the union axiom now shows
[ [
M= {a} × B = A × B (A.6c)
a∈A

to be a set as well. 

A.3.4 Infinity, Ordinals, Natural Numbers

The following axiom of infinity guarantees the existence of infinite sets (e.g., it will allow
us to define the set of natural numbers N, which is infinite by Th. A.46 below).

Axiom 6 Infinity:  
∃ 0∈X ∧ ∀ (x ∪ {x} ∈ X) .
X x∈X

Thus, the infinity axiom states the existence of a set X containing ∅ (iden-
tified with the number 0), and, for each of its elements x, its successor
S(x) = x ∪ {x}.
A AXIOMATIC SET THEORY 196

In preparation for our official definition of N in Def. A.41 below, we will study so-called
ordinals, which are special sets also of further interest to the field of set theory (the
natural numbers will turn out to be precisely the finite ordinals). We also need some
notions from the theory of relations, in particular, order relations (cf. Def. 2.19 and Def.
2.23).

Definition A.21. Let R be a relation on a set X.

(a) R is called asymmetric if, and only if,


 
∀ xRy ⇒ ¬(yRx) , (A.7)
x,y∈X

i.e. if x is related to y only if y is not related to x.

(b) R is called a strict partial order if, and only if, R is asymmetric and transitive. It is
noted that this is consistent with Not. 2.24, since, recalling the notation ∆(X) :=
{(x, x) : x ∈ X}, R is a partial order on X if, and only if, R \ ∆(X) is a strict
partial order on X. We extend the notions lower/upper bound, min, max, inf, sup
of Def. 2.25 to strict partial orders R by applying them to R ∪∆(X): We call x ∈ X
a lower bound of Y ⊆ X with respect to R if, and only if, x is a lower bound of Y
with respect to R ∪ ∆(X), and analogous for the other notions.

(c) A strict partial order R is called a strict total order or a strict linear order if, and
only if, for each x, y ∈ X, one has x = y or xRy or yRx.

(d) R is called a (strict) well-order if, and only if, R is a (strict) total order and every
nonempty subset of X has a min with respect to R (for example, the usual ≤
constitutes a well-order on N (see, e.g., [Phi16, Th. D.5]), but not on R (e.g., R+
does not have a min)).

(e) If Y ⊆ X, then the relation on Y defined by

xSy :⇔ xRy

is called the restriction of R to Y , denoted S = R↾Y (usually, one still writes R for
the restriction).

Lemma A.22. Let R be a relation on a set X and Y ⊆ X.

(a) If R is transitive, then R↾Y is transitive.

(b) If R is reflexive, then R↾Y is reflexive.


A AXIOMATIC SET THEORY 197

(c) If R is antisymmetric, then R↾Y is antisymmetric.

(d) If R is asymmetric, then R↾Y is asymmetric.

(e) If R is a (strict) partial order, then R↾Y is a (strict) partial order.

(f ) If R is a (strict) total order, then R↾Y is a (strict) total order.

(g) If R is a (strict) well-order, then R↾Y is a (strict) well-order.

Proof. (a): If a, b, c ∈ Y with aRb and bRc, then aRc, since a, b, c ∈ X and R is transitive
on X.
(b): If a ∈ Y , then a ∈ X and aRa, since R is reflexive on X.
(c): If a, b ∈ Y with aRb and bRa, then a = b, since a, b ∈ X and R is antisymmetric
on X.
(d): If a, b ∈ Y with aRb, then ¬bRa, since a, b ∈ X and R is asymmetric on X.
(e) follows by combining (a) – (d).
(f): If a, b ∈ Y with a = b and ¬aRb, then bRa, since a, b ∈ X and R is total on X.
Combining this with (e) yields (f).
(g): Due to (f), it merely remains to show that every nonempty subset Z ⊆ Y has a
min. However, since Z ⊆ X and R is a well-order on X, there is m ∈ Z such that m is
a min for R on X, implying m to be a min for R on Y as well. 

Remark A.23. Since the universal class V is not a set, ∈ is not a relation in the sense
of Def. 2.19. It can be considered as a “class relation”, i.e. a subclass of V × V, but
it is a proper class. However, ∈ does constitute a relation in the sense of Def. 2.19 on
each set X (recalling that each element of X must be a set as well). More precisely, if
X is a set, then so is
R∈ := {(x, y) ∈ X × X : x ∈ y}. (A.8a)
Then
∀ (x, y) ∈ R∈ ⇔ x ∈ y. (A.8b)
x,y∈X

Definition A.24. A set X is called transitive if, and only if, every element of X is also
a subset of X:
∀ x ⊆ X. (A.9a)
x∈X

Clearly, (A.9a) is equivalent to


 
∀ x∈y ∧ y∈X ⇒ x∈X . (A.9b)
x,y
A AXIOMATIC SET THEORY 198

Lemma A.25. If X, Y are transitive sets, then X ∩ Y is a transitive set.

Proof. If x ∈ X ∩ Y and y ∈ x, then y ∈ X (since X is transitive) and y ∈ Y (since Y


is transitive). Thus y ∈ X ∩ Y , showing X ∩ Y is transitive. 
Definition A.26. (a) A set α is called an ordinal number or just an ordinal if, and only
if, α is transitive and ∈ constitutes a strict well-order on α. An ordinal α is called a
successor ordinal if, and only if, there exists an ordinal β such that α = S(β), where
S is the successor function of Def. A.19. An ordinal α 6= 0 is called a limit ordinal
if, and only if, it is not a successor ordinal. We denote the class of all ordinals by
ON (it is a proper class by Cor. A.33 below).
(b) We define

∀ (α < β :⇔ α ∈ β), (A.10a)


α,β∈ON

∀ (α ≤ β :⇔ α < β ∨ α = β). (A.10b)


α,β∈ON

Example A.27. Using (A.5), 0 = ∅ is an ordinal, and 1 = S(0), 2 = S(1) are both
successor ordinals (in Prop. A.43, we will identify N0 as the smallest limit ordinal). Even
though X := {1} and Y := {0, 2} are well-ordered by ∈, they are not ordinals, since
they are not transitive sets: 1 ∈ X, but 1 6⊆ X (since 0 ∈ 1, but 0 ∈ / X); similarly,
1 ∈ 2 ∈ Y , but 1 ∈
/ Y.
Lemma A.28. No ordinal contains itself, i.e.

∀ α∈
/ α.
α∈ON

Proof. If α is an ordinal, then ∈ is a strict order on α. Due to asymmetry of strict


orders, x ∈ x can not be true for any element of α, implying that α ∈ α can not be
true. 
Proposition A.29. Every element of an ordinal is an ordinal, i.e.
 
∀ X ∈ α ⇒ X ∈ ON
α∈ON

(in other words, ON is a transitive class).

Proof. Let α ∈ ON and X ∈ α. Since α is transitive, we have X ⊆ α. As ∈ is a


strict well-order on α, it must also be a strict well-order on X by Lem. A.22(g). In
consequence, it only remains to prove that X is transitive as well. To this end, let
x ∈ X. Then x ∈ α, as α is transitive. If y ∈ x, then, using transitivity of α again,
y ∈ α. Now y ∈ X, as ∈ is transitive on α, proving x ⊆ X, i.e. X is transitive. 
A AXIOMATIC SET THEORY 199

Proposition A.30. If α, β ∈ ON, then X := α ∩ β ∈ ON (we will see in Th. A.35(a)


below that, actually, α ∩ β = min{α, β}).

Proof. X is transitive by Lem. A.25, and, since X ⊆ α, ∈ is a strict well-order on X by


Lem. A.22(g). 

Proposition A.31. On the class ON, the relation ≤ (as defined in (A.10)) is the same
as the relation ⊆, i.e.
 
∀ α ≤ β ⇔ α ⊆ β ⇔ (α ∈ β ∨ α = β) . (A.11)
α,β∈ON

Proof. Let α, β ∈ ON.


Assume α ≤ β. If α = β, then α ⊆ β. If α ∈ β, then α ⊆ β, since β is transitive.
Conversely, assume α ⊆ β and α 6= β. We have to show α ∈ β. To this end, we set
X := β \ α. Then X 6= ∅ and, as ∈ well-orders β, we can let m := min X. We will show
m = α (note that this will complete the proof, due to α = m ∈ X ⊆ β). If µ ∈ m, then
µ ∈ β (since m ∈ β and β is transitive) and µ ∈/ X (since m = min X), implying µ ∈ α
(since X = β \ α) and, thus, m ⊆ α. Seeking a contradiction, assume m 6= α. Then
there must be some γ ∈ α \ m ⊆ α ⊆ β. In consequence γ, m ∈ β. As γ ∈ / m and ∈ is
a total order on β, we must have either m = γ or m ∈ γ. However, m 6= γ, since γ ∈ α
and m ∈ / α (as m ∈ X). So it must be m ∈ γ ∈ α, implying m ∈ α, as β is transitive.
This contradiction proves m = α and establishes the proposition. 

Theorem A.32. The class ON is well-ordered by ∈, i.e.

(i) ∈ is transitive on ON:


 
∀ α<β ∧ β<γ ⇒ α<γ .
α,β,γ∈ON

(ii) ∈ is asymmetric on ON:


 
∀ α < β ⇒ ¬(β < α) .
α,β∈ON

(iii) Ordinals are always comparable:


 
∀ α<β ∨ β<α ∨ α=β .
α,β∈ON

(iv) Every nonempty set of ordinals has a min.


A AXIOMATIC SET THEORY 200

Proof. (i) is clear, as γ is a transitive set.


(ii): If α, β ∈ ON, then α ∈ β ∈ α implies α ∈ α by (i), which is a contradiction to
Lem. A.28.
(iii): Let γ := α ∩ β. Then γ ∈ ON by Prop. A.30. Thus
Lem. A.31
γ⊆α ∧ γ⊆β ⇒ (γ ∈ α ∨ γ = α) ∧ (γ ∈ β ∨ γ = β). (A.12)

If γ ∈ α and γ ∈ β, then γ ∈ α ∩ β = γ, in contradiction to Lem. A.28. Thus, by (A.12),


γ = α or γ = β. If γ = α, then α ⊆ β. If γ = β, then β ⊆ α, completing the proof of
(iii).
(iv): Let X be a nonempty set of ordinals and consider α ∈ X. If α = min X, then
we are already done. Otherwise, Y := α ∩ X = {β ∈ X : β ∈ α} 6= ∅. Since α is
well-ordered by ∈, there is m := min Y . If β ∈ X, then either β < α or α ≤ β by (iii).
If β < α, then β ∈ Y and m ≤ β. If α ≤ β, then m < α ≤ β. Thus, m = min X,
proving (iv). 
Corollary A.33. ON is a proper class (i.e. there is no set containing all the ordinals).

Proof. If there is a set X containing all ordinals, then, by comprehension, β := ON =


{α ∈ X : α is an ordinal} must be a set as well. But then Prop. A.29 says that the set
β is transitive and Th. A.32 yields that the set β is well-ordered by ∈, implying β to be
an ordinal, i.e. β ∈ β in contradiction to Lem. A.28. 
Corollary A.34. For each set X of ordinals, we have:

(a) X is well-ordered by ∈.

(b) X is an ordinal if, and only if, X is transitive. Note: A transitive set of ordinals
X is sometimes called an initial segment of ON, since, here, transitivity can be
restated in the form
 
∀ ∀ α<β ⇒ α∈X . (A.13)
α∈ON β∈X

Proof. (a) is a simple consequence of Th. A.32(i)-(iv).


(b) is immediate from (a). 
Theorem A.35. Let X be a nonempty set of ordinals.
T
(a) Then γ := X is an ordinal, namely γ = min X. In particular, if α, β ∈ ON,
then min{α, β} = α ∩ β.
A AXIOMATIC SET THEORY 201

S
(b) Then δ := X is an ordinal, namely δ = sup X. In particular, if α, β ∈ ON, then
max{α, β} = α ∪ β.

Proof. (a): Let m := min X. Then γ ⊆ m, since m ∈ X. Conversely, if α ∈ X, then


m ≤ α implies m ⊆ α by Prop. A.31, i.e. m ⊆ γ. Thus, m = γ, proving (a).
(b): To show δ ∈ ON, we need to show δ is transitive (then δ is an ordinal by Cor.
A.34(b)). If α ∈ δ, then there is β ∈ X such that α ∈ β. Thus, if γ ∈ α, then γ ∈ β,
since β is transitive. As γ ∈ β implies γ ∈ δ, we see that δ is transitive, as needed. It
remains to show δ = sup X. If α ∈ X, then α ⊆ δ, i.e. α ≤ δ, showing δ to be an upper
bound for X. Now let u ∈ ON be an arbitrary upper bound for X, i.e.

∀ α ⊆ u.
α∈X

Thus, δ ⊆ u, i.e. δ ≤ u, proving δ = sup X. 

Next, we obtain some results regarding the successor function of Def. A.19 in the context
of ordinals.

Lemma A.36. We have


 
∀ x, y ∈ S(α) ∧ x ∈ y ⇒ x 6= α .
α∈ON

Proof. Seeking a contradiction, we reason as follows:


α∈α
/ y∈S(α) α transitive x∈y
x = α ⇒ y 6= α ⇒ y∈α ⇒ y ⊆ α ⇒ α ∈ α.

This contradiction to α ∈
/ α yields x 6= α, concluding the proof. 

Proposition A.37. For each α ∈ ON, the following holds:

(a) S(α) ∈ ON.

(b) α < S(α).

(c) For each ordinal β, β < S(α) holds if, and only if, β ≤ α.

(d) For each ordinal β, if β < α, then S(β) < S(α).

(e) For each ordinal β, if S(β) < S(α), then β < α.


A AXIOMATIC SET THEORY 202

Proof. (a): Due to Prop. A.29, S(α) is a set of ordinals. Thus, by Cor. A.34(b), it
merely remains to prove that S(α) is transitive. Let x ∈ S(α). If x = α, then x =
α ⊆ α ∪ {α} = S(α). If x 6= α, then x ∈ α and, since α is transitive, this implies
x ⊆ α ⊆ S(α), showing S(α) to be transitive, thereby completing the proof of (a).
(b) holds, as α ∈ S(α) holds by the definition of S(α).
(c) is clear, since, for each ordinal β,

β < S(α) ⇔ β ∈ S(α) ⇔ β ∈ α ∨ β = α ⇔ β ≤ α.

(d): If β < α, then S(β) = β ∪ {β} ⊆ α, i.e. S(β) ≤ α < S(α).


(e) follows from (d) using contraposition: If ¬(β < α), then β = α or α < β, implying
S(β) = S(α) or S(α) < S(β), i.e. ¬(S(β) < S(α)). 

We now proceed to define the natural numbers:

Definition A.38. An ordinal n is called a natural number if, and only if,
 
n 6= 0 ∧ ∀ m ≤ n ⇒ m = 0 ∨ m is successor ordinal .
m∈ON

Proposition A.39. If n = 0 or n is a natural number, then S(n) is a natural number


and every element of n is a natural number or 0.

Proof. Suppose n is 0 or a natural number. If m ∈ n, then m is an ordinal by Prop.


A.29. Suppose m 6= 0 and k ∈ m. Then k ∈ n, since n is transitive. Since n is a natural
number, k = 0 or k is a successor ordinal. Thus, m is a natural number. It remains
to show that S(n) is a natural number. By definition, S(n) = n ∪ {n} 6= 0. Moreover,
S(n) ∈ ON by Prop. A.37(a), and, thus, S(n) is a successor ordinal. If m ∈ S(n), then
m ≤ n, implying m = 0 or m is a successor ordinal, completing the proof that S(n) is
a natural number. 

Theorem A.40 (Principle of Induction). If X is a set satisfying

0∈X ∧ ∀ S(x) ∈ X, (A.14)


x∈X

then X contains 0 and all natural numbers.

Proof. Let X be a set satisfying (A.14). Then 0 ∈ X is immediate. Let n be a natural


number and, seeking a contradiction, assume n ∈
/ X. Consider N := S(n)\X. According
to Prop. A.39, S(n) is a natural number and all nonzero elements of S(n) are natural
numbers. Since N ⊆ S(n) and 0 ∈ X, 0 ∈ / N and all elements of N must be natural
A AXIOMATIC SET THEORY 203

numbers. As n ∈ N , N 6= 0. Since S(n) is well-ordered by ∈ and 0 6= N ⊆ S(n), N


must have a min m ∈ N , 0 6= m ≤ n. Since m is a natural number, there must be k
such that m = S(k). Then k < m, implying k ∈
/ N . On the other hand

k < m ∧ m ≤ n ⇒ k ≤ n ⇒ k ∈ S(n).

Thus, k ∈ X, implying m = S(k) ∈ X, in contradiction to m ∈ N . This contradiction


proves n ∈ X, thereby establishing the case. 

Definition A.41. If the set X is given by the axiom of infinity, then we use compre-
hension to define the set

N0 := {n ∈ X : n = 0 ∨ n is a natural number}

and note N0 to be unique by extensionality. We also denote N := N0 \ {0}. In set theory,


it is also very common to use the symbol ω for the set N0 .

Corollary A.42. N0 is the set of all natural numbers and 0, i.e.


 
∀ n ∈ N0 ⇔ n = 0 ∨ n is a natural number .
n

Proof. “⇒” is clear from Def. A.41 and “⇐” is due to Th. A.40. 

Proposition A.43. ω = N0 is the smallest limit ordinal.

Proof. Since ω is a set of ordinals and ω is transitive by Prop. A.39, ω is an ordinal


by Cor. A.34(b). Moreover ω 6= 0, since 0 ∈ ω; and ω is not a successor ordinal (if
ω = S(n) = n ∪ {n}, then n ∈ ω and S(n) ∈ ω by Prop. A.39, in contradiction to
ω = S(n)), implying it is a limit ordinal. To see that ω is the smallest limit ordinal, let
α ∈ ON, α < ω. Then α ∈ ω, that means α = 0 or α is a natural number (in particular,
a successor ordinal). 

In the following Th. A.44, we will prove that N satisfies the Peano axioms P1 – P3 of
Sec. 3.1 (if one prefers, one can show the same for N0 , where 0 takes over the role of 1).

Theorem A.44. The set of natural numbers N satisfies the Peano axioms P1 – P3 of
Sec. 3.1.

Proof. For P1 and P2, we have to show that, for each n ∈ N, one has S(n) ∈ N \ {1}
and that S(m) 6= S(n) for each m, n ∈ N, m 6= n. Let n ∈ N. Then S(n) ∈ N by Prop.
A.39. If S(n) = 1, then n < S(n) = 1 by Prop. A.37(b), i.e. n = 0, in contradiction to
n ∈ N. If m, n ∈ N with m 6= n, then S(m) 6= S(n) is due to Prop. A.37(d). To prove
A AXIOMATIC SET THEORY 204

P3, suppose A ⊆ N has the property that 1 ∈ A and S(n) ∈ A for each n ∈ A. We need
to show A = N (i.e. N ⊆ A, as A ⊆ N is assumed). Let X := A ∪ {0}. Then X satisfies
(A.14) and Th. A.40 yields N0 ⊆ X. Thus, if n ∈ N, then n ∈ X \ {0} = A, showing
N ⊆ A. 
Notation A.45. For each n ∈ N0 , we introduce the notation n + 1 := S(n) (more
generally, one also defines α + 1 := S(α) for each ordinal α).
Theorem A.46. Let n ∈ N0 . Then A := N0 \ n is infinite (see Def. 3.10(b)). In
particular, N0 and N = N0 \ {0} = N0 \ 1 are infinite.

Proof. Since n ∈ / n, we have n ∈ A 6= ∅. Thus, if A were finite, then there were a


bijection f : A −→ Am := {1, . . . , m} = {k ∈ N : k ≤ m} for some m ∈ N. However,
we will show by induction on m ∈ N that there is no injective map f : A −→ Am . Since
S(n) ∈ / n, we have S(n) ∈ A. Thus, if f : A −→ A1 = {1}, then f (n) = f (S(n)),
showing that f is not injective and proving the cases m = 1. For the induction step, we
proceed by contraposition and show that the existence of an injective map f : A −→
Am+1 , m ∈ N, (cf. Not. A.45) implies the existence of an injective map g : A −→ Am .
To this end, let m ∈ N and f : A −→ Am+1 be injective. If m + 1 ∈ / f (A), then f itself
is an injective map into Am . If m + 1 ∈ f (A), then there is a unique a ∈ A such that
f (a) = m + 1. Define
(
f (k) for k < a,
g : A −→ Am , g(k) := (A.15)
f (k + 1) for a ≤ k.

Then g is well-defined: If k ∈ A and a ≤ k, then k +1 ∈ A\{a}, and, since f is injective,


g does, indeed, map into Am . We verify g to be injective: If k, l ∈ A, k < l, then also
k < l + 1 and k + 1 6= l + 1 (by Peano axiom P2 – k + 1 < l + 1 then also follows, but we
do not make use of that here). In each case, g(k) 6= g(l), proving g to be injective. 

For more basic information regarding ordinals see, e.g., [Kun12, Sec. I.8].

A.3.5 Power Set

There is one more basic construction principle for sets that is not covered by Axioms 0
– 6, namely the formation of power sets. This needs another axiom:

Axiom 7 Power Set:  


∀ ∃ ∀ Y ⊆X ⇒ Y ∈M .
X M Y
Thus, the power set axiom states that, for each set X, there exists a set M
that contains all subsets Y of X as elements.
A AXIOMATIC SET THEORY 205

Definition A.47. If X is a set and M is given by the power set axiom, then we call

P(X) := {Y ∈ M : Y ⊆ X}

the power set of X. Another common notation for P(X) is 2X (cf. Prop. 2.18).

A.3.6 Foundation

Foundation is, perhaps, the least important of the axioms in ZF. It basically cleanses
the mathematical universe of unnecessary “clutter”, i.e. of certain pathological sets that
are of no importance to standard mathematics anyway.

Axiom 8 Foundation:
  
∀ ∃ (x ∈ X) ⇒ ∃ ¬∃ z ∈ x ∧ z ∈ X .
X x x∈X z

Thus, the foundation axiom states that every nonempty set X contains an
element x that is disjoint to X.

Theorem A.48. Due to the foundation axiom, the ∈ relation can have no cycles, i.e.
there do not exist sets x1 , x2 , . . . , xn , n ∈ N, such that

x1 ∈ x2 ∈ · · · ∈ xn ∈ x1 . (A.16a)

In particular, sets can not be members of themselves:

¬∃ x ∈ x. (A.16b)
x

Proof. If there were sets x1 , x2 , . . . , xn , n ∈ N, such that (A.16a) were true, then, by
using the pairing axiom and the union axiom, we could form the set

X := {x1 , . . . , xn }.

Then, in contradiction to the foundation axiom, X ∩ xi 6= ∅, for each i = 1, . . . , n:


Indeed, xn ∈ X ∩ x1 , and xi−1 ∈ X ∩ xi for each i = 2, . . . , n. 

For a detailed explanation, why “sets” forbidden by foundation do not occur in standard
mathematics, anyway, see, e.g., [Kun12, Sec. I.14].
A AXIOMATIC SET THEORY 206

A.4 The Axiom of Choice


In addition to the axioms of ZF discussed in the previous section, there is one more
axiom, namely the axiom of choice (AC) that, together with ZF, makes up ZFC, the
axiom system at the basis of current standard mathematics. Even though AC is used
and accepted by most mathematicians, it does have the reputation of being somewhat
less “natural”. Thus, many mathematicians try to avoid the use of AC, where possible,
and it is often pointed out explicitly, if a result depends on the use of AC (but this
practice is by no means consistent, neither in the literature nor in this class, and one
might sometimes be surprised, which seemingly harmless result does actually depend on
AC in some subtle nonobvious way). We will now state the axiom:

Axiom 9 Axiom of Choice (AC):


 
 
∀ ∅ ∈
/M ⇒ ∃ ∀ f (M ) ∈ M  .
M M ∈M
S
f : M−→ N
N ∈M

Thus, the axiom of choice postulates, for each nonempty set M, whose ele-
ments are all nonempty sets, the existence of a choice function, that means
a function that assigns, to each M ∈ M, an element m ∈ M .
Example A.49. For example, the axiom of choice postulates, for each nonempty set
A, the existence of a choice function on P(A) \ {∅} that assigns each subset of A one of
its elements.

The axiom of choice is remarkable since, at first glance, it seems so natural that one
can hardly believe it is not provable from the axioms in ZF. However, one can actually
show that it is neither provable nor disprovable from ZF (see, e.g., [Jec73, Th. 3.5, Th.
5.16] – such a result is called an independence proof, see [Kun80] for further material). If
you want to convince yourself that the existence of choice functions is, indeed, a tricky
matter, try to define a choice function on P(R) \ {∅} without AC (but do not spend too
much time on it – one can show this is actually impossible to accomplish).
Theorem A.52 below provides several important equivalences of AC. Its statement and
proof needs some preparation. We start by introducing some more relevant notions from
the theory of partial orders:
Definition A.50. Let X be a set and let ≤ be a partial order on X.

(a) An element m ∈ X is called maximal (with respect to ≤) if, and only if, there exists
no x ∈ X such that m < x (note that a maximal element does not have to be a
max and that a maximal element is not necessarily unique).
A AXIOMATIC SET THEORY 207

(b) A nonempty subset C of X is called a chain if, and only if, C is totally ordered by
≤. Moreover, a chain C is called maximal if, and only if, no strict superset Y of C
(i.e. no Y ⊆ X such that C ( Y ) is a chain.

The following lemma is a bit technical and will be used to prove the implication AC
⇒ (ii) in Th. A.52 (other proofs in the literature often make use of so-called transfinite
recursion, but that would mean further developing the theory of ordinals, and we will
not pursue this route in this class).
Lemma A.51. Let X be a set and let ∅ 6= M ⊆ P(X) be a nonempty set of subsets of
X. We let M be partially ordered by inclusion, i.e. setting A ≤ B :⇔ A ⊆ B for each
A, B ∈ M. Moreover, define
[ [
∀ S := S (A.17)
S⊆M
S∈S

and assume  [ 
∀ C is a chain ⇒ C∈M . (A.18)
C⊆M

If the function g : M −→ M has the property that


 
∀ M ⊆ g(M ) ∧ #(g(M ) \ M ) ≤ 1 , (A.19)
M ∈M

then g has a fixed point, i.e.


∃ g(M ) = M. (A.20)
M ∈M

Proof. Fix some arbitrary M0 ∈ M. We call T ⊆ M an M0 -tower if, and only if, T
satisfies the following three properties

(i) M0 ∈ T .
S
(ii) If C ⊆ T is a chain, then C ∈T.

(iii) If M ∈ T , then g(M ) ∈ T .

Let T := {T ⊆ M : T is an M0 -tower}. If T1 := {M ∈ M : M0 ⊆ M }, then, clearly,


T1 is an M0 -tower and,
T in particular, T 6= ∅. Next, we note that the intersection of all
M0 -towers, i.e. T0 := T ∈T T , is also an M0 -tower. Clearly, no strict subset of T0 can
be an M0 -tower and
M ∈ T0 ⇒ M ∈ T1 ⇒ M0 ⊆ M. (A.21)
A AXIOMATIC SET THEORY 208

The main work of the rest of the proof consists of showing that T0 is a chain. To show
T0 to be a chain, define
 
Γ := M ∈ T0 : ∀ (M ⊆ N ∨ N ⊆ M ) . (A.22)
N ∈T0

We intend to show that Γ = T0 by verifying that Γ is an M0 -tower. As an intermediate


step, we define

∀ Φ(M ) := {N ∈ T0 : N ⊆ M ∨ g(M ) ⊆ N }
M ∈Γ

and also show each Φ(M ) to be an M0 -tower. Actually, Γ and each Φ(M )Ssatisfy (i)
due to (A.21). To verify Γ satisfies (ii), let C ⊆ Γ be a chain and U := C. Then
U ∈ T0 , since T0 satisfies (ii). If N ∈ T0 , and C ⊆ N for each C ∈ C, then U ⊆ N . If
N ∈ T0 , and there is C ∈ C such that C 6⊆ N , then N ⊆ C (since C ∈ Γ), i.e. N ⊆ U ,
showing U ∈ Γ and Γ satisfyingS(ii). Now, let M ∈ Γ. To verify Φ(M ) satisfies (ii), let
C ⊆ Φ(M ) be a chain and U := C. Then U ∈ T0 , since T0 satisfies (ii). If U ⊆ M , then
U ∈ Φ(M ) as desired. If U 6⊆ M , then there is x ∈ U such that x ∈ / M . Thus, there
is C ∈ C such that x ∈ C and g(M ) ⊆ C (since C ∈ Φ(M )), i.e. g(M ) ⊆ U , showing
U ∈ Φ(M ) also in this case, and Φ(M ) satisfies (ii). We will verify that Φ(M ) satisfies
(iii) next. For this purpose, fix N ∈ Φ(M ). We need to show g(N ) ∈ Φ(M ). We already
know g(N ) ∈ T0 , as T0 satisfies (iii). As N ∈ Φ(M ), we can now distinguish three cases.
Case 1: N ( M . In this case, we cannot have M ( g(N ) (otherwise, #(g(N ) \ N ) ≥ 2
in contradiction to (A.19)). Thus, g(N ) ⊆ M (since M ∈ Γ), showing g(N ) ∈ Φ(M ).
Case 2: N = M . Then g(N ) = g(M ) ∈ Φ(M ) (since g(M ) ∈ T0 and g(M ) ⊆ g(M )).
Case 3: g(M ) ⊆ N . Then g(M ) ⊆ g(N ) by (A.19), again showing g(N ) ∈ Φ(M ). Thus,
we have verified that Φ(M ) satisfies (iii) and, therefore, is an M0 -tower. Then, by the
definition of T0 , we have T0 ⊆ Φ(M ). As we also have Φ(M ) ⊆ T0 (from the definition
of Φ(M )), we have shown
∀ Φ(M ) = T0 .
M ∈Γ

As a consequence, if N ∈ T0 and M ∈ Γ, then N ∈ Φ(M ) and this means N ⊆ M ⊆


g(M ) or g(M ) ⊆ N , i.e. each N ∈ T0 is comparable to g(M ), showing g(M ) ∈ Γ and Γ
satisfying (iii), completing the proof that Γ is an M0 -tower. As with the Φ(M ) above,
we conclude Γ = T0 , as desired. To conclude the proof of the lemma, we note Γ = T0
implies T0 is a chain. We claim that
[
M := T0

satisfies (A.20): Indeed, M ∈ T0 , since T0 satisfies (ii). Then g(M ) ∈ T0 , since T0


satisfies (iii). We then conclude g(M ) ⊆ M from the definition of M . As we always
have M ⊆ g(M ) by (A.19), we have established g(M ) = M and proved the lemma. 
A AXIOMATIC SET THEORY 209

Theorem A.52 (Equivalences to the Axiom of Choice). The following statements (i)
– (v) are equivalent to the axiom of choice (as stated as Axiom 9 above).
Q
(i) Every Cartesian product i∈I Ai of nonempty sets Ai , where I is a nonempty
index set, is nonempty (cf. Def. 2.15(c)).

(ii) Hausdorff’s Maximality Principle: Every nonempty partially ordered set X con-
tains a maximal chain (i.e. a maximal totally ordered subset).

(iii) Zorn’s Lemma: Let X be a nonempty partially ordered set. If every chain C ⊆ X
(i.e. every nonempty totally ordered subset of X) has an upper bound in X (such
chains with upper bounds are sometimes called inductive), then X contains a
maximal element (cf. Def. A.50(a)).

(iv) Zermelo’s Well-Ordering Theorem: Every set can be well-ordered (recall the defi-
nition of a well-order from Def. A.21(d)).

(v) Every vector space V over a field F has a basis B ⊆ V .

Proof. “(i) ⇔ AC”: Assume (i). Given a nonempty


Q set of nonempty sets M, let I := M
and, for each M ∈ M, let AM := M . If f ∈ M ∈I AM , then, according to Def. 2.15(c),
for each M ∈ I = M, one has f (M ) ∈ AM = M , proving AC holds. Conversely, assume
6= ∅ and eachSAi 6= ∅. Let M := {Ai : i ∈ I}.
AC. Consider a family (Ai )i∈I such that I S
Then, by AC, there is a map g : M −→ N ∈M N = j∈I Aj such that g(M ) ∈ M for
each M ∈ M. Then we can define
[
f : I −→ Aj , f (i) := g(Ai ) ∈ Ai ,
j∈I

to prove (i).
Next, we will show AC ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ AC.
“AC ⇒ (ii)”: Assume AC and let X be a nonempty partially ordered set. Let M be the
set of all chains in X (i.e. the set of all nonempty totally ordered subsets of X). Then
∅∈/ M and M 6= ∅ (since X 6= ∅ and {x} ∈ M for each x ∈ X). Moreover, M satisfies
the hypothesis
S of Lem. A.51, since, if C ⊆ M is a chain of totally ordered subsets of X,
then C is a totally ordered subset of X, i.e. in M (here we have used the notation
of (A.17); also note that we are dealing with two different types of chains here, namely
those with respect to the order on X and those with respect to the order given by ⊆ on
M). Let f : P(X) \ {∅} −→ X be a choice function given by AC, i.e. such that

∀ f (Y ) ∈ Y.
Y ∈P(X)\{∅}
A AXIOMATIC SET THEORY 210

As an auxiliary notation, we set



∀ M ∗ := x ∈ X \ M : M ∪ {x} ∈ M .
M ∈M

With the intention of applying Lem. A.51, we define


(
M ∪ {f (M ∗ )} if M ∗ 6= ∅,
g : M −→ M, g(M ) :=
M if M ∗ = ∅.

Since g clearly satisfies (A.19), Lem. A.51 applies, providing an M ∈ M such that
g(M ) = M . Thus, M ∗ = ∅, i.e. M is a maximal chain, proving (ii).
“(ii) ⇒ (iii)”: Assume (ii). To prove Zorn’s lemma, let X be a nonempty set, partially
ordered by ≤, such that every chain C ⊆ X has an upper bound. Due to Hausdorff’s
maximality principle, we can assume C ⊆ X to be a maximal chain. Let m ∈ X be an
upper bound for the maximal chain C. We claim that m is a maximal element: Indeed,
if there were x ∈ X such that m < x, then x ∈ / C (since m is upper bound for C) and
C ∪ {x} would constitute a strict superset of C that is also a chain, contradicting the
maximality of C.
“(iii) ⇒ (iv)”: Assume (iii) and let X be a nonempty set. We need to construct a
well-order on X. Let W be the set of all well-orders on subsets of X, i.e.

W := (Y, W ) : Y ⊆ X ∧ W ⊆ Y × Y ⊆ X × X is a well-order on Y .

We define a partial order ≤ on W by setting



∀′ ′ (Y, W ) ≤ (Y ′ , W ′ ) :⇔ Y ⊆ Y ′ ∧ W = W ′ ↾Y
(Y,W ),(Y ,W )∈W

∧ (y ∈ Y, y ′ ∈ Y ′ , y ′ W ′ y ⇒ y ′ ∈ Y )

(recall the definition of the restriction of a relation from Def. A.21(e)). To apply Zorn’s
lemma to (W, ≤), we need to check that every chain C ⊆ W has an upper bound. To
this end, if C ⊆ W is a chain, let
[ [
UC := (YC , WC ), where YC := Y, WC := W.
(Y,W )∈C (Y,W )∈C

We need to verify UC ∈ W: If aWC b, then there is (Y, C) ∈ C such that aW b. In


particular, (a, b) ∈ Y × Y ⊆ YC × YC , showing WC to be a relation on YC . Clearly, WC
is a total order on YC (one just uses that, if a, b ∈ YC , then, as C is a chain, there is
(Y, W ) ∈ C such that a, b ∈ Y and W = WC ↾Y is a total order on Y ). To see that WC
is a well-order on YC , let ∅ 6= A ⊆ YC . If a ∈ A, then there is (Y, W ) ∈ C such that
A AXIOMATIC SET THEORY 211

a ∈ Y . Since W = WC ↾Y is a well-order on Y , we can let m := min Y ∩ A. We claim


that m = min A as well: Let b ∈ A. Then there is (B, U ) ∈ C such that b ∈ B. If
B ⊆ Y , then b ∈ Y ∩ A and mW b. If Y ⊆ B, then m, b ∈ B. If mU b, then we are
done. If bU m, then b ∈ Y (since (Y, W ) ≤ (B, U )), i.e., again, b ∈ Y ∩ A and mW b
(actually m = b in this case), proving m = min A. This completes the proof that WC is
a well-order on YC and, thus, shows UC ∈ W. Next, we check UC to be an upper bound
for C: If (Y, W ) ∈ C, then Y ⊆ YC and W = WC ↾Y are immediate. If y ∈ Y , y ′ ∈ YC ,
and y ′ WC y, then y ′ ∈ Y (otherwise, y ′ ∈ A with (A, U ) ∈ C, (Y, W ) ≤ (A, U ), y ′ U y, in
contradiction to y ′ ∈ / Y ). Thus, (Y, W ) ≤ UC , showing UC to be an upper bound for C.
By Zorn’s lemma, we conclude that W contains a maximal element (M, WM ). But then
M = X and WM is the desired well-order on X: Indeed, if there is x ∈ X \ M , then we
can let Y := M ∪ {x} and,
 
∀ aW b :⇔ (a, b ∈ M ∧ aWM b) ∨ b = x .
a,b∈Y

Then (Y, W ) ∈ W with (M, WM ) < (Y, W ) in contradiction to the maximality of


(M, WM ).
“(iv)
S ⇒ AC”: Assume (iv). Given a nonempty set of nonempty sets M, let X :=
M ∈M M . By (iv), there exists a well-order R on X. Then every nonempty Y ⊆ X
has a unique min. As every M ∈ M is a nonempty subset of X, we can define a choice
function
f : M −→ X, f (M ) := min M ∈ M,
proving AC.
“(v) ⇔ AC”: That every vector space has a basis is proved in Th. 5.23 by use of Zorn’s
lemma. That, conversely, (v) implies AC was first shown in [Bla84], but the proof
needs more algebraic tools than we have available in this class. There are also some
set-theoretic subtleties that distinguish this equivalence from the ones provided in (i) –
(iv), cf. Rem. A.53 below. 
Remark A.53. What is actually proved in [Bla84] is that Th. A.52(v) implies the
axiom of multiple choice (AMC), which states
 
 
  
∀ ∅ ∈
 /M ⇒ ∃ ∀ f (M ) ⊆ M ∧ 0 < #f (M ) < ∞  ,
M M ∈M
!
S
f : M−→P N
N ∈M

i.e., for each nonempty set M, whose elements are all nonempty sets, there exists a
function that assigns, to each M ∈ M, a finite subset of M . Neither the proof that (i)
– (iv) of Th. A.52 are each equivalent to AC nor the proof in [Bla84] that Th. A.52(v)
A AXIOMATIC SET THEORY 212

implies AMC makes use of the axiom of foundation (Axiom 8 above). However, the
proof that AMC implies AC does require the axiom of foundation (cf., e.g., [Hal17, Ch.
6]).

A.5 Cardinality
The following theorem provides two interesting, and sometimes useful, characterizations
of infinite sets:

Theorem A.54. Let A be a set. Using the axiom of choice (AC) of Sec. A.4, the
following statements (i) – (iii) are equivalent. More precisely, (ii) and (iii) are equivalent
even without AC (a set A is sometimes called Dedekind-infinite if, and only if, it satisfies
(iii)), (iii) implies (i) without AC, but AC is needed to show (i) implies (ii), (iii).

(i) A is infinite.

(ii) There exists M ⊆ A and a bijective map f : M −→ N.

(iii) There exists a strict subset B ( A and a bijective map g : A −→ B.

One sometimes expresses the equivalence between (i) and (ii) by saying that a set is
infinite if, and only if, it contains a copy of the natural numbers. The property stated in
(iii) might seem strange at first, but infinite sets are, indeed, precisely those identical in
size to some of their strict subsets (as an example think of the natural bijection n 7→ 2n
between all natural numbers and the even numbers).

Proof. We first prove, without AC, the equivalence between (ii) and (iii).
“(ii) ⇒ (iii)”: Let E denote the even numbers. Then E ( N and h : N −→ E,
h(n) := 2n, is a bijection, showing that (iii) holds for the natural numbers. According to
(ii), there exists M ⊆ A and a bijective map f : M −→ N. Define B := (A\M ) ∪˙ f −1 (E)
and (
x for x ∈ A \ M ,
h : A −→ B, h(x) := −1
(A.23)
f ◦ h ◦ f (x) for x ∈ M .
Then B ( A since B does not contain the elements of M that are mapped to odd
numbers under f . Still, h is bijective, since h↾A\M = IdA\M and h↾M = f −1 ◦ h ◦ f is the
composition of the bijective maps f , h, and f −1 ↾E : E −→ f −1 (E).
“(iii) ⇒ (ii)”: As (iii) is assumed, there exist B ⊆ A, a ∈ A \ B, and a bijective map
g : A −→ B. Set
M := {an := g n (a) : n ∈ N}.
A AXIOMATIC SET THEORY 213

We show that an 6= am for each m, n ∈ N with m 6= n: Indeed, suppose m, n ∈ N with


n > m and an = am . Then, since g is bijective, we can apply g −1 m times to an = am
to obtain
a = (g −1 )m (am ) = (g −1 )m (an ) = g n−m (a).
Since l := n − m ≥ 1, we have a = g(g l−1 (a)), in contradiction to a ∈ A \ B. Thus, all
the an ∈ M are distinct and we can define f : M −→ N, f (an ) := n, which is clearly
bijective, proving (ii).
“(iii) ⇒ (i)”: The proof is conducted by contraposition, i.e. we assume A to be finite and
proof that (iii) does not hold. If A = ∅, then there is nothing to prove. If ∅ 6= A is finite,
then, by Def. 3.10(b), there exists n ∈ N and a bijective map f : A −→ {1, . . . , n}. If
B ( A, then, according to Th. 3.19(a), there exists m ∈ N0 , m < n, and a bijective
map h : B −→ {1, . . . , m}. If there were a bijective map g : A −→ B, then h ◦ g ◦ f −1
were a bijective map from {1, . . . , n} onto {1, . . . , m} with m < n in contradiction to
Th. 3.17.
“(i) ⇒ (ii)”: Inductively, we construct a strictly increasing sequence M1 ⊆ M2 ⊆ . . . of
subsets Mn of A n ∈ N, and a sequence of functions fn : Mn −→ {1, . . . , n} satisfying
∀ fn is bijective, (A.24a)
n∈N
 
∀ m ≤ n ⇒ f n ↾M m = f m : (A.24b)
m,n∈N

Since A 6= ∅, there exists m1 ∈ A. Set M1 := {m1 } and f1 : M1 −→ {1}, f1 (m1 ) := 1.


Then M1 ⊆ A and f1 bijective are trivially clear. Now let n ∈ N and suppose M1 , . . . , Mn
and f1 , . . . , fn satisfying (A.24) have already been constructed. Since A is infinite, there
must be mn+1 ∈ A\Mn (otherwise Mn = A and the bijectivity of fn : Mn −→ {1, . . . , n}
shows A is finite with #A = n; AC is used to select the mn+1 ∈ A \ Mn ). Set Mn+1 :=
Mn ∪ {mn+1 } and
(
fn (x) for x ∈ Mn ,
fn+1 : Mn+1 −→ {1, . . . , n + 1}, fn+1 (x) := (A.25)
n + 1 for x = mn+1 .
Then the bijectivity of fn implies the bijectivity of fn+1 , and, since fn+1 ↾Mn = fn holds
by definition of fn+1 , the implication
m≤n+1 ⇒ fn+1 ↾Mm = fm
holds true as well. An induction also shows Mn = {m1 , . . . , mn } and fn (mn ) = n for
each n ∈ N. We now define
[
M := Mn = {mn : n ∈ N}, f : M −→ N, f (mn ) := fn (mn ) = n. (A.26)
n∈N

Clearly, M ⊆ A, and f is bijective with f −1 : N −→ M , f −1 (n) = mn . 


A AXIOMATIC SET THEORY 214

We now proceed to prove some rules regarding cardinality:

Theorem A.55. Let A, B be sets. Then

#A ≤ #B ∨ #B ≤ #A, (A.27)

i.e. there exists a bijective map φA : A −→ N with N ⊆ B or a bijective map φB :


M −→ B, M ⊆ A (this result makes use of AC).

Proof. To apply Zorn’s lemma of Th. A.52(iii), we define a partial order on the set
n o
M := (M, N, f ) : M ⊆ A, N ⊆ B, f : M −→ N is bijective (A.28)

by letting
(M, N, f ) ≤ (U, V, g) :⇔ M ⊆ U, g↾M = f. (A.29)
Then M contains the empty map, (∅, ∅, ∅) ∈ M, i.e. S M 6= ∅. Every chain C ⊆ M
has an upper bound, namely (MC , fC ) with MC := (M,N,f )∈C M and fC (x) := f (x),
where (M, N, f ) ∈ C is chosen such that x ∈ M (since C is a chain, the value of fC (x)
does not actually depend on S the choice of (M, N, f ) ∈ C and is, thus, well-defined).
Clearly, MC ⊆ A and NC := (M,N,f )∈C N ⊆ B. We need to verify fC : MC −→ NC
to be bijective. If x, y ∈ MC , then, since C is a chain, there exists (M, N, f ) ∈ C with
x, y ∈ M . As f is injective, we have, for x 6= y, that fC (x) = f (x) 6= f (y) = fC (y),
showing fC to be injective as well. If z ∈ NC , then z ∈ N for some (f, M, N ) ∈ C. Since
f : M −→ N is surjective, there exists x ∈ M ⊆ MC with fC (x) = f (x) = z, showing fC
to be surjective as well, proving (MC , NC , fC ) ∈ M. To see (MC , NC , fC ) to be an upper
bound for C, note that the definition of (MC , NC , fC ) immediately implies M ⊆ MC for
each (M, N, f ) ∈ C and fC ↾M = f for each (M, N, f ) ∈ C. Thus, Zorn’s lemma applies,
yielding a maximal element (Mmax , Nmax , fmax ) ∈ M. If a ∈ A \ Mmax and b ∈ B \ Nmax ,
then (
b for x = a,
f : {a} ∪˙ Mmax −→ {b} ∪˙ Nmax , f (x) :=
fmax (x) for x ∈ Mmax
is a bijective extension of fmax . Thus, the maximality of (Mmax , Nmax , fmax ) implies
Mmax = A or Nmax = B, completing the proof. 

Theorem A.56. Let A, B be sets, let A be infinite and assume there exists an injective
map φB : B −→ A. Then
#(A ∪ B) = #A, (A.30)
i.e. there exists a bijective map, mapping A onto A ∪ B (this result makes use of AC).
A AXIOMATIC SET THEORY 215

Proof. Since the map a 7→ a always maps A injectively into A ∪ B, it remains to show
the existence of an injective map φ : A ∪ B −→ A (then (A.30) holds due to the
Schröder-Bernstein Th. 3.12). Possibly replacing B with A \ B, we may also assume A
and B to be disjoint, without loss of generality. Thus, let M := A ∪˙ B. We apply Zorn’s
lemma of Th. A.52(iii) to
    
M := A ⊆ P(A) : ∀ #X = #N ∧ ∀ X 6= Y ⇒ X ∩ Y = ∅ , (A.31)
X∈A X,Y ∈A

partially ordered by set inclusion ⊆ (each element of M constitutes a partition of some


subset of A into infinite, countable subsets). Then M 6= ∅ by Th. A.54(ii). In the
usual way, if C ⊆ M is a chain, then the union of all elements of C is an element of M
that constitutes an upper bound for C. Thus, Zorn’s lemma applies, yielding a maximal
element Amax ∈ M. The maximality of Amax then implies
[
F := A \ C
C∈Amax

to be finite. Replacing some fixed C ∈ Amax with C ∪ F , we may assume Amax to be a


partition of A. For each X ∈ Amax , we have X ⊆ A and a bijective map φX : N −→ X.
Thus, the map
φ0 : N × Amax −→ A, φ0 (n, X) := φX (n),
is bijective as well. Letting N0 := {n ∈ N : n even} and N1 := {n ∈ N : n odd}, we
obtain
˙ 1 × Amax ).
N × Amax = (N0 × Amax ) ∪(N
Moreover, there exist bijective maps ψ0 : N×Amax −→ N0 ×Amax and ψ1 : N×Amax −→
N1 × Amax . Thus, we can define
(
(φ0 ◦ ψ0 ◦ φ−1
0 )(x) for x ∈ A,
φ : M := A ∪˙ B −→ A, φ(x) := −1
(φ0 ◦ ψ1 ◦ φ0 ◦ φB )(x) for x ∈ B,

which is, clearly, injective. 

Theorem A.57. Let A, B be nonempty sets, let A be infinite and assume there exists
an injective map φB : B −→ A. Then

#(A × B) = #A, (A.32)

i.e. there exists a bijective map, mapping A onto A × B (this result makes use of AC).
A AXIOMATIC SET THEORY 216

Proof. Since the map (a, b) 7→ (a, φB (b)) is injective from A × B into A × A, and the
map a 7→ (a, a) is injective from A into A × A, it suffices to show the existence of an
injective map from A × A into A (then (A.30) holds due to the Schröder-Bernstein Th.
3.12). However, we will actually directly prove the existence of a bijective map from A
onto A × A: To apply Zorn’s lemma of Th. A.52(iii), we define a partial order on the
set n o
M := (D, f ) : D ⊆ A, f : D −→ D × D is bijective (A.33)
by letting
(D, f ) ≤ (E, g) :⇔ D ⊆ E, g↾D = f. (A.34)
Then M 6= ∅, since there exists a bijective map f : N −→ N × N by Th. 3.28 and
a bijective map between some D ⊆ A and N by STh. A.54(ii). Every chain C ⊆ M
has an upper bound, namely (DC , fC ) with DC := (D,f )∈C D and fC (x) := f (x), where
(D, f ) ∈ C is chosen such that x ∈ D (since C is a chain, the value of fC (x) does not
actually depend on the choice of (D, f ) ∈ C and is, thus, well-defined). Clearly, DC ⊆ A,
and we merely need to verify fC : DC −→ DC × DC to be bijective. If x, y ∈ DC , then,
since C is a chain, there exists (D, f ) ∈ C with x, y ∈ D. As f is injective, we have
fC (x) = f (x) 6= f (y) = fC (y), showing fC to be injective as well. If (x, y) ∈ DC × DC
and (D, f ) ∈ C with x, y ∈ D as before, then the surjectivity of f : D −→ D × D yields
z ∈ D ⊆ DC with fC (z) = f (z) = (x, y), showing fC to be surjective as well, proving
(DC , fC ) ∈ M. To see (DC , fC ) to be an upper bound for C, note that the definition
of (DC , fC ) immediately implies D ⊆ DC for each (D, f ) ∈ C and fC ↾D = f for each
(D, f ) ∈ C. Thus, Zorn’s lemma applies, yielding a maximal element (Dmax , fmax ) ∈ M.
We would like to prove Dmax = A. However, in general, this can not be expected to
hold (for example, if A = N, and N \ D is finite, then (D, f ) ∈ M is always maximal
– since (E × E) \ (D × D) is infinite for each D ( E ⊆ A, f does not have a bijective
extension g : E −→ E × E). What we can prove, though, is the existence of a bijective
map φ : Dmax −→ A, which suffices, since the map
φ̃ : A −→ A × A, φ̃ := (φ, φ) ◦ fmax ◦ φ−1 ,
is then also bijective. It remains to establish the existence of the bijective map φ :
Dmax −→ A. Arguing via contraposition, we assume the nonexistence of a bijective
φ and show (Dmax , fmax ) is not maximal: Let E := A \ Dmax . Then A = Dmax ∪˙ E.
According to Th. A.56, there does not exist an injective map from E into Dmax . Then,
according to Th. A.55, there exists a bijective map φD : Dmax −→ N , N ⊆ E. Define
˙
P := (N × N ) ∪(N ˙ max × N ).
× Dmax ) ∪(D
Since φD : Dmax −→ N is bijective and fmax : Dmax −→ Dmax × Dmax is bijective, we
conclude
#(N × N ) = #(N × Dmax ) = #(Dmax × N ) = #Dmax = #N.
A AXIOMATIC SET THEORY 217

Thus, #P = #Dmax = #N by Th. A.56 and, in particular, there exists a bijection fN :


N −→ P . In consequence, we can combine the bijections fmax : Dmax −→ Dmax × Dmax
and fN : N −→ P to obtain a bijection

f : N ∪˙ Dmax −→ P ∪(D
˙ max × Dmax ) = (Dmax ∪˙ N ) × (Dmax ∪˙ N ),

which is a bijective extension of fmax , i.e. (N ∪˙ Dmax , f ) ∈ M and (Dmax , fmax ) is not
maximal. Thus, the maximality of (Dmax , fmax ) implies the existence of a bijective
φ : Dmax −→ A as desired. 

Corollary A.58. Let n ∈ N and let A be an infinite set. Then

#An = #A, (A.35)

i.e. there exists a bijective map, mapping A onto An (this result makes use of AC).

Proof. The claimed rule (A.35) follows via a simple induction from Th. A.57, since, for
each n ∈ N with n ≥ 2, the map

φ : An −→ An−1 × A, φ(a1 , . . . , an ) := (a1 , . . . , an−1 ), an ,

is, clearly, bijective. 

Theorem A.59. Let A be a set and let Pfin (A) denote the set of finite subsets of A.

(a) One has


#Pfin (A) = #2A
fin , (A.36)
where 2A A
fin := {0, 1}fin is defined as in Ex. 5.16(c) and a bijection between Pfin (A)
and {0, 1}A fin is given by restricting the map

χ : P(A) −→ {0, 1}A , χ(B) := χB ,

of Prop. 2.18 to Pfin (A).

(b) If A is infinite, then

∀ #Pn (A) = #Pfin (A) = #A, (A.37)


n∈N

i.e. there exists a bijective map, mapping A onto Pfin (A) as well as a bijective map,
mapping A onto Pn (A), where Pn (A) denotes the set of subsets of A with precisely
n elements (both results of (b) make use of AC).
A AXIOMATIC SET THEORY 218

Proof. (a): χ↾Pfin (A) is injective, since χ is injective. If f ∈ {0, 1}A


fin , then

Bf := {a ∈ A : f (a) 6= 0} ∈ Pfin (A)

with f = χBf , showing χ↾Pfin (A) to be surjective onto {0, 1}A


fin .

(b): Let n ∈ N and let a1 , . . . , an+1 ∈ A be n + 1 distinct elements of A. Define


(
a ∪ {a1 , . . . , an−1 } for a ∈
/ {a1 , . . . , an−1 },
ιn : A −→ Pn (A), ιn (a) :=
{a1 , . . . , an+1 } \ {a} for a ∈ {a1 , . . . , an−1 }.

Then, clearly, each ιn is injective as a map into Pn (A) and, due to Pn (A) ⊆ Pfin (A),
also as a map into Pfin (A). Thus, it remains to show the existence of injective maps
φn : Pn (A) −→ A and φ : Pfin (A) −→ A (then (A.37) holds due to the Schröder-
Bernstein Th. 3.12). Fix n ∈ N. Due to Cor. A.58, it suffices to define an injective map
φ̃n : Pn (A) −→ An . For each B ∈ Pn (A), let φB : {1, . . . , n} −→ B be bijective and
define
φ̃n : Pn (A) −→ An , φ̃n (B) := φB .
If B, C ∈ Pn (A) with x ∈ B \ C, then x ∈ φB ({1, . . . , n}), but x ∈ / φC ({1, . . . , n}),
showing φB 6= φC , i.e. φ̃n is injective. To obtain φ, it suffices to define an injective map
φ̃ : Pfin (A) −→ A × N, as there exists a bijective map from A × N onto A by Th. A.57.
Since
˙
[
Pfin (A) = Pn (A), (A.38)
n∈N

we may define

φ̃ : Pfin (A) −→ A × N, φ̃(B) := φn (B), n for B ∈ Pn (A),

which is, clearly, injective, due to (A.38) and the injectivity of φn . 

Theorem A.60. Let A, B be sets. If 2 ≤ #A (i.e. there exists an injective map ι2 :


{0, 1} −→ A), #A ≤ #B (i.e. there exists an injective map ιA : A −→ B), and B is
infinite, then

#AB = #2B = #P(B), (A.39a)


(A.37)
#AB B
fin = #2fin = #Pfin (B) = #B, (A.39b)

i.e. there exist bijective maps φ1 : AB −→ {0, 1}B and φ2 : {0, 1}B −→ P(B), as well as
bijective maps φ1,f : AB B B
fin −→ {0, 1}fin and φ2,f : {0, 1}fin −→ Pfin (B) (this result makes
use of AC). For the purposes of (A.39b), we introduce the notation 0 := ι2 (0) ∈ A, so
that both AB B
fin and 2fin make sense according to the definition in Ex. 5.16(c).
A AXIOMATIC SET THEORY 219

Proof. The existence of φ2 was already established in Prop. 2.18, the existence of φ2,f in
Th. A.59(a). Thus, according to the Schröder-Bernstein Th. 3.12, for the existence of φ1 ,
it suffices to show the existence of injective maps f1 : {0, 1}B −→ AB , f2 : AB −→ B B ,
f3 : B B −→ P(B × B), as well as a bijective map g : P(B × B) −→ P(B), which can
be stated concisely as

#2B ≤ #AB ≤ #B B ≤ #P(B × B) = #P(B). (A.40a)

Analogously, for the existence of φ1,f , it suffices to show the existence of injective maps
f1,f : {0, 1}B B B B B
fin −→ Afin , f2,f : Afin −→ Bfin , f3,f : Bfin −→ Pfin (B × B), as well as a
bijective map gf : Pfin (B × B) −→ Pfin (B), which can be stated concisely as

#2B B B
fin ≤ #Afin ≤ #Bfin ≤ #Pfin (B × B) = #Pfin (B) (A.40b)
B
(for Bfin to make sense according to the definition in Ex. 5.16(c), we now also introduce
the notation 0 := ιA (ι2 (0)) ∈ B). Clearly, the maps

f1 : {0, 1}B −→ AB , f1 (α)(b) := ι2 (α(b)),


f2 : AB −→ B B , f2 (α)(b) := ιA (α(b)),
B
f3 : B −→ P(B × B), f3 (α) := {(x, y) ∈ B × B : y = α(x)},

are, indeed, injective. Then the restrictions

f1,f : {0, 1}B B


fin −→ Afin , f1,f := f1 ↾{0,1}Bfin , f2,f : AB B
fin −→ Bfin , f2,f := f2 ↾ABfin ,

are well-defined and injective as well. While the restriction of f3 does not work, as it
does not map into Pfin (B × B), we can use the injective map
B
f3,f : Bfin −→ Pfin (B × B), f3,f (α) := {(x, y) ∈ B × B : y = α(x) 6= 0}.

From Th. A.57, we know the existence of a bijective map ψ : B × B −→ B, implying

g : P(B × B) −→ P(B), g(C) := ψ(C) = {ψ(x, y) : (x, y) ∈ C},

to be bijective as well, thereby proving (A.40a) and (A.39a). Then the restriction

gf : Pfin (B × B) −→ Pfin (B), gf := g↾Pfin (B×B) ,

is well-defined and also bijective, proving (A.40b) and (A.39b). 


B ASSOCIATIVITY AND COMMUTATIVITY 220

B General Forms of the Laws of Associativity and


Commutativity

B.1 Associativity
In the literature, the general law of associativity is often stated in the form that
a1 a2 · · · an gives the same result “for every admissible way of inserting parentheses into
a1 a2 · · · an ”, but a completely precise formulation of what that actually means seems to
be rare. As a warm-up, we first prove a special case of the general law:

Proposition B.1. Let (A, ·) be a semigroup (i.e. · : A × A −→ A is an associative


composition on A). Then
n
! k−1 ! n
Y Y Y
∀ ∀ ∀ ai ai = ai , (B.1)
n∈N, a1 ,...,an ∈A k∈{2,...,n}
n≥2 i=k i=1 i=1

where the product symbol is defined according to (3.16a).

Proof. The assumed associativity means the validity of

∀ (ab)c = a(bc). (B.2)


a,b,c∈A

If k = n, then (B.1) is immediate from (3.16a). For 2 ≤ k < n, we prove (B.1) by


induction on n: For the base case, n = 2, there is nothing to prove. For n > 2, one
computes
n
! k−1 ! n−1
! k−1 ! n−1
! k−1 !!
Y Y (3.16a) Y Y (B.2) Y Y
ai ai = an · ai ai = an · ai ai
i=k i=1 i=k i=1 i=k i=1

n−1
Y n
Y
ind.hyp. (3.16a)
= an · ai = ai , (B.3)
i=1 i=1

completing the induction and the proof of the proposition. 

The difficulty in stating the general form of the law of associativity lies in giving a
precise definition of what one means by “an admissible way of inserting parentheses into
a1 a2 · · · an ”. So how does one actually proceed to calculate the value of a1 a2 · · · an , given
that parentheses have been inserted in an admissible way? The answer is that one does
it in n − 1 steps, where, in each step, one combines two juxtaposed elements, consistent
B ASSOCIATIVITY AND COMMUTATIVITY 221

with the inserted parentheses. There can still be some ambiguity: For example, for
(a1 a2 )(a3 (a4 a5 )), one has the freedom of first combining a1 , a2 , or of first combining
a4 , a5 . In consequence, our general law of associativity will show that, for each admissible
sequence of n − 1 directives for combining two juxtaposed elements, the final result is
the same (under the hypothesis that (B.2) holds). This still needs some preparatory
work.
In the following,
Qn one might see it as a slight notational inconvenience that we have
defined i=1 ai as an · · · a1 rather than a1 · · · an . For this reason, we will enumerate the
elements to be combined by composition from right to left rather then from left to right.
Definition and Remark B.2. Let A be a nonempty set with a composition · : A ×
A −→ A, let n ∈ N, n ≥ 2, and let I be a totally ordered index set, #I = n, I =
{i1 , . . . , in } with i1 < · · · < in . Moreover, let F := (ain , . . . , ai1 ) be a family of n
elements of A.

(a) An admissible composition directive (for combining two juxtaposed elements of the
family) is an index ik ∈ I with 1 ≤ k ≤ n − 1. It transforms the family F into
the family G := (ain , . . . , aik+1 aik , . . . , ai1 ). In other words, G = (bj )j∈J , where
J := I \ {ik+1 }, bj = aj for each j ∈ J \ {ik }, and bik = aik+1 aik . We can write this
transformation as two maps
(1)
F 7→ δik (F ) := G = (ain , . . . , aik+1 aik , . . . , ai1 ) = (bj )j∈J , (B.4a)
(2)
I 7→ δik (I) := J = I \ {ik+1 }. (B.4b)
Thus, an application of an admissible composition directive reduces the length of
the family and the number of indices by one.
(b) Recursively, we define (finite) sequences of families, index sets, and indices as fol-
lows:
Fn := F, In := I, (B.5a)
(1) (2)
∀ Fα−1 := δjα (Fα ), Iα−1 := δjα (Iα ), where jα ∈ Iα \ {max Iα }.
α∈{2,...,n}
(B.5b)
The corresponding sequence of indices D := (jn , . . . , j2 ) in I is called an admissible
evaluation directive. Clearly,
∀ #Iα = α, i.e. Fα has length α. (B.6)
α∈{1,...,n}

In particular, I1 = {j2 } = {i1 } (where the second equality follows from (B.4b)),
F1 = (a), and we call
D(F ) := a (B.7)
B ASSOCIATIVITY AND COMMUTATIVITY 222

the result of the admissible evaluation directive D applied to F .


Theorem B.3 (General Law of Associativity). Let (A, ·) be a semigroup (i.e. · : A ×
A −→ A is an associative composition on A). Let n ∈ N, n ≥ 2, and let I be a
totally ordered index set, #I = n, I = {i1 , . . . , in } with i1 < · · · < in . Moreover, let
F := (ain , . . . , ai1 ) be a family of n elements of A. Then, for each admissible evaluation
directive as defined in Def. and Rem. B.2(b), the result is the same, namely
n
Y
D(F ) = ai k . (B.8)
k=1

Proof. We conduct the proof via induction on n. For n = 3, there are only two possible
directives and (B.2) guarantees that they yield the same result. For the induction step,
let n > 3. As in Def. and Rem. B.2(b), we write D = (jn , . . . , j2 ) and obtain some
I2 = {i1 , im }, 1 < m ≤ n, as the corresponding penultimate index set. Depending on
im , we partition (jn , . . . , j3 ) as follows: Set
 
J1 := k ∈ {3, . . . , n} : jk < im , J2 := k ∈ {3, . . . , n} : jk ≥ im . (B.9)

Then, for k ∈ J1 , jk is a composition directive to combine two elements to the right of


aim and, for k ∈ J2 , jk is a composition directive to combine two elements to the left of
aim . Moreover, J1 and J2 might or might not be the empty set: If J1 = ∅, then jk 6= i1
for each k ∈ {3, . . . , n}, implying im = i2 ; if J2 = ∅, then, in each of the n − 2 steps to
obtain I2 , an ik with k < m was removed from I, implying im = in (in particular, as
n 6= 2, J1 and J2 cannot both be empty). If J1 6= ∅, then D1 := (jk )k∈J1 is an admissible
evaluation directive for (aim−1 , . . . , ai1 ) – this follows from
(2)
jk ∈ K ⊆ {i1 , . . . , im−1 } ⇒ δjk (K) ⊆ K ⊆ {i1 , . . . , im−1 }. (B.10)

Since m − 1 < n, the induction hypothesis applies and yields


m−1
Y
D1 (aim−1 , . . . , ai1 ) = ai k . (B.11)
k=1

Analogously, if J2 6= ∅, then D2 := (jk )k∈J2 is an admissible evaluation directive for


(ain , . . . , aim ) – this follows from
(2)
jk ∈ K ⊆ {im , . . . , in } ⇒ δjk (K) ⊆ K ⊆ {im , . . . , in }. (B.12)

Since m > 1, the induction hypothesis applies and yields


n
Y
D2 (ain , . . . , aim ) = aik . (B.13)
k=m
B ASSOCIATIVITY AND COMMUTATIVITY 223

Thus, if J1 6= ∅ and J2 6= ∅, then we obtain


n
! m−1
! n
j2 =i1
Y Y Prop. B.1
Y
D(F ) = D2 (ain , . . . , aim ) · D1 (aim−1 , . . . , ai1 ) = ai k ai k = aik
k=m k=1 k=1
(B.14)
as desired. If J1 = ∅, then, as explained above, im = i2 . Thus, in this case,
n
! n
j2 =i1
Y Prop. B.1
Y
D(F ) = D2 (ain , . . . , ai2 ) · ai1 = ai k · ai 1 = aik (B.15)
k=2 k=1

as needed. Finally, if J2 = ∅, then, as explained above, im = in . Thus, in this case,


n
Y
j2 =i1
D(F ) = ain · D1 (ain−1 , . . . , ai1 ) = ai k , (B.16)
k=1

again, as desired, and completing the induction. 

B.2 Commutativity
In the present section, we will generalize the law of commutativity ab = ba to a finite
number of factors, provided the composition is also associative.
Theorem B.4 (General Law of Commutativity). Let (A, ·) be a semigroup (i.e. · :
A × A −→ A is an associative composition on A). If the composition is commutative,
i.e. if
∀ ab = ba, (B.17)
a,b∈A

then n n
Y Y
∀ ∀ ∀ ai = aπ(i) , (B.18)
n∈N π∈Sn a1 ,...,an ∈A
i=1 i=1

where Sn is the set of bijective maps on {1, . . . , n} (cf. Ex. 4.9(b)).

Proof. We conduct the proof via induction on n: For n = 1, there is nothing to prove
and, for n = 2, (B.18) is the same as (B.17). So let n > 2. As π is bijective, we may
define k := π −1 (n). Then
n n
! k−1
!
Y Th. B.3
Y Y
aπ(i) = aπ(i) · aπ(k) · aπ(i)
i=1 i=k+1 i=1
n
! k−1
!
(B.17) Y Y
= aπ(k) · aπ(i) · aπ(i) . (B.19)
i=k+1 i=1
C GROUPS 224

Define the bijective map


(
π(j) for 1 ≤ j ≤ k − 1,
ϕ : {1, . . . , n − 1} −→ {1, . . . , n − 1}, ϕ(j) := (B.20)
π(j + 1) for k ≤ j ≤ n − 1
(where the bijectivity of π implies the bijectivity of ϕ). Then
n n−1
! k−1
! n−1
Y (B.19) Y Y Th. B.3
Y
aπ(i) = an · aϕ(i) · aϕ(i) = an · aϕ(i)
i=1 i=k i=1 i=1
n−1
Y n
Y
ind.hyp.
= an · ai = ai , (B.21)
i=1 i=1

completing the induction proof. 

The following example shows that, if the composition is not associative, then, in general,
(B.17) does not imply (B.18):
Example B.5. Let A := {a, b, c} with #A = 3 (i.e. the elements a, b, c are all distinct).
Let the composition · on A be defined according to the composition table
· a b c
a b b b
b b b a
c b a a
Then, clearly, · is commutative. However, · is not associative, since, e.g.,
(cb)a = aa = b 6= a = cb = c(ba), (B.22)
and (B.18) does not hold, since, e.g.,
a(bc) = aa = b 6= a = cb = c(ba). (B.23)

C Groups
In Ex. 4.9(b), we defined the symmetric group SM of bijective maps from M into M (i.e.
of so-called permutations). It is a remarkable result that every group G is isomorphic
to a subgroup of the permutation group SG and the proof is surprisingly simple (cf.
Th. C.2 below). On the other hand, this result seems to have surprisingly few useful
applications, partly due to the fact that SG is usually much bigger (and, thus, usually
more difficult to study) than G (cf. Prop. C.4 below).
We start with a preparatory lemma:
C GROUPS 225

Lemma C.1. Let M, N be sets and let ψ : M −→ N be bijective. Then the symmetric
groups SM and SN are isomorphic.

Proof. Exercise. 

Theorem C.2 (Cayley). Let (G, ·) be a group. Then G is isomorphic to a subgroup of


the symmetric group (SG , ◦) of permutations on G. In particular, every finite group is
isomorphic to a subgroup of Sn .

Proof. Exercise. Hint: Show that

φ : G −→ SG , a 7→ fa , fa (x) := ax, (C.1)

defines a monomorphism and use Lem. C.1. 

Notation C.3. Let M, N be sets. Define

S(M, N ) := {(f : M −→ N ) : f bijective}.

Proposition C.4. Let M, N be sets with #M = #N = n ∈ N0 , S := S(M, N ) (cf.


Not. C.3). Then #S = n!; in particular #SM = n!.

Proof. We conduct the proof via induction: If n = 0, then S contains precisely the
empty map (i.e. the empty set) and #S = 1 = 0! is true. If n = 1 and M = {a},
N = {b}, then S contains precisely the map f : M −→ N , f (a) = b, and #S = 1 = 1!
is true. For the induction step, fix n ∈ N and assume #M = #N = n + 1. Let a ∈ M
and [  
A := S M \ {a}, N \ {b} . (C.2)
b∈N

Since the union in (C.2) is finite and disjoint, one has


X   X
ind.hyp.
#A = #S M \ {a}, N \ {b} = (n!) = (n + 1) · n! = (n + 1)!. (C.3)
b∈N b∈N

Thus, it suffices to show

φ : S −→ A, φ(f ) : M \ {a} −→ N \ {f (a)}, φ(f ) := f ↾M \{a} , (C.4)

is well-defined and bijective. If f : M −→ N is bijective, then f ↾M \{a} : M \ {a} −→


N \ {f (a)} is bijective as well, i.e. φ is well-defined. Suppose f, g ∈ S with f 6= g. If
f (a) 6= g(a), then φ(f ) 6= φ(g), as they have different codomains. If f (a) = g(a), then
there exists x ∈ M \ {a} with f (x) 6= g(x), implying φ(f )(x) = f (x) 6= g(x) = φ(g)(x),
D NUMBER THEORY 226


i.e., once again, φ(f ) 6= φ(g). Thus, φ is injective. Now let h ∈ S M \ {a}, N \ {b} for
some b ∈ N . Letting
(
b for x = a,
f : M −→ N, f (x) :=
h(x) for x 6= a,

we have φ(f ) = h, showing φ to be surjective as well. 

D Number Theory
The algebraic discipline called number theory, studies properties of the ring of integers
(and similar algebraic structures).

Theorem D.1 (Remainder Theorem). For each pair of numbers (a, b) ∈ N2 , there
exists a unique pair of numbers (q, r) ∈ N20 satisfying the two conditions a = qb + r and
0 ≤ r < b.

Proof. Existence: Define

q := max{n ∈ N0 : nb ≤ a}, (D.1a)


r := a − qb. (D.1b)

Then q ∈ N0 by definition and (D.1b) immediately yields a = qb + r as well as r ∈ Z.


Moreover, from (D.1a), qb ≤ a = qb + r, i.e. 0 ≤ r, in particular, r ∈ N0 . Since (D.1a)
also implies (q + 1)b > a = qb + r, we also have b > r as required.
Uniqueness: Suppose (q1 , r1 ) ∈ N0 , satisfying the two conditions a = q1 b + r1 and
0 ≤ r1 < b. Then q1 b = a − r1 ≤ a and (q1 + 1)b = a − r1 + b > a, showing
q1 = max{n ∈ N0 : nb ≤ a} = q. This, in turn, implies r1 = a − q1 b = a − qb = r,
thereby establishing the case. 

Definition D.2. (a) Let a, k ∈ Z. We define a to be a divisor of k (and also say that
a divides k, denoted a| k) if, and only if, there exists b ∈ Z such that k = ab. If a
is no divisor of k, then we write a 6 | k.

(b) A number p ∈ N, p ≥ 2, is called prime if, and only if, 1 and p are its only divisors.

(c) Let M be a nonempty set of integers. If M 6= {0}, then the number


 
gcd(M ) := max n ∈ N : ∀ n| k (D.2)
k∈M
D NUMBER THEORY 227

is called the greatest common divisor of the numbers in M . If M = {a, b}, then
one also writes gcd(a, b) := gcd(M ). The numbers in M are called relatively prime
or mutually prime if, and only if gcd(M ) = 1. If M is finite and 0 ∈
/ M , then the
number  
lcm(M ) := min n ∈ N : ∀ k| n (D.3)
k∈M

is called the least common multiple of the numbers in M . If M = {a, b}, then one
also writes lcm(a, b) := lcm(M ).

Remark D.3. Let M be a nonempty set of integers. If M 6= {0}, then gcd(M ) is


well-defined, since, for each k ∈ Z, 1| k, and, if 0 6= k ∈ M , then gcd(M ) ≤ k. If M is
finite
Q and 0 ∈/ M , then lcm(M ) is well-defined, since max{|k| : k ∈ M } ≤ lcm(M ) ≤
k∈M |k|. Some examples are

gcd(Z) = 1, gcd(3Z) = 3, gcd{8, 12, 20} = 4,


lcm(2, 3} = 6, lcm(4, 8) = 8, lcm{8, 12, 20} = 120.

Theorem D.4 (Bézout’s Lemma). If a, b ∈ Z with {a, b} 6= {0} and d := gcd(a, b),
then
{xa + yb : x, y ∈ Z} = d Z. (D.4)
In particular,
∃ xa + yb = d, (D.5)
x,y∈Z

which is known as Bézout’s identity. An important special case is

gcd(a, b) = 1 ⇒ ∃ xa + yb = 1. (D.6)
x,y∈Z

Proof. Clearly, it suffices to prove (D.4). To prove (D.4), we show

S := {xa + yb : x, y ∈ Z} ∩ N = d N. (D.7)

By setting y := 0 and x := ±1, we see that S contains a or −a, i.e. S 6= ∅. Let d := min S.
Then there exist s, t ∈ Z with sa + tb = d. It remains to show d := gcd(a, b). We apply
Th. D.1, to obtain (q, r) ∈ N20 with

|a| = qd + r ∧ 0 ≤ r < d.

Then, letting s̃ := s for a = |a| and s̃ := −s for a = −|a|, we have

r = |a| − qd = |a| − q(|a|s̃ + bt) = |a|(1 − qs̃) − bqt ∈ S ∪ {0}.


D NUMBER THEORY 228

Since d = min S and r < d, this yields r = 0 and d | a. Using b instead of a in the above
reasoning, yields d | b as well, showing d to be a common divisor of a and b. Now let
c ∈ N be an arbitrary common divisor of a and b. Then there exist u, v ∈ Z such that
a = cu and b = cv, implying

d = as + bt = cus + cvt = c(us + vt).

In consequence c|d, implying c ≤ d, showing d := gcd(a, b), as desired. 

Theorem D.5 (Euclid’s Lemma). Let a, b ∈ Z and n ∈ N.

(a) If n and a are relatively prime (i.e. gcd(n, a) = 1), then

n| ab ⇒ n|b.

(b) If n is prime, then 


n| ab ⇒ n|a ∨ n|b .

Proof. (a): As gcd(n, a) = 1, from (D.6), we obtain x, y ∈ Z such that xn + ya = 1.


Multiplying this equation by b yields

xnb + yab = b.

Moreover, by hypothesis, there exists c ∈ Z with ab = nc, implying n(xb + yc) = b, i.e.
n|b as claimed.
(b): Suppose n is prime, n| ab, and n 6 | a. As n is prime, the only divisors of n are 1
and n. As n 6 | a, we know gcd(n, a) = 1, implying n|b by (a). 

Theorem D.6 (Fundamental Theorem of Arithmetic). Every n ∈ N, n ≥ 2, has a


unique factorization into prime numbers (unique, except for the order of the primes).
More precisely, given n as above, there exists a unique function kn : Pn −→ N such that
Pn ⊆ {p ∈ N : p prime} and Y
n= pkn (p) (D.8)
p∈Pn

(then, clearly, Pn is necessarily finite).

Proof. Existence: This is a simple induction proof: The base case is n = 2. As 2 is


clearly prime, we can let P2 := {2} and k2 (2) := 1. For the induction step, let n > 2. If
n is prime, then, as in the base case, let Pn := {n} and kn (n) := 1. If n is not prime,
D NUMBER THEORY 229

then there exist a, b ∈ N with 1 < a ≤ b < n and n = ab. By induction hypothesis, there
exist functions ka : Pa −→ N, kb : Pb −→ N, such that Pa , Pb ⊆ {p ∈ N : p prime} and
Y Y
a= pka (p) , b = pkb (p) .
p∈Pa p∈Pb

Letting Pn := Pa ∪ Pb and

ka (p)
 for p ∈ Pa \ Pb ,
kn : Pn −→ N, kn (p) := kb (p) for p ∈ Pb \ Pa ,


ka (p) + kb (p) for p ∈ Pa ∩ Pb ,
we obtain
Y Y Y Y Y Y
n = ab = pka (p) pkb (p) = pka (p) pkb (p) pka (p)+kb (p) = pkn (p) ,
p∈Pa p∈Pb p∈Pa \Pb p∈Pb \Pa p∈Pa ∩Pb p∈Pn

thereby completing the induction proof.


Uniqueness: Here we will make use of Th. D.5(b). As existence, the proof is via induction
on n. Since n = 2 has precisely the divisors 1 and 2, uniqueness for the base case is clear.
In the same way, uniqueness is clear for every prime n > 2. Thus, for the induction
step, it only remains to consider the case, where n > 2 is not prime. Suppose, we have
P, Q ⊆ {p ∈ N : p prime} and k : P −→ N, l : Q −→ N, such that
Y Y
n= pk(p) = q l(q) .
p∈P q∈Q
Q
Let p0 ∈ P . Since p0 is prime and p0 |n = q∈Q q l(q) , Th. D.5(b) implies p0 | q0 for some
q0 ∈ Q. As p0 and q0 are both prime, this implies p0 = q0 and
k(p )−1
Y l(p )−1
Y
m := n/p0 = p0 0 pk(p) = p0 0 q l(q) .
p∈P \{p0 } q∈Q\{p0 }

Since n is not prime, we know m ≥ 2. As we know m < n, we may use the induction
hypothesis to conclude P = Q, k(p) = l(p) for each p ∈ P \{p0 }, and k(p0 )−1 = l(p0 )−1,
i.e. k = l as desired. 
Theorem D.7 (Euclid’s Theorem). The set of prime numbers is infinite.

Proof. We show that no finite set ofQprimes contains every prime: Let ∅ 6= P be a finite
set of prime numbers. Define p := q∈P q, p̃ := p + 1. Then p̃ ∈ / P , since p̃ > q for each
q ∈ P . Thus, if p̃ is prime, we are already done. If p̃ is not prime, then there exists a
prime n such that n|p̃ (this is immediate from Th. D.6, but also follows easily without
Th. D.6, since p̃ can only have finitely many factors a with 1 < a < p̃). If n ∈ P , then
there is a ∈ N such that na = p. As there is also b ∈ N such that nb = p̃, we have
1 = p̃ − p = nb − na = n(b − a), implying n = b − a = 1, in contradiction to 1 ∈ / P.
Thus, n ∈ / P , completing the proof. 
E VECTOR SPACES 230

E Vector Spaces

E.1 Cardinality and Dimension


Theorem E.1. Let V be a vector space over the field F and let B be a basis of V .
B B B
(a) #V = #Ffin , i.e. there exists a bijective map φ : V −→ Ffin , where Ffin is defined
as in Ex. 5.16(c).
(b) If B is infinite and #F ≤ #B (i.e. there exists an injective map φF : F −→ B),
then
B
#V = #Ffin = #2Bfin = #Pfin (B) = #B, (E.1)
i.e. there exist bijective functions
B B
φ1 : V −→ Ffin , φ2 : Ffin −→ {0, 1}B
fin ,
φ3 : {0, 1}B
fin −→ Pfin (B), φ4 : Pfin (B) −→ B.

Proof. (a): Given v ∈ V and b ∈ B, let cv (b) denote the coordinate of v with respect to
b according to Th. 5.19. Then
B
φ : V −→ Ffin , φ(v)(b) := cv (b),
is well-defined and bijective by Th. 5.19.
(b): The map φ1 exists according to (a), whereas φ2 , φ3 , φ3 exist according to Th.
A.60. 
Corollary E.2. Let F be a field and let M be an infinite set. If #F ≤ #M (i.e. there
exists an injective map φF : F −→ M ), then
dim F M = #2M (as vector space over F ), (E.2)
i.e. if B is a basis of F M , then there exists a bijective map φ : B −→ P(M ). In
particular, we have
dim RR = #2R (as vector space over R), (E.3a)
[Phi16, Th. F.2]
dim(Z2 )N = #2N = #R (as vector space over Z2 ). (E.3b)

Proof. Let B be a basis of F M . Since Ffin


M
is a subspace of F M and dim Ffin
M
= #M by
(5.36), there exists an injective map φM : M −→ B by Th. 5.27(b). In consequence,
the map (φM ◦ φF ) : F −→ B is injective as well. Thus, Th. E.1(b) applies and proves
(E.2). For (E.3a), we apply (E.2) with F = M = R; for (E.3b), we apply (E.2) with
F = Z2 and M = N. 
E VECTOR SPACES 231

E.2 Cartesian Products


Example E.3. In generalization of Ex. 5.2(c), if I is a nonempty index set, F is a field,
and (Y
Qi )i∈I is a family of vector spaces over F , then we make the Cartesian product
V := i∈I Yi into a vector space over F by defining for each a := (ai )i∈I , b := (bi )i∈I ∈ V :
(a + b)i := ai + bi , (E.4a)
(λa)i := λai for each λ ∈ F : (E.4b)
To verify that (V, +, ·) is, indeed, a vector space, we begin by checking Def. 5.1(i), i.e.
by showing (V, +) is a commutative group: Let n = (ni )i∈I ∈ A, ni := 0 ∈ Yi . Then
∀ ∀ (a + n)i = ai + 0 = ai ,
a=(ai )i∈I ∈V i∈I

i.e. a + n = a, showing n is the additive neutral element of V . Next, given a = (ai )i∈I ∈
V , define ā = (āi )i∈I , āi := −ai . Then
∀ (a + ā)i = ai + (−ai ) = 0,
i∈I

i.e. a + ā = n = 0, showing ā to be the additive inverse element of a (as usual, one then
writes −a instead of ā). Associativity is verified since
 (∗)
(a + b) + c = (ai + bi ) + ci = ai + (bi + ci )
∀ ∀ i

(ai )i∈I ,(bi )i∈I ,(ci )i∈I ∈V i∈I
= a + (b + c) i ,

where the associativity of addition in the vector space Yi was used at (∗). Likewise,
commutativity is verified since
(∗)
∀ ∀ (a + b)i = ai + bi = bi + ai = (b + a)i ,
(ai )i∈I ,(bi )i∈I ∈V i∈I

where the commutativity of addition in the vector space Yi was used at (∗). This
completes the proof that (V, +) is a commutative group.
To verify Def. 5.1(ii), one computes
 (∗)
λ(a + b) = λ(ai + bi ) = λai + λbi
∀ ∀ ∀ i
λ∈F (ai )i∈I ,(bi )i∈I ∈V i∈I
= (λa + λb)i ,
where distributivity in the vector space Yi was used at (∗), proving λ(a + b) = λa + λb.
Similarly,
 (∗)
(λ + µ)a i = (λ + µ)ai = λai + µai
∀ ∀ ∀
λ,µ∈F (ai )i∈I ∈V i∈I
= (λa + µa)i ,
REFERENCES 232

where, once more, distributivity in the vector space Yi was used at (∗), proving (λ+µ)a =
λa + µa.
The proof of Def. 5.1(iii), is given by
 (∗) 
∀ ∀ ∀ (λµ)a i
= (λµ)ai = λ(µai ) = λ(µa) i ,
λ,µ∈F (ai )i∈I ∈V i∈I

where the validity of Def. 5.1(iii) in the vector space Yi was used at (∗).
Finally, to prove Def. 5.1(iv), one computes
(∗)
∀ ∀ (1 · a)i = 1 · ai = ai ,
(ai )i∈I ∈V i∈I

where the validity of Def. 5.1(iv) in the vector space Yi was used at (∗).
Example E.4. In generalization of Ex. 6.7(c), we consider general projections defined
on a general Cartesian product of vector spaces, as defined in Ex. E.3 above: Let I be
a nonempty
Q index set, let (Yi )i∈I be a family of vector spaces over the field F , and let
V := i∈I Yi be the Cartesian product vector space as defined in Ex. E.3. For each
∅ 6= J ⊆ I, define
Y
πJ : V −→ VJ := Yi , πJ ((ai )i∈I ) := (ai )i∈J ,
i∈J

calling πJ the projection onto the coordinates in J We verify πJ to be linear: Let λ ∈ F


and a := (ai )i∈I , b := (bi )i∈I ∈ V . Then
πJ (λa) = (λai )i∈J , = λ(ai )i∈J = λπJ (a),
πJ (a + b) = (ai + bi )i∈J = (ai )i∈J + (bi )i∈J = πJ (a) + πJ (b),
proving πJ to be linear. Moreover, for each ∅ 6= J ⊆ I, we have
Im πJ = VJ , ker πJ = {(ai )i∈I : ai = 0 for each i ∈ J} ∼
= Y I\J , V = VJ ⊕ ker πJ ,
(E.5)
where, for the last equality, we identified
VJ ∼
= {(ai )i∈I : ai = 0 for each i ∈
/ J}.

References
[Bla84] A. Blass. Existence of Bases Implies the Axiom of Choice. Contemporary
Mathematics 31 (1984), 31–33.
REFERENCES 233

[EFT07] H.-D. Ebbinghaus, J. Flum, and W. Thomas. Einführung in die math-


ematische Logik, 5th ed. Spektrum Akademischer Verlag, Heidelberg, 2007
(German).

[Hal17] Lorenz J. Halbeisen. Combinatorial Set Theory, 2nd ed. Springer Mono-
graphs in Mathematics, Springer, Cham, Switzerland, 2017.

[Jec73] T. Jech. The Axiom of Choice. North-Holland, Amsterdam, 1973.

[Kun80] Kenneth Kunen. Set Theory. Studies in Logic and the Foundations of Math-
ematics, Vol. 102, North-Holland, Amsterdam, 1980.

[Kun12] Kenneth Kunen. The Foundations of Mathematics. Studies in Logic,


Vol. 19, College Publications, London, 2012.

[Phi16] P. Philip. Analysis I: Calculus of One Real Variable. Lecture Notes, LMU
Munich, 2015/2016, AMS Open Math Notes Ref. # OMN:202109.111306,
available in PDF format at
https://www.ams.org/open-math-notes/omn-view-listing?listingId=111306.

[Phi17] P. Philip. Functional Analysis. Lecture Notes, Ludwig-Maximi-


lians-Universität, Germany, 2017, available in PDF format at
http://www.math.lmu.de/~philip/publications/lectureNot
es/philipPeter_FunctionalAnalysis.pdf.

[Phi19] P. Philip. Linear Algebra II. Lecture Notes, Ludwig-Maximilians-Universi-


tät, Germany, 2019, available in PDF format at
http://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_LinearAlgebra2.pdf.

[Phi21] P. Philip. Numerical Mathematics I. Lecture Notes, Ludwig-Ma-


ximilians-Universität, Germany, 2020/2021, available in PDF format
at http://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_Num
ericalMathematics1.pdf.

[Str08] Gernot Stroth. Lineare Algebra, 2nd ed. Berliner Studienreihe zur Math-
ematik, Vol. 7, Heldermann Verlag, Lemgo, Germany, 2008 (German).

[Wik15a] Wikipedia. Coq — Wikipedia, The Free Encyclopedia. 2015, https://en.


wikipedia.org/wiki/Coq Online; accessed Sep-01-2015.

[Wik15b] Wikipedia. HOL Light — Wikipedia, The Free Encyclopedia. 2015, https:
//en.wikipedia.org/wiki/HOL_Light Online; accessed Sep-01-2015.
REFERENCES 234

[Wik15c] Wikipedia. Isabelle (proof assistant) — Wikipedia, The Free Encyclopedia.


2015, https://en.wikipedia.org/wiki/Isabelle_(proof_assistant)
Online; accessed Sep-01-2015.

You might also like