Professional Documents
Culture Documents
Peter Philip∗
Lecture Notes
Originally Created for the Class of Winter Semester 2018/2019 at LMU Munich
Includes Subsequent Corrections and Revisions†
Contents
1 Foundations: Mathematical Logic and Set Theory 5
1.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Propositional Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Predicate Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1
CONTENTS 2
5 Vector Spaces 90
5.1 Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Linear Independence, Basis, Dimension . . . . . . . . . . . . . . . . . . . 97
7 Matrices 129
7.1 Definition and Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2 Matrices as Representations of Linear Maps . . . . . . . . . . . . . . . . 134
7.3 Rank and Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.4 Special Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.5 Blockwise Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . 150
C Groups 224
References 232
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 5
for all vectors v, w and all scalars λ, µ. Before we can properly define and study vector
spaces and linear equations, it still needs some preparatory work. In modern mathe-
matics, the objects under investigation are almost always so-called sets. So one aims at
deriving (i.e. proving) true (and interesting and useful) statements about sets from other
statements about sets known or assumed to be true. Such a derivation or proof means
applying logical rules that guarantee the truth of the derived (i.e. proved) statement.
However, unfortunately, a proper definition of the notion of set is not easy, and neither
is an appropriate treatment of logic and proof theory. Here, we will only be able to
briefly touch on the bare necessities from logic and set theory needed to proceed to the
core matter of this class. We begin with logic in Sec. 1.2, followed by set theory in
Sec. 1.3, combining both in Sec. 1.4. The interested student can find an introductory
presentation of axiomatic set theory in Appendix A and he/she should consider taking
a separate class on set theory, logic, and proof theory at a later time.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 6
Mathematical logic is a large field in its own right and, as indicated above, a thorough in-
troduction is beyond the scope of this class – the interested reader may refer to [EFT07],
[Kun12], and references therein. Here, we will just introduce some basic concepts using
common English (rather than formal symbolic languages – a concept touched on in Sec.
A.2 of the Appendix and more thoroughly explained in books like [EFT07]).
As mentioned before, mathematics establishes the truth or falsehood of statements. By
a statement or proposition we mean any sentence (any sequence of symbols) that can
reasonably be assigned a truth value, i.e. a value of either true, abbreviated T, or false,
abbreviated F. The following example illustrates the difference between statements and
sentences that are not statements:
The fourth sentence in Ex. 1.1(b) is not a statement, as it can not be said to be either
true or false without any further knowledge on x. The fifth sentence in Ex. 1.1(b) is
not a statement as it lacks any meaning and can, hence, not be either true or false. It
would become a statement if given a definition of what it means for a natural number
to be green.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 7
The next step now is to combine statements into new statements using logical operators,
where the truth value of the combined statements depends on the truth values of the
original statements and on the type of logical operator facilitating the combination.
The simplest logical operator is negation, denoted ¬. It is actually a so-called unary
operator, i.e. it does not combine statements, but is merely applied to one statement.
For example, if A stands for the statement “Every dog is an animal.”, then ¬A stands
for the statement “Not every dog is an animal.”; and if B stands for the statement “The
number 4 is odd.”, then ¬B stands for the statement “The number 4 is not odd.”, which
can also be expressed as “The number 4 is even.”
To completely understand the action of a logical operator, one usually writes what is
known as a truth table. For negation, the truth table is
A ¬A
T F (1.1)
F T
that means if the input statement A is true, then the output statement ¬A is false; if
the input statement A is false, then the output statement ¬A is true.
We now proceed to discuss binary logical operators, i.e. logical operators combining
precisely two statements. The following four operators are essential for mathematical
reasoning:
Conjunction: A and B, usually denoted A ∧ B.
Disjunction: A or B, usually denoted A ∨ B.
Implication: A implies B, usually denoted A ⇒ B.
Equivalence: A is equivalent to B, usually denoted A ⇔ B.
Here is the corresponding truth table:
When first seen, some of the assignments of truth values in (1.2) might not be completely
intuitive, due to the fact that logical operators are often used somewhat differently in
common English. Let us consider each of the four logical operators of (1.2) in sequence:
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 8
For the use in subsequent examples, let A1 , . . . , A6 denote the six statements from Ex.
1.1(a).
Conjunction: Most likely the easiest of the four, basically identical to common language
use: A ∧ B is true if, and only if, both A and B are true. For example, using Ex. 1.1(a),
A1 ∧ A4 is the statement “Every dog is an animal and 2 + 3 = 5.”, which is true since
both A1 and A4 are true. On the other hand, A1 ∧ A3 is the statement “Every dog is
an animal and the number 4 is odd.”, which is false, since A3 is false.
Disjunction: The disjunction A ∨ B is true if, and only if, at least one of the statements
A, B is true. Here one already has to be a bit careful – A ∨ B defines the inclusive or,
whereas “or” in common English is often understood to mean the exclusive or (which is
false if both input statements are true). For example, using Ex. 1.1(a), A1 ∨ A4 is the
statement “Every dog is an animal or 2 + 3 = 5.”, which is true since both A1 and A4
are true. The statement A1 ∨ A3 , i.e. “Every dog is an animal or the number 4 is odd.”
is also true,
√ since A1 is true. However, the statement A2 ∨ A5 , i.e. “Every animal is a
dog or 2 < 0.” is false, as both A2 and A5 are false.
As you will have noted in the above examples, logical operators can be applied to
combine statements that have no obvious contents relation. While this might seem
strange, introducing contents-related restrictions is unnecessary as well as undesirable,
since it is often not clear which seemingly unrelated statements might suddenly appear
in a common context in the future. The same occurs when considering implications and
equivalences, where it might seem even more obscure at first.
Implication: Instead of A implies B, one also says if A then B, B is a consequence
of A, B is concluded or inferred from A, A is sufficient for B, or B is necessary for
A. The implication A ⇒ B is always true, except if A is true and B is false. At first
glance, it might be surprising that A ⇒ B is defined to be true for A false and B true,
however, there are many examples of incorrect statements implying correct statements.
For instance, squaring the (false) equality of integers −1 = 1, implies the (true) equality
of integers 1 = 1. However, as with conjunction and disjunction, it is perfectly valid
to combine statements without any obvious context relation: For example, using Ex.
1.1(a), the statement A1 ⇒ A6 , i.e. “Every dog is an animal implies x + 1 > 0 holds for
each natural number x.” is true, since A6 is true, whereas the statement A4 ⇒ A2 , i.e.
“2 + 3 = 5 implies every animal is a dog.” is false, as A4 is true and A2 is false.
Of course, the implication A ⇒ B is not really useful in situations, where the truth
values of both A and B are already known. Rather, in a typical application, one tries
to establish the truth of A to prove the truth of B (a strategy that will fail if A happens
to be false).
Example 1.2. Suppose we know Sasha to be a member of a group of students, taking
a class in Analysis. Then the statement A “Sasha has taken a class in Analysis before.”
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 9
implies the statement B “There is at least one student in the group, who has taken the
class before”. A priori, we might not know if Sasha has taken the Analysis class before,
but if we can establish that Sasha has, indeed, taken the class before, then we also know
B to be true. If we find Sasha to be taking the class for the first time, then we do not
know, whether B is true or false.
—
Equivalence: A ⇔ B means A is true if, and only if, B is true. Once again, using
input statements from Ex. 1.1(a), we see that A1 ⇔ A4 , i.e. “Every dog is an animal
is equivalent to 2 + 3 = 5.”, is true as well as A2 ⇔ A3 , i.e. “Every animal is a dog is
equivalent to √the number 4 is odd.”. On the other hand, A4 ⇔ A5 , i.e. “2 + 3 = 5 is
equivalent to 2 < 0, is false.
Analogous to the situation of implications, A ⇔ B is not really useful if the truth values
of both A and B are known a priori, but can be a powerful tool to prove B to be true
or false by establishing the truth value of A. It is obviously more powerful than the
implication as illustrated by the following example (compare with Ex. 1.2):
Example 1.3. Suppose we know Sasha has obtained the highest score among the stu-
dents registered for the Analysis class. Then the statement A “Sasha has taken the
Analysis class before.” is equivalent to the statement B “The student with the highest
score has taken the class before.” As in Ex. 1.2, if we can establish Sasha to have taken
the class before, then we also know B to be true. However, in contrast to Ex. 1.2, if we
find Sasha to have taken the class for the first time, then we know B to be false.
Remark 1.4. In computer science, the truth value T is often coded as 1 and the truth
value F is often coded as 0.
1.2.3 Rules
Note that the expressions in the first row of the truth table (1.2) (e.g. A ∧ B) are not
statements in the sense of Sec. 1.2.1, as they contain the statement variables (also known
as propositional variables) A or B. However, the expressions become statements if all
statement variables are substituted with actual statements. We will call expressions of
this form propositional formulas. Moreover, if a truth value is assigned to each statement
variable of a propositional formula, then this uniquely determines the truth value of the
formula. In other words, the truth value of the propositional formula can be calculated
from the respective truth values of its statement variables – a first justification for the
name propositional calculus.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 10
(b) The propositional formula A ∨ (¬A), also known as the law of the excluded middle,
has the remarkable property that its truth value is T for every possible choice of
truth values for A:
A ¬A A ∨ (¬A)
T F T (1.4)
F T T
Formulas with this property are of particular importance.
Definition 1.6. A propositional formula is called a tautology or universally true if,
and only if, its truth value is T for all possible assignments of truth values to all the
statement variables it contains.
Notation 1.7. We write φ(A1 , . . . , An ) if, and only if, the propositional formula φ
contains precisely the n statement variables A1 , . . . , An .
Definition 1.8. The propositional formulas φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) are called
equivalent if, and only if, φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ) is a tautology.
Lemma 1.9. The propositional formulas φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) are equiva-
lent if, and only if, they have the same truth value for all possible assignments of truth
values to A1 , . . . , An .
Proof. If φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) are equivalent and Ai is assigned the truth
value ti , i = 1, . . . , n, then φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ) being a tautology implies it
has truth value T. From (1.2) we see that either φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) both
have truth value T or they both have truth value F.
If, on the other hand, we know φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) have the same truth
value for all possible assignments of truth values to A1 , . . . , An , then, given such an
assignment, either φ(A1 , . . . , An ) and ψ(A1 , . . . , An ) both have truth value T or both
have truth value F, i.e. φ(A1 , . . . , An ) ⇔ ψ(A1 , . . . , An ) has truth value T in each case,
showing it is a tautology.
For all logical purposes, two equivalent formulas are exactly the same – it does not
matter if one uses one or the other. The following theorem provides some important
equivalences of propositional formulas. As too many parentheses tend to make formulas
less readable, we first introduce some precedence conventions for logical operators:
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 11
Theorem 1.11. (a) (A ⇒ B) ⇔ ¬A ∨ B. This means one can actually define impli-
cation via negation and disjunction.
(b) (A ⇔ B) ⇔ (A ⇒ B) ∧ (B ⇒ A) , i.e. A and B are equivalent if, and only if, A
is both necessary and sufficient for B. One also calls the implication B ⇒ A the
converse of the implication A ⇒ B. Thus, A and B are equivalent if, and only if,
both A ⇒ B and its converse hold true.
Proof. Each equivalence is proved by providing a truth table and using Lem. 1.9.
(a):
A B ¬A A ⇒ B ¬A ∨ B
T T F T T
T F F F F
F T T T T
F F T T T
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 12
(j): Exercise.
(k):
A ¬A ¬¬A
T F T
F T F
(l):
A B ¬A ¬B A ⇒ B ¬B ⇒ ¬A
T T F F T T
T F F T F F
F T T F T T
F F T T T T
Having checked all the rules completes the proof of the theorem.
The importance of the rules provided by Th. 1.11 lies in their providing proof techniques,
i.e. methods for establishing the truth of statements from statements known or assumed
to be true. The rules of Th. 1.11 will be used frequently in proofs throughout this class.
Remark 1.12. Another important proof technique is the so-called proof by contradic-
tion, also called indirect proof. It is based on the observation, called the principle of
contradiction, that A ∧ ¬A is always false:
A ¬A A ∧ ¬A
T F F (1.5)
F T F
Two more rules we will use regularly in subsequent proofs are the so-called transitivity
of implication and the transitivity of equivalence (we will encounter equivalence again
in the context of relations in Sec. 1.3 below). In preparation for the transitivity rules,
we generalize implication to propositional formulas:
Proof. According to Def. 1.13, the rules can be verified by providing truth tables that
show that, for all possible assignments of truth values to the propositional formulas on
the left-hand side of the implications, either the left-hand side is false or both sides are
true. (a):
A B C A ⇒ B B ⇒ C (A ⇒ B) ∧ (B ⇒ C) A ⇒ C
T T T T T T T
T F T F T F T
F T T T T T T
F F T T T T T
T T F T F F F
T F F F T F F
F T F T F F T
F F F T T T T
(b):
A B C A ⇔ B B ⇔ C (A ⇔ B) ∧ (B ⇔ C) A ⇔ C
T T T T T T T
T F T F F F T
F T T F T F F
F F T T F F F
T T F T F F F
T F F F T F F
F T F F F F T
F F F T T T T
Having checked both rules, the proof is complete.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 14
Definition 1.18 means we know everything about a set M if, and only if, we know all its
elements.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 15
Definition 1.19. The set with no elements is called the empty set; it is denoted by the
symbol ∅.
Example 1.20. For finite sets, we can simply write down all its elements,
√ for example,
A := {0}, B := {0, 17.5}, C := {5, 1, 5, 3}, D := {3, 5, 1}, E := {2, 2, −2}, where the
symbolism “:=” is to be read as “is defined to be equal to”.
Note C = D, since both sets contain precisely the same elements. In particular, the
order in which the elements are written down plays no role and a set does not change if
an element is written down more than once.
If a set has many elements, instead of writing down all its elements, one might use
abbreviations such as F := {−4, −2, . . . , 20, 22, 24}, where one has to make sure the
meaning of the dots is clear from the context.
Definition 1.21. The set A is called a subset of the set B (denoted A ⊆ B and also
referred to as the inclusion of A in B) if, and only if, every element of A is also an
element of B (one sometimes also calls B a superset of A and writes B ⊇ A). Please
note that A = B is allowed in the above definition of a subset. If A ⊆ B and A 6= B,
then A is called a strict subset of B, denoted A ( B.
If B is a set and P (x) is a statement about an element x of B (i.e., for each x ∈ B,
P (x) is either true or false), then we can define a subset A of B by writing
A := {x ∈ B : P (x)}. (1.6)
This notation is supposed to mean that the set A consists precisely of those elements of
B such that P (x) is true (has the truth value T in the language of Sec. 1.2).
(c) We have {3} ⊆ {6.7, 3, 0}. Letting A := {−10, −8, . . . , 8, 10}, we have {−2, 0, 2} =
{x ∈ A : x3 ∈ A}, ∅ = {x ∈ A : x + 21 ∈ A}.
Remark 1.23. As a consequence of Def. 1.18, the sets A and B are equal if, and only
if, one has both inclusions, namely A ⊆ B and B ⊆ A. Thus, when proving the equality
of sets, one often divides the proof into two parts, first proving one inclusion, then the
other.
Definition 1.24. (a) The intersection of the sets A and B, denoted A ∩ B, consists of
all elements that are in A and in B. The sets A, B are said to be disjoint if, and
only if, A ∩ B = ∅.
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 16
(b) The union of the sets A and B, denoted A ∪ B, consists of all elements that are in
A or in B (as in the logical disjunction in (1.2), the or is meant nonexclusively). If
A and B are disjoint, one sometimes writes A ∪˙ B and speaks of the disjoint union
of A and B.
(c) The difference of the sets A and B, denoted A\B (read “A minus B” or “A without
B”), consists of all elements of A that are not elements of B, i.e. A \ B := {x ∈
A: x∈ / B}. If B is a subset of a given set A (sometimes called the universe in
this context), then A \ B is also called the complement of B with respect to A.
In that case, one also writes B c := A \ B (note that this notation suppresses the
dependence on A).
Example 1.25. (a) Examples of Intersections:
As mentioned
earlier, it will
often be unavoidable
to consider sets of sets. Here are first
examples: ∅, {0}, {0, 1} , {0, 1}, {1, 2} .
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 17
Definition 1.26. Given a set A, the set of all subsets of A is called the power set of A,
denoted P(A) (for reasons explained later (cf. Prop. 2.18), the power set is sometimes
also denoted as 2A ).
So far, we have restricted our set-theoretic examples to finite sets. However, not sur-
prisingly, many sets of interest to us will be infinite (we will have to postpone a mathe-
matically precise definition of finite and infinite to Sec. 3.2). We will now introduce the
most simple infinite set.
Definition 1.28. The set N := {1, 2, 3, . . . } is called the set of natural numbers (for a
more rigorous construction of N, based on the axioms of axiomatic set theory, see Sec.
A.3.4 of the Appendix, where Th. A.46 shows N to be, indeed, infinite). Moreover, we
define N0 := {0} ∪ N.
—
Proof. In each case, the proof results from the corresponding rule of Th. 1.11:
(a):
Th. 1.11(c)
x∈A∩B ⇔x∈A∧x∈B ⇔ x ∈ B ∧ x ∈ A ⇔ x ∈ B ∩ A.
That means we are interested in statements involving universal quantification via the
quantifier “for all” (one also often uses “for each” or “for every” instead), existential
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 19
quantification via the quantifier “there exists”, or both. The quantifier of universal
quantification is denoted by ∀ and the quantifier of existential quantification is denoted
by ∃. Using these symbols as well as N and R to denote the sets of natural and real
numbers, respectively, we can restate (1.11) as
∀ P (x), (1.13a)
x∈A
∃ P (x). (1.13b)
x∈A
In (1.13), A denotes a set and P (x) is a sentence involving the variable x, a so-called
predicate of x, that becomes a statement (i.e. becomes either true or false) if x is substi-
tuted with any concrete element of the set A (in particular, P (x) is allowed to contain
further quantifiers, but it must not contain any other quantifier involving x – one says
x must be a free variable in P (x), not bound by any quantifier in P (x)).
The universal statement (1.13a) has the truth value T if, and only if, P (x) has the truth
value T for all elements x ∈ A; the existential statement (1.13b) has the truth value T
if, and only if, P (x) has the truth value T for at least one element x ∈ A.
V W
Remark 1.32. Some people prefer to write instead of ∀ and instead of ∃ .
x∈A x∈A x∈A x∈A
Even though this notation has the advantage of emphasizing that the universal statement
can be interpreted as a big logical conjunction and the existential statement can be
interpreted as a big logical disjunction, it is significantly less common. So we will stick
to ∀ and ∃ in this class.
Remark 1.33. According to Def. 1.31, the existential statement (1.13b) is true if, and
only if, P (x) is true for at least one x ∈ A. So if there is precisely one such x, then
(1.13b) is true; and if there are several different x ∈ A such that P (x) is true, then
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 20
(1.13b) is still true. Uniqueness statements are often of particular importance, and one
sometimes writes
∃! P (x) (1.14)
x∈A
for the statement “there exists a unique x ∈ A such that P (x) is true”. This notation
can be defined as an abbreviation for
∃ P (x) ∧ ∀ P (y) ⇒ x = y . (1.15)
x∈A y∈A
Remark 1.35. As for propositional calculus, we also have some important rules for
predicate calculus:
(a) Consider the negation of a universal statement, ¬ ∀ P (x), which is true if, and
x∈A
only if, P (x) does not hold for each x ∈ A, i.e. if, and only if, there exists at least
one x ∈ A such that P (x) is false (such that ¬P (x) is true). We have just proved
the rule
¬ ∀ P (x) ⇔ ∃ ¬P (x). (1.17a)
x∈A x∈A
Similarly, consider the negation of an existential statement. We claim the corre-
sponding rule is
¬ ∃ P (x) ⇔ ∀ ¬P (x). (1.17b)
x∈A x∈A
(b) If A, B are sets and P (x, y) denotes a predicate of both x and y, then ∀ ∀ P (x, y)
x∈A y∈B
and ∀ ∀ P (x, y) both hold true if, and only if, P (x, y) holds true for each x ∈ A
y∈B x∈A
and each y ∈ B, i.e. the order of two consecutive universal quantifiers does not
matter:
∀ ∀ P (x, y) ⇔ ∀ ∀ P (x, y) (1.19a)
x∈A y∈B y∈B x∈A
(b) As a more complicated example, consider the negation of the uniqueness statement
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 22
So how to decode the expression, we have obtained at the end? It states that if
P (x) holds for some x ∈ A, then there must be at least a second, different, element
y ∈ A such that P (y) is true. This is, indeed, precisely the negation of ∃! P (x).
x∈A
(d) The following example shows that different quantifiers do, in general, not commute
(i.e. do not yield equivalent statements when commuted): While the statement
∀ ∃ y>x (1.24a)
x∈R y∈R
is true (for each real number x, there is a bigger real number y, e.g. y := x + 1 will
do the job), the statement
∃ ∀ y>x (1.24b)
y∈R x∈R
is false (for example, since y > y is false). In particular, (1.24a) and (1.24b) are not
equivalent.
(e) Even though (1.14) provides useful notation, it is better not to think of ∃! as a
quantifier. It is really just an abbreviation for (1.15), and it behaves very differently
from ∃ and ∀: The following examples show that, in general, ∃! commutes neither
with ∃, nor with itself:
∃ ∃! m < n 6⇔ ∃! ∃ m<n
n∈N m∈N m∈N n∈N
1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 23
(the statement on the left is true, as one can choose n = 2, but the statement on
the right is false, as ∃ m < n holds for every m ∈ N). Similarly,
n∈N
∃! ∃! m < n 6⇔ ∃! ∃! m < n
n∈N m∈N m∈N n∈N
(the statement on the left is still true and the statement on the right is still false
(there is no m ∈ N such that ∃! m < n)).
n∈N
Remark 1.37. One can make the following observations regarding the strategy for
proving universal and existential statements:
(a) To prove that ∀ P (x) is true, one must check the truth of P (x) for every element
x∈A
x ∈ A – examples are not enough!
(b) To prove that ∀ P (x) is false, it suffices to find one x ∈ A such that P (x) is
x∈A
false – such an x is then called a counterexample and one counterexample is always
enough to prove ∀ P (x) is false!
x∈A
(c) To prove that ∃ P (x) is true, it suffices to find one x ∈ A such that P (x) is true
x∈A
– such an x is then called an example and one example is always enough to prove
∃ P (x) is true!
x∈A
The subfield of mathematical logic dealing with quantified statements is called predicate
calculus. In general, one does not restrict the quantified variables to range only over
elements of sets (as we have done above). Again, we refer to [EFT07] for a deeper
treatment of the subject.
As an application of quantified statements, let us generalize the notion of union and
intersection:
Definition 1.38. Let I 6= ∅ be a nonempty set, usually called an index set in the present
context. For each i ∈ I, let Ai denote a set (some or all of the Ai can be identical).
consists of all elements x that belong to at least one Ai . The union is called disjoint
if, and only if, for each i, j ∈ I, i 6= j implies Ai ∩ Aj = ∅.
Proposition 1.39. Let I 6= ∅ be an index set, let M denote a set, and, for each i ∈ I,
let Ai denote a set. The following set-theoretic rules hold:
T T
(a) Ai ∩M = (Ai ∩ M ).
i∈I i∈I
S S
(b) Ai ∪M = (Ai ∪ M ).
i∈I i∈I
T T
(c) Ai ∪M = (Ai ∪ M ).
i∈I i∈I
S S
(d) Ai ∩M = (Ai ∩ M ).
i∈I i∈I
T S
(e) M \ Ai = (M \ Ai ).
i∈I i∈I
S T
(f ) M \ Ai = (M \ Ai ).
i∈I i∈I
Proof. We prove (c) and (e) and leave the remaining proofs as an exercise.
(c):
!
\ (∗)
x∈ Ai ∪ M ⇔ x ∈ M ∨ ∀ x ∈ Ai ⇔ ∀ x ∈ Ai ∨ x ∈ M
i∈I i∈I
i∈I
\
⇔ x∈ (Ai ∪ M ).
i∈I
To justify the equivalence at (∗), we make use of Th. 1.11(b) and verify ⇒ and ⇐. For
⇒ note that the truth of x ∈ M implies x ∈ Ai ∨ x ∈ M is true for each i ∈ I. If x ∈ Ai
is true for each i ∈ I, then x ∈ Ai ∨ x ∈ M is still true for each i ∈ I. To verify ⇐, note
that the existence of i ∈ I such that x ∈ M implies the truth of x ∈ M ∨ ∀ x ∈ Ai .
i∈I
If x ∈ M is false for each i ∈ I, then x ∈ Ai must be true for each i ∈ I, showing
x ∈ M ∨ ∀ x ∈ Ai is true also in this case.
i∈I
2 FUNCTIONS AND RELATIONS 25
(e):
\
x∈M\ Ai ⇔ x ∈ M ∧ ¬ ∀ x ∈ Ai ⇔ x ∈ M ∧ ∃ x ∈
/ Ai
i∈I i∈I
i∈I
[
⇔ ∃ x ∈ M \ Ai ⇔ x ∈ (M \ Ai ),
i∈I
i∈I
Comparing with the notation of Def. 1.38, in (1.26a), for example, we have I = R and
Ai = N for each i ∈ I (where, in (1.26a), we have written x instead of i). Similarly, in
(1.26b), we have I = N and An = {1, 2, . . . , n} for each n ∈ I.
2.1 Functions
Definition 2.1. Let A, B be sets. Given x ∈ A, y ∈ B, the set
n o
(x, y) := {x}, {x, y} (2.1)
is called the ordered pair (often shortened to just pair) consisting of x and y. The set of
all such pairs is called the Cartesian product A × B, i.e.
A × ∅ = ∅ × A = ∅, (2.3a)
{1, 2} × {1, 2, 3} = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)} (2.3b)
6 {1, 2, 3} × {1, 2} = {(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)}.
= (2.3c)
Definition 2.3. Given sets A, B, a function or map f is an assignment rule that assigns
to each x ∈ A a unique y ∈ B. One then also writes f (x) for the element y. The set A
is called the domain of f , denoted dom(f ), and B is called the codomain of f , denoted
codom(f ). The information about a map f can be concisely summarized by the notation
f : A −→ B, x 7→ f (x), (2.5)
where x 7→ f (x) is called the assignment rule for f , f (x) is called the image of x, and
x is called a preimage of f (x) (the image must be unique, but there might be several
preimages). The set
graph(f ) := (x, y) ∈ A × B : y = f (x) (2.6)
is called the graph of f (not to be confused with pictures visualizing the function f ,
which are also called graph of f ). If one wants to be completely precise, then one
identifies the function f with the ordered triple (A, B, graph(f )).
The set of all functions with domain A and codomain B is denoted by F(A, B) or B A ,
i.e.
F(A, B) := B A := (f : A −→ B) : A = dom(f ) ∧ B = codom(f ) . (2.7)
Caveat: Some authors reserve the word map for continuous functions, but we use func-
tion and map synonymously.
f (T ) := {f (x) ∈ B : x ∈ T } (2.8)
f −1 (U ) := {x ∈ A : f (x) ∈ U } (2.9)
(c) f is called injective or one-to-one if, and only if, every y ∈ B has at most one
preimage, i.e. if, and only if, the preimage of {y} has at most one element:
−1
f injective ⇔ ∀ f {y} = ∅ ∨ ∃! f (x) = y
y∈B x∈A
⇔ ∀ x1 6= x2 ⇒ f (x1 ) 6= f (x2 ) . (2.10)
x1 ,x2 ∈A
(d) f is called surjective or onto if, and only if, every element of the codomain of f has
a preimage:
(e) f is called bijective if, and only if, f is injective and surjective.
f ({1, 2}) = {5, 4} = f −1 ({1, 2}), h̃−1 ({2, 4, 6}) = {1, 2, 3, 4, 5, 6}, (2.13)
Example 2.6. (a) For each nonempty set A, the map Id : A −→ A, Id(x) := x, is
called the identity on A. If one needs to emphasize that Id operates on A, then one
also writes IdA instead of Id. The identity is clearly bijective.
(b) Let A, B be nonempty sets. A map f : A −→ B is called constant if, and only if,
there exists c ∈ B such that f (x) = c for each x ∈ A. In that case, one also writes
f ≡ c, which can be read as “f is identically equal to c”. If f ≡ c, ∅ 6= T ⊆ A, and
U ⊆ B, then (
A for c ∈ U ,
f (T ) = {c}, f −1 (U ) = (2.14)
∅ for c ∈/ U.
f is injective if, and only if, A = {x}; f is surjective if, and only if, B = {c}.
f (S ∩ T ) ⊆ f (S) ∩ f (T ), (2.16a)
!
\ \
f Si ⊆ f (Si ), (2.16b)
i∈I i∈I
f (S ∪ T ) = f (S) ∪ f (T ), (2.16c)
!
[ [
f Si = f (Si ), (2.16d)
i∈I i∈I
f −1 (U ∩ V ) = f (U ) ∩ f −1 (V ),
−1
(2.16e)
!
\ \
f −1 Ui = f −1 (Ui ), (2.16f)
i∈I i∈I
−1
f (U ∪ V ) = f (U ) ∪ f −1 (V ),
−1
(2.16g)
!
[ [
f −1 Ui = f −1 (Ui ), (2.16h)
i∈I i∈I
2 FUNCTIONS AND RELATIONS 29
f (f −1 (U )) ⊆ U, f −1 (f (S)) ⊇ S, (2.16i)
f −1 (U \ V ) = f −1 (U ) \ f −1 (V ). (2.16j)
Proof. We prove (2.16b) (which includes (2.16a) as a special case) and the second part
of (2.16i), and leave the remaining cases as exercises.
For (2.16b), one argues
!
\ \
y∈f Si ⇔ ∃ ∀ x ∈ Si ∧ y = f (x) ⇒ ∀ y ∈ f (Si ) ⇔ y ∈ f (Si ).
x∈A i∈I i∈I
i∈I i∈I
The observation
It is an exercise to find counterexamples that show one can not, in general, replace the
four subset symbols in (2.16) by equalities (it is possible to find examples with sets that
have at most 2 elements).
f : N −→ R, n 7→ n2 , (2.18a)
g : N −→ R, n 7→ 2n. (2.18b)
showing that composing functions is, in general, not commutative, even if the involved
functions have the same domain and the same codomain.
2 FUNCTIONS AND RELATIONS 30
h ◦ (g ◦ f ) = (h ◦ g) ◦ f. (2.20)
The maps
(
n/2 if n even,
g1 : N −→ N, g1 (n) := (2.23b)
1 if n odd,
(
n/2 if n even,
g2 : N −→ N, g2 (n) := (2.23c)
2 if n odd,
both constitute left inverses of f . It follows from Th. 2.13(c) below that f does not
have a right inverse.
2 FUNCTIONS AND RELATIONS 31
The maps
both constitute right inverses of f . It follows from Th. 2.13(c) below that f does
not have a left inverse.
the inverse is
3 for n = 1,
1 for n = 2,
g −1 : N −→ N, g −1 (n) := (2.25c)
2 for n = 3,
n for n∈/ {1, 2, 3}.
While Examples 2.12(a),(b) show that left and right inverses are usually not unique,
they are unique provided f is bijective (see Th. 2.13(c)).
(a) f : A −→ B is right invertible if, and only if, f is surjective (where the implication
“⇐” makes use of the axiom of choice (AC), see Appendix A.4).
(c) f : A −→ B is invertible if, and only if, f is bijective. In this case, the right inverse
and the left inverse are unique and both identical to the inverse.
Proof. (a): If f is surjective, then, for each y ∈ B, there exists xy ∈ f −1 {y} such that
f (xy ) = y. By AC, we can define the choice function
g : B −→ A, g(y) := xy . (2.26)
Then, for each y ∈ B, f (g(y)) = y, showing g is a right inverse of f . Conversely, if
g : B −→ A is a right inverse of f , then, for each y ∈ B, it is y = f (g(y)), showing that
g(y) ∈ A is a preimage of y, i.e. f is surjective.
(b): Fix a ∈ A. If f is injective, then, for each y ∈ B with f −1 {y} 6= ∅, let xy denote
the unique element in A satisfying f (xy ) = y. Define
(
xy for f −1 {y} 6= ∅,
g : B −→ A, g(y) := (2.27)
a otherwise.
Then, for each x ∈ A, g(f (x)) = x, showing g is a left inverse of f . Conversely, if
g : B −→ A is a left inverse of f and x1 , x2 ∈ A with f (x1 ) = f (x2 ) = y, then
x1 = (g ◦ f )(x1 ) = g(f (x1 )) = g(f (x2 )) = (g ◦ f )(x2 ) = x2 , showing y has precisely one
preimage and f is injective.
The first part of (c) follows immediately by combining (a) and (b) (and, actually, without
using AC, since, if f is both injective and surjective, then, for each y ∈ B, the element
xy ∈ f −1 {y} is unique, and (2.26) can be defined without AC). It merely remains to
verify the uniqueness of right and left inverse for bijective maps. So let g be a left inverse
of f , let h be a right inverse of f , and let f −1 be an inverse of f . Then, for each y ∈ B,
g(y) = g ◦ (f ◦ f −1 ) (y) = (g ◦ f ) ◦ f −1 (y) = f −1 (y), (2.28a)
h(y) = (f −1 ◦ f ) ◦ h (y) = f −1 ◦ (f ◦ h) (y) = f −1 (y), (2.28b)
thereby proving the uniqueness of left and right inverse for bijective maps.
Theorem 2.14. Consider maps f : A −→ B, g : B −→ C. If f and g are both injective
(resp. both surjective, both bijective), then so is g ◦ f . Moreover, in the bijective case,
one has
(g ◦ f )−1 = f −1 ◦ g −1 . (2.29)
Proof. Exercise.
Definition 2.15. (a) Given an index set I and a set A, a map f : I −→ A is sometimes
called a family (of elements in A), and is denoted in the form f = (ai )i∈I with
ai := f (i). When using this representation, one often does not even specify f and
A, especially if the ai are themselves sets.
2 FUNCTIONS AND RELATIONS 33
(b) A sequence in a set A is a family of elements in A, where the index set is the set of
natural numbers N. In this case, one writes (an )n∈N or (a1 , a2 , . . . ). More generally,
a family is called a sequence, given a bijective map between the index set I and a
subset of N.
(c) Given a family of sets (Ai )i∈I , we define the Cartesian product of the Ai to be the
set of functions
( ! )
Y [
Ai := f : I −→ Aj : ∀ f (i) ∈ Ai . (2.30)
i∈I
i∈I j∈I
QI has precisely n elements with n ∈ N, then the elements of the Cartesian product
If
i∈I Ai are called (ordered) n-tuples, (ordered) triples for n = 3.
Example
T 2.16. (a) Using
S the notion of family, we can now say that the intersection
i∈I Ai and union i∈I Ai as defined in Def. 1.38 are the intersection and union of
the family of sets (Ai )i∈I , respectively. As a concrete example, let us revisit (1.26b),
where we have
\
(An )n∈N , An := {1, 2, . . . , n}, An = {1}. (2.31)
n∈N
In the following, we explain the common notation 2A for the power set P(A) of a set
A. It is related to a natural identification between subsets and their corresponding
characteristic function.
is called the characteristic function of the set B (with respect to the universe A). One
also finds the notations 1B and 1B instead of χB (note that all the notations supress
the dependence of the characteristic function on the universe A).
is bijective (recall that P(A) denotes the power set of A and {0, 1}A denotes the set of
all functions from A into {0, 1}).
Proposition 2.18 allows one to identify the sets P(A) and {0, 1}A via the bijective map
χ. This fact together with the common practice of set theory to identify the number 2
with the set {0, 1} explains the notation 2A for P(A).
2.2 Relations
2.2.1 Definition and Properties
Definition 2.19. Given sets A and B, a relation is a subset R of A × B (if one wants
to be completely precise, a relation is an ordered triple (A, B, R), where R ⊆ A × B).
If A = B, then we call R a relation on A. One says that a ∈ A and b ∈ B are related
according to the relation R if, and only if, (a, b) ∈ R. In this context, one usually writes
a R b instead of (a, b) ∈ R.
2 FUNCTIONS AND RELATIONS 35
Example 2.20. (a) The relations we are probably most familiar with are = and ≤.
The relation R of equality, usually denoted =, makes sense on every nonempty set
A:
R := ∆(A) := {(x, x) ∈ A × A : x ∈ A}. (2.36)
The set ∆(A) is called the diagonal of the Cartesian product, i.e., as a subset of
A × A, the relation of equality is identical to the diagonal:
∀ x R x, (2.41)
x∈A
i.e. if, and only if, each x is related to y if, and only if, y is related to x.
i.e. if, and only if, the only possibility for x to be related to y at the same time that
y is related to x is in the case x = y.
2 FUNCTIONS AND RELATIONS 36
i.e. if, and only if, the relatedness of x and y together with the relatedness of y and
z implies the relatedness of x and z.
Example 2.22. The relations = and ≤ on R (or N) are reflexive, antisymmetric, and
transitive; = is also symmetric, whereas ≤ is not; < is antisymmetric (since x < y∧y < x
is always false) and transitive, but neither reflexive nor symmetric. The relation
R := (x, y) ∈ N2 : (x, y are both even) ∨ (x, y are both odd) (2.45)
S := {(x, y) ∈ N2 : y = x2 } (2.46)
is not transitive (for example, 2 S 4 and 4 S 16, but not 2 S 16), not reflexive, not sym-
metric; it is only antisymmetric.
Definition 2.23. A relation R on a set A is called a partial order if, and only if, R is
reflexive, antisymmetric, and transitive. If R is a partial order, then one usually writes
x ≤ y instead of x R y. A partial order ≤ is called a total or linear order if, and only if,
for each x, y ∈ A, one has x ≤ y or y ≤ x.
Notation 2.24. Given a (partial or total) order ≤ on A 6= ∅, we write x < y if, and
only if, x ≤ y and x 6= y, calling < the strict order corresponding to ≤ (note that the
strict order is never a partial order).
(a) x ∈ A is called lower (resp. upper) bound for B if, and only if, x ≤ b (resp. b ≤ x)
for each b ∈ B. Moreover, B is called bounded from below (resp. from above) if, and
only if, there exists a lower (resp. upper) bound for B; B is called bounded if, and
only if, it is bounded from above and from below.
(b) x ∈ B is called minimum or just min (resp. maximum or max) of B if, and only if,
x is a lower (resp. upper) bound for B. One writes x = min B if x is minimum and
x = max B if x is maximum.
2 FUNCTIONS AND RELATIONS 37
(c) A maximum of the set of lower bounds of B (i.e. a largest lower bound) is called
infimum of B, denoted inf B; a minimum of the set of upper bounds of B (i.e. a
smallest upper bound) is called supremum of B, denoted sup B.
Example 2.26. (a) For each A ⊆ R, the usual relation ≤ defines a total order on A.
For A = R, we see that N has 0 and 1 as lower bound with 1 = min N = inf N. On
the other hand, N is unbounded from above. The set M := {1, 2, 3} is bounded
with min M = 1, max M = 3. The positive real numbers R+ := {x ∈ R : x > 0}
have inf R+ = 0, but they do not have a minimum (if x > 0, then 0 < x/2 < x).
(b) Consider A := N × N. Then
defines a partial order on A that is not a total order (for example, neither (1, 2) ≤
(2, 1) nor (2, 1) ≤ (1, 2)). For the set
B := (1, 1), (2, 1), (1, 2) , (2.48)
we have inf B = min B = (1, 1), B does not have a max, but sup B = (2, 2) (if
(m, n) ∈ A is an upper bound for B, then (2, 1) ≤ (m, n) implies 2 ≤ m and
(1, 2) ≤ (m, n) implies 2 ≤ n, i.e. (2, 2) ≤ (m, n); since (2, 2) is clearly an upper
bound for B, we have proved sup B = (2, 2)).
A different order on A is the so-called lexicographic order defined by
In contrast to the order from (2.47), the lexicographic order does define a total
order on A.
Lemma 2.27. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. Then the relation ≥,
defined by
x ≥ y ⇔ y ≤ x, (2.50)
is also a partial order on A. Moreover, using obvious notation, we have, for each x ∈ A,
Proof. Reflexivity, antisymmetry, and transitivity of ≤ clearly imply the same properties
for ≥, respectively. Moreover
proving (2.51a). Analogously, we obtain (2.51b). Next, (2.51c) and (2.51d) are implied
by (2.51a) and (2.51b), respectively. Finally, (2.51e) is proved by
Proof. Exercise.
Definition 2.29. Let A, B be nonempty sets with partial orders, both denoted by ≤
(even though they might be different). A function f : A −→ B, is called (strictly)
isotone, order-preserving, or increasing if, and only if,
∀ x < y ⇒ f (x) ≤ f (y) (resp. f (x) < f (y)) ; (2.52a)
x,y∈A
Functions that are (strictly) isotone or antitone are called (strictly) monotone.
Proposition 2.30. Let A, B be nonempty sets with partial orders, both denoted by ≤.
(c) If the order ≤ on A is total and f : A −→ B is invertible and strictly isotone (resp.
antitone), then f −1 is also strictly isotone (resp. antitone).
2 FUNCTIONS AND RELATIONS 39
is strictly isotone and invertible, however f −1 is not isotone (since 2 < 3, but
f −1 (2) = (1, 2) and f −1 (3) = (2, 1) are not comparable, i.e. f −1 (2) ≤ f −1 (3) is not
true).
(a) R is called an equivalence relation if, and only if, R is reflexive, symmetric, and
transitive. If R is an equivalence relations, then one often writes x ∼ y instead of
x R y.
2 FUNCTIONS AND RELATIONS 40
[x] := {y ∈ A : x ∼ y} (2.53)
and call [x] the equivalence class of x. Moreover, each y ∈ [x] is called a represen-
tative of [x]. The set of all equivalence classes A/ ∼:= {[x] : x ∈ A} is called the
quotient set of A by ∼, and the map
π : A −→ A/ ∼, x 7→ [x], (2.54)
The following Th. 2.33 shows that the equivalence classes of an equivalence relation
on a nonempty set A decompose A into disjoint sets and, conversely, that a given
decomposition of a nonempty set A into disjoint nonempty sets Ai , i ∈ I, gives rise
to a unique equivalence relation ∼ on A such that the Ai are precisely the equivalence
classes corresponding to ∼:
(b) Given an equivalence relation ∼ on a nonempty set A, the equivalence classes given
by ∼ form a decomposition of A: One has
∀ [x] = [y] ⇔ x ∼ y ∧ [x] ∩ [y] = ∅ ⇔ ¬(x ∼ y) (2.57)
x,y∈A
and
˙
[
A= Ai , (2.58)
i∈I
To prove the second equivalence of (2.57), we now show [x] 6= [y] ⇔ [x] ∩ [y] = ∅: If
[x] ∩ [y] = ∅, then [x] = [y] could only hold for [x] = [y] = ∅. However, x ∈ [x] and
y ∈ [y], showing [x] 6= [y]. For the converse, we argue via contraposition and assume
z ∈ [x] ∩ [y]. Then x ∼ z and y ∼ z and, by symmetry and transitivity of ∼, x ∼ y and
[x] = [y], proving the second equivalence of (2.57). It remains to verify (2.58). From
(2.57), we know the elements of A/ ∼ to be disjoint. On the other hand, if x ∈ A, then
x ∈ [x] ∈ A/ ∼, showing A to be the union of the Ai , thereby proving (2.58) and the
theorem.
(b) The relation R defined in (2.45) is an equivalence relation on N. Here, R yields pre-
cisely two equivalence classes, one consisting of all even numbers and one consisting
of all odd numbers.
(b) Having constructed the set if integers Z in (a), in a next step, one can perform a
similar construction to obtain the set of rational numbers Q. One defines Q as the
1
In (2.59), we employ the usual addition on N. Its mathematically precise definition is actually
somewhat tedious and, at this stage, we do not even have all the necessary prerequisites in place, yet.
Still, it should be possible to follow the present example, using one’s intuitive, informal understanding
of the addition on N0 . The precise definition can be found in [Phi16, Sec. D.1] and interested readers
might want to study the definition in [Phi16, Sec. D.1] once we have introduced the notions of induction
and recursion.
2 FUNCTIONS AND RELATIONS 43
Noting that (2.63) is precisely the same as (2.59) if + is replaced by ·, the proof
from (a) also shows that (2.63) does, indeed, define an equivalence relation on
Z × (Z \ {0}): One merely replaces each + with · and each N0 with Z or Z \ {0}),
respectively. The only modification needed occurs for 0 ∈ {a, c, e} in the proof of
transitivity (in this case, the proof of (a) yields adcf = 0 = bcde, which does not
imply af = be), where one now argues, for a = 0,
for c = 0,
and, for e = 0,
While the set of real numbers R can now also be constructed from the set Q, making
use of the notions of equivalence relation and quotient set, the construction is more
complicated and it also makes use of additional notions from the field of Analysis.
Thus, this construction is not within the scope of this class and the interested reader is
referred to [Phi16, Sec. D.5].
Induction is based on the fact that N satisfies the so-called Peano axioms:
P2: There exists an injective map S : N −→ N \ {1}, called the successor function (for
each n ∈ N, S(n) is called the successor of n).
P3: If a subset A of N has the property that 1 ∈ A and S(n) ∈ A for each n ∈ A, then
A is equal to N. Written as a formula, the third axiom is:
∀ 1 ∈ A ∧ S(A) ⊆ A ⇒ A = N .
A∈P(N)
Remark 3.1. In Def. 1.28, we had introduced the natural numbers N := {1, 2, 3, . . . }.
The successor function is S(n) = n + 1. In axiomatic set theory, one starts with the
Peano axioms and shows that the axioms of set theory allow the construction of a
set N which satisfies the Peano axioms. One then defines 2 := S(1), 3 := S(2), . . . ,
n + 1 := S(n). The interested reader can find more details in [Phi16, Sec. D.1].
Theorem 3.2 (Principle of Induction). Suppose, for each n ∈ N, φ(n) is a statement
(i.e. a predicate of n in the language of Def. 1.31). If (a) and (b) both hold, where
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 45
Remark 3.3. To prove some φ(n) for each n ∈ N by induction according to Th. 3.2
consists of the following two steps:
(b) Perform the inductive step, i.e. prove that φ(n) (the induction hypothesis) implies
φ(n + 1).
1·2
Base Case (n = 1): 1 = 2
, i.e. φ(1) is true.
n(n+1)
Induction Hypothesis: Assume φ(n), i.e. 1 + 2 + · · · + n = 2
holds.
Induction Step: One computes
φ(n) n(n + 1) n(n + 1) + 2n + 2
1 + 2 + · · · + n + (n + 1) = +n+1=
2 2
2
n + 3n + 2 (n + 1)(n + 2)
= = , (3.4)
2 2
i.e. φ(n + 1) holds and the induction is complete.
Proof. If, for each n ∈ N, we use ψ(n) to denote ∀ φ(m), then (3.5) is equivalent to
1≤m≤n
∀ ψ(n) ⇒ ψ(n + 1) , i.e. to Th. 3.2(b) with φ replaced by ψ. Thus, Th. 3.2 implies
n∈N
ψ(n) holds true for each n ∈ N, i.e. φ(n) holds true for each n ∈ N.
Corollary 3.6. Let I be an index set. Suppose, for each i ∈ I, φ(i) is a statement. If
there is a bijective map f : N −→ I and (a) and (b) both hold, where
(a) φ f (1) is true,
(b) ∀ φ f (n) ⇒ φ f (n + 1) ,
n∈N
Apart from providing a widely employable proof technique, the most important appli-
cation of Th. 3.2 is the possibility to define sequences (i.e. functions with domain N, cf.
Def. 2.15(b)) inductively, using so-called recursion:
(i) x1 = x.
Proof. To prove uniqueness, let (xn )n∈N and (yn )n∈N be sequences in A, both satisfying
(i) and (ii), i.e.
x1 = y1 = x and (3.6a)
∀ xn+1 = fn (x1 , . . . , xn ) ∧ yn+1 = fn (y1 , . . . , yn ) . (3.6b)
n∈N
We prove by induction (in the form of Cor. 3.5) that (xn )n∈N = (yn )n∈N , i.e.
∀ xn = yn : (3.7)
n∈N | {z }
φ(n)
F (1) = x, (3.9a)
∀ F (n + 1) = fn F (1), . . . , F (n) . (3.9b)
n∈N
and \
G := B. (3.11)
B∈F
∀ ∃! (n, xn ) ∈ G . (3.12)
n∈N xn ∈A
| {z }
φ(n)
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 48
Base Case (n = 1): From the definition of G, we know (1, x) ∈ G. If (1, a) ∈ G with
a 6= x, then H := G \ {(1, a)} ∈ F, implying G ⊆ H in contradiction to (1, a) ∈
/ H.
This shows a = x and proves φ(1).
Induction Hypothesis: Assume φ(m) for each m ∈ {1, . . . , n}.
Induction Step: From the induction hypothesis, we know
∃! (1, x1 ), . . . , (n, xn ) ∈ G.
(x1 ,...,xn )∈An
i.e. we have
gn : A −→ A, gn (x) := x + an+1 . (3.15b)
In (3.15a), one can also use other symbols for i, except a and n; for a finite sequence,
n needs to be less than the maximal index of the finite sequence.
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 49
(c) Product Symbol: On A = R (or, more generally, on every set A, where a multiplica-
tion · : A × A −→ A is defined), define recursively, for each given (possibly finite)
sequence (a1 , a2 , . . . ) in A:
1
Y n+1
Y n
Y
ai := a1 , ai := an+1 · ai for n ≥ 1, (3.16a)
i=1 i=1 i=1
i.e. we have
gn : A −→ A, gn (x) := an+1 · x. (3.16b)
In (3.16a), one can also use other symbols for i, except a and n; for a finite sequence,
n needs to be less than the maximal index of the finite sequence.
More generally, if I is an index set and φ : {1, . . . , n} −→ I a bijective map, then
define n
Y Y
ai := aφ(i) . (3.16c)
i∈I i=1
Example 3.9. As an (academic) example, where, in each step, the recursive definition
does depend on all previously computed values, consider the sequence (xn )n∈N , defined
by
n
1 Y
x1 := 1, ∀ xn+1 := xi ,
n∈N n i=1
i.e. by setting A := N and
n
1 Y
fn : An −→ A, fn (x1 , . . . , xn ) := xi .
n i=1
One obtains
1 1 1
x1 = 1, x2 = f1 (1) = 1, x3 = f2 (1, 1) = , x4 = f3 1, 1, = ,
2 2 6
1 1 1 1 1 1 1
x5 = f4 1, 1, , = , x6 = f5 1, 1, , , = , ...
2 6 48 2 6 48 2880
—
In the above recursive definitions, we have always explicitly specified A and the gn or
fn . However, in the literature as well as in the rest of this class, most of the time, the
gn or fn are not provided explicitly.
Definition 3.10. (a) The sets A, B are defined to have the same cardinality or the
same size if, and only if, there exists a bijective map ϕ : A −→ B. According to
Th. 3.11 below, this defines an equivalence relation on every set of sets.
(b) The cardinality of a set A is n ∈ N (denoted #A = n) if, and only if, there exists
a bijective map ϕ : A −→ {1, . . . , n}. The cardinality of ∅ is defined as 0, i.e.
#∅ := 0. A set A is called finite if, and only if, there exists n ∈ N0 such that
#A = n; A is called infinite if, and only if, A is not finite, denoted #A = ∞ (in the
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 51
strict sense, this is an abuse of notation, since ∞ is not a cardinality – for example
#N = ∞ and #P(N) = ∞, but N and P(N) do not have the same cardinality, since
the power set P(A) is always strictly bigger than A (see Th. 3.16 below) – #A = ∞
is merely an abbreviation for the statement “A is infinite”). The interested student
finds additional material regarding characterizations of infinite sets in Th. A.54 of
the Appendix.
(c) The set A is called countable if, and only if, A is finite or A has the same cardinality
as N. Otherwise, A is called uncountable.
Proof. According to Def. 2.32, we have to prove that ∼ is reflexive, symmetric, and
transitive. According to Def. 3.10(a), A ∼ B holds for A, B ∈ M if, and only if, there
exists a bijective map f : A −→ B. Thus, since the identity Id : A −→ A is bijective,
A ∼ A, showing ∼ is reflexive. If A ∼ B, then there exists a bijective map f : A −→ B,
and f −1 is a bijective map f −1 : B −→ A, showing B ∼ A and that ∼ is symmetric.
If A ∼ B and B ∼ C, then there are bijective maps f : A −→ B and g : B −→ C.
Then, according to Th. 2.14, the composition (g ◦ f ) : A −→ C is also bijective, proving
A ∼ C and that ∼ is transitive.
(i) The sets A and B have the same cardinality (i.e. there exists a bijective map
φ : A −→ B).
We will base the proof of the Schröder-Bernstein theorem on the following lemma (for
an alternative proof, see [Phi16, Th. A.56]):
Lemma 3.13. Let A be a set. Consider P(A) to be endowed with the partial order
given by set inclusion, i.e., for each X, Y ∈ P(A), X ≤ Y if, and only if, X ⊆ Y . If
F : P(A) −→ P(A) is isotone with respect to that order, then F has a fixed point, i.e.
F (X0 ) = X0 for some X0 ∈ P(A).
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 52
Proof. Define \
A := {X ∈ P(A) : F (X) ⊆ X}, X0 := X
X∈A
Proof of Th. 3.12. (i) trivially implies (ii), as one can simply set f := φ and g := φ−1 .
It remains to show (ii) implies (i). Thus, let f : A −→ B and g : B −→ A be injective.
To apply Lem. 3.13, define
F : P(A) −→ P(A), F (X) := A \ g B \ f (X) ,
and note
X ⊆ Y ⊆ A ⇒ f (X) ⊆ f (Y ) ⇒ B \ f (Y ) ⊆ B \ f (X)
⇒ g B \ f (Y ) ⊆ g B \ f (X) ⇒ F (X) ⊆ F (Y ).
Thus, by Lem. 3.13, F has a fixed point X0 . We claim that a bijection is obtained via
setting (
f (x) for x ∈ X0 ,
φ : A −→ B, φ(x) := −1
g (x) for x ∈ / X0 .
First, φ is well-defined, since x ∈/ X0 = F (X0 ) implies x ∈ g B \ f (X0 ) . To verify
that φ is injective, let x, y ∈ A, x 6= y. If x, y ∈ X0 , then φ(x) 6= φ(y), as f is
injective. If x, y ∈ A \ X0 , then φ(x) 6= φ(y), as g −1 is well-defined. If x ∈ X0 and
y∈/ X0 , then φ(x) ∈ f (X0 ) and φ(y) ∈ B \ f (X0 ), once again, implying φ(x) 6= φ(y). If
remains to prove surjectivity. If b ∈ f (X0 ), then φ(f −1 (b)) = b. If b ∈ B \ f (X0 ), then
g(b) ∈
/ X0 = F (X0 ), i.e. φ(g(b)) = b, showing φ to be surjective.
Theorem 3.14. Let A, B be nonempty sets. Then the following statements are equiva-
lent (where the implication “(ii) ⇒ (i)” makes use of the axiom of choice (AC)).
Corollary 3.15. Let A, B be nonempty sets. Using AC, we can expand the two equiv-
alent statements of Th. 3.12 to the following list of equivalent statements:
(i) The sets A and B have the same cardinality (i.e. there exists a bijective map
φ : A −→ B).
Proof. The equivalences are an immediate consequence of combining Th. 3.12 with Th.
3.14.
Theorem 3.16. Let A be a set. There can never exist a surjective map from A onto
P(A) (in this sense, the size of P(A) is always strictly bigger than the size of A; in
particular, A and P(A) can never have the same size).
B := {x ∈ A : x ∈
/ f (x)}.
Now B is a subset of A, i.e. B ∈ P(A) and the assumption that f is surjective implies
the existence of a ∈ A such that f (a) = B. If a ∈ B, then a ∈ / f (a) = B, i.e. a ∈ B
implies a ∈ B ∧ ¬(a ∈ B), so that the principle of contradiction tells us a ∈ / B must be
true. However, a ∈/ B implies a ∈ f (a) = B, i.e., this time, the principle of contradiction
tells us a ∈ B must be true. In conclusion, we have shown our original assumption that
there exists a surjective map f : A −→ P(A) implies a ∈ B ∧ ¬(a ∈ B), i.e., according
to the principle of contradiction, no surjective map from A into P(A) can exist.
While many results concerning cardinalities of finite sets seem intuitively clear, as a
mathematician, one still has to provide a rigorous proof. Providing such proofs in the
present section also provides us with the opportunity to see more examples of induction
proofs. We begin by showing finite cardinalities to be uniquely determined. The key is
the following theorem:
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 54
Then g is bijective, since it is the composition g = h ◦ f of the bijective map f with the
bijective map
Thus, the restriction g↾{1,...,m−1} : {1, . . . , m−1} −→ {1, . . . , n−1} must also be bijective,
such that the induction hypothesis yields m − 1 = n − 1, which, in turn, implies m = n
as desired.
Proof. For #A = n ∈ N, we use induction to prove (a) and (b) simultaneously, i.e. we
show
∀ #A = n ⇒ ∀ ∀ #B ∈ {0, . . . , n − 1} ∧ # A \ {a} = n − 1 .
n∈N B∈P(A)\{A} a∈A
| {z }
φ(n)
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 55
Base Case (n = 1): In this case, A has precisely one element, i.e. B = A \ {a} = ∅, and
#∅ = 0 = n − 1 proves φ(1).
Induction Step: For the induction hypothesis, we assume φ(n) to be true, i.e. we assume
(a) and (b) hold for each A with #A = n. We have to prove φ(n + 1), i.e., we consider
A with #A = n + 1. From #A = n + 1, we conclude the existence of a bijective map ϕ :
A −→ {1, . . . , n + 1}. We have to construct a bijective map ψ : A \ {a} −→ {1, . . . , n}.
To this end, set k := ϕ(a) and define the auxiliary function
n + 1 for x = k,
f : {1, . . . , n + 1} −→ {1, . . . , n + 1}, f (x) := k for x = n + 1,
x for x ∈
/ {k, n + 1}.
Then f ◦ ϕ : A −→ {1, . . . , n + 1} is bijective by Th. 2.14, and
(f ◦ ϕ)(a) = f (ϕ(a)) = f (k) = n + 1.
Thus, the restriction ψ := (f ◦ ϕ) ↾A\{a} is the desired bijective map ψ : A \ {a} −→
{1, . . . , n}, proving # A \ {a} = n. It remains to consider the strict subset B of A.
Since B is a strict subset of A, there
exists a ∈ A \ B. Thus, B ⊆ A \ {a} and, as
we have already shown # A \ {a} = n, the induction hypothesis applies and yields B
is finite with #B ≤ # A \ {a} = n, i.e. #B ∈ {0, . . . , n}, proving φ(n + 1), thereby
completing the induction.
Theorem 3.20. For #A = #B = n ∈ N and f : A −→ B, the following statements
are equivalent:
(i) f is injective.
(ii) f is surjective.
(iii) f is bijective.
Proof. Exercise.
Lemma 3.21. For each finite set A (i.e. #A = n ∈ N0 ) and each B ⊆ A, one has
#(A \ B) = #A − #B.
Base Case (m = 1): φ(1) is precisely the statement provided by Th. 3.19(b).
Induction Step: For the induction hypothesis, we assume φ(m) with 1 ≤ m < n. To
prove φ(m + 1), consider B ⊆ A with #B = m + 1. Fix an element b ∈ B and set
B1 := B \ {b}. Then #B1 = m by Th. 3.19(b), A \ B = (A \ B1 ) \ {b}, and we compute
Th. 3.19(b) φ(m)
#(A \ B) = # (A \ B1 ) \ {b} = #(A \ B1 ) − 1 = #A − #B1 − 1
= #A − #B,
Proof. The assertion is clearly true if A or B is empty. If A and B are nonempty, then
there exist m, n ∈ N such that #A = m and #B = n, i.e. there are bijective maps
f : A −→ {1, . . . , m} and g : B −→ {1, . . . , n}.
We first consider the case A∩B = ∅. We need to construct a bijective map h : A∪B −→
{1, . . . , m + n}. To this end, we define
(
f (x) for x ∈ A,
h : A ∪ B −→ {1, . . . , m + n}, h(x) :=
g(x) + m for x ∈ B.
Proof. If at least one Ai is empty, then (3.17) is true, since both sides are 0.
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 57
The case where all Ai are nonempty is proved by induction over n, i.e. we know ki :=
#Ai ∈ N for each i ∈ {1, . . . , n} and show by induction
n
Y n
Y
∀ # Ai = ki .
n∈N
i=1 i=1
| {z }
φ(n)
Q1 Q1
Base Case (n = 1): i=1 Ai = #A1 = k1 = i=1 ki , i.e. φ(1) holds.
Induction Step: From the induction
Q hypothesis φ(n),
Q we obtain a bijective map ϕ :
A −→ {1, . . . , N }, where A := ni=1 Ai and N := ni=1 ki . To prove φ(n + 1), we need
to construct a bijective map h : A × An+1 −→ {1, . . . , N · kn+1 }. Since #An+1 = kn+1 ,
there exists a bijective map f : An+1 −→ {1, . . . , kn+1 }. We define
Since ϕ and f are bijective, and since every m ∈ {1, . . . , N · kn+1 } has a unique repre-
sentation in the form m = a · N + r with a ∈ {0, . . . , kn+1 − 1} and r ∈ {1, . . . , N } (see
Th. D.1 in the Appendix), h is also bijective. This proves φ(n + 1) and completes the
induction.
Theorem 3.24. For each finite set A with #A = n ∈ N0 , one has #P(A) = 2n .
Base Case (n = 0): For n = 0, we have A = ∅, i.e. P(A) = {∅}. Thus, #P(A) = 1 = 20 ,
proving φ(0).
Induction Step: Assume φ(n) and consider A with #A = n + 1. Then A contains
B := A \ {a}, we then
at least one element a. For know #B = n from Th. 3.19(b).
Moreover, setting M := C ∪ {a} : C ∈ P(B) , we have the disjoint decomposition
P(A) = P(B) ∪˙ M. As the map ϕ : P(B) −→ M, ϕ(C) := C ∪ {a}, is clearly bijective,
P(B) and M have the same cardinality. Thus,
Th. 3.22 φ(n)
#P(A) = #P(B) + #M = #P(B) + #P(B) = 2 · 2n = 2n+1 ,
In this section, we present a number of important results regarding the natural numbers
and countability.
Theorem 3.25. (a) Every nonempty finite subset of a totally ordered set has a mini-
mum and a maximum.
(b) Every nonempty subset of N has a minimum.
Proof. Exercise.
Proposition 3.26. Every subset A of N is countable.
Proof. Since ∅ is countable, we may assume A 6= ∅. From Th. 3.25(b), we know that
every nonempty subset of N has a min. We recursively define a sequence in A by
(
min An if An := A \ {ai : 1 ≤ i ≤ n} 6= ∅,
a1 := min A, an+1 :=
an if An = ∅.
(i) A is countable.
(ii) There exists an injective map f : A −→ N.
(iii) There exists a surjective map g : N −→ A.
Proof. Directly from the definition of countable in Def. 3.10(c), one obtains (i)⇒(ii) and
(i)⇒(iii). To prove (ii)⇒(i), let f : A −→ N be injective. Then f : A −→ f (A) is
bijective, and, since f (A) ⊆ N, f (A) is countable by Prop. 3.26, proving A is countable
as well. To prove (iii)⇒(i), let g : N −→ A be surjective. Then g has a right inverse
f : A −→ N. One can obtain this from Th. 2.13(a), but, here, we can actually construct
f without the axiom of choice: For a ∈ A, let f (a) := min g −1 ({a}) (recall Th. 3.25(b)).
Then, clearly, g ◦ f = IdA . But this means g is a left inverse for f , showing f is injective
according to Th. 2.13(b). Then A is countable by an application of (ii).
3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 59
Qn
Theorem 3.28. If (A1 , . . . , An ), n ∈ N, is a finite family of countable sets, then i=1 Ai
is countable.
Proof. We first consider the special case n = 2 with A1 = A2 = N and show the map
ϕ : N × N −→ N, ϕ(m, n) := 2m · 3n ,
Proof. It suffices to consider the case that all Ai are nonempty. Moreover, according to
Prop. 3.27(iii), it suffices to construct a surjective map ϕ : N −→ A. Also according
to Prop. 3.27(iii), the countability of I and the Ai provides us with surjective maps
f : N −→ I and gi : N −→ Ai (here AC is used to select each gi from the set of all
surjective maps from N onto Ai ). Define
Remark 3.30. The axiom of choice is, indeed, essential for the proof of Th. 3.29. It
is shown in [Jec73, Th. 10.6] that it is consistent with the axioms of ZF (i.e. with the
axioms of Sec. A.3 of the Appendix) that, e.g., the uncountable sets P(N) and R (cf.
[Phi16, Th. F.2]) are countable unions of countable sets.
◦ : A × A −→ A, (x, y) 7→ x ◦ y, (4.1)
is called a composition on A; the pair (A, ◦) is then called a magma. It is also common
Q and to write x · y or just xy instead of x ◦ y. In
to call a composition a multiplication
this situation, the product symbol is defined as in Ex. 3.8(c). If the composition is
called an addition and x + y is written instead of x ◦ y, then it is assumed that it is
commutative, that means it satisfies the law of Def. 4.2(a) below (a multiplication might
P
or might now satisfy commutativity). For an addition, the summation symbol is
defined as in Ex. 3.8(b).
4 BASIC ALGEBRAIC STRUCTURES 61
(a) ◦ is called commutative or abelian if, and only if, x ◦ y = y ◦ x holds for all x, y ∈ A.
(b) ◦ is called associative if, and only if, x ◦ (y ◦ z) = (x ◦ y) ◦ z holds for all x, y, z ∈ A.
(c) An element e ∈ A is called left neutral (resp. right neutral) if, and only if,
e ∈ A is called neutral if, and only if, it is both left neutral and right neutral.
If associativity holds for each triple of elements, then it also holds for each finite tuple
of elements; if the composition is associative and commutativity holds for each pair of
elements, then it also holds for each finite tuple of elements (see Th. B.3 and Th. B.4
in the Appendix).
Lemma 4.3. Let (A, ◦) be a magma with l, r ∈ A. If l is left neutral and r is right
neutral, then l = r (in particular, A contains at most one neutral element).
x ◦ x = e. (4.3)
Before considering examples of magmas (A, ◦) in Ex. 4.9 below, we prove some first
rules and introduce some more notation.
Theorem 4.6. Let (G, ◦) be a group with right neutral element e ∈ G. Then the
following statements and rules are valid in G:
(b) If x, a ∈ G, then x ◦ a = e if, and only if, a ◦ x = e (i.e. an element is right inverse
if, and only if, it is left inverse). Moreover, inverse elements are unique (for each
x ∈ G, the unique inverse is then denoted by x−1 ).
x ◦ a = y ◦ a ⇒ x = y, (4.4a)
a ◦ x = a ◦ y ⇒ x = y. (4.4b)
Proof. (a): Let x ∈ G. By Def. 4.5(b), there exists y ∈ G such that x ◦ y = e and, in
turn, z ∈ G such that y ◦ z = e. Thus, as e is right neutral.
e ◦ z = (x ◦ y) ◦ z = x ◦ (y ◦ z) = x ◦ e = x,
implying
x = e ◦ z = (e ◦ e) ◦ z = e ◦ (e ◦ z) = e ◦ x,
showing e to be left neutral as well.
(b): To a choose b ∈ G such that a ◦ b = e. If x ◦ a = e, then
Interchanging the roles of x and a now proves the converse. To prove uniqueness, let
a, b be inverses to x. Then a = a ◦ e = a ◦ x ◦ b = e ◦ b = b.
(c): x−1 ◦ x = e holds according to (b) and shows that x is the inverse to x−1 . Thus,
(x−1 )−1 = x as claimed.
4 BASIC ALGEBRAIC STRUCTURES 63
Notation 4.7. Exponentiation with Integer Exponents: Let (A, ·) be a magma with
neutral element 1 ∈ A. Define recursively for each x ∈ A and each n ∈ N0 :
x0 := 1, ∀ xn+1 := x · xn . (4.5a)
n∈N0
Moreover, if (A, ·) constitutes a group, then also define for each x ∈ A and each n ∈ N:
Theorem 4.8. Exponentiation Rules: Let (G, ·) be a semigroup with neutral element
1 ∈ G. Let x, y ∈ G. Then the following rules hold for each m, n ∈ N0 . If (G, ·) is a
group, then the rules even hold for every m, n ∈ Z.
(a) xm+n = xm · xn .
(b) (xm )n = xm n .
Proof. (a): First, we fix n ∈ N0 and prove the statement for each m ∈ N0 by induction:
The base case (m = 0) is xn = xn , which is true. For the induction step, we compute
(4.5a) ind. hyp. (4.5a)
xm+1+n = x · xm+n = x · xm · xn = xm+1 xn ,
(b): First, we prove the statement for each n ∈ N0 by induction (for m < 0, we assume
G to be a group): The base case (n = 0) is (xm )0 = 1 = x0 , which is true. For the
induction step, we compute
(4.5a) ind. hyp. (a)
(xm )n+1 = xm · (xm )n = xm · xm n = xm n+m = xm (n+1) ,
completing the induction step. Now, let G be a group and n < 0. We already know
(xm )−1 = x−m . Thus, using what we have already shown,
(4.5b) −n
(xm )n = (xm )−1 = (x−m )−n = x(−m)(−n) = xm n .
completing the induction step. If G is a group and n < 0, then, using what we have
already shown,
(4.5b) Th. 4.6(d) −n (4.5b)
xn y n = (x−1 )−n (y −1 )−n = (x−1 y −1 )−n = (xy)−1 = (xy)n ,
where the elements of SM are also called permutations on M . Then (SM , ◦), where
◦ is given by composition of maps according to Def. 2.8, forms a group, the so-called
symmetric group or permutation group on M . We verify (SM , ◦) to be a group: ◦
is well-defined on SM , since the composition of bijective maps is bijective by Th.
2.14. Moreover, ◦ is associative by Th. 2.10(a). The neutral element is given by the
4 BASIC ALGEBRAIC STRUCTURES 65
(c) (N0 , +) and (N0 , ·), where + and · denote the usual addition and multiplication on
N0 , respectively (see [Phi16, Def. D.1] for a definition via recursion), form commu-
tative semigroups (see [Phi16, Th. D.2]), but no groups due to the lack of inverse
elements. From N0 , one can then construct the sets of integers Z (see Ex. 2.36(a)),
rational numbers Q (see Ex. 2.36(b)), real numbers R (see [Phi16, Def. D.32]), and
complex numbers C (see [Phi16, Def. 5.1]). One can extend + and · to each of these
sets (see Th. 4.15, Ex. 4.33, and Ex. 4.39 below, [Phi16, Def. D.32], and [Phi16,
Def. 5.1]) and show that (Z, +), (Q, +), (R, +), (C, +) are commutative groups (see
Th. 4.15, Ex. 4.33(b), and Ex. 4.39 below, [Phi16, Th. D.35], and [Phi16, Th. 5.2])
as well as (Q \ {0}, ·), (R \ {0}, ·), (C \ {0}, ·) (see Ex. 4.39 below, [Phi16, Th. D.35],
and [Phi16, Th. 5.2]).
∀ B ◦ C := {b ◦ c : b ∈ B ∧ c ∈ C} ⊆ A. (4.8)
B,C∈P(A)
The case, where B or C is a singleton set is of particular interest and one then uses
the following simplified notation:
∀ ∀ a ◦ B := {a} ◦ B, B ◦ a := B ◦ {a} . (4.9)
B∈P(A) a∈A
Sets of the form a ◦ B are called left cosets, sets of the form B ◦ a are called right
cosets. If (A, ◦) is commutative, then B ◦ C = C ◦ B for each B, C ∈ P(A), since
a = b ◦ c with b ∈ B, c ∈ C if, and only if, a = c ◦ b. In the same manner, one sees
that, if ◦ is associative on A, then it is also associative on P(A). If e ∈ A is left
(resp. right) neutral, then, clearly, {e} is left (resp. right) neutral for ◦ on P(A). In
4 BASIC ALGEBRAIC STRUCTURES 66
general, one can not expect (P(A), ◦) to be a group, even if (A, ◦) is a group, due
to the lack of inverse elements: For example, while (Z, +) is a group, (P(Z), +) is
not: For example, B := {0, 1} has no inverse: If C were inverse to B, then −1 ∈ C,
implying −1 = 0 + (−1) ∈ B + C, i.e. B + C 6= {0}, in contradiction to C being
inverse to B.
(e) Let (A, ·) be a magma and let M be a set. We consider the set F(M, A) = AM of
functions, mapping M into A. We make AM into a magma by defining · pointwise
on AM : Define
We then have
where (4.11a) – (4.11c) are immediate from (4.10). To verify (4.11d), let (A, ·) be
a group with e ∈ A being neutral. If f ∈ AM , then let
g : M −→ A, g(x) := (f (x))−1 .
Then
∀ (f · g)(x) = f (x) · (f (x))−1 = e,
x∈M
The, perhaps, most important notion when studying structures is that of the composi-
tion-respecting map. The technical term for such a map is homomorphism, which we
define next:
Definition 4.10. Let (A, ◦) and (B, ◦) be magmas (caveat: to simplify notation, we
denote both compositions by ◦; however, they might even denote different compositions
on the same set A = B). A map φ : A −→ B is called homomorphism if, and only if,
endomorphism if, and only if, (A, ◦) = (B, ◦); automorphism if, and only if, it is both
endomorphism and isomorphism. Moreover, (A, ◦) and (B, ◦) are called isomorphic
(denoted (A, ◦) ∼ = (B, ◦)) if, and only if, there exists an isomorphism φ : A −→ B. If
A and B have neutral elements e ∈ A, e′ ∈ B, then a homomorphism φ : A −→ B is
called unital if, and only if, φ(e) = e′ .
—
If (A, ◦) and (B, ◦) are isomorphic, then, from an algebraic perspective, the two struc-
tures are the same, except that the elements of the underlying set have been “renamed”
(where the isomorphism provides the assignment rule for obtaining the new names).
(e) If (A, ◦), (B, ◦) are groups and φ : A −→ B is a homomorphism, then φ is unital
and φ(a−1 ) = (φ(a))−1 for each a ∈ A.
Applying (φ(e))−1 to both sides of the above equality proves φ(e) = e′ . Moreover, for
each a ∈ A,
φ(a−1 ) ◦ φ(a) = φ(a−1 ◦ a) = φ(e) = e′ ,
proving (e).
Example 4.12. (a) If (A, ◦) is a magma, then it is immediate that the identity Id :
A −→ A is an automorphism.
(b) If (A, ◦), (B, ◦) are magmas, where e ∈ B is neutral, then the constant map φ ≡ e
is a homomorphism: Indeed,
(c) If (A, ·) is a semigroup with neutral element 1 ∈ A, then, for each fixed a ∈ A,
φa : N0 −→ A, φa (n) := an , (4.13)
is a homomorphism from (N0 , +) into (A, ·); if (A, ·) is a group, we may extend φa
to Z and it becomes a homomorphism from (Z, +) into (A, ·): These statements are
immediate from Th. 4.8(a). Note that, if A = N0 , Z, Q, R, C and (A, ·) = (A, +),
then φa (n) = na.
4 BASIC ALGEBRAIC STRUCTURES 69
φk : G −→ G, φk (a) := ak , (4.14)
Example 4.14. (a) The most trivial groups are the group with one element and the
group with two elements, which can be defined by the following Cayley tables (where
e, a are distinct elements):
◦ e a
◦ e
, e e a
e e
a a e
Both structures are abelian groups, which is immediate for the group with one
element and is easily verified for the group with two elements. We will see later
that they are isomorphic to the groups we will denote Z1 and Z2 (which, in light of
Prop. 4.11(c), shows again that the structure with two elements is a group).
(b) The following examples show that, in non-groups, left and right neutral elements
need not be unique: For a 6= b, let
◦ a b ◦ a b
a a b , a a a
b a b b b b
One easily verifies these structures to be associative, but they are not groups: The
magma on the left has no right neutral element, the magma on the right does not
have inverses to both a and b.
Theorem 4.15. Let (A, +) be an abelian semigroup with neutral element 0 ∈ A and
assume (A, +) to satisfy the cancellation laws (4.4) (with ◦ replaced by +). Consider
the quotient set
G := (A × A)/ ∼ = [(a, b)] : (a, b) ∈ A × A (4.15)
4 BASIC ALGEBRAIC STRUCTURES 70
Introducing the simplified notation [a, b] := [(a, b)], we define addition and subtraction
on G by
+ : G × G −→ G, [a, b], [c, d] 7→ [a, b] + [c, d] := [a + c, b + d], (4.17a)
− : G × G −→ G, [a, b], [c, d] 7→ [a, b] − [c, d] := [a, b] + [d, c]. (4.17b)
Then addition and subtraction on G are well-defined, (G, +) forms an abelian group,
where [0, 0] is the neutral element, [b, a] is the inverse element of [a, b] for each a, b ∈ A,
and, denoting the inverse element of [a, b] by −[a, b] in the usual way, [a, b] − [c, d] =
[a, b] + (−[c, d]) for each a, b, c, d ∈ A. Moreover, the map
is a monomorphism, where it is customary to identify A with ι(A). One then just writes
a instead of [a, 0] and −a instead of [0, a] = −[a, 0]. The most important application is
the case A = N0 , which yields that addition on Z = G forms an abelian group (cf. Ex.
2.36(a), [Phi16, Th. D.2], [Phi16, Th. D.7(b)]).
Proof. If one reexamines Ex. 2.36(a), replacing N0 by A and Z by G, then one sees it
shows ∼ to constitute an equivalence relation, where the proof makes use of commuta-
tivity, associativity, and the cancellation law. To verify + and − are well-defined on G,
we need to check the above definitions do not depend on the chosen representatives, i.e.
∀ ˜ ˜
[a, b] = [ã, b̃] ∧ [c, d] = [c̃, d] ⇒ [a + c, b + d] = [ã + c̃, b̃ + d] (4.19a)
˜
a,b,c,d,ã,b̃,c̃,d∈A
and
∀ ˜
[a, b] = [ã, b̃] ∧ [c, d] = [c̃, d] ⇒ ˜ . (4.19b)
[a, b] − [c, d] = [ã, b̃] − [c̃, d]
˜
a,b,c,d,ã,b̃,c̃,d∈A
Indeed, [a, b] = [ã, b̃] means a + b̃ = b + ã, [c, d] = [c̃, d] ˜ means c + d˜ = d + c̃, implying
˜ ˜ proving (4.19a). Now (4.19b)
a + c + b̃ + d = b + ã + d + c̃, i.e. [a + c, b + d] = [ã + c̃, b̃ + d],
is just (4.17b) combined with (4.19a). To verify commutativity and associativity on G,
let a, b, c, d, e, f ∈ A. Then
proving associativity. For every a, b ∈ A, one obtains [a, b] + [0, 0] = [a + 0, b + 0] = [a, b],
proving neutrality of [0, 0], whereas [a, b] + [b, a] = [a + b, b + a] = [a + b, a + b] = [0, 0]
(since (a + b, a + b) ∼ (0, 0)) shows [b, a] = −[a, b]. Now [a, b] − [c, d] = [a, b] + (−[c, d])
is immediate from (4.17b). The map ι is injective, as, for each a, b ∈ A, ι(a) = [a, 0] =
ι(b) = [b, 0] implies a + 0 = 0 + b, i.e. a = b. The map ι is a homomorphism, as
Theorem 4.17. Let (G, ◦) be a group, ∅ 6= U ⊆ G. Then U ≤ G if, and only if, (i)
and (ii) hold, where
Proof. Exercise.
Example 4.18. (a) If (G, ◦) is an arbitrary group with neutral element e ∈ G, then it
is immediate from Th. 4.17, that {e} and G are always subgroups.
(b) We use Th. 4.17 to verify that, for each k ∈ Z, kZ = {kl : l ∈ Z} is a subgroup of
(Z, +): As 0 ∈ kZ, kZ 6= ∅. If l1 , l2 ∈ Z, then kl1 + kl2 = k(l1 + l2 ) ∈ kZ. If l ∈ Z,
then −kl = k(−l) ∈ kZ. Thus, Th. 4.17 applies, showing kZ ≤ Z.
(c) As N is no subgroup of Z, even though Th. 4.17(i) holds for N, we see that, in
general, Th. 4.17(ii) can not be omitted for infinite subsets.
4 BASIC ALGEBRAIC STRUCTURES 72
(e) In contrast to intersections, unions of subgroups are not necessarily subgroups (how-
ever, cf. (f) below): We know from (b) that 2Z and 3Z are subgroups of (Z, +).
However, 2 ∈ 2Z, 3 ∈ 3Z, but 5 = 2 + 3 6∈ (2Z) ∪ (3Z), showing (2Z) ∪ (3Z) is not
a subgroup of (Z, +).
(f ) While (e) shows that unions of subgroups are not necessarily subgroups, if (G, ◦) is
a group with neutral element e ∈ G, I 6= ∅ is an index set, partially ordered by ≤
in a way such that, for each i, j ∈ I, there exists k ∈ I with i, j ≤ k (if I is totally
ordered by ≤, then one can use k := max{i, j}), and (Ui )i∈I is an increasingSfamily
of subgroups (i.e., for each i, j ∈ I with i ≤ j, one has Ui ⊆ Uj ), then U := i∈I Ui
is a subgroup as well: Indeed, if i ∈ I, then e ∈ Ui ⊆ U . If a, b ∈ U , then there exist
i, j ∈ I such that a ∈ Ui and b ∈ Uj . If k ∈ I with i, j ≤ k, then a ◦ b ∈ Uk ⊆ U .
Moreover, a−1 ∈ Ui ⊆ U . Thus, U is a subgroup by Th. 4.17.
(g) One can show that every group G is isomorphic to a subgroup of the permutation
group SG (see Sec. C of the Appendix, in particular, Cayley’s Th. C.2).
Definition 4.19. Let G and H be groups and let φ : G −→ H be a homomorphism.
Let e′ ∈ H be neutral. Define the sets
(a) φ−1 (V ) ≤ G.
(b) ker φ ≤ G.
(c) Im φ ≤ H.
(d) φ(U ) ≤ H.
(e) φ is a monomorphism (i.e. injective) if, and only if, ker φ = {e}.
4 BASIC ALGEBRAIC STRUCTURES 73
i.e. the nonempty preimages of elements of H are cosets of the kernel of φ (cf. Ex.
4.9(d)).
Proof. (a): Since V ≤ H, we have e′ ∈ V . From Prop. 4.11(e), we know φ(e) = e′ , i.e.
e ∈ φ−1 (V ) and φ−1 (V ) 6= ∅. If a, b ∈ φ−1 (V ), then φ(a), φ(b) ∈ V and, since V ≤ H,
φ(ab) = φ(a)φ(b) ∈ V,
Prop. 4.11(e)
φ(a−1 ) = (φ(a))−1 ∈ V,
showing xy ∈ Im φ and x−1 ∈ Im φ. Once again, Im φ ≤ H now follows from Th. 4.17.
(d) is now immediate by applying (c) with φ replaced by φ↾U (then φ(U ) = Im φ↾U ≤ H).
(e): If φ is injective, then e is the unique preimage of e′ , showing ker φ = {e}. Conversely,
assume ker φ = {e}. If a, b ∈ G with φ(a) = φ(b), then
In Ex. 4.9(d), we have defined cosets for a given subset of a magma. For a group G,
cosets of subgroups are of particular interest: We will see in Prop. 4.22 that, if U ≤ G,
then the cosets of U form a partition of G, giving rise to an equivalence relation on G
according to Th. 2.33(a). Thus, the cosets form the quotient set with respect to this
equivalence relation and, if the subgroup is sufficiently benign (namely what we will call
a normal subgroup in Def. 4.23 below), then we can make this quotient set itself into
another group, the so-called quotient group or factor group G/U (cf. Def. 4.26 and Th.
4.27(a) below).
Proposition 4.22. Let (G, ·) be a group, U ≤ G. Then
[
G= aU, ∀ aU = bU ∨ aU ∩ bU = ∅ , (4.21a)
a,b∈G
a∈G
[
G= U a, ∀ Ua = Ub ∨ Ua ∩ Ub = ∅ , (4.21b)
a,b∈G
a∈G
i.e. both the left cosets and the right cosets of U form decompositions of G.
Proof. We conduct the proof for left cosets – the proof for right cosets is completely
analogous. For each a ∈ G, we have a ∈ aU (since e ∈ U ), already showing G =
S
a∈G aU . Now let a, b, x ∈ G with x ∈ aU ∩ bU . We need to prove aU = bU . From
x ∈ aU ∩ bU , we obtain
∃ x = au1 = bu2 . (4.22)
u1 ,u2 ∈U
i.e. if, and only if, left and right cosets with respect to U are the same.
4 BASIC ALGEBRAIC STRUCTURES 75
Theorem 4.24. Let (G, ·) and (H, ·) be groups and let φ : G −→ H be a homomorphism.
Proof. (a): As we already know from Th. 4.20(a) that U is a subgroup of G, it merely
remains to show U is normal. If a ∈ G and u ∈ U , then φ(u) ∈ V . Since V is normal,
we have φ(a)V = V φ(a). Thus, there exists v ∈ V such that φ(a)φ(u) = vφ(a), namely
(b) In Ex. 4.9(b), we defined the symmetric groups S2 and S3 . Clearly, the map
(
f (n) for n ∈ {1, 2},
φ : S2 −→ S3 , f 7→ φ(f ), φ(f )(n) :=
3 for n = 3,
where we made use of a notation, commonly used for permutations (we will need to
study it more thoroughly later): (12) is the map that permutes 1 and 2 and leaves
3 fixed, (23) permutes 2 and 3 and leaves 1 fixed, (132) maps 1 into 3, 3 into 2, 2
into 1, (123) maps 1 into 2, 2 into 3, 3 into 1.
Definition 4.26. Let (G, ·) be a group and let N ≤ G be a normal subgroup. According
to Prop. 4.22, the cosets of N form a partition of G. According to Th. 2.33(a), the cosets
are precisely the equivalence classes of the equivalence relation ∼ on G, defined by
a ∼ b :⇔ aN = bN.
Define
G/N := G/ ∼,
4 BASIC ALGEBRAIC STRUCTURES 76
i.e. G/N is the set of all cosets of N . We define a composition on G/N by letting
(note that we already know the forming of cosets to be associative by Ex. 4.9(d)). We
call (G/N, ·) the quotient group of G by N or the factor group of G with respect to N
(cf. Th. 4.27(a) below).
Theorem 4.27. (a) Let (G, ·) be a group and let N ≤ G be a normal subgroup. Then
(4.24) well-defines a composition on G/N that makes G/N into a group with neutral
element N . Moreover, the map
φ = fm ◦ fe . (4.28)
Proof. (a): To see that (4.24) well-defines a composition on G/N , we need to show
that the definition is independent of the chosen coset representatives a, b: To this end,
suppose a, b, a′ , b′ ∈ G are such that aN = a′ N and bN = b′ N . We need to show
abN = a′ b′ N . There exist na , nb ∈ N such that a′ = ana , b′ = bnb . Since N is a normal
subgroup of G, we have bN = N b and there exists n ∈ N such that na b = bn. Then, as
nnb N = N , we obtain
Example 4.28. (a) Consider (Z, +) and let n ∈ N. We know from Ex. 4.18(b) that
nZ ≤ Z. As (Z, +) is also commutative, nZ is a normal subgroup of Z and we can
form the quotient group
Zn := Z/nZ. (4.29)
The elements of Zn are the cosets r + nZ, r ∈ Z. For r ∈ Z, we have
r̄ := r + nZ = {r + mn : m ∈ Z}.
r + nZ = {k ∈ Z : k ≡ r (mod n)}.
For this reason, one also calls the elements of Zn congruence classes. If k = r + mn
as above, one also says that r is the residue (or remainder) when dividing k by n
and, thus, one also calls the elements of Zn residue classes. Here, the canonical
homomorphism of Th. 4.27(a) takes the form
φ : Z −→ Zn , φ(r) := r̄ = r + nZ.
4 BASIC ALGEBRAIC STRUCTURES 78
+ : R × R −→ R, (x, y) 7→ x + y,
(4.31)
· : R × R −→ R, (x, y) 7→ x · y
(+ is called addition and · is called multiplication; as before, one often writes xy instead
of x · y). Then (R, +, ·) (or just R, if + and · are understood) is called a ring if, and
only if, the following three conditions are satisfied:
(iii) Distributivity:
∀ x · (y + z) = x · y + x · z, (4.32a)
x,y,z∈R
∀ (y + z) · x = y · x + z · x. (4.32b)
x,y,z∈R
A ring R is called commutative if, and only if, its multiplication is commutative. More-
over, a ring is called a ring with unity if, and only if, R contains a neutral element with
respect to multiplication (i.e. there is 1 ∈ R such that 1 · x = x · 1 = x for each x ∈ R)
– some authors always require a ring to have a neutral element with respect to multi-
plication. Finally, (R, +, ·) is called a field if, and only if, it is a ring and, in addition,
(R \ {0}, ·) constitutes a commutative group (its neutral element is denoted by 1). For
each x, y in a ring R, define y − x := y + (−x), where −x is the additive inverse of x. If
R is a field, then, for x 6= 0, also define the fractions xy := y/x := yx−1 with numerator
y and denominator x, where x−1 denotes the multiplicative inverse of x.
—
Like group theory, ring theory and field theory are vast important subdisciplines of
algebra. Here we will merely give a brief introduction to some elementary notions and
results, before proceeding to vector spaces, which are defined, building on the notion of
4 BASIC ALGEBRAIC STRUCTURES 80
a field. Before we come to examples of rings and fields, we will prove a number of basic
rules. However, it might be useful to already know that, with the usual addition and
multiplication, Z is a ring (but not a field), and Q, R, C all are fields.
Theorem 4.30. The following statements and rules are valid in every ring (R, +, ·) (let
0 denote the additive neutral element and let x, y, z ∈ R):
(a) x · 0 = 0 = 0 · x.
(b) x(−y) = −(xy) = (−x)y.
(c) (−x)(−y) = xy.
(d) x(y − z) = xy − xz, (y − z)x = yx − zx.
i.e. x · 0 = 0 follows since (R, +) is a group. The second equality follows analogously
using (4.32b).
(b): xy +x(−y) = x(y −y) = x·0 = 0, where we used (4.32a) and (a). This shows x(−y)
is the additive inverse to xy. The second equality follows analogously using (4.32b).
(c): xy = −(−(xy)) = −(x(−y)) = (−x)(−y), where (b) was used twice.
(d): x(y − z) = x(y + (−z)) = xy + x(−z) = xy − xz and (y − z)x = (y + (−z))x =
yx + (−z)x = yx − zx.
Theorem 4.31. The following statement and rules are valid in every field (F, +, ·):
(a) xy = 0 ⇒ x = 0 ∨ y = 0.
(b) Rules for Fractions:
a b ad + bc a b ab a/c ad
+ = , · = , = ,
c d cd c d cd b/d bc
where all denominators are assumed 6= 0.
and
a b ab
· = ac−1 bd−1 = ab(cd)−1 =
c d cd
and
a/c ad
= ac−1 (bd−1 )−1 = ac−1 b−1 d = ad(bc)−1 = ,
b/d bc
completing the proof.
Definition and Remark 4.32. Let (R, +, ·) be a ring and a ∈ R. Then a is called a left
(resp. right) zero divisor if, and only if, there exists x ∈ R \ {0} such that ax = 0 (resp.
xa = 0). If a 6= 0 is a zero divisor, then it is called a nonzero or nontrivial zero divisor.
According to Th. 4.30(a), 0 is always a zero divisor, except for R = {0}. According
to Th. 4.31(a), in a field, there do not exist any nontrivial zero divisors. However, in
general, rings can have nontrivial zero divisors (see, e.g. Ex. 4.38 and Ex. 4.42(a) below).
Example 4.33 (Ring of Integers Z). Even though we have occasionally already used
multiplication on Z in examples, so far, we have not provided a mathematically precise
definition. The present example, will remedy this situation. Recall from Ex. 2.36(a) and
Th. 4.15 the definition of Z as a set of equivalence classes of elements of N0 × N0 with
addition (and subtraction) on Z defined according to (4.17). To obtain the expected
laws of arithmetic, multiplication on Z needs to be defined such that (a − b) · (c − d) =
(ac + bd) − (ad + bc), which leads to the following definition: Multiplication on Z is
defined by
· : Z × Z −→ Z, [a, b], [c, d] 7→ [a, b] · [c, d] := [ac + bd, ad + bc]. (4.33)
(a) It is an exercise to show the definition in (4.33) does not depend on the chosen
representatives, i.e.
∀ ˜ ⇒ [ac+bd, ad+bc] = [ãc̃+ b̃d,
[a, b] = [ã, b̃]∧[c, d] = [c̃, d] ˜ ãd+
˜ b̃c̃] .
˜ 0
a,b,c,d,ã,b̃,c̃,d∈N
(4.34)
(b) It is an exercise to show (Z, +, ·) is a commutative ring with unity, where [1, 0] is
the neutral element of multiplication, and there are no nontrivial zero divisors i.e.
∀ [a, b]·[c, d] = [ac+bd, ad+bc] = [0, 0] ⇒ [a, b] = [0, 0]∨[c, d] = [0, 0] .
a,b,c,d∈N0
(4.35)
(c) (Z, +, ·) is not a field, since, e.g., there is no multiplicative inverse for 2 ∈ Z: While
0 · 2 = 0, one has n · 2 ≥ 2 for each n ∈ N and −n · 2 ≤ −2 for each n ∈ N, showing
k · 2 6= 1 for each k ∈ Z.
4 BASIC ALGEBRAIC STRUCTURES 82
(b) Let (R, +, ·) and (S, +, ·) be rings (caveat: to simplify notation, we use the same
symbols to denote the compositions on R and S, respectively; however, they might
even denote different compositions on the same set R = S). A map φ : R −→ S
is called ring homomorphism if, and only if, φ is a homomorphism in the sense of
Def. 4.10 with respect to both + and · (caveat: Ex. 4.36(a) below shows that a
ring homomorphism is not necessarily unital with respect to multiplication). The
notions ring monomorphism, epimorphism, isomorphism, endomorphism, automor-
phism are then defined as in Def. 4.10. Moreover, (R, +, ·) and (S, +, ·) are called
isomorphic (denoted (R, +, ·) ∼= (S, +, ·)) if, and only if, there exists a ring isomor-
phism φ : R −→ S.
(b) Let (F, +, ·) be a field, ∅ 6= U ⊆ F . Then U is a subfield of F if, and only if, (i)
and (ii) hold, where
Proof. In both cases, it is clear that (i) and (ii) are necessary. That (i) and (ii) are
also sufficient in both cases is due to the fact that, if associativity (resp. commutativity,
resp. distributivity) hold on R (resp. F ), then it is immediate that the same property
holds on U .
(b) Let n ∈ N. It is clear from Th. 4.35(a) that nZ is a subring of Z (note that, for
n > 1, nZ is not a ring with unity). Moreover, for n > 1, φ : Z −→ nZ, φ(k) := nk,
is not a ring homomorphism, since, e.g. φ(1 · 1) = n 6= n2 = φ(1)φ(1).
(d) Let (R, +, ·) be a ring (resp. a ring with unity, resp. a field), let I 6= ∅ be an index
set, and let (Ui )i∈I be a family of subrings (resp.
T of subrings with unity, resp. of
subfields). It is then an exercise to show U := i∈I Ui is a subring (resp. a subring
with unity, resp. a subfield).
(e) In contrast to intersections, unions of rings are not necessarily rings (however, cf.
(f) below): We know from (b) that 2Z and 3Z are subrings of Z. But we already
noted in Ex. 4.18(e) that (2Z) ∪ (3Z) is not even a subgroup of (Z, +). To show
that the union of two fields is not necessarily a field (not even a ring, actually),
needs slightly more work: We use that Q is a subfield of R (cf. Rem. 4.40 below)
and show that, for each x ∈ R with x2 ∈ Q, Q(x) := {r + sx : r, s ∈ Q} is a subfield
of R (clearly, Q(x) = Q if, and only if, x ∈ Q): Setting r = s = 0, shows 0 ∈ Q(x).
Setting r = 1, s = 0, shows 1 ∈ Q(x). If r1 , s1 , r2 , s2 ∈ Q, then
r1 + s1 x + r2 + s2 x = r1 + r2 + (s1 + s2 )x ∈ Q(x),
(r1 + s1 x)(r2 + s2 x) = r1 r2 + s1 s2 x2 + (s1 r2 + r1 s2 )x ∈ Q(x),
−(r1 + s1 x) = −r1 − s1 x ∈ Q(x),
r1 − s 1 x
(r1 + s1 x)−1 = ∈ Q(x),
r12 − s21 x2
√
showing
√ Q(x) to be a subfield of R by√Th. 4.35(b).
√ However,√e.g., U := Q( 2) ∪
Q( 3) is not even a ring, since α√:= 2√ + 3∈ / U: α ∈/√ Q( 2) since, otherwise,
there
√ are r, s ∈ Q with α = r + s 2, i.e. 3 = r + (s − 1) 2, and 3 = r2 + 2r(s −
1) 2 + 2(s − 1)2 , implying the false statement
√ p √
3=0 ∨ 3∈Q ∨ 3/2 ∈ Q ∨ 2 ∈ Q.
√ √ √
Switching the roles of 2 and 3, shows α ∈ / Q( 3), i.e. α ∈
/ U.
(f ) While (e) shows that unions of subrings (or even subfields) are not necessarily
subrings, if (R, +, ·) is a ring (resp. a ring with unity, resp. a field), I 6= ∅ is an
4 BASIC ALGEBRAIC STRUCTURES 84
index set, partially ordered by ≤ in a way such that, for each i, j ∈ I, there exists
k ∈ I with i, j ≤ k (if I is totally ordered by ≤, then one can use k := max{i, j}),
and (Ui )i∈I is an increasing family of subrings (resp. of subrings with unity,S resp. of
subfields) (i.e., for each i, j ∈ I with i ≤ j, one has Ui ⊆ Uj ), then U := i∈I Ui is a
subgroup (resp. a subring with unity, resp. a subfield) as well: Indeed, according to
Ex. 4.18(f), (U, +) ≤ (R, +). If a, b ∈ U , then there exist i, j ∈ I such that a ∈ Ui
and b ∈ Uj . If k ∈ I with i, j ≤ k, then ab ∈ Uk ⊆ U , i.e. U is a ring by Th. 4.35(a).
If 1 ∈ Ui , then 1 ∈ U . If each (Ui \ {0}, ·) is a group, then (U \ {0}, ·) is a group by
Ex. 4.18(f), showing U to be a subfield by Th. 4.35(b).
—
If φ is even an isomorphism with respect to both + and ·, then the above implications
become equivalences.
showing (B, +, ·) satisfies (4.32b) as well. If (A, +, ·) is a ring, then we know from Prop.
4.11(c) that (B, +) is a commutative group and that (B, ·) is associative, showing that
4 BASIC ALGEBRAIC STRUCTURES 85
(B, +, ·) is a ring as well. If (A, +, ·) is a ring with unity, then the above and Prop.
4.11(d) imply (B, +, ·) to be a ring with unity as well. If (A, +, ·) is a field, then, as
B \ {0} 6= ∅, (B \ {0}, ·) must also be a group, i.e. (B, +, ·) is a field. Finally, if φ
is an isomorphism, then the implications become equivalences, as both φ and φ−1 are
epimorphisms by Prop. 4.11(b) (and since φ(0) = 0).
Example 4.38. Let n ∈ N. In Ex. 4.28(a), we considered the group (Zn , +), where Zn
was defined as the quotient group Zn := Z/nZ. We now want to define a multiplication
on Zn that makes Zn into a ring with unity (and into a field if, and only if, n is prime,
cf. Def. D.2(b)). This is accomplished by letting
First, we check that the definition does not depend on the chosen representatives: Let
r1 , r2 , s1 , s2 ∈ Z. If r̄1 = r̄2 and s̄1 = s̄2 , then there exist mr , ms ∈ Z such that
r1 − r2 = mr n and s1 − s2 = ms n, implying
φ : Z −→ Zn , φ(r) = r̄ = r + nZ
ā · b̄ = n̄ = 0̄,
which is commutative due to (Z, +) being commutative. We claim the additive isomor-
phism f : G −→ Zn , f (r) := r̄, of Ex. 4.28(a) to be a multiplicative isomorphism as
well: If r, s ∈ G, then, using q, p as above,
f (r ⊗ s) = f (p) = p̄ = rs − qn = rs = r̄ · s̄ = f (r)f (s),
showing f to be a multiplicative homomorphism. As we know f to be bijective, (G, +, ·)
is a ring by Prop. 4.37, yielding (G, +, ⊗) ∼ = (Zn , +, ·). As stated in Ex. 4.28(a) in the
context of (Zn , +), one tends to use the isomorphism f to identify G with Zn . One does
the same in the context of (Zn , +, ·), also simply writing · for the multiplication on G.
Example 4.39 (Field of Rational Numbers Q). In Ex. 2.36(b), we defined the set of
rational numbers Q as the quotient set of Z × (Z \ {0}) with respect to the equivalence
relation defined by
(a, b) ∼ (c, d) :⇔ a · d = b · c.
We now want to define “the usual” addition and multiplication on Q and then verify
that these make Q into a field (recall the notation ab := a/b := [(a, b)]):
Addition on Q is defined by
a c a c ad + bc
+ : Q × Q −→ Q, , 7→ + := . (4.37)
b d b d bd
Multiplication on Q is defined by
a c a c ac
· : Q × Q −→ Q, , 7→ · := . (4.38)
b d b d bd
We will now show that (Q, +, ·) forms a field, where 0/1 and 1/1 are the neutral elements
with respect to addition and multiplication, respectively, (−a/b) is the additive inverse
to a/b, whereas b/a is the multiplicative inverse to a/b with a 6= 0. We already know
from Ex. 2.36(b) that the map
k
ι : Z −→ Q, ι(k) :=
,
1
is injective. We will now also show it is a unital ring monomorphism, i.e. it satisfies
ι(1) = 11 ,
∀ ι(k + l) = ι(k) + ι(l), (4.39a)
k,l∈Z
We begin by showing that the definitions in (4.37) and (4.38) do not depend on the
chosen representatives, i.e.
!
a ã c c̃ ad + bc ˜
ãd + b̃c̃
∀ ∀ = ∧ = ⇒ = (4.40)
˜
a,c,ã,c̃∈Z b,d,b̃,d∈Z\{0} b b̃ d d˜ bd b̃d˜
and
a ã c c̃ ac ãc̃
∀ ∀ = ∧ = ⇒ = . (4.41)
a,c,ã,c̃∈Z ˜
b,d,b̃,d∈Z\{0} b b̃ d d˜ bd b̃d˜
Furthermore, the results of both addition and multiplication are always elements of Q.
(4.40) and (4.41): a/b = ã/b̃ means ab̃ = ãb, c/d = c̃/d˜ means cd˜ = c̃d, implying
ad + bc ãd˜ + b̃c̃
(ad + bc)b̃d˜ = bd(ãd˜ + b̃c̃), i.e. =
bd b̃d˜
and
ac ãc̃
acb̃d˜ = bdãc̃, i.e.
= .
bd b̃d˜
That the results of both addition and multiplication are always elements of Q follows
from (4.35), i.e. from the fact that Z has no nontrivial zero divisors. In particular, if
b, d 6= 0, then bd 6= 0, showing (ad + bc)/(bd) ∈ Q and (ac)/(bd) ∈ Q.
Next, we verify + and · to be commutative and associative on Q: Let a, c, e ∈ Z and
b, d, f ∈ Z \ {0}. Then, using commutativity on Z, we compute
c a cb + da ad + bc a c c a ca ac a c
+ = = = + , · = = = · ,
d b db bd b d d b db bd b d
showing commutativity on Q. Using associativity and distributivity on Z, we compute
a c e a cf + de adf + b(cf + de) (ad + bc)f + bde
+ + = + = =
b d f b df bdf bdf
ad + bc e a c e
= + = + + ,
bd f b d f
a c e a(ce) (ac)e a c e
· · = = = · · ,
b d f b(df ) (bd)f b d f
proving distributivity on Q. We now check the claims regarding neutral and inverse
elements:
a 0 a·1+b·0 a
+ = = ,
b 1 b·1 b
a −a ab + b(−a) (a − a)b 0 0
+ = 2
= 2
= 2 = ,
b b b b b 1
a 1 a·1 a
· = = ,
b 1 b·1 b
a b ab 1
· = = .
b a ba 1
Thus, (Q, +, ·) is a ring and (Q \ {0}, ·) is a group, implying Q to be a field. Finally
k l k·1+1·l k+l
ι(k) + ι(l) = + = = = ι(k + l),
1 1 1 1
k l kl
ι(k) · ι(l) = · = = ι(kl),
1 1 1
proving (4.39).
Remark 4.40. With the usual addition and multiplication, (R, +, ·) and (C, +, ·) are
fields (see [Phi16, Th. D.35] and and [Phi16, Th. 5.2], respectively). In particular, R is
a subfield of C and Q is a subfield of both R and C.
Definition and Remark 4.41. Let (R, +, ·) be a ring with unity. We call x ∈ R
invertible if, and only if, there exists x ∈ R such that xx = xx = 1. We call (R∗ , ·),
where R∗ := {x ∈ R : x invertible} the group of invertible elements of R. We verify
(R∗ , ·) to be a group: If x, y ∈ R∗ , then xy ȳx̄ = 1 and ȳx̄xy = 1, showing xy ∈ R∗ .
Moreover, R∗ inherits associativity from R, 1 ∈ R∗ and each x ∈ R∗ has a multiplicative
inverse by definition of R∗ , proving (R∗ , ·) to be a group. We also note that, if R 6= {0},
then 0 ∈ / R∗ and (R∗ , +) is not a group (in particular, R∗ is then not a subring of R).
Example 4.42. (a) Let (R, +, ·) be a ring and let M be a set. As in Ex. 4.9(e), we
can define + and · pointwise on RM , i.e.
least two distinct elements, then RM always has nontrivial divisors of 0, even if R
has none: Let x, y ∈ M with x 6= y, and 0 6= a ∈ R. Define
( (
a if z = x, a if z = y,
fx , fy : M −→ R, fx (z) := fy (z) :=
0 if z 6= x, 0 if z 6= y.
Then fx 6≡ 0, fy 6≡ 0, but fx · fy ≡ 0.
(b) Let (G, +) be an abelian group and let End(G) denote the set of (group) endomor-
phisms on G. Define addition and multiplication on End(G) by letting, for each
f, g ∈ End(G),
(f + g) : G −→ G, (f + g)(a) := f (a) + g(a), (4.42a)
(f · g) : G −→ G, (f · g)(a) := f (g(a)). (4.42b)
We claim that (End(G), +, ·) forms a ring with unity (the so-called ring of endo-
morphisms on G): First, we check + to be well-defined (which is clear for ·): For
each a, b ∈ G, we compute, using commutativity of + on G,
(f +g)(a+b) = f (a+b)+g(a+b) = f (a)+f (b)+g(a)+g(b) = (f +g)(a)+(f +g)(b),
showing f + g ∈ End(G). As we then know (End(G), +) ≤ (F(G, G), +), we know
· to be associative, and Id is clearly neutral for ·, it remains to check distributivity:
To this end, let f, g, h ∈ End(G). Then, for each a ∈ G,
f · (g + h) (a) = f g(a) + h(a) = (f g)(a) + (f h)(a) = (f g + f h)(a),
(g + h) · f (a) = (g + h)(f (a)) = (gf )(a) + (hf )(a) = (gf + hf )(a),
proving distributivity and that (End(G), +, ·) forms a ring with unity. Moreover, the
set Aut(G) := End(G) ∩ SG of automorphisms on G is a subgroup of the symmetric
group SG and, comparing with Def. and Rem. 4.41, we also have Aut(G) = End(G)∗ .
Definition and Remark 4.43. Let (F, +, ·) be a field. Recall the homomorphism from
Ex. 4.12(c), which we now apply to the group (F, +) with a := 1. Then
n
X
φ1 : N0 −→ F, φ1 (n) = 1 =: n · 1. (4.43)
i=1
We say that the field F has characteristic 0 (denoted char F = 0) if, and only if, n·1 6= 0
for each n ∈ N. Otherwise, we define the field’s characteristic to be the number
char F := min{n ∈ N : n · 1 = 0}.
With this definition, we have char Q = char R = char C = 0 and, for each prime number
p, char Zp = p.
—
5 VECTOR SPACES 90
The examples above show that 0 and each prime number occur as characteristic of fields.
It is not difficult to prove that no other numbers can occur:
Theorem 4.44. Let F be a field. If char F 6= 0, then p := char F is a prime number.
As the field F has no nontrivial zero divisors, we must have φ1 (a) = 0 or φ1 (b) = 0. But
p = min{n ∈ N : φ1 (n) = 0}, implying a = p or b = p, showing p to be prime.
5 Vector Spaces
(i) (V, +) is a commutative group. The neutral element with respect to + is denoted
0 (do not confuse 0 ∈ V with 0 ∈ F – once again, the same symbol is used for
different objects (both objects only coincide for F = V )).
5 VECTOR SPACES 91
(ii) Distributivity:
(iv) The neutral element with respect to the multiplication on F is also neutral with
respect to the scalar multiplication:
∀ 1x = x. (5.4)
x∈V
If V is a vector space over F , then one calls the elements of V vectors and the elements
of F scalars.
Example 5.2. (a) Every field F is a vector space over itself if one uses the field ad-
dition in F as the vector addition and the field multiplication in F as the scalar
multiplication (as important special cases, we obtain that R is a vector space over
R and C is a vector space over C): All the vector space laws are immediate conse-
quences of the corresponding field laws: Def. 5.1(i) holds as every field is a commu-
tative group with respect to addition; Def. 5.1(ii) follows from field distributivity
and multiplicative commutativity on F ; Def. 5.1(iii) is merely the multiplicative
associativity on F ; and Def. 5.1(iv) holds, since scalar multiplication coincides with
field multiplication on F .
(b) The reasoning in (a) actually shows that every field F is a vector space over every
subfield E ⊆ F . In particular, R is a vector space over Q.
(c) If A is any nonempty set, F is a field, and Y is a vector space over the field F ,
then we can make V := F(A, Y ) = Y A (the set of functions from A into Y ) into a
vector space over F by defining for each f, g : A −→ Y :
To verify that (V, +, ·) is, indeed, a vector space, we begin by checking Def. 5.1(i),
i.e. by showing (V, +) is a commutative group: Since (5.5a) is precisely the pointwise
5 VECTOR SPACES 92
where the validity of Def. 5.1(iii) in the vector space Y was used at (∗).
Finally, to prove Def. 5.1(iv), one computes
(∗)
∀ ∀ (1 · f )(x) = 1 · f (x) = f (x),
f ∈V x∈A
where the validity of Def. 5.1(iv) in the vector space Y was used at (∗).
(d) Let F be a field (e.g. F = R or F = C) and n ∈ N. For x = (x1 , . . . , xn ) ∈ F n ,
y = (y1 , . . . , yn ) ∈ F n , and λ ∈ F , define componentwise addition
x + y := (x1 + y1 , . . . , xn + yn ), (5.6a)
Proposition 5.3. The following rules hold in each vector space V over the field F
(here, even though we will usually use the same symbol for both objects, we write 0 for
the additive neutral element in F and ~0 for the additive neutral element in V ):
Proof. Exercise.
Definition 5.4. Let (V, +, ·) be a vector space over the field F , ∅ 6= U ⊆ V . We call U
a subspace of V if, and only if, (U, +, ·) forms a vector space over F with respect to the
operations + and · it inherits from V , i.e. with respect to +↾U ×U and ·↾F ×U .
Theorem 5.5. Let (V, +, ·) be a vector space over the field F , ∅ 6= U ⊆ V . Then the
following statements are equivalent:
(i) U is a subspace of V .
∀ x + y ∈ U, (5.7a)
x,y∈U
∧ ∀ ∀ λx ∈ U. (5.7b)
λ∈F x∈U
Proof. From the definition of a vector space, it is immediate that (i) implies (ii) and
(iii).
“(iii)⇒(ii)”: One obtains (5.7a) from (5.8) by setting λ := µ := 1, and one obtains
(5.7b) from (5.8) by setting y := 0 (and µ arbitrary, e.g., µ := 1).
“(ii)⇒(i)”: Clearly, U inherits commutativity and distributivity as well as the validity
of (5.3) and (5.4) from V . Moreover, if x ∈ U , then, setting λ := −1 in (5.7b), shows
−x ∈ U . This, together with (5.7a), proves (U, +) to be a subgroup of (V, +), thereby
completing the proof of (i).
5 VECTOR SPACES 94
Example 5.6. (a) As a consequence of Th. 5.5, if V is a vector space over the field F ,
then {0} ⊆ V is always a subspace of V (sometimes called the trivial or the zero
vector space over F ).
(b) √
Q is not a subspace of R if R is considered as a vector space over R (for example,
2·2 ∈/ Q). However, Q is a subspace of R if R is considered as a vector space
over Q.
(c) Consider the vector space V := R3 over R and let
U := {(x, y, z) ∈ V : 3x − y + 7z = 0}.
We use Th. 5.5 to show U is a subspace of V : Since (0, 0, 0) ∈ U , we have U 6= ∅.
Moreover, if u1 = (x1 , y1 , z1 ) ∈ U , u2 = (x2 , y2 , z2 ) ∈ U , and λ, µ ∈ R, then
λu1 + µu2 = (λx1 + µx2 , λy1 + µy2 , λz1 + µz2 )
and
3(λx1 + µx2 ) − (λy1 + µy2 ) + 7(λz1 + µz2 )
u1 ,u2 ∈U
= λ(3x1 − y1 + 7z1 ) + µ(3x2 − y2 + 7z2 ) = λ · 0 + µ · 0 = 0,
showing λu1 + µu2 and proving U to be a subspace of V .
(d) It is customary to write K if K may stand for R or C, and, from now on, we will
occasionally use this notation. From Ex. 5.2(c), we know that, for each ∅ 6= A,
F(A, K) constitutes a vector space over K. Thus, as a consequence of Th. 5.5, a
subset of F(A, K) is a vector space over K if, and only if, it is closed under addition
and scalar multiplication. By using results from Calculus, we obtain the following
examples:
(i) The set P(K) of all polynomials mapping from K into K is a vector space over
K by [Phi16, Rem. 6.4]; for each n ∈ N0 , the set Pn (K) of all such polynomials
of degree at most n is also a vector space over K by [Phi16, Rem. (6.4a),(6.4b)].
(ii) If ∅ 6= M ⊆ C, then the set of continuous functions from M into K, i.e.
C(M, K), is a vector space over K by [Phi16, Th. 7.38].
(iii) If a, b ∈ R ∪ {−∞, ∞} and a < b, then the set of differentiable functions from
I :=]a, b[ into K is a vector space over K by [Phi16, Th. 9.7(a),(b)]. Moreover,
[Phi16, Th. 9.7(a),(b)] also implies that, for each k ∈ N, the set of k times
differentiable functions from I into K is a vector space over K, and so is each
set C k (I, K) of k times continuously differentiable functions ([Phi16,
T Th. 7.38]
∞ k
is also needed for the last conclusion). The set C (I, K) := k∈N C (I, K) is
also a vector space over K by Th. 5.7(a) below.
5 VECTOR SPACES 95
(a) Let I 6= ∅ be T
an index set and (Ui )i∈I a family of subspaces of V . Then the inter-
section U := i∈I Ui is also a subspace of V .
(b) In contrast to intersections, unions of subspaces are almost never subspaces (how-
ever, cf. (c) below). More precisely, if U1 and U2 are subspaces of V , then
U1 ∪ U2 is subspace of V ⇔ U1 ⊆ U2 ∨ U2 ⊆ U1 . (5.9)
(c) While (b) shows that unions of subspaces are not necessarily subspaces, if I 6= ∅ is
an index set, partially ordered by ≤ in a way such that, for each i, j ∈ I, there exists
k ∈ I with i, j ≤ k (if I is totally ordered by ≤, then one can use k := max{i, j}),
and (Ui )i∈I is an increasing family
S of subspaces (i.e., for each i, j ∈ I with i ≤ j,
one has Ui ⊆ Uj ), then U := i∈I Ui is a subspace as well.
While the union of two subspaces U1 , U2 is typically not a subspace, Th. 5.7(a) implies
there is always a smallest subspace containing U1 ∪ U2 and we will see below that this
subspace is always the so-called sum U1 + U2 of the two subspaces (this is a special case
of Prop. 5.11). As it turns out, the sum of vector spaces is closely connected with two
other essential notions of vector space theory, namely the notions linear combination
and span, which we define first:
(b) If A = ∅, then hAi = {0}; if A = 6 ∅, then hAi is the set of all linear combinations
of elements from A, i.e.
( n )
X
hAi = λ i ai : n ∈ N ∧ λ 1 , . . . , λ n ∈ F ∧ a1 , . . . , a n ∈ A . (5.13)
i=1
Definition 5.10. Let V be a vector space over the field F , let I be an index set and
let (Ui )i∈I be a family of subspaces of V . For I = ∅, define
X
Ui := {0}. (5.14a)
i∈I
⇒ ∀ λu = 0. (5.16a)
u∈W
Occasionally, one also wants to have the notion available for families of vectors
instead of sets, and one calls a family (ui )i∈I of vectors in V linearly independent
if, and only if,
!
X
n ∈ N ∧ J ⊆ I ∧ #J = n ∧ λ j uj = 0 ∧ ∀ λj ∈ F
j∈J
j∈J
⇒ ∀ λj = 0. (5.16b)
j∈J
Sets and families that are not linearly independent are called linearly dependent.
Example 5.13. Let V be a vector space over the field F .
(a) ∅ is linearly independent: Indeed, if U = ∅, then the left side of the implication in
(5.16a) is always false (since W ⊆ U means #W = 0), i.e. the implication is true.
Moreover, by Def. 5.12(a), vP∈ V is linearly dependent on ∅ if, and only if, v = 0
(this is also consistent with u∈∅ λu u = 0 (cf. (3.15d)).
(b) If 0 ∈ U ⊆ V , then U is linearly dependent (in particular, {0} is linearly dependent),
due to 1 · 0 = 0.
(c) If 0 6= v ∈ V and λ ∈ F with λv = 0, then λ = 0 by Prop. 5.3(d), showing {v} to be
linearly independent. However, the family (v, v) is always linearly dependent, since
1v + (−1)v = 0 (also for v = 0). Moreover, 1v = v also shows that every v ∈ V is
linearly dependent on itself.
5 VECTOR SPACES 99
(d) The following example is of some general importance: Let I be a set and V = F I
(cf. Ex. 5.2(c),(d)). Define
(
1 if i = j,
∀ ei : I −→ F, ei (j) := δij := (5.17)
i∈I 0 if i 6= j.
∀ fk : R −→ K, fk (x) := ekx ,
k∈N0
(a) U is linearly dependent if, and only if, there exists u0 ∈ U such that u0 is linearly
dependent on U \ {u0 }.
implying
k
X k+n
X k+m
X
0= (λi − µi )zi + λi xi − µ i yi .
i=1 i=k+1 i=k+1
The linear independence of U then yields λ1 = µ1 , . . . , λk = µk as well as λk+1 =
· · · = λk+n = µk+1 = · · · = µk+m = 0. Thus, in particular, u ∈ hU1 ∩ U2 i, proving
V1 ∩ V2 ⊆ hU1 ∩ U2 i as desired.
Definition 5.15. Let V be a vector space over the field F and B ⊆ V . Then B is
called a basis of V if, and only if, B is a generating set for V (i.e. V = hBi) that is also
linearly independent.
Example 5.16. (a) Due to Ex. 5.13(a), in each vector space V over a field F , we have
that ∅ is linearly independent with h∅i = {0}, i.e. ∅ is a basis of the trivial space
{0}.
(b) If one considers a field F as a vector space over itself, then, clearly, every {λ} with
0 6= λ ∈ F is a basis.
(c) We continue Ex. 5.13(d), where we showed that, given a field F and a set I, B =
{ei : i ∈ I} with the ei defined by (5.17) constitutes a linearly independent subset
of the vector space F I over F . We will now show that hBi = Ffin I
(i.e. B is a basis
I I
of Ffin ), where Ffin denotes the set of functions f : I −→ F such that there exists
a finite set If ⊆ I, satisfying
f (i) = 0 for each i ∈ I \ If , (5.22a)
f (i) 6= 0 for each i ∈ If (5.22b)
5 VECTOR SPACES 102
I
(then Ffin = F I if, and only if, I is finite (for example Ffinn
= F n for n ∈ N); but, in
I I I
general, Ffin is a strict subset of F ). We first show that Ffin is always a subspace
of F I : Indeed, if f, g ∈ Ffin
I
and λ ∈ F , then Iλf = If for λ 6= 0, Iλf = ∅ for λ = 0,
I I I
and If +g ⊆ If ∪ Ig , i.e. λf ∈ Ffin and f + g ∈ Ffin . We also note B ⊆ Ffin , since
I
Iei = {i} for each i ∈ I. To see hBi = Ffin , we merely note that
X
∀ f= f (i)ei . (5.23)
I
f ∈Ffin
i∈If
I
Thus, B is a basis of Ffin as claimed – B is often referred to as the standard basis of
I
Ffin . In particular, we have shown that, for each n ∈ N, the set {ej : j = 1, . . . , n},
where e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1), forms a basis of
F n (of Rn if F = R and of Cn if F = C). If I = N, then Ffin I
is the space of all
sequences in F that have only finitely many nonzero entries. We will see later that,
I
actually, every vector space is isomorphic to some suitable Ffin (cf. Rem. 5.25 and
Th. 6.9).
(d) In Ex. 5.6(d)(i), we noted that the set P(K) of polynomials with coefficients in
K is a vector space over K. We show the set B := {ej : j ∈ N0 } of monomials
ej : K −→ K, ej (x) := xj , to be a basis of P(K): While hBi = P(K) is immediate
from the definition of P(K), linear independence of B can be shown similarly to the
linear independence of the exponential functions in Ex. 5.13(e): As in that example,
we use the differentiability of the considered functions (here, the ej ) to show each
set Un := {ej : j ∈ {0, . . . , n}}, n ∈ N0 , is linearly independent, using an induction
on n: Since, for n = 0, en ≡ 1, the base case holds. Now let n ≥ 1 and assume
λ0 , . . . , λn ∈ K are such that
n
X
∀ λj xj = 0. (5.24)
x∈R
j=0
Theorem 5.17. Let V be a vector space over the field F and B ⊆ V . Then the following
statements (i) – (iii) are equivalent:
5 VECTOR SPACES 103
(i) B is a basis of V .
(ii) B is a maximal linearly independent subset of V , i.e. B is linearly independent
and each set A ⊆ V with B ( A is linearly dependent.
(iii) B is a minimal generating set for V , i.e. hBi = V and hAi ( V for each A ( B.
showing hŨ i = V .
(b): Suppose µ0 ∈ F and µu ∈ F , u ∈ U \ {u0 }, are such that
X (5.27) X
0 = µ0 v0 + µ u u = µ 0 λ u 0 u0 + (µ0 λu + µu )u.
u∈U \{u0 } u∈U \{u0 }
P that, for v = 0, one has Bv = ∅, c is the empty map, and (5.28) becomes 0 =
Note
b∈∅ c(b) b.
5 VECTOR SPACES 105
Proof. The existence of Bv and the map c follows from the fact that the basis B is a
generating set, hBi = V . For the uniqueness proof, consider finite sets Bv , B̃v ⊆ B and
maps c : Bv −→ F \ {0}, c̃ : B̃v −→ F \ {0} such that
X X
v= c(b) b = c̃(b) b. (5.29)
b∈Bv b∈B̃v
Extend both c and c̃ to A := Bv ∪ B̃v by letting c(b) := 0 for b ∈ B̃v \ Bv and c̃(b) := 0
for b ∈ Bv \ B̃v . Then X
0= c(b) − c̃(b) b, (5.30)
b∈A
such that the linear independence of A implies c(b) = c̃(b) for each b ∈ A, which, in
turn, implies Bv = B̃v and c = c̃.
Remark 5.20. If the basis B of V has finitely many elements, then one often enumerates
the elements B = {b1 , . . . , bn }, n = #B ∈ N, and writes λi := c(bi ) for bi ∈ Bv , λi := 0
for bi ∈
/ Bv , such that (5.28) takes the, perhaps more familiar looking, form
n
X
v= λi bi . (5.31)
i=1
In Ex. 5.16(c), we have seen that the vector space F n , n ∈ N, over F has the finite
basis {e1 , . . . , en } and, more generally, that, for each set I, B = {ei : i ∈ I} is a basis
I
for the vector space Ffin over F . Since, clearly, B and I have the same cardinality (via
the bijective map i 7→ ei ), this shows that, for each set I, there exists a vector space,
whose basis has the same cardinality as I. In Th. 5.23 below, we will show one of the
fundamental results of vector space theory, namely that every vector space has a basis
and that two bases of the same vector space always have the same cardinality (which is
referred to as the dimension of the vector space, cf. Def. 5.24).
As some previous results of this class, the proof of Th. 5.23 makes use of the axiom
of choice (AC) of Appendix A.4, which postulates, for each nonempty set M, whose
elements are all nonempty sets, the existence of a choice function, that means a function
that assigns, to each M ∈ M, an element m ∈ M . Somewhat surprisingly, AC can not
be proved (or disproved) from ZF, i.e. from the remaining standard axioms of set theory
(see Appendix A.4 for a reference). Previously, when AC was used, choice functions were
employed directly. However, in the proof of Th. 5.23, AC enters in the form of Zorn’s
lemma. While Zorn’s lemma turns out to be equivalent to AC, the equivalence is not
obvious (see Th. A.52(iii)). However, Zorn’s lemma provides an important technique
5 VECTOR SPACES 106
Definition 5.21 (Same as Part of Appendix Def. A.50). Let X be a set and let ≤ be
a partial order on X.
(a) An element m ∈ X is called maximal (with respect to ≤) if, and only if, there exists
no x ∈ X such that m < x (note that a maximal element does not have to be a
max and that a maximal element is not necessarily unique).
(b) A nonempty subset C of X is called a chain if, and only if, C is totally ordered by
≤.
Theorem 5.22 (Zorn’s Lemma). Let X be a nonempty partially ordered set. If every
chain C ⊆ X (i.e. every nonempty totally ordered subset of X) has an upper bound in
X (such chains with upper bounds are sometimes called inductive), then X contains a
maximal element (cf. Def. 5.21(a)).
Proof. According to Th. A.52(iii), Zorn’s lemma is equivalent to the axiom of choice.
(c) Bases of V have a unique cardinality, i.e. if B ⊆ V and B̃ ⊆ V are both bases of
V , then there exists a bijective map φ : B −→ B̃. In particular, if #B = n ∈ N0 ,
then #B̃ = n.
Proof. If V is generated by finitely many vectors, then the proof does not need Zorn’s
lemma and, thus, we treat this case first: If V = {0}, then (a) – (d) clearly hold. Thus,
we also assume V 6= {0}. To prove (d), assume B = {b1 , . . . , bn } is a basis of V with
5 VECTOR SPACES 107
is totally ordered by ⊆, by Th. 3.25(a), there exist i0 ∈ {1, . . . , n} such that Ci0 = max D,
i.e. Ci ⊆ Ci0 for each i ∈ {1, . . . , n}, implying ci ∈ Ci0 for each i ∈ {1, . . . , n} as well.
As Ci0 is linearly independent, λ1 = · · · = λn = 0, showing C0 is linearly independent,
as desired. Having, thus, verified all hypotheses of Zorn’s lemma, Th. 5.22 provides a
maximal element B ∈ M. By Th. 5.17(ii), B is a basis of V . As U ⊆ B also holds, this
proves (a) and, in particular, (b).
To prove (c), let B1 , B2 be bases of V . As we have shown above that (c) holds, provided
B1 or B2 is finite, we now assume B1 and B2 both to be infinite. We show the existence
of an injective map φ1 : B2 −→ B1 : To this end, for each v ∈ B1 , let B2,v be the finite
subset of B2 given by Th. 5.19, consisting of all b ∈ B2 such that the coordinate of v
with respect to b is nonzero. Let ψv : B2,v −→ {1, . . . , #B2,v } be bijective. Also define
[
E := B2,v .
v∈B1
Then, since V = hB1 i, we also have V = hEi. Thus, as E ⊆ B2 and the basis B2 is
a minimal generating set, we obtain E = B2 . In particular, for each b ∈ B2 , we may
choose some v(b) ∈ B1 such that b ∈ B2,v(b) . The map
f : B2 −→ B1 × N, f (b) := v(b), ψv(b) (b) ,
is then, clearly, well-defined and injective. Moreover, Th. A.57 of the Appendix provides
us with a bijective map φB1 : B1 × N −→ B1 . In consequence, φ1 : B2 −→ B1 ,
φ1 := φB1 ◦ f is injective as well. Since B1 , B2 were arbitrary bases, we can interchange
the roles of B1 and B2 to also obtain an injective map φ2 : B1 −→ B2 . According to
the Schröder-Bernstein Th. 3.12, there then also exists a bijective map φ : B1 −→ B2 ,
completing the proof of (c).
To prove (d), let B be a basis of V and let U ⊆ V be linearly independent. Analogously
to the proof of (a), apply Zorn’s lemma, but this time with the set
such that the maximal element B̃ ∈ M, obtained from Th. 5.22, has the form B̃ = U ∪˙ C
with C ⊆ B. We claim B̃ to be a basis of V : Linear independence is clear, since B̃ ∈ M,
and it only remains to check hB̃i = V . Due to the maximality of B̃, if b ∈ B, there exist
a finite set B̃b ⊆ B̃ and λbu ∈ F , u ∈ B̃b , such that
X
b= λbu u. (5.35)
u∈B̃b
λ1 , . . . , λn ∈ F , n ∈ N, such that
n
X n X
X
(5.35)
v= λi bi = λi λbi u u,
i=1 i=1 u∈B̃b
i
Definition 5.24. According to Th. 5.23, for each vector space V over a field F , the
cardinality of its bases is unique. This unique cardinality is called the dimension of V and
is denoted dim V . If dim V < ∞ (i.e. dim V ∈ N0 ), then V is called finite-dimensional,
otherwise infinite-dimensional.
Remark 5.25. Let F be a field. In Ex. 5.16(c), we saw that B = {ei : i ∈ I} with
(
1 if i = j,
∀ ei : I −→ F, ei (j) = δij :=
i∈I 0 if i 6= j,
I
is a basis of Ffin . Since #B = #I, we have shown
I
dim Ffin = #I, (5.36)
Remark 5.26. Bases of vector spaces are especially useful if they are much smaller than
the vector space itself, as in the case of finite-dimensional vector spaces over infinite
fields, Rn and Cn being among the most important examples. For infinite-dimensional
vector spaces V , one often has dim V = #V (cf. Th. E.1 and Cor. E.2 in the Appendix)
such that vector space bases are not as helpful in such situations. On the other hand,
infinite-dimensional vector spaces might come with some additional structure, providing
a more useful notion of basis (e.g., in infinite-dimensional Hilbert spaces, one typically
uses so-called orthonormal bases rather than vector space bases).
—
In the rest of the section, we investigate relations between bases and subspaces; and we
introduce the related notions direct complement and direct sum.
5 VECTOR SPACES 110
Theorem 5.27. Let V be a vector space over the field F and let U be a subspace of V .
(b) dim U ≤ dim V , i.e. if BU is a basis of U and B is a basis of V , then there exists
an injective map φ : BU −→ B.
dim U = dim V ⇔ U = V.
Definition 5.28. Let V be a vector space over the field F and let U, W be subspaces
of V . Then W is called a direct complement of U if, and only if, V = U + W and
U ∩ W = {0}. In that case, we also write V = U ⊕ W and we say that V is the direct
sum of U and W .
—
Proposition 5.29. Let V be a vector space over the field F and let U, W be subspaces
of V such that V = U ⊕ W . If there exist 0 6= u ∈ U and 0 6= w ∈ W , then there exists
a subspace W̃ 6= W of V with V = U ⊕ W̃ .
(a) U + W = hBU ∪ BW i.
(b) If U ∩ W = {0}, then BU ∪˙ BW is a basis of U ⊕ W .
(c) It holds that
dim(U + W ) + dim(U ∩ W ) = dim U + dim W, (5.38)
by which we mean that there exists a bijective map
˙ W × {1}) −→ (B+ × {0}) ∪(B
φ : (BU × {0}) ∪(B ˙ ∩ × {1}).
Proof. (a): Let B := BU ∪BW . If v = u+w with u ∈ U , w ∈ W , then there exist distinct
u1 , . . . , un ∈ BU , distinct w1 , . . . , wm ∈ BW (m, n ∈ N), and λ1 , . . . , λn , µ1 , . . . , µm ∈ F
such that n m
X X
v =u+w = λ i ui + µi w i , (5.39)
i=1 i=1
6 LINEAR MAPS 112
showing U + W = hBi.
(b): Let B := BU ∪ BW . We know U ⊕ W = hBi from (a). Now consider (5.39)
with u + w = 0. Then u = −w ∈ U ∩ W = {0}, i.e. u = 0. Then w = 0 as well and
0 = λ1 = · · · = λn = µ1 = · · · = µm , due to the linear independence of the u1 , . . . , un and
the linear independence of the w1 , . . . , wm . In consequence, B is linearly independent
as well and, thus, a basis of U ⊕ W .
(c): According to Th. 5.27(a), we may choose BU and BW such that B∩ ⊆ BU and
B∩ ⊆ BW . Let B0 := BU \ B∩ , U0 := hB0 i, B1 := BW \ B∩ , U1 := hB1 i. Then
B := BU ∪ BW = B0 ∪˙ B1 ∪˙ B∩
is linearly independent: Consider, once again, the situation of the proof of (b), with
v = 0 and, this time, with u1 , . . . , un ∈ B0 . Again, we obtain u = −w ∈ W , i.e.
Prop. 5.14(d)
u ∈ U0 ∩ (U ∩ W ) = hB0 i ∩ hB∩ i = hB0 ∩ B∩ i = {0}.
Thus, 0 = λ1 = · · · = λn = 0, due to the linear independence of B0 . This then implies
w = 0 and 0 = µ1 = · · · = µm due to the linear independence of BW , yielding the
claimed linear independence of B. According to (a), we also have U + W = hBi, i.e. B
is a basis of U + W . Now we are in a position to define
˙ W × {1}) −→ (B × {0}) ∪(B
φ : (BU × {0}) ∪(B ˙ ∩ × {1}),
(b, 0) for b ∈ BU and α = 0,
φ(b, α) := (b, 0) for b ∈ B1 and α = 1,
(b, 1) for b ∈ B∩ and α = 1,
which is, clearly, bijective. Bearing in mind Th. 5.23(c), this proves (c).
(d): As B = BU ∪˙ BW is a basis of V , we can write each v ∈ V as v = u + w with u ∈ U
and w ∈ W , showing V = U + W . On the other hand,
Prop. 5.14(d)
U ∩ V = hBU i ∩ hBV i = hBU ∩ BV i = {0},
proving V = U ⊕ W .
6 Linear Maps
4.10 and Def. 4.34(b). In the context of vector spaces, the homomorphisms are so-called
linear maps, which we proceed to define and study next.
Definition 6.1. Let V and W be vector spaces over the field F . A map A : V −→ W
is called vector space homomorphism or F -linear (or merely linear if the field F is
understood) if, and only if,
(note that, in general, vector addition and scalar multiplication will be different on the
left-hand sides and right-hand sides of the above equations). Thus, A is linear if, and
only if, it is a group homomorphism with respect to addition and also satisfies (6.1b).
In particular, if A is linear, then ker A and Im A are defined by Def. 4.19, i.e.
We denote the set of all F -linear maps from V into W by L(V, W ). The notions
vector space (or linear) monomorphism, epimorphism, isomorphism, endomorphism,
automorphism are then defined as in Def. 4.10. Moreover, V and W are called isomorphic
(denoted V ∼
= W ) if, and only if, there exists a vector space isomorphism A : V −→ W .
—
Before providing examples of linear maps in Ex. 6.7 below, we first provide and study
some basic properties of such maps.
Notation 6.2. For the composition of linear maps A and B, we often write BA instead
of B ◦ A. For the application of a linear map, we also often write Av instead of A(v).
Proof. Exercise.
Proposition 6.5. Let V and W be vector spaces over the field F , and let A : V −→ W
be linear.
6 LINEAR MAPS 115
(a) A is injective if, and only if, for each linearly independent subset S of V , A(S) is
a linearly independent subset of W .
(b) A is surjective if, and only if, for each generating subset S of V , A(S) is a generating
subset of W .
(c) A is bijective if, and only if, for each basis B of V , A(B) is a basis of W .
Proof. Exercise.
Theorem 6.6. Let V and W be vector spaces over the field F . Then each linear map
A : V −→ W is uniquely determined by its values on a basis of V . More precisely, if B is
a basis of V , (wb )b∈B is a family in W , and, for each v ∈ V , Bv and cv : Bv −→ F \ {0}
are as in Th. 5.19 (we now write cv instead of c to underline the dependence of c on v),
then the map
!
X X
A : V −→ W, A(v) = A cv (b) b := cv (b) wb , (6.4)
b∈Bv b∈Bv
∀ Ã(b) = wb , (6.5)
b∈B
implies A = Ã.
If v ∈ V and Bv and cv are as before, then the linearity of à and (6.5) imply
!
X Ã∈L(V,W ) X X
Ã(v) = Ã cv (b) b = cv (b) Ã(b) = cv (b) wb = A(v). (6.7)
b∈Bv b∈Bv b∈Bv
Then, by Th. 5.19, there exists a unique family (aji )(j,i)∈{1,...,m}×{1,...,n} in F such
that m
X
∀ wi = A(bi ) = aji cj . (6.8)
i∈{1,...,n}
j=1
Pn
Thus, given v = λi bi ∈ V , λ1 , . . . , λn ∈ F , we find
i=1
n
! n n X m m n
!
X X X X X
A(v) = A λi bi = λi w i = λi aji cj = aji λi cj
i=1 i=1 i=1 j=1 j=1 i=1
m
X n
X
= µj cj , where µj := aji λi . (6.9)
j=1 i=1
Thus,
Pn from Th. 6.6, we concludePm that the map,
Pnassigning to each vector v =
i=1 λi bi ∈ V the vector j=1 µj cj with µj := i=1 aji λi is precisely the unique
linear map A : V −→ W , satisfying (6.8).
(b) Let V be a vector space over the field F , let B be a basis of V . Given v ∈ V , let
Bv and cv : Bv −→ F \ {0} be as in Th. 5.19. For each c ∈ B, define
(
cv (c) for c ∈ Bv ,
πc : V −→ F, πc (v) :=
0 for c ∈
/ Bv ,
calling πc the projection onto the coordinate with respect to c. Comparing with
(6.4), we see that πc is precisely the linear map A : V −→ W := F , determined by
the family (wb )b∈B in F with
(
1 for b = c,
∀ wb := δbc := (6.10)
b∈B 0 for b 6= c,
6 LINEAR MAPS 117
as this yields
! (
X X X cv (c) for c ∈ Bv ,
∀ A(v) = A cv (b) b = cv (b) wb = cv (b) δbc =
v∈V
b∈Bv b∈Bv b∈Bv
0 for c ∈
/ Bv .
(c) Let I be a nonempty set and let Y be a vector space over the field F . We then
know from Ex. 5.2(c) that V := F(I, Y ) = Y I is a vector space over F . For each
i ∈ I, define
πi : V −→ Y, πi (f ) := f (i),
calling πi the projection onto the ith coordinate (note that, for I = {1, . . . , n},
n ∈ N, Y = F , one has V = F n , πi (v1 , . . . , vn ) = vi ). We verify πi to be linear: Let
λ ∈ F and f, g ∈ V . Then
πi (λf ) = (λf )(i) = λ(f (i)) = λπi (f ),
πi (f + g) = (f + g)(i) = f (i) + g(i) = πi (f ) + πi (g),
proving πi to be linear. Moreover, for each i ∈ I, we have
Im πi = Y, ker πi = {(f : I −→ Y ) : f (i) = 0} ∼
= Y I\{i} , V = Y ⊕ ker πi ,
(6.12)
where, for the last equality, we identified
Y ∼
= {(f : I −→ Y ) : f (j) = 0 for each j 6= i}.
A generalization of the present example to general Cartesian products of vector
spaces can be found in Ex. E.4 of the Appendix. To investigate the relation between
the present projections and the projections of (b), let Y = F . We know from Ex.
5.16(c) that B = {ei : i ∈ I} with
(
1 if i = j,
∀ ei : I −→ F, ei (j) = δij :=
i∈I 0 if i 6= j,
I I I
is a basis of FP
fin , which is a subspace of F , such that the πi are defined on Ffin .
n
Now, if f = j=1 λij eij with λ1 , . . . , λn ∈ F ; i1 , . . . , in ∈ I; n ∈ N, and πeik is
defined as in (b) (k ∈ {1, . . . , n}), then
n
X n
X
πik (f ) = f (ik ) = λij eij (ik ) = λij δjk = λik = πeik (f ), (6.13)
j=1 j=1
(d) We consider the set of complex numbers C as vector space over R with basis {1, i}.
Then the map of complex conjugation
A : C −→ C, A(z) = A(x + iy) = z̄ = x − iy,
is R-linear, since, for each z = x + iy, w = u + iv ∈ C,
z + w = x+u−iy−iv = z̄+w̄ ∧ zw = xu−yv−(xv+yu)i = (x−iy)(u−iv) = z̄ w̄
and z̄ = z for z ∈ R. As A is bijective with A = A−1 , A is even an automorphism.
However, complex conjugation is not C-linear (now considering C as a vector space
over itself), since, e.g., i · 1 = −i 6= i · 1̄ = i.
(e) By using results from Calculus, we obtain the following examples:
(i) Let V be the set of convergent sequences in K. According to [Phi16, Th.
7.13(a)], scalar multiples of convergent sequences are convergent and sums of
convergent sequences are convergent, showing V to be a subspace of KN , the
vector space over K of all sequences in K. If (zn )n∈N and (wn )n∈N are elements
of V and λ ∈ K, then, using [Phi16, Th. 7.13(a)] once again, we know
lim (λzn ) = λ lim zn , lim (zn + wn ) = lim zn + lim wn ,
n→∞ n→∞ n→∞ n→∞ n→∞
(iii) Let a, b ∈ R, a ≤ b, I := [a, b], and let W := R(I, K) be the set of all K-valued
Riemann integrable functions on I. According to [Phi16, Th. 10.11(a)], scalar
multiples of Riemann integrable functions are Riemann integrable and sums
of Riemann integrable functions are Riemann integrable, showing W to be
another subspace of KI . Using [Phi16, Th. 10.11(a)] again, we know the map
Z
B : W −→ K, B(f ) := f, (6.17)
I
For some of the linear maps investigated in Ex. 6.7 above, we have provided the re-
spective kernels, images, and direct complements of the kernels. The examples suggest
relations between the dimensions of domain, image, and kernel of a linear map, which
are stated an proved in the following theorem:
Theorem 6.8 (Dimension Formulas). Let V and W be vector spaces over the field F ,
and let A : V −→ W be linear. Moreover, let BV be a basis of V , let Bker be a basis of
ker A, let BIm be a basis of Im A, and let BW be a basis of W .
(b) dim Im A ≤ dim V , i.e. there exists an injective map from BIm into BV .
(c) dim Im A ≤ dim W , i.e. there exists an injective map from BIm into BW .
The following theorem is one of the central results of Linear Algebra: It says that vector
spaces are essentially determined by the size of their bases:
Theorem 6.9. Let V and W be vector spaces over the field F . Then V ∼
= W (i.e. V
and W are isomorphic) if, and only if, dim V = dim W .
(i) A is an isomorphism.
6 LINEAR MAPS 121
(ii) A is an epimorphism.
(iii) A is a monomorphism.
Example 6.11. The present example shows that the analogue of Th. 6.10 does not hold
for infinite-dimensional vector spaces: Let F be a field and V := F N (the vector space
of sequences in F – the example actually still works in precisely the same way over the
N
simpler space Ffin ). Define the maps
R : V −→ V, R(λ1 , λ2 , . . . , ) := (0, λ1 , λ2 , . . . , ),
L : V −→ V, L(λ1 , λ2 , . . . , ) := (λ2 , λ3 , . . . )
(R is called the right shift operator, L is called the left shift operator). Clearly, R and L
are linear and
L ◦ R = Id,
i.e. L is a left inverse for R, R is a right inverse for L. Thus, according to Th. 2.13,
R is injective and L is surjective (which is also easily seen directly from the respective
definitions of R and L). However,
Im R = (f : N −→ F ) : f (1) = 0 6= V (e.g. (1, 0, 0, . . . , ) ∈
/ Im R),
ker L = (f : N −→ F ) : f (k) = 0 for k 6= 1 6= {0} (e.g. 0 6= (1, 0, 0, . . . , ) ∈ ker L),
V , then we can always form the quotient group V /U . We will see below that we can
even make V /U into a vector space over F , called the quotient space or factor space of
V with respect to U . As we write (V, +) as an additive group, we write the respective
cosets (i.e. the elements of V /U ) as v + U , v ∈ V .
Theorem 6.12. Let V be a vector space over the field F and let U be a subspace of V .
+ : V /U × V /U −→ V /U, (v + U ) + (w + U ) := v + w + U, (6.21a)
· : F × V /U −→ V /U, λ · (v + U ) := λv + U, (6.21b)
are well-defined, i.e. the results do not depend on the chosen representatives of the
respective cosets.
(b) The natural (group) epimorphism of Th. 4.27(a),
satisfies
(c) V /U with the compositions of (a) forms a vector space over F and φU of (b)
constitutes a linear epimorphism.
Proof. (a): The composition + is well-defined by Th. 4.27(a). To verify that · is well-
defined as well, let v, w ∈ V , λ ∈ F , and assume v + U = w + U . We need to show
λv + U = λw + U . If x ∈ λv + U , then there exists u1 ∈ U such that x = λv + u1 . Since
v + U = w + U , there then exists u2 ∈ U such that v + u1 = w + u2 and v = w + u2 − u1 .
Thus, x = λv+u1 = λw+λ(u2 −u1 )+u1 . Since U is a subspace of V , λ(u2 −u1 )+u1 ∈ U ,
showing x ∈ λw + U and λv + U ⊆ λw + U . As we can switch the roles of v and w in
the above argument, we also have λw + U ⊆ λv + U and λv + U = λw + U , as desired.
(b): (6.23a) holds, as φU is a homomorphism with respect to + by Th. 4.27(a). To
verify (6.23b), let λ ∈ F , v ∈ V . Then
(6.21b)
φU (λv) = λv + U = λ(v + U ) = λφU (v),
One can sometimes obtain information about a vector space V by studying a subspace
U and the corresponding quotient space V /U , both of which are “smaller” spaces in the
sense of the following Cor. 6.13:
Corollary 6.13. Let V be a vector space over the field F and let U be a subspace of V .
Let φ : V −→ V /U , φ(v) := v + U , denote the natural epimorphism.
From Def. 4.26 and Th. 4.27(a), we already know that the elements of V /U are precisely
the equivalence classes of the equivalence relation defined by
∀ v ∼ w :⇔ v + U = w + U ⇔ v − w ∈ U . (6.24)
v,w∈V
Here, in the context of vector spaces, it turns out that the equivalence relations on V
that define quotient spaces, are precisely so-called linear equivalence relations:
Definition 6.14. Let V be a vector space over the field F and let R ⊆ V × V be a
relation on V . Then R is called linear if, and only if, it has the following two properties:
Proof. (a): The linearity of ∼ defined by (6.24) is precisely the assertion that the
compositions + and · of Th. 6.12(a) are well-defined.
(b): We have 0 ∈ U , as ∼ is reflexive. If v, w ∈ U and λ ∈ F , then the linearity of ∼
implies v + w ∼ 0 + 0 = 0 and λv ∼ λ0 = 0, i.e. v + w ∈ U and λv ∈ U , showing U to be
a subspace. Now suppose v, w ∈ V with v ∼ w. Then the linearity of ∼ yields v −w ∼ 0,
i.e. v − w ∈ U . If x = v + u with u ∈ U , then x = w + (x − w) = w + (v + u − w) ∈ w + U ,
showing v + U ⊆ w + U . As ∼ is symmetric, we then have w + U ⊆ v + U as well.
Conversely, if v, w ∈ V with v + U = w + U , then v − w ∈ U , i.e. v − w ∼ 0 and the
linearity of ∼ implies v ∼ w, proving ∼ satisfies (6.24).
In Th. 4.27(b), we proved the isomorphism theorem for groups: If G and H are groups
and φ : G −→ H is a homomorphism, then G/ ker φ ∼ = Im φ. Since vector spaces V and
W are, in particular, groups, if A : V −→ W is linear, then V / ker A ∼
= Im A as groups,
and it is natural to ask, whether V / ker A and Im A are isomorphic as vector spaces. In
Th. 6.16(a) below we see this, indeed, to be the case.
Theorem 6.16 (Isomorphism Theorem). Let V and X be vector spaces over the field
F , let U, W be subspaces of V .
V / ker A ∼
= Im A. (6.25)
(V /U )/(W/U ) ∼
= V /W. (6.29)
6 LINEAR MAPS 125
Proof. (a): All assertions, except the linearity assertions, were already proved in Th.
4.27(b). Moreover, fe is linear by Th. 6.12(b). Thus, it merely remains to show
f λ(v + ker A) = λ f (v + ker A)
for each λ ∈ F and each v ∈ V (then fm is linear, as both f and ι are linear). Indeed,
if λ ∈ F and v ∈ V , then
f λ(v + ker A) = f (λv + ker A) = A(λv) = λ A(v) = λ f (v + ker A),
as desired.
(b): According to (a), it suffices to show that
A : U + W −→ U/(U ∩ W ), A(u + w) := u + (U ∩ W ) for u ∈ U , w ∈ W (6.30)
well-defines a linear map with ker A = W . To verify that A is well-defined, let u, u1 ∈ U
and w, w1 ∈ W with u + w = u1 + w1 . Then u − u1 = w1 − w ∈ U ∩ W and, thus,
A(u + w) = u + (U ∩ W ) = u1 + w1 − w + (U ∩ W ) = u1 + (U ∩ W ) = A(u1 + w1 ),
proving A to be well-defined by (6.30). To prove linearity, we no longer assume u + w =
u1 + w1 and calculate
A (u + w) + (u1 + w1 ) = A (u + u1 ) + (w + w1 ) = u + u1 + (U ∩ W )
= u + (U ∩ W ) + u1 + (U ∩ W )
= A(u + w) + A(u1 + w1 ),
as well as, for each λ ∈ F ,
A λ(u + w) = A(λu + λw) = λu + (U ∩ W ) = λ u + (U ∩ W ) = λ A(u + w),
as needed. If w ∈ W , then A(w) = A(0+w) = U ∩W , showing w ∈ ker A and W ⊆ ker A.
Conversely, let u + w ∈ ker A, u ∈ U , w ∈ W . Then A(u + w) = u + (U ∩ W ) = U ∩ W ,
i.e. u ∈ U ∩ W and u + w ∈ W , showing ker A ⊆ W .
(c): Exercise.
Theorem 6.18. Let V and W be vector spaces over the field F . The addition and
scalar multiplication on L(V, W ) given by (6.31) are well-defined in the sense that, if
A, B ∈ L(V, W ) and λ ∈ F , then A + B ∈ L(V, W ) and λA ∈ L(V, W ). Moreover, with
the operations defined in (6.31), L(V, W ) forms a vector space over F .
Proof. Exercise.
Theorem 6.19. Let V and W be vector spaces over the field F , let BV and BW be bases
of V and W , respectively. Using Th. 6.6, define maps Avw ∈ L(V, W ) by letting
(
w for v = ṽ,
∀ Avw (ṽ) := (6.32)
w,v,ṽ∈BW ×BV ×BV 0 for v 6= ṽ.
Define
B := Avw : (v, w) ∈ BV × BW . (6.33)
(c) If dim V = ∞ and dim W ≥ 1, then hBi ( L(V, W ) and, in particular, B is not a
basis of L(V, W ).
implies λ1k = · · · = λM k = 0 due to the linear independence of the wj . As this holds for
each k ∈ {1, . . . , N }, we have established the linear independence of B.
(b): According to (a), it remains to show hBi = L(V, W ). Let A P ∈ L(V, W ) and
i ∈ {1, . . . , n}. Then there exists a finite set Bi ⊆ BW such that Avi = w∈Bi λw w with
6 LINEAR MAPS 127
PM Pn
Letting L := j=1 i=1 aji Avi wj , we claim A = L. Indeed,
M X
X n M X
X n M
X
∀ L(vk ) = aji Avi wj (vk ) = aji δik wj = ajk wj = A(vk ),
k∈{1,...,n}
j=1 i=1 j=1 i=1 j=1
proving (6.35).
(b): For each v ∈ V , we calculate
(A + B)C v = (A + B)(Cv) = A(Cv) + B(Cv) = (AC)v + (BC)v = (AC + BC)v,
proving (6.36).
Theorem 6.21. Let V be a vector space over the field F . Then R := L(V, V ) con-
stitutes a ring with unity with respect to pointwise addition and map composition as
multiplication. If dim V > 1, then R is not commutative and it has nontrivial divisors
of 0.
6 LINEAR MAPS 128
(a) A linear endomorphism A ∈ L(V, V ) is called regular if, and only if, it is an au-
tomorphism (i.e. if, and only if, it is bijective/invertible). Otherwise, A is called
singular.
(b) The set GL(V ) := {A ∈ L(V, V ) : A is regular} = L(V, V )∗ (cf. Def. and Rem.
4.41) is called the general linear group of V (cf. Cor. 6.23 below).
Corollary 6.23. Let V be a vector space over the field F . Then the general linear
group GL(V ) is, indeed, a group with respect to map composition. For V 6= {0}, it is
not a group with respect to addition (note 0 ∈
/ GL(V ) in this case). If dim V ≥ 3 or
F 6= {0, 1} and dim V ≥ 2, then the group GL(V ) is not commutative.
Proof. Since L(V, V ) is a ring with unity, we know GL(V ) = L(V, V )∗ to be a group
by Def. and Rem. 4.41 – since the composition of linear maps is linear, GL(V ) is also a
subgroup of the symmetric group SV (cf. Ex. 4.9(b)). If dim V ≥ 3, then the noncom-
mutativity of GL(V ) follows as in Ex. 4.9(b), where the elements a, b, c of (4.7) are now
taken as distinct elements of a basis of V . Now let v1 , v2 ∈ V be linearly independent
and B a basis of V with v1 , v2 ∈ B. Define A1 , A2 ∈ L(V, V ) by letting, for v ∈ B,
v2 for v = v1 ,
−v1 for v = v1 ,
A1 v := v1 for v = v2 , A2 v := v2 for v = v2 , (6.38)
v otherwise, v otherwise.
7 Matrices
with coordinates (aji )(j,i)∈{1,...,m}×{1,...,n} in F (also cf. Ex. 6.7(a)). This motivates the
following definition of matrices, where, however, instead of F , we allow F an arbitrary
nonempty set S, as it turns out to be sometimes useful to have matrices with entries
that are not necessarily elements of a field.
with m rows and n columns. One therefore also calls 1 × n matrices row vectors and
m × 1 matrices column vectors, and one calls n × n matrices quadratic. One calls the
elements akk the (main) diagonal elements of (aji ) and one also says that these elements
lie on the (main) diagonal of the matrix and that they form the (main) diagonal of the
matrix. The set of all m × n matrices over S is denoted by M(m, n, S), and for the set
of all quadratic n × n matrices, one uses the abbreviation M(n, S) := M(n, n, S).
(a) Matrix Addition: For m × n matrices (aji ) and (bji ) over S, define the sum
(b) Scalar Multiplication: Let (aji ) be an m × n matrix over S. If S is a ring, then let
λ ∈ S; if S is a vector space over F , then let λ ∈ F . Define
(c) Matrix Multiplication: Let S be a ring. For each m × n matrix (aji ) and each n × l
matrix (bji ) over S, define the product
n
!
X
(aji )(bji ) := ajk bki ∈ M(m, l, S), (7.5)
k=1 (j,i)∈{1,...,m}×{1,...,l}
(a) Consider
v1
..
v := . ∈ M(n, 1, S) ∼
= S n, w := w1 . . . wn ∈ M(1, n, S) ∼
= S n.
vn
Then n n
X X
Av = cA
k vk ∈ M(m, 1, S), wB = wk rkB ∈ M(1, l, S),
k=1 k=1
where we wrote cA
k vk , as we do not assume S to be a commutative ring – if S is
commutative (e.g. a field), then the more familiar form vk cA
k is also admissible.
7 MATRICES 132
∀ cAB
i = A cB
i .
i∈{1,...,l}
∀ rjAB = rjA B.
j∈{1,...,m}
yielding
n n
!
X X
∀ xj = ajk vk = cA
k vk ,
j∈{1,...,m}
k=1 k=1 j
n n
!
X X
∀ yi = wk bki = wk rkB ,
i∈{1,...,l}
k=1 k=1 i
proving (b).
(c): For each j ∈ {1, . . . , m}, i ∈ {1, . . . , l}, one computes
n
X
(rjAB )i = (AB)ji = ajk bki = (rjA B)i ,
k=1
proving (c).
Example 7.6. Using the matrices from Ex. 7.3, we have
−3
−4 1 −1 0 1 −1 0
= 1 = · (−3) + ·1+ ·0
−6 2 0 −2 2 0 −2
0
7 MATRICES 133
and
−3 0
−4 −1 = 1 −1 0 1 1 = 1 · −3 0 + (−1) · 1 1 + 0 · 0 −1 .
0 −1
Theorem 7.7. Let S be a ring.
(a) Matrix multiplication of matrices over S is associative whenever all relevant multi-
plications are defined. More precisely, if A = (aji ) is an m × n matrix, B = (bji ) is
an n × l, and C = (cji ) is an l × p matrix, then
(b) Matrix multiplication of matrices over S is distributive whenever all relevant mul-
tiplications are defined. More precisely, if A = (aji ) is an m × n matrix, B = (bji )
and C = (cji ) are n × l matrices, and D = (dji ) is an l × p matrix, then
(c) For each n ∈ N, M(n, S) is a ring. If S is a ring with unity, then M(n, S)
is a ring with unity as well, where the additive neutral element is the zero matrix
0 ∈ M(n, S) (all entries are zero) and the multiplicative neutral element of M(n, S)
is the so-called identity matrix Idn := (δji ),
( 1
1 for i = j,
Idn = . . . ,
δji := (7.9)
0 for i 6= j,
1
where the usual meaning of such notation is that all omitted entries are 0. We call
GLn (S) := M(n, S)∗ , i.e. the group of invertible matrices over S, general linear
group (we will see in Th. 7.10(a) below that, if S = F is a field and V is a vector
space of dimension n over F , then GLn (F ) is isomorphic to the general linear group
GL(V ) of Def. 6.22(b)). Analogous to Def. 6.22(a), we call the matrices in GLn (S)
regular and the matrices in M(n, S) \ GLn (S) singular.
Proof. (a): One has m × p matrices (AB)C = (dji ) and A(BC) = (eji ), where, using
associativity and distributivity in S, one computes
l n
! l X n n l
!
X X X X X
dji = ajk bkα cαi = ajk bkα cαi = ajk bkα cαi = eji ,
α=1 k=1 α=1 k=1 k=1 α=1
7 MATRICES 134
Thus, if we represent v by a column vector ṽ (an n×1 matrix) containing its coordinates
λ1 , . . . , λn with respect to the basis {v1 , . . . , vn } and A(v) by a column vector w̃ (an m×1
matrix) containing its coordinates with respect to the basis {w1 , . . . , wm }, then (7.11)
shows
λ1
..
w̃ = M ṽ = M . , where M := (aji ) ∈ M(m, n, F ). (7.12)
λn
—
For finite-dimensional vector spaces, the precise relationship between linear maps, bases,
and matrices is provided by the following theorem:
Theorem 7.10. Let V and W be finite-dimensional vector spaces over the field F ,
let BV := {v1 , . . . , vn } and BW := {w1 , . . . , wm } be bases of V and W , respectively;
m, n ∈ N, n = dim V , m = dim W .
(b) Let aji ∈ F , j ∈ {1, . . . , m}, i ∈ {1, . . . , n}. Moreover, let A ∈ L(V, W ). Then
(7.1) holds if, and only if,
m
X
∀ Avi = aji wj . (7.14)
i∈{1,...,n}
j=1
Proof. (a): According to Th. 6.19, Avi wj : (j, i) ∈ {1, .. . , m} × {1, . . . , n} forms
a basis of L(V, W ). Thus, to every family of coordinates aji : (j, i) ∈ {1, . . . , m} ×
7 MATRICES 136
{1, . . . , n} in F , (7.1) defines a unique element of L(V, W ), i.e. I is bijective. It remains
to verify that I is linear. To this end, let λ, µ ∈ F and A, B ∈ L(V, W ) with
m X
X n
A= aji Avi wj , (aji ) = I(A) ∈ M(m, n, F ),
j=1 i=1
Xm X n
B= bji Avi wj , (bji ) = I(B) ∈ M(m, n, F ).
j=1 i=1
Then
m X
X n m X
X n m X
X n
λA + µB = λ aji Avi wj + µ bji Avi wj = (λaji + µbji )Avi wj ,
j=1 i=1 j=1 i=1 j=1 i=1
showing
proving the linearity of I. Now let V = W and vi = wi for each i ∈ {1, . . . , n}. We
claim n Xn n
X X
BA = cji Avi vj , where cji = bjk aki : (7.15)
j=1 i=1 k=1
Definition and Remark 7.11. In the situation of Th. 7.10, for each A ∈ L(V, W ), one
calls the matrix I(A) = (aji ) ∈ M(m, n, F ) the (transformation) matrix corresponding
to A with respect to the basis {v1 , . . . , vn } of V and the basis {w1 , . . . , wm } of W . If the
bases are understood, then one often tends to identify the map with its corresponding
matrix.
However, as I(A) depends on the bases, identifying A and I(A) is only admissible as
long as one keeps the bases of V and W fixed! Moreover, if one represents matrices as
rectangular arrays as in (7.2) (which one usually does), then one actually considers the
basis vectors of {v1 , . . . , vn } and {w1 , . . . , wm } as ordered from 1 to n (resp. m), i.e. I(A)
actually depends on the so-called ordered bases (v1 , . . . , vn ) and (w1 , . . . , wm ) (ordered
bases are tuples rather than sets and the matrix corresponding to A changes if the order
of the basis vectors changes).
P
Similarly, we had seen in (7.12) that it can be useful to identify a vector v = ni=1 λi vi
with its coordinates (λ1 , . . . , λn ), typically represented as an n × 1 matrix (a column
vector, as in (7.12)) or a 1 × n matrix (a row vector). Obviously, this identification is
also only admissible as long as the basis {v1 , . . . , vn } and its order is kept fixed.
Av1 = w1
Av2 = 2w1 − w2
Av3 = 3w1
Av4 = 4w1 − 2w2 .
According to Th. 7.10(a),(b), the coefficients on the right-hand side of the above equa-
tions provide the columns of the matrix representing A: The matrix I(A) of A with
respect to BV and BW (I according to Th. 7.10(a)) is
1 2 3 4
I(A) = .
0 −1 0 −2
is obtained from I(A) by switching columns 1 and 4 as well as rows 1 and 2, resulting
in
′ −2 −1 0 0
I (A) = .
4 2 3 1
Suppose, we want to determine Av for the vector v := −v1 + 3v2 − 2v3 + v4 . Then,
7 MATRICES 138
The following Th. 7.13 is the justification for defining matrix multiplication according
to Def. 7.2(c).
As we have seen, given A ∈ L(V, W ), the matrix I(A) representing A depends on the
chosen (ordered) bases of V and W . It will be one goal of Linear Algebra II to find
methods to determine bases of V and W such that the form of I(A) becomes particularly
simple. In the following Th. 7.14, we show how I(A) changes for given basis transitions
for V and W , respectively.
Theorem 7.14. Let V and W be finite-dimensional vector spaces over the field F ,
m, n ∈ N, dim V = n, dim W = m. Let BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ) be
ordered bases of V and W , respectively. Moreover, let BV′ := (v1′ , . . . , vn′ ) and BW
′
:=
7 MATRICES 139
(w1′ , . . . , wm
′
) also be ordered bases of V and W , respectively, and let cji , dji , fji ∈ F be
such that
n
X
∀ vi′ = cji vj , (7.17a)
i∈{1,...,n}
j=1
Xm m
X
∀ wi′ = fji wj , wi = dji wj′ , (7.17b)
i∈{1,...,m}
j=1 j=1
where (dji ) = (fji )−1 ∈ GLm (F ) (we then call (cji ) ∈ GLn (F ) and (fji ) ∈ GLm (F ) the
transition matrices corresponding to the basis transitions from BV to BV′ and from BW
′
to BW , respectively). If A ∈ L(V, W ) has I(BV , BW )(A) = (aji ) ∈ M(m, n, F ) with
I(BV , BW ) as in Th. 7.10(a), then
I(BV′ , BW
′
)(A) = (eji ), where (eji ) = (dji )(aji )(cji ) = (fji )−1 (aji )(cji ). (7.18)
In particular, in the special case, where V = W , vi = wi , vi′ = wi′ for each i ∈ {1, . . . , n},
one has
I(BV′ , BV′ )(A) = (eji ), where (eji ) = (cji )−1 (aji )(cji ). (7.19)
Example 7.15. We explicitly write the transformations and matrices of Th. 7.14 for the
situation of Ex. 7.12: There we had V := Q4 with ordered bases BV := (v1 , v2 , v3 , v4 ),
BV′ := (v4 , v2 , v3 , v1 ), and W := Q2 with ordered bases BW := (w1 , w2 ), BW
′
:= (w2 , w1 ).
Thus,
v1′ = v4 0 0 0 1
v2′ = v2 0 1 0 0
(c ji ) =
v3′ = v3 0 0 1 0
v4′ = v1 , 1 0 0 0
and
w1′ = w2 0 1
′ (fji ) =
w2 = w1 , 1 0
7 MATRICES 140
and
w1 = w2′ 0 1
′ (dji ) = .
w2 = w1 , 1 0
Moreover, A ∈ L(V, W ) in Ex. 7.12 was such that
1 2 3 4
(aji ) = I(BV , BW )(A) = .
0 −1 0 −2
(a) Let V and W be vector spaces over F and A ∈ L(V, W ). Then rk(A) := dim Im A
is called the rank of A.
(b) Let (aji )(j,i)∈{1,...,m}×{1,...,n} ∈ M(m, n, F ) and consider the column vectors and row
vectors of (aji ) as elements of the vector spaces F m and F n , respectively (cf. Rem.
7.4(b)). Then
* a1i +
..
rk(aji ) := dim . : i ∈ {1, . . . , n} (7.20a)
a
mi
—
7 MATRICES 141
A main goal of the present section is to show that, if V and W are finite-dimensional
vector spaces, A ∈ L(V, W ), and (aji ) is the matrix representing A with respect to
chosen ordered bases of V and W , respectively, then rk(A) = rk(aji ) = rrk(aji ) (cf. Th.
7.23 below).
Theorem 7.17. Let F be a field, m, n ∈ N.
(b) If the matrices (cji ) ∈ M(n, n, F ) and (dji ) ∈ M(m, m, F ) are regular and (aji ) ∈
M(m, n, F ), then
rk(aji ) = rk (dji )(aji )(cji ) .
Proof. (a) is basically due to Lem. 7.5(a) and Th. 6.9: Let B : W −→ F m be the linear
isomorphism with B(wj ) = ej , where {e1 , . . . , em } is the standard basis of F m (cf. Ex.
5.16(c)). Then
rk(A) = dim Im A = dim B(Im A) = dim Im(BA) = rk(BA). (7.21)
Since
m
X
! a1i
= ... = cA
∀ (BA)vi = B aji wj i ,
i∈{1,...,n}
j=1 ami
we have Im(BA) = h{cA A
1 , . . . , cn }i, which, together with (7.21), proves (a).
Then (aji ) is the matrix representing A with respect to BV and BW and rk(A) = rk(aji )
′
by (a). Now let BV′ := (v1′ , . . . , vn′ ) and BW ′
:= (w1′ , . . . , wm ) also be ordered bases of V
and W , respectively, such that (7.17) holds, i.e.
n
X m
X
∀ vi′ = cji vj , ∀ wi = dji wj′
i∈{1,...,n} i∈{1,...,m}
j=1 j=1
7 MATRICES 142
(such bases exist, since (cji ) and (dji ) are regular), then, according to Th. 7.14, the
matrix (eji ) := (dji )(aji )(cji ) represents A with respect to BV′ and BW
′
. Thus, (a) yields
rk(eji ) = rk(A) = rk(aji ), as desired.
Theorem 7.18. Let V, W be finite-dimensional vector spaces over the field F , dim V =
n, dim W = m, (m, n) ∈ N2 , A ∈ L(V, W ) with r := rk(A). Then there exists ordered
bases BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ) of V and W , respectively, such that
the matrix (aji ) ∈ M(m, n, F ) corresponding to A with respect to BV and BW satisfies
(
1 for j = i, 1 ≤ j ≤ r, Idr 0
aji = i.e. (aji ) = .
0 otherwise, 0 0
As a tool for showing rk(aji ) = rrk(aji ), we will now introduce the transpose of a
matrix, which is also of general interest. While, here, we are mostly interested in using
the transpose of a matrix for a matrix representing a linear map, the concept makes
sense in the more general situation of Def. 7.1:
Remark 7.21. Let S be a nonempty set, m, n ∈ N. It is immediate from Def. 7.19 that
the map A 7→ At is bijective from M(m, n, S) onto M(n, m, S), where
∀ (At )t = A. (7.23)
A∈M(m,n,S)
(AB)t = B t At . (7.24)
Proof. Exercise.
Theorem 7.23. Let V, W be finite-dimensional vector spaces over the field F , dim V =
n, dim W = m, with ordered bases BV := (v1 , . . . , vn ) and BW := (w1 , . . . , wm ), respec-
tively. If A ∈ L(V, W ) and (aji ) ∈ M(m, n, F ) is the matrix corresponding to A with
respect to BV and BW , then
Proof. As the first equality was already shown in Th. 7.17(a), it merely remains to show
rk(aji ) = rrk(aji ). Let r := rk(A). According to Th. 7.18 and Th. 7.14, there exist
regular matrices (xji ) ∈ GLm (F ), (yji ) ∈ GLn (F ) such that
Idr 0 (7.24) t t t Idr 0
(xji )(aji )(yji ) = ⇒ (yji ) (aji ) (xji ) = .
0 0 0 0
7 MATRICES 144
Since (xji )t and (yji )t are regular by Th. 7.22(a), we may use Th. 7.17(b) to obtain
t t t t
Idr 0
rrk(aji ) = rk(aji ) = rk (yji ) (aji ) (xji ) = rk = r,
0 0
as desired.
Remark 7.24. Let V, W be finite-dimensional vector spaces over the field F , dim V = n,
dim W = m, where m, n ∈ N. Moreover, let BV = (v1 , . . . , vn ) and BW = (w1 , . . . , wm )
be ordered bases of V and W , respectively, let A ∈ L(V, W ), and let (aji ) ∈ M(m, n, F )
be the matrix corresponding to A with respect to BV and BW . Using the transpose
matrix (aji )t ∈ M(n, m, F ), one can now also define a transpose At of the map A.
However, a subtlety arises related to basis transitions and, given λ1 , . . . , λm ∈ F , rep-
resenting coordinates of w ∈ W with respect to BW , one can not simply define At by
applying (aji )t to the column vector containing λ1 , . . . , λm , as the result would, in gen-
eral, depend on BV and BW . Instead, one obtains At by left-multiplying (aji ) with the
row vector containing λ1 , . . . , λm :
At w := λ1 . . . λm (aji ).
Even though the resulting coordinate values are the same that one obtains from com-
puting (aji )t (λ1 . . . λm )t , the difference appears when considering basis transitions. In
consequence, algebraically, one obtains the transpose At to be a map from the dual W ′
of W into the dual V ′ of V . Here, we will not go into further details, as we do not want
to explain the concept of dual spaces at this time.
(a) We call a matrix A = (aji ) ∈ M(n, S) diagonal if, and only if, aji = 0 for j 6= i,
i.e. if, and only if, all nondiagonal entries of A are 0. We define
Dn (S) := A ∈ M(n, S) : A is diagonal .
7 MATRICES 145
(b) A matrix A = (aji ) ∈ M(n, S) is called upper triangular or right triangular (resp.
lower triangular or left triangular) if, and only if, aji = 0 for each i, j ∈ {1, . . . , n}
such that j > i (resp. j < i), i.e. if, and only if, all nonzero entries of A are
above/right (resp. below/left) of the diagonal. A triangular matrix A is called
strict if, and only if, aii = 0 for each i ∈ {1, . . . , n}; it is called unipotent if, and
only if, aii = 1 for each i ∈ {1, . . . , n}. We define4
BUn (S) := A ∈ M(n, S) : A is upper triangular ,
BLn (S) := A ∈ M(n, S) : A is lower triangular ,
BU0n (S) := A ∈ M(n, S) : A is strict upper triangular ,
BL0n (S) := A ∈ M(n, S) : A is strict lower triangular ,
BU1n (S) := A ∈ BUn (S) : A is unipotent ,
BL1n (S) := A ∈ BLn (S) : A is unipotent .
(c) A matrix A = (aji ) ∈ M(n, S) is called symmetric if, and only if, At = A, i.e. if
aji = aij for each i, j ∈ {1, . . . , n}. We define
Symn (S) := A ∈ M(n, S) : A is symmetric .
Proposition 7.26. Let S be a ring and m, n ∈ N. Let s1 , . . . , smax{m,n} ∈ S.
(b) Dn (S) is a subring of M(n, S) (subring with unity if S is a ring with unity). If S
is a field, then Dn (S) is a vector subspace of M(n, S).
4
The ‘B’ in the notation comes from the theory of so-called Lie algebras, where one finds, at least
for suitable S, that BUn (S) and BLn (S) form so-called Borel subalgebras of the algebra M(n, S).
7 MATRICES 146
(c) If S is a ring with unity, then D := diag(s1 , . . . , sn ) ∈ GLn (S) if, and only if, each
si is invertible, i.e. if, and only if, si ∈ S ∗ for each i ∈ {1, . . . , n}. In that case, one
has D−1 = diag(s−1 −1 ∗
1 , . . . , sn ) ∈ GLn (S). Moreover, Dn (S ) = Dn (S) ∩ GLn (S) is
a subgroup of GLn (S).
Proof. (a): If (dji ) := D1 , (eji ) := D2 , (xji ) := D1 A and (yji ) := AD2 , then, for each
(j, i) ∈ {1, . . . , m} × {1, . . . , n},
n
X m
X
xji = djk aki = sj aji , yji = ajk eki = aji si .
k=1 k=1
(a) Let A := (aji ) ∈ M(n, S) and B := (bji ) ∈ M(n, S). If A, B ∈ BUn (S) or
A, B ∈ BLn (S), and C := (cji ) = AB, then cii = aii bii for each i ∈ {1, . . . , n}.
(b) BUn (S), BLn (S), BU0n (S), BL0n (S) are subrings of M(n, S) ( BUn (S), BLn (S) are
subrings with unity if S is a ring with unity). If S is a field, then BUn (S), BLn (S),
BU0n (S), BL0n (S) are vector subspaces of M(n, S).
(c) Let S be a ring with unity. If B := (bji ) ∈ BUn (S) ∩ GLn (S) and A := (aji ) = B −1 ,
then A ∈ BUn (S) with
0 for i < j,
−1
aji = bii for i = j, (7.29a)
Pi−1 −1
− k=1 ajk bki bii recursively for i > j.
If B := (bji ) ∈ BLn (S) ∩ GLn (S) and A := (aji ) = B −1 , then A ∈ BLn (S) with
0 for j < i,
−1
aji = bii for j = i, (7.29b)
Pj −1
− k=i+1 ajk bki bii recursively for j > i.
7 MATRICES 147
(d) If S is a ring with unity and A := (aji ) ∈ BUn (S) ∪ BLn (S), then A ∈ GLn (S) if,
and only if, each aii is invertible. Moreover, BU1n (S), BL1n (S) as well as BUn (S) ∩
GLn (S), BLn (S) ∩ GLn (S) are subgroups of GLn (S).
i−1 =0 n =0
X z}|{ X z}|{
cii = aik bki + aii bii + aik bki = aii bii ,
k=1 k=i+1
i−1 =0 n =0
X z}|{ X z}|{
cii = aik bki +aii bii + aik bki = aii bii .
k=1 k=i+1
(b): Let A, B ∈ BUn (S) and C := (cji ) = AB. We obtain C ∈ BUn (S), since, if j > i,
then
i
X Xn
cji = ajk bki + ajk bki = 0
k=1 k=i+1
(the first sum equals 0 since k ≤ i < j and A is upper triangular, the second sum equals
0 since k > i and B is upper triangular). Now let A, B ∈ BLn (S) and C := (cji ) := AB.
We obtain C ∈ BLn (S), since, if j < i, then
i−1
X n
X
cji = ajk bki + ajk bki = 0
k=1 k=i
(the first sum equals 0 since k < i and B is lower triangular, the second sum equals 0 since
j < i ≤ k and A is lower triangular). Now let M ∈ {BUn (S), BLn (S), BU0n (S), BL0n (S)}.
Then the above and (a) show AB ∈ M for A, B ∈ M. As A, B ∈ M, clearly, implies
A + B ∈ M and −A ∈ M, we have shown M to be a subring of M(n, S), where, if
1 ∈ S, then Idn ∈ BUn (S) ∩ BLn (S). If λ ∈ S, A ∈ M, then, clearly, λA ∈ M, showing
that, for S being a field, M is a vector subspace of M(n, S).
(c): If B ∈ BUn (S) ∩ GLn (S), then each bii is invertible by (a) and (b). Now let
A := (aji ) be defined by (7.29a), C := (cji ) := AB. We already know C ∈ BUn (S)
according to (b). Moreover, cii = 1 follows from (a) and, for j < i,
i
X i−1
X
cji = ajk bki = aji bii + ajk bki = 0
k=j k=j
7 MATRICES 148
by the recursive part of (7.29a), completing the proof of C = Idn . If B ∈ BLn (S) ∩
GLn (S), then each bii is invertible by (a) and (b). Now let A := (aji ) be defined by
(7.29b), C := (cji ) := AB. We already know C ∈ BLn (S) according to (b). Moreover,
cii = 1 follows from (a) and, for j > i,
j j
X X
cji = ajk bki = aji bii + ajk bki = 0
k=i k=i+1
(b) A can be uniquely decomposed into its symmetric and skew-symmetric parts, i.e.
A = Asym + Askew and, if A = S + B with S ∈ Symn (F ) and B ∈ Skewn (F ), then
S = Asym and B = Askew .
(c) A is symmetric if, and only if, A = Asym ; A is skew-symmetric if, and only if,
A = Askew .
5
Using the determinant det A and results from Linear Algebra II, we can show that A−1 ∈
Skewn (F ) ∩ GLn (F ) can only exist for char F = 2 or n even: If A ∈ Skewn (F ), then det A = det(At ) =
det(−A) = (−1)n det A, where we have used [Phi19, Cor. 4.24(b),(d)]. Thus, for n odd, 2 det A = 0,
implying 2 = 0 (i.e. char F = 2) or det A = 0 (i.e. A ∈/ GLn (F ) by [Phi19, Cor. 4.24(h)]).
7 MATRICES 150
Theorem 7.34. Let S be a ring. Let J, I, K be finite index sets, #J, #I, #K ∈ N.
Now assume, we have disjoint partitions of J, I, K into nonempty sets:
[˙ [˙ [˙
J= Jα , I = Iβ , K = Kγ , (7.31)
α∈{1,...,M } β∈{1,...,N } γ∈{1,...,L}
7 MATRICES 151
where M, N, L ∈ N,
Then, for each (α, γ, β) ∈ {1, . . . , M }×{1, . . . , L}×{1, . . . , N }, Aαγ is a Jα ×Kγ matrix
and Bγβ is a Kγ × Iβ matrix. Thus, we can define
L
!
X
(Cαβ ) := (Aαγ )(Bγβ ) := Aαγ Bγβ . (7.32c)
γ=1 (α,β)∈{1,...,M }×{1,...,N }
We claim that
C := (cji ) := AB = (Cαβ ) (7.33)
in the sense that
Example 7.35. Let S be a ring with unity and A, B, C, D ∈ GL2 (S). Then we can use
Th. 7.34 to perform the following multiplication of a 4 × 6 matrix by a 6 × 4 matrix:
A−1 0
A B 0 −1 −1 2 Id2 BC −1
B C = .
0 C D CB −1 2 Id2
0 D−1
8 LINEAR SYSTEMS 152
8 Linear Systems
Ax = b (8.2)
Example 8.2. While linear systems arise from many applications, we merely provide
a few examples, illustrating the importance of linear systems for problems from inside
the subject of linear algebra.
(a) Let v1 := (1, 2, 0, 3), v2 := (0, 3, 2, 1), v3 := (1, 1, 1, 1). The question if the set
M := {v1 , v2 , v3 } is a linearly dependent subset of R4 is equivalent to asking if there
8 LINEAR SYSTEMS 153
x1 + x3 = 0
2x1 + 3x2 + x3 = 0
2x2 + x3 = 0
3x1 + x2 + x3 = 0
has a solution x = (x1 , x2 , x3 ) 6= (0, 0, 0). The linear system can be written as
1 0 1 0
2 3 1 x1 0
0 2 1 x2 = 0 .
x3
3 1 1 0
has a solution.
where e1 , . . . , en are the standard (column) basis vectors. Then the vk are obviously
the column vectors of A−1 .
8 LINEAR SYSTEMS 154
where x ∈ F n is identified with a column vector in M(n, 1, F ) and the column vector
Ax ∈ M(m, 1, F ) is identified with an element of F m . In view of this fact, given an
n-dimensional vector space V over F , an m-dimensional vector space W over F , a linear
map L ∈ L(V, W ), and b ∈ W , we call
Lx = b (8.6)
a linear system or merely a linear equation for the unknown vector x ∈ V and we write
L(L|b) for its set of solutions. If V = F n , W = F m , and the the matrix A ∈ M(m, n, F )
corresponds to L with respect to the respective standard bases of F n and F m (cf. Ex.
5.16(c)), then (8.6) is identical to (8.2) and
i.e. the set of solutions is precisely the preimage of the set {b} under L. In consequence,
we then have
b∈
/ Im L ⇒ L(A|b) = L(L|b) = ∅, (8.10a)
Th. 4.20(f)
b ∈ Im L ⇒ ∀ L(A|b) = L(L|b) = x0 + ker L = x0 + L(L|0). (8.10b)
x0 ∈L(L|b)
One can also express (8.10b) by saying one obtains all solutions to the inhomogeneous
system Lx = b by adding a particular solution of the inhomogeneous system to the set
of all solutions to the homogeneous system Lx = 0.
8 LINEAR SYSTEMS 155
Proof. (a): (i) implies (ii) by (8.7). Since A corresponds to L with respect to the
respective standard bases of F n and F m , (ii) implies (iii) by Lem. 7.5(a). (iii) implies
(iv), since
(iii)
rk(A|b) = dimh{cA A A A
1 , . . . , cn , b}i = dimh{c1 , . . . , cn }i = rk(A).
(ii) L(A|0) = L(L|0) = {0}, i.e. the homogeneous system Lx = 0 has only the so-called
trivial solution x = 0.
(iii) rk(A) = n.
8 LINEAR SYSTEMS 156
We recall from (8.1) and Def. 8.1 that, given a field F and m, n ∈ N, we are considering
linear systems
Xn
∀ ajk xk = bj ,
j∈{1,...,m}
k=1
8 LINEAR SYSTEMS 157
The first nonzero elements in each row are called pivot elements (in (8.11), the positions
of the pivot elements are marked by squares ). The columns cA k containing a pivot
element are called pivot columns and k ∈ {1, . . . , n} is then called a pivot index; an index
k ∈ {1, . . . , n} that is not a pivot index is called a free index. Let IpA , IfA ⊆ {1, . . . , n}
denote the sets of pivot indices and free indices, respectively. If A represents a linear
system, then the variables corresponding to pivot columns are called pivot variables. All
remaining variables are called free variables.
Example 8.8. The following matrix over R is in echelon form:
0 0 3 3 0 −1 0 3
A := 0 0 0 4 0 2 −3 2 .
0 0 0 0 0 0 1 1
It remains in echelon form if one adds zero rows at the bottom. For the linear system
Ax = b, the variables x3 , x4 , x7 are pivot variables, whereas x1 , x2 , x5 , x6 , x8 are free
variables.
Theorem 8.9. Let F be a field and m, n ∈ N. Let A ∈ M(m, n, F ), b ∈ M(m, 1, F ) ∼=
F m , and consider the linear system Ax = b. Assume the augmented matrix (A|b) to be
in echelon form.
(b) Let LA : F n −→ F m be the linear map associated with A according to Rem. 8.3.
Noting A to be in echelon form as well and using the notation from Def. 8.7, one
has
dim ker LA = dim L(A|0) = #IfA , dim Im LA = rk A = #IpA ,
i.e. the dimension of the kernel of A is given by the number of free variables, whereas
the dimension of the image of A is given by the number of pivot variables.
Proof. (a): The equivalence between (i) and (ii) was already shown in Th. 8.4.
“(ii)⇔(iii)”: As both A and (A|b) are in echelon form, their respective ranks are, clearly,
given by their respective number of nonzero rows. However, A and (A|b) have the same
number of nonzero rows if, and only if, b contains no pivot elements.
(b): As A is in echelon form, its number of nonzero rows (i.e. its rank) is precisely the
number of pivot indices, proving dim Im LA = rk A = #IpA . Since #IpA + #IfA = n =
dim ker LA + dim Im LA , the claimed dim ker LA = #IfA now also follows.
If the augmented matrix (A|b) of the linear system Ax = b is in echelon form and such
that its final column b contains no pivot element, then the linear system can be solved,
using so-called back substitution: One obtains a parameterized representation of L(A|b)
as follows: Starting at the bottom with the first nonzero row, one solves each row for the
corresponding pivot variable, in each step substituting the expressions for pivot variables
that were obtained in previous steps. The free variables are treated as parameters. A
particular element of L(A|b) can be obtained by setting all free variables to 0. We now
provide a precise formulation of this algorithm:
where, for l > ik , the αik l are defined recursively over k = N, . . . , 1 as follows: Assuming
αij l have already been defined for each j > k such that (8.12b) holds, solve the k-th
equation of Ax = b for xik :
n
X n
X
ak i x i = b k ⇒ x ik = a−1
k ik bk − a−1
k ik ak i x i . (8.13)
i=ik i=ik +1
All variables xi in the last equation have indices i > ik and, inductively, we use (8.12b)
for ij > ik to replace all pivot variables in this equation for xik . The resulting expression
for xik has the form (8.12b) with αik l ∈ F , as desired.
From (αkl ), we now define column vectors
s1 v1i
..
∀ vi := ... ,
s := . ,
A
i∈If
sn vni
where
(
α ik i
for j = ik ∈ IpA ,
α ik 0 for j = ik ∈ IpA ,
∀ sj := ∀ vji := 1 for j = i,
j∈{1,...,n} 0 for j ∈ IfA , (j,i)∈{1,...,n}×IfA
0 for j ∈ IfA \ {i}.
Theorem 8.11. Consider the situation of Alg. 8.10 and let LA : F n −→ F m be the
linear map associated with A according to Rem. 8.3.
(a) Let x = (x1 , . . . , xn )t ∈ F n be a column vector. Then x ∈ L(A|b) if, and only if,
the xik , k ∈ {1, . . . , N }, satisfy the recursion over k = N, . . . , 1, given by (8.13) or,
equivalently, by (8.12b). In particular, x ∈ ker LA = L(A|0) if, and only if, the xik
satisfy the respective recursion with b1 = · · · = bN = 0.
(b) The set
B := {vi : i ∈ IfA } (8.14)
forms a basis of ker LA = L(A|0) and
L(A|b) = s + ker LA . (8.15)
8 LINEAR SYSTEMS 160
then X X
∀ 0= λj vji = λj δji = λi ,
i∈IfA
j∈IfA j∈IfA
Indeed, we have
X X
v ik i = α ik i = αik l vli = αik l δli ,
l∈IfA : l>ik l∈IfA : l>ik
which holds, since, if Ik := {l ∈ IfA : l > ik } = ∅, then the sum is empty and, thus, 0
with vik i = 0 as well, due to i < ik , and, if Ik 6= ∅, then, due to the Kronecker δ, the
sum evaluates to αik i as well. To prove (8.15), according to (8.10b), it suffices to show
s ∈ L(A|b) and, by (a), this is equivalent to
X
∀ s ik = α ik 0 + α ik l s l .
k∈{1,...,N }
l∈IfA : l>ik
However, the validity of these equations is immediate from the definition of s, which, in
particular, implies the sums to vanish (sl = 0 for each l ∈ IfA ).
Example 8.12. Consider the linear system Ax = b, where
1 2 −1 3 0 1 1
0 0 1 1 −1 1
2 ∈ R4 .
A :=
0 0 0 ∈ M(4, 6, R), b :=
0 1 1 3
0 0 0 0 0 0 0
Since (A|b) is in echelon form and b does not have any pivot elements, the set of solutions
L(A|b) is nonempty, and it can be obtained using back substitution according to Alg.
8.10: Using the same notation as in Alg. 8.10, we have
m = 4, n = 6, IpA = {1, 3, 5}, N = 3, IfA = {2, 4, 6}.
8 LINEAR SYSTEMS 161
x5 = 3 − x6 ,
x3 = 2 − x6 + x5 − x4 = 5 − 2x6 − x4 ,
x1 = 1 − x6 − 3x4 + x3 − 2x2 = 6 − 3x6 − 4x4 − 2x2 .
As seen in the previous section, if the augmented matrix (A|b) of the linear system
Ax = b is in echelon form, then the set of solutions can be obtained via back substitution.
Thus, it is desirable to transform the augmented matrix of a linear system into echelon
form without changing the set of solutions. This can be accomplished in finitely many
steps by the so-called Gaussian elimination algorithm of Alg. 8.17 below (cf. Th. 8.19),
using elementary row operations and variable substitutions, where variable substitutions
actually turn out to constitute particular elementary row operations.
(a) Row Switching: Switching two rows ri and rj , where i, j ∈ {1, . . . , m}.
(b) Row Multiplication: Replacing a row ri by some nonzero multiple λri of that row,
i ∈ {1, . . . , m}, λ ∈ F \ {0}.
8 LINEAR SYSTEMS 162
(c) Row Addition: Replacing a row ri by the sum ri + λrj of that row and a multiple
of another row, (i, j) ∈ {1, . . . , m}2 , i 6= j, λ ∈ F .
Definition and Remark 8.14. Let F be a field. Consider a linear system
n
X
∀ ajk xk = bj ,
j∈{1,...,m}
k=1
Comparing with Def. 8.13(c), we see that variable substitution is a particular case of
row addition: One obtains (8.17) by replacing row rl of (A|b) by the sum rl − ali a−1
ji rj .
Proof. For each of the three elementary row operations, it is immediate from the arith-
metic laws holding in the field F that, for each x ∈ F n , one has x ∈ L(A|b) if, and
only if, x ∈ L(Aer |ber ) (where the inverse operation of row switching is merely switching
rows ri and rj again, the inverse operation of row multiplication by λ ∈ F \ {0} is row
multiplication by λ−1 , and the inverse operation of row addition ri 7→ ri + λrj , i 6= j, is
ri 7→ ri − λrj ).
8 LINEAR SYSTEMS 163
Proof. Let N ∈ {1, . . . , n + 1} be the maximal k occurring during the Gaussian elimina-
tion algorithm, i.e. à := A(N ) . To prove that à is in echelon form, we show by induction
on k ∈ {1, . . . , N } that the first k − 1 columns of A(k) are in echelon form as well as the
(k)
first r(k) rows of A(k) with aij = 0 for each (i, j) ∈ {r(k), . . . , m} × {1, . . . , k − 1}. For
k = 1, the assertion is trivially true. By induction, we assume the assertion for k < N
and prove it for k + 1 (for k = 1, we do not assume anything). As already stated in
Rem. 8.18 above, the kth step of Alg. 8.17 does not change the first r(k) − 1 rows of A(k)
and it does not change the first k − 1 columns of A(k) , since (by induction hypothesis)
(k)
aij = 0 for each (i, j) ∈ {r(k), . . . , m} × {1, . . . , k − 1}. Moreover, if we are in the case
of Alg. 8.17(a), then
(k)
(k+1) (k) aik (k)
∀ aik = aik − (k)
ar(k),k = 0.
i∈{r(k)+1,...,m} ar(k),k
(k+1)
In each case, after the application of (a), (b), or (c) of Alg. 8.17, aij = 0 for each
(i, j) ∈ {r(k + 1), . . . , m} × {1, . . . , k}. We have to show that the first r(k + 1) rows of
A(k+1) are in echelon form. For Alg. 8.17(c), it is r(k + 1) = r(k) and there is nothing
(k+1)
to prove. For Alg. 8.17(a),(b), we know ar(k+1),j = 0 for each j ∈ {1, . . . , k}, while
(k+1)
ar(k),k 6= 0, showing that the first r(k + 1) rows of A(k+1) are in echelon form in each
case. Thus, we know that the first r(k + 1) rows of A(k+1) are in echelon form, and
all elements in the first k columns of A(k+1) below row r(k + 1) are zero, showing that
the first k columns of A(k+1) are also in echelon form. As, at the end of the Gaussian
elimination algorithm, r(k) = m or k = n + 1, we have shown à to be in echelon form.
That rk à = rk A is an immediate consequence of Cor. 8.16 combined with a simple
induction. If A = (B|b) and à = (B̃|b̃) represent the augmented matrices of the linear
systems Bx = b and B̃x = b̃, respectively, then L(B|b) = L(B̃|b̃) is an immediate
consequence of Th. 8.15 combined with a simple induction.
Remark 8.20. To avoid mistakes, especially when performing the Gaussian elimination
algorithm manually, it is advisable to check the row sums after each step. It obviously
suffices to consider how row sums are changed if (a) of Alg. 8.17 is carried out. Moreover,
let i ∈ {r(k) + 1, . . . , m}, as only rows with row numbers i ∈ {r(k) + 1, . . . , m} can be
(k) P (k)
changed by (a) in the kth step. If si = nj=1 aij is the sum of the ith row before (a)
8 LINEAR SYSTEMS 165
must be the sum of the ith row after (a) has been carried out in the kth step.
Remark 8.21. Theorem 8.19 shows that the Gaussian elimination algorithm of Alg.
8.17 provides a reliable systematic way of solving linear systems (when combined with
the back substitution Alg. 8.10). It has few simple rules, which make it easy to implement
as computer code. However, the following Ex. 8.22 illustrates that Alg. 8.17 is not
always the most efficient way to obtain a solution and, especially when solving small
linear systems manually, it often makes sense to deviate from it. In the presence of
roundoff errors, there can also be good reasons to deviate from Alg. 8.17, which are
typically addressed in classes on Numerical Analysis (see, e.g. [Phi21, Sec. 5]).
One can quickly solve this linear system without making use of Alg. 8.17 and Alg.
8.10: The second equation yields x2 = 3, which, plugged into the third equation, yields
x1 − x3 = 3. Adding this to the first equation yields 7x1 = 7, i.e. x1 = 1. Thus, the first
equation then provides x3 = 4 − 6 = −2 and we obtain
1
L(A|b) = 3 . (8.20)
−2
To illustrate Alg. 8.17 and Alg. 8.10, we solve (8.19) once again, first using Alg. 8.17 to
transform the system to echelon form, then using Alg. 8.10 to obtain the solution: For
8 LINEAR SYSTEMS 166
(k)
each k ∈ {1, 2, 3}, we provide (A(k) |b(k) ), r(k), and the row sums si , i ∈ {1, 2, 3}:
6 0 1 4
(A(1) |b(1) ) := (A|b) = 0 2 0 6 , r(1) := 1,
1 −1 −1 0
(1) (1) (1)
s1 = 11, s2 = 8, s3 = −1.
For k = 2, we apply Alg. 8.10(a), where we replace the third row of (A(1) |b(1) ) by the
third row plus (− 16 ) times the first row to obtain
6 0 1 4
(A(2) |b(2) ) := 0 2 0 6 , r(2) := r(1) + 1 = 2,
0 −1 − 6 − 23
7
3 1 1 | 1
8 LINEAR SYSTEMS 167
has a solution. Let us answer this question by transforming the system to echelon
form via elementary row operations: Replacing the 2nd row by the 2nd row plus
(−2) times the 1st row, and replacing the 4th row by the 4th row plus (−3) times
the 1st row yields
1 0 1 | 0
0 3 −1 | 0
0 2 1 | 2 .
0 1 −2 | −2
Switching rows 2 and 4 yields
1 0 1 | 0
0
1 −2 | −2
.
0 2 1 | 2
0 3 −1 | 0
Replacing the 3rd row by the 3rd row plus (−2) times the 2nd row, and replacing
the 4th row by the 4th row plus (−3) times the 2nd row yields
1 0 1 | 0
0 1 −2 | −2
0 0 5 | 6 .
0 0 5 | 6
Replacing the 4th row by the 4th row plus (−1) times the 3rd row yields
1 0 1 | 0
0 1 −2 | −2
0 0 5 | 6 .
(Ã|b̃) :=
0 0 0 | 0
(b) In Ex. 8.2(a), we saw that the question, if the set M := {v1 , v2 , v3 } with v1 , v2 , v3
as in (a) is a linearly dependent subset of R4 , can be equivalently posed as the
question, if the linear system Ax = 0, with the same matrix A as in (a), has a
solution x = (x1 , x2 , x3 ) 6= (0, 0, 0). From (a), we know rk A = rk à = 3, implying
L(A|0) = {0} and M is linearly independent.
8 LINEAR SYSTEMS 168
8.4 LU Decomposition
8.4.1 Definition and Motivation
If one needs to solve linear systems Ax = b with the same matrix A, but varying b (as,
e.g., in the problem of finding an inverse to an n × n matrix A, cf. Ex. 8.2(c)), then it is
not efficient to separately transform each (A|b) into echelon form. A more efficient way
aims at decomposing A into simpler matrices L and U , i.e. A = LU , then facilitating
the simpler solution of Ax = LU x = b when varying b (i.e. without having to transform
(A|b) into echelon form for each new b), cf. Rem. 8.25 below.
Definition 8.24. Let S be a ring with unity and A ∈ M(m, n, S) with m, n ∈ N. A
decomposition
A = LÃ (8.21a)
is called an LU decomposition of A if, and only if, L ∈ BL1m (S) (i.e. L is unipotent lower
triangular) and à ∈ M(m, n, S) is in echelon form. If A is an m × m matrix, then
à ∈ BUm (S) (i.e. à is upper triangular) and (8.21a) is an LU decomposition in the
strict sense, which is emphasized by writing U := Ã:
A = LU. (8.21b)
Remark 8.25. Let F be a field. Suppose the goal is to solve linear systems Ax = b
with fixed matrix A ∈ M(m, n, F ) and varying vector b ∈ F n . Moreover, suppose one
knows an LU decomposition A = LÃ. Then
0 = a11 = l11 u11 = u11 , showing the first column of U to be 0, i.e. U is singular, in
contradiction to A being regular.
—
Even though the previous example shows not every matrix has an LU decomposition, we
will see in Th. 8.34 below that every matrix does have an LU decomposition, provided
one is first allowed to permute its rows in a suitable manner.
The LU decomposition of Th. 8.34 below can be obtained, with virtually no extra
effort, by performing the Gaussian elimination Alg. 8.17. The proof will be founded
on the observation that each elementary row operation can be accomplished via left
multiplication with a suitable matrix: For row multiplication, one multiplies from the
left with a diagonal matrix, for row addition, one multiplies from the left with a so-
called Frobenius matrix or with the transpose of such a matrix (cf. Cor. 8.29 below);
for row switching, one multiplies from the left with a so-called permutation matrix (cf.
Cor. 8.33(a) below) – where, for Alg. 8.17 and the proof of Th. 8.34, we only need row
switching and row addition by means of a Frobenius matrix (not its transpose).
Definition 8.27. Let n ∈ N and let S be a ring with unity. We call a unipotent lower
triangular matrix A = (aji ) ∈ BL1n (S) a Frobenius matrix of index k, k ∈ {1, . . . , n}, if,
and only if, aji = 0 for each i ∈
/ {j, k}, i.e. if, and only if, it has the following form:
1
...
1
A= ak+1,k 1 . (8.23)
ak+2,k 1
.. ...
.
an,k 1
We define
Frokn (S) := A ∈ M(n, S) : A is Frobenius of index k .
bm,k 1
and C = (cji ) = BA ∈ M(m, n, S), then
m
(
X aji for each j ∈ {1, . . . , k},
cji = bjα aαi = (8.24)
α=1
bjk aki + aji for each j ∈ {k + 1, . . . , m}.
−lm,r(k) 1
(c) Let k ∈ {1, . . . , n}. If L = (lji ) ∈ Frokn (S), then L−1 ∈ Frokn (S). More precisely,
1 1
... ...
1 1
−1
L= lk+1,k 1 ⇒ L = −lk+1,k 1 .
lk+2,k 1 −lk+2,k 1
.
.. . ..
.
.. . ..
ln,k 1 −ln,k 1
8 LINEAR SYSTEMS 171
(d) If
1 0 ... 0
l21 1 ... 0
1
L = (lji ) = .. .. ... .. ∈ BLn (S)
. . .
ln1 ln2 . . . 1
is an arbitrary unipotent lower triangular matrix, then it is the product of the fol-
lowing n − 1 Frobenius matrices:
L = L̃1 · · · L̃n−1 ,
where
1
...
1
∈ Frokn (S)
L̃k := lk+1,k 1 for each k ∈ {1, . . . , n − 1}.
lk+2,k 1
.
.. . ..
ln,k 1
Proof. (a): Formula (8.24) is immediate from the definition of matrix multiplication.
(b) follows by applying (a) with B := Lk and A := A(k) .
(c): Applying (a) with A := L and B := L−1 , where L−1 is defined as in the statement
of (c), we can use (8.24) to compute C = BA:
(
0 for each j ∈ {1, . . . , k}, i 6= j,
aji =
1 for each j ∈ {1, . . . , k}, i = j,
(8.24)
cji = −ljk · 0 + 0 = 0
for each j ∈ {k + 1, . . . , n}, i 6= j, k,
bjk aki + aji = −ljk · 1 + ljk = 0 for each j ∈ {k + 1, . . . , n}, i = k,
−ljk · 0 + 1 = 1 for each j ∈ {k + 1, . . . , n}, i = j,
Corollary 8.29. Consider the situation of Def. 8.13, i.e. let F be a field, m, n ∈ N,
and A ∈ M(m, n, F ) with rows r1 , . . . , rm .
(a) The elementary row operation of row multiplication, i.e. of replacing row ri by some
nonzero multiple λri of that row, i ∈ {1, . . . , m}, λ ∈ F \{0}, can be accomplished by
left-multiplying A by the diagonal matrix Dλ := diag(d1 , . . . , dm ) ∈ Dm (F ), where
(
λ for j = i,
∀ dj :=
j∈{1,...,m} 1 for j 6= i.
(b) The elementary row operation of row addition, i.e. of replacing row ri by the sum
ri + λrk of that row and a multiple of another row, (i, k) ∈ {1, . . . , m}2 , i 6= k,
λ ∈ F , can be accomplished, for i > k, by left-multiplying A by the Frobenius
8 LINEAR SYSTEMS 173
matrix Fλ ∈ Frokm (F ),
1
...
(
1
λ for j = i,
Fλ = fk+1,k 1 , ∀ fj,k :=
fk+2,k 1
j∈{k+1,...,m} 0 6 i,
for j =
.
.. . ..
fm,k 1
thereby establishing the case. For i < k, we obtain, for (bji ) := (Fλ )t , (cji ) := (Fλ )t A,
m
(
X ajl for each j ∈ {1, . . . , m}, j 6= i,
cjl = bjα aαl =
α=1
ajl + bjk akl = ajl + λakl for j = i,
Definition 8.30. Let n ∈ N and recall the definition of the symmetric group Sn from
Ex. 4.9(b). Moreover, let F be a field (actually, the following definition still makes sense
as long as F is a set containing elements 0 and 1). For each permutation π ∈ Sn , we
define an n × n matrix
Pπ := etπ(1) etπ(2) . . . etπ(n) ∈ M(n, F ), (8.26)
where the ei denote the standard unit (row) vectors of F n . The matrix Pπ is called the
permutation matrix (in M(n, F )) associated with π. We define
Pern (F ) := Pπ ∈ M(n, F ) : π ∈ Sn .
(a) If π ∈ Sn and the columns of Pπ ∈ Pern (F ) are given according to (8.26), then the
rows of Pπ are given according to the inverse permutation π −1 ,
eπ−1 (1)
eπ−1 (2) t
Pπ = . . . , (Pπ ) = Pπ−1 .
(8.27)
eπ−1 (n)
(b) P := (pji ) ∈ M(n, F ) is a permutation matrix if, and only if, each row and each
column of P have precisely one entry 1 and all other entries of P are 0.
Proof. (a): Consider the ith row of (pji ) := Pπ . Then pji = 1 if, and only if, j = π(i)
(since pji also belongs to the ith column). Thus, pji = 1 if, and only if, i = π −1 (j),
which means that the jth row is eπ−1 (j) .
(b): Suppose P = Pπ , π ∈ Sn , n ∈ N, is a permutation matrix. Then (8.26) implies
that each column of P has precisely one 1 and all other elements are 0; (8.27) implies
that each row of P has precisely one 1 and all other elements are 0.
Conversely, suppose that P ∈ M(n, F ) is such that each row and each column of P have
precisely one entry 1 and all other entries of P are 0. Define a map π : {1, . . . , n} −→
{1, . . . , n} by letting π(i) be such that pπ(i),i = 1. Then
P = etπ(1) etπ(2) . . . etπ(n) ,
Thus,
m X
n
X 0 ... etj ... 0
Pπ A = aji Pπ |{z}
i=1 j=1 ith column
eπ−1 (1)
n
m X t
X eπ−1 (2) 0 . . . e j . . . 0
= aji
...
|{z}
i=1 j=1 ith column
eπ−1 (n)
0 . . . eπ−1 (1) · ej . . . 0
Xm X n 0 . . . eπ−1 (2) · ej . . . 0
= aji
. . .
i=1 j=1 0 . . . eπ−1 (n) · ej . . . 0
| {z }
ith column
m X
n
X 0 ... etπ(j) ... 0
= aji |{z} ,
i=1 j=1 ith column
Both surjectivity and injectivity of I are clear from Def. 8.30 and, then, Pern (F ) must
be a group by Prop. 4.11(c). Combining (8.28) with (b) proves (8.29) and, thus, also
shows Pern (F ) to be a subgroup of GLn (F ).
Definition and Remark 8.32. Let n ∈ N and let F be a field. A permutation matrix
Pτ ∈ Pern (F ), corresponding to a transposition τ = (ji) ∈ Sn (τ permutes i and j
and leaves all other elements fixed), is called a transposition matrix and is denoted by
Pji := Pτ . The transposition matrix Pji has the form
1
..
.
1
0 1
1
.
Pji =
.. .
(8.30)
1
1 0
1
...
1
Since every permutation is the composition of a finite number of transpositions (which we
will prove in Linear Algebra II), it is implied by Prop. 8.31(e) that every permutation
matrix is the product of a finite number of transposition matrices. It is immediate
from Prop. 8.31(c),(d) that left multiplication of a matrix A ∈ M(n, m), m ∈ N,
by Pji ∈ Pern (F ) switches the ith and jth row of A, whereas right multiplication of
A ∈ M(m, n) by Pji ∈ Pern (F ) switches the ith and jth column of A.
Corollary 8.33. Consider the situation of Def. 8.13, i.e. let F be a field, m, n ∈ N,
and A ∈ M(m, n, F ) with rows r1 , . . . , rm .
(a) The elementary row operation of row switching, i.e. of switching rows ri and rj ,
where i, j ∈ {1, . . . , m}, can be accomplished by left-multiplying A by the transposi-
tion matrix Pij ∈ Perm (F ).
(b) Let A(k) 7→ A(k+1) be the matrix transformation occurring in the kth step of the
Gaussian elimination Alg. 8.17 and assume the kth step is being performed by Alg.
8.17(b), i.e. by first switching the ith row with the r(k)th row and then, for each i ∈
(k) (k)
{r(k)+1, . . . , m}, replacing the (new) ith row by the (new) ith row plus −aik /ar(k),k
times the (new) r(k)th row, then
A(k+1) = Lk Pi,r(k) A(k) (8.31)
8 LINEAR SYSTEMS 178
r(k)
with transposition matrix Pi,r(k) ∈ Perm (F ) and with Lk ∈ From (F ) being the
Frobenius matrix of index r(k) defined in (8.25b).
Proof. (a) is immediate from Prop. 8.31(c) (cf. Def. and Rem. (8.32)).
(b) is due to (a) combined with Prop. 8.28(b).
8.4.3 Existence
Proof. Let N ∈ {0, . . . , n} be the final number of steps that is needed to perform the
Gaussian elimination Alg. 8.17. If N = 0, then A consists of precisely one row and there
is nothing to prove (set L := P := (1)). If N ≥ 1, then let A(1) := A and, recursively, for
each k ∈ {1, . . . , N }, let A(k+1) be the matrix obtained in the kth step of the Gaussian
elimination algorithm. If Alg. 8.17(b) is used in the kth step, then, according to Cor.
8.33(b),
A(k+1) = Lk Pk A(k) , (8.33)
where Pk := Pi,r(k) ∈ Perm (F ) is the transposition matrix that switches rows r(k) and i,
r(k)
while Lk ∈ From (F ) is the Frobenius matrix of index r(k) given by (8.25b). If (a) or
(c) of Alg. 8.17 is used in the kth step, then (8.33) also holds, namely with Pk := Idm
for (a) and with Lk := Pk := Idm for (c). By induction, (8.33) implies
Defining
and both upper and lower triangular, showing E = Idm . This, in turn, yields U1 = U2
as well as L1 = L2 , i.e. the claimed uniqueness of the LU decomposition for invertible
A.
Remark 8.35. Note that, even for invertible A ∈ GLn (F ), the triple (P, L, Ã) of (8.32)
is, in general, not unique. For example
1 0 1 1 1 0 1 1
= ,
0 1 1 0 1 1 0 −1
| {z } | {z } | {z } | {z }
P1 A L1 Ã1
0 1 1 1 1 0 1 0
= .
1 0 1 0 1 1 0 1
| {z } | {z } | {z } | {z }
P2 A L2 Ã2
Remark 8.36. Let F be a field. Suppose the goal is to solve linear systems Ax = b
with fixed matrix A ∈ M(m, n, F ) and varying vector b ∈ F n . Moreover, suppose one
knows an LU decomposition P A = LÃ with permutation matrix P ∈ Perm (F ). One
can then proceed as in Rem. 8.25 above, except with A replaced by P A and b replaced
by P b. As already indicated in Rem. 8.25, in the presence of roundoff errors, such errors
might be magnified by the use of an LU decomposition. If one works over the fields R
or C, then there are other decompositions (e.g. the so-called QR decomposition) that
tend to behave more benignly in the presence of roundoff errors (cf., e.g., [Phi21, Rem.
5.15] and [Phi21, Sec. 5.4.3]).
8.4.4 Algorithm
Let us crystallize the proof of Th. 8.34 into an algorithm to actually compute the
matrices L, P , and à occurring in decomposition (8.32) of a given à ∈ M(m, n, F ). As
in the proof of Th. 8.19, let N ∈ {1, . . . , n + 1} be the maximal k occurring during the
Gaussian elimination Alg. 8.17 applied to A.
(a) Algorithm for Ã: Starting with A(1) := A, the final step of Alg. 8.17 yields the
matrix à := A(N ) ∈ M(m, n, F ) in echelon form.
(b) Algorithm for P : Starting with P (1) := Idm , in the kth step of Alg. 8.17, define
P (k+1) := Pk P (k) , where Pk := Pi,r(k) if rows r(k) and i are switched according to
Alg. 8.17(b), and Pk := Idm , otherwise. In the last step, one obtains P := P (N ) .
(c) Algorithm for L: Starting with L(1) := 0 (zero matrix), we obtain L(k+1) from L(k)
in the kth step of Alg. 8.17 as follows: For Alg. 8.17(c), set L(k+1) := L(k) . If rows
8 LINEAR SYSTEMS 181
r(k) and i are switched according to Alg. 8.17(b), then we first switch rows r(k) and
i in L(k) to obtain some L̃(k) (this conforms to the definition of the L′k in (8.36), we
will come back to this point below). For Alg. 8.17(a) and for the elimination step of
Alg. 8.17(b), we first copy all elements of L(k) (resp. L̃(k) ) to L(k+1) , but then change
(k+1) (k) (k)
the elements of the r(k)th column according to (8.25b): Set li,r(k) := aik /ar(k),k
for each i ∈ {r(k) + 1, . . . , m}. In the last step, we obtain L from L(N ) by setting
all elements on the diagonal to 1 (postponing this step to the end avoids worrying
about the diagonal elements when switching rows earlier).
To see that this procedure does, indeed, provide the correct L, we go back to the
proof of Th. 8.34: From (8.38), (8.39), and (8.36), it follows that the r(k)th column
of L is precisely the r(k)th column of L−1k permuted according to PN · · · Pk+1 . This
is precisely what is described in (c) above: We start with the r(k)th column of L−1 k
(k+1) (k) (k)
by setting li,r(k) := aik /ar(k),k and then apply Pk+1 , . . . , PN during the remaining
steps.
Remark 8.37. Note that in the kth step of the algorithm for Ã, we eliminate all
elements of the kth column below the row with number r(k), while in the kth step of
the algorithm for L, we populate elements of the r(k)th column below the diagonal for
the first time. Thus, when implementing the algorithm in practice, one can save memory
capacity by storing the new elements for L at the locations of the previously eliminated
elements of A. This strategy is sometimes called compact storage.
Example 8.38. Let us determine the LU decomposition (8.32) (i.e. the matrices P , L,
and Ã) for the matrix
1 4 2 3
0 0 1 4
A := ∈ M(4, 4, R).
2 6 3 1
1 2 1 0
(where the function r is the one introduced in the Gaussian elimination Alg. 8.17). We
8 LINEAR SYSTEMS 182
0 0 0 1
0 0 0 0 1 4 2 3
, P23 A(2) = 0 −2 −1 −5 .
2 0 0 0
P23 L(2) =
0 0 0 0 0 0 1 4
1 0 0 0 0 −2 −1 −3
Eliminating the second column yields
0 0 0 0 1 4 2 3
2 0 0 0 , A(3) = 0 −2 −1 −5 ,
L(3) =
0 r(3) = r(2) + 1 = 3.
0 0 0 0 0 1 4
1 1 0 0 0 0 0 2
Accidentally, elimination of the third column does not require any additional work and
we have
1 0 0 0 1 0 0 0
0 0 1 0 2 1 0 0
P = P (4) = P (3) = (4) (3)
0 1 0 0 , L = L , L = 0 0 1 0 ,
0 0 0 1 1 1 0 1
1 4 2 3
0 −2 −1 −5
à = A(4) = A(3) =
0 0
, r(4) = r(3) + 1 = 4.
1 4
0 0 0 2
Recall that L is obtained from L(4) by setting the diagonal values to 1. One checks that,
indeed, P A = LÃ.
8 LINEAR SYSTEMS 183
Example 8.39. We remain in the situation of the previous Ex. 8.38. We see that
rk A = rk à = 4, showing A to be invertible. In Ex. 8.2(c), we observed that we obtain
the columns vk of A−1 by solving the linear systems
Av1 = e1 , . . . , Avn = en ,
where n = 4 in the present case and where e1 , . . . , en denote the standard (column) basis
vectors. We now determine the vk using the LU decomposition of Ex. 8.38 together with
the strategies described in Rem. 8.36 and Rem. 8.25: First, with
1 0 0 0
2 1 0 0
L= 0 0 1 0
1 1 0 1
we solve
1 0 −1 1
Next, with
1 4 2 3
0 −2 −1 −5
à =
0 0
1 4
0 0 0 2
we solve
Ãv1 = z1 , Ãv2 = z2 , Ãv3 = z3 , Ãv4 = z4 ,
A AXIOMATIC SET THEORY 184
Thus,
2 0 −6 14
1 3 −2 −1 −1
A−1 = · .
4 −8 4 8 −8
2 0 −2 2
One verifies
1 4 2 3 2 0 −6 14 1 0 0 0
0 0 1 4 1 3 −2 −1 −1 0 1 0 0
AA−1 = · · = .
2 6 3 1 4 −8 4 8 −8 0 0 1 0
1 2 1 0 2 0 −2 2 0 0 0 1
Suppose X ∈ X. Then X is a set that contains itself. But X was defined to contain
only sets that do not contain themselves, i.e. X ∈
/ X.
So suppose X ∈
/ X. Then X is a set that does not contain itself. Thus, by the definition
of X, X ∈ X.
Perhaps you think Russell’s construction is rather academic, but it is easily translated
into a practical situation. Consider a library. The catalog C of the library should contain
all the library’s books. Since the catalog itself is a book of the library, it should occur
as an entry in the catalog. So there can be catalogs such as C that have themselves as
an entry and there can be other catalogs that do not have themselves as an entry. Now
one might want to have a catalog X of all catalogs that do not have themselves as an
entry. As in Russell’s antinomy, one is led to the contradiction that the catalog X must
have itself as an entry if, and only if, it does not have itself as an entry.
One can construct arbitrarily many versions, which we will not do. Just one more:
Consider a small town with a barber, who, each day, shaves all inhabitants, who do not
shave themselves. The poor barber now faces a terrible dilemma: He will have to shave
himself if, and only if, he does not shave himself.
To avoid contradictions such as Russell’s antinomy, axiomatic set theory restricts the
construction of sets via so-called axioms, as we will see below.
Definition A.1. (a) The language of set theory consists precisely of the following sym-
bols: ∧, ¬, ∃, (, ), ∈, =, vj , where j = 1, 2, . . . .
(b) A set-theoretic formula is a finite string of symbols from the above language of set
theory that can be built using the following recursive rules:
Remark A.3. It is noted that, for a given finite string of symbols, a computer can, in
principle, check in finitely many steps, if the string constitutes a set-theoretic formula
or not. The symbols that can occur in a set-theoretic formula are to be interpreted
as follows: The variables v1 , v2 , . . . are variables for sets. The symbols ∧ and ¬ are
to be interpreted as the logical operators of conjunction and negation as described in
Sec. 1.2.2. Similarly, ∃ stands for an existential quantifier as in Sec. 1.4: The statement
∃vj (φ) means “there exists a set vj that has the property φ”. Parentheses ( and ) are
used to make clear the scope of the logical symbols ∃, ∧, ¬. Where the symbol ∈ occurs,
it is interpreted to mean that the set to the left of ∈ is contained as an element in the
set to the right of ∈. Similarly, = is interpreted to mean that the sets occurring to the
left and to the right of = are equal.
(φ) ∨ (ψ) is short for ¬((¬(φ)) ∧ (¬(ψ))) (cf. Th. 1.11(j)), (A.2a)
(φ) ⇒ (ψ) is short for (¬(φ)) ∨ (ψ) (cf. Th. 1.11(a)), (A.2b)
(φ) ⇔ (ψ) is short for ((φ) ⇒ (ψ)) ∧ ((ψ) ⇒ (φ)) (cf. Th. 1.11(b)). (A.2c)
A AXIOMATIC SET THEORY 187
Moreover,
Remark A.5. Even though axiomatic set theory requires the use of set-theoretic for-
mulas as described above, the systematic study of formal symbolic languages is the
subject of the field of mathematical logic and is beyond the scope of this class (see, e.g.,
[EFT07]). In Def. and Rem. 1.15, we defined a proof of statement B from statement A1
as a finite sequence of statements A1 , A2 , . . . , An such that, for 1 ≤ i < n, Ai implies
Ai+1 , and An implies B. In the field of proof theory, also beyond the scope of this class,
such proofs are formalized via a finite set of rules that can be applied to (set-theoretic)
formulas (see, e.g., [EFT07, Sec. IV], [Kun12, Sec. II]). Once proofs have been formal-
ized in this way, one can, in principle, mechanically check if a given sequence of symbols
does, indeed, constitute a valid proof (without even having to understand the actual
meaning of the statements). Indeed, several different computer programs have been
devised that can be used for automatic proof checking, for example Coq [Wik15a], HOL
Light [Wik15b], and Isabelle [Wik15c] to name just a few.
the mathematical structure called a group), and then to study the logical consequences
of these axioms (for example, group theory studies the statements that are true for all
groups as a consequence of the group axioms). For a given system of axioms, the ques-
tion if there exists an object satisfying all the axioms in the system (i.e. if the system
of axioms is consistent, i.e. free of contradictions) can be extremely difficult to answer.
—
We are now in a position to formulate and discuss the axioms of axiomatic set the-
ory. More precisely, we will present the axioms of Zermelo-Fraenkel set theory, usually
abbreviated as ZF, which are Axiom 0 – Axiom 8 below. While there exist various
set theories in the literature, each set theory defined by some collection of axioms, the
axioms of ZFC, consisting of the axioms of ZF plus the axiom of choice (Axiom 9, see
Sec. A.4 below), are used as the foundation of mathematics currently accepted by most
mathematicians.
Axiom 0 Existence:
∃ (X = X).
X
Recall that this is just meant to be a more readable transcription of the
set-theoretic formula ∃v1 (v1 = v1 ). The axiom of existence states that there
exists (at least one) set X.
In Def. 1.18 two sets are defined to be equal if, and only if, they contain precisely the
same elements. In axiomatic set theory, this is guaranteed by the axiom of extensionality:
Axiom 1 Extensionality:
∀ ∀ ∀ (z ∈ X ⇔ z ∈ Y ) ⇒ X = Y .
X Y z
Following [Kun12], we assume that the substitution property of equality is part of the
underlying logic, i.e. if X = Y , then X can be substituted for Y and vice versa without
changing the truth value of a (set-theoretic) formula. In particular, this yields the
converse to extensionality:
∀ ∀ X = Y ⇒ ∀ (z ∈ X ⇔ z ∈ Y ) .
X Y z
is an axiom. Thus, the comprehension scheme states that, given the set X,
there exists (at least one) set Y , containing precisely the elements of X that
have the property φ.
Remark A.8. Comprehension does not provide uniqueness. However, if both Y and
Y ′ are sets containing precisely the elements of X that have the property φ, then
∀ x ∈ Y ⇔ (x ∈ X ∧ φ) ⇔ x ∈ Y ′ ,
x
and, then, extensionality implies Y = Y ′ . Thus, due to extensionality, the set Y given
by comprehension is unique, justifying the notation
{x : x ∈ X ∧ φ} := {x ∈ X : φ} := Y (A.3)
symbols. Some logical subtleties arise from the fact that there is some ambiguity in the
way such abbreviations can be resolved: For example, 0 ∈ X can abbreviate either
ψ : ∃ φ(y) ∧ y ∈ X or χ : ∀ φ(y) ⇒ y ∈ X , where φ(y) stands for ∀ (v ∈ / y).
y y v
Then ψ and χ are equivalent if ∃! φ(y) is true (e.g., if Axioms 0 – 2 hold), but they can
y
be nonequivalent, otherwise (see discussion between Lem. 2.9 and Lem. 2.10 in [Kun80]).
—
At first glance, the role played by the free variables in φ, which are allowed to occur
in Axiom 2, might seem a bit obscure. So let us consider examples to illustrate that
allowing free variables (i.e. set parameters) in comprehension is quite natural:
Example A.11. (a) Suppose φ in comprehension is the formula x ∈ Z (having Z as a
free variable), then the set given by the resulting axiom is merely the intersection
of X and Z:
X ∩ Z := {x ∈ X : φ} = {x ∈ X : x ∈ Z}.
(b) Note that it is even allowed for φ in comprehension to have X as a free variable, so
one can let φ be the formula ∃ (x ∈ u ∧ u ∈ X) to define the set
u
n o
X ∗ := x ∈ X : ∃ (x ∈ u ∧ u ∈ X) .
u
(as C and M are non-sets) and, by extensionality, {C} = {M } were true, in contradic-
tion to a set with a cow inside not being the same as a set with a monkey inside. Thus,
we see that all objects of the mathematical universe must be so-called hereditary sets,
i.e. sets all of whose elements (thinking of the elements as being the children of the sets)
are also sets.
A.3.2 Classes
As we need to avoid contradictions such as Russell’s antinomy, we must not require the
existence of a set {x : φ} for each set-theoretic formula φ. However, it can still be
useful to think of a “collection” of all sets having the property φ. Such collections are
commonly called classes:
(b) If φ is a set-theoretic formula, then we say the class {x : φ} exists (as a set) if, and
only if
∃ ∀ x∈X ⇔ φ (A.4)
X x
Example A.13. (a) Due to Russell’s antinomy of Sec. A.1, we know that R := {x :
x∈
/ x} forms a proper class.
(b) The universal class of all sets, V := {x : x = x}, is a proper class. Once again,
this is related to Russell’s antinomy: If V were a set, then
R = {x : x ∈
/ x} = {x : x = x ∧ x ∈
/ x} = {x : x ∈ V ∧ x ∈
/ x}
Remark A.14. From the perspective of formal logic, statements involving proper
classes are to be regarded as abbreviations for statements without proper classes. For
example, it turns out that the class G of all sets forming a group is a proper class. But
we might write G ∈ G as an abbreviation for the statement “The set G is a group.”
A AXIOMATIC SET THEORY 192
Axioms 0 – 2 are still consistent with the empty set being the only set in existence (see
[Kun12, I.6.13]). The next axiom provides the existence of nonempty sets:
Axiom 3 Pairing:
∀ ∀ ∃ (x ∈ Z ∧ y ∈ Z).
x y Z
Thus, the pairing axiom states that, for all sets x and y, there exists a set Z
that contains x and y as elements.
0 := ∅, (A.5a)
1 := {0}, (A.5b)
2 := {0, 1} (A.5c)
(c) (x, y) := {{x}, {x, y}} the ordered pair given by x and y (cf. Def. 2.1).
While we now have the existence of the infinitely many different sets 0, {0}, {{0}}, . . . ,
we are not, yet, able to form sets containing more than two elements. This is remedied
by the following axiom:
Axiom 4 Union:
∀ ∃ ∀ ∀ (x ∈ X ∧ X ∈ M) ⇒ x ∈ Y .
M Y x X
Thus, the union axiom states that, for each set of sets M, there exists a set
Y containing all elements of elements of M.
Definition A.17. (a) If M is a set and Y is given by the union axiom, then define
[ [
M := X := x ∈ Y : ∃ x∈X .
X∈M
X∈M
X ∩ Y := {x ∈ X : x ∈ Y },
\
Ai := x ∈ Ai0 : ∀ x ∈ Ai ,
i∈I
i∈I
i.e. the intersection over the empty set is the class of all sets – in particular, a proper
class and not a set.
Thus, recalling (A.5), we have 1 = S(0), 2 = S(1); and we can define 3 := S(2), . . . In
general, we call the set S(x) the successor of the set x.
—
In Def. 2.3 and Def. 2.19, respectively, we define functions and relations in the usual
manner, making use of the Cartesian product A × B of two sets A and B, which,
according to (2.2) consists of all ordered pairs (x, y), where x ∈ A and y ∈ B. However,
Axioms 0 – 4 are not sufficient to justify the existence of Cartesian products. To obtain
Cartesian products, we employ the axiom of replacement. Analogous to the axiom
of comprehension, the following axiom of replacement actually consists of a scheme of
infinitely many axioms, one for each set-theoretic formula:
is an axiom. Thus, the replacement scheme states that if, for each x ∈ X,
there exists a unique y having the property φ (where, in general, φ will depend
on x), then there exists a set Y that, for each x ∈ X, contains this y with
property φ. One can view this as obtaining Y by replacing each x ∈ X by
the corresponding y = y(x).
Theorem A.20. If A and B are sets, then the Cartesian product of A and B, i.e. the
class
A × B := x : ∃ ∃ x = (a, b)
a∈A b∈B
exists as a set.
Proof. For each a ∈ A, we can use replacement with X := B and φ := φa being the
formula y = (a, x) to obtain the existence of the set
(in the usual way, comprehension and extensionality were used as well). Analogously,
using replacement again with X := A and φ being the formula y = {x} × B, we obtain
the existence of the set
M := {{x} × B : x ∈ A}. (A.6b)
In a final step, the union axiom now shows
[ [
M= {a} × B = A × B (A.6c)
a∈A
to be a set as well.
The following axiom of infinity guarantees the existence of infinite sets (e.g., it will allow
us to define the set of natural numbers N, which is infinite by Th. A.46 below).
Axiom 6 Infinity:
∃ 0∈X ∧ ∀ (x ∪ {x} ∈ X) .
X x∈X
Thus, the infinity axiom states the existence of a set X containing ∅ (iden-
tified with the number 0), and, for each of its elements x, its successor
S(x) = x ∪ {x}.
A AXIOMATIC SET THEORY 196
In preparation for our official definition of N in Def. A.41 below, we will study so-called
ordinals, which are special sets also of further interest to the field of set theory (the
natural numbers will turn out to be precisely the finite ordinals). We also need some
notions from the theory of relations, in particular, order relations (cf. Def. 2.19 and Def.
2.23).
(b) R is called a strict partial order if, and only if, R is asymmetric and transitive. It is
noted that this is consistent with Not. 2.24, since, recalling the notation ∆(X) :=
{(x, x) : x ∈ X}, R is a partial order on X if, and only if, R \ ∆(X) is a strict
partial order on X. We extend the notions lower/upper bound, min, max, inf, sup
of Def. 2.25 to strict partial orders R by applying them to R ∪∆(X): We call x ∈ X
a lower bound of Y ⊆ X with respect to R if, and only if, x is a lower bound of Y
with respect to R ∪ ∆(X), and analogous for the other notions.
(c) A strict partial order R is called a strict total order or a strict linear order if, and
only if, for each x, y ∈ X, one has x = y or xRy or yRx.
(d) R is called a (strict) well-order if, and only if, R is a (strict) total order and every
nonempty subset of X has a min with respect to R (for example, the usual ≤
constitutes a well-order on N (see, e.g., [Phi16, Th. D.5]), but not on R (e.g., R+
does not have a min)).
xSy :⇔ xRy
is called the restriction of R to Y , denoted S = R↾Y (usually, one still writes R for
the restriction).
Proof. (a): If a, b, c ∈ Y with aRb and bRc, then aRc, since a, b, c ∈ X and R is transitive
on X.
(b): If a ∈ Y , then a ∈ X and aRa, since R is reflexive on X.
(c): If a, b ∈ Y with aRb and bRa, then a = b, since a, b ∈ X and R is antisymmetric
on X.
(d): If a, b ∈ Y with aRb, then ¬bRa, since a, b ∈ X and R is asymmetric on X.
(e) follows by combining (a) – (d).
(f): If a, b ∈ Y with a = b and ¬aRb, then bRa, since a, b ∈ X and R is total on X.
Combining this with (e) yields (f).
(g): Due to (f), it merely remains to show that every nonempty subset Z ⊆ Y has a
min. However, since Z ⊆ X and R is a well-order on X, there is m ∈ Z such that m is
a min for R on X, implying m to be a min for R on Y as well.
Remark A.23. Since the universal class V is not a set, ∈ is not a relation in the sense
of Def. 2.19. It can be considered as a “class relation”, i.e. a subclass of V × V, but
it is a proper class. However, ∈ does constitute a relation in the sense of Def. 2.19 on
each set X (recalling that each element of X must be a set as well). More precisely, if
X is a set, then so is
R∈ := {(x, y) ∈ X × X : x ∈ y}. (A.8a)
Then
∀ (x, y) ∈ R∈ ⇔ x ∈ y. (A.8b)
x,y∈X
Definition A.24. A set X is called transitive if, and only if, every element of X is also
a subset of X:
∀ x ⊆ X. (A.9a)
x∈X
Example A.27. Using (A.5), 0 = ∅ is an ordinal, and 1 = S(0), 2 = S(1) are both
successor ordinals (in Prop. A.43, we will identify N0 as the smallest limit ordinal). Even
though X := {1} and Y := {0, 2} are well-ordered by ∈, they are not ordinals, since
they are not transitive sets: 1 ∈ X, but 1 6⊆ X (since 0 ∈ 1, but 0 ∈ / X); similarly,
1 ∈ 2 ∈ Y , but 1 ∈
/ Y.
Lemma A.28. No ordinal contains itself, i.e.
∀ α∈
/ α.
α∈ON
Proposition A.31. On the class ON, the relation ≤ (as defined in (A.10)) is the same
as the relation ⊆, i.e.
∀ α ≤ β ⇔ α ⊆ β ⇔ (α ∈ β ∨ α = β) . (A.11)
α,β∈ON
(a) X is well-ordered by ∈.
(b) X is an ordinal if, and only if, X is transitive. Note: A transitive set of ordinals
X is sometimes called an initial segment of ON, since, here, transitivity can be
restated in the form
∀ ∀ α<β ⇒ α∈X . (A.13)
α∈ON β∈X
S
(b) Then δ := X is an ordinal, namely δ = sup X. In particular, if α, β ∈ ON, then
max{α, β} = α ∪ β.
∀ α ⊆ u.
α∈X
Next, we obtain some results regarding the successor function of Def. A.19 in the context
of ordinals.
This contradiction to α ∈
/ α yields x 6= α, concluding the proof.
(c) For each ordinal β, β < S(α) holds if, and only if, β ≤ α.
Proof. (a): Due to Prop. A.29, S(α) is a set of ordinals. Thus, by Cor. A.34(b), it
merely remains to prove that S(α) is transitive. Let x ∈ S(α). If x = α, then x =
α ⊆ α ∪ {α} = S(α). If x 6= α, then x ∈ α and, since α is transitive, this implies
x ⊆ α ⊆ S(α), showing S(α) to be transitive, thereby completing the proof of (a).
(b) holds, as α ∈ S(α) holds by the definition of S(α).
(c) is clear, since, for each ordinal β,
Definition A.38. An ordinal n is called a natural number if, and only if,
n 6= 0 ∧ ∀ m ≤ n ⇒ m = 0 ∨ m is successor ordinal .
m∈ON
k < m ∧ m ≤ n ⇒ k ≤ n ⇒ k ∈ S(n).
Definition A.41. If the set X is given by the axiom of infinity, then we use compre-
hension to define the set
N0 := {n ∈ X : n = 0 ∨ n is a natural number}
Proof. “⇒” is clear from Def. A.41 and “⇐” is due to Th. A.40.
In the following Th. A.44, we will prove that N satisfies the Peano axioms P1 – P3 of
Sec. 3.1 (if one prefers, one can show the same for N0 , where 0 takes over the role of 1).
Theorem A.44. The set of natural numbers N satisfies the Peano axioms P1 – P3 of
Sec. 3.1.
Proof. For P1 and P2, we have to show that, for each n ∈ N, one has S(n) ∈ N \ {1}
and that S(m) 6= S(n) for each m, n ∈ N, m 6= n. Let n ∈ N. Then S(n) ∈ N by Prop.
A.39. If S(n) = 1, then n < S(n) = 1 by Prop. A.37(b), i.e. n = 0, in contradiction to
n ∈ N. If m, n ∈ N with m 6= n, then S(m) 6= S(n) is due to Prop. A.37(d). To prove
A AXIOMATIC SET THEORY 204
P3, suppose A ⊆ N has the property that 1 ∈ A and S(n) ∈ A for each n ∈ A. We need
to show A = N (i.e. N ⊆ A, as A ⊆ N is assumed). Let X := A ∪ {0}. Then X satisfies
(A.14) and Th. A.40 yields N0 ⊆ X. Thus, if n ∈ N, then n ∈ X \ {0} = A, showing
N ⊆ A.
Notation A.45. For each n ∈ N0 , we introduce the notation n + 1 := S(n) (more
generally, one also defines α + 1 := S(α) for each ordinal α).
Theorem A.46. Let n ∈ N0 . Then A := N0 \ n is infinite (see Def. 3.10(b)). In
particular, N0 and N = N0 \ {0} = N0 \ 1 are infinite.
For more basic information regarding ordinals see, e.g., [Kun12, Sec. I.8].
There is one more basic construction principle for sets that is not covered by Axioms 0
– 6, namely the formation of power sets. This needs another axiom:
Definition A.47. If X is a set and M is given by the power set axiom, then we call
P(X) := {Y ∈ M : Y ⊆ X}
the power set of X. Another common notation for P(X) is 2X (cf. Prop. 2.18).
A.3.6 Foundation
Foundation is, perhaps, the least important of the axioms in ZF. It basically cleanses
the mathematical universe of unnecessary “clutter”, i.e. of certain pathological sets that
are of no importance to standard mathematics anyway.
Axiom 8 Foundation:
∀ ∃ (x ∈ X) ⇒ ∃ ¬∃ z ∈ x ∧ z ∈ X .
X x x∈X z
Thus, the foundation axiom states that every nonempty set X contains an
element x that is disjoint to X.
Theorem A.48. Due to the foundation axiom, the ∈ relation can have no cycles, i.e.
there do not exist sets x1 , x2 , . . . , xn , n ∈ N, such that
x1 ∈ x2 ∈ · · · ∈ xn ∈ x1 . (A.16a)
¬∃ x ∈ x. (A.16b)
x
Proof. If there were sets x1 , x2 , . . . , xn , n ∈ N, such that (A.16a) were true, then, by
using the pairing axiom and the union axiom, we could form the set
X := {x1 , . . . , xn }.
For a detailed explanation, why “sets” forbidden by foundation do not occur in standard
mathematics, anyway, see, e.g., [Kun12, Sec. I.14].
A AXIOMATIC SET THEORY 206
Thus, the axiom of choice postulates, for each nonempty set M, whose ele-
ments are all nonempty sets, the existence of a choice function, that means
a function that assigns, to each M ∈ M, an element m ∈ M .
Example A.49. For example, the axiom of choice postulates, for each nonempty set
A, the existence of a choice function on P(A) \ {∅} that assigns each subset of A one of
its elements.
—
The axiom of choice is remarkable since, at first glance, it seems so natural that one
can hardly believe it is not provable from the axioms in ZF. However, one can actually
show that it is neither provable nor disprovable from ZF (see, e.g., [Jec73, Th. 3.5, Th.
5.16] – such a result is called an independence proof, see [Kun80] for further material). If
you want to convince yourself that the existence of choice functions is, indeed, a tricky
matter, try to define a choice function on P(R) \ {∅} without AC (but do not spend too
much time on it – one can show this is actually impossible to accomplish).
Theorem A.52 below provides several important equivalences of AC. Its statement and
proof needs some preparation. We start by introducing some more relevant notions from
the theory of partial orders:
Definition A.50. Let X be a set and let ≤ be a partial order on X.
(a) An element m ∈ X is called maximal (with respect to ≤) if, and only if, there exists
no x ∈ X such that m < x (note that a maximal element does not have to be a
max and that a maximal element is not necessarily unique).
A AXIOMATIC SET THEORY 207
(b) A nonempty subset C of X is called a chain if, and only if, C is totally ordered by
≤. Moreover, a chain C is called maximal if, and only if, no strict superset Y of C
(i.e. no Y ⊆ X such that C ( Y ) is a chain.
The following lemma is a bit technical and will be used to prove the implication AC
⇒ (ii) in Th. A.52 (other proofs in the literature often make use of so-called transfinite
recursion, but that would mean further developing the theory of ordinals, and we will
not pursue this route in this class).
Lemma A.51. Let X be a set and let ∅ 6= M ⊆ P(X) be a nonempty set of subsets of
X. We let M be partially ordered by inclusion, i.e. setting A ≤ B :⇔ A ⊆ B for each
A, B ∈ M. Moreover, define
[ [
∀ S := S (A.17)
S⊆M
S∈S
and assume [
∀ C is a chain ⇒ C∈M . (A.18)
C⊆M
Proof. Fix some arbitrary M0 ∈ M. We call T ⊆ M an M0 -tower if, and only if, T
satisfies the following three properties
(i) M0 ∈ T .
S
(ii) If C ⊆ T is a chain, then C ∈T.
The main work of the rest of the proof consists of showing that T0 is a chain. To show
T0 to be a chain, define
Γ := M ∈ T0 : ∀ (M ⊆ N ∨ N ⊆ M ) . (A.22)
N ∈T0
∀ Φ(M ) := {N ∈ T0 : N ⊆ M ∨ g(M ) ⊆ N }
M ∈Γ
and also show each Φ(M ) to be an M0 -tower. Actually, Γ and each Φ(M )Ssatisfy (i)
due to (A.21). To verify Γ satisfies (ii), let C ⊆ Γ be a chain and U := C. Then
U ∈ T0 , since T0 satisfies (ii). If N ∈ T0 , and C ⊆ N for each C ∈ C, then U ⊆ N . If
N ∈ T0 , and there is C ∈ C such that C 6⊆ N , then N ⊆ C (since C ∈ Γ), i.e. N ⊆ U ,
showing U ∈ Γ and Γ satisfyingS(ii). Now, let M ∈ Γ. To verify Φ(M ) satisfies (ii), let
C ⊆ Φ(M ) be a chain and U := C. Then U ∈ T0 , since T0 satisfies (ii). If U ⊆ M , then
U ∈ Φ(M ) as desired. If U 6⊆ M , then there is x ∈ U such that x ∈ / M . Thus, there
is C ∈ C such that x ∈ C and g(M ) ⊆ C (since C ∈ Φ(M )), i.e. g(M ) ⊆ U , showing
U ∈ Φ(M ) also in this case, and Φ(M ) satisfies (ii). We will verify that Φ(M ) satisfies
(iii) next. For this purpose, fix N ∈ Φ(M ). We need to show g(N ) ∈ Φ(M ). We already
know g(N ) ∈ T0 , as T0 satisfies (iii). As N ∈ Φ(M ), we can now distinguish three cases.
Case 1: N ( M . In this case, we cannot have M ( g(N ) (otherwise, #(g(N ) \ N ) ≥ 2
in contradiction to (A.19)). Thus, g(N ) ⊆ M (since M ∈ Γ), showing g(N ) ∈ Φ(M ).
Case 2: N = M . Then g(N ) = g(M ) ∈ Φ(M ) (since g(M ) ∈ T0 and g(M ) ⊆ g(M )).
Case 3: g(M ) ⊆ N . Then g(M ) ⊆ g(N ) by (A.19), again showing g(N ) ∈ Φ(M ). Thus,
we have verified that Φ(M ) satisfies (iii) and, therefore, is an M0 -tower. Then, by the
definition of T0 , we have T0 ⊆ Φ(M ). As we also have Φ(M ) ⊆ T0 (from the definition
of Φ(M )), we have shown
∀ Φ(M ) = T0 .
M ∈Γ
Theorem A.52 (Equivalences to the Axiom of Choice). The following statements (i)
– (v) are equivalent to the axiom of choice (as stated as Axiom 9 above).
Q
(i) Every Cartesian product i∈I Ai of nonempty sets Ai , where I is a nonempty
index set, is nonempty (cf. Def. 2.15(c)).
(ii) Hausdorff’s Maximality Principle: Every nonempty partially ordered set X con-
tains a maximal chain (i.e. a maximal totally ordered subset).
(iii) Zorn’s Lemma: Let X be a nonempty partially ordered set. If every chain C ⊆ X
(i.e. every nonempty totally ordered subset of X) has an upper bound in X (such
chains with upper bounds are sometimes called inductive), then X contains a
maximal element (cf. Def. A.50(a)).
(iv) Zermelo’s Well-Ordering Theorem: Every set can be well-ordered (recall the defi-
nition of a well-order from Def. A.21(d)).
to prove (i).
Next, we will show AC ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ AC.
“AC ⇒ (ii)”: Assume AC and let X be a nonempty partially ordered set. Let M be the
set of all chains in X (i.e. the set of all nonempty totally ordered subsets of X). Then
∅∈/ M and M 6= ∅ (since X 6= ∅ and {x} ∈ M for each x ∈ X). Moreover, M satisfies
the hypothesis
S of Lem. A.51, since, if C ⊆ M is a chain of totally ordered subsets of X,
then C is a totally ordered subset of X, i.e. in M (here we have used the notation
of (A.17); also note that we are dealing with two different types of chains here, namely
those with respect to the order on X and those with respect to the order given by ⊆ on
M). Let f : P(X) \ {∅} −→ X be a choice function given by AC, i.e. such that
∀ f (Y ) ∈ Y.
Y ∈P(X)\{∅}
A AXIOMATIC SET THEORY 210
Since g clearly satisfies (A.19), Lem. A.51 applies, providing an M ∈ M such that
g(M ) = M . Thus, M ∗ = ∅, i.e. M is a maximal chain, proving (ii).
“(ii) ⇒ (iii)”: Assume (ii). To prove Zorn’s lemma, let X be a nonempty set, partially
ordered by ≤, such that every chain C ⊆ X has an upper bound. Due to Hausdorff’s
maximality principle, we can assume C ⊆ X to be a maximal chain. Let m ∈ X be an
upper bound for the maximal chain C. We claim that m is a maximal element: Indeed,
if there were x ∈ X such that m < x, then x ∈ / C (since m is upper bound for C) and
C ∪ {x} would constitute a strict superset of C that is also a chain, contradicting the
maximality of C.
“(iii) ⇒ (iv)”: Assume (iii) and let X be a nonempty set. We need to construct a
well-order on X. Let W be the set of all well-orders on subsets of X, i.e.
W := (Y, W ) : Y ⊆ X ∧ W ⊆ Y × Y ⊆ X × X is a well-order on Y .
(recall the definition of the restriction of a relation from Def. A.21(e)). To apply Zorn’s
lemma to (W, ≤), we need to check that every chain C ⊆ W has an upper bound. To
this end, if C ⊆ W is a chain, let
[ [
UC := (YC , WC ), where YC := Y, WC := W.
(Y,W )∈C (Y,W )∈C
i.e., for each nonempty set M, whose elements are all nonempty sets, there exists a
function that assigns, to each M ∈ M, a finite subset of M . Neither the proof that (i)
– (iv) of Th. A.52 are each equivalent to AC nor the proof in [Bla84] that Th. A.52(v)
A AXIOMATIC SET THEORY 212
implies AMC makes use of the axiom of foundation (Axiom 8 above). However, the
proof that AMC implies AC does require the axiom of foundation (cf., e.g., [Hal17, Ch.
6]).
A.5 Cardinality
The following theorem provides two interesting, and sometimes useful, characterizations
of infinite sets:
Theorem A.54. Let A be a set. Using the axiom of choice (AC) of Sec. A.4, the
following statements (i) – (iii) are equivalent. More precisely, (ii) and (iii) are equivalent
even without AC (a set A is sometimes called Dedekind-infinite if, and only if, it satisfies
(iii)), (iii) implies (i) without AC, but AC is needed to show (i) implies (ii), (iii).
(i) A is infinite.
One sometimes expresses the equivalence between (i) and (ii) by saying that a set is
infinite if, and only if, it contains a copy of the natural numbers. The property stated in
(iii) might seem strange at first, but infinite sets are, indeed, precisely those identical in
size to some of their strict subsets (as an example think of the natural bijection n 7→ 2n
between all natural numbers and the even numbers).
Proof. We first prove, without AC, the equivalence between (ii) and (iii).
“(ii) ⇒ (iii)”: Let E denote the even numbers. Then E ( N and h : N −→ E,
h(n) := 2n, is a bijection, showing that (iii) holds for the natural numbers. According to
(ii), there exists M ⊆ A and a bijective map f : M −→ N. Define B := (A\M ) ∪˙ f −1 (E)
and (
x for x ∈ A \ M ,
h : A −→ B, h(x) := −1
(A.23)
f ◦ h ◦ f (x) for x ∈ M .
Then B ( A since B does not contain the elements of M that are mapped to odd
numbers under f . Still, h is bijective, since h↾A\M = IdA\M and h↾M = f −1 ◦ h ◦ f is the
composition of the bijective maps f , h, and f −1 ↾E : E −→ f −1 (E).
“(iii) ⇒ (ii)”: As (iii) is assumed, there exist B ⊆ A, a ∈ A \ B, and a bijective map
g : A −→ B. Set
M := {an := g n (a) : n ∈ N}.
A AXIOMATIC SET THEORY 213
#A ≤ #B ∨ #B ≤ #A, (A.27)
Proof. To apply Zorn’s lemma of Th. A.52(iii), we define a partial order on the set
n o
M := (M, N, f ) : M ⊆ A, N ⊆ B, f : M −→ N is bijective (A.28)
by letting
(M, N, f ) ≤ (U, V, g) :⇔ M ⊆ U, g↾M = f. (A.29)
Then M contains the empty map, (∅, ∅, ∅) ∈ M, i.e. S M 6= ∅. Every chain C ⊆ M
has an upper bound, namely (MC , fC ) with MC := (M,N,f )∈C M and fC (x) := f (x),
where (M, N, f ) ∈ C is chosen such that x ∈ M (since C is a chain, the value of fC (x)
does not actually depend on S the choice of (M, N, f ) ∈ C and is, thus, well-defined).
Clearly, MC ⊆ A and NC := (M,N,f )∈C N ⊆ B. We need to verify fC : MC −→ NC
to be bijective. If x, y ∈ MC , then, since C is a chain, there exists (M, N, f ) ∈ C with
x, y ∈ M . As f is injective, we have, for x 6= y, that fC (x) = f (x) 6= f (y) = fC (y),
showing fC to be injective as well. If z ∈ NC , then z ∈ N for some (f, M, N ) ∈ C. Since
f : M −→ N is surjective, there exists x ∈ M ⊆ MC with fC (x) = f (x) = z, showing fC
to be surjective as well, proving (MC , NC , fC ) ∈ M. To see (MC , NC , fC ) to be an upper
bound for C, note that the definition of (MC , NC , fC ) immediately implies M ⊆ MC for
each (M, N, f ) ∈ C and fC ↾M = f for each (M, N, f ) ∈ C. Thus, Zorn’s lemma applies,
yielding a maximal element (Mmax , Nmax , fmax ) ∈ M. If a ∈ A \ Mmax and b ∈ B \ Nmax ,
then (
b for x = a,
f : {a} ∪˙ Mmax −→ {b} ∪˙ Nmax , f (x) :=
fmax (x) for x ∈ Mmax
is a bijective extension of fmax . Thus, the maximality of (Mmax , Nmax , fmax ) implies
Mmax = A or Nmax = B, completing the proof.
Theorem A.56. Let A, B be sets, let A be infinite and assume there exists an injective
map φB : B −→ A. Then
#(A ∪ B) = #A, (A.30)
i.e. there exists a bijective map, mapping A onto A ∪ B (this result makes use of AC).
A AXIOMATIC SET THEORY 215
Proof. Since the map a 7→ a always maps A injectively into A ∪ B, it remains to show
the existence of an injective map φ : A ∪ B −→ A (then (A.30) holds due to the
Schröder-Bernstein Th. 3.12). Possibly replacing B with A \ B, we may also assume A
and B to be disjoint, without loss of generality. Thus, let M := A ∪˙ B. We apply Zorn’s
lemma of Th. A.52(iii) to
M := A ⊆ P(A) : ∀ #X = #N ∧ ∀ X 6= Y ⇒ X ∩ Y = ∅ , (A.31)
X∈A X,Y ∈A
Theorem A.57. Let A, B be nonempty sets, let A be infinite and assume there exists
an injective map φB : B −→ A. Then
i.e. there exists a bijective map, mapping A onto A × B (this result makes use of AC).
A AXIOMATIC SET THEORY 216
Proof. Since the map (a, b) 7→ (a, φB (b)) is injective from A × B into A × A, and the
map a 7→ (a, a) is injective from A into A × A, it suffices to show the existence of an
injective map from A × A into A (then (A.30) holds due to the Schröder-Bernstein Th.
3.12). However, we will actually directly prove the existence of a bijective map from A
onto A × A: To apply Zorn’s lemma of Th. A.52(iii), we define a partial order on the
set n o
M := (D, f ) : D ⊆ A, f : D −→ D × D is bijective (A.33)
by letting
(D, f ) ≤ (E, g) :⇔ D ⊆ E, g↾D = f. (A.34)
Then M 6= ∅, since there exists a bijective map f : N −→ N × N by Th. 3.28 and
a bijective map between some D ⊆ A and N by STh. A.54(ii). Every chain C ⊆ M
has an upper bound, namely (DC , fC ) with DC := (D,f )∈C D and fC (x) := f (x), where
(D, f ) ∈ C is chosen such that x ∈ D (since C is a chain, the value of fC (x) does not
actually depend on the choice of (D, f ) ∈ C and is, thus, well-defined). Clearly, DC ⊆ A,
and we merely need to verify fC : DC −→ DC × DC to be bijective. If x, y ∈ DC , then,
since C is a chain, there exists (D, f ) ∈ C with x, y ∈ D. As f is injective, we have
fC (x) = f (x) 6= f (y) = fC (y), showing fC to be injective as well. If (x, y) ∈ DC × DC
and (D, f ) ∈ C with x, y ∈ D as before, then the surjectivity of f : D −→ D × D yields
z ∈ D ⊆ DC with fC (z) = f (z) = (x, y), showing fC to be surjective as well, proving
(DC , fC ) ∈ M. To see (DC , fC ) to be an upper bound for C, note that the definition
of (DC , fC ) immediately implies D ⊆ DC for each (D, f ) ∈ C and fC ↾D = f for each
(D, f ) ∈ C. Thus, Zorn’s lemma applies, yielding a maximal element (Dmax , fmax ) ∈ M.
We would like to prove Dmax = A. However, in general, this can not be expected to
hold (for example, if A = N, and N \ D is finite, then (D, f ) ∈ M is always maximal
– since (E × E) \ (D × D) is infinite for each D ( E ⊆ A, f does not have a bijective
extension g : E −→ E × E). What we can prove, though, is the existence of a bijective
map φ : Dmax −→ A, which suffices, since the map
φ̃ : A −→ A × A, φ̃ := (φ, φ) ◦ fmax ◦ φ−1 ,
is then also bijective. It remains to establish the existence of the bijective map φ :
Dmax −→ A. Arguing via contraposition, we assume the nonexistence of a bijective
φ and show (Dmax , fmax ) is not maximal: Let E := A \ Dmax . Then A = Dmax ∪˙ E.
According to Th. A.56, there does not exist an injective map from E into Dmax . Then,
according to Th. A.55, there exists a bijective map φD : Dmax −→ N , N ⊆ E. Define
˙
P := (N × N ) ∪(N ˙ max × N ).
× Dmax ) ∪(D
Since φD : Dmax −→ N is bijective and fmax : Dmax −→ Dmax × Dmax is bijective, we
conclude
#(N × N ) = #(N × Dmax ) = #(Dmax × N ) = #Dmax = #N.
A AXIOMATIC SET THEORY 217
f : N ∪˙ Dmax −→ P ∪(D
˙ max × Dmax ) = (Dmax ∪˙ N ) × (Dmax ∪˙ N ),
which is a bijective extension of fmax , i.e. (N ∪˙ Dmax , f ) ∈ M and (Dmax , fmax ) is not
maximal. Thus, the maximality of (Dmax , fmax ) implies the existence of a bijective
φ : Dmax −→ A as desired.
i.e. there exists a bijective map, mapping A onto An (this result makes use of AC).
Proof. The claimed rule (A.35) follows via a simple induction from Th. A.57, since, for
each n ∈ N with n ≥ 2, the map
φ : An −→ An−1 × A, φ(a1 , . . . , an ) := (a1 , . . . , an−1 ), an ,
Theorem A.59. Let A be a set and let Pfin (A) denote the set of finite subsets of A.
i.e. there exists a bijective map, mapping A onto Pfin (A) as well as a bijective map,
mapping A onto Pn (A), where Pn (A) denotes the set of subsets of A with precisely
n elements (both results of (b) make use of AC).
A AXIOMATIC SET THEORY 218
Then, clearly, each ιn is injective as a map into Pn (A) and, due to Pn (A) ⊆ Pfin (A),
also as a map into Pfin (A). Thus, it remains to show the existence of injective maps
φn : Pn (A) −→ A and φ : Pfin (A) −→ A (then (A.37) holds due to the Schröder-
Bernstein Th. 3.12). Fix n ∈ N. Due to Cor. A.58, it suffices to define an injective map
φ̃n : Pn (A) −→ An . For each B ∈ Pn (A), let φB : {1, . . . , n} −→ B be bijective and
define
φ̃n : Pn (A) −→ An , φ̃n (B) := φB .
If B, C ∈ Pn (A) with x ∈ B \ C, then x ∈ φB ({1, . . . , n}), but x ∈ / φC ({1, . . . , n}),
showing φB 6= φC , i.e. φ̃n is injective. To obtain φ, it suffices to define an injective map
φ̃ : Pfin (A) −→ A × N, as there exists a bijective map from A × N onto A by Th. A.57.
Since
˙
[
Pfin (A) = Pn (A), (A.38)
n∈N
we may define
φ̃ : Pfin (A) −→ A × N, φ̃(B) := φn (B), n for B ∈ Pn (A),
i.e. there exist bijective maps φ1 : AB −→ {0, 1}B and φ2 : {0, 1}B −→ P(B), as well as
bijective maps φ1,f : AB B B
fin −→ {0, 1}fin and φ2,f : {0, 1}fin −→ Pfin (B) (this result makes
use of AC). For the purposes of (A.39b), we introduce the notation 0 := ι2 (0) ∈ A, so
that both AB B
fin and 2fin make sense according to the definition in Ex. 5.16(c).
A AXIOMATIC SET THEORY 219
Proof. The existence of φ2 was already established in Prop. 2.18, the existence of φ2,f in
Th. A.59(a). Thus, according to the Schröder-Bernstein Th. 3.12, for the existence of φ1 ,
it suffices to show the existence of injective maps f1 : {0, 1}B −→ AB , f2 : AB −→ B B ,
f3 : B B −→ P(B × B), as well as a bijective map g : P(B × B) −→ P(B), which can
be stated concisely as
Analogously, for the existence of φ1,f , it suffices to show the existence of injective maps
f1,f : {0, 1}B B B B B
fin −→ Afin , f2,f : Afin −→ Bfin , f3,f : Bfin −→ Pfin (B × B), as well as a
bijective map gf : Pfin (B × B) −→ Pfin (B), which can be stated concisely as
#2B B B
fin ≤ #Afin ≤ #Bfin ≤ #Pfin (B × B) = #Pfin (B) (A.40b)
B
(for Bfin to make sense according to the definition in Ex. 5.16(c), we now also introduce
the notation 0 := ιA (ι2 (0)) ∈ B). Clearly, the maps
are well-defined and injective as well. While the restriction of f3 does not work, as it
does not map into Pfin (B × B), we can use the injective map
B
f3,f : Bfin −→ Pfin (B × B), f3,f (α) := {(x, y) ∈ B × B : y = α(x) 6= 0}.
to be bijective as well, thereby proving (A.40a) and (A.39a). Then the restriction
B.1 Associativity
In the literature, the general law of associativity is often stated in the form that
a1 a2 · · · an gives the same result “for every admissible way of inserting parentheses into
a1 a2 · · · an ”, but a completely precise formulation of what that actually means seems to
be rare. As a warm-up, we first prove a special case of the general law:
n−1
Y n
Y
ind.hyp. (3.16a)
= an · ai = ai , (B.3)
i=1 i=1
The difficulty in stating the general form of the law of associativity lies in giving a
precise definition of what one means by “an admissible way of inserting parentheses into
a1 a2 · · · an ”. So how does one actually proceed to calculate the value of a1 a2 · · · an , given
that parentheses have been inserted in an admissible way? The answer is that one does
it in n − 1 steps, where, in each step, one combines two juxtaposed elements, consistent
B ASSOCIATIVITY AND COMMUTATIVITY 221
with the inserted parentheses. There can still be some ambiguity: For example, for
(a1 a2 )(a3 (a4 a5 )), one has the freedom of first combining a1 , a2 , or of first combining
a4 , a5 . In consequence, our general law of associativity will show that, for each admissible
sequence of n − 1 directives for combining two juxtaposed elements, the final result is
the same (under the hypothesis that (B.2) holds). This still needs some preparatory
work.
In the following,
Qn one might see it as a slight notational inconvenience that we have
defined i=1 ai as an · · · a1 rather than a1 · · · an . For this reason, we will enumerate the
elements to be combined by composition from right to left rather then from left to right.
Definition and Remark B.2. Let A be a nonempty set with a composition · : A ×
A −→ A, let n ∈ N, n ≥ 2, and let I be a totally ordered index set, #I = n, I =
{i1 , . . . , in } with i1 < · · · < in . Moreover, let F := (ain , . . . , ai1 ) be a family of n
elements of A.
(a) An admissible composition directive (for combining two juxtaposed elements of the
family) is an index ik ∈ I with 1 ≤ k ≤ n − 1. It transforms the family F into
the family G := (ain , . . . , aik+1 aik , . . . , ai1 ). In other words, G = (bj )j∈J , where
J := I \ {ik+1 }, bj = aj for each j ∈ J \ {ik }, and bik = aik+1 aik . We can write this
transformation as two maps
(1)
F 7→ δik (F ) := G = (ain , . . . , aik+1 aik , . . . , ai1 ) = (bj )j∈J , (B.4a)
(2)
I 7→ δik (I) := J = I \ {ik+1 }. (B.4b)
Thus, an application of an admissible composition directive reduces the length of
the family and the number of indices by one.
(b) Recursively, we define (finite) sequences of families, index sets, and indices as fol-
lows:
Fn := F, In := I, (B.5a)
(1) (2)
∀ Fα−1 := δjα (Fα ), Iα−1 := δjα (Iα ), where jα ∈ Iα \ {max Iα }.
α∈{2,...,n}
(B.5b)
The corresponding sequence of indices D := (jn , . . . , j2 ) in I is called an admissible
evaluation directive. Clearly,
∀ #Iα = α, i.e. Fα has length α. (B.6)
α∈{1,...,n}
In particular, I1 = {j2 } = {i1 } (where the second equality follows from (B.4b)),
F1 = (a), and we call
D(F ) := a (B.7)
B ASSOCIATIVITY AND COMMUTATIVITY 222
Proof. We conduct the proof via induction on n. For n = 3, there are only two possible
directives and (B.2) guarantees that they yield the same result. For the induction step,
let n > 3. As in Def. and Rem. B.2(b), we write D = (jn , . . . , j2 ) and obtain some
I2 = {i1 , im }, 1 < m ≤ n, as the corresponding penultimate index set. Depending on
im , we partition (jn , . . . , j3 ) as follows: Set
J1 := k ∈ {3, . . . , n} : jk < im , J2 := k ∈ {3, . . . , n} : jk ≥ im . (B.9)
B.2 Commutativity
In the present section, we will generalize the law of commutativity ab = ba to a finite
number of factors, provided the composition is also associative.
Theorem B.4 (General Law of Commutativity). Let (A, ·) be a semigroup (i.e. · :
A × A −→ A is an associative composition on A). If the composition is commutative,
i.e. if
∀ ab = ba, (B.17)
a,b∈A
then n n
Y Y
∀ ∀ ∀ ai = aπ(i) , (B.18)
n∈N π∈Sn a1 ,...,an ∈A
i=1 i=1
Proof. We conduct the proof via induction on n: For n = 1, there is nothing to prove
and, for n = 2, (B.18) is the same as (B.17). So let n > 2. As π is bijective, we may
define k := π −1 (n). Then
n n
! k−1
!
Y Th. B.3
Y Y
aπ(i) = aπ(i) · aπ(k) · aπ(i)
i=1 i=k+1 i=1
n
! k−1
!
(B.17) Y Y
= aπ(k) · aπ(i) · aπ(i) . (B.19)
i=k+1 i=1
C GROUPS 224
The following example shows that, if the composition is not associative, then, in general,
(B.17) does not imply (B.18):
Example B.5. Let A := {a, b, c} with #A = 3 (i.e. the elements a, b, c are all distinct).
Let the composition · on A be defined according to the composition table
· a b c
a b b b
b b b a
c b a a
Then, clearly, · is commutative. However, · is not associative, since, e.g.,
(cb)a = aa = b 6= a = cb = c(ba), (B.22)
and (B.18) does not hold, since, e.g.,
a(bc) = aa = b 6= a = cb = c(ba). (B.23)
C Groups
In Ex. 4.9(b), we defined the symmetric group SM of bijective maps from M into M (i.e.
of so-called permutations). It is a remarkable result that every group G is isomorphic
to a subgroup of the permutation group SG and the proof is surprisingly simple (cf.
Th. C.2 below). On the other hand, this result seems to have surprisingly few useful
applications, partly due to the fact that SG is usually much bigger (and, thus, usually
more difficult to study) than G (cf. Prop. C.4 below).
We start with a preparatory lemma:
C GROUPS 225
Lemma C.1. Let M, N be sets and let ψ : M −→ N be bijective. Then the symmetric
groups SM and SN are isomorphic.
Proof. Exercise.
Proof. We conduct the proof via induction: If n = 0, then S contains precisely the
empty map (i.e. the empty set) and #S = 1 = 0! is true. If n = 1 and M = {a},
N = {b}, then S contains precisely the map f : M −→ N , f (a) = b, and #S = 1 = 1!
is true. For the induction step, fix n ∈ N and assume #M = #N = n + 1. Let a ∈ M
and [
A := S M \ {a}, N \ {b} . (C.2)
b∈N
i.e., once again, φ(f ) 6= φ(g). Thus, φ is injective. Now let h ∈ S M \ {a}, N \ {b} for
some b ∈ N . Letting
(
b for x = a,
f : M −→ N, f (x) :=
h(x) for x 6= a,
D Number Theory
The algebraic discipline called number theory, studies properties of the ring of integers
(and similar algebraic structures).
Theorem D.1 (Remainder Theorem). For each pair of numbers (a, b) ∈ N2 , there
exists a unique pair of numbers (q, r) ∈ N20 satisfying the two conditions a = qb + r and
0 ≤ r < b.
Definition D.2. (a) Let a, k ∈ Z. We define a to be a divisor of k (and also say that
a divides k, denoted a| k) if, and only if, there exists b ∈ Z such that k = ab. If a
is no divisor of k, then we write a 6 | k.
(b) A number p ∈ N, p ≥ 2, is called prime if, and only if, 1 and p are its only divisors.
is called the greatest common divisor of the numbers in M . If M = {a, b}, then
one also writes gcd(a, b) := gcd(M ). The numbers in M are called relatively prime
or mutually prime if, and only if gcd(M ) = 1. If M is finite and 0 ∈
/ M , then the
number
lcm(M ) := min n ∈ N : ∀ k| n (D.3)
k∈M
is called the least common multiple of the numbers in M . If M = {a, b}, then one
also writes lcm(a, b) := lcm(M ).
Theorem D.4 (Bézout’s Lemma). If a, b ∈ Z with {a, b} 6= {0} and d := gcd(a, b),
then
{xa + yb : x, y ∈ Z} = d Z. (D.4)
In particular,
∃ xa + yb = d, (D.5)
x,y∈Z
gcd(a, b) = 1 ⇒ ∃ xa + yb = 1. (D.6)
x,y∈Z
S := {xa + yb : x, y ∈ Z} ∩ N = d N. (D.7)
By setting y := 0 and x := ±1, we see that S contains a or −a, i.e. S 6= ∅. Let d := min S.
Then there exist s, t ∈ Z with sa + tb = d. It remains to show d := gcd(a, b). We apply
Th. D.1, to obtain (q, r) ∈ N20 with
|a| = qd + r ∧ 0 ≤ r < d.
Since d = min S and r < d, this yields r = 0 and d | a. Using b instead of a in the above
reasoning, yields d | b as well, showing d to be a common divisor of a and b. Now let
c ∈ N be an arbitrary common divisor of a and b. Then there exist u, v ∈ Z such that
a = cu and b = cv, implying
n| ab ⇒ n|b.
xnb + yab = b.
Moreover, by hypothesis, there exists c ∈ Z with ab = nc, implying n(xb + yc) = b, i.e.
n|b as claimed.
(b): Suppose n is prime, n| ab, and n 6 | a. As n is prime, the only divisors of n are 1
and n. As n 6 | a, we know gcd(n, a) = 1, implying n|b by (a).
then there exist a, b ∈ N with 1 < a ≤ b < n and n = ab. By induction hypothesis, there
exist functions ka : Pa −→ N, kb : Pb −→ N, such that Pa , Pb ⊆ {p ∈ N : p prime} and
Y Y
a= pka (p) , b = pkb (p) .
p∈Pa p∈Pb
Letting Pn := Pa ∪ Pb and
ka (p)
for p ∈ Pa \ Pb ,
kn : Pn −→ N, kn (p) := kb (p) for p ∈ Pb \ Pa ,
ka (p) + kb (p) for p ∈ Pa ∩ Pb ,
we obtain
Y Y Y Y Y Y
n = ab = pka (p) pkb (p) = pka (p) pkb (p) pka (p)+kb (p) = pkn (p) ,
p∈Pa p∈Pb p∈Pa \Pb p∈Pb \Pa p∈Pa ∩Pb p∈Pn
Since n is not prime, we know m ≥ 2. As we know m < n, we may use the induction
hypothesis to conclude P = Q, k(p) = l(p) for each p ∈ P \{p0 }, and k(p0 )−1 = l(p0 )−1,
i.e. k = l as desired.
Theorem D.7 (Euclid’s Theorem). The set of prime numbers is infinite.
Proof. We show that no finite set ofQprimes contains every prime: Let ∅ 6= P be a finite
set of prime numbers. Define p := q∈P q, p̃ := p + 1. Then p̃ ∈ / P , since p̃ > q for each
q ∈ P . Thus, if p̃ is prime, we are already done. If p̃ is not prime, then there exists a
prime n such that n|p̃ (this is immediate from Th. D.6, but also follows easily without
Th. D.6, since p̃ can only have finitely many factors a with 1 < a < p̃). If n ∈ P , then
there is a ∈ N such that na = p. As there is also b ∈ N such that nb = p̃, we have
1 = p̃ − p = nb − na = n(b − a), implying n = b − a = 1, in contradiction to 1 ∈ / P.
Thus, n ∈ / P , completing the proof.
E VECTOR SPACES 230
E Vector Spaces
Proof. (a): Given v ∈ V and b ∈ B, let cv (b) denote the coordinate of v with respect to
b according to Th. 5.19. Then
B
φ : V −→ Ffin , φ(v)(b) := cv (b),
is well-defined and bijective by Th. 5.19.
(b): The map φ1 exists according to (a), whereas φ2 , φ3 , φ3 exist according to Th.
A.60.
Corollary E.2. Let F be a field and let M be an infinite set. If #F ≤ #M (i.e. there
exists an injective map φF : F −→ M ), then
dim F M = #2M (as vector space over F ), (E.2)
i.e. if B is a basis of F M , then there exists a bijective map φ : B −→ P(M ). In
particular, we have
dim RR = #2R (as vector space over R), (E.3a)
[Phi16, Th. F.2]
dim(Z2 )N = #2N = #R (as vector space over Z2 ). (E.3b)
i.e. a + n = a, showing n is the additive neutral element of V . Next, given a = (ai )i∈I ∈
V , define ā = (āi )i∈I , āi := −ai . Then
∀ (a + ā)i = ai + (−ai ) = 0,
i∈I
i.e. a + ā = n = 0, showing ā to be the additive inverse element of a (as usual, one then
writes −a instead of ā). Associativity is verified since
(∗)
(a + b) + c = (ai + bi ) + ci = ai + (bi + ci )
∀ ∀ i
(ai )i∈I ,(bi )i∈I ,(ci )i∈I ∈V i∈I
= a + (b + c) i ,
where the associativity of addition in the vector space Yi was used at (∗). Likewise,
commutativity is verified since
(∗)
∀ ∀ (a + b)i = ai + bi = bi + ai = (b + a)i ,
(ai )i∈I ,(bi )i∈I ∈V i∈I
where the commutativity of addition in the vector space Yi was used at (∗). This
completes the proof that (V, +) is a commutative group.
To verify Def. 5.1(ii), one computes
(∗)
λ(a + b) = λ(ai + bi ) = λai + λbi
∀ ∀ ∀ i
λ∈F (ai )i∈I ,(bi )i∈I ∈V i∈I
= (λa + λb)i ,
where distributivity in the vector space Yi was used at (∗), proving λ(a + b) = λa + λb.
Similarly,
(∗)
(λ + µ)a i = (λ + µ)ai = λai + µai
∀ ∀ ∀
λ,µ∈F (ai )i∈I ∈V i∈I
= (λa + µa)i ,
REFERENCES 232
where, once more, distributivity in the vector space Yi was used at (∗), proving (λ+µ)a =
λa + µa.
The proof of Def. 5.1(iii), is given by
(∗)
∀ ∀ ∀ (λµ)a i
= (λµ)ai = λ(µai ) = λ(µa) i ,
λ,µ∈F (ai )i∈I ∈V i∈I
where the validity of Def. 5.1(iii) in the vector space Yi was used at (∗).
Finally, to prove Def. 5.1(iv), one computes
(∗)
∀ ∀ (1 · a)i = 1 · ai = ai ,
(ai )i∈I ∈V i∈I
where the validity of Def. 5.1(iv) in the vector space Yi was used at (∗).
Example E.4. In generalization of Ex. 6.7(c), we consider general projections defined
on a general Cartesian product of vector spaces, as defined in Ex. E.3 above: Let I be
a nonempty
Q index set, let (Yi )i∈I be a family of vector spaces over the field F , and let
V := i∈I Yi be the Cartesian product vector space as defined in Ex. E.3. For each
∅ 6= J ⊆ I, define
Y
πJ : V −→ VJ := Yi , πJ ((ai )i∈I ) := (ai )i∈J ,
i∈J
References
[Bla84] A. Blass. Existence of Bases Implies the Axiom of Choice. Contemporary
Mathematics 31 (1984), 31–33.
REFERENCES 233
[Hal17] Lorenz J. Halbeisen. Combinatorial Set Theory, 2nd ed. Springer Mono-
graphs in Mathematics, Springer, Cham, Switzerland, 2017.
[Kun80] Kenneth Kunen. Set Theory. Studies in Logic and the Foundations of Math-
ematics, Vol. 102, North-Holland, Amsterdam, 1980.
[Phi16] P. Philip. Analysis I: Calculus of One Real Variable. Lecture Notes, LMU
Munich, 2015/2016, AMS Open Math Notes Ref. # OMN:202109.111306,
available in PDF format at
https://www.ams.org/open-math-notes/omn-view-listing?listingId=111306.
[Str08] Gernot Stroth. Lineare Algebra, 2nd ed. Berliner Studienreihe zur Math-
ematik, Vol. 7, Heldermann Verlag, Lemgo, Germany, 2008 (German).
[Wik15b] Wikipedia. HOL Light — Wikipedia, The Free Encyclopedia. 2015, https:
//en.wikipedia.org/wiki/HOL_Light Online; accessed Sep-01-2015.
REFERENCES 234