Encyclopaedia of Mathematics

Free Encyclopedia of Mathematics 0.0.
1
by the PlanetMath authors Aatu, ack, akrowne, alek thiery, alinabi, almann,
alozano, antizeus, antonio, aparna, ariels, armbrusterb, AxelBoldt, basseykay,
bbukh, benjaminfjones, bhaire, brianbirgen, bs, bshanks, bwebste, cryo, danielm,
Daume, debosberg, deiudi, digitalis, djao, Dr Absentius, draisma, drini, drummond, dublisk, Evandar, fibonaci, flynnheiss, gabor sz, GaloisRadical, gantsich,
gaurminirick, gholmes74, giri, greg, grouprly, gumau, Gunnar, Henry, iddo, igor,
imran, jamika chris, jarino, jay, jgade, jihemme, Johan, karteef, karthik, kemyers3, Kevin OBryant, kidburla2003, KimJ, Koro, lha, lieven, livetoad, liyang, Logan, Luci, m759, mathcam, mathwizard, matte, mclase, mhale, mike, mikestaflogan, mps, msihl, muqabala, n3o, nerdy2, nobody, npolys, Oblomov, ottocolori,
paolini, patrickwonders, pbruin, petervr, PhysBrain, quadrate, quincynoodles,
ratboy, RevBobo, Riemann, rmilson, ruiyang, Sabean, saforres, saki, say 10,
scanez, scineram, seonyoung, slash, sleske, slider142, sprocketboy, sucrose, superhiggs, tensorking, thedagit, Thomas Heye, thouis, Timmy, tobix, tromp, tz26, unlord, uriw, urz, vampyr, vernondalhart, vitriol, vladm, volator, vypertd, wberry,
Wkbj79, wombat, x bas, xiaoyanggu, XJamRastafire, xriso, yark et al.
edited by Joe Corneli & Aaron Krowne
c 2004 PlanetMath.org authors. Permission is granted to copy, disCopyright

tribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software
Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included in the section entitled GNU
Free Documentation License.
Introduction
Welcome to the PlanetMath One Big Book compilation, the Free Encyclopedia of Mathematics. This book gathers in a single document the best of the hundreds of authors and
thousands of other contributors from the PlanetMath.org web site, as of January 4, 2004.
The purpose of this compilation is to help the efforts of these people reach a wider audience
and allow the benefits of their work to be accessed in a greater breadth of situations.
We want to emphasize is that the Free Encyclopedia of Mathematics will always be a work
in progress. Producing a book-format encycopedia from the amorphous web of interlinked
and multidimensionally-organized entries on PlanetMath is not easy. The print medium
demands a linear presentation, and to boil the web site down into this format is a difficult,
and in some ways lossy, transformation. A major part of our editorial efforts are going into
making this transformation. We hope the organization weve chosen for now is useful to
readers, and in future editions you can expect continuing improvements.
The linearization of PlanetMath.org is not the only editorial task we must perform.
Throughout the millenia, readers have come to expect a strict standard of consistency and
correctness from print books, and we must strive to meet this standard in the PlanetMath
Book as closely as possible. This means applying more editorial control to the book form
of PlanetMath than is applied to the web site. We hope you will agree that there is significant value to be gained from unifying style, correcting errors, and filtering out not-yet-ready
content, so we will continue to do these things.
For more details on planned improvements to this book, see the TODO file that came with
this archive. Remember that you can help us to improve this work by joining PlanetMath.org
and filing corrections, adding entries, or just participating in the community. We are also
looking for volunteers to help edit this book, or help with programming related to its production, or to help work on Noosphere, the PlanetMath software. To send us comments
about the book, use the e-mail address pmbook@planetmath.org. For general comments
and queries, use feedback@planetmath.org.
Happy mathing,
Joe Corneli
Aaron Krowne
Tuesday, January 27, 2004
i
Top-level Math Subject

Classificiations
00
01
03
05
06
08
11
12
13
14
15
16
17
18
19
20
22
26
28
30
31
32
33
34
35
37
39
40
41
42
43
44
General
History and biography
Mathematical logic and foundations
Combinatorics
Order, lattices, ordered algebraic structures
General algebraic systems
Number theory
Field theory and polynomials
Commutative rings and algebras
Algebraic geometry
Linear and multilinear algebra; matrix theory
Associative rings and algebras
Nonassociative rings and algebras
Category theory; homological algebra
$K$-theory
Group theory and generalizations
Topological groups, Lie groups
Real functions
Measure and integration
Functions of a complex variable
Potential theory
Several complex variables and analytic spaces
Special functions
Ordinary differential equations
Partial differential equations
Dynamical systems and ergodic theory
Difference and functional equations
Sequences, series, summability
Approximations and expansions
Fourier analysis
Abstract harmonic analysis
Integral transforms, operational calculus
ii
45
46
47
49
51
52
53
54
55
57
58
60
62
65
68
70
74
76
78
80
81
82
83
85
86
90
91
92
93
94
97
Integral equations
Functional analysis
Operator theory
Calculus of variations and optimal control; optimization
Geometry
Convex and discrete geometry
Differential geometry
General topology
Algebraic topology
Manifolds and cell complexes
Global analysis, analysis on manifolds
Probability theory and stochastic processes
Statistics
Numerical analysis
Computer science
Mechanics of particles and systems
Mechanics of deformable solids
Fluid mechanics
Optics, electromagnetic theory
Classical thermodynamics, heat transfer
Quantum theory
Statistical mechanics, structure of matter
Relativity and gravitational theory
Astronomy and astrophysics
Geophysics
Operations research, mathematical programming
Game theory, economics, social and behavioral sciences
Biology and other natural sciences
Systems theory; control
Information and communication, circuits
Mathematics education
iii
Table of Contents
lipschitz function 21
lognormal random variable 21
lowest upper bound 22
marginal distribution 22
measurable space 23
measure zero 23
minimum spanning tree 23
minimum weighted path length 24
mod 2 intersection number 25
moment generating function 27
monoid 27
monotonic operator 27
multidimensional Gaussian integral 28
multiindex 29
near operators 30
negative binomial random variable 36
normal random variable 37
normalizer of a subset of a group 38
nth root 38
null tree 40
open ball 40
opposite ring 40
orbit-stabilizer theorem 41
orthogonal 41
permutation group on a set 41
prime element 42
product measure 43
projective line 43
projective plane 43
proof of calculus theorem used in the Lagrange
method 44
proof of orbit-stabilizer theorem 45
proof of power rule 45
proof of primitive element theorem 47
proof of product rule 47
proof of sum rule 48
proof that countable unions are countable 48
quadrature 48
quotient module 49
regular expression 49
regular language 50
right function notation 51
ring homomorphism 51
scalar 51
schrodinger operator 51
Introduction i
Top-level Math Subject Classificiations ii
Table of Contents iv
GNU Free Documentation License lii
UNCLA Unclassified 1
Golomb ruler 1
Hesse configuration 1
Jordans Inequality 2
Lagranges theorem 2
Laurent series 3
Lebesgue measure 3
Leray spectral sequence 4
Mobius transformation 4
Mordell-Weil theorem 4
Plateaus Problem 5
Poisson random variable 5
Shannons theorem 6
Shapiro inequality 9
Sylow p-subgroups 9
Tchirnhaus transformations 9
Wallis formulae 10
ascending chain condition 10
bounded 10
bounded operator 11
complex projective line 12
converges uniformly 12
descending chain condition 13
diamond theorem 13
equivalently oriented bases 13
finitely generated R-module 14
fraction 14
group of covering transformations 15
idempotent 15
isolated 17
isolated singularity 17
isomorphic groups 17
joint continuous density function 18
joint cumulative distribution function 18
joint discrete density function 19
left function notation 20
lift of a submanifold 20
limit of a real function exits at a point 20
iv
selection sort 52
semiring 53
simple function 54
simple path 54
solutions of an equation 54
spanning tree 54
square root 55
stable sorting algorithm 56
standard deviation 56
stochastic independence 56
substring 57
successor 57
sum rule 58
superset 58
symmetric polynomial 59
the argument principle 59
torsion-free module 59
total order 60
tree traversals 60
trie 63
unit vector 64
unstable fixed point 65
weak* convergence in normed linear space 65
well-ordering principle for natural numbers 65
00-01 Instructional exposition (textbooks,
tutorial papers, etc.) 66
dimension 66
toy theorem 67
00-XX General 68
method of exhaustion 68
00A05 General mathematics 69
Conways chained arrow notation 69
Knuths up arrow notation 70
arithmetic progression 70
arity 71
introducing 0th power 71
lemma 71
property 72
saddle point approximation 72
singleton 73
subsequence 73
surreal number 73
00A07 Problem books 76
Nesbitts inequality 76
proof of Nesbitts inequality 76
00A20 Dictionaries and other general

reference works 78
completing the square 78
00A99 Miscellaneous topics 80
QED 80
TFAE 80
WLOG 81
order of operations 81
01A20 Greek, Roman 84
Roman numerals 84
01A55 19th century 85
Poincar, Jules Henri 85
01A60 20th century 90
Bourbaki, Nicolas 90
Erds Number 97
03-00 General reference works (handbooks, dictionaries, bibliographies, etc.) 98
Burali-Forti paradox 98
Cantors paradox 98
Russells paradox 99
biconditional 99
bijection 100
cartesian product 100
chain 100
characteristic function 101
concentric circles 101
conjunction 102
disjoint 102
empty set 102
even number 103
fixed point 103
infinite 103
injective function 104
integer 104
inverse function 105
linearly ordered 106
operator 106
ordered pair 106
ordering relation 106
partition 107
pullback 107
set closed under an operation 108
signature of a permutation 109
subset 109
surjective 110
quantifier 141
quantifier free 144
subformula 144
syntactic compactness theorem for first order logic
144
transfinite induction 144
universal relation 145
universal relations exist for each level of the arithmetical hierarchy 145
well-founded induction 146
well-founded induction on formulas 147
03B15 Higher-order logic and type theory 143
Hartigs quantifier 143
Russells theory of types 143
analytic hierarchy 145
game-theoretical quantifier 146
logical language 147
second order logic 148
03B40 Combinatory logic and lambdacalculus 150
Church integer 150
combinatory logic 150
lambda calculus 151
03B48 Probability and inductive logic
154
conditional probability 154
03B99 Miscellaneous 155
Beth property 155
Hofstadters MIU system 155
IF-logic 157
Tarskis result on the undefinability of Truth 160
axiom 161
compactness 164
consistent 164
interpolation property 164
sentence 165
03Bxx General logic 166
Banach-Tarski paradox 166
03C05 Equational classes, universal algebra 168
congruence 168
every congruence is the kernel of a homomorphism 168
homomorphic image of a -structure is a -structure
transposition 110
truth table 111
03-XX Mathematical logic and foundations 112
standard enumeration 112
03B05 Classical propositional logic 113
CNF 113
Proof that contrapositive statement is true using
logical equivalence 113
contrapositive 114
disjunction 114
equivalent 114
implication 115
propositional logic 115
theory 116
transitive 116
truth function 117
03B10 Classical first-order logic 118
1 bootstrapping 118
Boolean 119
Godel numbering 120
Godels incompleteness theorems 120
Lindenbaum algebra 127
Lindstroms theorem 128
Pressburger arithmetic 129
R-minimal element 129
Skolemization 129
arithmetical hierarchy 129
arithmetical hierarchy is a proper hierarchy 130
atomic formula 131
creating an infinite model 131
criterion for consistency of sets of formulas 132
deductions are 1 132
example of Godel numbering 134
example of well-founded induction 135
first order language 136
first order logic 137
first order theories 138
free and bound variables 138
generalized quantifier 139
logic 140
proof of compactness theorem for first order logic
141
proof of principle of transfinite induction 141
proof of the well-founded induction principle 141
vi
169
kernel 169
kernel of a homomorphism is a congruence 169
quotient structure 170
03C07 Basic properties of first-order languages and structures 171
Models constructed from constants 171
Stone space 172
alphabet 173
axiomatizable theory 174
definable 174
definable type 175
downward Lowenheim-Skolem theorem 176
example of definable type 176
example of strongly minimal 177
first isomorphism theorem 177
language 178
length of a string 179
proof of homomorphic image of a -structure is
a -structure 179
satisfaction relation 180
signature 181
strongly minimal 181
structure preserving mappings 181
structures 182
substructure 183
type 183
upward Lowenheim-Skolem theorem 183
03C15 Denumerable structures 185
random graph (infinite) 185
03C35 Categoricity and completeness of
theories 187
-categorical 187
Vaughts test 187
proof of Vaughts test 187
03C50 Models with special properties
(saturated, rigid, etc.) 189
example of universal structure 189
homogeneous 191
universal structure 191
03C52 Properties of classes of models
192
amalgamation property 192
03C64 Model theory of ordered structures; o-minimality 193
infinitesimal 193
o-minimality 194
real closed fields 194
03C68 Other classical first-order model
theory 196
imaginaries 196
03C90 Nonclassical models (Boolean-valued,
sheaf, etc.) 198
Boolean valued model 198
03C99 Miscellaneous 199
axiom of foundation 199
elementarily equivalent 199
elementary embedding 200
model 200
proof equivalence of formulation of foundation
201
03D10 Turing machines and related notions 203
Turing machine 203
03D20 Recursive functions and relations,
subrecursive hierarchies 206
primitive recursive 206
03D25 Recursively (computably) enumerable sets and degrees 207
recursively enumerable 207
03D75 Abstract and axiomatic computability and recursion theory 208
Ackermann function 208
halting problem 209
03E04 Ordered sets and their cofinalities; pcf theory 211
another definition of cofinality 211
cofinality 211
maximal element 212
partitions less than cofinality 213
well ordered set 213
pigeonhole principle 213
proof of pigeonhole principle 213
tree (set theoretic) 214
-complete 215
Cantors diagonal argument 215
Fodors lemma 216
Schroeder-Bernstein theorem 216
Veblen function 216
additively indecomposable, 217
vii
cardinal number 217

cardinal successor 217
cardinality 218
cardinality of a countable union 218
cardinality of the rationals 219
classes of ordinals and enumerating functions 219
club 219
club filter 220
countable 220
countably infinite 221
finite 221
fixed points of normal functions 221
height of an algebraic number 221
if A is infinite and B is a finite subset of A, then
A \ B is infinite 222
limit cardinal 222
natural number 223
ordinal arithmetic 224
ordinal number 225
power set 225
proof of Fodors lemma 225
proof of Schroeder-Bernstein theorem 225
proof of fixed points of normal functions 226
proof of the existence of transcendental numbers
226
proof of theorems in aditively indecomposable
227
proof that the rationals are countable 228
stationary set 228
successor cardinal 229
uncountable 229
von Neumann integer 229
von Neumann ordinal 230
weakly compact cardinal 231
weakly compact cardinals and the tree property
231
Cantors theorem 232
proof of Cantors theorem 232
additive 232
antisymmetric 233
constant function 233
direct image 234
domain 234
dynkin system 234
equivalence class 235
fibre 235
filtration 236
finite character 236
fix (transformation actions) 236
function 237
functional 237
generalized cartesian product 238
graph 238
identity map 238
inclusion mapping 239
inductive set 239
invariant 240
inverse function theorem 240
inverse image 241
mapping 242
mapping of period n is a bijection 242
partial function 242
partial mapping 243
period of mapping 243
pi-system 244
proof of inverse function theorem 244
proper subset 246
range 246
reflexive 246
relation 246
restriction of a mapping 247
set difference 247
symmetric 247
symmetric difference 248
the inverse image commutes with set operations
248
transformation 249
transitive 250
transitive 250
transitive closure 250
Hausdorffs maximum principle 250
Kuratowskis lemma 251
Tukeys lemma 251
Zermelos postulate 251
Zermelos well-ordering theorem 251
Zorns lemma 252
axiom of choice 252
equivalence of Hausdorffs maximum principle,
Zorns lemma and the well-ordering theorem 252
equivalence of Zorns lemma and the axiom of
viii
Martins axiom 283

Martins axiom and the continuum hypothesis
283
Martins axiom is consistent 284
a shorter proof: Martins axiom and the continuum hypothesis 287
continuum hypothesis 288
forcing 288
generalized continuum hypothesis 289
inaccessible cardinals 290
3 290
290
Dedekind infinite 291
Zermelo-Fraenkel axioms 291
class 291
complement 293
delta system 293
delta system lemma 293
diagonal intersection 293
choice 253
maximality principle 254
principle of finite induction 254
principle of finite induction proven from wellordering principle 255
proof of Tukeys lemma 255
proof of Zermelos well-ordering theorem 255
axiom of extensionality 256
axiom of infinity 256
axiom of pairing 257
axiom of power set 258
axiom of union 258
axiom schema of separation 259
de Morgans laws 260
de Morgans laws for sets (proof) 261
set theory 261
union 264
universe 264
von Neumann-Bernays-Gdel set theory 265
FS iterated forcing preserves chain condition 267
chain condition 268
composition of forcing notions 268
composition preserves chain condition 268
equivalence of forcing notions 269
forcing relation 270
forcings are equivalent if one is dense in the other
270
iterated forcing 272
iterated forcing and composition 273
name 273
partial order with chain condition does not collapse cardinals 274
proof of partial order with chain condition does
not collapse cardinals 274
proof that forcing notions are equivalent to their
composition 275
complete partial orders do not add small subsets
280
proof of complete partial orders do not add small
subsets 280
3 is equivalent to and continuum hypothesis
281
Levy collapse 281
proof of 3 is equivalent to and continuum hypothesis 282
intersection 294
multiset 294
proof of delta system lemma 294
rational number 295
saturated (set) 295
separation and doubletons axiom 295
set 296
03Exx Set theory 299
intersection 299
03F03 Proof theory, general 300
NJp 300
NKp 300
natural deduction 301
sequent 301
sound,, complete 302
03F07 Structure of proofs 303
induction 303
03F30 First-order arithmetic and fragments 307
Elementary Functional Arithmetic 307
PA 308
Peano arithmetic 308
03F35 Second- and higher-order arithmetic and fragments 310
ACA0 310
ix
05A19 Combinatorial identities 342

RCA0 310
Pascals rule 342
Z2 310
05A99 Miscellaneous 343
comprehension axiom 311
principle of inclusion-exclusion 343
induction axiom 311
principle of inclusion-exclusion proof 344
03G05 Boolean algebras 313
05B15 Orthogonal arrays, Latin squares,
Boolean algebra 313
Room squares 346
M. H. Stones representation theorem 313
03G10 Lattices and related structures example of Latin squares 346
graeco-latin squares 346
314
latin square 347
Boolean lattice 314
magic square 347
complete lattice 314
05B35 Matroids, geometric lattices 348
lattice 315
matroid 348
03G99 Miscellaneous 316
polymatroid 353
Chu space 316
05C05 Trees 354
Chu transform 316
AVL tree 354
biextensional collapse 317
Aronszajn tree 354
example of Chu space 317
Suslin tree 354
property of a Chu space 318
05-00 General reference works (hand- antichain 355
books, dictionaries, bibliographies, etc.) 319balanced tree 355
binary tree 355
example of pigeonhole principle 319
branch 356
multi-index derivative of a power 319
child node (of a tree) 356
multi-index notation 320
05A10 Factorials, binomial coefficients, complete binary tree 357
digital search tree 357
combinatorial functions 322
digital tree 358
Catalan numbers 322
example of Aronszajn tree 358
Levi-Civita permutation symbol 323
example of tree (set theoretic) 359
Pascals rule (bit string proof) 325
extended binary tree 359
Pascals rule proof 326
external path length 360
Pascals triangle 326
Upper and lower bounds to binomial coefficient internal node (of a tree) 360
leaf node (of a tree) 361
328
parent node (in a tree) 361
binomial coefficient 328
proof that has the tree property 362
double factorial 329
root (of a tree) 362
factorial 329
tree 363
falling factorial 330
weight-balanced binary trees are ultrametric 364
inductive proof of binomial theorem 331
weighted path length 366
multinomial theorem 332
05C10 Topological graph theory, imbedmultinomial theorem (proof) 333
proof of upper and lower bounds to binomial coding 367
Heawood number 367
efficient 334
05A15 Exact enumeration problems, gen- Kuratowskis theorem 368
Szemeredi-Trotter theorem 368
erating functions 336
crossing lemma 369
Stirling numbers of the first kind 336
crossing number 369
Stirling numbers of the second kind 338
Dirac theorem 396

Euler circuit 397
Fleurys algorithm 397
Hamiltonian cycle 398
Hamiltonian graph 398
Hamiltonian path 398
Ores theorem 398
Petersen graph 399
hypohamiltonian 399
traceable 399
05C60 Isomorphism problems (reconstruction conjecture, etc.) 400
graph isomorphism 400
05C65 Hypergraphs 402
Steiner system 402
finite plane 402
hypergraph 403
linear space 404
05C69 Dominating sets, independent sets,
cliques 405
Mantels theorem 405
clique 405
proof of Mantels theorem 405
05C70 Factorization, matching, covering
and packing 407
Petersen theorem 407
Tutte theorem 407
bipartite matching 407
edge covering 409
matching 409
maximal bipartite matching algorithm 410
maximal matching/minimal edge covering theorem 411
05C75 Structural characterization of types
of graphs 413
multigraph 413
pseudograph 413
05C80 Random graphs 414
examples of probabilistic proofs 414
probabilistic method 415
05C90 Applications 417
Hasse diagram 417
Eulers polyhedron theorem 419
Poincare formula 419
graph topology 369

planar graph 370
proof of crossing lemma 370
05C12 Distance in graphs 372
Hamming distance 372
05C15 Coloring of graphs and hypergraphs 373
bipartite graph 373
chromatic number 374
chromatic number and girth 375
chromatic polynomial 375
colouring problem 376
complete bipartite graph 377
complete k-partite graph 378
four-color conjecture 378
k-partite graph 379
property B 380
05C20 Directed graphs (digraphs), tournaments 381
cut 381
de Bruijn digraph 381
directed graph 382
flow 383
maximum flow/minimum cut theorem 384
tournament 385
05C25 Graphs and groups 387
Cayley graph 387
05C38 Paths and cycles 388
Euler path 388
Veblens theorem 388
acyclic graph 389
bridges of Knigsberg 389
cycle 390
girth 391
path 391
proof of Veblens theorem 392
05C40 Connectivity 393
k-connected graph 393
Thomassens theorem on 3-connected graphs 393
Tuttes wheel theorem 394
connected graph 394
cutvertex 395
05C45 Eulerian and Hamiltonian graphs
396
Bondy and Chvtal theorem 396
xi
equivalence relation 443

Turans theorem 419
06-XX Order, lattices, ordered algebraic
Wagners theorem 420
structures 445
block 420
join 445
bridge 420
meet 445
complete graph 420
06A06 Partial order, general 446
degree (of a vertex) 421
directed set 446
distance (in a graph) 421
infimum 446
edge-contraction 421
sets that do not have an infimum 447
graph 422
supremum 447
graph minor theorem 422
upper bound 448
graph theory 423
homeomorphism 424
dense (in a poset) 449
loop 424
partial order 449
minor (of a graph) 424
poset 450
neighborhood (of a vertex) 425
quasi-order 450
null graph 425
well quasi ordering 450
order (of a graph) 425
06B10 Ideals, congruence relations 452
proof of Eulers polyhedron theorem 426
order in an algebra 452
proof of Turans theorem 427
06C05 Modular lattices, Desarguesian
realization 427
lattices 453
size (of a graph) 428
modular lattice 453
subdivision 428
06D99 Miscellaneous 454
subgraph 429
distributive 454
wheel graph 429
distributive lattice 454
05D05 Extremal set theory 431
06E99 Miscellaneous 455
LYM inequality 431
Boolean ring 455
Sperners theorem 432
08A40 Operations, polynomials, primal
05D10 Ramsey theory 433
algebras 456
Erdos-Rado theorem 433
coefficients of a polynomial 456
Ramseys theorem 433
Ramseys theorem 434
binary operation 457
arrows 435
filtered algebra 457
coloring 436
11-00 General reference works (handproof of Ramseys theorem 437
05D15 Transversal (matching) theory 438 books, dictionaries, bibliographies, etc.) 459
Euler phi-function 459
Halls marriage theorem 438
Euler-Fermat theorem 460
proof of Halls marriage theorem 438
Fermats little theorem 460
saturate 440
Fermats theorem proof 460
system of distinct representatives 440
Goldbachs conjecture 460
05E05 Symmetric functions 441
Jordans totient function 461
elementary symmetric polynomial 441
reduction algorithm for symmetric polynomials Legendre symbol 461
Pythagorean triplet 462
441
06-00 General reference works (hand- Wilsons theorem 462
books, dictionaries, bibliographies, etc.) 443arithmetic mean 462
xii
proof of Lucass theorem 486

ceiling 463
computation of powers using Fermats little the- 11A15 Power residues, reciprocity 487
Eulers criterion 487
orem 463
Gauss lemma 487
congruences 464
Zolotarevs lemma 489
coprime 464
cubic reciprocity law 491
cube root 464
proof of Eulers criterion 493
floor 465
proof of quadratic reciprocity rule 494
geometric mean 465
quadratic character of 2 495
googol 466
quadratic reciprocity for polynomials 496
googolplex 467
quadratic reciprocity rule 497
greatest common divisor 467
quadratic residue 497
group theoretic proof of Wilsons theorem 467
11A25 Arithmetic functions; related numharmonic mean 467
bers; inversion formulas 498
mean 468
Dirichlet character 498
number field 468
Liouville function 498
pi 468
Mangoldt function 499
proof of Wilsons theorem 470
proof of fundamental theorem of arithmetic 471 Mertens first theorem 499
Moebius function 499
root of unity 471
11-01 Instructional exposition (textbooks, Moebius in version 500
arithmetic function 502
multiplicative function 503
base 472
non-multiplicative function 505
11-XX Number theory 474
totient 507
Lehmers Conjecture 474
unit 507
Sierpinski conjecture 474
11A41 Primes 508
prime triples conjecture 475
11A05 Multiplicative structure; Euclidean Chebyshev functions 508
algorithm; greatest common divisors 476 Euclids proof of the infinitude of primes 509
Mangoldt summatory function 509
Bezouts lemma (number theory) 476
Mersenne numbers 510
Euclids algorithm 476
Thues lemma 510
Euclids lemma 478
composite number 511
Euclids lemma proof 478
prime 511
fundamental theorem of arithmetic 479
prime counting function 511
perfect number 479
prime difference function 512
smooth number 480
prime number theorem 512
11A07 Congruences; primitive roots; residue
prime number theorem result 513
systems 481
proof of Thues Lemma 514
Antons congruence 481
semiprime 515
Fermats Little Theorem proof (Inductive) 482
sieve of Eratosthenes 516
Jacobi symbol 483
test for primality of Mersenne numbers 516
Shanks-Tonelli algorithm 483
11A51 Factorization; primality 517
Wieferich prime 483
Fermat Numbers 517
Wilsons theorem for prime powers 484
Fermat compositeness test 517
factorial module prime powers 485
Zsigmondys theorem 518
proof of Euler-Fermat theorem 485
xiii
divisibility 518
division algorithm for integers 519
proof of division algorithm for integers 519
square-free number 520
squarefull number 520
the prime power dividing a factorial 521
11A55 Continued fractions 523
Stern-Brocot tree 523
continued fraction 524
11A63 Radix representation; digital problems 527
Kummers theorem 527
corollary of Kummers theorem 528
11A67 Other representations 529
Sierpinski Erdos egyptian fraction conjecture 529
adjacent fraction 529
any rational number is a sum of unit fractions
530
conjecture on fractions with odd denominators
532
unit fraction 532
ABC conjecture 533
Suranyi theorem 533
irrational to an irrational power can be rational
534
triangular numbers 534
11B05 Density, gaps, topology 536
Cauchy-Davenport theorem 536
Manns theorem 536
Schnirelmann density 537
Sidon set 537
asymptotic density 538
discrete space 538
essential component 539
normal order 539
11B13 Additive bases 541
Erdos-Turan conjecture 541
additive basis 542
asymptotic basis 542
base con version 542
sumset 546
11B25 Arithmetic progressions 547
Behrends construction 547
Freimans theorem 548
Szemeredis theorem 548

multidimensional arithmetic progression 549
11B34 Representation functions 550
Erdos-Fuchs theorem 550
11B37 Recurrences 551
Collatz problem 551
recurrence relation 551
11B39 Fibonacci and Lucas numbers and
polynomials and generalizations 553
Fibonacci sequence 553
Hogatts theorem 554
Lucas numbers 554
golden ratio 554
11B50 Sequences (mod m) 556
Erdos-Ginzburg-Ziv theorem 556
11B57 Farey sequences; the sequences ?
557
Farey sequence 557
11B65 Binomial coefficients; factorials;
q-identities 559
Lucass Theorem 559
binomial theorem 559
11B68 Bernoulli and Euler numbers and
polynomials 561
Bernoulli number 561
Bernoulli periodic function 561
Bernoulli polynomial 562
generalized Bernoulli number 562
11B75 Other combinatorial number theory 563
Erdos-Heilbronn conjecture 563
Freiman isomorphism 563
sum-free 564
11B83 Special sequences and polynomials 565
Beatty sequence 565
Beattys theorem 566
Fraenkels partition theorem 566
Sierpinski numbers 567
palindrome 567
proof of Beattys theorem 568
square-free sequence 569
superincreasing sequence 569
Lychrel number 570
xiv
Schanuels conjecutre 598

closed form 571
period 598
11C08 Polynomials 573
11G05 Elliptic curves over global fields
content of a polynomial 573
600
cyclotomic polynomial 573
complex multiplication 600
height of a polynomial 574
11H06 Lattices and convex bodies 602
length of a polynomial 574
Minkowskis theorem 602
proof of Eisenstein criterion 574
proof that the cyclotomic polynomial is irreducible lattice in Rn 602
11H46 Products of linear forms 604
575
11D09 Quadratic and bilinear equations triple scalar product 604
11J04 Homogeneous approximation to
577
Pells equation and simple continued fractions one number 605
Dirichlets approximation theorem 605
577
11D41 Higher degree equations; Fermats 11J68 Approximation to algebraic numbers 606
equation 578
Davenport-Schmidt theorem 606
Beal conjecture 578
Liouville approximation theorem 606
Euler quartic conjecture 579
proof of Liouville approximation theorem 607
Fermats last theorem 580
11D79 Congruences in many variables 11J72 Irrationality; linear independence
over a field 609
582
nth root of 2 is irrational for n 3 (proof using
Chinese remainder theorem 582
Fermats last theorem) 609
Chinese remainder theorem proof 583
e is irrational (proof) 610
11D85 Representation problems 586
irrational 610
polygonal number 586
square root of 2 is irrational 611
11J81 Transcendence (general theory)
Diophantine equation 588
11E39 Bilinear and Hermitian forms 590 612
Fundamental Theorem of Transcendence 612
Hermitian form 590
Gelfonds theorem 612
non-degenerate bilinear form 590
four exponentials conjecture 612
positive definite form 591
six exponentials theorem 613
symmetric bilinear form 591
transcendental number 614
Clifford algebra 591
11Exx Forms and linear algebraic groups 11K16 Normal numbers, radix expansions, etc. 615
593
quadratic function associated with a linear func- absolutely normal 615
11K45 Pseudo-random numbers; Monte
tional 593
11F06 Structure of modular groups and Carlo methods 617
pseudorandom numbers 617
generalizations; arithmetic groups 594
quasirandom numbers 618
Taniyama-Shimura theorem 594
11F30 Fourier coefficients of automor- random numbers 619
truly random numbers 619
phic forms 597
11L03 Trigonometric and exponential sums,
Fourier coefficients 597
11F67 Special values of automorphic L- general 620
series, periods of modular forms, cohomol- Ramanujan sum 620
11L05 Gauss and Kloosterman sums; genogy, modular symbols 598
xv
11P05 Warings problem and variants

648
Lagranges four-square theorem 648
Warings problem 648
proof of Lagranges four-square theorem 649
11P81 Elementary theory of partitions
651
pentagonal number theorem 651
11R04 Algebraic numbers; rings of algebraic integers 653
Dedekind domain 653
Dirichlets unit theorem 653
Eisenstein integers 654
Galois representation 654
Gaussian integer 658
algebraic conjugates 659
algebraic integer 659
algebraic number 659
algebraic number field 659
calculating the splitting of primes 660
characterization in terms of prime ideals 661
ideal classes form an abelian group 661
integral basis 661
integrally closed 662
transcendental root theorem 662
11R06 PV-numbers and generalizations;
other special algebraic numbers 663
Salem number 663
11R11 Quadratic extensions 664
prime ideal decomposition in quadratic extensions of Q 664
11R18 Cyclotomic extensions 666
Kronecker-Weber theorem 666
examples of regular primes 667
prime ideal decomposition in cyclotomic exten11N32 Primes represented by polynomi- sions of Q 668
als; other multiplicative structure of poly- regular prime 669
11R27 Units and factorization 670
nomial values 644
regulator 670
Euler four-square identity 644
11N56 Rate of growth of arithmetic func- 11R29 Class numbers, class groups, discriminants 672
tions 645
Existence of Hilbert Class Field 672
highly composite number 645
class number formula 673
11N99 Miscellaneous 646
discriminant 673
Chinese remainder theorem 646
ideal class 674
proof of chinese remainder theorem 646
eralizations 622
Gauss sum 622
Kloosterman sum 623
Landsberg-Schaar relation 623
derivation of Gauss sum up to a sign 624
11L40 Estimates on character sums 625
Plya-Vinogradov inequality 625
11M06 (s) and L(s, ) 627
Aperys constant 627
Dedekind zeta function 627
Dirichlet L-series 628
Riemann -function 629
Riemann Xi function 630
Riemann omega function 630
functional equation for the Riemann Xi function
630
functional equation for the Riemann theta function 631
generalized Riemann hypothesis 631
proof of functional equation for the Riemann theta
function 631
11M99 Miscellaneous 633
Riemann zeta function 633
formulae for zeta in the critical strip 636
functional equation of the Riemann zeta function
638
value of the Riemann zeta function at s = 2 638
11N05 Distribution of primes 640
Bertrands conjecture 640
Bruns constant 640
proof of Bertrands conjecture 640
twin prime conjecture 642
11N13 Primes in progressions 643
primes in progressions 648
xvi
ray class group 675

11R32 Galois theory 676
Galois criterion for solvability of a polynomial by
radicals 676
11R34 Galois cohomology 677
Hilbert Theorem 90 677
11R37 Class field theory 678
Artin map 678
Tchebotarev density theorem 679
modulus 679
multiplicative congruence 680
ray class field 680
11R56 Ad`
ele rings and groups 682
adle 682
idle 682
restricted direct product 683
11R99 Miscellaneous 684
Henselian field 684
valuation 685
11S15 Ramification and extension theory 686
decomposition group 686
examples of prime ideal decomposition in number fields 688
inertial degree 691
ramification index 692
unramified action 697
11S31 Class field theory; p-adic formal
groups 699
Hilbert symbol 699
11S99 Miscellaneous 700
p-adic integers 700
local field 701
11Y05 Factorization 703
Pollards rho method 703
quadratic sieve 706
11Y55 Calculation of integer sequences
709
Kolakoski sequence 709
11Z05 Miscellaneous applications of number theory 711
function 711
arithmetic derivative 711
example of arithmetic derivative 712
proof that (n) is the number of positive divisors
of n 712
monomial 714
order and degree of polynomial 715
12-XX Field theory and polynomials 716
homogeneous polynomial 716
subfield 716
12D05 Polynomials: factorization 717
factor theorem 717
proof of factor theorem 717
proof of rational root theorem 718
rational root theorem 719
sextic equation 719
12D10 Polynomials: location of zeros
(algebraic theorems) 720
Cardanos derivation of the cubic formula 720
Ferrari-Cardano derivation of the quartic formula
721
Galois-theoretic derivation of the cubic formula
722
Galois-theoretic derivation of the quartic formula
724
cubic formula 728
derivation of quadratic formula 728
quadratic formula 729
quartic formula 730
reciprocal polynomial 730
root 731
variant of Cardanos derivation 732
Archimedean property 733
complex 734
complex conjugate 735
complex number 737
examples of totally real fields 738
fundamental theorem of algebra 739
imaginary 739
imaginary unit 739
indeterminate form 739
inequalities for real numbers 740
interval 742
modulus of complex number 743
proof of fundamental theorem of algebra 744
proof of the fundamental theorem of algebra 744
xvii
real and complex embeddings 744

real number 746
totally real and imaginary fields 747
12E05 Polynomials (irreducibility, etc.)
748
Gausss Lemma I 748
Gausss Lemma II 749
discriminant 749
polynomial ring 751
resolvent 751
de Moivre identity 754
monic 754
Wedderburns Theorem 754
proof of Wedderburns theorem 755
second proof of Wedderburns theorem 756
finite field 757
Frobenius automorphism 760
characteristic 761
characterization of field 761
example of an infinite field of finite characteristic
762
examples of fields 762
field 764
field homomorphism 764
prime subfield 765
12F05 Algebraic extensions 766
a finite extension of fields is an algebraic extension 766
algebraic closure 767
algebraic extension 767
algebraically closed 767
algebraically dependent 768
existence of the minimal polynomial 768
finite extension 769
minimal polynomial 769
norm 770
primitive element theorem 770
splitting field 770
the field extension R/Q is not finite 771
trace 771
12F10 Separable extensions, Galois theory 772
Abelian extension 772
Fundamental Theorem of Galois Theory 772
Galois closure 773
Galois conjugate 773

Galois extension 773
Galois group 773
absolute Galois group 774
cyclic extension 774
example of nonperfect field 774
fixed field 774
infinite Galois theory 774
normal closure 776
normal extension 776
perfect field 777
radical extension 777
separable 777
separable closure 778
12F20 Transcendental extensions 779
transcendence degree 779
12F99 Miscellaneous 780
composite field 780
extension field 780
12J15 Ordered fields 782
ordered field 782
absolute value 783
associates 784
cancellation ring 784
comaximal 784
every prime ideal is radical 784
module 785
radical of an ideal 786
ring 786
subring 787
tensor product 787
13-XX Commutative rings and algebras
789
commutative ring 789
13A02 Graded rings 790
graded ring 790
13A05 Divisibility 791
Eisenstein criterion 791
13A10 Radical theory 792
Hilberts Nullstellensatz 792
nilradical 792
radical of an integer 793
13A15 Ideals; multiplicative ideal theory
xviii
794
contracted ideal 794
existence of maximal ideals 794
extended ideal 795
fractional ideal 796
homogeneous ideal 797
ideal 797
maximal ideal 797
principal ideal 798
the set of prime ideals of a commutative ring
with identity 798
13A50 Actions of groups on commutative rings; invariant theory 799
Schwarz (1975) theorem 799
invariant polynomial 800
Lagranges identity 801
characteristic 802
cyclic ring 802
proof of Euler four-square identity 803
proof that every subring of a cyclic ring is a cyclic
ring 804
proof that every subring of a cyclic ring is an
ideal 804
zero ring 805
13B02 Extension theory 806
algebraic 806
module-finite 806
13B05 Galois theory 807
algebraic 807
13B21 Integral dependence 808
integral 808
13B22 Integral closure of rings and ideals ; integrally closed rings, related rings
(Japanese, etc.) 809
integral closure 809
13B30 Quotients and localization 810
fraction field 810
localization 810
multiplicative set 811
13C10 Projective and free modules and
ideals 812
example of free module 812
13C12 Torsion modules and ideals 813
torsion element 813
13C15 Dimension theory, depth, related

rings (catenary, etc.) 814
Krulls principal ideal theorem 814
Artin-Rees theorem 815
Nakayamas lemma 815
prime ideal 815
proof of Nakayamas lemma 816
proof of Nakayamas lemma 817
support 817
13E05 Noetherian rings and modules 818
Hilbert basis theorem 818
Noetherian module 818
proof of Hilbert basis theorem 819
finitely generated modules over a principal ideal
domain 819
13F07 Euclidean rings and generalizations 821
Euclidean domain 821
Euclidean valuation 821
proof of Bezouts Theorem 822
proof that an Euclidean domain is a PID 822
13F10 Principal ideal rings 823
Smith normalform 823
13F25 Formal power series rings 825
formal power series 825
13F30 Valuation rings 831
discrete valuation 831
discrete valuation ring 831
13G05 Integral domains 833
Dedekind-Hasse valuation 833
PID 834
UFD 834
a finite integral domain is a field 835
an artinian integral domain is a field 835
example of PID 835
field of quotients 836
integral domain 836
irreducible 837
motivation for Euclidean domains 837
zero divisor 838
13H05 Regular local rings 839
regular local ring 839
13H99 Miscellaneous 840
local ring 840
xix
height of a prime ideal 866

semi-local ring 841
invertible sheaf 866
13J10 Complete rings, completion 842
locally free 867
completion 842
normal irreducible varieties are nonsingular in
13J25 Ordered rings 844
codimension 1 867
ordered ring 844
sheaf of meromorphic functions 867
13J99 Miscellaneous 845
very ample 867
topological ring 845
14C20 Divisors, linear systems, invert13N15 Derivations 846
ible sheaves 869
derivation 846
13P10 Polynomial ideals, Gr
obner bases divisor 869
Rational and birational maps 870
847
general type 870
Grobner basis 847
14-00 General reference works (hand- 14F05 Vector bundles, sheaves, related
books, dictionaries, bibliographies, etc.) 849constructions 871
direct image (functor) 871
Picard group 849
14F20 Etale
and other Grothendieck topoloaffine space 849
gies and cohomologies 872
affine variety 849
site 872
dual isogeny 850
14F25 Classical real and complex cohofinite morphism 850
mology 873
isogeny 851
Serre duality 873
line bundle 851
sheaf cohomology 874
nonsingular variety 852
14G05 Rational points 875
projective space 852
Hasse principle 875
projective variety 854
14H37 Automorphisms 876
quasi-finite morphism 854
Frobenius morphism 876
14A10 Varieties and morphisms 855
14H45 Special curves and curves of low
Zariski topology 855
genus 878
algebraic map 856
Fermats spiral 878
algebraic sets and polynomial ideals 856
archimedean spiral 878
noetherian topological space 857
folium of Descartes 879
regular map 857
spiral 879
structure sheaf 858
14H50 Plane and space curves 880
14A15 Schemes and morphisms 859
torsion (space curve) 880
closed immersion 859
14H52 Elliptic curves 881
coherent sheaf 859
Birch and Swinnerton-Dyer conjecture 881
fibre product 860
Hasses bound for elliptic curves over finite fields
prime spectrum 860
882
scheme 863
L-series of an elliptic curve 882
separated scheme 864
Mazurs theorem on torsion of elliptic curves 884
singular set 864
Mordell curve 884
Nagell-Lutz theorem 885
Cartier divisor 865
Selmer group 886
General position 865
bad reduction 887
Serres twisting theorem 866
conductor of an elliptic curve 890
ample 866
xx
diagonally dominant matrix 920

elliptic curve 890
eigenvalue (of a matrix) 921
height function 894
eigenvalue problem 922
j-invariant 895
eigenvalues of orthogonal matrices 924
rank of an elliptic curve 896
eigenvector 925
supersingular 897
the torsion subgroup of an elliptic curve injects exactly determined 926
free vector space over a set 926
in the reduction of the curve 897
in a vector space, v = 0 if and only if = 0 or
14H99 Miscellaneous 900
v is the zero vector 928
Riemann-Roch theorem 900
invariant subspace 929
genus 900
least squares 929
projective curve 901
linear algebra 930
proof of Riemann-Roch theorem 901
14L17 Affine algebraic groups, hyperal- linear least squares 932
linear manifold 934
gebra constructions 902
matrix exponential 934
affine algebraic group 902
matrix operations 935
algebraic torus 902
14M05 Varieties defined by ring con- nilpotent matrix 938
ditions (factorial, Cohen-Macaulay, semi- nilpotent transformation 938
non-zero vector 939
normal) 903
off-diagonal entry 940
normal 903
14M15 Grassmannians, Schubert vari- orthogonal matrices 940
orthogonal vectors 941
eties, flag manifolds 904
overdetermined 941
Borel-Bott-Weil theorem 904
partitioned matrix 941
flag variety 905
pentadiagonal matrix 942
14R15 Jacobian problem 906
proof of Cayley-Hamilton theorem 942
Jacobian conjecture 906
15-00 General reference works (handproof of Schur decomposition 943
books, dictionaries, bibliographies, etc.) 907singular value decomposition 944
skew-symmetric matrix 945
Cholesky decomposition 907
square matrix 946
Hadamard matrix 908
strictly upper triangular matrix 946
Hessenberg matrix 909
If A Mn (k) and A is supertriangular then symmetric matrix 947
theorem for normal triangular matrices 947
An = 0 910
triangular matrix 948
Jacobi determinant 910
tridiagonal matrix 949
Jacobis Theorem 912
under determined 950
Kronecker product 912
unit triangular matrix 950
LU decomposition 913
unitary 951
Peetres inequality 914
vector space 952
Schur decomposition 915
vector subspace 953
antipodal 916
zero map 954
conjugate transpose 916
zero vector in a vector space is unique 955
corollary of Schur decomposition 917
zero vector space 955
covector 918
diagonal matrix 918
diagonalization 920
xxi
circulant matrix 956

matrix 957
15-XX Linear and multilinear algebra;
matrix theory 960
linearly dependent functions 960
15A03 Vector spaces, linear dependence,
rank 961
Sylvesters law 961
basis 961
complementary subspace 962
dimension 963
every vector space has a basis 964
flag 964
frame 965
linear combination 968
linear independence 968
list vector 968
nullity 969
orthonormal basis 970
physical vector 970
proof of rank-nullity theorem 972
rank 973
rank-nullity theorem 973
similar matrix 974
span 975
theorem for the direct sum of finite dimensional
vector spaces 976
vector 976
15A04 Linear transformations, semilinear transformations 980
admissibility 980
conductor of a vector 980
cyclic decomposition theorem 981
cyclic subspace 981
dimension theorem for symplectic complement
(proof) 981
dual homomorphism 982
dual homomorphism of the derivative 983
image of a linear transformation 984
invertible linear transformation 984
kernel of a linear transformation 985
linear transformation 985
minimal polynomial (endomorphism) 986
symplectic complement 987
trace 988
15A06 Linear equations 989

Gaussian elimination 989
finite-dimensional linear problem 991
homogeneous linear problem 992
linear problem 993
reduced row echelon form 993
row echelon form 994
under-determined polynomial interpolation 994
15A09 Matrix inversion, generalized inverses 996
matrix adjoint 996
matrix inverse 997
15A12 Conditioning of matrices 1000
singular 1000
15A15 Determinants, permanents, other
special matrix functions 1001
Cayley-Hamilton theorem 1001
Cramers rule 1001
cofactor expansion 1002
determinant 1003
determinant as a multilinear mapping 1005
determinants of some matrices of special form
1006
example of Cramers rule 1006
proof of Cramers rule 1008
proof of cofactor expansion 1008
resolvent matrix 1009
15A18 Eigenvalues, singular values, and
eigenvectors 1010
Jordan canonical form theorem 1010
Lagrange multiplier method 1011
Perron-Frobenius theorem 1011
characteristic equation 1012
eigenvalue 1012
eigenvalue 1013
15A21 Canonical forms, reductions, classification 1015
companion matrix 1015
eigenvalues of an involution 1015
linear involution 1016
normal matrix 1017
projection 1018
quadratic form 1019
15A23 Factorization of matrices 1021
QR decomposition 1021
xxii
15A30 Algebraic systems of matrices 1023 inner product space 1051

proof of Cauchy-Schwarz inequality 1052
ideals in matrix algebras 1023
self-dual 1052
15A36 Matrices of integers 1025
skew-symmetric bilinear form 1053
permutation matrix 1025
spectral theorem 1053
15A39 Linear inequalities 1026
15A66 Clifford algebras, spinors 1056
Farkas lemma 1026
15A42 Inequalities involving eigenvalues geometric algebra 1056
15A69 Multilinear algebra, tensor prodand eigenvectors 1027
ucts 1058
Gershgorins circle theorem 1027
Einstein summation convention 1058
Gershgorins circle theorem result 1027
basic tensor 1059
Shurs inequality 1028
15A48 Positive matrices and their gen- multi-linear 1061
outer multiplication 1061
eralizations; cones of matrices 1029
tensor 1062
negative definite 1029
tensor algebra 1065
negative semidefinite 1029
tensor array 1065
positive definite 1030
tensor product (vector spaces) 1067
positive semidefinite 1030
tensor transformations 1069
primitive matrix 1031
15A72 Vector and tensor algebra, theory
reducible matrix 1031
of invariants 1072
15A51 Stochastic matrices 1032
bac-cab rule 1072
Birkoff-von Neumann theorem 1032
cross product 1072
proof of Birkoff-von Neumann theorem 1032
15A57 Other types of matrices (Hermi- euclidean vector 1073
rotational invariance of cross product 1074
tian, skew-Hermitian, etc.) 1035
15A75 Exterior algebra, Grassmann alHermitian matrix 1035
direct sum of Hermitian and skew-Hermitian ma- gebras 1076
contraction 1076
trices 1036
exterior algebra 1077
identity matrix 1037
15A99 Miscellaneous topics 1081
skew-Hermitian matrix 1037
Kronecker delta 1081
transpose 1038
15A60 Norms of matrices, numerical range,dual space 1081
applications of functional analysis to ma- example of trace of a matrix 1083
generalized Kronecker delta symbol 1083
trix theory 1041
linear functional 1084
Frobenius matrix norm 1041
modules are a generalization of vector spaces 1084
matrix p-norm 1042
proof of properties of trace of a matrix 1085
self consistent matrix norm 1043
15A63 Quadratic and bilinear forms, in- quasipositive matrix 1086
trace of a matrix 1086
ner products 1044
Cauchy-Schwarz inequality 1044
Volume 2
adjoint endomorphism 1045
anti-symmetric 1046
16-00 General reference works (handbilinear map 1046
books, dictionaries, bibliographies, etc.) 1088
dot product 1049
every orthonormal set is linearly independent 1050 direct product of modules 1088
direct sum 1089
inner product 1051
xxiii
exact sequence 1089

quotient ring 1090
16D10 General module theory 1091
annihilator 1091
annihilator is an ideal 1091
artinian 1092
composition series 1092
conjugate module 1093
modular law 1093
module 1093
proof of modular law 1094
zero module 1094
16D20 Bimodules 1095
bimodule 1095
16D25 Ideals 1096
associated prime 1096
nilpotent ideal 1096
primitive ideal 1096
product of ideals 1097
proper ideal 1097
semiprime ideal 1097
zero ideal 1098
16D40 Free, projective, and flat modules
and ideals 1099
finitely generated projective module 1099
flat module 1099
free module 1100
free module 1100
projective cover 1100
projective module 1101
16D50 Injective modules, self-injective
rings 1102
injective hull 1102
injective module 1102
16D60 Simple and semisimple modules,
primitive rings and ideals 1104
central simple algebra 1104
completely reducible 1104
simple ring 1105
16D80 Other classes of modules and ideals 1106
essential submodule 1106
faithful module 1106
minimal prime ideal 1107
module of finite rank 1107
simple module 1107

superfluous submodule 1107
uniform module 1108
16E05 Syzygies, resolutions, complexes
1109
n-chain 1109
chain complex 1109
flat resolution 1110
free resolution 1110
injective resolution 1110
projective resolution 1110
short exact sequence 1111
split short exact sequence 1111
von Neumann regular 1111
16K20 Finite-dimensional 1112
quaternion algebra 1112
16K50 Brauer groups 1113
Brauer group 1113
16K99 Miscellaneous 1114
division ring 1114
16N20 Jacobson radical, quasimultiplication 1115
Jacobson radical 1115
a ring modulo its Jacobson radical is semiprimitive 1116
examples of semiprimitive rings 1116
proof of Characterizations of the Jacobson radical 1117
properties of the Jacobson radical 1118
quasi-regularity 1119
semiprimitive ring 1120
16N40 Nil and nilpotent radicals, sets,
ideals, rings 1121
Koethe conjecture 1121
nil and nilpotent ideals 1121
16N60 Prime and semiprime rings 1123
prime ring 1123
16N80 General radicals and rings 1124
prime radical 1124
radical theory 1124
16P40 Noetherian rings and modules 1126
Noetherian ring 1126
noetherian 1126
16P60 Chain conditions on annihilators
and summands: Goldie-type conditions ,
xxiv
bialgebra 1148
Krull dimension 1128
coalgebra 1148
Goldie ring 1128
coinvariant 1149
uniform dimension 1128
16S10 Rings determined by universal prop-comodule 1149
erties (free algebras, coproducts, adjunc- comodule algebra 1149
comodule coalgebra 1150
tion of inverses, etc.) 1130
module algebra 1150
Ore domain 1130
16S34 Group rings , Laurent polynomial module coalgebra 1150
16W50 Graded rings and modules 1151
rings 1131
graded algebra 1151
support 1131
16S36 Ordinary and skew polynomial rings graded module 1151
supercommutative 1151
and semigroup rings 1132
16W55 Super (or skew) structure
Gaussian polynomials 1132
1153
q skew derivation 1133
super tensor product 1153
q skew polynomial ring 1133
superalgebra 1153
sigma derivation 1133
supernumber 1154
sigma, delta constant 1133
16W99 Miscellaneous 1155
skew derivation 1133
Hamiltonian quaternions 1155
skew polynomial ring 1134
16Y30 Near-rings 1158
near-ring 1158
algebra 1135
17A01 General theory 1159
algebra (module) 1135
commutator bracket 1159
16U10 Integral domains 1137
17B05 Structure theory 1161
Pr
ufer domain 1137
Killing form 1161
valuation domain 1137
16U20 Ore rings, multiplicative sets, Ore Levis theorem 1161
nilradical 1161
localization 1139
radical 1162
Goldies Theorem 1139
17B10 Representations, algebraic theory
Ore condition 1139
(weights) 1163
Ores theorem 1140
Ados theorem 1163
classical ring of quotients 1140
Lie algebra representation 1163
saturated 1141
16U70 Center, normalizer (invariant el- adjoint representation 1164
examples of non-matrix Lie groups 1165
ements) 1142
isotropy representation 1165
center (rings) 1142
17B15 Representations, analytic theory
16U99 Miscellaneous 1143
1166
anti-idempotent 1143
16W20 Automorphisms and endomor- invariant form (Lie algebras) 1166
17B20 Simple, semisimple, reductive (suphisms 1144
per)algebras (roots) 1167
ring of endomorphisms 1144
16W30 Coalgebras, bialgebras, Hopf al- Borel subalgebra 1167
gebras ; rings, modules, etc. on which Borel subgroup 1167
Cartan matrix 1168
these act 1146
Cartan subalgebra 1168
Hopf algebra 1146
Cartans criterion 1168
almost cocommutative bialgebra 1147
xxv
monic 1194
Casimir operator 1168
natural equivalence 1195
Dynkin diagram 1169
representable functor 1195
Verma module 1169
supplemental axioms for an Abelian category 1195
Weyl chamber 1170
18A05 Definitions, generalizations 1197
Weyl group 1170
autofunctor 1197
Weyls theorem 1170
classification of finite-dimensional representations automorphism 1197
category 1198
of semi-simple Lie algebras 1171
category example (arrow category) 1199
cohomology of semi-simple Lie algebras 1171
commutative diagram 1199
nilpotent cone 1171
double dual embedding 1200
parabolic subgroup 1172
dual category 1201
pictures of Dynkin diagrams 1172
duality principle 1201
positive root 1175
endofunctor 1202
rank 1175
examples of initial objects, terminal objects and
root lattice 1175
zero objects 1202
root system 1176
forgetful functor 1204
simple and semi-simple Lie algebras 1177
isomorphism 1205
simple root 1178
natural transformation 1205
weight (Lie algebras) 1178
types of homomorphisms 1205
weight lattice 1178
17B30 Solvable, nilpotent (super)algebras zero object 1206
18A22 Special properties of functors (faith1179
ful, full, etc.) 1208
Engels theorem 1179
exact functor 1208
Lies theorem 1182
18A25 Functor categories, comma catesolvable Lie algebra 1183
gories 1210
17B35 Universal enveloping (super)algebras
Yoneda embedding 1210
1184
18A30 Limits and colimits (products, sums,
Poincare-Birkhoff-Witt theorem 1184
directed limits, pushouts, fiber products,
universal enveloping algebra 1185
17B56 Cohomology of Lie (super)algebras equalizers, kernels, ends and coends, etc.)
1211
1187
categorical direct product 1211
Lie algebra cohomology 1187
17B67 Kac-Moody (super)algebras (struc- categorical direct sum 1211
kernel 1212
ture and representation theory) 1188
18A40 Adjoint functors (universal conKac-Moody algebra 1188
structions, reflective subcategories, Kan exgeneralized Cartan matrix 1188
tensions, etc.) 1213
adjoint functor 1213
Jacobi identity interpretations 1190
equivalence of categories 1214
Lie algebra 1190
18B40 Groupoids, semigroupoids, semireal form 1192
18-00 General reference works (hand- groups, groups (viewed as categories) 1215
groupoid (category theoretic) 1215
18E10 Exact categories, abelian cateGrothendieck spectral sequence 1193
gories 1216
category of sets 1194
abelian category 1216
functor 1194
xxvi
associative 1245
exact sequence 1217
canonical projection 1246
derived category 1218
centralizer 1246
enough injectives 1218
commutative 1247
18F20 Presheaves and sheaves 1219
examples of groups 1247
locally ringed space 1219
group 1250
presheaf 1220
quotient group 1250
sheaf 1220
20-02 Research exposition (monographs,
sheafification 1225
survey articles) 1252
stalk 1226
length function 1252
18F30 Grothendieck groups 1228
20-XX Group theory and generalizations
Grothendieck group 1228
18G10 Resolutions; derived functors 1229 1253
free product with amalgamated subgroup 1253
derived functor 1229
nonabelian group 1254
18G15 Ext and Tor, generalizations, K
unneth
20A05 Axiomatics and elementary propformula 1231
erties 1255
Ext 1231
18G30 Simplicial sets, simplicial objects Feit-Thompson theorem 1255
Proof: The orbit of any element of a group is a
(in a category) 1232
subgroup 1255
nerve 1232
center 1256
simplicial category 1232
characteristic subgroup 1256
simplicial object 1233
class function 1257
18G35 Chain complexes 1235
conjugacy class 1258
5-lemma 1235
conjugacy class formula 1258
9-lemma 1236
conjugate stabilizer subgroups 1258
Snake lemma 1236
coset 1259
chain homotopy 1237
cyclic group 1259
chain map 1237
derived subgroup 1260
homology (chain complex) 1237
18G40 Spectral sequences, hypercoho- equivariant 1260
examples of finite simple groups 1261
mology 1238
finitely generated group 1262
spectral sequence 1238
19-00 General reference works (hand- first isomorphism theorem 1262
fourth isomorphism theorem 1262
generator 1263
Algebraic K-theory 1239
group actions and homomorphisms 1263
K-theory 1240
group homomorphism 1265
examples of algebraic K-theory groups 1241
homogeneous space 1265
19K33 EXT and K-homology 1242
identity element 1268
Fredholm module 1242
inner automorphism 1268
K-homology 1243
kernel 1269
maximal 1269
examples of K-theory groups 1244
20-00 General reference works (hand- normal subgroup 1269
normality of subgroups is not transitive 1269
alternating group is a normal subgroup of the normalizer 1270
order (of a group) 1271
symmetric group 1245
xxvii
their modules 1294

presentation of a group 1271
group ring 1294
proof of first isomorphism theorem 1272
20C15 Ordinary representations and charproof of second isomorphism theorem 1273
acters 1295
proof that all cyclic groups are abelian 1274
proof that all cyclic groups of the same order are Maschkes theorem 1295
a representation which is not completely reducible
isomorphic to each other 1274
proof that all subgroups of a cyclic group are 1295
orthogonality relations 1296
cyclic 1274
20C30 Representations of finite symmetregular group action 1275
ric groups 1299
second isomorphism theorem 1275
example of immanent 1299
simple group 1276
immanent 1299
solvable group 1276
permanent 1299
subgroup 1276
third isomorphism theorem 1277
Frobenius reciprocity 1301
Schurs lemma 1301
Cayley table 1279
character 1302
proper subgroup 1280
group representation 1303
quaternion group 1280
20B05 General theory for finite groups induced representation 1303
regular representation 1304
1282
restriction representation 1304
cycle notation 1282
20D05 Classification of simple and nonpermutation group 1283
solvable groups 1305
20B15 Primitive groups 1284
Burnside p q theorem 1305
primitive transitive permutation group 1284
20B20 Multiply transitive finite groups classification of semisimple groups 1305
semisimple group 1305
1286
Jordans theorem (multiply transitive groups) 1286 20D08 Simple groups: sporadic groups
1307
multiply transitive 1286
Janko groups 1307
sharply multiply transitive 1287
20B25 Finite automorphism groups of al- 20D10 Solvable groups, theory of forgebraic, geometric, or combinatorial struc- mations, Schunck classes, Fitting classes,
-length, ranks 1308
tures 1288
diamond theory 1288

Cuhinins
Theorem 1308
20B30 Symmetric groups 1289
separable 1308
supersolvable group 1309
20D15 Nilpotent groups, p-groups 1310
20B35 Subgroups of symmetric groups Burnside basis theorem 1310
1290
20D20 Sylow subgroups, Sylow properCayleys theorem 1290
ties, -groups, -structure 1311
-groups and 0 -groups 1311
(p, q) shuffle 1291
p-subgroup 1311
Frobenius group 1291
Burnside normal complement theorem 1312
permutation 1292
Frattini argument 1312
proof of Cayleys theorem 1292
Sylow p-subgroup 1312
20C05 Group rings of finite groups and Sylow theorems 1312
xxviii
Sylows first theorem 1313

Sylows third theorem 1313
application of Sylows theorems to groups of order pq 1313
p-primary component 1314
proof of Frattini argument 1314
proof of Sylow theorems 1314
subgroups containing the normalizers of Sylow
subgroups normalize themselves 1316
20D25 Special subgroups (Frattini, Fitting, etc.) 1317
Fittings theorem 1317
characteristically simple group 1317
the Frattini subgroup is nilpotent 1317
20D30 Series and lattices of subgroups
1319
maximal condition 1319
minimal condition 1319
subnormal series 1320
20D35 Subnormal subgroups 1321
subnormal subgroup 1321
Cauchys theorem 1322
Lagranges theorem 1322
exponent 1322
fully invariant subgroup 1323
proof of Cauchys theorem 1323
proof of Lagranges theorem 1324
proof of the converse of Lagranges theorem for
finite cyclic groups 1324
proof that expG divides |G| 1324
proof that |g| divides expG 1325
proof that every group of prime order is cyclic
1325
20E05 Free nonabelian groups 1326
Nielsen-Schreier theorem 1326
Scheier index formula 1326
free group 1326
proof of Nielsen-Schreier theorem and Schreier
index formula 1327
Jordan-Holder decomposition 1328
profinite group 1328
extension 1329
holomorph 1329
proof of the Jordan Holder decomposition theo-
rem 1329
semidirect product of groups 1330
wreath product 1333
Jordan-Hlder decomposition theorem 1334
simplicity of the alternating groups 1334
abelian groups of order 120 1337
fundamental theorem of finitely generated abelian
groups 1337
conjugacy class 1338
Frattini subgroup 1338
non-generator 1338
20Exx Structure and classification of infinite or finite groups 1339
faithful group action 1339
20F18 Nilpotent groups 1340
classification of finite nilpotent groups 1340
nilpotent group 1340
20F22 Other classes of groups defined by
subgroup chains 1342
inverse limit 1342
20F28 Automorphism groups of groups
1344
outer automorphism group 1344
20F36 Braid groups; Artin groups 1345
braid group 1345
20F55 Reflection and Coxeter groups 1347
cycle 1347
dihedral group 1348
20F65 Geometric group theory 1349
groups that act freely on trees are free 1349
perfect group 1350
20G15 Linear algebraic groups over arbitrary fields 1351
Nagaos theorem 1351
computation of the order of GL(n, Fq ) 1351
general linear group 1352
order of the general linear group over a finite field
1352
special linear group 1352
20G20 Linear algebraic groups over the
reals, the complexes, the quaternions 1353
orthogonal group 1353
20G25 Linear algebraic groups over local
fields and their integers 1354
xxix
Iharas theorem 1354

20G40 Linear algebraic groups over finite fields 1355
SL2 (F3 ) 1355
20J06 Cohomology of groups 1356
group cohomology 1356
stronger Hilbert theorem 90 1357
20J15 Category of groups 1359
variety of groups 1359
20K01 Finite abelian groups 1360
Schinzels theorem 1360
20K10 Torsion groups, primary groups
and generalized primary groups 1361
torsion 1361
20K25 Direct sums, direct products, etc.
1362
direct product of groups 1362
Klein 4-group 1363
divisible group 1364
example of divisible group 1364
locally cyclic group 1364
20Kxx Abelian groups 1366
abelian group 1366
20M10 General structure theory 1367
existence of maximal semilattice decomposition
1367
semilattice decomposition of a semigroup 1368
simple semigroup 1368
20M12 Ideal theory 1370
Rees factor 1370
ideal 1370
20M14 Commutative semigroups 1372
Archimedean semigroup 1372
commutative semigroup 1372
20M20 Semigroups of transformations,
etc. 1373
semigroup of transformations 1373
20M30 Representation of semigroups; actions of semigroups on sets 1375
counting theorem 1375
example of group action 1375
group action 1376
orbit 1377
proof of counting theorem 1377
stabilizer 1378
a semilattice is a commutative band 1379
adjoining an identity to a semigroup 1379
band 1380
bicyclic semigroup 1380
congruence 1381
cyclic semigroup 1381
idempotent 1382
null semigroup 1383
semigroup 1383
semilattice 1383
subsemigroup,, submonoid,, and subgroup 1384
zero elements 1384
20N02 Sets with a single binary operation (groupoids) 1386
groupoid 1386
idempotency 1386
left identity and right identity 1387
20N05 Loops, quasigroups 1388
Moufang loop 1388
loop and quasigroup 1389
fixed-point subspace 1390
22-XX Topological groups, Lie groups
1391
Cantor space 1391
22A05 Structure of general topological
groups 1392
topological group 1392
22C05 Compact groups 1393
n-torus 1393
reductive 1393
22D05 General properties and structure
of locally compact groups 1394
-simple 1394
22D15 Group algebras of locally compact groups 1395
group C -algebra 1395
22E10 General properties and structure
of complex Lie groups 1396
existence and uniqueness of compact real form
1396
maximal torus 1397
xxx
even/odd function 1426

Lie group 1397
example of chain rule 1427
complexification 1399
example of increasing/decreasing/monotone funcHilbert-Weyl theorem 1400
the connection between Lie groups and Lie alge- tion 1428
extended mean-value theorem 1428
bras 1401
26-00 General reference works (hand- increasing/decreasing/monotone function 1428
intermediate value theorem 1429
limit 1429
derivative notation 1402
mean value theorem 1430
fundamental theorems of calculus 1403
mean-value theorem 1430
logarithm 1404
proof of the first fundamental theorem of calcu- monotonicity criterion 1431
nabla 1431
lus 1405
proof of the second fundamental theorem of cal- one-sided limit 1432
product rule 1432
culus 1405
proof of Darbouxs theorem 1433
root-mean-square 1406
proof of Fermats Theorem (stationary points)
square 1406
1434
26-XX Real functions 1408
proof of Rolles theorem 1434
abelian function 1408
proof of Taylors Theorem 1435
full-width at half maximum 1408
26A03 Foundations: limits and general- proof of binomial formula 1436
izations, elementary topology of the line proof of chain rule 1436
proof of extended mean-value theorem 1437
1410
proof of intermediate value theorem 1437
Cauchy sequence 1410
proof of mean value theorem 1438
Dedekind cuts 1410
binomial proof of positive integer power rule 1413 proof of monotonicity criterion 1439
proof of quotient rule 1439
exponential 1414
quotient rule 1440
interleave sequence 1415
signum function 1440
limit inferior 1415
26A09 Elementary functions 1443
limit superior 1416
definitions in trigonometry 1443
power rule 1417
hyperbolic functions 1444
properties of the exponential 1417
26A12 Rate of growth of functions, orsqueeze rule 1418
ders of infinity, slowly varying functions
26A06 One-variable calculus 1420
1446
Darbouxs theorem (analysis) 1420
Landau notation 1446
Fermats Theorem (stationary points) 1420
26A15 Continuity and related questions
Heaviside step function 1421
(modulus of continuity, semicontinuity, disLeibniz rule 1421
continuities, etc.) 1448
Rolles theorem 1422
Dirichlets function 1448
binomial formula 1422
semi-continuous 1448
chain rule 1422
semicontinuous 1449
complex Rolles theorem 1423
uniformly continuous 1450
complex mean-value theorem 1423
26A16 Lipschitz (H
older) classes 1451
definite integral 1424
Lipschitz condition 1451
derivative of even/odd function (proof) 1425
direct sum of even/odd functions (example) 1425 Lipschitz condition and differentiability 1452
xxxi
Lipschitz condition and differentiability result 1453 implicit function theorem 1481
proof of implicit function theorem 1482
26A18 Iteration 1454
26B12 Calculus of vector functions 1484
iteration 1454
Clairauts theorem 1484
periodic point 1454
26A24 Differentiation (functions of one Fubinis Theorem 1484
variable): general theory, generalized deriva-Generalised N-dimensional Riemann Sum 1485
Generalized N-dimensional Riemann Integral 1485
tives, mean-value theorems 1455
Helmholtz equation 1486
Leibniz notation 1455
Hessian matrix 1487
derivative 1456
Jordan Content of an N-cell 1487
lHpitals rule 1460
Laplace equation 1487
proof of De lHpitals rule 1461
chain rule (several variables) 1488
related rates 1462
26A27 Nondifferentiability (nondifferen- divergence 1489
tiable functions, points of nondifferentia- extremum 1490
irrotational field 1490
bility), discontinuous derivatives 1464
partial derivative 1491
Weierstrass function 1464
plateau 1492
26A36 Antidifferentiation 1465
proof of Greens theorem 1492
antiderivative 1465
relations between Hessian matrix and local exintegration by parts 1465
integrations by parts for the Lebesgue integral trema 1493
solenoidal field 1494
1466
26A42 Integrals of Riemann, Stieltjes 26B15 Integration: length, area, volume
1495
and Lebesgue type 1468
arc length 1495
Riemann sum 1468
26B20 Integral formulas (Stokes, Gauss,
Riemann-Stieltjes integral 1469
continuous functions are Riemann integrable 1469 Green, etc.) 1497
Greens theorem 1497
generalized Riemann integral 1469
proof of Continuous functions are Riemann inte- 26B25 Convexity, generalizations 1499
convex function 1499
grable 1470
extremal value of convex/concave functions 1500
26A51 Convexity, generalizations 1471
26B30 Absolutely continuous functions,
concave function 1471
functions of bounded variation 1502
26Axx Functions of one variable 1472
absolutely continuous function 1502
function centroid 1472
26B05 Continuity and differentiation ques- total variation 1503
tions 1473
derivation of zeroth weighted power mean 1505

C0 (U ) is not empty 1473
weighted power mean 1506
Rademachers Theorem 1474
26C15 Rational functions 1507
smooth functions with compact support 1475
26B10 Implicit function theorems, Jaco- rational function 1507
bians, transformations with several vari- 26C99 Miscellaneous 1508
Laguerre Polynomial 1508
ables 1477
26D05 Inequalities for trigonometric funcJacobian matrix 1477
tions and polynomials 1509
directional derivative 1477
Weierstrass product inequality 1509
gradient 1478
proof of Jordans Inequality 1509
implicit differentiation 1481
xxxii
26D10 Inequalities involving derivatives Hahn-Kolmogorov theorem 1536

and differential and integral operators 1511 measure 1536
outer measure 1536
Gronwalls lemma 1511
properties for measure 1538
proof of Gronwalls lemma 1511
26D15 Inequalities for sums, series and 28A12 Contents, measures, outer measures, capacities 1540
integrals 1513
Hahn decomposition theorem 1540
Carlemans inequality 1513
Jordan decomposition 1540
Chebyshevs inequality 1513
Lebesgue decomposition theorem 1541
MacLaurins Inequality 1514
Lebesgue outer measure 1541
Minkowski inequality 1514
absolutely continuous 1542
Muirheads theorem 1515
counting measure 1543
Schurs inequality 1515
measurable set 1543
Youngs inequality 1515
arithmetic-geometric-harmonic means inequality outer regular 1543
signed measure 1543
1516
singular measure 1544
general means inequality 1516
28A15 Abstract differentiation theory,
power mean 1517
differentiation of set functions 1545
proof of Chebyshevs inequality 1517
Hardy-Littlewood maximal theorem 1545
proof of Minkowski inequality 1518
proof of arithmetic-geometric-harmonic means in- Lebesgue differentiation theorem 1545
Radon-Nikodym theorem 1546
equality 1519
integral depending on a parameter 1547
proof of general means inequality 1521
28A20 Measurable and nonmeasurable
proof of rearrangement inequality 1522
functions, sequences of measurable funcrearrangement inequality 1523
tions, modes of convergence 1549
Egorovs theorem 1549
Bernoullis inequality 1524
Fatous lemma 1549
proof of Bernoullis inequality 1524
Fatou-Lebesgue theorem 1550
26E35 Nonstandard analysis 1526
dominated convergence theorem 1550
hyperreal 1526
measurable function 1550
e is not a quadratic irrational 1527
monotone convergence theorem 1551
zero of a function 1528
28-00 General reference works (handproof of Egorovs theorem 1551
proof of Fatous lemma 1552
proof of Fatou-Lebesgue theorem 1552
extended real numbers 1530
proof of dominated convergence theorem 1553
28-XX Measure and integration 1532
proof of monotone convergence theorem 1553
Riemann integral 1532
28A25 Integration with respect to meamartingale 1532
28A05 Classes of sets (Borel fields, - sures and other set functions 1555
rings, etc.), measurable sets, Suslin sets, L (X, d) 1555
Hardy-Littlewood maximal operator 1555
analytic sets 1534
Lebesgue integral 1556
Borel -algebra 1534
28A10 Real- or complex-valued set func- 28A60 Measures on Boolean rings, measure algebras 1558
tions 1535
-algebra 1558
-finite 1535
-algebra 1558
Argand diagram 1535
xxxiii
Runges theorem 1582

algebra 1559
Weierstrass M-test 1583
measurable set (for outer measure) 1559
28A75 Length, area, volume, other geo- annulus 1583
conformally equivalent 1583
metric measure theory 1561
contour integral 1584
Lebesgue density theorem 1561
orientation 1585
28A80 Fractals 1562
proof of Weierstrass M-test 1585
Cantor set 1562
unit disk 1586
Hausdorff dimension 1565
upper half plane 1586
Koch curve 1566
winding number and fundamental group 1586
Sierpinski gasket 1567
30B10 Power series (including lacunary
fractal 1567
series) 1587
28Axx Classical measure theory 1569
Euler relation 1587
Vitalis Theorem 1569
analytic 1588
proof of Vitalis Theorem 1569
28B15 Set functions, measures and inte- existence of power series 1588
infinitely-differentiable function that is not anagrals with values in ordered spaces 1571
lytic 1590
Lp -space 1571
power series 1591
locally integrable function 1572
28C05 Integration theory via linear func- proof of radius of convergence 1592
tionals (Radon measures, Daniell integrals, radius of convergence 1593
etc.), representing set functions and mea- 30B50 Dirichlet series and other series
expansions, exponential series 1594
sures 1573
Dirichlet series 1594
Haar integral 1573
28C10 Set functions and measures on 30C15 Zeros of polynomials, rational functopological groups, Haar measures, invari- tions, and other analytic functions (e.g.
zeros of functions with bounded Dirichlet
ant measures 1575
integral) 1596
Haar measure 1575
28C20 Set functions and measures and Mason-Stothers theorem 1596
zeroes of analytic functions are isolated 1596
integrals in infinite-dimensional spaces (Wiener
30C20 Conformal mappings of special
measure, Gaussian measure, etc.) 1577
domains 1598
essential supremum 1577
28D05 Measure-preserving transforma- automorphisms of unit disk 1598
unit disk upper half plane conformal equivalence
tions 1578
theorem 1598
measure-preserving 1578
30-00 General reference works (hand- 30C35 General theory of conformal mappings 1599
proof of conformal mapping theorem 1599
domain 1579
30C80 Maximum principle; Schwarzs lemma,
region 1579
Lindel
of principle, analogues and generalregular region 1580
izations; subordination 1601
topology of the complex plane 1580
30-XX Functions of a complex variable Schwarz lemma 1601
maximum principle 1601
1581
proof of Schwarz lemma 1602
z0 is a pole of f 1581
30D20 Entire functions, general theory
1603
Riemann mapping theorem 1582
xxxiv
Liouvilles theorem 1603

Moreras theorem 1603
entire 1604
holomorphic 1604
proof of Liouvilles theorem 1604
30D30 Meromorphic functions, general
theory 1606
Casorati-Weierstrass theorem 1606
Mittag-Lefflers theorem 1606
Riemanns removable singularity theorem 1607
essential singularity 1607
meromorphic 1607
pole 1607
proof of Casorati-Weierstrass theorem 1608
proof of Riemanns removable singularity theorem 1608
residue 1609
simple pole 1610
30E20 Integration, integrals of Cauchy
type, integral representations of analytic
functions 1611
Cauchy integral formula 1611
Cauchy integral theorem 1612
Cauchy residue theorem 1613
Gauss mean value theorem 1614
Mobius circle transformation theorem 1614
Mobius transformation cross-ratio preservation
theorem 1614
Rouchs theorem 1614
absolute convergence implies convergence for an
infinite product 1615
absolute convergence of infinite product 1615
closed curve theorem 1615
conformal Mobius circle map theorem 1615
conformal mapping 1616
conformal mapping theorem 1616
convergence/divergence for an infinite product
1616
example of conformal mapping 1616
examples of infinite products 1617
link between infinite products and sums 1617
proof of Cauchy integral formula 1618
proof of Cauchy residue theorem 1619
proof of Gauss mean value theorem 1620
proof of Goursats theorem 1620
proof of Mobius circle transformation theorem

1622
proof of Simultaneous converging or diverging of
product and sum theorem 1623
proof of absolute convergence implies convergence
for an infinite product 1624
proof of closed curve theorem 1624
proof of conformal Mobius circle map theorem
1624
simultaneous converging or diverging of product
and sum theorem 1625
Cauchy-Riemann equations 1625
Cauchy-Riemann equations (polar coordinates)
1626
proof of the Cauchy-Riemann equations 1626
removable singularity 1627
30F40 Kleinian groups 1629
Klein 4-group 1629
31A05 Harmonic, subharmonic, superharmonic functions 1630
a harmonic function on a graph which is bounded
below and nonconstant 1630
example of harmonic functions on graphs 1630
examples of harmonic functions on Rn 1631
harmonic function 1632
31B05 Harmonic, subharmonic, superharmonic functions 1633
Laplacian 1633
32A05 Power series, series of functions
1634
exponential function 1634
32C15 Complex spaces 1637
Riemann sphere 1637
star-shaped region 1638
32H02 Holomorphic mappings, (holomorphic) embeddings and related questions 1639
Blochs theorem 1639
Hartogs theorem 1639
32H25 Picard-type theorems and generalizations 1640
Picards theorem 1640
little Picard theorem 1640
33-XX Special functions 1641
beta function 1641
xxxv
33B10 Exponential and trigonometric func-34A05 Explicit solutions and reductions

1669
tions 1642
separation of variables 1669
natural logarithm 1642
33B15 Gamma, beta and polygamma func- variation of parameters 1670
34A12 Initial value problems, existence,
tions 1643
uniqueness, continuous dependence and conBohr-Mollerup theorem 1643
tinuation of solutions 1672
gamma function 1643
initial value problem 1672
proof of Bohr-Mollerup theorem 1645
33B30 Higher logarithm functions 1647 34A30 Linear equations and systems, general 1674
Lambert W function 1647
Chebyshev equation 1674
natural log base 1648
33D45 Basic orthogonal polynomials and autonomous system 1676
functions (Askey-Wilson polynomials, etc.) 34B24 Sturm-Liouville theory 1677
eigenfunction 1677
1649
34C05 Location of integral curves, sinorthogonal polynomials 1649
33E05 Elliptic functions and integrals gular points, limit cycles 1678
Hopf bifurcation theorem 1678
1651
Poincare-Bendixson theorem 1679
Weierstrass sigma function 1651
omega limit set 1679
elliptic function 1652
elliptic integrals and Jacobi elliptic functions 1652 34C07 Theory of limit cycles of polynomial and analytic vector fields (existence,
examples of elliptic functions 1654
uniqueness, bounds, Hilberts 16th probmodular discriminant 1654
34-00 General reference works (hand- lem and ramif 1680
Hilberts 16th problem for quadratic vector fields
1680
Liapunov function 1656
34C23 Bifurcation 1682
Lorenz equation 1657
equivariant branching lemma 1682
Wronskian determinant 1659
dependence on initial conditions of solutions of 34C25 Periodic solutions 1683
Bendixsons negative criterion 1683
ordinary differential equations 1660
Dulacs criteria 1683
differential equation 1661
existence and uniqueness of solution of ordinary proof of Bendixsons negative criterion 1684
differential equations 1662
maximal interval of existence of ordinary differ- Hartman-Grobman theorem 1685
equilibrium point 1685
ential equations 1663
stable manifold theorem 1686
method of undetermined coefficients 1663
34D20 Lyapunov stability 1687
natural symmetry of the Lorenz equation 1664
symmetry of a solution of an ordinary differen- Lyapunov stable 1687
neutrally stable fixed point 1687
tial equation 1665
symmetry of an ordinary differential equation stable fixed point 1687
34L05 General spectral theory 1688
1665
34-01 Instructional exposition (textbooks, Gelfand spectral radius theorem 1688
34L15 Estimation of eigenvalues, upper
second order linear differential equation with con- and lower bounds 1689
Rayleigh quotient 1689
stant coefficients 1667
xxxvi
34L40 Particular operators (Dirac, one- 37C20 Generic properties, structural stability 1712
dimensional Schr
odinger, etc.) 1690
Kupka-Smale theorem 1712
Dirac delta function 1690
Pughs general density theorem 1712
construction of Dirac delta function 1691
35-00 General reference works (hand- structural stability 1713
37C25 Fixed points, periodic points, fixedbooks, dictionaries, bibliographies, etc.) 1692
point index theory 1714
differential operator 1692
35J05 Laplace equation, reduced wave hyperbolic fixed point 1714
equation (Helmholtz), Poisson equation 169437C29 Homoclinic and heteroclinic orbits 1715
Poissons equation 1694
heteroclinic 1715
35L05 Wave equation 1695
homoclinic 1715
wave equation 1695
35Q53 KdV-like equations (Korteweg-de 37C75 Stability theory 1716
Vries, Burgers, sine-Gordon, sinh-Gordon, attracting fixed point 1716
stable manifold 1716
etc.) 1697
37C80 Symmetries, equivariant dynamKorteweg - de Vries equation 1697
ical systems 1718
35Q99 Miscellaneous 1698
-equivariant 1718
heat equation 1698
37-00 General reference works (hand- 37D05 Hyperbolic orbits and sets 1719
hyperbolic isomorphism 1719
37A30 Ergodic theorems, spectral the- 37D20 Uniformly hyperbolic systems (expanding, Anosov, Axiom A, etc.) 1720
ory, Markov operators 1700
Anosov diffeomorphism 1720
ergodic 1700
Axiom A 1721
fundamental theorem of demography 1700
proof of fundamental theorem of demography 1701 hyperbolic set 1721
37B05 Transformations and group ac- 37D99 Miscellaneous 1722
tions with special properties (minimality, Kupka-Smale 1722
37E05 Maps of the interval (piecewise
distality, proximality, etc.) 1703
continuous, continuous, smooth) 1723
discontinuous action 1703
Sharkovskiis theorem 1723
37B20 Notions of recurrence 1704
37G15 Bifurcations of limit cycles and
nonwandering set 1704
periodic orbits 1724
Feigenbaum constant 1724
-limit set 1705
Feigenbaum fractal 1725
asymptotically stable 1706
equivariant Hopf theorem 1726
expansive 1706
the only compact metric spaces that admit a pos- 37G40 Symmetries, equivariant bifurcation theory 1728
itively expansive homeomorphism are discrete spaces
Poenaru (1976) theorem 1728
1707
bifurcation problem with symmetry group 1728
topological conjugation 1708
trace formula 1729
topologically transitive 1709
uniform expansivity 1709
37C10 Vector fields, flows, ordinary dif- chaotic dynamical system 1730
37H20 Bifurcation theory 1732
ferential equations 1710
bifurcation 1732
flow 1710
39B05 General 1733
globally attracting fixed point 1711
xxxvii
proof of Abels test for convergence 1754

functional equation 1733
39B62 Functional inequalities, including proof of Bolzano-Weierstrass Theorem 1754
proof of Cauchys root test 1756
subadditivity, convexity, etc. 1734
proof of Leibnizs theorem (using Dirichlets conJensens inequality 1734
vergence test) 1756
proof of Jensens inequality 1735
proof of arithmetic-geometric-harmonic means inproof of absolute convergence theorem 1756
proof of alternating series test 1757
equality 1735
proof of comparison test 1757
subadditivity 1736
proof of integral test 1758
superadditivity 1736
40-00 General reference works (handproof of ratio test 1759
ratio test 1759
40A10 Convergence and divergence of
Cauchy product 1738
integrals 1760
Cesro mean 1739
improper integral 1760
alternating series 1739
40A25 Approximation to limiting values
alternating series test 1739
(summation of series, etc.) 1761
monotonic 1740
Eulers constant 1761
monotonically decreasing 1740
40A30 Convergence and divergence of
monotonically increasing 1741
series and sequences of functions 1763
monotonically nondecreasing 1741
Abels limit theorem 1763
monotonically nonincreasing 1741
Lowner partial ordering 1763
sequence 1742
Lowners theorem 1764
series 1742
40A05 Convergence and divergence of matrix monotone 1764
operator monotone 1764
series and sequences 1743
pointwise convergence 1764
Abels lemma 1743
uniform convergence 1765
Abels test for convergence 1744
40G05 Ces`
aro, Euler, N
orlund and HausBaronis Theorem 1744
dorff methods 1766
Bolzano-Weierstrass theorem 1744
Cesàro summability 1766
Cauchy criterion for convergence 1744
40G10 Abel, Borel and power series methCauchys root test 1745
ods 1768
Dirichlets convergence test 1745
Abel summability 1768
Proof of Baronis Theorem 1746
proof of Abels convergence theorem 1769
Proof of Stolz-Cesaro theorem 1747
proof of Taubers convergence theorem 1770
Stolz-Cesaro theorem 1748
41A05 Interpolation 1772
absolute convergence theorem 1748
Lagrange Interpolation formula 1772
comparison test 1748
Simpsons 3/8 rule 1772
convergent sequence 1749
trapezoidal rule 1773
convergent series 1749
41A25 Rate of convergence, degree of
determining series convergence 1749
approximation 1775
example of integral test 1750
superconvergence 1775
geometric series 1750
41A58 Series expansions (e.g. Taylor,
harmonic number 1751
Lidstone series, but not Fourier series) 1776
harmonic series 1752
Taylor series 1776
integral test 1753
Taylors Theorem 1778
proof of Abels lemma (by induction) 1754
xxxviii
41A60 Asymptotic approximations, asymp-symmetric set 1803

totic expansions (steepest descent, etc.) 46A30 Open mapping and closed graph
theorems; completeness (including B-, Br 1779
completeness) 1805
Stirlings approximation 1779
42-00 General reference works (hand- closed graph theorem 1805
open mapping theorem 1805
countable basis 1781
Heine-Cantor theorem 1806
discrete cosine transform 1782
42-01 Instructional exposition (textbooks, proof of Heine-Cantor theorem 1806
topological vector space 1807
46B20 Geometry and structure of normed
Laplace transform 1784
42A05 Trigonometric polynomials, in- linear spaces 1808
limp kxkp = kxk 1808
equalities, extremal problems 1785
Hahn-Banach theorem 1809
Chebyshev polynomial 1785
42A16 Fourier coefficients, Fourier se- proof of Hahn-Banach theorem 1810
ries of functions with special properties, seminorm 1811
vector norm 1813
special Fourier series 1787
46B50 Compactness in Banach (or normed)
Riemann-Lebesgue lemma 1787
spaces 1815
example of Fourier series 1788
42A20 Convergence and absolute con- Schauder fixed point theorem 1815
vergence of Fourier and trigonometric se- proof of Schauder fixed point theorem 1815
ries 1789
`p 1817
Dirichlet conditions 1789
42A38 Fourier and Fourier-Stieltjes trans- Banach space 1818
forms and other transforms of Fourier type an inner product defines a norm 1818
continuous linear mapping 1818
1790
equivalent norms 1819
Fourier transform 1790
normed vector space 1820
46Bxx Normed linear spaces and Banach
Poisson summation formula 1792
42B05 Fourier series and coefficients 1793 spaces; Banach lattices 1821
vector p-norm 1821
Parseval equality 1793
46C05 Hilbert and pre-Hilbert spaces:
Wirtingers inequality 1793
43A07 Means on groups, semigroups, geometry and topology (including spaces
with semidefinite inner product) 1822
etc.; amenable groups 1795
Bessel inequality 1822
amenable group 1795
Hilbert module 1822
44A35 Convolution 1796
Hilbert space 1823
convolution 1796
46-00 General reference works (handproof of Bessel inequality 1823
46C15 Characterizations of Hilbert spaces
1825
balanced set 1799
classification of separable Hilbert spaces 1825
bounded function 1800
bounded set (in a topological vector space) 1801 46E15 Banach spaces of continuous, differentiable or analytic functions 1826
cone 1802
Ascoli-Arzela theorem 1826
locally convex topological vector space 1803
sequential characterization of boundedness 1803 Stone-Weierstrass theorem 1826
xxxix
47A53 (Semi-) Fredholm operators; inproof of Ascoli-Arzel theorem 1827

dex theories 1853
Holder inequality 1827
Fredholm index 1853
Young Inequality 1828
Fredholm operator 1853
conjugate index 1828
47A56 Functions whose values are linproof of Holder inequality 1828
ear operators (operator and matrix valproof of Young Inequality 1829
ued functions, etc., including analytic and
vector field 1829
46F05 Topological linear spaces of test meromorphic ones 1855
functions, distributions and ultradistribu- Taylors formula for matrix functions 1855
47A60 Functional calculus 1856
tions 1830
Beltrami identity 1856
Tf is a distribution of zeroth order 1830
Euler-Lagrange differential equation 1857
p.v.( x1 ) is a distribution of first order 1831
calculus of variations 1857
Cauchy principal part integral 1832
47B15 Hermitian and normal operators
delta distribution 1833
(spectral measures, functional calculus, etc.)
distribution 1833
1862
equivalence of conditions 1835
every locally integrable function is a distribution self-adjoint operator 1862
47G30 Pseudodifferential operators 1863
1836
Dini derivative 1863
localization for distributions 1836
47H10 Fixed-point theorems 1864
operations on distributions 1837
Brouwer fixed point in one dimension 1864
smooth distribution 1839
Brouwer fixed point theorem 1865
space of rapidly decreasing functions 1840
any topological space with the fixed point propsupport of distribution 1841
46H05 General theory of topological al- erty is connected 1865
fixed point property 1866
gebras 1843
proof of Brouwer fixed point theorem 1867
Banach algebra 1843
46L05 General theory of C -algebras 1844 47L07 Convex sets and cones of operators 1868
C -algebra 1844
convex hull of S is open if S is open 1868
Gelfand-Naimark representation theorem 1844
47L25 Operator spaces (= matricially
state 1844
normed spaces) 1869
46L85 Noncommutative topology 1846
operator norm 1869
Gelfand-Naimark theorem 1846
Serre-Swan theorem 1846
46T12 Measure (Gaussian, cylindrical, Drazin inverse 1870
etc.) and integrals (Feynman, path, Fres- 49K10 Free problems in two or more independent variables 1871
nel, etc.) on manifolds 1847
Kantorovitchs theorem 1871
path integral 1847
47A05 General (adjoints, conjugates, prod-49M15 Methods of Newton-Raphson, Galerkin
ucts, inverses, domains, ranges, etc.) 1849 and Ritz types 1873
Newtons method 1873
Baker-Campbell-Hausdorff formula(e) 1849
51-00 General reference works (handadjoint 1850
closed operator 1850
Apollonius theorem 1877
properties of the adjoint operator 1851
Apollonius circle 1877
47A35 Ergodic theory 1852
Brahmaguptas formula 1878
ergodic theorem 1852
xl
Brianchon theorem 1878

Brocard theorem 1878
Carnot circles 1879
Erdos-Anning Theorem 1879
Euler Line 1879
Gergonne point 1879
Gergonne triangle 1880
Herons formula 1880
Lemoine circle 1880
Lemoine point 1880
Miquel point 1881
Mollweides equations 1881
Morleys theorem 1881
Newtons line 1882
Newton-Gauss line 1882
Pascals mystic hexagram 1882
Ptolemys theorem 1882
Pythagorean theorem 1883
Schooten theorem 1883
Simsons line 1884
Stewarts theorem 1884
Thales theorem 1884
alternate proof of parallelogram law 1885
alternative proof of the sines law 1885
angle bisector 1887
angle sum identity 1888
annulus 1889
butterfly theorem 1889
centroid 1889
chord 1890
circle 1890
collinear 1893
complete quadrilateral 1893
concurrent 1893
cosines law 1894
cyclic quadrilateral 1894
derivation of cosines law 1894
diameter 1895
double angle identity 1896
equilateral triangle 1896
fundamental theorem on isogonal lines 1897
height 1897
hexagon 1897
hypotenuse 1898
isogonal conjugate 1898
isosceles triangle 1899

legs 1899
medial triangle 1899
median 1900
midpoint 1900
nine-point circle 1900
orthic triangle 1901
orthocenter 1901
parallelogram 1902
parallelogram law 1902
pedal triangle 1902
pentagon 1903
polygon 1903
proof of Apollonius theorem 1904
proof of Apollonius theorem 1904
proof of Brahmaguptas formula 1905
proof of Erdos-Anning Theorem 1906
proof of Herons formula 1906
proof of Mollweides equations 1907
proof of Ptolemys inequality 1908
proof of Ptolemys theorem 1909
proof of Pythagorean theorem 1910
proof of Pythagorean theorem 1910
proof of Simsons line 1911
proof of Stewarts theorem 1912
proof of Thales theorem 1913
proof of butterfly theorem 1913
proof of double angle identity 1914
proof of parallelogram law 1915
proof of tangents law 1915
quadrilateral 1916
radius 1916
rectangle 1916
regular polygon 1917
regular polyhedron 1917
rhombus 1918
right triangle 1919
sector of a circle 1919
sines law 1919
sines law proof 1920
some proofs for triangle theorems 1920
square 1921
tangents law 1921
triangle 1921
triangle center 1922
xli

geometry 1924
51-XX Geometry 1927
non-Euclidean geometry 1927
parallel postulate 1927
51A05 General theory and projective geometries 1928
Cevas theorem 1928
Menelaus theorem 1928
Pappuss theorem 1929
proof of Cevas theorem 1929
proof of Menelaus theorem 1930
proof of Pappuss theorem 1931
proof of Pascals mystic hexagram 1932
51A30 Desarguesian and Pappian geometries 1934
Desargues theorem 1934
proof of Desargues theorem 1934
Picks theorem 1936
proof of Picks theorem 1936
Weizenbocks Inequality 1939
51M04 Elementary problems in Euclidean
geometries 1940
Napoleons theorem 1940
corollary of Morleys theorem 1941
pivot theorem 1941
proof of Morleys theorem 1941
proof of pivot theorem 1943
51M05 Euclidean geometries (general)
and generalizations 1944
area of the n-sphere 1944
geometry of the sphere 1945
sphere 1945
spherical coordinates 1947
volume of the n-sphere 1947
51M10 Hyperbolic and elliptic geometries (general) and generalizations 1949
Lobachevskys formula 1949
51M16 Inequalities and extremum problems 1950
Brunn-Minkowski inequality 1950
Hadwiger-Finsler inequality 1950
isoperimetric inequality 1951

proof of Hadwiger-Finsler inequality 1951
51M20 Polyhedra and polytopes; regular figures, division of spaces 1953
polyhedron 1953
Euler line proof 1954
SSA 1954
cevian 1955
congruence 1955
incenter 1956
incircle 1956
symmedian 1957
51N05 Descriptive geometry 1958
curve 1958
piecewise smooth 1960
rectifiable 1960
51N20 Euclidean analytic geometry 1961
Steiners theorem 1961
Van Aubel theorem 1961
conic section 1961
proof of Steiners theorem 1963
proof of Van Aubel theorem 1964
proof of Van Aubels Theorem 1965
three theorems on parabolas 1966
52A01 Axiomatic and generalized convexity 1969
convex combination 1969
52A07 Convex sets in topological vector
spaces 1970
Frechet space 1970
52A20 Convex sets in n dimensions (including convex hypersurfaces) 1973
Caratheodorys theorem 1973
52A35 Helly-type theorems and geometric transversal theory 1974
Hellys theorem 1974
convex set 1975
52C07 Lattices and convex bodies in n
dimensions 1976
Radons lemma 1976
52C35 Arrangements of points, flats, hyperplanes 1978
Sylvesters theorem 1978
xlii
53-00 General reference works (hand- symplectic manifold 2009

symplectic matrix 2009
symplectic vector field 2010
Lie derivative 1979
closed differential forms on a simple connected symplectic vector space 2010
53D10 Contact manifolds, general 2011
domain 1979
contact manifold 2011
exact (differential form) 1980
53D20 Momentum maps; symplectic remanifold 1980
duction 2012
metric tensor 1983
proof of closed differential forms on a simple con- momentum map 2012
54-00 General reference works (handnected domain 1983
pullback of a k-form 1985
Krull dimension 2013
tangent space 1985
53-01 Instructional exposition (textbooks, Niemytzki plane 2013
Sorgenfrey line 2014
boundary (in topology) 2014
curl 1988
closed set 2014
53A04 Curves in Euclidean space 1990
coarser 2015
Frenet frame 1990
compact-open topology 2015
Serret-Frenet equations 1991
completely normal 2015
curvature (space curve) 1992
continuous proper map 2016
fundamental theorem of space curves 1993
derived set 2016
helix 1993
diameter 2016
space curve 1994
53A45 Vector and tensor analysis 1996 every second countable space is separable 2016
first axiom of countability 2017
closed (differential form) 1996
53B05 Linear and affine connections 1997 homotopy groups 2017
indiscrete topology 2018
Levi-Civita connection 1997
interior 2018
connection 1997
invariant forms on representations of compact
vector field along a curve 2001
53B21 Methods of Riemannian geome- groups 2018
ladder connected 2019
try 2002
local base 2020
Hodge star operator 2002
loop 2020
Riemannian manifold 2002
loop space 2020
metrizable 2020
germ of smooth functions 2004
53C17 Sub-Riemannian geometry 2005 neighborhood system 2021
paracompact topological space 2021
Sub-Riemannian manifold 2005
53D05 Symplectic manifolds, general 2006 pointed topological space 2021
Darbouxs Theorem (symplectic geometry) 2006 proper map 2021
quasi-compact 2022
Mosers theorem 2006
regularly open 2022
almost complex structure 2007
separated 2022
coadjoint orbit 2007
support of function 2023
examples of symplectic manifolds 2007
topological invariant 2024
hamiltonian vector field 2008
topological space 2024
isotropic submanifold 2008
topology 2025
lagrangian submanifold 2009
xliii
2045
triangle inequality 2025
Klein bottle 2045
universal covering space 2026
54A05 Topological spaces and general- Mobius strip 2046
cell attachment 2047
izations (closure spaces, etc.) 2027
quotient space 2047
characterization of connected compact metric spaces.
torus 2048
2027
54B17 Adjunction spaces and similar conclosure axioms 2027
structions 2049
neighborhood 2028
adjunction space 2049
open set 2028
54A20 Convergence in general topology 54B40 Presheaves and sheaves 2050
direct image 2050
(sequences, filters, limits, convergence spaces,
etc.) 2030
cofinite and cocountable topology 2051
Banach fixed point theorem 2030
cone 2051
Dinis theorem 2031
join 2052
another proof of Dinis theorem 2031
order topology 2052
continuous convergence 2032
contractive maps are uniformly continuous 2033 suspension 2053
54C05 Continuous maps 2054
net 2033
Inverse Function Theorem (topological spaces)
proof of Banach fixed point theorem 2034
2054
proof of Dinis theorem 2035
continuity of composition of functions 2054
theorem about continuous convergence 2035
continuous 2055
ultrafilter 2035
discontinuous 2055
ultranet 2036
homeomorphism 2057
proof of Inverse Function Theorem (topological
basis 2037
spaces) 2057
box topology 2037
restriction of a continuous mapping is continuclosure 2038
ous 2057
cover 2038
54C10 Special maps on topological spaces
dense 2039
(open, closed, perfect, etc.) 2059
examples of filters 2039
densely defined 2059
filter 2039
open mapping 2059
limit point 2040
54C15 Retraction 2060
nowhere dense 2040
retract 2060
perfect set 2041
54C70 Entropy 2061
properties of the closure operator 2041
differential entropy 2061
subbasis 2041
54B05 Subspaces 2042
Borsuk-Ulam theorem 2062
irreducible 2042
ham sandwich theorem 2062
irreducible component 2042
proof of Borsuk-Ulam theorem 2062
subspace topology 2042
54D05 Connected and locally connected
54B10 Product spaces 2043
spaces (general aspects) 2064
product topology 2043
product topology preserves the Hausdorff prop- Jordan curve theorem 2064
clopen subset 2064
erty 2044
54B15 Quotient spaces, decompositions connected component 2065
xliv
connected set 2065

connected set in a topological space 2066
connected space 2066
connectedness is preserved under a continuous
map 2066
cut-point 2067
example of a connected space that is not pathconnected 2067
example of a semilocally simply connected space
which is not locally simply connected 2068
example of a space that is not semilocally simply
connected 2068
locally connected 2069
locally simply connected 2069
path component 2069
path connected 2070
products of connected spaces 2070
proof that a path connected space is connected
2070
quasicomponent 2070
semilocally simply connected 2071
54D10 Lower separation axioms (T0 T3 ,
etc.) 2072
T0 space 2072
T1 space 2072
T2 space 2072
T3 space 2073
a compact set in a Hausdorff space is closed 2073
proof of A compact set in a Hausdorff space is
closed 2074
regular 2074
regular space 2074
separation axioms 2075
topological space is T1 if and only if every singleton is closed. 2076
54D15 Higher separation axioms (completely regular, normal, perfectly or collectionwise normal, etc.) 2077
Tietze extension theorem 2077
Tychonoff 2077
Urysohns lemma 2078
normal 2078
proof of Urysohns lemma 2078
54D20 Noncompact covering properties
(paracompact, Lindel
of, etc.) 2081
Lindelof 2081
countably compact 2081
locally finite 2081
54D30 Compactness 2082
Y is compact if and only if every open cover of
Y has a finite subcover 2082
Heine-Borel theorem 2083
Tychonoffs theorem 2083
a space is compact if and only if the space has
the finite intersection property 2083
closed set in a compact space is compact 2084
closed subsets of a compact set are compact 2084
compact 2085
compactness is preserved under a continuous map
2085
examples of compact spaces 2086
finite intersection property 2088
limit point compact 2088
point and a compact set in a Hausdorff space
have disjoint open neighborhoods. 2088
proof of Heine-Borel theorem 2089
properties of compact spaces 2091
relatively compact 2092
sequentially compact 2092
two disjoint compact sets in a Hausdorff space
have disjoint open neighborhoods. 2092
54D35 Extensions of spaces (compactifications, supercompactifications, completions, etc.) 2094
Alexandrov one-point compactification 2094
compactification 2094
54D45 Local compactness, -compactness
2095
-compact 2095
examples of locally compact and not locally compact spaces 2095
locally compact 2096
54D65 Separability 2097
separable 2097
54D70 Base properties 2098
second countable 2098
Lindelof theorem 2099
first countable 2099
proof of Lindelof theorem 2099
xlv
totally disconnected space 2100

54E15 Uniform structures and generalizations 2101
topology induced by uniform structure 2101
uniform space 2101
uniform structure of a metric space 2102
uniform structure of a topological group 2102
-net 2103
Euclidean distance 2103
Hausdorff metric 2104
Urysohn metrization theorem 2104
ball 2104
bounded 2105
city-block metric 2105
completely metrizable 2105
distance to a set 2106
equibounded 2106
isometry 2106
metric space 2107
non-reversible metric 2107
open ball 2108
some structures on Rn 2108
totally bounded 2110
ultrametric 2110
Lebesgue number lemma 2111
proof of Lebesgue number lemma 2111
complete 2111
completeness principle 2112
uniformly equicontinuous 2112
Baire category theorem 2112
Baire space 2113
equivalent statement of Baire category theorem
2113
generic 2114
meager 2114
proof for one equivalent statement of Baire category theorem 2114
proof of Baire category theorem 2115
residual 2115
six consequences of Baire category theorem 2116
Hahn-Mazurkiewicz theorem 2116
Vitali covering 2116
compactly generated 2116
54G05 Extremally disconnected spaces,
F -spaces, etc. 2117
extremally disconnected 2117

54G20 Counterexamples 2118
Sierpinski space 2118
long line 2118
Universal Coefficient Theorem 2120
invariance of dimension 2121
55M05 Duality 2122
Poincare duality 2122
55M20 Fixed points and coincidences 2123
Sperners lemma 2123
55M25 Degree, winding number 2125
degree (map of spheres) 2125
winding number 2126
genus of topological surface 2127
55N10 Singular theory 2128
Betti number 2128
Mayer-Vietoris sequence 2128
cellular homology 2128
homology (topological space) 2129
homology of RP3 . 2131
long exact sequence (of homology groups) 2132
relative homology groups 2133
suspension isomorphism 2134
55P05 Homotopy extension properties,
cofibrations 2135
cofibration 2135
homotopy extension property 2135
55P10 Homotopy equivalences 2136
Whitehead theorem 2136
weak homotopy equivalence 2136
55P15 Classification of homotopy type
2137
simply connected 2137
55P20 Eilenberg-Mac Lane spaces 2138
Eilenberg-Mac Lane space 2138
55P99 Miscellaneous 2139
fundamental groupoid 2139
55Pxx Homotopy theory 2141
nulhomotopic map 2141
55Q05 Homotopy groups, general; sets
of homotopy classes 2142
xlvi

Van Kampens theorem 2142
Dehn surgery 2174
category of pointed topological spaces 2143
57N16 Geometric structures on manideformation retraction 2143
folds 2175
fundamental group 2144
self-intersections of a curve 2175
homotopy of maps 2144
57N70 Cobordism and concordance 2176
homotopy of paths 2145
long exact sequence (locally trivial bundle) 2145 h-cobordism 2176
55Q52 Homotopy groups of special spaces Smales h-cobordism theorem 2176
cobordism 2176
2146
contractible 2146
orientation 2178
55R05 Fiber spaces 2147
57R22 Topology of vector bundles and
classification of covering spaces 2147
fiber bundles 2180
covering space 2148
hairy ball theorem 2180
deck transformation 2148
57R35 Differentiable mappings 2182
lifting of maps 2150
Sards theorem 2182
lifting theorem 2151
differentiable function 2182
monodromy 2151
57R42 Immersions 2184
properly discontinuous action 2153
immersion 2184
regular covering 2153
57R60 Homotopy spheres, Poincar
e con55R10 Fiber bundles 2155
jecture 2185
associated bundle construction 2155
Poincare conjecture 2185
bundle map 2156
The Poincare dodecahedral space 2185
fiber bundle 2156
homology sphere 2186
locally trivial bundle 2157
57R99 Miscellaneous 2187
principal bundle 2157
transversality 2187
pullback bundle 2158
57S25 Groups acting on specific manireduction of structure group 2158
folds 2189
section of a fiber bundle 2160
Isomorphism of the group P SL2 (C) with the
some examples of universal bundles 2161
group of Mobius transformations 2189
universal bundle 2161
55R25 Sphere bundles and vector bun- 58A05 Differentiable manifolds, foundations 2190
dles 2163
partition of unity 2190
Hopf bundle 2163
58A10 Differential forms 2191
vector bundle 2163
55U10 Simplicial sets and complexes 2164 differential form 2191
58A32 Natural bundles 2194
simplicial complex 2164
57-00 General reference works (hand- conormal bundle 2194
cotangent bundle 2194
normal bundle 2195
connected sum 2167
57-XX Manifolds and cell complexes 2168 tangent bundle 2195
58C35 Integration on manifolds; meaCW complex 2168
3
sures on manifolds 2196
57M25 Knots and links in S 2170
general Stokes theorem 2196
connected sum 2170
proof of general Stokes theorem 2196
knot theory 2170
58C40 Spectral theory; eigenvalue probunknot 2173
xlvii
binomial distribution 2219

lems 2199
convergence in distribution 2220
spectral radius 2199
density function 2221
58E05 Abstract critical point theory (Morse
theory, Ljusternik-Schnirelman (Lyusternik-distribution function 2221
geometric distribution 2222
Shnirelman) theory, etc.) 2200
relative entropy 2223
Morse complex 2200
Paul Levy continuity theorem 2224
Morse function 2200
characteristic function 2225
Morse lemma 2201
Kolmogorovs inequality 2226
centralizer 2201
60-00 General reference works (hand- discrete density function 2226
probability distribution function 2227
60F05 Central limit and other weak theBayes theorem 2202
orems 2229
Bernoulli random variable 2202
Lindebergs central limit theorem 2229
Gamma random variable 2203
60F15 Strong theorems 2231
beta random variable 2204
Kolmogorovs strong law of large numbers 2231
chi-squared random variable 2205
strong law of large numbers 2231
continuous density function 2205
60G05 Foundations of stochastic processes
expected value 2206
2233
geometric random variable 2207
stochastic process 2233
proof of Bayes Theorem 2207
random variable 2208
stochastic matrix 2234
uniform (continuous) random variable 2208
60J10 Markov chains with discrete pauniform (discrete) random variable 2209
60A05 Axioms; other general questions rameter 2235
Markov chain 2235
2210
example of pairwise independent events that are 62-00 General reference works (handbooks, dictionaries, bibliographies, etc.) 2236
not totally independent 2210
covariance 2236
independent 2210
moment 2237
random event 2211
60A10 Probabilistic measure theory 2212 variance 2237
62E15 Exact distribution theory 2239
Cauchy random variable 2212
Pareto random variable 2239
almost surely 2212
exponential random variable 2240
hypergeometric random variable 2240
Borel-Cantelli lemma 2214
negative hypergeometric random variable 2241
Chebyshevs inequality 2214
negative hypergeometric random variable, examMarkovs inequality 2215
ple of 2242
cumulative distribution function 2215
proof of expected value of the hypergeometric
limit superior of sets 2215
distribution 2243
proof of Chebyshevs inequality 2216
proof of variance of the hypergeometric distribuproof of Markovs inequality 2216
60E05 Distributions: general theory 2217 tion 2243
proof that normal distribution is a distribution
Cramer-Wold theorem 2217
2245
Helly-Bray theorem 2217
65-00 General reference works (handScheffes theorem 2218
Zipfs law 2218
xlviii
normal equations 2246

principle components analysis 2247
pseudoinverse 2248
cubic spline interpolation 2250
65B15 Euler-Maclaurin formula 2252
Euler-Maclaurin summation formula 2252
proof of Euler-Maclaurin summation formula 2252
65C05 Monte Carlo methods 2254
Monte Carlo methods 2254
65D32 Quadrature and cubature formulas 2256
Simpsons rule 2256
65F25 Orthogonalization 2257
Givens rotation 2257
Gram-Schmidt orthogonalization 2258
Householder transformation 2259
orthonormal 2261
65F35 Matrix norms, conditioning, scaling 2262
Hilbert matrix 2262
Pascal matrix 2262
Toeplitz matrix 2263
matrix condition number 2264
matrix norm 2264
pivoting 2265
65R10 Integral transforms 2266
integral transform 2266
65T50 Discrete and fast Fourier transforms 2267
Vandermonde matrix 2267
discrete Fourier transform 2268
68M20 Performance evaluation; queueing; scheduling 2270
Amdahls Law 2270
efficiency 2270
proof of Amdahls Law 2271
68P05 Data structures 2272
heap insertion algorithm 2272
heap removal algorithm 2273
68P10 Searching and sorting 2275
binary search 2275
bubblesort 2276
heap 2277
heapsort 2278
in-place sorting algorithm 2279
insertion sort 2279
lower bound for sorting 2281
quicksort 2282
sorting problem 2283
68P20 Information storage and retrieval
2285
Browsing service 2285
Digital Library Index 2285
Digital Library Scenario 2285
Digital Library Space 2286
Digitial Library Searching Service 2286
Service, activity, task, or procedure 2286
StructuredStream 2286
collection 2286
digital library stream 2287
digital object 2287
good hash table primes 2287
hashing 2289
metadata format 2293
system state 2293
transition event 2294
68P30 Coding and information theory
(compaction, compression, models of communication, encoding schemes, etc.) 2295
Huffman coding 2295
Huffmans algorithm 2297
arithmetic encoding 2299
binary Gray code 2300
entropy encoding 2301
68Q01 General 2302
currying 2302
higher-order function 2303
68Q05 Models of computation (Turing
machines, etc.) 2304
Cook reduction 2304
Levin reduction 2305
Turing computable 2305
computable number 2305
deterministic finite automaton 2306
non-deterministic Turing machine 2307
non-deterministic finite automaton 2307
non-deterministic pushdown automaton 2309
oracle 2310
xlix
Kleene algebra 2331

Kleene star 2331
monad 2332
68R05 Combinatorics 2333
switching lemma 2333
68R10 Graph theory 2334
Floyds algorithm 2334
digital library structural metadata specification
2334
digital library structure 2335
digital library substructure 2335
68T10 Pattern recognition, speech recognition 2336
Hough transform 2336
68U10 Image processing 2340
aliasing 2340
68W01 General 2341
Horners rule 2341
68W30 Symbolic computation and algebraic computation 2343
algebraic computation 2343
68W40 Analysis of algorithms 2344
speedup 2344
74A05 Kinematics of deformation 2345
body 2345
deformation 2345
76D05 Navier-Stokes equations 2346
Navier-Stokes equations 2346
81S40 Path integrals 2347
Feynman path integral 2347
90C05 Linear programming 2349
linear programming 2349
simplex algorithm 2350
91A05 2-person games 2351
examples of normal form games 2351
normal form game 2352
91A10 Noncooperative games 2353
dominant strategy 2353
91A18 Games in extensive form 2354
extensive form game 2354
Nash equilibrium 2355
Pareto dominant 2355
common knowledge 2356
complete information 2356
self-reducible 2311
universal Turing machine 2311
68Q10 Modes of computation (nondeterministic, parallel, interactive, probabilistic, etc.) 2312
deterministic Turing machine 2312
random Turing machine 2313
68Q15 Complexity classes (hierarchies,
relations among complexity classes, etc.)
2315
NP-complete 2315
complexity class 2315
constructible 2317
counting complexity class 2317
polynomial hierarchy 2317
polynomial hierarchy is a hierarchy 2318
time complexity 2318
68Q25 Analysis of algorithms and problem complexity 2320
counting problem 2320
decision problem 2320
promise problem 2321
range problem 2321
search problem 2321
68Q30 Algorithmic information theory
(Kolmogorov complexity, etc.) 2323
Kolmogorov complexity 2323
Kolmogorov complexity function 2323
Kolmogorov complexity upper bounds 2324
computationally indistinguishable 2324
distribution ensemble 2325
hard core 2325
invariance theorem 2325
natural numbers identified with binary strings
2326
one-way function 2326
pseudorandom 2327
psuedorandom generator 2327
support 2327
68Q45 Formal languages and automata
2328
automaton 2328
context-free language 2329
68Q70 Algebraic theory of languages and
automata 2331
example of Nash equilibrium 2357

game 2357
game theory 2358
strategy 2358
utility 2359
92B05 General biology and biomathematics 2360
Lotka-Volterra system 2360
93A10 General systems 2362
transfer function 2362
passivity 2363
Hurwitz matrix 2365
94A12 Signal theory (characterization,
reconstruction, etc.) 2366
rms error 2366
94A17 Measures of information, entropy
2367
conditional entropy 2367
gaussian maximizes entropy for given covariance
2368
mutual information 2368
proof of gaussian maximizes entropy for given
covariance 2369
94A20 Sampling theory 2371
sampling theorem 2371
94A60 Cryptography 2372
Diffie-Hellman key exchange 2372
elliptic curve discrete logarithm problem 2373
Heaps law 2374
History 2375
li
GNU Free Documentation License

Version 1.1, March 2000
c 2000 Free Software Foundation, Inc.
Copyright
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies of this license document, but
changing it is not allowed.
Preamble
The purpose of this License is to make a manual, textbook, or other written document free
in the sense of freedom: to assure everyone the effective freedom to copy and redistribute
it, with or without modifying it, either commercially or noncommercially. Secondarily, this
License preserves for the author and publisher a way to get credit for their work, while not
being considered responsible for modifications made by others.
This License is a kind of copyleft, which means that derivative works of the document must
themselves be free in the same sense. It complements the GNU General Public License, which
is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free
software needs free documentation: a free program should come with manuals providing the
same freedoms that the software does. But this License is not limited to software manuals; it
can be used for any textual work, regardless of subject matter or whether it is published as a
printed book. We recommend this License principally for works whose purpose is instruction
or reference.
Applicability and Definitions

This License applies to any manual or other work that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. The Document,
lii
below, refers to any such manual or work. Any member of the public is a licensee, and is
addressed as you.
A Modified Version of the Document means any work containing the Document or a
portion of it, either copied verbatim, or with modifications and/or translated into another
language.
A Secondary Section is a named appendix or a front-matter section of the Document
that deals exclusively with the relationship of the publishers or authors of the Document to
the Documents overall subject (or to related matters) and contains nothing that could fall
directly within that overall subject. (For example, if the Document is in part a textbook
of mathematics, a Secondary Section may not explain any mathematics.) The relationship
could be a matter of historical connection with the subject or with related matters, or of
legal, commercial, philosophical, ethical or political position regarding them.
The Invariant Sections are certain Secondary Sections whose titles are designated, as being
those of Invariant Sections, in the notice that says that the Document is released under this
License.
The Cover Texts are certain short passages of text that are listed, as Front-Cover Texts or
Back-Cover Texts, in the notice that says that the Document is released under this License.
A Transparent copy of the Document means a machine-readable copy, represented in a
format whose specification is available to the general public, whose contents can be viewed
and edited directly and straightforwardly with generic text editors or (for images composed
of pixels) generic paint programs or (for drawings) some widely available drawing editor,
and that is suitable for input to text formatters or for automatic translation to a variety of
formats suitable for input to text formatters. A copy made in an otherwise Transparent file
format whose markup has been designed to thwart or discourage subsequent modification
by readers is not Transparent. A copy that is not Transparent is called Opaque.
Examples of suitable formats for Transparent copies include plain ASCII without markup,
Texinfo input format, LATEX input format, SGML or XML using a publicly available DTD,
and standard-conforming simple HTML designed for human modification. Opaque formats
include PostScript, PDF, proprietary formats that can be read and edited only by proprietary
word processors, SGML or XML for which the DTD and/or processing tools are not generally
available, and the machine-generated HTML produced by some word processors for output
purposes only.
The Title Page means, for a printed book, the title page itself, plus such following pages
as are needed to hold, legibly, the material this License requires to appear in the title page.
For works in formats which do not have any title page as such, Title Page means the text
near the most prominent appearance of the works title, preceding the beginning of the body
of the text.
liii
Verbatim Copying
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying
this License applies to the Document are reproduced in all copies, and that you add no
other conditions whatsoever to those of this License. You may not use technical measures
to obstruct or control the reading or further copying of the copies you make or distribute.
However, you may accept compensation in exchange for copies. If you distribute a large
enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly
display copies.
Copying in Quantity
If you publish printed copies of the Document numbering more than 100, and the Documents
license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly
and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover
Texts on the back cover. Both covers must also clearly and legibly identify you as the
publisher of these copies. The front cover must present the full title with all words of the
title equally prominent and visible. You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve the title of the Document
and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the
first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto
adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you
must either include a machine-readable Transparent copy along with each Opaque copy, or
state in or with each Opaque copy a publicly-accessible computer-network location containing
a complete Transparent copy of the Document, free of added material, which the general
network-using public has access to download anonymously at no charge using public-standard
network protocols. If you use the latter option, you must take reasonably prudent steps, when
you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy
will remain thus accessible at the stated location until at least one year after the last time
you distribute an Opaque copy (directly or through your agents or retailers) of that edition
to the public.
It is requested, but not required, that you contact the authors of the Document well before
redistributing any large number of copies, to give them a chance to provide you with an
updated version of the Document.
liv
Modifications
You may copy and distribute a Modified Version of the Document under the conditions of
sections 2 and 3 above, provided that you release the Modified Version under precisely this
License, with the Modified Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy of it. In addition, you
must do these things in the Modified Version:
Use in the Title Page (and on the covers, if any) a title distinct from that of the
Document, and from those of previous versions (which should, if there were any, be
listed in the History section of the Document). You may use the same title as a previous
version if the original publisher of that version gives permission.
List on the Title Page, as authors, one or more persons or entities responsible for
authorship of the modifications in the Modified Version, together with at least five of
the principal authors of the Document (all of its principal authors, if it has less than
five).
State on the Title page the name of the publisher of the Modified Version, as the
publisher.
Preserve all the copyright notices of the Document.
Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
Include, immediately after the copyright notices, a license notice giving the public
permission to use the Modified Version under the terms of this License, in the form
shown in the Addendum below.
Preserve in that license notice the full lists of Invariant Sections and required Cover
Texts given in the Documents license notice.
Include an unaltered copy of this License.
Preserve the section entitled History, and its title, and add to it an item stating
at least the title, year, new authors, and publisher of the Modified Version as given
on the Title Page. If there is no section entitled History in the Document, create
one stating the title, year, authors, and publisher of the Document as given on its
Title Page, then add an item describing the Modified Version as stated in the previous
sentence.
Preserve the network location, if any, given in the Document for public access to
a Transparent copy of the Document, and likewise the network locations given in the
Document for previous versions it was based on. These may be placed in the History
section. You may omit a network location for a work that was published at least four
years before the Document itself, or if the original publisher of the version it refers to
gives permission.
lv
In any section entitled Acknowledgements or Dedications, preserve the sections

title, and preserve in the section all the substance and tone of each of the contributor
acknowledgements and/or dedications given therein.
Preserve all the Invariant Sections of the Document, unaltered in their text and in
their titles. Section numbers or the equivalent are not considered part of the section
titles.
Delete any section entitled Endorsements. Such a section may not be included in
the Modified Version.
Do not retitle any existing section as Endorsements or to conflict in title with any
Invariant Section.
If the Modified Version includes new front-matter sections or appendices that qualify as
Secondary Sections and contain no material copied from the Document, you may at your
option designate some or all of these sections as invariant. To do this, add their titles to
the list of Invariant Sections in the Modified Versions license notice. These titles must be
distinct from any other section titles.
You may add a section entitled Endorsements, provided it contains nothing but endorsements of your Modified Version by various parties for example, statements of peer review
or that the text has been approved by an organization as the authoritative definition of a
standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25
words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version.
Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity. If the Document already includes a cover
text for the same cover, previously added by you or by arrangement made by the same entity
you are acting on behalf of, you may not add another; but you may replace the old one, on
explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to
use their names for publicity for or to assert or imply endorsement of any Modified Version.
Combining Documents
You may combine the Document with other documents released under this License, under
the terms defined in section 4 above for modified versions, provided that you include in the
combination all of the Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its license notice.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections
lvi
with the same name but different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original author or publisher of that
section if known, or else a unique number. Make the same adjustment to the section titles
in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections entitled History in the various original
documents, forming one section entitled History; likewise combine any sections entitled
Acknowledgements, and any sections entitled Dedications. You must delete all sections
entitled Endorsements.
Collections of Documents
You may make a collection consisting of the Document and other documents released under
this License, and replace the individual copies of this License in the various documents with
a single copy that is included in the collection, provided that you follow the rules of this
License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually
under this License, provided you insert a copy of this License into the extracted document,
and follow this License in all other respects regarding verbatim copying of that document.
Aggregation With Independent Works

A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, does not as a whole
count as a Modified Version of the Document, provided no compilation copyright is claimed
for the compilation. Such a compilation is called an aggregate, and this License does not
apply to the other self-contained works thus compiled with the Document, on account of
their being thus compiled, if they are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then
if the Document is less than one quarter of the entire aggregate, the Documents Cover Texts
may be placed on covers that surround only the Document within the aggregate. Otherwise
they must appear on covers around the whole aggregate.
Translation
Translation is considered a kind of modification, so you may distribute translations of the
Document under the terms of section 4. Replacing Invariant Sections with translations
lvii
requires special permission from their copyright holders, but you may include translations of
some or all Invariant Sections in addition to the original versions of these Invariant Sections.
You may include a translation of this License provided that you also include the original
English version of this License. In case of a disagreement between the translation and the
original English version of this License, the original English version will prevail.
Termination
You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the
Document is void, and will automatically terminate your rights under this License. However,
parties who have received copies, or rights, from you under this License will not have their
licenses terminated so long as such parties remain in full compliance.
Future Revisions of This License

The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to
the present version, but may differ in detail to address new problems or concerns. See
http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document
specifies that a particular numbered version of this License or any later version applies
to it, you have the option of following the terms and conditions either of that specified
version or of any later version that has been published (not as a draft) by the Free Software
Foundation. If the Document does not specify a version number of this License, you may
choose any version ever published (not as a draft) by the Free Software Foundation.
ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy of the License in the
document and put the following copyright and license notices just after the title page:
c YEAR YOUR NAME. Permission is granted to copy, distribute
Copyright
and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software
Foundation; with the Invariant Sections being LIST THEIR TITLES, with the
lviii
Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. A
copy of the license is included in the section entitled GNU Free Documentation
License.
If you have no Invariant Sections, write with no Invariant Sections instead of saying which
ones are invariant. If you have no Front-Cover Texts, write no Front-Cover Texts instead
of Front-Cover Texts being LIST; likewise for Back-Cover Texts.
If your document contains nontrivial examples of program code, we recommend releasing
these examples in parallel under your choice of free software license, such as the GNU General
Public License, to permit their use in free software.
lix
Chapter 1
UNCLA Unclassified
1.1
Golomb ruler
A Golomb ruler of length n is a ruler with only a subset of the integer markings {0, a2 , , n}
{0, 1, 2, . . . , n} that appear on a regular ruler. The defining criterion of this subset is that
there exists an m such that any positive integer k 6 m can be expresses uniquely as a
difference k = ai aj for some i, j. This is referred to as an m-Golomb ruler.
A 4-Golomb ruler of length n is given by {0, 1, 3, 7}. To verify this, we need to show that
every number 1, 2, . . . , 7 can be expressed as a difference of two numbers in the above set:
1=10
2=31
3=30
4=73
An optimal Golomb ruler is one where for a fixed value of n the value of an is minimized.
Version: 2 Owner: mathcam Author(s): mathcam, imran
1.2
Hesse configuration
A Hesse configuration is a set P of nine non-collinear points in the projective plane over
a field K such that any line through two points of P contains exactly three points of P .
1
Then there are 12 such lines through P . A Hesse configuration exists if and only if the field
K contains a primitive third root of unity. For such K the projective automorphism group
PGL(3, K) acts transitively on all possible Hesse configurations.
The configuration P with its intersection structure of 12 lines is isomorphic to the affine space
A = F2 where F is a field with three elements.
The group PGL(3, K) of all symmetries that map P onto itself has order 216 and it
is isomorphic to the group of affine transformations of A that have determinant 1. The
stabilizer in of any of the 12 lines through P is a cyclic subgroup of order three and is
generated by these subgroups.
The symmetry group is isomorphic to G(K)/Z(K) where G(K) GL(3, K) is a group
of order 648 generated by reflections of order three and Z(K) is its cyclic center of order
three. The reflection group G(C) is called the Hesse group which appears as G25 in the
classification of finite complex reflection groups by Shephard and Todd.
If K is algebraically closed and the characteristic of K is not 2 or 3 then the nine inflection
points of an elliptic curve E over K form a Hesse configuration.
Version: 3 Owner: debosberg Author(s): debosberg
1.3
Jordans Inequality
Jordans Inequality states 2 x 6 sin(x) 6 x, x [0, 2 ]

Version: 3 Owner: unlord Author(s): unlord
1.4
Lagranges theorem
Lagranges theorem
1: G group
2: H 6 G
3: [G : H] index of H in G
4: |G| = |H|[G : H]
Note: This is a seed entry written using a short-hand format described in this FAQ.
Version: 2 Owner: bwebste Author(s): akrowne, apmxi
2
1.5
Laurent series
A Laurent series centered about a is a series of the form
X
ck (z a)k
k=
where ck , a, z C.
One can prove that the above series converges everywhere inside the set
D := {z C | R1 < |z a| < R2 }
where
R1 := lim sup |ck |1/k
k
and

1/k
R2 := 1/ lim sup |ck |
.
k
(This set may be empty)
Every Laurent series has an associated function, given by

f (z) :=
k=
ck (z a)k ,
whose domain is the set of points in C on which the series converges. This function is analytic
inside the annulus D, and conversely, every analytic function on an annulus is equal to some
(unqiue) Laurent series.
Version: 3 Owner: djao Author(s): djao
1.6
Lebesgue measure
Let S R, and let S 0 be the complement of S with respect to R. We define S to be

measurable if, for any A R,
m (A) = m (A
S) + m (A
S 0)
where m (S) is the Lebesgue outer measure of S. If S is measurable, then we define the
Lebesgue measure of S to be m(S) = m (S).
Lebesgue measure on Rn is the n-fold product measure of Lebesgue measure on R.
Version: 2 Owner: vampyr Author(s): vampyr
3
1.7
Leray spectral sequence
The Leray spectral sequence is a special case of the Grothendieck spectral sequence regarding
composition of functors.
If f : X Y is a continuous map of topological spaces, and if F is a sheaf of abelian groups
on X, then there is a spectral sequence E2pq = H p (Y, Rq f F) H p+q (X, F)
where f is the direct image functor.
Version: 1 Owner: bwebste Author(s): nerdy2
1.8
M
obius transformation
S
A Mobius transformation is a bijection on the extended complex plane C {} given by
az+b
cz+d z 6= dc ,
a
f (z) =
z=
c
z = dc
where a, b, c, d C and ad bc 6= 0
It can be shown that the inverse, and composition of two mobius transformations are similarly
defined, and so the Mobius transformations form a group under composition.
The geometric interpretation of the Mobius group is that it is the group of automorphisms
of the Riemann sphere.
Any Mobius map can be composed from the elementary transformations - dilations, translations and inversions. If we define a line to be a circle passing through then it can be
shown that a Mobius transformation maps circles to circles, by looking at each elementary
transformation.
Version: 9 Owner: vitriol Author(s): vitriol
1.9
Mordell-Weil theorem
If E is an elliptic curve defined over a number field K, then the group of points with
coordinates in K is a finitely generated abelian group.
Version: 1 Owner: nerdy2 Author(s): nerdy2
1.10
Plateaus Problem
The Plateaus Problem is the problem of finding the surface with minimal area among all
surfaces wich have the same prescribed boundary.
This problem is named after the Belgian physicist Joseph plateau (1801-1883) who experimented with soap films. As a matter of fact if you take a wire (which represents a closed curve
in three-dimensional space) and dip it in a solution of soapy water, you obtain a soapy surface which has the wire as boundary. It turns out that this surface has the minimal area
among all surfaces with the same boundary, so the soap film is a solution to the Plateaus
Problem.
Jesse Douglas (1897-1965) solved the problem by proving the existence of such minimal
surfaces. The solution to the problem is achieved by finding an harmonic and conformal
parameterization of the surface.
The extension of the problem to higher dimensions (i.e. for k-dimensional surfaces in ndimensional space) turns out to be much more difficult to study. Moreover while the solutions
to the original problem are always regular it turns out that the solutions to the extended
problem may have singularities if n 8. To solve the extended problem the theory of
currents (Federer and Fleming) has been developed.
Version: 4 Owner: paolini Author(s): paolini
1.11
Poisson random variable
X is a Poisson random variable with parameter if

fX (x) =
e x
,
x!
x = {0, 1, 2, ...}
Parameters:
? >0
syntax:
X P oisson()
Notes:
1. X is often used to describe the ocurrence of rare events. Its a very commonly used
distribution in all fields of statistics.
2. E[X] =
3. V ar[X] =
t
4. MX (t) = e(e 1)
Version: 2 Owner: Riemann Author(s): Riemann
1.12
Shannons theorem
Definition (Discrete) Let (, F, ) be a discrete probability space, and let X be a discrete

random variable on .
The entropy H[X] is defined as the functional
X
H[X] =
(X = x) log (X = x).
(1.12.1)
Definition (Continuous) Entropy in the continuous case is called differential entropy.
DiscussionDiscrete Entropy Entropy was first introduced by Shannon in 1948 in his

landmark paper A Mathematical Theory of Communication. A modified and expanded
argument of his argument is presented here.
Suppose we have a set of possible events whose probabilities of occurrence are p1 , p2 , . . . , pn .
These probabilities are known but that is all we know concerning which event will occur.
Can we find a measure of how much choice is involved in the selection of the event or of
how uncertain we are of the outcome? If there is such a measure, say H(p1, p2 , . . . , pn ), it is
reasonable to require of it the following properties:
1. H should be continuous in the pi .
2. If all the pi are equal, pi = n1 , then H should be a monotonic increasing function of n.
With equally likely events there is more choice, or uncertainty, when there are more
possible events.
6
3. If a choice be broken down into two successive choices, the original H should be the
weighted sum of the individual values of H.
As an example of this last property, consider losing your luggage down a chute which feeds
three carousels, A, B and C. Assume that the baggage handling system is constructed such
that the probability of your luggage ending up on carousel A is 21 , on B is 13 , and on C is
1
. These probabilities specify the pi . There are two ways to think about your uncertainty
6
about where your luggage will end up.
First, you could consider your uncertainty to be H(PA , PB , PC ) = H( 21 , 31 , 61 ). On the other
hand, you reason, no matter how byzantine the baggage handling system is, half the time
your luggage will end up on carousel A and half the time it will end up on carousels B
or C (with uncertainty H(PA , PB S C ) = H( 21 , 12 )). If it doesnt go into A (and half the
time it wont), then two-thirds of the time it shows up on B and one-third of the time
it winds up on carousel C (and your uncertainty about this second event, in isolation, is
H(PB , PC ) = H( 32 , 13 )). But remember this second event only happens half the time (PB S C
of the time), so you must weight this second uncertainty appropriatelythat is, by 21 . The
uncertainties computed using each of these chains of reasoning must be equal. That is,

H (PA , PB , PC ) = H PA , PB S C + PB S C H (PB , PC )

1 1 1
1 1
1
2 1
H
=H
+ H
, ,
,
,
2 3 6
2 2
2
3 3
If youre not as lost as your luggage, then you may be interested in the following. . .
Theorem The only H satisfying the three above assumptions is of the form:
H = k
n
X
pi log pi
i=1
k is a constant, essentially a choice of unit of measure. The measure of uncertainty, H, is

called entropy, not to be confused (though it often is) with Boltzmanns thermodynamic
entropy. The logarithm may be taken to the base 2, in which case H is measured in bits,
or to the base e, in which case H is measured in nats.
DiscussionContinuous Entropy Despite its seductively analogous form, continuous

entropy cannot be obtained as a limiting case of discrete entropy.
We wish to obtain a generally finite measure as the bin size goes to zero. In the discrete
case, the bin size is the (implicit) width of each of the n (finite or infinite) bins/buckets/states
whose probabilities are the pn . As we generalize to the continuous domain, we must make
this width explicit.
7
To do this, start with a continuous function f discretized as shown in the figure:

As the figure indicates, by the mean-value theorem there exists a value xi in each bin such
Figure 1.1: Discretizing the function f into bins of width
that
(i+1)
f (xi ) = inti
f (x)dx
(1.12.2)
and thus the integral of the function f can be approximated (in the Riemannian sense) by
int
f (x)dx
= lim
f (xi )
(1.12.3)
i=
where this limit and bin size goes to zero are equivalent.
We will denote
H :=
and expanding the log we have
H =
=
i=
i=
f (xi ) log f (xi )
(1.12.4)
i=
f (xi ) log f (xi )

f (xi ) log f (xi )
(1.12.5)
f (xi ) log .
(1.12.6)
i=
As 0, we have
i=
i=
f (xi ) intf (x)dx = 1
and
f (xi ) log f (xi ) intf (x) log f (x)dx
This leads us to our definition of the differential entropy (continuous entropy):

h[f ] = lim H + log = int
f (x) log f (x)dx.
0
Version: 13 Owner: gaurminirick Author(s): drummond
(1.12.7)
(1.12.8)
(1.12.9)
1.13
Shapiro inequality
Let n > 3 positive reals x1 , x2 , . . . , xn R+ .

The following inequality
x1
x2
xn
n
+
++
>
x1 + x2 x2 + x3
x1 + x2
2
with xi + xi+1 > 0 is true for any even integer n 6 12 and any odd integer n 6 23.
Version: 1 Owner: alek thiery Author(s): alek thiery
1.14
Sylow p-subgroups
Let G be a finite group and p be a prime that divides |G|. We can then write |G| = pk m for
some positive integer k so that p does not divide m.
Any subgroup of H whose order is pk is called a Sylow p-subgroup or simply p subgroup.
First Sylow theorem states that any group with order pk m has a Sylow p-subgroup.
Version: 3 Owner: drini Author(s): drini, apmxi
1.15
Tchirnhaus transformations
A polynomial transformation which transforms a polynomial to another with certain zerocoefficients is called a Tschirnhaus Transformation. It is thus an invertible transformation of the form x 7 g(x)/h(x) where g, h are polynomials over the base field K (or some
subfield of the splitting field of the polynomial being transformed). If gcd(D(x), f (x)) = 1
then the Tschirnhaus transformation becomes a polynomial transformation mod f.
Specifically, it concerns a substitution that reduces finding the roots of the polynomial
p = T n + a1 T n1 + ... + an =
n
Y
(T ri ) k[T ]
i=1
to finding the roots of another q T

- with less parameters - and solving an auxiliary polynomial
equation s, with deg(s) < deg(p q).
Historically, the transformation was applied to reduce the general quintic equation, to simpler
resolvents. Examples due to Hermite and Klein are respectively: The principal resolvent
K(X) := X 5 + a0 X 2 + a1 X + a3
9
and the Bring-Jerrard form

K(X) := X 5 + a1 X + a2
Tschirnhaus transformations are also used when computing Galois groups to remove repeated
roots in resolvent polynomials. Almost any transformation will work but it is extremely hard
to find an efficient algorithm that can be proved to work.
Version: 5 Owner: bwebste Author(s): bwebste, ottem
1.16
Wallis formulae
1.3.....(2n 1)
2.4.....2n 2
2.4.....2n
3.5.....(2n + 1)
int02 sin2n xdx =
int02 sin2n+1 xdx =
Y 4n2
2244
=
=
...
2
2 n=1 4n 1
1335
Version: 2 Owner: vypertd Author(s): vypertd
1.17
ascending chain condition
A collection S of subsets of a set X (that is, a subset of the power set of X) satisfies
the ascending chain condition or ACC if there does not exist an infinite ascending chain
s1 s2 of subsets from S.
See also the descending chain condition (DCC).
Version: 2 Owner: antizeus Author(s): antizeus
1.18
bounded
Let X be a subset of R. We say that X is bounded when there exists a real number M such
that |x| < M for all x M. When X is an interval, we speak of an bounded interval.
This can be generalized first to Rn . We say that X Rn is bounded if there is a real number
M such that kxk < M for all x M and k k is the Euclidean distance between x and y.
When we consider balls, we speak of bounded balls
10
This condition is equivalent to the statement: There is a real number T such that kxyk < T
for all x, y X.
A further generalization to any metric space V says that X V is bounded when there is a
real number M such that d(x, y) < M for all x, y X and d represents the metric (distance
function) on V .
1.19
bounded operator
Definition [1]
1. Suppose X and Y are normed vector spaces with norms kkX and kkY . Further,
suppose T is a linear map T : X Y . If there is a C 0 such that
kT xkY
kxkX
for all x X, then T is a bounded operator.

2. Let X and Y be as above, and let T : X Y is a bounded operator. Then the norm
of T is defined as the real number
kT k = sup{
kT xkY
| x X \ {0}}.
kxkX
In the special case when X is the zero vector space, any linear map T : X Y is the
zero map since T (0) = 0T (0) = 0. In this case, we define kT k = 0.
TODO:
1. The defined norm for mappings is a norm
2. Examples: identity operator, zero operator: see [1].
3. Give alternative expressions for norm of T . (supremum taken over unit ball)
4. Discuss boundedness and continuity
Theorem [1, 2] Suppose T : X Y is a linear map between vector spaces X and Y . If X
is finite dimensional, then T is bounded.
11
REFERENCES
1. E. Kreyszig, Introductory Functional Analysis With Applications, John Wiley & Sons,
1978.
2. G. Bachman, L. Narici, Functional analysis, Academic Press, 1966.
Version: 2 Owner: bwebste Author(s): matte, apmxi
1.20
complex projective line
complex projective line

1: (z1 , z2 ) complex numbers
2: (z1 , z2 ) 6= (0, 0)
3: C \ {0} : (z1 , z2 ) (z1 , z2 )

4: {(z1 , z2 ) C \ {0}}/
Version: 1 Owner: bwebste Author(s): apmxi
1.21
converges uniformly
Let X be a set, (Y, ) a metric space and {fn } a sequence of functions from X to Y , and
f : X Y another function.
If for any > 0 there exists an integer N such that
(fn (x), f (x)) <
for all n > N we say that fn converges unformly to f .
12
1.22
descending chain condition
A collection S of subsets of a set X (that is, a subset of the power set of X) satisfies
the descending chain condition or DCC if there does not exist an infinite descending chain
s1 s2 of subsets from S.
See also the ascending chain condition (ACC).
1.23
diamond theorem
In the simplest case, the result states that every image of a two-colored Diamond figure (like
the figure in Platos Meno dialogue) under the action of the symmetric group of degree 4 has
some ordinary or color-interchange symmetry. The theorem generalizes to graphic designs
on 2x2x2, 4x4, and 4x4x4 arrays. It is of interest because it relates classical (Euclidean)
symmetries to underlying group actions that come from finite rather than from classical
geometry. The group actions in the 4x4 case of the theorem throw some light on the R. T.
Curtis miracle octad generator approach to the large Mathieu group.
Version: 2 Owner: m759 Author(s): m759
1.24
equivalently oriented bases
equivalently oriented bases

1: V finite-dimensional vector space
2: (v1 , . . . , vn ) ordered basis for V
3: (w1 , . . . , wn ) ordered basis for V
4: A : V V
5: i {1, . . . , n} : Avi = wi
6: det(A) > 0
fact: there is a unique linear isomorphism taking a given basis to

another given basis
13
1: V finite-dimensional vector space

2: (v1 , . . . , vn ) ordered basis for V
3: (w1 , . . . , wn ) ordered basis for V
4: !A : V V linear isomorphism : i {1, . . . , n} : Avi = wi
1.25
finitely generated R-module
finitely generated R-module

1: X module over R
2: Y X
3: X generated by Y
4: Y finite
Version: 1 Owner: Thomas Heye Author(s): apmxi
1.26
fraction
A fraction is a rational number expressed in the form nd or n/d, where n is designated the
numerator and d the denominator. The slash between them is known as a solidus when
the fraction is expressed as n/d.
The fraction n/d has value n | d. For instance, 3/2 = 3 | 2 = 1.5.
If n/d < 1, then n/d is known as a proper fraction. Otherwise, it is an improper
fraction. If n and d are relatively prime, then n/d is said to be in lowest terms. To
get a fraction in lowest terms, simply divide the numerator and the denominator by their
greatest common divisor:
60
60 | 12
5
=
= .
84
84 | 12
7
14
The rules for manipulating fractions are
a
+
b
a
b
a
b
a
|
b
a
b
c
d
c
d
c
d
c
d
=
=
=
=
=
ka
kb
ad + bc
bd
ad bc
bd
ac
bd
ad
.
bc
Version: 3 Owner: bwebste Author(s): digitalis
1.27
group of covering transformations
group of covering transformations

1: ({h : X X hcovering transformation}, )
1.28
idempotent
idempotent
1: R ring
2: r R
3: r 2 = r
The following facts hold in commutative rings.
15
fact: if r is idempotent, then 1 r is idempotent

1: R ring
2: r R
3: r idempotent
4: 1 r idempotent
fact: if r is idempotent, then rR is a ring

1: R ring
2: r R
3: r idempotent
4: rR is a ring
fact: if r is idempotent, then rR has identity r

1: R ring
2: r R
3: r idempotent
4: s rR : rs = sr = s
fact: if r is idempotent, then R

= rR (1 r)R
1: R ring
2: r R
3: r idempotent
4: R
= rR (1 r)R
16
1.29
isolated
Let X be a topological space, let S X, and let x S. The

T point x is said to be an isolated
point of S if there exists an open set U X such that U S = {x}.
The set S is isolated if every point in S is an isolated point.
1.30
isolated singularity
isolated singularity
1: f : U C C
2: z0 U
{}
3: f analytic on U \ {z0 }
1.31
isomorphic groups
isomorphic groups
1: (X1 , 1 ), (X2 , 2 ) groups
2: f : X1 X2 isomorphism
17
1.32
joint continuous density function
Let X1 , X2 , ..., Xn be n random variables all defined on the same probability space. The joint
continuous density function of X1 , X2 , ..., Xn , denoted by fX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ), is the
function
fX1 ,X2 ,...,Xn : Rn R
(x ,x ,...,x )
n
1 2
such that int(,,...,)
fX1 ,X2 ,...,Xn (u1, u2 , ..., un )du1 du2 ...dun = FX1 ,X2 ,...,Xn (x1 , x2 , ..., xn )
As in the case where n = 1, this function satisfies:
1. fX1 ,X2 ,...,Xn (x1 , ..., xn ) > 0 (x1 , ..., xn )

2. intx1 ,...,xn fX1 ,X2 ,...,Xn (u1 , u2, ..., un )du1 du2 ...dun = 1
As in the single variable case, fX1 ,X2 ,...,Xn does not represent the probability that each of the
random variables takes on each of the values.
1.33
joint cumulative distribution function
Let X1 , X2 , ..., Xn be n random variables all defined on the same probability space. The joint
cumulative distribution function of X1 , X2 , ..., Xn , denoted by FX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ),
is the following function:
FX1 ,X2 ,...,Xn : Rn R
FX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ) = P [X1 6 x1 , X2 6 x2 , ..., Xn 6 xn ]
As in the unidimensional case, this function satisfies:
1. lim(x1 ,...,xn)(,...,) FX1 ,X2 ,...,Xn (x1 , ..., xn ) = 0 and lim(x1 ,...,xn)(,...,) FX1 ,X2 ,...,Xn (x1 , ..., xn ) =
1
2. FX1 ,X2 ,...,Xn (x1 , ..., xn ) is a monotone, nondecreasing function.
18
3. FX1 ,X2 ,...,Xn (x1 , ..., xn ) is continuous from the right in each variable.
The way to evaluate FX1 ,X2 ,...,Xn (x1 , ..., xn ) is the following:
1
2
n
FX1 ,X2 ,...,Xn (x1 , ..., xn ) = intx
intx
intx
fX1 ,X2 ,...,Xn (u1 , ..., un )du1du2 dun
(if F is continuous) or
FX1 ,X2 ,...,Xn (x1 , ..., xn ) =
i1 6x1 ,...,in6xn
fX1 ,X2 ,...,Xn (i1 , ..., in )
(if F is discrete),
where fX1 ,X2 ,...,Xn is the joint density function of X1 , ..., Xn .
1.34
joint discrete density function
Let X1 , X2 , ..., Xn be n random variables all defined on the same probability space. The
joint discrete density function of X1 , X2 , ..., Xn , denoted by fX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ), is
the following function:
fX1 ,X2 ,...,Xn : Rn R
fX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ) = P [X1 = x1 , X2 = x2 , ..., Xn = xn ]
As in the single variable case, sometimes its expressed as pX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ) to mark
the difference between this function and the continuous joint density function.
Also, as in the case where n = 1, this function satisfies:
1. fX1 ,X2 ,...,Xn (x1 , ..., xn ) > 0 (x1 , ..., xn )

P
2.
x1 ,...,xn fX1 ,X2 ,...,Xn (x1 , ..., xn ) = 1
In this case, fX1 ,X2 ,...,Xn (x1 , ..., xn ) = P [X1 = x1 , X2 = x2 , ..., Xn = xn ].
19
1.35
left function notation
We are said to be using left function notation if we write functions to the left of their
arguments. That is, if : X Y is a function and x X, then x is the image of x under
.
Furthermore, if we have a function : Y Z, then we write the composition of the two
functions as : X Z, and the image of x under the composition as x = ()x =
(x).
Compare this to right function notation.
1.36
lift of a submanifold
lift of a submanifold
1: X, Y topological manifolds
2: Z Y submanifold
3: g : Z Y inclusion
4: g lift of g
5: i(
g)
1.37
limit of a real function exits at a point
Let X R be an open set of real numbers and f : X R a function.

If x0 X, we say that f is continuous at x0 if for any > 0 there exists positive such that
|f (x) f (x0 )| <
20
whenever
|x x0 | < .
Based on apm
1.38
lipschitz function
lipschitz function
1: f : R C
2: M R : x, y R : |f (x) f (y)| 6 M|x y|
1.39
lognormal random variable
X is a Lognormal random variable with parameters and 2 if
fX (x) =
e
1
22
(ln x)2
2 2
,x>0
Parameters:
? R
? 2 > 0
syntax:
X LogN(, 2)
21
Notes:
1. X is a random variable such that ln(X) is a normal random variable with mean and
variance 2 .
2. E[X] = e+
2 /2
2 + 2
3. V ar[X] = e2
(e 1)
4. MX (t) not useful

1.40
lowest upper bound
Let S be a set with an ordering relation 6, and let T be a subset of S. A lowest upper bound
of T is an upper bound x of T with the property that x 6 y for every upper bound y of T .
A lowest upper bound of T , when it exists, is unique.
Greatest lower bound is defined similarly: a greatest lower bound of T is a lower bound x of
T with the property that x > y for every lower bound y of T .
1.41
marginal distribution
Given random variables X1 , X2 , ..., Xn and a subset I {1, 2, ..., n}, the marginal distribution of the random variables Xi : i I is the following:
P
f{Xi :iI} (x) = {xi :iI}
or
/ fX1 ,...,Xn (x1 , ..., xn ) Q
(u
,
...,
u
)
f{Xi :iI} (x) = int{xi :iI}
f
1
n
X1 ,...,Xn
/
{ui :iI}
/ dui ,
summing if the variables are discrete and integrating if the variables are continuous.
This is, the marginal distribution of a set of random variables X1 , ..., Xn can be obtained by
summing (or integrating) the joint distribution over all values of the other variables.
22
The most common marginal distribution is the individual marginal distribution (ie, the
marginal distribution of ONE random variable).
1.42
measurable space
A measurable space is a set E together with a collection B(E) of subsets of E which is a

sigma algebra.
The elements of B(E) are called measurable sets.
1.43
measure zero
measure zero
1: (X, M, ) measure space
2: A M
3: (A) = 0
1.44
minimum spanning tree
Given a graph G with weighted edges, a minimum spanning tree is a spanning tree with
minimum weight, where the weight of a spanning tree is the sum of the weights of its edges.
There may be more than one minimum spanning tree for a graph, since it is the weight of
the spanning tree that must be minimum.
For example, here is a graph G of weighted edges and a minimum spanning tree T for that
graph. The edges of T are drawn as solid lines, while edges in G but not in T are drawn as
dotted lines.
23
4
4
8
5
5
7
3
2
Prims algorithm or Kruskals algorithm can compute the minimum spanning tree of a graph.
Version: 3 Owner: Logan Author(s): Logan
1.45
minimum weighted path length
Given a list of weights, W := {w1 , w2 , . . . , wn }, the minimum weighted path length is the
minimum of the weighted path length of all extended binary trees that have n external nodes
with weights taken from W . There may be multiple possible trees that give this minimum
path length, and quite often finding this tree is more important than determining the path
length.
Example
Let W := {1, 2, 3, 3, 4}. The minimum weighted path length is 29. A tree that gives this
weighted path length is shown below.
Applications
Constructing a tree of minimum weighted path length for a given set of weights has several
applications, particularly dealing with optimization problems. A simple and elegant algorithm for constructing such a tree is Huffmans algorithm. Such a tree can give the most
optimal algorithm for merging n sorted sequences (optimal merge). It can also provide a
means of compressing data (Huffman coding), as well as lead to optimal searches.
24
1.46
mod 2 intersection number
mod 2 intersection number

case: transversal map
1: X smooth manifold
2: X compact
3: Y smooth manifold
4: Z Y closed submanifold
5: f : X Y smooth
6: Z and X have complementary dimension
7: f transversal to Z
8: |f 1 (Z)| (mod#1)
case: nontransversal map

2: X compact
5: f : X Y smooth
6: dim(X) + dim(Z) = dim(Y )
7: g homotopic to f
8: g transversal to Z
9: |g 1(Z)| (mod#1)
25
fact: a homotopic transversal map exists

2: X compact
5: f : X Y smooth
6: dim(X) + dim(Z) = dim(Y )
7: g homotopic to f : g transversal to Z
fact: two homotopic transversal maps have the same mod 2 intersection number
2: X compact
5: f1 , f2 : X Y smooth
6: f1 homotopic to f2
7: I2 (f1 , Z) = I2 (f2 , Z)
fact: boundary theorem

1: X manifold with boundary
2: Y manifold
3: Z Y submanifold
4: Z and X have complementary dimension
5: g : X Y
26
6: g can be extended to X
7: I2 (g, Z) = 0
1.47
moment generating function
Given a random variable X, the moment generating function of X is the following

function:
MX (t) = E[etX ] for t R (if the expectation converges).
It can be shown that if the moment generating function of X is defined on an interval around
the origin, then
(k)
E[X k ] = MX (t)|t=0
In other words, the kth-derivative of the moment generating function evaluated at zero is
the kth moment of X.
1.48
monoid
A monoid is a semigroup G which contains an identity element; that is, there exists an
element e G such that e a = a e = a for all a G.
1.49
monotonic operator
For a poset X, an operator T is a monotonic operator if for all x, y X, x 6 y implies

T (x) 6 T (y).
27
1.50
multidimensional Gaussian integral
Let N(0, K) be an unnormalized multidimensional Gaussian with mean 0 and covariance

matrix K, Kij = cov(xi , xj ). K Q
is symmetric by the identity cov(xj , xi ) = cov(xi , xj ). Let
T
n
x = [x1 x2 . . . xn ] and d x ni=1 dxn .
It is easy to see that N(0, K) = exp ( 21 xT K1 x). How can we normalize N(0, K)?
We can show that
where |K| = det K.
e 2 x
T K1 x
dn x = ((2)n |K|) 2
(1.50.1)
K1 is real and symmetric (since (K1 )T = (KT )1 = K1 ). For convenience, let A = K1 .

We can decompose A into A = TT1 , where T is an orthonormal (TT T = I) matrix of
the eigenvectors of A and is a diagonal matrix of the eigenvalues of A. Then
Z
Z
Z
Z
1 T
1
21 xT Ax n
e
d x = e 2 x TT x dn x.
(1.50.2)
Because T is orthonormal, we have T1 = TT . Now define a new vector variable y TT x,
and substitute:
Z
21 xT TT1 x n
d x=
=
T TTT x
T y
e 2 x
e 2 y
where |J| is the determinant of the Jacobian matrix Jmn =

thus |J| = 1.
dn x
|J|dn y
(1.50.3)
(1.50.4)
(1.50.5)
xm
.
yn
In this case, J = T and
Now were in business, because is diagonal and thus the integral may be separated into
the product of n independent Gaussians, each of which we can integrate separately using the
well-known formula
21
1 2
2
inte 2 at dt =
.
(1.50.6)
a
Carrying out this program, we get
28
12 yT y n
d y=
n
Y
inte 2 k yk dyk
k=1
n
Y
k=1
2
k
(1.50.7)
12
(1.50.8)
12
(2)n
= Qn
k=1 k
1

(2)n 2
=
||
(1.50.9)
(1.50.10)
(1.50.11)
Now, we have |A| = |TT1 | = |T||||T1| = |T| = |||T|1 = ||, so this becomes
Z
21 xT Ax n
d x=
(2)n
|A|
21
(1.50.12)
Substituting back in for K1 , we get
as promised.
21 xT K1 x n
d x=
(2)n
|K1 |
12
Version: 4 Owner: drini Author(s): drini, drummond
1.51
multiindex
multiindex
Let n N. Then a element Nn is called a multiindex
Version: 2 Owner: mike Author(s): mike, apmxi
29
= ((2)n |K|) 2 ,
(1.50.13)
1.52
near operators
1.52.1
Perturbations and small perturbations: definitions and some

results
We start our discussion on the Campanato theory of near operators with some preliminary
tools.
Let X, Y be two sets and let a metric d be defined on Y . If F : X Y is an injective map,
we can define a metric on X by putting:
dF (x0 , x00 ) = d(F (x0 ), F (x00 )).
Indeed, dF is zero if and only if x0 = x00 (since F is injective); dF is obviously symmetric and
the triangle inequality follows from the triangle inequality of d.
If moreover F (X) is a complete subspace of Y , then X is complete wrt the metric dF .
Indeed, let (un ) be a Cauchy sequence in X. By definition of d, then (F (un )) is a Cauchy
sequence in Y , and in particular in F (X), which is complete. Thus, there exists y0 =
F (x0 ) F (X) which is limit of the sequence (F (un )). x0 is the limit of (xn ) in (X, dF ),
which completes the proof.
A particular case of the previous statement is when F is onto (and thus a bijection) and
(Y, d) is complete.
Similarly, if F (X) is compact in Y , then X is compat with the metric dF .
Definition 1. Let X be a set and Y be a metric space. Let F, G be two maps from X to
Y . We say that G is a perturbation of F if there exist a constant k > 0 such that for each
x0 , x00 X one has:
d(G(x0 ), G(x00 )) kd(F (x0 ), F (x00 ))
remark 1. In particular, if F is injective then G is a perturbation of F if G is uniformly continuous
wrt to the metric induced on X by F .
Definition 2. In the same hypothesis as in the previous definition, we say that G is a small
perturbation of F if it is a perturbation of constant k < 1.
We can now prove this generalization of the Banach-Caccioppoli fixed point theorem:
Theorem 1. Let X be a set and (Y, d) be a complete metric space. Let F, G be two mappings
from X to Y such that:
1. F is bijective;
30
2. G is a small perturbation of F .
Then, there exists a unique u X such that G(u) = F (u)
he hypothesis (1) ensures that the metric space (X, dF ) is complete. If we now consider
the function T : X X defined by
T
T (x) = F 1 (G(x))
we note that, by (2), we have
d(G(x0 ), G(x00 )) kd(F (x0 ), F (x00 ))
where k (0, 1) is the constant of the small perturbation; note that, by the definition of dF
and applying F F 1 to the first side, the last equation can be rewritten as
dF (T (x0 ), T (x00 )) kdF (x0 , x00 );
in other words, since k < 1, T is a contraction in the complete metric space (X, dF ); therefore
(by the classical Banach-Caccioppoli fixed point theorem) T has a unique fixed point: there
exist u X such that T (u) = u; by definition of T this is equivalent to G(u) = F (u), and
the proof is hence complete.
remark 2. The hypothesis of the theorem can be generalized as such: let X be a set and
Y a metric space (not necessarily complete); let F, G be two mappings from X to Y such
that F is injective, F (X) is complete and G(X) F (X); then there exists u X such that
G(u) = F (u).
(Apply the theorem using F (X) instead of Y as target space.)
remark 3. The Banach-Caccioppoli fixed point theorem is obtained when X = Y and F is
the identity.
We can use theorem 1 to prove a result that applies to perturbations which are not necessarily
small (i.e. for which the constant k can be greater than one). To prove it, we must assume
some supplemental structure on the metric of Y : in particular, we have to assume that the
metric d is invariant by dilations, that is that d(y 0, y 00) = d(y 0, y 00 ) for each y 0, y 00 Y .
The most common case of such a metric is when the metric is deduced from a norm (i.e. when
Y is a normed space, and in particular a Banach space). The result follows immediately:
Corollary 1. Let X be a set and (Y, d) be a complete metric space with a metric d invariant
by dilations. Let F, G be two mappings from X to Y such that F is bijective and G is a
perturbation of F , with constant K > 0.
Then, for each M > K there exists a unique uM X such that G(u) = MF (u)
he proof is an immediate consequence of theorem 1 given that the map G(u)

= G(u)/M
is a small perturbation of F (a property which is ensured by the dilation invariance of the
metric d).
T
31
We also have the following

Corollary 2. Let X be a set and (Y, d) be a complete, compact metric space with a metric
d invariant by dilations. Let F, G be two mappings from X to Y such that F is bijective and
G is a perturbation of F , with constant K > 0.
Then there exists at least one uK X such that G(u ) = KF (u )
L et (an ) be a decreasing sequence of real numbers greater than one, converging to one
(an 1) and let Mn = an K for each n N. We can apply corollary 1 to each Mn , obtaining
a sequence un of elements of X for which one has
G(un ) = Mn F (un ).
(1.52.1)
Since (X, dF ) is compact, there exist a subsequence of un which converges to some u ; by

continuity of G and F we can pass to the limit in (1.52.1), obtaining
G(u ) = KF (u )
which completes the proof.
remark 4. For theorem 2 we cannot ensure uniqueness of u , since in general the sequence
un may change with the choice of an , and the limit might be different. So the corollary can
only be applied as an existence theorem.
1.52.2
Near operators
We can now introduce the concept of near operators and discuss some of their properties.
A historical remark: Campanato initially introduced the concept in Hilbert spaces; subsequently, it was remarked that most of the theory could more generally be applied to Banach
spaces; indeed, it was also proven that the basic definition can be generalized to make part
of the theory available in the more general environment of metric vector spaces.
We will here discuss the theory in the case of Banach spaces, with only a couple of exceptions:
to see some of the extra properties that are available in Hilbert spaces and to discuss a
generalization of the Lax-Milgram theorem to metric vector spaces.
1.52.3
Basic definitions and properties
Definition 3. Let X be a set and Y a Banach space. Let A, B be two operators from X
to Y . We say that A is near B if and only if there exist two constants > 0 and k (0, 1)
such that, for each x0 , x00 X one has
kB(x0 ) B(x00 ) (A(x0 ) A(x00 ))k 6 k kB(x0 ) B(x00 )k
32
In other words, A is near B if B A is a small perturbation of B for an appropriate value

of .
Observe that in general the property is not symmetric: if A is near B, it is not necessarily
true that B is near A; as we will briefly see, this can only be proven if < 1/2, or in the
case that Y is a Hilbert space, by using an equivalent condition that will be discussed later
on. Yet it is possible to define a topology with some interesting properties on the space of
operators, by using the concept of nearness to form a base.
The core point of the nearness between operators is that it allows us to transfer many
important properties from B to A; in other words, if B satisfies certain properties, and A is
near B, then A satisfies the same properties. To prove this, and to enumerate some of these
nearness-invariant properties, we will emerge a few important facts.
In what follows, unless differently specified, we will always assume that X is a set, Y is a
Banach space and A, B are two operators from X to Y .
Lemma 1. If A is near B then there exist two positive constants M1 , M2 such that
kB(x0 ) B(x00 )k 6 M1 kA(x0 ) A(x00 )k
kA(x0 ) A(x00 )k 6 M2 kB(x0 ) B(x00 )k
e have:
kB(x0 ) B(x00 )k 6
6 kB(x0 ) B(x00 ) (A(x0 ) A(x00 ))k + kA(x0 ) A(x00 )k 6
6 k kB(x0 ) B(x00 )k + kA(x0 ) A(x00 )k
and hence
kA(x0 ) A(x00 )k
1k
which is the first inequality with M1 = /(1 k) (which is positive since k < 1).
kB(x0 ) B(x00 )k 6
But also
kA(x0 ) A(x00 )k 6
1
1
6 kB(x0 ) B(x00 ) (A(x0 ) A(x00 ))k + kB(x0 ) B(x00 )k 6
k
1
6 kB(x0 ) B(x00 )k + kB(x0 ) B(x00 )k
and hence
1+k
kB(x0 ) B(x00 )k
which is the second inequality with M2 = (1 + k)/.

kA(x0 ) A(x00 )k 6
33
The most important corollary of the previous lemma is the following

Corollary 3. If A is near B then two points of X have the same image under A if and only
if the have the same image under B.
We can express the previous concept in the following formal way: for each y in B(X) there
exist z in Y such that A(B 1 (y)) = {z} and conversely. In yet other words: each fiber of A
is a fiber (for a different point) of B, and conversely.
It is therefore possible to define a map TA : B(X) Y by putting TA (y) = z; the range of
TA is A(X). Conversely, it is possible to define TB : A(X) Y , by putting TB (z) = y; the
range of TB is B(X). Both maps are injective and, if restricted to their respective ranges,
one is the inverse of the other.
Also observe that TB and TA are continuous. This follows from the fact that for each x X
one has
TA (B(x)) = A(x),
TB (A(x)) = B(x)
and that the lemma ensures that given a sequence (xn ) in X, the sequence (B(xn )) converges
to B(x0 ) if and only if (A(xn )) converges to A(x0 ).
We can now list some invariant properties of operators with respect ot nearness. The properties are given in the form if and only if because each operator is near itself (therefore
ensuring the only if part).
1. a map is injective iff it is near an injective operator;
2. a map is surjective iff it is near a surjective operator;
3. a map is open iff it is near an open map;
4. a map has dense range iff it is near a map with dense range.
To prove (2) it is necessary to use theorem 1.
AnotherTimportant property that follows from the lemma is that if there exist y Y such that
A1 (y) B 1 (y) 6= , then it is A1 (y) = B 1 (y): intersecting fibers are equal. (Campanato
only stated this property for the case y = 0 and called it the kernel property; I prefer to
call it the fiber persistence property.)
A topology based on nearness

In this section we will show that the concept of nearness between operator can indeed be
connected to a topological understanding of the set of maps from X to Y .
34
Let M be the set of maps between X and Y . For each F M and for each k (0, 1) we let
Uk (F ) the set of all maps G M such that F G is a small perturbation of F with constant
k. In other words, G Uk (F ) iff G is near F with constants 1, k.
The set U(F ) = {Uk (F ) | 0 < k < 1} satisfies the axioms of the set of fundamental
neighbourhoods. Indeed:
1. F belongs to each Uk (F );
2. Uk (F ) Uh (F ) iff k < h, and thus the intersection property of neighbourhoods is
trivial;
3. for each Uk (F ) there exist Uh (F ) such that for each G Uh (F ) there exist Uj (G)
Uk (F ).
This last property (permanence of neighbourhoods) is somewhat less trivial, so we shall now
prove it.
L
et Uk (F ) be given.
Let Uh (F ) be another arbitrary neighbourhood of F and let G be an arbitrary element in it.

We then have:
kF (x0 ) F (x00 ) (G(x0 ) G(x00 ))k h kF (x0 ) F (x00 )k .
(1.52.2)
but also (lemma 1)

k(G(x0 ) G(x00 ))k (1 + h) kF (x0 ) F (x00 )k .
(1.52.3)
Let also Uj (G) be an arbitrary neighbourhood of G and H an arbitrary element in it. We

then have:
kG(x0 ) G(x00 ) (H(x0 ) H(x00 ))k j kG(x0 ) G(x00 )k .
(1.52.4)
The nearness between F and H is calculated as such:
kF (x0 ) F (x00 ) (H(x0 ) H(x00 ))k
kF (x0 ) F (x00 ) (G(x0 ) G(x00 ))k + kG(x0 ) G(x00 ) (H(x0 ) H(x00 ))k
h kF (x0 ) F (x00 )k + j kG(x0 ) G(x00 )k (h + j(1 + h)) kF (x0 ) F (x00 )k . (1.52.5)
We then want h + j(1 + h) k, that is j (k h)/(1 + h); the condition 0 < j < 1 is always
satisfied on the right side, and the left side gives us h < k.
It is important to observe that the topology generated this way is not a Hausdorff topology:
indeed, it is not possible to separate F and F + y (where F M and y is a constant element
of Y ). On the other hand, the subset of all maps with with a fixed valued at a fixed point
(F (x0 ) = y0 ) is a Hausdorff subspace.
35
Another important characteristic of the topology is that the set H of invertible operators
from X to Y is open in M (because a map is invertible iff it is near an invertible map). This
is not true in the topology of uniform convergence, as is easily seen by choosing X = Y = R
and the sequence with generic element Fn (x) = x3 x/n: the sequence converges (in the
uniform convergence topology) to F (x) = x3 , which is invertible, but none of the Fn is
invertible. Hence F is an element of H which is not inside H, and H is not open.
1.52.4
Some applications
As we mentioned in the introduction, the Campanato theory of near operators allows us

to generalize some important theorems; we will now present some generalizations of the
Lax-Milgram theorem, and a generalization of the Riesz representation theorem.
[TODO]
Version: 5 Owner: Oblomov Author(s): Oblomov
1.53
negative binomial random variable
X is a Negative binomial random variable with parameters r and p if

fX (x) =
r+x1
x
pr (1 p)x , x = {0, 1, ...}
Parameters:
? r>0
? p [0, 1]
syntax:
X NegBin(r, p)
Notes:
1. If r N, X represents the number of failed Bernoulli trials before the rth success.
Note that if r = 1 the variable is a geometric random variable.
36
2. E[X] = r 1p
p
3. V ar[X] = r 1p
p2
p
r
4. MX (t) = ( 1(1p)e
t)
1.54
normal random variable
X is a Normal random variable with parameters and 2 if
fX (x) =
(x)
1
2 2
e
2
2
,xR
Parameters:
? R
? 2 > 0
syntax:
X N(, 2 )
Notes:
1. Probably the most frequently used distribution. fX (x) will look like a bell-shaped
function, hence justifying the synonym bell distribution.
2. When = 0 and 2 = 1 the distribution is called standard normal
3. The cumulative distribution function of X is often called (x).
4. E[X] =
5. V ar[X] = 2
2
2
2
6. MX (t) = et+t

37
1.55
normalizer of a subset of a group
normalizer of a subset of a group

1: X group
2: Y X subset

3: {x X xY x1 = Y }
Version: 1 Owner: drini Author(s): apmxi
1.56
nth root
There are two often-used definitions of the nth root. The first discussed deals with real numbers
only; the second deals with complex numbers.
The nth root of a non-negative real number x, written as n x, can be defined as the real
number y such that y n = x. This notation is normally,
but not always, used when n is a
n
natural number. This definition could also be written as n x xx > 0 R.
Example: 4 81 = 3 because 34 = 3 3 3 3 = 81.
Example: 5 x5 + 5x4 + 10x3 + 5x2 + 1 = x + 1 because (x + 1)5 = (x2 + 2x + 1)2 (x + 1) =

x5 + 5x4 + 10x3 + 10x2 + 5x + 1. (See the binomial theorem and Pascals Triangle.)
The nth root operation is distributive for multiplication
division, but not for addition
q and
nx
x
n
n
and subtraction. That is, x y = x n y, and n y =
n y . However, except in special
cases, n x + y 6= n x + n y and n x y 6= n x n y.
q
4
4
81
81
Example: 4 625
= 35 because 35 = 534 = 625
.
1
. As
The nth root notation is actually an alternative to exponentiation. That
is, n x x n
3
3
n
n
3
such, the nth root operation is associative with exponentiation. That is, x = x n = x .
n
x
is undefined when
x < 0.and n is even. When n is odd and x < 0,
In
this
definition,
3
4
n
x < 0. Examples: 1 = 1, but 1 is undefined for this definition.
A more generalized definition: The nth roots of a complex number t = x+yi = (x, yi) = (r, )
are all the complex numbers z1 , z2 , . . . , zn C that satisfy the condition zkn = t. n such
complex numbers always exist.
38
One of the more popular methods of finding these roots is through geometry and trigonometry. The complex numbers are treated as a plane using Cartesian coordinates
with an x axis
and a yi axis. (Remember, in the context of complex numbers, i 1.) These p

rectangular coordinates (x, yi) are then translated to polar coordinates (r, ), where r = 2n x2 + y 2
(according to the previous definition of nth root), = 2 if x = 0, and = arctan xy if x 6= 0.
(See the Pythagorean theorem.)
Then the nth roots of t are the vertices of a regular polygon having n sides, centered at
(0, 0i), and having (r, ) as calculated above as one of its vertices.
Example: Consider 3 8. 8 can also be written as 8+0i or in polar as (8, 0). By our method, we
now have an equilateral triangle centered at (0, 0) and having one vertex at (2, 0). Knowing
that a complete circle consists of 2 radians, and knowing that all angles are equal in an
equilateral triangle, we can deduce that the other two vertices lie at polar coordinates (2, 2
)
3
).
Translating
back
into
rectangular
coordinates,
we
have:
and (2, 4
3
3
8=2
3
3
2
1
8 = 2(cos 2
+
i
sin
)
=
2(
+
i
) = 1 + i 3
3
3
2
2
4
1
3
+
i
sin
)
=
2(
+
i
) = 1 i 3
8 = 2(cos 4
3
3
2
2
Example: Consider 4 16. We can rewrite this as 4 1 4 16 = 2 i.
We can find 2 i by using a formula for multiplying complex numbers in

polar coordinates:
4
2
(r1 , 1 ) (r
02 + 12 = 1 and
2 , 2 ) = (r1 r2 , 1 + 2 ). So 0 + i = (r , 2). Therefore, r =
= 4 . So i = (1, 4 ), and doubling that we get (2, 4 ).
Now we have a square centered at polar coordinates (0, 0) with one corner at (2, 4 ). Adding
to the angle repeatedly gives us the remainder of the corners: (2, 3

), (2, 5
), (2, 7
).
2
4
4
4
Translating these to rectangular coordinates works as in the previous example.
So the four solutions to 4 16 are 2 + i 2, 2 + i 2, 2 i 2, and 2 i 2.
Example: Consider 3 1 + i. As in theprevious examples,

our first step is to convert 1 + 1i
into polar coordinates. We get r = 12 + 12 = 2 and = arctan 1 = 4 , giving a polar
coordinate of ( 2, 4 ). Now we take the cube root of this complex number: ( 2, 4 ) = (r 3 , 3).

We get coordinates ( 6 2, 12
). This point is one vertex of an equilateral triangle centered at
to . We know this
(0, 0). The other two vertices of the triangle are derived from adding 2
3
because lines from the center of an equilateral triangle to each of the corners will form three
equal angles of width 2
about the center, and because all three vertices of an equilateral
3
triangle will be the same distance from the center.
6
So the other vertices in polar coordinates are ( 6 2, 3
)
and
(
2, 17
). Most people would
4
12
just use a calculator to compute the sines and cosines of these angles, but they can be
interpolated using these handy identities:
39

cos 2t = 1 2 sin2 t (use this to calculate sin( 12
) from sin( 3 ) =
sin(a + b) = sin(a) cos(b) + cos(a) sin(b) (use a =
3
4
and b =
3
)
2
2
)
3
cos(a + b) = cos(a) cos(b) sin(a) sin(b)

The process of calculating these values is left as an exercise to the reader in the interest of
space. The rectangular coordinates, the cube roots of 1 + i, are:
q
6
6
6
2 4 12
( 2, 12 ) = 2 + i 2 1 23
6
6
128
128
( 6 2, 3
)
=
+
i
4
2
2
6
6
2 6
2+ 6
)
=
i
( 6 2, 17
2
2
12
4
4
Version: 8 Owner: mathcam Author(s): mathcam, wberry
1.57
null tree
A null tree is simply a tree with zero nodes.

1.58
open ball
Let (X, ) be a metric space and x0 X. Let r be a positive number. The set
B(x0 , r) = {x X : (x, x0 ) < r}
is called the ball with center x0 and radius r. On some spaces like C or R2 this is also known
as an open disk and when the space is R, it is known as open interval (all three spaces with
standard metric).
1.59
opposite ring
If R is a ring, then we may construct the opposite ring Rop which has the same underlying
abelian group structure, but with multiplication in the opposite order: the product of r1 and
r2 in Rop is r2 r1 .
40
If M is a left R-module, then it can be made into a right Rop -module, where a module
element m, when multiplied on the right by an element r of Rop , yields the rm that we
have with our left R-module action on M. Similarly, right R-modules can be made into left
Rop -modules.
If R is a commutative ring, then it is equal to its own opposite ring.
1.60
orbit-stabilizer theorem
Given a group action G on a set X, define Gx to be the orbit of x and Gx to be the set of
stabilizers of x. For each x X the correspondence g(x) gGx is a bijection between Gx,
and the set of left cosets of Gx
A famous corollary is that
|Gx| |Gx | = |G| x X
1.61
orthogonal
The definition of orthogonal varies depending on the mathematical constructs in question.

There are particular definitions for
orthogonal matrices
orthogonal polynomials
orthogonal vectors
In general, two objects are orthogonal if they do not coincide in some sense. Sometimes
orthogonal means roughly the same thing as perpendicular.
Version: 2 Owner: akrowne Author(s): akrowne
1.62
permutation group on a set
permutation group on a set

41
1: A set
2: (SA , ) symmetric group
3: X 6 SA
4: (X, )
fact: conjugating stabilizer of an element by permutation produces

stabilizer of permuted element
1: A set
2: a A
3: X permutation group on A
4: X
5: StabX (a) 1 = StabX ((a))
fact: if a permutation group acts transitively, then the intersection

of conjugated stabilizers is the identity
1: A set
2: a A
3: X permutation group on A
T
4: X StabX (a) = 1
1.63
prime element
An element p in a ring R is a prime element if it generates a prime ideal. If R is commutative,

this is equivalent to saying that for all a, b R , if p divides ab, then p divides a or p divides b.
42
When R = Z the prime elements as formulated above are simply prime numbers.
Version: 3 Owner: dublisk Author(s): dublisk
1.64
product measure
Let (E1 , B1 (E1 )) and (E2 , B2 (E2 )) be two measurable spaces, with measures 1 and 2 . Let
B1 B2 be the sigma algebra on E1 E2 generated by subsets of the form B1 B2 , where
B1 B1 (E1 ) and B2 B2 (E2 ).
The product measure 1 2 is defined to be the unique measure on the measurable space
(E1 E2 , B1 B2 ) satisfying the property
1 2 (B1 B2 ) = 1 (B1 )2 (B2 ) for all B1 B1 (E1 ), B2 B2 (E2 ).
1.65
projective line
projective line
example

1: ` = {[X, Y, Z, W ] RP3 Z = W = 0}
1.66
projective plane
projective plane
1: : S2 S2 {0, 1}
2: x y y = x
43
3: p : S2 S2 /
4: quotient space obtained from p
Version: 2 Owner: bhaire Author(s): bhaire, apmxi
1.67
proof of calculus theorem used in the Lagrange

method
Let f (x) and gi (x), i = 0, . . ., m be differentiable scalar functions; x Rn .

We will find local extremes of the function f (x) where f = 0. This can be proved by
contradiction:
f 6= 0
0 > 0, ; 0 < < 0 : f (x f ) < f (x) < f (x + f )
but then f (x) is not a local extreme.
Now we put up some conditions,

such that we should find the x S Rn that gives a local
Tm
extreme of f . Let S = i=1 Si , and let Si be defined so that gi (x) = 0x Si .
Any vector x Rn can have one component perpendicular to the subset Si (for visualization,
think n = 3 and let Si be a flat surface). gi will be perpendicular to Si , because:
0 > 0, ; 0 < < 0 : gi (x gi ) < gi (x) < gi (x + gi )
But gi (x) = 0, so any vector x + gi must be outside Si , and also outside S. (todo: I have
proved that there might exist a component perpendicular to each subset Si , but not that
there exists only one; this should be done)
By the argument above, f must be zero - but now we can ignore all components of f
perpendicular to S. (todo: this should be expressed more formally and proved)
So we will have a local extreme within Si if there exists a i such that
f = i gi
We will have local extreme(s) within S where there exists a set i , i = 1, . . ., m such that
X
f =
i gi
Version: 2 Owner: tobix Author(s): tobix
44
1.68
proof of orbit-stabilizer theorem
The correspondence is clearly surjective. It is injective because if gGx = g 0Gx then g = g 0 h

for some h Gx . Therefore g(x) = g(h(x)) = g(x).
1.69
proof of power rule
The power rule can be derived by repeated application of the product rule.
Proof for all positive integers n

The power rule has been shown to hold for n = 0 and n = 1. If the power rule is known to
hold for some k > 0, then we have
D k+1
D
x
=
(x xk )
Dx
Dx
D
= x( xk ) + xk
Dx
= x (kxk1 ) + xk
= kxk + xk
= (k + 1)xk
Thus the power rule holds for all positive integers n.
Proof for all positive rationals n

Let y = xp/q . We need to show
Dy p/q
p
(x ) = xp/q1
Dx
q
(1.69.1)
The proof of this comes from implicit differentiation.

By definition, we have y q = xp . We now take the derivative with respect to x on both sides
of the equality.
45
D q
y
Dx
D q Dy
(y )
Dy
Dx
Dy
qy p1
Dx
Dy
Dx
D p
x
Dx
= pxp1
= pxp1
=
=
=
=
=
p xp1
q y p1
p p1 q/y
x y
q
p p1 p/qp
x x
q
p p1+p/qp
x
q
p p/q1
x
q
Proof for all positive irrationals n

For positive irrationals we claim continuity due to the fact that (1.69.1) holds for all positive
rationals, and there are positive rationals that approach any positive irrational.
Proof for negative powers n

We again employ implicit differentiation. Let u = x, and differentiate un with respect to x
for some non-negative n. We must show
Dun
= nun1
Dx
(1.69.2)
By definition we have un un = 1. We begin by taking the derivative with respect to x on

both sides of the equality. By application of the product rule we get
46
D n n
(u u )
Dx
Dun
D un
un
+ un
Dx
Dx
n
D
u
un
+ un (nun1 )
Dx
Dun
un
Dx
Dun
Dx
= 1
= 0
= 0
= nu1
= nun1
Version: 3 Owner: alek thiery Author(s): alek thiery, Logan
1.70
proof of primitive element theorem
Let Pa F [x], respectively Pb F [x], be the monic irreducible polynomial satisfied by a,

respectively b. If E is an extension of F that splits Pa Pb , then E is normal over F , and so
there are a finite number of subfields of E containing F , as many as there are subgroups of
Gal(E/F ), by the Fundamental Theorem of Galois Theory. Let ck = a+ kb with k F , and
consider the fields F (ck ). Since F is characteristic 0, there are infinitely many choices for k.
But F F (ck ) F (a, b) E, so by the above there are only finitely many F (ck ). Therefore,
for some ki , kj F , F (cki ) = F (ckj ). Then ckj F (cki ), and so cki ckj = (ki kj )b F (cki ),
and thus b F (cki ). Then also a = cki ki b F (cki ), which gives F (a, b) F (cki ). But we
also have F (cki ) F (a, b), and thus F (a, b) = F (cki ), QED.
Version: 1 Owner: sucrose Author(s): sucrose
1.71
proof of product rule
D
f (x + h)g(x + h) f (x)g(x)
[f (x)g(x)] = lim
h0
Dx
h
f (x + h)g(x + h) + f (x + h)g(x) f (x + h)g(x) f (x)g(x)
= lim
h0
h

g(x + h) g(x)
f (x + h) f (x)
= lim f (x + h)
+ g(x)
h0
h
h
0
0
= f (x)g (x) + f (x)g(x)
47
1.72
proof of sum rule

D
f (x + h) + g(x + h) f (x) g(x)
[f (x) + g(x)] = lim
h0
Dx
h

f (x + h) f (x) g(x + h) g(x)
= lim
+
h0
h
h
0
0
= f (x) + g (x)
1.73
proof that countable unions are countable
Let C be a countable collection of countable sets. We will show that
C is countable.
Let P be the set of positive primes. P is countably infinite, so there is a bijection between
P and N. Since there is a bijection between C and a subset of N, there must in turn be a
one-to-one function f : C P .
Each S C is countable, so there exists a bijection between S and some subset of N. Call
this function g, and define a new function hS : S N such that for all x S,
hS (x) = f (S)g(x)
Note that hS is one-to-one. Also note that for any distinct pair S, T C, the range of hS
and the range of hT are disjoint due to the fundamental theorem of arithmetic.
S
S
We may now define a one-to-one function h : C N, where, for each x C, h(x) =
hS (x) for some S C where x S (the choice of S is irrelevant, so long S
as it contains x).
Since the range of h is a subset of N, h is a bijection into that set and hence C is countable.
1.74
quadrature
Quadrature is the computation of a univariate definite integral. It can refer to either

numerical or analytic techniques; one must gather from context which is meant.
Cubature refers to higher-dimensional definite integral computation.
Some numerical quadrature methods are Simpsons rule, the trapezoidal rule, and Riemann sums.
48
1.75
quotient module
quotient module
1: X is a ring
2: Y a module over X
3: Z is a submodule of Y
4: Y /Z is the additive group of cosets of Z in Y
5: x(y + Z) = xy + Z module structure
1.76
regular expression
A regular expression is a particular metasyntax for specifying regular grammars, which

has many useful applications.
While variations abound, fundamentally a regular expression consists of the following components.
Parentheses can be used for grouping and nesting, and must contain a fully-formed regular
expression. The | symbol can be used for denoting alternatives. Some specifications do not
provide nesting or alternatives. There are also a number of postfix operators. The ? operator
means that the preceding element can either be present or non-present, and corresponds to a
rule of the form A B | . The * operator means that the preceding element can be present
zero or more times, and corresponds to a rule of the form A BA | . The + operator
means that the preceding element can be present one or more times, and corresponds to a
rule of the form A BA | B. Note that while these rules are not immediately in regular
form, they can be transformed so that they are.
Here is an example of a regular expression that specifies a grammar that generates the binary
representation of all multiples of 3 (and only multiples of 3).
(0 (1(01 0) 1) ) 0
This specifies the context-free grammar (in BNF):
49
S
A
B
C
D
E
F
G
::=
::=
::=
::=
::=
::=
::=
::=
AB
CD
0B|
0C|
1E1
F E|
0G0
1G|
A little further work is required to transform this grammar into an acceptable form for
regular grammars, but it can be shown that this grammar (and any grammar specified by a
regular expression) is equivalent to some regular grammar.
Regular expressions have many applications. Quite often they are used for powerful string
matching and substitution features in many text editors and programming languages.
1.77
regular language
A regular grammar is a context-free grammar where all productions must take one of the
following forms (specified here in BNF, is the empty string):
<non-terminal> ::= terminal

<non-terminal> ::= terminal non-terminal
<non-terminal> ::=
A regular language is the set of strings generated by a regular grammar. Regular grammars
are also known as Type-3 grammars in the Chomsky hierarchy.
A regular grammar can be represented by a deterministic or non-deterministic finite automaton.
Such automata can serve to either generate or accept sentences in a particular regular
language. Note that since the set of regular languages is a subset of context-free languages, any deterministic or non-deterministic finite automaton can be simulated by a
pushdown automaton.
50
1.78
right function notation
We are said to be using right function notation if we write functions to the right of their
arguments. That is, if : X Y is a function and x X, then x is the image of x under
.
Furthermore, if we have a function : Y Z, then we write the composition of the two
functions as : X Z, and the image of x under the composition as x = x() =
(x).
Compare this to left function notation.
1.79
ring homomorphism
Let R and S be rings. A ring homomorphism is a function f : R S such that:

f (a + b) = f (a) + f (b) for all a, b R
f (a b) = f (a) f (b) for all a, b R
When working in a context in which all rings have a multiplicative identity, one also requires
that f (1R ) = 1S .
1.80
scalar
A scalar is a quantity that is invariant under coordinate transformation, also known as a

tensor of rank 0. For example, the number 1 is a scalar, so is any number or variable n R.
The point (3, 4) is not a scalar because it is variable under rotation. As such, a scalar can
be an element of a field over which a vector space is defined.
Version: 3 Owner: slider142 Author(s): slider142
1.81
schrodinger operator
schrodinger operator
51
1: V : R R
y
2: y 7 dd2yx + V (x)y
1.82
selection sort
The Problem
See the Sorting Problem.
The Algorithm
Suppose L = {x1 , x2 , . . . , xn } is the initial list of unsorted elements. The selection sort
algorithm sorts this list in n steps. At each step i, find the largest element L[j] such that
j < n i + 1, and swap it with the element at L[n i + 1]. So, for the first step, find the
largest value in the list and swap it with the last element in the list. For the second step,
find the largest value in the list up to (but not including) the last element, and swap it with
the next to last element. This is continued for n 1 steps. Thus the selection sort algorithm
is a very simple, in-place sorting algorithm.
Pseudocode
Algorithm Selection Sort(L, n)
Input: A list L of n elements
Output: The list L in sorted order begin
for i n downto 2 do
begin
temp L[i]
max 1
for j 2 to i do
if L[j] > L[max] then
max j
L[i] L[max]
L[max] temp
end
end
52
Analysis
The selection sort algorithm has the same runtime for any set of n elements, no matter
what the values or order of those elements are. Finding the maximum element of a list of
i elements requires i 1 comparisons. Thus T (n), the number of comparisons required to
sort a list of n elements with the selection sort, can be found:
T (n) =
=
n
X
i=2
n
X
i=1
2
(i 1)
in2
(n n 4)
2
= O(n2 )
However, the number of data movements is the number of swaps required, which is n1. This
algorithm is very similar to the insertion sort algorithm. It requires fewer data movements,
but requires more comparisons.
1.83
semiring
A semiring is an algebra (A, , +, 0, 1) of a set A, where 0 and 1 are constants, (A, , 1) is

a monoid, (A, +, 0) is a commutative monoid, distributes over + from the left and right,
and 0 is both a left and right annihilator (0a = a0 = 0). Often a b is written simply as ab,
and the semiring (A, , +, 0, 1) as simply A.
The relation 6 on a semiring A is defined as a 6 b if and only if there exists some c A
such that a + c = b, and is a quasiordering. If + is idempotent over A (that is, a + a = a
holds for all a A), then 6 is a partial ordering.
Addition and (left and right) multiplication are monotonic operators with respect to 6, with
0 as the minimal element.
53
1.84
simple function
Let (X, B) be a measurable space. Let Ak , k = 1, 2, . . . , n be the characteristic functions

of sets Ak B. We call h a simple function if it can be written as
h=
n
X
ck Ak ,
k=1
ck R.
(1.84.1)
for some n N.
Version: 2 Owner: drummond Author(s): drummond
1.85
simple path
A simple path in a graph is a path that contains no vertex more than once. By definition,
cycles are particular instances of simple paths.
1.86
solutions of an equation
solutions of an equation

1: {x f (x) = 0}
1.87
spanning tree
A spanning tree of a (connected) graph G is a connected, acyclic subgraph of G that

contains all of the vertices of G. Below is an example of a spanning tree T , where the edges
in T are drawn as solid lines and the edges in G but not in T are drawn as dotted lines.
54
For any tree there is exactly one spanning tree: the tree itself.
1.88
square root
The square root of a non-negative

real number
x,
written as x, is the real number y such
2
that y 2 = x. Equivalently, x x. Or, x x x.
Example: 9 = 3 because 32 = 3 3 = 9.
Example: x2 + 2x + 1 = x+1 because (x+1)2 = (x+1)(x+1) = x2 +x+x+1 = x2 +2x+1.
In some situations it is better to allow two values for x. For example, 4 = 2 because
22 = 4 and (2)2 = 4.
The square root operation is distributive for multiplication and division, but not for addition
and subtraction.
q
That is, x y = x y, and xy = xy .
However, in general, x + y 6= x + y and x y 6= x y.

p
Example: x2 y 2 = xy because (xy)2 = xy xy = x x y y = x2 y 2 = x2 y 2.
q
2
2
9
9
Example:
= 35 because 53 = 352 = 25
.
25
1
. As
The square root notation is actually an alternative to exponentiation. Thatis, x x 2
3
3
such, the square root operation is associative with exponentiation. That is, x3 = x 2 = x .
Negative real numbers do not havereal square roots. For example, 4 is not a real number.
Proof by contradiction: Suppose 4 = x R. If x is negative, x2 is positive. But if x is
55
positive, x2 is also positive. But x cannot be zero either, because 02 = 0. So
4 6 R.
For additional discussion of the square root and negative numbers, see the discussion of
complex Numbers.
Version: 9 Owner: wberry Author(s): wberry
1.89
stable sorting algorithm
A stable sorting algorithm is any sorting algorithm that preserves the relative ordering of
items with equal values. For instance, consider a list of ordered pairs L := {(A, 3), (B, 5), (C, 2), (D, 5), (E, 4
If a stable sorting algorithm sorts L on the second value in each pair using the 6 relation,
then the result is guaranteed to be {(C, 2), (A, 3), (E, 4), (B, 5), (D, 5)}. However, if an
algorithm is not stable, then it is possible that (D, 5) may come before (B, 5) in the sorted
output.
Some examples of stable sorting algorithms are bubblesort and mergesort (although the
stability of mergesort is dependent upon how it is implemented). Some examples of unstable
sorting algorithms are heapsort and quicksort (quicksort could be made stable, but then it
wouldnt be quick any more). Stability is a useful property when the total ordering relation
is dependent upon initial position. Using a stable sorting algorithm means that sorting by
ascending position for equal keys is built-in, and need not be implemented explicitly in the
comparison operator.
1.90
standard deviation
Given a random
variable X, the standard deviation of X is defined as
p
SD[X] = V ar[X].
The standard deviation is a measure of the variation of X around the expected value.
1.91
stochastic independence
The random variables X1 , X2 , ..., Xn are stochastically independent (or just independent)
if
56
fX1 ,...,Xn (x1 , ..., xn ) = fX1 (x1 ) fXn (xn ) (x1 , ..., xn ) Rn
This is, the random variables X1 , ..., Xn are independent if its joint distribution function can
be expressed as the product of the marginal distributions of the variables, evaluated at the
corresponding points.
This definition implies all the following:
1. FX1 ,...,Xn (x1 , ..., xn ) = FX1 (x1 ) fXn (xn )(x1 , ..., xn ) Rn (joint cumulative distribution)
2. MX1 +...+Xn (t) = MX1 (t) MXn (t)(t1 , ..., tn ) (moment generating function)
Q
Q
3. E[ ni=1 Xi ] = ni=1 E[Xi ] (expectation)
However, only the first two above imply independence. See also correlation.
There are other definitions of independence, too.
1.92
substring
Given a string s , a string t is a substring of s if s = utv for some strings u, v .

For example, lp, al, ha, alpha, and (the empty string) are all substrings of the string alpha.
1.93
successor
Given a set S, the successor of S is the set S

by S 0 .
57
{S}. One often denotes the successor of S
1.94
sum rule
The sum rule states that

D
[f (x) + g(x)] = f 0 (x) + g 0 (x)
Dx
Proof
See the proof of the sum rule.
Examples
D
D
D
(x + 1) =
x+
1=1
Dx
Dx
Dx
D 2
D
D
D 2
(x 3x + 2) =
x +
(3x) +
(2) = 2x 3
Dx
Dx
Dx
Dx
D
D
D
(sin x + cos x) =
sin x +
cos x = cos x sin x
Dx
Dx
Dx
1.95
superset
Given two sets A and B, A is a superset of B if every element in B is also in A. We

denote this relation as A B. This is equivalent to saying that B is a subset of A, that is
A B B A.
Similar rules that hold for also hold for . If X Y and Y X, then X = Y . Every set
is a superset of itself, and every set is a superset of the empty set.
A is a proper superset of B if A B and A 6= B. This relation is often denoted as A B.
Unfortunately, A B is often used to mean the more general superset relation, and thus it
should be made explicit when proper superset is intended.
58
1.96
symmetric polynomial
A polynomial f R[x1 , . . . , xn ] in n variables with coefficients in a ring R is symmetric if

(f ) = f for every permutation of the set {x1 , . . . , xn }.
Every symmetric polynomial can be written as a polynomial expression in the elementary symmetric polynom
1.97
the argument principle
the argument principle

1: f meromorphic in
2: 0 6 i 6 n : f (ai ) = 0
3: 0 6 i 6 m : f (bi ) =
4: cycle
5: homologous to zero with respect to
6: ai
/ im() : bi
/ im() :
1
int f 0 (z)/f (z)dz
2i
Pn
j=0
ind (aj )
Pm
k=0 ind (bk )
1.98
torsion-free module
torsion-free module
1: R integral domain
2: X left module over R
3: Xt torsion submodule
4: Xt = 0
59
fact: a finitely generated torsion-free submodule is a free module

1: X finitely generated R-module
2: X torsion-free
3: X free
(to be fiexd)
1.99
total order
A total order is a special case of a partial order. If 6 is a partial order on A, then it

satisfies the following three properties:
1. reflexivity: a 6 a for all a A
2. antisymmetry: If a 6 b and b 6 a for any a, b A, then a = b
3. transitivity: If a 6 b and b 6 c for any a, b, c A, then a 6 c
The relation 6 is a total order if it satisfies the above three properties and the following
additional property:
4. Comparability: For any a, b A, either a 6 b or b 6 a.
1.100
tree traversals
A tree traversal is an algorithm for visiting all the nodes in a rooted tree exactly once.
The constraint is on rooted trees, because the root is taken to be the starting point of the
traversal. A traversal is also defined on a forest in the sense that each tree in the forest can
be iteratively traversed (provided one knows the roots of every tree beforehand). This entry
presents a few common and simple tree traversals.
60
In the description of a tree, the notion of rooted-subtrees was presented. Full understanding
of this notion is necessary to understand the traversals presented here, as each of these
traversals depends heavily upon this notion.
In a traversal, there is the notion of visiting a node. Visiting a node often consists of doing
some computation with that node. The traversals are defined here without any notion of
what is being done to visit a node, and simply indicate where the visit occurs (and most
importantly, in what order).
Examples of each traversal will be illustrated on the following binary tree.
Vertices will be numbered in the order they are visited, and edges will be drawn with arrows
indicating the path of the traversal.
Preorder Traversal
Given a rooted tree, a preorder traversal consists of first visiting the root, and then
executing a preorder traversal on each of the roots children (if any).
For example
1
f
g
e
b
The term preorder refers to the fact that a node is visited before any of its descendents.
A preorder traversal is defined for any rooted tree. As pseudocode, the preorder traversal is
Algorithm PreorderTraversal(x, Visit)
Input: A node x of a binary tree, with children left(x) and right(x), and some computation
Visit
d
for x
Output: Visits nodes of subtree rooted at x in a preorder traversal begin
61
efined
Visit
(x)
PreorderTraversal
(left(x), Visit)
PreorderTraversal
(right(x), Visit)
end
Postorder Traversal
Given a rooted tree, a postorder traversal consists of first executing a postorder traversal
on each of the roots children (if any), and then visiting the root.
For example
7
f
g
e
b
As with the preorder traversal, the term postorder here refers to the fact that a node is
visited after all of its descendents. A postorder traversal is defined for any rooted tree. As
pseudocode, the postorder traversal is
Algorithm PostorderTraversal(x, Visit)
Visit
d
for x
Output: Visits nodes of subtree rooted at x in a postorder traversal begin
Visit
(x)
PostorderTraversal
(left(x), Visit)
PostorderTraversal
(right(x), Visit)
end
62
efined
In-order Traversal
Given a binary tree, an in-order traversal consists of executing an in-order traversal on
the roots left child (if present), then visiting the root, then executing an in-order traversal
on the roots right child (if present). Thus all of a roots left descendents are visited before
the root, and the root is visited before any of its right descendents.
For example
4
f
g
e
b
As can be seen, the in-order traversal has the wonderful property of traversing a tree from
left to right (if the tree is visualized as it has been drawn here). The term in-order comes
from the fact that an in-order traversal of a binary search tree visits the data associated with
the nodes in sorted order. As pseudocode, the in-order traversal is
Algorithm InOrderTraversal(x, Visit)
Visit
d
for x
Output: Visits nodes of subtree rooted at x in an in-order traversal begin
InOrderTraversal
(left(x), Visit)
Visit
(x)
InOrderTraversal
(right(x), Visit)
end
1.101
trie
A trie is a digital tree for storing a set of strings in which there is one node for every prefix
of every string in the set. The name comes from the word retrieval, and thus is pronounced
the same as tree (which leads to much confusion when spoken aloud). The word retrieval is
63
efined
stressed, because a trie has a lookup time that is equivalent to the length of the string being
looked up.
If a trie is to store some set of strings S (where is an alphabet), then it takes the
following form. Each edge leading to non-leaf nodes in the trie is labelled by an element
of . Any edge leading to a leaf node is labelled by $ (some symbol not in ). For every
string s S, there is a path from the root of the trie to a leaf, the labels of which when
concatenated form s ++ $ (where ++ is the string concatenation operator). For every path
from the root of the trie to a leaf, the labels of the edges concatenated form some string in
S.
Example
Suppose we wish to store the set of strings S := {alpha, beta, bear, beast, beat}. The trie that
stores S would be
1.102
unit vector
A unit vector is a vector with a length, or vector norm, of one. In Rn , one can obtain such a
vector by dividing a vector by its magnitude |~v|. For example, we have a vector < 1, 2, 3 >.
A unit vector pointing in this direction would be
1
2
3
1
1
< 1, 2, 3 >= < 1, 2, 3 >=< , , >
| < 1, 2, 3 > |
14
14 14 14
64
. The magnitude of this vector is 1.

1.103
unstable fixed point
A fixed point is considered unstable if it is neither attracting nor Liapunov stable. A saddle
point is an example of such a fixed point.
Version: 1 Owner: armbrusterb Author(s): armbrusterb
1.104
weak* convergence in normed linear space
weak* convergence in normed linear space

1: (x0n ) X 0
2: X a Banach space
3: x0 X 0 : x X : limn x(x0n ) x0n (x) = x0 (x).
4: If X is reflexive, then weak* convergence is the same as weak convergence
1.105
well-ordering principle for natural numbers
Every nonempty set S of nonnegative integers contains a least element; that is, there is some
integer a in S such that a 6 b for all b belonging to S.
For example, the positive integers are a well-ordered set under the standard order.
Version: 5 Owner: KimJ Author(s): KimJ
65
Chapter 2
00-01 Instructional exposition
(textbooks, tutorial papers, etc.)
2.1
dimension
The word dimension in mathematics has many definitions, but all of them are trying to
quantify our intuition that, for example, a sheet of paper has somehow one less dimension
than a stack of papers.
One common way to define dimension is through some notion of a number of independent
quantities needed to describe an element of an object. For example, it is natural to say
that the sheet of paper is two-dimensional because one needs two real numbers to specify
a position on the sheet, whereas the stack of papers is three-dimension because a position
in a stack is specified by a sheet and a position on the sheet. Following this notion, in
linear algebra the dimension of a vector space is defined as the minimal number of vectors
such that every other vector in the vector space is representable as a sum of these. Similarly,
the word rank denotes various dimension-like invariants that appear throughout the algebra.
However, if we try to generalize this notion to the mathematical objects that do not possess
an algebraic structure, then we run into a difficulty. From the point of view of set theory
there are as many real numbers as pairs of real numbers since there is a bijection from real
numbers to pairs of real numbers. To distinguish a plane from a cube one needs to impose
restrictions on the kind of mapping. Surprisingly, it turns out that the continuity is not
enough as was pointed out by Peano. There are continuous functions that map a square
onto a cube. So, in topology one uses another intuitive notion that in a high-dimensional
space there are more directions than in a low-dimensional. Hence, the (Lebesgue covering)
dimension of a topological space is defined as the smallest number d such that every covering
of the space by open sets can be refined so that no point is contained in more than d + 1 sets.
For example, no matter how one covers a sheet of paper by sufficiently small other sheets
of paper such that no two sheets can overlap each other, but cannot merely touch, one will
66
always find a point that is covered by 2 + 1 = 3 sheets.

Another definition of dimension rests on the idea that higher-dimensional objects are in some
sense larger than the lower-dimensional ones. For example, to cover a cube with a side length
2 one needs at least 23 = 8 cubes with a side length 1, but a square with a side length 2 can
be covered by only 22 = 4 unit squares. Let N() be the minimal number of open balls in
any covering of a bounded set S by balls of radius . The Besicovitch-Hausdorff dimension
of S is defined as lim log N(). The Besicovitch-Hausdorff dimension is not always
defined, and when defined it might be non-integral.
Version: 4 Owner: bbukh Author(s): bbukh
2.2
toy theorem
A toy theorem is a simplified version of a more general theorem. For instance, by introducing some simplifying assumptions in a theorem, one obtains a toy theorem.
Usually, a toy theorem is used to illustrate the claim of a theorem. It can also be illustrative
and insightful to study proofs of a toy theorem derived from a non-trivial theorem. Toy
theorems also have a great education value. After presenting a theorem (with, say, a highly
non-trivial proof), one can sometimes give some assurance that the theorem really holds, by
proving a toy version of the theorem.
For instance, a toy theorem of Brouwer fixed point theorem is obtained by restricting the
dimension to one. In this case, the Brouwer fixed point theorem follows almost immediately
from the intermediate value theorem (see this page).
Version: 1 Owner: matte Author(s): matte
67
Chapter 3
00-XX General
3.1
method of exhaustion
The method of exhaustion is calculating an area by approximating it by the areas of a

sequence of polygons.
For example, filling up the interior of a circle by inscribing polygons with more and more
sides.
Version: 1 Owner: vladm Author(s): vladm
68
Chapter 4
00A05 General mathematics
4.1
Conways chained arrow notation
Conways chained arrow notation is a way of writing numbers even larger than those
provided by the up arrow notation. We define m n p = m(p+2) n = m n and
| {z }
m n = m n 1 = mn . Longer chains are evaluated by
m n p 1 = m n p
m n 1 q = m n
and
m n p + 1 q + 1 = m n (m n p q + 1) q
For example:
332=
3 (3 2 2) 1 =
3 (3 2 2) =
3 (3 (3 1 2) 1) =
3 (3 3 1) =
3
327
33 =
= 7625597484987
69
A much larger example is:

3244=
3 2 (3 2 3 4) 3 =
3 2 (3 2 (3 2 2 4) 3) 3 =
3 2 (3 2 (3 2 (3 2 1 4) 3) 3) 3 =
3 2 (3 2 (3 2 (3 2) 3) 3) 3 =
3 2 (3 2 (3 2 9 3) 3) 3
Clearly this is going to be a very large number. Note that, as large as it is, it is proceeding
towards an eventual final evaluation, as evidenced by the fact that the final number in the
chain is getting smaller.
Version: 4 Owner: Henry Author(s): Henry
4.2
Knuths up arrow notation
Knuths up arrow noation is a way of writing numbers which would be unwieldy in

standard decimal notation. It expands on the exponential notation m n = mn . Define
m 0 = 1 and m n = m (m [n 1]).
21
Obviously m 1 = m1 = m, so 3 2 = 331 = 33 = 27, but 2 3 = 222 = 22

2
2(2 ) = 16.
m
In general, m n = mm
, a tower of height n.
Clearly, this process can be extended: m 0 = 1 and m n = m (m [n 1]).

An alternate notation is to write m(i) n for m n. (i2 times because then m(2) n = mn
| {z }
i2 times
and m(1) n = m + n.) Then in general we can define m(i) n = m(i1) (m(i) (n 1)).
To get a sense of how quickly these numbers grow, 3 2 = 3 3 is more than seven and
a half trillion, and the numbers continue to grow much more than exponentially.
4.3
arithmetic progression
Arithmetic progression of length n, initial term a1 and common difference d is the sequence
a1 , a1 + d, a1 + 2d, . . . , a1 + (n 1)d.
70
The sum of terms of an arithmetic progression can be computed using Gausss trick:
S=
(a1 + 0)
+S = (a1 + (n 1)d
2S = (2a1 + (n 1)
We just add the sum with itself written backwards, and the sum of each of the columns equals
to (2a1 + (n 1)d). The sum is then
S=
(2a1 + (n 1)d)n
.
2
4.4
arity
The arity of something is the number of arguments it takes. This is usually applied to
functions: an n-ary function is one that takes n arguments. Unary is a synonym for 1-ary,
and binary for 2-ary.
4.5
introducing 0th power
Let a be a number. Then for all n N, an is the product of n as. For integers (and their
extensions) we have a multiplicative identity called 1, i.e.a 1 = a for all a. So we can
write
an = an+0 = an 1.
From the definition of the power of a the usual laws can be derived; so it is plausible to set
a0 = 1, since 0 doesnt change a sum, like 1 doesnt change the product.
Version: 4 Owner: Thomas Heye Author(s): Thomas Heye
4.6
lemma
There is no technical distinction between a lemma and a theorem. A lemma is a proven

statement, typically named a lemma to distinguish it as a truth used as a stepping stone to
a larger result rather than an important statement in and of itself. Of course, some of the
most powerful statements in mathematics are known as lemmas, including Zorns lemma,
71
Bezouts lemma, Gauss lemma, Fatous lemma, etc., so one clearly cant get too much simply by reading into a propositions name.
According to [1], the plural Lemmas is commonly used. The correct plural of lemma,
however, is lemmata.
REFERENCES
1. N. Higham, Handbook of writing for the mathematical sciences, Society for Industrial and
Applied Mathematics, 1998. (pp. 16)
Version: 5 Owner: mathcam Author(s): mathcam
4.7
property
Given each element of a set X, a property is either true or false. Formally, a property P : X
{true, false}. Any property gives rise in a natural way to the set {x : x has the property P }
and the corresponding characteristic function.
Version: 3 Owner: fibonaci Author(s): bbukh, fibonaci, apmxi
4.8
saddle point approximation
The saddle point approximation (SPA), a.k.a. Stationary phase approximation, is a widely
used method in quantum field theory (QFT) and related fields. Suppose we want to evaluate
the following integral in the limit :
f (x)
I = lim int
.
dx e
(4.8.1)
The saddle point approximation can be applied if the function f (x) satisfies certain conditions. Assume that f (x) has a global minimum f (x0 ) = ymin at x = x0 , which is sufficiently
separated from other local minima and whose value is sufficiently smaller than the value of
those. Consider the Taylor expansion of f (x) about the point x0 :

1 2

(x x0 ) + x f (x)
f (x) = f (x0 ) + x f (x)
(x x0 )2 + O(x3 ).
(4.8.2)
x=x0
2
x=x0
Since f (x0 ) is a (global) minimum, it is clear that f 0 (x0 ) = 0. Therefore f (x) may be
approximated to quadratic order as
1
f (x) f (x0 ) + f 00 (x0 )(x x0 )2 .
2
72
(4.8.3)
The above assumptions on the minima of f (x) ensure that the dominant contribution to
(4.8.1) in the limit will come from the region of integration around x0 :
00
2 f (x0 )(xx0 )
I lim ef (x0 ) int
dx e
1/2

2
f (x0 )
lim e
.
f 00 (x0 )
(4.8.4)
In the last step we have performed the Gauian integral. The next nonvanishing higher order
correction to (4.8.4) stems from the quartic term of the expansion (4.8.2). This correction
may be incorporated into (4.8.4) to yield (after expanding part of the exponential):

4
4
f (x0 )
2 f 00 (x0 )(xx0 )2
1 (x f (x))|x=x0 (x x0 ) .
(4.8.5)
I lim e
int dx e
4!
...to be continued with applications to physics...
Version: 2 Owner: msihl Author(s): msihl
4.9
singleton
A set consisting of a single element is usually referred to as a singleton.

Version: 2 Owner: Koro Author(s): Koro
4.10
subsequence
If X is a set and (an )nN is a sequence in X, then a subsequence of (an ) is a sequence of

the form (anr )rN where (nr )rN is a strictly increasing sequence of natural numbers.
Version: 2 Owner: Evandar Author(s): Evandar
4.11
surreal number
The surreal numbers are a generalization of the reals. Each surreal number consists of two
parts (called the left and right), each of which is a set of surreal numbers. For any surreal
number N, these parts can be called NL and NR . (This could be viewed as an ordered pair of
sets, however the surreal numbers were intended to be a basis for mathematics, not something
to be embedded in set theory.) A surreal number is written N = hNL | NR i.
Not every number of this form is a surreal number. The surreal numbers satisfy two additional properties. First, if x NR and y NL then x y. Secondly, they must be well
73
founded. These properties are both satisfied by the following construction of the surreal
numbers and the 6 relation by mutual induction:
h|i, which has both left and right parts empty, is 0.
Given two (possibly empty) sets of surreal numbers R and L such that for any x R and
y L, x y, hL | Ri.
Define N 6 M if there is no x NL such that M 6 x and no y MR such that y 6 N.
This process can be continued transfinitely, to define infinite and infinitesimal numbers. For
instance if Z is the set of integers then = hZ |i. Note that this does not make equality the
same as identity: h1 | 1i = h|i, for instance.
It can be shown that N is sandwiched between the elements of NL and NR : it is larger
than any element of NL and smaller than any element of NR .
Addition of surreal numbers is defined by
N + M = h{N + x | x ML }
{M + x | y NL } | {N + x | x MR }
[
{M + x | y NR }i
It follows that N = hNR | NL i.

The definition of multiplication can be written more easily by defining M NL = {M x |
x NL } and similarly for NR .
Then
N M =hM NL + N ML NL ML , M NR + N MR NR MR |
M NL + N MR NL MR , M NR + N ML NR ML i
The surreal numbers satisfy the axioms for a field under addition and multiplication (whether
they really are a field is complicated by the fact that they are too large to be a set).
The integers of surreal mathematics are called the omnific integers. In general positive
integers n can always be written hn 1 |i and so n = h| 1 ni = h| (n) + 1i. So for
instance 1 = h0 |i.
In general, ha | bi is the simplest number between a and b. This can be easily used to define
the dyadic fractions: for any integer a, a + 21 = ha | a + 1i. Then 21 = h0 | 1i, 41 = h0 | 21 i,
and so on. This can then be used to locate non-dyadic fractions by pinning them between
a left part which gets infinitely close from below and a right part which gets infinitely close
from above.
74
Ordinal arithmetic can be defined starting with as defined above and adding numbers
such as h |i = + 1 and so on. Similarly, a starting infinitesimal can be found as h0 |
1, 12 , 41 . . .i = 1 , and again more can be developed from there.
75
Chapter 5
00A07 Problem books
5.1
Nesbitts inequality
Nesbitts inequality says, that for positive real a, b and c we have:

a
b
c
3
+
+
> .
b+c a+c a+b
2
Version: 2 Owner: mathwizard Author(s): mathwizard
5.2
proof of Nesbitts inequality
Starting from Nesbitts inequality

a
b
c
3
+
+
>
b+c a+c a+b
2
we transform the left hand side:
a+b+c a+b+c a+b+c
3
+
+
3> .
b+c
a+c
a+b
2
Now this can be transformed into:
((a + b) + (a + c) + (b + c))
1
1
1
+
+
a+b a+c b+c
Division by 3 and the right factor yields:

(a + b) + (a + c) + (b + c)
>
3
76
3
1
a+b
1
a+c
1 .
b+c
> 9.
Now on the left we have the arithmetic mean and on the right the harmonic mean, so this
inequality is true.
77
Chapter 6
00A20 Dictionaries and other
general reference works
6.1
completing the square
Let us consider the expression x2 + xy, where x and y are real (or complex) numbers. Using
the formula
(x + y)2 = x2 + 2xy + y 2
we can write
x2 + xy = x2 + xy + 0
y2 y2
= x2 + xy +
4
4
y 2 y2
= (x + ) .
2
4
This manipulation is called completing the square [3] in x2 + xy, or completing the square
x2 .
Replacing y by y, we also have
y
y2
x2 xy = (x )2 .
2
4
Here are some applications of this method:
Derivation of the solution formula to the quadratic equation.
Completing the square can also be used to find the extremal value of a quadratic
polynomial [2] without calculus. Let us illustrate this for the polynomial p(x) =
78
4x2 + 8x + 9. Completing the square yields

p(x) = (2x + 2)2 4 + 9
= (2x + 2)2 + 5
5,
since (2x + 2)2 0. Here, equality holds if and only if x = 1. Thus p(x) 5 for all
x R, and p(x) = 5 if and only if x = 1. It follows that p(x) has a global minimum
at x = 1, where p(1) = 5.
Completing the square can also be used as an integration technique to integrate, say
1
[3].
4x2 +8x+9
REFERENCES
1. R. Adams, Calculus, a complete course, Addison-Wesley Publishers Ltd, 3rd ed.
2. Matematik Lexikon (in Swedish), J. Thompson, T. Martinsson, Wahlstrom & Widstrand,
1991.
(Anyone has an English reference?)

Version: 7 Owner: mathcam Author(s): matte
79
Chapter 7
00A99 Miscellaneous topics
7.1
QED
The term QED is actually an abbreviation and stands for the Latin quod erat demonstrandum, meaning which was to be demonstrated.
QED typically is used to signify the end of a mathematical proof. The symbol

is often used in place of QED, and is called the Halmos symbol after mathematician
Paul Halmos (it can vary in width, however, and sometimes it is fully or partially shaded).
Halmos borrowed this symbol from magazines, where it was used to denote end of article.
7.2
TFAE
The abbreviation TFAE is shorthand for the following are equivalent. It is used before
a set of equivalent conditions (each implies all the others).
In a definition, when one of the conditions is somehow better (simpler, shorter, ...), it
makes sense to phrase the definition with that condition, and mention that the others are
equivalent. TFAE is typically used when none of the conditions can take priority over the
others. Actually proving the claimed equivalence must, of course, be done separately.
Version: 1 Owner: ariels Author(s): ariels
80
7.3
WLOG
WLOG (or WOLOG) is an acronym which stands for without loss of generality.
WLOG is invoked in situations where some property of a model or system is invariant
under the particular choice of instance attributes, but for the sake of demonstration, these
attributes must be fixed.
For example, we might be discussing properties of a segment (open or closed) of the real number
line. Due to the nature of the reals, we can select endpoints a and b without loss of generality. Nothing about our discussion of this segment depends on the choice of a or b. Of
course, any segment does actually have specific endpoints, so it may help to actually select
some (say 0 and 1) for clarity.
WLOG can also be invoked to shorten proofs where there are a number of choices of configuration, but the proof is the same for each of them. We need only walk through the proof
for one of these configurations, and WLOG serves as a note that we havent lost anything
in the choosing.
7.4
order of operations
The order of operations is a convention that tells us how to evaluate mathematical expressions (these could be purely numerical). The problem arises because expressions consist of
operators applied to variables or values (or other expressions) that each demand individual
evaluation, yet the order in which these individual evaluations are done leads to different
outcomes.
A conventional order of operations solves this. One could technically do without memorizing
this convention, but the only alternative is to use parentheses to group every single term of
an expression and evaluate the innermost operations first.
For example, in the expression a b + c, how do we know whether to apply multiplication or
addition first? We could interpret even this simple expression two drastically different ways:
1. Add b and c,
2. Multiply the sum from (1) with a.
or
1. Multiply a and b,
81
2. Add to the product in (1) the value of c.

One can see the different outcomes for the two cases by selecting some different values for a,
b, and c. The issue is resolved by convention in order of operations: the correct evaluation
would be the second one.
The nearly universal mathematical convention dictates the following order of operations (in
order of which operators should be evaluated first):
1. factorial.
2. Exponentiation.
3. Multiplication.
4. Division.
5. Addition.
Any parenthesized expressions are automatically higher priority than anything on the
above list.
There is also the problem of what order to evaluate repeated operators of the same type, as
in:
a/b/c/d
The solution in this problem is typically to assume the left-to-right interpretation. For the
above, this would lead to the following evaluation:
(((a/b)/c)/d)
In other words,
1. Evaluate a/b.
2. Evaluate (1)/c.
3. Evaluate (2)/d.
Note that this isnt a problem for associative operators such as multiplication or addition in
the reals. One must still proceed with caution, however, as associativity is a notion bound
82
up with the concept of groups rather than just operators. Hence, context is extremely
important.
For more obscure operations than the ones listed above, parentheses should be used to remove
ambiguity. Completely new operations are typically assumed to have the highest priority,
but the definition of the operation should be accompanied by some sort of explanation of how
it is evaluated in relation to itself. For example, Conways chained arrow notation explicitly
defines what order repeated applications of itself should be evaluated in (it is right-to-left
rather than left-to-right)!
83
Chapter 8
01A20 Greek, Roman
8.1
Roman numerals
Roman numerals are a method of writing numbers employed primarily by the ancient
Romans. It place of digits, the Romans used letters to represent the numbers central to the
system:
I
V
X
L
C
D
M
1
5
10
50
100
500
1000
Larger numbers can be made by writing a bar over the letter, which means one thousand
times as much. For instance V is 5000.
Other numbers were written by putting letters together. For instance II means 2. Larger
letters go on the left, so LII is 52, but IIL is not a valid Roman numeral.
One additional rule allows a letter to the left of a larger letter to signify subtracting the
smaller from the larger. For instance IV is 4. This can only be done once; 3 is written III,
not IIV . Also, it is generally required that the smaller letter be the one immediately smaller
than the larger, so 1999 is usually written MCMXCIX, not MIM.
It is worth noting that today it is usually considered incorrect to repeat a letter four times,
so IV is prefered to IIII. However many older monuments do not use the subtraction rule
at all, so 44 was written XXXXIIII instead of the now preferable XLIX.
84
Chapter 9
01A55 19th century
9.1
Poincar, Jules Henri
Jules Henri Poincare was born on April 29th 1854 in Cite Ducale[BA] a neighborhood in
Nancy, a city in France. He was the son of Dr Leon Poincare (1828-1892) who was a
professor at the University of Nancy in the faculty of medicine.[14] His mother, Eugenie
Launois (1830-1897) was described as a gifted mother[6] who gave special instruction to
her son. She was 24 and his father 26 years of age when Henri was born[9]. Two years after
the birth of Henri they gave birth to his sister Aline.[6]
In 1862 Henri entered the Lycee of Nancy which is today, called in his honor, the Lycee
Henri Poincare. In fact the University of Nancy is also named in his honor. He graduated
from the Lycee in 1871 with a bachelors degree in letters and sciences. Henri was the top
of class in almost all subjects, he did not have much success in music and was described as
average at best in any physical activities.[9] This could be blamed on his poor eyesight
and absentmindedness.[4] Later in 1873, Poincare entered lEcole Polytechnique where he
performed better in mathematics than all the other students. He published his first paper at 20 years of age, titled D
emonstration nouvelle des propri
et
es de lindicatrice
dune surface.[3] He graduated from the institution in 1876. The same year he decided
to attend lEcole des Mines and graduated in 1879 with a degree in mining engineering.[14]
After his graduation he was appointed as an ordinary engineer in charge of the mining
services in Vesoul. At the same time he was preparing for his doctorate in sciences (not
surprisingly), in mathematics under the supervision of Charles Hermite. Some of Charles
Hermites most famous contributions to mathematics are: Hermites polynomials, Hermites
differential equation, Hermites formula of interpolation and Hermitian matrices.[9] Poincare,
as expected graduated from the University of Paris in 1879, with a thesis relating to difJules Henri Poincare (1854 - 1912)
85
ferential equations. He then became a teacher at the University of Caen, where he taught
analysis. He remained there until 1881. He then was appointed as the matre de conferences
danalyse[14] (professor in charge of analysis conferences) at the University of Paris. Also in
that same year he married Miss Poulain dAndecy. Together they had four children: Jeanne
born in 1887, Yvonne born in 1889, Henriette born in 1891, and finally Leon born in 1893.
He had now returned to work at the Ministry of Public Services as an engineer. He was
responsible for the development of the northern railway. He held that position from 1881 to
1885. This was the last job he held in administration for the government of France. In 1893
he was awarded the title of head engineer in charge of the mines. After that his career awards
and position continuously escalated in greatness and quantity. He died two years before the
war on July 17th 1912 of an embolism at the age of 58. Interestingly, at the beginning of
World War I, his cousin Raymond Poincare was the president of the French Republic.
Poincares work habits have been compared to a bee flying from flower to flower. Poincare
was interested in the way his mind worked, he studied his habits. He gave a talk about his
observations in 1908 at the Institute of General Psychology in Paris. He linked his way of
thinking to how he made several discoveries. His mental organization was not only interesting
to him but also to Toulouse, a psychologist of the Psychology Laboratory of the School of
Higher Studies in Paris. Toulouse wrote a book called Henri Poincar
e which was published
in 1910. He discussed Poincares regular schedule: he worked during the same times each
day in short periods of time. He never spent a long time on a problem since he believed
that the subconscious would continue working on the problem while he worked on another
problem. Toulouse also noted that Poincare also had an exceptional memory. In addition he
stated that most mathematicians worked from principle already established while Poincare
was the type that started from basic principle each time.[9] His method of thinking is well
summarized as:
Habitue à negliger les details et à ne regarder que les cimes, il passait de lune à
lautre avec une promptitude surprenante et les faits quil decouvrait se groupant
deux-memes autour de leur centre etaient instantanemant et automatiquement
classe dans sa memoire. (He neglected details and jumped from idea to idea, the
facts gathered from each idea would then come together and solve the problem)
[BA]
The mathematician Darboux claimed he was un intuitif(intuitive)[BA], arguing that this
is demonstrated by the fact that he worked so often by visual representation. He did not
care about being rigorous and disliked logic. He believed that logic was not a way to invent
but a way to structure ideas but that logic limits ideas.
Poincare had the opposite philosophical views of Bertrand Rusell and Gottlob Fredge who
believed that mathematics were a branch of logic. Poincare strongly disagreed, claiming that
intuition was the life of mathematics. Poincare gives an interesting point of view in his book
Science and Hypothesis:
For a superficial observer, scientific truth is beyond the possibility of doubt; the
86
logic of science is infallible, and if the scientists are sometimes mistaken, this is
only from their mistaking its rule. [12]
Poincare believed that arithmetic is a synthetic science. He argued that Peanos axioms
cannot be proven non-circularly with the principle of induction.[7] Therefore concluding
that arithmetic is a priori synthetic and not analytic. Poincare then went on to say that
mathematics can not be a deduced from logic since it is not analytic. It is important to
note that even today Poincare has not been proven wrong in his argumentation. His views
were the same as those of Kant[8]. However Poincare did not share Kantian views in all
branches of philosophy and mathematics. For example in geometry Poincare believed that
the structure of non-Euclidean space can be known analytically. He wrote 3 books that made
his philosophies known: Science and Hypothesis, The Value of Science and Science
and Method.
Poincares first area of interest in mathematics was the fuchsian function that he named after
the mathematician Lazarus Fuch because Fuch was known for being a good teacher and done
alot of research in differential equations and in the theory of functions. The functions did
not keep the name fuchsian and are today called automorphic. Poincare actually developed
the concept of those functions as part of his doctoral thesis.[9] An automorphic function is a
function f (z) where z C which is analytic under its domain and which is invariant under
a denumerable infinite group of linear fractional transformations, they are the generalizations of trigonometric functions and elliptic functions.[15] Below Poincare explains how he
discovered Fuchsian functions:
For fifteen days I strove to prove that there could not be any functions like those
I have since called Fuchsian functions. I was then very ignorant; every day I
seated myself at my work table, stayed an hour or two, tried a great number
of combinations and reached no results. One evening, contrary to my custom, I
drank black coffee and could not sleep. Ideas rose in crowds; I felt them collide
until pairs interlocked, so to speak, making a stable combination. By the next
morning I had established the existence of a class of Fuchsian functions, those
which come from the hypergeometric series; I had only to write out the results,
which took but a few hours. [11]
This is a clear indication Henri Poincare brilliance. Poincare communicated a lot with Klein
another mathematician working on fuchsian functions. They were able to discuss and further
the theory of automorphic(fushian) functions. Apparently Klein became jealous of Poincares
high opinion of Fuchs work and ended their relationship on bad terms.
Poincare contributed to the field of algebraic topology and published Analysis situs in 1895
which was the first real systematic look at topology. He acquired most of his knowledge
from his work on differential equations. He also formulated the Poincare conjecture, one
of the great unsolved mathematics problems. It is currently one of the Millennium Prize
Problems. The problem is stated as:
87
Consider a compact 3-dimensional manifold V without boundary. Is it possible

that the fundamental group V could be trivial, even though V is not homeomorphic
to the 3-dimensional sphere? [5]
The problem has been attacked by many mathematicians such as Henry Whitehead in 1934,
but without success. Later in the 50s and 60s progress was made and it was discovered
that for higher dimension manifolds the problem was easier. (Theorems have been stated for
those higher dimensions by Stephe Smale, John Stallings, Andrew Wallace, and many more)
[5] Poincare also studied homotopy theory, which is the study of topology reduced to various
groups that are algebraically invariant.[9] He introduced the fundamental group in a paper
in 1894, and later stated his infamous conjecture. He also did work in analytic functions,
algebraic geometry, and Diophantine problems where he made important contributions not
unlike most of the areas he studied in.
In 1887, Oscar II, King of Sweden and Norway held a competition to celebrate his sixtieth
birthday and to promote higher learning.[1] The King wanted a contest that would be of
interest so he decided to hold a mathematics competition. Poincare entered the competition
submitting a memoir on the three body problem which he describes as:
Le but final de la Mecanique celeste est de resoudre cette grande question de
savoir si la loi de Newton explique à elle seule tous les phenomènes astronomiques;
le seul moyen dy parvenir est de faire des observation aussi precises que possible
et de les comparer ensuite aux resultats du calcul. (The goal of celestial mechanics
is to answer the great question of whether Newtonian mechanics explains all
astronomical phenomenons. The only way this can be proven is by taking the
most precise observation and comparing it to the theoretical calculations.) [13]
Poincare did in fact win the competition. In his memoir he described new mathematical ideas
such as homoclinic points. The memoir was about to be published in Acta Mathematica
when an error was found by the editor. This error in fact led to the discovery of chaos
theory. The memoir was published later in 1890.[9] In addition Poincare proved that the
determinism and predictability were disjoint problems. He also found that the solution of
the three body problem would change drastically with small change on the initial conditions.
This area of research was neglected until 1963 when Edward Lorenz discovered the famous
a chaotic deterministic system using a simple model of the atmosphere.[7]
He made many contributions to different fields of applied mathematics as well such as: celestial mechanics, fluid mechanics, optics, electricity, telegraphy, capillarity, elasticity, thermodynamics, potential theory, quantum theory, theory of relativity and cosmology. In the field
of differential equations Poincare has given many results that are critical for the qualitative
theory of differential equations, for example the Poincare sphere and the Poincare map.
It is that intuition that led him to discover and study so many areas of science. Poincare
is considered to be the next universalist after Gauss. After Gausss death in 1855 people
88
generally believed that there would be no one else that could master all branches of mathematics. However they were wrong because Poincare took all areas of mathematics as his
province[4].
REFERENCES
1. The 1911 Edition Encyclopedia:
Oscar II of Sweden and Norway, [online],
http://63.1911encyclopedia.org/O/OS/OSCAR II OF SWEDEN AND NORWAY.htm
2. Belliver, Andre: Henri Poincare ou la vocation souveraine, Gallimard, 1956.
3. Bour P-E., Rebuschi M.: Serveur W3 des Archives H. Poincare [online] http://www.univnancy2.fr/ACERHP/
4. Boyer B. Carl: A History of Mathematics: Henri Poincare, John Wiley & Sons, inc., Toronto,
1968.
5. Clay Mathematics Institute:
Millennium Prize Problems,
2000,
[online]
http://www.claymath.org/prizeproblems/ .
6. Encyclopaedia Britannica: Biography of Jules Henri Poincare.
7. Murz, Mauro: Jules Henri Poincare [Internet Encyclopedia of Philosophy], [online]
http://www.utm.edu/research/iep/p/poincare.htm, 2001.
8. Kolak, Daniel: Lovers of Wisdom (second edition), Wadsworth, Belmont, 2001.
9. OConnor, J. John & Robertson, F. Edmund: The MacTutor History of Mathematics Archive,
[online] http://www-gap.dcs.st-and.ac.uk/ history/, 2002.
10. Oeuvres de Henri Poincare: Tome XI, Gauthier-Villard, Paris, 1956.
11. Poincare, Henri: Science and Method; The Foundations of Science, The Science Press, Lancaster, 1946.
12. Poincare, Henri: Science and Hypothesis; The Foundations of Science, The Science Press,
Lancaster, 1946.
13. Poincare, Henri: Les methodes nouvelles de la mecanique celeste, Dover Publications, Inc.
New York, 1957.
14. Sageret, Jules: Henri Poincare, Mercvre de France, Paris, 1911.
15. Weisstein, W. Eric: World of Mathematics: Automorphic Function, CRC Press LLC, 2002.
Version: 6 Owner: Daume Author(s): Daume
89
Chapter 10
01A60 20th century
10.1
Bourbaki, Nicolas
by Emilie
Richer
The Problem
The devastation of World War I presented a unique challenge to aspiring mathematicians of
the mid 1920s. Among the many casualties of the war were great numbers of scientists and
mathematicians who would at this time have been serving as mentors to the young students.
Whereas other countries such as Germany were sending their scholars to do scientific work,
France was sending promising young students to the front. A war-time directory of the
ecole Normale Superieure in Paris confirms that about 2/3 of their student population was
killed in the war.[DJ] Young men studying after the war had no young teachers, they had
no previous generation to rely on for guidance. What did this mean? According to Jean
Dieudonne, it meant that students like him were missing out on important discoveries and
advances being made in mathematics at that time. He explained : I am not saying that
they (the older professors) did not teach us excellent mathematics (...) But it is indubitable
that a 50 year old mathematician knows the mathematics he learned at 20 or 30, but has
only notions, often rather vague, of the mathematics of his epoch, i.e. the period of time
when he is 50. He continued : I had graduated from the ecole Normale and I did not know
what an ideal was! This gives you and idea of what a young French mathematician knew in
1930.[DJ] Henri Cartan, another student in Paris shortly after the war affirmed : we were
the first generation after the war. Before us their was a vide, a vacuum, and it was necessary
to make everything new.[JA] This is exactly what a few young Parisian math students set
out to do.
The Beginnings
After graduation from the ecole Normale Superieure de Paris a group of about ten young
90
mathematicians had maintained very close ties.[WA] They had all begun their careers and
were scattered across France teaching in universities. Among them were Henri Cartan and
Andre Weil who were both in charge of teaching a course on differential and integral calculus at the University of Strasbourg. The standard textbook for this class at the time was
Traite dAnalyse by E. Goursat which the young professors found to be inadequate in
many ways.[BA] According to Weil, his friend Cartan was constantly asking him questions
about the best way to present a given topic to his class, so much so that Weil eventually
nicknamed him the grand inquisitor.[WA] After months of persistent questioning, in the
winter of 1934, Weil finally got the idea to gather friends (and former classmates) to settle
their problem by rewriting the treatise for their course. It is at this moment that Bourbaki
was conceived.
The suggestion of writing this treatise spread and very soon a loose circle of friends, including Henri Cartan, Andre Weil, Jean Delsarte, Jean Dieudonne and Claude Chevalley began
meeting regularly at the Capoulade, a cafe in the Latin quarter of Paris to plan it . They
called themselves the Committee on the Analysis Treatise[BL]. According to Chevalley
the project was extremely naive. The idea was to simply write another textbook to replace
Goursats.[GD] After many discussions over what to include in their treatise they finally
came to the conclusion that they needed to start from scratch and present all of essential
mathematics from beginning to end. With the idea that the work had to be primarily a
tool, not usable in some small part of mathematics but in the greatest possible number of
places.[DJ] Gradually the young men realized that their meetings were not sufficient, and
they decided they would dedicate a few weeks in the summer to their new project. The
collaborators on this project were not aware of its enormity, but were soon to find out.
In July of 1935 the young men gathered for their first congress (as they would later call them)
in Besse-en-Chandesse. The men believed that they would be able to draft the essentials of
mathematics in about three years. They did not set out wanting to write something new,
but to perfect everything already known. Little did they know that their first chapter would
not be completed until 4 years later. It was at one of their first meetings that the young
men chose their name : Nicolas Bourbaki. The organization and its membership would go
on to become one of the greatest enigmas of 20th century mathematics.
The first Bourbaki congress, July 1935. From left to right, back row: Henri Cartan, Rene
de Possel, Jean Dieudonne, Andre Weil, university lab technician, seated: Mirlès, Claude
Chevalley, Szolem Mandelbrojt.
Andre Weil recounts many years later how they decided on this name. He and a few other
Bourbaki collaborators had been attending the ecole Normale in Paris, when a notification
was sent out to all first year science students : a guest speaker would be giving a lecture and
attendance was highly recommended. As the Story goes, the young students gathered to
hear, (unbeknownst to them) an older student, Raoul Husson who had disguised himself with
a fake beard and an unrecognizable accent. He gave what is said to be an incomprehensible,
91
nonsensical lecture, with the young students trying desperately to follow him. All his results
were wrong in a non-trivial way and he ended with his most extravagant : Bourbakis
Theorem. One student even claimed to have followed the lecture from beginning to end.
Raoul had taken the name for his theorem from a general in the Franco-Prussian war. The
committee was so amused by the story that they unanimously chose Bourbaki as their name.
Weils wife was present at the discussion about choosing a name and she became Bourbakis
godmother baptizing him Nicolas.[WA] Thus was born Nicolas Bourbaki.
Andre Weil, Claude Chevalley, Jean Dieudonne, Henri Cartan and Jean Delsarte were among
the few present at these first meetings, they were all active members of Bourbaki until their
retirements. Today they are considered by most to be the founding fathers of the Bourbaki
group. According to a later member they were those who shaped Bourbaki and gave it
much of their time and thought until they retired he also claims that some other early
contributors were Szolem Mandelbrojt and Rene de Possel.[BA]
Reforming Mathematics : The Idea
Bourbaki members all believed that they had to completely rethink mathematics. They felt
that older mathematicians were holding on to old practices and ignoring the new. That is
why very early on Bourbaki established one its first and only rules : obligatory retirement
at age 50. As explained by Dieudonne if the mathematics set forth by Bourbaki no longer
correspond to the trends of the period, the work is useless and has to be redone, this is why
we decided that all Bourbaki collaborators would retire at age 50.[DJ] Bourbaki wanted to
create a work that would be an essential tool for all mathematicians. Their aim was to create
something logically ordered, starting with a strong foundation and building continuously on
it. The foundation that they chose was set theory which would be the first book in a series
of 6 that they named elements de mathematique(with the s dropped from mathematique
to represent their underlying belief in the unity of mathematics). Bourbaki felt that the old
mathematical divisions were no longer valid comparing them to ancient zoological divisions.
The ancient zoologist would classify animals based on some basic superficial similarities such
as all these animals live in the ocean. Eventually they realized that more complexity
was required to classify these animals. Past mathematicians had apparently made similar
mistakes : the order in which we (Bourbaki) arranged our subjects was decided according to
a logical and rational scheme. If that does not agree with what was done previously, well, it
means that what was done previously has to be thrown overboard.[DJ] After many heated
discussions, Bourbaki eventually settled on the topics for elements de mathematique they
would be, in order:
I Set theory
II algebra
III topology
IV functions of one real variable
V topological vector spaces
VI Integration
92
They now felt that they had eliminated all secondary mathematics, that according to them
did not lead to anything of proved importance.[DJ] The following table summarizes Bourbakis choices.
What remains after cutting the loose threads
Linear and multilinear algebra
A little general topology the least possible
Topological vector Spaces
Homological algebra
commutative algebra
Non-commutative algebra
Lie groups
Integration
differentiable manifolds
Riemannian geometry
What is excluded(the loose threads)

theory of ordinals and cardinals
Lattices
Most general topology
Most of group theory finite groups
Most of number theory
Trigonometrical series
Interpolation
Series of polynomials
Applied mathematics
Dieudonn
es metaphorical ball of yarn: here is my picture of mathematics now. It
is a ball of wool, a tangled hank where all mathematics react upon another in an almost
unpredictable way. And then in this ball of wool, there are a certain number of threads coming
out in all directions and not connecting with anything else. Well the Bourbaki method is very
simple-we cut the threads.[DJ]
Reforming Mathematics : The Process

It didnt take long for Bourbaki to become aware of the size of their project. They were
now meeting three times a year (twice for one week and once for two weeks) for Bourbaki
congresses to work on their books. Their main rule was unanimity on every point. Any
member had the right to veto anything he felt was inadequate or imperfect. Once Bourbaki
had agreed on a topic for a chapter the job of writing up the first draft was given to any
member who wanted it. He would write his version and when it was complete it would be
presented at the next Bourbaki congress. It would be read aloud line by line. According
to Dieudonne each proof was examined point by point and criticized pitilessly. He goes
on one has to see a Bourbaki congress to realize the virulence of this criticism and how it
surpasses by far any outside attack.[DJ] Weil recalls a first draft written by Cartan (who has
unable to attend the congress where it would being presented). Bourbaki sent him a telegram
summarizing the congress, it read : union intersection partie produit tu es demembre foutu
Bourbaki (union intersection subset product you are dismembered screwed Bourbaki).[WA]
During a congress any member was allowed to interrupt to criticize, comment or ask questions
at any time. Apparently Bourbaki believed it could get better results from confrontation
than from orderly discussion.[BA] Armand Borel, summarized his first congress as two or
three monologues shouted at top voice, seemingly independant of one another.[BA]
Bourbaki congress 1951.
After a first draft had been completely reduced to pieces it was the job of a new collaborator
to write up a second draft. This second collaborator would use all the suggestions and
changes that the group had put forward during the congress. Any member had to be able to
93
take on this task because one of Bourbakis mottoes was the control of the specialists by the
non-specialists[BA] i.e. a member had to be able to write a chapter in a field that was not
his specialty. This second writer would set out on his assignment knowing that by the time
he was ready to present his draft the views of the congress would have changed and his draft
would also be torn apart despite its adherence to the congresses earlier suggestions. The
same chapter might appear up to ten times before it would finally be unanimously approved
for publishing. There was an average of 8 to 12 years from the time a chapter was approved
to the time it appeared on a bookshelf.[DJ] Bourbaki proceeded this way for over twenty
years, (surprisingly) publishing a great number of volumes.
Bourbaki congress 1951.
Recruitment and Membership

During these years, most Bourbaki members held permanent positions at universities across
France. There, they could recruit for Bourbaki, students showing great promise in mathematics. Members would never be replaced formally nor was there ever a fixed number of
members. However when it felt the need, Bourbaki would invite a student or colleague to a
congress as a cobaille (guinea pig). To be accepted, not only would the guinea pig have
to understand everything, but he would have to actively participate. He also had to show
broad interests and an ability to adapt to the Bourbaki style. If he was silent he would
not be invited again.(A challenging task considering he would be in the presence of some of
the strongest mathematical minds of the time) Bourbaki described the reaction of certain
guinea pigs invited to a congress : they would come out with the impression that it was a
gathering of madmen. They could not imagine how these people, shouting -sometimes three
or four at a time- about mathematics, could ever come up with something intelligent.[DJ]
If a new recruit was showing promise, he would continue to be invited and would gradually
become a member of Bourbaki without any formal announcement. Although impossible to
have complete anonymity, Bourbaki was never discussed with the outside world. It was many
years before Bourbaki members agreed to speak publicly about their story. The following
table gives the names of some of Bourbakis collaborators.
1st generation (founding fathers)
H. Cartan
C. Chevalley
J. Delsarte
J. Dieudonn
e
A. Weil
2nd generation (invited after WWII)

J. Dixmier
R. Godement
S. Eilenberg
J.L. Koszul
P. Samuel
J.P Serre
L. Shwartz
3rd generation
A. Borel
F. Bruhat
P. Cartier
A. Grothendieck
S. Lang
J. Tate
3 Generations of Bourbaki (membership according to Pierre Cartier)[SM]. Note: There

have been a great number of Bourbaki contributors, some lasting longer than others, this table
gives the members listed by Pierre Cartier. Different sources list different official members
in fact the Bourbaki website lists J.Coulomb, C.Ehresmann, R.de Possel and S. Mandelbrojt
as 1st generation members.[BW]
94
Bourbaki congress 1938, from left to right: S. Weil, C. Pisot, A. Weil, J. Dieudonne, C.
Chabauty, C. Ehresmann, J. Delsarte.
The Books
The Bourbaki books were the first to have such a tight organization, the first to use an
axiomatic presentation. They tried as often as possible to start from the general and work
towards the particular. Working with the belief that mathematics are fundamentally simple and for each mathematical question there is an optimal way of answering it. This
required extremely rigid structure and notation. In fact the first six books of elements de
mathematique use a completely linearly-ordered reference system. That is, any reference
at a given spot can only be to something earlier in the text or in an earlier book. This
did not please all of its readers as Borel elaborates : I was rather put off by the very dry
style, without any concession to the reader, the apparent striving for the utmost generality,
the inflexible system of internal references and the total absence of outside ones. However,
Bourbakis style was in fact so efficient that a lot of its notation and vocabulary is still
in current usage. Weil recalls that his grandaughter was impressed when she learned that
he had been personally responsible for the symbol for the empty set,[WA] and Chevalley
explains that to bourbakise now means to take a text that is considered screwed up and
to arrange it and improve it. Concluding that it is the notion of structure which is truly
bourbakique.[GD]
As well as , Bourbaki is responsible for the introduction of the (the implication arrow),
N, R, C, Q and Z (respectively the natural, real, complex, rational numbers and the integers)
CA (complement of a set A), as well as the words bijective, surjective and injective.[DR]
The Decline
Once Bourbaki had finally finished its first six books, the obvious question was what next?.
The founding members who (not intentionally) had often carried most of the weight were now
approaching mandatory retirement age. The group had to start looking at more specialized
topics, having covered the basics in their first books. But was the highly structured Bourbaki
style the best way to approach these topics? The motto everyone must be interested in
everything was becoming much more difficult to enforce. (It was easy for the first six
books whose contents are considered essential knowledge of most mathematicians) Pierre
Cartier was working with Bourbaki at this point. He says in the forties you can say that
Bourbaki know where to go: his goal was to provide the foundation for mathematics.[12] It
seemed now that they did not know where to go. Nevertheless, Bourbaki kept publishing.
Its second series (falling short of Dieudonnes plan of 27 books encompassing most of modern
mathematics [BA]) consisted of two very successful books :
Book VII Commutative algebra
Book VIII Lie Groups
However Cartier claims that by the end of the seventies, Bourbakis method was understood,
and many textbooks were being written in its style : Bourbaki was left without a task. (...)
95
With their rigid format they were finding it extremely difficult to incorporate new mathematical developments[SM] To add to its difficulties, Bourbaki was now becoming involved
in a battle with its publishing company over royalties and translation rights. The matter was
settled in 1980 after a long and unpleasant legal process, where, as one Bourbaki member
put it both parties lost and the lawyer got rich[SM]. In 1983 Bourbaki published its last
volume : IX Spectral Theory.
By that time Cartier says Bourbaki was a dinosaur, the head too far away from the tail.
Explaining : when Dieudonne was the scribe of Bourbaki every printed word came from
his pen. With his fantastic memory he knew every single word. You could say Dieudonne
what is the result about so and so? and he would go to the shelf and take down the book and
open it to the right page. After Dieudonne retired no one was able to do this. So Bourbaki
lost awareness of his own body, the 40 published volumes.[SM] Now after almost twenty
years without a significant publication is it safe to say the dinosaur has become extinct?1
But since Nicolas Bourbaki never in fact existed, and was nothing but a clever teaching and
research ploy, could he ever be said to be extinct?
REFERENCES
[BL] L. BEAULIEU: A Parisian Cafe and Ten Proto-Bourbaki Meetings (1934-1935), The Mathematical Intelligencer Vol.15 No.1 1993, pp 27-35.
[BCCC] A. BOREL, P.CARTIER, K. CHANDRASKHARAN, S. CHERN, S. IYANAGA: Andre
Weil (1906-1998), Notices of the AMS Vol.46 No.4 1999, pp 440-447.
[BA] A. BOREL: Twenty-Five Years with Nicolas Bourbaki, 1949-1973, Notices of the AMS Vol.45
No.3 1998, pp 373-380.
[BN] N. BOURBAKI: Theorie des Ensembles, de la collection elements de Mathematique, Hermann, Paris 1970.
[BW] Bourbaki website: [online] at www.bourbaki.ens.fr.
[CH] H. CARTAN: Andre Weil:Memories of a Long Friendship, Notices of the AMS Vol.46 No.6
1999, pp 633-636.
[DR] R. DeCAMPS: Qui est Nicolas Bourbaki?, [online] at http://faq.maths.free.fr.
[DJ] J. DIEUDONNe: The Work of Nicholas Bourbaki, American Math. Monthly 77,1970, pp134145.
[EY] Encylopedie Yahoo: Nicolas Bourbaki, [online] at http://fr.encylopedia.yahoo.com.
[GD] D. GUEDJ: Nicholas Bourbaki, Collective Mathematician: An Interview with Claude Chevalley, The Mathematical Intelligencer Vol.7 No.2 1985, pp18-22.
[JA] A. JACKSON: Interview with Henri Cartan, Notices of the AMS Vol.46 No.7 1999, pp782-788.
[SM] M. SENECHAL: The Continuing Silence of Bourbaki- An Interview with Pierre Cartier, The
Mathematical Intelligencer, No.1 1998, pp 22-28.
[WA] A. WEIL: The Apprenticeship of a Mathematician, Birkh
auser Verlag 1992, pp 93-122.
1
Today what remains is LAssociation des Collaborteurs de Nicolas Bourbaki who organize Bourbaki
seminars three times a year. These are international conferences, hosting over 200 mathematicians who come
to listen to presentations on topics chosen by Bourbaki (or the A.C.N.B). Their last publication was in 1998,
chapter 10 of book VI commutative algebra.
96
10.2
Erds Number
A low Erdos number is a status symbol among 20th Century mathematicians and is similar
to the 6-degrees-of-separation concept.
Let e(p) be the Erdos number of person p. Your Erdos number is
0 if you are Paul Erdos
min{e(x)|x X} + 1, where X is the set of all persons you have authored a paper
with.
Version: 7 Owner: tz26 Author(s): tz26
97
Chapter 11
03-00 General reference works
(handbooks, dictionaries,
bibliographies, etc.)
11.1
Burali-Forti paradox
The Burali-Forti paradox demonstrates that the class of all ordinals is not a set. If there
were a set of all ordinals, Ord, then it would follow that Ord was itself an ordinal, and
therefore that Ord Ord. Even if sets in general are allowed to contain themselves, ordinals
cannot since they are defined so that is well founded over them.
This paradox is similar to both Russels paradox and Cantors paradox, although it predates
both. All of these paradoxes prove that a certain object is too large to be a set.
11.2
Cantors paradox
Cantors paradox demonstrates that there can be no largest cardinality. In particular,

there must be an unlimited number of infinite cardinalities. For suppose that were the
largest cardinal. Then we would have |P()| = ||. Suppose f : P() is a bijection
proving their equicardinality. Then X = { | 6 f ()} is a subset of , and so there
is some such that f () = X. But X
/ X, which is a paradox.
The key part of the argument strongly resembles Russells paradox, which is in some sense
a generalization of this paradox.
98
Besides allowing an unbounded number of cardinalities as ZF set theory does, this paradox
could be avoided by a few other tricks, for instance by not allowing the construction of a
power set or by adopting paraconsistent logic.
11.3
Russells paradox
Suppose that for any coherent proposition P (x), we can construct a set {x : P (x)}. Let
S = {x : x 6 x}. Suppose S S; then, by definition, S 6 S. Likewise, if S 6 S, then by

definition S S. Therefore, we have a contradiction. Bertrand Russell gave this paradox as
an example of how a purely intuitive set theory can be inconsistent. The regularity axiom,
one of the Zermelo-Fraenkel axioms, was devised to avoid this paradox by prohibiting selfswallowing sets.
An interpretation of Russell paradox without any formal language of set theory could be
stated like If the barber shaves all those who do not themselves shave, does he shave himself?. If you answer himself that is false since he only shaves all those who do not themselves
shave. If you answer someone else that is also false because he shaves all those who do not
themselves shave and in this case he is part of that set since he does not shave himself.
Therefore we have a contradiction.
Version: 5 Owner: Daume Author(s): Daume, vampyr
11.4
biconditional
A biconditional is a truth function that is true only in the case that both parameters are true
or both are false. For example, a only if b, a just in case b, as well as b implies a and a
implies b are all ways of stating a biconditional in english. Symbolically the biconditional
is written as
ab
or
ab
Its truth table is
99
a b ab
F F
T
F T
F
T F
F
T T
T
In addition, the biconditional function is sometimes written as iff, meaning if and only
if.
The biconditional gets its name from the fact that it is really two conditionals in conjunction,
(a b) (b a)
This fact is important to recognize when writing a mathematical proof, as both conditionals
must be proven independently.
11.5
bijection
Let X and Y be sets. A function f : X Y that is one-to-one and onto is called a bijection
or bijective function from X to Y .
When X = Y , f is also called a permutation of X.
Version: 8 Owner: mathcam Author(s): mathcam, drini
11.6
cartesian product
For any sets A and B, the cartesian product A B is the set consisting of all ordered pairs
(a, b) where a A and b B.
11.7
chain
Let B A, where A is ordered by . B is a chain in A if any two elements of B are

comparable.
That is, B is a linearly ordered subset of A.
100
11.8
characteristic function
Definition Suppose A is a subset of a set X. Then the function

(
1, when x A,
A (x) =
0, when x X \ A
is the characteristic function for A.
Properties
Suppose A, B are subsets of a set X.
1. For set intersections and set unions, we have
A T B = A B ,
A S B = A + B A T B .
2. For the symmetric difference,
A a B = A + B 2A T B .
3. For the set complement,
A{ = 1 A .
Remarks
A synonym for characteristic function is indicator function [1].
REFERENCES
1. G.B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed,
John Wiley & Sons, Inc., 1999.
Version: 6 Owner: bbukh Author(s): bbukh, matte, vampyr
11.9
concentric circles
A collection of circles is said to be concentric if they have the same center. The region formed
between two concentric circles is therefore an annulus.
101
11.10
conjunction
A conjunction is true only when both parameters (called conjuncts) are true. In English, conjunction is denoted by the word and. Symbolically, we represent it as or multiplication
applied to Boolean parameters. Conjunction of a and b would be written
ab
or, in algebraic context,
ab
or
ab
The truth table for conjuction is
a b ab
F F
F
F T
F
T F
F
T T
T
11.11
disjoint
Two sets X and Y are disjoint if their intersection X

11.12
Y is the empty set.
empty set
An empty set is a set that contains no elements. The Zermelo-Fraenkel axioms of set theory
postulate that there exists an empty set.
102
11.13
even number
Definition Suppose k is an integer. If there exists an integer r such that k = 2r + 1, then

k is an odd number. If there exists an integer r such that k = 2r, then k is an even
number.
The concept of even and odd numbers are most easily understood in the binary base. Then
the above definition simply states that even numbers end with a 0, and odd numbers end
with a 1.
Properties
1. Every integer is either even or odd. This can be proven using induction, or using the
fundamental theorem of arithmetic.
2. An integer k is even (odd) if and only if k 2 is even (odd).
11.14
fixed point
A fixed point x of a function f : X X, is a point that remains constant upon application

of that function, i.e.:
f (x) = x.
11.15
infinite
A set S is infinite if it is not finite; that is, there is no n N for which there is a bijection
between n and S. Hence an infinite set has a cardinality greater than any natural number:
|S| 0
Infinite sets can be divided into countable and uncountable. For countably infinite sets S,
there is a bijection between S and N. This is not the case for uncountably infinite sets (like
the reals and any non-trivial real interval).
Some examples of finite sets:
103
The empty set: {}.

{0, 1}
{1, 2, 3, 4, 5}
{1, 1.5, e, }
Some examples of infinite sets:
{1, 2, 3, 4, . . .} (countable)
The primes: {2, 3, 5, 7, 9, . . .} (countable)
An interval of the reals: (0, 1) (uncountable)
The rational numbers: Q (countable)
Version: 4 Owner: akrowne Author(s): akrowne, vampyr
11.16
injective function
We say that a function f : X Y is injective or one-to-one if f (x) = f (y) implies x = y,

or equivalently, whenever x 6= y, then f (x) 6= f (y).
Version: 6 Owner: drini Author(s): drini
11.17
integer
The set of integers, denoted by the symbol Z, is the set { 3, 2, 1, 0, 1, 2, 3, . . . } consisting of the natural numbers and their negatives.
Mathematically, Z is defined to be the set of equivalence classes of pairs of natural numbers
N N under the equivalence relation (a, b) (c, d) if a + d = b + c.
Addition and multiplication of integers are defined as follows:
(a, b) + (c, d) := (a + c, b + d)
(a, b) (c, d) := (ac + bd, ad + bc)
104
Typically, the class of (a, b) is denoted by symbol n if b 6 a (resp. n if a 6 b), where n is

the unique natural number such that a = b + n (resp. a + n = b). Under this notation, we
recover the familiar representation of the integers as {. . . , 3, 2, 1, 0, 1, 2, 3, . . . }. Here
are some examples:
0 = equivalence class of (0, 0) = equivalence class of (1, 1) = . . .
The set of integers Z under the addition and multiplication operations defined above form
an integral domain. The integers admit the following ordering relation making Z into an
ordered ring: (a, b) 6 (c, d) in Z if a + d 6 b + c in N.
The ring of integers is also a Euclidean domain, with valuation given by the absolute value
function.
11.18
inverse function
Definition Suppose f : X Y is a mapping between sets X and Y , and suppose f 1 :

Y X is a mapping that satisfies
f 1 f = idX ,
f f 1 = idY .
Then f 1 is called the inverse of f , or the inverse function of f .
Remarks
1. The inverse function of a function f : X Y exists if and only if f is a bijection, that
is, f is an injection and a surjection.
2. When an inverse function exists, it is unique.
3. The inverse function and the inverse image of a set coincide in the following sense.
Suppose f 1 (A) is the inverse image of a set A Y under a function f : X Y . If f
is a bijection, then f 1 (y) = f 1 ({y}).
105
11.19
linearly ordered
An ordering (or <) of A is called linear or total if any two elements of A are comparable.
The pair (A, ) is then called a linearly ordered set.
11.20
operator
Synonym of mapping and function. Often used to refer to mappings where the domain and
codomain are, in some sense a space of functions.
Examples: differential operator, convolution operator.
Version: 2 Owner: rmilson Author(s): rmilson
11.21
ordered pair
For any sets a and b, the ordered pair (a, b) is the set {{a}, {a, b}}.
The characterizing property of an ordered pair is:
(a, b) = (c, d) a = b and c = d,
and the above construction of ordered pair, as weird as it seems, is actually the simplest
possible formulation which achieves this property.
11.22
ordering relation
Let S be a set. An ordering relation is a relation 6 on S such that, for every a, b, c S:

Either a 6 b, or b 6 a,
If a 6 b and b 6 c, then a 6 c,
If a 6 b and b 6 a, then a = b.
106
Given an ordering relation 6, one can define a relation < by: a given by: a > b if b 6 a, and the relation > is defined
analogously.
11.23
partition
A partition P of a set S is a collection of mutually disjoint non-empty sets such that
P = S.
Any partition P of a set S introduces an equivalence relation on S, where each p P is

an equivalence class. Similarly, given an equivalence relation on S, the collection of distinct
equivalence classes is a partition of S.
11.24
pullback
Definition Suppose X, Y, Z are sets, and we have maps

f : Y Z,
: X Y.
Then the pullback of f under is the mapping
f : X Z,
x 7 (f )(x).
Let us denote by M(X, Y ) the set of all mappings f : X Y . We then see that is a
mapping M(Y, Z) M(X, Z). In other words, pulls back the set where f is defined on
from Y to X. This is illustrated in the below diagram.
X
f
Y
f
Z
Properties
1. For any set X, (idX ) = idM (X,X) .
107
2. Suppose we have maps

: X Y,
:Y Z
between sets X, Y, Z. Then
( ) = .
3. If : X Y is a bijection, then is a bijection and
1
= 1 .
4. Suppose X, Y are sets with X Y . Then we have the inclusion map : X , Y , and
for any f : Y Z, we have
f = f |X ,
where f |X is the restriction of f to X [1].
REFERENCES
1. W. Aitken, De Rahm Cohomology: Summary of Lectures 1-4, online.
11.25
set closed under an operation
A set X is said to be closed under some map L, if L maps elements in X to elements in X,

i.e., L : X X. More generally, suppose Y is the n-fold cartesian product Y = X X.
If L is a map L : Y X, then we also say that X is closed under the map L.
The above definition has no relation with the definition of a closed set in topology. Instead,
one should think of X and L as a closed system.
Examples
1. The set of invertible matrices is closed under matrix inversion. This means that the
inverse of an invertible matrix is again an invertible matrix.
2. Let C(X) be the set of complex valued continuous functions on some topological space
X. Suppose f, g are functions in C(X). Then we define the pointwise product of f
and g as the function f g : x 7 f (x)g(x). Since f g is continuous, we have that C(X)
is closed under pointwise multiplication.
108
In the first examples, the operations is of the type X X. In the latter, pointwise multiplication is a map C(X) C(X) C(X).
11.26
signature of a permutation
Let X be a finite set, and let G be the group of permutations of X (see permutation group).
There exists a unique homomorphism from G to the multiplicative group {1, 1} such
that (t) = 1 for any transposition (loc. sit.) t G. The value (g), for any g G,
is called the signature or sign of the permutation g. If (g) = 1, g is said to be of even
parity; if (g) = 1, g is said to be of odd parity.
Proposition: If X is totally ordered by a relation <, then for all g G,
(g) = (1)k(g)
(11.26.1)
where k(g) is the number of pairs (x, y) X X such that x < y and g(x) > g(y). (Such a
pair is sometimes called an inversion of the permutation g.)
Proof: This is clear if g is the identity map X X. If g is any other permutation, then for
some consecutive a, b X we have a g(b). Let h G be the transposition
of a and b. We have
k(h g) = k(g) 1
(h g) = (g)
and the proposition follows by induction on k(g).
Version: 4 Owner: drini Author(s): Larry Hammick
11.27
subset
Given two sets A and B, we say that A is a subset of B (which we denote as A B or

simply A B) if every element of A is also in B. That is, the following implication holds:
x A x B.
Some examples: The set A = {d, r, i, t, o} is a subset of the set B = {p, e, d, r, i, t, o} because
every element of A is also in B. That is, A B.
On the other hand, if C = {p, e, d, r, o} neither A is a subset of C (because t A but t 6 C)
nor C is a subset of A (because p C but p 6 A). The fact that A is not a subset of C is
written as A 6 C. And then, in this example we also have C 6 A.
109
If X Y and Y X, it must be the case that X = Y .

Every set is a subset of itself, and the empty set is a subset of every other set. The set A is
called a proper subset of B, if A B and A 6= B (in this case we do not use A B).
11.28
surjective
A function f : X Y is called surjective or onto if, for every y Y , there is an x X

such that f (x) = y.
Equivalently, f : X Y is onto when its image is all the codomain:
Imf = Y.
11.29
transposition
Given a set X = {a1 , a2 , . . . , an }, a transposition is a permutation (bijective function of X

onto itself) f such that there exist indices i, j such that f (ai ) = aj , f (aj ) = ai and f (ak ) = ak
for all other indices k.
Example: If X = {a, b, c, d, e} the function given by
(a)
(b)
(c)
(d)
(e)
=
=
=
=
=
a
e
c
d
b
is a transposition.
One of the main results on symmetric groups states that any permutation can be expressed
as composition of transpositions, and for any two decompositions of a given permutation,
the number of transpositions is always even or always odd.
110
11.30
truth table
A truth table is a tabular listing of all possible input value combinations for a truth function
and their corresponding output values. For n input variables, there will always be 2n rows
in the truth table. A sample truth table for (a b) c would be
a
F
F
F
F
T
T
T
T
b
F
F
T
T
F
F
T
T
c (a b) c
F
T
T
F
F
T
T
F
F
T
T
F
F
T
T
T
(Note that represents logical and, while represents the conditional truth function).
111
Chapter 12
03-XX Mathematical logic and
foundations
12.1
standard enumeration
The standard enumeration of {0, 1} is the sequence of strings s0 = , s1 = 0, s2 = 1,

s3 = 00, s4 = 01, in lexicographic order.
The characteristic function of a language A is A : N {0, 1} such that
(
1, if sn A
A (n) =
0, if sn
/ A.
The characteristic sequence of a language A (also denoted as A ) is the concatenation of the
values of the characteristic function in the natural order.
Version: 12 Owner: xiaoyanggu Author(s): xiaoyanggu
112
Chapter 13
03B05 Classical propositional logic
13.1
CNF
A propositional formula is a CNF formula, meaning Conjunctive normal form, if it is a

conjunction of disjunction of literals (a literal is a propositional variable or its negation).
Hence, a CNF is a formula of the form: K1 K2 . . . Kn , where each Ki is of the form
li1 li2 . . . lim for literals lij and some m.
Example: (x y z) (y w u) (x v u).
Version: 2 Owner: iddo Author(s): iddo
13.2
Proof that contrapositive statement is true using

logical equivalence
You can see that the contrapositive of an implication is true by considering the following:
The statement p q is logically equivalent to p q which can also be written as p q.
By the same token, the contrapositive statement q p is logically equivalent to q p
which, using double negation on q, becomes q p.
This, of course, is the same logical statement.
Version: 2 Owner: sprocketboy Author(s): sprocketboy
113
13.3
contrapositive
Given an implication of the form

pq
(p implies q) the contrapositive of this implication is

qp
(not q implies not p).
An implication and its contrapositive are equivalent statements. When proving a theorem,
it is often more convenient or more intuitive to prove the contrapositive instead.
13.4
disjunction
A disjunction is true if either of its parameters (called disjuncts) are true. Disjunction
does not correspond to or in English (see exclusive or.) Disjunction uses the symbol
or sometimes + when taken in algebraic context. Hence, disjunction of a and b would be
written
ab
or
a+b
The truth table for disjunction is
a b ab
F F
F
F T
T
T F
T
T T
T
13.5
equivalent
Two statements A and B are said to be (logically) equivalent if A is true if and only if B is
true (that is, A implies B and B implies A). This is usually written as A B. For example,
for any integer z, the statement z is positive is equivalent to z is not negative and z 6= 0.
Version: 1 Owner: sleske Author(s): sleske
114
13.6
implication
An implication is a logical construction that essentially tells us if one condition is true, then
another condition must be also true. Formally it is written
ab
or
ab
which would be read a implies b, or a therefore b, or if a, then b (to name a few).
Implication is often confused for if and only if, or the biconditional truth function ().
They are not, however, the same. The implication a b is true even if only b is true. So
the statement pigs have wings, therefore it is raining today, is true if it is indeed raining,
despite the fact that the first item is false.
In fact, any implication a b is called vacuously true when a is false. By contrast, a b
would be false if either a or b was by itself false (a b (a b) (a b), or in terms of
implication as (a b) (b a)).
It may be useful to remember that a b only tells you that it cannot be the case that
b is false while a is true; b must follow from a (and false does follow from false).
Alternatively, a b is in fact equivalent to
b a
The truth table for implication is therefore
a b ab
F F
T
F T
T
T F
F
T T
T
13.7
propositional logic
A propositional logic is a logic in which the only objects are propositions, that is,
objects which themselves have truth values. Variables represent propositions, and there are
no relations, functions, or quantifiers except for the constants T and (representing true
115
and false respectively). The connectives are typically , , , and (representing negation,
conjunction, disjunction, and implication), however this set is redundant, and other choices
can be used (T and can also be considered 0-ary connectives).
A model for propositional logic is just a truth function on a set of variables. Such a truth
function can be easily extended to a truth function on all formulas which contain only the
variables is defined on by adding recursive clauses for the usual definitions of connectives.
For instance ( ) = T iff () = () = T .
Then we say |= if () = T , and we say |= if for every such that () is defined,
|= (and say that is a tautology).
Propositional logic is decidable: there is an easy way to determine whether a sentence is a
tautology. It can be done using truth tables, since a truth table for a particular formula can
be easily produced, and the formula is a tautology if every assignment of truth values makes
it true. It is not known whether this method is efficient: the equivalent problem of whether
a formula is satisfiable (that is, whether its negation is a tautology) is a canonical example
of an NP-complete problem.
13.8
theory
If L is a logical language for some logic L, and T is a set of formulas with no free variables
then T is a theory of L.
We write T for any formula if every model M of L such that M T , M .
We write T ` is for there is a proof of from T .
13.9
transitive
The transitive property of logic is

(a b) (b c) (a c)
Where is the conditional truth function. From this we can derive that
(a = b) (b = c) (a = c)
116
13.10
truth function
A truth function is a function that returns one of two values, one of which is interpreted
as true, and the other which is interpreted as false. Typically either T and F are
used, or 1 and 0, respectively. Using the latter, we can write
f : {0, 1}n {0, 1}
defines a truth function f . That is, f is a mapping from any number (n) of true/false (0 or
1) values to a single value, which is 0 or 1.
117
Chapter 14
03B10 Classical first-order logic
14.1
1 bootstrapping
This proves that a number of useful relations and functions are 1 in first order arithmetic,
providing a bootstrapping of parts of mathematical practice into any system including the 1
relations (since the 1 relations are exactly the recursive ones, this includes Turing machines).
First, we want to build a tupling relation which will allow a finite sets of numbers to be
encoded by a single number. To do this we first show that R(a, b) a|b is 1 . This is true
since a|b c 6 b(a c = b), a formula with only bounded quantifiers.
Next note that P (x) x is prime is 1 since P (x) y < x(y = 1 y|x). Also
AP (x, y) P (x) P (y) z < y(x < z P (x)).
These two can be used to define (the graph of) a primality function, p(a) = a + 1-th prime.
2
Let p(a, b) = c 6 ba ([2|c] [q < br 6 b(AP (q, r) j < c[q j |c r j+1 |c])] [ba |c]
[ba+1 |c]).
This rather awkward looking formula is worth examining, since it illustrates a principle which
will be used repeatedly. c is intended to be a function of the form 20 31 52 and so on. If
it includes ba but not ba+1 then we know that b must be the a + 1-th prime. The definition
is so complicated because we cannot just say, as wed like to, p(a + 1) is the smallest prime
greater than p(a) (since we dont allow recursive definitions). Instead we embed the series
of values this recursion would take into a single number (c) and guarantee that the recursive
relationship holds for at least a terms; then we just check if the a-th value is b.
Finally, we can define our tupling relation. Technically, sincePa given relation must have
a fixed arity, we define for each n a function hx0 , . . . , xn i = 6n pxi i +1 . Then define (x)i
to be the i-th element of x when x is interpreted as a tuple, so h(x)0 , . . . , (x)n i = x. Note
that the tupling relation, even taken collectively, is not total. For instance 5 is not a tuple
(although it is sometimes convenient to view it as a tuple with empty spaces: h , , 5i). In
118
situations like this, and also when attempting to extract entries beyond the length, (x)i = 0
(for instance, (5)0 = 0). On the other hand there is a 0-ary tupling relation, hi = 1.
Thanks to our definition of p, we have hx0 , P
. . . , xn i = x x = p(0)x0 +1 p(n)xn +1 . This
is clearly 1 . (Note that we dont use the
as above, since we dont have that, but since
we have a different tupling function for each n this isnt a problem.)
For the reverse, (x)i = y ([p(i)y+1 |x] [p(i)y+2 |x]) ([y = 0] [p(i)|x]).
Also, define a length function by len(x) = y [p(y + 1)|x] z 6 y[p(z)|x] and a membership relation by in(x, n) i < len(x)[(x)i = n].
Armed with this, we can show that all primitive recursive functions are 1 . To see this, note
that x = 0, the zero function, is trivially recursive, as are x = Sy and pn,m (x1 , . . . , xn ) = xm .
The 1 functions are closed under composition, since if (~x) and (~x) both have no unbounded
quantifiers, ((~x)) obviously doesnt either.
Finally, suppose we have functions f (~x) and g(~x, m, n) in 1 . Then define the primitive
recursion h(~x, y) by first defining:
x, y) = z ln(z) = y i < y[(z)i+1 = g(~x, i, (z)i )] [ln(z) = 0 (z)0 = f (~x)
h(~
x, y))y .
and then h(~x, y) = (h(~
1 is also closed under minimization: if R(~x, y) is a 1 relation then y.f (~x, y) is a function
giving the least y satisfying R(~x, y). To see this, note that y.f (~x, y) = z f (~x, z) m <
zf (~x, m).
Finally, using primitive recursion it is possible to concatenate sequences. First, to concatenate a single number, if s = hx0 , . . . xn i then s 1 y = t p(len(s) + 1)y+1 . Then we can define
the concatenation of s with t = hy0 , . . . , ym i by defining f (s, t) = s and g(s, t, j, i) = j 1 (t)i ,
and by primitive recursion, there is a function h(s, t, i) whose value is the first j elements of
t appended to s. Then s t = h(s, t, len(t).
We can also define u , which concatenates only elements of t not appearing in s. This just
requires defining the graph of g to be g(s, t, j, i, x) [in(s, (t)i) x = j] [ in(s, (t)i ) x =
j 1 (t)i ]
14.2
Boolean
Boolean refers to that which can take on the values true or false, or that which concerns
truth and falsity. For example Boolean variable, Boolean logic, Boolean statement,
etc.
119
Boolean is named for George Boole, the 19th century mathematician.

14.3
G
odel numbering
A G
odel numbering is any way off assigning numbers to the formulas of a language. This is
often useful in allowing sentences of a language to be self-referential. The number associated
with a formula is called its G
odel number and is denoted pq.
More formally, if L is a language and G is a surjective partial function from the terms of
L to the formulas over L then G is a Godel numbering. pq may be any term t such that
G(t) = . Note that G is not defined within L (there is no formula or object of L representing
G), however properties of it (such as being in the domain of G, being a subformula, and so
on) are.
Athough anything meeting the properties above is a Godel numbering, depending on the
specific language and usage, any of the following properties may also be desired (and can
often be found if more effort is put into the numbering):
If is a subformula of then pq < pq
For every number n, there is some such that pq = n
G is injective
14.4
G
odels incompleteness theorems
Godels first and second incompleteness theorems are perhaps the most celebrated results
in mathematical logic. The basic idea behind Godels proofs is that by the device of
Godel numbering, one can formulate properties of theories and sentences as arithmetical
properties of the corresponding Godel numbers, thus allowing 1st order arithmetic to speak
of its own consistency, provability of some sentence and so forth.
The original result Godel proved in his classic paper On Formally Undecidable propositions
in Principia Mathematica and Related Systems can be stated as
Theorem 1. No theory T axiomatisable in the type system of PM (i.e. in Russells theory of types)
which contains Peano-arithmetic and is -consistent proves all true theorems of arithmetic
(and no false ones).
120
Stated this way, the theorem is an obvious corollary of Tarskis result on the undefinability of Truth.
This can be seen as follows. Consider a Godel numbering G, which assigns to each formula
its Godel number d e . The set of Godel numbers of all true sentences of arithmetic is
{d e | N |= }, and by Tarskis result it isnt definable by any arithmetic formula. But
assume theres a theory T an axiomatisation AxT of which is definable in arithmetic and
which proves all true statements of arithmetic. But now P (P is a proof of x from AxT )
defines the set of (Godel numbers of) true sentences of arithmetic, which contradicts Tarskis
result.
The proof given above is highly non-constructive. A much stronger version can actually be
extracted from Godels paper, namely that
Theorem 2. There is a primitive recursive function G, s.t. if T is a theory with a p.r.
axiomatisation , and if all primitive recursive functions are representable in T then N |=
G() but T 6` G()
This second form of the theorem is the one usually proved, although the theorem is usually
stated in a form for which the nonconstructive proof based on Tarskis result would suffice.
The proof for this stronger version is based on a similar idea as Tarskis result.
Consider the formula P (P is a proof of x from ), which defines a predicate P rov (x)
which represents provability from . Assume we have numerated the open formulae with
one variable in a sequence Bi , so that every open formula occurs. Consider now the sentence
Prov(Bx ), which defines the non-provability from predicate. Now, since Prov(Bx ) is
an open formula with one variable, it must be Bk for some k. Thus we can consider the
closed sentence Bk (k). This sentence is equivalent to Prov(subst(d P rov (x)e ), k)), but
since subst(d Prov(x)e ), k) is just Bk (k), it asserts its own unprovability.
Since all the steps we took to get the undecided but true sentence subst(d Prov(x)e ), k)
is just Bk (k) were very simple mechanic manipulations of Godel numbers guaranteed to
terminate in bounded time, we have in fact produced the p.r. Function G required by the
statement of the theorem.
The first version of the proof can be used to show that also many non-axiomatisable theories
are incomplete. For example, consider P A + all true 1 sentences. Since 1 truth is
definable at 2 -level, this theory is definable in arithmetic by a formula . However, its not
complete, since otherwise p(p is a proof of x from ) would be the set of true sentences
of arithmetic. This can be extended to show that no arithmetically definable theory with
sufficient expressive power is complete.
The second version of Godels first incompleteness theorem suggests a natural way to extend
theories to stronger theories which are exactly as sound as the original theories. This sort
of process has been studied by Turing, Feferman, Fenstad and others under the names of
ordinal logics and transfinite recursive progressions of arithmetical theories.
Godels second incompleteness theorem concerns what a theory can prove about its own
provability predicate, in particular whether it can prove that no contradiction is provable.
121
The answer under very general settings is that a theory cant prove that it is consistent,
without actually being inconsistent.
The second incompleteness theorem is best presented by means of a provability logic. Consider an arithmetic theory T which is p.r. axiomatised by . We extend the language this
theory is expressed in with a new sentence forming operator 2, so that any sentence in
parentheses prefixed by 2 is a sentence. Thus for example, 2(0 = 1) is a formula. Intuitively, we want 2() to express the provability of from . Thus the semantics of our new
language is exactly the same as that of the original language, with the additional rule that
2() is true if and only if ` . There is a slight difficulty here; might itself contain
boxed expressions, and we havent yet provided any semantics for these. The answer is simple, whenever a boxed expression 2() occurs within the scope of another box, we replace
it with the arithmetical statement P rov (). Thus for example the truth of 2(2(0 = 1))
is equivalent to ` P rov (d 0 = 1e ). Assuming that is strong enough to prove all true
instances of Prov(d e ) we can in fact interprete the whole of the new boxed language by
the translation. This is what we shall do, so formally ` (where might contain boxed
sentences) is taken to mean ` where is obtained by replacing the boxed expressions
with arithmetical formulae as above.
There are a number of restrictions we must impose on (and thus on 2, the meaning of
which is determined by ). These are known as Hilbert-Bernays derivability conditions and
they are as follows
if ` then ` 2()
` 2() 22()
` 2( ) (2 2)
A statement Cons asserts the consistency of if its equivalent to 2(0 = 1). Godels first
incompleteness theorem shows that there is a sentence Bk (k) for which the following is true
2(0 = 1) 3(Bk (k)) 3((Bk (k)), where 3 is the dual of 2, i.e. 3() 2().
A careful analysis reveals that this is provable in any which satisfied the derivability
conditions, i.e. ` 2(0 = 1) 3(Bk (k)) 3((Bk (k)). Assume now that can prove
2(0 = 1), i.e. that can prove its own consistency. Then can prove 3(Bk (k))
3((Bk (k)). But this means that can prove Bk (k)! Thus is inconsistent.
Version: 4 Owner: Aatu Author(s): Aatu
14.5
Lindenbaum algebra
Let L be a first order language. We define the equivalence relation over formulas of L by
if and only if ` . Let B = L/ be the set of equivalence classes. We define
the operations and and complementation, denoted [] on B by :
122
[] [] = [ ]
[] [] = [ ]
[] = []
We let 0 = [] and 1 = []. Then the structure (B, , ,, 0, 1) is a Boolean algebra,

called the Lindenbaum algebra.
Note that it may possible to define the Lindenbaum algebra on extensions of first order logic,
as long as there is a notion of formal proof that can allow the definition of the equivalence
relation.
Version: 12 Owner: jihemme Author(s): jihemme
14.6
Lindstr
oms theorem
One of the very first results of the study of model theoretic logics is a characterisation
theorem due to Per Lindstrom. He showed that the classical first order logic is the strongest
logic having the following properties
Being closed under contradictory negation
compactness
Lowenheim-Skolem theorem
also, he showed that first order logic can be characterised as the strongest logic for which
the following hold
Completeness (r.e. axiomatisability)
Lowenheim-Skolem theorem
The notion of strength used here is as follows. A logic L0 is stronger than L or as strong
if the class of sets definable in L the class of sets definable in L0 .
123
14.7
Pressburger arithmetic
Pressburger arithmetic is a weakened form of arithmetic which includes the structure

N, the constant 0, the unary function S, the binary function +, and the binary relation <.
Essentially, it is Peano arithmetic without multiplication.
Pressburger arithmetic is decideable, but is consequently very limited in what it can express.
14.8
R-minimal element
Let S be a set and R be a relation on S. An element a S is said to be R-minimal if and

only if there is no x S such that xRa.
14.9
Skolemization
Skolemization is a way of removing existential quantifiers from a formula. Variables bound

by existential quantifiers which are not inside the scope of universal quantifiers can simply
be replaced by constants: x[x < 3] can be changed to c < 3, with c a suitable constant.
When the existential quantifier is inside a universal quantifier, the bound variable must
be replaced by a Skolem function of the variables bound by universal quantifiers. Thus
x[x = 0 y[x = y + 1]] becomes x[x = 0 x = f (x) + 1].
This is used in second order logic to move all existential quantifiers outside the scope of first
order universal quantifiers. This can be done since second order quantifiers can quantify over
functions. For instance 1 x1 y1 z(x, y, z) is equivalent to 2 F 1 x1 y(x, y, F (x, y)).
14.10
arithmetical hierarchy
The arithmetical hierarchy is a hierarchy of either (depending on the context) formulas

or relations. The relations of a particular level of the hierarchy are exactly the relations
defined by the formulas of that level, so the two uses are essentially the same.
The first level consists of formulas with only bounded quantifiers, the corresponding relations
124
are also called the primitive recursive relations (this definition is equivalent to the definition
from computer science). This level is called any of 00 , 00 and 00 , depending on context.
A formula is 0n if there is some 01 formula such that can be written:
(~k) = x1 x2 Qxn (~k, ~x)
where Q is either or , whichever maintains the pattern of alternating quantifiers
The 01 relations are the same as the Recursively Enumerable relations.
Similarly, is a 0n relation if there is some 01 formula such that:
(~k) = x1 x2 Qxn (~k, ~x)
where Q is either or , whichever maintains the pattern of alternating quantifiers
A formula is 0n if it is both 0n and 0n . Since each 0n formula is just the negation of a 0n
formula and vice-versa, the 0n relations are the complements of the 0n relations.
T
The relations in 01 = 01 01 are the Recursive relations.
Higher levels on the hierarchy correspond to broader and broader classes of relations. A formula or relation which is 0n (or, equivalently, 0n ) for some integer n is called arithmetical.
The superscript 0 is often omitted when it is not necessary to distinguish from the analytic hierarchy.
Functions can be described as being in one of the levels of the hierarchy if the graph of the
function is in that level.
Version: 14 Owner: iddo Author(s): yark, iddo, Henry
14.11
arithmetical hierarchy is a proper hierarchy
By definition, we have n = n
n . In addition, n
n n+1 .
This is proved by vacuous quantification. If R is equivalent to (~n) then R is equivalent to

x(~n) and x(~n), where x is some variable that does not occur free in .
More significant is the proof that all containments are proper. First, let n > 1 and U be
universal for 2-ary n relations. Then D(x) U(x, x) is obviously n . But suppose D n .
Then D P in , so D n . Since U is universal, ther is some e such that D(x) U(e, x),
and therefore D(e) U(e, e) U(e, e). This is clearly a contradiction, so D n \ n
and D n \ n .
125
In addition the recursive join of D and D, defined by

D D(x) (y < x[x = 2 y] D(x)) (y < x[x = 2 y] D(x))
Clearly both D and D can be recovered from D D, so it is contained in neither n nor
n . However the definition aboveShas only unbounded quantifiers except for those in D and
D, so D D(x) n+1 \ n n
14.12
atomic formula
Let L be a first order language, and suppose it has signature . A formula of L is said to
be atomic if and only if :
1. = t1 = t2 , where t1 and t2 are terms;
2. = R(t1 , ..., tn ), where R is an n-ary relation symbol.
14.13
creating an infinite model
From the syntactic compactness theorem for first order logic, we get this nice (and useful)
result:
Let T be a theory of first-order logic. If T has finite models of unboundedly large sizes, then
T also has an infinite model.
D
efine the propositions

n x1 . . . xn .(x1 6= x2 ) . . . (x1 6= xn ) (x2 6= x3 ) . . . (xn1 6= xn )
(n says there exist (at least) n different elements in the world ). Note that . . . ` n `
. . . ` 2 ` 1 . Define a new theory
[
T = T {1 , 2 , . . .} .
For any finite subset T 0 T , we claim that T 0 is consistent: Indeed, T 0 contains axioms of
T, along with finitely many of {n }n1 . Let m correspond to the largest index appearing in
T 0 . If Mm |= T is a model
at least m elements (and by hypothesis, such as model
S of T with
0
exists), then Mm |= T {m } ` T .
126
So every finite subset of T is consistent; by the compactness theorem for first-order logic,
T is consistent, and by Godels completeness theorem for first-order logic it has a model
M. Then M |= T ` T, so M is a model of T with infinitely many elements (M |= n for
any n, so M has at least n elements for all n).
14.14
criterion for consistency of sets of formulas
Let L be a first order language, and L be a set of sentences. Then is consistent if

and only if every finite subset of is consistent.
14.15
deductions are 1
Using the example of Godel numbering, we can show that Proves(a, x) (the statement that
a is a proof of x, which will be formally defined below) is 1 .
First, Term(x) should be true iff x is the Godel number of a term. Thanks to primitive
recursion, we can define it by:
Term(x) i < x[x = h0, ii]
x = h5i
y < x[x = h6, yi Term(y)]
y, z < x[x = h8, y, zi Term(y) Term(z)]
Then AtForm(x), which is true when x is the Godel number of an atomic formula, is defined
by:
Form(x) y, z < x[x = h1, y, zi Term(y) Term(z)]
Next, Form(x), which is true only if x is the Godel number of a formula, is defined recursively
127
by:
Form(x) AtForm(x)
i, y < x[x = h2, i, yi Form(y)]
y < x[x = h3, yi Form(y)]
y, z < x[x = h4, y, zi Form(y) Form(z)]
The definition of QFForm(x), which is true when x is the Godel numbe of a quantifier free formula,
is defined the same way except without the second clause.
Next we want to show that the set of logical tautologies is 1 . This will be done by formalizing the concept of truth tables, which will require some development. First we show that
AtForms(a), which is a sequence containing the (unique) atomic formulas of a is 1 . Define
it by:
AtForms(a, t) ( Form(a) t = 0)
Form(a) (
x, y < a[a = h1, x, yit = a]
x, y < a[a = h7, x, yi t = a]
i, x < a[a = h2, i, xi t = AtForms(x)]
x < a[a = h3, xi t = AtForms(x)]
x, y < a[a = h4, x, yi t = AtForms(x) u AtForms(y)])
We say v is a truth assignment if it is a sequence of pairs with the first member of each
pair being a atomic formula and the second being either 1 or 0:
T A(v) i < len(v)x, y < (v)i[(v)i = hx, yi AtForm(x) (y = 1 y = 0)]
Then v is a truth assignment for a if v is a truth assignment, a is quantifier free, and every
atomic formula in a is the first member of one of the pairs in v. That is:
T Af (v, a) T A(v)QFForm(a)i < len(AtForms(a))j < len(v)[((v)j )0 = (AtForms(a))i ]

Then we can define when v makes a true by:
True(v, a) T Af (v, a)
AtForm(a) i < len(v)[((v)i )0 = a ((v)i )1 = 1]
y < x[x = h3, yi True(v, y)]
y, z < x[x = h4, y, zi True(v, y) True(v, z)]
128
Then a is a tautology if every truth assignment makes it true:

AtForms(a)
Taut(a)v < 22
[T Af (v, a) True(v, a)]
We say that a number a is a deduction of if it encodes a proof of from a set of axioms

Ax. This means that a is a sequence where for each (a)i either:
(a)i is the Godel number of an axiom
(a)i is a logical tautology
or
there are some j, k < i such that (a)j = h4, (a)k , (a)i i (that is, (a)i is a conclusion
under modus ponens from (a)j and (a)k ).
and the last element of a is pq.
If Ax is 1 (almost every system of axioms, including P A, is 1 ) then Proves(a, x), which is
true if a is a deduction whose last value is x, is also 1 . This is fairly simple to see from the
above results (let Ax(x) be the relation specifying that x is the Godel number of an axiom):
Proves(a, x) i < len(a)[Ax((a)i ) j, k < i[(a)j = h4, (a)k , (a)i i] Taut((a)i )]
14.16
example of G
odel numbering
We can define by recursion a function e from formulas of arithmetic to numbers, and the
corresponding Godel numbering as the inverse.
The symbols of the language of arithmetic are =, , , , 0, S, <, +, , the variables vi
for any integer i, and ( and ). ( and ) are only used to define the order of operations, and
should be inferred where appropriate in the definition below.
We can define a function e by recursion as follows:
e(vi ) = h0, ii
e( = ) = h1, e(), e()i
e(vi ) = h2, e(vi), e()i
129
e() = h3, e()i

e( ) = h4, e(), e()i
e(0) = h5i
e(S) = h6, e()i
e( < ) = h7, e(), e()i
e( + ) = h8, e(), e()i
e( ) = h9, e(), e()i
Clearly e1 is a Godel numbering, with pq = e().
14.17
example of well-founded induction
As an example of the use of Well-founded induction in the case where the order is not a
linear one, Ill prove the fundamental theorem of arithmetic : every natural number has a
prime factorization.
First note that the division relation is well-founded. This fact is proven in every algebra
books. The |-minimal elements are the prime numbers. We detail the two steps of the proof
:
1. If n is prime, then n is its own factorization into primes, so the assertion is true for
the |-minimal elements.
2. If n is not prime, then n has a non-trivial factorization (by definition of not being
prime), i.e. n = m`, where m, n 6= 1. By induction, m and ` have prime factorizations,
and we can see that this implies that n has one too. This takes care of case 2.
Here are other commonly used well-founded sets :
1. ideals of a Noetherian ring ordered by inverse proper inclusion;
2. Ideals of an artinian ring ordered by inclusion;
3. graphs ordered by minors (A graph A is a minor of B if and only if it can be obtained
from B by collapsing edges);
4. ordinal numbers;
130
5. etc.
14.18
first order language
Terms and formulas of first order logic are constructed with the classical logical symbols ,,
, , , , , and also ( and ), and a set
!
!
[
[ [
[
=
Reln
Funn
Const
n
where for each natural number n,

Reln is a (usually countable) set of n-ary relation symbols.
Funn is a (usually countable) set of n-ary function symbols.
Const is a (usually countable) set of constant symbols.
We require that all these sets be disjoint. The elements of the set are the only non-logical
symbols that we are allowed to use when we construct terms and formulas. They form the
signature of the language. So far they are only symbols, so they dont mean anything. For
most structures that we encounter, the set is finite, but we allow it to be infinite, even
uncountable, as this sometimes makes things easier, and just about everything still works
when the signature is uncountable. We also assume that we have an unlimited supply of
variables, with the only constraint that the collection of variables form a set, which should
be disjoint from the other sets of non-logical symbols.
The arity of a function or relation symbol is the number of parameters the symbol is about
to take. It is usually assumed to be a property of the symbol, and it is bad grammar to use
an n-ary function or relation with m parameters if m 6= n.
Terms are built inductively according to the following rules :
1. Any variable is a term;
2. Any constant symbol is a term;
3. If f is an n-ary function symbol, and t1 , ..., tn are term, then f (t1 , ..., tn ) is a term.
131
With terms in hands, we build formulas inductively by a finite application of the following
rules :
1. If t1 and t2 are terms, then t1 = t2 is a formula;
2. If R is an n-ary relation symbol and t1 , ..., tn are terms, then R(t1 , ..., tn ) is a formula;
3. If is a formula, then so is ;
4. If and are formulas, then so is ;
5. If is a formula, and x is a variable, then x() is a formula.
The other logical symbols are obtained in the following way :
def
def
:= ( )
:=
def
:= ( ) ( )
def
x. := (x())
All logical symbols are used when building formulas.

14.19
first order logic
A logic is first order if it has exactly one type. Usually the term refers specifically to the
logic with connectives , , , , and and the quantifiers and , all given the usual
semantics:
is true iff is not true
is true if either is true or is true
x(x) is true iff tx is true for every object t (where tx is the result of replacing every
unbound occurence of x in with t)
is the same as ( )
is the same as ()
is the same as ( ) ( )
x(x) is the same as x(x)
132
However languages with slightly different quantifiers and connectives are sometimes still
called first order as long as there is only one type.
14.20
first order theories
Let L be a first-order language. A theory in L is a set of sentences of L, i.e. a set of

formulas of L that have no free variables.
Definition. A theory T is said to be consistent if and only if T 6`, where stands for
false. In other words, T is consistent if one cannot derive a contradiction from it.S If is
a sentence of L, then we say is consistent with T if and only if the theory T {} is
consistent.
Definition. A theory T L is said to be complete if and only if for every formula L,
either T ` or T ` .
lemma. A theory T in L is complete if and only
S if it is maximal consistent. In other words,
T is complete if and only if for every 6 T , T {} is inconsistent.
Theorem. (Tarski) Every consistent theory T in L can be extended to a complete theory.
Proof : Use Zorns lemma on the collection of consistent theory extending T .
14.21
free and bound variables
In the entry first-order languages, I have mentioned the use of variables without mentioning what variables really are. A variable is a symbol that is supposed to range over the
universe of discourse. Unlike a constant, it has no fixed value.
There are two ways in which a variable can occur in a formula: free or bound. Informally,
a variable is said to occur free in a formula if and only if it is not within the scope of a
quantifier. For instance, x occurs free in if and only if it occurs in it as a symbol, and no
subformula of is of the form x.. Here the x after the is to be taken literally : it is x
and no other symbol.
The set F V () of free variables of is defined by Well-founded induction on the construction
of formulas. First we define V ar(t), where t is a term, to be the set or all variables occurring
in t, and then :
133
F V (t1 = t2 ) = V ar(t2 )
F V (R(t1 , ..., tn )) =
n
[
V ar(t2 )
V ar(tk )
k=1
F V () = F V ()
F V ( ) = F V ()
F V ()
F V (x()) = F V ()\{x}
When for some , the set F V () is not empty, then it is customary to write as (x1 , ...xn ),
in order to stress the fact that there are some free variables left in , and that those free
variables are among x1 , ..., xn . When x1 , ..., xn appear free in , then they are considered as
place-holders, and it is understood that we will have to supply values for them, when
we want to determine the truth of . If F V () = , then is called a sentence.
If a variable never occurs free in (and occurs as a symbol), then we say the variable is
bound. A variable x is bound if and only if x() or x() is a subformula of for some
The problem with this definition is that a variable can occur both free and bound in the
same formula. For example, consider the following formula of the lenguage {+, , 0, 1} of ring
theory :
x + 1 = 0 x(x + y = 1)
The variable x occurs both free and bound here. However, the following lemma tells us that
we can always avoid this situation :
Lemma 1. It is possible to rename the bound variables without affecting the truth of a
formula. In other words, if = x(), or x(), and z is a variable not occuring in , then
` z((z/x)), where (z/x) is the formula obtained from by replacing every free
occurence of x by z.
14.22
generalized quantifier
Generalized quantifiers are an abstract way of defining quantifiers.

The underlying principle is that formulas quantified by a generalized quantifier are true if
the set of elements satisfying those formulas belong in some relation associated with the
quantifier.
134
Every generalized quantifier has an arity, which is the number of formulas it takes as arguments, and a type, which for an n-ary quantifier is a tuple of length n. The tuple represents
the number of quantified variables for each argument.
The most common quantifiers are those of type h1i, including and . If Q is a quantifier
of type h1i, M is the universe of a model, and QM is the relation associated with Q in that
model, then Qx(x) {x M | (x)} QM .
So M = {M}, since the quantified formula is only true when all elements satisfy it. On the
other hand M = P (M) {}.
In general, the monadic quantifiers are those of type h1, . . . , 1i and if Q is an n-ary monadic
quantifier then QM P (M)n . Hartigs quantifier, for instance, is h1, 1i, and IM = {hX, Y i |
X, Y M |X| = |Y |}.
A quantifier Q is polyadic if it is of type hn1 , . . . , nn i where each ni N. Then:
QM
P (M)ni
These can get quite elaborate; W xy(x, y) is a h2i quantifier where X WM X is a

well-ordering. That is, it is true if the set of pairs making true is a well-ordering.
14.23
logic
Generally, by logic, people mean first order logic, a formal set of rules for building mathematical statements out of symbols like (negation) and (implication) along with quantifiers
like (for every) and (there exists).
More generally, a logic is any set of rules for forming sentences (the logics syntax) together
with rules for assigning truth values to them (the logics semantics). Normally it includes
a (possibly empty) set of types T (also called sorts), which represent the different kinds of
objects that the theory discusses (typical examples might be sets, numbers, or sets of numbers). In addition it specifies particular quantifiers, connectives, and variables. Particular
theories in the logic can then add relations and functions to fully specify a logical language.
135
14.24
proof of compactness theorem for first order logic
The theorem states that if a set of sentences of a first-order language L is inconsistent, then
some finite subset of it is inconsistent. Suppose L is inconsistent. Then by definition
`, i.e. there is a formal proof of false using only assumptions from . Formal proofs
are finite objects, so let collect all the formulas of that are used in the proof.
14.25
proof of principle of transfinite induction
To prove the transfinite induction theorem, we note that the class of ordinals is well-ordered
by . So suppose for some , there are ordinals such that () is not true. Suppose
further that satisfies the hypothesis, i.e. ( < (()) ()). We will reach a
contradiction.
The class C = { : ()} is not empty. Note that it may be a proper class, but this is not
important. Let = min(C) be the -minimal element of C. Then by assumption, for every
< , () is true. Thus, by hypothesis, () is true, contradiction.
Version: 8 Owner: jihemme Author(s): jihemme, quadrate
14.26
proof of the well-founded induction principle
This proof is very similar to the proof of the transfinite induction theorem. Suppose is
defined for a well-founded set (S, R), and suppose is not true for every a S. Assume
further that satisfies requirements 1 and 2 of the statement. Since R is a well-founded
relation, the set {a S : (a)} has an R minimal element r. This element is either an R
minimal element of S itself, in which case condition 1 is violated, or it has R predessors. In
this case, we have by minimality (s) for every s such that sRr, and by condition 2, (r) is
true, contradiction.
14.27
quantifier
A quantifier is a logical symbol which makes an assertion about the set of values which
make one or more formulas true. This an exceedingly general concept; the vast majority of
mathematics is done with the two standard quantifiers, and .
136
The universal quantifier takes a variable and a formula and asserts that the formula
holds for any value of x. A typical example would be a sentence like:
x[0 6 x]
which states that no matter what value x takes, 0 6 x.
The existential quantifier is the dual; that is the formula x(x) is equivalent to
x(x). It states that there is some x satsifying the formula, as in
x[x > 0]
which states that there is some value of x greater than 0.
The scope of a quantifier is the portion of a formula where it binds its variables. Note
that previous bindings of a variable are overridden within the scope of a quantifier. In the
examples above, the scope of the quantifiers was the entire formula, but that need not be
the case. The following is a more complicated use of quantifiers:
The scope of the first existential quantifier.
z
}|
{
x [x = 0 y [x = y + 1 (y = 0 x [y = x + 1])]]
| {z }
|
{z
}
The scope of the universal quantifier.
:The scope of the second existential quantifier. Within this area, all references to x refer to
the variable bound by the existential quantifier. It is impossible to refer directly to the one
bound by the universal quantifier.
As that example illustrates, it can be very confusing when one quantifier overrides another.
Since it does not change the meaning of a sentence to change a bound variable and all bound
occurrences of it, it is better form to replace sentences like that with an equivalent but more
readable one like:
x[x = 0 y[x = y + 1 (y = 0 z[y = z + 1])]]
These sentences both assert that every number is either equal to zero, or that there is some
number one less than it, and that the number one less than it is also either zero or has
a number one less than it. [Note: This is not the most useful of sentences. It would be
nice to replace this with a mathematically simple sentence which uses nested quantifiers
meaningfully.]
137
The quantifiers may not range over all objects. That is, x(x) may not specify that x
can be any object, but rather any object belonging to some class of objects. Similarly
x(x) may specify that there is some x within that class which satisfies . For instance
second order logic has two universal quantifiers, 1 and 2 (with corresponding existential
quantifiers), and variables bound by them only range over the first and second order objects
respectively. So 1 x[0 6 x] only states that all numbers are greater than or equal to 0, not
that sets of numbers are as well (which would be meaningless).
A particular use of a quantifier is called bounded or restricted if it limits the objects
to a smaller range. This is not quite the same as the situation mentioned above; in the
situation above, the definition of the quantifier does not include all objects. In this case,
quantifiers can range over everything, but in a particular formula it doesnt. This is expressed
in first order logic with formulas like these four:
x[x < c (x)]x[x X (x)]
x[x < c (x)]x[x X (X)]
The restriction is often incorporated into the quantifier. For instance the first example might
be written x < c[(c)].
A quantifier is called vacuous if the variable it binds does not appear anywhere in its scope,
such as xy[0 6 x]. While vacuous quantifiers do not change the meaning of a sentence,
they are occasionally useful in finding an equivalent formula of a specific form.
While these are the most common quantifiers (in particular, they are the only quantifiers
appearing in classical first-order logic), some logics use others. The quantifier !x(x), which
means that there is a unique x satsifying (x) is equivalent to x[(x) y[(y) x = y]].
Other quantifiers go beyond the usual two. Examples include interpreting Qx(x) to mean
there are an infinite (or uncountably infinite) number of x satisfying (x). More elaborate
examples include the branching Henkin quantifier, written:
x y
(x, y, a, b)
a b
This quantifier is similar to xyab(x, y, a, b) except that the choice of a and b cannot
depend on the values of a and b. This concept can be further generalized to the gamesemantic, or independence-friendly, quantifiers. All of these quantifiers are examples of
generalized quantifiers.
138
14.28
quantifier free
Let L be a first order language. A formula is quantifier free iff it contains no quantifiers.
Let T be a complete L-theory. Let S L. Then S is an elimination set for T iff for every
(
x) L there is some (
x) S so that T `
x((
x) (
x).
In particular, T has quantifier elimination iff the set of quantifier free formulas is an elimination set for T . In other words T has quantifier elimination iff for every (
x) L there is
some quantifier free (
x) L so that T `
x((
x)) (
x).
Version: 2 Owner: mathcam Author(s): mathcam, Timmy
14.29
subformula
Let L be a first order language and suppose , L are formulas. Then we say that is a
subformula of if and only if :
1. 6=
2. is one of , x() or x(), and either = , or is a subformula of .
3. is or and either = , = , or is a subformula of or
14.30
syntactic compactness theorem for first order

logic
Let L be a first-order language, and L be a set of sentences. If is inconsistent, then

some finite is inconsistent.
14.31
transfinite induction
Suppose () is a property defined for every ordinal , the principle of transfinite induction states that in the case where for every , if the fact that () is true for every <
implies that () is true, then () is true for every ordinal . Formally :
139
(( < ()) ()) (())

The principle of transfinite induction is very similar to the principle of finite induction, except that it is stated in terms of the whole class of the ordinals.
Version: 7 Owner: jihemme Author(s): jihemme, quadrate
14.32
universal relation
If is a class of n-ary relations with ~x as the only free variables, an n + 1-ary formula is
universal for if for any there is some e such that (e, ~x) (~x). In other words,
can simulate any element of .
Similarly, if is a class of function of ~x, a formula is universal for if for any there
is some e such that (e, ~x) = (~x).
14.33
universal relations exist for each level of the arithmetical hierarchy
Let L {n , n , n } and take any k N. Then there is a k + 1-ary relation U L such

that U is universal for the k-ary relations in L.
Proof
First we prove the case where L = 1 , the recursive relations. We use the example of a
Godel numbering.
Define T to be a k + 2-ary relation such that T (e, ~x, a) if:
e = pq
a is a deduction of either (~x) or (~x)
Since deductions are 1 , it follows that T is 1 . Then define U 0 (e, ~x) to be the least a such
that T (e, ~x, a) and U(e, ~x) (U 0 (e, ~x))len(U 0 (e,~x)) = e. This is again 1 since the 1 functions
are closed under minimization.
140
If f is any k ary 1 function then f (~x) = U(pf q, ~x).

Now take L to be the k-ary relatons in either n or n . Call the universal relation for
k + n-ary 1 relations U . Then any L is equivalent to a relation in the form
Qy1 Q0 y2 Q yn (~x, ~y ) where g 1 , and so U(~x) = Qy1 Q0 y2 Q yn U (pq, ~x, ~y ). Then
U is universal for L.
Finally, if L is the k-ary n relations and L then is equivalent to relations of the form
y1 y2 Qyn (~x, ~y ) and z1 z2 Qzn (~x, ~z). If the k-ary universal relations for n and
n are U and U respectively then (~x) U (pq, ~x) U (pq, ~x).
14.34
well-founded induction
The principle of well-founded induction is a generalization of the principle of transfinite induction.

Definition. Let S be a non-empty set, and R be a partial order relation on S. Then R is
said to be a well-founded relation if and only if every subset X S has an R-minimal
element. In the special case where R is a total order, we say S is well-ordered by R. The
structure (S, R) is called a well-founded set.
Note that R is by no means required to be a total order. A classical example of a wellfounded set that is not totally ordered is the set N of natural numbers ordered by division,
i.e. aRb if and only if a divides b, and a 6= 1. The R-minimal elements of this order are the
prime numbers.
Let be a property defined on a well-founded set S. The principle of well-founded induction
states that if the following is true :
1. is true for all the R-minimal elements of S
2. for every a, if for every x such that xRa, we have (x), then we have (a)
then is true for every a S.
As an example of application of this principle, we mention the proof of the fundamental theorem of arithmet
: every natural number has a unique factorization into prime numbers. The proof goes by
well-founded induction in the set N ordered by division.
141
14.35
well-founded induction on formulas
Let L be a first-order language. The formulas of L are built by a finite application of the
rules of construction. This says that the relation 6 defined on formulas by 6 if and only
if is a subformula of is a well-founded relation. Therefore, we can formulate a principle
of induction for formulas as follows : suppose P is a property defined on formulas, then P
is true for every formula of L if and only if
1. P is true for the atomic formulas;
2. for every formula , if P is true for every subformula of , then P is true for .
142
Chapter 15
03B15 Higher-order logic and type
theory
15.1
H
artigs quantifier
H
artigs quantifier is a quantifier which takes two variables and two formulas, written
Ixy(x)(y). It asserts that |{x | (x)}| = |{y | (y)|. That is, the cardinality of the values
of x which make is the same as the cardinality of the values which make (x) true. Viewed
as a generalized quantifier, I is a h2i quantifier.
Closely related is the Rescher quantifier, which also takes two variables and two formulas,
is written Jxy(x)(y), and asserts that |{x | (x)}| 6 |{y | (y)|. The Rescher quantifier is
sometimes defined instead to be a similar but different quantifier, Jx(x) |{x | (x)}| >
|{x | (x)}|. The first definition is a h2i quantifier while the second is a h1i quantifier.
Another similar quantifier is Changs quantifier QC , a h1i quantifier defined by QC
M = {X
C
M | |X| = |M|}. That is, Q x(x) is true if the number of x satisfying has the same
cardinality as the universe; for finite models this is the same as , but for infinite ones it is
not.
15.2
Russells theory of types
After the discovery of the paradoxes of set theory (notably Russells paradox), it become
apparent that naive set theory must be replaced by something in which the paradoxes cant
arise. Two solutions were proposed: type theory and axiomatic set theory based on a
limitation of size principle (see the entries class and von Neumann-Bernays-Godel set theory).
143
Type theory is based on the idea that impredicative definitions are the root of all evil.
Bertrand Russell and various other logicians in the beginning of the 20th century proposed
an analysis of the paradoxes that singled out so called vicious circles as the culprits. A
vicious circle arises when one attempts to define a class by quantifying over a totality of
classes including the class being defined. For example, Russells class R = {x | x 6 x}
contains a variable x that ranges over all classes.
Russells type theory, which is found in its mature form in the momentous Principia Mathematica avoids the paradoxes by two devices. First, Freges fifth axiom is abandoned entirely:
the extensions of predicates do not appear among the objects. Secondly, the predicates
themselves are ordered into a ramified hierarchy so that the predicates at the lowest level
can be defined by speaking of objects only, the predicates at the next level by speaking of
objects and of predicates at the previous level and so forth.
The first of these principles has drastic implications to mathematics. For example, the
predicate has the same cardinality seemingly cant be defined at all. For predicates apply
only to objects, and not to other predicates. In Freges system this is easy to overcome: the
equicardinality predicate is defined for extensions of predicates, which are objects. In order
to overcome this, Russell introduced the notion of types (which are today known as degrees).
Predicates of degree 1 apply only to objects, predicates of degree 2 apply to predicates of
degree 1, and so forth.
Type theoretic universe may seem quite odd to someone familiar with the cumulative hierarchy of set theory. For example, the empty set appears anew in all degrees, as do various
other familiar structures, such as the natural numbers. Because of this, it is common to
indicate only the relative differences in degrees when writing down a formula of type theory,
instead of the absolute degrees. Thus instead of writing
P1 x0 (x0 P1 x0 6= x0 )
one writes
Pi+1 xi (xi Pi+1 xi 6= xi )
to indicate that the formula holds for any i. Another possibility is simply to drop the
subscripts indicating degree and let the degrees be determined implicitly (this can usually
be done since we know that x y implies that if y is of degree n, then x is of degree n + 1).
A formula for which there is an assignment of types (degrees) to the variables and constants
so that it accords to the restrictions of type theory is said to be stratified.
The second device implies another dimension in which the predicates are ordered. In any
given degree, there appears a hierarchy of levels. At first level of degree n + 1 one has
predicates that apply to elements of degree n and which can be defined with reference only
to predicates of degree n. At second level there appear all the predicates that can be defined
144
with reference to preidcates of degree n and to predicates of degree n + 1 of level 1, and so

forth.
This second principle makes virtually all mathematics break down. For example, when
speaking of real number system and its completeness, one wishes to quantify over all predicates of real numbers (this is possible at degree n + 1 if the predicates of real numbers
appear at degree n), not only of those of a given level. In order to overcome this, Russell
and Whitehead introduced in PM the so-called axiom of reducibility, which states that if a
predicate Pn occurs at some level k (i.e. Pn = Pnk ), it occurs already on the first level.
Frank P. Ramsay was the first to notice that the axiom of reducibility in effect collapses the
hierarchy of levels, so that the hierarchy is entirely superfluous in presense of the axiom. The
original form of type theory is known as ramified type theory, and the simpler alternative
with no second hierarchy of levels is known as unramified type theory or simply as simple
type theory.
One descendant of type theory is W. v. Quines system of set theory known as NF (New
Foundations), which differs considerably from the more familiar set theories (ZFC, NBG,
Morse-Kelley). In NF there is a class comprehension axiom saying that to any stratified
formula there corresponds a set of elements satisfying the formula. The Russell class is not
a set, since it contains the formula x x, which cant be stratified, but the universal class
is a set: x = x is perfectly legal in type theory, as we can assign to x any degree and get a
well-formed formula of type theory. It is not known if NF axiomatises any extensor (see the
entry class) based on a limitation of size principle, like the more familiar set theories do.
In the modern variants of type theory, one usually has a more general supply of types.
Beginning with some set of types (presumably a division of the simple objects into some
natural categories), one defines the set of types T by setting
if a, b T , then (a b) T
for all t , t T
One way to proceed to get something familiar is to have contain a type t for truth values.
Then sentences are objects of type t, open formulae of one variable are of type Object t
and so forth. This sort of type system is often found in the study of typed lambda calculus
and also in intensional logics, which are often based on the former.
15.3
analytic hierarchy
The analytic hierarchy is a hierarchy of either (depending on context) formulas or relations

similar to the arithmetical hierarchy. It is essentially the second order equivalent. Like the
145
arithmetical hierarchy, the relations in each level are exactly the relations defined by the
formulas of that level.
The first level can be called 10 , 11 , 10 , or 10 , and consists of the arithmetical formulas or
relations.
A formula is 1n if there is some arithmetical formula such that:
~ n)
(~k) = X1 X2 QXn (~k, X
where Q is either or , whichever maintains the pattern of alternating quantifiers, and each Xi is a set
Similarly, a formula is 1n if there is some arithmetical formula such that:
~ n)
(~k) = X1 X2 QXn (~k, X
where Q is either or , whichever maintains the pattern of alternating quantifiers, and each Xi is a set
15.4
game-theoretical quantifier
A Henkin or branching quantifier is a multi-variable quantifier in which the selection of

variables depends only on some, but not all, of the other quantified variables. For instance
the simplest Henkin quantifier can be written:
xy
(x, y, a, b)
ab
This quantifier, inexpressible in ordinary first order logic, can best be understood by its
Skolemization. The formula above is equivalent to xa(x, f (y), a, g(a)). Critically, the
selection of y depends only on x while the selection of b depends only on a.
Logics with this quantifier are stronger than first order logic, lying between first and second order logic
in strength. For instance the Henkin quantifier can be used to define the Rescher quantifier,
and by extension Hartigs quantifer:
xy
[(x = a y = b) (x) (y)] Rxy(x)(y)
ab
To see that this is true, observe that this essentially requires that the Skolem functions
f (x) = y and g(a) = b the same, and moreover that they are injective. Then for each x
satisfying (x), there is a different f (x) satisfying ((f (x)).
146
This concept can be generalized to the game-theoretical quantifiers. This concept comes
from interpreting a formula as a game between a Prover and Refuter. A theorem is
provable whenever the Prover has a winning strategy; at each the Refuter chooses which
side they will play (so the Prover must be prepared to win on either) while each is a choice
for the Prover. At a , the players switch roles. Then represents a choice for the Refuter
and for the Prover.
Classical first order logic, then, adds the requirement that the games have perfect information.
The game-theoretical quantifers remove this requirement, so for instance the Henkin quantifier, which would be written xya/x b(x, y, a, b) states that when the Prover makes a
choice for b, it is made without knowledge of what was chosen at x.
15.5
logical language
In its most general form, a logical language is a set of rules for constructing formulas for
some logic, which can then by assigned truth values based on the rules of that logic.
A logical languages L consists of:
A set F of function symbols (common examples include + and )
A set R of relation symbols (common examples include = and <)
A set C of logical connectives (usually , , , and )
A set Q of quantifiers (usuallly and )
A set V of variables
Every function symbol, relation symbol, and connective is associated with an arity (the set
of n-ary function symbols is denoted Fn , and similarly for relation symbols and connectives).
Each quantifier is a generalized quantifier associated with a quantifier type hn1 , . . . , nn i.
The
S underlying logic has a (possibly empty) set of types T . There is a function Type :
F V T which assignes
a type to each function and variable. For each arity n is a
S
function Inputsn : Fn Rn T n which gives the types of each of the arguments to a
function symbol or relation. In addition, for each quantifier type hn1 , . . . , nn i there is a
function Inputshn1 ,...,nn i defined on Qhn1 ,...,nn i (the set of quantifiers of that type) which gives
an n-tuple of ni -tuples of types of the arguments taken by formulas the quantifier applies to.
The terms of L of type t T are built as follows:
147
1. If v is a variable such that Type(v) = t then v is a term of type t

2. If f is an n-ary function symbol such that Type(f ) = t and t1 , . . . , tn are terms such
that for each i < n Type(ti ) = (Inputsn (f ))i then f t1 , . . . , tn is a term of type t
The formulas of L are built as follows:
1. If r is an n-ary relation symbol and t1 , . . . , tn are terms such that Type(ti ) = (Inputsn (r))i
then rt1 , . . . , tn is a formula
2. If c is an n-ary connective and f1 , . . . , fn are formulas then cf1 , . . . , fn is a formula
3. If q is a quantifier of type hn1 , . . . , nn i, v1,1 , . . . , v1,n1 , v2,1 , . . . , vn,1 , . . . , vn,nn are a sequence
of variables such that Type(vi,j ) = ((Inputshn1 ,...,nn i (q))j )i and f1 , . . . , fn are formulas
then qv1,1 , . . . , v1,n1 , v2,1 , . . . , vn,1 , . . . , vn,nn f1 , . . . , fn is a formula
Generally the connectives, quantifiers, and variables are specified by the appropriate logic,
while the function and relation symbols are specified for particular languages. Note that
0-ary functions are usually called constants.
If there is only one type which is equated directly with truth values then this is essentially
a propositional logic. If the standard quantifiers and connectives are used, there is only one
type, and one of the relations is = (with its usual semantics), this produces first order logic.
If the standard quantifiers and connectives are used, there are two types, and the relations
include = and with appropriate semantics, this is second order logic (a slightly different
formulation replaces with a 2-ary function which represents function application; this views
second order objects as functions rather than sets).
Note that often connectives are written with infix notation with parentheses used to control
order of operations.
15.6
second order logic
Second order logic refers to logics with two (or three) types where one type consists of the
objects of interest and the second is either sets of those objects or functions on those objects
(or both, in the three type case). For instance, second order arithmetic has two types: the
numbers and the sets of numbers.
Formally, second order logic usually has:
the standard quantifiers (four of them, since each type needs its own universal and
existential quantifiers)
148
the standard connectives

the relation = with its normal semantics
if the second type represents sets, a relation where the first argument is of the first
type and the second argument is the second type
if the second type represents functions, a binary function which takes one argument of
each type and results in an object of the first type, representing function application
Specific second order logics may deviate from this definition slightly. In particular, some
mathematicians have argued that first order logics which additional quantifiers which give
it most or all of the strength of second order logic should be considered second order logics.
Some people, chiefly Quine, have raised philisophical objections to second order logic, centering on the question of whether models require fixing some set of sets or functions as the
actual sets or functions for the purposes of that model.
149
Chapter 16
03B40 Combinatory logic and
lambda-calculus
16.1
Church integer
A Church integer is a representation of integers as functions, invented by Alonzo Church.

An integer N is represented as a higher-order function, which applies a given function to a
given expression N times.
For example, in Haskell, a function that returns a particular Church integer might be
The transformation from a Church integer to an integer might be
unchurch n = n (+1) 0
Thus the (+1) function would be applied to an initial value of 0 n times, yielding the ordinary
integer n.
16.2
combinatory logic
Combinatory logic was invented by Moses Schonfinkel in the early 1920s, and was mostly
developed by Haskell Curry. The idea was to reduce the notation of logic to the simplest
terms possible. As such, combinatory logic consists only of combinators, combination
operations, and no free variables.
150
A combinator is simply a function with no free variables. A free variable is any variable
referred to in a function that is not a parameter of that function. The operation of combination is then simply the application of a combinator to its parameters. Combination is
specified by simple juxtaposition of two terms, and is left-associative. Parentheses may also
be present to override associativity. For example
f gxy = (f g)xy = ((f g)x)y
All combinators in combinatory logic can be derived from two basic combinators, S and K.
They are defined as
Sf gx = f x(gx)
Kxy = x
Reference is sometimes made to a third basic combinator, I, which can be defined in terms
of S and K.
Ix = SKKx = x
Combinatory logic where I is considered to be derived from S and K is sometimes known
as pure combinatory logic.
Combinatory logic and lambda calculus are equivalent. However, lambda calculus is more
concise than combinatory logic; an expression of size O(n) in lambda calculus is equivalent
to an expression of size O(n2 ) in combinatory logic.
For example, Sf gx = f x(gx) in combinatory logic is equivalent to S = (f (g(x((f x)(gx))))),
and Kxy = x is equivalent to K = (x(yx)).
16.3
lambda calculus
Lambda calculus (often referred to as -calculus) was invented in the 1930s by Alonzo
Church, as a form of mathematical logic dealing primarly with functions and the application
of functions to their arguments. In pure lambda calculus, there are no constants. Instead, there are only lambda abstractions (which are simply specifications of functions),
variables, and applications of functions to functions. For instance, Church integers are used
as a substitute for actual constants representing integers.
151
A lambda abstraction is typically specified using a lambda expression, which might look
like the following.
x.f x
The above specifies a function of one argument, that can be reduced by applying the
function f to its argument (function application is left-associative by default, and parentheses
can be used to specify associativity).
The -calculus is equivalent to combinatory logic (though much more concise). Most functional
programming languages are also equivalent to -calculus, to a degree (any imperative features in such languages are, of course, not equivalent).
Examples
We can specify the Church integer 3 in -calculus as
3 = f x . f (f (f x))
Suppose we have a function inc, which when given a string representing an integer, returns
a new string representing the number following that integer. Then
3 inc 0 = 3
Addition of Church integers in -calculus is
add = x y
add 2 3 = f z
= f z
= f z
= 5
. ( f z . x f (y f z))
. 2 f (3 f z)
. 2 f (f (f (f z)))
. f (f (f (f (f z))))
Multiplication is
mul = x y . ( f z . x ( w . y f w) z)
mul 2 3 = f z . 2 ( w . 3 f w) z
= f z . 2 ( w . f (f (f w))) z
= f z . f (f (f (f (f (f z)))))
152
Russells Paradox in -calculus

The -calculus readily admits Russells paradox. Let us define a function r that takes a
function x as an argument, and is reduced to the application of the logical function not to
the application of x to itself.
r = x . not (x x)
Now what happens when we apply r to itself?
r r = not (r r)
= not (not (r r))
..
.
Since we have not (r r) = (r r), we have a paradox.
153
Chapter 17
03B48 Probability and inductive
logic
17.1
conditional probability
Let (, B, ) be a probability space, andTlet X and Y be random variables on with joint

probability distribution (X, Y ) := (X Y ).
The conditional probability of X given Y is defined as
T
(X Y )
.
(X|Y ) :=
(Y )
(17.1.1)
In general,
(X|Y )(Y ) = (X, Y ) = (Y |X)(X),
and so we have
(X|Y ) =
(Y |X)(X)
.
(Y )
154
(17.1.2)
(17.1.3)
Chapter 18
03B99 Miscellaneous
18.1
Beth property
A logic is said to have the Beth property if whenever a predicate R is implicitly definable by
(i.e. if all models have at most one unique extension satisfying ), then R is explicitly definable relative to (i.e. there is a not containing R,such that |= x1 , .., xn (R(x1 , ..., xn )
(x1 , ..., xn ))).
18.2
Hofstadters MIU system
The alphabet of the system contains three symbols M, I, U. The set of theorem is the set of
string constructed by the rules and the axiom, is denoted by T and can be built as follows:
(axiom) MI T.
(i) If xI T then xIU T.
(ii) If Mx T then Mxx T.
(iii) In any theorem, III can be replaced by U.
(iv) In any theorem, UU can be omitted.
example:
155
Show that MUII T

MI T
MII T
MIIII T
MIIIIIIII T
MIIIIIIIIU T
MIIIIIUU T
MIIIII T
MUII T
by
by
by
by
by
by
by
by
axiom
rule (ii) where x = I
rule (ii) where x = II
rule (ii) where x = IIII
rule (i) where x = MIIIIIII
rule (iii)
rule (iv)
rule (iii)
Is MU a theorem?
No. Why? Because the number of Is of a theorem is never a multiple of 3. We will
show this by structural induction.
base case: The statement is true for the base case. Since the axiom has one I .
Therefore not a multiple of 3.
induction hypothesis: Suppose true for premise of all rule.
induction step: By induction hypothesis we assume the premise of each rule to be true
and show that the application of the rule keeps the staement true.
Rule 1: Applying rule 1 does not add any Is to the formula. Therefore the statement
is true for rule 1 by induction hypothesis.
Rule 2: Applying rule 2 doubles the amount of Is of the formula but since the initial
amount of Is was not a multiple of 3 by induction hypothesis. Doubling that amount
does not make it a multiple of 3 (i.e. if n 6 0 mod 3 then 2n 6 0 mod 3). Therefore
the statement is true for rule 2.
Rule 3: Applying rule 3 replaces III by U. Since the initial amount of Is was not a
multiple of 3 by induction hypothesis. Removing III will not make the number of Is
in the formula be a multiple of 3. Therefore the statement is true for rule 3.
Rule 4: Applying rule 4 removes UU and does not change the amount of Is. Since
the initial amount of Is was not a multiple of 3 by induction hypothesis. Therefore
the statement is true for rule 4.
Therefore all theorems do not have a multiple of 3 Is.
[GVL]
REFERENCES
[HD] Hofstader, R. Douglas: Godel, Escher, Bach: an Eternal Golden Braid. Basic Books, Inc.,
New York, 1979.
156
18.3
IF-logic
Independence Friendly logic (IF-logic) is an interesting conservative extension of classical first order logic
based on very natural ideas from game theoretical semantics developed by Jaakko Hintikka
and Gabriel Sandu among others. Although IF-logic is a conservative extension of first order
logic, it has a number of interesting properties, such as allowing truth-definitions and admitting a translation of all 11 sentences (second order sentences with an intial second order
existential quantifier followed by a first order sentence).
IF-logic can be characterised as the natural extension of first order logic when one allows
informational independence to occur in the game theoretical truth definition. To understand
this idea we need first to introduce the game theoretical definition of truth for classical first
order logic.
To each first order sentence we assign a game G() with to players played on models of
the appropriate language. The two players are called verifier and falsisier (or nature). The
idea is that the verifier attempts to show that the sentence is true in the model, while the
falsifier attempts to show that it is false in the model. The game G() is defined as follows.
We will use the convention that if p is a symbol that names a function, a predicate or an
object of the model M, then pM is that named entity.
if P is an n-ary predicate and ti are names of elements of the model, then G(P (t1 , ..., tn ))
Mn
) P M and otherwise
is a game in which the verifier immediatedly wins if (tM
1 , ..., t
the falsifier immediatedly wins.
the game G(1 2 ) begins with the choice i from 1 and 2 (i = 1 or i = 2) by the
verifier and then proceeds as the game G(i )
the game G(1 2 ) is the same as G(1 2 ), except that the choice is made by the
falsifier
the game G(x(x)) begins with the choice by verifier of a member of M which is
given a name a, and then proceeds as G((a))
the game G(x(x)) is the same as G(x(x)), except that the choice of a is made by
the falsifier
the game G() is the same as G() with the roles of the falsifier and verifier exchanged
Truth of a sentence is defined as the existence of a winning strategy for verifier for the
game G(). Similarly, falsity of is defined as the existence of a winning strategy for the
falsifier for the game G(). (A strategy is a specification which determines for each move the
opponent does what the player should do. A winning strategy is a strategy which guarantees
victory no matter what strategy the opponent follows).
157
For classical first order logic, this definition is equivalent to the usual Tarskian definition of
turth (i.e. the one based on satisfaction found in most treatments of semantics of first order
logic). This means also that since the law of excluded middle holds for first order logic that
the games G() have a very strong property; either the falsifier or the verifier has a winning
strategy.
Notice that all rules except those for negation and atomic sentences concern choosing a
sentence or finding an element. These can be codified into functions, which tell us which
sentence to pick or which element of the model to choose, based on our previous choices
and those of our opponent. For example, consider the sentence x(P (x) Q(x)). The
corresponding game begins with the falsifier picking an element a from the model, so a
strategy for the verifier must specify for each element a which of Q(a) and P (a) to pick. The
truth of the sentence is equivalent to the existence of a winning strategy for the verifier, i.e.
just such a function. But this means that x(P (x)Q(x)) is equivalent to f xP (x)f (x) =
0Q(x)f (x) = 1. Lets consider a more complicated example: xyzsP (x, y, z, s). The
truth of this is equivalent to the existence of a functions f and g, s.t. xzP (x, f (x), z, g(z)).
These sort of functions are known as Skolem functions, and they are in essence just winning
strategies for the verifier. We wont prove it here, but all first order sentences can be
expressed in form f1 ...fn x1 ...xk , where is a truth functional combination of atomic
sentences in which all terms are either constants or variables xi or formed by application of
the functions fi to such terms. Such sentences are said to be in 11 form.
Lets consider a 11 sentence f gxz(x, f (x), y, g(z)). Up front, it seems to assert the
existence of a winning strategy in a simple semantical game like those described above.
However, the game cant correspond to any (classical) first order formula! Lets first see
what the game the existence of a winning strategy of which this formula asserts looks like.
First, the falsifier chooses elements a and b to serve as x and y. Then the verifier chooses
an element c knowing only a and an element d knowing only b. The verifiers goal is that
(a, c, b, d) comes out as a true atomic sentence. The game could be actually arranged so
that the verifier is a team of two players (who arent allowed to communicate with each
other), one of which picks c the other one picking d.
From a game theoretical point of view, games in which some moves must be made without
depending on some of the earlier moves are called informationally incomplete games, and
they occur very commonly. Bridge is such a game, for example, and usually real examples
of such games have players being actually teams made up of several people.
IF-logic comes out of the game theoretical definition in a natural way if we allow informational independence in our semantical games. In IF-logic, every connective can be augmented
with an independence marker //, so that //0 means that the game for the occurance of
0 within the scope of must be played without knowledge of the choices made for . For
example (x//y)y(x, y) asserts that for any choice of value for x by the falsisier, the
verifier can find a value for y which does not depend on the value of x, s.t. (x, y) comes
out true. This is not a very characteristic example, as it can be written as an ordinary first
order formula yx(x, y). The curious game we described above corresponding to the sec158
ond order Skolem-function formulation 11 sentence f gxz(x, f (x), y, g(z)) corresponds

to an IF-sentence (x//y)(z//u)(y)(u)(x, y, z, u). IF-logic allows informational independence also for the usual logical connectives, for example (x//)((x) (x)) is true
if and only if for all x, either (x) or (x) is true, but which of these is picked by the verifier
must be decided independently of the choice for x by the falsifier.
One of the striking characteristics of IF-logic is that every 11 formula has an IF-translation
I F which is true if and only if is true (the equivalence does not in general hold if we
replace true with false). Since for example first order truth (in a model) is 11 definable
(its just quantification over all possible valuations, which are second order objects), there
are IF-theories which correctly represent the truth predicate for their first order part. What
is even more striking is that sufficiently strong IF-theories can do this for the whole of the
language they are expressed in.
This seems to contradict Tarskis famous result on the undefinability of truth, but this is
illusory. Tarskis result depends on the assumption that the logic is closed under contradictory negation. This is not the case for IF-logic. In general for a given sentence there is no
sentence which is true just in case is not true. Thus the law of excluded middle does
not hold in general in IF-logic (although it does for the classical first order portion). This is
quite unsurprising since games of imperfect information are very seldom determined in the
sense that either the verifier or the falsisifer has a winning strategy. For example, a game in
which I choose a 10-letter word and you have one go at guessing it is not determined in this
sense, since there is no 10-letter word you couldnt guess and on the other hand you have
no way of forcing me to choose any particular 10-letter word (which would guarantee your
victory).
IF-logic is stronger than first order logic in the usual sense that there are classes of structures
which are IF-definable but not first-order definable. Some of these are even finite. Many
interesting concepts are expressible in IF-logic, such as equicardinality,infinity (which can
be expressed by a logical formula in contradistinction to ordinary first order logic in which
non-logical symbols are needed), well-order
By Lindstroms theorem we thus know that either IF-logic is not complete (i.e. its set of
validities is not r.e.) or the Lowenheim-Skolem theorem does not hold. In fact, (downward)
Lowenheim-Skolem theorem does hold for IF-logic, so its not complete. There is a complete disproof procedure for IF-logic, but because IF-logic is not closed under contradictory
negation this does not yield a complete proof procedure.
IF-logic can be extended by allowing contradictory negations of closed sentences and turth
functional combinations thereof. This extended IF-logic is extremely strong. For example,
the second order induction axiom for PA is X((X(0) y(X(y) X(y + 1))) yX(y)).
The negation of this is a 11 sentence asserting the existence of a set which invalidates the
induction axiom. Since 11 sentences are expressible in IF-logic, we can translate the negation
of the induction axiom into IF-sentence . But now is a formula of extended IF-logic,
and is clearly equivalent to the usual induction axiom! As all the rest of PA axioms are first
order, this shows that extended IF-logic PA can correctly define the natural number system.
159
There exists also an interesting translation of nth order logic into extended IF-logic. Consider an n-sorted first order language and an nth order theory T translated into this language.
Now, extend the language to second order and add the axiom stating that the sort k + 1
actually comprises the whole of the powerset of the sort k. This is a 1 sentence (i.e. of
the form for all predicates P there is a first order element of sort k + 1 which comprises
exactly the k extension of P ). It is easy to see that a formula is valid in this new system if
and only if it was valid in the original nth order logic. The negation of this axiom is again
11 and translatable into IF-logic and thus the axiom itself is expressible in extended IFlogic. Moreover, since most interesting second order theories are finitely axiomatisable, we
can consider sentences of form T (where T is the multisorted translation of T ), which
express logical implication of by T (correctly). This is equivalent to (T ) (where
is contradictory), but since T is a conjunction of a P i11 sentence asserting comprehension
translated into extended IF-logic and first order translation of the axioms of T , this is a 11
formula translatable to non-extended IF-logic and so is . Thus sentences of form T
of nth order logic are translatable into IF-sentences which are true just in case the originals
were.
18.4
Tarskis result on the undefinability of Truth
Assume L is a logic which is closed under contradictory negation and has the usual truthfunctional connectives. Assume also that L has a notion of open formula with one variable
and of substitution. Assume that T is a theory of L in which we can define define surrogates
for formulae of L, and in which all true instances of the substitution relation and the truthfunctional connective relations are provable. We show that either T is inconsistent or T cant
be augmented with a truth predicate True for which the following T-schema holds
True(0 0 )
Assume that the open formulae with one variable of L have been indexed by some suitable
set that is representable in T (otherwise the predicate True would be next to useless, since if
theres no way to speak of sentences of a logic, theres little hope to define a truth-predicate
for it). Denote the i:th element in this indexing by Bi . Consider now the following open
formula with one variable
Liar(x) = True(Bx )(x)
Now, since Liar is an open formula with one free variable its indexed by some i. Now
consider the sentence Liar(i). From the T-schema we know that
160
True(Liar(i)) Liar(i)
and by the definition of Liar and the fact that i is the index of Liar(x) we have
True(Liar(i)) True(Liar(i))
which clearly is absurd. Thus there cant be an extension of T with a predicate Truth for
which the T-schema holds.
We have made several assumptions on the logic L which are crucial in order for this proof
to go trough. The most important is that L is closed under contradictory negation. There
are logics which allow truth-predicates, but these are not usually closed under contradictory
negation (so that its possible that True(Liar(i)) is neither true nor false). These logics
usually have stronger notions of negation, so that a sentence P says more than just that
P is not true, and the proposition that P is simply not true is not expressible.
An example of a logic for which Tarskis undefinability result does not hold is the so-called
Independence Friendly logic, the semantics of which is based on game theory and which
allows various generalised quantifiers (the Henkin branching quantifier, &c.) to be used.
18.5
axiom
In a nutshell, the logico-deductive method is a system of inference where conclusions (new

knowledge) follow from premises (old knowledge) through the application of sound arguments
(syllogisms, rules of inference). Tautologies excluded, nothing can be deduced if nothing
is assumed. Axioms and postulates are the basic assumptions underlying a given body
of deductive knowledge. They are accepted without demonstration. All other assertions
(theorems, if we are talking about mathematics) must be proven with the aid of the basic
assumptions.
The logico-deductive method was developed by the ancient Greeks, and has become the core
principle of modern mathematics. However, the interpretation of mathematical knowledge
has changed from ancient times to the modern, and consequently the terms axiom and
postulate hold a slightly different meaning for the present day mathematician, then they
did for Aristotle and Euclid.
The ancient Greeks considered geometry as just one of several sciences, and held the theorems
of geometry on par with scientific facts. As such, they developed and used the logicodeductive method as a means of avoiding error, and for structuring and communicating
knowledge. Aristotles Posterior Analytics is a definitive exposition of the classical view.
161
Axiom, in classical terminology, referred to a self-evident assumption common to many

branches of science. A good example would be the assertion that
When an equal amount is taken from equals, an equal amount results.
At the foundation of the various sciences lay certain basic hypotheses that had to be accepted
without proof. Such a hypothesis was termed a postulate. The postulates of each science
were different. Their validity had to be established by means of real-world experience.
Indeed, Aristotle warns that the content of a science cannot be successfully communicated,
if the learner is in doubt about the truth of the postulates.
The classical approach is well illustrated by Euclids elements, where we see a list of axioms
(very basic, self-evident assertions) and postulates (common-sensical geometric facts drawn
from our experience).
A1 Things which are equal to the same thing are also equal to one another.
A2 If equals be added to equals, the wholes are equal.
A3 If equals be subtracted from equals, the remainders are equal.
A4 Things which coincide with one another are equal to one another.
A5 The whole is greater than the part.
P1 It is possible to draw a straight line from any point to any other point.
P2 It is possible to produce a finite straight line continuously in a straight line.
P3 It is possible to describe a circle with any centre and distance.
P4 It is true that all right angles are equal to one another.
P5 It is true that, if a straight line falling on two straight lines make the interior angles on
the same side less than two right angles, the two straight lines, if produced indefinitely,
meet on that side on which are the angles less than the two right angles.
The classical view point is explored in more detail here.
A great lesson learned by mathematics in the last 150 years is that it is useful to strip the
meaning away from the mathematical assertions (axioms, postulates, propositions, theorems)
and definitions. This abstraction, one might even say formalization, makes mathematical
knowledge more general, capable of multiple different meanings, and therefore useful in
multiple contexts.
In structuralist mathematics we go even further, and develop theories and axioms (like
field theory, group theory, topology, vector spaces) without any particular application in
162
mind. The distinction between an axiom and a postulate disappears. The postulates
of Euclid are profitably motivated by saying that they lead to a great wealth of geometric
facts. The truth of these complicated facts rests on the acceptance of the basic hypotheses.
However by throwing out postulate 5, we get theories that have meaning in wider contexts,
hyperbolic geometry for example. We must simply be prepared to use labels like line
and parallel with greater flexibility. The development of hyperbolic geometry taught
mathematicians that postulates should be regarded as purely formal statements, and not as
facts based on experience.
When mathematicians employ the axioms of a field, the intentions are even more abstract.
The propositions of field theory do not concern any one particular application; the mathematician now works in complete abstraction. There are many examples of fields; field theory
gives correct knowledge in all contexts.
It is not correct to say that the axioms of field theory are propositions that are regarded as
true without proof. Rather, the Field Axioms are a set of constraints. If any given system of
addition and multiplication tolerates these constraints, then one is in a position to instantly
know a great deal of extra information about this system. There is a lot of bang for the
formalist buck.
Modern mathematics formalizes its foundations to such an extent that mathematical theories
can be regarded as mathematical objects, and logic itself can be regarded as a branch of
mathematics. Frege, Russell, Poincare, Hilbert, and Godel are some of the key figures in
this development.
In the modern understanding, a set of axioms is any collection of formally stated assertions
from which other formally stated assertions follow by the application of certain well-defined
rules. In this view, logic becomes just another formal system. A set of axioms should be
consistent; it should be impossible to derive a contradiction from the axiom. A set of axioms
should also be non-redundant; an assertion that can be deduced from other axioms need not
be regarded as an axiom.
It was the early hope of modern logicians that various branches of mathematics, perhaps
all of mathematics, could be derived from a consistent collection of basic axioms. An early
success of the formalist program was Hilberts formalization of Euclidean geometry, and the
related demonstration of the consistency of those axioms.
In a wider context, there was an attempt to base all of mathematics on Cantors set theory.
Here the emergence of Russells paradox, and similar antinomies of naive set theory raised
the possibility that any such system could turn out to be inconsistent.
The formalist project suffered a decisive setback, when in 1931 Godel showed that it is
possible, for any sufficiently large set of axioms (Peanos axioms, for example) to construct
a statement whose truth is independent of that set of axioms. As a corollary, Godel proved
that the consistency of a theory like Peano arithmetic is an unprovable assertion within the
scope of that theory.
163
It is reasonable to believe in the consistency of Peano arithmetic because it is satisfied by

the system of natural numbers, an infinite but intuitively accessible formal system. However, at this date we have no way of demonstrating the consistency of modern set theory
(Zermelo-Frankel axioms). The axiom of choice, a key hypothesis of this theory, remains a
very controversial assumption. Furthermore, using techniques of forcing (Cohen) one can
show that the continuum hypothesis (Cantor) is independent of the Zermelo-Frankel axioms.
Thus, even this very general set of axioms cannot be regarded as the definitive foundation
for mathematics.
Version: 11 Owner: rmilson Author(s): rmilson, digitalis
18.6
compactness
A logic is said to be (, )-compact, if the following holds

If is a set of sentences of cardinality less than or equal to and all subsets of
of cardinality less than are consistent, then is consistent.
For example, first order logic is (, )-compact, for if all finite subsets of some class of
sentences are consistent, so is the class itself.
18.7
consistent
If T is a theory of L then it is consistent iff there is some model M of L such that M T .

If a theory is not consistent then it is inconsistent.
A slightly different definition is sometimes used, that T is consistent iff T 6` (that is, as
long as it does not prove a contradiction). As long as the proof calculus used is sound and
complete, these two definitions are equivalent.
18.8
interpolation property
A logic is said to have the interpolation property if whenever (R, S) (R, S) holds, then
there is a sentence (R), so that (R, S) (R) and (R) (R, T ), where R, S and T
164
are some sets of symbols that occur in the formulae, S being the set of symbols common to
both and .
The interpolation property holds for first order logic. The interpolation property is related
to Beth definability property and Robinsons consistency property. Also, a natural generalisation is the concept -closed logic.
18.9
sentence
A sentence is a formula with no free variables.

Simple examples include:
xy[x < y]
or
z[z + 7 43 = 0]
However the following formula is not a sentence:
x+2 =3
165
Chapter 19
03Bxx General logic
19.1
Banach-Tarski paradox
The 3-dimensional ball can be split in a finite number of pieces which can be pasted together
to give two balls of the same volume as the first!
Let us formulate the theorem formally. We say that a set A Rn is decomposable
n
in N pieces
S SA1 , . . . , AN if there exist some isometries 1 , . . . , N of R such that A =
1 (A1 ) . . . N (AN ) while 1 (A1 ), . . . , N (AN ) are all disjoint.
We then say that two sets A, B Rn are equi-decomposable if both A and B are decomposable in the same pieces A1 , . . . , AN .
Theorem 2 (Banach-Tarski). The unit ball B3 R3 is equi-decomposable to the union of

two disjoint unit balls.
19.1.1
Comments
The actual number of pieces needed for this decomposition is not so large. Say that ten
pieces are enough.
Also it is not important that the set considered is a ball. Every two set with non empty
interior are equidecomposable in R3 . Also the ambient space can be choosen larger. The
theorem is true in all Rn with n 3 but it is not true in R2 nor in R.
Where is the paradox? We are saying that a piece of (say) gold can be cut and pasted to
obtain two pieces equal to the previous one. And we may divide these two pieces in the same
way to obtain four pieces and so on...
We believe that this is not possible since the weight of the piece of gold does not change
166
when I cut it.

A consequence of this theorem is, in fact, that it is not possible to define the volume for all
subsets of the 3-dimensional space. In particular the volume cannot be computed for some
of the pieces in which the unit ball is decomposed (some of them are not measurable).
The existence of non-measurable sets is proved more simply and in all dimension by Vitali Theorem.
However Banach-Tarski paradox says something more. It says that it is not possible to define
a measure on all the subsets of R3 even if we drop the countable additivity and replace it
with a finite additivity:
[
(A B) = (A) + (B) A, B disjoint.
Another point to be noticed is that the proof needs the axiom of choice. So some of the
pieces in which the ball is divided are not constructable.
167
Chapter 20
03C05 Equational classes, universal
algebra
20.1
congruence
Let be a fixed signature, and A a structure for . A congruence on A is an

equivalence relation such that for every natural number n and n-ary function symbol F
of , if ai a0i then F A(a1 , . . . an ) F A(a01 , . . . a0n ).
Version: 6 Owner: almann Author(s): almann
20.2
every congruence is the kernel of a homomorphism
Let be a fixed signature, and A a structure for . If is a congruence on A, then there

is a homomorphism f such that = ker (f ).
efine a homomorphism f : A A/ : a 7 [[a]]. Observe that a b if and only if
f (a) = f (b), so = ker (f ). To verify that f is a homomorphism, observe that
D
1. For each constant symbol c of , f (cA) = [[cA]] = cA/ .

2. For every natural number n and n-ary relation symbol R of , if RA(a1 , . . . , an ) then
RA/ ([[a1 ]], . . . , [[an ]]), so RA/ (f (a1 ), . . . , f (an )).
168
3. For every natural number n and n-ary function symbol F of ,

f (F A(a1 , . . . an )) = [[F A(a1 , . . . an )]]
= F A/ ([[a1 ]], . . . [[an ]])
= F A/ (f (a1 ), . . . f (an )).
20.3
homomorphic image of a -structure is a -structure
Let be a fixed signature, and A and B two structures for . If f : A B is a

homomorphism, then i(f ) is a structure for .
20.4
kernel
Given a function f : A B, the kernel of f is the equivalence relation on A defined by

(a, a0 ) ker (f ) f (a) = f (a0 ).
20.5
kernel of a homomorphism is a congruence
Let be a fixed signature, and A and B two structures for . If f : A B is a homomorphism,

then ker (f ) is a congruence on A.
I
f F is an n-ary function symbol of , and f (ai ) = f (a0i ), then

f (F A(a1 , . . . , an )) = F B(f (a1 ), . . . , f (an ))
= F B(f (a01 ), . . . , f (a0n ))
= f (F A(a01 , . . . , a0n )).

169
20.6
quotient structure
Let be a fixed signature, A a structure for , and a congruence on A. The quotient

structure of A by , denoted A/ , is defined as follows:
1. The universe of A/ is the set {[[a]] | a A}.
2. For each constant symbol c of , cA/ = [[cA]].
3. For every natural number n and every n-ary function symbol F of ,
F A/ ([[a1 ]], . . . [[an ]]) = [[F A(a1 , . . . an )]].
4. For every natural number n and every n-ary relation symbol R of , RA/ ([[a1 ]], . . . , [[an ]])
if and only if for some a0i ai we have RA(a01 , . . . , a0n ).
170
Chapter 21
03C07 Basic properties of first-order
languages and structures
21.1
Models constructed from constants
The definition of a structure and of the satisfaction relation is nice, but it raises the following
question : how do we get models in the first place? The most basic construction for models
of first-order theory is the construction that uses constants. Throughout this entry, L is a
fixed first-order language.
Let C be a set of constant symbols of L, and T be a theory in L. Then we say C is a set of
witnesses for T if and only if for every formula with at most one free variable x, we have
T ` x() (c) for some c C.
lemma. Let T is any consistent
S set of sentences of L, and C is a set of new symbols such
that |C| = |L|. Let L0 = L C. Then there is a consistent set T 0 L0 extending T and
which has C as set of witnesses.
Lemma. If T is a consistent theory in L, and C is a set of witnesses for T in L, then T has
a model whose elements are the constants in C.
Proof: Let be the signature for L. If T is a consistent set of sentences of L, then there is
a maximal consistent T 0 T . Note that T 0 and T have the same sets of witnesses. As every
model of T 0 is also a model of T , we may assume T is maximal consistent.
We let the universe of M be the set of equivalence classes C/ , where a b if and only if
a = b T . As T is maximal consistent, this is an equivalence relation. We interpret the
non-logical symbols as follows :
1. [a] =M [b] if and only if a b;
171
2. Constant symbols are interpreted in the obvious way, i.e. if c is a constant symbol,
then cM = [c];
3. If R is an n-ary relation symbol, then ([a1 ], ..., [an ]) RM if and only if R(a1 , ..., an )
T;
4. If F is an n-any function symbol, then F M([a0 ], ..., [an ]) = [b] if and only if
F (a1 , ..., an ) = b T .
From the fact that T is maximal consistent, and is an equivalence relation, we get that
the operations are well-defined (it is not so simple, ill write it out later). The proof that
M |= T is a straightforward induction on the complexity of the formulas of T .
Corollary. (The extended completeness theorem) A set T of formulas of L is consistent if

and only if it has a model (regardless of whether or not L has witnesses for T ).
Proof: First add a set C of new constants to L, and expand T to T 0 in such a way that C
is a set of witnesses for T 0 . Then expand T 0 to a maximal consistent set T 00 . This set has a
model M consisting of the constants in C, and M is also a model ot T .
Corollary. (compactness theorem) A set T of sentences of L has a model if and only if

every finite subset of T has a model.
Proof: Replace has a model by is consistent, and apply the syntactic compactness
theorem.
Corollary. (Godels completeness theorem) Let T be a consistent set of formulas of L. Then

A sentence is a theorem of T if and only if it is true in every model of T .
S
Proof: If is not a theorem of T , then is consistent with T , so T {} has a model
M, in which cannot be true.
Corollary. (Downward Lowenheim-Skolem theorem) If T L has a model, then it has a

model of power at most |L|.
I f T has a model, then it is consistent. The model constructed from constants has power
at most |L| (because we must add at most |L| many new constants).
Most of the treatment found in this entry can be read in more details in Chang and Keislers
book Model Theory.
21.2
Stone space
Suppose L is a first order language and B is a set of parameters from an L-structure M.

172
Let Sn (B) be the set of (complete) n-types over B (see type). Then we put a topology on
Sn (B) in the following manner.
For every formula L(B) we let S() := {p Sn (B) : p}. Then the topology is the
one with a basis of open sets given by {S() : L(B)}. Then we call Sn (B) endowed
with this topology the Stone space of complete n-types over B.
Some logical theorems and conditions are equivalent to topological conditions on this topology.
The compactness theorem for first order logic is so named because it is equivalent to
this topology being compact.
We define p to be an isolated type iff p is an isolated point in the stone space.SThis is
equivalent to there being some formula so that for every p we have T `
i.e. all the formulas in p are implied by some formula.
The Morley rank of a type p S1 (M) is equal to the Cantor-Bendixson rank of p in
this space.
The idea of considering the Stone space of types dates back to [1].
We can see that the set of formulas in a language is a Boolean lattice. A type is an ultra-filter
on this lattice. The definition of a Stone space can be made in an analogous way on the set
of ultra-filters on any boolean lattice.
REFERENCES
1. M. Morley, Categoricity in power. Trans. Amer. Math. Soc. 114 (1965), 514-538.
Version: 5 Owner: ratboy Author(s): Larry Hammick, Timmy
21.3
alphabet
An alphabet is a nonempty finite set of symbols. The main restriction is that we must
make sure that every string formed from can be broken back down into symbols in only
one way.
For example, {b, lo, g, bl, og} is not a valid alphabet because the string blog can be broken
up in two ways: b lo g and bl og. {Ca, n
a, d, a} is a valid alphabet, because there is only
one way to fully break up any given string formed from it.
If is our alphabet and n Z+ , we define the following as the powers of
173
0 = , where stands for the empty string.

n = {xy|x , y n1 } (xy is the juxtaposition of x and y)
So, n is the set of all strings formed from of length n.
Version: 1 Owner: xriso Author(s): xriso
21.4
axiomatizable theory
Let T be a first order theory. A subset T is a set of axioms for T if and only if T is
the set of all consequences of the formulas in . In other words, T if and only if is
provable using only assumptions from .
Definition. A theory T is said to be finitely axiomatizable if and only if there is a finite
set of axioms for T ; it is said to be recursively axiomatizable if and only if it has a
recursive set of axioms.
For example, group theory is finitely axiomatizable (it has only three axioms), and Peano arithmetic
is recursivaly axiomatizable : there is clearly an algorithm that can decide if a formula of
the language of the natural numbers is an axiom.
Theorem. complete recursively axiomatizable theories are decidable.
As an example of the use of this theorem, consider the theory of algebraically closed fields
of characteristic p for any number p prime or 0. It is complete, and the set of axioms is
obviously recursive, and so it is decidable.
21.5
definable
21.5.1
Definable sets and functions
Definability In Model Theory

Let L be a first order language. Let M be an L-structure. Denote x1 , . . . , xn by ~x and
y1 , . . . , ym by ~y , and suppose (~x, ~y ) is a formula from L, and b1 , . . . , bm is some sequence
from M.
Then we write (M n , ~b) to denote {~a M n : M |= (~a, ~b)}. We say that (M n , ~b) is ~bdefinable. More generally if S is some set and B M, and there is some ~b from B so that
174
S is ~b-definable then we say that S is B-definable.

In particular we say that a set S is -definable or zero definable iff it is the solution set of
some formula without parameters.
Let f be a function, then we say f is B-definable iff the graph of f (i.e. {(x, y) : f (x) = y})
is a B-definable set.
If S is B-definable then any automorphism of M that fixes B pointwise, fixes S setwise.
A set or function is definable iff it is B-definable for some parameters B.
Some authors use the term definable to mean what we have called -definable here. If this
is the convention of a paper, then the term parameter definable will refer to sets that are
definable over some parameters.
Sometimes in model theory it is not actually very important what language one is using, but
merely what the definable sets are, or what the definability relation is.
Definability of functions in Proof Theory

In proof theory, given a theory T in the language L, for a function f : M M to be
definable in the theory T , we have two conditions:
(i) There is a formula in the language L s.t. f is definable over the model M, as in the above
definition; i.e., its graph is definable in the language L over the model M, by some formula
(~x, y).
(ii) The theory T proves that f is indeed a function, that is T ` ~x!y.(~x, y).
For example: the graph of exponentiation function xy = z is definable by the language of
the theory I0 (a weak subsystem of PA), however the function itself is not definable in
this theory.
Version: 13 Owner: iddo Author(s): iddo, yark, Timmy
21.6
definable type
Let M be a first order structure. Let A and B be sets of parameters from M. Let p be
a complete n-type over B. Then we say that p is an A-definable type iff for every formula
(
x, y) with ln(
x) = n, there is some formula d(
y , z) and some parameters a
from A so
that for any b from B we have (
x, b) p iff M |= d(b, a).
Note that if p is a type over the model M then this condition is equivalent to showing that
175
{b M : (
x, b) M} is an A-definable set.
For p a type over B, we say p is definable if it is B-definable.
If p is definable, we call d the defining formula for , and the function 7 d a defining
scheme for p.
Version: 1 Owner: Timmy Author(s): Timmy
21.7
downward Lowenheim-Skolem theorem
Let L be a first order language, let A be an L-structure and let K dom(A). Then there is
an L-structure B such that K B and |B| 6 Max(|K|, |L|) and B is elementarily embedded
in A.
21.8
example of definable type
Consider (Q, <) as a structure in a language with one binary relation, which we interpret as
the order. This is a universal, 0 -categorical structure (see example of universal structure).
The theory of (Q, <) has quantifier elimination, and so is o-minimal. Thus a type over the
set Q is determined by the quantifier free formulas over Q, which in turn are determined by
the atomic formulas over Q. An atomic formula in one variable over B is of the form x b or x = b for some b B. Thus each 1-type over Q determines a Dedekind cut over
Q, and conversly a Dedekind cut determines a complete type over Q. Let D(p) := {a Q :
x > a p}.
Thus there are two classes of type over Q.
1. Ones where D(p) is of the form (, a) or (, a] for some a Q. It is clear that
these are definable from the above discussion.
2. Ones where D(p) has no supremum in Q. These are clearly not definable by o-minimality
of Q.
176
21.9
example of strongly minimal
Let LR be the language of rings. In other words LR has two constant symbols 0, 1 and three
binary function symbols +, ., . Let T be the LR -theory that includes the field axioms and
for each n the formula
^
X
x0 , x1 , . . . , xn y((
xi = 0)
xi y i = 0)
16i6n
06i6n
Which expresses that every degree n polynomial which is non constant has a root. Then any
model of T is an algebraically closed field. One can show that this is a complete theory and
has quantifier elimination (Tarski). Thus every B-definable subset of any K |= T is definable
by a quantifier free formula in LR (B) with one free variable y. A quantifier
P free formula is a
Boolean combination of atomic formulas. Each of these is of the form i6n bi y i = 0 which
defines a finite set. Thus every definable subset of K is a finite or cofinite set. Thus K and
T are strongly minimal
21.10
first isomorphism theorem
Let be a fixed signature, and A and B structures for . If f : A B is a homomorphism,

then A/ker (f ) is bimorphic to i(f ). Furthermore, if f has the additional property that for
every natural number n and n-ary relation symbol R of ,
RB(f (a1 ), . . . , f (an )) a0i [f (ai ) = f (a0i ) RA(a01 , . . . , a0n )],
then A/ker (f )
= i(f ).
ince the homomorphic image of a -structure is also a -structure, we may assume that
i(f ) = B.
S
Let = ker (f ). Define a bimorphism : A B : [[a]] 7 f (a). To verify that is well

defined, let a a0 . Then ([[a]]) = f (a) = f (a0 ) = ([[a0 ]]). To show that is injective,
suppose ([[a]]) = ([[a0 ]]). Then f (a) = f (a0 ), so a a0 . Hence [[a]] = [[a0 ]]. To show that is
a homomorphism, observe that for any constant symbol c of we have ([[cA]]) = f (cA) = cB.
For every natural number n and n-ary relation symbol R of ,
RA/ ([[a1 ]], . . . , [[an ]]) RA(a1 , . . . , an )
RB(f (a1 ), . . . , f (an ))
RB(([[a1 ]], . . . , ([[an ]])).
177
For every natural number n and n-ary function symbol F of ,

(F A/ ([[a1 ]], . . . , [[an ]])) = ([[F A(a1 , . . . , an )]])
= f (F A(a1 , . . . , an ))
= F B(f (a1 ), . . . , f (an ))
= F B(([[a1 ]], . . . , ([[an ]])).
Thus is a bimorphism.
Now suppose f has the additional property mentioned in the statement of the theorem.
Then
RB(([[a1 ]]), . . . , ([[an ]])) RB(f (a1 ), . . . , f (an ))
a0i [ai a0i RA(a01 , . . . , a0n )]

RA/ ([[a1 ]], . . . , [[an ]]).
Thus is an isomorphism.
21.11
language
Let be an alphabet. We then define the following using the powers of an alphabet and
infinite union, where n Z.
+
n=1
n
n = +
n=0
{}
A string is an element of , meaning that it is a grouping of symbols from one after

another. For example, abbc is a string, and cbba is a different string. + , like , contains
all finite strings except that + does not contain the empty string .
A language over is a subset of , meaning that it is a set of strings made from the
symbols in the alphabet .
Take for example an alphabet = {, , 63, a, A}. We can construct languages over , such
as: L = {aaa, , A63, 63, AaAaA}, or {a, aa, aaa, aaaa, }, or even the empty set
. In the context of languages, is called the empty language.
Version: 12 Owner: bbukh Author(s): bbukh, xriso
178
21.12
length of a string
Suppose we have a string w on alphabet . We can then represent the string as w =

x1 x2 x3 xn1 xn , where for all xi (1 6 i 6 n), xi (this means that each xi must be
a letter from the alphabet). Then, the length of w is n. The length of a string w is
represented as kwk.
For example, if our alphabet is = {a, b, ca} then the length of the string w = bcaab is
kwk = 4, since the string breaks down as follows: x1 = b, x2 = ca, x3 = a, x4 = b. So, our
xn is x4 and therefore n = 4. Although you may think that ca is two separate symbols, our
chosen alphabet in fact classifies it as a single symbol.
A special case occurs when kwk = 0, i.e. it does not have any symbols in it. This string
is called the empty string. Instead of saying w = , we use to represent the empty string:
w = . This is similar to the practice of using to represent a space, even though a space
is really blank.
If your alphabet contains as a symbol, then you must use something else to denote the
empty string.
Suppose you also have a string v on the same alphabet as w. We turn w into x1 xn just
as before, and similarly v = y1 ym . We say v is equal to w if and only if both m = n,
and for every i, xi = yi .
For example, suppose w = bba and v = bab, both strings on alphabet = {a, b}. These
strings are not equal because the second symbols do not match.
21.13
proof of homomorphic image of a -structure is

a -structure
We need to show that i(f ) is closed under functions. For every constant symbol c of ,
cB = f (cA). Hence cB i(f ). Also, if b1 , . . . , bn i(f ) and F is an n-ary function symbol of
, then for some a1 , . . . , an A we have
F B(b1 , . . . , bn ) = F B(f (a1 ), . . . , f (an )) = f (F A(a1 , . . . , an )).
Hence F B(b1 , . . . , bn ) i(f ).
179
21.14
satisfaction relation
Alfred Tarski was the first mathematician to give a definition of what it means for a formula
to be true in a structure. To do this, we need to provide a meaning to terms, and truthvalues to the formulas. In doing this, free variables cause a problem : what value are they
going to have ? One possible answer is to supply temporary values for the free variables,
and define our notions in terms of these temporary values.
Let A be a structure for the signature . Suppose J is an interpretation, and is a function
that assigns elements of A to variables, we define the function ValI, inductively on the
construction of terms :
ValI, (c) = I(c)

c a constant symbol
ValI, (x) = (x)
x a variable
ValI, (f (t1 , ..., tn )) = I(f )(ValI, (t1 ), ..., ValI, (tn ))
f an n-ary function symbol
Now we are set to define satisfaction. Again we have to take care of free variables by assigning
temporary values to them via a function . We define the relation A, |= by induction
on the construction of formulas :
A,
A,
A,
A,
A,
Here
|=
|=
|=
|=
|=
t1 = t2 if and only if ValI, (t1 ) = ValI, (t2 )

R(t1 , ..., tn ) if and only if (ValI, (t1 ), ..., ValI, (t1 )) I(R)
if and only if A, 6|=
if and only if either A, |= or A, |=
x.(x) if and only if for some a A, A, [x/a] |=
(
a
[x/a](y)
(y)
if x = y
else.
In case for some of L, we have A, |= , we say that A models, or is a model of,

or satisfies in environment, or context sigma. If has the free variables x1 , ..., xn ,
and a1 , ..., an A, we also write A |= (a1 , ..., an ) or A |= (a1 /x1 , ..., an /xn ) instead of
A, [x1 /a1 ] [xn /an ] |= . In case is a sentence (formula with no free variables), we write
A |= .
180
21.15
signature
A signature is the collection of a set of constant symbols, and for every natural number n,
a set of n-ary relation symbols and a set of n-ary function symbols.
21.16
strongly minimal
Let L be a first order language and let M be an L-structure. Let S, a subset of the domain
of M be a definable infinite set. Then S is strongly minimal iff every definable C S we
have either C is finite or S \ C is finite. We say that M is strongly minimal iff the domain
of M is a strongly minimal set.
If M is strongly minimal and N M then N is strongly minimal. Thus if T is a complete
L theory then we say T is strongly minimal if it has some model (equivalently all models)
which is strongly minimal.
Note that M is strongly minimal iff every definable subset of M is quantifier free definable
in a language with just equality. Compare this to the notion of o-minimal structures.
21.17
structure preserving mappings
Let be a fixed signature, and A and B be two structures for . The interesting functions
from A to B are the ones that preserve the structure.
A function f : A B is said to be a homomorphism if and only if:
1. For every constant symbol c of , f (cA) = cB.
2. For every natural number n and every n-ary function symbol F of ,
f (F A(a1 , ..., an )) = F B(f (a1 ), ..., f (an )).
3. For every natural number n and every n-ary relation symbol R of ,
RA(a1 , . . . , an ) RB(f (a1 ), . . . , f (an )).
Homomorphisms with various additional properties have special names:
181
An injective homomorphism is called a monomorphism.

A surjective homomorphism is called an epimorphism.
A bijective homomorphism is called a bimorphism.
An injective homomorphism whose inverse function is also a homomorphism is called
an embedding.
A surjective embedding is called an isomorphism.
A homomorphism from a structure to itself (e.g., f : A A) is called an endomorphism.
An isomorphism from a structure to itself is called an automorphism.
Version: 5 Owner: almann Author(s): almann, yark, jihemme
21.18
structures
Suppose is a fixed signature, and L is the corresponding first-order language. A structure A consists of a set A, called the universe of A, together with an interpretation
for the non-logical symbols contained in . The interpretation of in A is an operation
J on sets that has the following properties :
1. For each constant symbol c, J(c) is an element of A.
2. For each n N, and each n-ary function symbol f , J(f ) : An A is a function from
An to A.
3. For each n N, and each n-ary relation symbol R, J(R) is a subset of (n-ary
relation on) An .
Another commonly used notation is J(c) = cA, J(R) = RA, J(f ) = f A. For notational
convenience, when the context makes it clear in which structure we are working, we use the
elements of to stand for both the symbols and their interpretation. When is understood,
we call A a structure, instead of a -structure. In some texts, model may be used for
structure. Also, we shall write a A instead of a A. Of course, there are many different
possibilities for the interpretation J. If A is a structure, then the power of A, which we
denote |A|, is the cardinality of its universe A. It is easy to see that the number of possibilities
for the interpretation J is at most 2|A| when A is infinite.
182
21.19
substructure
Let be a fixed signature, and A and B structures for . We say A is a substructure of

B, denoted A B, if for all x A we have x B, and the inclusion map i : A B : x 7 x
is an embedding.
21.20
type
Let L be a first order language. Let M be an L-structure. Let B M, and let a M n .

Then we define the type of a over B to be the set of L-formulas (x, b) with parameters b
from B so that M |= (a, b). A collection of L-formulas is a complete n-type over B iff it is
of the above form for some B, M and a M n .
We call any consistent collection of formulas p in n variables with parameters from B a
partial n-type over B. (See criterion for consistency of sets of formulas.)
Note that a complete n-type p over B is consistent so is in particular a partial type over
B. Also p is maximal in the sense that for every formula (x, b) over B we have either
(x, b) p or (x, b) p. In fact, for every collection of formulas p in n variables
the following are equivalent:
p is the type of some sequence of n elements a over B in some model N M
p is a maximal consistent set of formulas.
For n we define Sn (B) to be the set of complete n-types over B.
Some authors define a collection of formulas p to be a n-type iff p is a partial n-type. Others
define p to be a type iff p is a complete n-type.
A type (resp. partial type/complete type) is any n-type (resp. partial type/complete type)
for some n .
21.21
upward Lowenheim-Skolem theorem
Let L be a first-order language and let A be an infinite L-structure. Then if is a cardinal

with > Max(|A|, |L|) then there is an L-structure B such that A is elementarily embedded
183
in B.
184
Chapter 22
03C15 Denumerable structures
22.1
random graph (infinite)
Suppose we have some method M of generating sequences of letters from {p, q} so that at
each generation the probability of obtaining p is x, a real number strictly between 0 and 1.
Let {ai : i < } be a set of vertices. For each i < , i > 1 we construct a graph Gi on the
vertices a1 , . . . , ai recursively.
G1 is the unique graph on one vertex.
For i > 1 we must describe for any j < k 6 i when aj and ak are joined.
If k < i then join aj and ak in Gi iff aj and ak are joined in Gi1
If k = i then generate a letter l(j, k) with M. Join aj to ak iff l(j, k) = p.
Now let be the graph on {ai : i < } so that for any n, m < , an is joined to am in iff
it is in some Gi .
Then we call a random graph. Consider the following property which we shall call fsaturation:
Given any
S finite disjoint U and V , subsets of {ai : i < } there is some an {ai : i <
} \ (U V ) so that an is joined to every point of U and no points in V .
Proposition 1. A random graph has f-saturation with probability 1.
S
Proof: Let b1 , b2 , . . . , bn , . . . be an enumeration of {ai : i < } \ (U V ). We say that bi is
correctly joined to (U, V ) iff it is joined to all the members of U and non of the members of
V . Then the probability that bi is not correctly joined is (1 x|U | (1 x)|V | ) which is some
185
real number y strictly between 0 and 1. The probability that none of the first m are correctly
joined is y m and the probability that none of the bi s are correctly joined is limn y n = 0.
Thus one of the bi s is correctly joined.
Proposition 2. Any two countable graphs with f-saturation are isomorphic.
Proof: This is via a back and fourth argument. The property of f-saturation is exactly what
is needed.
Thus although the system of generation of a random graph looked as though it could deliver
many potentially different graphs, this is not the case. Thus we talk about the random
graph.
The random graph can also be constructed as a Fraisse limit of all finite graphs, and in many
other ways. It is homogeneous and universal for the class of all countable graphs.
The theorem that almost every two infinite graph random are isomorphic was first proved
in [1].
REFERENCES
1. Paul Erdos and Alfed Renyi. Assymetric graphs. Acta Math. Acad. Sci. Hung., 14:295315,
1963.
Version: 2 Owner: bbukh Author(s): bbukh, Timmy
186
Chapter 23
03C35 Categoricity and
completeness of theories
23.1
-categorical
Let L be a first order language and let S be a set of L-sentences. If is a cardinal, then S
is said to be -categorical if S has a model of cardinality and any two such models are
isomorphic.
In other words, S is categorical iff it has a unique model of cardinality , to within isomorphism.
23.2
Vaughts test
Let L be a first order language, and let S be a set of L-sentences with no finite models which
is -categorical for some > |L|. Then S is complete.
23.3
proof of Vaughts test
Let be an L-sentence, and let A be the unique model of S of cardinality . Suppose A .

Then if B is any model of S then by the upward and downward Lowenheim-Skolem theorems,
there is a model C of S which is elementarily equivalent to B such that |C| = . Then C is
isomorphic to A, and so C , and B . So B for all models B of S, so S .
187
Similarly, if A then S . So S is complete.

188
Chapter 24
03C50 Models with special
properties (saturated, rigid, etc.)
24.1
example of universal structure
Let L be the first order language with the binary relation 6. Consider the following sentences:
x, y((x 6 y g x 6 y) ((x 6 y y 6 x) x = y))
x, y, z(x 6 y y 6 z x 6 z)
Any L-structure satisfying these is called a linear order. We define the relation < so that
x < y iff x 6 y x 6= y. Now consider these sentences:
1. x, y((x < y z(x < z < y))
2. xy, z(y < x < z)
A linear order that satisfies 1. is called dense. We say that a linear order that satisfies 2. is
without endpoints. Let T be the theory of dense linear orders without endpoints. This is a
complete theory.
We can see that (Q, 6) is a model of T . It is actually a rather special model.
Theorem 3. Let (S, 6) be any finite linear order. Then S embeds in (Q, 6).
Proof: By induction on |S|, it is trivial for |S| = 1.
189
Suppose that the statement holds for all linear orders with cardinality less than or equal to
n. Let |S| = n + 1, then pick some a S, let S 0 be the structure induced by S on S \ a.
Then there is some embedding e of S 0 into Q.
Now suppose a is less than every member of S 0 , then as Q is without endpoints, there
is some element b less than every element in the image of e. Thus we can extend e to
map a to b which is an embedding of S into Q.
We work similarly if a is greater than every element in S 0 .
If neither of the above hold then we can pick some maximum c1 S 0 so that c1 < a.
Similarly we can pick some minimum c2 S 0 so that c2 < a. Now there is some b Q
with e(c1 ) < b < e(c2 ). Then extending e by mapping a to b is the required embedding.

It is easy to extend the above result to countable structures. One views a countable structure
as a the union of an increasing chain of finite substructures. The necessary embedding is
the union of the embeddings of the substructures. Thus (Q, 6) is universal countable linear
order.
Theorem 4. (Q, 6) is homogeneous.
Proof: The following type of proof is known as a back and forth argument. Let S1 and S2 be
two finite substructures of (Q, 6). Let e : S1 S2 be an isomorphism. It is easier to think
of two disjoint copies B and C of Q with S1 a substructure of B and S2 a substructure of C.
Let b1 , b2 , . . . be an enumeration of B \ S1 . Let c1 , c2 , . . . , cn be an enumeration of C \ S2 .
We iterate the following two step process:
The ith forth step If bi is already in the domain of e then do nothing. If bi is not in the
domain of e. Then as in proposition 3, either bi is less than every element in the domain of
e or greater than or it has an immediate successor and predecessor in the range of e. Either
way there is an element c in C\ range(e) relative to the range of e. Thus we can extend the
isomorphism to include bi .
The ith back step If ci is already in the range of e then do nothing. If ci is not in the
domain of e. Then exactly as above we can find some b B\ dom(e) and extend e so that
e(b) = ci .
After stages, we have an isomorphism whose range includes every bi and whose domain
includes every ci . Thus we have an isomorphism from B to C extending e.
A similar back and forth argument shows that any countable dense linear order wihtout
endpoints is isomorphic to (Q, 6) so T is 0 -categorical.
190
24.2
homogeneous
Let L be a first order language. Let M be an L-structure. Then we say M is homogeneous

if the following holds:
if is an isomorphism between finite substructures of M, then extends to an automorphism
of M.
24.3
universal structure
Let L be a first order language, and let R be an elementary class of L-structures. Let be
a cardinal. R be the set of structures from R with cardinality less than or equal to .
Let M R . Suppose that for every N R there is an embedding of N into M. Then we
say M is universal.
191
Chapter 25
03C52 Properties of classes of
models
25.1
amalgamation property
A class of L-structures S has the amalgamation property iff whenever A, B1 , B2 S and and
fi : A Bi are elementary embeddings for i {1, 2} then there is some C S and some
elementary embeddings gi : Bi C for i {1, 2} so that g1 (f1 (x)) = g2 (f2 (x)) for all x A.
Compare this with the free product with amalgamated subgroup for groups and the definition of pushout contained there.
192
Chapter 26
03C64 Model theory of ordered
structures; o-minimality
26.1
infinitesimal
Let R be a real closed field, for example the reals thought of as a structure in L, the language
of ordered rings. Let B be some set of parameters from R. Consider the following set of
formulas in L(B):
{x 0}
Then this set of formulas is finitely satisfied, so by compactness is consistent. In fact this
set of formulas extends to a unique type p over B, as it defines a Dedekind cut. Thus there
is some model M containing B and some a M so that tp(a/B) = p.
Any such element will be called B-infinitesimal. In particular, suppose B = . Then the
definable closure of B is the intersection of the reals with the algebraic numbers. Then a
-infinitesimal (or simply infinitesimal ) is any element of any real closed field that is positive
but smaller than every real algebraic (positive) number.
As noted above such models exist, by compactness. One can construct them using ultraproducts, see the entry on hyperreal. This is due to Abraham Robinson, who used such
fields to formulate nonstandard analysis.
Let K be any ordered ring, then K contains N. We say K is archemedian iff for every a K
there is some n N so that a < n. Otherwise K is non-archemedian.
Real closed fields with infinitesimal elements are non-archemedian: for a an infinitesimal we
have a < 1/n and thus 1/a > n for each n N.
Reference: A Robinson, Selected papers of Abraham Robinson. Vol. II. Nonstandard anal193
ysis and philosophy (New Haven, Conn., 1979)

26.2
o-minimality
Let M be an ordered structure. An interval in M is any subset of M that can be expressed

in one of the following forms:
{x : a < x < b} for some a, b from M
{x : x > a} for some a from M
{x : x < a} for some a from M
Then we define M to be o-minimal iff every definable subset of M is a finite union of intervals
and points. This is a property of the theory of M i.e. if M N and M is o-minimal, then
N is o-minimal. Note that M being o-minimal is equivalent to every definable subset of M
being quantifier free definable in the language with just the ordering. Compare this with
strong minimality.
The model theory of o-minimal structures is well understood, for an excellent account see
Lou van den Dries, Tame topology and o-minimal structures, CUP 1998. In particular,
although this condition is merely on definable subsets of M it gives very good information
about definable subsets of M n for n .
26.3
real closed fields
It is clear that the axioms for a structure to be an ordered field can written in L, the
first order language of ordered rings. It is also true that the following conditions can be
written in a schema of first order sentences in this language. For each odd degree polynomial
p K[x], p has a root.
Let A be all these sentences together with one that states that all positive elements have a
square root. Then one can show that the consequences of A are a complete theory T . It is
clear that this theory is the theory of the real numbers. We call any L structure a real closed
field.
194
The semi algebraic sets on a real closed field are Boolean combinations of solution sets of
polynomial equalities and inequalities. Tarski showed that T has quantifier elimination,
which is equivalent to the class of semi algebraic sets being closed under projection.
Let K be a real closed field. Consider the definable subsets of K. By quantifier elimination,
each is definable by a quantifier free formula, i.e. a boolean combination of atomic formulas.
An atomic formula in one variable has one of the following forms:
f (x) > g(x) for some f, g K[x]
f (x) = g(x) for some f, g K[x].
The first defines a finite union of intervals, the second defines a finite union of points. Every
definable subset of K is a finite union of these kinds of sets, so is a finite union of intervals
and points. Thus any real closed field is o-minimal.
195
Chapter 27
03C68 Other classical first-order
model theory
27.1
imaginaries
Given an algebraic structure S to investigate, mathematicians consider substructures, restrictions of the structure, quotient structures and the like. A natural question for a mathematician to ask if he is to understand S is What structures naturally live in S? We can
formalise this question in the following manner: Given some logic appropriate to the structure S, we say another structure T is definable in S iff there is some definable subset T 0 of
S n , a bijection : T 0 T and a definable function (respectively relation) on T 0 for each
function (resp. relation) on T so that is an isomorphism (of the relevant type for T ).
For an example take some infinite group (G, .). Consider the centre of G, Z := {x G :
y G(xy = yx)}. Then Z is a first order definable subset of G, which forms a group with
the restriction of the multiplication, so (Z, .) is a first order definable structure in (G, .).
As another example consider the structure (R, +, ., 0, 1) as a field. Then the structure (R, <)
is first order definable in the structure (R, +, ., 0, 1) as for all x, y R2 we have x 6 y iff
z(z 2 = y x). Thus we know that (R, +, ., 0, 1) is unstable as it has a definable order on
an infinite subset.
Returning to the first example, Z is normal in G, so the set of (left) cosets of Z form a
factor group. The domain of the factor group is the quotient of G under the equivalence relation
x y iff z Z(xz = y). Therefore the factor group G/Z will not (in general) be a definable structure, but would seem to be a natural structure. We therefore weaken our
formalisation of natural from definable to interpretable. Here we require that a structure is isomorphic to some definable structure on equivalence classes of definable equivalence
relations. The equivalence classes of a -definable equivalence relation are called imaginaries.
196
In [2] Poizat defined the property of Elimination of Imaginaries. This is equivalent to the
following definition:
Definition 1. A structure A with at least two distinct -definable elements admits elimination of imaginaries iff for every n N and -definable equivalence relation on An there is
a -definable function f : An Ap (for some p) such that for all x and y from An we have
x y iff f (x) = f (y).
Given this property, we think of the function f as coding the equivalence classes of ,
and we call f (x) a code for x/ . If a structure has elimination of imaginaries then every
interpretable structure is definable.
In [3] Shelah defined, for any structure A a multi-sorted structure Aeq . This is done by adding
a sort for every -definable equivalence relation, so that the equivalence classes are elements
(and code themselves). This is a closure operator i.e. Aeq has elimination of imaginaries.
See [1] chapter 4 for a good presentation of imaginaries and Aeq . The idea of passing to
Aeq is very useful for many purposes. Unfortunately Aeq has an unwieldy language and
theory. Also this approach does not answer the question above. We would like to show that
our structure has elimination of imaginaries with just a small selection of sorts added, and
perhaps in a simple language. This would allow us to describe the definable structures more
easily, and as we have elimination of imaginaries this would also describe the interpretable
structures.
REFERENCES
1. Wilfrid Hodges, A shorter model theory Cambridge University Press, 1997.
2. Bruno Poizat, Une theorie de Galois imaginaire, Journal of Symbolic Logic, 48 (1983), pp.
1151-1170.
3. Saharon Shelah, Classification Theory and the Number of Non-isomorphic Models, North Hollans, Amsterdam, 1978.
197
Chapter 28
03C90 Nonclassical models
(Boolean-valued, sheaf, etc.)
28.1
Boolean valued model
A traditional model of a language makes every formula of that language either true or
false. A Boolean valued model is a generalization in which formulas take on any value in a
Boolean algebra.
Specifically, a Boolean valued model of a signature over the language L is a set A together
with a Boolean algebra B. Then the objects of the model are the functions AB = B A.
For any formula , we can assign a value kk from the Boolean algebra. For example, if L is
the language of first order logic, a typical recursive definition of kk might look something
like this:
kf = gk =
f (b)=g(b)
kk = kk0
k k = kk kk
W
kx(x)k = f AB k(f )k
198
Chapter 29
03C99 Miscellaneous
29.1
axiom of foundation
The axiom of foundation (also called the axiom of regularity) is an axiom of ZF

set theory prohibiting circular sets and sets with infinite levels of containment. Intuitively,
it states that every set can be built up from the empty set. There are several equivalent
formulations, for instance:
T
For any nonempty set X there is some y X such that y X = .
For any set X, there is no function f from to the transitive closure of X such that
f (n + 1) f (n).
For any formula , if there is any set x such that (x) then there is some X such that (X)
but there is no y X such that (y).
29.2
elementarily equivalent
If M and N are models of L then they are elementarily equivalent, denoted M N iff
for every sentence :
M iffN
199
29.3
elementary embedding
If A and B are models of L such that for each t T , At Bt , then we say B is an

elementary extension of A, or, equivalently, A is an elementary substructure of B if,
whenever is a formula of L with free variables included in x1 , . . . , xn (of types t1 , . . . , tn )
and a1 , . . . , an are such that ai ti for each i 6 n then:
A (a1 , . . . , an )iffB (a1 , . . . , an )
If A and B are models of L then a collection of one-to-one functions ft : At Bt for each
t T is an elementary embedding of A if whenever is a formula of type L with free
variables included in x1 , . . . , xn (of types t1 , . . . , tn ) and a1 , . . . , an are such that ai ti for
each i 6 n then:
A (a1 , . . . , an )iffB (ft1 (a1 ), . . . , ftn (an ))
29.4
model
Let L be a logical language with function symbols F , relations R, and types T . Then
M = h{Mt | t T }, {f M | f F }, {r M | r R}i
is a model of L (also called an L-structure, or, if the underlying logic is clear, a -structure,
where is a signature specifying just F and R) if:
Whenever f is an n-ary
Qn function symbol such that Type(f ) = t and Inputsn (f ) =
M
ht1 , . . . , tn i then f : 1 Mti Mt
Whenever r isQan n-ary relation symbol such that Inputsn (r) = ht1 , . . . , tn i then r M is
a relation on n1 Mti
If s is a term of L of type ts without free variables then it follows that s = f s1 . . . sn and

M
sM = f M(sM
1 , . . . , sn ) Mts .
If is a sentence then we write M (and say that M satisfies ) if is true in M, where
truth of a relation is defined by:
M
Rt1 . . . tn is true if RM(tM
1 , . . . , tn )
200
truth of a non-atomic formula is defined using the semantics of the underlying logic.
If is a class of sentences, we write M if for every , M .
For any term s of L whose only free variables are included in x1 , . . . , xn with types t1 , . . . , tn
then for any a1 , . . . , an such that ai Mti define sM(a1 , . . . , an ) by:
If si = xi then sM
i (a1 , . . . , an ) = ai
M
If s = f s1 . . . sm then s(Ma1 , . . . , an ) = f M(sM
1 (a1 , . . . , an ), . . . , sn (a1 , . . . , an ))
If is a formula whose only free variables are included in x1 , . . . , xn with types t1 , . . . , tn

then for any a1 , . . . , an such that ai Mti define M (a1 , . . . , an ) recursively by:
M
If = Rs1 . . . sm then M (a1 , . . . , an ) iff RM(sM
1 (a1 , . . . , an ), . . . , sn (a1 , . . . , an ))
Otherwise the truth of is determined by the semantics of the underlying logic.

As above, M (a1 , . . . , an ) iff for every , M (a1 , . . . , an ).
29.5
proof equivalence of formulation of foundation
We show that each of the three formulations of the axiom of foundation given are equivalent.
12
Let X be a set and consider any function f : tc(X).
Consider Y = {f (n) | n < }. By
T
assumption, there is some f (n) Y such that f (n) Y = , hence f (n + 1)
/ f (n).
23
Let be some formula such that (x) is true and for every X such that (X), there is some
y X such that (X). The define f (0) = x and f (n + 1) is some n f (n) such that (x).
This would construct a function violating the assumption, so there is no such .
201
31
Let X be a nonempty set and define (x) x X. Then is true for some X, and by
assumption, T
there is some y such that (y) but there is no z y such that (z). Hence
y X but y X = .
202
Chapter 30
03D10 Turing machines and related
notions
30.1
Turing machine
A Turing machine is an imaginary computing machine invented by Alan Turing to describe

what it means to compute something.
The physical description of a Turing machine is a box with a tape and a tape head. The
tape consists of an infinite number of cells stretching in both directions, with the tape head
always located over exactly one of these cells. Each cell has one of a finite number of symbols
written on it.
The machine has a finite set of states, and with every move the machine can change states,
change the symbol written on the current cell, and move one space left or right. The machine
has a program which specifies each move based on the current state and the symbol under
the current cell. The machine stops when it reaches a combination of state and symbol
for which no move is defined. One state is the start state, which the machine is in at the
beginning of a computation.
A Turing machine may be viewed as computing either a partial function or a relation. When
viewed as a function, the tape begins with a set of symbols which are the input, and when
the machine halts, whatever is on the tape is the output. For instance it is not difficult to
write a program which doubles a binary number, so input of 10 (with 0 on the first cell, 1
on the second, and all the rest blank) would give output 100. If the machine does not halt
on a particular input then the function is undefined on that input.
Alternatively, a Turing machine may be viewed as computing a relation. In that case the
initial symbols on the tape is again an input, and some states are denoted accepting. If
the machine halts in an accepting state, the symbol is accepted, if it halts in any other state,
203
the symbol is rejected. A slight variation is when all states are accepting, and a symbol
is rejected if the machine never halts (of course, if the only method of determining if the
machine will halt is watching it then you can never be sure that it wont stop at some point
in the future).
Another way for a Turing machine to compute a relation is to list (enumerate) its members
one by one. A relation is recursively enumerable if there is some Turing machine which can
list it in this way, or equivalently if there is a machine which halts in an accepting state only
on the members of the relation. A relation is recursive if it is recursively enumerable and its
complement is also. An equivalent definition is that there is a Turing machine which halts
in an accepting state only on members of the relation and always halts.
There are many variations on the definition of a Turing machine. The tape could be infinite
in only one direction, having a first cell but no last cell. Even stricter, a tape could move in
only one direction. It could be two (or more) dimensional. There could be multiple tapes,
and some of them could be read only. The cells could have multiple tracks, so that they hold
multiple symbols simultaneously.
The programs mentioned above define only one move for each possible state and symbol
combination; these are called deterministic. Some programs define multiple moves for some
combinations.
If the machine halts whenever there is any series of legal moves which leads to a situation
without moves, the machine is called non-deterministic. The notion is that the machine
guesses which move to use whenever there are multiple choices, and always guesses right.
Yet other machines are probabilistic; when given the choice between different moves they
select one at random.
No matter which of these variations is used, the recursive and recursively enumerable relations
and functions are unchanged (with two exceptionone of the tapes has to move in two directions, although it need not be infinite in both directions, and there can only be a finite
number of symbols, states, and tapes): the simplest imagineable machine, with a single tape,
one-way infinite tape and only two symbols, is equivalent to the most elaborate imagineable
array of multidimensional tapes, lucky guesses, and fancy symbols.
However not all these machines can compute at the same speed; the speed-up theorem states
that the number of moves it takes a machine to halt can be divided by an arbitrary constant
(the basic method involves increasing the number of symbols so that each cell encodes several
cells from the original machine; each move of the new machine emulates several moves from
the old one).
?
In particular, the question P = NP, which asks whether an important class determinisitic
machines (those which have a polynomial function of the input length bounding the time it
takes them to halt) is the same as the corresponding class of non-deterministic machines, is
one of the major unsolved problems in modern mathematics.
204
205
Chapter 31
03D20 Recursive functions and
relations, subrecursive hierarchies
31.1
primitive recursive
The class of primitive recursive functions is the smallest class of functions on the naturals
(from N to N) that
1. Includes
the zero function: z(x) = 0
the successor function: s(x) = x + 1
the projection functions: pn,m (x1 , . . . , xn ) = xm , m n

2. Is closed under
composition: h(x1 , . . . , xn ) = f (g1(x1 , . . . , xn ), . . . , gm (x1 , . . . , xn ))
primitive recursion: h(x, 0) = f (x); h(x, y + 1) = g(x, y, h(x, y))
The primitive recursive functions are Turing-computable, but not all Turing-computable
functions are primitive recursive (see Ackermanns function).
Further Reading
Daves Homepage: Primitive Recursive Functions: http://www.its.caltech.edu/ boozer/symbols/pr

Primitive recursive functions: http://public.logica.com/ stepneys/cyc/p/primrec.htm
206
Chapter 32
03D25 Recursively (computably)
enumerable sets and degrees
32.1
recursively enumerable
For a language L, TFAE:

There exists a Turing machine f such that x.(x L) the computation f (x) terminates.
There exists a total recursive function f : N L which is onto.
There exists a total recursive function f : N L which is one-to-one and onto.
A language L fulfilling any (and therefore all) of the above conditions is called recursively
enumerable.
Examples
1. Any recursive language.
2. The set of encodings of Turing machines which halt when given no input.
3. The set of encodings of theorems of Peano arithmetic.
4. The set of integers n for which the hailstone sequence starting at n reaches 1. (We
dont know if this set is recursive, or even if it is N; but a trivial program shows it is
recursively enumerable.)
207
Chapter 33
03D75 Abstract and axiomatic
computability and recursion theory
33.1
Ackermann function
Ackermanns function A(x, y) is defined by the recurrence relations
A(0, y)
=y+1
A(x + 1, 0)
= A(x, 1)
A(x + 1, y + 1) = A(x, A(x + 1, y))
Ackermanns function is an example of a recursive function that is not primitive recursive,
but is instead -recursive (that is, Turing-computable).
Ackermanns function grows extremely fast. In fact, we find that
A(0, y)
A(1, y)
A(2, y)
A(3, y)
=
=
=
=
y+1
2 + (y + 3) 3
2 (y + 3) 3
2y+3 3
.2
A(4, y) = 22
(y + 3exponentiations)
... and at this point conventional notation breaks down, and we need to employ something
like Conway notation or Knuth notation for large numbers.
208
Ackermanns function wasnt actually written in this form by its namesake, Wilhelm Ackermann. Instead, Ackermann found that the z-fold exponentiation of x with y was an example
of a recursive function which was not primitive recursive. Later this was simplified by Rosza
Peter to a function of two variables, similar to the one given above.
33.2
halting problem
The halting problem is to determine, given a particular input to a particular computer

program, whether the program will terminate after a finite number of steps.
The consequences of a solution to the halting problem are far-reaching. Consider some
predicate P (x) regarding natural numbers; suppose we conjecture that P (x) holds for all
x N. (Goldbachs conjecture, for example, takes this form.) We can write a program
that will count up through the natural numbers and terminate upon finding some n such
that P (n) is false; if the conjecture holds in general, then our program will never terminate.
Then, without running the program, we could pass it along to a halting program to
prove or disprove the conjecture.
In 1936, Alan Turing proved that the halting problem is undecideable; the argument is
presented here informally. Consider a hypothetical program that decides the halting the
problem:
Algorithm Halt(P, I)
Input: A computer program P and some input I for P
Output: True if P halts on I and false otherwise
The implementation of the algorithm, as it turns out, is irrelevant. Now consider another
program:
Algorithm Break(x)
Input: An irrelevant parameter x
Output: begin
if Halt(Break, x) then
whiletrue do
nothing
else
Break true
end
In other words, we can design a program that will break any solution to the halting problem.
If our halting solution determines that Break halts, then it will immediately enter an infinite
loop; otherwise, Break will return immediately. We must conclude that the Halt program
does not decide the halting problem.
209
210
Chapter 34
03E04 Ordered sets and their
cofinalities; pcf theory
34.1
another definition of cofinality
Let be a limit ordinal (e.g. a cardinal). The cofinality of cf() could also be defined as:
cf() = inf{|U| : U s.t. sup U = }
(sup U is calculated using the natural order of the ordinals). The cofinality of a cardinal is
always a regular cardinal and hence cf() = cf(cf()).
This definition is equivalent to the parent definition.
Version: 5 Owner: x bas Author(s): x bas
34.2
cofinality
If is an ordinal and X then X is said to be cofinal in if whenever y there is

x X with y 6 x.
A map f : between ordinals and is said to be cofinal if the image of f is cofinal
in .
If is an ordinal, the cofinality cf() of is the least ordinal such that there is a cofinal
map f : . Note that cf() 6 , because the identity map on is cofinal.
It is not hard to show that the cofinality of any ordinal is a cardinal, in fact a regular cardinal:
a cardinal is said to be regular if cf() = and singular if cf() < .
211
For any infinite cardinal it can be shown that < cf() , and so also < cf(2 ).
Examples
0 and 1 are regular cardinals. All other finite cardinals have cofinality 1 and are therefore
singular.
0 is regular.
Any infinite successor cardinal is regular.
The smallest infinite singular cardinal is . In fact, the map f : given by f (n) = n
is cofinal, so cf( ) = 0 . Note that cf(20 ) > 0 , and consequently 20 6= .
Version: 14 Owner: yark Author(s): yark, Evandar
34.3
maximal element
Let be an ordering on a set S, and let A S. Then, with respect to the ordering ,
a A is the least element of A if a x, for all x A.
a A is a minimal element of A if there exists no x A such that x a and x 6= a.
a A is the greatest element of A if x a for all x A.
a A is a maximal element of A if there exists no x A such that a x and x 6= a.
Examples.
The natural numbers N ordered by divisibility (|) have a least element, 1. The natural
numbers greater than 1 (N \ 1) have no least element, but infinitely many minimal
elements (the primes.) In neither case is there a greatest or maximal element.
The negative integers ordered by the standard definition of have a maximal element
which is also the greatest element, 1. They have no minimal or least element.
The natural numbers N ordered by the standard have a least element, 1, which is
also a minimal element. They have no greatest or maximal element.
The rationals greater than zero with the standard ordering have no least element or
minimal element, and no maximal or greatest element.
212
34.4
partitions less than cofinality
If < cf() then ()1 .

This follows easily from the definition of cofinality.
For any coloring f : then define
P
1
g : + 1 by g() = |f ()|. Then = < g(), and by the normal rules of cardinal
arithmatic sup< g() = . Since < cf(), there must be some < such that g() = .
34.5
well ordered set
A well-ordered set is a totally ordered set in which every nonempty subset has a least
member.
An example of well-ordered set is the set of positive integers with the standard order relation
(Z+ , <), because any nonempty subset of it has least member. However, R+ (the positive
reals) is not a well-ordered set with the usual order, because (0, 1) = {x : 0 < x < 1} is a
nonempty subset but it doesnt contain a least number.
A well-ordering of a set X is the result of defining a binary relation 6 on X to itself in
such a way that X becomes well-ordered with respect to 6.
Version: 9 Owner: drini Author(s): drini, vypertd
34.6
pigeonhole principle
For any natural number n, there does not exist a bijection between n and a proper subset
of n.
The name of the theorem is based upon the observation that pigeons will not occupy a
pigeonhole that already contains a pigeon, so there is no way to fit n pigeons in fewer than
n pigeonholes.
34.7
proof of pigeonhole principle
It will first be proven that, if a bijection exists between two finite sets, then the two sets
have the same number of elements.
213
Let S and T be finite sets and f : S T be a bijection. Since f is injective, then |S| =
| ran f |. Since f is surjective, then |T | = | ran f |. Thus, |S| = |T |.
Since the pigeonhole principle is the contrapositive of the proven statement, it follows that
the pigeonhole principle holds.
Version: 2 Owner: Wkbj79 Author(s): Wkbj79
34.8
tree (set theoretic)
In a set theory, a tree is defined to be a set T and a relation <T T T such that:
<T is a partial ordering of T
For any t T , {s T | s <T t} is well-ordered
The nodes immediately greater than a node are termed its children, the node immediately
less is its parent (if it exists), any node less is an ancestor and any node greater is a
descendant. A node with no ancestors is a root.
The partial ordering represents distance from the root, and the well-ordering requirement
prohibits any loops or splits below a node (that is, each node has at most one parent, and
therefore at most one grand-parent, and so on). Since there is generally no requirement that
the tree be connected, the null ordering makes any set into a tree, although the tree is a
trivial one, since each element of the set forms a single node with no children.
Since the set of ancestors of any node is well-ordered, we can associate it with an ordinal.
We call this the height, and write: ht(t) = o. t.({s T | s <T t}). This all accords with
normal usage: a root has height 0, something immediately above the root has height 1, and
so on. We can then assign a height to the tree itself, which we define to be the least number
greater than the height of any element of the tree. For finite trees this is just one greater
than the height of its tallest element, but infinite trees may not have a tallest element, so
we define ht(T ) = sup{ht(t) + 1 | t T }.
For every <T ht(T ) we define the -th level to be the set T = {t T | ht(t) = }. So
of course T0 is all roots of the tree. If <T ht(T ) then T () is the subtree of elements with
height less than : t T () x T ht(t) < .
We call a tree a -tree for any cardinal if |T | = and ht T = . If is finite, the only way
to do this is to have a single branch of length .
214
34.9
-complete
A structured set
T S (typically a filter or a Boolean algebra) is -complete if, given any K S
with |K| < , K S. It is complete if it is -complete for all .
Similarly, a partial order is -complete if any sequence of fewer than elements has an
upper bound within the partial order.
A 1 -complete structure is called countably complete.
34.10
Cantors diagonal argument
One of the starting points in Cantors development of set theory was his discovery that there
are different degrees of infinity. The rational numbers, for example, are countably infinite; it
is possible to enumerate all the rational numbers by means of an infinite list. By contrast,
the real numbers are uncountable. it is impossible to enumerate them by means of an
infinite list. These discoveries underlie the idea of cardinality, which is expressed by saying
that two sets have the same cardinality if there exists a bijective correspondence between
them.
In essence, Cantor discovered two theorems: first, that the set of real numbers has the same
cardinality as the power set of the naturals; and second, that a set and its power set have a
different cardinality (see Cantors theorem). The proof of the second result is based on the
celebrated diagonalization argument.
Cantor showed that for every given infinite sequence of real numbers x1 , x2 , x3 , . . . it is
possible to construct a real number x that is not on that list. Consequently, it is impossible
to enumerate the real numbers; they are uncountable. No generality is lost if we suppose
that all the numbers on the list are between 0 and 1. Certainly, if this subset of the real
numbers in uncountable, then the full set is uncountable as well.
Let us write our sequence as a table of decimal expansions:
0.
0.
0.
0.
..
.
d11
d21
d31
d41
..
.
d12
d22
d32
d42
..
.
d13
d23
d33
d43
..
.
d14
d24
d34
d44
..
.
...
...
...
...
..
.
where
xn = 0.dn1dn2 dn3 dn4 . . . ,
and the expansion avoids an infinite trailing string of the digit 9.
215
For each n = 1, 2, . . . we choose a digit cn that is different from dnn and not equal to 9, and
consider the real number x with decimal expansion
0.c1 c2 c3 . . .
By construction, this number x is different from every member of the given sequence. After
all, for every n, the number x differs from the number xn in the nth decimal digit. The claim
is proven.
Version: 6 Owner: rmilson Author(s): rmilson, slider142
34.11
Fodors lemma
If is a regular, uncountable cardinal, S is a stationary subset of , and f : is

regressive on S (that is, f () < for any S) then there is some and some stationary
S0 S such that f () = for any S0 .
34.12
Schroeder-Bernstein theorem
Let S and T be sets. If there exists an injection f : S T and an injection g : T S, then

S and T have the same cardinality.
The Schroder-Bernstein theorem is useful for proving many results about cardinality, since
it replaces one hard problem (finding a bijection between S and T ) with two generally easier
problems (finding two injections).
34.13
Veblen function
The Veblen function is used to obtain larger ordinal numbers than those provided by
exponentiation. It builds on a hierarchy of closed and unbounded classes:
Cr(0) is the additively indecomposable numbers, H
Cr(Sn) = Cr(n)0 the set of fixed points of the enumerating function of Cr(n)
T
Cr() = < Cr()
216
The Veblen function is defined by setting equal to the enumerating function of Cr().
We call a number strongly critical if Cr(). The class of strongly critical ordinals
is written SC, and the enumerating function is written fSC () = .
0 , the first strongly critical ordinal, is also called the Feferman-Schutte ordinal.
34.14
additively indecomposable,
An ordinal is called additively indecomposable if it is not 0 and for any , < ,

+ < . The set of additively indecomposable ordinals is denoted H.
Obviously 1 H, since 0 + 0 < 1. Also H since the sum of two finite numbers is still
finite, and no finite numbers other than 1 are in H.
H is closed and unbounded, so the enumerating function of H is normal. In fact, fH () = .
The derivative fH0 () is written . The number 0 =
of the series , , , . . ..
, therefore, is the first fixed point
34.15
cardinal number
A cardinal number is an ordinal number S with the property that S X for every ordinal
number X which has the same cardinality as S.
Version: 3 Owner: djao Author(s): rmilson, djao
34.16
cardinal successor
The cardinal successor of a cardinal is the least cardinal greater than . It is denoted + .
Version: 1 Owner: yark Author(s): yark
217
34.17
cardinality
Cardinality is a notion of the size of a set which does not rely on numbers. It is a relative
notion because, for instance, two sets may each have an infinite number of elements, but one
may have a greater cardinality. That is, it may have a more infinite number of elements.
The formal definition of cardinality rests upon the notion of a one-to-one mapping between
sets.
Definition.
Sets A and B have the same cardinality if there is a one-to-one and onto function f from A
to B (a bijection.) Symbolically, we write |A| = |B|. This is also called equipotence.
Results.
1. A is equipotent to A.
2. If A is equipotent to B, then B is equipotent to A.
3. If A is equipotent to B and B is equipotent to C, then A is equipotent to C.
Proof.
1. The identity function on A is a bijection from A to A.
2. If f is a bijection from A to B, then f 1 exists and is a bijection from B to A.
3. If f is a bijection from A to B and g is a bijection from B to C, then f g is a bijection
from A to C.
Example.
The set of even integers E has the same cardinality as the set of integers Z. We define
f : E Z such that f (x) = x2 . Then f is a bijection, therefore |E| = |Z|.
34.18
cardinality of a countable union
Let C be a countable collection of countable sets. Then

218
C is countable.
34.19
cardinality of the rationals
The set of rational numbers Q is countable, and therefore its cardinality is 0 .

Version: 2 Owner: quadrate Author(s): quadrate
34.20
classes of ordinals and enumerating functions
A class of ordinals is just a subset of the ordinals. For every class of ordinals M there is an
enumerating function fM defined by transfinite recursion:
fM () = min{x M | f () < x for all < }
This function simply lists the elements of M in order. Note that it is not necessarily defined
for all ordinals, although it is defined for a segment of the ordinals. Let otype(M) = dom(f )
be the order type of M, which is either On or some ordinal . If < then fM () <
fM (), so fM is an order isomorphism between otype(M) and M.
We say M is -closed if for any N M such that |N| < , also sup N M.
We say M is -unbounded if for any < there is some M such that < .
We say a function f : M On is -continuous if M is -closed and
f (sup N) = sup{f () | N}
A function is -normal if it is order preserving ( < implies f () < f ()) and continuous.
In particular, the enumerating function of a -closed class is always -normal.
All these definitions can be easily extended to all ordinals: a class is closed (resp. unbounded) if it is -closed (unbounded) for all . A function is continuous (resp. normal)
if it is -continuous (normal) for all .
34.21
club
T
If is a cardinal then a set C is closed iff for any S C and < , sup(S ) =
then C. (That is, if the limit of some sequence in C is less than then the limit is also
in C.)
If is a cardinal and C then C is unbounded if, for any < , there is some C
such that < .
219
If a set is both closed and unbounded then it is a club set.

34.22
club filter
If is a regular uncountable cardinal then club(), the filter of all sets containing a club
subset of , is a -complete filter closed under diagonal intersection called the club filter.
To see that this is a filter, note that club() since it is obviously both closed and
unbounded. If x club() then any subset of containing x is also in club(), since x, and
therefore anything containing it, contains a club set.
It is a complete filter because the intersection of fewer than club sets is a club set.
T To
see this, suppose hCi ii< is a sequence of club sets where < . Obviously C = Ci is
closed, since any sequence which appears in C appears in every Ci , and therefore its limit is
also in every Ci . To show that it is unbounded, take some < . Let h1,ii be an increasing
sequence with 1,1 > and 1,i Ci for every i < . Such a sequence can be constructed,
since every Ci is unbounded. Since < and is regular, the limit of this sequence is less
than . We call it 2 , and define a new sequence h2,i i similar to the previous sequence.
We can repeat this process, getting a sequence of sequences hj,ii where each element of a
sequence is greater than every member of the previous sequences. Then for each i < , hj,ii
is an increasing sequence contained in Ci , and all these sequences have the same limit (the
limit of hj,ii). This limit is then contained in every Ci , and therefore C, and is greater than
.
To see that club() is closed under diagonal intersection, let hCi i, i < be a sequence, and
let C = i< Ci . Since the diagonal intersection
T contains the intersection, obviously C is
unbounded. Then suppose S C and sup(S ) = . Then S C for every > , and
since each C is closed, C , so C.
34.23
countable
A set S is countable if there exists a bijection between S and some subset of N.

All finite sets are countable.
220
34.24
countably infinite
A set S is countably infinite if there is a bijection between S and N.

As the name implies, any countably infinite set is both countable and infinite.
Countably infinite sets are also sometimes called denumerable.
34.25
finite
A set S is finite if there exists a natural number n and a bijection from S to n. If there
exists such an n, then it is unique, and it is called the cardinality of S.
34.26
fixed points of normal functions
If f : M On is a function then Fix(f ) = {x M | f (x) = x} is the set of fixed points of

f . f 0 , the derivative of f , is the enumerating function of Fix(f ).
If f is -normal then Fix(f ) is -closed and -normal, and therefore f 0 is also -normal.
34.27
height of an algebraic number
Suppose we have an algebraic number such that the polynomial of smallest degree it is a
root of (with the co-efficients relatively prime) is given by:
n
X
ai xi
i=0
Then the height h of the algebraic number is given by:
h=n+
n
X
i=0
221
|ai |
This is a quantity which is used in the proof of the existence of transcendental numbers.
REFERENCES
1. Shaw, R. Mathematics Society Notes, 1st edition. Kings School Chester, 2003.
2. Stewart, I. Galois Theory, 3rd edition. Chapman and Hall, 2003.
3. Baker, A. Transcendental Number Theory, 1st edition. Cambridge University Press, 1975.
Version: 13 Owner: kidburla2003 Author(s): kidburla2003
34.28
if A is infinite and B is a finite subset of A, then

A \ B is infinite
Theorem. If A is an infinite set and B is a finite subset of A, then A \ B is infinite.

Proof. The proof is by contradiction. If A \ B would be finite, there would exist a k N
and a bijection f : {1, . . . , k} A \ B. Since B is finite, there also exists a bijection
g : {1, . . . , l} B. We can then define a mapping h : {1, . . . , k + l} A by

f (i)
when i {1, . . . , k},
h(i) =
g(i k) when i {k + 1, . . . , k + l}.
Since f and g are bijections, h is a bijection between a finite subset of N and A. This is a
contradiction since A is infinite. 2
34.29
limit cardinal
A limit cardinal is a cardinal such that + < for every cardinal < . Here + denotes
the cardinal successor of . If 2 < for every cardinal < , then is called a strong limit
cardinal.
Every strong limit cardinal is a limit cardinal, because + 6 2 holds for every cardinal .
Under GCH, every limit cardinal is a strong limit cardinal because in this case + = 2 for
every infinite cardinal .
The three smallest limit cardinals are 0, 0 and . Note that some authors do not count 0,
or sometimes even 0 , as a limit cardinal.
222
34.30
natural number
Given the Zermelo-Fraenkel axioms of set theory, one can prove that there exists an inductive set
X such that X. The natural numbers N are then defined to be the intersection of all
subsets of X which are inductive sets and contain the empty set as an element.
The first few natural numbers are:
0 :=
1 := 00 = {0} = {}
2 := 10 = {0, 1} = {, {}}
3 := 20 = {0, 1, 2} = {, {}, {, {}}}
Note that the set 0 has zero elements, the set 1 has one element, the set 2 has two elements,
etc. Informally, the set n is the set consisting of the n elements 0, 1, . . . , n 1, and n is both
a subset of N and an element of N.
In some contexts (most notably, in number theory), it is more convenient to exclude 0 from
the set of natural numbers, so that N = {1, 2, 3, . . . }. When it is not explicitly specified, one
must determine from context whether 0 is being considered a natural number or not.
Addition of natural numbers is defined inductively as follows:
a + 0 := a for all a N
a + b0 := (a + b)0 for all a, b N
Multiplication of natural numbers is defined inductively as follows:
a 0 := 0 for all a N
a b0 := (a b) + a for all a, b N
The natural numbers form a monoid under either addition or multiplication. There is an
ordering relation on the natural numbers, defined by: a 6 b if a b.
223
34.31
ordinal arithmetic
Ordinal arithmetic is the extension of normal arithmetic to the transfinite ordinal numbers.
The successor operation Sx (sometimes written x + 1, although this notation risks confusion
with the general definition of addition) is part of the definition of the ordinals, and addition
is naturally defined by recursion over this:
x+0 =0
x + Sy = S(x + y)
x + = sup< x + for limit
If x and y are finite then x+y under this definition is just the usual sum, however when x and
y become infinite, there are differences. In particular, ordinal addition is not commutative.
For example,
+ 1 = + S0 = S( + 0) = S
but
1 + = supn< 1 + n =
Multiplication in turn is defined by iterated addition:
x0 =0
x Sy = x y + x
x = sup< x for limit
Once again this definition is equivalent to normal multiplication when x and y are finite, but
is not commutative:
2 =1+ =+
but
2 = supn< 2 n =
Both these functions are strongly increasing in the second argument and weakly increasing
in the first argument. That is, if < then
+<+
<
+ 6+
6
224
34.32
ordinal number
An ordinal number is a well ordered set S such that, for every x S,

x = {z S | z < x}
(where < is the ordering relation on S).
34.33
power set
Definition If X is a set, then the power set of X is the set whose elements are the subsets
of X. It is usually denoted as P(X) or 2X .
1. If X is a finite set, then |2X | = 2|X| . This property motivates the notation 2X .
2. For an arbitrary set X, Cantors theorem states two things about the power set: First,
there is no bijection between X and P(X). Second, the cardinality of 2X is greater
than the cardinality of X.
Version: 5 Owner: matte Author(s): matte, drini
34.34
proof of Fodors lemma
If we let f 1 : P (S) be the inverse of f restricted to S then Fodors lemma is equivalent

to the claim that for any function such that f () > there is some S such
that f 1 () is stationary.
Then
T if1Fodors lemma is false, for every S there is some club set C such that
C f () = . Let C = < C . The club setsTare closed under diagonal intersection,
so C is also club and therefore there is some S C. Then C for each < , and
so there can be no < such that f 1 (), so f () > , a contradiction.
34.35
proof of Schroeder-Bernstein theorem
We first prove as a lemma that for any B A, if there is an injection f : A B, then there
is also a bijection h : A B.
225
Define a sequence {Ck }

k=0 of subsets of A by C0 = A B and for k 0, Ck+1 = f (Ck ).
If the
C
are
not
pairwise
disjoint, then there are minimal
integers j and k with j < k and
T
T k
= , so j > 0. Thus Cj = f (Cj1)
Cj Ck nonempty. Then k > 1, and so Ck B. C0 B T
and Ck = f (Ck1). By assumption, f is injective, so Cj1 Ck1 is nonempty, contradicting
the minimality of j. Hence the Ck are pairwise disjoint.
S
Now let C =
k=0 Ck , and define h : A B by

f (z), z C
h(z) =
.
z,
z 6 C
If z C, then h(z) = f (z) B. But if z
/ C, then z B, and so h(z) B. Hence h is
well-defined; h is injective by construction. Let b B. If b
/ C, then h(b) = b. Otherwise,
b Ck = f (Ck1) for some k 0, and so there is some a Ck1 such that h(a) = f (a) = b.
Thus b is bijective; in particular, if B = A, then h is simply the identity map on A.
To prove the theorem, suppose f : S T and g : T S are injective. Then the composition
gf : S g(T ) is also injective. By the lemma, there is a bijection h0 : S g(T ). The
injectivity of g implies that g 1 : g(T ) T exists and is bijective. Define h : S T by
h(z) = g 1 h0 (z); this map is a bijection, and so S and T have the same cardinality.
Version: 13 Owner: mps Author(s): mps
34.36
proof of fixed points of normal functions
Suppose f is a -normal function and consider any < and define a sequence by 0 =
and n+1 = f (n ). Let = supn< n . Then, since f is continuous,
f ( ) = sup f (n ) = sup n+1 =
n<
n<
So Fix(f ) is unbounded.
Suppose N is a set of fixed points of f with |N| < . Then
f (sup N) = sup f () = sup = sup N
N
so sup N is also a fixed point of f , and therefore Fix(f ) is closed.

34.37
proof of the existence of transcendental numbers
Cantor discovered this proof.

226
Lemma:
Consider a natural number k. Then the number of algebraic numbers of height k is finite.
Proof:
To see this, note the sum in the definition of height is positive. Therefore:
n6k
where n is the degree of the polynomial. For a polynomial of degree n, there are only n
coefficients, and the sum of their moduli is (k n), and there is only a finite number of ways
of doing this (the number of ways is the number of algebraic numbers). For every polynomial
with degree less than n, there are less ways. So the sum of all of these is also finite, and
this is the number of algebraic numbers with height k (with some repetitions). The result
follows.
Proof of the main theorem:

You can start writing a list of the algebraic numbers because you can put all the ones with
height 1, then with height 2, etc, and write them in numerical order within those sets because
they are finite sets. This implies that the set of algebraic numbers is countable. However,
by diagonalisation, the set of real numbers is uncountable. So there are more real numbers
than algebraic numbers; the result follows.
34.38
proof of theorems in aditively indecomposable
H is closed
Let {i | i < } be some increasing sequence of elements of H and let = sup{i | i < }.
Then for any x, y < , it must be that x < i and y < j for some i, j < . But then
x + y < max{i,j} < .
227
H is unbounded
Consider any , and define a sequence by 0 = S and n+1 = n + n . Let = supn<
be the limit of this sequence. If x, y < then it must be that x < i and y < j for some
i, j < , and therefore x + y < max{i,j}+1 . Note that is, in fact, the next element of H,
since every element in the sequence is clearly additively decomposible.
fH() =
Since 0 is not in H, we have fH (0) = 1.
For any + 1, we have fH ( + 1) is the least additively indecomposible number greater than
fH (). Let 0 = SfH () and n+1 = n + n = n 2. Then fH ( + 1) = supn< n =
supn< S 2 2 = fH () . The limit case is trivial since H is closed unbounded, so fH
is continuous.
34.39
proof that the rationals are countable
Suppose we have a rational number = p/q in lowest terms with q > 0. Define the height
of this number as h() = |p| + q. For example, h(0) = h( 10 ) = 1, h(1) = h(1) = 2,
and h(2) = h( 1
) = h( 21 ) = h(2) = 3. Note that the set of numbers with a given height
2
is finite. The rationals can now be partitioned into classes by height, and the numbers in
each class can be ordered by way of increasing numerators. Thus it is possible to assign
a natural number to each of the rationals by starting with 0, 1, 1, 2, 1
, 1 , 2, 3, . . . and
2 2
progressing through classes of increasing heights. This assignment constitutes a bijection
between N and Q and proves that Q is countable.
A corollary is that the irrational numbers are uncountable, since the union of the irrationals
and the rationals is R, which is uncountable.
Version: 5 Owner: quadrate Author(s): quadrate
34.40
stationary set
If is a cardinal, C , and C intersects every club in then C is stationary. If C is not

stationary then it is thin.
228
34.41
successor cardinal
A successor cardinal is a cardinal that is the cardinal successor of some cardinal.

34.42
uncountable
Definition A set is uncountable if it is not countable. In other words, a set S is uncountable, if there is no subset of N with the same cardinality as S.
1. All uncountable sets are infinite. However, the converse is not true. For instance, the
natural numbers and the rational numbers - although infinite - are both countable.
2. The real numbers form an uncountable set. The famous proof of this result is based
on Cantors diagonal argument.
Version: 2 Owner: matte Author(s): matte, vampyr
34.43
von Neumann integer
A von Neumann integer is not an integer, but instead a construction of a natural number
using some basic set notation. The von Neumann integers are defined inductively. The
von Neumann integer zero is defined to be the empty set, , and there are no smaller von
Neumann integers. The von Neumann integer N is then the set of all von Neumann integers
less than N. The set of von Neumann integers is the set of all finite von Neumann ordinals.
This form of construction from very basic notions of sets is applicable to various forms of
set theory (for instance, Zermelo-Fraenkel set theory). While this construction suffices to
define the set of natural numbers, a little more work must be done to define the set of all
integers.
229
Examples
0
1
2
3
=
=
=
=
..
.
N =
{0} = {}
{0, 1} = {, {}}
{0, 1, 2} = {, {} , {{, {}}}}
{0, 1, . . . , N 1}
Version: 3 Owner: mathcam Author(s): mathcam, Logan
34.44
von Neumann ordinal
The von Neumann ordinal is a method of defining ordinals in set theory.

The von Neumann ordinal is defined to be the well-ordered set containing the von Neumann ordinals which precede . The set of finite von Neumann ordinals is known as the
von Neumann integers. Every well-ordered set is isomorphic a von Neumann ordinal.
They can be constructed by transfinite recursion as follows:
The empty set is 0.
Given any ordinal , the ordinal + 1 (the successor of is defined to be
S
Given a set A of ordinals, aA a is an ordinal.
{}.
If an ordinal is the successor of another ordinal, it is an successor ordinal. If an ordinal is

neither 0 nor a successor ordinal then it is a limit ordinal. The first limit ordinal is named
.
The class of ordinals is denoted On.
The von Neumann ordinals have the convenient property that if a < b then a b and a b.
Version: 5 Owner: Henry Author(s): Henry, Logan
230
34.45
weakly compact cardinal
Weakly compact cardinals are (large) infinite cardinals which have a property related to the
syntactic compactness theorem for first order logic. Specifically, for any infinite cardinal ,
consider the language L, .
This language is identical to first logic except that:
(a) infinite conjunctions and disjunctions of fewer than formulas are allowed
(b) infinite strings of fewer than quantifiers are allowed
The weak compactness theorem for L, states that if is a set of sentences of L, such
that || = and any with || < is consistent then is consistent.
A cardinal is weakly compact if the weak compactness theorem holds for L, .
34.46
weakly compact cardinals and the tree property
A cardinal is weakly compact if and only if it is inaccessible and has the tree property.
Weak compactness implies tree property
Let be a weakly compact cardinal and let (T, <T ) be a tree with all levels smaller than
. We define a theory in L, with for each x T , a constant cx , and a single unary relation
B. Then our theory consists of the sentences:
[B(cx ) B(cy )] for every incompatible x, y T
W
xT () B(cx ) for each <
It should be clear that B represents membership in a cofinal branch, since the first class of
sentences asserts that no incompatible elements are both in B while the second class states
that the branch intersects every level.
Clearly || = , since there are elements in T , and hence fewer than = sentences
in the first group, and of course there are levels and therefore sentences in the second
group.
231
Now consider any with || < . Fewer than sentences of the second group are
included, so the set of x for which the corresponding cx must all appear in T () for some
< . But since T has branches of arbitrary height, T () |= .
Since is weakly compact, it follows that also has a model, and that model obviously has
a set of cx such that B(cx ) whose corresponding elements of T intersect every level and are
compatible, therefore forming a cofinal branch of T , proving that T is not Aronszajn.
34.47
Cantors theorem
Let X be any set and P(X) its power set. Cantors theorem states that there is no bijection
between X and P(X). Moreover the cardinality of P(A) is stricly greater than that of A,
that is |A| < |P(A)|.
Version: 2 Owner: igor Author(s): igor
34.48
proof of Cantors theorem
The proof of this theorem is fairly simple using the following construction which is central
to Cantors diagonal argument.
Consider a function F : X P(X) from X to its powerset. Then we define the set Z X
as follows:
Z = {x X | x 6 F (x)}.
Suppose that F is, in fact, a bijection. Then there must exist an x X such that F (x) = Z.
But, by construction, we have the following contradiction:
x Z x 6 F (x) x 6 Z.
Hence F cannot be a bijection between X and P(X).

34.49
additive
Let be some real function defined on an algebra of sets A. We say that is additive if,
whenever A and B are disjoint sets in A, we have
[
(A B) = (A) + (B).
232
Suppose A is a -algebra. Then, given any sequence hAi i of disjoint sets in A, if we have
[ X
Ai =
(Ai )
we say that is countably additive or -additive.
Useful properties of an additive set function include the following:

1. () = 0.
2. If A B, then (A) 6 (B).
3. If A B, then (B \ A) = (B) (A).
S
T
4. Given A and B, (A B) + (A B) = (A) + (B).
34.50
antisymmetric
A relation R on A is antisymmetric iff x, y A, (xRy yRx) (x = y). The number

of possible antisymmetric relations on A is 2n 3
where n = |A|.
n2 n
2
out of the 2n total possible relations,
Antisymmetric is not the same thing as not symmetric, as it is possible to have both at
the same time. However, a relation R that is both antisymmetric and symmetric has the
condition that xRy x = y. There are only 2n such possible relations on A.
An example of an antisymmetric relation on A = {, , ?} would be R = {(?, ?), (, ), (, ?), (?, )}.
One relation that isnt antisymmetric is R = {(, ), (?, ), (, ?)} because we have both ?R
and R?, but =
6 ?
34.51
constant function
Definition Suppose X and Y are sets and f : X Y is a function. Then f is a constant

function if f (a) = f (b) for all a, b in X.
233
Properties
1. The composition of a constant function with any function (for which composition is
defined) is a constant function.
2. A constant map between topological spaces is continuous.
34.52
direct image
Let f : A B be a function, and let U A be a subset. The direct image of U is the set
f (U) B consisting of all elements of B which equal f (u) for some u U.
34.53
domain
Let R be a binary relation. Then the set of all x such that xRy is called the domain of R.
That is, the domain of R is the set of all first coordinates of the ordered pairs in R.
34.54
dynkin system
Let be a set, and P() be the power set of . A dynkin system on is a set D P()
such that
1. D
2. A, B D and A B B \ A D
S
3. An D, An An+1 , n 1
k=1 Ak D.
Let A P() be a set, and consider
= {X : X is a dynkin system and A X} .
234
(34.54.1)
We define the intersection of all the dynkin systems containing A as

D(A) :=
(34.54.2)
One can easily verify that D(A) is itself a dynkin system and that it contains A. We call
D(A) the dynkin system generated by A. It is the smallest dynkin system containing
A.
A dynkin system which is also -system is a -algebra.
34.55
equivalence class
Let S be a set with an equivalence relation . An equivalence class of S under is a subset

T S such that
If x T and y S, then x y if and only if y T
If S is nonempty, then T is nonempty
For x S, the equivalence class containing x is often denoted by [x], so that
[x] := {y S | x y}.
The set of all equivalence classes of S under is defined to be the set of all subsets of S
which are equivalence classes of S under .
For any equivalence relation , the set of all equivalence classes of S under is a partition
of S, and this correspondence is a bijection between the set of equivalence relations on S
and the set of partitions of S (consisting of nonempty sets).
Version: 3 Owner: djao Author(s): djao, rmilson
34.56
fibre
Given a function f : X Y , a fibre is an inverse image of an element of Y . That is given

y Y , f 1 ({y}) = {x X | f (x) = y} is a fibre.
235
Example
Define f : R2 R by f (x, y) = x2 + y 2. Then the fibres of f consist of concentric circles
about the origin, the origin itself, and empty sets depending on whether we look at the
inverse image of a positive number, zero, or a negative number respectively.
34.57
filtration
A filtration is a sequence of sets A1 , A2 , . . . , An with

A1 A2 An .
If one considers the sets A1 , . . . , An as elements of a larger set which are partially ordered
by inclusion, then a filtration is simply a finite chain with respect to this partial ordering.
It should be noted that in some contexts the word filtration may also be employed to
describe an infinite chain.
34.58
finite character
A family F of sets is of finite character if

1. For each A F, every finite subset of A belongs to F;
2. If every finite subset of a given set A belongs to F, then A belongs to F.
34.59
fix (transformation actions)
Let A be a set, and T : A A a transformation of that set. We say that x A is fixed by

T , or that T fixes x, whenever
T (x) = x.
The subset of fixed elements is called the fixed set of T , and is frequently denoted as AT .
236
We say that a subset B A is fixed by T whenever all elements of B are fixed by T , i.e.
B AT .
If this is so, T restricts to the identity transformation on B.
The definition generalizes readily to a family of transformations with common domain
Ti : A A,
iI
In this case we say that a subset B A is fixed, if it is fixed by all the elements of the
family, i.e. whenever
\
B
ATi .
iI
34.60
function
Let A and B be sets. A function f : A B is a relation R from A to B such that

For every a A, there exists b B such that (a, b) R.
If a A, b1 , b2 B, and (a, b1 ) R and (a, b2 ) R, then b1 = b2 .
For a A, one usually denotes by f (a) the unique element b B such that (a, b) R. The
set A is called the domain of f , and the set B is called the codomain.
34.61
functional
Definition A functional T is a function mapping a function space (often a vector space)

V in to a field of scalars K, typically taken to be R or C.
Discussion Examples of functionals include the integral and entropy. A functional f is
often indicated by the use of square brackets, T [x] rather than T (x).
The linear functionals are those functionals T that satisfy
T(x+y)=T(x)+T(y)
237
T(cx)=cT(x)
for any c K, x, y V .
Version: 4 Owner: mathcam Author(s): mathcam, drummond
34.62
generalized cartesian product
Given any family of sets {Aj }jJ indexed by an index set J, the generalized cartesian product
Y
Aj
jJ
is the set of all functions

f :J
such that f (j) Aj for all j J.
For each i J, the projection map
i :
Y
jJ
Aj
jJ
Aj Ai
is the function defined by

i (f ) := f (i).
34.63
graph
The graph of a function f : X Y is the subset of X Y given by {(x, f (x)) : x X}.

34.64
identity map
Definition If X is a set, then the identity map in X is the mapping that maps each
element in X to itself.
238
Properties
1. An identity map is always a bijection.
2. Suppose X has two topologies 1 and 2 . Then the identity mapping I : (X, 1 )
(X, 2 ) is continuous if and only if 1 is finer than 2 , i.e., 1 2 .
3. The identity map on the n-sphere, is homotopic to the antipodal map A : S n S n if
n is odd [1].
REFERENCES
1. V. Guillemin, A. Pollack, Differential topology, Prentice-Hall Inc., 1974.
Version: 3 Owner: bwebste Author(s): matte
34.65
inclusion mapping
Definition Let X be a subset of Y . Then the inclusion map from X to Y is the mapping
:X Y
x 7 x.
In other words, the inclusion map is simply a fancy way to say that every element in X is
also an element in Y .
To indicate that a mapping is an inclusion mapping, one usually writes , instead of
when defining or mentioning an inclusion map. This hooked arrow symbol , can be seen
as combination of the symbols and . In the above definition, we have not used this
convention. However, examples of this convention would be:
Let : X , Y be the inclusion map from X to Y .
We have the inclusion S n , Rn+1 .
34.66
inductive set
An inductive set is a set X with the property that, for every x X, the successor x0 of x is
also an element of X.
239
One major example of an inductive set is the set of natural numbers N.

34.67
invariant
Let A be a set, and T : A A a transformation of that set. We say that x A is an

invariant of T whenever x is fixed by T :
T (x) = x.
We say that a subset B A is invariant with respect to T whenever
T (B) B.
If this is so, the restriction of T is a well-defined transformation of the invariant subset:

T : B B.
B
The definition generalizes readily to a family of transformations with common domain

Ti : A A,
iI
In this case we say that a subset is invariant, if it is invariant with respect to all elements of
the family.
34.68
inverse function theorem
Let f be a continuously differentiable, vector-valued function mapping the open set E Rn

to Rn and let S = f(E). If, for some point a E, the Jacobian, |Jf (a)|, is non-zero, then
there is a uniquely defined function g and two open sets X E and Y S such that
1. a X, f(a) Y ;
2. Y = f(X);
3. f : X Y is one-one;
4. g is continuously differentiable on Y and g(f(x)) = x for all x X.
240
Simplest case
When n = 1, this theorem becomes: Let f be a continuously differentiable, real-valued
function defined on the open interval I. If for some point a I, f 0 (a) 6= 0, then there
is a neighbourhood [, ] of a in which f is strictly monotonic. Then y f 1 (y) is a
continuously differentiable, strictly monotonic function from [f (), f ()] to [, ]. If f is
increasing (or decreasing) on [, ], then so is f 1 on [f (), f ()].
Note
The inverse function theorem is a special case of the implicit function theorem where the
dimension of each variable is the same.
34.69
inverse image
Let f : A B be a function, and let U B be a subset. The inverse image of U is the

set f 1 (U) A consisting of all elements a A such that f (a) U.
The inverse image commutes with all set operations: For any collection {Ui }iI of subsets of
B, we have the following identities for
1. unions:
f 1
Ui
iI
2. intersections:
f 1
Ui
iI
f 1 (Ui )
f 1 (Ui )
iI
iI
and for any subsets U and V of B, we have identities for

3. complements:
f 1 (U)
4. set differences:
{
= f 1 (U {)
f 1 (U \ V ) = f 1 (U) \ f 1 (V )
5. symmetric differences:
f 1 (U
V ) = f 1 (U)
241
f 1 (V )
In addition, for X A and Y B, the inverse image satisfies the miscellaneous identities
T
6. (f |X )1 (Y ) = X f 1 (Y )
T
7. f (f 1 (Y )) = Y f (A)
8. X f 1 (f (X)), with equality if f is injective.
34.70
mapping
Synonym of function, although typical usage suggests that mapping is the more generic term.
In a geometric context, the term function is often employed to connote a mapping whose
purpose is to assign values to the elements of its domain, i.e. a function defines a field of
values, whereas mapping seems to have a more geometric connotation, as in a mapping of
one space to another.
34.71
mapping of period n is a bijection
Theorem Suppose X is a set. Then a mapping f : X X of period n is a bijection.

Proof. If n = 1, the claim is trivial; f is the identity mapping. Suppose n = 2, 3, . . .. Then
for any x X, we have x = f f n1 (x) , so f is an surjection. To see that f is a injection,
suppose f (x) = f (y) for some x, y in X. Since f n is the identity, it follows that x = y. 2
Version: 3 Owner: Koro Author(s): matte
34.72
partial function
A function f : A B is sometimes called a total function, to signify that f (a) is defined

for every a A. If C is any set such that C A then f is also a partial function from C
to A.
Clearly if f is a function from A to B then it is a partial function from A to B, but a partial
function need not be defined for every element of its domain.
242
34.73
partial mapping
Let X1 , , Xn and Y be sets, and let f be a function of n variables: f : X1 X2 Xn

Y . Fix xi Xi for 2 6 i 6 n. The induced mapping a 7 f (a, x2 , . . . , xn ) is called the partial
mapping determined by f corresponding to the first variable.
In the case where n = 2, the map defined by a 7 f (a, x) is often denoted f (, x). Further,
any function f : X1 X2 Y determines a mapping from X1 into the set of mappings
of X2 into Y , namely f : x 7 (y 7 f (x, y)). The converse holds too, and it is customary
to identify f with f. Many of the canonical isomorphisms that we come across (e.g. in
multilinear algebra) are illustrations of this kind of identification.
Version: 2 Owner: mathcam Author(s): mathcam, Larry Hammick
34.74
period of mapping
Definition Suppose X is a set and f is a mapping f : X X. If f n is the identity mapping

on X for some n = 1, 2, . . ., then f is said to be a mapping of period n. Here, the notation
f n means the n-fold composition f f .
Examples
1. A mapping f is of period 1 if and only if f is the identity mapping.
2. Suppose V is a vector space. Then a linear involution L : V V is a mapping of
period 2. For example, the reflection mapping x 7 x is a mapping of period 2.
3. In the complex plane, the mapping z 7 e2i/n z is a mapping of period n for n =
1, 2, . . ..
4. Let us consider the function space spanned by the trigonometric functions sin and cos.
On this space, the derivative is a mapping of period 4.
Properties
1. Suppose X is a set. Then a mapping f : X X of period n is a bijection. (proof.)
2. Suppose X is a topological space. Then a continuous mapping f : X X of period n
is a homeomorphism.
243
34.75
pi-system
Let be a set, and P() be the power set of . A -system (or pi-system) on is a set
F P() such that
A, B F A
B F.
(34.75.1)
A -system is closed under finite intersection.

34.76
proof of inverse function theorem
Since det Df (a) 6= 0 the Jacobian matrix Df (a) is invertible: let A = (Df (a))1 be its
inverse. Choose r > 0 and > 0 such that
B = B (a) E,
1
2nkAk
.
r
2kAk
kDf (x) Df (a)k
x B,
Let y Br (f (a)) and consider the mapping

Ty : B Rn
Ty (x) = x + A (y f (x)).
If x B we have
kDTy (x)k = k1 A Df (x)k kAk kDf (a) Df (x)k
1
.
2n
Let us verify that Ty is a contraction mapping. Given x1 , x2 B, by the mean-value theorem

on Rn we have
|Ty (x1 ) Ty (x2 )|
1
sup nkDTy (x)k |x1 x2 | |x1 x2 |.
2
x[x1 ,x2 ]
Also notice that Ty (B) B. In fact, given x B,
1
|Ty (x) a| |Ty (x) Ty (a)| + |Ty (a) a| |x a| + |A (y f (a))| + kAkr .
2
2
244
So Ty : B B is a contraction mapping and hence by the contraction principle there exists

one and only one solution to the equation
Ty (x) = x,
i.e. x is the only point in B such that f (x) = y.
Hence given any y Br (f (a)) we can find x B which solves f (x) = y. Let us call
g : Br (f (a)) B the mapping which gives this solution, i.e.
f (g(y)) = y.
Let V = Br (f (a)) and U = g(V ). Clearly f : U V is one to one and the inverse of f
is g. We have to prove that U is a neighbourhood of a. However since f is continuous in
a we know that there exists a ball B (a) such that f (B (a)) Br (y0 ) and hence we have
B (a) U.
We now want to study the differentiability of g. Let y V be any point, take w Rn and
> 0 so small that y + w V . Let x = g(y) and define v() = g(y + w) g(y).
First of all notice that being
1
|Ty (x + v()) Ty (x)| |v()|
2
we have
1
|v() |v() A w| |v()| kAk |w|
2
and hence
|v()| 2kAk |w|.
On the other hand we know that f is differentiable in x that is we know that for all v it
holds
f (x + v) f (x) = Df (x) v + h(v)
with limv0 h(v)/|v| = 0. So we get
|h(v())|
2kAk |w| |h(v())|
0

v()
So
lim
0
when 0.
g(y + ) g(y)
v()
w h(v())
= lim
= lim Df (x)1
= Df (x)1 w
0
0

that is
Dg(y) = Df (x)1 .
245
34.77
proper subset
Let S be a set and let X S be a subset. We say X is a proper subset of S if X 6= S.

34.78
range
Let R be a binary relation. Then the set of all y such that xRy for some x is called the
range of R. That is, the range of R is the set of all second coordinates in the ordered pairs
of R.
In terms of functions, this means that the range of a function is the full set of values it can
take on (the outputs), given the full set of parameters (the inputs). Note that the range is
a subset of the codomain.
34.79
reflexive
A relation R on A is reflexive if and only if a A, aRa. The number of possible reflexive

2
2
relations on A is 2n n , out of the 2n total possible relations, where n = |A|.
For example, let A = {1, 2, 3}, R is a relation on A. Then R = {(1, 1), (2, 2), (3, 3), (1, 3), (3, 2)}
would be a reflexive relation, because it contains all the (a, a), a A pairs. However,
R = {(1, 1), (2, 2), (2, 3), (3, 1)} is not reflexive because it would also have to contain (3, 3).
34.80
relation
A relation is any subset of a cartesian product of two sets A and B. That is, any R A B
is a binary relation. One may write aRb to denote a A, b B and (a, b) R. A subset of
A A is simply called a relation on A.
An example of a relation is the less-than relation on integers, i.e. < Z Z. (1, 2) <,
but (2, 1) 6 <.
246
34.81
restriction of a mapping
Definition Let f : X Y be a mapping from a set X to a set Y . If A is a subset of X,

then the restriction of f to A is the mapping
f |A : A Y
a 7 f (a).
34.82
set difference
Let A and B sets in some ambient set X. The set difference, or simply difference, between
A and B (in that order) is the set of all elements that are contained in A, but not in B.
This set is denoted by A \ B, and we have
A \ B = {x X | x A, x
/ B}
\
= A B {,
where B { is the complement of B in X.
Remark
Sometimes the set difference is also written as A B. However, if A and B are sets in a
vector space, then A B is commonly used to denote the set
A B = {a b | a A, b B},
which, in general, is not the same as the set difference of A and B. Therefore, to avoid
confusion, one should try to avoid the notation A B for the set difference.
Version: 5 Owner: matte Author(s): matte, quadrate
34.83
symmetric
A relation R on A is symmetric iff x, y A xRy yRx. The number of possible

n2 +n
2
symmetric relations on A is 2 2 out of the 2n total possible relations, where n = |A|.
247
An example of a symmetric relation on A = {a, b, c} would be R = {(a, a), (c, b), (b, c), (a, c), (c, a)}.
One relation that is not symmetric is R = {(b, b), (a, b), (b, a), (c, b)}, because since we have
(c, b) we must also have (b, c) in order to be symmetric.
34.84
symmetric difference
The symmetric difference between two sets A and B, written A4 B, isSthe set of all
x such
x A or x B but not both. It is equal to (A B) (B A) and
S that either
T
(A B) (A B).
S
The symmetric
difference
operator
is
commutative
since
A4
B
=
(A
B)
(B A) =
S
(B A) (A B) = B4 A.
The operation is also associative. To see this, consider three sets A, B, and C. Any given
elemnet x is in zero, one, two, or all three of these sets. If x is not in any of A, B, or C,
then it is not in the symmetric difference of the three sets no matter how it is computed.
If x is in one of the sets, let that set be A; then x A4 B and x (A4 B)4 C; also,
x
/ (B4 C) and therefore x A4 (B4 C). If x is in two of the sets, let them be A
and B; then x
/ A4 B and x
/ (A4 B)4 C; also, x B4 C, but because x is in A,
x
/ A4 (B4 C). If x is in all three, then x
/ A4 B but x (A4 B)4 C; similarly,
x
/ B4 C but x A4 (B4 C). Thus, A4 (B4 C) = (A4 B)4 C.
In general, an element will be in the symmetric difference of several sets iff it is in an

odd number of the sets.
Version: 5 Owner: mathcam Author(s): mathcam, quadrate
34.85
the inverse image commutes with set operations
Theorem. Let f be a mapping from X to Y . If {Bi }iI is a (possibly uncountable) collection

of subsets in Y , then the following relations hold for the inverse image:
(1) f 1
[
iI
(2) f 1
\
iI
[ 1

Bi =
f
Bi
iI
\ 1

Bi =
f
Bi
iI
If A and B are subsets in Y , then we also have:

248
(3) For the set complement,

f 1 (A)
(4) For the set difference,
{
= f 1 (A{).
f 1 (A \ B) = f 1 (A) \ f 1 (B).
(5) For the symmetric difference,
f 1 (A
B) = f 1 (A)
f 1 (B).
Proof. For part (1), we have

n
[
[ o
f 1
Bi = x X | f (x)
Bi
iI
iI
= {x X | f (x) Bi for some i I}

[
=
{x X | f (x) Bi }
iI
[
iI

f 1 Bi .
Similarly, for part (2), we have

\
\

f 1
Bi = x X | f (x)
Bi
iI
iI
= {x X | f (x) Bi for all i I}

\
=
{x X | f (x) Bi }
iI
\
iI

f 1 Bi .
For the set complement, suppose x

/ f 1 (A). This is equivalent to f (x)
/ A, or f (x) T
A{,
1
{
which is equivalent to x f (A ). Since the set differenceaA \ B can be S
written as A B c ,
part (4) follows from parts (2) and (3). Similarly, since A B = (A \ B) (B \ A), part (5)
follows from parts (1) and (4). 2
34.86
transformation
Synonym of mapping and function. Often used to refer to mappings where the domain and
codomain are the same set, i.e. one can compose a transformation with itself. For example,
when one speaks of transformation of a space, one refers to some deformation/ of that space.
249
34.87
transitive
Let A be a set. A is said to be transitive if whenever x A then x A.

Equivalently, A is transitive if whenever x A and y x then y A.
34.88
transitive
A relation R on A is transitive if and only if x, y, z A, (xRy yRz) (xRz).

For example, the is a subset of relation between sets is transitive. The is not equal
to relation 6= between integers is not transitive. If we assign to our definiton x = 5, y = 42,
and z = 5, we know that both 5 6= 42 (x 6= y) and 42 6= 5 (y 6= z). However, 5 = 5 (x = z),
so 6= is not transitive
34.89
transitive closure
The transitive closure of a set X is the smallest transitive set tc(X) such that X tc(X).
The transitive closure of a set can be constructed as follows:
S
Define a function f on by f (0) = X and f (n + 1) = f (n)
tc(X) =
f (n)
n<
34.90
Hausdorff s maximum principle
Theorem Let X be a partially ordered set. Then there exists a maximal totally ordered
subset of X.
The Hausdorffs maximum principle is one of the many theorems equivalent to the axiom of choice.
The below proof uses Zorns lemma, which is also equivalent to the axiom of choice.
250
Proof. Let S be the set of all totally ordered subsets of X. S is not empty, since the
empty set is an element of S. Given a subset of S, the union of all the elements of
is again an element of S, as is easily verified. This shows that S, ordered by inclusion, is
inductive. The result now follows from Zorns lemma.2
Version: 3 Owner: matte Author(s): matte, cryo
34.91
Kuratowskis lemma
Any chain in an ordered set is contained in a maximal chain.

This proposition is equivalent to the axiom of choice.
34.92
Tukeys lemma
Each nonempty family of finite character has a maximal element.

Here, by a maximal element we mean a maximal element with respect to the inclusion
ordering: A 6 B iff A B. This lemma is equivalent to the axiom of choice.
34.93
Zermelos postulate
If F is a disjoint family of nonempty

T sets, then there is a set C which has exactly one element
of each A F (i.e such that A C is a singleton for each A F.)
This is one of the many propositions which are equivalent to the axiom of choice.
34.94
Zermelos well-ordering theorem
If X is any set whatsoever, then there exists a well-ordering of X. The well-ordering theorem
is equivalent to the axiom of choice.
251
34.95
Zorns lemma
Let X be a partially ordered set, and suppose that every chain in X has an upper bound.
Then X has a maximal element x, in the sense that for all y X, y 6> x.
Zorns lemma is equivalent to the axiom of choice.
34.96
axiom of choice
Let C be a collection of nonempty sets. Then there exists a function f with domain
C such that f (x) x for all x C. f is sometimes called a choice function on C.
The axiom of choice is commonly (although not universally) accepted with the axioms of
Zermelo-Fraenkel set theory. The axiom of choice is equivalent to the well-ordering principle
and to Zorns lemma.
The axiom of choice is sometimes called the multiplicative axiom, as it is equivalent to
the proposition that a product of cardinals is zero if and only if one of the factors is zero.
34.97
equivalence of Hausdorff s maximum principle,

Zorns lemma and the well-ordering theorem
Hausdorffs maximum principle implies Zorns lemma. Consider a partially ordered set
X, where every chain has an upper bound. According to the maximum principle there exists
a maximal totally ordered subset S
Y X. This then has an upper bound, x. If x is not
the largest element in Y then {x} Y would be a totally ordered set in which Y would be
properly contained, contradicting the definition. Thus x is a maximal element in X.
Zorns lemma implies the well-ordering theorem. Let X be any non-empty set, and
let A be the collection of pairs (A, 6), where A X and 6 is a well-ordering on A. Define
a relation , on A so that for all x, y A : x y iff x equals an initial of y. It is easy
to see that this defines a partial order relation on A (it inherits reflexibility, anti symmetry
and transitivity from one set being an initial and thus a subset of the other).
For each chain C A, define C 0 = (R, 60 ) where R is the union of all the sets A for all
(A, 6) C, and 60 is the union of all the relations 6 for all (A, 6) C. It follows that C 0
252
is an upper bound for C in A.

According to Zorns lemma, A now has a maximal element, (M, 6M ). We postulate that M
contains all members of X,
S for if this were not true we could for any a X M construct
(M , 6 ) where M = M {a} and 6 is extended so Sa (M ) = M. Clearly 6 then defines
a well-order on M , and (M , 6 ) would be larger than (M, 6M ) contrary to the definition.
Since M contains all the members of X and 6M is a well-ordering of M, it is also a wellordering on X as required.
The well-ordering theorem implies Hausdorffs maximum principle. Let (X, )
be a partially ordered set, and let 6 be a well-ordering on X. We define the function by
transfinite recursion over (X, 6) so that
(
SS
{a} if {a}
b<a (b) is totally ordered under .
(a) =
.
otherwise.
S
It follows that xX (x) is a maximal totally ordered subset of X as required.
Version: 4 Owner: mathcam Author(s): mathcam, cryo
34.98
equivalence of Zorns lemma and the axiom of

choice
Let X be a set partially ordered by < such that each chain has an upper bound. Equate
each x X with p(x) = {y X | x < y} P (X). Let p(X) = {p(x) | x X}. If p(x) =
then it follows that x is maximal.
Suppose no p(x) = . Then by the axiom of choice there is a choice function f on p(X), and
since for each p(x) we have f (p(x)) p(x), it follows that p(x) < f (p(x)). Define f (p(x))
for all ordinals i by transfinite induction:
f0 (p(x)) = p(x)
f+1 (p(x)) = f (p(x))
And for a limit ordinal , let f (p(x)) be the upper bound of fi (p(x)) for i < .
This construction can go on forever, for any ordinal. Then we can easily construct a surjective
function from X to Ord by g() = f (x). But that requires that X be a proper class, in
contradiction to the fact that it is a set. So there can be no such choice function, and there
must be a maximal element of X.
For the reverse, assume Zorns lemma and let C be any set of non-empty sets. Consider the
253
set of functions F = {f | a dom(f )(a C f (a) a)} partially ordered by inclusion.

Then the union of any chain in F is also a member of F (since the union of a chain of
functions is always a function). By Zorns lemma, F has a maximal element f , and since
any function with domain smaller than C can be easily expanded, dom(f ) = C, and so f is
a choice function C.
34.99
maximality principle
Let S be a collection of sets. If, for each chain C S, there exists an X S such that every
element of C is a subset of X, then S contains a maximal element. This is known as a the
maximality principle.
The maximality principle is equivalent to the axiom of choice.
34.100
principle of finite induction
Let S be a set of positive integers with the properties

1. 1 belongs to S, and
2. whenever the integer k is in S, then the next integer k + 1 must also be in S.
Then S is the set of all positive integers.
The Second Principle of Finite Induction would replace (2) above with
2. If k is a positive integer such that 1, 2, . . . , k belong to S, then k + 1 must also be in S.
The Principle of Finite Induction is a consequence of the well-ordering principle.
254
34.101
principle of finite induction proven from wellordering principle
Let T be the set of all postive integers not in S. Assume T is nonempty. The well-ordering principle
says T contains a least element; call it a. Since 1 S, we have a > 1, hence 0 < a 1 < a.
The choice of a as the smallest element of T means a 1 is not in T , and hence is in S. But
then (a 1) + 1 is in S, which forces a S, contradicting a T . Hence T is empty, and S
is all positive integers.
34.102
proof of Tukeys lemma
Let S be a set and F a set of subsets of S such that F is of finite character. By Zorns lemma,
it is enough to show that F is inductive. For that, it will be enough to show that if (Fi )iI
is a family of elements of F which is totally ordered by inclusion, then the union U of the
Fi is an element of F as well (since U is an upper bound on the family (Fi )). So, let K be
a finite subset of U. Each element of U is in Fi for some i I. Since K is finite and the Fi
are totally ordered by inclusion, there is some j I such that all elements of K are in Fj .
That is, K Fj . Since F is of finite character, we get K F , QED.
Version: 1 Owner: Koro Author(s): Larry Hammick
34.103
proof of Zermelos well-ordering theorem
Let X be any set and let f be a choice function on P(X) \ {}. Then define a function i by
transfinite recursion on the class of ordinals as follows:
[
[
i() = f (X
{i()}) unless X
{i()} = or i() is undefined for some <
<
<
(the function is undefined if either of the unless clauses holds).

Thus i(0) is just f (X) (the least element of X), and i(1) = f (X {i(0)}) (the least element
of X other than i(0)).
Define by the axiom of replacement = i1 [X] = { | i(x) = for some x X}. Since is
a set of ordinals, it cannot contain all the ordinals (by the Burali-Forti paradox).
Since the ordinals are well ordered, there is a least ordinal not in , and therefore i() is
undefined. It cannot be that the second unless clause holds (since is the least such ordinal)
255
S
so it must be that X < {i()} = , and therefore for every x X there is some <
such that i() = x. Since we already know that i is injective, it is a bijection between and
X, and therefore establishes a well-ordering of X by x <X y i1 (x) < i1 (y).
S
The reverse is simple. If C is a set of nonempty sets, select any well ordering of C. Then
a choice function is just f (a) = the least member of a under that well ordering.
34.104
axiom of extensionality
If X and Y have the same elements, then X = Y .

The Axiom of Extensionality is one of the axioms of Zermelo-Fraenkel set theory. In symbols,
it reads:
u(u X u Y ) X = Y.
Note that the converse,
X = Y u(u X u Y )
is an axiom of the predicate calculus. Hence we have,
X = Y u(u X u Y ).
Therefore the Axiom of Extensionality expresses the most fundamental notion of a set: a set
is determined by its elements.
Version: 2 Owner: Sabean Author(s): Sabean
34.105
axiom of infinity
There exists an infinite set.

The Axiom of Infinity is an axiom of Zermelo-Fraenkel set theory. At first glance, this axiom
seems to be ill-defined. How are we to know what constitutes an infinite set when we have
not yet defined the notion of a finite set? However, once we have a theory of ordinal numbers
in hand, the axiom makes sense.
Meanwhile, we can give a definition of finiteness that does not rely upon the concept of
number. We do this by introducing the notion
S of an inductive set. A set S is said to be
inductive if S and for every x S, x {x} S. We may then state the Axiom of
Infinity as follows:
There exists an inductive set.
256
In symbols:
S[ S (x S)[x
{x} S]]
We shall then be able to prove that the following conditions are equivalent:
1. There exists an inductive set.
2. There exists an infinite set.
3. The least nonzero limit ordinal, , is a set.
34.106
axiom of pairing
For any a and b there exists a set {a, b} that contains exactly a and b.
The Axiom of Pairing is one of the axioms of Zermelo-Fraenkel set theory. In symbols, it
reads:
abcx(x c x = a x = b).
Using the axiom of extensionality, we see that the set c is unique, so it makes sense to define
the pair
{a, b} = the unique c such that x(x c x = a x = b).
Using the Axiom of Pairing, we may define, for any set a, the singleton
{a} = {a, a}.
We may also define, for any set a and b, the ordered pair
(a, b) = {{a}, {a, b}}.
Note that this definition satisfies the condition
(a, b) = (c, d) iff a = c and b = d.
We may define the ordered n-tuple recursively
(a1 , . . . , an ) = ((a1 , . . . , an1 ), an ).
257
34.107
axiom of power set
For any X, there exists a set Y = P (X).

The Axiom of Power Set is an axiom of Zermelo-Fraenkel set theory. In symbols, it reads:
XY u(u Y u X).
In the above, u X is defined as z(z u z X). Hence Y is the set of all subsets of
X. Y is called the power set of X and is denoted P (X). By extensionality, the set Y is
unique.
The Power Set Axiom allows us to define the cartesian product of two sets X and Y :
X Y = {(x, y) : x X y Y }.
The Cartesian product is a set since
X Y P (P (X
Y )).
We may define the Cartesian product of any finite collection of sets recursively:
X1 Xn = (X1 Xn1 ) Xn .
34.108
axiom of union
For any X there exists a set Y =
X.
The Axiom of Union is an axiom of Zermelo-Fraenkel set theory. In symbols, it reads

XY u(u Y z(z X u z)).
Notice that this means that Y is the set of elements of all elements of X. More succinctly,
the union of any set of sets is a set. By extensionality, the set Y is unique. Y is called the
union of X.
In particular, the Axiom of Union, along with the axiom of pairing allows us to define
[
[
X
Y = {X, Y },
as well as the triple
{a, b, c} = {a, b}
258
[
{c}
and therefore the n-tuple

{a1 , . . . , an } = {a1 }
{an }
34.109
axiom schema of separation
Let (u, p) be a formula. For any X and p, there exists a set Y = {u X : (u, p)}.
The Axiom Schema of Separation is an axiom schema of Zermelo-Fraenkel set theory. Note
that it represents infinitely many individual axioms, one for each formula . In symbols, it
reads:
XpY u(u Y u X (u, p)).
By extensionality, the set Y is unique.
The Axiom Schema of Separation implies that may depend on more than one parameter
p.
We may show by induction that if (u, p1, . . . , pn ) is a formula, then
Xp1 pn Y u(u Y u X (u, p1 , . . . , pn ))
holds, using the Axiom Schema of Separation and the axiom of pairing.
Another consequence of the Axiom Schema of Separation is that a subclass of any set is a
set. To see this, let C be the class C = {u : (u, p1, . . . , pn )}. Then
\
XY (C X = Y )
holds, which means that the intersection

of C with any set is a set. Therefore, in particular,
T
the intersection of two sets X Y = {x X : x Y } is a set. Furthermore the difference
of two sets X Y = {x X : x
/ Y } is a set and, provided there exists at least one set,
which is guaranteed by the axiom of infinity, the empty set is a set. For if X is a set, then
= {x X : x 6= x} is a set.
T
T
Moreover, if C is a nonempty class, then C is a set, by Separation.
C is a subset of
every X C.
Lastly, we may use Separation to show that the class of all sets, V , is not a set, i.e., V is a
proper class. For example, suppose V is a set. Then by Separation
V 0 = {x V : x
/ x}
is a set and we have reached a Russell paradox.
259
34.110
de Morgans laws
In set theory, de Morgans laws relate the three basic set operations to each other; the
union, the intersection, and the complement. de Morgans laws are named after the Indianborn British mathematician and logician Augustus De Morgan (1806-1871) [1].
If A and B are subsets of a set X, de Morgans laws state that
[
\
(A B){ = A{ B {,
\
[
(A B){ = A{ B {.
S
T
Here, denotes the union, denotes the intersection, and A{ denotes the set complement
of A in X, i.e., A{ = X \ A.
Above, de Morgans laws are written for two sets. In this form, they are intuitively
quite
S
clear. For instance, the first claim states that an element that is not in A B S
is not in A
and not in B. It also states that an elements not in A and not in B is not in A B.
For an arbitrary collection of subsets, de Morgans laws are as follows:
Theorem. Let X be a set with subsets Ai X for i I, where I is an arbitrary index-set.

In other words, I can be finite, countable, or uncountable. Then
[ {
\
=
A{i ,
Ai
iI
\
iI
Ai
{
iI
A{i .
iI
(proof)
de Morgans laws in a Boolean algebra
For Boolean variables x and y in a Boolean algebra, de Morgans laws state that
(x y)0 = x0 y 0,
(x y)0 = x0 y 0.
Not surprisingly, de Morgans laws form an indispensable tool when simplifying digital circuits involving and, or, and not gates [2].
REFERENCES
1. Wikipedias entry on de Morgan, 4/2003.
2. M.M. Mano, Computer Engineering: Hardware Design, Prentice Hall, 1988.
Version: 11 Owner: matte Author(s): matte, drini, greg

260
34.111
de Morgans laws for sets (proof )
Let X be a set with subsets Ai X for i I, where I is an arbitrary index-set. In other

words, I can be finite, countable, or uncountable. We first show that
[ 0
\
Ai
=
A0i ,
iI
iI
where A0 denotes the complement of A.

0
T
S
and T = iI A0i . To establish the equality S = T , we shall
Let us define S =
iI Ai
use a standard argument for proving equalities in set theory. Namely, weSshow that S T
/ Ai
and T S. For the first claim, suppose x is an element
/ iI Ai , so x
T in S. Then x
0
for any i I. Hence T
x Ai for all i I, and x iI Ai = T . Conversely, suppose x
is anSelement in T = iI A0i . Then x A0i for all i I. Hence x
/ Ai for any i I, so
x
/ iI Ai , and x S.
The second claim,
\
iI
Ai
0
A0i ,
iI
follows by applying the first claim to the sets A0i .

34.112
set theory
Set theory is special among mathematical theories, in two ways: It plays a central role in
putting mathematics on a reliable axiomatic foundation, and it provides the basic language
and apparatus in which most of mathematics is expressed.
34.112.1
Axiomatic set theory
I will informally list the undefined notions, the axioms, and two of the schemes of set
theory, along the lines of Bourbakis account. The axioms are closer to the von NeumannBernays-Godel model than to the equivalent ZFC model. (But some of the axioms are
identical to some in ZFC; see the entry ZermeloFraenkelAxioms.) The intention here is just
to give an idea of the level and scope of these fundamental things.
There are three undefined notions:
1. the relation of equality of two sets
261
2. the relation of membership of one set in another (x y)

3. the notion of an ordered 3. pair, which is a set comprised from two 3. other sets, in a
specific order.
Most of the eight schemes belong more properly to logic than to set theory, but they, or
something on the same level, are needed in the work of formalizing any theory that uses the
notion of equality, or uses quantifiers such as . Because of their formal nature, let me just
(informally) state two of the schemes:
S6. If A and B are sets, and A = B, then anything true of A is true of B, and conversely.
S7. If two properties F (x) and G(x) of a set x are equivalent, then the generic set having
the property F , is the same as the generic set having the property G.
(The notion of a generic set having a given property, is formalized with the help of the
Hilbert symbol; this is one way, but not the only way, to incorporate what is called the
axiom of choice.)
Finally come the five axioms in this axiomatization of set theory. (Some are identical to
axioms in ZFC, q.v.)
A1. Two sets A and B are equal iff they have the same elements, i.e. iff the relation x A
implies x B and vice versa.
A2. For any two sets A and B, there is a set C such that the x C is equivalent to x = A
or x = B.
A3. Two ordered pairs (A, B) and (C, D) are equal iff A = C and B = D.
A4. For any set A, there exists a set B such that x B is equivalent to x A; in other
words, there is a set of all subsets of A, for any given set A.
A5. There exists an infinite set.
The word infinite is defined in terms of Axioms A1-A4. But to formulate the definition,
one must first build up some definitions and results about functions and ordered sets, which
we havent done here.
34.112.2
Product sets, relations, functions, etc.
Moving away from foundations and toward applications, all the more complex structures
and relations of set theory are built up out of the three undefined notions. (See the entry
Set.) For instance, the relation A B between two sets, means simply if x A then
x B.
262
Using the notion of ordered pair, we soon get the very important structure called the product
A B of two sets A and B. Next, we can get such things as equivalence relations and order
relations on a set A, for they are subsets of A A. And we get the critical notion of a
function
A B, as a subset of A B. Using functions, we get such things as the product
Q
iI Ai of a family of sets. (Family is a variation of the notion of function.)
To be strictly formal, we should distinguish between a function and the graph of that function, and between a relation and its graph, but the distinction is rarely necessary in practice.
34.112.3
Some structures defined in terms of sets
The natural numbers provide the first example. Peano, Zermelo and Fraenkel, and others
have given axiom-lists for the set N, with its addition, multiplication, and order relation;
but nowadays the custom is to define even the natural numbers in terms of sets. In more
detail, a natural number is the order-type of a finite well-ordered set. The relation m n
between m, n N is defined with the aid of a certain theorem which says, roughly, that for
any two well-ordered sets, one is a segment of the other. The sum or product of two natural
numbers is defined as the cardinal of the sum or product, respectively, of two sets. (For an
extension of this idea, see surreal numbers.)
(The term cardinal takes some work to define. The type of an ordered set, or any other
kind of structure, is the generic structure of that kind, which is defined using .)
Groups provide another simple example of a structure defined in terms of sets and ordered
pairs. A group is a pair (G, f ) in which G is just a set, and f is a mapping G G G
satisfying certain axioms; the axioms (associativity etc.) can all be spelled out in terms of
sets and ordered pairs, although in practice one uses algebraic notation to do it. When we
speak of (e.g.) the group S3 of permutations of a 3-element set, we mean the type of
such a group.
Topological spaces provide another example of how mathematical structures can be defined
in terms of, ultimately, the sets and ordered pairs in set theory. A topological space is a pair
(S, U), where the set S is arbitrary, but U has these properties:
any element of U is a subset of S
the union of any family (or set) of elements of U is also an element of U
the intersection of any finite family of elements of U is an element of U.
Many special kinds of topological spaces are defined by enlarging this list of restrictions on
U.
Finally, many kinds of structure are based on more than one set. E.g. a left module is a
commutative group M together with a ring R, plus a mapping R M M which satisfies
263
a specific set of restrictions.
34.112.4
Categories, homological algebra
Although set theory provides some of the language and apparatus used in mathematics
generally, that language and apparatus have expanded over time, and now include what are
called categories and functors. A category is not a set, and a functor is not a mapping,
despite similarities in both cases. A category comprises all the structured sets of the same
kind, e.g. the groups, and contains also a definition of the notion of a morphism from one
such structured set to another of the same kind. A functor is similar to a morphism but
compares one category to another, not one structured set to another. The classic examples
are certain functors from the category of topological spaces to the category of groups.
Homological algebra is concerned with sequences of morphisms within a category, plus
functors from one category to another. One of its aims is to get structure theories for specific
categories; the homology of groups and the cohomology of Lie algebras are examples. For
more details on the categories and functors of homological algebra, I recommend a search
for Eilenberg-Steenrod axioms.
Version: 8 Owner: mathwizard Author(s): Larry Hammick
34.113
union
The union of two sets A and B isSthe set which contains all x A and all x B. The
union of A and B is written as (A B).
For any sets A and B,
xA
B (x A) (x B)
Version: 1 Owner: imran Author(s): imran
34.114
universe
A universe U is a nonempty set satisfying the following axioms:

1. If x U and y x, then y U.
2. If x, y U, then {x, y} U.
264
3. If x U, then the power set P(x) U.

4. If {xi |i I U} is a family of elements of U, then
iI
xi U.
From these axioms, one can deduce the following properties:

1. If x U, then {x} U.
2. If x is a subset of y U, then x U.
3. If x, y U, then the ordered pair (x, y) = {{x, y}, x} is in U.
S
4. If x, y U, then x y and x y are in U.
5. If {xi |i I U} is a family of elements of U, then the product
iI
xi is in U.
6. If x U, then the cardinality of x is strictly less than the cardinality of U. In

particular, U
/ U.
The standard reference for universes is [SGA4].
REFERENCES
[SGA4] Grothendieck et al. SGA4.
34.115
von Neumann-Bernays-Gdel set theory
von Neumann-Bernays-Godel (commonly referred to as NBG or vNBG) is an axiomatisation

of set theory closely related to the more familiar Zermelo-Fraenkel with choice (ZFC) axioamatisation. The primary difference between ZFC and NBG is that NBG has proper classes
among its objects. NBG and ZFC are very closely related and are in fact equiconsistent,
NBG being a conservative extension of ZFC.
In NBG, the proper classes are differentiated from sets by the fact that they do not belong
to other classes. Thus in NBG we have
Set(x) yx y
Another interesting fact about proper classes within NBG is the following limitation of size principle
of von Neumann:
265
Set(x) |x| = |V |
where V is the set theoretic universe. This principle can in fact replace in NBG essentially all set existence axioms with the exception of the powerset axiom (and obviously the
axiom of infinity). Thus the classes that are proper in NBG are in a very clear sense big,
while the sets are small.
The NBG set theory can be axiomatised in two different ways
Using the Godel class construction functions, resulting in a finite axiomatisation
Using a class comprehension axiom scheme, resulting in an infinite axiomatisation
In the latter alternative we take ZFC and relativise all of its axioms to sets, i.e. we replace
every expression of form x with x(Set(x) ) and x with x(Set(x) ) and add
the class comprehension scheme
If is a formula with a free variable x with all its quantifiers restricted to
sets, then the following is an axiom: Ax(x A )
Notice the important restriction to formulae with quantifiers restricted to sets in the scheme.
This requirement makes the NBG proper classes predicative; you cant prove the existence
of a class the definition of which quantifies over all classes. This restriction is essential; if we
loosen it we get a theory that is not conservative over ZFC. If we allow arbitrary formulae in
the class comprehension axiom scheme we get what is called Morse-Kelley set theory. This
theory is essentially stronger than ZFC or NBG. In addition to these axioms, NBG also
contains the global axiom of choice
\
Cxz(C
x = {z})
Another way to axiomatise NBG is to use the eight Godel class construction functions. These
functions correspond to the various ways in which one can build up formulae (restricted
to sets!) with set parameters. However, the functions are finite in number and so are the
resulting axioms governing their behaviour. In particular, since there is a class corresponding
to any restricted formula, the intersection of any set and this class exists too (and is set).
Thus the comprehension scheme of ZFC can be replaced with a finite number of axioms,
provided we allow for proper classes.
It is easy to show that everything provable in ZF is also provable in NBG. It is also not too
difficult to show that NBG - global choice is conservative extension of ZFC. However, showing
that NBG (including global choice) is a conservative extension of ZFC is considerably more
difficult. This is equivalent to showing that NBG with global choice is conservative over
266
NBG with only local choice (choice restricted to sets). In order to do this one needs to use
(class) forcing. This result is usually credited to Easton and Solovay.
34.116
FS iterated forcing preserves chain condition
i< be a finite support iterated forcing where for

Let be a regular cardinal and let hQ
has the chain condition.
every < , P Q
By induction:
P0 is the empty set.
If P satisfies the chain condition then so does P+1 , since P+1 is equivalent to P Q
and composition preserves the chain condition for regular .
Suppose is a limit ordinal and P satisfies the chain condition for all < . Let
S = hpi ii< be a subset of P of size . The domains of the elements of pi form finite
subsets of , so if cf() > then these are bounded, and by the inductive hypothesis, two
of them are compatible.
Otherwise, if cf() < , let hj ij<cf() be an increasing sequence of ordinals cofinal in .
Then for any i < there is some n(i) < cf() such that dom(pi ) n(i) . Since is regular
and this is a partition of into fewer than pieces, one piece must have size , that is, there
is some j such that j = n(i) for values of i, and so {pi | n(i) = j} is a set of conditions
of size contained in Pj , and therefore contains compatible members by the induction
hypothesis.
Finally, if cf() = , let C = hj ij< be a strictly increasing, continuous sequence cofinal in
. Then for every i < there is some n(i) < such that dom(pi ) n(i) . When n(i) is a
limit ordinal, T
since C is continuous, there is also (since dom(pi ) is finite) some f (i) < i such
that dom(pi ) [f (i) , i ) = . Consider the set E of elements i such that i is a limit ordinal
and for any j < i, n(j) < i. This is a club, so by Fodors lemma there is some j such that
{i | f (i) = j} is stationary.
For each pi such that f (i) = j, consider p0i = pi j. There are of these, all members of Pj ,
so two of them must be compatible, and hence those two are also compatible in P .
267
34.117
chain condition
A partial order P satisfies the -chain condition if for any S P with |S| = then there
exist distinct x, y S such that either x 6 y or y 6 x.
If = 1 then P is said to satisfy the countable chain condition (c.c.c.)
34.118
composition of forcing notions
is some P -name such that P Q

is a forcing
Suppose P is a forcing notion in M and Q
notion.
of Q, P Q
=Q
(that is, no
Then take a set of P -names Q such that given a P name Q
matter which generic subset G of P we force with, the names in Q correspond precisely to
the elements of Q[G]).

We can define
P Q = {hp, qi | p P, q Q}
We can define a partial order on P Q such that hp1 , q1 i 6 hp2 , q2 i iff p1 6P p2 and p1
q1 6Q q2 . (A note on interpretation: q1 and q2 are P names; this requires only that q1 6 q2
in generic subsets contain p1 , so in other generic subsets that fact could fail.)
is itself a forcing notion, and it can be shown that forcing by P Q
is equivalent
Then P Q
to forcing first by P and then by Q[G].

34.119
composition preserves chain condition
Let be a regular cardinal. Let P be a forcing notion satisfying the chain condition.
be a P -name such that P Q
is a forcing notion satisfying the chain
Let Q
condition. Then P Q satisfies the chain conditon.
268
Proof:
Outline
We prove that there is some p such that any generic subset of P including p also includes
of the pi . Then, since Q[G] satisfies the chain condition, two of the corresponding qi must
be compatible. Then, since G is directed, there is some p stronger than any of these which
forces this to be true, and therefore makes two elements of S compatible.
Let S = hpi , qi ii< P Q.
=
Claim: There is some p P such that p |{i | pi G}|
= {hp, pi | p P }, hence G[G]
(Note: G
= G)
If no p forces this then every p forces that it is not true, and therefore P |{i | pi G}| 6 .
Since is regular, this means that for any generic G P , {i | pi G} is bounded. For
each G, let f (G) be the least such that < implies that there is some > such that
p G. Define B = { | = f (G)} for some G.
Claim: |B| <
= , and if , B then p must
If B then there is some p P such that p f (G)
be incompatible with p . Since P satisfies the chain condition, it follows that |B| < .
This is a contradiction,
Since is regular, = sub(B) < . But obviously p+1 p+1 G.
= .
so we conclude that there must be some p such that p |{i | pi G}|
If G P is any generic subset containing p then A = {
qi [G] | pi G} must have cardinality
. Since Q[G] satisfies the chain condition, there exist i, j < such that pi , pj G and
there is some q[G] Q[G] such that q[G] 6 qi [G], qj [G]. Then since G is directed, there is
some p0 G such that p0 6 pi , pj , p and p0 q[G] 6 q1 [G], q2 [G]. So hp0 , qi 6 hpi , qi i, hpj , qj i.
34.120
equivalence of forcing notions
Let P and Q be two forcing notions such that given any generic subset G of P there is a
generic subset H of Q with M[G] = M[H] and vice-versa. Then P and Q are equivalent.
269
Since if G M[H], [G] M for any P -name , it follows that if G M[H] and H M[G]
then M[G] = M[H].
34.121
forcing relation
If M is a transitive model of set theory and P is a partial order then we can define a forcing
relation:
p P (1 , . . . , n )
(p forces (1 , . . . , n ))
for any p P , where 1 , . . . , n are P - names.
Specifically, the relation holds if for every generic filter G over P which contains p,
M[G] (1 [G], . . . , n [G])
That is, p forces if every extension of M by a generic filter over P containing p makes
true.
If p P holds for every p P then we can write P to mean that for any generic G P ,
M[G] .
34.122
forcings are equivalent if one is dense in the

other
Suppose P and Q are forcing notions and that f : P Q is a function such that:
p1 6P p2 implies f (p1 ) 6Q f (p2 )
If p1 , p2 P are incomparable then f (p1 ), f (p2 ) are incomparable
f [P ] is dense in Q
then P and Q are equivalent.
270
Proof
We seek to provide two operations (computable in the appropriate universes) which convert
between generic subsets of P and Q, and to prove that they are inverses.
F (G) = H where H is generic

Given a generic G P , consider H = {q | f (p) 6 q} for some p G.
If q1 H and q1 6 q2 then q2 H by the definition of H. If q1 , q2 H then let p1 , p2 P
be such that f (p1 ) 6 q1 and f (p2 ) 6 q2 . Then there is some p3 6 p1 , p2 such that p3 G,
and since f is order preseving f (p3 ) 6 f (p1 ) 6 q1 and f (p3 ) 6 f (p2 ) 6 q2 .
Suppose D is a dense subset of Q. Since f [P ] is dense in Q, for any d D there is some
p P such that f (p) 6 d. For each d D, assign (using the axiom of choice) some dp P
such that f (dp ) 6 d, and call the set of these DP . This is dense in P , since for any p P
there is some d D such that d 6 f (p), and so some dp DP such that f (dp ) 6 d. If dp 6 p
then DP is dense, so suppose dp p. If dp 6 p then this provides a member of DP less than
p; alternatively, since f (dp ) and f (p) are compatible, dp and p are compatible, so p 6 dp ,
and therefore
f (p) = f (dp ) = d, so p DP . Since DP is dense in P , there is some element
T
p DP G. Since p DP , there is some d D such that f (p) 6 d. But since p G,
d H, so H intersects D.
G can be recovered from F (G)

Given H constructed as above, we can recover G as the set of p P such that f (p) H.
Obviously every element from G is included in the new set, so consider some p such that
f (p) H. By definition, there is some p1 G such that f (p1 ) 6 f (p). Take some dense
D Q such that there is no d D such that f (p) 6 d (this can be done easily be taking
any dense subset and removing all such elements; the resulting set is still dense since there
is some d1 such that d1 6 f (p) 6 d). This set intersects f [G] in some q, so there is some
p2 G such that f (p2 ) 6 q, and since G is directed, some p3 G such that p3 6 p2 , p1 . So
f (p3 ) 6 f (p1 ) 6 f (p). If p3 p then we would have p 6 p3 and then f (p) 6 f (p3 ) 6 q,
contradicting the definition of D, so p3 6 p and p G since G is directed.
F 1(H) = G where G is generic

Given any generic H in Q, we define a corresponding G as above: G = {p P | f (p) H}.
If p1 G and p1 6 p2 then f (p1 ) H and f (p1 ) 6 f (p2 ), so p2 G since H is directed. If
p1 , p2 G then f (p1 ), f (p2 ) H and there is some q H such that q 6 f (p1 ), f (p2).
271
Consider D, the set of elements of Q which are f (p) for some p P and either f (p) 6 q or
there is no element greater than both f (p) and q. This is dense, since given any q1 Q, if
q1 6 q then (since f [P ] is dense) there is some p such that f (p) 6 q1 6 q. If q 6 q1 then
there is some p such that f (p) 6 q 6 q1 . If neither of these and q there is some r 6 q1 , q then
any p such that f (p) 6 r suffices, and if there is no such r then any p such that f (p) 6 q
suffices.
T
There is some f (p) D H, and so p G. Since H is directed, there is some r 6 f (p), q,
so f (p) 6 q 6 f (p1 ), f (p2 ). If it is not the case that f (p) 6 f (p1 ) then f (p) = f (p1 ) = f (p2 ).
In either case, we confirm that H is directed.
Finally, let D be a dense subset of P . f [D] is dense in Q, since given any q Q, there
is some p T
P such that p 6 q,Tand some d D such that d 6 p 6 q. So there is some
f (p) f [D] H, and so p D G.
H can be recovered from F 1(H)

Finally, given G constructed by this method, H = {q | f (p) 6 q} for some p G. To see
this, if there is some f (p) for p G such that f (p) 6 q then f (p) H so q H. On the
other hand, if q H then the set of f (p) such that either f (p) 6 q or there is no r Q such
that r 6 q, f (p) is dense (as shown above), and so intersects H. But since H is directed, it
must be that there is some f (p) H such that f (p) 6 q, and therefore p G.
34.123
iterated forcing
We can define an iterated forcing of length by induction as follows:

Let P0 = .
0 be a forcing notion.
Let Q
For 6 , P is the set of all functions f such that dom(f ) and for any i dom(f ),
i . Order P by the rule f 6 g iff dom(g) dom(f ) and
f (i) is a Pi -name for a member of Q
for any i dom(f ), g i f (i) 6Q i g(i). (Translated, this means that any generic subset
i , be less than g(i).)
including g restricted to i forces that f (i), an element of Q
is a forcing notion).
is a forcing notion in P (so P Q
For < , Q
i< is an iterated forcing.

Then the sequence hQ
If P is restricted to finite functions that it is called a finite support iterated forcing
272
(FS), if P is restricted to countable functions, it is called a countable support iterated

function (CS), and in general if each function in each P has size less than then it is a
< -support iterated forcing.
s by induction, using a function F such that
Typically we construct the sequence of Q
i< ) = Q
.
F (hQ
34.124
iterated forcing and composition
There is a function satisfying forcings are equivalent if one is dense in the other f : P
Q P+1 .
Proof
S
Let f (hg, qi) = g {h, qi}. This is obviously a member of P+1 , since it is a partial function
from + 1 (and if the domain of g is less than then so is the domain of f (hg, qi)), if i <
then obviously f (hg, qi) applied to i satisfies the definition of iterated forcing (since g does),
and if i = then the definition is satisfied since q is a name in Pi for a member of Qi .
f is order preserving, since if hg1 , q1 i 6 hg2 , q2 i, all the appropriate characteristics of a
function carry over to the image, and g1 Pi q1 6 q2 (by the definition of 6 in ).
If hg1 , q1 i and hg2 , q2 i are incomparable then either g1 and g2 are incomparable, in which
case whatever prevents them from being compared applies to their images as well, or q1 and
q2 arent compared appropriately, in which case again this prevents the images from being
compared.
Finally, let g be any element of P+1 . Then g P . If
/ dom(g) then this is just g,
and f (hg, qi) 6 g for any q. If dom(g) then f (hg , g()i) = g. Hence f [P Q ] is
dense in P+1 , and so these are equivalent.
34.125
name
We need a way to refer to objects of M[G] within M. This is done by assigning a name to
each element of M[G].
Given a partial order P , we construct the P -names by induction. Each name is just a
273
relation between P and the set of names already constructed; that is, a name is a set of
ordered pairs of the form (p, ) where p P and is a name constructed at an earlier level
of the induction.
Given a generic subset G P , we can then define the interpretation [G] of a P -name in
M[G] by:
[G] = { 0 [G] | (p, 0 ) } for some p G
Of course, two different names can have the same interpretation.
The generic subset can be thought of as a key which reveals which potential elements of
are actually elements.
Any element x M can be given a canonical name
x = {(p, y) | y x, p P }
This guarantees that the elements of x[G] will be exactly the same as the elements of x,
regardless of which members of P are contained in G.
34.126
partial order with chain condition does not collapse cardinals
If P is a partial order with satisfies the chain condition and G is a generic subset of P then
for any < M, is also a cardinal in M[G], and if cf() = in M then also cf() =
in M[G].
This theorem is the simplest way to control a notion of forcing, since it means that a notion
of forcing does not have an effect above a certain point. Given that any P satisfies the |P |+
chain condition, this means that most forcings leaves all of M above a certain point alone.
(Although it is possible to get around this limit by forcing with a proper class.)
34.127
proof of partial order with chain condition does

not collapse cardinals
Outline:
274
Given any function f purporting to violate the theorem by being surjective (or cofinal) on
, we show that there are fewer than possible values of f (), and therefore only max(, )
possible elements in the entire range of f , so f is not surjective (or cofinal).
Details:
Suppose > is a cardinal of M that is not a cardinal in M[G].
There is some function f M[G] and some cardinal < such that f : is surjective.
This has a name, f. For each < , consider
F = { < | p f()
= } for some p P
|F | < , since any two p P which force different values for f() are incompatible and P
has no sets of incompatible elements of size .
S
Notice that F is definable in M. Then the range of f must be contained in F = i< Fi .
But |F | 6 = max(, ) < . So f cannot possibly be surjective, and therefore is not
collapsed.
Now suppose that for some > > , cf() = in M and for some < there is a cofinal
function f : .
S
We can construct F as above, and again the range of f is contained in F = i< Fi . But
then | range(f )| 6 |F | 6 < . So there is some < such that f () < for any < ,
and therefore f is not cofinal in .
34.128
proof that forcing notions are equivalent to their

composition
This is a long and complicated proof, the more so because the meaning of Q shifts depending
on what generic subset of P is being used. It is therefore broken into a number of steps. The
core of the proof is to prove that, given any generic subset G of P and a generic subset H of
Q[G]
there is a corresponding generic subset G H of P Q such that M[G][H] = M[G H],
and conversely, given any generic subset G of P Q we can find some generic GP of P and
P ] such that M[GP ][GQ ] = M[G].
a generic GQ of Q[G
We do this by constructing functions using operations which can be performed within the
forced universes so that, for example, since M[G][H] has both G and H, G H can be
calculated, proving that it contains M[G H]. To ensure equality, we will also have to
ensure that our operations are inverses; that is, given G, GP GH = G and given G and H,
(G H)P = P and (G H)Q = H.
275
The remainder of the proof merely defines the precise operations, proves that they give
generic sets, and proves that they are inverses.
Before beginning, we prove a lemma which comes up several times:
Lemma: If G is generic in P and D is dense above some p G then

T
G D 6=
Let D 0 = {p0 P | p0 D p0 is incompatible with p}. This is dense, since if p0 P

then either p0 is incompatible with p, in which case p0 D 0 , or there is some p1 such that
p1 6 p, p0 , and therefore there is some p2 6 p1 such that p2 D, and therefore p2 6 p0 . So
G intersects D 0 . But since a generic set is directed, no two elements are incompatible, so
G must contain an element of D 0 which is not incompatible with p, so it must contain an
element of D.
G H is a generic filter
First, given generic subsets G and H of P and Q[G],

we can define:
G H = {hp, qi | p G q[G] H}
G H is closed
Let hp1 , q1 i G H and let hp1 , q1 i 6 hp2 , q2 i. Then we can conclude p1 G, p1 6 p2 ,
q1 [G] H, and p1 q1 6 q2 , so p2 G (since G is closed) and q2 [G] H since p1 G and
p1 forces both q1 6 q2 and that H is downward closed. So hp2 , q2 i G H.
G H is directed
Suppose hp1 , q1 i, hp1 , q1 i G H. So p1 , p2 G, and since G is directed, there is some p3 6
p1 , p2 . Since q1 [G], q2 [G] H and H is directed, there is some q3 [G] 6 q1 [G], q2 [G]. Therefore
there is some p4 6 p3 , p4 G, such that p4 q3 6 q1 , q2 , so hp4 , q3 i 6 hp1 , q1 i, hp1 , q1 i and
hp4 , q3 i G H.
G H is generic
We can project it into a dense subset of Q using G:
Suppose D is a dense subset of P Q.
276
DQ = {
q[G] | hp, qi D} for some p G
Lemma: DQ is dense in Q[G]

take any p0 G. Then we can define yet another dense subset, this one
Given any q0 Q,
in G:
Dq0 = {p | p 6 p0 p q 6 q0 hp, qi D} for some q Q

Lemma: Dq0 is dense above p0 in P
we have some hp1 , q1 i 6
Take any p P such that p 6 p0 . Then, since D is dense in P Q,
hp, q0 i such that hp1 , q1 i D. Then by definition p1 6 p and p1 Dq0 .
T
From this lemma, we can conclude that there is some p1 6 p0 such that p1 G Dq0 , and
therefore some q1 such that p1 q1 6 q0 where hp1 , q1 i D. So DQ is indeed dense in Q[G].

T
Since DQ is dense in Q[G],

there is some q such that q[G] DQ H, and so some p G
such that hp, qi D. But since p G and q H, hp, qi G H, so G H is indeed generic.
GP is a generic filter
let:
Given some generic subset G of P Q,
GP = {p P | p0 6 p hp0 , qi G} for some p0 P and some q Q
GP is closed
Take any p1 GP and any p2 such that p1 6 p2 . Then there is some p0 6 p1 satisfying the
definition of GP , and also p0 6 p2 , so p2 GP .
GP is directed
Consider p1 , p2 GP . Then there is some p01 and some q1 such that hp01 , q1 i G and some
p02 and some q2 such that hp02 , q2 i G. Since G is directed, there is some hp3 , q3 i G such
277
that hp3 , q3 i 6 hp01 , q1 i, hp02 , q2 i, and therefore p3 GP , p3 6 p1 , p2 .

GP is generic
Let D be a dense subset of P . Then D 0 = {hp, qi | p D}. Clearly this is dense, since if
then there is some p0 6 p such that p0 D, so hp0 , qi D 0 and hp0 , qi 6 hp, qi.
hp, qi P Q
T
T
So there is some hp, qi D 0 G, and therefore p D GP . So GP is generic.
GQ is a generic filter
define:
Given a generic subset G P Q,
GQ = {
q[GP ] | hp, qi G} for some p P
P ], that is, the forcing notion
(Notice that GQ is dependant on GP , and is a subset of Q[G
inside M[GP ], as opposed to the set of names Q which weve been primarily working with.)
GQ is closed
Suppose q1 [GP ] GQ and q1 [GP ] 6 q2 [GP ]. Then there is some p1 GP such that p1
q1 6 q2 . Since p1 GP , there is some p2 6 p1 such that for some q3 , hp2 , q3 i G. By the
definition of GQ , there is some p3 such that hp3 , q1 i G, and since G is directed, there is
some hp4 , q4 i G and hp4 , q4 i 6 hp3 , q1 i, hp2 , q3 i. Since G is closed and hp4 , q4 i 6 hp4 , q2 i, we
have q2 [GP ] GQ .
GQ is directed
Suppose q1 [GP ], q2 [GP ] GQ . Then for some p1 , p2 , hp1 , q1 i, hp2 , q2 i G, and since G is
directed, there is some hp3 , q3 i G such that hp3 , q3 i 6 hp1 , q1 i, hp2 , q2 i. Then q3 [GP ] GQ
and since p3 G and p3 q3 6 q1 , q2 , we have q3 [GP ] 6 q1 [GP ], q2 [GP ].
GQ is generic
be a P -name for D, and let p1 GP
Let D be a dense subset of Q[GP ] (in M[GP ]). Let D
be a such that p1 D is dense. By the definition of GP , there is some p2 6 p1 such that

hp2 , q2 i G for some q2 . Then D 0 = {hp, qi | p q D p 6 p2 }.
278
Lemma: D 0 is dense (in G) above hp2 , q2 i

is dense, and therefore
Take any hp, qi P Q such that hp, qi 6 hp2 , q2 i. Then p D
and p q3 6 q. So hp, q3 i 6 hp, qi and hp, q3 i D 0 .
there is some q3 such that p q3 D
Take any hp3 , q3 i D 0
G. Then p3 GP , so q3 D, and by the definition of GQ , q3 GQ .
GP GQ = G
If G is a generic subset of P Q, observe that:
GP GQ = {hp, qi | p0 6 p hp0 , q0 i G hp0 , qi G} for some p0 , q0 , p0
If hp, qi G then obviously this holds, so G GP GQ . Conversly, if hp, qi GP GQ
then there exist p0 , q0 and p0 such that hp0 , q0 i, hp0, qi G, and since G is directed, some
hp1 , q1 i G such that hp1 , q1 i 6 hp0 , q0i, hp0 , qi. But then p1 6 p and p1 q1 6 q, and since
G is closed, hp, qi G.
(G H)P = G
Assume that G is generic in P and H is generic in Q[G].
Suppose p (G H)P . Then there is some p0 P and some q Q such that p0 6 p and
hp0 , qi G H. By the definition of G H, p0 G, and then since G is closed p G.
Conversely, suppose p G. Then (since H is non-trivial), hp, qi G H for some q, and
therefore p (G H)P .
(G H)Q = H
Assume that G is generic in P and H is generic in Q[G].
Given any q H, there is some q Q such that q[G] = q, and so there is some p such that
hp, qi G H, and therefore q[G] H.
On the other hand, if q (G H)Q then there is some hp, qi G H, and therefore some
q[G] H.
279
34.129
complete partial orders do not add small subsets
Suppose P is a -complete partial order in M. Then for any generic subset G, M contains
no bounded subsets of which are not in M.
34.130
proof of complete partial orders do not add

small subsets
Take any x M[G], x . Let x be a name for x. There is some p G such that
p x is a subset of bounded by <
Outline:
For any q 6 p, we construct by induction a series of elements q stronger than p. Each q
will determine whether or not x. Since we know the subset is bounded below , we can
use the fact that P is complete to find a single element stronger than q which fixes the
exact value of x. Since the series is definable in M, so is x, so we can conclude that above
any element q 6 p is an element which forces x M. Then p also forces x M, completing
the proof.
Details:
Since forcing can be described within M, S = {q P | q x V } is a set in M. Then,
given any q 6 p, we can define q0 = q and for any q ( < ), q+1 is an element of P
stronger than q such that either q+1 + 1 x or q+1 + 1
/ x. For limit , let q0
be any upper bound of q for < (this exists since P is -complete and < ), and let
q be stronger than q0 and satisfy either q+1 x or q+1
/ x. Finally let q be the
upper bound of q for < . q P since P is -complete.
Note that these elements all exist since for any p P and any (first-order) sentence there
is some q 6 p such that q forces either or .
q not only forces that x is a bounded subset of , but for every ordinal it forces whether or
not that ordinal is contained in x. But the set { < | q x} is defineable in M, and
is of course equal to x[G ] in any generic G containing q . So q x M.
Since this holds for any element stronger than p, it follows that p x M, and therefore
x[G] M.
280
34.131
3 is equivalent to and continuum hypothesis
If S is a stationary subset of and < implies 2 6 then

3S S
Moreover, this is best possible: 3S is consistent with S .
34.132
Levy collapse
Given any cardinals and in M, we can use the Levy collapse to give a new model
M[G] where = . Let P = Levy(, ) be the set of partial functions f : with
| dom(f )| < . These functions each give partial information about a function F which
collapses onto .
S
Given any generic subset G of P , M[G] has a set G, so let F = G. Each element of G is a
partial function, and they are all compatible, so F is a function. dom(G) = since for each
< the set of f P such that dom(f ) is dense (given any function without , it is
trivial to add (, 0), giving a stronger function which includes ). Also range(G) = since
the set of f P such that < is in the range of f is again
S dense (the domain of each f is
bounded, so if is larger than any element of dom(f ), f {(, )} is stronger than f and
includes in its domain).
So F is a surjective function from to , and is collapsed in M[G]. In addition,
| Levy(, )| = , so it satisfies the + chain condition, and therefore + is not collapsed, and
becomes + (since for any ordinal between and + there is already a surjective function
to it from ).
We can generalize this by forcing with P = Levy(, < ) with regular, the set of partial
functions f : such that f (0, ) = 0, | dom(f )| < and if > 0 then f (, i) < .
In essence, this is the union of Levy(, ) for each < < .
S
In M[G], define F = G and F () = F (, ). Each F is a function from to , and by
the same argument as above F is both total and surjective. Moreover, it can be shown that
P satisfies the chain condition, so does not collapse and = + .
281
34.133
proof of 3 is equivalent to and continuum

hypothesis
The proof that 3S implies both S and that for every < , 2 6 are given in the entries
for 3S and S .
Let A = hA iS be a sequence which satisfies S .
Since there are only bounded subsets of , there is a surjective function f :
Bounded() where Bounded() is the bounded subsets of . Define a sequence B =
hB i< by B = f () if sup(B ) < and otherwise. Since the set of (B , )
Bounded() such that B = T is unbounded for any bounded subset T , it follow that
every bounded subset of occurs times in B.
We can define a new sequence, D = hD iS such that x D x B for some A .
We can show that D satisfies 3S .
First, for any , x D means that x B for some A , and since B A ,
we have D .
Next take any T . We consider two cases:
T is bounded
The set of such that T = B forms an unbounded sequence T 0 , so there is a stationary
S 0 S such that S 0 A T 0 . For each such , x D x Bi for some
i A T 0 . But each such Bi is equal to T , so D = T .
T is unbounded
We define a function j : as follows:
j(0) = 0
T
To find j(), take X {j() | < }. This is a bounded subset of , so is equal to
an unbounded series of elementsSof B. Take j() = , where is T
the least number
greater than any element of {} {j() | < } such that B = X {j() | < }.
Let T 0 = range(j). This is obviously unbounded, and so there is a stationary S 0 S such
that S 0 A T 0 .
Next, consider C, the set of ordinals less than closed under j. Clearly it is unbounded,
since if < then j() includes j() for < , and so induction gives an ordinal greater
than closed under j (essentially the result of applying
j an infinite number of times). Also,
T
C is closed: take any c C and suppose sup(c ) = . Then for any < , there is some
282
c such that < < and therefore j() < . So is closed under j, and therefore
contained in C.
T
Since C is a club, C 0 = C S 0 is stationary. Suppose C 0 . Then x D x TB
where A . Since S 0 , range(j), and therefore B T . Next take any x T .
Since C, it is closed under j, hence there is some such that j(x) . Since
sup(A ) = , there is some A such that < , so j(x) . Since A , B D ,
and since range(j), j() B for any T
< j 1 (), and in particular x B . Since we
showed above that D , we have D = T for any C 0 .
34.134
Martins axiom
For any cardinal , Martins Axiom for (MA ) states that if P is a partial order
satisfying ccc then given any set of dense subsets of P , there is a directed subset intersecting
each such subset. Martins Axiom states that MA holds for every < 20 .
34.135
Martins axiom and the continuum hypothesis
MA0 always holds

Given a countable collection of dense subsets of a partial order, we can selected a set hpn in<
such that pn is in the n-th dense subset, and pn+1 6 pn for each n. Therefore CH implies
MA.
If MA then 20 > , and in fact 2 = 20

> 0 , so 2 > 20 , hence it will suffice to find an surjective function from P(0 ) to P().
T
Let A = hA i< , a sequence of infinite subsets of such that for any 6= , A A is
finite.
Given any subset S we will construct a function f : {0, 1} such that a unique S can
be recovered from each f . f will have the property that if i S then f (a) = 0 for finitely
many elements a Ai , and if i
/ S then f (a) = 0 for infinitely many elements of Ai .
Let P be the partial order (under inclusion) such that each element p P satisfies:
283
p is a partial function from to {0, 1}

There exist i1 , . . . , in S such that for each j < n, Aij dom(p)
S
There is a finite subset of , wp , such that wp = dom(p) j<n Aij
For each j < n, p(a) = 0 for finitely many elements of Aij
This satisfies ccc. To see this, consider any uncountable sequence S = hp i<1 of elements
of P . There are only countably many finite subsets of , so there is some w such that
w = wp for uncountably many p S and p w is the same for each such element. Since each
of these functions domain covers only a finite number of the A , and is 1 on all but a finite
number of elements in each, there are only a countable number of different combinations
available, and therefore two of them are compatible.
Consider the following groups of dense subsets:
Dn = {p P | n dom(p)} for n < . This is obviously dense since any p not already
in Dn can be extended to one which is by adding hn, 1i
DS
/ D then
= {p P | dom(p) A } for S. This is dense since if p
p {ha, 1i | a A \ dom(p)} is.
For each
/ S, n < , Dn, = {p P | m > n p(m)
= 0} for some m < . This
T
T SS
is dense since if p
/ Dn, then dom(p) A = A
wp
j Aij . But wp is finite,
and the intersection of A with any other Ai is finite, so this intersection is finite,
and
S hence bounded by some m. A is infinite, so there is some m 6 x A . So
p {hx, 0i} Dn, .
By MA , given any set of dense subsets of P , there is a generic G which intersects all of
them. There are a total of 0 + |S| + ( |S|) 0 = dense subsets in these
S three groups,
and hence some generic G intersecting all of them. Since
G is directed, g = G is a partial
T
function from to {0, 1}.TSince for each n < , G Dn is non-empty, n dom(g), so g is
a total function. Since G D for S is non-empty, there is some element of G whose
domain contains all of A and is 0 on
T a finite number of them, hence g(a) = 0 for a finite
number of a A . Finally, since G Dn, for each n < ,
/ S, the set of n A such
that g(n) = 0 is unbounded, and hence infinite. So g is as promised, and 2 = 20 .
34.136
Martins axiom is consistent
If is an uncountable strong limit cardinal such that for any < , = then it is
consistent that 20 = and MA. This is shown by using finite support iterated forcing to
284
construct a model of ZFC in which this is true. Historically, this proof was the motivation
for developing iterated forcing.
Outline
The proof uses the convenient fact that MA holds as long as it holds for all partial orders
smaller than . Given the conditions on , there are at most names for these partial orders.
At each step in the forcing, we force with one of these names. The result is that the actual
generic subset we add intersects every dense subset of every partial order.
Construction of P
will be constructed by induction with three conditions: |P | 6 for all 6 , P Q

Q
M, and P satisfies the ccc. Note that a partial ordering on a cardinal < is a function
from to {0, 1}, so there are at most 2 < of them. Since a canonical name for a
partial ordering of a cardinal is just a function from P to that cardinal, there are at most
2 6 of them.
At each of the steps, we want to deal with one of these possible partial orderings, so we
need to partition the steps in to steps for each of the cardinals less than . In addition,
we need to include every P name for any level. Therefore, we partion into hS, i,< for
each cardinal , with each S, having cardinality and the added condition that S,
implies > . Then each P name for a partial ordering of is assigned some index S, ,
and that partial order will be dealt with at stage Q .
for < , P can be constructed and the P names for partial orderings
Formally, given Q
of each cardinal enumerated by the elements of S, . S, for some and , and
of has already been assigned
> so some canonical P name for a partial order 6
to .
can be defined as h , 6
is a P name, it is also a P name, so Q
i if P h , 6
i
Since 6
satisfies the ccc and by the trivial partial order h1, {h1, 1i}i otherwise. Obviously this
is either trivial or a cardinal together
satisfies the ccc, and so P+1 does as well. Since Q
M. Finally, |P+q | 6 P ||n (supi |Q
i |)n 6 .
with a canonical name, P Q
n
Proof that MA holds for <

Lemma: It suffices to show that MA holds for partial order with size 6
uppose P is a partial order with |P | > and let hD i< be dense subsets of P . Define
functions fi : P D for with f (p) > p (obviously such elements exist since D is
S
285
dense). Let g : P P P be a function such that g(p, q) > p, q whenever p and q are
compatible. Then pick some element q P and let Q be the closure of {q} under f and g
with the same ordering as P (restricted to Q).
Since there are only functions
being used,
T it must be that |Q| 6 . If p Q then f (p) > p
T
and clearly f (p) Q D , so each D Q is dense in Q. In addition, Q is ccc: if A is an
antichain in Q and p1 , p2 A then p1 , p2 are incompatible in Q. But if they were compatible
in P then g(p1, p2 ) > p1 , p2 would be an element of Q, so they must be incompatible in P .
Therefore A is an antichain in P , and therefore must have countable cardinality, since P
satisfies the ccc.
T
T
By assumption, there is a directed G Q such that G (D Q) 6= for each < , and
therefore MA holds in full.
Now we must prove that, if G is a generic subset of P , R some partial order with |R| 6
and hD i< are dense subsets of R then there is some directed subset of R intersecting each
D .
If |R| < then additional elements can be added greater than any other element of R to
make |R| = , and then since there is an order isomorphism into some partial order of ,
assume R is a partial ordering of . Then let D = {h, i | D }.
i [G] for each i < and:

Take canonical names so that R = R[G],
D = D[G]
and Di = D
is a partial ordering satisfying ccc and
P R
and
D
D is dense in R
For any , there is a maximal antichain D, P such that if p D, then either
p P 6R or p P R and another maximal antichain E, P such that if
These antichains determine the
or p P h, i 6 D.
p E, then either p P h, i D
value of those two formulas.
Then, since cf > and = S
for < , itSmust be that cf = , so is regular. Then
= sup({ + 1 | dom(p), p ,< D, E, ) < , so D, , E, P , and therefore
and D
are also P names.
the P names R
Lemma: For any , G = {p | p G} is a generic subset of P
irst, it is directed, since if p1 , p2 G then there is some p G such that
p 6 p1 , p2 , and therefore p G and p 6 p1 , p2 .
F
Also, it is generic. If D is a dense subset of P then D = {p P | p 6 q D} isSdense in

P , since if p P then there is some
T d 6 p , but thenTd is compatible with p, so d p D .
Therefore there is some p D G , and so p D G .
286
and D
are P names, R[G]
] = R and D[G]
] = D, so
Since R
= R[G
= D[G
is a partial ordering of satisfying the ccc and
V [G ] R
D is dense in R
Then there must be some p G such that
is a partial ordering of satisfying the ccc
p P R
as a P name with
Let Ap be a maximal antichain of P such that p Ap , and define 6
and (a, n) 6
for each m R
if n = (, ) where < < and p 6= a Ap .
(p, m) 6
[G] = R when p G and 6
[G] = otherwise. Then this is the name for a
That is, 6
= 6
, and > .
partial ordering of , and therefore there is some S, such that 6
[G ] = 6
[G ] = R.
Since p G G , Q
Since P+1 = P Q , we know that GQ Q is generic since forcing

with the composition is equivalent to s
T
Since Di V [G ] V [G ] and is dense, it follows that Di GQ 6= and since GQ is a
subset of R in P , MA holds.
Proof that 20 =
The relationship between Martins axiom and the continuum hypothesis tells us that 20 >
. Since 20 was less than in V , and since |P | = adds at most elements, it must be
that 20 = .
34.137
a shorter proof: Martins axiom and the continuum hypothesis
This is another, shorter proof for the fact that MA0 always holds.
Let (P, 6) be a partially ordered set and DTbe a collection of subsets of (P, 6). We remeber
that a filter G on (P, 6) is D-generic if G D 6= for all D D which are dense in (P, 6).
(dense in this context means: If D is dense in (P, 6), than for every p P theres a d D
such that d 6 p.
Let (P, 6) be a partially ordered set and D a countable collection of dense subsets of P , then
there exists a D-generic filter G on P . Moreover, it could be shown, that for every p P
theres such a D-generic filter G with p G.
287
et D1 , . . . , Dn , . . . be the dense subsets in D. Furthermore let p0 = p. Now we can choose

for every 1 6 n < an element pn P such that pn 6 pn1 and pn Dn . If we now consider
the set G := {q P : n < s.t. pn 6 q}, than it is easy to check that G is a D-generic
filter on P and p G obviously. This completes the proof.
L
Version: 4 Owner: x bas Author(s): x bas
34.138
continuum hypothesis
The Continuum Hypothesis states that there is no cardinal number such that 0 < < 20 .
An equivalent statement is that 1 = 20 .
It is known to be independent of the axioms of ZFC.
The continuum hypothesis can also be stated as: there is no subset of the real numbers
which has cardinality strictly between that of the reals and that of the integers. It is from
this that the name comes, since the set of real numbers is also known as the continuum.
34.139
forcing
Forcing is the method used by Paul Cohen to prove the independence of the continuum hypothesis
(CH). In fact, the method was used by Cohen to prove that CH could be violated. The treatment I give here is VERY informal. I will develop it later. First let me give an example from
algebra.
Suppose we have a field k, and we want to add to this field an element such 2 = 1.
We see that we cannot simply drop a new in k, since then we are not guaranteed that we
still have a field. Neither can we simply assume that k already has such an element. The
standard way of doing this is to start by adjoining a generic indeterminate X, and impose a
constraint on X, saying that X 2 + 1 = 0. What we do is take the quotient k[X]/(X 2 + 1),
and make a field out of it by taking the quotient field. We then obtain k(), where is the
equivalence class of X in the quotient. The general case of this is the theorem of algebra
saying that every polynomial p over a field k has a root in some extension field.
We can rephrase this and say that it is consistent with standard field theory that 1 have
a square root.
When the theory we consider is ZFC, we run in exactly the same problem : we cant just
add a new set and pretend it has the required properties, because then we may violate
something else, like foundation. Let M be a transitive model of set theory, which we call
288
the ground model. We want to add a new set S to M in such a way that the extension
M0 has M as a subclass, and the properties of M are preserved, and S M0 .
The first step is to approximate the new set using elements of M. This is the analogue of
finding the irreducible polynomial in the algebraic example. The set P of such approximations can be ordered by how much information the approximations give : let p, q P , then
p 6 q if and only if p is stronger than q. We call this set a set of forcing conditions.
Furthermore, it is required that the set P itself and the order relation be elements of M.
Since P is a partial order, some of its subsets have interesting properties. Consider P as
a topological space with the order topology. A subset D P is dense in P if and only if
for every p P , there is d D such that d 6 p. A filter in P is said to be M-generic if
and only if it intersects every one of the dense subsets of P which are in M. An M-generic
filter in P is also referred to as a generic set of conditions in the literature. In general,
eventhough P is a set in M, generic filters are not elements of M.
If P is a set of forcing conditions, and G is a generic set of conditions in P , all in the ground
model M, then we define M[G] to be the least model of ZFC that contains G. In forthcoming
entries I will detail the construction of M[G]. The big theorem is this :
Theorem 5. M[G] is a model of ZFC, and has the same ordinals as M, and M M[G].
The way to prove that we can violate CH using a generic extension is to add many new
subsets of in the following way : let M be a transitive model of ZFC, and let (P, 6) be
the set (in M) of all functions f whose domain is a finite subset of 2 0 , and whose range
is the set {0, 1}. The ordering
here is p 6 q if and only if p q. Let G be a generic set of
S
conditions in P . Then G is a total function whose domain is 2 0 , and range is {0, 1}.
We can see this f as coding 2 new functions f : 0 {0, 1}, < 2 , which are subsets of
omega. These functions are all dictinct, and so CH is violated in M[G].
All this relies on a proper definition of the satisfaction relation in M[G], and the forcing relation,
which will come in a forthcoming entry. Details can be found in Thomas Jechs book Set
Theory.
34.140
generalized continuum hypothesis
The generalized continuum hypothesis states that for any infinite cardinal there is no
cardinal such that < < 2 .
Equivalently, for every ordinal , +1 = 2 .
Like the continuum hypothesis, the generalized continuum hypothesis is known to be independent
of the axioms of ZFC.
289
34.141
inaccessible cardinals
A limit cardinal is a strong limit cardinal if for any < , 2 < .

A regular limit cardinal is called weakly inaccessible, and a regular strong limit cardinal
is called inaccessible.
34.142
3S is a combinatoric principle regarding a stationary set S . It holds when

T there is a
sequence hA iS such that each A and for any A , { S | A = A } is
stationary.
To get some sense of what this means, observe that for any < , {} , so the set of
A = {} is stationary (in ). More strongly, suppose > . Then any subset of T is
bounded in so A = T on a stationary set. Since |S| = , it follows that 2 6 . Hence
31 , the most common form (often written as just 3), implies CH.
34.143
S is a combinatoric principle weaker than 3S . It states that, for S stationary in , there

is a sequence hA iS such that A and sup(A ) = and with the property that for
each unbounded subset T there is some A X.
Any sequence satisfying 3S can be adjusted so that sup(A ) = , so this is indeed a weakened
form of 3S .
Any such sequence actually contains a stationary set of such that A T for each T :
given any club C and any unbounded T , construct a sequence, C and T , from the
elements of each, such that the -th member of C is greater than the -th member of T ,
which is in turn greater than any earlier member of C . Since both sets are unbounded, this
construction is possible, and T is a subset of T still unbounded in . So there is some
such that A T , and since sup(A ) = , is also the limit of a subsequence of C and
therefore an element of C.
290
34.144
Dedekind infinite
A set A is a said to be Dedekind infinite if there is an injective function f : A, where

denotes the set of natural numbers.
A Dedekind infinite set is certainly infinite, and if the axiom of choice is assumed, then an
infinite set is Dedekind infinite. However, it is consistent with the failure of the axiom of
choice that there is a set which is infinite but not Dedekind infinite.
34.145
Zermelo-Fraenkel axioms
Equality of sets: If X and Y are sets, and x X iff x Y , then X = Y . Pair set: If X
and Y are sets, then there is a set Z containing only X and Y . Union over a set: If X is a
set, then there exists a set that contains every element of each x X. axiom of power set:
If X is a set, then there exists a set P(x) with the property that Y P(x) iff any element
y Y is also in X. Replacement axiom: Let F (x, y) be some formula. If, for all x, there
is exactly one y such that F (x, y) is true, then for any set A there exists a set B with the
property that b B iff there exists some a A such that F (a, b) is true. regularity axiom:
Let F (x) be some formula. If there is some x that makes F (x) true, then there is a set Y
such that F (Y ) is true, but for no y Y is F (y) true. Existence of an infinite set: There
exists a non-empty set X with the property that, for any x X, there is some y X such
that x y but x 6= y. Ernst Zermelo and Abraham Fraenkel proposed these axioms as a
foundation for what is now called Zermelo-Fraenkel set theory, or ZF. If these axioms are
accepted along with the axiom of choice, it is often denoted ZFC.
Version: 10 Owner: mathcam Author(s): mathcam, vampyr
34.146
class
By a class in modern set theory we mean an arbitrary collection of elements of the universe.
All sets are classes (as they are collections of elements of the universe - which are usually
sets, but could also be urelements), but not all classes are sets. Classes which are not sets
are called proper classes.
291
The need for this distinction arises from the paradoxes of the so called naive set theory. In
naive set theory one assumes that to each possible division of the universe into two disjoint
and mutually comprehensive parts there corresponds an entity of the universe, a set. This is
the contents of Freges famous fifth axiom, which states that to each second order predicate
P there corresponds a first order object p called the extension of P , s.t. x(P (x) x p).
(Every predicate P divides the universe into two mutually comprehensive and disjoint parts;
namely the part which consists of objects for which P holds and the part consisting of objects
for which P does not hold).
Speaking in modern terms we may view the situation as follows. Consider a model of set
theory M. The interpretation the model gives to defines implicitly a function f : P (M)
M. Seen this way, the fact that not all classes can be sets simply means that we cant
injectively map the powerset of any set into the set itself, which is a famous result by
Cantor. Functions like f here are known as extensors and they have been used in the study
of semantics of set theory.
Russells paradox - which could be seen as a proof of Cantors theorem about cardinalities
of powersets - shows that Freges fifth axiom is contradictory; not all classes can be sets.
From here there are two traditional ways to proceed: either trough the theory of types or
trough some form of limitation of size principle.
The limitation of size principle in its vague form says that all small classes (in the sense of
cardinality) are sets, while all proper classes are very big; too big to be sets. The limitation
of size principle can be found in Cantors work where it is the basis for Cantors doctrine that
only transfinite collections can be thought as specific objects (sets), but some collections are
absolutely infinite, and cant be thought to be comprehended into an object. This can be
given a precise formulation: all classes which are of the same cardinality as the universal
class are too big, and all other classes are small. In fact, this formulation can be used
in von Neumann-Bernays-Godel set theory to replace the replacement axiom and almost all
other set existence axioms (with the exception of the powerset axiom).
The limitation of size principle can be seen to give rise to extensors of type P <|A| (A) A.
(P <|A|(A) is the set of all subsets of A which are of cardinality less than that of A). This is
not the only possible way to avoid Russells paradox. We could use an extensor according
to which all classes which are of cardinality less than that of the universe or for which the
cardinality of their complement is less than that of the universe are sets (i.e. map into
elements of the model).
In many set theories there are formally no proper classes; ZFC is an example of just such a
set theory. In these theories one usually means by a proper class an open formula , possibly
with set parameters a1 , ..., an . Notice, however, that these do not exhaust all possible proper
classes that should really exist for the universe, as it only allows us to deal with proper
classes that can be defined by means of an open formula with parameters. The theory NBG
formalises this usage: its conservative over ZFC (as clearly speaking about open formulae
with parameters must be!).
292
There is a set theory known as Morse-Kelley set theory which allows us to speak about and
to quantify over an extended class of impredicatively defined porper classes that cant be
reduced to simply speaking about open formulae.
34.147
complement
Let A be a subset of B. The complement of A in B (denoted A{ when the larger set B is

clear from context) is the set difference B \ A.
34.148
delta system
If S is a set of finite sets then it is aT-system if there is some (possibly empty) X such
that for any a, b S, if a 6= b then a b = X.
34.149
delta system lemma
If S is a set of finite sets such that |S| = 1 then there is a S 0 S such that |S 0 | = 1 and
S is a -system.
34.150
diagonal intersection
If hST
i ii< is a sequence then the diagonal intersection, i< Si is defined to be { < |
< S }.
That is, is in i< Si if it is contained in the first members of the sequence.

293
34.151
intersection
The intersection of two sets A and B is the set that contains T

all the elements x such that
x A and x B. The intersection of A and B is written as A B.
T
Example. If A = {1, 2, 3, 4, 5} and B = {1, 3, 5, 7, 9} then A B = {1, 3, 5}.
We can define also the intersection of an arbitrary number
of sets. If {Aj }jJ is a family of
T
sets we define the intersection of all them, denoted jJ Aj , as the set consisting in those
elements belonging to all sets Aj :
\
Aj = {x Aj : for all j J}.
jJ
Version: 7 Owner: drini Author(s): drini, xriso
34.152
multiset
A multiset is a set for which duplicate elements are allowed.

For example, {1, 1, 3} is a multiset, but not a set.
34.153
proof of delta system lemma
Since there are only 0 possible cardinalities for any element of S, there must be some
n such that there are an uncountable number of elements of S with cardinality n. Let
S = {a S | |a| = n} for this n. By induction, the lemma holds:
If n = 1 then there each element of S is distinct, and has no intersection with the others,
so X = and S 0 = S .
Suppose n > 1. If there is some x which is in an uncountable number of elements of S then
take S = {a \ {x} | x a S }. Obviously this is uncountable and every element has n 1
elements, so by the induction hypothesis there is some S 0 S of S
uncountable cardinality
such that the intersection of any two elements is X. Obviously
{a
{x} | a S 0 } satisfies
S
the lemma, since the intersection of any two elements is X {x}.
On the other hand, if there is no suchTx then we can construct a sequence hai ii<1 such that
each ai S and for any i 6= j, ai aj =
S by induction. Take any element for a0 , and
given hai ii< , since is countable, A = i< ai is countable. Obviously each element of
294
A is in only a countable number of elements of S , so there are an uncountable number of

elements of S which are candidates for a . Then this sequence satisfies the lemma, since
the intersection of any two elements is .
34.154
rational number
The rational numbers Q are the fraction field of the ring Z of integers. In more elementary
terms, a rational number is a quotient a/b of two integers a and b. Two fractions a/b and
c/d are equivalent if the product of the cross terms is equal:
a
c
= ad = bc
b
d
Addition and multiplication of fractions are given by the formulae
a
+
b
a
c
ad + bc
=
d
bd
c
ac
=
d
bd
The field of rational numbers is an ordered field, under the ordering relation: a/b 6 c/d if
the inequality a d 6 b c holds in the integers.
34.155
saturated (set)
If p : X Y is a surjective map, we say that a subset C X is saturated (with respect to

p) if C contains every set p1 ({y}) it intersects. Equivalently, C is saturated if it is a union
of fibres.
34.156
separation and doubletons axiom
Separation axiom : If X is a set and P is a condition on sets, there exists a set

Y whose members are precisely the members of X satisfying P . Common notation:
Y = {A XkP (A)}.
Doubletons axiom (or Pairs): If X and Y are sets there is a set Z whose only
members are X and Y. Common notation: Z = {X, Y }.
295
REFERENCES
1. G.M. Bergman, An Invitation to General Algebra and Universal Constructions.
34.157
set
34.157.1
Introduction
A set is a collection, group, or conglomerate 1 .

Sets can be of real objects or mathematical objects; but the sets themselves are purely
conceptual. This is an important point to note: the set of all cows (for example) does not
physically exist, even though the cows do. The set is a gathering of the cows into one
conceptual unit that is not part of physical reality. This makes it easy to see why we can
have sets with an infinite number of elements; even though we may not be able to point out
infinity objects in the real world, we can construct conceptual sets which an infinite number
of elements (see the examples below).
Mathematics is thus built upon sets of purely conceptual, or mathematical, objects. Sets
are usually denoted by upper-case roman letters (like S). Sets can be defined by listing the
members, as in
S = {a, b, c, d}
Or, a set can be defined from a formula. This type of statement defining a set is of the form
S = {x : P (x)}
where S is the symbol denoting the set, x is the variable we are introducing to represent a
generic element of the set, and P (x) is some property that is true for values x within S (that
is x S iff P (x) holds). (We denote and by comma separated clauses in P (x). Also note
that the x : portion of the set definition may contain a qualification which narrows values of
x to some other set which is already known).
Sets are, in fact, completely defined by their elements. If two sets have the same elements,
they are equivalent. This is called the axiom of extensionality, and it is one of the most
important characteristics of sets that distinguishes them from predicates or properties.
1
However, not every collection has to be a set (in fact, all collections cant be sets). See proper class for
more details.
296
The symbol denotes inclusion in a set. For example,

sS
would be read s is an element of S, or S contains s.
Some examples of sets, with formal definitions, are :
The set of all even integers : {x Z : 2 | x}
The set of all prime numbers: {p N : x N x | p x {1, p}}, where denotes
implies and | denotes divides.
The set of all real functions of one real parameter: {f (x) R : x R}
The set of all isosceles triangles: {4ABC : (AB = BC) 6= AC}, where overline
denotes segment length.
Z, N, and R are all standard sets: the integers, the natural numbers, and the real numbers,
respectively. These are all infinite sets.
The most basic set is the empty set (denoted or {}).
The astute reader may have noticed that all of our examples of sets utilize sets, which does
not suffice for rigorous definition. We can be more rigorous if we postulate only the empty
set, and define a set in general as anything which one can construct from the empty set and
the ZFC axioms.
All objects in modern mathematics are constructed via sets.
34.157.2
Set Notions
An important set notion is cardinality. Cardinality is roughly the same as the intuitive
notion of size. For sets which have a less than infinite (non-infinite) number of elements,
cardinality can be thought of as size. However, intuition breaks down for sets with an infinite
number of elements. For more detail, see the cardinality entry.
Another important set concept is that of subsets. A subset B of a set A is any set which
contains only elements that appear in A. Subsets are denoted with the symbol, i.e. B A.
Also useful is the notion of a proper subset, denoted B A, which adds the restriction
that B must be smaller than A (that is, have a lower cardinality).
297
34.157.3
Set Operations
There are a number of standard (common) operations which are used to manipulate sets,
producing new sets from combinations of existing sets (sometimes with entirely different
types of elements). These standard operations are:
union
intersection
set difference
symmetric set difference
complement
cartesian product
298
Chapter 35
03Exx Set theory
35.1
intersection of sets
T
Let X, Y be sets. The intersection of X and Y , denoted X Y is the set
\
X
Y = {z : z X, z Y }
299
Chapter 36
03F03 Proof theory, general
36.1
NJp
NJp is a natural deduction proof system for intuisitionistic propositional logic. Its only
axiom is for any atomic . Its rules are:
cc
, 0 , 0
(E)
[, , ]
(I)
The syntax 0 indicates that the rule also holds if that formula is omitted.

(I)
[, ]
,
( I)
(E)

( E)
[, ]
where is atomic(i )
36.2
NKp
NKp is a natural deduction proof system for classical propositional logic. It is identical to
NJp except that it replaces the rule i with the rule:
300
,
where is atomic(c )
36.3
natural deduction
Natural deduction refers to related proof systems for several different kinds of logic, intended
to be similar to the way people actually reason. Unlike many other proof systems, it has
many rules and few axioms. Sequents in natural deduction have only one formula on the
right side.
Typically the rules consist of one pair for each connective, one of which allows the introduction of that symbol and the other its elimination.
To give one example, the proof rules I and E are:
,
and
( I)

[, ]
( E)
36.4
sequent
A sequent represents a formal step in a proof. Typically it consists of two lists of formulas,
one representing the premises and one the conclusions. A typical sequent might be:
, ,
This claims that, from premises and either or must be true. Note that is not
a symbol in the language, rather it is a symbol in the metalanguage used to discuss proofs.
Also, notice the asymmetry: everything on the left must be true to conclude only one thing
on the right. This does create a different kind of symmetry, since adding formulas to either
side results in a weaker sequent, while removing them from either side gives a stronger one.
Some systems allow only one formula on the right.
301
Most proof systems provide ways to deduce one sequent from another. These rules are
written with a list of sequents above and below a line. This rule indicates that if everything
above the line is true, so is everything under the line. A typical rule is:
, ,
This indicates that if we can deduce from , we can also deduce it from together with
.
Note that the capital greek letters are usually used to denote a (possibly empty) list of
formulas. [, ] is used to denote the contraction of and , that is, the list of those
formulas appearing in either or but with no repeats.
36.5
sound,, complete
If T h and P r are two sets of facts (in particular, a theory of some language and the set of
things provable by some method) we say P r is sound for T h if P r T h. Typically we
have a theory and set of rules for constructing proofs, and we say the set of rules are sound
(which theory is intended is usually clear from context) since everything they prove is true
(in T h).
If T h P r we say P r is complete for T h. Again, we usually have a theory and a set of
rules for constructing proofs, and say that the set of rules is complete since everything true
(in T h) can be proven.
302
Chapter 37
03F07 Structure of proofs
37.1
induction
Induction is the name given to a certain kind of proof, and also to a (related) way of defining
a function. For a proof, the statement to be proved has a suitably ordered set of cases.
Some cases (usually one, but possibly zero or more than one), are proved separately, and
the other cases are deduced from those. The deduction goes by contradiction, as we shall
see. For a function, its domain is suitably ordered. The function is first defined on some
(usually nonempty) subset of its domain, and is then defined at other points x in terms of
its values at points y such that y < x.
37.1.1
Elementary proof by induction
Proof by induction is a variety of proof by contradiction, relying, in the elementary cases,

on the fact that every non-empty set of natural numbers has a least element. Suppose we
want to prove a statement F (n) which involves a natural number n. It is enough to prove:
1) If n N, and F (m) is true for all m N such that m < n, then F (n) is true.
or, what is the same thing,
2) If F (n) is false, then F (m) is false for some m < n.
To see why, assume that F (n) is false for some n. Then there is a smallest k N such that
F (k) is false. Then, by hypothesis, F (n) is true for all n < k. By (1), F (k) is true, which is
a contradiction.
(If we dont regard induction as a kind of proof by contradiction, then we have to think
of it as supplying some kind of sequence of proofs, of unlimited length. Thats not very
303
satisfactory, particularly for transfinite inductions, which we will get to below.)

Usually the initial case of n = 0, and sometimes
a few cases, need to be proved separately,
P
as in the following example. Write Bn = nk=0 k 2 . We claim
Bn =
n3 n2 n
+
+ for all n N
3
2
6
Let us try to apply (1). We have the inductive hypothesis (as it is called)
Bm =
m3 m2 m
+
+
for all m < n
3
2
6
which tells us something if n > 0. In particular, setting m = n 1,

Bn1 =
(n 1)3 (n 1)2 n 1
+
+
3
2
6
3
Now we just add n2 to each side, and verify that the right side becomes n3 + n2 + n6 . This
proves (1) for nonzero n. But if n = 0, the inductive hypothesis is vacuously true, but of no
use. So we need to prove F (0) separately, which in this case is trivial.
Textbooks sometimes distinguish between weak and strong (or complete) inductive proofs.
A proof that relies on the inductive hypothesis (1) is said to go by strong induction. But in
the sum-of-squares formula above, we needed only the hypothesis F (n 1), not F (m) for all
m < n. For another example, a proof about the Fibonacci sequence might use just F (n 2)
and F (n 1). An argument using only F (n 1) is referred to as weak induction.
37.1.2
Definition of a function by induction
Lets begin with an example, the function N N, n 7 an , where a is some integer > 0.
The inductive definition reads
a0 = 1
an = a(an1 ) for all n > 0
Formally, such a definition requires some justification, which runs roughly as follows. Let T
be the set of m N for which the following definition has no problem.
a0 = 1
an = a(an1 ) for 0 < n m
We now have a finite sequence fm on the interval [0, m], for each m T . We verify that any
fl and fm have the same values throughout the intersection of their two domains. Thus we
can define a single function on the union of the various domains. Now suppose T 6= N, and
let k be the least element of N T . That means that the definition has a problem when
304
m = k but not when m < k. We soon get a contradiction, so we deduce T = N. That means
that the union of those domains is all of N, i.e. the function an is defined, unambiguously,
throughout N.
Another inductively defined function is the Fibonacci sequence, q.v.
We have been speaking of the inductive definition of a function, rather than just a sequence
(a function on N), because the notions extend with little change to transfinite inductions.
An illustration par excellence of inductive proofs and definitions is Conways theory of
surreal numbers. The numbers and their algebraic laws of composition are defined entirely
by inductions which have no special starting cases.
37.1.3
Minor variations of the method
The reader can figure out what is meant by induction starting at k, where k is not necessarily zero. Likewise, the term downward induction is self-explanatory.
A common variation of the method is proof by induction on a function of the index n.
Rather than spell it out formally, let me just give an example. Let n be a positive integer
having no prime factors of the form 4m + 3. Then n = a2 + b2 for some integers a and b.
The usual textbook proof uses induction on a function of n, namely the number of prime
factors of n. The induction starts at 1 (i.e. either n = 2 or prime n = 4m + 1), which in this
instance is the only part of the proof that is not quite easy.
37.1.4
Well-ordered sets
An ordered set (S, ) is said to be well-ordered if any nonempty subset of S has a least
element. The criterion (1), and its proof, hold without change for any well-ordered set S in
place of N (which is a well-ordered set). But notice that it wont be enough to prove that
F (n) implies F (n + 1) (where n + 1 denotes the least element > n, if it exists). The reason
is, given an element m, there may exist elements < m but no element k such that m = k + 1.
Then the induction from n to n + 1 will fail to reach m. For more on this topic, look for
limit ordinals.
Informally, any variety of induction which works for ordered sets S in which a segment
Sx = {y S|y < x} may be infinite, is called transfinite induction.
37.1.5
Noetherian induction
An ordered set S, or its order, is called noetherian if any non-empty subset of S has a
maximal element. Several equivalent definitions are possible, such as the ascending chain condition:
305
any strictly increasing sequence of elements of S is finite. The following result is easily proved
by contradiction.
Principle of Noetherian induction: Let (S, ) be a set with a Noetherian order, and let
T be a subset of S having this property: if x S is such that the condition y > x implies
y T , then x T . Then T = S.
So, to prove something F (x) about every element x of a Noetherian set, it is enough to
prove that F (z) for all z > y implies F (y). This time the induction is going downward,
but of course that is only a matter of notation. The opposite of a Noetherian order, i.e. an
order in which any strictly decreasing sequence is finite, is also in use; it is called a partial
well-order, or an ordered set having no infinite antichain.
The standard example of a Noetherian ordered set is the set of ideals in a Noetherian ring.
But the notion has various other uses, in topology as well as algebra. For a nontrivial
example of a proof by Noetherian induction, look up the Hilbert basis theorem.
37.1.6
Inductive ordered sets
An ordered set (S, ) is said to be inductive if any totally ordered subset of S has an
upper bound in S. Since the empty set is totally ordered, any inductive ordered set is nonempty. We have this important result:
Zorns lemma: Any inductive ordered set has a maximal element.
Zorns lemma is widely used in existence proofs, rather than in proofs of a property F (x) of
an arbitrary element x of an ordered set. Let me sketch one typical application. We claim
that every vector space has a basis. First, we prove that if a free subset F , of a vector space
V , is a maximal free subset (with respect to the order relation ), then it is a basis. Next,
to see that the set of free subsets is inductive, it is enough to verify that the union of any
totally ordered set of free subsets is free, because that union is an upper bound on the totally
ordered set. Last, we apply Zorns lemma to conclude that V has a maximal free subset.
Version: 10 Owner: Daume Author(s): Larry Hammick, slider142
306
Chapter 38
03F30 First-order arithmetic and
fragments
38.1
Elementary Functional Arithmetic
Elementary Functional Arithmetic, or EFA, is a weak theory of arithmetic created

by removing induction from Peano arithmetic. Because it lacks induction, axioms defining
exponentiation must be added.
x(x0 6= 0) (0 is the first number)
x, y(x0 = y 0 x = y) (the successor function is one-to-one)
x(x + 0 = x) (0 is the additive identity)
x, y(x + y 0 = (x + y)0 ) (addition is the repeated application of the successor function)
x(x 0 = 0)
x, y(x (y 0) = x y + x (multiplication is repeated addition)
x((x < 0)) (0 is the smallest number)
x, y(x < y 0 x < y x = y)
x(x0 = 1)
0
x(xy = xy x)
307
38.2
PA
Peano Arithmetic (PA) is the restriction of Peanos axioms to a first order theory of arithmetic. The only change is that the induction axiom is replaced by induction restricted to
arithmetic formulas:
(0) x((x) (x0 )) x(x))where is arithmetical
Note that this replaces the single, second-order, axiom of induction with a countably infinite
schema of axioms.
Appropriate axioms defining +, , and < are included. A full list of the axioms of PA looks
like this (although the exact list of axioms varies somewhat from source to source):
x(x0 6= 0) (0 is the first number)
x, y(x0 = y 0 x = y) (the successor function is one-to-one)
x(x + 0 = x) (0 is the additive identity)
x, y(x + y 0 = (x + y)0 ) (addition is the repeated application of the successor function)
x(x 0 = 0)
x, y(x (y 0) = x y + x) (multiplication is repeated addition)
x((x < 0)) (0 is the smallest number)
x, y(x < y 0 x < y x = y)
(0) x((x) (x0 )) x(x))where is arithmetical
38.3
Peano arithmetic
Peanos axioms are a definition of the set of natural numbers, denoted N. From these
axioms Peano arithmetic on natural numbers can be derived.
1. 0 N (0 is a natural number)
2. For each x N, there exists exactly one x0 N, called the successor of x
3. x0 6= 0 (0 is not the successor of any natural number)
308
4. x = y if and only if x0 = y 0.
5. (axiom of induction) If M N and 0 M and x M implies x0 M, then M = N.
The successor of x is sometimes denoted Sx instead of x0 . We then have 1 = S0, 2 = S1 =
SS0, and so on.
Peano arithmetic consists of statements derived via these axioms. For instance, from these
axioms we can define addition and multiplication on natural numbers. Addition is defined
as
x + 1 = x0 for all x N
x + y 0 = (x + y)0 for all x, y N
Addition defined in this manner can then be proven to be both associative and commutative.
Multiplication is
x 1 = x for all x N
x y 0 = x y + x for all x, y N
This definition of multiplication can also be proven to be both associative and commutative,
and it can also be shown to be distributive over addition.
Version: 4 Owner: Henry Author(s): Henry, Logan
309
Chapter 39
03F35 Second- and higher-order
arithmetic and fragments
39.1
ACA0
ACA0 is a weakened form of second order arithmetic. Its axioms include the axioms of PA
together with arithmetic comprehension.
39.2
RCA0
RCA0 is a weakened form of second order arithmetic. It consists of the axioms of PA other
than induction, together with 01 -IND and 01 -CA.
39.3
Z2
Z2 is the full system of second order arithmetic, that is, the full theory of numbers and sets
of numbers. It is sufficient for a great deal of mathematics, including much of number theory
and analysis.
The axioms defining successor, addition, multiplication, and comparison are the same as
those of PA. Z2 adds the full induction axiom and the full comprehension axiom.
310
39.4
comprehension axiom
The axiom of comprehension (CA) states that every formula defines a set. That is,
Xx(x X (x))for any formulawhereXdoes not occur free in
The names specification and separation are sometimes used in place of comprehension, particularly for weakened forms of the axiom (see below).
In theories which make no distinction between objects and sets (such as ZF), this formulation
leads to Russels paradox, however in stratified theories this is not a problem (for example
second order arithmetic includes the axiom of comprehension).
This axiom can be restricted in various ways. One possibility is to restrict it to forming
subsets of sets:
Y Xx(x X x Y (x))for any formulawhereXdoes not occur free in
This formulation (used in ZF set theory) is sometimes called the Aussonderungsaxiom.
Another way is to restrict to some family F , giving the axiom F-CA. For instance the
axiom 01 -CA is:
Xx(x X (x))whereis01 andXdoes not occur free in
A third form (usually called separation) uses two formulas, and guarantees only that those
satisfying one are included while those satisfying the other are excluded. The unrestricted
form is the same as unrestricted collection, but, for instance, 01 separation:
x((x) (x)) Xx(((x) x X) ((x) x
/ X))
whereandare01 andXdoes not occur free inor
is weaker than 01 -CA.

39.5
induction axiom
An induction axiom specifies that a theory includes induction, possibly restricted to specific
formulas. IND is the general axiom of induction:
(0) x((x) (x + 1)) x(x) for any formula
311
If is restricted to some family of formulas F then the axiom is called F-IND, or F induction.
For example the axiom 01 -IND is:
(0) x((x) (x + 1)) x(x) where is 01
312
Chapter 40
03G05 Boolean algebras
40.1
Boolean algebra
A Boolean algebra is a set B with two binary operators, meet,and join, and
one unary operator 0 complement, which together are a Boolean lattice. If X and Y are
boolean algebras, a mapping f : X Y is a morphism of Boolean algebras when it is a
morphism of , , and 0 .
Version: 6 Owner: greg Author(s): greg
40.2
M. H. Stones representation theorem
Theorem 3. Given a Boolean algebra B there exists a totally disconnected Hausdorff space
X such that B is isomorphic to the Boolean algebra of clopen subsets of X.
[
Very rough scetch of proof] Let

X = {f : B {0, 1} | f is a homomorphism}
endowed with the subspace topology induced by the product topology on B {0,1} . Then X
is a totally disconnected Hausdorff space. Let Cl(X) denote the Boolean algebra of clopen
subsets of X, then the following map
T : B Cl(X),
T (x) = {f X | f (x) = 1}
is well defined (i.e. T (x) is indeed a clopen set), and an isomorphism.

Version: 4 Owner: Dr Absentius Author(s): Dr Absentius
313
Chapter 41
03G10 Lattices and related
structures
41.1
Boolean lattice
A Boolean lattice B is a distributive lattice in which for each element x B there exists
a complement x0 B such that
x x0
x x0
(x0 )0
(x y)0
(x y)0
=0
=I
=x
= x0 y 0
= x0 y 0
Given a set, any collection of subsets that is closed under unions, intersections, and complements is a Boolean algebra.
Boolean rings (with identity, but allowing 0=1) are equivalent to Boolean lattices. To view
a Boolean ring as a Boolean lattice, define x y = xy and x y = x + y + xy. To view a
Boolean lattice as a Boolean ring, define xy = x y and x + y = (x0 y) (x y 0).
Version: 3 Owner: mathcam Author(s): mathcam, greg
41.2
complete lattice
A complete lattice is a nonempty poset in which every nonempty subset has a supremum
and an infimum.
314
In particular, a complete lattice is a lattice.

41.3
lattice
A lattice is any non-empty poset P in which any two elements x and y have a least upper bound,
x y, and a greatest lower bound, x y.
In other words, if q = x y then q P , q 6 x and q 6 y. Further, for all p P if p 6 x and
p 6 y, then p 6 q.
Likewise, if q = x y then q P , x 6 q and y 6 q, and for all p P if x 6 p and y 6 p,
then q 6 p.
Since P is a poset, the operations and have the following properties:
x x = x, x x = x
x y = y x, x y = y x
x (y z) = (x y) z,
x (y z) = (x y) z
x (x y) = x (x y) = x
(idempotency)
(commutativity)
(associativity)
(absorption)
Further, x 6 y is equivalent to:

x y = x and x y = y
Version: 5 Owner: mps Author(s): mps, greg
315
(consistency)
Chapter 42
03G99 Miscellaneous
42.1
Chu space
A Chu space over a set is a triple (A, r, X) with r : A X . A is called the carrier
and X the cocarrier.
Although the definition is symmetrical, in practice asymmetric uses are common. In particular, often X is just taken to be a set of function from A to , with r(a, x) = x(a) (such a
Chu space is called normal and is abbreviated (A, X)).
We define the perp of a Chu space C = (A, r, X) to be C = (X, r `, A) where r `(x, a) =
r(a, x).
Define r and r to be functions defining the rows and columns of C respectively, so that
r(a) : X and r(x) : A are given by r(a)(x) = r(x)(a) = r(a, x). Clearly the rows
of C are the columns of C .
Using these definitions, a Chu space can be represented using a matrix.
If r is injective then we call C separable and if r is injective we call C extensional. A Chu
space which is both separable and extensional is biextensional.
42.2
Chu transform
If C = (A, r, X) and D = (B, s, Y) are Chu spaces then we say a pair of functions f : A B
and g : Y X form a Chu transform from C to D if for any (a, y) A Y we have
r(a, g(y)) = s(f (a), y).
316
42.3
biextensional collapse
If C = (A, r, X) is a Chu space, we can define the biextensional collapse of C to be

(
r [A], r 0 , r[X]) where r 0 (
r (a), r(x)) = r(a, x).
That is, to name the rows of the biextensional collapse, we just use functions representing
the actual rows of the original Chu space (and similarly for the columns). The effect is to
merge indistinguishable rows and columns.
We say that two Chu spaces are equivalent if their biextensional collapses are isomorphic.
42.4
example of Chu space
Any set A can be represented as a Chu space over {0, 1} by (A, r, P(A)) with r(a, X) = 1
iff a X. This Chu space satisfies only the trivial property 2A , signifying the fact that sets
have no internal structure. If A = {a, b, c} then the matrix representation is:
a
b
c
{}
0
0
0
{a}
1
0
0
{b}
0
1
0
{c}
0
0
1
{a,b}
1
1
0
{a,c}
1
0
1
{b,c}
0
1
1
{a,b,c}
1
1
1
Increasing the structure of a Chu space, that is, adding properties, is equivalent to deleting
columns. For instance we can delete the columns named {c} and {b, c} to turn this into
the partial order satisfying c 6 a. By deleting more columns, we can further increase the
structure. For example, if we require that the set of rows be closed under the bitwise or
operation (and delete those columns which would prevent this) then we can it will define a
semilattice, and if it is closed under both bitwise or and bitwise and then it will define a
lattice. If the rows are also closed under complementation then we have a Boolean algebra.
Note that these are not arbitrary connections: the Chu transforms on each of these classes
of Chu spaces correspond to the appropriate notion of homomorphism for those classes.
For instance, to see that Chu transforms are order preserving on Chu spaces viewed as partial
orders, let C = (A, r, X) be a Chu space satisfying b 6 a. That is, for any x X we have
r(b, x) = 1 r(a, x) = 1. Then let (f, g) be a Chu transform to D = (B, s, X), and suppose
s(f (b), y) = 1. Then r(b, g(y)) = 1 by the definition of a Chu transform, and then we have
r(a, g(y)) = 1 and so s(f (a), y) = 1, demonstrating that f (b) 6 f (a).
317
42.5
property of a Chu space
A property of a Chu space over with carrier A is some Y A. We say that a Chu
space C = (A, r, X) satisfies Y if X Y .
For example, every Chu space satisfies the property A.
318
Chapter 43
43.1
example of pigeonhole principle
A simple example.
For any group of 8 integers, there exist at least two of them whose difference is divisible by
7.
C onsider the residue classes modulo 7. These are 0, 1, 2, 3, 4, 5, 6. We have seven classes
and eight integers. So it must be the case that 2 integers fall on the same residue class, and
therefore their difference will be divisible by 7.
43.2
multi-index derivative of a power
Theorem If i, k are multi-indices in Nn , and x = (x1 , . . . , xn ), then

k! ki
x
if i k,
i k
(ki)!
x =
0
otherwise.
Proof. The proof follows from the corresponding rule for the ordinary derivative; if i, k are
319
in 0, 1, 2, . . ., then
di k
x =
dxi
k!
xki
(ki)!
if i k,
otherwise.
(43.2.1)
Suppose i = (i1 , . . . , in ), k = (k1 , . . . , kn ), and x = (x1 , . . . , xn ). Then we have that

|i|
xk11 xknn
i1
i
n
x1 xn
in kn
i1 k1
x
x .
=
1
xinn n
xi11
i xk =
For each r = 1, . . . , n, the function xkr r only depends on xr . In the above, each partial
differentiation /xr therefore reduces to the corresponding ordinary differentiation d/dxr .
Hence, from equation 43.2.1, it follows that i xk vanishes if ir > kr for any r = 1, . . . , n. If
this is not the case, i.e., if i k as multi-indices, then for each r,
kr !
dir kr
xkr ir ,
=
x
r
i
dxrr
(kr ir )! r
and the theorem follows. 2
43.3
multi-index notation
Definition [1, 2, 3] A multi-index is an n-tuple (i1 , . . . , in ) of non-negative integers i1 , . . . , in .

In other words, i Nn . Usually, n is the dimension of the underlying space. Therefore, when
dealing with multi-indices, it is assumed clear from the context.
Operations on multi-indices
For a multi-index i, we define the length (or order) as
|i| = i1 + + in ,
and the factorial as
i! =
n
Y
ik !.
k=1
If i = (i1 , . . . , in ) and j = (j1 , . . . , jn ) are two multi-indices, their sum and difference is
defined component-wise as
i + j = (i1 + j1 , . . . , in + jn ),
i j = (i1 j1 , . . . , in jn ).
320
Thus |i j| = |i| |j|. Also, if jk ik for all k = 1, . . . , n, then we write j i. For

multi-indices i, j, with j i, we define

i!
i
=
.
j
(i j)!j!
For a point x = (x1 , . . . , xn ) in Rn (with standard coordinates) we define
i
x =
n
Y
xikk .
k=1
Also, if f : Rn R is a smooth function, and i = (i1 , . . . , in ) is a multi-index, we define
|i|
f,
i1 e1 in en
where e1 , . . . , en are the standard unit vectors of Rn . Since f is sufficiently smooth, the order
in which the derivations are performed is irrelevant. For multi-indices i and j, we thus have
if =
i j = i+j = j+i = j i .
Much of the motivation for the above notation is that standard results such as Leibniz rule,
Taylors formula, etc can be written more or less as-is in many dimensions by replacing
indices in N with multi-indices. Below are some examples of this.
Examples
1. If n is a positive integer, and x1 , . . . , xk are complex numbers, the multinomial expansion states that
X xi
,
(x1 + + xk )n = n!
i!
|i|=n
where x = (x1 , . . . , xk ) and i is a multi-index. (proof)
2. Leibniz rule [1]: If f, g : Rn R are smooth functions, and j is a multi-index, then

X j
j
i (f ) ji (g),
(f g) =
i
ij
where i is a multi-index.
REFERENCES
1. http://www.math.umn.edu/ jodeit/course/TmprDist1.pdf
2. M. Reed, B. Simon, Methods of Mathematical Physics, I - Functional Analysis, Academic Press, 1980.
3. E. Weisstein, Eric W. Weissteins world of mathematics, entry on Multi-Index Notation

321
Chapter 44
05A10 Factorials, binomial
coefficients, combinatorial functions
44.1
Catalan numbers
The Catalan numbers, or Catalan sequence, have many interesting applications in combinatorics.
The nth Catalan number is given by:
Cn =
2n
n
n+1

where nr represents the binomial coefficient. The first several Catalan numbers are 1, 1, 2,
5, 14, 42, 132, 429, 1430, 4862 ,. . . (see EIS sequence A000108 for more terms). The Catalan
numbers are also generated by the recurrence relation
C0 = 1,
Cn =
n1
X
Ci Cn1i .
i=0
For example, C3 = 1 2 + 1 1 + 2 1 = 5, C4 = 1 5 + 1 2 + 2 1 + 5 1 = 14, etc.

The ordinary generating function for the Catalan numbers is
X
1 1 4z
n
.
Cn z =
2z
n=0
Interpretations of the nth Catalan number include:
322
1. The number of ways to arrange n pairs of matching parentheses, e.g.:

()
(()) ()()
((())) (()()) ()(()) (())() ()()()
2. The number of ways an polygon of n + 2 sides can be split into n triangles.
3. The number of rooted binary trees with exactly n + 1 leaves.
The Catalan sequence is named for Eugène Charles Catalan, but it was discovered in 1751
by Euler when he was trying to solve the problem of subdividing polygons into triangles.
REFERENCES
1. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics. AddisonWesley, 1998. Zbl 0836.00001.
Version: 3 Owner: bbukh Author(s): bbukh, vampyr
44.2
Levi-Civita permutation symbol
Definition. Let ki {1, , n} for all i = 1, , n. The Levi-Civita permutation

symbols k1 kn and k1 kn are defined as
+1 when {l 7 kl } is an even permutation (of {1, , n}),

k1 km
1 when {l 7 kl } is an odd permutation,
k1 km =
=
0
otherwise, i.e., when ki = kj , for some i 6= j.
The Levi-Civita permutation symbol is a special case of the generalized Kronecker delta symbol.
Using this fact one can write the Levi-Civita permutation symbol as the determinant of an
n n matrix consisting of traditional delta symbols. See the entry on the generalized Kronecker symbol for details.
When using the Levi-Civita permutation symbol and the generalized Kronecker delta symbol,
the Einstein summation convention is usually employed. In the below, we shall also use this
convention.
properties
323
When n = 2, we have for all i, j, m, n in {1, 2},

ij mn = im jn in jm ,
ij
in
jn ,
ij ij = 2.
(44.2.1)
(44.2.2)
(44.2.3)
When n = 3, we have for all i, j, k, m, n in {1, 2, 3},

jmn imn = 2ji ,
(44.2.4)
ijk ijk = 6.
(44.2.5)
Let us prove these properties. The proofs are instructional since they demonstrate typical
argumentation methods for manipulating the permutation symbols.
Proof. For equation 220.5.1, let us first note that both sides are antisymmetric with respect
of ij and mn. We therefore only need to consider the case i 6= j and m 6= n. By substitution,
we see that the equation holds for 12 12 , i.e., for i = m = 1 and j = n = 2. (Both sides are
then one). Since the equation is anti-symmetric in ij and mn, any set of values for these
can be reduced the the above case (which holds). The equation thus holds for all values of
ij and mn. Using equation 220.5.1, we have for equation 44.2.2
ij in = ii jn in ji
= 2jn jn
= jn .
Here we used the Einstein summation convention with i going from 1 to 2. Equation 44.2.3
follows similarly from equation 44.2.2. To establish equation 44.2.4, let us first observe that
both sides vanish when i 6= j. Indeed, if i 6= j, then one can not choose m and n such
that both permutation symbols on the left are nonzero. Then, with i = j fixed, there are
only two ways to choose m and n from the remaining two indices. For any such indices,
we have jmn imn = (imn )2 = 1 (no summation), and the result follows. The last property
follows since 3! = 6 and for any distinct indices i, j, k in {1, 2, 3}, we have ijk ijk = 1 (no
summation). 2
Examples.
The determinant of an n n matrix A = (aij ) can be written as
det A = i1 in a1i1 anin ,
where each il should be summed over 1, . . . , n.
If A = (A1 , A2 , A3 ) and B = (B 1 , B 2 , B 3 ) are vectors in R3 (represented in some right
hand oriented orthonormal basis), then the ith component of their cross product equals
(A B)i = ijk Aj B k .
324
For instance, the first component of A B is A2 B 3 A3 B 2 . From the above expression

for the cross product, it is clear that A B = B A. Further, if C = (C 1 , C 2 , C 3 )
is a vector like A and B, then the triple scalar product equals
A (B C) = ijk Ai B j C k .
From this expression, it can be seen that the triple scalar product is antisymmetric
when exchanging any adjacent arguments. For example, A (B C) = B (A C).
Suppose F = (F 1 , F 2 , F 3 ) is a vector field defined on some domain of R3 with Cartesian
coordinates x = (x1 , x2 , x3 ). Then the ith component of the curl of F equals
( F )i (x) = ijk
k
F (x).
xj
44.3
Pascals rule (bit string proof )

This proof is based on an alternate, but equivalent, definition of the binomial coefficient: nr
is the number of bit strings (finite sequences of 0s and 1s) of length n with exactly r ones.
We want to show that

n
n1
n1
=
+
r
r1
r
To do so, we will show that both sides of the equation are counting the same set of bit
strings.
The left-hand side counts the set of strings of n bits with r 1s. Suppose we take one of these
strings and remove the first bit b. There are two cases: either b = 1, or b = 0.

bit strings of this
If b = 1, then the new string is n 1 bits with r 1 ones; there are n1
r1
nature.

If b = 0, then the new string is n 1 bits with r ones, and there are n1
strings of this
r
nature.
Therefore every string counted on the left is covered by one, but not both, of these two cases.
If we add the two cases, we find that

n1
n1
n
+
=
r
r1
r
325
44.4
Pascals rule proof
We need to show

n+1
n
n
=
+
k
k1
k
Let us begin by writing the left-hand side as
n!
n!
+
k!(n k)! (k 1)!(n (k 1))!
Getting a common denominator and simplifying, we have
n!
n!
(n k + 1)n!
kn!
+
=
+
k!(n k)! (k 1)!(n k + 1)!
(n k + 1)k!(n k)! k(k 1)!(n k + 1)!
(n k + 1)n! + kn!
=
k!(n k + 1)!
(n + 1)n!
=
k!((n + 1) k)!
(n + 1)!
=
k!((n + 1) k)!

n+1
=
k
44.5
Pascals triangle
Pascals triangle is the following configuration of numbers:
1
1
1
1
1
1
1
1
3
4
5
6
..
.
1
3
6
10
15
21
1
2
10
20
35
1
4
..
.
326
1
5
15
35
1
6
21
..
.
1
7
This triangle goes on into infinity. Therefore we have only printed the first 8 lines. In general,
this triangle is constructed such that entries on the left side and right side are 1, and every
entry inside the triangle is obtained by summing the two entries immediately above it. For
instance, on the forth row 4 = 1 + 3.
Historically, the application of this triangle has been to give the coefficients when expanding
binomial expressions. For instance, to expand (a + b)4 , one simply look up the coefficients
on the fourth row, and write
(a + b)4 = a4 + 4a3 b + 6a2 b2 + 4ab3 + b4 .
Pascals triangle is named after the French mathematician Blaise Pascal (1623-1662) [3].
However, this triangle was known at least around 1100 AD in China; five centuries before Pascal [1]. In modern language, the expansion of the binomial is given by the binomial theorem
discovered by Isaac Newton in 1665 [2]: For any n = 1, 2, . . . and real numbers a, b, we have
(a + b)
n
X
n nk k
a b
=
k
k=0

n n2 2
n n1
n
a b + bn .
a b+
= a +
2
1
Thus, in Pascals triangle, the entries on the nth row are given by the binomial coefficients

n
n!
=
.
k
(n k)!k!
for k = 1, . . . , n.
REFERENCES
1. Wikipedias entry on the binomial coefficients
2. Wikipedias entry on Isaac Newton
3. Wikipedias entry on Blaise Pascal
Version: 1 Owner: Koro Author(s): matte
327
44.6
Upper and lower bounds to binomial coefficient

nk
n
k
k!

n
n e k
k
k

n
n k
k
k
Also, for large n:
n
k
nk
.
k!
Version: 1 Owner: gantsich Author(s): gantsich
44.7
binomial coefficient
The number of ways to choose r objects from a set with n elements (n r) is given by
n!
.
(n r)!r!
It is usually denoted in several ways, like

n
, C(n, r),
r
Crn
These numbers are called binomial coefficients, because they show up at expanding (x + y)n .
Some interesting properties:
n
r
is the coefficient of xk y nk in (x + y)n . (binomial theorem).

n
n
= nr
.
r

n
n
n+1
+
=
(Pascals rule).
r1
r
r

n
= 1 = nn for all n.
0

n
+ n1 + n2 + + nn = 2n .
0

n
n1 + n2 + (1)n nn = 0.
0

Pn t
n+1
t=1 k = k+1 .
328

On the context of Computer Science, it also helps to see nr as the number of strings
consisting of ones and zeros with r ones and n r zeros. This equivalency comes from the
fact that if S be a finite set with n elements, nr is the number of distinct subsets of S with
r elements. For each subset T of S, consider the function
XT : S {0, 1}
where XT (x) = 1 whenever x T and 0 otherwise (so XT is the characteristic function for
T ). For each T P(S), XT can be used to produce a unique bit string of length n with
exactly r ones.
44.8
double factorial
The double factorial of a positive integer n is

n!! = n(n 2) kn
where kn denotes 1 if n is odd and 2 if n is even.
For example,
7!! = 7 5 3 1 = 105
10!! = 10 8 6 4 2 = 3840
Note that n!! is not the same as (n!)!.
Version: 3 Owner: drini Author(s): Larry Hammick, Riemann
44.9
factorial
For any non-negative integer n, the factorial of n, denoted n!, can be defined by
n! =
n
Y
r=1
where for n = 0 the empty product is taken to be 1.

Alternatively, the factorial can be defined recursively by 0! = 1 and n! = n(n 1)! for n > 0.
n! is equal to the number of permutations of n distinct objects. For example, there are 5!
ways to arrange the five letters A, B, C, D and E into a word.
329
Eulers gamma function (x) generalizes the notion of factorial to almost all complex values,
as
(n + 1) = n!
for every non-negative integer n.
Version: 13 Owner: yark Author(s): yark, Riemann
44.10
falling factorial
For n N, the rising and falling factorials are nth degree polynomial described, respectively,
by
xn = x(x + 1) . . . (x + n 1)
xn = x(x 1) . . . (x n + 1)
The two types of polynomials are related by:
xn = (1)n (x)n .
The rising factorial is often written as (x)n , and referred to as the Pochhammer symbol (see
hypergeometric series). Unfortunately, the falling factorial is also often denoted by (x)n , so
great care must be taken when encountering this notation.
Notes.
Unfortunately, the notational conventions for the rising and falling factorials lack a common
standard, and are plagued with a fundamental inconsistency. An examination of reference
works and textbooks reveals two fundamental sources of notation: works in combinatorics
and works dealing with hypergeometric functions.
Works of combinatorics [1,2,3] give greater focus to the falling factorial because of if its role
in defining the Stirling numbers. The symbol (x)n almost always denotes the falling factorial.
The notation for the rising factorial varies widely; we find hxin in [1] and (x)(n) in [3].
Works focusing on special functions [4,5] universally use (x)n to denote the rising factorial and
use this symbol in the description of the various flavours of hypergeometric series. Watson [5]
credits this notation to Pochhammer [6], and indeed the special functions literature eschews
falling factorial in favour of Pochhammer symbol. Curiously, according to Knuth [7],
Pochhammer himself used (x)n to denote the binomial coefficient (Note: I havent verified
this.)
The notation featured in this entry is due to D. Knuth [7,8]. Given the fundamental inconsistency in the existing notations, it seems sensible to break with both traditions, and
to adopt new and graphically suggestive notation for these two concepts. The traditional
330
notation, especially in the hypergeometric camp, is so deeply entrenched that, realistically,

one needs to be familiar with the traditional modes and to take care when encountering the
symbol (x)n .
References
1. Comtet, Advanced combinatorics.
2. Jordan, Calculus of finite differences.
3. Riordan, Introduction to combinatorial analysis.
4. Erdelyi, et. al., Bateman manuscript project.
5. Watson, A treatise on the theory of Bessel functions.
6. Pochhammer, Ueber hypergeometrische Functionen nter Ordnung, Journal f
ur die
reine und angewandte Mathematik 71 (1870), 316352.
7. Knuth, Two notes on notation download
8. Greene, Knuth, Mathematics for the analysis of algorithms.
44.11
inductive proof of binomial theorem
When n = 1,

1
X
1 1k k
1 1 0
1 0 1
(a + b) =
a b =
ab +
a b = a + b.
k
0
1
k=0
1
For the inductive step, assume it holds for m. Then for n = m + 1,
331
(a + b)m+1 = a(a + b)m + b(a + b)m

m
m
X
X
m mj j
m mk k
a
b by the inductive hypothesis
a
b +b
= a
j
k
j=0
k=0
m
m
X
m mk+1 k X m mj j+1
a
b
by multiplying through by a and b
a
b +
=
j
k
j=0
k=0
m
m
X
m mk+1 k X m mj j+1
m+1
= a
+
a
b +
a
b
by pulling out the k = 0 term
k
j
j=0
k=1

m
m+1
X
m mk+1 k X
m
m+1
= a
+
a
b +
amk+1 bk by letting j = k 1
k
k
1
k=1
k=1

m
m
X
X
m
m mk+1 k
m+1
am+1k bk + bm+1 by pulling out thek = m + 1 te
a
b +
= a
+
k
1
k
k=1
k=1

m
X m
m
m+1
m+1
= a
+b
+
+
am+1k bk by combining the sums
k
k1
k=1

m
X
m + 1 m+1k k
a
b from Pascals rule
= am+1 + bm+1 +
k
k=1

m+1
X m+1
=
am+1k bk by adding in the m + 1 terms,
k
k=0
as desired.
44.12
multinomial theorem
A multinomial is a mathematical expression consisting of two or more terms, e.g.

a1 x1 + a2 x2 + . . . + ak xk .
The multinomial theorem provides the general form of the expansion of the powers of this
expression, in the process specifying the multinomial coefficients which are found in that
expansion. The expansion is:
(x1 + x2 + . . . + xk )n =
n!
xn1 xn2 xnk k
n1 !n2 ! nk ! 1 2
where the sum is taken over all multi-indices (n1 , . . . nk ) Nk that sum to n.
332
(44.12.1)
The expression
is denoted by
n!
n1 !n2 !nk !
occurring in the expansion is called multinomial coefficient and

n
.
n1 , n2 , . . . , nk
Version: 7 Owner: bshanks Author(s): yark, bbukh, rmilson, bshanks
44.13
multinomial theorem (proof )
Proof. The below proof of the multinomial theorem uses the binomial theorem and induction
on k. In addition, we shall use multi-index notation.
First, for k = 1, both sides equal xn1 . For the induction step, suppose the multinomial
theorem holds for k. Then the binomial theorem and the induction assumption yield
n
X
n
n
nl
(x1 + + xk + xk+1 ) =
(x1 + + xk )l xk+1
l
l=0

n
X
n X xi nl
=
xk+1
l!
i!
l
l=0
= n!
n X
X
l=0 |i|=l
|i|=l
nl
xi xk+1
i!(n l)!
where x = (x1 , . . . , xk ) and i is a multi-index in I+k . To complete the proof, we need to show
that the sets
A = {(i1 , . . . , ik , n l) I+k+1 | l = 0, . . . , n, |(i1 , . . . , ik )| = l},
B = {j I+k+1 | |j| = n}
are equal. The inclusion A B is clear since
|(i1 , . . . , ik , n l)| = l + n l = n.
For B A, suppose j = (j1 , . . . , jk+1 ) I+k+1 , and |j| = n. Let l = |(j1 , . . . , jk )|. Then
l = n jk+1 , so jk+1 = n l for some l = 0, . . . , n. It follows that that A = B.
Let us define y = (x1 , , xk+1 ) and let j = (j1 , . . . , jk+1 ) be a multi-index in I+k+1 . Then
(x1 + + xk+1 )
= n!
|j|=n
X yj
.
= n!
j!
|j|=n
333
k+1
x(j1 ,...,jk ) xk+1
(j1 , . . . , jk )!jk+1 !
This completes the proof. 2

44.14
proof of upper and lower bounds to binomial coefficient
Let 2 k n be natural numbers. Well first prove the inequality

n
ne k
.
k
k

We rewrite nk as

n
k1
1
1
nk1
=
1
n
n
k
to get
(n 1) (n k + 1)
< 1.
e nk1
k
Multiplying the inequality above with kk! < ek1 yields
k
n(n 1) (n k + 1) k k
1
k
n
=
k
k k!
k!
k
n
e
k1
<
e
ne k
n
<
.
k
k
To conclude the proof we show that

i
n1
Y
1
nn
=
1+
n 2 N.
i
n!
i=1
n1
Y
i=1
1
1+
i
i
(44.14.1)
n1
Y
(i + 1)i
=
ii
i=1
n
Q
ii1
= n1 i=2
Q i1
i
(n 1)!
i=1
Since each left-hand factor in (44.14.1) is < e, we have

k 1, we immediately get

n
=
k
k1
Q
i=2
1
k!
334
1
i
nn
n!
<
< en1 . Since n i < n 1 i
nn
.
k!
And from
k n (n i) k (k i) n 1 i k 1
we obtain

n
k

n k
k
k1
n Y n i)
k i=1 k i
335
Chapter 45
05A15 Exact enumeration problems,
generating functions
45.1
Stirling numbers of the first kind
Introduction. The Stirling numbers of the first kind, frequently denoted as

s(n, k),
k, n N,
1 6 k 6 n,
are the integer coefficients of the falling factorial polynomials. To be more precise, the
defining relation for the Stirling numbers of the first kind is:
n
x = x(x 1)(x 2) . . . (x n + 1) =
n
X
s(n, k)xk .
k=1
Here is the table of some initial values.

n\k
1
2
3
4
5
1
2 3
4 5
1
-1
1
2 -3 1
-6 11 -6
1
24 -50 35 -10 1
Recurrence Relation. The evident observation that

xn+1 = xxn nxn .
leads to the following equivalent characterization of the s(n, k), in terms of a 2-place recurrence formula:
s(n + 1, k) = s(n, k 1) ns(n, k),
1 6 k < n,
336
subject to the following initial conditions:

s(n, 0) = 0,
s(1, 1) = 1.
Generating Function. There is also a strong connection with the generalized binomial formula,
which furnishes us with the following generating function:
(1 + t)x =
X
n
X
s(n, k)xk
n=0 k=1
tn
.
n!
This generating function implies a number of identities. Taking the derivative of both sides
with respect to t and equating powers, leads to the recurrence relation described above.
Taking the derivative of both sides with respect to x gives

n
X
n+1
nj
(k + 1)s(n, k + 1) =
(1) (n j)!
s(j, k)
j
j=k
This is because the derivative of the left side of the generating funcion equation with respect
to x is
X
tk
x
x
(1 + t) ln(1 + t) = (1 + t)
(1)k1 .
k
k=1
The relation
(1 + t)x1 (1 + t)x2 = (1 + t)x1 +x2
yields the following family of summation identities. For any given k1 , k2 , d > 1 we have

X d + k1 + k2
k1 + k2
s(d1 + k1 , k1 )s(d2 + k2 , k2 ).
s(d + k1 + k2 , k1 + k2 ) =
k1 + d1
k1
d +d =d
1
Enumerative interpretation. The absolute value of the Stirling number of the first kind,
s(n, k), counts the number of permutations of n objects with exactly k orbits (equivalently, with exactly k cycles). For example, s(4, 2) = 11, corresponds to the fact that
the symmetric group on 4 objects has 3 permutations of the form
()()
2 orbits of size 2 each,
and 8 permutations of the form

( )
1 orbit of size 3, and 1 orbit of size 1,
(see the entry on cycle notation for the meaning of the above expressions.)
Let us prove this. First, we can remark that the unsigned Stirling numbers of the first are
characterized by the following recurrence relation:
|s(n + 1, k)| = |s(n, k 1)| + n|s(n, k)|,
337
1 6 k < n.
To see why the above recurrence relation matches the count of permutations with k cycles,
consider forming a permutation of n + 1 objects from a permutation of n objects by adding
a distinguished object. There are exactly two ways in which this can be accomplished. We
could do this by forming a singleton cycle, i.e. leaving the extra object alone. This accounts
for the s(n, k 1) term in the recurrence formula. We could also insert the new object into
one of the existing cycles. Consider an arbitrary permutation of n object with k cycles, and
label the objects a1 , . . . , an , so that the permutation is represented by
(a1 . . . aj1 )(aj1 +1 . . . aj2 ) . . . (ajk1 +1 . . . an ) .
|
{z
}
k cycles
To form a new permutation of n + 1 objects and k cycles one must insert the new object into
this array. There are, evidently n ways to perform this insertion. This explains the n s(n, k)
term of the recurrence relation. Q.E.D.
45.2
Stirling numbers of the second kind
Summary. The Stirling numbers of the second kind,

S(n, k),
k, n N,
1 6 k 6 n,
are a doubly indexed sequence of natural numbers, enjoying a wealth of interesting combinatorial properties. There exist several logically equivalent characterizations, but the starting
point of the present entry will be the following definition:
The Stirling number S(n, k) is the number of way to partition a set of n objects
into k groups.
For example, S(4, 2) = 7 because there are seven ways to partition 4 objects call them a,
b, c, d into two groups, namely:
(a)(bcd), (b)(acd), (c)(abd), (d)(abc), (ab)(cd), (ac)(bd), (ad)(bc)
Four additional characterizations will be discussed in this entry:
a recurrence relation
a generating function related to the falling factorial
differential operators
a double-index generating function
Each of these will be discussed below, and shown to be equivalent.
338
A recurrence relation. The Stirling numbers of the second kind can be characterized in
terms of the following recurrence relation:
S(n, k) = kS(n 1, k) + S(n 1, k 1),
1 6 k < n,
subject to the following initial conditions:

S(n, n) = S(n, 1) = 1.
Let us now show that the recurrence formula follows from the enumerative definition. Evidently,
there is only one way to partition n objects into 1 group (everything is in that group), and
only one way to partition n objects into n groups (every object is a group all by itself).
Proceeding recursively, a division of n objects a1 , . . . , an1 , an into k groups can be achieved
by only one of two basic maneuvers:
We could partition the first n 1 objects into k groups, and then add object an into
one of those groups. There are kS(n 1, k) ways to do this.
We could partition the first n 1 objects into k 1 groups and then add object an as
a new, 1 element group. This gives an additional S(n 1, k 1) ways to create the
desired partition.
The recursive point of view, therefore explains the connection between the recurrence formula, and the original definition.
Using the recurrence formula we can easily obtain a table of the initial Stirling numbers:
n\k
1
2
3
4
5
1 2 3 4 5
1
1 1
1 3 1
1 7 6 1
1 15 25 10 1
Falling Factorials. Consider the vector space of polynomials in indeterminate x. The

most obvious basis of this infinite-dimensional vector space is the sequence of monomial
powers: xn , n N. However, the sequence of falling factorials:
xn = x(x 1)(x 2) . . . (x n + 1),
nN
is also a basis, and hence can be used to generate the monomial basis. Indeed, the Stirling
numbers of the second kind can be characterized as the the coefficients involved in the
corresponding change of basis matrix, i.e.
n
x =
n
X
S(n, k)xk .
k=1
339
So, for example,

x4 = x + 7x(x 1) + 6x(x 1)(x 2) + x(x 1)(x 2)(x 3).
Arguing inductively, let us prove that this characterization follows from the recurrence relation. Evidently the formula is true for n = 1. Suppose then that the formula is true for a
given n. We have
xxk = xk+1 + kxk ,
and hence using the recurrence relation we deduce that
n+1
n
X
S(n, k) x xk
k=1
=
=
n
X
k=1
n+1
X

kS(n, k)xk + S(n, k + 1) xk+1
S(n, k)xk
k=1
Differential operators. Let Dx denote the ordinary derivative, applied to polynomials

in indeterminate x, and let Tx denote the differential operator xDx . We have the following
characterization of the Stirling numbers of the second kind in terms of these two operators:
n
(Tx ) =
n
X
S(n, k) xk (Dx )k ,
k=1
where an exponentiated differential operator denotes the operator composed with itself the
indicated number of times. Let us show that this follows from the recurrence relation. The
proof is once again, inductive. Suppose that the characterization is true for a given n. We
have
Tx (xk (Dx )k ) = kxk (Dx )k + xk+1 (Dx )k+1 ,
and hence using the recurrence relation we deduce that
(Tx )n+1 = xDx
n
X
S(n, k) xk (Dx )k
k=1
=
=
n
X
k=1
n+1
X
S(n, k) kxk (Dx )k + xk+1 (Dx )k+1

S(n, k) xk (Dx )k
k=1
340
Double index generating function. One can also characterize the Stirling numbers of
the second kind in terms of the following generating function:
x(et 1)
X
n
X
S(n, k) xk
n=1 k=1
tn
.
n!
Let us now prove this. Note that the differential equation

d
= ,
dt
admits the general solution
= et x.
It follows that for any polynomial p() we have

exp(tT )[p()]
=x
X
tn
n=0
n!

(T )n [p()]
=x
= p(et x).
The proof is simple: just take Dt of both sides. To be more explicit,

Dt p(et x) = p0 (et x)et x = T [p()]
,
=xet
and that is exactly equal to Dt of the left-hand side. Since this relation holds for all polynomials, it also holds for all formal power series. In particular if we apply the above relation
to e , use the result of the preceding section, and note that
D [e ] = e ,
we obtain
xet
=
=
n
X
t
n!

(T ) [e ]
n=1
n
X
X
=x
S(n, k)
n=1 k=1
X
n
X
x
=e

tn k

(D )k [e ]
n!
=x
S(n, k) xk
n=1 k=1
tn
n!
Dividing both sides by ex we obtain the desired generating function. Q.E.D.

341
Chapter 46
05A19 Combinatorial identities
46.1
Pascals rule
Pascals rule is the binomial identity

n+1
n
n
=
+
k
k1
k

where 1 6 k 6 n and nk is the binomial coefficient.
342
Chapter 47
05A99 Miscellaneous
47.1
principle of inclusion-exclusion
The principle of inclusion-exclusion provides a way of methodically counting the union

of possibly non-disjoint sets.
Let C = {A1 , A2 , . . . AN } be a finite collection of finite sets. Let Ik represent the set of k-fold
intersections of members of C (e.g., I2 contains all possible intersections of two sets chosen
from C).
Then

N
N
X
[
X

(1)(j+1)
|S|
Ai =

i=1
For example:
j=0
SIj
\
B| = (|A| + |B|) (|A B|)
[ [
\
\
\
\ \
|A B C| = (|A| + |B| + |C|) (|A B| + |A C| + |B C|) + (|A B C|)
|A
The principle of inclusion-exclusion, combined with de Morgans theorem, can be used to

count the intersection of sets as well. Let A be some universal set such that Ak A for each
k, and let Ak represent the complement of Ak with respect to A. Then we have

N
N
\
[

A
=
A

i
i

i=1
i=1
thereby turning the problem of finding an intersection into the problem of finding a union.
343
47.2
principle of inclusion-exclusion proof
The proof is by induction. Consider a single set A1 . Then the principle of inclusion-exclusion
states that |A1 | = |A1 |, which is trivially true.
Now consider a collection of exactly two sets A1 and A2 . We know that
[
[
[ \
A B = (A \ B) (B \ A) (A B)
Furthermore, the three sets on the right-hand side of that equation must be disjoint. Therefore, by the addition principle, we have
[
\
|A B| = |A \ B| + |B \ A| + |A B|
\
\
\
= |A \ B| + |A B| + |B \ A| + |A B| |A B|
\
= |A| + |B| |A B|
So the principle of inclusion-exclusion holds for any two sets.
Now consider a collection of N > 2 finite sets A1 , A2 , . . . AN . We assume that the principle
of inclusion-exclusion holds for any collection of M sets where 1 6 M < N. Because the
union of sets is associative, we may break up the union of all sets in the collection into a
union of two sets:
!
N
N
1
[
[
[
Ai =
Ai
AN
i=1
i=1
By the principle of inclusion-exclusion for two sets, we have

!
1
N
1
N[

[
N[
\

Ai
AN
Ai + |AN |
Ai =

i=1
i=1
i=1
Now, let Ik be the collection of all k-fold intersections of A1 , A2 , . . . AN 1 , and let Ik0 be
the collection of all k-fold intersections of A1 , A2 , . . . AN that include AN . Note that AN is
included in every member of Ik0 and in no member of Ik , so the two sets do not duplicate
one another.
We then have

N
N
[
X

X

(1)(j+1)
|S| + |AN |
Ai =

i=1
j=1
SIj
344
N
1
[
i=1
Ai

AN

by the principle of inclusion-exclusion for a collection of N 1 sets. Then, we may distribute

set intersection over set union to find that

1
N
N
N[
[

X
X
\

(j+1)
+
|A
|
|S|
(A
A
)
A
(1)
=

N
i
N
i

i=1
j=1
i=1
SIj
Note, however, that
(Ax
AN )
[
\
\ \
(Ay
AN ) = (Ax Ay
AN )
Henc we may again apply the principle of inclusion-exclusion for N 1 sets, revealing that

N
1
N
1
N

[
X
X
X
X
\

(1)(j+1)
(1)(j+1)
|S| + |AN |
|S
AN |
Ai =

j=1
j=1
i=1
SIj
SIj
N
1
N
1
X
X
X
X
(1)(j+1)
(1)(j+1)
|S|
|S| + |AN |
=
j=1
N
1
X
j=1
N
1
X
j=1
SIj
(1)(j+1)
(1)(j+1)
SIj
SIj
j=1
|S| + |AN |
|S| + |AN | +
N
X
j=2
N
X
j=2
(1)(j)
0
SIj+1
SIj0
(1)(j+1)
|S|
SIj0
|S|
The second sum does not include I10 . Note, however, that I10 = {AN }, so we have

N
N
1
N

[
X
X
X
X

(1)(j+1)
(1)(j+1)
|S| +
|S|
Ai =

0
j=1
j=1
i=1
SIj
SIj
N
1
X
X
X
(1)(j+1)
|S| +
=
|S|
j=1
SIj
SIj0
Combining the two sums yields the principle of inclusion-exclusion for N sets.
345
Chapter 48
05B15 Orthogonal arrays, Latin
squares, Room squares
48.1
example of Latin squares
It is easily shown that the multiplication table (Cayley-table) of a group has exactly these
properties and thus are latin squares. The converse, however, is (unfortunately) not true, ie.
not all Latin squares are multiplication tables for a group (the smallest counter example is
a Latin square of order 5).
Version: 2 Owner: jgade Author(s): jgade
48.2
graeco-latin squares
Let A = (aij ) and B = (bij ) be two n n matrices. We define their join as the matrix whose
(i, j)th entry is the pair (aij , bij ).
A graeco-latin square is then the join of two latin squares.
The name comes from Eulers use of greek and latin letters to differentiate the entries on
each array.
An example of graeco-latin square:
a
d
b
c
b
c
a
d
c
b
d
a
346
d
a
c
b
48.3
latin square
A latin square of order n is an n n array such that each column and each row are made
with the same n symbols, using every one exactly once time.
Examples.
a
c
d
b
b
d
c
a
c
a
b
d
d
b
a
c
1
4
2
3
2
3
1
4
3
2
4
1
4
1
3
2
48.4
magic square
A magic square of order n is an n n array using each one of the numbers 1, 2, 3, . . . , n2 once
and such that the sum of the numbers in each row, column or main diagonal is the same.
Example:
8 1 6
3 5 7
4 9 2
Its easy to prove that the sum is always 21 n(n2 + 1). So in the example with n = 3 the sum
is always 12 (3 10) = 15.
347
Chapter 49
05B35 Matroids, geometric lattices
49.1
matroid
A matroid, or an independence structure, is a kind of finite mathematical structure whose

properties imitate the properties of a finite subset of a vector space. Notions such as rank
and independence (of a subset) have a meaning for any matroid, as does the notion of duality.
A matroid permits several equivalent formal definitions: two definitions in terms of a rank
function, one in terms of independant subsets, and several more.
For a finite set X, (X) will denote the set of all subsets of X, and |X| will denote the
number of elements of X. E is a fixed finite set throughout.
Definition 1: A matroid is a pair (E, r) where r is a mapping (E) N satisfying these
axioms:
r1) r(S) |S| for all S E.
r2) If S T E then r(S) r(T ).
r3) For any subsets S and T of E,
[
\
r(S T ) + r(S
T ) r(S) + r(T ).
The matroid (E, r) is called normal if also
r*) r({e}) = 1 for any e E.
r is called the rank function of the matroid. (r3) is called the submodular inequality.
The notion of isomorphism between one matroid (E, r) and another (F, s) has the expected
meaning: there exists a bijection f : E F which preserves rank, i.e. satisfies s(f (A)) =
348
r(A) for all A E.

Definition 2: A matroid is a pair (E, r) where r is a mapping (E) N satisfying these
axioms:
q1) r() = 0.
q2) If x E and S E then r(S
{x}) r(S) {0, 1}.

S
S
S
q3) If x, y E and S E and r(S {x}) = r(S {y}) = r(S) then r(S {x, y}) = r(S).
Definition 3: A matroid is a pair (E, I) where I is a subset of (E) satisfying these axioms:
i1) I.
i2) If S T E and T I then S I.
i3) If S, T I and S, T U E and S and T are both maximal subsets of U with the
property that they are in I, then |S| = |T |.
An element of I is called an independent set. (E, I) is called normal if any singleton subset
of E is independant, i.e.
i*) {x} I for all x E
Definition 4: A matroid is a pair (E, B) where B is a subset of (E) satisfying these
axioms:
b1) B 6= .
b2) If S, T B and S T then S = T .
b3) If S, T B and x E S then there exists y E T such that (S
An element of B is called a basis (of E). (E, B) is called normal if also
S
b*) bB b = E
x) y B.
i.e. if any singleton subset of E can be extended to a basis.
Definition 5: A matroid is a pair (E, ) where is a mapping (E) (E) satisfying

these axioms:
1) S (S) for all S E.
2) If S (T ) then (S) (T ).
S
S
3) If x (S {y}) (S) then y (S {x}).
is called the span mapping of the matroid, and (A) is called the span of the subset A.
349
(E, ) is called normal if also

*) () =
Definition 6: A matroid is a pair (E, C) where C is a subset of (E) satisfying these
axioms:
c1)
/ C.
c2) If S, T C and S T then S = T .
c3) If S,
ST C and S 6= T and x S
U S T.
T then there exists U C such that x

/ U and
An element of C is called a circuit. (E, C) is called normal if also

c*) No singleton subset of E is a circuit.
49.1.1
Equivalence of the definitions
It would take several pages to spell out what is a circuit in terms of rank, and likewise for
each other possible pair of the alternative defining notions, and then to prove that the various
sets of axioms unambiguously define the same structure. So let me sketch just one example:
the equivalence of Definitions 1 (on rank) and 6 (on circuits). Assume first the conditions in
Definition 1. Define a circuit as a minimal subset A of (E) having the property r(A) < |A|.
With a little effort, we verify the axioms (c1)-(c3). Now assume (c1)-(c3), and let r(A) be
the largest integer n such that A has a subset B for which
B contains no element of C
n = |B|.
One now proves (r1)-(r3). Next, one shows that if we define C in terms of r, and then
another rank function s in terms of C, we end up with s=r. The equivalence of (r*) and
(c*) is easy enough as well.
49.1.2
Examples of matroids
Let V be a vector space over a field k, and let E be a finite subset of V . For S E, let r(S)
be the dimension of the subspace of V generated by S. Then (E, r) is a matroid. Such a
matroid, or one isomorphic to it, is said to be representable over k. The matroid is normal
iff 0
/ E. There exist matroids which are not representable over any field.
The second example of a matroid comes from graph theory. The following definition will be
rather informal, partly because the terminology of graph theory is not very well standardised.
350
For our present purpose, a graph consists of a finite set V , whose elements are called vertices,
plus a set E of two-element subsets of V , called edges. A circuit in the graph is a finite set
of at least three edges which can be arranged in a cycle:
{a, b}, {b, c}, . . . {y, z}, {z, a}
such that the vertices a, b, . . . z are distinct. With circuits thus defined, E satisfies the axioms
in Definition 6, and is thus a matroid, and in fact a normal matroid. (The definition is easily
adjusted to permit graphs with loops, which define non-normal matroids.) Such a matroid,
or one isomorphic to it, is called graphic.
S
Let E = A B be a finite set, where A and B are nonempty and disjoint. Let G a subset of
A B. We get a matching matroid on E as follows. Each element of E defines a line
which is a subset (a row or column) of the set A B. Let us call the elements of G points.
For any S E let r(S) be the largest number n such that for some set of points P :
|P | = n
No two points of P are on the same line
Any point of P is on a line defined by an element of S.
One can prove (it is not trivial) that r is the rank function of a matroid on E. That
matroid is normal iff every line contains at least one point. matching matroids participate in
combinatorics, in connection with results on transversals, such as Halls marriage theorem.
49.1.3
The dual of a matroid
Proposition: Let E be a matroid and r its rank function. Define a mapping s : (E) N
by
s(A) = |A| r(E) + r(E A).
Then the pair (E, s) is a matroid (called the dual of (E, r).
We leave the proof as an exercise. Also, it is easy to verify that the dual of the dual is the
original matroid. A circuit in (E, s) is also referred to as a cocircuit in (E, r). There is a
notion of cobasis also, and cospan.
If the dual of E is graphic, E is called cographic. This notion of duality agrees with the
notion of same name in the theory of planar graphs (and likewise in linear algebra): given
a plane graph, the dual of its matroid is the matroid of the dual graph. A matroid that is
both graphic and cographic is called planar, and various criteria for planarity of a graph can
be extended to matroids. The notion of orientability can also be extended from graphs to
matroids.
351
49.1.4
Binary matroids
A matroid is said to be binary if it is representable over the field of two elements. There are
several other (equivalent) characterisations of a binary matroid (E, r), such as:
The symmetric difference of any family of circuits is the union of a family of pairwise
disjoint circuits.
T
For any circuit C and cocircuit D, we have |C D| 0 (mod 2).
Any graphic matroid is binary. The dual of a binary matroid is binary.
49.1.5
Miscellaneous
The definition of the chromatic polynomial of a graph,

X
(x) =
(1)|F | xr(E)r(F ) ,
F E
extends without change to any matroid. This polynomial has something to say about the
decomposibility of matroids into simpler ones.
Also on the topic of decomposibility, matroids have a sort of structure theory, in terms of
what are called minors and separators. That theory, due to Tutte, goes by induction; roughly
speaking, it is an adaptation of the old algorithms for putting a matrix into a canonical form.
Along the same lines are several theorems on basis exchange, such as the following. Let
E be a matroid and let
A = {a1 , . . . , an }
B = {b1 , . . . , bn }
be two (equipotent) bases of E. There exists a permutation of the set {1, . . . , n} such
that, for every m from 0 to n,
{a1 , . . . , am , b(m+1) , . . . , b(n) }
is a basis of E.
49.1.6
Further reading
A good textbook is:

James G. Oxley, Matroid Theory, Oxford University Press, New York etc., 1992
plus the updates-and-errata file at Dr. Oxleys website.
352
The chromatic polynomial is not discussed in Oxley, but see e.g. Zaslavski.
Version: 3 Owner: drini Author(s): Larry Hammick, NeuRet
49.2
polymatroid
The polymatroid defined by a given matroid (E, r) is the set of all functions w : E R
such that
w(e) 0
for all e E
X
w(e) r(S)
for all S E .
eS
Polymatroids are related to the convex polytopes seen in linear programming, and have
similar uses.
Version: 1 Owner: nobody Author(s): Larry Hammick
353
Chapter 50
05C05 Trees
50.1
AVL tree
An AVL tree is A balanced binary search tree where the height of the two subtrees (children)
of a node differs by at most one. Look-up, insertion, and deletion are O(ln n), where n is
the number of nodes in the tree.
The structure is named for the inventors, Adelson-Velskii and Landis (1962).
50.2
Aronszajn tree
A -tree T for which |T | < for all < and which has no cofinal branches is called a
-Aronszajn tree. If = 1 then it is referred to simply as an Aronszajn tree.
If there are no -Aronszajn trees for some then we say has the tree property. has
the tree property, but no singular cardinal has the tree property.
50.3
Suslin tree
An Aronszajn tree is a Suslin tree iff it has no uncountable antichains.

354
50.4
antichain
A subset A of a poset (P, <P ) is an antichain if no two elements are comparable. That is,
if a, b A then a P b and b P a.
A maximal antichain of T is one which is maximal.
In particular, if (P, <P ) is a tree then the maximal antichains are exactly those antichains
which intersect every branch, and if the tree is splitting then every level is a maximal
antichain.
50.5
balanced tree
A balanced tree is a rooted tree where each subtree of the root has an equal number of
nodes (or as near as possible). For an example, see binary tree.
50.6
binary tree
A binary tree is a rooted tree where every node has two or fewer children. A balanced
binary tree is a binary tree that is also a balanced tree. For example,
A
B
C
E
D
is a balanced binary tree.

The two (potential) children of a node in a binary tree are often called the left and right
children of that node. The left child of some node X and all that childs descendents are the
left descendents of X. A similar definition applies to Xs right descendents. The left
subtree of X is Xs left descendents, and the right subtree of X is its right descendents.
Since we know the maximum number of children a binary tree node can have, we can make
some statements regarding minimum and maximum depth of a binary tree as it relates to
355
the total number of nodes. The maximum depth of a binary tree of n nodes is n 1 (every
non-leaf node has exactly one child). The minimum depth of a binary tree of n nodes (n > 0)
is dlog2 ne (every non-leaf node has exactly two children, that is, the tree is balanced).
A binary tree can be implicitly stored as an array, if we designate a constant, maximum
depth for the tree. We begin by storing the root node at index 0 in the array. We then store
its left child at index 1 and its right child at index 2. The children of the node at index 1
are stored at indices 3 and 4, and the chldren of the node at index 2 are stored at indices 5
and 6. This can be generalized as: if a node is stored at index k, then its left child is located
at index 2k + 1 and its right child at 2k + 2. This form of implicit storage thus eliminates
all overhead of the tree structure, but is only really advantageous for trees that tend to be
balanced. For example, here is the implicit array representation of the tree shown above.
A
C D F G
Many data structures are binary trees. For instance, heaps and binary search trees are binary
trees with particular properties.
Version: 3 Owner: Daume Author(s): Daume, Logan
50.7
branch
A subset B of a tree (T, <T ) is a branch if B is a maximal linearly ordered subset of T .

That is:
<T is a linear ordering of B
S
If t T \ B then B {t} is not linearly ordered by <T .
This is the same as the intuitive conception of a branch: it is a set of nodes starting at the
root and going all the way to the tip (in infinite sets the conception is more complicated,
since there may not be a tip, but the idea is the same). Since branches are maximal there is
no way to add an element to a branch and have it remain a branch.
A cofinal branch is a branch which intersects every level of the tree.
50.8
child node (of a tree)
A child node C of a node P in a tree is any node connected to P which has a path distance
from the root node R which is one greater than the path distance between P and R.
356
Drawn in the canonical root-at-top manner, a child node of a node P in a tree is simply any
node immediately below P which is connected to it.
Figure: A node (blue) and its children (red.)
50.9
complete binary tree
A complete binary tree is a binary tree with the additional property that every node
must have exactly two children if an internal node, and zero children if a leaf node.
More precisely: for our base case, the complete binary tree of exactly one node is simply
the tree consisting of that node by itself. The property of being complete is preserved
if, at each step, we expand the tree by connecting exactly zero or two individual nodes (or
complete binary trees) to any node in the tree (but both must be connected to the same
node.)
50.10
digital search tree
A digital search tree is a tree which stores strings internally so that there is no need for
extra leaf nodes to store the strings.
357
50.11
digital tree
A digital tree is a tree for storing a set of strings where nodes are organized by substrings
common to two or more strings. Examples of digital trees are digital search trees and tries.
50.12
example of Aronszajn tree
Construction 1: If is a singular cardinal then there is a simple construction of a Aronszajn tree. Let hk i< with < be a sequence cofinal in . Then consider the tree
where T = {(, k ) | < k < } with (1 , k1 ) <T (2 , k2 ) iff 1 < 2 and k1 = k2 .
Note that this is similar to (indeed, a subtree of) the construction given for a tree with no
cofinal branches. It consists of disjoint branches, with the -th branch of height k . Since
< , every level has fewer than elements, and since the sequence is cofinal in , T must
have height and cardinality .
Construction 2: We can construct an Aronszajn tree out of the compact subsets of Q+ .
<T will be defined by x <T y iff y is an end-extension of x. That is, x y and if r y \ x
and s x then s < r.
S
Let T0 = {[0]}. Given a level T , let T+1 = {x {q} | x T q > max x}. ThatSis, for
every element x in T and every rational number q larger than any element of x, x {q} is
an element of T+1 . If < 1 is a limit ordinal then each element of T is the union of some
branch in T ().
We can show by induction that |T | < 1 for each < 1 . For the base case, T0 has only one
element. If |T | < 1 then |T+1 | = |T | |Q| = |T | = < 1 . If < 1 is a limit ordinal
then T () is a countable union of countable sets, and therefore itself countable. Therefore
there are a countable number of branches, so T is also countable. So T has countable levels.
Suppose T has an uncountable branch, B = hb0 , b1 , . . .i. Then for any i < j < 1 , bi bj .
Then for each i, there is some xi bi+1 \ bi such that xi is greater than any element of
bi . Then hx0 , x1 , . . .i is an uncountable increasing sequence of rational numbers. Since the
rational numbers are countable, there is no such sequence, so T has no uncountable branch,
and is therefore Aronszajn.
358
50.13
example of tree (set theoretic)
The set Z+ is a tree with <T =<. This isnt a very interesting tree, since it simply consists
of a line of nodes. However note that the height is even though no particular node has
that height.
A more
interesting tree using Z+ defines m <T n if ia = m and ib = n for some i, a, b
S
Z+ {0}. Then 1 is the root, and all numbers which are not powers of another number are
in T1 . Then all squares (which are not also fourth powers) for T2 , and so on.
To illustrate the concept of a cofinal branch, observe that for any limit ordinal we can
construct a -tree which has no cofinal branches. We let T = {(, )| < < } and
(1 , 1 ) <T (2 , 2 ) 1 < 2 1 = 2 . The tree then has disjoint branches, each
consisting of the set {(, )| < } for some < . No branch is cofinal, since each branch
is capped at elements, but for any < , there is a branch of height + 1. Hence the
suprememum of the heights is .
50.14
extended binary tree
An extended binary tree is a transformation of any binary tree into a complete binary tree.
This transformation consists of replacing every null subtree of the original tree with special
nodes. The nodes from the original tree are then internal nodes, while the special nodes
are external nodes.
For instance, consider the following binary tree.
The following tree is its extended binary tree. Empty circles represent internal nodes, and
filled circles represent external nodes.
Every internal node in the extended tree has exactly two children, and every external node
is a leaf. The result is a complete binary tree.
359
50.15
external path length
Given a binary tree T , construct its extended binary tree T 0 . The external path length
of T is then defined to be the sum of the lengths of the paths to each of the external nodes.
For example, let T be the following tree.
The extended binary tree of T is
The external path length of T (denoted E) is

E = 2 + 3 + 3 + 3 + 3 + 3 + 3 = 20
The internal path length of T is defined to be the sum of the lengths of the paths to each
of the internal nodes. The internal path length of our example tree (denoted I) is
I =1+2+0+2+1+2=8
Note that in this case E = I + 2n, where n is the number of internal nodes. This happens
to hold for all binary trees.
50.16
internal node (of a tree)
An internal node of a tree is any node which has degree greater than one. Or, phrased in
rooted tree terminology, the internal nodes of a tree are the nodes which have at least one
child node.
360
Figure: A tree with internal nodes highlighted in red.
50.17
leaf node (of a tree)
A leaf of a tree is any node which has degree of exactly 1. Put another way, a leaf node of
a rooted tree is any node which has no child nodes.
Figure: A tree with leaf nodes highlighted in red.
50.18
parent node (in a tree)
A parent node P of a node C in a tree is the first node which lies along the path from C
to the root of the tree, R.
Drawn in the canonical root-at-top manner, the parent node of a node C in a tree is simply
the node immediately above C which is connected to it.
361
Figure: A node (blue) and its parent (red.)
50.19
proof that has the tree property
Let T be a tree with finite levels and an infinite number of elements. Then consider the
elements of T0 . T can be partitioned into the set of descendants of each of these elements,
and since any finite partition of an infinite set has at least one infinite partition, some element
x0 in T0 has an infinite number of descendants. The same procedure can be applied to the
children of x0 to give an element x1 T1 which has an infinite number of descendants, and
then to the children of x1 , and so on. This gives a sequence X = hx0 , x1 , . . .i. The sequence
is infinite since each element has an infinite number of descendants, and since xi+1 is always
of child of xi , X is a branch, and therefore an infinite branch of T .
50.20
root (of a tree)
The root of a tree is a place-holder node. It is typically drawn at the top of the page, with
the other nodes below (with all nodes having the same path distance from the root at the
same height.)
362
Figure: A tree with root highlighted in red.
Any tree can be redrawn this way, selecting any node as the root. This is important to
note: taken as a graph in general, the notion of root is meaningless. We introduce a root
explicitly when we begin speaking of a graph as a tree there is nothing in general that
selects a root for us.
However, there are some special cases of trees where the root can be distinguished from the
other nodes implicitly due to the properties of the tree. For instance, a root is uniquely
identifiable in a complete binary tree, where it is the only node with degree two.
50.21
tree
Formally, a forest is an undirected, acyclic graph. A forest consists of trees, which are
themselves acyclic, connected graphs. For example, the following diagram represents a forest,
each connected component of which is a tree.
All trees are forests, but not all forests are trees. As in a graph, a forest is made up of vertices
(which are often called nodes interchangeably) and edges. Like any graph, the vertices and
edges may each be labelled that is, associated with some atom of data. Therefore a forest
or a tree is often used as a data structure.
Often a particular node of a tree is specified as the root. Such trees are typically drawn with
the root at the top of the diagram, with all other nodes depending down from it (however
this is not always the case). A tree where a root has been specified is called a rooted tree. A
363
tree where no root has been specified is called a free tree. When speaking of tree traversals,
and most especially of trees as datastructures, rooted trees are often implied.
The edges of a rooted tree are often treated as directed. In a rooted tree, every non-root
node has exactly one edge that leads to the root. This edge can be thought of as connecting
each node to its parent. Often rooted trees ae considered directed in the sense that all edges
connect parents to their children, but not vice-versa. Given this parent-child relationship, a
descendant of a node in a directed tree is defined as any other node reachable from that
node (that is, a nodes children and all their descendants).
Given this directed notion of a rooted tree, a rooted subtree can be defined as any node
of a tree and all of its descendants. This notion of a rooted subtree is very useful in dealing
with trees inductively and defining certain algorithms inductively.
Because of their simple structure and unique properties, trees and forests have many uses.
Because of the simple definition of various tree traversals, they are often used to store and
lookup data. Many algorithms are based upon trees, or depend upon a tree in some manner,
such as the heapsort algorithm or Huffman encoding. There are also a great many specific
forms and families of trees, each with its own constraints, strengths, and weaknesses.
50.22
weight-balanced binary trees are ultrametric
Let X be the set of leaf nodes in a weight-balanced binary tree. Let the distance between
leaf nodes be identified with the weighted path length between them. We will show that this
distance metric on X is ultrametric.
Before we begin, let the join of any two nodes x, y, denoted x y, be defined as the node
z which is the most immediate common ancestor of x and y (that is, the common ancestor
which is farthest from the root). Also, we are using weight-balanced in the sense that
the weighted path length from the root to each leaf node is equal, and
each subtree is weight-balanced, too.
Lemma: two properties of weight-balanced trees

Because the tree is weight-balanced, the distances between any node and each of the leaf
node descendents of that node are equal. So, for any leaf nodes x, y,
d(x, x y) = d(y, x y)
364
(50.22.1)
Hence,
d(x, y) = d(x, x y) + d(y, x y) = 2 d(x, x y)
(50.22.2)
Back to the main proof

We will now show that the ultrametric three point condition holds for any three leaf nodes
in a weight-balanced binary tree.
Consider any three points a, b, c in a weight-balanced binary tree. If d(a, b) = d(b, c) = d(a, c),
then the three point condition holds. Now assume this is not the case. Without loss of
generality, assume that d(a, b) < d(a, c).
Applying Eqn. 50.22.2,
2 d(a, a b) < 2 d(a, a c)

d(a, a b) < d(a, a c)
Note that both a b and a c are ancestors of a. Hence, a c is a more distant ancestor of
a and so must a c must be an ancestor of a b.
Now, consider the path between b and c. to get from b to c is to go from b up to a b, then
up to a c, and then down to c. Since this is a tree, this is the only path. The highest node
in this path (the ancestor of both b and c) was a c, so the distance d(b, c) = 2 d(b, a c).
But by Eqn. 50.22.1 and Eqn. 50.22.2 (noting that b is a descendent of a c), we have
d(b, c) = 2 d(b, a c) = 2 d(a, a c) = d(a, c)
To summarize, we have d(a, b) 6 d(b, c) = d(a, c), which is the desired ultrametric three
point condition. So we are done.
Note that this means that, if a, b are leaf nodes, and you are at a node outside the subtree
under a b, then d(you, a) = d(you, b). In other words, (from the point of view of distance
between you and them,) the structure of any subtree that is not your own doesnt matter to
you. This is expressed in the three point condition as if two points are closer to each other
than they are to you, then their distance to you is equal.
(above, we have only proved this if you are at a leaf node, but it works for any node which is
outside the subtree under a b, because the paths to a and b must both pass through a b).
Version: 2 Owner: bshanks Author(s): bshanks
365
50.23
weighted path length
Given an extended binary tree T (that is, simply any complete binary tree, where leafs are
denoted as external nodes), associate weights with each external node. The weighted
path length of T is the sum of the product of the weight and path length of each external
node, over all external nodes.
P
Another formulation is that weighted path length is
wj lj over all external nodes j, where
wj is the weight of an external node j, and lj is the distance from the root of the tree to j.
If wj = 1 for all j, then weighted path length is exactly the same as external path length.
Example
Let T be the following extended binary tree. Square nodes are external nodes, and circular
nodes are internal nodes. Values in external nodes indicate weights, which are given in this
problem, while values in internal nodes represent the weighted path length of subtrees rooted
at those nodes, and are calculated from the given weights and the given tree. The weight of
the tree as a whole is given at the root of the tree.
This tree happens to give the minimum weighted path length for this particular set of
weights.
366
Chapter 51
05C10 Topological graph theory,
imbedding
51.1
Heawood number
The Heawood number of a surface is the maximal number of colors needed to color any graph
embedded in the surface. For example, four-color conjecture states that Heawood number
of the sphere is four.
In 1890 Heawood proved for all surfaces except sphere that the Heawood number is
%
$
p
7 + 49 24e(S)
,
H(S) 6
2
where e(S) is the Euler characteristic of the surface.
Later it was proved in the works of Franklin, Ringel and Youngs that
$
%
p
7 + 49 24e(S)
H(S) >
.
2
For example, the complete graph on 7 vertices can be embedded in torus as follows:
1
4
6
5
1
5
2
367
REFERENCES
1. Bela Bollob
as. Graph Theory: An Introductory Course, volume 63 of GTM. Springer-Verlag,
1979. Zbl 0411.05032.
2. Thomas L. Saaty and Paul C. Kainen. The Four-Color Problem: Assaults and Conquest. Dover,
1986. Zbl 0463.05041.
51.2
Kuratowskis theorem
A finite graph is planar if and only if it contains no subgraph that is isomorphic to or is

a subdivision of K5 or K3,3 , where K5 is the complete graph of order 5 and K3,3 is the
complete bipartite graph of order 6. Wagners theorem is an equivalent later result.
REFERENCES
1. Kazimierz Kuratowski. Sur le problème des courbes gauches en topologie. Fund. Math., 15:271
283, 1930.
Version: 7 Owner: bbukh Author(s): bbukh, digitalis
51.3
Szemer
edi-Trotter theorem
The number of incidences of a set of n points and a set of m lines in the real plane R2 is
2
I = O(n + m + (nm) 3 ).
Proof. Lets consider the points as vertices of a graph, and connect two vertices by an edge
if they are adjacent on some line. Then the number of edges is e = I m. If e < 4n then
we are done. If e > 4n then by crossing lemma
m2 > cr(G) >
1 (I m)3
,
64
n2
and the theorem follows.

Recently, Toth[1] extended the theorem to the complex plane C2 . The proof is difficult.
368
REFERENCES
1. Csaba D. Toth. The Szemeredi-Trotter theorem in the complex plane. arXiv:CO/0305283, May
2003.
51.4
crossing lemma
The crossing number of a graph G with n vertices and m > 4n edges is

cr(G) >
1 m3
.
64 n2
51.5
crossing number
The crossing number cr(G) of a graph G is the minimal number of crossings among all
embeddings of G in the plane.
51.6
graph topology
A graph (V, E) is identified by its vertices V = {v1 , v2 , . . .} and its edges E = {{vi , vj }, {vk , vl }, . . .}.
A graph also admits a natural topology, called the graph topology, by identifying every
edge {vi , vj } with the unit interval I = [0, 1] and gluing them together at coincident vetices.
This construction can be easily realized in theSframework of simplicial complexes. We can
form a simplicial complex G = {{v} | v V } E. And the desired topological realization
of the graph is just the geometric realization |G| of G.
Viewing a graph as a topological space has several advantages:
The notion of graph isomorphism simply becomes that of homeomorphism.
The notion of a connected graph coincides with topological conectedness.
A connected graph is a tree iff its fundamental group is trivial.
369
51.7
planar graph
A planar graph is a graph which can be drawn on a plane (flat 2-d surface) with no edge
crossings.
No complete graphs above K4 are planar. K4 , drawn without crossings, looks like :
A
B
C
Hence it is planar (try this for K5 .)

51.8
proof of crossing lemma
Eulers formula implies the linear lower bound cr(G) > m 3n + 6, and so it cannot be used
directly. What we need is to consider the subgraphs of our graph, apply Eulers formula on
them, and then combine the estimates. The probabilistic method provides a natural way to
do that.
Consider a minimal embedding of G. Choose independently every vertex of G with probability p. Let Gp be a graph induced by those vertices. By Eulers formula, cr(Gp )mp +3np > 0.
The expectation is clearly
E(cr(Gp ) mp + 3np ) > 0.
Since E(np ) = pn, E(mp ) = p2 m and E(Xp ) = p4 cr(G), we get an inequality that bounds
the crossing number of G from below,
cr(G) > p2 m 3p3 n.
Now set p =
4n
m
(which is at most 1 since m > 4n), and the inequaliy becomes

cr(G) >
1 m3
.
64 n2
370
Similarly, if m > 92 n, then we can set p =
9n
2m
cr(G) >
to get
4 m3
.
243 n2
REFERENCES
1. Martin Aigner and G
unter M. Ziegler. Proofs from THE BOOK. Springer, 1999.
371
Chapter 52
05C12 Distance in graphs
52.1
Hamming distance
In comparing two bit patterns, the Hamming distance is the count of bits different in the two
patterns. More generally, if two ordered lists of items are compared, the Hamming distance
is the number of items that do not identically agree. This distance is applicable to encoded
information, and is a particularly simple metric of comparison, often more useful than the
city-block distance or Euclidean distance.
References
Originally from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html)
372
Chapter 53
05C15 Coloring of graphs and
hypergraphs
53.1
bipartite graph
A bipartite graph is a graph with a chromatic number of 2.

The following graph, for example, is bipartite:
A
B
E
One way to think of a bipartite graph is by partitioning the vertices into two disjoint sets
where vertices in one set are adjacent only to vertices in the other set. In the above graph,
this may be more obvious with a different representation:
373
The two subsets are the two columns of vertices, all of which have the same colour.
A graph is bipartite if and only if all its cycles have even length. This is easy to see intuitively:
any path of odd length on a bipartite must end on a vertex of the opposite colour from the
beginning vertex and hence cannot be a cycle.
53.2
chromatic number
The chromatic number of a graph is the minimum number of colours required to colour
it.
Consider the following graph:
A
This graph has been coloured using 3 colours. Furthermore, its clear that it cannot be
coloured with fewer than 3 colours, as well: it contains a subgraph (BCD) that is isomorphic
to the complete graph of 3 vertices. As a result, the chromatic number of this graph is indeed
3.
This example was easy to solve by inspection. In general, however, finding the chromatic
number of a large graph (and, similarly, an optimal colouring) is a very difficult (NP-hard)
problem.
374
53.3
chromatic number and girth
A famous theorem of P. Erdos 1 .

Theorem 6. For any natural numbers k and g, there exists a graph G with chromatic number
(G) k and girth girth(G) g.
Obviously, we can easily have graphs with high chromatic numbers. For instance, the
complete graph Kn trivially has (Kn ) = n; however girth(Kn ) = 3 (for n 3). And
the cycle graph Cn has girth(Cn ) = n, but
1 n = 1
(Cn ) = 2 n even
3 otherwise.
It seems intuitively plausible that a high chromatic number occurs because of short, local
cycles in the graph; it is hard to envisage how a graph with no short cycles can still have
high chromatic number.
Instead of envisaging, Erdos proof shows that, in some appropriately chosen probability space
on graphs with n vertices, the probability of choosing a graph which does not have (G) k
and girth(G) g tends to zero as n grows. In particular, the desired graphs exist.
This seminal paper is probably the most famous application of the probabilistic method, and
is regarded by some as the foundation of the method.2 Today the probabilistic method is
a standard tool for combinatorics. More constructive methods are often preferred, but are
almost always much harder.
53.4
chromatic polynomial
Let G be a graph (in the sense of graph theory) whose set V of vertices is finite and nonempty,
and which has no loops or multiple edges. For any natural number x, let (G, x), or just (x),
denote the number of x-colorations of G, i.e. the number of mappings f : V {1, 2, . . . , x}
such that f (a) 6= f (b) for any pair (a, b) of adjacent vertices. Let us prove that (which
is called the chromatic polynomial of the graph G) is a polynomial function in x with
coefficients in Z. Write E for the set of edges in G. If |E|=0, then trivially (x) = x|V |
(where | | denotes the number of elements of a finite set). If not, then we choose an edge e
1
See the very readable P. Erd

os, Graph theory and probability, Canad J. Math. 11 (1959), 3438.
However, as always, with the benefit of hindsight we can see that the probabilistic method had been used
before, e.g. in various applications of Sards theorem. This does nothing to diminish from the importance
of the clear statement of the tool.
2
375
and construct two graphs having fewer edges than G: H is obtained from G by contracting
the edge e, and K is obtained from G by omitting the edge e. We have
(G, x) = (K, x) (H, x)
(53.4.1)
for all x N, because the polynomial (K, x) is the number of colorations of the vertices of
G which might or might not be valid for the edge e, while (H, x) is the number which are
not valid. By induction on |E|, (53.4.1) shows that (G, x) is a polynomial over Z.
By refining the argument a little, one can show
(x) = x|V | |E|x|V |1 + . . . sxk ,
for some nonzero integer s, where k is the number of connected components of G, and the
coefficients alternate in sign.
With the help of the Mobius-Rota inversion formula (see Moebius inversion), or directly by
induction, one can prove
X
(x) =
(1)|F | x|V |r(F )
F E
where the sum is over all subsets F of E, and r(F ) denotes the rank of F in G, i.e. the
number of elements of any maximal cycle-free subset of F . (Alternatively, the sum may be
taken only over subsets F such that F is equal to the span of F ; all other summands cancel
out in pairs.)
The chromatic number of G is the smallest x > 0 such that (G, x) > 0 or, equivalently,
such that (G, x) 6= 0.
The Tutte polynomial of a graph, or more generally of a matroid (E, r), is this function
of two variables:
X
t(x, y) =
(x 1)r(E)r(F ) (y 1)|F |r(F ) .
F E
Compared to the chromatic polynomial, the Tutte contains more information about the
matroid. Still, two or more nonisomorphic matroids may have the same Tutte polynomial.
Version: 5 Owner: bbukh Author(s): bbukh, Larry Hammick
53.5
colouring problem
The colouring problem is to assign a colour to every vertex of a graph such that no two
adjacent vertices have the same colour. These colours, of course, are not necessarily colours
in the optic sense.
376
One potential colouring of this graph is:

A
A and C have the same colour; B and E have a second colour; and D and F have another.
Graph colouring problems have many applications in such situations as scheduling and
matching problems.
53.6
complete bipartite graph
The complete bipartite graph Kn,m is a graph with two sets of vertices, one with n
members and one with m, such that each vertex in one set is adjacent to every vertex in the
other set and to no vertex in its own set. As the name implies, Kn,m is bipartite.
Examples of complete bipartite graphs:
K2,5 :
C
A
D
E
F
G
K3,3 :
377
53.7
complete k-partite graph
The complete k-partite graph Ka1 ,a2 ...ak is a k-partite graph with a1 , a2 . . . ak vertices of each
colour wherein every vertex is adjacent to every other vertex with a different colour and to
no vertices with the same colour.
For example, the 3-partite complete graph K2,3,4 :
A
D
E
G
53.8
four-color conjecture
The four-color conjecture was a long-standing problem posed by Guthrie while coloring a
map of England. The conjecture states that every map on a plane or a sphere can be colored
using only four colors such that no two adjacent countries are assigned the same color. This
is equivalent to the statement that chromatic number of every planar graph is no more than
four. After many unsuccessfull attempts the conjecture was proven by Appel and Haken in
1976 with an aid of computer.
378
Interestingly, the seemingly harder problem of determining the maximal number of colors
needed for all surfaces other than the sphere was solved long before the four-color conjecture
was settled. This number is now called the Heawood number of the surface.
REFERENCES
1. Thomas L. Saaty and Paul C. Kainen. The Four-Color Problem: Assaults and Conquest. Dover,
1986.
53.9
k-partite graph
A k-partite graph is a graph with a chromatic number of k.

An alternate definition of a k-partite graph is a graph where the vertices are partitioned into
k subsets with the following conditions:
1. No two vertices in the same subset are 1. adjacent.
2. There is no partition of 2. the vertices with fewer than k subsets where condition 1 holds.
These two definitions are equivalent. Informally, we see that a colour can be assigned to all
the vertices in each subset, since they are not adjacent to one another. Furthermore, this is
also an optimal colouring, since the second condition holds.
An example of a 4-partite graph:
A
H
D
A 2-partite graph is also called a bipartite graph.

379
53.10
property B
A hypergraph G is said to possess property B if it 2-colorable, i.e., its vertices can be colored
in two colors, so that no edge of G is monochromatic.
The property was named after Felix Bernstein by E. W. Miller.
380
Chapter 54
05C20 Directed graphs (digraphs),
tournaments
54.1
cut
On a digraph, define a sink to be a vertex with out-degree zero and a source to be a vertex
with in-degree zero. Let G be a digraph with non-negative weights and with exactly one
sink and exactly one source. A cut C on G is a subset of the edges such that every path
from the source to the sink passes through an edge in C. In other words, if we remove every
edge in C from the graph, there is no longer a path from the source to the sink.
Define the weight of C as
WC =
W (e)
eC
where W (e) is the weight of the edge e.
Observe that we may achieve a trivial cut by removing all the edges of G. Typically, we are
more interested in minimal cuts, where the weight of the cut is minimized for a particular
graph.
54.2
de Bruijn digraph
The vertices of the de Bruijn digraph B(n, m) are all possible words of length m1 chosen
from an alphabet of size n.
B(n, m) has nm edges consisting of each possible word of length m from an alphabet of size
381
n. The edge a1 a2 . . . an connects the vertex a1 a2 . . . an1 to the vertex a2 a3 . . . an .

For example, B(2, 4) could be drawn as:
0000
000
1000
100
0001
1001
0100
001
0010
010
1100
1010
0101
0011
101
1101
110
1011
0110
1110
011
0111
111
1111
Notice that an Euler cycle on B(n, m) represents a shortest sequence of characters from an
alphabet of size n that includes every possible subsequence of m characters. For example,
the sequence 000011110010101000 includes all 4-bit subsequences. Any de Bruijn digraph
must have an Euler cycle, since each vertex has in degree and out degree of m.
54.3
directed graph
A directed graph or digraph is a pair (V, E) where V is a set of vertices and E is a

subset of V V called edges or arcs.
If E is symmetric (i.e., (u, v) E if and only if (v, u) E), then the digraph is isomorphic
to an ordinary (that is, undirected) graph.
Digraphs are generally drawn in a similar manner to graphs with arrows on the edges to
indicate a sense of direction. For example, the digraph
({a, b, c, d}, {(a, b), (b, d), (b, c), (c, b), (c, c), (c, d)})
may be drawn as
382
54.4
flow
On a digraph, define a sink to be a vertex with out-degree zero and a source to be a vertex
with in-degree zero. Let G be a digraph with non-negative weights and with exactly one
sink and exactly one source. A flow on G is an assignment f : E(G) R of values to each
edge of G satisfying certain rules:
1. For any edge e, we must have 0 6 f (e) 6 W (e) (where W (e) is the weight of e).
2. For any vertex v, excluding the source and the sink, let Ein be the set of edges incident
to v and let Eout be the set of edges incident from v. Then we must have
X
X
f (e).
f (e) =
eEin
eEout
Let Esource be the edges incident from the source, and let Esink be the set of edges incident
to the sink. If f is a flow, then
X
X
f (e) =
f (e) .
eEsink
eEsource
We will refer to this quantity as the amount of flow.

Note that a flow given by f (e) = 0 trivially satisfies these conditions. We are typically more
interested in maximum flows, where the amount of flow is maximized for a particular
graph.
We may interpret a flow as a means of transmitting something through a network. Suppose
we think of the edges in a graph as pipes, with the weights corresponding with the capacities
of the pipes; we are pouring water into the system through the source and draining it through
the sink. Then the first rule requires that we do not pump more water through a pipe than
is possible, and the second rule requires that any water entering a junction of pipes must
383
leave. Under this interpretation, the maximum amount of flow corresponds to the maximum
amount of water we could pump through this network.
Instead of water in pipes, one may think of electric charge in a network of conductors. Rule
(2) above is one of Kirchoffs two laws for such networks; the other says that the sum of the
voltage drops around any circuit is zero.
Version: 3 Owner: nobody Author(s): Larry Hammick, vampyr
54.5
maximum flow/minimum cut theorem
Let G be a finite digraph with nonnegative weights and with exactly one sink and exactly
one source. Then
I) For any flow f on G and any cut C of G, the amount of flow for f is less than or equal to
the weight of C.
II) There exists a flow f0 on G and a cut C0 of G such that the flow of f0 equals the weight
of C0 .
Proof:(I) is easy, so we prove only (II). Write R for the set of nonnegative real numbers.
Let V be the set of vertices of G. Define a matrix
:V V R
where (x, y) is the sum of the weights (or capacities) of all the directed edges from x to y.
By hypothesis there is a unique v V (the source) such that
(x, v) = 0
x V
and a unique w V (the sink) such that

(w, x) = 0
x V .
We may also assume (x, x) = 0 for all x V . Any flow f will correspond uniquely (see
Remark below) to a matrix
:V V R
such that
X
z
(x, y) (x, y)
x, y V
X
(x, z) =
(z, x)
x 6= v, w
z
Let be the matrix of any maximal flow, and let A be the set of x V such that there
exists a finite sequence x0 = v, x1 , . . . , xn = x such that for all m from 1 to n 1, we have
either
(xm , xm+1 ) < (xm , xm+1 )
(54.5.1)
384
or
(xm+1 , xm ) > 0 .
(54.5.2)
Write B = V A.
Trivially, v A. Let us show that w B. Arguing by contradiction, suppose w A, and let
(xm ) be a sequence from v to w with the properties we just mentioned. Take a real number
> 0 such that
+ (xm , xm+1 ) < (xm , xm+1 )
for all the (finitely many) m for which (138.1.1) holds, and such that
(xm+1 , xm ) >
for all the m for which (54.5.2) holds. But now we can define a matrix with a larger flow
than (larger by ) by:
(xm , xm+1 ) = + (xm , xm+1 )
if (138.1.1) holds
(xm+1 , xm ) = (xm+1 , xm )
if (54.5.2) holds
(a, b) = (a, b)
for all other pairs (a, b) .
This contradiction shows that w B.

Now consider the set C of pairs (x, y) of vertices such that x V and y W . Since W is
nonempty, C is a cut. But also, for any (x, y) C we have
(x, y) = (x, y)
(54.5.3)
for otherwise we would have y V . Summing (54.5.3) over C, we see that the amount of
the flow f is the capacity of C, QED.
Remark: We expressed the proof rather informally, because the terminology of graph theory
is not very well standardized and cannot all be found yet here at PlanetMath. Please feel
free to suggest any revision you think worthwhile.
Version: 5 Owner: bbukh Author(s): Larry Hammick, vampyr
54.6
tournament
A tournament is a directed graph obtained by choosing a direction for each edge in an

undirected complete graph. For example, here is a tournament on 4 vertices:
1
4
385
Any tournament on a finite number n of vertices contains a Hamiltonian path, i.e., directed
path on all n vertices. This is easily shown by induction on n: suppose that the statement
holds for n, and consider any tournament T on n + 1 vertices. Choose a vertex v0 of T and
consider a directed path v1 , v2 , . . . , vn in T \ {v0 }. Now let i {0, . . . , n} be maximal such
that vj v0 for all j with 1 6 j 6 i. Then
v1 , . . . , vi , v0 , vi+1 , . . . , vn
is a directed path as desired.
The name tournament originates from such a graphs interpretation as the outcome of
some sports competition in which every player encounters every other player exactly once,
and in which no draws occur; let us say that an arrow points from the winner to the loser.
A player who wins all games would naturally be the tournaments winner. However, as the
above example shows, there might not be such a player; a tournament for which there isnt
is called a 1-paradoxical tournament. More generally, a tournament T = (V, E) is called
k-paradoxical if for every k-subset V 0 of V there is a v0 V \ V 0 such that v0 v for all
v V 0 . By means of the probabilistic method Erdos showed that if |V | is sufficiently large,
then almost every tournament on V is k-paradoxical.
Version: 3 Owner: bbukh Author(s): bbukh, draisma
386
Chapter 55
05C25 Graphs and groups
55.1
Cayley graph
Let G = hX|Ri be a presentation of the finitely generated group G with generators X and
relations R. We define the Cayley graph = (G, X) of G with generators X as
= (G, E) ,
where
E = {{u, a u} |u G, a X} .
That is, the vertices of the Cayley graph are precisely the elements of G, and two elements
of G are connected by an edge iff some generator in X transfers the one to the other.
Examples
1. G = Zd , with generators X = {e1 , . . . , ed }, the standard basis vectors. Then (G, X)
is the d-dimensional grid; confusingly, it too is often termed Zd .
2. G = Fd , the free group with the d generators X = {g1 , ..., gd }. Then (G, X) is the
2d-regular tree.
387
Chapter 56
05C38 Paths and cycles
56.1
Euler path
An Euler path along a connected graph with n vertices is a path connecting all n vertices,
and traversing every edge of the graph only once. Note that a vertex with an odd degree
allows one to traverse through it and return by another path at least once, while a vertex
with an even degree only allows a number of traversals through, but one cannot end an Euler
path at a vertex with even degree. Thus, a connected graph has an Euler path which is a
circuit (an Euler circuit) if all of its vertices have even degree. A connected graph has an
Euler path which is non-circuituous if it has exactly two vertices with odd degree.
This graph has an Euler path which is a circuit. All of its vertices are of even degree.
This graph has an Euler path which is not a circuit. It has exactly two vertices of odd degree.
Note that a graph must be connected to have an Euler path or circuit. A graph is connected
if every pair of vertices u and z has a path uv, . . . , yz between them.
56.2
Veblens theorem
The edge set of a graph can be partitioned into cycles if and only if every vertex has even
degree.
388
Version: 2 Owner: digitalis Author(s): digitalis
56.3
acyclic graph
Any graph that contains no cycles is an acyclic graph. A directed acyclic graph is often
called a DAG for short.
For example, the following graph and digraph are acyclic.
A
B
A
C
In contrast, the following graph and digraph are not acyclic, because each contains a cycle.
A
B
A
C
56.4
bridges of Knigsberg
The bridges of K
onigsberg is a famous problem inspired by an actual place and situation.
The solution of the problem, put forth by Leonhard Euler in 1736, is the first work of
graph theory and is responsible for the foundation of the discipline.
The following figure shows a portion of the Prussian city of Konigsberg. A river passes
through the city, and there are two islands in the river. Seven bridges cross between the
islands and the mainland:
Figure 1: Map of the K

onigsberg bridges.
The mathematical problem arose when citizens of Konigsburg noticed that one could not
take a stroll across all seven bridges, returning to the starting point, without crossing at
least one bridge twice.
389
Answering the question of why this is the case required a mathematical theory that didnt
exist yet: graph theory. This was provided by Euler, in a paper which is still available today.
To solve the problem, we must translate it into a graph-theoretic representation. We model
the land masses, A, B, C and D, as vertices in a graph. The bridges between the land
masses become edges. This generates from the above picture the following graph:
Figure 2: Graph-theoretic representation of the K

onigsburg bridges.
At this point, we can apply what we know about Euler paths and Euler circuits. Since an
Euler circuit for a graph exists only if every vertex has an even degree, the Konigsberg graph
must have no Euler circuit. Hence, we have explained why one cannot take a walk around
Konigsberg and return to the starting point without crossing at least one bridge more than
once.
56.5
cycle
A cycle in a graph, digraph, or multigraph, is simple path from a vertex to itself (i.e., a
path where the first vertex is the same as the last vertex and no edge is repeated).
For example, consider this graph:
A
ABCDA and BDAB are two of the cycles in this graph. ABA is not a cycle, however, since
it uses the edge connecting A and B twice. ABCD is not a cycle because it begins on A but
ends on D.
A cycle of length n is sometimes denoted Cn and may be referred to as a polygon of n sides:
that is, C3 is a triangle, C4 is a quadrilateral, C5 is a pentagon, etc.
An even cycle is one of even length; similarly, an odd cycle is one of odd length.
390
56.6
girth
The girth of a graph G is the length of the shortest cycle in G.1

For instance, the girth of any grid Zd (where d > 2) is 4, and the girth of the vertex graph
of the dodecahedron is 5.
56.7
path
A path in a graph is a finite sequence of alternating vertices and edges, beginning and ending
with a vertex, v1 e1 v2 e2 v3 . . . en1 vn such that every consecutive pair of vertices vx and vx+1
are adjacent and ex is incident with vx and with vx+1 . Typically, the edges may be omitted
when writing a path (e.g., v1 v2 v3 . . . vn ) since only one edge of a graph may connect two
adjacent vertices. In a multigraph, however, the choice of edge may be significant.
The length of a path is the number of edges in it.
A
Paths include (but are certainly not limited to) ABCD (length 3), ABCDA (length 4), and
ABABABABADCBA (length 12). ABD is not a path since B is not adjacent to D.
In a digraph, each consecutive pair of vertices must be connected by an edge with the proper
orientation; if e = (u, v) is an edge, but (v, u) is not, then uev is a valid path but veu is not.
Consider this digraph:
G
GHIJ, GJ, and GHGHGH are all valid paths. GHJ is not a valid path because H and
J are not connected. GJI is not a valid path because the edge connecting I to J has the
1
There is no widespread agreement on the girth of a forest, which has no cycles. It is also extremely
unimportant.
391
opposite orientation.
56.8
proof of Veblens theorem
The proof is very easy by induction on the number of elements of the set E of edges. If E is
empty, then all the vertices have degree zero, which is even. Suppose E is nonempty. If the
graph contains no cycle, then some vertex has degree 1, which is odd. Finally, if the graph
does contain a cycle C, then every vertex has the same degree mod 2 with respect to E C,
as it has with respect to E, and we can conclude by induction.
Version: 1 Owner: mathcam Author(s): Larry Hammick
392
Chapter 57
05C40 Connectivity
57.1
k-connected graph
For k N, a graph G is k-connected iff G has more than k vertices and if the graph left by
removing any k or less vertices is connected. The largest integer k such that G is k-connected
is called the connectivity of G and is denoted by (G).
Version: 1 Owner: lieven Author(s): lieven
57.2
Thomassens theorem on 3-connected graphs
Every 3-connected graph G with more than 4 vertices has an edge e such that G/e is also
3-connected.
Suppose such an edge doesnt exist. Then, for every edge e = xy, the graph G/e isnt
3-connected and can be made disconnected by removing 2 vertices. Since (G) > 3, our
contracted vertex vxy has to be one of these two. So for every edge e, G has a vertex z 6= x, y
such that {vxy , z} separates G/e. Any 2 vertices separated by {vxy , z} in G/e are separated
in G by S := {x, y, z}. Since the minimal size of a separating set is 3, every vertex in S has
an adjacent vertex in every component of G S.
Now we choose the edge e, the vertex z and the component C such that |C| is minimal. We
also choose a vertex v adjacent to z in C.
By construction G/zv is not 3-connected since removing xy disconnects C v from G/zv.
So there is a vertex w such that {z, v, w} separates G and as above every vertex in {z, v, w}
has an adjacent vertex in every component of G {z, v, w}. We now consider a component
D of G {z, v, w} that doesnt contain x or y. Such a component exists since x and y belong
393
to the same component and G {z, v, w} isnt connected. Any vertex adjacent to v in D
is also an element of C since v is an element of C. This means D is a proper subset of C
which contradicts our assumption that |C| was minimal.
57.3
Tuttes wheel theorem
Every 3-connected simple graph can be constructed starting from a wheel graph by repeatedly either adding an edge between two non-adjacent vertices or splitting a vertex.
57.4
connected graph
A connected graph is a graph such that there exists a path between all pairs of vertices.
If the graph is a directed graph, and there exists a path from each vertex to every other
vertex, then it is a strongly connected graph.
A connected component is a subset of vertices of any graph and any edges between them
that forms a connected graph. Similarly, a strongly connected component is a subset
of vertices of any digraph and any edges between them that forms a strongly connected
graph. Any graph or digraph is a union of connected or strongly connected components,
plus some edges to join the components together. Thus any graph can be decomposed
into its connected or strongly connected components. For instance, Tarjans algorithm can
decompose any digraph into its strongly connected components.
For example, the following graph and digraph are connected and strongly connected, respectively.
A
On the other hand, the following graph is not connected, and consists of the union of two
connected components.
394
The following digraph is not strongly connected, because there is no way to reach F from
other vertices, and there is no vertex reachable from C.
A
The three strongly connected components of this graph are

A
C F
57.5
cutvertex
A cutvertex of a graph G is a vertex whose deletion increases the number of components

of G. The edge analogue of a cutvertex is a bridge.
395
Chapter 58
05C45 Eulerian and Hamiltonian
graphs
58.1
Bondy and Chvtal theorem
Bondy and Chv

atals theorem.
Let G be a graph of order n 3 and suppose that u and v are distinct non adjacent vertices
such that deg(u) + deg(v) n.
Then G is Hamiltonian if and only if G + uv is hamiltonian.
58.2
Dirac theorem
Theorem: Every graph with n > 3 vertices and minimum degree at last
n
2
has a Hamiltonian cycle.
Proof: Let G = (V, E) be a graph with kGk = n > 3 and (G) > n2 . Then G is connected:
otherwise, the degree of any vertex in the smallest component C of G would be less then
kCk > n2 . Let P = x0 ...xk be a longest path in G. By the maximality of P, all the neighbours
of x0 and all the neighbours of xk lie on P. Hence at last n2 of the vertices x0 , ..., xk1 are
adjacent to xk , and at last n2 of these same k < n vertices xi are such that x0 xi+1 E. By the
pigeon hole principle, there is a vertex xi that has both properties, so we have x0 xi+1 E
and xi xk E for some i < k. We claim that the cycle C := x0 xi+1 P xk xi P x0 is a Hamiltonian
cycle of G. Indeed since G is connected, C would otherwise have a neighbour in G C, which
could be combined with a spanning path of C into a path longer than P. 2
396
58.3
Euler circuit
An Euler circuit is a connected graph such that starting at a vertex a, one can traverse along
every edge of the graph once to each of the other vertices and return to vertex a. In other
words, an Euler circuit is an Euler path that is a circuit. Thus, using the properties of odd
and even degree vertices given in the definition of an Euler path, an Euler circuit exists iff
every vertex of the graph has an even degree.
This graph is an Euler circuit as all vertices have degree 2.
This graph is not an Euler circuit.

58.4
Fleurys algorithm
Fleurys algorithm constructs an Euler circuit in a graph (if its possible).
1. Pick any vertex to start

2. From that vertex pick an edge to traverse, considering following rule: never cross a
bridge of the reduced graph unless there is no other choice
3. Darken that edge, as a reminder that you cant traverse it again
4. Travel that edge, coming to the next vertex
5. Repeat 2-4 until all edges have been traversed, and you are back at the starting vertex
By reduced graph we mean the original graph minus the darkened (already used) edges.
Version: 3 Owner: Johan Author(s): Johan
397
58.5
Hamiltonian cycle
let G be a graph. If theres a cycle visiting all vertices exactly once, we say that the cycle is
a hamiltonian cycle.
58.6
Hamiltonian graph
Let G be a graph or digraph.

If G has a Hamiltonian cycle, we call G a hamiltonian graph.
There is not useful necessary and sufficient condition for a graph being hamiltonian. However,
we can get some necessary conditions from the definition like a hamiltonian graph is always
connected and has order at least 3. This an other observations lead to the condition:
Let G = (V, E) be a graph of order at least 3. If G is hamiltonian, for every proper subset
U of V , the subgraph induced by V U has at most |U| components.
For the sufficiency conditions, we get results like Ores theorem or Bondy and Chvatal theorem
58.7
Hamiltonian path
Let G be a graph. A path on G that includes every vertex exactly once is called a hamiltonian path.
58.8
Ores theorem
Let G be a graph of order n 3 such that, for every pair of distinct non adjacent vertices u
and v, deg(u) + deg(v) n. Then G is a Hamiltonian graph.
398
58.9
Petersen graph
Petersens graph. An example of graph that is traceable but not Hamiltonian. That is, it
has a Hamiltonian path but doesnt have a Hamiltonian cycle.
This is also the canonical example of a hypohamiltonian graph.

58.10
hypohamiltonian
A graph G is hypohamiltonian if G is not Hamiltonian, but G v is Hamiltonian for each

v V (V the vertex set of G). The smallest hypohamiltonian graph is the Petersen graph,
which has ten vertices.
58.11
traceable
let G be a graph. If G has a Hamiltonian path, we say that G is traceable.

Not every traceable graph is Hamiltonian. As an example consider Petersens graph.
399
Chapter 59
05C60 Isomorphism problems
(reconstruction conjecture, etc.)
59.1
graph isomorphism
A graph isomorphism is a bijection between the vertices of two graphs G and H:

f : V (G) V (H)
with the property that any two vertices u and v from G are adjacent if and only if f (u) and
f (v) are adjacent in H.
If an isomorphism can be constructed between two graphs, then we say those graphs are
isomorphic.
For example, consider these two graphs:
a
400
2
5
Although these graphs look very different at first, they are in fact isomorphic; one isomorphism between them is
f (a) = 1
f (b) = 6
f (c) = 8
f (d) = 3
f (g) = 5
f (h) = 2
f (i) = 4
f (j) = 7
401
Chapter 60
05C65 Hypergraphs
60.1
Steiner system
A Steiner system S(t, k, n) is a kuniform hypergraph on n vertices such that every set
of t vertices is contained in exactly one edge. Notice that S(2, k, n) are merely 2uniform
linear spaces. The families of hypergraphs S(2, 3, n) are known as Steiner triple systems.
Version: 2 Owner: drini Author(s): drini, NeuRet
60.2
finite plane
Let H = (V, E) be a linear space. A finite plane is an intersecting linear space. That is to
say, a linear space in which any two edges in E have a nonempty intersection.
Finite planes are rather restrictive hypergraphs, and the following holds.
Theorem 4. Let H = (V, E) be a finite plane. Then for some positive integer k, H is
(k + 1)regular, (k + 1)uniform, and |E| = |V | = k 2 + k + 1.
The above k is the order of the finite plane. It is not known in general if finite planes exist
of order other than k a power of a prime. The terminology finite plane is suggestive, as we
can think of the edges as a finite collection of lines in Euclidean space in General position
so that they all intersect pairwise in exactly one point. The added restriction that all pairs
of vertices determine an edge (i.e. any two points determine a line), however, makes it
impossible to depict a finite plane in the Euclidean plane by means of straight lines, except
for the trivial case k = 1. The finite plane of order 2 is known as the Fano plane. The
following is a diagrammatic representation:
402
An edge here is represented by a straight line, and the inscribed circle is also an edge. In
other words, for a vertex set {1, 2, 3, 4, 5, 6, 7}, the edges of the Fano plane are
{1, 2, 4}
{2, 3, 5}
{3, 4, 6}
{4, 5, 7}
{5, 6, 1}
{6, 7, 2}
{7, 1, 3}
Notice that the Fano plane is generated by the ordered triplet (1, 2, 4) and adding 1 to each
entry, modulo 7. The generating triplet has the property that the differences of any two
elements, in either order, are all pairwise different modulo 7. In general, if we can find a set
of k + 1 elements of Zk2 +k+1 (the integers modulo k 2 + k + 1) with all pairwise differences
distinct, then this gives a cyclic representation of the finite plane of order k.
Version: 10 Owner: drini Author(s): NeuRet
60.3
hypergraph
A hypergraph H is an ordered pair (V, E) where V is a set of vertices and E is a set of

edges such that E P(V ). In other words, an edge is nothing more than a set of vertices.
Sometimes it is desirable to restrict this definition more. The empty hypergraph is not very
interesting, so we usually accept that V 6= . Singleton edges are allowed in general, but
not the empty one. Most applications consider only finite hypergraphs, but occasionally it
is also useful to allow V to be infinite.
Many of the definitions of graphs carry verbatim to hypergraphs. H is said to be k-uniform
if every edge e E has cardinality k. The degree of a vertex v is the number of edges in
E that contain this vertex, often denoted d(v). H is k-regular if every vertex has degree k.
Notice that an ordinary graph is merely a 2uniform hypergraph.
Let V = {v1 , v2 , . . . , vn } and E = {e1 , e2 , . . . em }. Associated to any hypergraph is the
n m incidence matrix A = (aij ) where
(
1 if vi ej
aij =
0 otherwise
The transpose At of the incidence matrix also defines a hypergraph H , the dual of H, in
an obvious manner. To be explicit, let H = (V , E) where V is an melement set and E
is an nelement set of subsets of V . For vj V and ei E , vj ei if and only if aij = 1.
403
Notice, of course, that the dual of a uniform hypergraph is regular and vice-versa. It is not
rare to see fruitful results emerge by considering the dual of a hypergraph.
60.4
linear space
A hypergraph H = (V, E) is a linear space if any pair of vertices is found in exactly one
edge. This usage of the term has no relation to its occasional appearance in linear algebra
as a synonym for a vector space.
The following two observations are often useful for linear spaces. Let n = |V | and pick some
arbitrary v V .
1.
X |e|
eE
2.
X
e3v

n
=
2
|e| 1 = n 1
where the sum is taken over all edges containing v.

The first property follows
because every pair of vertices is contained in precisely one edge.

e
For a fixed edge e, 2 counts all the pairs of vertices
in e, and this summed over all edges
gives all the possible pairs of vertices, which is n2 . The second one holds because given
any vertex, this vertex forms exactly one edge with every other vertex, so |e| 1 counts the
number of vertices v shares in vertex e, summed over all edges gives all the vertices except
v.
404
Chapter 61
05C69 Dominating sets,
independent sets, cliques
61.1
Mantels theorem
Every graph of order n and size greater than bn2 /4c contains a triangle (cycle of order 3).
61.2
clique
A maximal complete subgraph of a graph is a clique, and the clique number (G) of a
graph G is the maximal order of a clique in G. Simply, (G) is the maximal order of a
complete subgraph of G. Some authors however define a clique as any complete subgraph of
G and refer to the other definition as maximum clique.
Adapted with permission of the author from Modern Graph Theory by Bela Bollobas,
published by Springer-Verlag New York, Inc., 1998.
Version: 9 Owner: mathwizard Author(s): mathwizard, digitalis
61.3
proof of Mantels theorem
Lets consider a graph G with n vertices and no triangles and find a charaterization for this
kind of graphs which make them distinct from those graphs which have n vertices and at
least one triangle.Let a vertex vi i = 1, 2, .., n has his own weight wi such that wi = 1 . Let
405
S=wi wj for any (i,j) in E(G). Now,take two vertices h ,k not joined let x the total weight
of the neighbors of k so y is for k and lets assume x >= y.If we take a little portion of
weight from the vertex k and we put this in the weight of h then ,this shifting wont decrease
S, so S would be maximal when all the weight is over a K2 ,that is a complete subgraph
made up of two vertices.Therefore,since S = vw and v + w = 1 at last w = v = 1/2 and
S <= 1/4.The theorem is proven considering all the weights equal to 1/n.In fact in this case
S = 1/n2 |E(G)| and |E(G)| <= n2 /4.
Version: 2 Owner: deiudi Author(s): deiudi
406
Chapter 62
05C70 Factorization, matching,
covering and packing
62.1
Petersen theorem
If G is a 3-regular 2-edge connected graph, then G has a complete matching.

Version: 2 Owner: scineram Author(s): scineram
62.2
Tutte theorem
Let G(V, E) be any finite graph. G has a complete matching iff X V (G) : cp (G X) 6
|X|, where cp (H) is the number of the components of a H graph with an odd number of
points.
Version: 2 Owner: scineram Author(s): scineram
62.3
bipartite matching
A matching on a bipartite graph is called a bipartite matching. Bipartite matchings have

many interesting properties.
Matrix form Suppose we have a bipartite graph G and we partition the vertices into two
sets, V1 and V2 , of the same colour. We may then represent the graph with a simplified
407
adjacency matrix with |V1 | rows and |V2 | columns containing a 1 where an edge joins the
corresponding vertices and a 0 where there is no edge.
We say that two 1s in the matrix are line-independent if they are not in the same row or
column. Then a matching on the graph with be a subset of the 1s in the matrix that are all
line-independent.
For example, consider this bipartite graph (the thickened edges are a matching):
The graph could be represented as the matrix
1 1 0
1 0 1
0 0 1
0 1 1
1 0
0 0
0 1
0 1
where a 1 indicates an edge in the matching.

independent.
Note that all the starred 1s are line-
A complete matching on a bipartite graph G(V1 , V2 , E) is one that saturates all of the
vertices in V1 .
Systems of distinct representatives A system of distinct representatives is equivalent
to a maximal matching on some bipartite graph. Let V1 and V2 be the two sets of vertices in
the graph with |V1 | 6 |V2 |. Consider the set {v V1 : (v)}, which includes the neighborhood
of every vertex in V1 . An SDR for this set will be a unique choice of an vertex in V2 for each
vertex in V1 . There must be an edge joining these vertices; the set of all such edges forms a
matching.
Consider the sets
S1
S2
S3
S4
=
=
=
=
{A, B, D}
{A, C}
{C, E}
{B, C, E}
A
C
E
B
One SDR for these sets is
408
S1
S2
S3
S4
Note that this is the same matching on the graph shown above.
Finding Bipartite Matchings One method for finding maximal bipartite matchings involves using a network flow algorithm. Before using it, however, we must modify the graph.
Start with a bipartite graph G. As usual, we consider the two sets of vertices V1 and V2 .
Replace every edge in the graph with a directed arc from V1 to V2 of capacity L, where L is
some large integer.
Invent two new vertices: the source and the sink. Add a directed arc of capacity 1 from the
source to each vertex in V1 . Likewise, add a directed arc of capacity 1 from each vertex in
V2 to the sink.
Now find the maximum flow from the source to the sink. The total weight of this flow will
be the of the maximum matching on G. Similarly, the set of edges with non-zero flow will
constitute a matching.
There also exist algorithms specifically for finding bipartite matchings that avoid the overhead of setting up a weighted digraph suitable for network flow.
62.4
edge covering
Let G be a graph. An edge covering C on G is a subset of the vertices of G such that each
edge in G is incident with at least one vertex in C.
For any graph, the vertex set is a trivial edge covering. Generally, we are more interested
in minimal coverings. A minimal edge covering is simply an edge covering of the least
possible.
62.5
matching
Let G be a graph. A matching M on G is a subset of the edges of G such that each vertex
in G is incident with no more than one edge in M.
It is easy to find a matching on a graph; for example, the empty set will always be a matching.
Typically, the most interesting matchings are maximal matchings. A maximal matching
on a graph G is simply a matching of the largest possible size.
409
62.6
maximal bipartite matching algorithm
The maximal bipartite matching algorithm is similar some ways to the Ford-Fulkerson algorithm for network flow. This is not a coincidence; network flows and matchings are closely
related. This algorithm, however, avoids some of the overhead associated with finding network flow.
The basic idea behind this algorithm is as follows:
1. Start with some (not necessarily maximal) matching M.
2. Find a path that alternates with an edge e1 M, followed by an edge e2 6 M, and so
on, ending with some edge ef M.
3. For each edge e in the path, add e to M if e 6 M or remove e from M if e M. Note
that this must increase |M| by 1.
4. Repeat until we can no longer augment the matching in this manner.
The algorithm employs a clever labeling trick to find these paths and to ensure that the set
of edges chosen remains a valid matching.
The algorithm as described here uses the matrix form of a bipartite graph. Translating the
matching from a matrix to a graph is straightforward.
There are two phases to this algorithm: labeling and flipping.
Labeling We begin with a matrix with R rows and C columns containing 0s, 1s, and 1s,
where a 1 indicates in edge in the matching and a 1 indicates an edge not in the matching.
Number the columns 1 . . . C and number the rows 1 . . . R.
Start by labeling each column that contains no 1s with the symbol #.
Now we scan the columns. Scan each column i that has been labelled but not scanned. Find
each 1 in column i that is in an unlabelled row; label this row i. Mark column i as scanned.
Next, we scan the rows. Scan each row j that has been labelled but not scanned. Find the
first 1 in row j. Label the column in which it appears j, and mark row j as scanned. If
there is no 1 in row j, proceed to the flipping phase.
Otherwise, go back to row scanning. Continue scanning and labelling until there are no
labelled, unscanned rows or columns; at that point, the set of 1s is a maximal matching.
410
Flipping We enter the flip phase when we scan some row j that contains no 1. This row
must have some label c, and in column c, row j of the matrix, there must be a 1; change
this to a 1.
Now consider column c; it has some label r. If r is #, then stop; go back to the labelling
phase. Otherwise, change the 1 at column c, row r to a 1.
Move on to row r and continue the process.
Notes The algorithm must begin with some matching; we may begin with the empty set
(or a single edge), since that is always a matching. However, each iteration through the
process increases the size of the matching by exactly one. Therefore, we can make a simple
optimization by starting with a larger matching. A nave greedy algorithm can quickly
choose a valid matching that is usually close to the size of the maximal matching; we may
initalize our matrix with that matching to give the procedure a head start.
62.7
maximal matching/minimal edge covering theorem
Theorem Let G be a graph. If M is a matching on G, and C is an edge covering for G,

then |M| 6 |C|.
Proof Consider an arbitrary matching M on G and an arbitrary edge covering C on G.
We will attempt to construct a one-to-one function f : M C.
Consider some edge e M. At least one of the vertices that e joins must be in C, because C
is an edge covering and hence every edge is incident with some vertex in C. Call this vertex
ve , and let f (e) = ve .
Now we will show that f one-to-one. Suppose we have two edges e1 , e2 M where f (e1 ) =
f (e2 ) = v. By the definition of f , e1 and e2 must both be incident with v. Since M is a
matching, however, no more than one edge in M can be incident with any given vertex in
G. Therefore e1 = e2 , so f is one-to-one.
Hence we now have that |M| 6 |C|.
Corollary Let G be a graph. Let M and C be a matching and an edge covering on G,
respectively. If |M| = |C|, then M is a maximal matching and C is a minimal edge covering.
411
Proof Suppose M is not a maximal matching. Then, by definition, there exists another
matching M 0 where |M| < |M 0 |. But then |M 0 | > |C|, which violates the above theorem.
Likewise, suppose C is not a minimal edge covering. Then, by definition, there exists another
covering C 0 where |C 0 | < |C|. But then |C 0 | < |M|, which violates the above theorem.
412
Chapter 63
05C75 Structural characterization of
types of graphs
63.1
multigraph
A multigraph is a graph in which we allow more than one edge to join a pair of vertices.
Two or more edges that join a pair of vertices are called parallel edges. Every graph, then,
is a multigraph, but not all multigraphs are graphs.
63.2
pseudograph
A pseudograph is a graph that allows both parallel edges and loops.

413
Chapter 64
05C80 Random graphs
64.1
examples of probabilistic proofs
The first example is the existence of k-paradoxical tournaments. The proof hinges upon the
following basic probabilistic inequality, for any events A and B,
[
P A B 6 P (A) + P (B)
Theorem 5. For every k, there exists a tournament T (usually very large) such that T is
k-paradoxical.
et n = |T | be the number of vertices of T , where n > k. We will show that for n large
enough, a k-paradoxical tournament must exist. The probability space in question is all
possible directions of the arrows of T , where each arrow can point in either direction with
probability 1/2, independently of any other arrow.
L
We say that a set K of k vertices is arrowed by a vertex v0 outside the set if every arrow
between v0 to wi K points from v0 to wi , for i = 1, . . . , k. Consider a fixed set K of k
vertices and a fixed vertex v0 outside K. Thus, there are k arrows from v0 to K, and only
one arrangement of these arrows permits K to be arrowed by v0 , thus
P (K is arrowed by v0 ) =
1
.
2k
The complementary event, is therefore,

P (K is not arrowed by v0 ) = 1
1
.
2k
By independence, and because there are n k vertices outside of K,

nk
1
.
P (K is not arrowed by any vertex) = 1 k
2
414
(64.1.1)

Lastly, since there are nk sets of cardinality k in T , we employ the inequality mentioned
above to obtain that for the union of all events of the form in equation (64.1.1)
nk

n
1
.
P (Some set of k vertices is not arrowed by any vertex) 6
1 k
2
k
If the probability of this last event is less than 1 for some n, then there must exist a kparadoxical tournament of n vertices. Indeed there is such an n, since

nk
nk
1
n
1
1
1 k
=
n(n 1) (n k + 1) 1 k
k
2
k!
2

nk
1
1 k
<
n 1 k
k!
2
Therefore, regarding k as fixed while n tends to infinity, the right-hand-side above tends to
zero. In particular, for some n it is less than 1, and the result follows.
Version: 6 Owner: karteef Author(s): NeuRet
64.2
probabilistic method
The probabilistic method was pioneered by Erdos Pal (known to Westerners as Paul
Erdos) and initially used for solving problems in graph theory, but has acquired ever wider
applications. Broadly, the probabilistic method is somewhat the opposite of extremal graph
theory. Instead of considering how a graph can behave in the most extreme case, we
consider how a collection of graphs behave on average, whereby we can formulate a
probability space. The fruits reaped by this method are often raw existence theorems, usually deduced from the fact that the nonexistence of whatever sort of graph would mean a zero
probability. For instance, by means of the probabilistic method, Erdos proved the existence
of a graph of arbitrarily high girth and chromatic number, a very counterintuitive result.
Graphs tend to get enormous as the chromatic number and girth increase, thereby severely
hindering necessary computations to explicitly construct them, so an existence theorem is
most welcome.
In all honesty, probabilistic proofs are nothing more than counting proofs in disguise, since
determining the probabilities of interest will invariably involve detailed counting arguments.
In fact, we could remove from any probabilistic proof any mention of a probability space,
although the result may be significanltly less transparent. Also, the advantage of using
probability is that we can employ all the machinery of probability theory. Markov chains,
martingales, expectations, probabilistic inequalities, and many other results, all become the
tools of the trade in dealing with seemingly static objects of combinatorics and number
theory.
415
REFERENCES
1. Noga Alon and Joel H. Spencer. The probabilistic method. John Wiley & Sons, Inc., second
edition, 2000. Zbl 0996.05001.
2. Paul Erdos and Joel Spencer. Probabilistic methods in combinatorics. Academic Press, 1973.
Zbl 0308.05001.
Version: 10 Owner: bbukh Author(s): bbukh, NeuRet
416
Chapter 65
05C90 Applications
65.1
Hasse diagram
If (A, R) is a finite poset, then it can be represented by a Hasse diagram, where a line is
drawn from x A up to y A if
xRy
There is no z A such that xRz and zRy. (There are no in-between elements)
Since we are always drawing from lower to higher elements, we do not need to direct any
edges.
Example: If A = P({1, 2, 3}), the power set of {1, 2, 3}, and R is the subset relation , then
we can draw the following:
{1, 2, 3}
{1, 2}
{1, 3}
{2, 3}
{1}
{2}
{3}
Even though {3}R{1, 2, 3} (since {3} {1, 2, 3}), we do not draw a line directly between
them because there are inbetween elements: {2, 3} and {1, 3}. However, there still remains
an indirect path from {3} to {1, 2, 3}.
417
418
Chapter 66
05C99 Miscellaneous
66.1
Eulers polyhedron theorem
If a connected plane graph G has n vertices, m edges, and f faces, then

n m + f = 2.
66.2
Poincar
e formula
The Poincare formula is a generalization of Eulers polyhedron theorem for polyhedrons of

higher genus. If a polyhedron has n vertices, m edges, f faces and genus g the equation
n m + f = (g),
where
is known as the Euler characteristic.
(g) = 2 2g
66.3
Turans theorem
A graph having n vertices, which contains no p-clique with p > 2, has at most

2
1
n
1
p1 2
419
edges.
66.4
Wagners theorem
A graph is planar if and only if it contains neither K5 nor K3,3 as a minor, where K5 is
the complete graph of order 5 and K3,3 is the complete bipartite graph of order 6. This is
equivalent to Kuratowskis theorem.
66.5
block
A subgraph B of a graph G is a block of G if either it is a bridge (together with the vertices

incident with the bridge) or else it is a maximal 2-connected subgraph of G.
Any two blocks of a graph G have at most one vertex in common. Also, every vertex
belonging to at least two blocks is a cutvertex of G, and, conversely, every cutvertex belongs
to at least two blocks.
66.6
bridge
A bridge of a graph G is an edge whose deletion increases the number of components of G.

The vertex analogue of a bridge is a cutvertex.
66.7
complete graph
The complete graph with n vertices, denoted Kn , contains all possible edges; that is, any
two vertices are adjacent.
420
The complete graph of 4 vertices, or K4 looks like this:
The number of edges in Kn is the n 1th triangular number. Every vertex in Kn has degree
n 1; therefore Kn has an Euler circuit if and only if n is odd. A complete graph always
has a Hamiltonian path, and the chromatic number of Kn is always n.
66.8
degree (of a vertex)
The degree of a vertex x is d(x) = |(x)|, where (x) is the neighborhood of x. If we want
to emphasize that the underlying graph is G, then we write G (x) and dG (x); this notation
can be intuitively extended to many other graph theoretic functions.
The minimal degree of the vertices of a graph G is denoted by (G) and the maximal
degree by (G). A vertex of degree 0 is said to be an isolated vertex. If (G) = (G) = k,
that is, every vertex has degree k, then G is said to be k-regular or regular of degree k.
A graph is regular if it is k-regular for some k. A 3-regular graph is said to be cubic.
Version: 7 Owner: mathcam Author(s): Larry Hammick, digitalis
66.9
distance (in a graph)
Given vertices x and y, their distance d(x, y) is the minimal length of an x y path. If
there is no x y path then d(x, y) = .
66.10
edge-contraction
Given an edge xy of a graph G, the graph G/xy is obtained from G by contracting the edge
xy; that is, to get G/xy we identify the vertices x and y and remove all loops and duplicate
421
edges. A graph G0 obtained by a sequence of edge-contractions is said to be a contraction

of G.
66.11
graph
A graph G is an ordered pair of disjoint sets (V, E) such that E is a subset of the set V (2)
of unordered pairs of V . V and E are always assumed to be finite, unless explicitly stated
otherwise. The set V is the set of vertices (sometimes called nodes) and E is the set of
edges. If G is a graph, then V = V (G) is the vertex set of G, and E = E(G) is the edge
set. Typically, V (G) is defined to be nonempty. If x is a vertex of G, we sometimes write
x G instead of x V (G).
An edge {x, y} is said to join the vertices x and y and is denoted by xy. This xy and yx are
equivalent; the vertices x and y are the endvertices of this edge. If xy E(G), then x and
y are adjacent, or neighboring, vertices of G, and the vertices x and y are incident with
the edge xy. Two edges are adjacent if they have exactly one common endvertex. Also,
x y means that the vertex x is adjacent to the vertex y.
Notice that the definition allows pairs of the form {x, x}, which would correspond to a node
joining to itself.
Some graphs.
If, on a given graph, there is at most one edge joining each pair of nodes, we say that the
graph is simple.
Version: 24 Owner: drini Author(s): digitalis, drini
66.12
graph minor theorem
If (Gk )kN is an infinite sequence of finite graphs, then there exist two numbers m < n such
that Gm is isomorphic to a minor of Gn .
422
This theorem (proven by Robertson and Seymour) is often referred to as the deepest result
in graph theory. It resolves Wagners conjecture in the affirmative and leads to an important
generalization of Kuratowskis theorem.
Specifically, to every set G of finite graphs which is closed under taking minors (meaning if
G G and H is isomorphic to a minor of G, then H G) there exist finitely many graphs
G1 , . . . , Gn such that G consists precisely of those finite graphs that do not have a minor
isomorphic to one of the Gi . The graphs G1 , . . . , Gn are often referred to as the forbidden
minors for the class G.
Version: 6 Owner: AxelBoldt Author(s): AxelBoldt, JoshuaHorowitz
66.13
graph theory
Graph theory is the branch of mathematics that concerns itself with graphs.
The concept of graphs is extraordinarily simple, which explains their wide applicability. It
is usually agreed upon that graph theory proper was born in 1736, when Euler formalized
the now-famous bridges of Konigsberg problem. Graph theory has now grown to touch
almost every mathematical discipline, in one form or another, and it likewise borrows from
elsewhere tools for its own problems. Anyone who delves into the topic will quickly see that
the lifeblood of graph theory is the abundancy of tricky questions and clever answers. There
are, of course, general results that systematize the subject, but we also find an emphasis on
the solutions of substantial problems over building machinery for its own sake.
For quite long graph theory was regarded as a branch of topology concerned with 1-dimensional
simplices, however this view has faded away. The only remainder of the topological past is
the topological graph theory, a branch of graph theory that primarily deals with drawing of
graphs on surfaces. The most famous achievement of topological graph theory is the proof
of the four-color conjecture (every political map on the surface of a plane or a sphere can be
colored into four colors given that each country consists of only one piece).
Now, a (finite) graph is usually thought of as a subset of pairs of elements of a finite set
(called vertices), or more generally as a family of arbitrary sets in the case of hypergraphs. For
instance, Ramsey theory as applied to graph theory deals with determining how disordered
can graphs be. The central result here is the Ramseys theorem which states that one can
always find many vertices that are either all connected or disconnected from each other,
given that the graph is sufficiently large. The other result is Szemeredi regularity lemma.
The four-color conjecture mentioned above is one of the problems in graph coloring. There
are many ways one can color a graph, but the most common are vertex coloring and edge
coloring. In these type of colorings, one colors vertices (edges) of a graph so that no two
vertices of the same color are adjacent (resp. no edges of the same color share a common
vertex). The most common problem is to find a coloring using the fewest number of colors
423
possible. Such problems often arise in scheduling problems.

Graph theory benefits greatly from interaction with other fields of mathematics. For example,
probabilistic methods have become the standard tool in the arsenal of graph theorists, and
random graph theory has grown into a full-fledged branch of its own.
Version: 22 Owner: karteef Author(s): yark, bbukh, iddo, mathwizard, NeuRet, akrowne
66.14
homeomorphism
We say that a graph G is homeomorphic to graph H if the realization R(G) of G is

topologically homeomorphic to R(H) or, equivalently, G and H have isomorphic subdivisions.
66.15
loop
In graph theory, a loop is an edge which joins a vertex to itself, rather than to some other
vertex. By definition, a graph cannot contain a loop; a pseudograph, however, may contain
both multiple edges and multiple loops. Note that by some definitions, a multigraph may
contain multiple edges and no loops, while other texts define a multigraph as a graph allowing
multiple edges and multiple loops.
In algebra, a loop is a quasigroup which contains an identity element.
Version: 6 Owner: drini Author(s): Larry Hammick, digitalis
66.16
minor (of a graph)
A graph H is a minor of G, written G H or H G, if it is a subgraph of a graph obtained

from G by a sequence of edge-contractions.
424
66.17
neighborhood (of a vertex)
For a graph G, the set of vertices adjacent to a vertex x G, the neighborhood

S of x, is
denoted by (x). Occasionally one calls (x) the open neighborhood of x, and {x} the
closed neighborhood of x.
Adapted with permission of the author from Modern graph theory by Bela Bollobas,
66.18
null graph
Figure 1: Null Graph.
A graph with zero vertices and edges. One possible representation of the null graph is shown
above.
The null graph is the initial object in the category of graphs.
Further Reading
Is the null graph a pointless concept, by Frank Harary and Ronald Read
Version: 4 Owner: wombat Author(s): wombat
66.19
order (of a graph)
The order of a graph G is the number of vertices in G; it is denoted by |G|. The same
notation is used for the number of elements (cardinality) of a set. Thus, |G| = |V (G)|. We
write Gn for an arbitrary graph of order n. Similarly, G(n, m) denotes an arbitrary
graph of order n and size m.
425
66.20
proof of Eulers polyhedron theorem
This proof is not one of the standard proofs given to Eulers formula. I found the idea
presented in one of Coxeters books. It presents a different approach to the formula, that may
be more familiar to modern students who have been exposed to a Discrete Mathematics
course. It falls into the category of informal proofs: proofs which assume without proof
certain properties of planar graphs usually proved with algebraic topology. This one makes
deep (but somewhat hidden) use of the Jordan curve theorem.
Let G = (V, E) be a planar graph; we consider some particular planar embedding of G. Let
F be the set of faces of this embedding. Also let G0 = (F, E 0 ) be the dual graph (E 0 contains
an edge between any 2 adjacent faces of G). The planar embeddings of G and G0 determine
a correspondence between E and E 0 : two vertices of G are adjacent iff they both belong to
a pair of adjacent faces of G; denote by : E E 0 this correspondence.
In all illustrations, we represent a planar graph G, and the two sets of edges T E (in red) and
T 0 E 0 (in blue).
Let T E be a spanning tree of G. Let T 0 = E 0 \ [E]. We claim that T 0 is a spanning tree

of G0 . Indeed,
T 0 contains no loop.
Given any loop of edges in T 0 , we may draw a loop on the faces of G which participate
in the loop. This loop must partition the vertices of G into two non-empty sets, and
only crosses edges of E \ T . Thus, (V, T ) has more than a single connected component,
so T is not spanning. [The proof of this utilizes the Jordan curve theorem.]
T 0 spans G0 .
For suppose T 0 does not connect all faces F . Let f1 , f2 F be two faces with no path
between them in T 0 . Then T must contain a cycle separating f1 from f2 , and cannot
be a tree. [The proof of this utilizes the Jordan curve theorem.]
S
We thus have a partition E = T 1 [T 0 ] of the edges of G into two sets. Recall that in
any tree, the number of edges is one less than the number of nodes. It follows that
|E| = |T | + |T 0 | = (|V | 1) + (|F | 1) = |V | + |F | 2,
as required.
426
66.21
proof of Turans theorem
If the graph G has n 6 p 1 vertices it cannot contain any pclique and thus has at most
n
edges. So in this case we only have to prove that
2

2
n(n 1)
1
n
6 1
.
2
p1 2
This of course is easy, we get
n1
1
1
=1 61
.
n
n
p1
This is true since n 6 p 1.
So now we assume that n > p and the set of vertices of G is denoted by V . If G has the
maximum number of edges possible without containing a pclique it contains a p 1clique,
since otherwise we might add edges to get one. So we denote one such clique by A and define
B := G\A.

So A has p1
edges. We are now interested in the number of edges in B, which we will call
2
eB , and in the number of edges connecting A and B, which will be called eA,B . By induction
we get:

1
1
1
(n p + 1)2 .
eB 6
2
p1
Since G does not contain any pclique every vertice of B is connected to at most p 2
vertices in A and thus we get:
eA,B 6 (p 2)(n p + 1).
Putting this together we get for the number of edges |E| of G:

1
p1
1
+
|E| 6
1
(n p + 1)2 + (p 2)(n p + 1).
2
2
p1
And thus we get:

|E| 6 1
1
p1
n2
.
2
66.22
realization
Let p1 , p2 , . . . be distinct points in R3 , the 3-dimensional Euclidean space, such that every
plane in R3 contains at most 3 of these points. Write (pi , pj ) for the straight line segment
427
with endpoints pi and pj (open or closed, as you like). Given a graph G = (V, E), V =
(x1 , x2 , . . . , xn ), the topological space
R(G) =
{(pi , pj ) : xi xj E}
n
[[
1
{pi } R3
is said to be a realization of G.
66.23
size (of a graph)
The size of a graph G is the number of edges in G; it is denoted by e(G). G(n, m) denotes
an arbitrary graph of order n and size m.
66.24
subdivision
A graph H is said to be a subdivision of a graph G, or a topological G graph if H is

obtained from G by subdividing some of the edges, that is, by replacing the edges by paths
having at most their endvertices in common. We often use T G for a topological G graph.
Thus, T G denotes any member of a large family of graphs; for example, T C4 is an arbitrary
cycle of length at least 4. For any graph G, the spaces R(G) (denoting the realization of G)
and R(T G) are homeomorphic.
428
66.25
subgraph
We say that G0 = (V 0 , E 0 ) is a subgraph of G = (V, E) if V 0 V and E 0 E. In this case

we write G0 G.
If G0 contains all edges of G that join two vertices in V 0 then G0 is said to be the subgraph
induced or spanned by V 0 and is denoted by G[V 0 ]. Thus, a subgraph G0 of G is an
induced subgraph if G0 = G[V (G0 )]. If V 0 = V , then G0 is said to be a spanning subgraph
of G.
Often, new graphs are constructed from old ones by deleting or adding some vertices and
edges. If W V (G), then G W = G[V W] is the subgraph of G obtained by deleting
the vertices in W and all edges incident with them. Similarly, if E 0 E(G), then
G E 0 = (V (G), E(G) \ E 0 ). If W = w and E 0 = xy, then this notation is simplified to
G w and G xy. Similarly, if x and y are nonadjacent vertices of G, then G + xy is
obtained from G by joining x to y.
Version: 5 Owner: bbukh Author(s): digitalis
66.26
wheel graph
The wheel graph of n vertices Wn is a graph that contains a cycle of length n 1 plus a
vertex v (sometimes called the hub) not in the cycle such that v is connected to every other
vertex. The edges connecting v to the rest of the graph are sometimes called spokes.
W4 :
A
D
C
429
W6 :
A
E
F
D
B
C
430
Chapter 67
05D05 Extremal set theory
67.1
LYM inequality
Let F be a Sperner family, that is, the collection of subsets of {1, 2, . . . , n} such that no set
contains any other subset. Then
X 1
6 1.
n
|X|
XF
This inequality is known as LYM inequality by the names of three people that independently discovered it: Lubell[2], Yamamoto[4], Meshalkin[3].

n
n
Since nk 6 bn/2c
for every integer k, LYM inequality tells us that |F | / bn/2c
6 1 which is
Sperners theorem.
REFERENCES
1. Konrad Engel. Sperner theory, volume 65 of Encyclopedia of Mathematics and Its Applications.
Cambridge University Press. Zbl 0868.05001.
2. David Lubell. A short proof of Sperners lemma. J. Comb. Theory, 1:299, 1966. Zbl 0151.01503.
3. Lev D. Meshalkin. Generalization of Sperners theorem on the number of subsets of a finite set.
Teor. Veroyatn. Primen., 8:219220, 1963. Zbl 0123.36303.
4. Koichi Yamamoto. Logarithmic order of free distributive lattice. J. Math. Soc. Japan, 6:343
353, 1954. Zbl 0056.26301.
431
67.2
Sperners theorem
What is the size of the largest family F of subsets of an n-element set such that no A F
is a subset of B F? Sperner [3] gave an answer in the following elegant theorem:

n
Theorem 7. For every family F of incomparable subsets of an n-set, |F| 6 bn/2c
.
A family satisfying the conditions of Sperners theorem is usually called Sperner family or
antichain. The later terminology stems from the fact that subsets of a finite set ordered by
inclusion form a Boolean lattice.
There are many generalizations of Sperners theorem. On one hand, there are refinements
like LYM inequality that strengthen the theorem in various ways. On the other hand, there
are generalizations to posets other than the Boolean lattice. For a comprehensive exposition
of the topic one should consult a well-written monograph by Engel[2].
REFERENCES
1. Bela Bollob
as. Combinatorics: Set systems, hypergraphs, families of vectors, and combinatorial
probability. 1986. Zbl 0595.05001.
2. Konrad Engel. Sperner theory, volume 65 of Encyclopedia of Mathematics and Its Applications.
Cambridge University Press. Zbl 0868.05001.
3. Emanuel Sperner. Ein Satz u
ber Untermengen einer endlichen Menge. Math. Z., 27:544548,
1928. Available online at JFM.
432
Chapter 68
05D10 Ramsey theory
68.1
Erd
os-Rado theorem
Repeated exponentiation for cardinals is denoted expi (), where i < . It is defined by:
exp0 () =
and
expi+1 () = 2expi ()
The Erdos-Rado theorem states that:
expi ()+ (+ )i+1
That is, if f : [expi ()+ ]i+1 then there is a homogenous set of size + .
As special cases, (2 )+ (+ )2 and (20 )+ (1 )20 .
68.2
Ramseys theorem
Ramseys theorem states that a particular arrows relation,
433
()nk
for any integers n and k.
In words, if f is a function on sets of integers of size n whose range is finite then there is
some infinite X such that f is constant on the subsets of X of size n.
As an example, suppose f : []2 {0, 1} is defined by:
f ({x, y}) =
1 if x = y 2 or y = x2
0 otherwise
Then let X be the set of integers which are not perfect squares. This is clearly infinite,
and obviously if x, y X then neither x = y 2 nor y = x2 , so f is homogenous on X.
68.3
Ramseys theorem
The original version of Ramseys theorem states that for every positive integers k1 and
k2 there is n such that if edges of a complete graph on n vertices are colored in two colors,
then there is either a k1 -clique of the first color or k2 -clique of the second color.
The standard proof proceeds by induction on k1 + k2 . If k1 , k2 6 2, then the theorem holds
trivially. To prove induction step we consider the graph G that contains no cliques of the
desired kind, and then consider any vertex v of G. partition the the rest of the vertices into
two classes C1 and C2 according to whether edges from v are of color 1 or 2 respectively.
By inductive hypothesis |C1 | is bounded since it contains no k1 1-clique of color 1 and no
k2 -clique of color 2. Similarly, |C2 | is bounded. QED.
Similar argument shows that for any positive integers k1 , k2 , . . . , kt if we color the edges of a
sufficiently large graph in t colors, then we would be able to find either k1 -clique of color 1,
or k2 -clique or color 2,. . . , or kt -clique of color t.
The minimal n whose existence stated in Ramseys theorem is called Ramsey number
and denoted by R(k1 , k2 ) (and R(k1 , k2 , . . . , kt ) for multicolored graphs). The above proof
shows that R(k1 , k2 ) 6 R(k1 , k2 1) + R(k1 1, k2 ). From that it is not hard to deduce by
2 2
induction that R(k1 , k2 ) 6 k1k+k
. In the most interesting case k1 = k2 = k this yields
1 1
k
approximately R(k, k) 6 (4 + o(1)) . The lower bounds can be established by means of
probabilistic construction as follows.
Take a complete graph of size n and color its edges at random, choosing the color of each
edge uniformly independently of all other edges. The probability that any given set of k
434
vertices is a monochromatic clique is 21k . Let Ir be the random variable which is 1 if

rth set of k elements is monochromatic clique and is 0 otherwise. The sum Ir s over all
k-element sets is
the number of monochromatic
k-cliques. Therefore by linearity of

Psimply P
expectation E( Ir ) =
E(Ir ) = 21k nk . If the expectation is less than 1, then there
exists a coloring which has no monochromatic cliques. A little exercise in calculus shows
that if we choose nto be more than (2 + o(1))k/2 then the expectation is indeed less than 1.
Hence, R(k, k) > ( 2 + o(1))k .
The gap between the lower and upper bounds has been open for several decades.
There
have been a number of improvements in o(1) terms, but nothing better than 2 + o(1) 6
R(k, k)1/k 6 4 + o(1) is known. It is not even known whether limk R(k, k)1/k exists.
The behavior of R(k, x) for fixed k and large x is equally mysterious. For this case Ajtai,
Komlos and Szemeredi[1] proved that R(k, x) 6 ck xk1 /(ln)k2 . The matching lower bound
has only recently been established for k = 3 by Kim [4]. Even in this case the asymptotics
is unknown. The combination of results of Kim and improvement of Ajtai, Komlos and
1
Szemeredis result by Shearer [6] yields 162
(1 + o(1))k 2 / log k 6 R(3, k) 6 (1 + o(1))k 2 / log k.
A lot of machine and human time has been spent trying to determine Ramsey numbers for
small k1 and k2 . An up-to-date summary of our knowledge about small Ramsey numbers
can be found in [5].
REFERENCES
1. Miklos Ajtai, Janos Komlos, and Endre Szemeredi. A note on Ramsey numbers. J. Combin.
Theory Ser. A, 29(3):354360, 1980. Zbl 0455.05045.
2. Noga Alon and Joel H. Spencer. The probabilistic method. John Wiley & Sons, Inc., second
edition, 2000. Zbl 0996.05001.
3. Ronald L. Graham, Bruce L. Rothschild, and Joel H. Spencer. Ramsey Theory. WileyInterscience series in discrete mathematics. 1980. Zbl 0455.05002.
4. Jeong Han Kim. The Ramsey number R(3, t) has order of magnitude t2 / log t. Random Structures & Algorithms, 7(3):173207, 1995. Zbl 0832.05084. Preprint is available online.
5. Stanislaw Radziszowski. Small Ramsey numbers. Electronic Journal of Combinatorics, Dynamical Survey, 2002. Available online.
6. James B. Shearer. A note on the independence number of triangle-free graphs. Discrete Mathematics, 46:8387, 1983. Zbl 0516.05053. Abstract is available online
68.4
arrows
Let [X] = {Y X | |Y | = }, that is, the set of subsets of X of . Then given some
cardinals , , and
435
()
States that for any set X of size and any function f : [X] , there is some Y X and
some such that |Y | = and for any y [Y ] , f (y) = .
In words, if f is a partition of [X] into subsets then f is constant on a subset of size (a
homogenous subset).
As an example, the pigeonhole principle is the statement that if n is finite and k < n then:
n 21k
That is, if you try to partition n into fewer than n pieces than one piece has more than one
element.
Observe that if
()
then the same statement holds if:
is made larger (since the restriction of f to a set of size can be considered)
is made smaller (since a subset of the homogenous set will suffice)
is made smaller (since any partition into fewer than pieces can be expanded by
adding empty sets to the partition)
is made smaller (since a partition f of [] where < can be extended to a
partition f 0 of [] by f 0 (X) = f (X ) where X is the smallest elements of X)
9 ()
is used to state that the corresponding relation is false.
68.5
coloring
A coloring of a set X by Y is just a function f : X Y . The term coloring is used because

the function can be thought of as assigning a color from Y to each element of X.
436
Any coloring provides a partition of X: for each y Y , f 1 (y), the set of elements x such
that f (x) = y, is one element of the partition. Since f is a function, the sets in the partition
are disjoint, and since it is a total function, their union is X.
68.6
proof of Ramseys theorem

()nk
is proven by induction on n.
If n = 1 then this just states that any partition of an infinite set into a finite number of
subsets must include an infinite set; that is, the union of a finite number of finite sets is
finite. This is simple enough to prove: since there are a finite number of sets, there is a
largest set of size x. Let the number of sets be y. Then the size of the union is no more than
xy.
If
()nk
then we can show that
()n+1
k
Let f be some coloring of [S]n+1 by k where S is an infinite subset
S of . Observe that, given
x
n
x
and x < , we can define f : [S \ {x}] k by f (X) = f ({x} X. Since S is infinite, by
the induction hypothesis this will have an infinite homogenous set.
Then we define a sequence of integers hni ii and a sequence of infinite subsets of , hSi ii
by induction. Let n0 = 0 and let S0 = . Given ni and Si for i 6 j we can define Sj as an
infinite homogenous set for f ni : [Sj1]n k and nj as the least element of Sj .
S
Obviously N = {ni } is infinite, and it is also homogenous, since each ni is contained in Sj
for each j 6 i.
437
Chapter 69
05D15 Transversal (matching)
theory
69.1
Halls marriage theorem
Let S = {S1 , S2 , . . . Sn } be a finite collection of finite sets. There exists a system of distinct representatives
of S if and only if the following condition holds for any T S:
[

T > |T |
As a corollary, if this condition fails to hold anywhere, then no SDR exists.
This is known as Halls marriage theorem. The name arises from a particular application of
this theorem. Suppose we have a finite set of single men/women, and, for each man/woman,
a finite collection of women/men to whom this person is attracted. An SDR for this collection
would be a way each man/woman could be (theoretically) married happily. Hence, Halls
marriage theorem can be used to determine if this is possible.
An application of this theorem to graph theory gives that if G(V1 , V2 , E) is a bipartite graph,
then G has a complete matching that saturates every vertex of V1 if and only if |S| 6 |N(S)|
for every subset S V1 .
69.2
proof of Halls marriage theorem
We prove Halls marriage theorem by induction on |S|, the size of S.

The theorem is trivially true for |S| = 0.
438
Assuming the theorem true for all |S| < n, we prove it for |S| = n.
First suppose that we have the stronger condition
[

T |T | + 1
for all 6= T S. Pick any x Sn as the representative of Sn ; we must choose an SDR

from
S 0 = {xS1 , . . . , xSn1 } .
But if
then, by our assumption,
{xSj1 , ..., xSjk } = T 0 S 0

k

[ [

0
T Sji 1 k.

i=1
By the already-proven case of the theorem for S 0 we see that we can indeed pick an SDR for
S 0.
Otherwise, for some =
6 T S we have the exact size
[

T = |T | .
Inside T itself, for any T 0 T S we have
[

T 0 |T 0 | ,
so by an already-proven case of the theorem we can pick an SDR for T .

S
It remains to pick an SDR for ST which avoids all elements of T (these elements are in
the SDR for T ). To use the already-proven case of the theorem S(again) and do this, we
0
must show that
S 0for any T ST, even after discarding elements of T there remain enough
elements in T : we must prove
[
[

T 0 \ T |T 0 | .
But
[
[
S S
S

T = | (T T 0 )| | T |
T0 \
S
|T T 0 | |T | =
= |T | + |T 0 | |T | = |T 0 | ,
(69.2.1)
(69.2.2)
(69.2.3)
using the disjointness of T and T 0 . So by an already-proven

case of the theorem, ST does
S
indeed have an SDR which avoids all elements of T .
QED.

439
69.3
saturate
Let G(V, E) be a graph and M a matching in G. A vertex v V (G) is said to be saturated

by M if there is an edge in M incident to v. A vertex v V (G) with no such edge is said
to be unsaturated by M. We also say that M saturates v.
69.4
system of distinct representatives
Let S = {S1 , S2 , . . . Sn } be a finite collection of finite sets. A system of distinct representatives, or SDR, of S is a set
x1 S1 , x2 S2 , . . . xn Sn
such that
xi 6= xj whenever i 6= j
(i.e., each choice must be unique).
440
Chapter 70
05E05 Symmetric functions
70.1
elementary symmetric polynomial
The coefficient of xnk in the polynomial (x + t1 )(x + t2 ) (x + tn ) is called the kth

elementary symmetric polynomial in the n variables t1 , . . . , tn .
The first few examples are:
n=1
t1
n=2
t1 + t2
t1 t2
n=3
t1 + t2 + t3
t1 t2 + t2 t3 + t1 t3
t1 t2 t3
70.2
reduction algorithm for symmetric polynomials
We give here an algorithm for reducing a symmetric polynomial into a polynomial in the
elementary symmetric polynomials.
441
We define the height of a monomial xe11 xenn in R[x1 , . . . , xn ] to be e1 + 2e2 + + nen . The
height of a polynomial is defined to be the maximum height of any of its monomial terms,
or 0 if it is the zero polynomial.
Let f be a symmetric polynomial. We reduce f into elementary symmetric polynomials by
induction on the height of f . Let cxe11 xenn be the monomial term of maximal height in f .
Consider the polynomial
e en1 en1 en2
s2
g := f cs1n
2 e1 e1
sen1
sn
where sk is the kth elementary symmetric polynomial in the n variables x1 , . . . , xn . Then

g is a symmetric polynomial of lower height than f , so by the induction hypothesis, g is a
polynomial in s1 , . . . , sn , and it follows immediately that f is also a polynomial in s1 , . . . , sn .
442
Chapter 71
71.1
equivalence relation
An equivalence relation on a set S is a relation that is:

Reflexive. a a for all a S.
Symmetric. Whenever a b, then b a.
Transitive. If a b and b c then a c.
If a and b are related this way we say that they are equivalent under . If a S, then the
set of all elements of S that are equivalent to a is called the equivalence class of a.
An equivalence relation on a set induces a partition on it, and also any partition induces
an equivalence relation. Equivalence relations are important, because often the set S can
be transformed into another set (quotient space) by considering each equivalence class as
a single unit.
Two examples of equivalence relations:
1. Consider the set of integers 1. Z and take a positive integer m. Then m induces an
equivalence relation by a b when 1. m divides 1. b a (that is, a and b leave the same
remainder when divided by m).
2. Take a group (G, ) and a 2. subgroup 2. H. Define a b whenever ab1 H. That
defines an equivalence relation. Here 2. equivalence classes are called 2. cosets.
443
444
Chapter 72
06-XX Order, lattices, ordered
algebraic structures
72.1
join
Certain posets X have a binary operator join, denoted such that xy is the lowest upper bound
of x and y. Further, if j and j 0 are both joins of x and y, then j 6 j 0 and j 0 6 j, and so
j = j 0 ; thus a join, if it exists, is unique. Also known as the or operator.
Version: 4 Owner: yark Author(s): yark, greg
72.2
meet
Certain posets X have a binary operator meet, denoted such that xy is the greatest lower bound
of x and y. Further, if m and m0 are both meets of x and y, then m 6 m0 and m > m0 , and
so m = m0 ; thus a meet, if it exists, is unique.
445
Chapter 73
06A06 Partial order, general
73.1
directed set
A directed set is a partially ordered set (A, 6) such that whenever a, b A there is c A
such that a 6 c and b 6 c.
A subset B A is said to be residual iff there is a A such that b B whenever a 6 b,
and cofinal iff for each a A there is b B such that a 6 b.
Note: Many authors do not require 6 to be antisymmetric, so that (A, 6) is only a suborder
with the above property.
73.2
infimum
The infimum of a set S is the greatest lower bound of S and is denoted inf(S).
Let A be a set with a partial order 6, and let S A. For any x A, x is a lower bound
of S if x 6 y for any y S. The infimum of S, denothed inf(S), is the greatest such lower
bound; that is, if b is a lower bound of S, then b 6 inf(S).
Note that it is not necessarily the case that inf(S) S. Suppose S = (0, 1); then inf(S) = 0,
but 0 6 S.
Also note that a set does not necessarily have an infimum. See the attachments to this entry
for examples.
446
73.3
sets that do not have an infimum
Some examples for sets that do not have an infimum:

The set M1 := Q (as a subset of Q) does not have an infimum (nor a supremum).
Intuitively this is clear, as the set is unbounded. The (easy) formal proof is left as an
exercise for the reader.
A more interesting example: The set M2 := {x Q : x2 > 2, x > 0} (again as a subset
of Q) .
N ow
of M2 . Now we use the fact
clearly, inf(M2 ) > 0. Assume i > 0is an infimum
that 2 is not rational, and therefore i < 2 or i > 2.
If i < 2, choose any j Q from the interval (i, 2) R (this is a real interval, but
as the rational numbers are dense in the real numbers, every nonempty interval in R
contains a rational number, hence such a j exists).
Then j > i, but j < 2, hence j 2 < 2 and therefore j is a lower bound for M2 , which
is a contradiction.
On the other
hand,
if
i
>
2, the argument
is very similar: Choose any j Q from the

interval ( 2, i) R. Then j < i, but j > 2, hence j 2 > 2 and therefore j M2 . Thus
M2 contains an element j smaller than i, which is a contradiction to the assumption
that i = inf(M2 )
Intuitively speaking, this example exploits the fact that Q does not have enough
elements. More formally, Q
as a metric space is not complete. 0 The M2 defined above
0
is the real interval M2 := ( 2, ) R intersected with Q. M2 as a subset of R does
have an infimum (namely 2), but as that is not an element of Q, M2 does not have
an infimum as a subset of Q.
This example also makes it clear that it is important to clearly state the superset one
is working in when using the notion of infimum or supremum.
It also illustrates that the supremum is a natural generalization of the minimum of a
set, as a set that does not have a minimum may still have an infimum (such as M20 ).
Of course all the ideas expressed here equally apply to the supremum, as the two
notions are completely analogous (just reverse all inequalities).
Version: 4 Owner: sleske Author(s): yark, sleske
73.4
supremum
The supremum of a set S is the least upper bound of S and is denoted sup(S).
447
Let A be a set with a partial order 6, and let S A. For any x A, x is an upper bound
of S if y 6 x for any y S. The supremum of S is the least such upper bound; that is, if b
is an upper bound of S, then sup(S) 6 b.
Note that it is not necessarily the case that sup(S) S. Suppose S = (0, 1); then sup(S) = 1,
but 1 6 S.
Note also that a set may not have an upper bound at all.
73.5
upper bound
Let S be a set with an ordering relation 6, and let T be a subset of S. An upper bound for
T is an element z S such that x 6 z for all x T . We say that T is bounded from above
if there exists an upper bound for T .
Lower bound, and bounded from below are defined in a similar manner.
448
Chapter 74
06A99 Miscellaneous
74.1
dense (in a poset)
If (P, 6) is a poset then a subset Q P is dense if for any p P there is some q Q such
that q 6 p.
74.2
partial order
A partial order (often simply referred to as an order or ordering) is a relation 6 A A

that satisfies the following three properties:
1. reflexivity: a 6 a for all a A
2. antisymmetry: If a 6 b and b 6 a for any a, b A, then a = b
3. transitivity: If a 6 b and b 6 c for any a, b, c A, then a 6 c
A total order is a partial order that satisfies a fourth property known as comparability.
A set and a partial order on that set define a poset.
449
74.3
poset
A poset is a partially ordered set, that is, a poset is a pair (P, ) where is a partial order
relation on P .
A few examples:
(Z, ) where is the common less than or equal relation on the integers.
(P(X), ). P(X) is the power set of X and the relation given by the common
inclusion of sets.
In a partial order, not any two elements need to be comparable. As example, consider
X = {a, b, c} and the poset on its power set given by the inclusion. Here, {a} {a, c}
but the two subsets {a, b} and {a, c} are not comparable. (Neither {a, b} {a, c} nor
{a, c} {a, b}.
74.4
quasi-order
A quasi-order on a set S is a relation . on S satisfying the following two axioms:

1. reflexivity: s . s for all s S, and
2. transitivity: If s . t and t . u, then s . u; for all s, t, u S.
Given such a relation, the relation s t : (s . t) (t . s) is an equivalence relation on
S, and . induces a partial order 6 on the set S/ of equivalence classes of defined by
[s] 6 [t] : s . t,
where [s] and [t] denote the equivalence classes of s and t. In particular, 6 does satisfy
antisymmetry, whereas . may not.
Version: 3 Owner: draisma Author(s): draisma
74.5
well quasi ordering
A quasi-order (Q, -) is a well-quasi-ordering (wqo) if for every infinite sequence a1 , a2 , a3 , . . .

from Q there exist i < j N such that ai - aj . An infinite sequence from Q is usually
450
referred to as bad if for all i < j, ai 6- aj holds; otherwise it is called good. Note that an
antichain is obviously a bad sequence.
The following proposition gives equivalent definitions for well-quasi-ordering:
Proposition 3. Given a set Q and a binary relation - over Q, the following conditions are
equivalent:
(Q, -) is a well-quasi-ordering;
(Q, -) has no infinite (-) strictly decreasing chains and no infinite antichains.
Every linear extension of Q/ is a well-order, where is the equivalence relation and
Q/ is the set of equivalence classes induced by .
Any infinite (-) Q-sequence contains an increasing chain.
The equivalence of WQO to the second and the fourth conditions is proved by the infinite
version of Ramseys theorem.
Version: 8 Owner: iddo Author(s): iddo
451
Chapter 75
06B10 Ideals, congruence relations
75.1
order in an algebra
Let A be an algebra, finitely generated over Q. An order R of A is a subring of A which is

finitely generated as a Z-module and which satisfies R Q = A.
Remark: The algebra A is not necessarily commutative.
Examples:
1. The ring of integers in a number field is an order, known as the maximal order.
2. Let K be a quadratic imaginary field and O its ring of integers. Then for each integer
n, the ring Z + nO is an order of K (in fact it can be proved that every order of K is
of this form).
Reference: Joseph H. Silverman, The arithmetic of elliptic curves, Springer-Verlag, New
York, 1986.
Version: 6 Owner: alozano Author(s): alozano
452
Chapter 76
06C05 Modular lattices,
Desarguesian lattices
76.1
modular lattice
A lattice L is said to be modular if x (y z) = (x y) z for all x, y, z L such that x 6 z.

The following are examples of modular lattices.
All distributive lattices.
The lattice of normal subgroups of any group.
The lattice of submodules of any module.
A finite lattice L is modular if and only if it is graded and its rank function satisfies
(x) + (y) = (x y) + (x y) for all x, y L.
453
Chapter 77
06D99 Miscellaneous
77.1
distributive
Given a set S with two binary operations + : S S S and : S S S, we say that is

right distributive over + if
(a + b) c = (a c) + (b c) for all a, b, c S
and left distributive over + if
a (b + c) = (a b) + (a c) for all a, b, c S.
If is both left and right distributive over +, then it is said to be distributive over +.
77.2
distributive lattice
A lattice is said to be distributive if it satisifes either (and therefore both) of the distributive laws:
x (y z) = (x y) (x z)
x (y z) = (x y) (x z)
Every distributive lattice is modular.
Examples of distributive lattices include Boolean lattices and totally ordered sets.
454
Chapter 78
06E99 Miscellaneous
78.1
Boolean ring
A boolean ring is a ring R that has a unit element, and in which every element is idempotent.
In other words,
x2 = x, x R.
Example of boolean ring:
Let R be the ring Z2 Z2 with the operations being coordinate-wise. Then we can check:
(1, 1) (1, 1)
(1, 0) (1, 0)
(0, 1) (0, 1)
(0, 0) (0, 0)
=
=
=
=
(1, 1)
(1, 0)
(0, 1)
(0, 0)
the four elements that form the ring are idempotent. So, R is Boolean.
455
Chapter 79
08A40 Operations, polynomials,
primal algebras
79.1
if p =
coefficients of a polynomial
Pn
i=0
ai xi is a polynomial its coefficients are {ai }ni=0
Version: 5 Owner: say 10 Author(s): say 10, apmxi
456
Chapter 80
08A99 Miscellaneous
80.1
binary operation
A binary operation on a set X is a function from X X to X.

Rather than using function notation, it is usual to write binary operations with an operation symbol between elements, or even with no operation at all, it being understood that
juxtaposed elements are to be combined using an operation that should be clear from the
context.
Thus, addition of real numbers is the operation
(x, y) 7 x + y,
and multiplication in a groupoid is the operation
(x, y) 7 xy.
Version: 1 Owner: mclase Author(s): mclase
80.2
filtered algebra
Definition 2. A filtered algebra over the field k is an algebra (A, ) over k which is endowed
with a filtration F = {Fi }iN compatible with the multiplication in the following sense
m, n N,
Fm Fn Fn+m .
A special case of filtered algebra is a graded algebra. In general there is the following
construction that produces a graded algebra out of a filtered algebra.
457
Definition 3. Let (A, , F) be a filtered algebra then the associated graded algebra G(A)
is defined as follows:
As a vector space
G(A) =
Gn ,
nN
where,
G0 = F0 ,
and n > 0,
Gn = Fn /Fn1 ,
the multiplication is defined by

(x + Fn )(y + Fm ) = x y + Fn+m
Theorem 6. The multiplication is well defined and endows G(A) with the structure of a
graded algebra, with gradation {Gn }nN . Furthermore if A is associative then so is G(A).
An example of a filtered algebra is the Clifford algebra Cliff(V,
V q) of a vector space V endowed
with a quadratic form q. The associated graded algebra is V , the exterior algebra of V .
As algebras A and G(A) are distinct (with the exception of the trivial case that A is graded)
but as vector spaces they are isomorphic.
Theorem 7. The underlying vector spaces of A and G(A) are isomorphic.
Version: 5 Owner: Dr Absentius Author(s): Dr Absentius
458
Chapter 81
81.1
Euler phi-function
for any positive integer n, (n) is the number of positive integers less than or equal to n
which are coprime to n. This is known as the Euler -function. Among its useful properties
are the facts that is multiplicative, meaning if gcd(a, b) = 1, then (ab) = (a)(b), and
that (pk ) = pk1(p 1) if p is prime. These two facts combined give a numeric computation
of for all integers:

Y
1
(n) = n
1
.
p
p|n
For example,
(2000) = (24 53 )
1
1
= 2000(1 )(1 )
2
5
1 4
= 2000( )( )
2 5
8000
=
10
= 800.
In addition,
(d) = n
d|n
where the sum extends over all positive divisors of n. Also, (n) is the number of units in
the ring Z/nZ of integers modulo n.
459
81.2
Euler-Fermat theorem
Given a, n Z, a(n) 1 (mod n) when gcd(a, n) = 1, where is the Euler totient function.
81.3
Fermats little theorem
If a, p Z with p a prime and p - a, then ap1 1 (mod p).

81.4
Fermats theorem proof
Consider the sequence a, 2a, . . . , (p 1)a.

They are all different (modulo p) because if ma = na with 1 m < n p 1 then
0 = a(m n) and since p 6 |a we get p | (m n) which is impossible.
Now, since all these numbers are different, the set {a, 2a, 3a, . . . , (p 1)a} will have the p 1
possible congruence classes (although not necessarily in the same order) and therefore
a 2a 3a (p 1)a (p 1)!ap1 (p 1)!
(mod p)
and using gcd((p 1)!, p) = 1 we get

ap1 1
(mod p)
81.5
Goldbachs conjecture
The conjecture states that every even integer n > 2 is expressible as the sum of two primes.
In 1966 Chen proved that every sufficiently large even number can be expressed as the sum
of a prime and a number with at most two prime divisors.
460
Vinogradov proved that every sufficiently large odd number is a sum of three primes. In
1997 it was shown by J.-M. Deshouillers, G. Effinger, H. Te Riele, and D. Zinoviev that
assuming generalized Riemann hypothesis every odd number n > 5 can be represented as
sum of three primes.
The conjecture was first proposed in a 1742 letter from Christian Goldbach to Euler and
still remains unproved.
Version: 6 Owner: drini Author(s): bbukh, drini, imran
81.6
Jordans totient function
Let p be a prime, k and n natural numbers. Then

Jk (n) = nk
Y
(1 pk )
p|n
where the product is over divisors of n.

This is a generalization of Eulers totient function.
81.7
Legendre symbol
Legendre Symbol.

Let p an odd prime. The symbol ap or (a | p) will has the value 1 if a is a quadratic residue
modulo p, 1 if a is not a quadratic residue, or 0 if p divides a. The symbol defined this
way is called Legendre symbol.
The Legendre symbol can be computed by means of Eulers criterion or Gauss lemma.
A generalization of this symbol is the Jacobi Symbol.
461
81.8
Pythagorean triplet
A Pythagorean triplet is a set {a, b, c} of three integers such that

a2 + b2 = c2 .
That is, {a, b, c} is a Pythagorean triplet if there exists a right triangle whose sides are a, b, c.
If {a, b, c} a Pythagorean triplet, so is {ka, kb, kc}. If a, b, c are coprimes, then we say that
the triplet is primitive.
All the primitive Pythagorean triplets are given by
a = 2mn
b = m2 n2
c = m2 + n2
where m, n are any two coprime integers, one odd and the other even with m > n.
81.9
Wilsons theorem
Wilsons theorem states that

(p 1)! 1
(mod p)
for prime numbers p.

81.10
arithmetic mean
Arithmetic Mean.
If a1 , a2 , . . . , an are real numbers, we define the arithmetic mean of them as
A.M. =
a1 + a2 + + an
.
n
The arithmetic mean is what is commonly called the average of the numbers.
462
81.11
ceiling
The ceiling function is the smallest integer greater or equal than its argument. It is usually
denoted as dxe.
Some examples: d6.2e = 7, d0.4e = 1, d7e = 7, d5.1e = 5, de = 4, d4e = 4.
Note that this function is NOT the integer part ([x]), since d3.5e = 4 and [3.5] = 3.
81.12
computation of powers using Fermats little theorem
A straightforward application of Fermats Theorem consists of rewriting the power of an

integer mod n. Suppose we have x ab (mod n) with a U(n). Then, by Fermats
theorem we have
a(n) 1 (mod n),
so
x ab (1)k ab (a(n) )k ab+k(n)
(mod n)
for any integer k. This means we can replace b by any integer congruent to it mod (n). In
particular we have
x ab % (n) (mod n)
where b % (n) denotes the remainder of b upon division by (n).
This can be used to make the computation of large powers easier. It also allows one to find
1
an easy to compute inverse to xb (mod n) whenever b U(n). In fact, this is just xb where
b1 is an inverse to b mod (n). This forms the base of the RSA cryptosystem where a
message x is encrypted by raising it to the bth power, giving xb , and is decrypted by raising
it to the b1 th power, giving
1
1
(xb )b xbb ,
which, by the above argument, is just
1
xbb
% (n)
x,
the original message!

Version: 2 Owner: basseykay Author(s): basseykay
463
81.13
congruences
Let a, b be integers and m a non-zero integer. We say that a is congruent to b modulo

m if m divides b a. We write this as
a b (mod m).
If a and b are congruent modulo m, it means that both numbers leave the same residue when
divided by m.
Congruence with a fixed modulo is an equivalence relation on Z. The set of equivalence classes
is a cyclic group of order m with respect to the sum and a ring if we consider multiplication
modulo m. This ring is usually denoted as
Z
.
mZ
This ring is also commonly denoted as Zm , although that notation is also used to represent
the m-adic integers.
81.14
coprime
Two integers a, b are coprime if their greatest common divisor is 1. It is also said that a, b
are relatively prime.
81.15
cube root
The cube rootof a real number

x,written
as 3 x, is the real number y such that y 3 = x.
3
Equivalently, 3 x x. Or, 3 x 3 x 3 x x.
Example: 3 8 = 2 because (2)3 = 2 2 2 = 8.
Example: 3 x3 + 3x2 + 3x + 1 = x + 1 because (x + 1)3 = (x + 1)(x + 1)(x + 1) = (x2 + 2x +

1)(x + 1) = x3 + 3x2 + 3x + 1.
The cube root operation is distributive for multiplication and division, but not for addition
and subtraction.
q
That is, 3 x y = 3 x 3 y, and 3 xy = 3 xy .

464
However, in general, 3 x + y 6= 3 x + 3 y and 3 x y 6= 3 x 3 y.

p
Example: 3 x3 y 3 = xy because (xy)3 = xy xy xy = x3 y 3 = x3 y 3 .
q
3
3
8
8
Example: 3 125
= 25 because 52 = 253 = 125
.
1
. As
The cube root notation is actually an alternative to exponentiation. Thatis, 3 x x 3
2
2
3
2
such, the cube root operation is associative with exponentiation. That is, x = x 3 = 3 x .
Version: 4 Owner: wberry Author(s): wberry
81.16
floor
The floor function is the greatest integer less or equal than its argument. It is usually
denoted as bxc.
Some examples: b6.2c = 6, b0.4c = 0, b7c = 7, b5.1c = 6, bc = 3, b4c = 4.
Note that this function is NOT the integer part ([x]), since b3.5c = 4 and [3.5] = 3.
However, both functions agree for non negative numbers
On some texts however, it is sometimes seen the bracket notation to denote floor function (although they actually work with integer part) so it is sometimes also called bracket
function
81.17
geometric mean
Geometric Mean.
If a1 , a2 , . . . , an are real numbers, we define their geometric mean as
G.M. =
a1 a2 an
(We usually require the numbers to be non negative so the mean always exists.)
465
81.18
googol
A googol equals to the number 10100 , that is, a one followed by one hundred zeros. A
100
googolplex is is ten raised to the power of a googol, i.e., 10(10 ) .
Although these numbers do not have much use in traditional mathematics, they are useful
for illustrating what big can mean in mathematics. Written out in numbers a googol is
100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
This is already a huge number. For instance, it is more than the number of atoms in the
known universe. A googoplex is even larger. In fact, since a googolplex has a googol number
of zeros in its decimal representation, a googoplex has more digits than there are atoms
in our universe. Thus, even if all matter in the universe were at disposal, it would not be
possible to write down the decimal representation of a googolplex [1].
Properties
1. A googol is approximately the factorial of 70. The only prime factors in a googol and
a googolplex are 2 and 5 [1].
2. Using Stirlings formula we can approximate the factorial of a googol to obtain
101 )
(10100 )! 10(9.95 10
History and etymology

The googol was created by the American mathematician Edward Kasner (1878-1955) [4] in
[2] to illustrate the difference between an unimaginably large number and infinity. The name
googol was coined by Kasners nine-year-old nephew Milton Sirotta in 1938 when asked to
give a name for a huge number. The name googol was perhaps influenced by the comic strip
character Barney Google [1, 3].
REFERENCES
1. Wikipedias entry on googol.
2. Kasner, Edward & Newman, James Roy,Mathematics and the Imagination, (New York,
NY, U SA: Simon and Schuster, 1967; Dover Pubns, April 2001; London: Penguin, 1940, ISBN
0486417034).
3. Douglas Harpers Etymology online dictionary, Googol 12/2003.
4. Wikipedias entry on Edward Kasner.

466
81.19
googolplex
100
A googolplex is 1010 ; that is, 10 raised to the googol-th power. This can also be viewed as
a one followed by a googol zeros at its right.
81.20
greatest common divisor
Let a and b be given integers, with at least one of them different from zero. The greatest
common divisor of a and b, denoted by gcd(a, b), is the positive integer d satisfying
1. d|a and d|b,
2. if c|a and c|b, then c 6 d.
More intuitively, the greatest common divisor is the largest integer dividing both a and b.
81.21
group theoretic proof of Wilsons theorem
Here we present a group theoretic proof of it. clearly

,it is enough to show that (p2)! 1 (mod p) since p1 1 (mod p). By Sylows theorems,we
have that p-Sylows subgroups of Sp ,the symmetric group on p elements,have order p,and the
number np of Sylow subgroups is congruent to 1 modulo p.Let P be a Sylows subgroup of
Sp .Note that P is generated by a p-cycle.There are (p 1)! cycles of lenght p in Sp .Each
p-Sylow subgroup contains p 1 cycles of lenght p,hence there are (p1)!
= (p 2)! different
p1
p-Sylow subgrups in Sp ,i.e. nP = (p 2)! .From Sylows Second Theorem, it follows that
(p 2)! 1 (mod p),so (p 1)! 1 (mod p).
Version: 4 Owner: ottocolori Author(s): ottocolori
81.22
harmonic mean
If a1 , a2 , . . . , an are positive numbers, we define their harmonic mean as:

H.M. =
1
a1
1
a2
467
n
++
1
an
If you travel from city A to city B at x miles per hour, and then you travel back at y miles
per hour. What was the average velocity for the whole trip?
The harmomic mean of x and y!. That is, the average velocity is
2xy
2
1
1 =
x+y
+y
x
81.23
mean
A mean is a homogeneous function f whose domain is the collection of all finite multisets of
R and whose codomain is R, such that for any set S = {x1 , x2 , x3 , . . . , xt } of real numbers,
min{x1 , x2 , x3 , . . . , xt } 6 f (S) 6 max{x1 , x2 , x3 , . . . , xt }.
Pythagoras identified three types of means: the arithmetic mean, the geometric mean,
and the harmonic mean. Other well-known means include: the median, the mode, the
arithmetic-geometric mean, the arithmetic-harmonic mean, the harmonic-geometric mean,
the root-mean-square (sometimes called the quadratic mean), the identric mean, the Heronian mean, and the Cesro mean. Even the minimum and maximum functions are means
(though vacuously so).
It should be noted that the arithmetic mean is sometimes simply referred to as the mean.
81.24
number field
A field which is a finite extension of Q, the rational numbers, is called a number field.
81.25
pi
The number is the ratio between the perimeter and the diameter on any given circle.
That is, in any circle, dividing the perimeter by the diameter always gives the same answer:
3.14159265358...
468
Over human history there were many attempts to calculate this number precisely. One of the
oldest approximations appears in the Rhind Papyrus (circa 1650 B.C.) where a geometrical
construction is given where (16/9)2 = 3.1604 . . . is used as an approximation to although
this was not explicitly mentioned.
It wasnt until the Greeks that there were systematical attemps to calculate . Archimedes
[1], in the third century B.C. used regular polygons inscribed and circumscribed to a circle
to approximate : the more sides a polygon has, the closer to the circle it becomes and
therefore the ratio between the polygons area between the square of the radius yields approximations to . Using this method he showed that 223/71 < < 22/7 (3.140845 . . . <
< 3.142857 . . .).
Aroud
the world there were also attemps to calculate . Brahmagupta [1] gave the value
of 10 = 3.16227 . . . using a method similar to Archimedes. Chinese mathematician Tsu
Chung-Chih (CA. 500 A.D.) gave the approximation 355/113 = 3.141592920 . . ..
Later, during the renaissance, Leonardo de Pisa (Fibonacci) [1] used 96-sideed regular polygons to find the approximation 864/275 = 3.141818 . . .
For centuries, variations on Archimedes method were the only tool known, but Viète [1]
gave in 1593 the formula
v
s
r s
r u
r
u
2
1 1 1 1 t1 1 1 1 1
=
+
+
+
2 2 2 2 2 2 2 2 2
which was the first analytical expression for involving infinite summations or products.
Later with the advent of calculus many of these formulas were discovered. Some examples
are Wallis [1] formula:
2 2 4 4 6
=
2
1 3 3 5 5
and Leibnizs formula,
1 1 1 1
1
=1 + +
+ ,
4
3 5 7 9 11
obtained by developing arctan(/4) using power series, and with some more advanced techniques,
p
= 6(2),
found by determining the value of the Riemann Zeta function at s = 2.
The Leibniz expression provides an alternate way to define (namely 4 times the limit of
the series) and it is one of the formal ways to define when studying analysis in order to
avoid the geometrical definition.
It is known that is not a rational number (quotient of two integers). Moreover, is not
algebraic over the rationals (that is, it is a transcendental number). This means that no
polynomial with rational coefficients can have as a root. Its irrationality implies that its
decimal expansion (or any integer base for that matter) is not finite nor periodic.
469
REFERENCES
1. The MacTutor History of Mathematics archive, The MacTutor History of Mathematics Archive
2. The ubiquitous [Dario Castellanos] Mathematics Magazine Vol 61, No. 2. April 1988.
Mathematical Association of America.
3. A history of pi. [OConnor and Robertson] http://www-groups.dcs.st-and.ac.uk/~history/HistTopics/P
4. Pi chronology. [OConnor and Robertson] http://www-groups.dcs.st-and.ac.uk/~history/HistTopics/P

5. Asian
contributions
to
Mathematics.
http://www.pps.k12.or.us/depts-c/mc-me/be-as-ma.pdf
[Ramesh
Gangolli]
6. Archimedes approximation of pi. [Chuck Lindsey] http://itech.fgcu.edu/faculty/clindsey/mhf4404/ar
81.26
proof of Wilsons theorem
We begin by noting that

(p 1)! = (p 1)(p 2) (2)(1)
Since we are working (mod#1), all the numbers 2, . . . , p 2 have inverses (mod#1), and
each can be paired with its inverse within 2, . . . , p 2. This leaves 1, which is its own inverse,
and p 1, also its own inverse. Hence we can write
(p 1)! (p 1)(1) (1) (mod p)
but then
(p 1)! (p 1) 1
(mod p)
Hence
(p 1)! 1

470
(mod p)
81.27
proof of fundamental theorem of arithmetic
If n is prime, there is nothing to prove, so assume n is composite. Then there exists an

integer d such that 1 < d < n and d|n. By the well-ordering principle, we can pick the smallest such integer; call it p1 . If p1 were composite, then it would have a divisor 1 < q < p1 , but
q|p1 q|d, contradicting minimality of p1 . Thus p1 is prime. Write n = p1 n1 . If n1 is prime,
we have the desired representation. Otherwise, the same argument yields a new prime, p2 ,
such that n1 = p2 n2 n = p1 p2 n2 . The decreasing sequence n > n1 > n2 > > 1 cannot
continue indefinitely, so at some point nk1 is a prime; call it pk . This leads to the prime
factorization n = p1 p2 pk .
To show uniqueness, assume we have two prime factorizations
n = p1 p2 pr = q1 q2 qs .
Assume without loss of generality that r 6 s, and that our primes are written in increasing
magnitude, so p1 6 6 pr , q1 6 6 qs . Since p1 |q1 q2 qs and the qi are prime, we must
have p1 = qk for some k, but then p1 > q1 . The same reasoning gives q1 > p1 , so p1 = q1 .
Cancel this common factor to get
p2 p3 pr = q2 q3 qs .
Continuing, we can divide by all pi and get
1 = qr+1 qr+2 qs .
But the qi were assumed to be > 1, so they must be equal to 1. Then r = s and pi = qi for
all i, making the two factorizations identical.
81.28
root of unity
The fundamental theorem of algebra assures us that the polynomial xn 1 = 0 has n roots
on C. That is, there exist n complex numbers z such that z n = 1. These numbers are called
roots of unity.
If = e2i/n = cos(2/n) + i sin(2/n), then the n-th roots of the unity are: k = e2ki/n =
cos(2k/n) + i sin(2k/n) for k = 1, 2, . . . , n.
If drawed on the complex plane, the n-th roots of unity are the vertices of a regular n-gon.
471
Chapter 82
82.1
base
Most 1 written number systems are built upon the concept of base for their functioning and
conveying of quantitative meaning. In these systems, meaning is derived from two things:
symbols and places. The representation of a value then follows the schema:
. . . s2 s1 s0 .s1 s2 s3 . . .
Where each si is some symbol that has a quantitative value. Places to the left of the point
(.) are worth whole units, and places to the right are worth fractional units. It is the base
that tells us how much of a fraction or how many whole units. Once a base b is chosen, the
value of a number s2 s1 s0 .s1 s2 s3 would be calculated like:
s2 s1 s0 .s1 s2 s3 = s2 b2 + s1 b1 + s0 b0 + s1 b1 + s2 b2 + s3 b3
In our now-standard, Arabic-derived decimal system, the base b is equal to 10. Other very
common and useful systems are binary, hexadecimal, and octal, having b = 2, b = 16, and
b = 8 respectively 2 .
Each si is a member of an alphabet of symbols which must have b members. Intuitively this
1
but not all see Roman numerals for an example of a baseless number system.
These are generic systems which are capable of representing any number. By contrast, our system of
written time is a curious hybrid of bases (60, 60, and then 10 from there on) and has a fixed number of whole
places and a different number of symbols (24) in the highest place, making it capable only of representing
the same discrete, finite set of values over and over again.
2
472
makes sense: when we try to represent the number which follows 9 in the decimal system,
we know it must be 10, since there is no symbol after 9. Hence, place as well as symbol
conveys the meaning, and base tells us how much a unit in each place is worth.
Curiously, though one would think that the choice of base leads to merely a different way
of rendering the same information, there are instances where things are variously provable
or proven in some bases, but not others. For instance, there exists a non-recursive formula
for the nth binary digit of , but not for decimal one still must calculate all of the n 1
preceding decimal digits of to get the nth (see this paper).
473
Chapter 83
11-XX Number theory
83.1
Lehmers Conjecture
If m is a positive quantity, to find a polynomial of the form

f (x) = X r + a1 X r 1 + ... + ar
where the as are integers, such that the absolute value of the product of those roots of f
which lie outside the unit circle, lies between 1 and 1 + m
This problem, in interest in itself, is especially important for our purposes. Whether or not
the problem has a solution for m 0.176 we do not know. Derrick Henry Lehmer, 1933.
We define Mahlers measure of a polynomial f to be the absolute value of the product of those
roots of f which lie outside the unit circle, multiplied by the absolute value of the coefficient
of the leading term of f. We shall denote it M(f).
Lehmers conjecture states that there exists a constant C 1 such that every polynomial f
with integer coefficients and M(f) 1 has M(f) C.
83.2
Sierpinski conjecture
In 1960 Waclaw Sierpinski (1882-1969) proved the following interesting result:

Theorem: There exist infinitely many odd integers k such that k2n + 1 is composite for
every n > 1.
474
A multiplier k with this property is called a Sierpinski number. The Sierpinski problem
consists in determining the smallest Sierpinski number. In 1962, John Selfridge discovered
the Sierpinski number k = 78557, which is now believed to be in fact the smallest such
number.
Conjecture: The integer k = 78557 is the smallest Sierpinski number. To prove the
conjecture, it would be sufficient to exhibit a prime k2n + 1 for each k < 78557.
83.3
prime triples conjecture
There exists infinitely many triples p, q, r of prime numbers such that P = pq + pr + qr is a

prime.
475
Chapter 84
11A05 Multiplicative structure;
Euclidean algorithm; greatest
common divisors
84.1
Bezouts lemma (number theory)
Let a, b be integers, not both zero. Then there exist two integers x, y such that:
ax + by = gcd(a, b).
This does not only work on Z but on every integral domain where an Euclidean valuation
has been defined.
84.2
Euclids algorithm
Euclids algorithm describes a procedure for finding the greatest common divisor of two
integers.
Suppose a, b Z, and without loss of generality, b > 0, because if b < 0, then gcd(a, b) =
gcd(a, |b|), and if b = 0, then gcd(a, b) = |a|. Put d := gcd(a, b).
By the division algorithm for integers, we may find integers q and r such that a = q0 b + r0
where 0 6 r0 < b.
Notice that gcd(a, b) = gcd(b, r0 ), because d | a and d | b, so d | r0 = a q0 b and if b and r0
had a common divisor d0 larger than d, then d0 would also be a common divisor of a and b,
contradicting ds maximality. Thus, d = gcd(b, r0 ).
476
So we may repeat the division, this time with b and r0 . Proceeding recursively, we obtain
a
b
r0
r1
=
=
=
=
..
.
q0 b + r0 with 0 6 r0 r0 > r1 > r2 > . . . ,
which must eventually reach zero, that is to say, rn = 0 for some n, and the algorithm
terminates. We may easily generalize the previous argument to show that d = gcd(rk1 , rk ) =
gcd(rk , rk+1) for k = 0, 1, 2, . . . , where r1 = b. Therefore,
d = gcd(rn1 , rn ) = gcd(rn1 , 0) = rn1 .
More colloquially, the greatest common divisor is the last nonzero remainder in the algorithm.
The algorithm provides a bit more than this. It also yields a way to express the d as a
linear combination of a and b, a fact obscurely known as Bezouts lemma. For we have that
a q0 b
b q1 r0
r0 q2 r1
r1 q3 r2
=
=
=
=
..
.
r0
r1
r2
r3
rn3 qn1 rn2 = rn1

rn2 = qn rn1
so substituting each remainder rk into the next equation we obtain

b q1 (a q0 b)
(a q0 b) q2 (k1 a + l1 b)
(k1 a + l1 b) q3 (k2 a + l2 b)
=
=
=
..
.
k1 a + l1 b
k2 a + l2 b
k3 a + l3 b
=
=
=
..
.
r1
r2
r3
(kn3 a + ln3 b) qn (kn2 a + ln2 b) = kn1 a + ln1 b = rn1

Sometimes, especially for manual computations, it is preferrable to write all the algorithm
in a tabular format. As an example, let us apply the algorithm to a = 756 and b = 595.
The following table details the procedure. The variables at the top of each column (without
subscripts) have the same meaning as above. That is to say, r is used for the sequence of
remainders and q for the corresponding sequence of quotients. The entries in the k and l
477
r
756
595
161
112
49
14
7
0
q
1
3
1
2
3
2
k
l
1
0
0
1
1
1
3
4
4
5
11 14
37 47
columns are obtained by multiplying the current values for k and l by the q in this row, and
subtracting the results from the k and l in the previous row.
Thus, gcd(756, 595) = 7 and 37 756 47 595 = 7.
Euclids algorithm was first described in his classic work Elements, which also contained
procedures for geometrical constructions. These are the first known formally described algorithms. Prior to this, informally defined algorithms were in common use to perform various
computations, but Elements contained the first attempt to rigorously describe a procedure
and explain why its results are admissable. Euclids algorithm for greatest common divisor
is still commonly used today; since Elements was published in the fourth century BC, this
algorithm has been in use for nearly 2400 years!
Version: 13 Owner: rmilson Author(s): NeuRet, drini, vampyr
84.3
Euclids lemma
If a|bc, with gcd(a, b) = 1, then a|c.

84.4
Euclids lemma proof
We have a|bc, so bc = na, with n an integer. Dividing both sides by a, we have

bc
=n
a
But gcd(a, b) = 1 implies b/a is only an integer if a = 1. So
c
bc
=b =n
a
a
478
which means a must divide c.

84.5
fundamental theorem of arithmetic
Each natural number n > 1 can be decomposed uniquely, up to the order of the factors, as
a product of prime numbers. This allows us to write n in the unique representation
n = p1 a1 p2 a2 p3 a3 pk ak
for some nonnegative integer k with pi prime and pi 6= pj for i 6= j. For some results it is
also useful to assume that pi < pj for i < j.
84.6
perfect number
An integer n is called perfect if it is the sum of all divisors of n less than n itself. It is
not known if there are any odd perfect numbers, but all even perfect numbers have been
classified as follows:
If 2k 1 is prime for some k > 1, then 2k1(2k 1) is perfect, and every even perfect
number is of this form.
Proof: () Let p = 2k 1 be prime, let n = 2k1 p, and define (a) as the sum of all
positive divisors of the integer a. Since is multiplicative (meaning (ab) = (a)(b) when
gcd(a, b) = 1), we have:
(n) =
=
=
=
(2k1 p)
(2k1 )(p)
(2k 1)(p + 1)
(2k 1)(2k ) = 2n,
which shows n is perfect.

() Assume n is an even perfect number. Write n = 2k1 m for some odd m and k > 2.
Then we have gcd(2k1, m) = 1, so
(n) = (2k1 m) = (2k1 )(m) = (2k 1)(m).
479
But if n is perfect, then by definition (n) = 2n which in our case means (n) = 2n = 2k m.
Piecing together our two formulae for (n), we get
2k m = (2k 1)(m).
So (2k 1)|2k m, which forces (2k 1)|m. Write m = (2k 1)M. So from above we have:
2k m
=
(2k 1)(m)
2k (2k 1)M
=
(2k 1)(m)
2k M = (m)
Since m|m by definition and M|m by assumption, we have
2k M = (m) > m + M = 2k M,
which forces (m) = m + M. Thus m has only two divisors, m and M. Hence m must be
prime, M = 1, and m = (2k 1)M = 2k 1, from which the result follows.
84.7
smooth number
n is a k-smooth number if all prime divisors of n are less than k.

480
Chapter 85
11A07 Congruences; primitive
roots; residue systems
85.1
Antons congruence
For every n N (n!)p stands for the product of numbers between 1 and n which are not
divisible by a given prime p. And we set (0!)p = 1.
The corollary below generalizes a result first found by Anton, Stickelberger, and Hensel:
Let N0 be the least non-negative residue of n (mod ps ) where p is a prime number and
n N. Then

bn/ps c
+
(n!)p 1
(N0 !)p (mod ps ).
Proof: We write each r in the product below as ips + j to get
Q
(n!)p
=
r
1rn,ps 6|r
ips + j
0ibn/ps c1,1j<ps ,ps 6|j
bn/ps c1
i=0
ips + j
i=bn/ps c,1jN
1j<ps ,ps 6|j

s
(ps !)pbn/p c (N0 !)p
!!
N0
Q
j=1,ps 6|j
(mod ps ).
From Wilsons theorem for prime powers it follows that

(N0 !)p
for
p = 2, s 3
(n!)p
bn/ps c
(1)
(N0 !)p otherwise
481
,ps 6|j
(mod ps ).
85.2
Fermats Little Theorem proof (Inductive)
We must show
with p prime, and p - a.
ap1 1
(mod p)
When a = 1, we have
1p1 1 (mod p)
Now assume the theorem holds for any a. We have as a direct consequence that
ap a
(mod p)
Lets examine a + 1. By the binomial theorem, we have
(a + 1)

p
p p1
p p
a+1
a
++
a +
p1
1
0
a + pap1 + p(p 1)ap2 + + p(p 1) 2a + 1
(a + 1) + [pap1 + p(p 1)ap2 + + p(p 1) 2a]
However, note that the entire bracketed term is divisible by p, since each element of it is
divisble by p. Hence
(a + 1)p (a + 1)
(mod p)
Since p is prime, we can cancel an (a + 1) from both sides, giving

(a + 1)p1 1 (mod p)
Then by induction, Fermats little theorem holds in general.
482
85.3
Jacobi symbol
Jacobi Symbol.
Let n be an odd positive integer,
with prime factorization p1 e1 pk ek . Let a > 0 be an

integer. The Jacobi symbol na is defined to be
k ei
a Y
a
=
n
pi
i=1

where pai is the Legendre symbol of a and pi .
Version: 2 Owner: saforres Author(s): saforres
85.4
Shanks-Tonelli algorithm
The Shanks-Tonelli algorithm is a procedure for solving a congruence of the form x2 n

(mod p), where p is an odd prime and n is a quadratic residue of p. In other words, it can
be used to compute modular square roots.
First find positive integers Q and S such that p 1 = 2S Q, where Q is odd. Then find a
quadratic nonresidue W of p and compute V W Q (mod p). Then find an integer n0 that
is the multiplicative inverse of n (mod p) (i.e., nn0 1 (mod p)).
Compute
Rn
Q+1
2
(mod p)
and find the smallest integer i > 0 that satisfies

i
(R2 n0 )2 1 (mod p)
If i = 0, then x = R, and the algorithm stops. Otherwise, compute
(Si1)
R0 RV 2
(mod p)
and repeat the procedure for R = R0 .
85.5
Wieferich prime
By Fermats little theorem the relationship p | 2p 1 holds for any odd prime p. An odd
prime p such that p2 - 2p 1 is called a Wieferich prime. It is currently unknown whether or
not there are infinitely many Wieferich primes, or whether or not there are infinitely many
primes that are not Wieferich, though the ABC conjecture implies the former.
483
REFERENCES
1. Ireland, Kenneth and Rosen, Michael. A Classical Introduction to Modern Number Theory.
Springer, 1998.
2. Nathanson, Melvyn B. Elementary Methods in Number Theory. Springer, 2000.
85.6
Wilsons theorem for prime powers
For every natural number n, let (n!)p denote the product of numbers 1 m n with
gcd(m, p) = 1.
For prime p and s N
s
(p !)p
1
for p = 2, s 3
1 otherwise
(mod ps ).
Proof: We pair up all factors of the product (ps !)p into those numbers m where m 6 m1
(mod ps ) and those where this is not the case. So (ps !)p is congruent (modulo ps ) to the
product of those numbers m where m m1 (mod ps ) m2 1 (mod ps ).
Let p be an odd prime and s N. Since 2 6 |ps , ps |(m2 1) implies ps |(m + 1) either or
ps |(m 1). This leads to
(ps !)p 1 (mod ps )
for odd prime p and any s N.

Now let p = 2 and s 2. Then
1 + t.2s1
Since
2s1 + 1
we have
2
(mod 2s ), t =+ 1.

2s1 1 1
(mod 2s ),
(ps !)p (1).(1) = 1 (mod ps )

For p = 2, s 3, but 1 for s = 1, 2.
484
85.7
factorial module prime powers
For n N and prime number p, (n!)p is the product of numbers 1 m n|p 6 |m.
For n, s N and prime number p, we have the congruence
n!
p
Pd
i=0
n
pi
d
Y
n
(1)b ps +i c (Ni !)p
i=0
(mod ps ),
j k
where Ni is the least non-negative residue of pni (mod ps ). d + 1 denotes the number of
digits in the p-adic representation of n. More preciesly, 1 is 1 unless p = 2, s 3.
j k
Proof: Let i 0. Then the set of numbers between 1 and pni is

kp, k 1, k
pi+1

This is true for every integer i with pi+1 n. So we have

j k
n

!
pi
n
! .
j
k j n k =
pi
n
i+1
p
p
p
pi+1
(85.7.1)
Multiplying all terms with 0 i d, where d is the largest power of p not greater than n,
the statement follows from the generalization of Antons congruence.
85.8
proof of Euler-Fermat theorem
Let a1 , a2 , . . . , a(n) be all positive integers less than n which are coprime to n. Since
gcd(a, n) = 1, then the set aa1 , aa2 , . . . , aa(n) are each congruent to one of the integers
a1 , a2 , . . . , a(n) in some order. Taking the product of these congruences, we get
(aa1 )(aa2 ) (aa(n) ) a1 a2 a(n)
(mod n)
hence
a(n) (a1 a2 a(n) ) a1 a2 a(n)
(mod n).
Since gcd(a1 a2 a(n) , n) = 1, we can divide both sides by a1 a2 a(n) , and the desired
result follows.
485
85.9
proof of Lucass theorem
Let n m N. Let a0 , b0 be the least non-negative residues of n, m (mod p), respectively.

(Additionally, we set r = n m, and r0 is the least non-negative residue of r modulo p.)
Then the statement follows from
j k

n
p
n
a0
j k
(mod p).
m
m
b0
p
We define the carry indicators ci for all i 0 as

1 if
bi + ri p
ci =
,
0 otherwise
and additionally c1 = 0.
The special case s = 1 of Antons congruence is:
(n!)p (1)b p c a0 !
n
(mod p),
(85.9.1)
where a0 as defined above, and (n!)p is the product of numbers n not divisible by p. So
we have
n
n!
j k n = (n!)p (1)b p c a0 ! (mod p)
n
!pb p c
p
When dividing by the left-hand terms of the congruences for m and r, we see that the power
of p is

n
m
r
= c0
p
p
p
So we get the congruence

n
a0
c0
m
,
(1)
n
bpc
b0
c0
p
b mp c
or equivalently
j k
c0
n
p
1
n
a0
j k
(mod p).
(85.9.2)
m
p
m
b0
p
Now we consider c0 = 1. Since

a0 = b0 + r0 pc0 = p,
| {z }

b0 + r0 < b0 c0 = 1 b0 (p r0 ) = a0 < b0 ab00 = 0. So both congruencesthe one
in the statement and (91.2.1) produce the same results.
486
Chapter 86
11A15 Power residues, reciprocity
86.1
Eulers criterion
Let p be an odd prime and n an integer such that (n, p) = 1 (that is, n and p are
relatively prime).
Then (n|p) n(p1)/2 (mod p) where (n|p) is the Legendre symbol.
86.2
Gauss lemma
Gausss lemma on quadratic residues (GL) is:

Proposition 1: Let p be an odd prime and let n be an integer which is not a multiple of p.
Let u be the number of elements of the set

p1
n
n, 2n, 3n, . . . ,
2
whose least positive residues, modulo p, are greater than p/2. Then

n
= (1)u
p

where np is the Legendre symbol.
That is, n is a quadratic residue modulo p when u is even and it is a quadratic nonresidue
when u is odd.
487
GL is the special case

S=
p1
1, 2, . . . ,
2
of the slightly more general statement below. Write Fp for the field of p elements, and
identify Fp with the set {0, 1, . . . , p 1}, with its addition and multiplication mod p.
Proposition 2: Let S be a subset of Fp such that x S or x S, but not both, for any
x Fp . For n Fp let u(n) be the number of elements k S such that kn
/ S. Then

n
= (1)u(n) .
p
Proof: If a and b are distinct elements of S, we cannot have an = bn, in view of the
hypothesis on S. Therefore
Y
Y
an = (1)u(n)
a.
aS
aS
On the left we have
(p1)/2
n
by Eulers criterion. So
Y
n
a
a=
p
aS
aS
Y
Y
Y
n
a = (1)u(n)
a
p aS
aS
The product is nonzero, hence can be cancelled, yielding the proposition.

Remarks: Using GL, it is straightforward to prove that for any odd prime p:
(
1
if x 1 (mod 4)
1
=
p
1 if x 1 (mod 4)
(
1
2
=
p
1
if x 1
if x 3
(mod 8)
(mod 8)
The condition on S can also be stated like this: for any square x2 Fp , there is a unique
y S such that x2 = y 2. Apart from the usual choice
S = {1, 2, . . . , (p 1)/2} ,
the set
{2, 4, . . . , p 1}
has also been used, notably by Eisenstein. I think it was also Eisenstein who gave us this
trigonometric identity, which is closely related to GL:
Y
sin (2an/p)
n
=
p
sin (2a/p)
aS
488
It is possible to prove GL or Proposition 2 from scratch, without leaning on Eulers criterion, the existence of a primitive root, or the fact that a polynomial over Fp has no more
zeros than its degree.
Version: 6 Owner: drini Author(s): Larry Hammick, drini
86.3
Zolotarevs lemma
We will identify the ring Zn of integers modulo n, with the set {0, 1, . . . n 1}.
lemma 1: (Zolotarev) For any prime number p and any m Zp , the Legendre symbol
is equal to the signature of the permutation m : x 7 mx of Zp .

m
p
Proof: Let us write () for the signature of any permutation . If is a circular permutation on a set of k elements, then () = (1)k1 .
Let i be the order of m in Zp . Then the permutation m consists of (p 1)/i orbits, each of
size i, whence
(m ) = (1)(i1)(p1)/i
If i is even, then
i p1
i
m(p1)/2 = m 2
And if i is odd, then 2i divides p 1, so
m(p1)/2 = mi
= (1)
p1
2i
p1
i
= (m )
= 1 = (m ).
In both cases, the lemma follows from Eulers criterion.

Lemma 1 extends easily from the Legendre symbol to the Jacobi Symbol
m
n
for odd n.
The following is Zolotarevs penetrating proof of the quadratic reciprocity law, using Lemma
1.
Lemma 2: Let be the permutation of the set
Amn = {0, 1, . . . , m 1} {0, 1, . . . , n 1}
which maps the kth element of the sequence
(0, 0)(0, 1) . . . (0, n 1)(1, 0) . . . (1, n 1)(2, 0) . . . (m 1, n 1),
to the kth element of the sequence
(0, 0)(1, 0) . . . (m 1, 0)(0, 1) . . . (m 1, 1)(0, 2) . . . (m 1, n 1),
for every k from 1 to mn. Then
() = (1)m(m1)n(n1)/4
489
and if m and n are both odd,

() = (1)(m1)(n1)/4 .
Proof: We will use the fact that the signature of a permutation of a finite totally ordered set
is determined by the number of inversions of that permutation. The sequence (0, 0), (0, 1) . . .
defines on Amn a total order in which the relation (i, j) < (i0 , j 0 ) means
i < i0 or (i = i0 and j < j 0 ).
But (i0 , j 0 ) < (i, j) means
j 0 < j or (j 0 = j and i0 < i).
The only pairs ((i, j), (i0 , j 0 )) that get inverted are, therefore, the ones with i < i0 and j > j 0 .
There are indeed m2 n2 such pairs, proving the first formula, and the second follows easily.
Now let p and q be distinct odd primes. Denote by the canonical ring isomorphism
Zpq Zp Zq . Define two permutations and of Zp Zq by
(x, y) = (qx + y, y)
(x, y) = (x, x + py).
Last, define a map : Zpq Zpq by
(x + qy) = px + y
for x {0, 1, . . . q 1} and y {0, 1, . . . p 1}. Evidently is a permutation.
We have
(qx + y) = (qx + y, y)
(x + py) = (x, x + py)
and therefore
1 = .
Let us compare the signatures of the two sides. The permutation m 7 qx + y is the
composition of m 7 qx and m 7 m + y. The latter has signature 1, whence by Lemma 1,
q
q
q
() =
=
p
p
and similarly
p
p
p
=
() =
.
q
q
By Lemma 2,
( 1 ) = (1)(p1)(q1)/4 .
490
Thus
(p1)(q1)/4
(1)
which is the quadratic reciprocity law.

p
q
=
p
q
Reference
G. Zolotarev, Nouvelle demonstration de la loi de reciprocite de Legendre, Nouv. Ann. Math
(2), 11 (1872), 354-362
86.4
cubic reciprocity law
In a ring Z/nZ, a cubic residue is just a value of the function x3 for some invertible element
x of the ring. Cubic residues display a reciprocity phenomenon similar to that seen with
quadratic residues. But we need some preparation in order to state the cubic reciprocity
law.
3
will denote 1+i
, which is one of the complex cube roots of 1. K will denote the ring
2
K = Z[]. The elements of K are the complex numbers a + b where a and b are integers.
We define the norm N : K Z by
N(a + b) = a2 + ab + b2
or equivalently
N(z) = zz .
Whereas Z has only two units (meaning invertible elements), namely 1, K has six, namely
all the sixth roots of 1:
1
and we know 2 = 1 . Two nonzero elements and of K are said to be associates

if = for some unit . This is an equivalence relation, and any nonzero element has six
associates.
K is a principal ring, hence has unique factorization. Let us call K irreducible if
the condition = implies that or , but not both, is a unit. It turns out that the
irreducible elements of K are (up to multiplication by units):
the number 1 , which has norm 3. We will denote it by .
positive real integers q 2 (mod 3) which are prime in Z. Such integers are called
rational primes in K.
complex numbers q = a + b where N(q) is a prime in Z and N(q) 1 (mod 3).
491
For example, 3 + 2 is a prime in K because its norm, 19, is prime in Z and is 1 mod 3; but
19 is not a prime in K.
Now we need some convention whereby at most one of any six associates is called a prime.
By convention, the following numbers are nominated:
the number .
rational primes (rather than their negative or complex associates).
complex numbers q = a + b where N(q) 1 (mod 3) is prime in Z and
a 2
b 0
(mod 3)
(mod 3) .
One can verify that this selection exists and is unambigous.

Next,weseek a three-valued function analogous to the two-valued quadratic residue character
x 7 xp . Let be a prime in K, with 6= . If is any element of K such that - , then
N ()1 1
(mod ) .
Since N() 1 is a multiple of 3, we can define a function

: K {1, , 2}
by
() (N ()1)/3 if -
() = 0 if | .
is a character, called the cubic residue character mod . We have () = 1 if and only
if is a nonzero cube mod . (Compare Eulers criterion.)
At last we can state this famous result of Eisenstein and Jacobi:
Theorem (Cubic Reciprocity Law): If and are any two distinct primes in K, neither
of them , then
() = () .
The quadratic reciprocity law has two supplements which describe
wise the cubic law has this supplement, due to Eisenstein:
Theorem: For any prime in K, other than ,
() = 2m
492
1
p

and 2p . Like-
where
m = ( + 1)/3
m = (a + 1)/3
if is a rational prime
if = a + b is a complex prime.
Remarks:Some writers refer to our irreducible elements as primes in K; what we have

called primes, they call primary primes.
The quadratic reciprocity law would take a simpler form if we were to make a different
convention on what is a prime in Z, a convention similar to the one in K: a prime in Z is
either 2 or an irreducible element x of Z such that x 1 (mod 4). The primes would then
be 2, -3, 5, -7, -11, 13, . . . and the QRL would say simply

q
p
=1
q
p
for any two distinct odd primes p and q.
86.5
proof of Eulers criterion
(All congruences are modulo p for the proof; omitted for clarity.)
Let
x = a(p1)/2
Then x2 1 by Fermats little theorem. Thus:
x 1
Now consider the two possibilities:
If a is a quadratic residue then by definition, a b2 for some b. Hence:
x a(p1)/2 bp1 1
It remains to show that a(p1)/2 1 if a is a quadratic non-residue. We can proceed

in two ways:
Proof (a) partition the set { 1, . . . , p 1 } into pairs { c, d } such thatcd a. Then
c and d must always be distinct since a is a non-residue. Hence, the product of
the union of the partitions is:
(p 1)! a(p1)/2 1
and the result follows by Wilsons theorem.
493
Proof (b) The equation:

z (p1)/2 1
has at most (p 1)/2 roots. But we already know of (p 1)/2 distinct roots of
the above equation, these being the quadratic residues modulo p. So a cant be a
root, yet a(p1)/2 1. Thus we must have:
a(p1)/2 1
QED.
Version: 3 Owner: liyang Author(s): liyang
86.6
proof of quadratic reciprocity rule
The quadratic reciprocity law is:

Theorem:
Let p and q be distinct odd primes, and write p = 2a + 1 and q = 2b + 1.
(Gauss)

p
q
Then q
= (1)ab .
p
(
v
w
is the Legendre symbol.)
Proof: Let R be the subset [a, a] [b, b] of Z Z. Let S be the interval

[(pq 1)/2, (pq 1)/2]
of Z. By the Chinese remainder theorem, there exists a unique bijection f : S R such
that, for any s S, if we write f (s) = (x, y), then x s (mod p) and y s (mod q).
Let P be the subset of R consisting of the values of f on [1, (pq 1)/2]. P contains, say, u
elements of the form (x, 0) such that x < 0, and v elements of the form (0, y) with y < 0.
Intending to apply Gauss lemma, we seek some kind of comparison between u and v.
We define three subsets of P by
R0 = {(x, y) P |x > 0, y > 0}
R1 = {(x, y) P |x < 0, y 0}
R2 = {(x, y) P |x 0, y < 0}
and we let Ni be the cardinal of Ri for each i.
P has ab + b elements in the region y > 0, namely f (m) for all m of the form k + lq with
1 k b and 0 l a. Thus
N0 + N1 = ab + b (b v) + u
494
i.e.
N0 + N1 = ab + u + v.
(86.6.1)
Swapping p and q, we have likewise

N0 + N2 = ab + u + v.
(86.6.2)
Furthermore, for any s S, if f (s) = (x, y) then f (s) = (x, y). It follows that for any
(x, y) R other than (0, 0), either (x, y) or (x, y) is in P , but not both. Therefore
N1 + N2 = ab + u + v.
(86.6.3)
Adding (1), (2), and (3) gives us

0 ab + u + v
(mod 2)
so
(1)ab = (1)u (1)v
which, in view of Gausss lemma, is the desired conclusion.
For a bibliography of the more than 200 known proofs of the QRL, see Lemmermeyer .
86.7
quadratic character of 2
For any odd prime p, Gauss lemma quickly yields

2
= 1 if p 1 (mod 8)
p

2
= 1 if p 3 (mod 8)
p
(86.7.1)
(86.7.2)
But there is another way, which goes back to Euler, and is worth seeing, inasmuch as it is
the prototype of certain more general arguments about character sums.
Let be a primitive eighth root of unity in an algebraic closure of Z/pZ, and write =
+ 1 . We have 4 = 1, whence 2 + 2 = 0, whence
2 = 2 .
By the binomial formula, we have
p = p + p .
495
If p 1 (mod 8), this implies p = . If p 3 (mod 8), we get instead p = 5 + 5 =

1 = . In both cases, we get p1 = p2 , proving (1) and (2).
A variation of the argument, closer to Eulers, goes as follows. Write

= exp(2i/8)
= + 1
Both are algebraic integers. Arguing much as above, we end up with

2
p1
(mod p)
p
which is enough.
86.8
quadratic reciprocity for polynomials
Let F be a finite field of characteristic p, and let f and g be distinct monic irreducible

(non-constant) polynomials in the polynomial ring F [X]. Define the Legendre symbol fg
by
(

1
if f is a square in the quotient ring F [X]/(g),
f
:=
g
1 otherwise.
The quadratic reciprocity theorem for polynomials over a finite field states that

p1
f
g
= (1) 2 deg(f ) deg(g) .
g
f
REFERENCES
1. Feng, Ke Qin and Ying, Linsheng, An elementary proof of the law of quadratic reciprocity in
Fq (T ). Sichuan Daxue Xuebao 26 (1989), Special Issue, 3640.
2. Merrill, Kathy D. and Walling, Lynne H., On quadratic reciprocity over function fields. Pacific
J. Math. 173 (1996), no. 1, 147150.
496
86.9
quadratic reciprocity rule
The quadratic reciprocity rule states that

q
p
= (1)(p1)(q1)/4
p
q

where is the Legendre symbol, p and q are odd and prime, and at least one of p or q is
positive.
Note that the Legendre symbol may also appear as (p | q).
86.10
quadratic residue
Let a, n be relatively prime integers. If there exists an integer x that satisfies

x2 a
(mod n)
then a is said to be a quadratic residue of n. Otherwise, a is called a quadratic nonresidue of n.

497
Chapter 87
11A25 Arithmetic functions; related
numbers; inversion formulas
87.1
Dirichlet character
Z
A Dirichlet character mod m is a group homomorphism from mZ
: Z C given by

(n mod m) (gcd(n, m) = 1)
(n) =
0
(gcd(n, m) 6= 1)
to C . The function
is also referred to as a Dirichlet character. The Dirichlet characters mod m form a group if
Z
one defines 0 to be the function which takes a mZ
to (a)0(a). It turns out that

Z
this resulting group is isomorphic to mZ . The trivial character is given by (a) = 1 for

Z
, and it acts as the identity element for the group. A character is said to be
all a mZ
primitive if it does not arise as the composite

Z
Z
C ,
mZ
m0Z
for any proper divisor m0|m, where the first map is the natural mapping and the second map
is a character mod m0. If is non-primitive, the gcd of all such m is called the conductor
of .
87.2
Liouville function
The Liouville function is defined by (1) = 1 and (n) = (1)k1 +k2 ++kr , if the prime
factorization of n > 1 is n = pk11 pk22 pkr r . This function is multiplicative and satisfies the
498
identity
X
d|n
(
1 if n = m2 for some integer m
(d) =
0 otherwise.
87.3
Mangoldt function
The Mangoldt function is defined by

(
ln p, if n = pk , where p is a prime and k is a natural number > 1
(n) =
0,
otherwise
The Moebius inversion formula leads to the identity (n) =
87.4
d|n (n/d) ln d =
d|n
(d) ln d.
Mertens first theorem
For any real number x > 2 we have

X ln p
p6x
= ln x + O(1)
for all prime integers p.

Moreover, the term O(1) arising in this formula lies in the open interval (1 ln 4, ln 4).
87.5
Moebius function
For a positive integer n, define by
if n = 1
1,
(n) = 0,
if p2 |n for some prime p
(1)r , if n = p1 p2 pr , where the pi are distinct primes.

499
In other words, (n) = 0 if n is not a square-free integer, while (n) = (1)r if n is square-free
with r prime factors. The function is a multiplicative function, and obeys the identity
(
X
1 if n = 1
(d) =
0 if n > 1
d|n
where d runs through the positive divisors of n.
87.6
Moebius in version
The M
obius function and inversion formula
For any integer n 1, let (n) be 0 if n is divisible by the square of a prime number, and if
not, let (n) = (1)k(n) where k(n) is the number of primes which divide n. The resulting
function : N {1, 0, 1} is called the M
obius function, or just the Mobius function.
Proposition 1: is the unique mapping N Z such that
X
(1) = 1
(87.6.1)
(d) = 0 for all n > 1
(87.6.2)
d|n
Proof: By induction, there can only be one function with these properties. clearly satisfies
(1), so take some n > 1. Let p be some prime factor of n, and let m be the product of all
the prime factors of n.
X
X
(d) =
(d)
d|n
d|m
(d) +
d|m
p-d
(d)
d|m
p|d
d|m/p
(d)
= 0
(d)
d|m/p
Proposition 2: Let f and g be two mappings of N into some given additive group. The
conditions
X
f (n) =
g(d) for all n N
(87.6.3)
d|n
g(n) =
X
d|n
(d)f
n
500
for all n N
(87.6.4)
are equivalent.
Proof: Fix some n N . Assuming (3), we have
n
X
X
X
(d)f
=
(d)
g(e)
d
d|n
d|n
e|n/d
n
XX
=
(d)g
k
k|n d|k
X n X
=
g
(d)
k
k|n
= g(n)
d|k
by Proposition 1
Conversely, assuming (4), we get

X
d|n

d
g(d) =
(e)f
e
d|n e|d
X n X k
=
f
k
e
XX
k|n
e|k
= f (n)
by Proposition 1
as claimed.
Definitions: In the notation of Proposition 1, f is called the Mobius transform of g, and
formula (4) is called the Mobius inversion formula.
M
obius-Rota inversion
G.-C. Rota has described a generalization of the Mobius formalism. In it, the set N , ordered
by the relation x|y between elements x and y, is replaced by a more general ordered set, and
is replaced by a function of two variables.
Let (S, ) be a locally finite ordered set, i.e. an ordered set such that {z S|x z y} is
a finite set for all x, y S. Let A be the set of functions : S S Z such that
(x, x) = 1 for all x S
(x, y) 6= 0 implies x 6 y
(87.6.5)
(87.6.6)
A becomes a monoid if we define the product of any two of its elements, say and , by
X
()(x, y) =
(x, t)(t, y).
tS
The sum makes sense because (x, t)(t, y) is nonzero for only finitely many values of t.
(Clearly this definition is akin to the definition of the product of two square matrices.)
501
Consider the element of A defined simply by

(
1 if x y
(x, y) =
0 otherwise.
The function , regarded as a matrix over Z, has an inverse matrix, say . That means
(
X
1 if x = y,
(x, t)(t, y) =
0 otherwise.
tS
Thus for any f, g A, the equations
f = g
g = f
(87.6.7)
(87.6.8)
are equivalent.
Now lets sketch out how the traditional Mobius inversion is a special case of Rotas notion.
Let S be the set N , ordered by the relation x|y between elements x and y. In this case,
is essentially a function of only one variable:
Proposition 3: With the above notation, (x, y) = (y/x) for all x, y N such that x|y.
The proof is fairly straightforward, by induction on the number of elements of the interval
{z S|x z y}.
Now let g be a function from N to some additive group, and write g(x, y) = g(y/x) for all
pairs (x, y) such that x|y. The equivalence of (7) and (8), for g and its transform g, is just
Proposition 2.
Example:Let E be a set, and let S be the set of all finite subsets of E, ordered by inclusion.
The ordered set S is left-finite, and for any x, y S such that x y, we have (x, y) =
(1)|yx| , where |z| denotes the cardinal of the finite set z.
A slightly more sophisticated example comes up in connection with the chromatic polynomial
of a graph or matroid.
Version: 9 Owner: mathcam Author(s): Larry Hammick, KimJ
87.7
arithmetic function
An arithmetic function is a function f : Z+ C from the positive integers to the

complex numbers.
There are two noteworthy operations on the set of arithmetic functions:
502
If f and g are two arithmetic functions, the sum of f and g, denoted f + g, is given by
(f + g)(n) = f (n) + g(n),
and the Dirichlet convolution of f and g, denoted by f g, is given by
n
X
(f g)(n) =
f (d)g
.
d
d|n
The set or arithmetic functions, equippied with these two binary operations, forms a commutative ring.
The 0 of the ring is the function f such that f (n) = 0 for any positive integer n. The 1 of
the ring is the function f with f (1) = 1 and f (n) = 0 for any n > 1.
87.8
multiplicative function
In the number theory, a multiplicative function is an arithmetic function f (n) of the positive
integer n with the property that f (1) = 1 and whenever a and b are coprime, then:
f (ab) = f (a)f (b) .
An arithmetic function f (n) is said to be completely multiplicative if f (1) = 1 and f (ab) =
f (a)f (b) holds for all positive integers a and b, even when they are not coprime. In this case
the function is a homomorphism of monoids and, because of the fundamental theorem of arithmetic,
is completely determined by its restriction to the prime numbers. Every completely multiplicative function is multiplicative.
Outside the number theory, the term multiplicative is usually used for all functions with the
property f (ab) = f (a)f (b) for all arguments a and b. This article discusses number theoretic
multiplicative functions.
Examples Examples of the multiplicative functions include many functions of an importance in the number theory, such as:
(n): the Euler totient function, counting the positive integers coprime to n,
(n): the Moebius function, related to the number of prime factors of square-free numbers,
d(n): the number of positive divisors of n,
(n): the sum of all the positive divisors of n,
503
k (n): the sum of the k-th powers of all the positive divisors of n (where k may be any
complex number),
Id(n): the identity function, defined by Id(n) = n,
Idk (n): the power functions, defined by Idk (n) = nk for any natural number (or even
complex number) k,
1(n): the constant function, defined by 1(n) = 1,
(n): the function defined by:
(n) =
X
d|n
(
0, if n > 1
(d) =
1, if n = 1 ,
where d runs through the positive divisors of n.

An example of a non-multiplicative function is the arithmetic function r2 (n) - the number
of representations of n as a sum of squares of two integers, positive, negative, or zero, where
in a counting the number of ways, reversal of order is allowed. For example:
1 = 12 + 02 = (1)2 + 02 = 02 + 12 = 02 + (1)2
and therefore r2 (1) = 4 6= 1. This shows that the function is not multiplicative. However,
r2 (n)/4 is multiplicative.
Properties A multiplicative function is completely determined by its values at the powers
of prime numbers, a consequence of the fundamental theorem of arithmetic. Thus, if n is a
product of powers of distinct prime numbers, say n = pa q b , then f (n) = f (pa )f (q b ) .
This property of multiplicative functions significantly reduces the need for computation, as
in the following examples for n = 144 = 24 32 :
d(144)
(144)
2 (144)
3 (144)
=
=
=
=
0 (144) = 0 (24 )0 (32 ) = (10 + 20 + 40 + 80 + 160 )(10 + 30 + 90 ) = 5 3 = 15,

1 (144) = 1 (24 )1 (32 ) = (11 + 21 + 41 + 81 + 161 )(11 + 31 + 91 ) = 31 13 = 403,
2 (24 )2 (32 ) = (12 + 22 + 42 + 82 + 162 )(12 + 32 + 92 ) = 341 91 = 31031,
3 (24 )3 (32 ) = (13 + 23 + 43 + 83 + 163 )(13 + 33 + 93 ) = 4681 757 = 3543517.
Similarly, we have:
(144) = (24 )(32 ) = 8 6 = 48
504
Convolution If f and g are two arithmetic functions, one defines a new arithmetic function
f g, the convolution of f and g, by:
(f g)(n) =
f (d)g
d|n
n
d
where the sum extends over all positive divisors d of n. Some general properties of this
operation include (here the argument n is omitted in all functions):
If both f and g are multiplicative, then so is f g,
f g = g f,
(f g) h = f (g h),
f = f = f.
This shows that the multiplicative functions with the convolution form a commutative monoid
with the identity element . Relations among the multiplicative functions discussed above
include:
1 = (the Moebius inversion formula),
1 1 = d,
Id 1 = ,
Idk 1 = k ,
1 = Id.
Version: 8 Owner: XJamRastafire Author(s): XJamRastafire
87.9
non-multiplicative function
In the number theory, a non-multiplicative function is an arithmetic function f (n) of the

positive integer n which is not multiplicative.
505
Examples Some examples of a non-multiplicative functions are the arithmetic functions:

r2 (n) - the number of unordered represetantions of n as a sum of squares of two integers,
positive, negative or zero,
c4 (n) - the number of ways that n can be expressed as the sum of four squares of
nonnegative integers, where we distinguish between different orders of the summands.
For example:
1 = 12 + 0 2 + 0 2 + 0 2 = 0 2 + 1 2 + 0 2 + 0 2 + 0 2 = 0 2 + 0 2 + 1 2 + 0 2 = 0 2 + 0 2 + 0 2 + 1 2 ,
hence c4 (1) = 4 6= 1 .
The partition function P (n) - the number of ordered representations of n as a sum of
positive integers. For instance:
P (2 5) = P (10) = 42
and
P (2)P (5) = 2 7 = 14 6= 42 .
The prime counting function (n). Here we first have (1) = 0 6= 1 and then we have
as yet for example:
(2 5) = (10) = 4
and
(2)(5) = 1 3 = 3 6= 4 .
The Mangoldt function (n). (1) = ln 1 6= 1 and for example:
(2 5) = (10) = 0
and
(2)(5) = ln 2 ln 5 6= 0 .
We would think that for some n multiplicativity of (n) would be true as in:
(2 6) = (12) = 0
and
(2)(6) = ln 2 0 = 0 ,
but we have to write:
(22 )(3) = ln 2 ln 3 6= 0 .
506
87.10
totient
A totient is a sequence f : {1, 2, 3, . . .} C such that

gf =h
for some two completely multiplicative sequences g and h, where denotes the convolution
product (or Dirichlet product; see multiplicative function).
The term totient was introduced by Sylvester in the 1880s, but is seldom used nowadays
except in two cases. The Euler totient satisfies
0 = 1
where k denotes the function n 7 nk (which is completely multiplicative). The more general
Jordan totient Jk is defined by
0 Jk = k .
87.11
unit
Let R be a ring with multiplicative identity 1. We say that u R is an unit (or unital) if u
divides 1 (denoted u | 1). That is, there exists an r R such that 1 = ur = ur.
507
Chapter 88
11A41 Primes
88.1
Chebyshev functions
There are two different functions who are collectively known as the Chebyshev functions:
(x) =
log p
p6x
where the notation used indicates the summation over all positive primes p less than or equal
to x, and
X
(x) =
k log p,
p6x
where the same summation notation is used and k denotes the unique integer such that
pk 6 x but pk+1 > x. Heuristically, these first of these two functions measures the number
of primes less than x and the second does the same, but weighting each prime in accordance
with their logarithmic relationship to x.
Many innocuous results in number theory owe their proof to a relatively simple analysis of
the asymptotics of one or both of these functions. For example, the fact that for any n, we
have
Y
p < 4n
p6n
is equivalent to the statement that (x) < x log 4.

A somewhat less innocuous result is that the prime number theorem (i.e. that (x) logx x )
is equivalent to the statement that (x) x, which in turn, is equivalent to the statement
that (x) x.
508
REFERENCES
1. Ireland, Kenneth and Rosen, Michael. A Classical Introduction to Modern Number Theory.
Springer, 1998.
2. Nathanson, Melvyn B. Elementary Methods in Number Theory. Springer, 2000.
88.2
Euclids proof of the infinitude of primes
If there was only a finite amount of primes then there would be some largest prime p.
However p! + 1 is not divisible by any number n 6 p greater than one, so p! + 1 cannot be
factored by the primes we already know, but every integer greater than one is divisible by
at least one prime, so there must be some prime greater than p by which p! + 1 is divisible.
88.3
Mangoldt summatory function
A number theoretic function used in the study of prime numbers; specifically it was used in
the proof of the prime number theorem.
It is defined thus:
(x) =
(r)
r6x
where (x) is the Mangoldt function.

The Mangoldt Summatory Function is valid for all positive real x.
Note that we do not have to worry that the inequality above is ambiguous, because (x) is
only non-zero for natural x. So no matter whether we take it to mean r is real, integer or
natural, the result is the same because we just get a lot of zeros added to our answer.
The prime number theorem, which states:
(x)
x
ln(x)
where (x) is the prime counting function, is equivalent to the statement that:
509
(x) x
We can also define a smoothing function for the summatory function, defined as:
1 (x) = intx0 (t)dt
and then the prime number theorem is also equivalent to:
1
1 (x) x2
2
which turns out to be easier to work with than the original form.
88.4
Mersenne numbers
Numbers of the form

Mn = 2n 1, (n > 1)
are called Mersenne numbers after Father Marin Mersenne, a French monk who wanted to
discover which such numbers are actually prime. Mersenne primes have a strong connection
with perfect numbers.
The currently known Mersenne primes are n = 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107,
127, 521, 607, 1279, 2203, 2281, 3217, 4253, 4423, 9689, 9941, 11213, 19937, 21701, 23209,
44497, 86243, 110503, 132049, 216091, 756839, 859433, 1257787, 1398269, 2976221, 3021377,
6972593, and 13,466,917.
It is conjectured that the density of Mersenne primes with exponent p < x is of order
e
log log x
log 2
where is Eulers constant.
88.5
Thues lemma
Let p be a prime number of the form 4k + 1 . Then there are two unique integers a and b
with 0 < a < b such that p = a2 + b2 . Additionally, if a number p can be written in as the
510
sum of two squares in 2 different ways, then the number p is composite.

Version: 4 Owner: mathcam Author(s): mathcam, slash
88.6
composite number
A composite number is a natural number which is not prime and not equal to 1. That is,
n is composite if n = ab, with a and b natural numbers both not equal to 1.
Examples.
1 is not composite (and also not prime), by definition.
2 is not composite, as it is prime.
15 is composite, since 15 = 3 5.
93555 is composite, since 93555 = 33 5 7 9 11.
52223 is not composite, since it is prime.
88.7
prime
An integer p is prime if it has exactly two positive divisors. The first few positive prime
numbers are 2, 3, 5, 7, 11, . . . .
A prime number is often (but not always) required to be positive.
88.8
prime counting function
The prime counting function is a non-multiplicative function for any positive real number
x, denoted as (x) and gives the number of primes not exceeding x. It usually takes a
positive integer n for an argument. The first few values of (n) for n = 1, 2, 3, . . . are
0, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 8, 8 . . . (Sloanes sequence A000720 ).
511
The asymptotic behavior of (x) x/ ln x is given by the prime number theorem. This
function is closely related with the Chebyshevs functions (x) and (x).
88.9
prime difference function
The prime difference function is an arithmetic function for any positive integer n, denoted
as dn and gives the difference between two consecutive primes pn and pn+1 :
dn pn+1 pn .
For example:
d1 = p2 p1 = 3 2 = 1,
d10 = p11 p10 = 31 29 = 2,
d100 = p101 p100 = 547 541 = 6,
d1000 = p1001 p1000 = 7927 7919 = 8,
d10000 = p10001 p10000 = 104743 104729 = 14 and so forth.
The first few values of dn for n = 1, 2, 3, . . . are 1, 2, 2, 4, 2, 4, 2, 4, 6, 2, 6, 4, 2, 4, 6, 6, 2, 6, 4, 2, . . .
(Sloanes sequence A001223).
88.10
prime number theorem
Define (x) as the number of primes less than or equal to x. The prime number theorem
asserts that
x
(x) v
log x
as x , that is, (x)/ logx x tends to 1 as x increases. Here log x is the natural logarithm.
There is a sharper statement that is also known as the prime number theorem:
(x) = li x + R(x),
512
where li is the logarithmic integral defined as

li x =
intx2
x
1!x
(k 1)!x
dt
=
+
++
+O
2
log t
log x (log x)
(log x)k
x
(log x)k+1
and R(x) is the error term whose behavior is still not fully known. From the work of Korobov
and Vinogradov on zeroes of Riemann zeta-function it is known that

R(x) = O x exp(c()(log x) )
for every > 53 . The unproven Riemann hypothesis is equivalent to the statement that
R(x) = O(x1/2 log x).
There exist a number of proofs of the prime number theorem. The original proofs by
Hadamard [4] and de la Vallee Poussin[7] called on analysis of behavior of the Riemann zeta function
(s) near line Re s = 1 to deduce the estimates for R(x). For a long time it was an open
problem to find an elementary proof of the prime number theorem (elementary meaning
not involving complex analysis). Finally Erdos and Selberg [3, 6] found such a proof.
Nowadays there are some very short proofs of the prime number theorem (for example, see
[5]).
REFERENCES
1. Tom M. Apostol. Introduction to Analytic Number Theory. Narosa Publishing House, second
edition, 1990. Zbl 0335.10001.
2. Harold Davenport. Multiplicative Number Theory. Markham Pub. Co., 1967. Zbl 0159.06303.
3. Paul Erdos. On a new method in elementary number theory. Proc. Nat. Acad. Sci. U.S.A.,
35:374384, 1949. Zbl 0034.31403.
4. Jacques Hadamard. Sur la distribution des zeros de la fonction (s) et ses consequences
arithmetiques. Bull. Soc. Math. France, 24:199220. JFM 27.0154.01.
5. Donald J. Newman. Simple analytic proof of the prime number theorem. Amer. Math. Monthly,
87:693696, 1980. Available online at JSTOR.
6. Atle Selberg. An elementary proof of the prime number theorem. Ann. Math. (2), 50:305311,
1949. Zbl 0036.30604.
7. Charles de la Vallee Poussin. Recherces analytiques sur la theorie des nombres premiers. Ann.
Soc. Sci. Bruxells, 1897.
Version: 7 Owner: bbukh Author(s): bbukh, KimJ
88.11
prime number theorem result
Gauss discovered that (n) is approximately
n
ln n
513
but is also approximated by the function :
Li(x) = intn2
1
dx
ln x
.
88.12
proof of Thues Lemma
Let p be a prime congruent to 1 mod 4. By Eulers criterion (or by Gauss lemma), the
congruence
x2 1 (mod p)
(88.12.1)
has a solution. By Dirichlets approximation theorem, there exist integers a and b such that

x

1
a b 1
(88.12.2)
p
[ p] + 1 < p
1 a [ p] < p
(2) tells us
|ax bp| <
p.
Write u = |ax bp|. We get

u2 + a2 a2 x2 + a2 0
(mod p)
and
0 < u2 + a2 < 2p ,
whence u2 + a2 = p, as desired.
To prove Thues lemma in another way, we will imitate a part of the proof of Lagranges four-square theorem
From (1), we know that the equation
x2 + y 2 = mp
(88.12.3)
has a solution (x, y, m) with, we may assume, 1 m < p. It is enough to show that if m > 1,
then there exists (u, v, n) such that 1 n < m and
u2 + v 2 = np .
If m is even, then x and y are both even or both odd; therefore, in the identity

2
2
x+y
xy
x2 + y 2
+
=
2
2
2
both summands are integers, and we can just take n = m/2 and conclude.
514
If m is odd, write a x (mod m) and b y (mod m) with |a| < m/2 and |b| < m/2. We
get
a2 + b2 = nm
for some n < m. But consider the identity
(a2 + b2 )(x2 + y 2 ) = (ax + by)2 + (ay bx)2 .
On the left is nm2 p, and on the right we see
ax + by x2 + y 2 0 (mod m)
ay bx xy yx 0 (mod m) .
Thus we can divide the equation
nm2 p = (ax + by)2 + (ay bx)2
through by m2 , getting an expression for np as a sum of two squares. The proof is complete.
Remark:The solutions of the congruence (1) are explicitly

p1
x
! (mod p) .
2
Version: 4 Owner: mathcam Author(s): Larry Hammick, slash
88.13
semiprime
A composite number which is the product of two (possibly equal) primes is called semiprime.
Such numbers are sometimes also called 2-almost primes. For example:
1 is not a semiprime because it is not a composite number or a prime,
2 is not a semiprime, as it is a prime,
4 is a semiprime, since 4 = 2 2,
8 is not a semiprime, since it is a product of three primes (8 = 2 2 2),
2003 is not a semiprime, as it is a prime,
2005 is a semiprime, since 2005 = 5 401,
2007 is not a semiprime, since it is a product of three primes (2007 = 3 3 223).
515
The first few semiprimes are 4, 6, 9, 10, 14, 15, 21, 22, 25, 26, 33, 34, 35, 38, 39, 46, 49, 51, 55, 57, 58, 62, . . .
(Sloanes sequence A001358 ). The Moebius function (n) for semiprimes can be only equal
to 0 or 1. If we form an integer sequence of values of (n) for semiprimes we get a binary
sequence: 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, . . .. (Sloanes sequence A072165
).
All the squares of primes are also semiprimes. The first few squares of primes are then
4, 9, 25, 49, 121, 169, 289, 361, 529, 841, 961, 1369, 1681, 1849, 2209, 2809, 3481, 3721, 4489, 5041, . . ..
(Sloanes sequence A001248 ). The Moebius function (n) for the squares of primes is always
equal to 0 as it is equal to 0 for all the squares of semiprimes.
Version: 5 Owner: drini Author(s): drini, XJamRastafire
88.14
sieve of Eratosthenes
The sieve of Eratosthenes is a simple algorithm for generating prime numbers between 1 and
some arbitrary integer N.
Let p = 2, which is of course known to be prime. Mark all positive multiples of p (2, 4, 6, 8 . . . )
as composite. Now let p be the smallest number not marked as composite (in this case, 3);
it must be the next prime.
Again, mark all positive multiples of p as composite. Continue
this process while p 6 N. When done, all numbers less than N that have not been marked
as composite are prime.
For many years, the sieve of Eratosthenes was the fastest known algorithm for generating
primes. Today, there are faster methods, such as a quadratic sieve.
88.15
test for primality of Mersenne numbers
Suppose p is an odd prime, and define a sequence Ln recursively as

L0 = 4,
Ln+1 = (L2n 2) mod (2p 1).
The number 2p 1 is prime if an only if Lp2 = 0.
REFERENCES
1. Donald E. Knuth. The Art of Computer Programming, volume 2. Addison-Wesley, 1969.

516
Chapter 89
11A51 Factorization; primality
89.1
Fermat Numbers
Fermat Numbers.
The n-th Fermat number is defined as:
n
Fn = 22 + 1.
Fermat incorrectly conjectured that all these numbers were primes, although he had no
proof. The first 5 Fermat numbers: 3, 5, 17, 257, 65537 (corresponding to n = 0, 1, 2, 3, 4) are
all primes (so called Fermat primes) Euler was the first to point out the falsity of Fermats
conjecture by proving that 641 is a divisor of F5 . (In fact, F5 = 641 6700417). Moreover,
no other Fermat number is known to be prime for n > 4, so now it is conjectured that
those are all prime Fermat numbers. It is also unknown whether there are infinitely many
composite Fermat numbers or not.
One of the famous achievements of Gauss was to prove that the regular polygon of m sides
can be constructed with ruler and compass if and only if m can be written as
m = 2k Fr1 Fr2 Frt
where k 0 and the other factors are distinct primes of the form Fn .
89.2
Fermat compositeness test
The Fermat compositeness test states that for any odd integer n > 0, if there exist an
integer b, between 1 and n 1 such that bn1 6 1 (mod n) then n is composite.
517
If bn1 6 1 (mod n) then b is a witness to ns compositeness.

If bn1 1 (mod n) then n is pseudoprime base b.
Fermat compositness test is a fast way to prove compositness of most numbers, but unfortunately there are composite numbers that are pseudoprime in every base. An example of
a such number is 561. These numbers are called Carmichael numbers (see EIS sequence
A002997 for a list of first few Carmichael numbers).
Proof of the Fermat compositeness test] Suppose n is prime. Then the Euler phi-function
of n is given by (n) = n 1 and by Fermats little theorem bn1 1 (mod n) for all
integers b. We can conclude that if this is not the case, then n is not prime, so n must be
composite.
[
Version: 8 Owner: bbukh Author(s): bbukh, basseykay
89.3
Zsigmondys theorem
For all positive integers q > 1 and n > 1, there exists a prime p which divides q n 1 but
doesnt divide q m 1 for 0 < m < n, except when q = 2k 1 and n = 2 or q = 2 and n = 6.
89.4
divisibility
Given integers a and b, then we say a divides b if and only if there is some q Z such that
b = qa.
There are many equivalent ways to notate this relationship:
a|b (read a divides b)
b is divisible by a
a is a factor of b
a is a divisor of b
The notion of divisibility can apply to other rings (e.g., polynomials).
518
89.5
division algorithm for integers
Given any two integers a, b where b > 0, there exists a unique pair of integers q, r such that
a = qb + r and 0 6 r < b. q is called the quotient of a and b, and r is the remainder.
The division algorithm is not an algorithm at all but rather a theorem. Its name probably
derives from the fact that it was first proved by showing that an algorithm to calculate the
quotient of two integers yields this result.
There are similar forms of the division algorithm that apply to other rings (for example,
polynomials).
89.6
proof of division algorithm for integers
Let a, b integers (b > 0). We want to express a = bq + r for some integers q, r with 0 6 r < b
and that such expression is unique.
Consider the numbers
. . . , a 3b, a 2b, a b, a, a + b, a + 2b, a + 3b, . . .
From all these numbers, there has to be a smallest non negative one. Let it be r. Since
r = a qb for some q,1 we have a = bq + r. And, if r > b then r wasnt the smallest nonnegative number on the list, since the previous (equal to r b) would also be non-negative.
Thus 0 6 r < b.
So far, we have proved that we can express a asbq + r for some pair of integers q, r such that
0 6 r < b. Now we will prove the uniqueness of such expression.
Let q 0 and r 0 another pair of integers holding a = bq 0 + r 0 and 0 6 r 0 < b. Suppose r 6= r 0 .
Since r 0 = a bq 0 is a number on the list, cannot be smaller or equal than r and thus r < r 0 .
Notice that
0 < r 0 r = (a bq 0 ) (a bq) = b(q q 0 )
so b divides r 0 r which is impossible since 0 < r 0 r < b. We conclude that r 0 = r. Finally,

if r = r 0 then abq = abq 0 and therefore q = q 0 . This concludes the proof of the uniqueness
part.
1
For example, if r = a + 5b then q = 5.
519
89.7
square-free number
A square-free number is a natural number that contains no powers greater than 1 in its
prime factorization. In other words, if x is our number, and
x=
r
Y
pai i
i=1
is the prime factorization of x into r distinct primes, then ai 2 is always false for square-free
x.
The name derives from the fact that if any ai were to be greater than or equal to two, we
could be sure that at least one square divides the x (namely, p2i .)
The asymptotic density of square-free numbers is 62 which can be proved by application of
a square-free variation of the sieve of Eratosthenes as follows:
X
A(n) =
[n is a squarefree]
k6n
XX
k6n
(d)
d2 |k
(d)
d6 n
k6n
d2 |n
(d)
d6 n
=n
jnk
d2
X (d)
+
O(
n)
d2
d6 n
=n
X (d)
d
=n
d2
+ O( n)
1
6
+ O( n) = n 2 + O( n).
(2)
It was shown that Riemann hypothesis implies error term O(n7/22+ ) in the above.
Version: 6 Owner: akrowne Author(s): bbukh, akrowne
89.8
squarefull number
A natural number n is called squarefull (or powerful) if for every prime p|n we have p2 |n. In
1978 Erdos conjectured that we cannot have three consecutive squarefull natural numbers.
520
If we assume the ABC conjecture, there are only finitely many such consecutive triples.
89.9
the prime power dividing a factorial
In 1808, Legendre showed that the exact power of a prime p dividing n! is

K
X
n
i=1
pi
where K is the largest power of p being 6 n.

f p > n then p doesnt divide n!, and its power is 0, and the sum above is empty. So let
the prime p n.
j k j
k
n
For each 1 i K, there are pni pi+1
numbers between 1 and n with i being the
greatest power of p dividing each. So the power of p dividing n! is
I
K
X
i=1
But each
j k
n
pi

n
n
i+1
.
i
pi
p
, i 2 in the sum appears with factors i and i 1, so the above sum equals
X
Corollary 1.

n
.
i=1
pi
K

K
X
n p (n)
n
=
,
K
p
p1
k=1
where p denotes the sums of digits function in base p.

I
f n < p, then p (n) = n and
j k
n
p
is 0 =
np (n)
.
p1
521
So we assume p 6 n.
Let nK nK1 n0 be the p-adic representation of n. Then

n p (n)
p1
PK
k=1 nk
P
k1
j=0
0j<kK
pj .nk

= n1 + n2 p + . . . + nK pK1

+ n2 + n3 p + . . . + nK pK2
..
.
+nK
=

K
X
n
k=1
522
pk
Chapter 90
11A55 Continued fractions
90.1
Stern-Brocot tree
If we start with irreducible fractions representing zero and infinity,

0 1
, ,
1 0
and then between adjacent fractions
m
n
and
m0
n0
we insert fraction
m+m0
,
n+n0
then we obtain
0 1 1
, , .
1 1 0
Repeating the process, we get
0 1 1 2 1
, , , , ,
1 2 1 1 0
and then
0 1 1 3 1 3 2 3 1
, , , , , , , , ,
1 3 2 2 1 2 1 1 0
and so forth. It can be proven that every irreducible fraction appears at some iteration
[1]. The process can be represented graphically by means of so-called Stern-Brocot tree,
named after its discoverers, Moris Stern and Achille Brocot.
523
0
1
1
1
0
1
1
2
0
1
1
3
0
1
1
4
1
3
1
1
1
2
2
5
1
2
1
0
2
3
3
5
2
3
2
1
1
1
3
4
1
1
3
2
4
3
3
2
1
0
2
1
5
3
2
1
3
1
5
2
3
1
1
0
4
1
1
0
If we specify position of a fraction in the tree as a path consisting of L(eft) an R(ight) moves
along the tree starting from the top (fraction 11 ), and also define matrices

1 1
1 0
L=
,
R=
,
0 1
1 1
n n0
then product of the matrices corresponding to the path is matrix m
m0 whose entries are
numerators and denominators of parent fractions. For example, the path leading to fraction
3
is LRL. The corresponding matrix product is
5

1 1 1 0 1 1
2 3
LRL =
=
,
0 1 1 1 0 1
1 2
and the parents of
3
5
are
1
2
and 23 .
REFERENCES
1. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics. AddisonWesley, 1998. Zbl 0836.00001.
90.2
continued fraction
Let (an )n1 be a sequence of positive real numbers, and let a0 be any real number. Consider
the sequence
1
c1 = a0 +
a1
524
c2 = a0 +
c3 = a0 +
1
a1 +
1
a2
1
a1 +
1
a2 + a1
c4 = . . .
The limit c of this sequence, if it exists, is called the value or limit of the infinite continued
fraction with convergents (cn ), and is denoted by
a0 +
or by
a0 +
1
a1 +
1
a2 + a
1
3 +...
1 1 1
...
a1 + a2 + a3 +
In the same way, a finite sequence

(an )1nk
defines a finite sequence
(cn )1nk .
We then speak of a finite continued fraction with value ck .
An archaic word for a continued fraction is anthyphairetic ratio.
If the denominators an are all (positive) integers, we speak of a simple continued fraction.
We then use the notation q = ha0 ; a1 , a2 , a3 , . . .i or, in the finite case, q = ha0 ; a1 , a2 , a3 , . . . , an i .
It is not hard to prove that any irrational number c is the value of a unique infinite simple
continued fraction. Moreover, if cn denotes its nth convergent, then c cn is an alternating
sequence and |c cn | is decreasing (as well as convergent to zero). Also, the value of an
infinite simple continued fraction is perforce irrational.
Any rational number is the value of two and only two finite continued fractions; in one of
them, the last denominator is 1. E.g.
43
= h1; 2, 3, 4i = h1; 2, 3, 3, 1i .
30
These two conditions on a real number c are equivalent:
1. c is a root of an 1. irreducible 1. quadratic polynomial 1. with integer coefficients.
2. c is irrational and its simple continued fraction is eventually periodic; i.e.
c = ha0 ; a1 , a2 , . . .i
525
and, for some integer m and some integer k > 0, we have an = an+k for all n m.
For example, consider the quadratic equation for the golden ratio:
x2 = x + 1
or equivalenty
x=1+
1
.
x
We get
1
1 + x1
1
= 1+
1 + 1+1 1
x = 1+
and so on. If x > 0, we therefore expect

x = h1; 1, 1, 1, . . .i
which indeed can be proved. As an exercise, you might like to look for a continued fraction
expansion of the other solution of x2 = x + 1.
Although e is transcendental, there is a surprising pattern in its simple continued fraction
expansion.
e = h2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, . . .i
No pattern is apparent in the expansions some other well-known transcendental constants,
such as and the Euler-Mascheroni constant .
Owing to a kinship with the Euclidean division algorithm, continued fractions arise naturally
in number theory. An interesting example is the Pell Diophantine equation
x2 Dy 2 = 1
where D is a nonsquare integer > 0. It turns out that if
(x, y) is any solution of the Pell
equation other than (1, 0), then |x/y| is a convergent to D.
22
7
and
355
113
are well-known rational approximations to , and indeed both are convergents to
3.14159265 . . . = = h3; 7, 15, 1, 292, ...i

22
14285714 . . . =
= h3.3; 7i
7
355
14159292 . . . =
= h3; 7, 15, 1i = h3; 7, 16i
113
For one more example, the distribution of leap years in the 4800-month cycle of the Gregorian
calendar can be interpreted (loosely speaking) in terms of the continued fraction expansion
of the number of days in a solar year.
Version: 18 Owner: bbukh Author(s): Larry Hammick, XJamRastafire
526
Chapter 91
11A63 Radix representation; digital
problems
91.1
Kummers theorem
Given integers n m 0 and a prime number p, then the power of p dividing

to the number of carries when adding n and n m in base p.
n
m
is equal
Proof:
For the proof we can allow base p representations of numbers with leading zeros. So let
nd nd1 n0 := n,
md md1 m0 := m,
all in base p. We set r = n m and denote the p-adic representation of r with rd rd1 r0 .
We define c1 = 0, and for each 0 j d

1 for
mj + rj p
cj =
0 otherwise.
(91.1.1)
Finally, we introduceas in the corollary in the entry on the prime power dividing a given factorial
p (n) as the sum of digits in the p-adic representation of n. Then it follows that the power
n
of p dividing m
is
p (m) + p (r) p (n)
.
p1
For each j 0, we have
nj = mj + rj + cj1 p.cj .
527
Then
p (m) + p (r) p (n)
=
=
Hence we have
the total number of carries.
Pd
k=0 ((p
Pd
k=0 (mk
d
X
1)cj ) +
|k=0
+ rk nk )
(cj cj1) = cd c1 = 0 .
p (m) + p (r) p (n) X

=
ck ,
p1
k=0
{z
91.2
corollary of Kummers theorem

n
As shown in Kummers theorem, the power of a prime number p dividing m
, m n N,
was the total number of carries when adding n and n m in base p. Well give an explicit
formula for the carry indicator.
Given integers n m 0 and a prime number p, let ni , mi , ri be the i-th digit of n, m, and
r := n m, respectively.
Define c1 = 0, and for each integer i 0 we define

1 if
mi + ri p,
ci =
.
0 otherwise
ni , mi , and ri are the i-th digits in the p-adic representations of n, m, and r, respectively.
(r = n m.)
For each i 0 we have
ni = mi + ri + ci1 p.ci .
Starting with the i-th digit of n, we multiply with increasing powers of p to get
!
d
d
d
X
X
X

ki
ki
nk p =
p (mk + rk ) +
pk1(i1) ck1 pk(i1) ck .
k=i
k=i
k=i
The last sum in the above equation leaves only the values for indices i and d, and we get

n
m
r
= i + i + ci1
(91.2.1)
i
p
p
p
for all i 0.
528
Chapter 92
11A67 Other representations
92.1
Sierpinski Erd
os egyptian fraction conjecture
Erdos and Sierpinski conjectured that for any integer n > 3 there exist positive integers
a, b, c so that:
5
1 1 1
= + +
n
a b c
92.2
adjacent fraction
Two fractions ab and dc , ab > dc of the positive integers a, b, c, d are adjacent if their difference
is some unit fraction n1 , n > 0 that is, if we can write:
a c
1
= .
b d
n
For example the two proper fractions and unit fractions
1
1
1
=
.
11 12
132
1
17
and
1
19
are not since:

1
1
2
=
.
17 19
323
529
1
11
and
1
12
are adjacent since:
It is not necessary of course that fractions are both proper fractions:

20 19
1
=
.
19 19
19
or unit fractions:
1
3 2
=
.
4 3
12
All successive terms of some Farey sequence Fn of a degree n are always adjacent fractions.
In the first Farey sequence F1 of a degree 1 there are only two adjacent fractions, namely 11
and 01 .
Adjacent unit fractions can be parts of many Egyptian fractions:
1
1
141
+
=
.
70 71
4970
92.3
any rational number is a sum of unit fractions
Representation
Any rational number ab Q between 0 and 1 can be represented as a sum of different
unit fractions. This result was known to the Egyptians, whose way for representing rational
numbers was as a sum of different unit fractions.
The following greedy algorithm can represent any 0
1. Let

b
n=
a
be the smallest natural number for which

2. Output
1
n
1
n
a 1
a0
=
.
b0
b n
530
< 1 as such a sum:
ab . If a = 0, terminate.
as the next term of the sum.
3. Continue from step 1, setting
a
b
Proof of correctness
The algorithm can never output the same unit fraction twice. Indeed, any n selected in step
1
1 is at least 2, so n1
< n2 so the same n cannot be selected twice by the algorithm, as
then n 1 could have been selected instead of n.
It remains to prove that the algorithm terminates. We do this by induction on a.
For a = 0: The algorithm terminates immediately.
For a > 0: The n selected in step 1 satisfies
b an < b + a.
So
an b
a 1
=
,
b n
bn
and 0 an b < a by the induction hypothesis, the algorithm terminates for
a
b
n1 .
QED
Problems
1. The greedy algorithm always works, but it tends to produce unnecessarily large denominators.
For instance,
1 1 1
47
= + + ,
60
3 4 5
1
but the greedy algorithm selects 2 , leading to the representation
47
1 1
1
= + + .
60
2 4 30
2. The representation is never unique. For instance, for any n we have the representations
1
1
1
=
+
n
n + 1 n (n + 1)
So given any one representation of ab as a sum of different unit fractions we can take
the largest denominator appearing n and replace it with two (larger) denominators.
Continuing the process indefinitely, we see infinitely many such representations, always.
531
92.4
conjecture on fractions with odd denominators
Egyptian fractions raise many open problems; this is one of the most famous of them.
Suppose we wish to write fractions as sums of distinct unit fractions with odd denominators.
Obviously, every such sum will have a reduced representation with an odd denominator.
For instance, the greedy algorithm applied to
1
1
1
+ 91 + 35
+ 315
.
7
2
7
gives
1
4
1
,
28
but we may also write
2
7
as
It is known that we can we represent every rational number with odd denominator as a sum
of distinct unit fractions with odd denominators.
However it is not known whether the greedy algorithm works when limited to odd denominators.
a
Conjecture 1. For any fraction 0 2k+1
< 1 with odd denominator, if we repeatedly
subtract the largest unit fraction with odd denominator that is smaller than our fraction, we
will eventually reach 0.
Version: 5 Owner: drini Author(s): bbukh, drini, ariels
92.5
unit fraction
An unit fraction nd is a fraction whose numerator n = 1. If its integer denominator d > 1,

then a fraction is also a proper fraction. So there is only one unit fraction which is improper,
namely 1.
Such fractions are known from Egyptian mathematics where we can find a lot of special
representations of the numbers as a sum of an unit fractions, which are now called Egyptian
fractions. From the Rhind papyrus as an example:
1
1
1
2
=
+
+
.
71
40 568 710
Many unit fractions are in the pairs of the adjacent fractions. An unit fractions are some
successive or non-successive terms of any Farey sequence Fn of a degree n. For example the
fractions 12 and 14 are adjacent, but they are not the successive terms in the Farey sequence
F5 . The fractions 13 and 14 are also adjacent and they are successive terms in the F5 .
532
Chapter 93
11A99 Miscellaneous
93.1
ABC conjecture
Suppose we have three mutually coprime integers A, B, C satisfying A + B = C. Given any

> 0, the ABC conjecture is that there is a constant () such that
max(|A|, |B|, |C|) 6 ()(rad(ABC))1+
where rad is the radical of an integer. This conjecture was formulated by Masser and Oesterle
in 1980.
The ABC conjecture is considered one of the most important unsolved problems in number
theory, as many results would follow directly from this conjecture. For example, Fermats last theorem
could be proved (for sufficiently large exponents) with perhaps one page worth of proof.
Further Reading
An interesting and elementary article on the ABC conjecture can be found at http://www.maa.org/mathland
93.2
Suranyi theorem
Every integer k can be expressed as the following sum:

k = 12 22 m2
533
for some m Z+ .
We firstly note that:
0 = 12 + 2 2 3 2 + 4 2 5 2 6 2 + 7 2
1 = 12
2 = 12 22 32 + 42
4 = 12 22 + 32
Now it suffices to prove that if the theorem is true for k then it is also true for k + 4.
As
(m + 1)2 (m + 2)2 (m + 3)2 + (m + 4)2 = 4
its simple to finish the proof:
if k = 12 m2 then
(k + 4) = 12 m2 + (m + 1)2 (m + 2)2 (m + 3)2 + (m + 4)2
and we are done.
Version: 4 Owner: mathcam Author(s): mathcam, alek thiery
93.3
irrational to an irrational power can be rational
Let A = 2 . If A is a rational number, its finished. Otherwise, if A is an irrational number,

2
let B = A 2 . Then B = 2 = 2 is a rational. Hence an irrational number to an irrational
power can be a rational number. (In fact its is proved thanks to Gelfond-Schneider theorem
that A is a trancendental number and then an irrational number)
93.4
triangular numbers
The triangular numbers are defined by the series

tn =
n
X
i=1
That is, the nth triangular number is simply the sum of the first n natural numbers. The
first few triangular numbers are
534
1, 3, 6, 10, 15, 21, 28, . . .

The name triangular number comes from the fact that the summation defining tn can be
visualized as the number of dots in
..
.

.. . .
.
.
where the number of rows is equal to n.

The closed-form for the triangular numbers is
t(n) =
n(n + 1)
2
Legend has it that a grammar-school-aged Gauss was told by his teacher to sum up all the
numbers from 1 to 100. He reasoned that each number i could be paired up with 101 i, to
form a sum of 101, and if this was done 100 times, it would result in twice the actual sum
(since each number would get used twice due to the pairing). Hence, the sum would be
1 + 2 + 3 + + 100 =
100(101)
2
The same line of reasoning works to give us the closed form for any n.
Another way to derive the closed form is to assume that the nth triangular number is less
than or equal to the nth square (that is, each row is less than or equal to n, so the sum of all
rows must be less than or equal to n n or n2 ), and then use the first few triangular numbers
to solve the general 2nd degree polynomial An2 + Bn + C for A, B, and C. This leads to
A = 1/2, B = 1/2, and C = 0, which is the same as the above formula for t(n).
535
Chapter 94
11B05 Density, gaps, topology
94.1
Cauchy-Davenport theorem
If A and B are non-empty subsets of Zp , then

|A + B| > min(|A| + |B| 1, p),
where A + B denotes the sumset of A and B.
REFERENCES
1. Melvyn B. Nathanson. Additive Number Theory: Inverse Problems and Geometry of Sumsets,
volume 165 of GTM. Springer, 1996.
94.2
Manns theorem
Let A and B be subsets of Z. If 0 A
B,
(A + B) > min(1, A + B),

where denotes Schnirelmann density.
This statement was known also as ( + )-conjecture until H. B. Mann proved it in 1942.
536
94.3
Schnirelmann density
Let A be a subset of Z, and let A(n) be number of elements of A in [1, n]. Schnirelmann
density of A is
A(n)
.
A = inf n
n
Schnirleman density has the following properties:
1. A(n) > nA for all n.
2. A = 1 if and only if N A
3. if 1 does not belong to A, then A = 0.
Schnirelman proved that if 0 A
B then
(A + B) > A + B A B
and also if A + B > 1, then (A + B) = 1. From these he deduced that if A > 0 then
A is an additive basis.
94.4
Sidon set
A set of natural numbers is called a Sidon set if all pairwise sums of its elements are
distinct. Equivalently, the equation a + b = c + d has only the trivial solution {a, b} = {c, d}
in elements of the set.
Sidon sets are a special case of so-called Bh [g] sets. A set A is called a Bh [g] set if for every
n N the equation n = a1 + . . . + ah has at most g different solutions with a1 6 6 ah
being elements of A. The Sidon sets are B2 [1] sets.
Define Fh (n, g) as the size of the largest Bh [g] set contained in the interval [1, n]. Whereas
it is known that F2 (n, 1) = n1/2 + O(n5/16 )[2, p. 85], no asymptotical results are known for
g > 1 or h > 2 [1].
The infinite Bh [g] sets are understood even worse. ErdoTs [2, p. 89] proved that for every
infinite Sidon set A we have lim inf n (n/ log n)1/2 |A [1, n]| < C for
T some constant C.
On the other hand, for a long time no example of a set for which |A [1, n]| > n1/3+ for
some > 0 was known. Only recently Ruzsa[3] used an extremely clever construction to
T
prove the existence of a set A for which |A [1, n]| > n 21 for every > 0 and for all
sufficiently large n.
537
REFERENCES
1. Ben
Green.
Bh [g]
sets:
The
current
state
of
affairs.
http://www.dpmms.cam.ac.uk/~bjg23/papers/bhgbounds.dvi, 2000.
2. Heini Halberstam and Klaus Friedrich Roth. Sequences. Springer-Verlag, second edition, 1983.
Zbl 0498.10001.
3. Imre Ruzsa. An infinite Sidon sequence. J. Number Theory, 68(1):6371, 1998. Available at
http://www.math-inst.hu/~ruzsa/cikkek.html.
94.5
asymptotic density
Let A be a subset of Z+ . For any n Z+ put A(n) = {1, 2, . . . , n}

Define the upper asymptotic density d(A) of A by
d(A) = lim sup
n
A.
|A(n)|
n
d(A) is also known simply as the upper density of A.

Similarly, we define d(A), the lower asymptotic density of A, by
d(A) = lim inf
n
|A(n)|
n
We say A has asymptotic density d(A) if d(A) = d(A), in which case we put d(A) = d(A).
Version: 3 Owner: mathcam Author(s): mathcam, saforres
94.6
discrete space
Definition Let X be a set. Then the discrete topology for X is the topology given by the
power set of X. A topological space equipped with the discrete topology is called a discrete
space.
In other words, the discrete topology is the finest topology one can give to a set.
Theorem The following conditions are equivalent:
1. X is a discrete space.
2. Every singleton in X is an open set.
538
3. If A is a subset of X, and x A, then A is a neighborhood of x.

Definition Suppose X is a topological space and Y is a subset equipped with the subspace topology.
If Y is a discrete space, then Y is called discrete subspace of X.
Theorem Suppose X is a topological space and Y is a subset of X. Then Y isTa discrete
subspace if and only if for any y Y , there is an open subset S of X such that S Y = {x}.
Example The set Z is a discrete subspace of R and C.
Version: 5 Owner: matte Author(s): matte, Larry Hammick, drini, drummond
94.7
essential component
If A is a set of nonnegative integers such that

(A + B) > B
(94.7.1)
for every set B with Schnirelmann density 0 B + (1 B)B,
2h
where h denotes the order of A.
Pl
unnecke improved that to
(A + B) > B 11/h .
There are non-basic essential components. Linnik constructed non-basic essential component
for which A(n) = O(n ) for every > 0.
REFERENCES
94.8
normal order
Let f (n) and F (n) be functions from Z+ R. We say that f (n) has normal order F (n) if
for each > 0 the set
A() = {n Z+ : (1 )F (n) < f (n) < (1 + )F (n)}
539
has the property that d(A()) = 1. Equivalently, if B() = Z+ \A(), then d(B()) = 0.
(Note that d(X) denotes the lower asymptotic density of X).
We say that f has average order F if
n
X
j=1
f (j)
n
X
j=1
540
F (j)
Chapter 95
11B13 Additive bases
95.1
Erd
os-Turan conjecture
Erdos-Turan conjecture asserts there exist no asymptotic basis A N0 of order 2 such that
its representation function
X
0
1
rA,2
(n) =
a1 +a2 =n
a1 6a2
is bounded.
Alternatively, the question can be phrased as whether there exists a power series F with
coefficients 0 and 1 such that all coefficients of F 2 are greater than 0, but are bounded.
If we replace set of nonnegative integers by the set of all integers, then the question was
0
settled by Nathanson[2] in negative, that is, there exists a set A Z such that rA,2
(n) = 1.
REFERENCES
Zbl 0498.10001.
2. Melvyn B. Nathanson. Every function is the representation function of an additive basis for
the integers. arXiv:math.NT/0302091.
541
95.2
additive basis
A subset A of Z is an (additive) basis of order n if

[
nA = N {0},
where nA is n-fold sumset of A. Usually it is assumed that 0 belongs to A when saying that
A is an additive basis.
95.3
asymptotic basis
A subset A of Z is an asymptotic basis of order n if the n-fold sumset nA contains all

sufficiently large integers.
95.4
base con version
The bases entry gives a good overview over the symbolic representation of numbers, something which we use every day. This entry will give a simple overview and method of converting
numbers between bases: that is, taking one representation of a number and converting it to
another.
Perhaps the simplest way to explain base conversion is to describe the conversion to and
from base 10 (in which everyone is accustomed to performing arithmetic). We will begin
with the easier method, which is the conversion from some other base to base 10.
Conversion to Base 10
Suppose we have a number represented in base b. This number is given as a sequence of
symbols sn sn1 s2 s1 .t1 t2 tm . This sequence represents
sn bn1 + sn1 bn2 + + s2 b + s1 + t1 b1 + t2 b2 + + tm bm
This is straight-forward enough. All we need to do is convert each symbol s to its decimal
equivalent. Typically this is simple. The symbols 0, 1, , 9 usually represent the same value
in any other base. For b > 9, the letters of the alphabet begin to be used, with a = 10, b = 11,
542
and so on. This serves up to b = 36. Since the most common bases used are binary (b = 2),
octal (b = 8), decimal (b = 10), and hexadecimal (b = 16), this scheme is generally sufficient.
Once we map symbols to values, we can easily apply the formula above, adding and multiplying as we go along.
Example of Conversion to Base 10

1. Binary to Decimal Let b = 2. Convert 10010.011 to decimal.
10010.011 = 1 24 + 0 23 + 0 22 + 1 2 + 0 + 0 21 + 1 22 + 1 23
1 1
= 16 + 2 + +
4 8
= 18.375
2. Ternary to Decimal Let b = 3. Convert 10210.21 to decimal.
10210.21 = 1 34 + 0 33 + 2 32 + 1 3 + 0 + 2 31 + 1 32
2 1
= 1 81 + 2 9 + 1 3 + +
3 9
= 102.777777 . . .
Note that there is no exact decimal representation of the ternary value 10210.21.
This happens often with conversions between bases. This is why many decimal values
(such as 0.1) cannot be represented precisely in binary floating point (generally used
in computers for non-integral arithmetic).
3. Hexadecimal to Decimal Let b = 16. Convert 4ad9.e3 to decimal.
4ad9.e3 = 4 163 + 10 162 + 13 16 + 9 + 14 161 + 3 162

14
3
= 4 4096 + 10 256 + 13 16 + 9 +
+
16 256
227
= 16384 + 2560 + 208 + 9 +
256
= 19161.88671875
Iterative Method
Obviously base conversion can become very tedious. It would make sense to write a computer program to perform these operations. What follows is a simple algorithm that iterates
543
through the symbols of a representation, accumulating a value as it goes along. Once all the
symbols are iterated, the result is in the accumulator. This method only works for whole
numbers. The fractional part of a number could be converted in the same manner, but it
would have to be a separate process, working from the end of the number rather than the
beginning.
Algorithm Parse((A, n, b))
Input: Sequence A, where 0 6 A[i] 1
Output: The value represented by A in base b (where A[1] is the left-most digit) v 0
for i 1 to n do
v b v + A[i]
Parse v
This is a simple enough method that it could even be done in ones head.
Example of Iterative Method

Let b = 2. Convert 101101 to decimal.
Remaining Digits
Accumulator
101101
0
01101
02+1=1
1101
12+0=2
101
22+1=5
01
5 2 + 1 = 11
1
11 2 + 0 = 22
22 2 + 1 = 45
This is a bit easier than remembering that the first digit corresponds to the 25 = 32 place.
Conversion From Base 10

To convert a value to some base b simply consists of inverse application of the iterative
method. As the iterative method for parsing a symbolic representation of a value consists
of multiplication and addition, the iterative method for forming a symbolic representation
of a value will consist of division and subtraction.
Algorithm Generate((A, n, b))
Input: Array A of sufficient size to store the representation, n > 0, b > 1
Output: The representation of n in base b will be stored in array A in reverse order i 1
whilen > 0 do
A[i] n mod b (remainder of n/b)
n n/b (integral division)
ii+1
544
Example of Conversion from Base 10

Convert the decimal value 20000 to hexadecimal.
n
Value Sequence Symbolic Sequence
20000
{}
1250
{0}
0
78
{2, 0}
20
4
{14, 2, 0}
e20
0
{4, 14, 2, 0}
4e20
Note how we obtain the symbols from the right. We get the next symbol (moving left) by
taking the value of n mod 16. We then replace n with the whole part of n/16, and repeat
until n = 0.
This isnt as easy to do in ones head as the other direction, though for small bases (e.g.
b = 2) it is feasible. For example, to convert 20000 to binary:
n
20000
10000
5000
2500
1250
625
312
156
78
39
19
9
4
2
0
1
00
0
100
Representation
0
10
110
1110
1110
1110
1110
0
10
010
0010
0010
0010
0010
0010
0010
0010
0010
0
00
000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
Of course, remembering that many digits (15) might be difficult.
Conversion Between Similar Bases

The digits in the previous example are grouped into sets of four both to ease readability, and
to highlight the relationship between binary and hexadecimal. Since 24 = 16, each group of
four binary digits is the representation of a hexadecimal digit in the same position of the
sequence.
545
It is trivial to get the octal and hexadecimal representations of any number once one has the
binary representation. For instance, since 23 = 8, the octal representation can be obtained
by grouping the binary digits into groups of 3, and converting each group to an octal digit.
100 111 000 100 000 = 47040
Even base 25 = 32 could be obtained:
10011 10001 00000 = jh0
95.5
sumset
Let A1 , A2 , . . . , An be subsets of an additive group G. The sumset

A1 + A2 + + An
is the set of all elements of the form a1 + a2 + + an , where ai Ai .
In geometry a sumset is often called Minkowski sum.
546
Chapter 96
11B25 Arithmetic progressions
96.1
Behrends construction
At first sight it may seem that the greedy algorithm yields the densest subset of {0, 1, . . . , N}
that is free of arithmetic progressions of length 3. It is not hard to show that the greedy
algorithm yields the set of numbers that lack digit 2 in their ternary development. Density
of such numbers is O(N log3 21 ).
However, in 1946 Behrend[1] constructed much denser subsets that are free of arithmetic
progressions. His major idea is that if we were looking for a progression-free
sets in Rn , then
T
we could use spheres. So, consider an d-dimensional cube [1, n]d Zd and family of spheres
x21 + x22 + + x2d = t for t = 1, . . . , dn2 . Each point in the cube is contained in one of the
spheres, and so at least one of the spheres contains nd /dn2 lattice points. Let us call this set
A. Since sphere does not contain arithmetic progressions, A does not contain any progressions
either. Now let f be a Freiman isomorphism from A to a subset of Z defined as follows. If
x = {x1 , x2 , . . . , xd } is a point of A, then f (x) = x1 + x2 (2n) + x3 (2n)2 + + xd (2n)d1 ,
that is, we treat xi as ith digit of f (x) in base 2n. It is not hard to see that f is indeed
a Freiman
isomorphism of order 2, and that f (A) {1, 2, . . . , N = (2n)d }. If we set
d = c ln
N, then we get that there is a progression-free subset of {1,
p 2, . . . , N} of size at
ln N (c ln 2+2/c+o(1)
least Ne
. To maximize this value we can set c = 2/ ln 2. Thus, there
exists a progression-free set of size at least
Ne
8 ln 2 ln N (1+o(1))
This result was later generalized to sets not containing arithmetic progressions of length k
by Rankin[3]. His construction is more complicated, and depends on the estimates of the
number of representations of an integer as a sum of many squares. He proves that the size
of a set free of k-term arithmetic progression is at least
1/(k1)
Nec(log N )
547
On the other hand, Moser[2] gave a construction analogous to that of Behrend, but which
was explicit since it did not use the pigeonhole principle.
REFERENCES
1. Felix A. Behrend. On the sets of integers which contain no three in arithmetic progression.
Proc. Nat. Acad. Sci., 23:331332, 1946. Zbl 0060.10302.
2. Leo Moser. On non-averaging sets of integers. Canadian J. Math., 5:245252, 1953.
Zbl 0050.04001.
3. Robert A. Rankin. Sets of integers containing not more than a given number of terms in
arithmetical progression. Proc. Roy. Soc. Edinburgh Sect. A, 65:332344, 1962. Zbl 0104.03705.
96.2
Freimans theorem
Let A be a finite set of integers such that the 2-fold sumset 2A is small, i.e., |2A| < c|A|
for some constant c. There exists an n-dimensional arithmetic progression of length c0 |A|
that contains A, and such that c0 and n are functions of c only.
REFERENCES
96.3
Szemer
edis theorem
Let k be a positive integer and let > 0. There exists a positive integer N = N(k, ) such
that every subset of {1, 2, . . . , N} of size N contains an arithmetic progression of length k.
The case k = 3 was first proved by Roth[4]. His method did not seem to extend to the case
k > 3. Using completely different ideas Szemeredi proved the case k = 4 [5], and the general
case of an arbitrary k [6].
The best known bounds for N(k, ) are
c(log 1 )k1
6 N(k, ) 6 2
548
k+9
22
where the lower bound is due to Behrend[1] (for k = 3) and Rankin[3], and the upper bound
is due to Gowers[2].
For k = 3 a better upper bound was obtained by Bourgain
56 2
N(3, ) 6 c 2 e2
REFERENCES
1. Felix A. Behrend. On the sets of integers which contain no three in arithmetic progression.
Proc. Nat. Acad. Sci., 23:331332, 1946. Zbl 0060.10302.
2. Timothy Gowers. A new proof of Szemeredis theorem. Geom. Funct. Anal., 11(3):465588,
2001. Preprint available at http://www.dpmms.cam.ac.uk/~wtg10/papers.html.
3. Robert A. Rankin. Sets of integers containing not more than a given number of terms in
arithmetical progression. Proc. Roy. Soc. Edinburgh Sect. A, 65:332344, 1962. Zbl 0104.03705.
4. Klaus Friedrich Roth. On certain sets of integers. J. London Math. Soc., 28:245252, 1953.
Zbl 0050.04002.
5. Endre Szemeredi. On sets of integers containing no four elements in arithmetic progression.
Acta Math. Acad. Sci. Hung., 20:89104, 1969. Zbl 0175.04301.
6. Endre Szemeredi. On sets of integers containing no k elements in arithmetic progression. Acta.
Arith., 27:299345, 1975. Zbl 0303.10056.
96.4
multidimensional arithmetic progression
An n-dimensional arithmetic progresssion is a set of the form

Q = Q(a; q1 , . . . , qn ; l1 , . . . , ln )
= { a + x1 q1 + + xn qn | 0 6 xi < li for i = 1, . . . , n }.
The length of the progression is defined as l1 ln . The progression is proper if |Q| =
l1 ln .
REFERENCES
549
Chapter 97
11B34 Representation functions
97.1
Erd
os-Fuchs theorem
Let A be a set of natural numbers. Let Rn (A) be the number of ways to represent n as a
sum of two elements in A, that is,
X
Rn (A) =
1.
ai +aj =n
ai ,aj A
Erdos-Fuchs theorem [1, 2] states that if c > 0, then

1
X
1
Rn (A) = cN + o N 4 log 2 N
n6N
cannot hold.
On the other hand, Ruzsa [3] constructed a set for which

1

X
Rn (A) = cN + O N 4 log N .
n6N
REFERENCES
1. Paul Erdos and Wolfgang H.J. Fuchs. On a problem of additive number theory. J. Lond. Math.
Soc., 31:6773, 1956. Zbl 0070.04104.
Zbl 0498.10001.
3. Imre Ruzsa. A converse to a theorem of Erdos and Fuchs. J. Number Theory, 62(2):397402,
1997. Zbl 0872.11014.

550
Chapter 98
11B37 Recurrences
98.1
Collatz problem
We define the function f : N N (where N excludes zero) such that

f (a) =
3a + 1
a/2
if a is odd
if a is even.
Then let the sequence cn be defined as ci = f (ci1 ), with c0 an arbitrary natural seed value.
It is conjectured that the sequence c0 , c1 , c2 , . . . will always end in 1, 4, 2, repeating infinitely.
This has been verified by computer up to very large values of c0 , but is unproven in general.
It is also not known whether this problem is decideable. This is generally called the Collatz
problem.
The sequence cn is sometimes called the hailstone sequence. This is because it behaves
analogously to a hailstone in a cloud which falls by gravity and is tossed up again repeatedly.
The sequence similarly ends in an eternal oscillation.
98.2
recurrence relation
A recurrence relation is a function which gives the value of a sequence at some position
based on the values of the sequence at previous positions and the position index itself. If
the current position n of a sequence s is denoted by sn , then the next value of the sequence
expressed as a recurrence relation would be of the form
551
sn+1 = f (s1 , s2 , . . . , sn1 , sn , n)

Where f is any function. An example of a simple recurrence relation is
sn+1 = sn + (n + 1)
which is the recurrence relation for the sum of the integers from 1 to n + 1. This could also
be expressed as
sn = sn1 + n
keeping in mind that as long as we set the proper initial values of the sequence, the recurrence
relation indices can have any constant amount added or subtracted.
552
Chapter 99
11B39 Fibonacci and Lucas numbers
and polynomials and generalizations
99.1
Fibonacci sequence
The Fibonacci sequence, discovered by Leonardo Pisano Fibonacci, begins

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, . . .
The nth Fibonacci number is generated by adding the previous two. Thus, the Fibonacci
sequence has the recurrence relation
fn = fn1 + fn2
with f0 = 0 and f1 = 1. This recurrence relation can be solved into the closed form
1
f (n) = (n 0 n )
5
Where is the golden ratio (also see this entry for an explanation of 0 .) Note that
fn+1
=
n fn
lim
553
99.2
Hogatts theorem
Hogatts theorem states that every positive integer can be expressed as a sum of distinct
Fibonacci numbers.
For any positive integer, k Z+ , there exists a unique positive integer n so that Fn1 < k 6
Fn . We proceed by strong induction on n. For k = 0, 1, 2, 3, the property is true as 0, 1, 2, 3
are themselves Fibonacci numbers. Suppose k > 4 and that every integer less than k is a
sum of distinct Fibonacci numbers. Let n be the largest positive integer such that Fn < k.
We first note that if k Fn > Fn1 then
Fn+1 > k > Fn + Fn1 = Fn+1 ,
giving us a contradiction. Hence k Fn 6 Fn1 and consequently the positive integer
(k Fn ) can be expressed as a sum of distinct Fibonnaci numbers. Moreover, this sum does
not contain the term Fn as k Fn 6 Fn1 < Fn . Hence,k = (k Fn ) + Fn is a sum of distinct
Fibonacci numbers and Hogatts theorem is proved by induction.
Version: 4 Owner: mathcam Author(s): mathcam, alek thiery
99.3
Lucas numbers
The Lucas numbers are a slight variation of Fibonacci numbers. These numbers follow the
same recursion:
ln+1 = ln + ln1
but having different initial conditions: l1 = 1, l2 = 3 leading to the sequence 1, 3, 4, 7, 11, 18, 29, 47, 76, 123, . .
Lucas numbers hold the following property: ln = fn1 + fn+1 where fn is the nth Fibonacci
number.
99.4
golden ratio
The Golden Ratio, or , has the value

1.61803398874989484820 . . .
gets its rather illustrious name from the fact that the Greeks thought that a rectangle
with ratio of side lengths of about 1.6 was the most pleasing to the eye. Classical Greek
architecture is based on this premise.
554
Above: The golden rectangle; l/w = .
has plenty of interesting mathematical properties, however. Its value is exactly
1+ 5
2
The value
1 5
2
is often called 0 . and 0 are the two roots of the recurrence relation given by the
Fibonacci sequence. The following identities hold for and 0 :
= 0
1 = 0
1
0
1 0 =
and so on. These give us
1 + 0 = 1
which implies
n1 + n = n+1
555
Chapter 100
11B50 Sequences (mod m)
100.1
Erd
os-Ginzburg-Ziv theorem
If a1 , a2 , . . . , a2n1 is a set of integers, then there exists a subset ai1 , ai2 , . . . , ain of n integers
such that
ai1 + ai2 + + ain 0 (mod n).
REFERENCES
556
Chapter 101
11B57 Farey sequences; the
sequences ?
101.1
Farey sequence
The nth Farey sequence is the ascending sequence of all rationals {0
a
b
1 : b n}.
The first 5 Farey sequences are

1
2
3
4
5
0
1
0
1
0
1
0
1
0
1
<
<
<
<
<
1
1
1
2
1
3
1
4
1
5
<
<
<
<
1
1
1
2
1
3
1
4
<
<
<
2
3
1
2
1
3
<
<
<
1
1
2
3
2
5
<
<
3
4
1
2
<
<
1
1
3
5
<
2
3
<
3
4
<
4
5
<
1
1
Farey sequences are a singularly useful tool in understanding the convergents that appear in
continued fractions. The convergents for any irrational can be found: they are precisely
the closest number to on the sequences Fn .
It is also of value to look at the sequences Fn as n grows. If ab and dc are reduced representations
of adjacent terms in some Farey sequence Fn (where b, d n), then they are adjacent fractions;
their difference is the least possible:
a c
1

= .
b d
bd
Furthermore, the first fraction to appear between the two in a Farey sequence is
sequence Fb+d , and (as written here) this fraction is already reduced.
a+c
,
b+d
in
An alternate view of the dynamics of how Farey sequences develop is given by Stern-Brocot
trees.
557
558
Chapter 102
11B65 Binomial coefficients;
factorials; q-identities
102.1
Lucass Theorem
Let m, n N {0} be two natural numbers . If p is a prime number and :

m = ak pk + ak1 pk1 + + a1 p + a0 , n = bk pk + bk1 pk1 + ... + b1 p + b0
are the base-p expansions of m and n , then the following congruence is true :

m
a0
a1
ak
(mod p)
n
b0
b1
bk
Note : the binomial coefficient is defined in the usual way , namely :

x
x!
=
y!(x y)!
y
if x > y and 0 otherwise (of course , x and y are natural numbers).
Version: 1 Owner: slash Author(s): slash
102.2
binomial theorem
The binomial theorem is a formula for the expansion of (a + b)n , for n a positive integer and
a and b any two real (or complex) numbers, into a sum of powers of a and b. More precisely,

n n1
n n2 2
n
n
(a + b) = a +
a b+
a b + + bn .
1
2
559
For example, if n is 3 or 4, we have:

(a + b)3
= a3 + 3a2 b + 3ab2 + b3
(a + b)4 = a4 + 4a3 b + 6a2 b2 + 4ab3 + b4 .
560
Chapter 103
11B68 Bernoulli and Euler numbers
and polynomials
103.1
Bernoulli number
Let Br be the rth Bernoulli periodic function. Then the rth Bernoulli number is
Br := Br (0).
1
One can see that B2r+1 = 0 for r > 1. Numerically, B0 = 1, B1 = 12 , B2 = 16 , B4 = 30
,
103.2
Bernoulli periodic function
Let br be the rth Bernoulli polynomial. Then the rth Bernoulli periodic function Br (x) is
defined as the periodic function of period 1 which coincides with br on [0, 1].
561
103.3
Bernoulli polynomial
The Bernoulli polynomials are the sequence {br (x)}

r=0 of polynomials defined on [0, 1] by
the conditions:
b0 (x) = 1,
b0r (x) = rbr1 (x), r > 1,
int10 br (x)dx = 0, r > 1
These assumptions imply the identity
br (x)
r=0
yr
yexy
= y
r!
e 1
allowing us to calculate the br . We have
b0 (x) = 1
b1 (x) = x
1
2
b2 (x) = x2 x +
1
6
1
3
b3 (x) = x3 x2 + x
2
2
b4 (x) = x4 2x3 + x2
..
.
1
30
103.4
generalized Bernoulli number
Let be a non-trivial primitive character mod m. The generalized Bernoulli numbers Bn,
are given by
m
X
X
tn
teat
=
Bn,
(a) mt
e 1 n=0
n!
a=1
They are members of the field Q() generated by the values of .

562
Chapter 104
11B75 Other combinatorial number
theory
104.1
Erd
os-Heilbronn conjecture
Let A Zp be a set of residues modulo p, and let h be a positive integer, then

hA = { a1 + a2 + + ah | a1 , a2 , . . . , ah are distinct elements of A }
has cardinality at least min(p, hk h2 + 1). This was conjectured by Erdos and Heilbronn
in 1964[1]. The first proof was given by Dias da Silva and Hamidoune in 1994.
REFERENCES
1. Paul Erdos and Hans Heilbronn. On the addition of residue classes (mod#1). Acta Arith.,
9:149159, 1964. Zbl 0156.04801.
volume 165 of GTM. Springer, 1996. Zbl 0859.11003.
104.2
Freiman isomorphism
Let A and B be subsets of abelian groups GA and GB respectively. A Freiman isomorphism

of order s is a bijective mapping f : A B such that
a1 + a2 + + as = a01 + a02 + + a0s
563
holds if and only if

f (a1 ) + f (a2 ) + + f (as ) = f (a01 ) + f (a02 ) + + f (a0s ).
The Freiman isomorphism is a restriction of the conventional notion of a group isomorphism
to a limited number of group operations. In particular, a Freiman isomorphism of order s is
also a Freiman isomorphism of order s 1, and the mapping is a Freiman isomorphism of
every order precisely when it is the conventional isomorphism.
Freiman isomorphisms were introduced by Freiman in his monograph [1] to build a general
theory of set addition that is independent of the underlying group.
The number of equivalence classes under Freiman isomorphisms of order 2 is n2n(1+o(1)) [2].
REFERENCES
1. Gregory Freiman. Foundations of Structural Theory of Set Addition, volume 37 of Translations
of Mathematical Monographs. AMS, 1973. Zbl 0271.10044.
2. Sergei V. Konyagin and Vsevolod F. Lev.
Combinatorics and linear algebra of Freimans isomorphism.
Mathematika, 47:3951, 2000.
Available at
http://math.haifa.ac.il/~seva/pub_list.html.
volume 165 of GTM. Springer, 1996. Zbl 0859.11003.
104.3
sum-free
A set A is called sum-free if the equation a1 + a2 = a3 does not have solutions inTelements
of A. Equivalently, a set is sum-free if it is disjoint from its 2-fold sumset, i.e., A 2A = .
564
Chapter 105
11B83 Special sequences and
polynomials
105.1
Beatty sequence
The integer sequence

0
B(, ) :=

n 0
is called the Beatty sequence with density , slope

1
,
n=1
offset 0 , and y-intercept
0
.
Sometimes a sequence of the above type is called a floor Beatty sequence, and denoted
B(f ) (, 0), while an integer sequence

n 0
(c)
0
B (, ) :=
n=1
is called a ceiling Beatty sequence.
References
[1 ] M. Lothaire, Algebraic combinatorics on words, vol. 90, Cambridge University Press,

kCambridge, 2002, ISBN 0-521-81220-8, available online at http://www-igm.univ-mlv.fr/~berstel
A collective work by Jean Berstel, Dominique Perrin, Patrice Seebold, Julien Cassaigne,
Aldo De Luca, Steffano Varricchio, Alain Lascoux, Bernard Leclerc, Jean-Yves Thibon,
Veronique Bruyere, Christiane Frougny, Filippo Mignosi, Antonio Restivo, Christophe
Reutenauer, Dominique Foata, Guo-Niu Han, Jacques Desarmenien, Volker Diekert,
Tero Harju, Juhani Karhumaki and Wojciech Plandowski; With a preface by Berstel
and Perrin. MR 1905123
Version: 8 Owner: Kevin OBryant Author(s): Kevin OBryant
565
105.2
Beattys theorem
If p and q are positive irrationals such that

1 1
+ =1
p q
then the sequences
{bnpc}
n=1 = bpc, b2pc, b3pc, . . .
{bnqc}n=1 = bqc, b2qc, b3qc, . . .

where bxc denotes the floor (or greatest integer function) of x, constitute a partition of the
set of positive integers.
That is, every positive integer is a member exactly once of one of the two sequences and the
two sequences have no common terms.
105.3
Fraenkels partition theorem
Fraenkels partition theorem is a generalization of Beattys theorem. Set

n 0
0
B(, ) :=
.
n=1
We say that two sequences partition N = {1, 2, 3, . . .} if the sequences are disjoint and their
union is N.
Fraenkels Partition Theorem: The sequences B(, 0) and B(, 0) partition N if and
only if the following five conditions are satisfied.
1. 0 < < 1.
2. + = 1.
3. 0 + 0 1.
4. If is irrational, then 0 + 0 = 0 and k + 0 6 Z for 2 k N.
5. If is rational (say q N is minimal with q N), then
1.
566
1
q
+0 and dq0 e+dq 0 e =
References
[1 ] Aviezri S. Fraenkel, The bracket function and complementary sets of integers, Canad.
J. Math. 21 (1969), 627. MR 38:3214
[2 ] Kevin OBryant, Fraenkels partition and Browns decomposition, arXiv:math.NT/0305133.
105.4
Sierpinski numbers
An integer k is a Sierpinski number if for every positive integer n, the number k2n + 1 is
composite.
That such numbers exist is amazing, and even more surprising is that there are infinitely
many of them (in fact, infinitely many odd ones). The smallest known Sierpinski number
is 78557, but it is not known whether or not this is the smallest one. The smallest number
m for which it is unknown whether or not m is a Sierpinski number is 4847.
A process for generating Sierpinski numbers using covering sets of primes can be found at
http://www.glasgowg43.freeserve.co.uk/siercvr.htm
Visit
http://www.seventeenorbust.com/
for the distributed computing effort to show that 78557 is indeed the smallest Sierpinski
number (or find a smaller one).
Similarly, a Riesel number is a number k such that for every positive integer n, the number
k2n 1 is composite. The smallest known Riesel number is 509203, but again, it is not known
for sure that this is the smallest.
105.5
palindrome
A palindrome is a number which yields itself when its digits are reversed. Some palindromes
are :
121
567
2002
314159951413
Clearly one can construct aribitrary-length palindromes by taking any number and appending to it a reversed copy of itself or of all but the last digit.
The concept of palindromes can also be extended to sequences and strings.
105.6
proof of Beattys theorem
We define an := np and bn := nq. Since p and q are irrational, so are an and bn .

It is also the case that an 6= bm for all m and n, for if np = mq then q = 1 +
rational.
n
m
would be
The theorem
is equivalent with the statement that for each integer N > 1 exactly 1 element
S
of {an } {bn } lies in (N, N + 1).
S
Choose N integer. Let s(N) be the number of elements of {an } {bn } less than N.
an < N np < N n <
N
p
So there are b Np c elements of {an } less than N and likewise b Nq c elements of {bn }.
By definition,
N
p
N
q
1 < b Np c <
1 < b Nq c <
N
p
N
q
and summing these inequalities gives N 2 < s(N) < N which gives that s(N) = N 1
since s(N) is integer.
S
The number of elements of {an } {bn } lying in (N, N + 1) is then s(N + 1) s(N) = 1.
568
105.7
square-free sequence
A square-free sequence is a sequence which has no adjacent repeating subsequences of any

length.
The name square-free comes from notation: Let {s} be a sequence. Then {s, s} is also a
sequence, which we write compactly as {s2 }. In the rest of this entry we use a compact
notation, lacking commas or braces. This notation is commonly used when dealing with
sequences in the capacity of strings. Hence we can write {s, s} = ss = s2 .
Some examples:
xabcabcx = x(abc)2 x, not a square-free sequence.
abcdabc cannot have any subsequence written in square notation, hence it is a squarefree sequence.
ababab = (ab)3 = ab(ab)2 , not a square-free sequence.
Note that, while notationally similar to the number-theoretic sense of square-free, the two
concepts are distinct. For example, for integers a and b the product aba = a2 b, a square.
But as a sequence, aba = {a, b, a}; clearly lacking any commutativity that might allow us to
shift elements. Hence, the sequence aba is square-free.
105.8
superincreasing sequence
A sequence s1 , s2 , . . . is superincreasing if
sn >
n1
X
si
i=1
That is, any element of the sequence is greater than all of the previous elements added
together. A commonly used superincreasing sequence is that of powers of two (si = 2i .)
P
Suppose x = ni=1 ai si . If s is a superincreasing sequence and every ai {0, 1}, then we can
always determine the ai s simply by knowing x (this is analogous to the fact that we can
always determine which bits are on and off in the binary bitstring representing a number,
given that number.)
569
Chapter 106
11B99 Miscellaneous
106.1
Lychrel number
A Lychrel number is a number which never yields a palindrome in the iterative process
of adding to itself a copy of itself with digits reversed. For example, if we start with the
number 983 we get:
983 + 389 = 1372
1372 + 2731 = 4103
4103 + 3014 = 7117
So in 3 steps we get a palindrome, hence 983 is not a Lychrel number.
In fact, it is not known if there exist any Lychrel numbers (in base 10 in base 2 for instance,
there have been numbers proven to be Lychrel numbers 1 ). The first Lychrel candidate is
196:
196 + 691 = 887
887 + 788 = 1675
1675 + 5761 = 7436
7436 + 6347 = 13783
13783 + 38731 = 52514
1
[2] informs us that Ronald Sprague has proved that the number 10110 in base 2 is a Lychrel number.
570
52514 + 41525 = 94039

94039 + 93049 = 187088
187088 + 880781 = 1067869
...
This has been followed out to millions of digits, with no palindrome found in the sequence.
The following table gives the number of Lychrel candidates found within ascending ranges:
Range
Possible Lychrels
0 - 100
0
100 - 1,000
2
1,000 - 10,000
3
10,000 - 100,000
69
100,000 - 1,000,000
99
10,000,000 - 100,000,000
1728
100,000,000 - 1,000,000,000
29,813
REFERENCES
1. Wade VanLandingham, 196 And Other Lychrel Numbers
2. John Walker, Three Years of Computing
106.2
closed form
A closed form function which gives the value of a sequence at index n has only one parameter, n itself. This is in contrast to the recurrence relation form, which can have all of the
previous values of the sequence as parameters.
The benefit of the closed form is that one does not have to calculate all of the previous values
of the sequence to get the next value. This is not too useful if one wants to print out or
utilize all of the values of a sequence up to some n, but it is very useful to get the value of
the sequence just at some index n.
There are many techniques used to find a closed-form solution for a recurrence relation.
Some are
571
Repeated substitution. Replace each sk in the expression of sn (with k < n) with

its recurrence relation representation. Repeat again on the resulting expression, until
some pattern is evident.
Estimate an upper bound for sn in terms of n. Then, solve for the unknowns (say there
are r unknowns) by finding the first r values of the recurrence relation and solving the
linear system formed by them and the unknowns.
Find the characteristic equation of the recurrence relation and solve for the roots. If
the recurrence relation is not homogeneous, then youll have to apply a method such
as the method of undetermined coefficients.
572
Chapter 107
11C08 Polynomials
107.1
content of a polynomial
Let P = a0 + a1 x + . . . + an xn Z[x] be a polynomial with integer coefficients. The content

of P is the greatest common divisor of the coefficients of P .
c(P ) = gcd(a0 , a1 , . . . , an )
107.2
cyclotomic polynomial
For any positive integer n, we define n (x), the nth cyclotomic polynomial, by
n (x) =
n
Y
(x n j )
j=1,
(j,n)=1
where n = e
2i
n
, i.e. n is an nth root of unity.
n (x) is an irreducible polynomial of degree (n) in Q[x] for all n Z+ .

573
107.3
height of a polynomial
Let P = a0 + a1 x + . . . + an xn C[x] be a polynomial with complex coefficients. The height

of P is
H(P ) = max{| a0 |, | a1 |, . . . , | an |}.
107.4
length of a polynomial
Let P = a0 + a1 x + . . . + an xn C[x] be a polynomial with complex coefficients. The length

of P is
n
X
L(P ) =| a0 | + | a1 | + . . . + | an |=
| ai |
i=0
107.5
proof of Eisenstein criterion
Let f (x) R[x] be a polynomial satisfying Eisensteins criterion with prime p.

Suppose that f (x) = g(x)h(x) with g(x), h(x) F [x], where F is the field of fractions of R.
A lemma of Gauss states that there exist g 0 (x), h0 (x) R[x] such that f (x) = g 0(x)h0 (x), i.e.
any factorization can be converted to a factorization in R[x].
P
P
P
k
0
Let f (x) = ni=0 ai xi , g 0 (x) = `j=0 bj xj , h0 (x) = m
k=0 ck x be the expansions of f (x), g (x),
and h0 (x) respectively.
Let : R[x] R/pR[x] be the natural homomorphism from R[x] to R/pR[x]. Note that
since p | ai for i < n and p - an , we have (ai ) = 0 for i < n and (ai ) = 6= 0
!
n
n
X
X
i
(f (x)) =
ai x =
(ai )xi = (an )xn = xn
i=0
i=0
Therefore we have xn = (f (x)) = (g 0 (x)h0 (x)) = (g 0(x))(h0 (x)) so we must have

0
0
(g 0(x)) = x` and (h0 (x) = xm for some , R/pR and some integers `0 , m0 .
Clearly `0 6 deg(g 0 (x)) = ` and m0 6 deg(h0 (x)) = m, and therefore since `0 m0 = n = `m,
we must have `0 = ` and m0 = m. Thus (g 0(x)) = x` and (h0 (x)) = xm .
574
If ` > 0, then (bi ) = 0 for i < `. In particular, (b0 ) = 0, hence p | b0 . Similarly if m > 0,
then p | c0 .
Since f (x) = g 0 (x)h0 (x), by equating coefficients we see that a0 = b0 c0 .
If ` > 0 and m > 0, then p | b0 and p | c0 , which implies that p2 | a0 . But this contradicts
our assumptions on f (x), and therefore we must have ` = 0 or m = 0, that is, we must have
a trivial factorization. Therefore f (x) is irreducible.
107.6
proof that the cyclotomic polynomial is irreducible
We first prove that n (x) Z[x]. The field extension Q(n ) of Q is the splitting field of the
polynomial xn 1 Q[x], since it splits this polynomial and is generated as an algebra by a
single root of the polynomial. Since splitting fields are normal, the extension Q(n )/Q is a
Galois extension. Any element of the Galois group, being a field automorphism, must map
n to another root of unity of exact order n. Therefore, since the Galois group of Q(n )/Q
permutes the roots of n (x), it must fix the coefficients of n (x), so by Galois theory these
coefficients are in Q. Moreover, since the coefficients are algebraic integers, they must be in
Z as well.
Let f (x) be the minimal polynomial of n in Q[x]. Then f (x) has integer coefficients as well,
since n is an algebraic integer. We will prove f (x) = n (x) by showing that every root of
n (x) is a root of f (x). We do so via the following claim:
Claim: For any prime p not dividing n, and any primitive nth root of unity C, if
f () = 0 then f ( p ) = 0.
This claim does the job, since we know f (n ) = 0, and any other primitive nth root of unity
can be obtained from n by successively raising n by prime powers p not dividing n a finite
number of times 1 .
To prove this claim, consider the factorization xn 1 = f (x)g(x) for some polynomial g(x)
Z[x]. Writing O for the ring of integers of Q(n ), we treat the factorization as taking place in
O[x] and proceed to mod out both sides of the factorization by any prime ideal p of O lying
over (p). Note that the polynomial xn 1 has no repeated roots mod p, since its derivative
nxn1 is relatively prime to xn 1 mod p. Therefore, if f () = 0 mod p, then g() 6= 0 mod p,
and applying the pth power Frobenius map to both sides yields g( p) 6= 0 mod p. This means
that g( p) cannot be 0 in C, because it doesnt even equal 0 mod p. However, p is a root of
xn 1, so if it is not a root of g, it must be a root of f , and so we have f ( p ) = 0, as desired.
1
Actually, if one applies Dirichlets theorem on primes in arithmetic progressions here, it turns out that
one prime is enough, but we do not need such a sharp result here.
575
576
Chapter 108
11D09 Quadratic and bilinear
equations
108.1
Pells equation and simple continued fractions
Let d be a positive integer which is not a perfect square, and let (x, y) be a solution
of
x
2
2
x dy = 1. Then y is a convergent in the simple continued fraction expansion of d.
Suppose we have a non-trivial solution x, y of Pells equation, i.e. y 6= 0. Let x, y both be
positive integers. From
2
1
x
=d+ 2
y
y
2
we see that xy > d, so we have xy > d. So we get

x
1
d =
y
y 2 .( x + d) <
y
This implies that
x
y
1
y 2 .(2 d)
<
1
.
2y 2
is a convergent of the continued fraction of
577
d.
Chapter 109
11D41 Higher degree equations;
Fermats equation
109.1
Beal conjecture
The Beal conjecture states:

Let A, B, C, x, y, z be nonzero integers such that x, y, and z are all 3, and
Ax + B y = C z
(109.1.1)
Then A, B, and C (or any two of them) are not relatively prime.
It is clear that the famous statement known as Fermats last theorem would follow from this
stronger claim.
Solutions of equation (1) are not very scarce. One parametric solution is
[a(am + bm )]m + [b(am + bm )]m = (am + bm )m+1
for m 3, and a, b such that the terms are nonzero. But computerized searching brings
forth quite a few additional solutions, such as:
578
3 3 + 63
39 + 543
36 + 183
7 6 + 77
274 + 1623
2113 + 31653
3863 + 48253
3073 + 6144
54003 + 904
2173 + 56423
2713 + 8134
6023 + 9034
6243 + 143523
18623 + 577223
22463 + 44924
18383 + 974143
= 35
= 311
= 38
= 983
= 97
= 4224
= 5794
= 52193
= 6304
= 6514
= 75883
= 87293
= 3125
= 37244
= 741183
= 55144
Mysteriously, the summands have a common factor > 1 in each instance.

This conjecture is wanted in Texas, dead or alive. For the details, plus some additional
links, see Mauldin.
109.2
Euler quartic conjecture
Inspired by Fermat last theorem Euler conjectured that there are no positive integer solutions
to the quartic equation
x4 + y 4 + z 4 = w 4.
This conjecture was disproved by Elkies (1988), who found an infinite class of solutions. One
of the first solutions discovered was
26824404 + 153656394 + 1879604 = 206156734
Bibliography
Simon Singh - Fermats Last Theorem
579
109.3
Fermats last theorem
The Theorem
Fermats last theorem was put forth by Pierre de Fermat around 1630. It states that the
Diophantine equation (a, b, c, n N)
an + bn = cn
has no non-zero solutions for n > 2.
History
Fermats last theorem was actually a conjecture and remained unproved for over 300 years. It
was finally proven in 1994 by Andrew Wiles, an English mathematician working at Princeton.
It was always called a theorem, due to Fermats uncanny ability to propose true conjectures. Originally the statement was discovered by Fermats son Clement-Samuel among
margin notes that Fermat had made in his copy of Diophantus Arithmetica. Fermat
followed the statement of the conjecture with the infamous teaser:
I have discovered a truly remarkable proof which this margin is too small to contain
Over the years, Fermats last theorem was proven for various sub-cases which required specific
values of n, but no direct progress was made along these lines towards a general proof. These
proofs were bittersweet victories, as each one still left an infinite number of cases unproved.
Among the big names who took a crack at the theorem are Euler, Gauss, Germaine, Cauchy,
Dirichlet, and Legendre.
The theorem finally began to yield to direct attack in the 20th century.
Proof
In 1982 Gerhard Frey conjectured that if FLT has a solution (a,b,c,n), then the elliptic curve
defined by
y 2 = x(x an )(x + bn )
is semistable, but not modular. The above equation is known as Freys equation, or the Frey
curve. Ribet proved this conjecture in 1986.
The Taniyama-Shimura conjecture, which appeared in an early form in 1955, says that all
580
elliptic curves are modular. If in fact this conjecture were to be proven in the semistable
case, then it would follow that the Frey equation would be semistable and modular, hence
FLT could have no solutions.
After a flawed attempt in 1993, Wiles along with Richard Taylor successfully proved the
semistable case of the Taniyama-Shimura conjecture in 1994, hence proving Fermats last
theorem. The proof appears in the May 1995 Annals of Mathematics, Vol. 151, No. 3. It is
129 pages.
Speculation
Wiles proof rests upon the work of hundreds of mathematicians and the mathematics created
up to and including the 20th century. We cannot imagine how Fermats last theorem could
be proved without these advanced mathematical tools, which include group theory and
Galois theory, the theory of modular forms, Riemannian topology, and the theory of elliptic
equations.
Could Fermat, then, have possibly had a proof to his own conjecture, in the year 1630?
It doesnt seem likely, given the requisite mathematics behind the proof as we know it.
Assuming Fermats teaser was truthful, and Fermat was not in error, this paradox has
lead some to (hopefully) jokingly attribute supernatural abilities to Fermat.
A more interesting possibility is that there is yet another proof, which is elementary and
utilizes no more knowledge than Fermat had available in his day.
Most mathematicians, however, think that Fermat was just in error. It is also possible that
he realized later that he didnt have a solution, but of course did not ammend the margin
notes where he wrote his tantalizing statement. Still, we cannot rule out the existence of a
simpler proof, so for some, the search continues...
Further Reading
Fermats Last Theorem, by J J OConnor and E F Robertson.
Fermats Last Theorem, web site by David Shay.
Fermats Enigma (book, offline), by Simon Singh.
581
Chapter 110
11D79 Congruences in many
variables
110.1
Chinese remainder theorem
Suppose we have a set of n congruences of the form

x a1 (mod p1 )
x a2 (mod p2 )
..
.
x an (mod pn )
where p1 , p2 . . . pn are relatively prime. Let
P =
n
Y
pi
i=1
and, for all i N (1 6 i 6 n), let yi be an integer that satisfies

yi
P
1 (mod pi )
pi
Then one solution of these congruences is

x0 =
n
X
ai yi
i=1
P
pi
Any x Z satisfies the set of congruences if and only if it satifies

x x0
(mod P )
582
The Chinese remainder theorem is said to have been used to count the size of the ancient
Chinese armies (i.e., the soldiers would split into groups of 3, then 5, then 7, etc, and the
leftover soldiers from each grouping would be counted).
110.2
Chinese remainder theorem proof
We first prove the following lemma: if

a b (mod p)
a b (mod q)
gcd(p, q) = 1
then
a b (mod pq)
We know that for some k Z, a b = kp; likewise, for some j Z, a b = jq, so kp = jq.
Therefore kp jq = 0.
It is a well-known theorem that, given a, b, c, x0 , y0 Z such that x0 a + y0 b = c and d =
gcd(a, b), any solutions to the Diophantine equation ax + by = c are given by
b
x = x0 + n
d
a
y = y0 + n
d
where n Z.
We apply this theorem to the diophantine equation kp jq = 0. Clearly one solution of this
diophantine equation is k = 0, j = 0. Since gcd(q, p) = 1, all solutions of this equation are
given by k = nq and j = np for any n Z. So we have a b = npq; therefore pq divides
a b, so a b (mod pq), thus completing the lemma.
Now, to prove the Chinese remainder theorem, we first show that yi must exist for any
natural i where 1 6 i 6 n. If
P
yi 1 (mod pi )
pi
then by definition there exists some k Z such that
yi
P
1 = kpi
pi
583
which in turn implies that
P
kpi = 1
pi
This is a diophantine equation with yi and k being the unknown integers. It is a well-known
theorem that a diophantine equation of the form
yi
ax + by = c
has solutions for x and y if and only if gcd(a, b) divides c. Since pPi is the product of each pj
(j N, 1 6 j 6 n) except pi , and every pj is relatively prime to pi , pPi and pi are relatively
prime. Therefore, by definition, gcd( pPi , pi ) = 1; since 1 divides 1, there are integers k and yi
that satisfy the above equation.
Consider some j N, 1 6 j 6 n. For any i N, 1 6 i 6 n, either i 6= j or i = j. If i 6= j,
then

P
P
ai yi = ai yi
pj
pi
pi pj
so pj divides ai yi pPi , and we know
ai yi
P
0 (mod pj )
pi
Now consider the case that i = j. yj was selected so that

yj
P
1
pj
so we know
aj yj
(mod pj )
P
ai
pj
(mod pj )
So we have a set of n congruences mod pj ; summing them shows that

n
X
i=1
ai yi
P
ai
pi
(mod pj )
Therefore x0 satisfies all the congruences.

Suppose we have some
This implies that for some k Z,
So, for any pi , we know that
x x0
(mod P )
x x0 = kP

P
pi
x x0 = k
pi
584
so x x0 (mod pi ). Since congruence is transitive, x must in turn satisfy all the original
congruences.
Likewise, suppose we have some x that satisfies all the original congruences. Then, for any
pi , we know that
x ai (mod pi )
and since
x0 ai
(mod pi )
the transitive and symmetric properties of congruence imply that

x x0
(mod pi )
for all pi . So, by our lemma, we know that

x x0
(mod p1 p2 . . . pn )
or
x x0
(mod P )
585
Chapter 111
11D85 Representation problems
111.1
polygonal number
A polygonal number, or figurate number, is any value of the function

Pd (n) =
(d 2)n2 + (4 d)n
2
for integers n 0 and d 3. A generalized polygonal number is any value of Pd (n) for
some integer d 3 and any n Z. For fixed d, Pd (n) is called a d-gonal or d-polygonal
number. For d = 3, 4, 5, . . ., we speak of a triangular number, a square number or a square,
a pentagonal number, and so on.
An equivalent definition of Pd , by induction on n, is:
Pd (0) = 0
Pd (n) = Pd (n 1) + (d 2)(n 1) + 1
for all n 1
Pd (n 1) = Pd (n) + (d 2)(1 n) 1
for all n < 0 .
From these equations, we can deduce that all generalized polygonal numbers are nonnegative
integers. The first two formulas show that Pd (n) points can be arranged in a set of n nested
d-gons, as in this diagram of P3 (5) = 15 and P5 (5) = 35.
Polygonal numbers were studied somewhat by the ancients, as far back as the Pythagoreans,
but nowadays their interest is mostly historical, in connection with this famous result:
Theorem:For any d 3, any integer n 0 is the sum of some d d-gonal numbers.
586
In other words, any nonnegative integer is a sum of three triangular numbers, four squares,
five pentagonal numbers, and so on. Fermat made this remarkable statement in a letter to
Mersenne. Regrettably, he never revealed the argument or proof that he had in mind. More
than a century passed before Lagrange proved the easiest case: Lagranges four-square theorem.
The case d = 3 was demonstrated by Gauss around 1797, and the general case by Cauchy
in 1813.
587
Chapter 112
11D99 Miscellaneous
112.1
Diophantine equation
A Diophantine equation is an equation for which the solutions are required to be integers.
Generally, solving a Diophantine equation is not as straightforward as solving a similar
equation in the real numbers. For example, consider this equation:
x4 + y 4 = z 4
It is easy to find real numbers x, y, z that satisfy this equation: pick any arbitrary x and
y, and you can compute a z from them. But if we require that x, y, z all be integers, it is
no longer obvious at all how to find solutions. Even though raising an integer to an integer
power yields another integer, the reverse is not true in general.
As it turns out, of course, there are no solutions to the above Diophantine equation: it is a
case of Fermats last theorem.
At the Second International Congress of Mathematicians in 1900, David Hilbert presented
several unsolved problems in mathematics that he believed held special importance. Hilberts
tenth problem was to find a general procedure for determining if Diophantine equations have
solutions:
Given a Diophantine equation with any number of unknowns and with rational integer
coefficients: devise a process, which could determine by a finite number of operations whether
the equation is solvable in rational integers.
Note that this preceded the formal study of computing and Godels incompleteness theorem,
and it is unlikely that Hilbert had anticipated a negative solution that is, a proof that
no such algorithm is possible but that turned out the be the case. In the 1950s and 60s,
588
Martin Davis, Julia Robinson, and Hilary Putnam showed that an algorithm to determine
the solubility of all exponential Diophantine equations is impossible.
589
Chapter 113
11E39 Bilinear and Hermitian forms
113.1
Hermitian form
A sesquilinear form over a complex vector space V is a function B : V V C with the

properties:
1. B(x + y, z) = B(x, z) + B(y, z)
2. B(x, y + z) = B(x, y) + B(x, z)
3. B(cx, dy) = cB(x, y)d
for all x, y, z V and c, d C.
A Hermitian form is a sesquilinear form B which is also complex conjugate symmetric:
B(x, y) = B(y, x).
An inner product over a complex vector space is a positive definite Hermitian form.
113.2
non-degenerate bilinear form
A bilinear form B over a vector space V is said to be non-degenerate when

if B(x, y) = 0 for all x V , then y = 0, and
if B(x, y) = 0 for all y V , then x = 0.
590
113.3
positive definite form
A bilinear form B on a real or complex vector space V is positive definite if B(x, x) > 0 for
all nonzero vectors x V . On the other hand, if B(x, x) < 0 for all nonzero vectors x V ,
then we say B is negative definite.
A form which is neither positive definite nor negative definite is called indefinite.
113.4
symmetric bilinear form
A symmetric bilinear form is a bilinear form B which is symmetric in the two coordinates;
that is, B(x, y) = B(y, x) for all vectors x and y.
Every inner product over a real vector space is a positive definite symmetric bilinear form.
113.5
Clifford algebra
Let V be a vector space over a field k, and Q : V V k a symmetric bilinear form. Then
the Clifford algebra Cliff(Q, V ) is the quotient of the tensor algebra T(V ) by the relations
v w + w v = 2Q(v, w)
v, w V.
Since the above relationship is not homogeneous in the usual Z-grading on T(V ), Cliff(Q, V )
does not inherit a Z-grading. However, by reducing mod 2, we also have a Z2 -grading on
T(V ), and the relations above are homogeneous with respect to this, so Cliff(Q, V ) has a
natural Z2 -grading, which makes it into a superalgebra.
In addition, we do have a filtration on Cliff(Q, V ) (making it a filtered algebra), and the
associated graded algebra Cliff(Q, V ) is simply V , the exterior algebra of V . In particular,
dim Cliff(Q, V ) = dim V = 2dim V .
The most commonly used Clifford algebra is the case V = Rn , and Q is the standard
inner product with orthonormal basis e1 , . . . , en . In this case, the algebra is generated by
591
e1 , . . . , en and the identity of the algebra 1, with the relations

e2i = 1
ei ej = ej ei
(i 6= j)
Trivially, Cliff(R0 ) = R, and it can be seen from the relations above that Cliff(R)
= C, the
2
complex numbers, and Cliff(R ) = H, the quaternions.
On the other hand, for V = Cn we get the particularly simple answer of
Cliff(C2k )
= M2k (C)
Cliff(C2k+1 ) = M2k (C) M2k (C).
Version: 5 Owner: bwebste Author(s): bwebste
592
Chapter 114
11Exx Forms and linear algebraic
groups
114.1
quadratic function associated with a linear functional
quadratic function associated with a linear functional

Let V be a real Hilbert space (and thus an inner product space), and let f be a continuous
linear functional on V . Then f has an associated quadratic function : V R given by
1
(v) = kvk2 f (v)
2
Version: 3 Owner: drini Author(s): matte, drini, apmxi
593
Chapter 115
11F06 Structure of modular groups
and generalizations; arithmetic groups
115.1
Taniyama-Shimura theorem
For any natural number N > 1, define the modular group 0 (N) to be the following subgroup
of the group SL( 2, Z) of integer coefficient matrices of determinant 1:

a b
0 (N) :=
SL( 2, Z) c 0 (mod N) .
c d
Let H be the subset of the Riemann sphere consisting of all points in the upper half plane
(i.e., complex numbers with strictly positive imaginary part), together with the rational numbers
and the point at infinity. Then 0 (N) acts on H , with group action given by the operation

az + b
a b
.
z :=
c d
cz + d
Define X0 (N) to be the quotient of H by the action of 0 (N). The quotient space X0 (N)
inherits a quotient topology and holomorphic structure from C making it into a compact
Riemann surface. (Note: H itself is not a Riemann surface; only the quotient X0 (N) is.)
By a general theorem in complex algebraic geometry, every compact Riemann surface admits
a unique realization as a complex nonsingular projective curve; in particular, X0 (N) has such
a realization, which by abuse of notation we will also denote X0 (N). This curve is defined
over Q, although the proof of this fact is beyond the scope of this entry 1 .
1
Explicitly, the curve X0 (N ) is the unique nonsingular projective curve which has function field equal
to C(j(z), j(N z)), where j denotes the elliptic modular jfunction. The curve X0 (N ) is essentially the
algebraic curve defined by the polynomial equation N (X, Y ) = 0 where N is the modular polynomial,
with the caveat that this procedure yields singularities which must be resolved manually. The fact that N
has integer coefficients provides one proof that X0 (N ) is defined over Q.
594
Taniyama-Shimura Theorem (weak form): For any elliptic curve E defined over Q,
there exists a positive integer N and a surjective algebraic morphism : X0 (N) E
defined over Q.
This theorem was first conjectured (in a much more precise, but equivalent formulation) by
Taniyama, Shimura, and Weil in the 1970s. It attracted considerable interest in the 1980s
when Frey [2] proposed that the Taniyama-Shimura conjecture implies Fermats last theorem.
In 1995, Andrew Wiles [1] proved a special case of the Taniyama-Shimura theorem which
was strong enough to yield a proof of Fermats Last Theorem. The full Taniyama-Shimura
theorem was finally proved in 1997 by a team of a half-dozen mathematicians who, building on Wiless work, incrementally chipped away at the remaining cases until the full result was proved. As of this writing, the proof of the full theorem can still be found on
Richard Taylorss preprints page.
595
REFERENCES
1. Breuil, Christophe; Conrad, Brian; Diamond, Fred; Taylor, Richard; On the modularity of
elliptic curves over Q: wild 3-adic exercises. J. Amer. Math. Soc. 14 (2001), no. 4, 843939
2. Frey, G. Links between stable elliptic curves and certain Diophantine equations. Ann. Univ.
Sarav. 1 (1986), 140.
3. Wiles, A. Modular elliptic curves and Fermats Last Theorem. Annals of Math. 141 (1995),
443551.
596
Chapter 116
11F30 Fourier coefficients of
automorphic forms
116.1
Fourier coefficients
Let f be a Riemann integrable function from [, ] to R. Then the numbers

a0 =
1
int f (x)dx,
2
1
int f (x)cos(nx)dx,

1
bn = int f (x)sin(nx)dx
are called the Fourier coefficients of the function f.

an =
The trigonometric series

a0 +
(an cos(nx) + bn sin(nx))
n=1
is called the trigonometric series of the function f , or Fourier series of the function f.
Version: 5 Owner: mathcam Author(s): mathcam, vladm
597
Chapter 117
11F67 Special values of automorphic
L-series, periods of modular forms,
cohomology, modular symbols
117.1
Schanuels conjecutre
Let x1 , x2 , , xn be complex numbers linearly independent over Q. Then the set

{x1 , x2 , . . . , xn , ex1 , ex2 , . . . , exn }
has transcendence degree greater than or equal to n. Though seemingly innocuous, a proof
of Schanuels conjecture would prove hundreds of open conjectures in transcendental number
theory.
117.2
period
A real number x is a period if it is expressible as the integral of an algebraic function (with

algebraic coefficients) over an algebraic domain, and this integral is absolutely convergent.
This representation is called the numbers period representation. An algebraic domain is a subset of Rn given by polynomial inequalities with algebraic coefficients. A
complex number is defined to be a period if both its real and imaginary parts are. The set
of all complex periods is denoted by P.
598
117.2.1
Examples
Example 1. The transcendental number is a period since we can write

= intx2 +y2 61 dx dy.
Example 2. Any algebraic number is a period since we use the somewhat natural definition
that integration over a 0-dimensional space is taken to mean evaluation:
= int{} x
Example 3. logarithms of algebraic numbers are periods:
log = int1
117.2.2
1
dx
x
Non-periods
It is by no means trivial to find complex non-periods, though there existence is clear by a

counting argument: The set of complex numbers is uncountable, whereas the set of periods
is countable, as there are only countably many algebraic domains to choose and countably
many algebraic functions over which to integrate.
117.2.3
Inclusion
With the existence of a non-period, we have the following chain of set inclusions:
Z ( Q ( Q ( P ( C,
where Q denotes the set of algebraic numbers. The periods promise to prove an interesting
and important set of numbers in that nebulous region between Q and C.
117.2.4
References
Kontsevich and Zagier. Periods. 2001. Available on line at http://www.ihes.fr/PREPRINTS/M01/M01-22

599
Chapter 118
11G05 Elliptic curves over global
fields
118.1
complex multiplication
Let E be an elliptic curve. The endomorphism ring of E, denoted End(E), is the set of
all regular maps : E E such that (O) = O, where O E is the identity element for
the group structure of E. Note that this is indeed a ring under addition (( + )(P ) =
(P ) + (P )) and composition of maps.
The following theorem implies that every endomorphism is also a group endomorphism:
Theorem 8. Let E1 , E2 be elliptic curves, and let : E1 E2 be a regular map such that
(OE1 ) = OE2 . Then is also a group homomorphism, i.e.
P, Q E1 , (P +E1 Q) = (P ) +E2 (Q)
.
[Proof: See [2], Theorem 4.8, page 75]
If End(E) is isomorphic (as a ring) to an order R in a quadratic imaginary field K then we
say that the elliptic curve E has complex multiplication by K (or complex multiplication by
R).
Note: End(E) always contains a subring isomorphic to Z, formed by the multiplication by n
maps:
[n] : E E, [n]P = n P
and, in general, these are all the maps in the endomorphism ring of E.
600
Example: fix d Z. Let E be the elliptic curve defined by

y 2 = x3 dx
then this curve has complex multiplication by Q(i) (more concretely by Z(i)). Besides the
multiplication by n maps, End(E) contains a genuine new element:
[i] : E E,
[i](x, y) = (x, iy)
(the name complex multiplication comes from the fact that we are multiplying the points
in the curve by a complex number, i in this case).
REFERENCES
1. James Milne, Elliptic Curves, online course notes. http://www.jmilne.org/math/CourseNotes/math679.html
2. Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
3. Joseph H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves. Springer-Verlag,
New York, 1994.
4. Goro Shimura, Introduction to the Arithmetic Theory of Automorphic Functions. Princeton
University Press, Princeton, New Jersey, 1971.
601
Chapter 119
11H06 Lattices and convex bodies
119.1
Minkowskis theorem
Let L R2 be a lattice in the sense of number theory, i.e. a 2-dimensional free group over
Z which generates R2 over R. Let w1 , w2 be generators of the lattice L. A set F of the form
F = {(x, y) R2 : (x, y) = w1 + w2 ,
0 6 < 1,
0 6 < 1}
is usually called a fundamental domain or fundamental parallelogram for the lattice L.
Theorem 9 (Minkowskis Theorem). Let L be an arbitrary lattice in R2 and let be

the area of a fundamental parallelogram. Any convex region K symmetrical about the origin
and of area greater than 4 contains points of the lattice L other than the origin.
119.2
lattice in Rn
Definition 4. A lattice in Rn is an n-dimensional additive free group over Z which generates

Rn over R.
Example: The following is an example of a lattice L R2 , generated by w1 = (1, 2), w2 =
(4, 1).
602
L = {w1 + w2 | , Z}
603
Chapter 120
11H46 Products of linear forms
120.1
The triple
as
a1
det a2
a3
triple scalar product

scalar product of three vectors is an extension of the cross product. It is defined

b1 c1
b
c
b
c
b
c
2
2
1
1
1
1
b2 c2 = ~a (~b ~c) = a1 det
a2 det
+ a3 det
b3 c3
b3 c3
b2 c2
b3 c3
The determinant above is positive if the three vectors satisfy the right-hand rule and negative
otherwise. Recall that the magnitude of the cross product of two vectors is equivalent to the
area of the parallelogram they form, and the dot product is equivalent to the product of the
projection of one vector onto another with the length of the vector projected upon. Putting
these two ideas together, we can see that
|~a (~b ~c)| = |~b ~c||~a| cos = base height = Volume of parallelepiped
Thus, the magnitude of the triple scalar product is equivalent to the volume of the parallelepiped formed by the three vectors. (A parallelepiped is a three-dimensional object where
opposing faces are parallel. An example is a brick or sheared brick.)It follows that the triple
scalar product of three coplanar or collinear vectors is then 0.
identities related to the triple scalar product:
~ B)
~ C
~ = (B
~ C)
~ A
~ = (C
~ A)
~ B
~
(A
~ (B
~ C)
~ = A
~ (C
~ B)
~
A
The latter is implied by the properties of the cross product.
604
Chapter 121
11J04 Homogeneous approximation
to one number
121.1
Dirichlets approximation theorem
Theorem (Dirichlet, c. 1840): For any real number and any integer n 1, there exist
1
integers a and b such that 1 a n and |a b| n+1
.
Proof: We can suppose n 2. For each integer a in the interval [1, n], write ra = a [a]
[0, 1). Since the n + 2 numbers 0, ra , 1 all lie in the same unit interval, some two of them
1
. If 0 or 1 is in any such pair, then the other element
differ (in absolute value) by at most n+1
1
of the pair is one of the ra , and we are done. If not, then 0 rk rl n+1
for some distinct
k and l. If k > l we have rk rl = rkl , since each side is in [0, 1) and the difference between
them is an integer. Similarly, if k < l, we have 1 (rk rl ) = rlk . So, with a = k l or
a = l k respectively, we get
1
|ra c|
n+1
where c is 0 or 1, and the result follows.
It is clear that we can add the condition gcd(a, b) = 1 to the conclusion.
The same statement, but with the weaker conclusion |a b| < n1 , admits a slightly shorter
proof, and is sometimes also referred to as the Dirichlet approximation theorem. (It was that
shorter proof which made the pigeonhole principle famous.) Also, the theorem is sometimes
1
restricted to irrational values of , with the (nominally stronger) conclusion |a b| < n+1
.
Version: 2 Owner: Koro Author(s): Larry Hammick
605
Chapter 122
11J68 Approximation to algebraic
numbers
122.1
Davenport-Schmidt theorem
For any real which is not rational or quadratic irrational, there are infinitely many rational
or real quadratic irrational which satisfy
| |< C H()3 ,
where
C=
C0 is any fixed number greater than
C0 ,
if | |< 1,
C0 2 , if | |> 1.
160
9
and H() is the height of .
REFERENCES
1. Davenport, H. & Schmidt, M. Wolfgang: Approximation to real numbers by quadratic irrationals. Acta Arithmetica XIII, 1967.
122.2
Liouville approximation theorem
Given , a real algebraic number of degree n 6= 1, there is a constant c = c() > 0 such that
for all rational numbers p/q, (p, q) = 1, the inequality

p > c()

q
qn
606
holds.
Many mathematicians have worked at strengthening this theorem:
Thue: If is an algebraic number of degree n > 3, then there is a constant c0 =
c0 (, ) > 0 such that for all rational numbers p/q, the inequality

p
> c0 q 1n/2

q
holds.
Siegel: If is an algebraic number of degree n > 2, then there is a constant c1 =

c1 (, ) > 0 such that for all rational numbers p/q, the inequality

p
n
> c1 q ,
= mint=1,...,n
+t +

q
t+1
holds.
Dyson: If is an algebraic number of degree n > 3, then there is a constant c2 =

c2 (, ) > 0 such that for all rational numbers p/q with q > c2 , the inequality

p > q 2n

q
holds.
Roth: If is an irrational algebraic number and > 0, then there is a constant

c3 = c3 (, ) > 0 such that for all rational numbers p/q, the inequality

p > c3 q 2

q
holds.
122.3
proof of Liouville approximation theorem
Let satisfy the equation f () = an n + an1 n1 + + a0 = 0 where the ai are integers.

Choose M such that M > max16x6+1 |f 0(x)|.

p
Suppose q lies in ( 1, + 1) and f pq 6= 0.
607

n n
n1
n

f p = |a p + an1 p q + + a0 q | > 1

q
qn
qn
since the numerator is a non-zero integer.
By the mean-value theorem

p

p
p
1
0

6 f
f (x) < M .
f () =
n
q
q
q
q
608
Chapter 123
11J72 Irrationality; linear
independence over a field
123.1
nth root of 2 is irrational for n 3 (proof using

Fermats last theorem)
Proof. Suppose n 3, and suppose

that 2 = an /bn , or
2 = a/b for some positive integers a, b. It follows
bn + bn = an .
(123.1.1)
We can now apply a recent result of Andrew

Wiles [1], which states that there are no non-zero
integers a, b satisfying equation (2). Thus n 2 is irrational. 2
The above proof is given in [2], where it is attributed to W.H. Schultz.
REFERENCES
1. A. Wiles, Modular elliptic curves and Fermats last theorem, Annals of Mathematics,
Volume 141, No. 3 May, 1995, 443-551.
2. W.H. Schultz, An observation, American Mathematical Monthly, Vol. 110, Nr. 5, May 2003.
(submitted by R. Ehrenborg).
609
123.2
e is irrational (proof )
From the Taylor series for ex we know the following equation:

e=
X
1
.
k!
k=0
(123.2.1)
Now let us assume that e is rational. This would mean there are two natural numbers a and
b, such that:
a
e= .
b
This yields:
b!e N.
Now we can write e using (1):
X
1
.
b!e = b!
k!
k=0
This can also be written:
b!e =
X
X
b!
b!
+
.
k!
k!
k=0
k=b+1
The first sum is obviously a natural number, and thus
X
b!
k!
k=b+1
must also be natural. Now we see:

k

X
X
1
b!
1
1
1
= .
=
+
+ ... <
k!
b + 1 (b + 1)(b + 2)
b+1
b
k=b+1
k=1
Since
1
b
< 1 we conclude:
0<
X
b!
< 1.
k!
k=b+1
We have also seen that this is an integer, but there is no integer between 0 and 1. So there
cannot exist two natural numbers a and b such that e = ab , so e is irrational.
123.3
irrational
An irrational number is a real number which cannot be represented as a ratio of two

integers. That is, if x is irrational, then
610
x 6=
a
b
with a, b Z.
123.4
square root of 2 is irrational
Assume that the square root of 2 ( 2) is rational, then we can write
a
2= ,
b
where a, b N and a and b are relatively prime. But then
a2
2 = ( 2)2 = 2
b
2
2
2b = a .
From the above we have 2|a2 , and since 2 is prime it must divide a. Now we can write
c = a/2 and
2b2 = 4c2
b2 = 2c2 .
From the above we have 2|b2 , and since 2 is prime and it must divide b.
But if 2|a and 2|b, then a and b are not
relatively prime, which contradicts the hypothesis.
Hence the initial assumtion is false and 2 is irrational.
With a little bit of work this argument can be generalized to any positive integer that is not
k
a square. Let n be such an integer, then
there must exist a prime p such that n = p m,
where p 6 |m and k is odd. Assume that n = a/b, where a, b N and are relatively prime.
This is equivalent to
nb2 = pm kb2 = a2 .
From the fundamental theorem of arithmetic, it is clear that the maximum powers of p that
divides a2 and b2 are even. So, since m is odd, the maximum power of p that divides pm kb2
2
is also odd, and, from the above
equation, the same should be true for a . Hence, we have
reached a contradiction and n must be irrational.
The same argument can be generalized to even more, for example to the case of nonsquare
irreducible fractions and to higher order roots.
611
Chapter 124
11J81 Transcendence (general
theory)
124.1
Fundamental Theorem of Transcendence
The tongue-in-cheek name given to the fact that if n is a nonzero integer, then |n| > 1. This
trick is used in many transcendental number theory proofs. In fact, the hardest step of many
problems is showing that a particular integer is not zero.
124.2
Gelfonds theorem
Let and be algebraic over Q, with irrational and not equal to 0 or 1. Then is
transcendental over Q.
This is perhaps the most useful result in determining whether a number is algebraic or
transcendental.
Version: 4 Owner: mathcam Author(s): mathcam, kidburla2003
124.3
four exponentials conjecture
Four exponentials conjecture: Given four complex numbers x1 , x2 , y1 , y2 , either x1 /x2

or y1 /y2 is rational, or one of the four numbers exp(xi yj ) is transcendental.
612
This conjecture is stronger than the six exponentials theorem.
REFERENCES
[1] Waldschmidt, Michel, Diophantine approximation on linear algebraic groups. Transcendence
properties of the exponential function in several variables. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 326. Springer-Verlag,
Berlin, 2000. xxiv+633 pp. ISBN 3-540-66785-7.
124.4
six exponentials theorem
Complex numbers x1 , x2 , . . . , xn are Q-linearly independent if the only rational numbers

r1 , r2 , . . . , rn with
r1 x1 + r2 x2 + + rn xn = 0
are r1 = r2 = = rn = 0.
Six exponentials Theorem: If x1 , x2 , x3 are Q-linearly independent, and y1 , y2 are also

Q-linearly independent, then at least one of the six numbers exp(xi yj ) is transcendental.
This is weaker than the four exponentials conjecture.
Four Exponentials Conjecture: Given four complex numbers x1 , x2 , y1 , y2 , either x1 /x2
or y1 /y2 is rational, or one of the four numbers exp(xi yj ) is transcendental.
For the history of the six exponentials theorem, we quote briefly from [6, p. 15]:
The six exponentials theorem occurs for the first time in a paper by L. Alaoglu
and P. Erdos [1], when these authors try to prove Ramanujans assertion that the
quotient of two consecutive superior highly composite numbers is a prime, they
need to know that if x is a real number such that px1 and px2 are both rational
numbers, with p1 and p2 distinct prime numbers, then x is an integer. However,
this statement (special case of the four exponentials conjecture) is yet unproven.
They quote C. L. Siegel and claim that x indeed is an integer if one assumes pxi
to be rational for three distinct primes pi . This is just a special case of the six
exponentials theorem. They deduce that the quotient of two consecutive superior
highly composite numbers is either a prime, or else a product of two primes.
The six exponentials theorem can be deduced from a very general result of Th.
Schneider [4]. The four exponentials conjecture is equivalent to the first of the
613
eight problems at the end of Schneiders book [5]. An explicit statement of the six
exponentials theorem, together with a proof, has been published independently
and at about the same time by S. Lang [2, Chapter 2] and K. Ramachandra [3,
Chapter 2]. They both formulated the four exponentials conjecture explicitly.
REFERENCES
[1] L. Alaoglu and P. Erdos, On highly composite and similar numbers. Trans. Amer. Math. Soc.
56 (1944), 448469. Available online at www.jstor.org.
[2] S. Lang, Introduction to transcendental numbers, Addison-Wesley Publishing Co., Reading,
Mass., 1966.
[3] K. Ramachandra, Contributions to the theory of transcendental numbers. I, II. Acta Arith. 14
(1967/68), 65-72; ibid. 14 (1967/1968), 7388.
[4] Schneider, Theodor, Ein Satz u
ber ganzwertige Funktionen als Prinzip f
ur Transzendenzbeweise.
(German) Math. Ann. 121, (1949). 131140.
[5] Schneider, Theodor Einf
uhrung in die transzendenten Zahlen. (German) Springer-Verlag,
Berlin-Gottingen-Heidelberg, 1957. v+150 pp.
[6] Waldschmidt, Michel, Diophantine approximation on linear algebraic groups. Transcendence
properties of the exponential function in several variables. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 326. Springer-Verlag,
Berlin, 2000. xxiv+633 pp. ISBN 3-540-66785-7.
124.5
transcendental number
A transcendental number is a complex number that is not an algebraic number. The

most famous transcendental numbers are and e (the natural log base.)
Cantor showed that, in a sense, almost all numbers are transcendental, because the algebraic numbers are countable, whereas the transcendental numbers are not.
614
Chapter 125
11K16 Normal numbers, radix
expansions, etc.
125.1
absolutely normal
Let x R and b N with b 2. We consider the sequence of digits of x in base b. If s is a

finite string of digits in base b and n N, we let N(s, n) be the number of times the string
s occurs among the first n digits of x in base b. We say that x is normal in base b if
lim
N(s, n)
1
= k
n
b
for every string s of length k.

Intuitively, x is normal in base b if all digits and digit-blocks in the base-b digit sequence of
x occur just as often as would be expected if the sequence had been produced completely
randomly.
We say that x is absolutely normal if it is normal in every base b 2. (Some authors use
the term normal instead of absolutely normal.)
Absolutely normal numbers were first defined by Emile

Borel in 1909. Borel also proved that
almost all real numbers are absolutely normal, in the sense that the numbers that are not
absolutely normal form a set with Lebesgue measure zero. However, for any base b, there
are uncountably many numbers that are not normal in base b.
Champernownes number
0.1234567891011121314...
(obtained by concatentating the decimal expansions of all natural numbers) is normal in
base 10, but not absolutely normal.
Few absolutely normal numbers are known. The first one was constructed by Sierpinski in
615
1916, and a related construction led to a computable absolutely normal number in 2002.
Maybe the most prominent absolutely normal number is Chaitins constant , which is not
computable.
No rational number can be normal in any base. Beyond that, it is extremely hard to prove
or disprove normality of a given constant. For instance, it has been conjectured that all
irrational algebraic numbers are absolutely normal since no counterexamples are known; on
the other hand, not a single irrational algebraic number has been proven normal in any base.
Likewise, it is conjectured that the transcendental constants , e, and ln(2) are normal, and
this is supported by some empirical evidence, but a proof is out of reach. We dont even
know which digits occur infinitely often in the decimal expansion of .
Version: 4 Owner: AxelBoldt Author(s): AxelBoldt
616
Chapter 126
11K45 Pseudo-random numbers;
Monte Carlo methods
126.1
pseudorandom numbers
Generated in a digital computer by a numerical algorithm, pseudorandom numbers are not

random, but should appear to be random when used in Monte Carlo calculations.
The most widely used and best understood pseudorandom generator is the Lehmer multiplicative
congruential generator, in which each number r is calculated as a function of the preceding
number in the sequence:
ri = [ari1 ] (mod#1)
or
ri = [ari1 + c] (mod#1)
where a and c are carefully chosen constants, and m is usually a power of two, 2k . All
quantities appearing in the formula (except m) are integers of k bits. The expression in
brackets is an integer of length 2k bits, and the effect of the modulo (mod#1) is to mask
off the most significant part of the result of the multiplication. r0 is the seed of a generation
sequence; many generators allow one to start with a different seed for each run of a program,
to avoid re-generating the same sequence, or to preserve the seed at the end of one run for
the beginning of a subsequent one. Before being used in calculations, the ri are usually
transformed to floating point numbers normalized into the range [0, 1]. Generators of this
type can be found which attain the maximum possible period of 2k2 , and whose sequences
pass all reasonable tests of randomness, provided one does not exhaust more than a few
617
percent of the full period.

References
126.2
quasirandom numbers
Quasirandom Numbers are sequences of numbers to be used in Monte Carlo calculations,

optimized not to appear highly random, but rather to give the fastest convergence in the
computation. They are applicable mainly to multidimensional integration, where the theory
is based on that of uniformity of distribution ([Kuipers74]).
Because the way of generating and using them is quite different, one must distinguish between
finite and infinite quasirandom sequences.
A finite quasirandom sequence is optimized for a particular number of points in a particular
dimensionality of space. However, the complexity of this optimization is so horrendous that
exact solutions are known only for very small point sets ([Kuipers74], [Zaremba72]) The most
widely used sequences in practice are the Korobov sequences.
An infinite quasirandom sequence is an algorithm which allows the generation of sequences
of an arbitrary number of vectors of arbitrary length (p-dimensional points). The properties
of these sequences are generally known only asymptotically, where they perform considerably
better than truly random or pseudorandom
sequences, since they give 1/N convergence for
Monte Carlo integration instead of 1/ N. The short-term distribution may, however, be

rather poor, and generators should be examined carefully before being used in sensitive
calculations. Major improvements are possible by shuffling, or changing the order in which
the numbers are used. An effective shuffling technique is given in [Braaten79].
References
Kuipers74 L. Kuipers and H. Niederreiter, Uniform Distribution of Sequences, Wiley, New York,
1974.
Zaremba72 S.K. Zaremba (Ed.), Applications of Number Theory to Numerical Analysis, Academic
Press, New York, 1972.
618
Braaten79 E. Braaten and G. Weller, An Improved Low-discrepancy Sequence for Multidimensional Quasi-Monte Carlo Integration, J. Comp. Phys. 33 (1979) 249.
126.3
random numbers
Random numbers are particular occurrences of random variables. They are necessary for
Monte Carlo calculations as well as many other computerized processes. There are three
kinds of random numbers:
truly random numbers
pseudorandom numbers
quasirandom numbers
126.4
truly random numbers
Truly random numbers can only be generated by a physical process and cannot be generated
via software. This makes it rather clumsy to use them in Monte Carlo calculations, since
they must be first generated in a separate device and either sent to the computer or recorded
(for example on removable storage media) for later use in calculations. Traditionally, tapes
containing millions of random numbers generated using radioactive decay were available from
laboratories.
Nowadays, standard digital computers often have provisions for obtaining truly random
numbers, that is, numbers generated by a physical process. Fir instance, Intel has provided
a function since their i810 chipsets which utilizes noise in a particularly prone semiconductor
as a source of randomness. Often times it is possible to use other incidental physical noise as
a source; for example, static on the the input channel of a sound card. In addition, peripheral
devices (add-ons) to personal computers exist which provide truly random numbers, when
the previous methods fail.
References
619
Chapter 127
11L03 Trigonometric and
exponential sums, general
127.1
Ramanujan sum
For positive integers s and n, the complex number

X
cs (n) =
e2iks/n
0k<n
(k,n)=1
is referred to as a Ramanujan sum, or a Ramanujan trigonometric sum. Since e2i = 1, an

equivalent definition is
X
cs (n) =
e2iks/n
kr(n)
where r(n) is some reduced residue system mod n, meaning any subset of Z containing
exactly one element of each invertible residue class mod n.
Using a symmetry argument about roots of unity, one can show

(
X
s if s|n
cs (n) =
0 otherwise.
d|s
Applying Mobius inversion, we get
cs (n) =
(n/d)d =
d|n
d|s
(n/d)d
d|(n,s)
which shows that cs (n) is a real number, and indeed an integer. In particular cs (1) = (s).
More generally,
cst (mn) = cs (m)ct (n) if (m, t) = (n, s) = 1 .
620
Using the Chinese remainder theorem, it is not hard to show that for any fixed n, the function
s 7 cs (n) is multiplicative:
cs (n)ct (n) = cst (n) if (s, t) = 1 .
If m is invertible mod n, then the mapping k 7 km is a permutation of the invertible residue
classes mod n. Therefore
cs (mn) = cs (n) if (m, s) = 1 .
Remarks:Trigonometric sums often make convenient apparatus in number theory, since any
function on a quotient ring of Z defines a periodic function on Z itself, and conversely. For
another example, see Landsberg-Schaar relation.
Some writers use different notation from ours, reversing the roles of s and n in the expression
cs (n).
The name Ramanujan sum was introduced by Hardy.
Version: 5 Owner: drini Author(s): Larry Hammick, drini
621
Chapter 128
11L05 Gauss and Kloosterman
sums; generalizations
128.1
Gauss sum
Let p be a prime. Let be any multiplicative group character on Z/pZ (that is, any
group homomorphism of multiplicative groups (Z/pZ) C ). For any a Z/pZ, the
complex number
X
ga () :=
(t)e2iat/p
tZ/pZ
is called a Gauss sum on Z/pZ associated to .

In general, the equation ga () = (a1 )g1 () (for nontrivial a and ) reduces the compu
tation of general Gauss sums to that of g1 (). The absolute value of g1 () is always p as
long

as is nontrivial, and if is a quadratic character (that is, (t) is the Legendre symbol
t
), then the value of the Gauss sum is known to be
p
(
p, p 1 (mod 4),
g1 () =
i p, p 3 (mod 4).
REFERENCES
1. Kenneth Ireland & Michael Rosen, A Classical Introduction to Modern Number Theory, Second
Edition, SpringerVerlag, 1990.
622
128.2
Kloosterman sum
The Kloosterman sum is one of various trigonometric sums that are useful in number theory
and, more generally, in finite harmonic analysis. The original Kloosterman sum is

X
2i(ax + bx1 )
Kp (a, b) =
exp
p
xF
p
where Fp is the field of prime order p. Such sums have been generalized in a few different
ways since their introduction in 1926. For instance, let q be a prime power, Fq the field of q
elements, : Fq C a character, and : Fq C a mapping such that (x + y) = (x)(y)
identically. The sums
X
(x)(ax + bx1 )
K (|a, b) =
xFq
are of interest, because they come up as Fourier coefficients of modular forms.

Kloosterman sums are finite analogs of the K-Bessel functions of this kind:

a(x + x1 )
1 s1
dx
Ks (a) = int0 x exp
2
2
where Re(a) > 0.
128.3
Landsberg-Schaar relation
The Landsberg-Schaar relation states that for any positive integers p and q:
p1
1 X
exp
p n=0
2in2 q
p

2q1
ei/4 X
in2 p
=
exp
2q
2q n=0
(128.3.1)
Although both sides of (1) are mere finite sums, no one has yet found a proof which uses no
infinite limiting process. One way to prove it is to put = 2iq/p + , where > 0, in this
identity due to Jacobi:
+
+
X
1 X n2 /
n2
e
(128.3.2)
e
=
n=
n=
and let 0. The details can be found in various works on harmonic analysis, such as [1].
The identity (2) is a basic one in the theory of theta functions. It is sometimes called the
functional equation for the Riemann theta function. See e.g. [2 VII.6.2].
If we just let q = 1 in the Landsberg-Schaar identity, it reduces to a formula for the quadratic
Gauss sum mod p; notice that p need not be prime.
623
References:
[1] H. Dym and H.P. McKean. Fourier series and Integrals. Academic Press, 1972.
[2] J.-P. Serre. A Course in Arithmetic. Springer, 1970.
128.4
derivation of Gauss sum up to a sign
The Gauss sum can be easily evaluated up to a sign by squaring the original series
X s
X t
2is/p
2
e
e2it/p
g1 () =
p
p
sZ/pZ
tZ/pZ

X
st 2i(s+t)/p
=
e
p
s,tZ/pZ
and summing over a new variable n = s1 t (mod p)

X n
=
e2i(s+ns)
p
s,nZ/pZ
X n X
e2is(n+1)
=
p
sZ/pZ
nZ/pZ
X n
=
(q[n 1 (mod p)] 1)
p
nZ/pZ

X n
1
=p
p
p
nZ/pZ
(
p, if p 1 (mod 4),
1
=p
=
p
p, if p 3 (mod 4).
REFERENCES
1. Harold Davenport. Multiplicative Number Theory. Markham Pub. Co., 1967. Zbl 0159.06303.
624
Chapter 129
11L40 Estimates on character sums
129.1
Plya-Vinogradov inequality
Theorem 8. For m, n N and p a positive odd rational prime,

m+n
X t

< p ln p.

p
t=m
tart with the following manipulations:

m+n
X
t=m
t
p
The expression
p1 m+n p1
1XXX
=
p t=0 x=m a=0
Pp1

p1 m+n
p1
t
1 X X 2iax/p X t
2ia(xt)/p
e
=
e
e2iat/p
p
p a=1 x=m
p
t=0
t 2iat/p
t=0 ( p )e
is just a Gauss sum, and has magnitude
p . Hence
p1
p1

p1

n
p X e2ian/p 1
pX
m+n
p X m+n
X
X
X
t

e2iax/p 6
e2aix/p =
e2iam/p

6

2ia/p

p
p a=1 x=m
p a=1
p a=1 e
1
x=0
t=m
p1

p1
p1
p X eian/p sin(an/p) p X
X 1

1

6 p
=
6

sin(ha/pi)
p
eia/p sin(a/p)
p
p
2ha/pi
a=1
a=1
a=1
Here hxi denotes the absolute value of the difference between x and the closest integer to x,
i.e. hxi = inf zZ {|x z|}.
Since p is odd, we have
625
p1
p1
2
X p
X
1
1X 1
=
=p
2 a=1 ha/pi
a
p a
a=1
0<a< 2
2x+1
Now ln 2x1
> x1 for x > 1; to prove this, it suffices to show that the function f : [1, ) R
is decreasing and approaches 1 as x . To prove the latter
given by f (x) = x ln 2x+1
2x1
statement, substitute v = 1/x and take the limit as v 0 using LHopitals rule. To prove
the former statement, it will suffice to show that f 0 is less than zero on the interval [1, ).
4x2 +1
But f 0 (x) 0 as x and f 0 is increasing on [1, ), since f 00 (x) = 4x4
2 1 (1 4x2 1 ) > 0
for x > 1, so f 0 is less than zero for x > 1.
With this in hand, we have

p1
p1

2
2
m+n
X
p X
t
1 X
2a + 1

p
< p
= p ln p.
ln

6

p
p
a
2a 1
t=m
a=1
a=1
REFERENCES
1. Vinogradov, I. M., Elements of Number Theory, 5th rev. ed., Dover, 1954.
626
Chapter 130
11M06 (s) and L(s, )
130.1
Ap
erys constant
The number
X
1
(3) =
n3
n=1
= 1.202056903159594285399738161511449990764986292 . . .
has been called Aperys constant since 1979, when Roger Apery published a remarkable proof
that it is irrational [1].
REFERENCES
1. Roger Apery. Irrationalite de (2) et (3). Asterisque, 61:1113, 1979.
2. Alfred van der Poorten. A proof that Euler missed. Aperys proof of the irrationality of (3).
An informal report. Math. Intell., 1:195203, 1979.
130.2
Dedekind zeta function
Let K be a number field with ring of integers OK . Then the Dedekind zeta function of K is
the analytic continuation of the following series:
X
K (s) =
(NQK (I))s
IOK
627
where I ranges over non-zero ideals of OK , and NQK (I) = |OK : I| is the norm of I.
This converges for Re(s) > 1, and has a meromorphic continuation to the whole plane, with
a simple pole at s = 1, and no others.
The Dedekind zeta function has an Euler product expansion,
K (s) =
Y
p
1
1 (NQK (p))s
where p ranges over prime ideals of OK . The Dedekind zeta function of Q is just the
Riemann zeta function.
130.3
Dirichlet L-series
The Dirichlet L-series associated to a Dirichlet character is the series

L(, s) =
X
(n)
n=1
ns
(130.3.1)
It converges absolutely and uniformly in the domain Re(s) > 1 + for any positive , and
admits the Euler product identity
L(, s) =
Y
p
1
1 (p)ps
(130.3.2)
where the product is over all primes p, by virtue of the multiplicativity of . In the case
where = 0 is the trivial character mod m, we have
Y
L(0 , s) = (s) (1 ps ),
(130.3.3)
p|m
where (s) is the Riemann zeta function. If is non-primitive, and C is the conductor of
, we have
Y
(1 (p)ps ),
(130.3.4)
L(, s) = L(0, s)
p|m
p-C
where 0 is the primitive character which induces . For non-trivial, primitive characters
mod m, L(, s) admits an analytic continuation to all of C and satsfies the symmetric
functional equation

m 1s
m s/2 s + e
g1 ()
1 s + e
2
1
= e L( , 1 s)
. (130.3.5)
L(, s)
2
i
m
2
628
Here, e {0, 1} is defined by (1) = (1)e (1), is the gamma function, and g1 ()
is a Gauss sum. (3),(4), and (5) combined show that L(, s) admits a (3)meromorphic
(3)continuation to all of C for all Dirichlet characters , and an analytic one for (3)nontrivial . Again assuming that is non-trivial and primitive character mod m, if k is a
positive integer, we have
Bk,
L(, 1 k) =
,
(130.3.6)
k
where Bk, is a generalized Bernoulli number. By (5), taking into account the poles of , we
get for k positive, k e mod 2,
k
ke
Bk,1
2
1+ 2 g1 ()
.
(130.3.7)
L(, k) = (1)
e
2i
m
k!
This series was first investigated by (duh) Dirichlet, who used the non-vanishing of L(, 1) for
non-trivial to prove his famous Dirichlets theorem on primes in arithmetic progression.
This is probably the first instance of using complex analysis to prove a purely number
theoretic result.
130.4
Riemann -function
The Riemann theta function is a number theoretic function which is only really used in the
derivation of the functional equation for the Riemann Xi function.
It is defined as:
(x) = 2(x) + 1
where is the Riemann omega function.
The domain of the Riemann Theta function is x 0.
To give an exact form for the theta function, note that:
P
P
P
n2 x
n2 x
(n)2 x
=
=
(x) =
n=1 e
n=1 e
n=1 e
so that:
2(x)+1 =
(x) =
n=1
n=
en
en
2 x
2 x
+(x)+1 =
n2 x
+
n=1 e
n=1
en
2 x
2 x
+e0
n=
en
2 x
Riemann showed that the theta function satisfied a functional equation, which was the key
step in the proof of the analytic continuation for the Riemann Xi function; this has direct
consequences of course with the Riemann zeta function.
629
130.5
Riemann Xi function
The Xi function is the function which is the key to the functional equation for the Riemann zeta function.
It is defined as:
1
(s) = 2 s ( 12 s)(s)
Riemann himself used the notation of a lower case xi (). The famous Riemann hypothesis is
equivalent to the assertion that all the zeros of are real, in fact Riemann himself presented
his original hypothesis in terms of that function.
Riemanns lower case xi is defined as:
(s) = 12 s(s 1)(s)
130.6
Riemann omega function
The Riemann Omega function is used in the proof of the analytic continuation for the
Riemann Xi function to the whole complex plane. It is defined as:
P
n2 x
(x) =
n=1 e
The Omega function satisfies a functional equation also, which can be easily derived from
the theta functional equation.
130.7
functional equation for the Riemann Xi function
The Riemann Xi function satisfies a functional equation, which directly implies the Riemann
Zeta functions functional equation. The proof depends on the Riemann Theta function.
(s) = (1 s)
You can see from the definition that the Xi function is not defined for Re(s) < 1, since the
Riemann zeta function is only defined for Re(s) > 1, so this is an important theorem for
630
the Zeta function (in fact, there are no zeros with real part greater than 1, so without this
functional equation the study of the zeta function would be very limited).
130.8
functional equation for the Riemann theta function
The lemma used in the derivation of the functional equation for the Riemann Xi function.
This functional equation is not as remarkable as the one for the Xi function, because it does
not actually extend the domain of the function.
( x1 ) = x(x)
The proof relies on the Cauchy integral formula and the Poisson summation formula.
130.9
generalized Riemann hypothesis
This generalization of the Riemann hypothesis to arbitrary Dedekind zeta functions states
that for any number field K, the only zeroes s of the Dedekind zeta function K (s) that lie
in the strip 0 6 Re s 6 1 satisfy Re s = 21 .
130.10
proof of functional equation for the Riemann

theta function
All sums are over all integers unless otherwise specified. Thus the Riemann theta function
is
X
2
(x) =
en x .
n
Now, we wish to apply Poission P

summation to f (x, y) = exy , in the variable y. (x) =
P
n f (x, n), and thus is equal to

n f (x, n) where
f (x, n) = intR f (x, y)e2iny dy = intR e(xy
631
2 +2iny)
dy.
(OK, I thought I knew this one, but Im stuck, and have to go. Ill finish it later).
632
Chapter 131
11M99 Miscellaneous
131.1
Riemann zeta function
131.1.1
Definition
The Riemann zeta function is defined to be the complex valued function given by the series
X
1
,
(s) :=
ns
n=1
(131.1.1)
which is valid (in fact, absolutely convergent) for all complex numbers s with Re(s) > 1. We
list here some of the key properties [1] of the zeta function.
1. For all s with Re(s) > 1, the zeta function satisfies the Euler product formula
(s) =
Y
p
1
,
1 ps
(131.1.2)
where the product is taken over all positive integer primes p, and converges uniformly
in a neighborhood of s.
2. The zeta function has a meromorphic continuation to the entire complex plane with a
simple pole at s = 1, of residue 1, and no other singularities.
3. The zeta function satisfies the functional equation
(s) = 2s s1 sin
s
(1 s)(1 s),
2
for any s C (where denotes the gamma function).

633
(131.1.3)
131.1.2
Distribution of primes
The Euler product formula (131.1.2) given above expresses the zeta function as a product
over the primes p Z, and consequently provides a link between the analytic properties of
the zeta function and the distribution of primes in the integers. As the simplest possible
illustration of this link, we show how the properties of the zeta function given above can be
used to prove that there are infinitely many primes.
If the set S of primes in Z were finite, then the Euler product formula
Y
1
(s) =
1 ps
pS
would be a finite product, and consequently lims1 (s) would exist and would equal
lim (s) =
s1
pS
1
.
1 p1
But the existence of this limit contradicts the fact that (s) has a pole at s = 1, so the set
S of primes cannot be finite.
A more sophisticated analysis of the zeta function along these lines can be used to prove both
the analytic prime number theorem and Dirichlets theorem on primes in arithmetic progressions
1
. Proofs of the prime number theorem can be found in [2] and [5], and for proofs of Dirichlets
theorem on primes in arithmetic progressions the reader may look in [2] and [1].
131.1.3
Zeros of the zeta function
A nontrivial zero of the Riemann zeta function is defined to be a root (s) = 0 of the zeta
function with the property that 0 6 Re(s) 6 1. Any other zero is called trivial zero of the
zeta function.
The reason behind the terminology is as follows. For complex numbers s with real part
greater than 1, the series definition (131.1.1) immediately shows that no zeros of the zeta
function exist in this region. It is then an easy matter to use the functional equation (131.1.3)
to find all zeros of the zeta function with real part less than 0 (it turns out they are exactly
the values 2n, for n a positive integer). However, for values of s with real part between 0
and 1, the situation is quite different, since we have neither a series definition nor a functional
equation to fall back upon; and indeed to this day very little is known about the behavior
of the zeta function inside this critical strip of the complex plane.
It is known that the prime number theorem is equivalent to the assertion that the zeta
function has no zeros s with Re(s) = 0 or Re(s) = 1. The celebrated Riemann hypothesis
1
In the case of arithmetic progressions, one also needs to examine the closely related Dirichlet Lfunctions
in addition to the zeta function itself.
634
asserts that all nontrivial zeros s of the zeta function satisfy the much more precise equation
Re(s) = 1/2. If true, the hypothesis would have profound consequences on the distribution
of primes in the integers [5].
635
REFERENCES
1.
2.
3.
4.
5.
Lars Ahlfors, Complex Analysis, Third Edition, McGrawHill, Inc., 1979.

Joseph Bak & Donald Newman, Complex Analysis, Second Edition, SpringerVerlag, 1991.
Gerald Janusz, Algebraic Number Fields, Second Edition, American Mathematical Society, 1996.
Serge Lang, Algebraic Number Theory, Second Edition, SpringerVerlag, 1994.
Stephen Patterson, Introduction to the Theory of the Riemann Zeta Function, Cambridge University Press, 1988.
6. B. Riemann, Ueber die Anzahl der Primzahlen unter einer gegebenen Gr
osse,
http://www.maths.tcd.ie/pub/HistMath/People/Riemann/Zeta/
7. JeanPierre Serre, A Course in Arithmetic, SpringerVerlag, 1973.
131.2
formulae for zeta in the critical strip
Let us use the traditional notation s = + it for the complex variable, where and t are
real numbers.
(s) =
X
1
(1)n+1 ns
1 21s n=1
>0
1
x [x]
+ 1 sint
dx
>0
1
s1
xs+1
((x))
1
1
(s) =
+ sint
dx > 1
1
s1 2
xs+1
(s) =
(131.2.1)
(131.2.2)
(131.2.3)
where [x] denotes the largest integer x, and ((x)) denotes x [x] 12 .
We will prove (131.2.2) and (131.2.3) with the help of this useful lemma:
Lemma: For integers u and v such that 0 < u < v:
v
X
n=u+1
ns = sintvu
x [x]
v 1s u1s
dx
+
xs+1
1s
Proof: If we can prove the special case v = u + 1, namely

(u + 1)s = sintu+1
u
(u + 1)1s u1s
x [x]
dx
+
xs+1
1s
636
(131.2.4)
then the lemma will follow by summing a finite sequence of cases of (131.2.4). The integral
in (131.2.4) is
int10
tdt
= int10 (u + t)s dt int10 u(u + t)s1 dt
(u + t)s+1
(u + 1)1s u1s u [(u + 1)s us ]
+
=
1s
s
so the right side of (131.2.4) is

u1s
s
(u + 1)1s
(u + 1)1s u1s u (u + 1)s us
+
1s
1s
1s

us
u+1
u
s s(u + 1)
s
= (u + 1)
+u
u+
+u
1s
1s
1s
1s
and the lemma is proved.
= (u + 1)s 1 + us 0
Now take u = 1 and let v in the lemma, showing that (131.2.2) holds for > 1. By
the principle of analytic continuation, if the integral in (131.2.2) is analytic for > 0, then
(131.2.2) holds for > 0. But x [x] is bounded, so the integral converges uniformly on
for any > 0, and the claim (131.2.2) follows.
We have
1
1
1s
sint
dx =
1 x
2
2
Adding and subtracting this quantity from (131.2.2), we get (131.2.3) for > 0. We need
to show that
((x))
int
dx
1
xs+1
is analytic on > 1. Write
f (y) = inty1 ((x))dx
and integrate by parts:
f (x)
((x))
dx = lim f (x)x1s f (1)x11 + (s + 1)int
dx
1
s+1
x
x
xs+2
The first two terms on the right are zero, and the integral converges for > 1 because f
is bounded.
int
1
Remarks: We will prove (131.2.1) in a later version of this entry.

Using formula (131.2.3), one can verify Riemanns functional equation in the strip 1 < <
2. By analytic continuation, it follows that the functional equation holds everywhere. One
way to prove it in the strip is to decompose the sawtooth function ((x)) into a Fourier series,
and do a termwise integration. But the proof gets rather technical, because that series does
not converge uniformly.
637
131.3
functional equation of the Riemann zeta function
Let denote the gamma function, the Riemann zeta function and s any complex number.
Then
s1
2
1s
2
(1 s) =
s
2
s
2
(s).
Though the equation appears too intricate to be of any use, the inherent symmetry in the
formula makes this the simplest method of evaluating (s) at points to the left of the critical
strip.
131.4
value of the Riemann zeta function at s = 2
Here we present an application of Parsevals equality to number theory. Let (s) denote the
Riemann zeta function. We will compute the value
(2)
with the help of Fourier analysis.
Example:
Let f : R R be the identity function, defined by
f (x) = x, for all x R
The Fourier series of this function has been computed in the entry examples of Fourier sery.
Thus
f (x) = x =
af0
(afn cos(nx) + bfn sin(nx))
n=1
X
2
=
(1)n+1 sin(nx),
n
n=1
Parsevals theorem asserts that:

638
x (, )
X f
1 2
int f (x)dx = 2(af0 )2 +
[(ak )2 + (bfk )2 ]
k=1
So we apply this to the function f (x) = x:

2(af0 )2
[(afk )2
(bfk )2 ]
k=1
and
1 2
1
2 2
int f (x)dx = int x2 dx =
Hence by Parsevals equality

4
and hence
X
X
1
4
=4
=
2
n
n2
n=1
n=1
X
2 2
1
=
n2
3
n=1
X
1
2
(2) =
=
n2
6
n=1
639
Chapter 132
11N05 Distribution of primes
132.1
Bertrands conjecture
Bertrand conjectured that for every positive integer n > 1, there exists at least one prime
p satisfying n < p < 2n. This result was proven in 1850 by Chebyshev, but the name
Bertrands Conjecture remains in the literature.
132.2
Bruns constant
Bruns constant is the sum of the reciprocals of all twin primes

X 1
1
B=
+
1.9216058.
p p+2
p
p+2 is prime
Viggo Brun proved that the constant exists by using a new sieving method, which later
became known as Bruns sieve.
132.3
proof of Bertrands conjecture
This is a version of Erdos proof as it appears in Hardy and Wright.

We start by deriving an upper bound on .
640
Definition.
(n) =
log p
p6n
p prime
Theorem. (n) 6 n log 4

Proof. By induction.
The cases for n = 1 and n = 2 follow by inspection.
For even n > 2, the case follows immediately from the case for n 1 since n isnt prime.
So letn = 2m + 1 with m > 0 and consider (1 + 1)2m+1 and its binomial
expansion. Since

2m+1
2m+1
2m+1
m
= m+1 and both terms occur exactly once, we find m 6 4 . Each prime p
m

2m+1
6 m log 4.
with m+1 2 and there is no prime p with
n 2n+1 .

For a prime p define r(p, n) to be the highest power of p dividing 2n
. We first look at the
n
highest power of p dividing n!. Every p-th term contributes a factor, so we have already [ np ]
factors, where [x] is the integer part of x. However, every p2 -th term contributes an extra
3
factor above that,
P nand every p -th term one more and so on. So the highest power of p
dividing n! is j [ pj ]. Now for r(p, n) we need to take the factors contributed by (2n)! and
P
] 2[ pnj ]). Now
subtract twice the factors taken away by n!. This leads us to r(p, n) = j ([ 2n
pj
each of these terms is either 0 or 1 (as is every value of [2x] 2[x]), and the terms vanish for
j > [ log(2n)
], so r(p, n) 6 [ log(2n)
] or pr(p,n) 6 2n.
log p
log p
Q r(p,n)
= pp
Now 2n
. By the previous inequality, primes larger than 2n do not contribute
n
to this product and by assumption there are no primes between n and 2n. So

Y
2n
=
pr(p,n) .
n
16p6n
p prime
If p >
2n, all the terms for higher powers of p vanish and r(p, n) = [ 2n
] 2[ np ].
p
For n > p > 2n

, 32 > np > 1 and so for p > 2n
> 2n we can apply the previous formula for
3
3
r(p, n) and find that its zero. So for all n > 4, the contribution of the primes larger than
2n
is zero.
3
641

Now for p> 2n, r(p, n) is at most 1, so an upper bound for the contribution for the primes
2n
between 2n and 2n
is the product of all primes smaller than 2n
which equals e( 3 ) . There
3
3
are at most 2nprimes smaller than 2n and by the inequality pr(p,n) 6 2n their product
is less than (2n) 2n .
Combining these we get
4n
6
2n + 1

2n
2n
6 (2n) 2n e( 3 ) .
n
Using 2n + 1 < (2n)2 and taking logarithms, we find n log3 4 6 ( 2n + 2) log(2n) which is
clearly false for large enough n, say n = 211 , leading to a contradiction. For smaller n,
we prove the theorem by exhibiting a sequence of primes in which each is smaller than the
double of its predecessor, e.g., 3, 5, 7, 13, 23, 43, 83, 163, 317, 631, 1259, 2503.
132.4
twin prime conjecture
Two consecutive odd numbers which are both prime are called twin primes, e.g. 5 and 7, or
41 and 43, or 1,000,000,000,061 and 1,000,000,000,063. But is there an infinite number of
twin primes ?
642
Chapter 133
11N13 Primes in progressions
643
Chapter 134
11N32 Primes represented by
polynomials; other multiplicative
structure of polynomial values
134.1
Euler four-square identity
The Euler four-square identity simply states that

(x21 +x22 +x23 +x24 )(y12 +y22 +y32 +y42) = (x1 y1 +x2 y2 +x3 y3 +x4 y4 )2 +(x1 y2 x2 y1 +x3 y4 x4 y3 )2
+(x1 y3 x3 y1 + x4 y2 x2 y4 )2 + (x1 y4 x4 y1 + x2 y3 x3 y2 )2
It may be derived from the property of quaternions that the norm of the product is equal to
the product of the norms.
644
Chapter 135
11N56 Rate of growth of arithmetic
functions
135.1
highly composite number
We call n a highly composite number if d(n) > d(m) for all m < n, where d(n) is the number
of divisors of n. The first several are 1, 2, 4, 6, 12, 24. The sequence is A002182 in Sloanes
encyclopedia.
The integer n is superior highly composite if there is an > 0 such that for all m 6= n,
d(n)n > d(m)m .
The first several superior highly composite numbers are 2, 6, 12, 60, 120, 360. The sequence
is A002201 in Sloanes encyclopedia.
REFERENCES
[1] L. Alaoglu and P. Erdos, On highly composite and similar numbers. Trans. Amer. Math. Soc.
56 (1944), 448469. Available at www.jstor.org
645
Chapter 136
11N99 Miscellaneous
136.1
Chinese remainder theorem
Let R be a commutative ring with identity. If I1 , . . . , In are ideals of R such that Ii + Ij = R

whenever i 6= j, then let
n
n
\
Y
Ii =
Ii .
I=
i=1
i=1
The sum of quotient maps R/I R/Ii gives an isomorphism

R/I
=
n
Y
R/Ii .
i=1
This has the slightly weaker consequence that given a system of congruences x
= ai (mod Ii ),
there is a solution in R which is unique mod I, as the theorem is usually stated for the
integers.
Version: 4 Owner: bwebste Author(s): bwebste, nerdy2
136.2
proof of chinese remainder theorem
First we prove that ai +

i = 1. Then
j6=i
aj = R for each i. Without loss of generality, assume that
R = (a1 + a2 )(a1 + a3 ) (a1 + an ),
since each factor a1 + aj is R. Expanding the product, each term will contain a1 as a factor,
except the term a2 a2 an . So we have
(a1 + a2 )(a1 + a3 ) (a1 + an ) a1 + a2 a2 an ,
646
and hence the expression on the right hand side must equal R.
Q
T
Now we can prove that ai = ai , by induction. The statement is trivial for n = 1. For
n = 2, note that
\
\
\
a1 a2 = (a1 a2 )R = (a1 a2 )(a1 + a2 ) a2 a1 + a1 a2 = a1 a2 ,
and the reverse inclusion is obvious, since each ai is an ideal. Assume that the statement is
proved for n 1, and condsider it for n. Then
n
\
ai = a1
n
\\
ai = a1
n
\Y
ai ,
using the induction hypothesis in the last step. But using the fact proved above and the
n = 2 case, we see that
n
n
n
\Y
Y
Y
a1
ai = a1
ai =
ai .
2
Finally,
Qwe are ready to prove the Chinese remainder theorem. Consider the ring homomorphism
R R/ai defined by projection on each component of the
1 + x, a2 +
T product: x 7 (aQ
x, . . . , an + x). It is easy to see that the kernel of this map is ai , which is also ai by the
earlier part of the proof. So it only remains to show that the map is surjective.
Q
Accordingly, take an arbitrary element (a1 + x1 , a2 + x2 , . . . , an + xn ) of QR/ai . Using the
first part of the proof, for each i, we can find elements yi ai and zi j6=i aj such that
yi + zi = 1. Put
x = x1 z1 + x2 z2 + + xn zn .
Then for each i,
ai + x = ai + xi zi ,
since xj zj ai for all j 6= i,
since xi yi ai ,
= ai + xi yi + xi zi ,
= ai + xi (yi + zi ) = ai + xi 1 = ai + xi .
Thus the map is surjective as required, and induces the isomporphism

YR
R
Q
.
ai
ai
647
Chapter 137
11P05 Warings problem and
variants
137.1
Lagranges four-square theorem
Lagranges four-square theorem states that every non-negative integer may be expressed as
the sum of at most four squares. By the Euler four-square identity, it is enough to show that
every prime is expressible by at most four squares. It was later proved that any number not
of the form 4n (8m + 7) may be expressed as the sum of at most three squares.
This shows that g(2) = G(2) = 4, where g and G are the Waring functions.
137.2
Warings problem
Waring asked whether it is possible to represent every natural number as a sum of bounded
number of nonnegative kth powers, that is, whether the set { nk | n Z+ } is a basis. He
was led to this conjecture by Lagranges theorem which asserted that every natural number
can be represented as a sum of four squares.
Hilbert [1] was the first to prove the Warings problem for all k. In his paper he did not give
an explicit bound on g(k), the number of powers needed, but later it was proved that
$ %
k
3
k
g(k) = 2 +
2
2
except possibly finitely many exceptional k, none of which are known to the date.
648
Wooley[3], improving the result of Vinogradov, proved that the number of kth powers needed
to represent all sufficiently large integers is
G(k) 6 k(ln k + ln ln k + O(1)).
REFERENCES
1. David Hilbert. Beweis f
ur Darstellbarkeit der ganzen Zahlen durch eine feste Anzahl n-ter
Potenzen (Waringsches Problem). Math. Ann., pages 281300, 1909. Available electronically
from GDZ.
2. Robert C. Vaughan. The Hardy-Littlewood method. Cambridge University Press, 1981.
Zbl 0868.11046.
3. Trevor D. Wooley. Large improvements in Warings problem. Ann. Math, 135(1):131164, 1992.
Zbl 0754.11026. Available online at JSTOR.
137.3
proof of Lagranges four-square theorem
The following proof is essentially Lagranges original, from around 1770.

lemma 1: For any integers a, b, c, d, w, x, y, z,
(a2 + b2 + c2 + d2 )(w 2 + x2 + y 2 + z 2 ) =
+
+
+
(aw + bx + cy + dz)2
(ax bw cz + dy)2
(ay + bz cw dx)2
(az by + cx dw)2.
This is the Euler four-square identity, q.v., with different notation.

Lemma 2: If an even number 2m is a sum of two squares, then so is m.
Proof: Say 2m = x2 + y 2 . Then x and y are both even or both odd. Therefore, in the
identity
x y 2 x + y 2
m=
+
,
2
2
both fractions on the right side are integers.
Lemma 3: If p is an odd prime, then a2 + b2 + 1 = kp for some integers a, b, k with 0 < k.
Proof: Consider the values of a2 , and the values of b2 1, for
a = 0, 1, ...,
649
p1
2
p1
2
No two elements of the first set are congruent mod p, and no two of the second. Since
each set has p+1
elements, but there are only p residue classes, something in the first set is
2
congruent to something in the second, i.e.
b = 0, 1, ...,
a2 + b2 + 1 = kp
for some k. Clearly 0 < k.
By Lemma 1 we need only show that an arbitrary prime p is a sum of four squares. Since
that is trivial for p = 2, suppose p is odd. By Lemma 3, we know
mp = a2 + b2 + c2 + d2
for some m, a, b, c, d with 0 < m. To complete the proof, we will show that if m > 1 then np
is a sum of four squares for some n with 1 n < m.
If m is even, then none, two, or all four of a, b, c, d are even; in any of those cases, Lemma 2
allows us to take n = m/2. So assume m is odd but > 1. Write
w
x
y
z
a (mod m)
b (mod m)
c (mod m)
d (mod m)
where w, x, y, z are all in the interval (m/2, m/2). We have

w 2 + x2 + y 2 + z 2 < 4
m2
= m2
4
w 2 + x2 + y 2 + z 2 0 (mod m).
So w 2 + x2 + y 2 + z 2 = nm for some integer n with 0 < n < m. But now look at Lemma 1.
On the left is nm2 p. Evidently these three sums:
ax bw cz + dy
ay + bz cw dx
az by + cx dw
are multiples of m. The same is true of the other sum on the right in Lemma 1:
aw + bx + cy + dz w 2 + x2 + y 2 + z 2 0
(mod m).
The equation in Lemma 1 can therefore be divided through by m2 . The result is an expression
for np as a sum of four squares. Since 0 < n < m, the proof is complete.
Remark: Lemma 3 can be improved: it is enough for p to be an odd number, not necessarily
prime. But that stronger statement requires a longer proof.
650
Chapter 138
11P81 Elementary theory of
partitions
138.1
pentagonal number theorem
Theorem :
(1 xk ) =
k=1
(1)n xn(3n+1)/2
(138.1.1)
n=
where the two sides are regarded as formal power series over Z.
Proof:For n 0, denote by f (n) the coefficient of xn in the product on the left, i.e. write
(1 x ) =
k=1
By this definition, we have for all n
f (n)xn .
n=0
f (n) = e(n) d(n)

where e(n) (resp. d(n)) is the number of partitions of n as a sum of an even (resp. odd)
number of distinct summands. To fix the notation, let P (n) be set of pairs (s, g) where
s is a natural number > 0 and g is a decreasing mapping {1, 2, . . . , s} N+ such that
P
x g(x) = n. The cardinal of P (n) is thus f (n), and P (n) is the union of these two disjoint
sets:
E(n) = {(s, g) P (n) | s is even},
D(n) = {(s, g) P (n) | s is odd}.
Now on the right side of (138.1.1) we have

1+
(1)n xn(3n+1)/2 +
n=1
X
n=1
651
(1)n xn(3n1)/2 .
Therefore what we want to prove is

e(n) = d(n) + (1)m
if n = m(3m 1)/2 for some m
e(n) = d(n)
otherwise.
(138.1.2)
(138.1.3)
For m 1 we have
m(3m + 1)/2 = 2m + (2m 1) + . . . + (m + 1)
m(3m 1)/2 = (2m 1) + (2m 2) + . . . + m
(138.1.4)
(138.1.5)
Take some (s, g) P (n), and suppose first that n is not of the form (138.1.4) nor (138.1.5).
Since g is decreasing, there is a unique integer k [1, s] such that
g(j) = g(1) j + 1
for j [1, k], g(j)
< g(1) j + 1
If g(s) k, define g : [1, s 1] N+ by

(
g(x) + 1, if x [1, g(s)],
g(x) =
g(x),
if x [g(s) + 1, s 1].
for j [k + 1, s].
If g(s) > k, define g : [1, s + 1] N+ by
g(x) 1, if x [1, k],

g(x) = g(x),
if x [k + 1, s],
k,
if x = s + 1.
P
In both cases, g is decreasing and x g(x) = n. The mapping g g maps takes an element
having odd s to an element having even s, and vice versa. Finally, the reader can verify that
g = g. Thus we have constructed a bijection E(n) D(n), proving (138.1.3).
Now suppose that n = m(3m + 1)/2 for some (perforce unique) m. The above construction
still yields a bijection between E(n) and D(n) excluding (from one set or the other) the
single element (m, g0 ):
g0 (x) = 2m + 1 x
for x [1, m]
as in (138.1.4). Likewise if n = m(3m 1)/2, only this element (m, g1 ) is excluded:

g1 (x) = 2m x
for x [1, m]
as in (138.1.5). In both cases we deduce (138.1.2), completing the proof.

Remarks:The name of the theorem derives from the fact that the exponents n(3n + 1)/2
are the generalized pentagonal numbers.
The theorem was discovered and proved by Euler around 1750. This was one of the first
results about what are now called theta functions, and was also one of the earliest applications
of the formalism of generating functions.
The above proof is due to F. Franklin, (Comptes Rendus de lAcad. des Sciences, 92,
1881, pp. 448-450).
652
Chapter 139
11R04 Algebraic numbers; rings of
algebraic integers
139.1
Dedekind domain
A Dedekind domain is an integral domain R for which:

Every ideal in R is finitely generated.
Every nonzero prime ideal is a maximal ideal.
The domain R is integrally closed in its field of fractions.
It is worth noting that the clause every prime is maximal implies that the maximal length
of a strictly increasing chain of prime ideals is 1, so the Krull dimension of any Dedekind
domain is 1. In particular, the affine ring of an algebraic set is a Dedekind domain if and
only if the set is normal, irreducible, and 1-dimensional.
Every Dedekind domain is a Noetherian ring.
If K is a number field, then OK , the ring of algebraic integers of K, is a Dedekind domain.
Version: 10 Owner: mathcam Author(s): mathcam, saforres
139.2
Dirichlets unit theorem
Let K be a number field, OK be its ring of integers. Then

OK (K) Zr+s-1.
653
Here (K) is the finite cyclic group of the roots of unity in OK , r is the number of real embeddings
K R, and 2s is the number of non-real complex embeddings K C (they occur in
complex conjugate pairs, so s is an integer).
139.3
Eisenstein integers
Let = (1+ 3)/2, where we arbitrarily choose 3 to be either of the complex numbers
whose square is 3. The Eisenstein integers are the ring Z[] = {a + b : a, b Z}.
139.4
Galois representation
In general, let K be any field. Write K for a separable closure of K, and GK for the
absolute Galois group ((|Q)K/K) of K. Let A be a (Hausdorff) abelian topological group.
Then an (A-valued) Galois representation for K is a continuous homomorphism
: GK Aut(A),
where we endow GK with the Krull topology, and where Aut(A) is the group of continuous automorphisms of A, endowed with the compact-open topology. One calls A the
representation space for .
The simplest case is where A = Cn , the group of n 1 column vectors with complex entries.
Then Aut(Cn ) = GLn (C), and we have what is usually called a complex representation of
degree n. In the same manner, letting A = F n , with F any field (such as R or a finite field
Fq ) we obtain the usual definition of a degree n representation over F .
There is an alternate definition which we should also mention. Write Z[GK ] for the group ring
of G with coefficients in Z. Then a Galois representation for K is simply a continuous
Z[GK ]-module A. In other words, all the information in a representation is preserved in
considering the representation space A as a continuous Z[GK ]-module. The equivalence of
these two definitions is as described in the entry for the group algebra.
When A is complete, the continuity requirement is equivalent to the action of Z[GK ] on M
naturally extending to a Z[[GK ]]-module structure on M. The notation Z[[GK ]] denotes the
completed group ring:
Z[[G]] = lim Z[G/H],
654
where G is any profinite group, and H ranges over all normal subgroups of finite index.
A notation we will be using often is the following. Suppose : GK Aut(A) is a representation, and H G a subgroup. Then we let
AH = {a A | (h)a = a, for all h H},
the subgroup of A fixed pointwise by H.
Given a Galois representation , let G0 = ker . By the fundamental theorem of infinite Galois theory,
since G0 is a closed normal subgroup of GK , it corresponds to a certain normal subfield
of K. Naturally, this is the fixed field of G0 , and we denote it by K(). (The notation
becomes better justified after we view some examples.) Notice that since is trivial on
G0 = ((|Q)K/K()), it factors through a representation
e: ((|Q)K()/K) Aut(A),
which is faithful. This property characterizes K().
In the case A = Rn or A = Cn , the so-called no small subgroups argument implies that

the image of GK is finite.
For a first application of definition, we say that is discrete if for all a A, the stabilizer of
a in GK is open in GK . This is the case when A is given the discrete topology, such as when
A is finite and Hausdorff. The stabilizer of any a A fixes a finite extension of K, which we
denote by K(a). One has that K() is the union of all the K(a).
As a second application, suppose that the image (GK ) is Abelian. Then the quotient
GK /G0 is Abelian, so G0 contains the commutator subgroup of GK , which means that K()
is contained in K ab , the maximal Abelian extension of K. This is the case when is a
character, i.e. a 1-dimensional representation over some (commutative unital) ring,
: GK GL1 (A) = A .
Associated to any field K are two basic Galois representations, namely those with representation spaces A = L and A = L , for any normal intermediate field K L K, with the
usual action of the Galois group on them. Both of these representations are discrete. The
additive representation is rather simple if L/K is finite: by the normal basis theorem, it is
merely a permutation representation on the normal basis. Also, if L = K and x K, then
K(x), the field obtained by adjoining x to K, agrees with the fixed field of the stabilizer of
x in GK . This motivates the notation K(a) introduced above.
655
By contrast, in general, L can become a rather complicated object. To look at just a

piece of the representation L , assume that L contains the group m of m-th roots of unity,
where m is prime to the characteristic of K. Then we let A = m . It is possible to
choose an isomorphism of Abelian groups m
= Z/m, and it follows that our representa
tion is : GK (Z/m) . Now assume that m has the form pn , where p is a prime not
equal to the characteristic, and set An = pn . This gives a sequence of representations
n : GK (Z/pn ) , which are compatible with the natural maps (Z/pn+1 ) (Z/pn ) .
This compatibility allows us to glue them together into a big representation
: GK Aut(Tp Gm )
= Z
p,
called the p-adic cyclotomic representation of K. This is representation is often not discrete.
The notation Tp Gm will be explained below.
This example may be generalized as follows. Let B be an Abelian algebraic group defined
over K. For each integer n, let Bn = B(K)[pn ] be the set of K-rational points whose order
divides pn . Then we define the p-adic Tate module of B via
Tp B = lim Bn .
It acquires a natural Galois action from the ones on the Bn . The two most commonly treated
examples of this are the cases B = Gm (the multiplicative group, giving the cyclotomic
representation above) and B = E, an elliptic curve defined over K.
The last thing which we shall mention about generalities is that to any Galois representation
: GK Aut(A), one may associate the Galois cohomology groups H n (K, ), more commonly written H n (K, A), which are defined to be the group cohomology of GK (computed
with continuous cochains) with coefficients in A.
Galois representations play a fundamental role in algebraic number theory, as many objects
and properties related to global fields and local fields may be determined by certain Galois
representations and their properties. We shall describe the local case first, and then the
global case.
Let K be a local field, by which we mean the fraction field of a complete DVR with finite
residue field. We write vK for the normalized valuation, OK for the associated DVR, mK for
the maximal ideal of OK , kK = OK /mK for the residue field, and ` for the characteristic of
kK .
Let L/K be a finite Galois extension, and define vL , OL , mL , and kL accordingly. There
is a natural surjection ((|Q)L/K) ((|Q)kL/kK ). We call the kernel of this map the
656
inertia group, and write it I(L/K) = ker (((|Q)L/K) ((|Q)kL /kK )). Further, the pSylow subgroup of I(L/K) is normal, and we call it the wild ramification group, and denote
it by W (L/K). One calls I/W the tame ramification group.
It happens that the formation of these group is compatible with extensions L0 /L/K, in
that we have surjections I(L0 /K) I(L/K) and W (L0 /K) W (L/K). This lets us
define WK IK GK to be the inverse limits of the subgroups W (L/K) I(L/K)
((|Q)L/K), L as usual ranging over all finite Galois extensions of K in K.
Let be a Galois representation for K with representation space A. We say that is
ramified if the inertia group IK acts trivially on A, or in other words IK ker or AIK = A.
Otherwise we say it is unramified. Similarly, we say that is (at most) tamely ramified if
the wild ramification group acts trivially, or WK ker , or AWK = A; and if not we say it
is wildly ramified.
IK
We let Kur = K be the maximal unramified extension of K, and Ktame = K

maximal tamely ramified extension of K.
WK
be the
Unramified or tamely ramified extensions are usually much easier to study than wildly ramb
ified extensions. In the unramified case, it results from the fact that GK /IK
= Gk(K)
=Z
is pro-cyclic. Thus an unramified representation is completely determined by the action of
() for a topological generator of GK /IK . (Such a is often called a Frobenius element.)
Given a finite extension L/K, one defines the inertia degree fL/K = [kL : kK ] and the
ramification degree eL/K = [vL (L ) : vL (K )] as usual. Then in the Galois case one may
recover them as fL/K = [((|Q)L/K) : I(L/K)] and eL/K = #I(L/K). The tame inertia
degree, which is the non-p-part of eL/K , is equal to [I(L/K) : W (L/K)], while the wild
inertia degree, which is the p-part of eL/K , is equal to #W (L/K).
One finds that the inertia and ramification properties of L/K may be computed from the
ramification properties of the Galois representation OL .
We now turn to global fields. We shall only treat the number field case. Thus we let K be
a finite extension of Q, and write OK for its ring of integers. For each place v of K, write
Kv for the completion of K with respect to v. When v is a finite place, we write simply v
for its associated normalized valuation, Ov for OKv , mv for mKv , kv for kKv , and `(v) for the
characteristic of kv .
For each place v, fix an algebraic closure K v of Kv . Furthermore, choose an embedding K ,
K v . This choice is equivalent to choosing an extension of v to all of K , and to choosing an

embedding GKv , GK . We denote the image of this last embedding by Gv GK ; it is called
a decomposition group at v. Sitting inside Gv are two groups, Iv and Wv , corresponding to
the inertia and wild ramification subgroups IKv and WKv of GKv ; we call the images Iv and
Wv the inertia group at v and the wild ramification group at v, respectively.
657
For a Galois representation : GK Aut(A) and a place v, it is profitable to consider

the restricted representation v = |Gv . One calls a global representation, and v a local
representation. We say that is ramified or tamely ramified (or not) at v if v is (or
isnt). The Tchebotarev density theorem implies that the corresponding Frobenius elements
v Gv are dense in GK , so that the union of the Gv is dense in GK . Therefore, it is
reasonable to try to reduce questions about to questions about all the v independently.
This is a manifestation of Hasses local-to-global principle.
Given a global Galois representation with representation space Znp which is unramified at
all but finitely many places v, it is a goal of number theory to prove that it arises naturally
in arithmetic geometry (namely, as a subrepresentation of an etale cohomology group of a
motive), and also to prove that it arises from an automorphic form. This can only be shown
in certain special cases.
Version: 6 Owner: jay Author(s): jay
139.5
Gaussian integer
A complex number of the form a + bi, where a, b Z, is called a Gaussian integer.

It is easy to see that the set S of all Gaussian integers is a subring of C; specifically, S is the
smallest subring containing {1, i}, whence S = G.
G is a Euclidean ring, hence a principal ring, hence a unique factorization domain.
There are four units (i.e. invertible elements) in the ring G, namely 1 and i. Up to
multiplication by units, the primes in G are
ordinary prime numbers 3 (mod#1)
elements of the form a bi where a2 + b2 is an ordinary prime 1 (mod#1) (see
Thues lemma)
the element 1 + i.
Using the ring of Gaussian integers, it is not hard to show, for example, that the Diophantine equation
x2 + 1 = y 3 has no solutions (x, y) Z Z except (0, 1).
Version: 6 Owner: Daume Author(s): Larry Hammick, KimJ
658
139.6
algebraic conjugates
Let L be an algebraic extension of a field K, and let 1 L be algebraic over K. Then

1 is the root of a minimal polynomial f (x) K[x]. Denote the other roots of f (x) in L
by 2 , 3 , . . . , n . These are the algebraic conjugates of 1 and any two are said to be
algebraically conjugate.
The notion of algebraic conjugacy is a special case of group conjugacy in the case where the
group in question is the Galois group of the above minimal polynomial, viewed as acting on
the roots of said polynomial.
139.7
algebraic integer
Let K be an extension of Q. A number K is called an algebraic integer of K it is the

root of a monic polynomial with coefficients in Z, i.e., an element of K that is integral over
Z.. Every algebraic integer is an algebraic number (with K = C), but the converse is false.
139.8
algebraic number
A number C is called an algebraic number if there exists a polynomial f (x) = an xn +

+ a0 such that a0 , . . . , an , not all zero, are in Q and f () = 0.
139.9
algebraic number field
A field K C is called an algebraic number field if its dimension over Q is finite.

659
139.10
calculating the splitting of primes
Let K|L be an extension of number fields, with rings of integers OK , OL . Since this extension
is separable, there exists K with L() = K and by multiplying by a suitable integer, we
may assume that OK (we do not require that OL [] = OK . There is not, in general, an
OL with this property). Let f OL [x] be the minimal polynomial of .
Now, let p be a prime ideal of L that does not divide (f )(OK )1 , and let f OL /pOL [x]
be the reduction of f mod p, and let f = f1 fn be its factorization into irreducible
polynomials. If there are repeated factors, then p splits in K as the product
p = (p, f1 ()) (p, fn ()),
where fi is any polynomial in OL [x] reducing to fi . Note that in this case p is unramified,
since all fi are pairwise coprime mod p
For example, let L = Q, K = Q( d) where d is a square-free integer. Then f = x2 d. For

any prime p, f is irreducible mod p if and only if it has no roots mod p, i.e. d is a quadratic
non-residue mod p. Using quadratic reciprocity, we can obtain a congruence condition mod
4p for which primes split and which do not. In general, this is possible for all fields with
abelian Galois groups, using class field theory.
Furthermore, let K 0 be the splitting field of L. Then G = Gal(K 0 |L) acts on the roots of f ,
giving a map G Sm , where m = deg f . Given a prime p of OL , the Artin symbol [P, K 0 |L]
for any P lying over p is determined up to conjugacy by p. Its image in Sn is a product of
disjoint cycles of length m1 , . . . , mn where mi = deg fi . This information is useful not just
for prime splitting, but also for the calculation of Galois groups.
Another useful fact is the Frobenius density theorem, which states that every element of G
is [P, K 0 |L] for infinitely many primes P of OK 0 .
For example, let f = x3 + x2 + 2 Z[x]. This is irreducible mod 3, and thus irreducible.
Galois theory tells us that G = Gal(K 0 |L) is a subgroup of S3 , and so is isomorphic to C3 or
S3 , but it is not obvious which. But if we consider p = 7, f (x 2)(x2 + 3x 1) (mod 7),
and the quadratic factor is irreducible mod 7. Thus, G
= S3 .
Or let f = x4 + ax2 + b for some integers a, b and is irreducible. For a prime p, consider the
factorization of f . Either it remains irreducible (G contains a 4-cycle), splits as the product
of irreducible quadratics (G contains a cycle of the form (12)(34)) or f has a root. If is
a root of f , then so is , and so assuming p 6= 2, there are at least two roots, and so a
3-cycle is impossible. Thus G
= C4 or D4 .
Version: 7 Owner: mathcam Author(s): mathcam, bwebste
660
139.11
characterization in terms of prime ideals
Let R be a Dedekind domain and let I be an ideal of R. Then there exists an ideal J in R
such that IJ is principal.
139.12
ideal classes form an abelian group
As previous, define C as the set of ideal classes, with multiplication defined by

[a] [b] = [ab]
where a, b are ideals of OK .
We shall check the group properties:
1. associativity: [a] ([b] [c]) = [a] [bc] = [a(bc)] = [abc] = [(ab)c] = [ab] [c] = ([a] [b]) [c]
2. identity element: [OK ] [b] = [b] = [b] [OK ].
3. inverses: Consider [b]. Let b be an integer in b. Then b (b), so there exists c such
that bc = (b).
Then the ideal class [b] [c] = [(b)] = [OK ].
Then C is a group under the operation .
It is abelian since [a][b] = [ab] = [ba] = [b][a].
Version: 5 Owner: drini Author(s): drini, saforres
139.13
integral basis
Let K be a number field. A set of algebraic integers {1 , . . . , s } is said to be an integral

basis for K if every in OK can be represented uniquely as an integer linear combination of
{1 , . . . , s } (i.e. one can write = m1 1 + + ms s with m1 , . . . , ms (rational) integers).
If I is an ideal of OK , then {1 , . . . , s } I is said to be an integral basis for I if every
element of I can be represented uniquely as an integer linear combination of {1 , . . . , s }.
(In the above, OK denotes the ring of algebraic integers of K.)
661
An integral basis for K over Q is a basis for K over Q.

139.14
integrally closed
A subring R of a ring S is said to be integrally closed in S if whenever S and is integral

over R, then R.
The integral closure of R in S is integrally closed in S.
A ring R is said to be integrally closed (or normal) if it is integrally closed in its fraction field.
139.15
transcendental root theorem
Suppose a constant a is transcendental over some field

F
.
Then
x is also transcendental
over F . Informally, this theorem is true because if x were algebraic, then we could take
its minimal polynomial, group the terms into odd and even powers, and then show that x is
also algebraic over F , a contradiction.
In fact this theorem is true for 3rd roots, 4th roots, 5th roots, ..., nth roots etc, but the proof
is somewhat more involved.
662
Chapter 140
11R06 PV-numbers and
generalizations; other special
algebraic numbers
140.1
Salem number
Salem number is a real algebraic integer > 1 whose algebraic conjugates

all lie in the

unit disk { z C |z| 6 1 } with at least one on the unit circle { z C |z| = 1 }.
Powers of a Salem number n (n = 1, 2, . . . ) are everywhere dense modulo 1, but are not
uniformly distributed modulo 1.
The smallest known Salem number is the largest positive root of
10 + 9 7 6 5 4 3 + + 1 = 0.
663
Chapter 141
11R11 Quadratic extensions
141.1
prime ideal decomposition in quadratic extensions of Q
Let K be a quadratic number field, i.e. K = Q( d) for some square-free integer d. The
discriminant of the extension is
(
d, if d 1 mod 4,
DK/Q =
4d, if d 2, 3 mod 4.
Let OK denote the ring of integers of K. We have:
(
1+ d
Z
Z, if d 1 mod 4,
2
OK
=
Z dZ, if d 2, 3 mod 4.
prime ideals of Z decompose as follows in OK :
Theorem 10. Let p Z be a prime.
1. If p | d (divides), then pOK = (p, d)2 ;

2. If d is odd, then
(2,
1
+
d) , if d
3 mod 4,

2OK =
2, 1+2 d 2, 12 d , if d 1 mod 8,
prime, if d 5 mod 8.
3. If p 6= 2, p does not divide d, then

(
(p, n + d)(p, n d), if d n2 mod p,

pOK =
prime, if d is not prime mod p.
664
REFERENCES
1. Daniel A.Marcus, Number Fields. Springer, New York.
665
Chapter 142
11R18 Cyclotomic extensions
142.1
Kronecker-Weber theorem
The following theorem classifies the possible Abelian extensions of Q.

Theorem 11 (Kronecker-Weber Theorem). Let L/Q be a finite abelian extension, then
L is contained in a cyclotomic extension, i.e. there is a root of unity such that L Q().
In a similar fashion to this result, the theory of elliptic curves with complex multiplication
provides a classification of abelian extensions of quadratic imaginary number fields:
Theorem 12. Let K be a quadratic imaginary number field with ring of integers OK . Let
E be an elliptic curve with complex multiplication by OK and let j(E) be the j-invariant of
E. Then:
1. K(j(E)) is the Hilbert class field of K.
2. If j(E) 6= 0, 1728 then the maximal abelian extension of K is given by:
K ab = K(j(E), h(Etorsion ))
where h(Etorsion ) is the set of x-coordinates of all the torsion points of E.
Note: The map h : E C is called a Weber function for E. We can define a Weber
function for the cases j(E) = 0, 1728 so the theorem holds true for those two cases as well.
Assume E : y 2 = x3 + Ax + B, then:
x(P ), if j(E) 6= 0, 1728;

h(P ) = x2 (P ), if j(E) = 1728;
3
x (P ), if j(E) = 0.
666
REFERENCES
1. S. Lang, Algebraic Number Theory, Springer-Verlag, New York.
New York.
142.2
examples of regular primes
Examples:
1. These are all the irregular primes up to 1061:
37,59,67,101,103,131,149,157,233,257,263,271,283,293,307, 311,347,353,379,389,401,
409,421,433,461,463,467,491,523,541,547,557,577,587,593, 607,613,617,619,631,647,
653,659,673,677,683,691,727,751,757,761,773,797,809,811, 821,827,839,877,881,887,
929,953,971,1061.
(for this, see the On-Line Encyclopedia of Integer Sequences, sequence A000928)
2. The following are the first few class numbers of the cyclotomic fields Q(p ), where p
is a primitive p-th root of unity:
p Class Number
3
1
5
1
7
1
11
1
13
1
17
1
19
1
23
3
29
8
31
9
37
37
41
121
43
211
47
695
53
4889
59
41241
61
76301
Remarks:
667
Notice that 37 divides 37, and 59 divides 41241 = 3 59 233, thus 37, 59 are
irregular primes (see above).
The class number of the cyclotomic fields grows very quickly with p. For example,
p = 19 is the last cyclotomic field of class number 1.
142.3
prime ideal decomposition in cyclotomic extensions of Q
Let q Z be a prime greater than 2, let q = e2i/q and write L = Q(q ) for the cyclotomic extension.
The ring of integers of L is OL = Z[q ]. The discriminant of L/Q is:
DL/Q = q q2
and it is + exactly when q 1 0, 1 mod 4.
Proposition 4. q Q(q ), with + exactly when q 1 0, 1 mod 4.

I
t can be proved that:

DL/Q = q q2 =
(qi qj )2
16i<j6p1
Taking square roots we obtain

q
q3
2
q =
(qi qj ) Q(q )
16i<j6p1
Hence the result holds (and the sign depends on whether q 1 0, 1 mod 4).
Let K = Q( q) with the corresponding sign. Thus, by the proposition we have a tower of
fields:
L = Q(q )
K
Q
For a prime ideal pZ the decomposition in the quadratic extension K/Q is well-known (see
this entry). The next theorem characterizes the decomposition in the extension L/Q:
Theorem 13. Let p Z be a prime.
668
1. If p = q, qOL = (1 q )q1 . In other words, the prime q is totally ramified in L.

2. If p 6= q then pZ splits into (p 1)/f distinct primes in OL , where f is the order of
p mod q (i.e. pf 1 mod q, and for all 1 < n < f, pn 6= 1 mod q).
REFERENCES
1. Daniel A.Marcus, Number Fields. Springer, New York.
142.4
regular prime
A prime p is regular if the class number of the cyclotomic field Q(p ) is not divisible by p
(where p := e2i/p denotes a primitive pth root of unity). An irregular prime is a prime that
is not regular.
Regular primes rose to prominence as a result of Ernst Kummers work in the 1850s on
Fermats last theorem. Kummer was able to prove Fermats Last Theorem in the case where
the exponent is a regular prime, a result that prior to Wiless recent work was the only
demonstration of Fermats Last Theorem for a large class of exponents. In the course of this
work Kummer also established the following numerical criterion for determining whether a
prime is regular:
p is regular if and only if none of the numerators of the Bernoulli numbers B0 , B2 , B4 , . . . , Bp3
is a multiple of p.
Based on this criterion it is possible to give a heuristic argument that the regular primes
have density e1/2 in the set of all primes [1]. Despite this, there is no known proof that the
set of regular primes is infinite, although it is known that there are infinitely many irregular
primes.
REFERENCES
1. Kenneth Ireland & Michael Rosen, A Classical Introduction to Modern Number Theory,
Springer-Verlag, New York, Second Edition, 1990.
669
Chapter 143
11R27 Units and factorization
143.1
regulator
Let K be a number field with [K : Q] = n = r1 + 2r2 . Here r1 denotes the number of

real embeddings:
i : K , R, 1 6 i 6 r1
while r2 is half of the number of complex embeddings:
j : K , C,
1 6 j 6 r2
Note that {j , j | 1 6 j 6 r2 } are all the complex embeddings of K. Let r = r1 + r2 and

for 1 6 i 6 r define the norm in K corresponding to each embedding:
k k i : K R+
k ki =| i () |,
k kr1 +j =| j () |2 ,
1 6 i 6 r1
1 6 i 6 r2
Let OK be the ring of integers of K. By Dirichlets unit theorem, we know that the rank of
the unit group O
K is exactly r 1 = r1 + r2 1. Let
{1 , 2 , . . . , r1 }
be a fundamental system of generators of O

K modulo
torsion subgroup). Let A be the r (r 1) matrix
log k 1 k1 log k 2 k1 . . .
log k 1 k2 log k 2 k2 . . .
A=
..
..
..
.
.
.
log k 1 kr log k 2 kr . . .
roots of unity (this is, modulo the
log k r1 k1
log k r1 k2
..
.
log k r1 kr
and let Ai be the (r 1) (r 1) matrix obtained by deleting the i-th row from A, 1 6 i 6 r.
It can be checked that the determinant of Ai , det Ai , is independent up to sign of the choice
of fundamental system of generators of O
K and is also independent of the choice of i.
670
Definition 5. The regulator of K is defined to be

RegK =| det A1 |
671
Chapter 144
11R29 Class numbers, class groups,
discriminants
144.1
Existence of Hilbert Class Field
Let K be a number field. There exists a finite extension E of K with the following properties:
1. [E : K] = hK , where hK is the class number of K.
2. E is Galois over K.
3. The ideal class group of K is isomorphic to the Galois group of E over K.
4. Every ideal of OK is a principal ideal of the ring extension OE .
5. Every prime ideal P of OK decomposes into the product of
where f is the order of [P] in the ideal class group of OE .
hK
f
prime ideals in OE ,
There is a unique field E satisfying the above five properties, and it is known as the Hilbert
class field of K.
The field E may also be characterized as the maximal abelian unramified extension of K.
672
144.2
class number formula
Let K be a number field with [K : Q] = n = r1 + 2r2 , where r1 denotes the number of

real embeddings of K, and 2r2 is the number of complex embeddings of K. Let
K (s)
be the Dedekind zeta function of K. Also define the following invariants:
1. hK is the class number, the number of elements in the ideal class group of K.
2. RegK is the regulator of K.
3. K is the number of roots of unity contained in K.
4. DK is the discriminant of the extension K/Q.
Then:
Theorem 14 (Class Number formula). The Dedekind zeta function of K, K (s) converges absolutely
for Re(s) > 1 and extends to a meromorphic function defined for Re(s) > 1 n1 with only
one simple pole at s = 1. Moreover:
lim(s 1)K (s) =
s1
2r1 (2)r2 hK RegK

p
K | DK |
Note: This is the most general class number formula. In particular cases, for example
when K is a cyclotomic extension of Q, there are particular and more refined class number
formulas.
144.3
discriminant
144.3.1
Definitions
Let R be any Dedekind domain with field of fractions K. Fix a finite dimensional field
extension L/K and let S denote the integral closure of R in L. For any basis x1 , . . . , xn of
L over K, the determinant
(x1 , . . . , xn ) := det[Tr(xi xj )],
673
whose entries are the trace of xi xj over all pairs i, j, is called the discriminant of the basis
x1 , . . . , xn . The ideal in R generated by all discriminants of the form
(x1 , . . . , xn ), xi S
is called the discriminant ideal of S over R, and denoted (S/R).

In the special case where S is a free Rmodule, the discriminant ideal (S/R) is always a
principal ideal, generated by any discriminant of the form (x1 , . . . , xn ) where x1 , . . . , xn is
a basis for S as an Rmodule. In particular, this situation holds whenever K and L are
number fields.
144.3.2
Properties
The discriminant is so named because it allows one to determine which ideals of R are
ramified in S. Specifically, the prime ideals of R that ramify in S are precisely the ones that
contain the discriminant ideal (S/R). In the case R = Z, Minkowskis theorem states that
any ring of integers S of a number field larger than Q has discriminant strictly smaller than
Z itself, and this fact combined with the previous result shows that any number field K 6= Q
admits at least one ramified prime over Q.
Version: 5 Owner: djao Author(s): djao, saforres
144.4
ideal class
Let K be a number field. Let a and b be ideals in OK (the ring of algebraic integers of K).
Define a relation on the ideals of OK in the following way: write a b if there exist
nonzero elements and of OK such that ()a = ()b.
The relation is an equivalence relation, and the equivalence classes under are known as
ideal classes.
The number of equivalence classes, denoted by h or hK , is called the class number of K.
Note that the set of ideals of any ring R forms an Abelian semigroup with the product of ideals
as the semigroup operation. By replacing ideals by ideal classes, it is possible to define a
group on the ideal classes of OK in the following way.
Let a, b be ideals of OK . Denote the ideal classes of which a and b are representatives by [a]
and [b] respectively. Then define by
[a] [b] = [ab]
Let C = {[a] | a 6= (0), a an ideal of OK }. With the above definition of multiplication, C is

an abelian group, called the ideal class group of K.
674
Note that the ideal class group of K is simply the quotient group of the ideal group of K by
the subgroup of principal fractional ideals.
144.5
ray class group
Let m be a modulus for a number field K. The ray class group of K mod m is the group
Im/Km,1, where
Im is the subgroup of the ideal group of K generated by all prime ideals which do not
occur in the factorization of m.
Km,1 is the subgroup of Im consisting of all principal ideals in the ring of integers of K
having the form () where is multiplicatively congruent to 1 mod m.
675
Chapter 145
11R32 Galois theory
145.1
Galois criterion for solvability of a polynomial

by radicals
Let f F [x] be a polynomial over a field F , and let K be its splitting field. Then K is a
radical extension if and only if the Galois group Gal(K/F ) is a solvable group.
676
Chapter 146
11R34 Galois cohomology
146.1
Hilbert Theorem 90
Let L/K be a finite Galois extension with Galois group G = Gal(L/K). Then the first
Galois cohomology group H 1 (G, L ) is 0.
A corollary (and the actual result that Hilbert called his Theorem 90) is that, if G is cyclic
with generator , then x L has norm 1 if and only if
x = y/(y)
for some y L.
677
Chapter 147
11R37 Class field theory
147.1
Artin map
Let L/K be a Galois extension of number fields, with rings of integers OL and OK . For any
finite prime P L lying over a prime p K, let D(P) denote the decomposition group
of P, let T (P) denote the inertia group of P, and let l := OL /P and k := OK /p be the
residue fields. The exact sequence
1 T (P) D(P) ((|Q)l/k) 1
yields an isomorphism D(P)/T (P)
= ((|Q)l/k). In particular, there is a unique element
in D(P)/T (P), denoted [L/K, P], which maps to the q th power Frobenius map Frobq
((|Q)l/k) under this isomorphism (where q is the number of elements in k). The notation
[L/K, P] is referred to as the Artin symbol of the extension L/K at P.
If we add the additional assumption that p is unramified, then T (P) is the trivial group,
and [L/K, P] in this situation is an element of D(P) ((|Q)L/K), called the Frobenius
automorphism of P.
If, furthermore, L/K is an Abelian extension (that is, ((|Q)L/K) is an abelian group), then
[L/K, P] = [L/K, P0 ] for any other prime P0 L lying over p. In this case, the Frobenius
automorphism [L/K, P] is denoted (L/K, p); the change in notation from P to p reflects the
fact that the automorphism is determined by p K independent of which prime P of L
above it is chosen for use in the above construction.
Definition 4. Let S be a finite set of primes of K, containing all the primes that ramify in
S
L. Let IK
denote the subgroup of the group IK of fractional ideals of K which is generated
by all the primes in K that are not in S. The Artin map
S
((|Q)L/K)
L/K : IK
S
is the map given by L/K (p) := (L/K, p) for all primes p
/ S, extended linearly to IK
.
678
147.2
Tchebotarev density theorem
Let L/K be any finite Galois extension of number fields with Galois group G. For any
conjugacy class C G, the subset of prime ideals p K which are unramified in L and
satisfy the property
[L/K, P] C for any prime P L containing p
has analytic density
|C|
,
|G|
where [L/K, P] denotes the Artin symbol at P.
Note that the conjugacy class of [L/K, P] is independent of the choice of prime P lying
over p, since any two such choices of primes are related by a Galois automorphism and their
corresponding Artin symbols are conjugate by this same automorphism.
147.3
modulus
A modulus for a number field K is a formal product

Y
pnp
p
where
The product is taken over all finite primes and infinite primes of K
The exponents np are nonnegative integers
All but finitely many of the np are zero
For every real prime p, the exponent np is either 0 or 1
For every complex prime p, the exponent np is 0
A modulus can be written as a product of its finite part
Y
pnp
p finite
679
and its infinite part
pnp ,
p real
with the finite part equal to some ideal in the ring of integers OK of K, and the infinite part
equal to the product of some subcollection of the real primes of K.
147.4
multiplicative congruence
Let p be any real prime of a number field K, and write i : K R for the corresponding
real embedding of K. We say two elements , K are multiplicatively congruent mod p if
the real numbers i() and i() are either both positive or both negative.
Now let p be a finite prime of K, and write (OK )p for the localization of the ring of integers
OK of K at p. For any natural number n, we say and are multiplicatively congruent mod
pn if they are members of the same coset of the subgroup 1 + pn (OK )p of the multiplicative
group K of K.
If m is any modulus for K, with factorization
Y
m=
pnp ,
p
then we say and are multiplicatively congruent mod m if they are multiplicatively congruent mod pnp for every prime p appearing in the factorization of m.
Multiplicative congruence of and mod m is commonly denoted using the notation

(mod m).
147.5
ray class field
Proposition 5. Let L/K be a finite Abelian extension of number fields, and let OK be the
ring of integers of K. There exists an integral ideal C OK , divisible by precisely the
prime ideals of K that ramify in L, such that
((), L/K) = 1,
K , 1 mod C
where ((), L/K) is the Artin map.

680
Definition 6. The conductor of a finite abelian extension L/K is the largest ideal CL/K
OK satisfying the above properties.
Note that there is a largest ideal with this condition because if proposition 1 is true for
C1 , C2 then it is also true for C1 + C2 .
Definition 7. Let I be an integral ideal of K. A ray class field of K (modulo I) is a finite
abelian extension KI/K with the property that for any other finite abelian extension L/K
with conductor CL/K ,
CL/K | I L KI
Note: It can be proved that there is a unique ray class field with a given conductor. In words,
the ray class field is the biggest abelian extension of K with a given conductor (although
the conductor of KI does not necessarily equal I !, see example 2).
Remark: Let p be a prime of K unramified in L, and let P be a prime above p. Then
(p, L/K) = 1 if and only if the extension of residue fields is of degree 1
[OL /P : OK /p] = 1
if and only if p splits completely in L. Thus we obtain a characterization of the ray class
field of conductor C as the abelian extension of K such that a prime of K splits completely
if and only if it is of the form
(),
K , 1 mod C
Examples:
1. The ray class field of Q of conductor NZ is the N th -cyclotomic extension of Q. More
concretely, let N be a primitive N th root of unity. Then
QN Z = Q(N )
2.
Q(i)(2) = Q(i)
so the conductor of Q(i)(2) /Q is (1).
3. K(1) , the ray class field of conductor (1), is the maximal abelian extension of K which
is unramified everywhere. It is, in fact, the Hilbert class field of K.
REFERENCES
1. Artin/Tate, Class Field Theory. W.A.Benjamin Inc., New York.

681
Chapter 148
11R56 Ad`
ele rings and groups
148.1
adle
Let K be a number field. For each finite prime v of K, let ov denote the valuation ring of the
completion Kv of K at v. The adèle group AK of K is defined to be the restricted direct product
of the collection of locally compact additive groups {Kv } over all primes v of K (both finite
primes and infinite primes), with respect to the collection of compact open subgroups {ov }
defined for all finite primes v.
The set AK inherits addition and multiplication operations (defined pointwise) which make
it into a topological ring. The original field K embeds as a ring into AK via the map
Y
x 7
xv .
v
defined for x K, where xv denotes the image of x in Kv under the embedding K , Kv .

Note that xv ov for all but finitely many v, so that the element x is sent under the above
definition into the restricted direct product as claimed.
It turns out that the image of K in AK is a discrete set and the quotient group AK /K is a
compact space in the quotient topology.
148.2
idle
Let K be a number field. For each finite prime v of K, let ov be the valuation ring of the
completion Kv of K at v, and let Uv be the group of units in ov . Then each group Uv is a
compact open subgroup of the group of units Kv of Kv . The idèle group IK of K is defined
682
to be the restricted direct product of the multiplicative groups {Kv } with respect to the
compact open subgroups {Uv }, taken over all finite primes and infinite primes v of K.
The units K in K embed into IK via the diagonal embedding
Y
x 7
xv ,
v
where xv is the image of x under the embedding K , Kv of K into its completion Kv .

As in the case of adèles, the group K is a discrete subgroup of the group of idèles IK , but
unlike the case of adèles, the quotient group IK /K is not a compact group. It is, however,
possible to define a certain subgroup of the idèles (the subgroup of norm 1 elements) which
does have compact quotient under K .
Warning: The group IK is a multiplicative subgroup of the ring of adèles AK , but the
topology on IK is different from the subspace topology that IK would have as a subset of
AK .
148.3
restricted direct product
Let {Gv }vV be a collection of locally compact topological groups. For all but finitely many
v V , let Hv Gv be a compact open subgroup of Gv . The restricted direct product of the
collection {Gv } with respect to the collection {Hv } is the subgroup

)
(

Y

G := (gv )vV
Gv gv Hv for all but finitely many v V

vV
of the direct product
vV
Gv .
We define a topology on G as follows. For every finite subset S V that contains all the
elements v for which Hv is undefined, form the topological group
Y
Y
GS :=
Gv
Hv
vS
vS
/
consisting of the direct product of the Gv s, for v S, and the Hv s, for v

/ S. The
topological group GS is a subset of G for each such S, and we take for a topology on G the
weakest topology such that the GS are open subsets of G, with the subspace topology on
each GS equal to the topology that GS already has in its own right.
683
Chapter 149
11R99 Miscellaneous
149.1
Henselian field
Let || be a non archemidean valuation on K. Define the set V := {x : |x| 6 1}. We can see
that V is closed under addition as || is an ultra metric, and infact V is an additive group.
The other valuation axioms ensure that V is a ring. We call V the valuation ring of K with
respect to ||. Note that the field of fractions of V is K.
Let := {x : |x| < 1}. It is easy to show that this is a maximal ideal of V . Let R := V /
be called the residue field.
The map res: V V / given by x 7 x + is called the residue map. We extend the
definition of the residue map to sequences of elements from V , and hence to V [X] so that if
P
P
(
(
f (X) V [X] is given by i6n ai X i then resf ) R[X] is given by i6n resa i)X i .
(
Hensel Property: Let f (x) V [x]. Suppose resf )(x) has a simple root e k. Then f (x)
has a root e0 V and res(e0 ) = e.
Any valued field satisfying hensels property shall be called henselian. The completion of
a non archemidean valued field K with respect to the valuation (cf. constructing the reals
from the rationals as the completion with respect to the standard metric) is a henselian field.
Every non archemedian valued field K has a unique (up to isomorphism) smallest henselian
field K h containing it. We call K h the henselisation of K.
684
149.2
valuation
Let K be a field. A valuation on K is a function | | : K R satisfying the properties:

1. |x| > 0 for all x K, with equality if and only if x = 0
2. |xy| = |x| |y| for all x, y K
3. |x + y| 6 |x| + |y|
If a valuation satisfies |x + y| 6 max(|x|, |y|), then we say that it is a nonarchimedean
valuation. Otherwise we say that it is an archimedean valuation.
Every valuation on K defines a metric on K, given by d(x, y) := |x y|. This metric is an
ultrametric if and only if the valuation is nonarchimedean. Two valuations are equivalent
if their corresponding metrics induce the same topology on K. An equivalence class v of
valuations on K is called a prime of K. If v consists of archimedean valuations, we say that
v is an infinite prime, or archimedean prime. Otherwise, we say that v is a finite prime, or
nonarchimedean prime.
In the case where K is a number field, primes as defined above generalize the notion of
prime ideals in the following way. Let p K be a nonzero prime ideal 1 , considered as a
fractional ideal. For every nonzero element x K, let r be the unique integer such that
x pr but x
/ pr+1 . Define
(
1/N(p)r x 6= 0,
|x|p :=
0
x = 0,
where N(p) denotes the absolute norm of p. Then | |p is a nonarchimedean valuation on
K, and furthermore every nonarchimedean valuation on K is equivalent to | |p for some
prime ideal p. Hence, the prime ideals of K correspond bijectively with the finite primes of
K, and it is in this sense that the notion of primes as valuations generalizes that of a prime
ideal.
As for the archimedean valuations, when K is a number field every embedding of K into R
or C yields a valuation of K by way of the standard absolute value on R or C, and one can
show that every archimedean valuation of K is equivalent to one arising in this way. Thus
the infinite primes of K correspond to embeddings of K into R or C, and we call such a
prime real or complex according to whether the valuations comprising it arise from real or
complex embeddings.
By prime ideal we mean prime fractional ideal of K or equivalently prime ideal of the
ring of integers of K. We do not mean literally a prime ideal of the ring K, which would be the zero ideal.
685
Chapter 150
11S15 Ramification and extension
theory
150.1
decomposition group
150.1.1
Decomposition Group
Let A be a noetherian integrally closed integral domain with field of fractions K. Let L be
a Galois extension of K and denote by B the integral closure of A in L. Then, for any
prime ideal p A, the Galois group G := ((|Q)L/K) acts transitively on the set of all
prime ideals P B containing p. If we fix a particular prime ideal P B lying over p,
then the stabilizer of P under this group action is a subgroup of G, called the decomposition
group at P and denoted D(P/p). In other words,
D(P/p) := { G | (P) = (P)}.
If P0 B is another prime ideal of B lying over p, then the decomposition groups D(P/p)
and D(P0/p) are conjugate in G via any Galois automorphism mapping P to P0 .
150.1.2
Inertia Group
Write l for the residue field B/P and k for the residue field A/p. Assume that the extension
l/k is separable (if it is not, then this development is still possible, but considerably more
complicated; see [1, p. 20]). Any element D(P/p), by definition, fixes P and hence
descends to a well defined automorphism of the field l. Since also fixes A by virtue of
being in G, it induces an automorphism of the extension l/k fixing k. We therefore have a
group homomorphism
D(P/p) ((|Q)l/k),
686
and the kernel of this homorphism is called the inertia group of P, and written T (P/p). It
turns out that this homomorphism is actually surjective, so there is an exact sequence
1
150.1.3
T (P/p)
D(P/p)
((|Q)l/k)
(150.1.1)
Decomposition of Extensions
The decomposition group is so named because it can be used to decompose the field extension
L/K into a series of intermediate extensions each of which has very simple factorization
behavior at p. If we let LD denote the fixed field of D(P/p) and LT the fixed field of T (P/p),
then the exact sequence (150.1.1) corresponds under Galois theory to the lattice of fields
L
e
LT
f
LD
g
K
If we write e, f, g for the degrees of these intermediate extensions as in the diagram, then we
have the following remarkable series of equalities:
1. The number e equals the ramification index e(P/p) of P over p, which is independent
of the choice of prime ideal P lying over p since L/K is Galois.
2. The number f equals the inertial degree f (P/p) of P over p, which is also independent
of the choice of prime ideal P since L/K is Galois.
3. The number g is equal to the number of prime ideals P of B that lie over p A.
Furthermore, the fields LD and LT have the following independent characterizations:
LT is the smallest intermediate field F such thatTP is totally ramified over P
it is the largest intermediate field such that e(P F, p) = 1.
F , and
LDTis the smallest intermediate field F such that P is the only

T prime of BTlying over
P F , and it is the largest intermediate field such that e(P F, p) = f (P F, p) = 1.
Informally, this decomposition of the extension says that the extension LD /K encapsulates
all of the factorization of p into distinct primes, while the extension LT /LD is the source
of all the inertial degree in P over p and the extension L/LT is responsible for all of the
ramification that occurs over p.
687
150.1.4
Localization
The decomposition groups and inertia groups of P behave well under localization. That
is, the decomposition and inertia groups of PBP BP over the prime ideal pAp in the
localization Ap of A are identical to the ones obtained using A and B themselves. In fact,
the same holds true even in the completions of the local rings Ap and BP at p and P.
REFERENCES
1. J.P. Serre, Local Fields, SpringerVerlag, 1979 (GTM 67)
150.2
examples of prime ideal decomposition in number fields
Here we follow the notation of the entry on the decomposition group. See also this entry.
Example 1
Let K = Q( 7); then Gal(K/Q) = {Id, }

= Z/2Z, where is the complex conjugation
map. Let OK be the ring of integers of K. In this case:

1 + 7
OK = Z
2
The discriminant of this field is DK/Q = 7. We look at the decomposition in prime ideals
of some prime ideals in Z:
1. The only prime ideal in Z that ramifies is (7):
(7)OK = ( 7)2
and we have e = 2, f = g = 1. Next we compute the decomposition
and inertia groups
from the definitions. Notice that both Id, fix the ideal ( 7). Thus:
D(( 7)/(7)) = Gal(K/Q)
For the inertia group, notice that Id mod ( 7). Hence:
T (( 7)/(7)) = Gal(K/Q)
Also note
that
this
is
trivial
if
we
use
the
properties
of
the
fixed
field
of
D((
7)/(7))
and T (( 7)/(7)) (see the section on decomposition of extensions in the entry on

decomposition group), and the fact that e f g = n, where n is the degree of the
extension (n = 2 in our case).
688
2. The primes (5), (13) are inert, i.e. they are prime ideals in OK . Thus e = 1 = g, f = 2.
Obviously the conjugation map fixes the ideals (5), (13), so
D(5OK /(5)) = Gal(K/Q) = D(13OK /(13))
On the other hand ( 7) 7 mod(5), (13), so 6= Id mod(5), (13) and

T (5OK /(5)) = {Id} = T (13OK /(13))
3. The primes (2), (29) are split:

1 + 7
1 7
2OK = 2,
2,
= P P0
2
2

29OK = 29, 14 + 7 29, 14 7 = R R0
so e = f = 1, g = 2 and
D(P/(2)) = T (P/(2)) = {Id} = D(R/(29)) = T (R/(29))

Example 2
2i
Let 7 = e 7 , i.e. a 7th -root of unity, and let L = Q(7 ). This is a cyclotomic extension of
Q with Galois group
Gal(L/Q)
= (Z/7Z)
= Z/6Z
Moreover
Gal(L/Q) = {a : L L | a (7 ) = 7a ,
a (Z/7Z) }
L = Q(7 )
Galois theory gives us the subfields of L:
Q(7 + 76 )
Q( 7)
Q
The discriminant of the extension L/Q is DL/Q = 75 . Let OL denote the ring of integers of
L, thus OL = Z[7 ]. We use the results of this entry to find the decomposition of the primes
2, 5, 7, 13, 29:
L = Q(7 )
(1 7 )6
P P0
(5)
Q1 Q2 Q3
( 7)2

2, 1+ 2 7 2, 1 2 7
(5)
(13)
(7)
(2)
(5)
(13)
K = Q( 7)
2
689
1. The prime ideal 7Z is totally ramified in L, and the only prime ideal that ramifies:
7OL = (1 7 )6 = T6
Thus
e(T/(7)) = 6,
f (T/(7)) = g(T/(7)) = 1
Note that, by the properties of the fixed fields of decomposition and inertia groups, we
must have LT (T/(7)) = Q = LD(T/(7)) , thus, by Galois theory,
D(T/(7)) = T (T/(7)) = Gal(L/Q)
2. The ideal 2Z factors in K as above, 2OK = P P0 , and each of the prime ideals P, P0
remains inert from K to L, i.e. POL = P, a prime ideal of L. Note also that the order
of 2 mod 7 is 3, and since g is at least 2, 2 3 = 6, so e must equal 1 (recall that
ef g = n):
e(P/(2)) = 1, f (P/(2)) = 3, g(P/(2)) = 2
Since e = 1, LT (P/(2)) = L, and [L : LD(P/(2)) ] = 3, so
D(P/(2)) =< 2 >
= Z/3Z,
T (P/(2)) = {Id}
3. The ideal (5) is inert, 5OL = S is prime and the order of 5 modulo 7 is 6. Thus:
e(S/(5)) = 1,
f (S/(5)) = 6,
D(S/(5)) = Gal(L/Q),
g(S/(5)) = 1
T (S/(5)) = {Id}
4. The prime ideal 13Z is inert in K but it splits in L, 13OL = Q1 Q2 Q3 , and

13 6 1 mod 7, so the order of 13 is 2:
e(Qi /(13)) = 1,
f (Qi /(13)) = 2,
D(Qi /(13)) =< 6 >

= Z/2Z,
g(Qi /(13)) = 3
T (Qi /(13)) = {Id}
5. The prime ideal 29Z is splits completely in L,

29OL = R1 R2 R3 R0 1 R0 2 R0 3
Also 29 1 mod 7, so f = 1,
e(Ri /(29)) = 1,
f (Ri /(29) = 1,
g(Ri /(29)) = 6
D(Ri /(29)) = T (Ri /(29)) = {Id}

690
150.3
inertial degree
Let : A B be a ring homomorphism. Let P B be a prime ideal, with p := 1 (P) A.

The algebra map induces an A/p module structure on the ring B/P. If the dimension of
B/P as an A/p module exists, then it is called the inertial degree of P over A.
A particular case of special importance in number theory is when L/K is a field extension
and : OK OL is the inclusion map of the ring of integers. In this case, the domain
OK /p is a field, so dimOK /p OL /P is guaranteed to exist, and the inertial degree of P over
OK is denoted f (P/p). We have the formula
X
e(P/p)f (P/p) = [L : K],
P|p
where e(P/p) is the ramification index of P over p and the sum is taken over all prime ideals
P of OL dividing pOL .
Example:
Let : Z Z[i] be the inclusion of the integers into the Gaussian integers. A prime p in Z
may or may not factor in Z[i]; if it does factor, then it must factor as p = (x + yi)(x yi) for
some integers x, y. Thus a prime p factors into two primes if it equals x2 + y 2, and remains
prime in Z[i] otherwise. There are then three categories of primes in Z[i]:
1. The prime 2 factors as (1 + i)(1 i), and the principal ideals generated by (1 + i) and
(1 i) are equal in Z[i], so the ramification index of (1 + i) over Z is two. The ring
Z[i]/(1 + i) is isomorphic to Z/2, so the inertial degree f ((1 + i)/(2)) is one.
2. For primes p 1 mod 4, the prime p Z factors into the product of the two primes
(x + yi)(x + yi), with ramification index and inertial degree one.
3. For primes p 3 (mod 4), the prime p remains prime in Z[i] and Z[i]/(p) is a two
dimensional field extension of Z/p, so the inertial degree is two and the ramification
index is one.
In all cases, the sum of the products of the inertial degree and ramification index is equal to
2, which is the dimension of the corresponding extension Q(i)/Q of number fields.
150.3.1
Local interpretations & generalizations
For any extension : A B of Dedekind domains, the inertial degree of the prime P B
over the prime p := 1 (P) A is equal to the inertial degree of PBP over pAp in the
localizations at P and p. Moreover, the same is true even if we pass to completions of the
local rings BP and Ap at P and p. The preservation of inertial degree and ramification indices
691
with respect to localization is one of the reasons why the technique of localization is a useful
tool in the study of such domains.
As in the case of ramification indices, it is possible to define the notion of inertial degree
in the more general setting of locally ringed spaces. However, the generalizations of inertial
degree are not as widely used because in algebraic geometry one usually works with a fixed
base field, which makes all the residue fields at the points equal to the same field.
150.4
ramification index
150.4.1
Ramification in number fields
Definition 5 (First definition). Let L/K be an extension of number fields. Let p be a

nonzero prime ideal in the ring of integers OK of K, and suppose the ideal pOL OL factors
as
n
Y
pOL =
Pei i
i=1
for some prime ideals Pi OL and exponents ei N. The natural number ei is called the
ramification index of Pi over p. It is often denoted e(Pi /p). If ei > 1 for any i, then we say
the ideal p ramifies in L.
T
Likewise, if P is a nonzero prime ideal in OL , and p := P OK , then we say P ramifies over
K if the ramification index e(P/p) of P in the factorization of the ideal pOL OL is greater
than 1. That is, a prime p in OK ramifies in L if at least one prime P dividing pOL ramifies
over K. If L/K is a Galois extension, then the ramification indices of all the primes dividing
pOL are equal, since the Galois group is transitive on this set of primes.
150.4.2
The local view
The phenomenon of ramification has an equivalent interpretation

T in terms of local rings.
With L/K as before, let P be a prime in OL with p := P OK . Then the induced
map of localizations (OK )p , (OL )P is a local homomorphism of local rings (in fact, of
discrete valuation rings), and the ramification index of P over p is the unique natural number e such that
p(OL )P = (P(OL )P)e (OL )P.
An astute reader may notice that this formulation of ramification index does not require
that L and K be number fields, or even that they play any role at all. We take advantage
of this fact here to give a second, more general definition.
692
Definition 6 (Second definition). Let : A B be any ring homomorphism. Suppose

P B is a prime ideal such that the localization BP of B at P is a discrete valuation ring.
Let p be the prime ideal 1 (P) A, so that induces a local homomorphism P : Ap BP.
Then the ramification index e(P/p) is defined to be the unique natural number such that
(p)BP = (PBP)e(P/p) BP,
or if (p)BP = (0).
The reader who is not interested in local rings may assume that A and B are unique factorization domains,
in which case e(P/p) is the exponent of P in the factorization of the ideal (p)B, just as in our
first definition (but without the requirement that the rings A and B originate from number
fields).
There is of course much more that can be said about ramification indices even in this purely
algebraic setting, but we limit ourselves to the following remarks:
1. Suppose A and B are themselves discrete valuation rings, with respective maximal ideals
:= lim B/Pn be the completions of A and B with
p and P. Let A := lim A/pn and B
respect to p and P. Then

A).
e(P/p) = e(PB/p
(150.4.1)
In other words, the ramification index of P over p in the Aalgebra B equals the
ramification index in the completions of A and B with respect to p and P.
2. Suppose A and B are Dedekind domains, with respective fraction fields K and L. If
B equals the integral closure of A in L, then
X
e(P/p)f (P/p) 6 [L : K],
(150.4.2)
P|p
where P ranges over all prime ideals in B that divide pB, and f (P/p) := dimA/p(B/P)
is the inertial degree of P over p. Equality holds in Equation (150.4.2) whenever B is
finitely generated as an Amodule.
150.4.3
Ramification in algebraic geometry
The word ramify in English means to divide into two or more branches, and we will
show in this section that the mathematical term lives up to its common English meaning.
Definition 7 (Algebraic version). Let f : C1 C2 be a nonconstant regular morphism
of curves (by which we mean one dimensional nonsingular irreducible algebraic varieties) over
an algebraically closed field k. Then f has a nonzero degree n := deg f , which can be defined
in any of the following ways:
693
Figure 150.1: The function f (y) = y 2 near y = 0.

The number of points in a generic fiber f 1 (p), for p C2
The maximum number of points in f 1 (p), for p C2
The degree of the extension k(C1 )/f k(C2 ) of function fields
There is a finite set of points p C2 for which the inverse image f 1 (p) does not have size
n, and we call these points the branch points or ramification points of f . If P C1 with
f (P ) = p, then the ramification index e(P/p) of f at P is the ramification index obtained
algebraically from Definition 6 by taking
A = k[C2 ]p , the local ring consisting of all rational functions in the function field k(C2 )
which are regular at p.
B = k[C1 ]P , the local ring consisting of all rational functions in the function field k(C1 )
which are regular at P .
p = mp , the maximal ideal in A consisting of all functions which vanish at p.
P = mP , the maximal ideal in B consisting of all functions which vanish at P .
= fp : k[C2 ]p , k[C1 ]P , the map on the function fields induced by the morphism f .
Example 1. The following picture may be worth a thousand words. Let k = C and C1 =
C2 = C = A1C . Take the map f : C C given by f (y) = y 2 . Then f is plainly a map of
degree 2, and every point in C2 except for 0 has two preimages in C1 . The point 0 is thus a
ramification point of f of index 2, and near 0 we have the following graph of f .
Note that we have only drawn the real locus of f because that is all that can fit into two
dimensions. We see from the figure that a typical point on C2 such as the point x = 1 has
two points in C1 which map to it, but that the point x = 0 has only one corresponding point
of C1 which branches or ramifies into two distinct points of C1 whenever one moves
away from 0.
150.4.4
Relation to the number field case
The relationship between Definition 6 and Definition 7 is easiest to explain in the case where
f is a map between affine varieties. When C1 and C2 are affine, then their coordinate rings
k[C1 ] and k[C2 ] are Dedekind domains, and the points of the curve C1 (respectively, C2 )
correspond naturally with the maximal ideals of the ring k[C1 ] (respectively, k[C2 ]). The
ramification points of the curve C1 are then exactly the points of C1 which correspond
to maximal ideals of k[C1 ] that ramify in the algebraic sense, with respect to the map
f : k[C2 ] k[C1 ] of coordinate rings.
694
Equation (150.4.2) in this case says

X
e(P/p) = n,
P f 1 (p)
and we see that the well known formula (150.4.2) in number theory is simply the algebraic analogue of the geometric fact that the number of points in the fiber of f , counting
multiplicities, is always n.
Example 2. Let f : C C be given by f (y) = y 2 as in Example 1. Since C2 is just the
affine line, the coordinate ring C[C2 ] is equal to C[X], the polynomial ring in one variable
over C. Likewise, C[C1 ] = C[Y ], and the induced map f : C[X] C[Y ] is naturally given
by f (X) = Y 2 . We may accordingly identify the coordinate ring C[C2 ] with the subring
C[X 2 ] of C[X] = C[C1 ].
Now, the ring C[X] is a principal ideal domain, and the maximal ideals in C[X] are exactly
the principal ideals of the form (X a) for any a C. Hence the nonzero prime ideals in
C[X 2 ] are of the form (X 2 a), and these factor in C[X] as
(X 2 a) = (X a)(X + a) C[X].
Note that the two prime ideals (X a) and (X + a) of C[X] are equal only when a = 0,
so we see that the ideal (X 2 a) in C[X 2 ], corresponding to the point a C2 , ramifies in C1
exactly when a = 0. We have therefore recovered our previous geometric characterization of
the ramified points of f , solely in terms of the algebraic factorizations of ideals in C[X].
In the case where f is a map between projective varieties, Definition 6 does not directly
apply to the coordinate rings of C1 and C2 , but only to those of open covers of C1 and C2
by affine varieties. Thus we do have an instance of yet another new phenomenon here, and
rather than keep the reader in suspense we jump straight to the final, most general definition
of ramification that we will give.
Definition 8 (Final form). Let f : (X, OX ) (Y, OY ) be a morphism of locally ringed spaces.
Let p X and suppose that the stalk (OX )p is a discrete valuation ring. Write p :
(OY )f (p) (OX )p for the induced map of f on stalks at p. Then the ramification index of p over Y is the unique natural number e, if it exists (or if it does not exist), such
that
p (mf (p) )(OX )p = mep ,
where mp and mf (p) are the respective maximal ideals of (OX )p and (OY )f (p) . We say p is
ramified in Y if e > 1.
Example 3. A ring homomorphism : A B corresponds functorially to a morphism
Spec(B) Spec(A) of locally ringed spaces from the prime spectrum of B to that of A,
and the algebraic notion of ramification from Definition 6 equals the sheaftheoretic notion
of ramification from Definition 159.4.3.
695
Example 4. For any morphism of varieties f : C1 C2 , there is an induced morphism

f # on the structure sheaves of C1 and C2 , which are locally ringed spaces. If C1 and C2
are curves, then the stalks are one dimensional regular local rings and therefore discrete
valuation rings, so in this way we recover the algebraic geometric definition (Definition 7)
from the sheaf definition (Definition 159.4.3).
150.4.5
Ramification in complex analysis
Ramification points or branch points in complex geometry are merely a special case of the
highflown terminology of Definition 159.4.3. However, they are important enough to merit
a separate mention here.
Definition 9 (Analytic version). Let f : M N be a holomorphic map of Riemann
surfaces. For any p M, there exists local coordinate charts U and V around p and f (p)
such that f is locally the map z 7 z e from U to V . The natural number e is called the
ramification index of f at p, and p is said to be a branch point or ramification point of f if
e > 1.
Example 5. Take the map f : C C, f (y) = y 2 of Example 1. We study the behavior
of f near the unramified point y = 1 and near the ramified point y = 0. Near y = 1, take
the coordinate w = y 1 on the domain and v = x 1 on the range. Then f maps w + 1
to (w + 1)2 , which in the v coordinate is (w + 1)2 1 = 2w + w 2. If we change coordinates
to z = 2w + w 2 on the domain, keeping v on the range, then f (z) = z, so the ramification
index of f at y = 1 is equal to 1.
Near y = 0, the function f (y) = y 2 is already in the form z 7 z e with e = 2, so the
ramification index of f at y = 0 is equal to 2.
150.4.6
Algebraicanalytic correspondence
Of course, the analytic notion of ramification given in Definition 9 can be couched in terms
of locally ringed spaces as well. Any Riemann surface together with its sheaf of holomorphic
functions is a locally ringed space. Furthermore the stalk at any point is always a discrete
valuation ring, because germs of holomorphic functions have Taylor expansions making the
stalk isomorphic to the power series ring C[[z]]. We can therefore apply Definition 159.4.3 to
any holomorphic map of Riemann surfaces, and it is not surprising that this process yields
the same results as Definition 9.
More generally, every map of algebraic varieties f : V W can be interpreted as a
holomorphic map of Riemann surfaces in the usual way, and the ramification points on V
and W under f as algebraic varieties are identical to their ramification points as Riemann
surfaces. It turns out that the analytic structure may be regarded in a certain sense as
696
the completion of the algebraic structure, and in this sense the algebraicanalytic correspondence between the ramification points may be regarded as the geometric version of the
equality (150.4.1) in number theory.
The algebraicanalytic correspondence of ramification points is itself only one manifestation
of the wide ranging identification between algebraic geometry and analytic geometry which
is explained to great effect in the seminal paper of Serre [6].
REFERENCES
1. Robin Hartshorne, Algebraic Geometry, SpringerVerlag, 1977 (GTM 52).
2. Gerald Janusz, Algebraic Number Fields, Second Edition, American Mathematical Society, 1996
(GSM 7).
3. J
urgen Jost, Compact Riemann Surfaces, SpringerVerlag, 1997.
4. Dino Lorenzini, An Invitation to Arithmetic Geometry, American Mathematical Society, 1996
(GSM 9).
5. JeanPierre Serre, Local Fields, SpringerVerlag, 1979 (GTM 67).
6. JeanPierre Serre, Geometrie algebraique et geometrie analytique, Ann. de LInst. Fourier 6
pp. 142, 195556.
7. Joseph Silverman, The Arithmetic of Elliptic Curves, SpringerVerlag, 1986 (GTM 106).
Version: 11 Owner: saforres Author(s): djao, saforres
150.5
unramified action
Let K be a number field and let be a discrete valuation on K (this might be, for example,
the valuation attached to a prime ideal P of K).
Let K be the completion of K at , and let O be the ring of integers of K , i.e.
O = {k K | (k) > 0}
The maximal ideal of O will be denoted by
M = {k K | (k) > 0}
and we denote by k the residue field of K , which is
k = O /M
We will consider three different global Galois groups, namely
GK/K = Gal(K/K)
697
GK /K = Gal(K /K )
Gk /k = Gal(k /k )
where K, K , k are separable algebraic closures of the corresponding field. We also define
notation for the inertia group of GK /K
I GK /K
Definition 8. Let S be a set and suppose there is a group action of Gal(K /K ) on S. We
say that S is unramified at , or the action of GK /K on S is unramified at , if the action
of I on S is trivial, i.e.
(s) = s I , s S
Remark: By Galois theory we know that, Knr , the fixed field of I , the inertia subgroup,
is the maximal unramified extension of K , so
I
= Gal(K /Knr )
698
Chapter 151
11S31 Class field theory; p-adic
formal groups
151.1
Hilbert symbol
Let K be any local field. For any two nonzero elements a, b K , we define:
(
+1 if z 2 = ax2 + by 2 has a nonzero solution (x, y, z) 6= (0, 0, 0) in K 3 ,
(a, b) :=
1 otherwise.
The number (a, b) is called the Hilbert symbol of a and b in K.
699
Chapter 152
11S99 Miscellaneous
152.1
p-adic integers
152.1.1
Basic construction
For any prime p, the padic integers is the ring obtained by taking the completion of the
ring Z with respect to the metric induced by the valuation
|x| :=
1
pp (x)
, x Z,
(152.1.1)
where p (x) denotes the largest integer e such that pe divides x. The ring of padic integers
is usually denoted by Zp , and its fraction field by Qp .
152.1.2
Profinite viewpoint
The ring Zp of padic integers can also be constructed by taking the inverse limit
Zp := lim Z/pn Z
over the inverse system Z/p2 Z Z/pZ 0 consisting of the rings Z/pn Z, for all
n > 0, with the projection maps defined to be the unique maps such that the diagram
Z
Z/pn Z
Z/pn+1 Z
commutes. An algebraic and topological isomorphism between the two constructions is

obtained by taking the coordinatewise projection map Z lim Z/pn Z, extended to the
completion of Z under the padic metric.

700
This alternate characterization shows that Zp is compact, since it is a closed subspace of the
space
Y
Z/pn Z
n>0
which is an infinite product of finite topological spaces and hence compact under the product topology.
152.1.3
Generalizations
If we interpret the prime p as an equivalence class of valuations on Q, then the field Qp is

simply the completion of the topological field Q with respect to the metric induced by any
member valuation of p (indeed, the valuation defined in Equation (152.1.1) may serve as
the representative). This notion easily generalizes to other fields and valuations; namely,
if K is any field, and p is any prime of K, then the padic field Kp is defined to be the
completion of K with respect to any valuation in p. The analogue of the padic integers in
this case can be obtained by taking the subset (and subring) of Kp consisting of all elements
of absolute value less than or equal to 1, which is well defined independent of the choice of
valuation representing p.
In the special case where K is a number field, the padic ring Kp is always a finite extension
of Qp whenever p is a finite prime, and is always equal to either R or C whenever p is an
infinite prime.
152.2
local field
A local field is a topological field which is Hausdorff and locally compact as a topological space.
Examples of local fields include:
Any field together with the discrete topology.
The field R of real numbers.
The field C of complex numbers.
The field Qp of padic rationals, or any finite extension thereof.
The field Fq ((t)) of formal Laurent series in one variable t with coefficients in the
finite field Fq of q elements.
In fact, this list is completeevery local field is isomorphic as a topological field to one of
the above fields.
701
152.2.1
Acknowledgements
This document is dedicated to those who made it all the way through Serres book [1] before
realizing that nowhere within the book is there a definition of the term local field.
REFERENCES
1. JeanPierre Serre, Local Fields, SpringerVerlag, 1979 (GTM 67).
702
Chapter 153
11Y05 Factorization
153.1
Pollards rho method
Say, for example, that you have a big number n and you want to know the factors of n. Lets
use 16843009. And, say, for example, that we know that n is a not a prime number. In this
case, I know it isnt because I multiplied two prime numbers together to make n. (For the
crypto weenies out there, you know that there a lot of numbers lying around which were
made by multiplying two prime numbers together. And, you probably wouldnt mind finding
the factors of some of them.) In cases where you dont know, a priori, that the number is
composite, there are a variety of methods to test for compositeness.
Lets assume that n has a factor d. Since we know n is composite, we know that there must
be one. We just dont know what its value happens to be. But, there are some things that
we do know about d. First of all, d is smaller than n. In fact, there are at least some d which
are no bigger than the square root of n.
So, how does this help? If you start picking numbers at random (keeping your numbers
greater or equal to zero and strictly less than n), then the only time you will get a
b (mod#1) is when a and b are identical. However, since d is smaller than n, there is a good
chance that a b (mod#1) sometimes when a 6= b.
Well, if a b (mod#1), that means that (a b) is a multiple of d. Since n is also a multiple
of d, the greatest common divisor of (a b) and n is a positive, integer multiple of d. We can
keep picking numbers randomly until the greatest common divisor of n and the difference
of two of our random numbers is greater than one. Then, we can divide n by whatever this
greatest common divisor turned out to be. In doing so, we have broken down n into two
factors. If we suspect that the factors may be composite, we can continue trying to break
them down further by doing the algorithm again on each half.
The amazing thing here is that through all of this, we just knew there had to be some divisor
703
of n. We were able to use properties of that divisor to our advantage before we even knew
what the divisor was!
This is at the heart of Pollards rho method. Pick a random number a. Pick another random
number b. See if the greatest common divisor of (a b) and n is greater than one. If not,
pick another random number c. Now, check the greatest common divisor of (c b) and n.
If that is not greater than one, check the greatest common divisor of (c a) and n. If that
doesnt work, pick another random number d. Check (d c), (d b), and (d a). Continue
in this way until you find a factor.
As you can see from the above paragraph, this could get quite cumbersome quite quickly. By
the k-th iteration, you will have to do (k 1) greatest common divisor checks. Fortunately,
there is way around that. By structuring the way in which you pick random numbers, you
can avoid this buildup.
Lets say we have some polynomial f (x) that we can use to pick random numbers. Because
were only concerned with numbers from zero up to (but not including) n, we will take all of
the values of f (x) modulo n. We start with some x1 . We then pick our random numbers
by xk+1 = (f (xk ) (mod#1)).
Now, say for example we get to some point k where xk xj (mod#1) with k < j. Then,
because of the way that modulo arithmetic works, f (xk ) will be congruent to f (xj ) modulo
d. So, once we hit upon xk and xj , then each element in the sequence starting with xk will be
congruent modulo d to the corresponding element in the sequence starting at xj . Thus, once
the sequence gets to xk it has looped back upon itself to match up with xj (when considering
them modulo d).
This looping is what gives the rho method its name. If you go back through (once you
determine d) and look at the sequence of random numbers that you used (looking at them
modulo d), you will see that they start off just going along by themselves for a bit. Then,
they start to come back upon themselves. They dont typically loop the whole way back to
the first number of your sequence. So, they have a bit of a tail and a loopjust like the
greek letter rho ().
Before we see why that looping helps, we will first speak to why it has to happen. When
we consider a number modulo d, we are only considering the numbers greater than or equal
to zero and strictly less than d. This is a very finite set of numbers. Your random sequence
cannot possibly go on for more than d numbers without having some number repeat modulo
d. And, if the function f (x) is well-chosen, you can probably loop back a great deal sooner.
The looping helps because it means that we can get away without accumulating the number
of greatest common divisor steps we need to perform with each new random number. In
fact, it makes it so that we only need to do one greatest common divisor check for every
second random number that we pick.
Now, why is that? Lets assume that the loop is of length t and starts at the j-th random
number. Say that we are on the k-th element of our random sequence. Furthermore, say
704
that k is greater than or equal to j and t divides k. Because k is greater than j we know it
is inside the looping part of the . We also know that if t divides k, then t also divides 2k.
What this means is that x2k and xk will be congruent modulo d because they correspond
to the same point on the loop. Because they are congruent modulo d, their difference is a
multiple of d. So, if we check the greatest common divisor of (xk xk/2 ) with n every time we
get to an even k, we will find some factor of n without having to do k 1 greatest common
divisor calculations every time we come up with a new random number. Instead, we only
have to do one greatest common divisor calculation for every second random number.
The only open question is what to use for a polynomial f (x) to get some random numbers
which dont have too many choices modulo d. Since we dont usually know much about d,
we really cant tailor the polynomial too much. A typical choice of polynomial is
f (x) = x2 + a
where a is some constant which isnt congruent to 0 or 2 modulo n. If you dont place
those restrictions on a, then you will end up degenerating into the sequence {1, 1, 1, 1, ...} as
soon as you hit upon some x which is congruent to either 1 or 1 modulo n.
Lets use the algorithm now to factor our number 16843009. We will use the sequence x1 = 1
with xn+1 = (1024x2n + 32767 (mod#1)). [ I also tried it with the very basic polynomial
f (x) = x2 + 1, but that one went 80 rounds before stopping so I didnt include the table
here.]
k
xk
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1
33791
10832340
12473782
4239855
309274
11965503
15903688
3345998
2476108
11948879
9350010
4540646
858249
14246641
4073290
4451768
14770419
gcd(n, xk xk/2 )
1
1
n
1
n
1
n
n
257
Lets try to factor again with a different random number schema. We will use the sequence
705
x1 = 1 with xn+1 = (2048x2n + 32767 (mod#1)).

k
xk
1
2
3
4
5
6
1
34815
9016138
4752700
1678844
14535213
gcd(n, xk xk/2 )
1
1
257
Version: 3 Owner: patrickwonders Author(s): patrickwonders
153.2
quadratic sieve
Algorithm To factor a number n using the quadratic sieve, one seeks two numbers x and
y which are not congruent modulo n with x not congruent to y modulo n but have x2 y 2
(mod n). If two such numbers are found, one can then say that (x + y)(x y) 0 (mod n).
Then, x + y and x y must have non-trivial factors in common with n.
The quadratic sieve method of factoring depends upon being able to create a set of numbers
whose factorization can be expressed as a product of pre-chosen primes. These factorizations
are recorded as vectors of the exponents. Once enough vectors are collected to form a set
which contains a linear dependence, this linear dependence is exploited to find two squares
which are equivalent modulo n.
To accomplish this, the quadratic sieve method uses a set of prime numbers called a factor
base. Then, it searches for numbers which can be factored entirely within that factor base.
If there are k prime numbers in the factor base, then each number which can be factored
within the factor base is stored as a k-dimensional vector where the i-th component of the
vector for y gives the exponent of the i-th prime from the factor base in the factorization of
y. For example, if the factor base were {2, 3, 5, 7, 11, 13}, then the number y = 23 32 115
would be stored as the vector h3, 2, 0, 0, 5, 0i.
Once k + 1 of these vectors have been collected, there must be a linear dependence among
them. The k + 1 vectors are taken modulo 2 to form vectors in Zk2 . The linear dependence
among them is used to find a combination of the vectors which sum up to the zero vector
in Zk2 . Summing these vectors is equivalent to multiplying the ys to which they correspond.
And, the zero vector in Zk2 signals a perfect square.
To factor n, chose a factor base B = {p1 , p2 , . . . , pk } such that 2 B and
for each odd
prime pj in B, n is a quadratic residue of pj . Now, start picking xi near n and calculate
yi = x2i n. Clearly yi x2i (mod n). If yi can be completely factored by numbers in B,
706
then it is called B-smooth. If it is not B-smooth, then discard xi and yi and move on to a
new choice of xi . If it is B-smooth, then store xi , yi , and the vector of its exponents for the
primes in B. Also, record a copy of the exponent vector with each component taken modulo
2.
Once k + 1 vectors have been recorded, there must be a linear dependence among them.
Using the copies of the exponent vectors that were taken modulo 2, determine which ones
can be added together to form the zero vector. Multiply together the xi that correspond to
those chosen vectorscall this x. Also, add together the original vectors that correspond
to the chosen vectors to form a new vector
component of this vector will be even.
Q~vk . Every
vi
Divide each element of ~v by 2. Form y = i=1 pi .
Because each yi x2i (mod n), x2 y 2 (mod n). If x y (mod n), then find some more
B-smooth numbers and try again. If x is not congruent to y modulo n, then (x + y) and
(x y) are factors of n.
Example Consider the number n = 16843009 The integer nearest its square root is 4104.
Given the factor base
B = {2, 3, 5, 7, 13}
, the first few B-smooth values of yi = f (xi ) = x2i n are:

xi
yi = f (xi )
4122
147875
4159
454272
4187
687960
4241 1143072
4497 3380000
4993 8087040
2
0
7
3
5
5
9
3
0
1
3
6
0
5
5
3
0
1
0
4
1
7 13
1 2
1 2
2 1
2 0
0 2
0 1
Using x0 = 4241 and x1 = 4497, one obtains:

y0 = 1143072 = 25 36 50 72 130
y1 = 3380000 = 25 30 54 70 132
Which results in:
x = 4241 4497 = 19071777
y = 25 33 52 71 131 = 1965600
From there:
gcd(x y, n) = 257
gcd(x + y, n) = 65537
707
It may not be completely obvious why we required that n be a quadratic residue of each pi in
the factor base B. One might intuitively think that we actually want the pi to be quadratic
residues of n instead. But, that is not the case.
We are trying to express n as:
(x + y)(x y) = x2 y 2 = n
where
y=
k
Y
pvi i
i=1
Because we end up squaring y, there is no reason that the pi would need to be quadratic
residues of n.
So, why do we require that n be a quadratic residue of each pi ? We can rewrite the x2 y 2 = n
as:
k
Y
2
i
=n
x
p2v
i
i=1
If we take that expression modulo pi for any pi for which the corresponding vi is non-zero,
we are left with:
x2 n (mod pi )
Thus, in order for pi to show up in a useful solution, n must be a quadratic residue of

pi . We would be wasting time and space to employ other primes in our factoring and
linear combinations.
Version: 6 Owner: patrickwonders Author(s): patrickwonders
708
Chapter 154
11Y55 Calculation of integer
sequences
154.1
Kolakoski sequence
A self-describing sequence {kn }

k=0 of alternating blocks of 1s and 2s, given by the following rules:
k0 = 1.1
kn is the length of the (n + 1)th block.
Thus, the sequence begins 1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1, ...
It is conjectured that the density of 1s in the sequence is 0.5. It is not known whether the
1s have a density; however, it is known that were this true, that density would be 0.5. It is
also not known whether the sequence is a strongly recurrent sequence; this too would imply
density 0.5.
Extensive computer experiments strongly support the conjecture. Furthermore, if on is the
number of 1s in the first n elements, then it appears that on = 0.5n + O(log n). Note for
comparison that for a random sequence of
1s and 2s, the number of 1s in the first n
elements is with high probability 0.5n + O( n).
To generate rapidly a large number of elements of the sequence, it is most efficient to build
a heirarchy of generators for the sequence. If the conjecture is correct, then the depth of
this heirarchy is only O(log n) to generate the first n elements.
1
Some sources start the sequence at k0 = 2, instead. This only has the effect of shifting the sequence by
one position.
709
This is sequence A000002 in the Online Encyclopedia of Integer Sequences.

710
Chapter 155
11Z05 Miscellaneous applications of
number theory
155.1
function
The function takes positive integers as its input and gives the number of positive divisors
of its input as its output. For example, since 1, 2, and 4 are all of the positive divisors of 4,
then (4) = 3. As another example, since 1, 2, 5, and 10 are all of the positive divisors of
10, then (10) = 4.
The function behaves according to the following two rules:
1. If p is a prime and x 1. is a nonnegative integer, then (px ) = x + 1.
2. If gcd(a, b) = 1, then (ab) = (a) (b).
Because these two rules hold for the function, it is a multiplicative function.
Note that these rules work for the previous two examples. Since 2 is prime, then (4) =
(22 ) = 2 + 1 = 3. Since 2 and 5 are distinct primes, then (10) = (2 5) = (2) (5) =
(1 + 1)(1 + 1) = 4.
The function is extremely useful for studying cyclic rings.
155.2
arithmetic derivative
The arithmetic derivative n0 of a natural number n is defined by the following rules:

711
p0 = 1 for any prime p.

(ab)0 = a0 b + ab0 for any a, b N (Leibniz rule).
Note: One of the major contributors to the theory of the arithmetic derivative is E.J.
Barbeau who published Remark on an arithmetic derivative in 1961.
155.3
example of arithmetic derivative
Consider the natural number 6. Using the rules of the arithmetic derivative we get:
60 = (2 3)0 = 20 3 + 2 30 = 1 3 + 2 1 = 5
Below is
tives:
n 1
n0 0
n00 0
a list of the 10 first natural numbers and their first and second arithmetic deriva2 3 4
1 1 4
0 0 4
5 6
1 5
0 1
7 8 9 10
1 12 6 7
0 16 5 1
155.4
proof that (n) is the number of positive divisors

of n
Following is a proof that, if behaves according to the following two rules...

1. If p is a prime and x 1. is a nonnegative integer, 1. then (px ) = x + 1.
2. If gcd(a, b) = 1, then (ab) = (a) (b).
...then counts the positive divisors of its input, which must be a positive integer.
Let p be a prime. Then p0 = 1. Since 1 is the only positive divisor of 1 and (1) = (p0 ) =
0 + 1 = 1, then (1) is equal to the number of positive divisors of 1.
Suppose that, for all positive integers k smaller than z Z with z > 1, the number of
positive divisors of k is (k). Since z > 1, then z has a prime divisor. Let p be a prime that
divides z. Let x Z+ such that px divides z and px+1 does not divide z. Let a Z+ such
that z = px a. Then gcd(a, p) = 1. Thus, gcd(a, px ) = 1. Since a < z, then, by the induction
hypothesis, there are (a) positive divisors of a.
712
Let d be a positive divisor of z. Let y be a nonnegative integer such that py divides d

and py+1 does not divide d. Thus, 0 y x, and there are x + 1 choices for y. Let
c Z+ such that d = py c. Then gcd(c, p) = 1. Since c divides d and d divides z, then
c divides z. Since c divides px a and gcd(c, p) = 1, then c divides a. Thus, there are (a)
choices for c. Since there are x + 1 choices for y and there are (a) choices for c, then there
are (x + 1) (a) choices for d. Hence, there are (x + 1) (a) positive divisors of z. Since
(z) = (px a) = (px ) (a) = (x + 1) (a), it follows that, for every n Z+ , the number of
positive divisors of n is (n).
713
Chapter 156
156.1
monomial
A monomial is a product of non-negative powers of variables. It may also include an

optional coefficient (which is sometimes ignored when discussing particular properties of
monomials). A polynomial can be thought of as a sum over a set of monomials.
For example, the following are monomials.
1
x2 y
xyz 3x4 y 2 z 3 z
If there are n variables from which a monomial may be formed, then a monomial may be
represented without its coefficient as a vector of n naturals. Each position in this vector
would correspond to a particular variable, and the value of the element at each position
would correspond to the power of that variable in the monomial. For instance, the monomial
T
x2 yz 3 formed from the set of variables {w, x, y, z} would be represented as 0 2 1 3 . A
constant would be a zero vector.
Given this representation, we may define a few more concepts. First, the degree of a
monomial is the sum of the elements of its vector representation. Thus, the degree of x2 yz 3
is 0 + 2 + 1 + 3 = 6, and the degree of a constant is 0. If a polynomial is represented as a
sum over a set of monomials, then the degree of a polynomial can be defined as the degree
of the monomial of largest degree belonging to that polynomial.
714
Version: 2 Owner: bbukh Author(s): bbukh, Logan
156.2
order and degree of polynomial
Let f be a polynomial in two variables, viz. f (x, y) =

Then the degree of f is given by:
i,j
aij xi y j .1
degf = sup{i + j|aij 6= 0}

Note the S
degreeSof the zero-polynomial is , since sup (per definition) is , thus
degf N {0} {}.
Similarly the order of f is given by:
ordf = inf{i + j|aij 6= 0}

Note the order of the zero-polynomial is (because inf = ). Thus ordf N
S
{0} {}.
Please note that the term order is not as common as degree. In fact, it is perhaps more
frequently associated with power series (a form of generalized polynomials) than with ordinary polynomials. Also be aware that the term order occasionally is used as a synonym for
degree.
1
In order to simplify the notation, the definition is given in terms of a polynomial in two variables, however
the definition naturally scales to any number of variables.
715
Chapter 157
12-XX Field theory and polynomials
157.1
homogeneous polynomial
A polynomial P (x1 , , xn ) of degree k is called homogeneous if P (cx1 , , cxn ) = ck P (x1 , , xn )

for all constants c.
An equivalent definition is that all terms of the polynomial have the same degree (i.e. k).
Observe that a polynomial P is homogeneous iff deg P = (P ).
As an important example of homogeneous polynomials one can mention the symmetric polynomials.
157.2
subfield
Let F be a field and S a subset such that S with the inherited operations from F is also a
field. Then we say that S is a subfield of F .
716
Chapter 158
12D05 Polynomials: factorization
158.1
factor theorem
If f (x) is a polynomial, then x a is a factor if and only if a is a root (that is, f (a) = 0).
This theorem is of great help for finding factorizations of higher order polynomials. As
example, let us think that we need to factor the polynomial p(x) = x3 + 3x2 33x 35.
With some help of the rational root theorem we can find that x = 1 is a root (that is,
p(1) = 0), so we know (x + 1) must be a factor of the polynomial. We can write then
p(x) = (x + 1)q(x)
where the polynomial q(x) can be found using long or synthetic division of p(x) between
x 1. Some calculations show us that for the example q(x) = x2 + 2x 35 which can be
easily factored as (x 5)(x + 7). We conclude that
p(x) = (x + 1)(x 5)(x + 7).
158.2
proof of factor theorem
Suppose that f (x) is a polynomial of degree n1. It is infinitely differentiable, and therefore
has a Taylor series expansion about a. Since f (n) (x) = 0, the expansion terminates after the
n 1 term. Also, the nth remainder of the Taylor series vanishes.
Rn (x) =
f (n) (y) n
x =0
n!
717
Thus the function is equal to its Taylor series.

n1
X f (k) (a)
f 00 (a)
f 0 (a)
2
(x a) +
(x a) +
(x a)k
f (x) = f (a) +
1!
2!
k!
k=3
f (x) = f (a) +
n1 (k)
X
f (a)
k!
k=1
f (x) = f (a) + (x a)
(x a)k
n1 (k)
X
f (a)
k=1
k!
(x a)k1
Now if x a is a factor of f (x) then we can write f (x) = (x a)g(x) for some polynomial
g(x). If f (a) = 0 we have
f (x) = (x a)
n1 (k)
X
f (a)
k=1
k!
(x a)k1 ,
and therefore f (x) = (x a)g(x). So x a is a factor of f (x). Now if f (x) = (x a)g(x),

that is if x a is a factor of f (x), we immediately havef (a) = (a a)g(a) = 0. Thus x a
is a factor of f (x) if and only if f (a) = 0.
Version: 2 Owner: volator Author(s): volator
158.3
proof of rational root theorem
Let p/q be a root of p(x). Then we have

n1

n
p
p
p
+ an1
+ . . . + a1
+ a0 = 0.
an
n
n1
q
q
q
Now multiply through by q n , and do some simple rearrangements to obtain:
an pn + an1 pn1 q + . . . + a1 pq n1 + a0 q n = 0
a0 q n = an pn an1 pn1 q . . . a1 pq n1
a0 q n = p(an pn1 an1 pn2 q . . . a1 q n1 ).
So p | a0 q n and by hypothesis gcd(p, q) = 1. This implies that p | a0 . After similar
rearrangements, we obtain:
an pn = q(an1 pn1 . . . a1 pq n2 a0 q n1).
So q | an pn and q | an .
Version: 2 Owner: bs Author(s): bs
718
158.4
rational root theorem
Consider the polynomial

p(x) = an xn + an1 xn1 + + a1 x + a0
where all the coefficients ai are integers.
If p(x) has a rational root p/q where gcd(p, q) = 1, then p|a0 and q|an .
This theorem is a special case of a result about polynomial whose coefficients belong to a
unique factorization domain. The theorem then states that any root in the fraction field is
also in the base domain.
158.5
sextic equation
The sextic Equation is the univariate polynomial of the sixth degree:

x6 + ax5 + bx4 + cx3 + dx2 + ex + f = 0.
Joubert showed in 1861 that this polynomial can be reduced without any form of accessory
irrationalities to the 3-parameterized resolvent:
x6 + ax4 + bx2 + cx + c = 0.
This polynomial was studied in great detail by Felix Klein and Robert Fricke in the 19th
century and it is diectly related to the algebraic aspect of Hilberts 13th Problem. Its solution
has been reduced (by Klein) to the solution of the so-called Valentiner Form problem, a
ternary form problem which seeks the ratios of the variables involved in the invariant system
of the Valentiner group of order 360. It can also be solved with a class of generalized
hypergeometric series, by Birkelands approach to algebraic equations. Scott Crass has
given an explicit solution to the Valentiner problem by purely iterational methods, see
Version: 9 Owner: mathcam Author(s): ottem, mathcam
719
Chapter 159
12D10 Polynomials: location of
zeros (algebraic theorems)
159.1
Cardanos derivation of the cubic formula
To solve the cubic polynomial equation x3 + ax2 + bx + c = 0 for x, the first step is to apply
the Tchirnhaus transformation x = y a3 . This reduces the equation to y 3 + py + q = 0,
where
a2
3
ab 2a3
q = c
+
3
27
p = b
The next step is to substitute y = u v, to obtain

(u v)3 + p(u v) + q = 0
(159.1.1)
(q (v 3 u3 )) + (u v)(p 3uv) = 0
(159.1.2)
or, with the terms collected,
From equation (159.1.2), we see that if u and v are chosen so that q = v 3 u3 and p = 3uv,
then y = u v will satisfy equation (159.1.1), and the cubic equation will be solved!
There remains the matter of solving q = v 3 u3 and p = 3uv for u and v. From the second
equation, we get v = p/(3u), and substituting this v into the first equation yields
q=
p3
u3
(3u)3
720
which is a quadratic equation in u3 . Solving for u3 using the quadratic formula, we get
p
27q + 108p3 + 729q 2
3
u =
p 54
27q + 108p3 + 729q 2
v3 =
54
Using these values for u and v, you can backsubstitute y = u v, p = b a2 /3, q =
c ab/3 + 2a3 /27, and x = y a/3 to get the expression for the first root r1 in the cubic
formula. The second and third roots r2 and r3 are obtained by performing synthetic division
using r1 , and using the quadratic formula on the remaining quadratic factor.
159.2
Ferrari-Cardano derivation of the quartic formula
Given a quartic equation x4 + ax3 + bx2 + cx + d = 0, apply the Tchirnhaus transformation

x 7 y a4 to obtain
y 4 + py 2 + qy + r = 0
(159.2.1)
where
3a2
8
ab a3
+
q = c
2
8
ac a2 b 3a4
+
r = d
4
16
256
p = b
clearly a solution to Equation (159.2.1) solves the original, so we replace the original equation
with Equation (159.2.1). Move qy + r to the other side and complete the square on the left
to get:
(y 2 + p)2 = py 2 qy + (p2 r).
We now wish to add the quantity (y 2 + p + z)2 (y 2 + p)2 to both sides, for some unspecified
value of z whose purpose will be made clear in what follows. Note that (y 2 +p+z)2 (y 2 +p)2
is a quadratic in y. Carrying out this addition, we get
(y 2 + p + z)2 = (p + 2z)y 2 qy + (z 2 + 2pz + p2 r)
(159.2.2)
The goal is now to choose a value for z which makes the right hand side of Equation (159.2.2)
a perfect square. The right hand side is a quadratic polynomial in y whose discriminant is
8z 3 20pz 2 + (8r 16p2 )z + q 2 + 4pr 4p3 .
721
Our goal will be achieved if we can find a value for z which makes this discriminant zero.
But the above polynomial is a cubic polynomial in z, so its roots can be found using the
cubic formula. Choosing then such a value for z, we may rewrite Equation (159.2.2) as
(y 2 + p + z)2 = (sy + t)2
for some (complicated!) values s and t, and then taking the square root of both sides and
solving the resulting quadratic equation in y provides a root of Equation (159.2.1).
159.3
Galois-theoretic derivation of the cubic formula
We are trying to find the roots r1 , r2 , r3 of the polynomial x3 + ax2 + bx + c = 0. From the
equation
(x r1 )(x r2 )(x r3 ) = x3 + ax2 + bx + c
we see that
a = (r1 + r2 + r3 )
b = r1 r2 + r1 r3 + r2 r3
c = r1 r2 r3
The goal is to explicitly construct a radical tower over the field k = C(a, b, c) that contains
the three roots r1 , r2 , r3 .
Let L = C(r1 , r2 , r3 ). By Galois theory we know that ((|Q)L/C(a, b, c)) = S3 . Let K L
be the fixed field of A3 S3 . We have a tower of field extensions
L = C(r1 , r2 , r3 )
A3
K=?
S3 /A3
k = C(a, b, c)
which we know from Galois theory is radical. We use Galois theory to find K and exhibit
radical generators for these extensions.
Let := (123) be a generator of ((|Q)L/K) = A3 . Let = e2i/3 C L be a primitive
cube root of unity. Since has norm 1, Hilberts Theorem 90 tells us that = y/(y) for
some y L. Galois theory (or Kummer theory) then tells us that L = K(y) and y 3 K,
thus exhibiting L as a radical extension of K.
722
The proof of Hilberts Theorem 90 provides a procedure for finding y, which is as follows:
choose any x L, form the quantity
x + 2(x) + 3 2 (x);
then this quantity automatically yields a suitable value for y provided that it is nonzero. In
particular, choosing x = r2 yields
y = r1 + r2 + 2 r3 .
and we have L = K(y) with y 3 K. Moreover, since := (23) does not fix y 3 , it follows
that y 3
/ k, and this, combined with [K : k] = 2, shows that K = k(y 3 ).
Set z := (y) = r1 + 2 r2 + r3 . Applying the same technique to the extension K/k, we find
that K = k(y 3 z 3 ) with (y 3 z 3 )2 k, and this exhibits K as a radical extension of k.
To get explicit formulas, start with y 3 +z 3 and y 3 z 3 , which are fixed by S3 and thus guaranteed
to be in k. Using the reduction algorithm for symmetric polynomials, we find
y 3 + z 3 = 2a3 + 9ab 27c
y 3 z 3 = (a2 3b)3
Solving this system for y and z yields
y =
z =
!1/3
(2a3 9ab + 27c)2 + 4(a2 + 3b)3
2
!1/3
p
2a3 + 9ab 27c (2a3 9ab + 27c)2 + 4(a2 + 3b)3
2
2a3 + 9ab 27c +
Now we solve the linear system

a = (r1 + r2 + r3 )
y = r1 + r2 + 2r3
z = r1 + 2 r2 + r3
and we get
1
(a + y + z)
3
1
=
(a + 2 y + z)
3
1
(a + y + 2 z)
=
3
r1 =
r2
r3
which expresses r1 , r2 , r3 as radical expressions of a, b, c by way of the previously obtained

expressions for y and z, and completes the derivation of the cubic formula.
723
159.4
Galois-theoretic derivation of the quartic formula
Let x4 + ax3 + bx2 + cx + d be a general polynomial with four roots r1 , r2 , r3 , r4 , so (x

r1 )(x r2 )(x r3 )(x r4 ) = x4 + ax3 + bx2 + cx + d. The goal is to exhibit the field extension
C(r1 , r2 , r3 , r4 )/C(a, b, c, d) as a radical extension, thereby expressing r1 , r2 , r3 , r4 in terms of
a, b, c, d by radicals.
Write N for C(r1 , r2 , r3 , r4 ) and F for C(a, b, c, d). The Galois group ((|Q)N/F ) is the
symmetric group S4 , the permutation group on the four elements {r1 , r2 , r3 , r4 }, which has
a composition series
1 Z/2 V4 A4 S4 ,
where:
A4 is the alternating group in S4 , consisting of the even permutations.
V4 = {1, (12)(34), (13)(24), (14)(23)} is the Klein four-group.
Z/2 is the twoelement subgroup {1, (12)(34)} of V4 .
Under the Galois correspondence, each of these subgroups corresponds to an intermediate
field of the extension N/F . We denote these fixed fields by (in increasing order) K, L, and
M.
We thus have a tower of field extensions, and corresponding automorphism groups:
Subgroup
Fixed field
Z/2
A4
S4
By Galois theory, or Kummer theory, each field in this diagram is a radical extension of the
one below it, and our job is done if we explicitly find what the radical extension is in each
case.
724
We start with K/F . The index of A4 in S4 is two, so K/F is a degree two extension. We
have to find an element of K that is not in F . The easiest such element to take is the element
obtained by taking the products of the differences of the roots, namely,
Y
:=
(ri rj ) = (r1 r2 )(r1 r3 )(r1 r4 )(r2 r3 )(r2 r4 )(r3 r4 ).
16i<j64
Observe that is fixed by any even permutation of the roots ri , but that () = for
any odd permutation . Accordingly, 2 is actually fixed by all of S4 , so:
K, but
/ F.
2 F .
K = F [] = F [ 2 ], thus exhibiting K/F as a radical extension.
The element 2 F is called the discriminant of the polynomial. An explicit formula for
2 can be found using the reduction algorithm for symmetric polynomials, and, although it
is not needed for our purposes, we list it here for reference:
2 = 256d3 d2 (27a4 144a2 b + 128b2 + 192ac)
c2 (27c2 18abc + 4a3 c + 4b3 a2 b2 )
2d(abc(9a2 40b) 2b3 (a2 4b) 3c2 (a2 24b)).
Next up is the extension L/K, which has degree 3 since [A4 : V4 ] = 3. We have to find an
element of N which is fixed by V4 but not by A4 . Luckily, the form of V4 almost cries out
that the following elements be used:
t1 := (r1 + r2 )(r3 + r4 )
t2 := (r1 + r3 )(r2 + r4 )
t3 := (r1 + r4 )(r2 + r3 )
These three elements of N are fixed by everything in V4 , but not by everything in A4 . They
are therefore elements of L that are not in K. Moreover, every permutation in S4 permutes
the set {t1 , t2 , t3 }, so the cubic polynomial
(x) := (x t1 )(x t2 )(x t3 )
actually has coefficients in F ! In fancier language, the cubic polynomial (x) defines a
cubic extension E of F which is linearly disjoint from K, with the composite extension EK
equal to L. The polynomial (x) is called the resolvent cubic of the quartic polynomial
x4 + ax3 + bx2 + cx + d. The coefficients of (x) can be found fairly easily using (again) the
reduction algorithm for symmetric polynomials, which yields
(x) = x3 2bx2 + (b2 + ac 4d)x + (c2 + a2 d abc).
725
(159.4.1)
Using the cubic formula, one can find radical expressions for the three roots of this polynomial, which are t1 , t2 , and t3 , and henceforth we assume radical expressions for these three
quantities are known. We also have L = K[t1 ], which in light of what we just said, exhibits
L/K as an explicit radical extension.
The remaining extensions are easier and the reader who has followed to this point should
have no trouble with the rest. For the degree two extension M/L, we require an element of
M that is not in L; one convenient such element is r1 + r2 , which is a root of the quadratic
polynomial
(x (r1 + r2 ))(x (r3 + r4 )) = x2 + ax + t1 L[x]
(159.4.2)
and therefore equals (a + a2 4t1 )/2. Hence M = L[r1 + r2 ] = L[(a + a2 4t1 )/2] is
a radical extension of L.
Finally, for the extension N/M, an element of N that is not in M is of course r1 , which is a
root of the quadratic polynomial
(x r1 )(x r2 ) = x2 (r1 + r2 )x + r1 r2 .
(159.4.3)
Now, r1 + r2 is known from the previous paragraph, so it remains to find an expression for
r1 r2 . Note that r1 r2 is fixed by (12)(34), so it is in M but not in L. To find it, use the
equation (t2 + t3 t1 )/2 = r1 r2 + r3 r4 , which gives
(x r1 r2 )(x r3 r4 ) = x2
(t2 + t3 t1 )
x+d
2
and, upon solving for r1 r2 with the quadratic formula, yields

p
(t2 + t3 t1 ) + (t2 + t3 t1 )2 16d
r1 r2 =
p4
(t2 + t3 t1 ) (t2 + t3 t1 )2 16d
r3 r4 =
4
(159.4.4)
(159.4.5)
We can then use this expression, combined with Equation (159.4.3), to solve for r1 using
the quadratic formula. Perhaps, at this point, our poor reader needs a summary of the
procedure, so we give one here:
1. Find t1 , t2 , and t3 by solving the resolvent cubic (Equation (159.4.1)) using the cubic
formula,
2. From Equation (159.4.2), obtain
r3 + r4
a2 4t1 )
2
(a a2 4t1 )
=
2
r1 + r2 =
(a +
726
3. Using Equation (159.4.3), write

r1 =
r2 =
r3 =
r4 =
p
(r1 + r2 )2 4(r1 r2 )
p 2
(r1 + r2 ) (r1 + r2 )2 4(r1 r2 )
p 2
(r3 + r4 ) + (r3 + r4 )2 4(r3 r4 )
p 2
(r3 + r4 ) (r3 + r4 )2 4(r3 r4 )
2
(r1 + r2 ) +
where the expressions r1 + r2 and r3 + r4 are derived in the previous step, and the
expressions r1 r2 and r3 r4 come from Equation (159.4.4) and (159.4.5).
4. Now the roots r1 , r2 , r3 , r4 of the quartic polynomial x4 + ax3 + bx2 + cx + d have been
found, and we are done!
727
159.5
cubic formula
The three roots r1 , r2 , r3 of a cubic polynomial equation x3 + ax2 + bx + c = 0 are given by
3
2 (a2 + 3 b)
a
r1 =

31
q
3
3
2
3
2
3
3 2 a + 9 a b + 4 (a + 3 b) + (2 a + 9 a b 27 c) 27 c
r2
r3
2 a + 9 a b +
4 (a2
+ 3 b) +
(2 a3
54
+ 9 a b 27 c) 27 c
13

1 + i 3 (a2 + 3 b)
a
=
+

31
q
3
3
2
3
2
3
108 2 a + 9 a b + 4 (a + 3 b) + (2 a + 9 a b 27 c) 27 c
q
13

3
2
3
2
3
2 a + 9 a b + 4 (a + 3 b) + (2 a + 9 a b 27 c) 27 c
1i 3
2
54

1 i 3 (a2 + 3 b)
a
=
+

31
q
3
3
2
3
2
3
108 2 a + 9 a b + 4 (a + 3 b) + (2 a + 9 a b 27 c) 27 c
q
13

3
2
3
2
3
2 a + 9 a b + 4 (a + 3 b) + (2 a + 9 a b 27 c) 27 c
1+i 3
2
54
159.6
derivation of quadratic formula
Suppose A, B, C are real numbers, with A 6= 0, and suppose

Ax2 + Bx + C = 0.
Since A is nonzero, we can divide by A and obtain the equation
x2 + bx + c = 0,
where b =
B
A
and c =
C
.
A
This equation can be written as

x2 + bx +
b2 b2
+ c = 0,
4
4
728
so completing the square, i.e., applying the identity (p + q)2 = p2 + 2pq + q 2 , yields

2
b
b2
c.
x+
=
2
4
Then, taking the square root of both sides, and solving for x, we obtain the solution formula
r
b
b2
x =
c
2 r4
B
C
B2
=
2
2A
A
4A
2
B B 4AC
=
.
2A
Version: 4 Owner: mathcam Author(s): matte, fiziko
159.7
quadratic formula
The roots of the quadratic equation

ax2 + bx + c = 0
a, b, c R, a 6= 0
are given by the following formula:

x=
b2 4ac
.
2a
The number = b2 4ac is called the discriminant if the equation. If > 0, there are
two different real roots, if = 0 there is a single real root (counted twice) and if < 0
there are no real roots (but two different complex roots).
Lets work a few examples.
First, consider 2x2 14x + 24 = 0. Here a = 2, b = 14, c = 24. Substituting in the formula
gives us
p
14 (14)2 2 4 24
14 4
14 2
=
=
x=
22
4
4
So we have two solutions (depending if you take the sign + or ): x = 16
= 4 and x = 12
= 3.
4
4
Now we will solve x2 x 1 = 0. Here a = 1, b = 1, c = 1 so
p
1 (1)2 4(1)(1)
1 5
=
x=
2
2
so the solutions are x =
1+ 5
2
and x =
1 5
.
2

729
159.8
quartic formula
The four roots r1 , r2 , r3 , r4 of a quartic polynomial equation x4 + ax3 + bx2 + cx + d = 0 are

given by
v
u
u
1
a2 2b
2 3 (b2 3ac + 12d)
a 1 u
u
u
+
r1 =
q
4
2t 4
3
3 2b3 9abc + 27c2 + 27a2 d 72bd + 4(b2 3ac + 12d)3 + (2b3 9abc +
r2
r3
r4
v
u
u
1
a 1 u
a2 2b
2 3 (b2 3ac + 12d)
u
=
u
+
q
4
2t 4
3
3
2
2
3 2b 9abc + 27c + 27a d 72bd + 4(b2 3ac + 12d)3 + (2b3 9abc +
v
u
u
1
a 1 u
a2 2b
2 3 (b2 3ac + 12d)
u
=
+ u
+
q
4
2t 4
3
3
2
2
v
u
u
1
a2 2b
a 1 u
2 3 (b2 3ac + 12d)
u
=
+ u
+
q
4
2t 4
3
3
2
2
159.9
reciprocal polynomial
Definition [1] Let p : C C be a polynomial of degree n with complex (or real) coefficients.
Then p is a reciprocal polynomial if
p(z) = z n p(1/z)
for all z C.
It is clear that if z is a zero for a reciprocal polynomial, then 1/z is also a zero. This property
motivates the name.
Examples of matrices whose characteristic polynomial are reciprocal are
730
1. orthogonal matrices,
2. involution matrices,
3. the Pascal matrices [2].
REFERENCES
1. H. Eves, Elementary Matrix Theory, Dover publications, 1980.
2. N.J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM, 2002.
159.10
root
Suppose youre given an function f (x) where x is an independent variable. Then, a root of
f is a number a that is solution for the equation f (x) = 0, (that is, substituting a for x into
f gives 0 as result).
Example. If f (x) = x2 4, then x = 2 is a root, since f (2) = 22 4 = 0.
Graphically, a root of f is a value where the graph of the function intersects the x-axis.
Of course the definition can be generalized to other kind of functions. The domain needs not
to be R nor the codomain. As long as the codomain has some kind of 0 element, a root will
be an element of the domain belonging to the preimage of the 0. The function f : R R
given by f (x) = x2 + 1 has no roots, but the function f : C C given by f (x) = x2 + 1 has
i as a root.
In the special case of polynomials, there are general formulas for finding roots of polynomials
with degree up to 4: the quadratic formula, the cubic formula and the quartic formula.
If we have a root a for a polynomial f (x), we divide f (x) by x a (either by polynomial long
division or synthetic division) and we are left with a polynomial with smaller degrees whose
roots are the other roots of f . We can use that result together with the rational root theorem
to find a rational root if exists, and then get a polynomial with smaller degree which possibly
we can find easily the other roots.
Considering the general case of functions y = f (x) (not necessarily polynomials) there are
several numerical methods (like Newtons method) to approximate roots. This could be
handy too for polynomials whose roots are not rational numbers.
731
159.11
variant of Cardanos derivation
By a linear change of variable, a cubic polynomial over C can be given the form x3 + 3bx + c.
To find the zeros of this cubic in the form of surds in b and c, make the substitution x =
y 1/3 + z 1/3 , thus replacing one unknown with two, and then write down identities which are
suggested by the resulting equation in two unknowns. Specifically, we get
y + 3(y 1/3 + z 1/3 )y 1/3 z 1/3 + z + 3b(y 1/3 + z 1/3 ) + c = 0.
(159.11.1)
This will be true if

y+z+c=0
(159.11.2)
3y 1/3 z 1/3 + 3b = 0,
(159.11.3)
yz = b3 .
(159.11.4)
which in turn requires
The pair of equations (1) and (3) is a quadratic system in y and z, readily solved. But notice
that (2) puts a restriction on a certain choice of cube roots.
732
Chapter 160
12D99 Miscellaneous
160.1
Archimedean property
Let x be any real number. Then there exists a natural number n such that n > x.
This theorem is known as the Archimedean property of real numbers. It is also sometimes called the axiom of Archimedes, although this name is doubly deceptive: it is neither
an axiom (it is rather a consequence of the least upper bound property) nor attributed to
Archimedes (in fact, Archimedes credits it to Eudoxus).
et x be a real number, and let S = {a N : a 6 x}. If S is empty, let n = 1; note that
x < n (otherwise 1 S).
L
Assume S is nonempty. Since S has an upper bound, S must have a least upper bound; call
it b. Now consider b 1. Since b is the least upper bound, b 1 cannot be an upper bound
of S; therefore, there exists some y S such that y > b 1. Let n = y + 1; then n > b. But
y is a natural, so n must also be a natural. Since n > b, we know n 6 S; since n 6 S, we
know n > x. Thus we have a natural greater than x.
Corollary 4. If x and y are real numbers with x > 0, there exists a natural n such that
nx > y.
ince x and y are reals, and x 6= 0, y/x is a real. By the Archimedean property, we can
choose an n N such that n > y/x. Then nx > y.
S
Corollary 5. If w is a real number greater than 0, there exists a natural n such that 0 <
1/n < w.
U
sing Corollary 1, choose n N satisfying nw > 1. Then 0 < 1/n < w. QED
Corollary 6. If x and y are real numbers with x < y, there exists a rational number a such
that x < a < y.
733
irst examine the case where 0 6 x. Using Corollary 2, find a natural n satisfying
0 < 1/n < (y x). Let S = {m N : m/n > y}. By Corollary 1 S is non-empty, so
let m0 be the least element of S and let a = (m0 1)/n. Then a < y. Furthermore, since
y 6 m0 /n, we have y 1/n < a; and x < y 1/n < a. Thus a satisfies x < a < y.
F
Now examine the case where x < 0 < y. Take a = 0.

Finally consider the case where x < y 6 0. Using the first case, let b be a rational satisfying
y < b < x. Then let a = b.
160.2
complex
There are some polynomial equations that dont have (real) solutions. Examples of these
are x2 + 5 = 0, x2 + x + 1 = 0. Mathematically we express this by saying that R is not an
algebraically closed field.
In order to solve that kind of equation, we have to extend our number system. by adding a
number i that has the property that i2 = 1. In this way we extend the field of real numbers
R to a field C whose elements are called complex numbers. The formal construction can
be seen at [complex numbers]. The field C is algebraically closed: every polynomial with
complex coefficients, and therefore every polynomial with real coefficients, has at least one
complex root (which might be real as well).
Any complex number can be written as z = x + iy (with x, y R). Here we call x the real
part of z and y the imaginary part of z. We write this as
x = Re(z)
y = Im(z)
Real numbers are a subset of complex numbers, and a real number r can be written also as
r + i0. Thus, a complex number is real if and only if its imaginary part is equal to zero.
By writing x + iy as (x, y) we can also look at complex numbers as ordered pairs. With this
notation, real numbers are the pairs of the form (r, 0).
The rules of addition and multiplication for complex numbers are:
(a + ib) + (x + iy) = (a + x) + i(b + y)
(a + ib)(x + iy) = (ax by) + i(ay + bx)
(a, b) + (x, y) = (a + x, b + y)
(a, b)(x, y) = (ax by, ay + bx)
(to see why the last identity holds, expand the first product and then simplify by using
i2 = 1).
We have also negatives: (a, b) = (a, b) and multiplicative inverses:

a
b
1
(a, b) =
,
.
a2 + b2 a2 + b2
734
Seeing complex numbers as ordered pairs also let us give C the structure of vector space
(over R). The norm of z = x + iy is defined as
p
|z| = x2 + y 2.
Then we have |z|2 = zz where z is the conjugate of z = x + iy and its defined as z = x iy.
Thus we can also characterize real numbers as those complex numbers z such that z = z.
Conjugation obeys the following rules:
z1 + z2 = z1 + z2
z1 z2 = z1 z2
z = z
The ordered-pair notation lets us visualize complex numbers as points in the plane, but then
we can also describe complex numbers with polar coordinates.
if z = a + ib is represented in polar coordinates as (r, t) we call r the modulus of z and t

its argument.
If r = a + ib = (r, t), then a = r sin t and b = r cos t. So we have the following expression,
called the polar form of z:
z = a + ib = r(cos t + i sin t)
Multiplication of complex numbers can be done in a very neat way using polar coordinates:
(r1 , t1 )(r2 , t2 ) = (r1 r2 , t1 + t2 ).
The latter expression proves de Moivres theorem.
Version: 26 Owner: drini Author(s): drini, mathwizard
160.3
complex conjugate
160.3.1
Definition
Scalar Complex Conjugate

Let z be a complex number with real part a and imaginary part b,
735
z = a + bi
Then the complex conjugate of z is
z = a bi
Complex conjugation represents a reflection about the real axis on the Argand diagram
representing a complex number.
Sometimes a star () is used instead of an overline, e.g. in physics you might see
int
dx = 1
where is the complex conjugate of a wave function.
160.3.2
Matrix Complex Conjugate
Let A = (aij ) be a n m matrix with complex entries. Then the complex conjugate of
A is the matrix A = (aij ). In particular, if v = (v 1 , . . . , v n ) is a complex row/ vector, then
v = (v 1 , . . . , v n ).
Hence, the matrix complex conjugate is what we would expect: the same matrix with all of
its scalar components conjugated.
160.3.3
Properties of the Complex Conjugate
Scalar Properties
If u, v are complex numbers, then
1. uv = (u)(v)
2. u + v = u + v
1
= u1
3. u
4. (u) = u
5. If v 6= 0, then ( uv ) = u/v
736
6. Let u = a + bi. Then uu = uu = a2 + b2 0 (the complex modulus).

7. If z is written in polar form as z = rei , then z = rei .
160.3.4
Matrix and Vector Properties
Let A be a matrix with complex entries, and let v be a complex row/column vector.
Then
T
1. AT = A
2. Av = Av, and vA = vA. (Here we assume that A and v are compatible size.)
Now assume further that A is a complex square matrix, then
1. trace A = (trace A)
2. det A = (det A)
1
= A1
3. A
160.4
complex number
The ring of complex numbers C is defined to be the quotient ring of the polynomial ring
R[X] in one variable over the reals by the principal ideal (X 2 + 1). For a, b R, the
equivalence class of a + bX in C is usually denoted a + bi, and one has i2 = 1.
The complex numbers form an algebraically closed field. There is a standard metric on the
complex numbers, defined by
p
d(a1 + b1 i, a2 + b2 i) := (a2 a1 )2 + (b2 b1 )2 .
737
160.5
examples of totally real fields
Here we present examples of totally real fields, totally imaginary fields and CM-fields.
Examples:
1. Let K = Q( d) with d a square-free positive integer. Then

K = {IdK , }
where IdK : K , C is the identity map (IdK (k) = k, for all k K), whereas
: K , C, (a + b d) = a b d
Since d R it follows that K is a totally real field.
2. Similarly, let K = Q( d) with d a square-free negative integer. Then

K = {IdK , }
where IdK : K , C is the identity map (IdK (k) = k, for all k K), whereas
: K , C, (a + b d) = a b d
Since d C and it is not in R, it follows that K is a totally imaginary field.

3. Let n , n > 3, be a primitive nth root of unity and let L = Q(n ), a cyclotomic extension.
Note that the only roots of unity that are real are 1. If : L , C is an embedding,
then (n ) must be a conjugate of n , i.e. one of
{na | a (Z/nZ) }
but those are all imaginary. Thus (L) * R. Hence L is a totally imaginary field.
4. In fact, L as in (3) is a CM-field. Indeed, the maximal real subfield of L is
F = Q(n + n1 )
Notice that the minimal polynomial of n over F is
X 2 (n + n1)X + 1
so we obtain L from F by adjoining the square root of the discriminant of this polynomial
which is
4
n2 + n2 2 = 2 cos( ) 2 < 0
n
and any other conjugate is
4a
n2a + n2a 2 = 2 cos(
) 2 < 0, a (Z/nZ)
n
Hence, L is a CM-field.
5. Notice that any quadratic imaginary number field is obviously a CM-field.
738
160.6
fundamental theorem of algebra
Let f : C C be a non-constant polynomial. Then there is z C with f (z) = 0.

In other words, C is algebraically closed.
160.7
imaginary
A complex number c C is called imaginary if its real part is 0.
All complex numbers may be written as c = a+bi where i is the imaginary unit i = 1 and
a, b R. An imaginary number can be written as c = bi, and because of this is sometimes
called a pure complex number.
The imaginary numbers are closed under addition but not under multiplication.
160.8
imaginary unit
The imaginary unit i := 1. Any imaginary number m may be written as m = bi,

b R. Any complex number c C may be written as c = a + bi, a, b R.
Note that there are two complex square roots of 1 (i.e. the two solutions to the equation
x2 + 1 = 0 in C), so there is always some ambiguity in which of these we choose to call
i and which we call i, though this has little bearing on any applications of complex
numbers.
Version: 5 Owner: mathcam Author(s): mathcam, drummond
160.9
indeterminate form
The expression
0
0
739
is known as the indeterminate form. The motivation for this name is that there are no
rules for comparing the value of 00 to the other real numbers. Note that, for example, 10 is
not indeterminate, since we can justifiably associate it with +, which does compare with
the rest of the real numbers (in particular, it is defined to be greater than all of them.)
Although
0
0
is called the indeterminate form, another indeterminate form is
for the same motivating reasons.

160.10
inequalities for real numbers
Suppose a and b are real numbers. Then we have four types of inequalities between a and b:
1. The inequality a b means that a b is positive.
3. The inequality a b means that a b is non-positive.
4. The inequality a b means that a b is non-negative.
The first two inequalities are also called a strict inequalities.
Properties
Suppose a and b are real numbers.
1. If a > b, then a < b. If a < b, then a > b.
2. If a b, then a b. If a b, then a b.
3. Suppose a0 , a1 , . . . is a sequence of real numbers converging to a, and suppose that
either ai < b or ai b for some real number b for each i. Then a b.
740
Examples
1. The triangle inequality. If a, b, c are real numbers, then

|a c| |b c| |a b| |a c| + |b c|.
2. Jordans Inequality
3. Youngs inequality
4. Bernoullis inequality
5. Nesbitts inequality
6. Shapiro inequality
Inequalities for sequences

1. Chebyshevs inequality
2. MacLaurins Inequality
3. Carlemans inequality
4. arithmetic-geometric-harmonic means inequality and the general means inequality
5. Jensens inequality
6. Minkowski inequality
7. rearrangement inequality
Geometric inequalities
1. Hadwiger-Finsler inequality
2. Weizenbocks Inequality
3. Brunn-Minkowski inequality
Matrix inequalities
1. Shurs inequality
Version: 4 Owner: mathcam Author(s): matte, mathcam
741
160.11
interval
Loosely speaking, an interval is a part of the real numbers that start at one number and
stops at another number. For instance, all numbers greater that 1 and smaller than 2 form
in interval. Another interval is formed by numbers greater or equal to 1 and smaller than
2. Thus, when talking about intervals, it is necessary to specify whether the endpoints are
part of the interval or not. There are then four types of intervals with three different names:
open, closed and half-open. Let us next define these precisely.
1. The open interval contains neither of the endpoints. If a < b are real numbers, then
the open interval of numbers between a and b is written as (a, b) and
(a, b) = {x R | a < x < b}.
2. The closed interval contains both endpoints. If a < b are real numbers, then the closed
interval is written as [a, b] and
[a, b] = {x R | a x b}.
3. A half-open interval contains only one of the endpoints. If a < b are real numbers, the
half-open intervals (a, b] and [a, b) are defined as
(a, b] = {x R | a < x b},
[a, b) = {x R | a x < b}.
Infinite intervals
If we allow either (or both) of a and b to be infinite, then we define
(a, )
[a, )
(, a)
(, a]
(, )
=
=
=
=
=
{x R | x > a},
{x R | x a},
{x R | x < a},
{x R | x a},
R.
Note on naming and notation

In [1, 2], an open interval is always called a segment, and a closed interval is called simply
an interval. However, the above naming with open, closed, and half-open interval seems to
be more widely adopted. See e.g. [3, 4, 2]. To distinguish between [a, b) and (a, b], the former
is sometimes called a right half-open interval and the latter a left half-open interval
[6]. The notation (a, b), [a, b), (a, b], [a, b] seems to be standard. However, some authors
(especially from the French school) use the notation ]a, b[, [a, b[, ]a, b], [a, b] as opposed to
(a, b), [a, b), (a, b], [a, b].
742
REFERENCES
1.
2.
3.
4.
W. Rudin, Principles of Mathematical Analysis, McGraw-Hill Inc., 1976.

W. Rudin, Real and complex analysis, 3rd ed., McGraw-Hill Inc., 1987.
R. Adams, Calculus, a complete course, Addison-Wesley Publishers Ltd., 3rd ed., 1995.
L. Rade,
B. Westergren, Mathematics Handbook for Science and Engineering, Studentlitteratur, 1995.

5. R.A. Silverman, Introductory Complex Analysis, Dover Publications, 1972.
6. S. Igari, Real analysis - With an introduction to Wavelet Theory, American Mathematical Society, 1998.
160.12
modulus of complex number
Definition Let z be a complex number, and let z be the complex conjugate of z. Then the
modulus, or absolute value, of z is defined as [1]
|z| = zz.
If we write z in polar form as z = rei with r 0, [0, 2), then |z| = r. It follows
that the modulus is a positive real number or zero. Alternatively, if a and b are the real
respectively imaginary parts of z, then
a2 + b2 ,
|z| =
(160.12.1)
which is simply the Euclidean norm of the point (a, b) R2 . It follows that the modulus
satisfies the triangle inequality, i.e., property 2 below. Other properties of the modulus are
as follows [1, 2]: If u, v C, then
1. |u| 0, with |u| = 0 if and only if u = 0.
2. |u + v| |u| + |v|.
3. |uv| = |u||v|.
4. For any n = 1, 2, . . ., we have that |un | = |u|n .
5. If v 6= 0, then |u/v| = |u|/|v|.
6. |u| = |u|.
7. |u| is a strictly increasing function of | Re{u}| and | Im{u}|.
Property 3 and 6 follows by writing u and v in polar form (see e.g. [2]). Property 5 follows
from property 3 and the identity 1/u = u/|u|2. Indeed,
|u/v| = |uv/|v|2| = |u||v/|v|2| = |u|/|v|.
743
REFERENCES
1. E. Kreyszig, Advanced Engineering Mathematics, John Wiley & Sons, 1993, 7th ed.
2. E. Weisstein, Eric W. Weissteins world of mathematics, entry on Complex Modulus.
160.13
proof of fundamental theorem of algebra
If f (x) C[x] let a be a root of f (x) in some extension of C. Let K be a Galois closure of
C(a) over R and set G = Gal(K/R). Let H be a Sylow 2-subgroup of G and let L = K H (the
fixed field of H in K). By the Fundamental Theorem of Galois Theory we have [L : R] =
[G : H], an odd number. We may write L = R(b) for some b L, so the minimal polynomial
mb,R (x) is irreducible over R and of odd degree. That degree must be 1, and hence L = R,
which means that G = H, a 2-group. Thus G1 = Gal(K/C) is also a 2-group. If G1 6= 1
choose G2 G1 such that [G1 : G2 ] = 2, and set M = K G2 , so that [M : C] = [G1 : G2 ] = 2.
But any polynomial of degree 2 over C has roots in C by the quadratic formula, so such a
field M cannot exist. This contradiction shows that G1 = 1. Hence K = C and a C,
completing the proof.
Version: 1 Owner: scanez Author(s): scanez
160.14
proof of the fundamental theorem of algebra
Let f : C C be a polynomial, and suppose f has no root in C. We will show f is constant.

Let g = f1 . Since f is never zero, g is defined and holomorphic on C (ie. it is entire).
Moreover, since f is a polynomial, |f (z)| as |z| , and so |g(z)| 0 as |z| .
Then there is some M such that |g(z)| < 1 whenever |z| > M, and g is continuous and so
bounded on the compact set {z C : |z| 6 M}.
So g is bounded and entire, and therefore by Liouvilles theorem g is constant. So f is
constant as required.
160.15
real and complex embeddings
Let L be a subfield of C.
744
Definition 9.
1. A real embedding of L is an injective field homomorphism
: L , R
2. A (non-real) complex embedding of L is an injective field homomorphism
: L , C
such that (L) * R.
3. We denote L the set of all embeddings, real and complex, of L in C (note that all of
them must fix Q, since they are field homomorphisms).
Note that if is a real embedding then
= , where denotes the complex conjugation
automorphism:
: C C, (a + bi) = a bi
On the other hand, if is a complex embedding, then is another complex embedding, so
the complex embeddings always come in pairs {, }.
Let K L be another subfield of C. Moreover, assume that [L : K] is finite (this is the

dimension of L as a vector space over K). We are interested in the embeddings of L that fix
K pointwise, i.e. embeddings : L , C such that
(k) = k,
k K
Theorem 15. For any embedding of K in C, there are exactly [L : K] embeddings of L

such that they extend . In other words, if is one of them, then
(k) = (k),
k K
Thus, by taking = IdK , there are exactly [L : K] embeddings of L which fix K pointwise.
Hence, by the theorem, we know that the order of L is [L : Q]. The number [L : Q] is
usually decomposed as
[L : Q] = r1 + 2r2
where r1 is the number of embeddings which are real, and 2r2 is the number of embeddings
which are complex (non-real). Notice that by the remark above this number is always even,
so r2 is an integer.
Remark: Let be an embedding of L in C. Since is injective, we have (L)
= L, so we
can regard as an automorphism of L. When L/Q is a Galois extension, we can prove that
L
= Gal(L/Q), and hence proving in a different way the fact that
| L |= [L : Q] =| Gal(L/Q) |
745
160.16
real number
There are several equivalent definitions of real number, all in common use. We give one
definition in detail and mention the other ones.
A Cauchy sequence of rational numbers is a sequence {xi }, i = 0, 1, 2, . . . of rational numbers
with the property that, for every rational number > 0, there exists a natural number N such
that, for all natural numbers n, m > N, the absolute value |xn xm | satisfies |xn xm | < .
The set R of real numbers is the set of equivalence classes of Cauchy sequences of rational
numbers, under the equivalence relation {xi } {yi} if the interleave sequence of the two
sequences is itself a Cauchy sequence. The real numbers form a ring, with addition and
multiplication defined by
{xi } + {yi } = {(xi + yi )}
{xi } {yi } = {(xi yi )}
There is an ordering relation on R, defined by {xi } 6 {yi} if either {xi } {yi} or there
exists a natural number N such that xn < yn for all n > N.
One can prove that the real numbers form an ordered field and that they satisfy the least upper bound proper
For every nonempty subset S R, if S has an upper bound then S has a lowest upper bound.
It is also true that every ordered field with the least upper bound property is isomorphic to
R.
Alternative definitions of the set of real numbers include:
1. Equivalence classes of decimal sequences (sequences consisting of natural numbers between 0 and 9, and a single decimal point), where two decimal sequences are equivalent
if they are identical, or if one has an infinite tail of 9s, the other has an infinite tail of
0s, and the leading portion of the first sequence is one lower than the leading portion
of the second.
2. Dedekind cuts of rational numbers (that is, subsets S of Q with the property that, if
a S and b < a, then b S).
3. The real numbers can also be defined as the unique (up to isomorphism) ordered field
satisfying the least upper bound property.
The real numbers are often described as the unique (up to isomorphism) complete ordered
field. While this fact is true, care must be taken when using it for the definition, because
the standard definition of complete is logically dependent on the notion of real number.
746
160.17
totally real and imaginary fields
For this entry, we follow the notation of the entry real and complex embeddings.
Let K be a subfield of the complex numbers, C, and let K be the set of all embeddings of
K in C.
Definition 10. With K as above:
1. K is a totally real field if all embeddings K are real embeddings.
2. K is a totally imaginary field if all embeddings K are (non-real) complex embeddings.
3. K is a CM-field or complex multiplication field if K is a totally imaginary
extension of a totally real field, i.e. K is the extension obtained from a totally real
field by adjoining the square root of a number all of whose conjugates are negative.
Note: A complex number is real if and only if
, the complex conjugate of , equals :
R=
Thus, a field K which is fixed pointwise by complex conjugation is totally real. Given a
field L, the subfield of L fixed pointwise by complex conjugation is called the maximal real
subfield of L.
For examples (of (1), (2) and (3)), see examples of totally real fields.
747
Chapter 161
12E05 Polynomials (irreducibility,
etc.)
161.1
Gausss Lemma I
There are a few different things that are sometimes called Gauss lemma. See also Gausss Lemma II.
Gausss Lemma I: If R is a UFD and f (x) and g(x) are both primitive in R[x], so is
f (x)g(x).
Proof: Suppose f (x)g(x) not primitive. We will show either f (x) or g(x) isnt as well.
f (x)g(x) not primitive means the gcd of the coefficients of f (x)g(x) is not a unit. Let p
be a prime factor of that gcd. We consider the image of R mod p - i.e. under the natural
ring homomorphism : R R/pR - and extend to the polynomial ring.
Since R is an integral domain, R/pR is an integral domain, so (R/pR)[x] is an integral
domain. And we have
f (x) g(x) = 0
Where f (x) is the image of f (x) in (R/pR)[x], similarly g(x). So f (x) = 0 or g(x) = 0. So
f (x) or g(x) is divisible by p, so one of them is not primitive.
Version: 4 Owner: bshanks Author(s): zugzwang, bshanks
748
161.2
Gausss Lemma II
Definition:A polynomial p(x) = an xn + . . . + a0 over a UFD R is said to be primitive if its

coefficients are not all divisible by any element of R other than a unit.
proposition (Gauss): Let R be a UFD and F its field of fractions. If a polynomial p R[x]
is reducible in F [x], then it is reducible in R[x].
Proof:We may assume that p is primitive. Suppose p = qr with q, r F [x]. There are unique
elements a, b F such that q/a and r/b are in R[x] and are primitive. But p/ab = (q/a)(r/b).
Since p is primitive, it follows from Gausss Lemma I that ab is a unit, and therefore so are
a and b. This completes the proof.
Remark:Another result with the same name is Gauss lemma on quadratic residues.
Version: 3 Owner: bshanks Author(s): Larry Hammick, pbruin, bshanks
161.3
discriminant
Summary. The discriminant of a given polynomial is a number, calculated from the coefficients of that polynomial, that vanishes if and only if that polynomial has one or more
multiple roots. Using the discriminant we can test for the presence of multiple roots, without
having to actually calculate the roots of the polynomial in question.
Definition. The discriminant of order n N is the polynomial, denoted here by (n) =
(n) (a1 , . . . , an ), characterized by the following relation:
(n)
(s1 , s2 , . . . , sn ) =
n Y
n
Y
(xi xj )2 ,
(161.3.1)
i=1 j=i+1
where
sk = sk (x1 , . . . , xn ),
k = 1, . . . , n
is the k th elementary symmetric polynomial.

The above relation is a defining one, because the right-hand side of (1) is, evidently, a (1)
symmetric (1) polynomial, and because the (1) algebra of symmetric polynomials is freely
generated by the basic symmetric polynomials, i.e. every symmetric polynomial arises in a
unique fashion as a polynomial of s1 , . . . , sn .
Proposition 1. Up to sign, the discriminant is given by the determinant of an 2n 1
square matrix with columns 1 to n 1 formed by shifting the sequence 1, a1 , . . . , an , and
749
columns n to 2n 1

1

a1

a2

(n)
= ..
.

0

0
formed by shifting the sequence n, (n 1)a1 , . . . , an1 , i.e.

0 ...
0
n
0
...
0
1 ...
0
(n 1) a1
n
...
0
a1 . . .
0
(n 2) a2 (n 1) a1 . . .
0

..
..
..
..
.. . .
..

.
.
.
.
.
.
.

0 . . . an1
0
0
. . . 2 an2
0 . . . an
0
0
. . . an1
(161.3.2)
Multiple root test. Let K be a field, let x denote an indeterminate, and let
p = xn + a1 xn1 + . . . + an1 x + an ,
ai K
be a monic polynomial over K. We define [p], the discriminant of p, by setting

[p] = (n) (a1 , . . . , an ) .
The discriminant of a non-monic polynomial is defined homogenizing the above definition,
i.e by setting
[ap] = a2n2 [p], a K.
Proposition 2. The discriminant vanishes if and only if p has multiple roots in its splitting field.
Proof. It isnt hard to show that a polynomial has multiple roots if and only if that polynomial
and its derivative share a common root. The desired conclusion now follows by observing
that the determinant formula in equation (161.3.2) gives the resolvent of a polynomial and
its derivative. This resolvent vanishes if and only if the polynomial in question has a multiple
root. Q.E.D.
Some Examples. Here are the first few discriminants.

(1) = 1
(2) = a21 4 a2
(3) = 18 a1 a2 a3 + a21 a22 4 a32 4 a31 a3 27a23
(4) = a21 a22 a23 4 a32 a23 4 a31 a33 + 18 a1 a2 a33 27 a43
4 a21 a32 a4 + 16 a42 a4 + 18 a31 a2 a3 a4 80 a1 a22 a3 a4
6 a21 a23 a4 + 144 a2a23 a4 27 a41 a24 + 144 a21 a2 a24
128 a22 a24 192 a1 a3 a24 + 256 a34
750
Here is the matrix used to calculate

1

a1

a2

(4)
= a3
a4

0

0
(4) :
0
1
a1
a2
a3
a4
0

0 4
0
0
0
0 3a1 4
0
0
1 2a2 3a1 4
0
a1 a3 2a2 3a1 4
a2 0
a3 2a2 3a1
a3 0
0
a3 2a2
a4 0
0
0
a3
161.4
polynomial ring
Let R be a ring. The polynomial ring over R in one variable X is the set R[X] of all sequences
in R with only finitely many nonzero terms. If (a0 , a1 , a2 , a3 , . . . ) is an element in R[X], with
an = 0 for all n > N, then we usually write this element as
N
X
n=0
an X n = a0 + a1 X + a2 X 2 + a3 X 3 + + aN X N
Addition and multiplication in R[X] are defined by

N
X
an X +
n=0
N
X
n=0
an X n
N
X
n=0
N
X
bn X
bn X n =
n=0
N
X
n=0
2N
X
n=0
(an + bn )X n
n
X
k=0
ak bnk
(161.4.1)
!
Xn
(161.4.2)
R[X] is a ring under these operations.

The polynomial ring over R in two variables X, Y is defined to be R[X, Y ] := R[X][Y ]. In
three variables, we have R[X, Y, Z] := R[X, Y ][Z] = R[X][Y ][Z], and in any finite number of
variables, we have inductively R[X1 , X2 , . . . , Xn ] := R[X1 , . . . , Xn1 ][Xn ] = R[X1 ][X2 ] [Xn ].
161.5
resolvent
Summary. The resolvent of two polynomials is a number, calculated from the coefficients
of those polynomials, that vanishes if and only if the two polynomials share a common root.
Conversely, the resolvent is non-zero if and only if the two polynomials are mutually prime.
751
Definition. Let K be a field and let

p(x) = a0 xn + a1 xn1 + . . . + an ,
q(x) = b0 xm + b1 xm1 + . . . + bm
be two polynomials over K of degree n and m, respectively. We define Res[p, q] K, the
resolvent of p(x) and q(x), to be the determinant of a n + m square matrix with columns 1
to m formed by shifted sequences consisting of the coefficients of p(x), and columns m + 1
to n + m formed by shifted sequences consisting of coefficients of q(x), i.e.

Res[p, q] =

a0 0
a1 a0
a2 a1
.. ..
. .
0 0
0 0
...
...
...
..
.
0
0
0
..
.
. . . an1
. . . an
b0 0
b1 b0
b2 b1
.. ..
. .
0 0
0 0
...
...
...
..
.
0
0
0
..
.
. . . bm1
. . . bm
Proposition 3. The resolvent of two polynomials is non-zero if and only if the polynomials
are relatively prime.
Proof. Let p(x), q(x) K[x] be two arbitrary polynomials of degree n and m, respectively.
The polynomials are relatively prime if and only if every polynomial including the unit
polynomial 1 can be formed as a linear combination of p(x) and q(x). Let
r(x) = c0 xm1 + c1 xm2 + . . . + cm1 ,
s(x) = d0 xn1 + b1 xn2 + . . . + dn1
be polynomials of degree m 1 and n 1, respectively. The coefficients of the linear
combination r(x)p(x) + s(x)q(x) are given by the following matrixvector multiplication:
a0 0
a1 a0
a2 a1
.. ..
. .
0 0
0 0
...
...
...
..
.
0
0
0
..
.
. . . an1
. . . an
b0 0
b1 b0
b2 b1
.. ..
. .
0 0
0 0
...
...
...
..
.
...
...
c0
c1
c2
..
.
0
cm1
.. d
0
.
d1
bm1
d2
bm
.
..
dn1
In consequence of the preceding remarks, p(x) and q(x) are relatively prime if and only if
the matrix above is non-singular, i.e. the resolvent is non-vanishing. Q.E.D.
752
Alternative Characterization. The following proposition describes the resolvent of two

polynomials in terms of the polynomials roots. Indeed this property uniquely characterizes
the resolvent, as can be seen by carefully studying the appended proof.
Proposition 4. Let p(x), q(x) be as above and let x1 , . . . , xn and y1 , . . . , ym be their respective
roots in the algebraic closure of K. Then,
Res[p, q] =
n
am
0 b0
m
n Y
Y
(xi yj )
i=1 j=1
Proof. The multilinearity property of

1

A1

A2
m n
Res[p, q] = a0 b0 ..
.

0

0
determinants implies that

0
1
A1
..
.
0
0
...
...
...
..
.
0
0
0
..
.
. . . An1
. . . An
1 0
B1 1
B2 B1
..
..
.
.
0 0
0 0
...
...
...
..
.
0
0
0
..
.
. . . Bm1
. . . Bm
where
ai
,
a0
bj
Bj = ,
b0
Ai =
i = 1, . . . n,
j = 1, . . . m.
It therefore suffices to prove the proposition for monic polynomials. Without loss of generality we can also assume that the roots in question are algebraically independent.
Thus, let X1 , . . . , Xn , Y1 , . . . , Ym be indeterminates and set
F (X1 , . . . , Xn , Y1 , . . . , Ym ) =
n Y
m
Y
i=1 j=1
(Xi Yj )
P (x) = (x X1 ) . . . (x Xn ),
Q(x) = (x Y1 ) . . . (x Ym ),
G(X1 , . . . , Xn , Y1 , . . . , Ym ) = Res[P, Q]
Now by Proposition 1, G vanishes if we replace any of the Y1 , . . . , Ym by any of X1 , . . . , Xn
and hence F divides G.
Next, consider the main diagonal of the matrix whose determinant gives Res[P, Q]. The first
m entries of the diagonal are equal to 1, and the next n entries are equal to (1)m Y1 . . . Ym .
It follows that the expansion of G contains a term of the form (1)mn Y1n . . . Ymn . However,
the expansion of F contains exactly the same term, and therefore F = G. Q.E.D.
753
161.6
de Moivre identity
From the Euler relation

ei = cos + i sin
(161.6.1)
ein = (ei )n
cos n + i sin n = (cos + i sin )n .
(161.6.2)
(161.6.3)
it follows that
This is called de Moivres formula, and besides being generally useful, its a convenient
way to remember double- (and higher-multiple-) angle formulas. For example,
cos 2 + i sin 2 = (cos + i sin )2 = cos2 + 2i sin cos sin2 .
(161.6.4)
Since the imaginary parts and real parts on each side must be equal, we must have
cos 2 = cos2 sin2
sin 2 = 2 sin cos .
and
(161.6.5)
(161.6.6)
Version: 3 Owner: Daume Author(s): Larry Hammick, drummond
161.7
monic
A monic polynomial is a polynomial with a leading coefficient of 1. That is, if Pn (x) is a

polynomial of order n in the variable x, then the coefficient of xn in Pn (x) is 1.
For example, x5 + 3x3 10x2 + 1 is a monic 5th-order polynomial. 3x2 + 2z 5 is a 2nd-order
polynomial which is not monic.
161.8
Wedderburns Theorem
A finite division ring is a field.

One of the many consequences of this theorem is that for a finite projective plane, Desargues
Theorem implies Pappus theorem.
754
161.9
proof of Wedderburns theorem
We want to show that the multiplication operation in a finite division ring is abelian.
We denote the centralizer in D of an element x as CD (x).
Lemma. The centralizer is a subring.
0 and 1 are obviously elements of CD (x) and if y and z are, then x(y) = (xy) = (yx) =
(y)x, x(y + z) = xy + xz = yx + zx = (y + z)x and x(yz) = (xy)z = (yx)z = y(xz) =
y(zx) = (yz)x, so y, y + z, and yz are also elements of CD (x). Moreover, for y 6= 0, xy = yx
implies y 1 x = xy 1 , so y 1 is also an element of CD (x).
Now we consider the center of D which well call Z(D). This is also a subring and is in fact
the intersection of all centralizers.
Z(D) =
CD (x)
xD
Z(D) is an abelian subring of D and is thus a field. We can consider D and every CD (x)
as vector spaces over Z(D) of dimension n and nx respectively. Since D can be viewed as a
module over CD (x) we find that nx divides n. If we put q := |Z(D)|, we see that q > 2 since
{0, 1} Z(D), and that |CD (x)| = q nx and |D| = q n .
It suffices to show that n = 1 to prove that multiplication is abelian, since then |Z(D)| = |D|
and so Z(D) = D.
We now consider D := D {0} and apply the conjugacy class formula.
|D | = |Z(D )| +
[D : CD (x)]
which gives
qn 1 = q 1 +
.
X qn 1
q nx 1
x
By Zsigmondys theorem, there exists a prime p that divides q n 1 but doesnt divide any of
the q m 1 for 0 < m < n, except in 2 pathological cases which will be dealt with separately.
755
Such a prime p will divide q n 1 and each of the

can only happen if n = 1.
q n 1
.
q nx 1
So it will also divide q 1 which
We now deal with the 2 exceptional cases. In the first case n equals 2, which would mean D is
a vector space of dimension 2 over Z(D), with elements of the form a+b where a, b Z(D).
Such elements clearly commute so D = Z(D) which contradicts our assumption that
2.
Pn =
6
In the second case, n = 6 and q = 2. The class equation reduces to 64 1 = 2 1 + x 22nx1
1
where nx divides 6. This gives 62 = 63x + 21y + 9z with x, y and z integers, which is
impossible since the right hand side is divisible by 3 and the left hand side isnt.
161.10
second proof of Wedderburns theorem
We can prove Wedderburn Theorem,without using Zsigismondys theorem on the conjugacy class formula
of the first proof; let Gn set of n-th roots of unity and Pn set of n-th primitive roots of unity
and d (q) the d-th cyclotomic polynomial.
It results
n (q) =
)
Q
Pn (q
p(q) = q n 1 =
Gn (q
) =
d|n
d (q)
n (q) Z[q] ,its unitarian and n (q) | q n 1

n (q) |
q n 1
q d 1
with d | n, d < n
by conjugacy class formula,we have:

qn 1 = q 1 +
X qn 1
q nx 1
x
by last two previous properties,it results:

n (q) | q n 1 , n (q) |
qn 1
n (q) | q 1
q nx 1
because n (q) divides the left and each addend of

conjugacy class formula.
By third property
q n 1
x q nx 1
of the right member of the
q > 1 , n (x) Z[x] n (q) Z |n (q)| | q 1 |n (q)| 6 q 1

If,for n > 1,we have |n (q)| > q 1,then n = 1 and the theorem is proved.
756
We know that
|n (q)| =
Pn
|q | , with q C
by the triangle inequality in C

|q | > ||q| ||| = |q 1|
as is a primitive root of unity, besides
|q | = |q 1| = 1
but
n > 1 6= 1
therefore,we have
|q | > |q 1| = q 1 |n (q)| > q 1
Version: 7 Owner: ottocolori Author(s): ottocolori
161.11
finite field
A finite field is a field F which has finitely many elements. We will present some basic facts
about finite fields.
161.11.1
Size of a finite field
Theorem 9. A finite field F has positive characteristic p > 0. The cardinality of F is pn

where n := [F : Fp ] and Fp denotes the prime subfield of F .
T he characteristic of F is positive because otherwise the additive subgroup generated by
1 would be an infinite subset of F . Accordingly, the prime subfield Fp of F is isomorphic to
the field Z/pZ of integers mod p. Since the field F is an ndimensional vector space over
Fp , it is setisomorphic to Fpn and thus has cardinality pn .
161.11.2
Existence of finite fields
Now that we know every finite field has pn elements, it is natural to ask which of these
actually arise as cardinalities of finite fields. It turns out that for each prime p and each
natural number n, there is essentially exactly one finite field of size pn .
757
Lemma 2. In any field F with m elements, the equation xm = x is satisfied by all elements
x of F .
he result is clearly true if x = 0. We may therefore assume x is not zero. By definition of
field, the set F of nonzero elements of F forms a group under multiplication. This set has
m 1 elements, and by Lagranges theorem xm1 = 1 for any x F , so xm = x follows.
T
Theorem 10. For each prime p > 0 and each natural number n N, there exists a finite
field of cardinality pn , and any two such are isomorphic.
F or n = 1, the finite field Fp := Z/pZ has p elements, and any two such are isomorphic by
the map sending 1 to 1.
n
In general, the polynomial f (X) := X p X Fp [X] has derivative 1 and thus is separable
over Fp . We claim that the splitting field F of this polynomial is a finite field of size pn .
The field F certainly contains the set S of roots of f (X). However, the set S is closed under
the field operations, so S is itself a field. Since splitting fields are minimal by definition, the
containment S F means that S = F . Finally, S has pn elements since f (X) is separable,
so F is a field of size pn .
For the uniqueness part, any other field F 0 of size pn contains a subfield isomorphic to Fp .
n
Moreover, F 0 equals the splitting field of the polynomial X p X over Fp , since by lemma 2
every element of F 0 is a root of this polynomial, and all pn possible roots of the polynomial
are accounted for in this way. By the uniqueness of splitting fields up to isomorphism, the
two fields F and F 0 are isomorphic.
Note: The proof of Theorem 10 given here, while standard because of its efficiency, relies
on more abstract algebra than is strictly necessary. The reader may find a more concrete
presentation of this and many other results about finite fields in [1, Ch. 7].
Corollary 7. Every finite field F is a normal extension of its prime subfield Fp .
T his follows from the fact that field extensions obtained from splitting fields are normal
extensions.
161.11.3
Units in a finite field
Henceforth, in light of Theorem 10, we will write Fq for the unique (up to isomorphism)
finite field of cardinality q = pn . A fundamental step in the investigation of finite fields is
the observation that their multiplicative groups are cyclic:
Theorem 11. Let Fq denote the multiplicative group of nonzero elements of the finite field
Fq . Then Fq is a cyclic group.
758
e begin with the formula
(d) = k,
(161.11.1)
d|k
where denotes the Euler totient function. It is proved as follows. For every divisor d of
k, the cyclic group Ck of size k has exactly one cyclic subgroup Cd of size d. Let Gd be the
subset of Cd consisting of elements of Cd which have the maximum possible order of d. Since
every element of Ck has maximal order in the subgroup of Ck that it generates, we see that
the sets Gd partition the set Ck , so that
X
|Gd | = |Ck | = k.
d|k
The identity (161.11.1) then follows from the observation that the cyclic subgroup Cd has
exactly (d) elements of maximal order d.
We now prove the theorem. Let k = q 1, and for each divisor d of k, let (d) be the
number of elements of Fq of order d. We claim that (d) is either zero or (d). Indeed, if
it is nonzero, then let x Fq be an element of order d, and let Gx be the subgroup of Fq
generated by x. Then Gx has size d and every element of Gx is a root of the polynomial
xd 1. But this polynomial cannot have more than d roots in a field, so every root of xd 1
must be an element of Gx . In particular, every element of order d must be in Gx already,
and we see that Gx only has (d) elements of order d.
We have proved that (d) 6 (d) for all d | q 1. If (q 1) were 0, then we would have
X
X
(d) <
(d) = q 1,
d|q1
d|q1
which is impossible since the first sum must equal q 1 (because every element of Fq has
order equal to some divisor d of q 1).
A more constructive proof of Theorem 11, which actually exhibits a generator for the cyclic
group, may be found in [2, Ch. 16].
161.11.4
Automorphisms of a finite field

m
Observe that, since a splitting field for X q X over Fp contains all the roots of X q
X, it follows that the field Fqm contains a subfield isomorphic to Fq . We will show later
(Theorem 13) that this is the only way that extensions of finite fields can arise. For now we
will construct the Galois group of the field extension Fqm /Fq , which is normal by Corollary 7.
Theorem 12. The Galois group of the field extension Fqm /Fq is a cyclic group of size m
generated by the q th power Frobenius map Frobq .
he fact that Frobq is an element of ((|Q)Fqm /Fq ), and that (Frobq )m = Frobqm is the
identity on Fqm , is obvious. Since the extension Fqm /Fq is normal and of degree m, the
T
759
group ((|Q)Fqm /Fq ) must have size m, and we will be done if we can show that (Frobq )k ,
for k = 0, 1, . . . , m 1, are distinct elements of ((|Q)Fqm /Fq ).
It is enough to show that none of (Frobq )k , for k = 1, 2, . . . , m 1, is the identity map on
Fqm , for then we will have shown that Frobq is of order exactly equal to m. But, if any such
k
(Frobq )k were the identity map, then the polynomial X q X would have q m distinct roots
in Fqm , which is impossible in a field since q k < q m .
We can now use the Galois correspondence between subgroups of the Galois group and
intermediate fields of a field extension to immediately classify all the intermediate fields in
the extension Fqm /Fq .
Theorem 13. The field extension Fqm /Fq contains exactly one intermediate field isomorphic
to Fqd , for each divisor d of m, and no others. In particular, the subfields of Fq are precisely
the fields Fpd for d | n.
y the fundamental theorem of Galois theory, each intermediate field of Fqm /Fq corresponds to a subgroup of ((|Q)Fqm /Fq ). The latter is a cyclic group of order m, so its
subgroups are exactly the cyclic groups generated by (Frobq )d , one for each d | m. The
d
fixed field of (Frobq )d is the set of roots of X q X, which forms a subfield of Fqm isomorphic to Fqd , so the result follows.
B
The subfields of Fq can be obtained by applying the above considerations to the extension
Fpn /Fq .
REFERENCES
1. Kenneth Ireland & Michael Rosen, A Classical Introduction to Modern Number Theory, Second
Edition, SpringerVerlag, 1990 (GTM 84).
2. Ian Stewart, Galois Theory, Second Edition, Chapman & Hall, 1989.
161.12
Frobenius automorphism
Let F be a field of characteristic p > 0. Then for any a, b F ,

(a + b)p = ap + bp ,
(ab)p = ap bp .
Thus the map
:F F
a
7 ap
760
is an injective field automorphism, called the Frobenius automorphism, or simply the Frobenius map on F .
Note: This morphism is sometimes also called the small Frobenius to distinguish it from
the map a 7 aq , with q = pn . This map is then also referred to as the big Frobenius or
the power Frobenius map.
Version: 6 Owner: mathcam Author(s): bbukh, mathcam, sleske, yark, bshanks
161.13
characteristic
Let (F, +, ) be a field. The characteristic Char(F ) of F is commonly given by one of three
equivalent definitions:
if there is some positive integer n for which the result of adding 1 to itself n times
yields 0, then the characteristic of the field is the least such n. Otherwise, Char(F ) is
defined to be 0.
if f : Z F is defined by f (n) = n.1 then Char(F ) is the least strictly positive
generator of ker(F ) if ker(F ) 6= {0}; otherwise it is 0.
if K is the prime subfield of F , then Char(F ) is the size of K if this is finite, and 0
otherwise.
Note that the first two definitions also apply to arbitrary rings, and not just to fields.
The characteristic of a field (or more generally an integral domain) is always prime. For if
the characteristic of F were composite, say mn for m, n > 1, then in particular mn would
equal zero. Then either m would be zero or n would be zero, so the characteristic of F would
actually be smaller than mn, contradicting the minimality condition.
161.14
characterization of field
Let R 6= 0 be a commutative ring with identity.

Proposition 6. The ring R (as above) is a field if and only if R has exactly two ideals:
(0), R.
761
Suppose R is a field and let A be a non-zero ideal of R. Then there exists r A R

with r =
6 0. Since R is a field and r is a non-zero element, there exists s R such that
()
sr =1R
Moreover, A is an ideal, r A, s S, so s r = 1 A. Hence A = R. We have proved that
the only ideals of R are (0) and R as desired.
() Suppose the ring R has only two ideals, namely (0), R. Let a R be a non-zero
element; we would like to prove the existence of a multiplicative inverse for a in R. Define
the following set:
A = (a) = {r R | r = s a, for some s R}
This is clearly an ideal, the ideal generated by the element a. Moreover, this ideal is not the
zero ideal because a A and a was assumed to be non-zero. Thus, since there are only two
ideals, we conclude A = R. Therefore 1 A = R so there exists an element s R such that
sa = 1R
Hence for all non-zero a R, a has a multiplicative inverse in R, so R is, in fact, a field.
161.15
example of an infinite field of finite characteristic
Let K be a field of finite characteristic, such as Fp . Then the ring of polynomials, K[X], is an
integral domain. We may therefore construct its quotient field, namely the field of rational
polynomials. This is an example of an infinite field with finite characteristic.
161.16
examples of fields
Fields are typically sets of numbers in which the arithmetic operations of addition, subtraction, multiplication and division are defined. Another important class of fields are the
function fields defined on geometric objects such as algebraic varieties or Riemann surfaces.
The following is a list of examples of fields.
The rational numbers Q, the real numbers R and the complex numbers C are the most
familiar examples of fields.
762
Slightly more exotic, the hyperreal numbers and the surreal numbers are fields containing infinitesimal and infinitely large numbers. (The surreal numbers arent a field
in the strict sense since they form a proper class and not a set.)
The algebraic numbers form a field; this is the algebraic closure of Q. In general, every
field has an (essentially unique) algebraic closure.
The computable complex numbers (those whose digit sequence can be produced by a
Turing machine) form a field. The definable complex numbers (those which can be precisely specified using a logical formula) form a field containing the computable numbers;
arguably, this field contains all the numbers we can ever talk about. It is countable.
The so-called
some algebraic numbers.
For
number fields
arise from Q by adding
3
3
3
instance
Q(
2) = {u + v 2 | u, v Q} and Q( 2, i) = {u + vi + w 2 + xi 2 +
3
3
y 4 + zi 4 | u, v, w, x, y, z Q}.
If p is a prime number, then the p-adic rationals form a field Qp .
If p is a prime number, then the integers modulo p form a finite field with p elements,
typically denoted by Fp . More generally, for every prime power pn there is one and
only one finite field Fpn with pn elements.
If K is a field, we can form the field of rational functions over K, denoted by K(X).
It consists of quotients of polynomials in X with coefficients in K.
If V is a variety over the field K, then the function field of V , denoted by K(V ),
consists of all quotients of polynomial functions defined on V .
If U is a domain (= connected open set) in C, then the set of all meromorphic functions
on U is a field. More generally, the meromorphic functions on any Riemann surface
form a field.
The field of formal Laurent series over the field K in the variable X consists of all
expressions of the form
X
aj X j
j=M
where M is some integer and the coefficients aj come from K.
More generally, whenever R is an integral domain, we can form its field of fraction, a
field whose elements are the fractions of elements of R.
Version: 4 Owner: AxelBoldt Author(s): AxelBoldt, yark
763
161.17
field
A field is a commutative ring F with identity such that:

1 6= 0
If a F , and a 6= 0, then there exists b F with a b = 1.
161.18
field homomorphism
Let F and K be fields.

Definition 11. A field homomorphism is a function : F K such that:
1. (a + b) = (a) + (b) for all a, b F
2. (a b) = (a) (b) for all a, b F
3. (1) = 1,
(0) = 0
If is injective and surjective, then we say that is a field isomorphism.

Lemma 3. Let : F K be a field homomorphism. Then is injective.
I ndeed, if is a field homomorphism, in particular it is a ring homomorphism. Note that
the kernel of a ring homomorphism is an ideal and a field F only has two ideals, namely
{0}, F . Moreover, by the definition of field homomorphism, (1) = 1, hence 1 is not in the
kernel of the map, so the kernel must be equal to {0}.
Remark: For this reason the terms field homomorphism and field monomorphism are
synonymous. Also note that if is a field monomorphism, then
(F )
= F,
(F ) K
so there is a copy of F in K. In other words, if

: F K
is a field homomorphism then there exist a subfield H of K such that H
= F . Conversely,
suppose there exists H K with H isomorphic to F . Then there is an isomorphism
: F H
764
and we also have the inclusion homomorphism

: H , K
Thus the composition
: F K
is a field homomorphism.
161.19
prime subfield
The prime subfield of a field F is the intersection of all subfields of F , or equivalently the
smallest subfield of F . It can also be constructed by taking the quotient field of the additive
subgroup of F generated by the multiplicative identity 1.
If F has characteristic p where p > 0 is a prime, then the prime subfield of F is isomorphic
to the field Z/pZ of integers mod p. When F has characteristic zero, the prime subfield of
F is isomorphic to the field Q of rational numbers.
765
Chapter 162
12F05 Algebraic extensions
162.1
a finite extension of fields is an algebraic extension
Theorem 16. Let L/K be a finite field extension. Then L/K is an algebraic extension.
I n order to prove that L/K is an algebraic extension, we need to show that any element
L is algebraic, i.e., there exists a non-zero polynomial p(x) K[x] such that p() = 0.
Recall that L/K is a finite extension of fields, by definition, it means that L is a finite dimensional
vector space over K. Let the dimension be
[L : K] = n
for some n N.
Consider the following set of vectors in L:
S = {1, , 2, 3 , . . . , n }
Note that the cardinality of S is n + 1, one more than the dimension of the vector space.
Therefore, the elements of S must be linearly dependent over K, otherwise the dimension of
S would be greater than n. Hence, there exist ki K, 0 6 i 6 n, not all zero, such that
k0 + k1 + k2 2 + k3 3 + . . . + kn n = 0
Thus, if we define
p(X) = k0 + k1 X + k2 X 2 + k3 X 3 + . . . + kn X n
then p(X) K[X] and p() = 0, as desired.
NOTE: The converse is not true. See the entry algebraic extension for details.
766
162.2
algebraic closure
An extension field L of a field K is an algebraic closure of K if L is algebraically closed and

every element of L is algebraic over K.
Any two algebraic closures of K are isomorphic as fields, but not necessarily canonically.
162.3
algebraic extension
Definition 12. Let L/K be an extension of fields. L/K is said to be an algebraic extension of fields if every element of L is algebraic over K.
Examples:
1. Let L = Q( 2). The extension L/Q is an algebraic extension. Indeed, any element
L is of the form
=q+t 2L
for some q, t Q. Then L is a root of
X 2 2qX + q 2 2t2 = 0
2. The field extension R/Q is not an algebraic extension. For example, R is a
transcendental number over Q (see pi).
3. Let K be a field and denote by K the algebraic closure of K. Then the extension K/K
is algebraic.
4. In general, a finite extension of fields is an algebraic extension. However, the converse
is not true. The extension Q/Q is far from finite.
162.4
algebraically closed
A field K is algebraically closed if every non-constant polynomial in K[X] has a root in K.

767
162.5
algebraically dependent
Let L be an algebraic field extension of a field K. Two elements , of L are algebraically

dependent if there exists a non-zero polynomial f (x, y) K[x, y] such that f (, ) = 0. If
no such polynomial exists, and are said to be algebraically independent.
162.6
existence of the minimal polynomial
Proposition 7. Let K/L be a finite extension of fields and let k K. There exists a unique
polynomial mk (x) L[x] such that:
1. mk (x) is a monic polynomial;
2. mk (k) = 0;
3. If p(x) L[x] is another polynomial such that p(k) = 0, then mk (x) divides p(x).
W
e start by defining the following map:

: L[x] K
(p(x)) = p(k)
Note that this map is clearly a ring homomorhism. For all p(x), q(x) L[x]:
(p(x) + q(x)) = p(k) + q(k) = (p(x)) + (q(x))
(p(x) q(x)) = p(k) q(k) = (p(x)) (q(x))
Thus, the kernel of is an ideal of L[x]:
Ker() = {p(x) L[x] | p(k) = 0}
Note that the kernel is a non-zero ideal. This fact relies on the fact that K/L is a finite
extension of fields, and therefore it is an algebraic extension, so every element of K is a root
of a non-zero polynomial p(x) with coefficients in L, this is, p(x) Ker().
Moreover, the ring of polynomials L[x] is a principal ideal domain (see example of PID).
Therefore, the kernel of is a principal ideal, generated by some polynomial m(x):
Ker() = (m(x))
768
Note that the only units in L[x] are the constant polynomials, hence if m0 (x) is another
generator of Ker() then
m0 (x) = l m(x),
l 6= 0,
lL
Let be the leading coefficient of m(x). We define mk (x) = 1 m(x), so that the leading
coefficient of mk is 1. Also note that by the previous remark, mk is the unique generator of
Ker() which is monic.
By construction, mk (k) = 0, since mk belongs to the kernel of , so it satisfies (2).
Finally, if p(x) is any polynomial such that p(k) = 0, then p(x) Ker(). Since mk generates
this ideal, we know that mk must divide p(x) (this is property (3)).
For the uniqueness, note that any polynomial satisfying (2) and (3) must be a generator of
Ker(), and, as we pointed out, there is a unique monic generator, namely mk (x).
162.7
finite extension
Let K an extension field of F . We say that K is a finite extension if [K : F ] is finite.

That is, K is a finite dimensional space over F .
An important result on finite extensions establishes that any finite extension is also an
algebraic extension.
162.8
minimal polynomial
Let K|L be a finite field extension. Then if K, then the minimal polynomial m(x) L[x]
is the unique, monic non-zero polynomial such that m() = 0 and any other polynomial
f L[x] with f () = 0 is divisible by m.
Given , a polynomial m is the minimal polynomial of if and only if m is monic, irreducible,
and m() = 0.
769
162.9
norm
Let K/F be a Galois extension, and let x K. The norm NK

F (x) of x is defined to be
the product of all the elements of the orbit of x under the group action of the Galois group
Gal(K/F ) on K; taken with multiplicities if K/F is a finite extension.
In the case where K/F is a finite extension, the norm of x can be defined to be the
determinant of the linear transformation [x] : K K given by [x](k) := xk, where K
is regarded as a vector space over F . This definition does not require that K/F be Galois,
or even that K be a fieldfor instance, it remains valid when K is a division ring (although
F does have to be a field, in order for determinant to be defined). Of course, for finite Galois
extensions K/F , this definition agrees with the previous one, and moreover the formula
Y
(x)
NK
F (x) :=
Gal(K/F )
holds.
The norm of x is always an element of F , since any element of Gal(K/F ) permutes the orbit
of x and thus fixes NK
F (x).
162.10
primitive element theorem
If F is a field of characteristic 0, and a and b are algebraic over F , then there is an element
c in F (a, b) such that F (a, b) = F (c).
162.11
splitting field
Let f F [x] be a polynomial over a field F . A splitting field for f is a field extension K of
F such that
1. f splits (factors into a product of linear factors) in K[x],
2. K is the smallest field with this property (any sub-extension field of K which satisfies
the first property is equal to K).
Theorem: Any polynomial over any field has a splitting field, and any two such splitting
fields are isomorphic. A splitting field is always a normal extension of the ground field.
770
162.12
the field extension R/Q is not finite
Theorem 17. Let L/K be a finite field extension. Then L/K is an algebraic extension.
Corollary 2. The extension of fields R/Q is not finite.
Proof of the Corollary] If the extension was finite, it would be an algebraic extension.
However, the extension R/Q is clearly not algebraic. For example, R is transcendental
over Q (see pi).
[
162.13
trace
Let K/F be a Galois extension, and let x K. The trace TrK

F (x) of x is defined to be
the sum of all the elements of the orbit of x under the group action of the Galois group
Gal(K/F ) on K; taken with multiplicities if K/F is a finite extension.
In the case where K/F is a finite extension,
TrK
F (x) :=
(x)
Gal(K/F )
The trace of x is always an element of F , since any element of Gal(K/F ) permutes the orbit
of x and thus fixes TrK
F (x).
The name trace derives from the fact that, when K/F is finite, the trace of x is simply
the trace of the linear transformation T : K K of vector spaces over F defined by
T (v) := xv.
771
Chapter 163
12F10 Separable extensions, Galois
theory
163.1
Abelian extension
Let K be a Galois extension of F . The extension is said to be an abelian extension if the

Galois group Gal(K/F ) is abelian.
Examples: Q( 2)/Q has Galois group Z/2Z so Q( 2)/Q is an abelian extension.

Let n be a primitive nth root of unity. Then Q(n )/Q has Galois group (Z/nZ) (the group
of units of Z/nZ) so Q(n )/Q is abelian.
163.2
Fundamental Theorem of Galois Theory
Let L/F be a finite dimensional Galois extension of fields, with Galois group G := Gal(L/F ).
There is a bijective, inclusionreversing correspondence between subgroups of G and extensions
of F contained in L, given by
K 7 Gal(L/K), for any field K with F K L.
H 7 LH (the fixed field of H in L), for any subgroup H G.
The extension LH /F is normal if and only if H is a normal subgroup of G, and in this case the
homomorphism G Gal(LH /F ) given by 7 |LH induces (via the first isomorphism theorem)
772
a natural identification Gal(LH /F ) = G/H between the Galois group of LH /F and the
quotient group G/H.
163.3
Galois closure
Let K be an extension field of F . A Galois closure of K/F is a field L K that is a

Galois extension of F and is minimal in that respect, i.e. no proper subfield of L containing
K is normal over F .
163.4
Galois conjugate
Let K be a field, and let L be a separable closure. For any x K, the Galois conjugates of x are the elements of L which are in the orbit of x under the group action of the
absolute Galois group GK on L.
163.5
Galois extension
A field extension is Galois if it is normal and separable.

163.6
Galois group
The Galois group Gal(K/F ) of a field extension K/F is the group of all field automorphisms
: K K of K which fix F (i.e., (x) = x for all x F ).
The group operation is given by composition: for two automorphisms 1 , 2 Gal(K/F ),
given by 1 : K K and 2 : K K, the product 1 2 Gal(K/F ) is the composite
of the two maps 1 2 : K K.
773
163.7
absolute Galois group
Let k be a field. The absolute Galois group Gk of k is the Galois group Gal(k sep /k) of the
field extension k sep /k, where k sep is the separable closure of k.
163.8
cyclic extension
A Galois extension K/F is said to be a cyclic extension if the Galois group Gal(K/F ) is
cyclic.
163.9
example of nonperfect field
Let F = Fp (t), where Fp is the field with p elements. The splitting field E of the irreducible
polynomial f = xp t is not separable over F . Indeed, if is an element of E such that
p = t, we have
xp t = xp p = (x )p ,
which shows that f has one root of multiplicity p.
Version: 3 Owner: n3o Author(s): n3o
163.10
fixed field
Let K/F be a field extension with Galois group G = Gal(K/F ), and let H be a subgroup
of G. The fixed field of H in K is the set
K H := {x K | (x) = x for all H}.
The set K H is always a field, and F K H K.
163.11
infinite Galois theory
Let L/F be a Galois extension, not necessarily finite dimensional.

774
163.11.1
Topology on the Galois group
Recall that the Galois group G := ((|Q)L/F ) of L/F is the group of all field automorphisms
: L L that restrict to the identity map on F , under the group operation of composition.
In the case where the extension L/F is infinite dimensional, the group G comes equipped
with a natural topology, which plays a key role in the statement of the Galois correspondence.
We define a subset U of G to be open if, for each U, there exists an intermediate field
K L such that
The degree [K : F ] is finite,
If 0 is another element of G, and the restrictions |K and 0 |K are equal, then 0 U.
The resulting collection of open sets forms a topology on G, called the Krull topology, and
G is a topological group under the Krull topology.
163.11.2
Inverse limit structure
In this section we exhibit the group G as a projective limit of an inverse system of finite groups.
This construction shows that the Galois group G is actually a profinite group.
Let A denote the set of finite normal extensions K of F which are contained in L. The set
A is a partially ordered set under the inclusion relation. Form the inverse limit
Y
:= lim ((|Q)K/F )
((|Q)K/F )
KA
Q
consisting, as usual, of the set of all (K )
K ((|Q)K/F ) such that K 0 |K = K
0
0
for all K, K A with K K . We make into a topological space by putting the
discrete topology on each finite
Q set ((|Q)K/F ) and giving the subspace topology induced
by the Q
product topology on K ((|Q)K/F ). The group is a closed subset of the compact
group K ((|Q)K/F ), and is therefore compact.
Let
: G
((|Q)K/F )
KA
Q
be the group homomorphism which sends an element G to the element (K ) of K ((|Q)K/F )
whose Kth coordinate is the automorphism |K ((|Q)K/F ). Then the function has
image equal to and in fact is a homeomorphism between G and . Since is profinite, it
follows that G is profinite as well.
775
163.11.3
The Galois correspondence
Theorem 14 (Galois correspondence for infinite extensions). Let G, L, F be as

before. For every closed subgroup H of G, let LH denote the fixed field of H. The correspondence
K 7 ((|Q)L/K),
defined for all intermediate field extensions F K L, is an inclusion reversing bijection

between the set of all intermediate extensions K and the set of all closed subgroups of G. Its
inverse is the correspondence
H 7 LH ,
defined for all closed subgroups H of G. The extension K/F is normal if and only if
((|Q)L/K) is a normal subgroup of G, and in this case the restriction map
G ((|Q)K/F )
has kernel ((|Q)L/K).
Theorem 15 (Galois correspondence for finite subextensions). Let G, L, F be as
before.
Every open subgroup H G is closed and has finite index in G.
If H G is an open subgroup, then the field extension LH /F is finite.
For every intermediate field K with [K : F ] finite, the Galois group ((|Q)L/K) is an
open subgroup of G.
163.12
normal closure
Let K be an extension field of F . A normal closure of K/F is a field L K that is a

normal extension of F and is minimal in that respect, i.e. no proper subfield of L containing
K is normal over F . If K is an algebraic extension of F , then a normal closure for K/F
exists and is unique up to isomorphism.
163.13
normal extension
A field extension K/F is normal if every irreducible polynomial f F [x] which has at least
one root in K splits (factors into a product of linear factors) in K[x].
776
An extension K/F of finite degree is normal if and only if there exists a polynomial p F [x]
such that K is the splitting field for p over F .
163.14
perfect field
A perfect field is a field K such that any algebraic extension field L/K is separable over K.
All fields of characteristic 0 are perfect, so in particular the fields R, C and Q are perfect.
163.15
radical extension
A radical tower is a field extension L/F which has a filtration

F = L0 L1 Ln = L
where for each i, 0 6 i < n, there exists an element i Li+1 and a natural number ni such
that Li+1 = Li (i ) and ini Li .
A radical extension is a field extension K/F for which there exists a radical tower L/F with
L K. The notion of radical extension coincides with the informal concept of solving for
the roots of a polynomial by radicals, in the sense that a polynomial over K is solvable by
radicals if and only if its splitting field is a radical extension of F .
163.16
separable
An irreducible polynomial f F [x] with coefficients in a field F is separable if f factors into

distinct linear factors over a splitting field K of f .
A polynomial g with coefficients in F is separable if each irreducible factor of g in F [x] is a
separable polynomial.
A field extension K/F is separable if, for each a K, the minimal polynomial of a over
F is separable. When F has characteristic zero, every extension is separable; examples
of inseparable extensions include the quotient field K(u)[t]/(tp u) over the field K(u) of
rational functions in one variable, where K has characteristic p > 0.
777
163.17
separable closure
Let K be a field and let L be an algebraic closure of K. The separable closure of K inside L
is the compositum of all finite separable extensions of K contained in L (that is to say, the
smallest subfield of L that contains every finite separable extension of K).
778
Chapter 164
12F20 Transcendental extensions
164.1
transcendence degree
The transcendence degree of a set S over a field K, denoted TS , is the size of the maximal
subset S 0 of S such that all the elements of S 0 are algebraically independent.
The transcendence degree of a field extension L over K is the transcendence degree of
the minimal subset of L needed to generate L over K.
Heuristically speaking, the transcendece degree of a finite set S is obtained by taking the
number of elements in the set, subtracting the number of algebraic elements in that set, and
then subtracting the number of algebraic relations between distinct pairs of elements in S.
Example 4 (Computing the Transcendence

Degree).
The
set
S
=
{
7, , 2, e} has TS 6 2
since there are four elements, 7 is algebraic, and the polynomial f (x, y) = x2 y gives an
algebraic dependence between and 2 (i.e. (, 2 ) is a root of f ), giving TS 6 4 1 1 =
2. If we assume the conjecture that e and are algebraically independent, then no more
dependencies can exist, and we can conclude that, in fact, TS = 2.
779
Chapter 165
12F99 Miscellaneous
165.1
composite field
Let {K }, J, be a collection of subfields of a field L. The composite field of the collection

is the smallest subfield of L that contains all the fields K .
The notation K1 K2 (resp., K1 K2 . . . Kn ) is often used to denote the composite field of two
(resp., finitely many) fields.
165.2
extension field
We say that a field K is an extension of F if F is a subfield of K.

We usually denote K being an extension of F by: F K, F K, K/F , or
K
F
If K is an extension of F , we can regard K as a vector space over F . The dimension of
this space (which could possibly be infinite) is denoted [K : F ], and called the degree of the
extension 1 .
1
The term degree reflects the fact that, in the more general setting of Dedekind domains and scheme
theoretic algebraic curves, the degree of an extension of function fields equals the algebraic degree of the
polynomial defining the projection map of the underlying curves.
780
One of the classic theorems on extensions states that if F K L, then

[L : F ] = [L : K][K : F ]
(in other words, degrees are multiplicative in towers).
Version: 4 Owner: drini Author(s): drini, djao
781
Chapter 166
12J15 Ordered fields
166.1
ordered field
An ordered field is an ordered ring which is a field.

782
Chapter 167
167.1
absolute value
Let R be an ordered ring and let a R. The absolute value of a is defined to be the function
| | : R R given by
(
a
if a > 0,
|a| :=
a
otherwise.
In particular, the usual absolute value | | on the field R of real numbers is defined in this
manner.
Absolute value has a different meaning in the case of complex
numbers: for a complex
p
2
number z C, the absolute value |z| of z is defined to be x + y 2 , where z = x + yi and
x, y R are real.
All absolute value functions satisfy the defining properties of a valuation, including:
|a| 0 for all a R, with equality if and only if a = 0
|ab| = |a| |b| for all a, b R
|a + b| |a| + |b| for all a, b R (triangle inequality)
However, in general they are not literally valuations, because valuations are required to be
real valued. In the case of R and C, the absolute value is a valuation, and it induces a metric
in the usual way, with distance function defined by d(x, y) := |x y|.
783
167.2
associates
Let a, b be elements of a ring such that a = bu where u is a unit. Then we say that a and b
are associates.
The associate property induces an equivalence relation on the ring.
On an integral domain: a and b are associates if and only if (a) = (b) where (x) denotes the
principal ideal generated by x.
167.3
cancellation ring
A ring R is a cancellation ring if for all a, b R, if a b = 0 then either a = 0 or b = 0.

167.4
comaximal
Let R be a ring. Two ideals I, J of R are comaximal if I + J = R (i.e. if 1 = a + b for some

a I, b J).
Two distinct maximal ideals are comaximal.
167.5
every prime ideal is radical
Let R be a commutative ring and let P be a prime ideal of R.

Proposition 8. Every prime ideal P of R is a radical ideal, i.e.
P = Rad(P)
R
ecall that P ( R is a prime ideal if and only if for any a, b R

a b P a P or b P
784
Also, recall that

Rad(P) = {r R | n N such that r n P}
Obviously, we have P Rad(P) (just take n = 1), so it remains to show the reverse
inclusion.
Suppose r Rad(P), so there exists some n N such that r n P. We want to prove that
r must be an element of the prime ideal P. For this, we use induction on n to prove the
following proposition:
For all n N, for all r R, r n P r P.
Case n = 1: This is clear, r P r P.
Case n Case n + 1: Suppose we have proved the proposition for the case n, so our
induction hypothesis is
r R, r n P r P
and suppose r n+1 P. Then
r r n = r n+1 P
and since P is a prime ideal we have

r P or r n P
Thus we conclude, either directly or using the induction hypothesis, that r P as desired.
167.6
module
Let R be a ring with identity. A left module M over R is a set with two binary operations,
+ : M M M and : R M M, such that
1. (a + b) + c = a + (b + c) for all a, b, c M
2. a + b = b + a for all a, b M
3. There exists an element 0 M such that a + 0 = a for all a M
4. For any a M, there exists an element b M such that a + b = 0
5. r1 (r2 m) = (r1 r2 ) m for all r1 , r2 R and m M
6. 1 m = m for all m M
7. r (m + n) = (r m) + (r n) for all r R and m, n M
785
8. (r1 + r2 ) m = (r1 m) + (r2 m) for all r1 , r2 R and m M

A right module is defined analogously, except that the function goes from M R to M. If R
is commutative, there is an equivalences of category between the category of left Rmodules
and the category of right Rmodules.
167.7
radical of an ideal
Let R be a commutative ring. For any ideal I of R, the radical of I, written Rad(I), is the
set
{a R : an Ifor some integer n > 0}
The radical of an ideal I is always an ideal of R.
If I = Rad(I), then I is called a radical ideal.
Every prime ideal is a radical ideal. If I is a radical ideal, the quotient ring R/I is a ring
with no nonzero nilpotent elements.
167.8
ring
A ring is a set R together with two binary operations, denoted + : R R R and

: R R R, such that
1. (a + b) + c = a + (b + c) and (a b) c = a (b c) for all a, b, c R (associative law)
2. a + b = b + a for all a, b R (commutative law)
3. There exists an element 0 R such that a + 0 = a for all a R (additive identity)
4. For all a R, there exists b R such that a + b = 0 (additive inverse)
5. a(b+c) = (ab)+(ac) and (a+b)c = (ac)+(bc) for all a, b, c R (distributive law)
Equivalently, a ring is an abelian group (R, +) together with a second binary operation
such that is associative and distributes over +.
786
We say R has a multiplicative identity if there exists an element 1 R such that a1 = 1a = a

for all a R. We say R is commutative if a b = b a for all a, b R.
Every element a in a ring has a unique additive inverse, denoted a. The subtraction
operator in a ring is defined by the equation a b := a + (b).
167.9
subring
Let (A, +, ) a ring. A subring is a subset S of A with the operations + and of A restricted
to S and such that S is a ring by itself.
Since the restricted operation inherits the associativity, commutativity of +, etc, usually
only closure has to be checked.
A subring is called an ideal if whenever s S and a A, it happens that sa S. On
ring theory, ideals are far more important than subrings (since they play the analogue to
normal subgroups for groups).
Example:
Consider the ring (Z, +, ). Then (2Z, +, ) is a subring since the sum or product of two
even numbers is again an even number.
167.10
tensor product
Summary. The tensor product is a formal bilinear multiplication of two modules or

vector spaces. In essence, it permits us to replace bilinear maps from two such objects
by an equivalent linear map from the tensor product of the two objects. The origin of this
operation lies in classic differential geometry and physics, which had need of multiply indexed
geometric objects such as the first and second fundamental forms, and the stress tensor
see Tensor Product (Classical).
Definition (Standard). Let R be a commutative ring, and let A, B be R-modules. There
exists an R-module A B, called the tensor product of A and B over R, together with a
canonical bilinear homomorphism
: A B A B,
distinguished, up to isomorphism, by the following universal property. Every bilinear R787
module homomorphism
: A B C,
lifts to a unique R-module homomorphism
: A B C,
such that
b)
(a, b) = (a
for all a A, b B. Diagramatically:

AB
AB
C
The tensor product A B can be constructed by taking the free R-module generated by all
formal symbols
a b, a A, b B,
and quotienting by the obvious bilinear relations:
(a1 + a2 ) b = a1 b + a2 b,
a (b1 + b2 ) = a b1 + a b2 ,
r(a b) = (ra) b = a (rb)
a1 , a2 A, b B
a A, b1 , b2 B
a A, b B, r R
Definition (Categorical). Using the language of categories, all of the above can be expressed quite simply by stating that for all R-modules M, the functor () M is left-adjoint
to the functor Hom(M, ).
788
Chapter 168
13-XX Commutative rings and
algebras
168.1
commutative ring
Let (X, +, ) be a ring. Since (X, +) is required to be an abelian group, the operation +
necessarily is commutative. This needs not to happen for . Rings where is commutative,
that is, x y = y x for all x, y R, are called commutative rings.
789
Chapter 169
13A02 Graded rings
169.1
graded ring
Let G be an abelian group. A G-graded ring R is a direct sum R = gG Rg indexed with

the property that Rg Rh Rgh .
790
Chapter 170
13A05 Divisibility
170.1
Eisenstein criterion
theorem:
Let f be a primitive polynomial over a unique factorization domain R, say
f (x) = a0 + a1 x + a2 x2 + . . . + an xn .
If R has an irreducible element p such that
p | am
1mn
p2 - an
p - a0
then f is irreducible.
proof:
Suppose
f = (b0 + . . . + bs xs )(c0 + . . . + ct xt )
where s > 0 and t > 0. Since a0 = b0 c0 , we know that p divides one but not both of b0 and
c0 ; suppose p | c0 . By hypothesis, not all the cm are divisible by p; let k be the smallest
index such that p - ck . We have ak = b0 ck + b1 ck1 + . . . + bk c0 . We also have p | k, and p
divides every summand except one on the right side, which yields a contradiction. QED
Version: 5 Owner: Daume Author(s): Daume, Larry Hammick, nerdy2
791
Chapter 171
13A10 Radical theory
171.1
Hilberts Nullstellensatz
Let K be an algebraically closed field, and let I be an ideal in K[x1 , . . . , xn ], the polynomial ring
in n indeterminates.
Define V (I), the zero set of I, by
V (I) = {(a1 , . . . , an ) K n | f (a1 , . . . , an ) = 0for allf I}
Weak Nullstellensatz:
If V (I) = , then I = K[x1 , . . . , xn ]. In other words, the zero set of any proper ideal of
K[x1 , . . . , xn ] is nonempty.
Hilberts (Strong) Nullstellensatz:
Suppose f K[x1 , . . . , xn ] satisfies f (a1 , . . . , an ) = 0 for every (a1 , . . . , an ) V (I). Then
f r I for some integer r > 0.
In the language of algebraic geometry, the latter result is equivalent to the statement that
Rad(I) = I(V (I)).
171.2
nilradical
Let R be a commutative ring. An element x R is said to be nilpotent if xn = 0 for

some positive integer n. The set of all nilpotent elements of R is an ideal of R, called the
nilradical of R and denoted Nil(R). The nilradical is so named because it is the radical of
the zero ideal.
792
The nilradical of R equals the prime radical of R, although proving that the two are equivalent
requires the axiom of choice.
171.3
radical of an integer
Given a natural number n, let n = p1 1 pk k be its unique factorization as a product of

prime powers. Define the radical of n, denoted rad(n), to be the product p1 pk . This is
the square-free part of the integer, and thus the radical of a square-free number is itself.
793
Chapter 172
13A15 Ideals; multiplicative ideal
theory
172.1
contracted ideal
Let f : A B be a ring homomorphism. Let b be an ideal in B. Then it is easy to show

that the inverse image of b, that is f 1 (b), is an ideal in A, and we call it a contracted ideal.
A common notation for the contracted ideal in this case is bc .
172.2
existence of maximal ideals
Let R 6= 0 be a commutative ring with identity. Is there a maximal ideal in R? This simple
property turns out to be dependent on the axiom of choice. Assuming Zorns lemma, which
is equivalent to the axiom of choice, we are able to prove the following:
Proposition 9. Every ring R (as above) has a maximal ideal.
L
et be the partially ordered set
ordered by inclusion.
= {A | A is an ideal of R,
A 6= R}
Since 0 R, the ideal generated by 0, (0) , because (0) 6= R. Hence is non-empty.

In order to apply Zorns lemma we need to prove that every chain in has an upper bound
that belongs to . Let {A } be a chain of ideals in , so for all indices , we have
A A
or
794
A A
We claim that B, defined by

B=
is such an upper bound.
B is an ideal. Indeed, let a, b B, so there exist , such that a A , b A . Since

these two ideals are in a totally ordered chain we have
A A
or
A A
Without loss of generality, we assume A A . Then both a, b A , and A is an

ideal of the ring R. Thus a + b A B.
Similarly, let r R and b B. As above, there exists such that b A . Since A is
an ideal we have
r b A B
Therefore, B is an ideal.
B 6= R, otherwise 1 would belong to B, so there would be an such that 1 A so
A = R. But this is impossible because we assumed A for all indices .
Therefore B . Hence every chain in has an upper bound in and we can apply Zorns
lemma to deduce the existence of M, a maximal element (with respect to inclusion) in .
By definition of the set , M must be a maximal ideal in R.
NOTE: Assuming that the axiom of choice does NOT hold, mathematicians have shown
the existence of commutative rings (with 1) that have no maximal ideals.
172.3
extended ideal
Let f : A B be a ring map. We can look at the ideal generated by the image of a, which
is called an extended ideal and is denoted by ae .
It is not true in general that if a is an ideal in A, the image of a under f will be an ideal in
B. (For example, consider the embedding f : Z Q. The image of the ideal (2) Z is not
an ideal in Q, since the only ideals in Q are {0} and all of Q.)
795
172.4
fractional ideal
172.4.1
Basics
Let A be an integral domain with field of fractions K. Then K is an Amodule, and we

define a fractional ideal of A to be a submodule of K which is finitely generated as an
Amodule.
The product of two fractional ideals a and b of A is defined to be the submodule of K
generated by all the products x y K, for x a and y b. This product is denoted
a b, and it is always a fractional ideal of A as well. Note that, if A itself is considered as
a fractional ideal of A, then a A = a. Accordingly, the set of fractional ideals is always a
monoid under this product operation, with identity element A.
We say that a fractional ideal a is invertible if there exists a fractional ideal a0 such that
a a0 = A. It can be shown that if a is invertible, then its inverse must be a0 = (A : a), the
annihilator 1 of a in A.
172.4.2
Fractional ideals in Dedekind domains
We now suppose that A is a Dedekind domain. In this case, every nonzero fractional ideal
is invertible, and consequently the nonzero fractional ideals in A form a group under ideal
multiplication, called the ideal group of A.
The unique factorization of ideals theorem states that every fractional ideal in A factors
uniquely into a finite product of prime ideals of A and their (fractional ideal) inverses. It
follows that the ideal group of A is freely generated as an abelian group by the nonzero prime
ideals of A.
A fractional ideal of A is said to be principal if it is generated as an Amodule by a single
element. The set of nonzero principal fractional ideals is a subgroup of the ideal group of A,
and the quotient group of the ideal group of A by the subgroup of principal fractional ideals
is nothing other than the ideal class group of A.
1
In general, for any fractional ideals a and b, the annihilator of b in a is the fractional ideal (a : b)
consisting of all x K such that x b a.
796
172.5
homogeneous ideal
An ideal generated by homogenous elements is said to be homogeneous. The most natural

example is the polynomial ring K[x1 , x2 , . . . , xn ], where K is a field, which is said to be
homogeneous if it is generated by polynomials, each of which is homogeneous.
172.6
ideal
Let R be a ring. A left ideal (resp., right ideal) I of R is a nonempty subset I R such that:
a + b I for all a, b I
r a I (resp. a r I) for all a I and r R
A 2sided ideal is a left ideal I which is also a right ideal. If R is a commutative ring, then
these three notions of ideal are equivalent.
172.7
maximal ideal
Let R be a ring with identity. A proper left (right, two-sided) ideal m ( R is said to be
maximal if m is not a proper subset of any other proper left (right, two-sided) ideal of R.
One can prove:
A left ideal m is maximal if and only if R/m is a simple left R-module.
A right ideal m is maximal if and only if R/m is a simple right R-module.
A two-sided ideal m is maximal if and only if R/m is a simple ring.
All maximal ideals are prime ideals. If R is commutative, an ideal m R is maximal if and
only if the quotient ring R/m is a field.
797
172.8
principal ideal
Let R be a ring and let a R. The principal left (resp. right, 2-sided) ideal of a is the
smallest left (resp. right, 2-sided) ideal of R containing the element a.
When R is a commutative ring, the principal ideal of a is denoted (a).
172.9
the set of prime ideals of a commutative ring

with identity
the set of prime ideals of a commutative ring with identity

notation
1: spectrum of ring
2: Spec(R)
Version: 4 Owner: bwebste Author(s): yark, apmxi
798
Chapter 173
13A50 Actions of groups on
commutative rings; invariant theory
173.1
Schwarz (1975) theorem
theorem:
Let be a compact Lie group acting on V . Let u1 , . . . , us be a Hilbert basis
for the -invariant polynomials P() (see Hilbert-Weyl theorem). Let f E().
Then there exists a smooth germ h Es (the ring of C germs Rs R) such
that f (x) = h(u1 (x), . . . , us (x)). [GVL]
proof:
The proof is shown on page 58 of [GVL].
theorem: (as stated by Gerald W. Schwarz)
Let G be a compact Lie group acting orthogonally on Rn , let 1 , . . . , k be
generators of P(Rn )G (the set G-invariant polynomials on Rn ), and let =
(1 , . . . , k ) : Rn Rk . Then E(Rk ) = E(Rn )G . [SG]
proof:
The proof is shown in the following publication [SG].
799
REFERENCES
[GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.
[SG] Schwarz, W. Gerald: Smooth Functions Invariant Under the Action of aCompact Lie Group,
Topology Vol. 14, pp. 63-68, 1975.
173.2
invariant polynomial
An invariant polynomial is a polynomial P that is invariant under a (compact) Lie group

acting on a vector space V . Therefore P is -invariant polynomial if P (x) = P (x) for all
and x V .
REFERENCES
[GSS] Golubsitsky, Matin. Stewart, Ian. Schaeffer, G. David: Singularities and Groups in Bifurcation Theory (Volume II). Springer-Verlag, New York, 1988.
800
Chapter 174
13A99 Miscellaneous
174.1
Lagranges identity
Let R be a commutative ring, and let a1 , . . . , an , b1 , . . . , bn be arbitrary elements in R. Then

!
! n
!2
n
n
X
X
X
X
b2k
(ak bi ai bk )2 .
a2k
=
ak bk
k=1
k=1
k=1
1k<in
Proof:
The ring R where we take xi , yi (i = 1, . . . , n) from is commutative, so we can apply the
binomial formula.We start out with
!2
n
n
X
X
X
=
(x2i yi2 ) +
2xi yj xj yi
(174.1.1)
xi yi
i=1
i=1
i,j=1,i6=j
Using the binomial theorem, we see that

(xi yj xj yi )2 = x2i yj2 2xi xj yi yj + yj2 x2i .
So we get
n
X
i=1
xi yi
!2
n
X
i,j=1,i6=j
(xi yj xj yi)2 =
=
n
X
i=1
n
X

x2i yi2 +
n
X
i=1
801
x2i
i,j=1,i6=j
n
X
i=1
yi2

x2i yj2 + x2j yi2 (174.1.2)
(174.1.3)
Note that changing the roles of i and j in xi yj xj yi, we get

xj yi xi yj = (xi yj xj yi ).
But this doesnt matter when we square. So we can rewrite the last equation to
!2
! n
!
n
n
X
X
X
X
x2i
yi2 .
xi yi + 2
(xi yj xj yi )2 =
i=1
i=1
1i<jn
(174.1.4)
i=1
This is equivalent to the stated identity.

174.2
characteristic
The concept of characteristic that exists for integral domains can be generalized for cyclic rings.
By extending the existing definition in this manner, though, characteristics would no longer
have to be 0 or prime.
The characteristic of an infinite cyclic ring is 0. Let R be an infinite cyclic ring and r be
a generator of the additive group of R. If z Z such that zr = 0R , then z = 0. Since no
positive integer c exists such that cr = 0R , it follows that R has characteristic 0.
A finite ring is cyclic if and only if its order and characteristic are equal. If R is a cyclic
ring and r is a generator of the additive group of R, then |r| = |R|. Since, for every s R,
|s| divides |R|, then it follows that char R = |R|. Conversely, if R is a finite ring such that
char R = |R|, then the exponent of the additive group of R is also equal to |R|. Thus, there
exists t R such that |t| = |R|. Since hti is a subgroup of the additive group of R and
|hti| = |t| = |R|, it follows that R is a cyclic ring.
174.3
cyclic ring
A ring is a cyclic ring if its additive group is cyclic.

Every cyclic ring is commutative under multiplication. For if R is a cyclic ring, r is a
generator of the additive group of R, and s, t Z, then there exist a, b Z such that s = ar
and t = br. As a result, st = (ar)(br) = (ab)r 2 = (ba)r 2 = (br)(ar) = ts. (Note the disguised
use of the distributive property.)
A result of the fundamental theorem of finite abelian groups is that every ring with square-free
order is a cyclic ring.
802
If n is a positive integer, then, up to isomorphism, there are exactly (n) cyclic rings of order
n, where refers to the tau function. Also, if a cyclic ring has order n, then it has exactly
(n) subrings. This result mainly follows from Lagranges theorem and its converse. Note
that the converse of Lagranges theorem does not hold in general, but it does hold for finite
cyclic groups.
Every subring of a cyclic ring is a cyclic ring. Moreover, every subring of a cyclic ring is an
ideal.
R is a finite cyclic ring of order n if and only if there exists a positive divisor k of n such
that R is isomorphic to kZkn . R is an infinite cyclic ring that has no zero divisors if and
only if there exists a positive integer k such that R is isomorphic to kZ. Finally, R is an
infinite cyclic ring that has zero divisors if and only if it is isomorphic to the following subset
of M2 x 2 (Z):

c c
c c

c Z

Thus, any infinite cyclic ring that has zero divisors is a zero ring.
174.4
proof of Euler four-square identity
Using Lagranges identity, we have

!2
4
X
P4
P
P4
2
2
2
=
xk yk
k=1 xk
k=1 yk
1k<i4 (xk yi xi yk ) .
(174.4.1)
k=1
We group the six squares into 3 groups of two squares and rewrite:
(x1 y2 x2 y1 )2 + (x3 y4 x4 y3 )2
= ((x1 y2 x2 y1 ) + (x3 y4 x4 y3 ))2 2((x1 y2 x2 y1 )(x3 y4 x4 y3 ))
(x1 y3 x3 y1 )2 + (x2 y4 x4 y2 )2
= ((x1 y3 x3 y1 ) (x2 y4 x4 y2 ))2 + 2(x1 y3 x3 y1 )(x2 y4 x4 y2 )
(x1 y4 x4 y1 )2 + (x2 y3 x3 y2 )2
=
((x1 y4 x4 y1 ) + (x2 y3 x3 y2 ))2 2(x1 y4 x4 y1 )(x2 y3 x3 y2 ).
(174.4.2)
2((x1 y2 x2 y1 )(x3 y4 x4 y3 )) +2(x1 y3 x3 y1 )(x2 y4 x4 y2 )

2(x1 y4 x4 y1 )(x2 y3 x3 y2 )
=0
(174.4.6)
(174.4.3)
(174.4.4)
(174.4.5)
Using
803
we get
X
(xk yi xi yk )2
1k<i4
+(x3 y4 x4 y3 ))2(174.4.7)
= ((x1 y2 x2 y1 )
+((x1 y3 x3 y1 ) (x2 y4 x4 y2 ))2
+((x1 y4 x4 y1 ) + (x2 y3 x3 y2 ))2
(174.4.8)
by adding equations 174.4.2-174.4.4. We put the result of equation 174.4.7 into 174.4.1 and
get
4
X
xk yk
k=1
4
X
k=1
x2k
4
X
k=1
!2
yk2
(174.4.9)
((x1 y2 x2 y1 + x3 y4 x4 y3 )2
(x1 y3 x3 y1 + x4 y2 x2 y4 )2
(x1 y4 x4 y1 + x2 y3 x3 y2 )2
which is equivalent to the claimed identity.

174.5
proof that every subring of a cyclic ring is a

cyclic ring
Following is a proof that every subring of a cyclic ring is a cyclic ring.

Let R be a cyclic ring and S be a subring of R. Then the additive group of S is a subgroup
of the additive group of R. By definition of cyclic group, the additive group of R is cyclic.
Thus, the additive group of S is cyclic. It follows that S is a cyclic ring.
174.6
proof that every subring of a cyclic ring is an

ideal
Following is a proof that every subring of a cyclic ring is an ideal.

Let R be a cyclic ring and S be a subring of R. Then R and S are both cyclic rings. Let r
be a generator of the additive group of R and s be a generator of the additive group of S.
Since s S and S is a subring of R, then s R. Thus, there exists z Z with s = zr.
804
Let t R and u S. Since u S and S is a subring of R, then u R. Since multiplication

is commutative in a cyclic ring, then tu = ut. Since t R, then there exists a Z with
t = ar. Since u S, then there exists b Z with u = bs.
Since R is a ring, then r 2 R. Thus, there exists k Z with r 2 = kr. Since tu = (ar)(bs) =
(ar)[b(zr)] = (abz)r 2 = (abz)(kr) = (abkz)r = (abk)(zr) = (abk)s S, it follows that S is
an ideal of R.
174.7
zero ring
A ring is a zero ring if the product of any two elements is the additive identity (or zero).
Zero rings are commutative under multiplication. For if Z is a zero ring, 0Z is its additive
identity, and x, y Z, then xy = 0Z = yx.
Every zero ring is a nilpotent ring. For if Z is a zero ring, then Z 2 = {0Z }.
Since every subring of a ring must contain its zero element, then every subring of a ring is
an ideal, and a zero ring has no proper prime ideals.
The simplest zero ring is Z1 = {0}.
Zero rings exist in abundance. They can be constructed from any ring. If R is a ring, then

r r
r r

r R

considered as a subring of M2 x 2 (R) (with standard matrix addition and multiplication) is a

zero ring. Moreover, the cardinality of this subset of M2 x 2 (R) is the same as that of R.
Every finite zero ring can be written as a direct product of cyclic rings, which must also be
zero rings themselves. This is proven from the fundamental theorem of finite abelian
Qm agroups.
i
Thus, if p1 , . . . , pm are distinct primes, aQ
1 , . . . , am are positive integers, and n =
i=1 pi , then
m
the number of zero rings of order n is i=1 P (ai), where P denotes the partition function.
805
Chapter 175
13B02 Extension theory
175.1
algebraic
Let B be a ring with a subring A. An element x B is algebraic over A if there exist

elements a1 , . . . , an A, with an 6= 0, such that
an xn + an1 xn1 + + a1 x + a0 = 0.
An element x B is transcendental over A if it is not algebraic.
The ring B is algebraic over A if every element of B is algebraic over A.
175.2
module-finite
Let S be a ring with subring R.

We say that S is module-finite over R if S is finitely generated as an R-module.
We say that S is ring-finite over R if S = R[v1 , . . . , vn ] for some v1 , . . . , vn S.
Note that module-finite implies ring-finite, but the converse is false.
If L is ring-finite over K, with L, K fields, then L is a finite extension of K.
806
Chapter 176
13B05 Galois theory
176.1
algebraic
Let K be an extension field of F and let a K.

If there is a nonzero polynomial f F [x] such that f (a) = 0 (in K) we say that a is
algebraic over F .
For example, 2 R is algebraic over Q since

there is a nonzero polynomial with rational
2
coefficients, namely f (x) = x 2, such that f ( 2) = 0.
807
Chapter 177
13B21 Integral dependence
177.1
integral
Let B be a ring with a subring A. An element x B is integral over A if there exist elements
a1 , . . . , an1 A such that
xn + an1 xn1 + + a1 x + a0 = 0.
The ring B is integral over A if every element of B is integral over A.
808
Chapter 178
13B22 Integral closure of rings and
ideals ; integrally closed rings, related
rings (Japanese, etc.)
178.1
integral closure
Let B be a ring with a subring A. The integral closure of A in B is the set A0 B consisting
of all elements of B which are integral over A.
It is a theorem that the integral closure of A in B is itself a ring. In the special case where
A = Z, the integral closure A0 of Z is often called the ring of integers in B.
809
Chapter 179
13B30 Quotients and localization
179.1
fraction field
Given an integral domain R, the fraction field of R is the localization S 1 R of R with respect
to the multiplicative set S = R \ {0}. It is always a field.
179.2
localization
Let R be a commutative ring and let S be a nonempty multiplicative subset of R. The

localization of R at S is the ring S 1 R whose elements are equivalence classes of R S
under the equivalence relation (a, s) (b, t) if r(at bs) = 0 for some r S. Addition and
multiplication in S 1 R are defined by:
(a, s) + (b, t) = (at + bs, st)
(a, s) (b, t) = (a b, s t)
The equivalence class of (a, s) in S 1 R is usually denoted a/s. For a R, the localization of
R at the minimal multiplicative set containing a is written as Ra . When S is the complement
of a prime ideal p in R, the localization of R at S is written Rp.
810
179.3
multiplicative set
Let A be a set on which a multiplication operation : A A A has been defined. A

multiplicative subset of A is a subset S A with the property that s t S for every s, t S.
811
Chapter 180
13C10 Projective and free modules
and ideals
180.1
example of free module
Clearly from the definition, Zn is free as a Z-module for any positive integer n.
A more interesting example is the following:
Theorem 1. The set of rational numbers Q do not form a free Z-module.
irst note that any two elements in Q are Z-linearly dependent. If x = pq11 and y = pq22 ,
then q1 p2 x q2 p1 y = 0. Since basis elements must be linearly independent, this shows that
any basis must consist of only one element, say pq , with p and q relatively prime, and without
loss of generality, q > 0. The Z-span of { pq } is the set of rational numbers of the form np
. I
q
np
1
1
claim that q+1 is not in the set. If it were, then we would have q = q+1 for some n, but this
q
implies that np = q+1
which has no solutions for n, p Z ,q Z+ , giving us a contradiction.
F
812
Chapter 181
13C12 Torsion modules and ideals
181.1
torsion element
Let R be a principal ideal domain. We call an element m of the R-module a torsion element
if there exists a non-zero R such that m = 0. The set is denoted by tor(M).
tor(M) is not empty since 0 tor(M). Let m, n tor(M), so there exist , 6= 0 R such
that 0 = m = n. Since (m n) = m n = 0, 6= 0, this implies
that m n M. So tor(M) are a subgroup of M. Clearly m tor(M) for any non-zero
R. This shows that tor(M) is a submodule of M, the torsion submodule of M.
813
Chapter 182
13C15 Dimension theory, depth,
related rings (catenary, etc.)
182.1
Krulls principal ideal theorem
Let R be a Noetherian ring, and P be a prime ideal minimal over a principal ideal (x). Then
the height of P , that is, the dimension of RP , is less than 1. More generally, if P is a minimal
prime of an ideal generated by n elements, the height of P is less than n.
814
Chapter 183
13C99 Miscellaneous
183.1
Artin-Rees theorem
Let A be a Noetherian ring, a an ideal, E a finitely generated module, and F a submodule.

Then there exists an integer s > 1 such that for all integers n > 1 we have
\
\
an E
F = ans (as E
F ).
183.2
Nakayamas lemma
Let R be a commutative ring with 1. Let M be a finitely generated R-module. If there exists
an ideal a of R contained in the Jacobson radical and such that aM = M, then M = 0.
183.3
prime ideal
Let R be a ring. A two-sided proper ideal p of a ring R is called a prime ideal if the following
equivalent conditions are met:
1. If I and J are left ideals and the product of ideals IJ satisfies IJ p, then I p or
J p.
815
2. If I and J are right ideals with IJ p, then I p or J p.

3. If I and J are two-sided ideals with IJ p, then I p or J p.
4. If x and y are elements of R with xRy p, then x p or y p.
5. R/p is a prime ring.
When R is commutative with identity, an ideal p of R is prime if and only if for any a, b R,
if a b p then either a p or b p.
One also has in this case that an ideal p R is prime if and only if the quotient ring R/p
is an integral domain.
183.4
proof of Nakayamas lemma
Let X = {x1 , x2 , . . . , xn } be a minimal set of generators for M, in the sense that M is not
generated by any proper subset of X.
P
Elements of aM can be written as linear combinations
ai xi , where ai a.
Suppose that |X| > 0. Since M = aM, we can express x1 as a such a linear combination:
X
x1 =
ai xi .
Moving the term involving a1 to the left, we have
X
(1 a1 )x1 =
ai xi .
i>1
But a1 J(R), so 1 a1 is invertible, say with inverse b. Therefore,

X
x1 =
bai xi .
i>1
But this means that x1 is redundant as a generator of M, and so M is generated by the

subset {x2 , x3 , . . . , xn }. This contradicts the minimality of X.
We conclude that |X| = 0 and therefore M = 0.
816
183.5
proof of Nakayamas lemma
(This proof was taken from [1].)

If M were not zero, it would have a simple quotient, isomorphic to R/m for some maximal ideal
m of R. Then we would have mM 6= M, so that aM 6= M as a m.
REFERENCES
1. Serre, J.-P. Local Algebra. Springer-Verlag, 2000.
183.6
support
The support Supp(M) of a module M over a ring R is the set of all prime ideals p R such
that the localization Mp is nonzero.
The maximal support Suppm (M) of a module M over a ring R is the set of all maximal ideals
m R such that Mm is nonzero.
817
Chapter 184
13E05 Noetherian rings and
modules
184.1
Hilbert basis theorem
Let R be a right (left) Noetherian ring. Then R[x] is also right (left) noetherian.
184.2
Noetherian module
A module over R is said to be noetherian if the following equivalent conditions hold:

1. Every submodule of R is finitely generated over R
2. The ascending chain condition holds on submodules
3. Every nonempty family of submodules has a maximal element.
818
184.3
proof of Hilbert basis theorem
Let R be a Noetherian ring and let f (x) = an xn + an1 xn1 + . . . + a1 x + a0 R[x] with
an 6= 0. Then call an the initial coefficient of f .
Let I be an ideal in R[x]. We will show I is finitely generated, so that R[x] is noetherian. Now
let f0 be a polynomial of least degree in I, and if f0 , f1 , . . . , fk have been chosen then choose
fk+1 from I r(f0 , f1 , . . . , fk ) of minimal degree. Continuing inductively gives a sequence (fk )
of elements of I.
Let ak be the initial coefficient of fk , and consider the ideal J = (a1 , a2 , a3 , . . .) of initial
coefficients. Since R is Noetherian, J = (a0 , . . . , aN ) for some N.
P
Then I = (f0 , f1 , . . . , fN ). For if not then fN +1 I r (f0 , f1 , . . . , fn ), and aN +1 = N
k=0 uk ak
PN
k
for some u1 , u2 , . . . , uN R. Let g(x) = k=0 uk fk x where k = deg(fN +1 ) deg(fk ).
Then deg(fN +1 g) < deg(fN +1 ), and fN +1 g I and fN +1 g
/ (f0 , f1 , . . . , fN ). But
this contradicts minimality of deg(fN +1 ).
Hence, R[x] is Noetherian.
184.4
finitely generated modules over a principal ideal

domain
Let R be a principal ideal domain and let M be a finitely generated R- module.

Lemma 1. Lemma Let M be a submodule of the R-module Rn . Then M is free and finitely
generated by s n elements.
or n = 1 this is clear since M is an ideal of R and is generated by some element a R.
Now suppose that the statement is true for all submodules of Rm , 1 m n 1.
F
For a submodule M of Rn we define f : M 7 R, (k1 , . . . , kn ) 7 k1 . Since f is surjective,

the image of f is an ideal I in R. If I = {0}, then M ker(f ) = (0) Rn1 . Otherwise,
I = (g), g 6= 0. In the first case, elements of ker(f ) can bijectively be mapped to Rn1 by
g : ker(f ) 7 Rn1 , (0, k1, . . . , kn1) 7 (k1 , . . . , kn1 ); so the image of M under this mapping
is a submodule of Rn1 which by the induction hypothesis is finitely generated and free.
Now let x M such that f (x) = gh and y M with
T f (y) = g. Then f (x hy) =
f (x) f (hy) = 0, which is equivalent to x hy ker(f ) Rn := N which is isomorphic to
a submodule of Rn1 . This shows that Rx + N = M.
819
Let {g1 , . . . , gs } be a basis of N. By P

assumption, s n 1. Well show that {x, g1 , . . . , gs }
is linearly independent. So let rx + si=1 ri gi = 0. The first component of the gi are 0, so
the first component of rx must also be 0. Since f (x) is a multiple of g 6= 0 and 0 = r f (x),
then r = 0. Since {g1 , . . . , gs } are linearly independent, {x, g1 , . . . , gs } is a generating set of
M with s + 1 n elements.
Corollary 3. If M is a finitely generated R-module over a PID generated by s elements and
N is a submodule of M, then N can be generated by s or fewer elements.
P
et {g1 , . . . , gs be a generating set of M and f : Rs 7 M, (r1 , . . . , rs ) 7 si=1 ri gi . Then
0
the inverse image N of N is a submodule of Rs and according to lemma 1 can be generated
0
by s or fewer elements. Let n1 , . . . , nt be a generating set of n ; then t s, and since f is
surjective, f (n1 ), . . . , f (nt ) is a generating set of N.
L
Theorem 1. Let M be a finitely generated module over a principal ideal domain R.

(I) M/tor(M) is torsion-free, i.e. tor(M) = {0}. In particular, if M is torsion-free, then
M is free.
(II) Let tor(M) be a proper submodule of M. Then there exists a finitely generated free
submodule F of M such that M = F tor(M).
Proof of (I): For short set T := tor(M). For m M m denotes the coset modulo T generated
by m. Let m be a torsion element of M/T , so there exists R \ {0} such that m = 0,
which means m T . But then m is a member of T , and this implies that M/T has
no non-zero torsion elements (which is obvious if M = tor(M)).
Now let N be a finitely generated torsion-free R-module
with generating set {g1 , . . . , gn }.
Pn
n
The homomorphism f : R 7 N, (r1 , . . . , rn ) 7 i=1 ri gi is injective since N is torsion-free.
Let for instance {e1 , . . . , en } be the standard basis of Rn . Then the elements f (e1 ), . . . , f (en )
are linearly independent in N. Now let N = M/tor(M) where tor(M) = {0}. Then the
cosets can be identified with the elements of M, and the statement follows.
Proof of (II): Let : M 7 M/T, a 7 a + T . is surjective, so m1 , . . . , m
Ptt M can be
chosenPsuch that (mi ) = ni where the ni s are a basis of M/T . If 0M = i=1 ai mi , then
0n = ti=1 ai ni . Since n1 , . . . , nt are linearly independent in N it follows 0 = a1 = . . . = at .
So the submodule spanned by m1 , . . . , mt of M is free.
Pt
Now
let
m
be
some
element
of
M
and
(m)
=
i=1 ai ni . This is equivalent to m

Pt
ker() = T . Hence, any m is the sum of f+t, f F, t T . Since F is
i=1 ai ni T
torsion-free, F T = {0}, and it follows M = F T .
820
Chapter 185
13F07 Euclidean rings and
generalizations
185.1
Euclidean domain
An Euclidean domain is an integral domain D where an Euclidean valuation has been

defined.
Any Euclidean domain is also a principal ideal domain and therefore also a unique factorization domain.
But even more important, on Euclidean domains we can define gcd and use Euclids algorithm.
Examples of Euclidean domains are the rings Z and the polynomial ring on one variable F [x]
where F is a field.
185.2
Euclidean valuation
Let D be an integral domain. An Euclidean valuation is a function from non-zero elements

to the non-negative integers
[
: D {0} Z+ {0}
such that
For any a, b D, b 6= 0, there exist q, r D such that a = bq + r with (r) < (b) or
r = 0.
For any a, b D both non-zero, (a) 6 (ab).
821
Euclidean valuations are important because they let us define greatest common divisors and
use Euclids algorithm. Some facts about Euclidean valuations:
The value (1) is minimal. That is, (1) 6 (a) for any nonzero element of D.
u D is an unit if an only if (u) = (1).
185.3
proof of Bezouts Theorem
Let D be an integral domain with an Euclidean valuation. Let a, b D not both 0. Let
(a, b) = {ax + by|x, y D}. (a, b) is an ideal in D 6= {0}. We choose d (a, b) such that
(d) is the smallest positive value. Then (a, b) is generated by d and has the property d|a
and d|b. Two elements x and y in D are associate if and only if (x) = (y). So d is unique
up to a unit in D. Hence d is the greatest common divisor of a and b.
185.4
proof that an Euclidean domain is a PID
Let D be an Euclidean domain, and let a D be an ideal. We show that a is principal.

Indeed, let
A = {(x) : x a, x 6= 0}
be the subset of Z which contains the valuations of the non-zero elements of a. Since A is
non-empty and bounded below, it has a minimum m, and soppose that d a is an element
such that (d) = m. We contend that a = (d). Clearly (d) a, lets prove the opposite
inclusion. Let x a, there exist elements y, r such that
x = yd + r
with (r) < (d) or r = 0. Since r = x yd a, it must be r = 0, hence d|x, which
concludes the proof.
822
Chapter 186
13F10 Principal ideal rings
186.1
Smith normalform
Let A 6= 0 be a m n-matrix with entries from a principal ideal domain R. For a R \ {0}
(a) denotes the number of prime factors of a. Start with t = 1 and choose jt to be the
smallest column index of A with a non-zero entry.
(I) If at,jt = 0 and ak,jt 6= 0, exchange rows 1 and k.
(II) If there is an entry at position (k, jt ) such that at,jt 6 |ak,jt , then set = gcd (at,jt , ak,jt )
and choose , R such that
at,jt ak,jt = 1.
By left-multiplication with an appropriate matrix it can be achieved that row 1 of the
matrix product is the sum of row 1 multiplied by and row k multiplied by ( ).
Then we get at position (t, jt ), where () < (at,jt . Repeating these steps one ends
up with a matrix having an entry at position (t, jt ) that divides all entries in column
jt .
(III) Finally, adding appropriate multiples of row t, it can be achieved that all entries in
column jt except for that at position (t, jt ) are zero. This can be achieved by leftmultiplication with an appropriate matrix.
Applying the steps described above to the remaining non-zero columns of the resulting matrix
(if any), we get an m n-matrix with column indices j1 , . . . , jr where r min(m, n), each
of which satisfies the following:
1. the entry at position (l, jl ) is non-zero;
823
2. all entries below and above position (l, jl ) as well as entries left of (l, jl ) are zero.
Furthermore, all rows below the r-th row are zero.
This is a version of the Gauss algorithm for principal ideal domains which is usually described
only for commutative fields.
Now we can re-order the columns of this matrix so that elements on positions (i, i) for
1 i r are nonzero and (aii (ai+1,i+1 ) for 1 i < r; and all columns right of the r-th
column (if present) are zero. For short set i for the element at position (i, i). has nonnegative integer values; so (1 ) = 0 is equivalent to 1 being a unit of R. (i ) = (i+1 )
can either happen if i and i+1 differ by a unit factor, or if they are relative prime. In the
latter case one can add column i + 1 to column i (which doesnt change i ) and then apply
appropriate row manipulations to get i = 1. And for (i ) < (i+1 ) and i 6 |i+1 one can
apply step (II) after adding column i + 1 to column i. This diminishes the minimum -values
for non-zero entries of the matrix, and by reordering columns etc. we end up with a matrix
whose diagonalelements i satisfy i |i+1 1 i < r.
Since all row and column manipulations involved in
that there exist invertible m m and n n-matrices
1 0
... 0
0 2
0 ...
..
.
0 . . . 0 r
0 ...0
the process are invertible, this shows

s, T so that S A T is
(186.1.1)
This is the Smith normalform of the matrix. The elements i are unique up to associatedness and are called elementary divisors.
824
Chapter 187
13F25 Formal power series rings
187.1
formal power series
Formal power series allow one to employ much of the analytical machinery of power series
in settings which dont have natural notions of convergence. They are also useful in order to
compactly describe sequences and to find closed formulas for recursively described sequences;
this is known as the method of generating functions and will be illustrated below.
We start with a commutative ring R. We want to define the ring of formal power series over
R in the variable X, denoted by R[[X]];
of this ring can be written in a unique
P each element
n
way as an infinite sum of the form n=0 an X , where the coefficients an are elements of R;
any choice of coefficients an is allowed. R[[X]] is actually a topological ring so that these
infinite sums are well-defined and convergent. The addition and multiplication of such sums
follows the usual laws of power series.
Formal construction Start with the set RN of all infinite sequences in R. Define addition
of two such sequences by
(an ) + (bn ) = (an + bn )
and multiplication by
n
X
(an )(bn ) = (
ak bnk ).
k=0
This turns RN into a commutative ring with multiplicative identity (1,0,0,. . . ). We identify
the element a of R with the sequence (a,0,0,. . . ) and define X := (0, 1, 0, 0, . . .). Then every
element of RN of the form (a0 , a1 , a2 , . . . , aN , 0, 0, . . .) can be written as the finite sum
N
X
an X n .
n=0
825
In order to extend this equation to infinite series, we need a metric on RN . We define

d((an ), (bn )) = 2k , where k is the smallest natural number such that ak 6= bk (if there is not
such k, then the two sequences are equal and we define their distance to be zero). This is a
metric which turns RN into a topological ring, and the equation
X
(an ) =
an X n
n=0
can now be rigorously proven using the notion of convergence arising from d; in fact, any
rearrangement of the series converges to the same limit.
This topological ring is the ring of formal power series over R and is denoted by R[[X]].
Properties R[[X]] is an associative algebra over R which contains the ring R[X] of polynomials
over R; the polynomials correspond to the sequences which end in zeros.
The geometric series formula is valid in R[[X]]:
(1 X)1 =
P
Xn
n=0
An element
an X of R[[X]] is invertible in R[[X]] if and only if its constant coefficient a0
is invertible in R. This implies that the Jacobson radical of R[[X]] is the ideal generated by
X and the Jacobson radical of R.
Several algebraic properties of R are inherited by R[[X]]:
if R is a local ring, then so is R[[X]]
if R is noetherian, then so is R[[X]]
if R is an integral domain, then so is R[[X]]
if R is a field, then R[[X]] is a discrete valuation ring.
The metric space (R[[X]], d) is complete. The topology on R[[X]] is equal to the product topology
on RN where R is equipped with the discrete topology. It follows from Tychonoffs theorem
that R[[X]] is compact if and only if R is finite. The topology on R[[X]] can also be seen as
the I-adic topology, where I = (X) is the ideal generated by X (whose elements are precisely
the formal power series with zero constant coefficient).
If K = R is a field, we can consider the quotient field of the integral domain K[[X]]; it is
denoted by K((X)). It is a topological field whose elements are called formal Laurent
series; they can be uniquely written in the form
X
f=
an X n
n=M
where M is an integer which depends on the Laurent series f .

826
Formal power series as functions In analysis, every convergent power series defines
a function with values in the real or complex numbers. Formal power series can also be
interpreted
as functions, but one has to be careful with the domain and codomain. If
P
f =
an X n is an element of R[[X]], if S is a commutative associative algebra over R, if I
an ideal in S such that the I-adic topology on S is complete, and if x is an element of I,
then we can define
X
f (x) :=
an xn .
n=0
This latter series is guaranteed to converge in S given the above assumptions. Furthermore,
we have
(f + g)(x) = f (x) + g(x)
and
(f g)(x) = f (x)g(x)
(unlike in the case of bona fide functions, these formulas are not definitions but have to
proved).
Since the topology on R[[X]] is the (X)-adic topology and R[[X]] is complete, we can in
particular apply power series to other power series, provided that the arguments dont have
constant coefficients: f (0), f (X 2 X) and f ((1 X)1 1) are all well-defined for any
formal power series f R[[X]].
With this formalism, we can give an explicit formula for the multiplicative inverse of a power
series f whose constant coefficient a = f (0) is invertible in R:
f
X
n=0
an1 (a f )n
P
n
Differentiating formal power series If f =
n=0 an X R[[X]], we define the formal
derivative of f as
X
Df =
an nX n1 .
n=1
This operation is R-linear, obeys the product rule
D(f g) = (D f ) g + f (D g)
and the chain rule:
D(f (g)) = (D f )(g) D g
(in case g(0)=0).
In a sense, all formal power series are Taylor series, because if f =
(Dk f )(0) = k! ak
827
an X n , then
(here k! denotes the element 1 (1 + 1) (1 + 1 + 1) . . . R.

One can also define differentiation for formal Laurent series in a natural way, and then the
quotient rule, in addition to the rules listed above, will also be valid.
Power series in several variables The fastest way to define the ring R[[X1 , . . . , Xr ]]
of formal power series over R in r variables starts with the ring S = R[X1 , . . . , Xr ] of
polynomials over R. Let I be the ideal in S generated by X1 , . . . , Xr , consider the I-adic
topology on S, and form its completion. This results in a complete topological ring containing
S which is denoted by R[[X1 , . . . , Xr ]].
For n = (n1 , . . . , nr ) Nr , we write Xn = X1n1 Xrnr . Then every element of R[[X1 , . . . , Xr ]]
can be written in a unique was as a sum
X
an Xn
nNr
where the sum extends over all n Nr . These sums converge for any choice of the coefficients
an R and the order in which the summation is carried out does not matter.
If J is the ideal in R[[X1 , . . . , Xr ]] generated by X1 , . . . , Xr (i.e. J consists of those power
series with zero constant coefficient), then the topology on R[[X1 , . . . , Xr ]] is the J-adic
topology.
Since R[[X1 ]] is a commutative ring, we can define its power series ring, say R[[X1 ]][[X2 ]].
This ring is naturally isomorphic to the ring R[[X1 , X2 ]] just defined, but as topological rings
the two are different.
If K = R is a field, then K[[X1 , . . . , Xr ]] is a unique factorization domain.
Similar to the situation described above, we can apply power series in several variables to
other power series with zero constant coefficients. It is also possible to define partial derivatives
for formal power series in a straightforward way. Partial derivatives commute, as they do
for continuously differentiable functions.
Uses One can use formal power series to prove several relations familar from analysis in a
purely algebraic setting. Consider for instance the following elements of Q[[X]]:
X
(1)n
X 2n+1
sin(X) :=
(2n
+
1)!
n=0
cos(X) :=
X
(1)n
n=0
Then one can easily show that
(2n)!
sin2 + cos2 = 1
828
X 2n
and
D sin = cos
as well as
sin(X + Y ) = sin(X) cos(Y ) + cos(X) sin(Y )
(the latter being valid in the ring Q[[X, Y ]]).
As an example of the method of generating functions, consider the problem of finding a closed
formula for the Fibonacci numbers fn defined by fn+2 = fn+1 + fn , f0 = 0, and f1 = 1. We
work in the ring R[[X]] and define the power series
f=
fn X n ;
n=0
f is called the generating function for the sequence (fn ). The generating function for
the sequence (fn1 ) is Xf while that for (fn2 ) is X 2 f . From the recurrence relation, we
therefore see that the power series Xf +X 2f agrees with f except for the first two coefficients.
Taking these into account, we find that
f = Xf + X 2 f + X
(this is the crucial step; recurrence relations can almost always be translated into equations
for the generating functions). Solving this equation for f , we get
f=
X
.
1 X X2
Using the golden ratio 1 = (1 + 5)/2 and 2 = (1 5)/2, we can write the latter
expression as

1
1
1
5 1 1 X 1 2 X
These two power series are known explicitly because they are geometric series; comparing
coefficients, we find the explicit formula
1
fn = (n1 n2 ) .
5
In algebra, the ring K[[X1 , . . . , Xr ]] (where K is a field) is often used as the standard, most
general complete local ring over K.
Universal property The power series ring R[[X1 , . . . , Xr ]] can be characterized by the
following universal property: if S is a commutative associative algebra over R, if I is an ideal
in S such that the I-adic topology on S is complete, and if x1 , . . . , xr I are given, then
there exists a unique : R[[X1 , . . . , Xr ]] S with the following properties:
829
is an R-algebra homomorphism
is continuous
(Xi ) = xi for i = 1, . . . , r.
Version: 7 Owner: AxelBoldt Author(s): mps, AxelBoldt
830
Chapter 188
13F30 Valuation rings
188.1
discrete valuation
A discrete valuation on a field K is a valuation | | : K R whose image is a discrete subset

of R.
For any field K with a discrete valuation | |, the set
R := {x K : |x| 6 1}
is a subring of K with sole maximal ideal

M := {x K : |x| < 1},
and hence R is a discrete valuation ring. Conversely, given any discrete valuation ring R,
the field of fractions K of R admits a discrete valuation sending each element x R to cn ,
where 0 < c < 1 is some arbitrary fixed constant and n is the order of x, and extending
multiplicatively to K.
Note: Discrete valuations are often written additively instead of multiplicatively; under this
alternate viewpoint, the element x maps to logc |x| (in the above notation) instead of just
|x|. This transformation reverses the order of the absolute values (since c < 1), and sends
the element 0 K to . It has the advantage that every valuation can be normalized by a
suitable scalar multiple to take values in the integers.
188.2
discrete valuation ring
A discrete valuation ring R is a principal ideal domain with exactly one maximal ideal M.
Any generator t of M is called a uniformizer or uniformizing element of R; in other words,
831
a uniformizer of R is an element t R such that t M but t

/ M 2.
Given a discrete valuation ring R and a uniformizer t R, every element z R can be
written uniquely in the form u tn for some unit u R and some nonnegative integer
n Z. The integer n is called the order of z, and its value is independent of the choice of
uniformizing element t R.
Version: 4 Owner: djao Author(s): djao, nerdy2
832
Chapter 189
13G05 Integral domains
189.1
Dedekind-Hasse valuation
If D is an integral domain then it is a PID iff it has a Dedekind-Hasse valuation, that

is, a function : D {0} Z+ such that for any a, b D {0} either
a (b)
or
(a) (b) [0 < ( + ) < (b)]

Proof: First, let be a Dedekind-Hasse valuation and let I be an ideal of an integral domain
D. Take some b I with (b) minimal (this exists because the integers are well-ordered)
and some a I such that a 6= 0. I must contain both (a) and (b), and since it is closed
under addition, + I for any (a), (b).
Since (b) is minimal, the second possibility above is ruled out, so it follows that a (b).
But this holds for any a I, so I = (b), and therefore every ideal is princple.
For the converse, let D be a PID. Then define (u) = 1 for any unit. Any non-zero, non-unit
can be factored into a finite product of irreducibles (since every PID as a UFD), and every
such factorization of a is of the same length, r. So for a D, a non-zero non-unit, let
(a) = r + 1. Obviously r Z+ .
Then take any a, b D {0} and suppose a
/ (b). Then take the ideal of elements of
the form { + | (a), (b)}. Since this is a PID, it is a principal ideal (c) for some
r D {0}, and since 0 + b = b (c), there is some non-unit x D such that xc = b.
Then N(b) = N(xr). But since x is not a unit, the factorization of b must be longer than
the factorization of c, so (b) > (c), so is a Dedekind-Hasse valuation.
833
189.2
PID
A principal ideal domain D is an integral domain where every ideal is a principal ideal.
In a PID, an ideal (p) is maximal if and only if p is irreducible (and prime since any PID is
also a UFD).
189.3
UFD
An integral domain D such that

Every nonzero element of D that is not an unit can be factored into a product of a
finite number of irreducibles.
If p1 p2 pr and q1 q2 qs are two factorizations of a same element into irreducibles,
then r = s and we can reorder the qj in a way that qj is an associate element of rj
is called a unique factorization domain (UFD).
Some of the classic results about UFDs:
On a UFD, the concept of prime element and irreducible element coincide.
If F is a field, F [x] is UFD
If D is a UFD, then D[x] (the ring of polynomials on the variable x over D) is
also a UFD
Since R[x, y]
= R[x][y], these results can be extended to rings of polynomials with a finite
number of variables.
If D is a principal ideal domain, then it is also a UFD.
The converse is however, non true. Let F a field and consider the UFD F [x, y] Let I the
ideal consisting of all the elements of F [x, y] whose constant term is 0. Then it can be proved
that I is not a principal ideal. Therefore not every UFD is a PID.
834
189.4
a finite integral domain is a field
A finite (commutative) integral domain is a field.

L
et R be a finite integral domain. Let a be nonzero element of R.
Define a function : R R by (r) = ar.

Suppose (r) = (s) for some r, s R. Then ar = as, which implies a(r s) = 0. Since
a 6= 0 and R is a cancellation ring, we have r s = 0. So r = s, and hence is injective.
Since R is finite and is injective, by the pigeonhole principle we see that is also surjective.
Thus there exists some b R such that (b) = ab = 1R , and thus a is a unit.
Thus R is a finite division ring. Since it is commutative, it is also a field.
189.5
an artinian integral domain is a field
Let R be an integral domain and artinian.

Let a R with a 6= 0. Then R aR a2 R .
If an R = an+1 R, then there exists r R such that an = an+1 r, therefore since an 6= 0 (since
R is an integral domain) then we must have 1 = ar. Hence a is a unit.
Therefore, every artinian integral domain is a field.
189.6
example of PID
Important examples of principal ideal domains:

The ring of the integers Z.
The ring of polynomials in one variable over a field, i.e. a ring of the form F[X], where
F is a field. Note that the ring of polynomials in more than one variable over a field is
never a PID.
Both of these examples are actually examples of Euclidean rings, which are always PIDs.
There are, however, more complicated examples of PIDs which are not Euclidean rings.
835
189.7
field of quotients
field of quotients
1: R integral domain

2: A = {a/b a, b R & b 6= 0}
3: a1 /b1 , a2 /b2 A : a1 b2 = a2 b2 a1 /b1 a2 /b2

4: A/
random and presumably unrelated definition

1: An irreducible polynomial
2: in F [x]
3: for some field F
4: whose formal derivative is nonzero
5: If the characteristic of F is 0, then every nonzero irreducible polynomial is separable
6: If the characteristic of F is p 6= 0, then a nonzero irreducible polynomial is separable
if and only if it can be written as a polynomial in xp .
Version: 3 Owner: bwebste Author(s): yark, apmxi
189.8
integral domain
An integral domain is a cancellation ring which has an identity element 1 6= 0.

Integral domains are usually also assumed to be commutative rings.
836
189.9
irreducible
Let D be an integral domain, and let r be a nonzero element of D. We say that r is

irreducible in D if for any factorization r = ab in D we must have that a or b is an unit.
189.10
motivation for Euclidean domains
UFDs, PIDs, and Euclidean domains are ways of building successively more of standard
number theory into a domain.
First, observe that the units are numbers that are 1-like. Obvious examples, besides 1 itself,
are 1 in Z or i and i in the complex integers.
Ideals behave something like the set of products of an element; in Z those are the ideals
(together with the zero ideal), and ideals in other rings have some similar behavior.
In commutative rings, prime ideals have one property similar to the prime numbers. Specifically, a product of two elements is in the ideal exactly when one of those elements is already
in the ideal, the way that if a b is even we know that either a or b is even, but if we know
it is a multiple of four, that could be because both are even but not divisible by four.
The other property most associated with prime numbers is their irreducability: the only way
to factor an irreducible element is to use a unit, and since units are 1-like, that doesnt
really break the element into smaller pieces. (Specifically, the non-unit factor can always be
broken into another unit times the original irreducible element).
In a UFD these two properties of prime numbers coincide for non-zero numbers. All
prime elements (generators of prime ideals) are irreducible and all irreducibles are prime
elements. In addition, all numbers can be factored into prime elements the same way integers can be factored into primes.
A principal ideal domain behaves even more like the integers by adding the concept of a
greatest common divisor. Formally this holds because for any two ideals, in any ring, we
can find a minimal ideal which contains both of them, and in a PID we have the guarantee
that the new ideal is generated by a particular elementthe greatest common divisor. The
Dedekind-Hasse valuation on the ring encodes this property, by requiring that, if a is not a
multiple of b (that is, in (b)) then there is a common divisor which is simpler than b (the
formal definition is that the element be of the form ax + by, but there is, in general, a
connection between linear combinations of elements and their greatest common divisor).
Being a Euclidean domains is an even stronger requirement, the most important effect of
which is to provide Euclids algorithm for finding g.c.ds. The key property is that di837
vision with remainders can be performed akin to the way it is done in the integers. A
Euclidean valuation again encodes this property by ensuring that remainders are limited
(specifically, requiring that the norm of the remainder be less than the norm of the divisor). This forces the remainders to get successively smaller, guaranteeing that the process
eventually halts.
189.11
zero divisor
Let R be a ring. A nonzero element a R is called a zero divisor if there exists a nonzero
element b R such that a b = 0.
Example: Let R = Z6 . Then the elements 2 and 3 are zero divisors, since 2 3 6 0
(mod 6).
838
Chapter 190
13H05 Regular local rings
190.1
regular local ring
A local ring R of dimension n is regular if and only if its maximal ideal m is generated by n
elements.
Equivalently, R is regular if dimR/m m/m2 = dim R, where the first dimension is that of a
vector space, and the latter is the Krull dimension, since by Nakayamas lemma, elements
generate m if and only if their images under the projection generate m/m2 .
By Krulls principal ideal theorem, m cannot be generated by fewer than n elements, so the
maximal ideals of regular local rings have a minimal number of generators.
839
Chapter 191
13H99 Miscellaneous
191.1
local ring
Commutative case
A commutative ring with multiplicative identity is called local if it has exactly one maximal ideal.
This is the case if and only if 1 6= 0 and the sum of any two non-units in the ring is again a
non-unit; the unique maximal ideal consists precisely of the non-units.
The name comes from the fact that these rings are important in the study of the local
behavior of varieties and manifolds: the ring of function germs at a point is always local.
(The reason is simple: a germ f is invertible in the ring of germs at x if and only if f (x) 6= 0,
which implies that the sum of two non-invertible elements is again non-invertible.) This is
also why schemes, the generalizations of varieties, are defined as certain locally ringed spaces.
Other examples of local rings include:
All fields are local. The unique maximal ideal is (0).
Rings of formal power series over a field are local, even in several variables. The unique
maximal ideal consists of those power series without constant term.
if R is a commutative ring with multiplicative identity, and p is a prime ideal in R,
then the localization of R at p, written as Rp, is always local. The unique maximal
ideal in this ring is pRp.
All discrete valuation rings are local.
A local ring R with maximal ideal m is also written as (R, m).
Every local ring (R, m) is a topological ring in a natural way, taking the powers of m as a
neighborhood base of 0.
840
Given two local rings (R, m) and (S, n), a local ring homomorphism from R to S is a
ring homomorphism f : R S (respecting the multiplicative identities) with f (m) n.
These are precisely the ring homomorphisms that are continuous with respect to the given
topologies on R and S.
The residue field of the local ring (R, m) is the field R/m.
General case
One also considers non-commutative local rings. A ring with multiplicative identity is called
local if it has a unique maximal left ideal. In that case, the ring also has a unique maximal
right ideal, and the two ideals coincide with the rings Jacobson radical, which in this case
consists precisely of the non-units in the ring.
A ring R is local if and only if the following condition holds: we have 1 6= 0, and whenever
x R is not invertible, then 1 x is invertible.
All skew fields are local rings. More interesting examples are given by endomorphism rings:
a finite-length module over some ring is indecomposable if and only if its endomorphism ring
is local, a consequence of Fittings lemma.
Version: 9 Owner: djao Author(s): AxelBoldt, djao
191.2
semi-local ring
A semi-local ring is a commutative ring with finitely many maximal ideals.

841
Chapter 192
13J10 Complete rings, completion
192.1
completion
be the set of all Cauchy sequences {xn }nN in X.

Let (X, d) be a metric space. Let X
by setting {xn } {yn } if the interleave sequence of
Define an equivalence relation on X
the sequences {xn } and {yn } is also a Cauchy sequence. The completion of X is defined to
of equivalence classes of X
modulo .
be the set X
in the following manner:
The metric d on X extends to a metric on X
d({xn }, {yn }) := lim d(xn , yn ),
n
The definition
where {xn } and {yn } are representative Cauchy sequences of elements in X.
of is tailored so that the limit in the above definition is well defined, and fact that these
sequences are Cauchy, together with the fact that R is complete, ensures that the limit exists.
with this metric is of course a complete metric space.
The space X
and the construction of R from Q. The
Note the similarity between the construction of X
process used here is the same as that used to construct the real numbers R, except for the
minor detail that one can not use the terminology of metric spaces in the construction of R
itself because it is necessary to construct R in the first place before one can define metric
spaces.
192.1.1
Metric spaces with richer structure
If the metric space X has an algebraic structure, then in many cases this algebraic structure
simply by applying it one element at a time to sequences in
carries through unchanged to X
X. We will not attempt to state this principle precisely, but we will mention the following
important instances:
842
is also a topological group with multiplication

1. If (X, ) is a topological group, then X
defined by
{xn } {yn } = {xn yn }.
and make the
2. If X is a topological ring, then addition and multiplication extend to X
completion into a topological ring.
3. If F is a field with a valuation v, then the completion of F with respect to the metric
imposed by v is a topological field, denoted Fv and called the completion of F at v.
192.1.2
Universal property of completions
of X satisfies the following universal property: for every continuous map

The completion X
f : X Y of X into a complete metric space Y , there exists a unique lifting of f to a
Y making the diagram
continuous map f : X
X
Y
f
X
commute. Up to isomorphism, the completion of X is the unique metric space satisfying
this property.
843
Chapter 193
13J25 Ordered rings
193.1
ordered ring
An ordered ring is a commutative ring R with an ordering relation 6 such that, for every
a, b, c R:
1. If a 6 b, then a + c 6 b + c
2. If a 6 b and 0 6 c, then c a 6 c b
844
Chapter 194
13J99 Miscellaneous
194.1
topological ring
A ring R which is a topological space is called a topological ring if the addition and multiplication functions are continuous functions from R R to R.
A field which is a topological ring is called a topological field.
845
Chapter 195
13N15 Derivations
195.1
derivation
Let k be a field. A derivation d on a kalgebra V is a linear transformation d : V V

satisfying the properties
d(x + y) = dx + dy
d(x y) = x dy + dx y
846
Chapter 196
13P10 Polynomial ideals, Gr
obner
bases
196.1
Gr
obner basis
Definition of monomial orderings and support:

Let F be a field, and let S be the set of monomials in F [x1 , . . . , xn ], the polynomial ring in
n indeterminates. A monomial ordering is a total ordering 6 on S which satifies
1. a 6 b implies that ac 6 bc for all a, b, c S.
2. 1 6 a for all a S.
Henceforth, assume that we have fixed a monomial ordering. Take a F [x1 , . . . , xn ]. Define
the support of a, denoted supp(a), to be the set of monomials of a with nonzero coefficients.
Then define M(a) = max(supp(a)).
A partial order on F [x1 , . . . , xn ]:
We can extend our monomial ordering to a partial ordering on F [x1 , . . . , xn ] as follows: Let
a, b F [x1 , . . . , xn ]. If supp(a) 6= supp(b), we say that a p2 (x1 , . . . , xn ) > . . . with pi [x1 , . . . , xn ] is
finite.
A division algorithm for F [x1 , . . . , xn ]:
We can then formulate a division algorithm for F [x1 , . . . , xn ]:
847
Let (f1 , . . . , fs ) be an ordered s-tuple of polynomials, with fi F [x1 , . . . , xn ]. Then for each
f F [x1 , . . . , xn ], there exist a1 , . . . , as , r F [x1 , . . . , xn ] with r unique, such that
1. f = a1 f1 + + as fs + r
2. For each i = 1, . . . , s, M(ai ) does not divide any monomial in supp(r).
Furthermore, if ai fi 6= 0 for some i, then M(ai fi ) 6 M(f ).
Definition of Gr
obner basis:
Let I be a nonzero ideal of F [x1 , . . . , xn ]. A finite set T I of polynomials is a Grobner
basis for I if for all b I with b 6= 0 there exists p T such that M(p) | M(b).
Existence of Gr
obner bases:
Every ideal I k[x1 , . . . , xn ] other than the zero ideal has a Grobner basis. Additionally,
any Grobner basis for I is also a basis of I.
848
Chapter 197
197.1
Picard group
The Picard group of a variety, scheme, or more generally locally ringed space (X, OX ) is the
group of locally free OX modules of rank 1 with tensor product over OX as the operation.
It is not difficult to see this is isomorphic to H1 (X, OX

), the first sheaf cohomology group of
the multiplicative sheaf OX

which consists of the units of OX .
197.2
affine space
Affine space of dimension n over a field k is simply the set of ordered n-tuples k n .
197.3
affine variety
An affine variety over an algebraically closed field k is a subset of some affine space k n over
k which can be described as the vanishing set of finitely many polynomials in n variables
with coefficients in k, and which cannot be written as the union of two smaller such sets.
849
For example, the locus described by Y X 2 = 0 as a subset of C2 is an affine variety over

the complex numbers. But the locus described by Y X = 0 is not (as it is the union of the
loci X = 0 and Y = 0).
One can define a subset of affine space k n or an affine variety in k n to be closed if it is a subset
defined by the vanishing set of finitely many polynomials in n variables with coefficients in k.
The closed subsets then actually satisfy the requirements for closed sets in a topology, so this
defines a topology on the affine variety known as the Zariski topology. The definition above
has the extra condition that an affine variety not be the union of two closed subsets, i.e. it is
required to an irreducible topological space in the Zariski topology. Anything then satisfying
the definition without possibly the irreducibility is known as an (affine) algebraic set.
A quasi-affine variety is then an open set (in the Zariski topology) of an affine variety.
Note that some geometers do not require what they call a variety to be irreducible, that is
they call algebraic sets varieties. The most prevalent definition however requires varieties to
be irreducible.
References: Hartshorne, Algebraic Geometry.
197.4
dual isogeny
Let E and E 0 be elliptic curves over a field K of characteristic 6= 2, 3,and let [m] denote the
multiplcation-by-m isogeny on E. Then there exists a unique isogeny f : E 0 E, called the
dual isogeny to f , such that f f = [m].
Often only the existence of a dual isogeny is needed, but the construction is explicit via the
exact sequence
E 0 Div 0 (E 0 ) Div 0 (E) E,

where Div 0 is the divisors of degree 0 on an elliptic curve.
Version: 2 Owner: mathcam Author(s): mathcam, nerdy2
197.5
finite morphism
A finite morphism of affine schemes f : Spec A Spec B is a morphism with the property
that the associated homomorphism of rings f : B A makes A into a finite B-algebra.
850
Likewise, a finite morphism of affine varieties f : V W is a morphism with the property

that the associated homomorphism of rings f : A(W ) A(V ) makes A(V ) into a finite
A(W )-module, where A(V ) denotes the coordinate ring of V .
A morphism f : X Y of schemes is finite if Y has a covering by finitely many open affine
schemes Ui , such that f 1 (Ui ) = Vi is an open affine subscheme of X for each i, and the
induced map f |Vi : Vi Ui is finite for each i.
Likewise, a morphism f : X Y of varieties is finite if Y has a covering by finitely many
open affine varieties Ui such that f 1 (Ui ) = Vi is an open affine subvariety of X for each i,
and the induced map f |Vi : Vi Ui is finite for each i.
As an example, consider the map f : A1 A1 given by x 7 x2 where A1 is the affine line
over some algebraically closed field. The associated map of rings is k[x] k[x], x 7 x2 ,
which clearly is finite, so the original morphism of affine varieties is finite.
197.6
isogeny
Let E and E 0 be elliptic curves over a field k. An isogeny between E and E 0 is a finite morphism
f : E E 0 that preserves basepoints.
The two curves are called isogenous if there is an isogeny between them. This is an
equivalence relation, symmetry being due to the existence of the dual isogeny. Every isogeny
is an algebraic homomorphism and thus induces homomorphisms of the groups of the elliptic
curves for k-valued points.
Version: 4 Owner: mathcam Author(s): mathcam, nerdy2
197.7
line bundle
In algebraic geometry, the term line bundle refers to a locally free coherent sheaf of rank
1, also called an invertible sheaf. In manifold theory, it refers to a real or complex one
dimensional vector bundle. These notions are equivalent on a non-singular complex algebraic
variety X: given a one dimensional vector bundle, its sheaf of holomorphic sections is locally
free and of rank 1. Similarly, given a locally free sheaf F of rank one, the space
[
L=
Fx /mx Fx ,
xX
given the coarsest topology for which sections of F define continuous functions in a vector
bundle of complex dimension 1 over X, with the obvious map taking the stalk over a point
to that point.
851
197.8
nonsingular variety
A variety over an algebraically closed field k is nonsingular at a point x if the local ring Ox is
a regular local ring. Equivalently, if around the point, one has an open affine neighborhood
wherein the variety is cut out by certain polynomials F1 , . . . Fn of m variables x1 , . . . , xm ,
then it is nonsingular at x if the Jacobian has maximal rank at that point. Otherwise, x is
a singular point.
A variety is nonsingular if it is nonsingular at each point.
Over the real or complex numbers, nonsingularity corresponds to smoothness: at nonsingular points, varieties are locally real or complex manifolds (this is simply the implicit function theorem).
Singular points generally have corners or self intersections. Typical examples are the curves
x2 = y 3 which has a cusp at (0, 0) and is nonsingular everywhere else, and x2 (x + 1) = y 2
which has a self-intersection at (0, 0) and is nonsingular everywhere else.
197.9
projective space
Projective space and homogeneous coordinates. Let K be a field. Projective space of

dimension n over K, typically denoted by KPn , is the set of lines passing through the origin
in Kn+1 . More formally, consider the equivalence relation on the set of non-zero points
Kn+1 \{0} defined by
x x, x Kn+1 \{0}, K\{0}.
Projective space is defined to be the set of the corresponding equivalence classes.
Every x = (x0 , . . . , xn ) Kn+1 \{0} determines an element of projective space, namely the
line passing through x. Formally, this line is the equivalence class [x], or [x0 : x1 : . . . : xn ], as
it is commonly denoted. The numbers x0 , . . . , xn are referred to as homogeneous coordinates
of the line. Homogeneous coordinates differ from ordinary coordinate systems in that a given
element of projective space is labelled by multiple homogeneous coordinates.
Affine coordinates. Projective space also admits a more conventional type of coordinate
system, called affine coordinates. Let A0 KPn be the subset of all elements p = [x0 : x1 :
. . . : xn ] KPn such that x0 6= 0. We then define the functions
Xi : A0 Kn ,
852
i = 1, . . . , n,
according to
xi
,
x0
where (x0 , x1 , . . . , xn ) is any element of the equivalence class representing p. This definition
makes sense because other elements of the same equivalence class have the form
Xi (p) =
(y0 , y1, . . . , yn ) = (x0 , x1 , . . . , xn )

for some non-zero K, and hence
yi
xi
= .
y0
x0
The functions X1 , . . . , Xn are called affine coordinates relative to the hyperplane

H0 = {x0 = 1} Kn+1 .
Geometrically, affine coordinates can be described by saying that the elements of A0 are
lines in Kn+1 that are not parallel to H0 , and that every such line intersects H0 in one and
exactly one point. Conversely points of H0 are represented by tuples (1, x1 , . . . , xn ) with
(x1 , . . . , xn ) Kn , and each such point uniquely labels a line [1 : x1 : . . . : xn ] in A0 .
It must be noted that a single system of affine coordinates does not cover all of projective
space. However, it is possible to define a system of affine coordinates relative to every
hyperplane in Kn+1 that does not contain the origin. In particular, we get n + 1 different
systems of affine coordinates corresponding to the hyperplanes {xi = 1}, i = 0, 1, . . . , n.
Every element of projective space is covered by at least one of these n + 1 systems of
coordinates.
Projective automorphisms. The invertible linear transformations of Kn+1 determine a
corresponding group of automorphisms of projective space. Let A : Kn+1 Kn+1 be a nonsingular linear transformation. The corresponding projective automorphism [A] : KPn
KPn is defined to be the transformation with action
[x] 7 [Ax],
x Kn+1 .
It is evident that for every non-zero K the transformation A gives the same projective
automorphism as A. For this reason, it is convenient to identify the group of projective
automorphisms with
PSLn (K) = SLn+1 (K)/n+1 .
Here SLn+1 denotes the special group of unimodular linear transformations, that is transformations of Kn+1 having determinant 1. The symbol n+1 denotes the subgroup generated
by elements I, where is a (n + 1)st root of unity. The unimodular conditition is almost
sufficient to uniquely specify a linear transformation to represent a given projective action.
However, note that
det(A) = n+1 det(A),
853
A SLn+1 , K,
and hence if the field K admits non-trivial roots of unity , then multiplication by such an
preserves the determinant. Hence, the projective action of A SLn+1 coincides with the
projective action of A, making it necessary to quotient SLn+1 by the normal subgroup n+1
in order to obtain the group of projective automorphisms.
Version: 4 Owner: rmilson Author(s): rmilson, nerdy2
197.10
projective variety
Given a homogeneous polynomial F of degree d in n+1 variables X0 , . . . , Xn and a point [x0 :

: xn ], we cannot evaluate F at that point, because it has multiple such representations,
but since F (x0 , . . . , xn ) = d F (x0 , . . . , xn ) we can say whether any such representation
(and hence all) vanish at that point.
A projective variety over an algebraically closed field k is a subset of some projective space Pnk
over k which can be described as the common vanishing locus of finitely many homogeneous
polynomials with coefficients in k, and which is not the union of two such smaller loci.
197.11
quasi-finite morphism
A morphism f : X Y of schemes or varieties is quasi-finite if for each y Y , the fiber

f 1 (y) is a finite set.
854
Chapter 198
14A10 Varieties and morphisms
198.1
Zariski topology
Let Ank denote the affine space k n over a field k. The Zariski topology on Ank is defined to be
the topology whose closed sets are the sets
V (I) := {x Ank | f (x) = 0 for all f I} Ank ,
where I k[X1 , . . . , Xn ] is any ideal in the polynomial ring k[X1 , . . . , Xn ]. For any affine variety
V Ank , the Zariski topology on V is defined to be the subspace topology induced on V as
a subset of Ank .
Let Pnk denote ndimensional projective space over k. The Zariski topology on Pnk is defined
to be the topology whose closed sets are the sets
V (I) := {x Pnk | f (x) = 0 for all f I} Pnk ,
where I k[X0 , . . . , Xn ] is any homogeneous ideal in the graded kalgebra k[X0 , . . . , Xn ].
For any projective variety V Pnk , the Zariski topology on V is defined to be the subspace
topology induced on V as a subset of Pnk .
The Zariski topology is the predominant topology used in the study of algebraic geometry.
Every regular morphism of varieties is continuous in the Zariski topology (but not every
continuous map in the Zariski topology is a regular morphism). In fact, the Zariski topology
is the weakest topology on varieties making points in A1k closed and regular morphisms
continuous.
855
198.2
algebraic map
A map f : X Y between quasi-affine varieties X k n , Y k m over a field k is called

algebraic if there is a map f 0 : k n k m whose component functions are polynomials, such
that f 0 restricts to f on X.
Alternatively, f is algebraic if the pullback map f : C(Y ) C(X) takes the coordinate
ring of Y , k[Y ], to the coordinate ring of X, k[X].
198.3
algebraic sets and polynomial ideals
Suppose k is an algebraically closed field. Let Ank denote affine n-space over k.
For S k[x1 , . . . , xn ], define V (S), the zero set of S, by
V (S) = {(a1 , . . . , an ) K n | f (a1 , . . . , an ) = 0for allf S}
We say that Y Ank is an algebraic set if there exists T k[x1 , . . . , xn ] such that Y = V (T ).
The subsets of Ank which are algebraic sets induces the Zariski topology over Ank .
For Y Ank , define the ideal of Y in k[x1 , . . . , xn ] by
I(Y ) = {f k[x1 , . . . , xn ] | f (P ) = 0for allP Y }
It is easily shown that I(Y ) is an ideal of k[x1 , . . . , xn ].
Thus we have defined a function Z mapping from subsets of k[x1 , . . . , xn ] to algebraic sets
in Ank , and a function I mapping from subsets of An to ideals of k[x1 , . . . , xn ].
These maps have the following properties:
1. S1 S2 k[x1 , . . . , xn ] implies V (S1 ) V (S2 ).
2. Y1 Y2 Ank implies I(Y1 ) I(Y2).
3. For any ideal a k[x1 , . . . , xn ], I(V (a)) = Rad(a).
4. For any Y Ank , V (I(Y )) = Y , the closure of Y in the Zariski topology.
856
From the above, we see that there is a 1-1 correspondence between algebraic sets in Ank and
radical ideals of k[x1 , . . . , xn ]. Furthermore, an algebraic set Y Ank is an affine variety if
and only if I(Y ) is a prime ideal.
198.4
noetherian topological space
A topological space X is called noetherian if it satisfies the descending chain condition for
closed subsets: for any sequence
Y1 Y2
of closed subsets Yi of X, there is an integer m such that Ym = Ym+1 = .
Example:
The space Ank (affine n-space over a field k) under the Zariski topology is an example of a
noetherian topological space. By properties of the ideal of a subset of Ank , we know that if
Y1 Y2 is a descending chain of Zariski-closed subsets, then I(Y1) I(Y2 ) is
an ascending chain of ideals of k[x1 , . . . , xn ].
Since k[x1 , . . . , xn ] is a Noetherian ring, there exists an integer m such that I(Ym ) = I(Ym+1 ) =
. But because we have a one-to-one correspondence between radical ideals of k[x1 , . . . , xn ]
and Zariski-closed sets in Ank , we have V (I(Yi )) = Yi for all i. Hence Ym = Ym+1 = as
required.
198.5
regular map
A regular map : k n k m between affine spaces over an algebraically closed field is merely
one given by polynomials. That is, there are m polynomials F1 , . . . , Fm in n variables such
that the map is given by (x1 , . . . , xn ) = (F1 (x), . . . , Fm (x)) where x stands for the many
components xi .
A regular map : V W between affine varieties is one which is the restriction of a regular
map between affine spaces. That is, if V k n and W k m , then there is a regular map
: k n k m with (V ) W and = |V . So, this is a map given by polynomials, whose
image lies in the intended target.
A regular map between algebraic varieties is a locally regular map. That is : V W is
regular if around each point x there is an affine variety Vx and around each point f (x) W
there is an affine variety Wf (x) with (Vx ) Wf (x) and such that the restriction Vx Wf (x)
is a regular map of affine varieties.
857
198.6
structure sheaf
Let X be an irreducible algebraic variety over a field k, together with the Zariski topology.
Fix a point x X and let U X be any affine open subset of X containing x. Define
ox := {f /g k(U) | f, g k[U], g(x) 6= 0},
where k[U] is the coordinate ring of U and k(U) is the fraction field of k[U]. The ring ox is
independent of the choice of affine open neighborhood U of x.
The structure sheaf on the variety X is the sheaf of rings whose sections on any open subset
U X are given by
\
OX (U) :=
ox ,
xU
and where the restriction map for V U is the inclusion map OX (U) , OX (V ).
There is an equivalences of category under which an affine variety X with its structure sheaf
corresponds to the prime spectrum of the coordinate ring k[X]. In fact, the topological
embedding X , Spec(k[X]) gives rise to a latticepreserving bijection 1 between the open
sets of X and of Spec(k[X]), and the sections of the structure sheaf on X are isomorphic to
the sections of the sheaf Spec(k[X]).
Those who are fans of topos theory will recognize this map as an isomorphism of topos.
858
Chapter 199
14A15 Schemes and morphisms
199.1
closed immersion
A morphism of schemes f : (X, OX ) (Y, OY ) is a closed immersion if:

1. As a map of topological spaces, f : X Y is a homeomorphism from X into a closed
subset of Y .
2. The morphism of sheaves OY OX associated with f is an epimorphism in the
category of sheaves.
199.2
coherent sheaf
Let R be a ring, and X = Spec R be the its prime spectrum. Given an R-module M, one
can define a presheaf on X by defining its sections on an open set U to be OX (U) R M. We
and a sheaf of this form on X is called quasi-coherent. If M
call the sheafification of this M,
is called coherent. A sheaf on an arbitrary scheme
is a finitely generated module, then M
X is called (quasi-)coherent if it is (quasi-)coherent on each open affine subset of X.
859
199.3
fibre product
Let S be a scheme, and let i : X S and j : Y S be schemes over S. A fibre product

of X and Y over S is a scheme X S Y together with morphisms
p : X S Y X
q : X S Y Y
such that given any scheme Z with morphisms
x : Z X
y : Z Y
where i x = j y, there exists a unique morphism
(x, y) : Z X S Y
making the diagram
Z
x
(x,y)
y
X S Y
X
i
commute. In other words, a fiber product is an object X S Y , together with morphisms

p, q making the diagram commute, with the universal property that any other collection
(Z, x, y) forming such a commutative diagram maps into (X S Y, p, q).
Fibre products of schemes always exist and are unique up to canonical isomorphism.
Other notes Fibre products are also called pullbacks and can be defined in any category
using the same definition (but need not exist in general). For example, they always exist in
the category of modules over a fixed ring, as well as in the category of groups.
199.4
prime spectrum
199.4.1
Spec as a set
Let R be any commutative ring with identity. The prime spectrum Spec(R) of R is defined
to be the set
{P ( Rsuch thatP is a prime ideal of R}.
860
For any subset A of R, we define the variety of A to be the set

V (A) := {P Spec(R)such thatA P } Spec(R)
It is enough to restrict attention to subsets of R which are ideals, since, for any subset A of
R, we haveV (A) = V
(I) where I is the ideal generated by A. In fact, even more is true:
V (I) = V ( I) where I denotes the radical of the ideal I.
199.4.2
Spec as a topological space
We impose a topology on Spec(R) by defining the sets V (A) to be the collection of closed
subsets of Spec(R) (that is, a subset of Spec(R) is open if and only if it equals the complement
of V (A) for some subset A). The equations
!
[
\
I
V (I ) = V
n
[
n
\
V (Ii ) = V
i=1
Ii ,
i=1
for any ideals I , Ii of R, establish that this collection does constitute a topology on Spec(R).
This topology is called the Zariski topology in light of its relationship to the Zariski topology
on an algebraic variety (see section 199.4.4 below). Note that a point P Spec(R) is closed
if and only if P R is a maximal ideal.
A distinguished open set of Spec(R) is defined to be an open set of the form
Spec(R)f := {P Spec(R)such thatf
/ P } = Spec(R) \ V ({f }),
for any element f R. The collection of distinguished open sets forms a topological basis
for the open sets of Spec(R). In fact, we have
[
Spec(R) \ V (A) =
Spec(R)f .
f A
The topological space Spec(R) has the following additional properties:

Spec(R) is compact (but almost never Hausdorff).
A subset of Spec(R) is an irreducible closed set if and only if it equals V (P ) for some
prime ideal P of R.
For f R, let Rf denote the localization of R at f . Then the topological spaces
Spec(R)f and Spec(Rf ) are naturally homeomorphic, via the correspondence sending
a prime ideal of R not containing f to the induced prime ideal in Rf .
861
For P Spec(R), let RP denote the localization of R at the prime ideal P . Then the
topological spaces V (P ) Spec(R) and Spec(RP ) are naturally homeomorphic, via
the correspondence sending a prime ideal of R contained in P to the induced prime
ideal in RP .
199.4.3
Spec as a sheaf
For convenience, we adopt the usual convention of writing X for Spec(R). For any f R
and P Xf , let f,P : Rf RP be the natural inclusion map. Define a presheaf of rings
OX on X by setting

(
)

Y
U
has
an
open
cover
{X
}
with
elements
s
R

f
f
OX (U) := (sP )
RP
,
such that sP = f ,P (s ) whenever P Xf
P U
for each open set U X. The restriction map resU,V : OX (U) OX (V ) is the map induced
by the projection map
Y
Y
RP
RP ,
P V
P U
for each open subset V U. The presheaf OX satisfies the following properties:
1. OX is a sheaf.
2. OX (Xf ) = Rf for every f R.
3. The stalk (OX )P is equal to RP for every P X. (In particular, X is a locally ringed space.)
4. The restriction sheaf of OX to Xf is isomorphic as a sheaf to OSpec(Rf ) .
199.4.4
Relationship to algebraic varieties
Spec(R) is sometimes called an affine scheme because of the close relationship between
affine varieties in Ank and the Spec of their corresponding coordinate rings. In fact, the
correspondence between the two is an equivalences of category, although a complete statement of this equivalence requires the notion of morphisms of schemes and will not be given
here. Nevertheless, we explain what we can of this correspondence below.
Let k be a field and write as usual Ank for the vector space k n . Recall that an affine variety
V in Ank is the set of common zeros of some prime ideal I k[X1 , . . . , Xn ]. The coordinate
ring of V is defined to be the ring R := k[X1 , . . . , Xn ]/I, and there is an embedding i : V ,
Spec(R) given by
i(a1 , . . . , an ) := (X1 a1 , . . . , Xn an ) Spec(R).
862
The function i is not a homeomorphism, because it is not a bijection (its image is contained
inside the set of maximal ideals of R). However, the map i does define an order preserving
bijection between the open sets of V and the open sets of Spec(R) in the Zariski topology.
This isomorphism between these two lattices of open sets can be used to equate the sheaf
Spec(R) with the structure sheaf of the variety V , showing that the two objects are identical
in every respect except for the minor detail of Spec(R) having more points than V .
The additional points of Spec(R) are valuable in many situations and a systematic study of
them leads to the general notion of schemes. As just one example, the classical Bezouts theorem is only valid for algebraically closed fields, but admits a schemetheoretic generalization
which holds over nonalgebraically closed fields as well. We will not attempt to explain the
theory of schemes in detail, instead referring the interested reader to the references below.
REFERENCES
1. Robin Hartshorne, Algebraic Geometry, SpringerVerlag New York, Inc., 1977 (GTM 52).
2. David Mumford, The Red Book of Varieties and Schemes, Second Expanded Edition, Springer
Verlag, 1999 (LNM 1358).
199.5
scheme
199.5.1
Definitions
An affine scheme is a locally ringed space (X, OX ) with the property that there exists a
ring R (commutative, with identity) whose prime spectrum Spec(R) is isomorphic to X as
a locally ringed space.
A scheme is a locally ringed space (X, OX ) which has an open cover {U }I with the
property that each open set U , together with its restriction sheaf OX |U , is an affine scheme.
We define a morphism of schemes between two schemes (X, OX ) and (Y, OY ) to be a morphism of locally ringed spaces f : (X, OX ) (Y, OY ). A scheme over Y is defined to be a
scheme X together with a morphism of schemes X Y .
Note: Some authors, notably Mumford and Grothendieck, require that a scheme be separated
as well (and use the term prescheme to describe a scheme that is not separated), but we will
not impose this requirement.
863
199.5.2
Examples
Every affine scheme is clearly a scheme as well. In particular, Spec(R) is a scheme for
any commutative ring R.
Every variety can be interpreted as a scheme. An affine variety corresponds to the
prime spectrum of its coordinate ring, and a projective variety has an open cover by
affine pieces each of which is an affine variety, and hence an affine scheme.
199.6
separated scheme
A scheme X is defined to be a separated scheme if the morphism

d : X X Spec Z X
into the fibre product X Spec Z X which is induced by the identity maps i : X X in
each coordinate is a closed immersion.
Note the similarity to the definition of a Hausdorff topological space. In the situation of
topological spaces, a space X is Hausdorff if and only if the diagonal morphism X X X
is a closed embedding of topological spaces. The definition of a separated scheme is very
similar, except that the topological product is replaced with the scheme fibre product.
199.7
singular set
The singular set of a variety X is the set of singular points. This is a proper subvariety. A
subvariety Y of X is contained in the singular set if and only if its local ring OY is regular.
864
Chapter 200
14A99 Miscellaneous
200.1
Cartier divisor
On a scheme X, a Cartier divisor is a global section of the sheaf K /O , where K is the

multiplicative sheaf of meromorphic functions, and O the multiplicative sheaf of invertible
regular functions (the units of the structure sheaf).
More explicitly, a Cartier divisor is a choice
T of open cover Ui of X, and meromorphic functions
fi K (Ui ), such that fi /fj O (Ui Uj ), along with two Cartier divisors being the same
if the open cover of one is a refinement of the other, with the same functions attached to
open sets, or if fi is replaced by gfi with g O .
Intuitively, the only information carried by Cartier divisor is where it vanishes, and the order
it does there. Thus, a Cartier divisor should give us a Weil divisor, and vice versa. On nice
(for example, nonsingular over an algebraically closed field) schemes, it does.
200.2
General position
In the projective plane, 4 points are said to be in general position iff no three of them are
on the same line. Dually 4 lines are in general position iff no three of them meet in the same
point. This definition naturally scales to more than four points/lines.
865
200.3
Serres twisting theorem
Let X be a scheme, and L an ample invertible sheaf on X. Then for any coherent sheaf F,
and sufficiently large n, H i (F Ln ) = 0, that is, the higher sheaf cohomology of F Ln is
trivial.
200.4
ample
An invertible sheaf L on a scheme X is called ample if for any coherent sheaf F, F Ln is

globally generated for sufficiently large n. A sheaf is ample if and only if Lm is very ample
for some m.
200.5
height of a prime ideal
Let R be a commutative ring. The height of a prime ideal p is the supremum of all integers
n such that there exists a chain p0 pn = p of distinct prime ideals.
The Krull dimension of R is the supremum of the heights of all the prime ideals of R.
200.6
invertible sheaf
A sheaf L of OX modules on a ringed space OX is called invertible if there is another sheaf

of OX -modules L0 such that L L0
= OX . A sheaf is invertible if and only if it is locally free
of rank 1, and its inverse is the sheaf L
= Hom(L, OX ), by the obvious map.
The set of invertible sheaves obviously form an abelian group under tensor multiplication,
called the Picard group of X.
866
200.7
locally free
A sheaf F on a ringed space X is called locally free if for each point x X, there is an open
neighborhood U of x such that F|U is free, or equivalently, Fx , the stalk of F at x is free as
a Ox -module. If Fx is of finite rank n, then F is said to be of rank n.
200.8
normal irreducible varieties are nonsingular in

codimension 1
Theorem 18. Let X be a normal irreducible variety. The singular set S X has codimension
2 or more.
ssume not. We may assume X is affine, since codimension is local. Now let u be the ideal
of functions vanishing on S. This is an ideal of height 1, so the local ring of Y , OS = A(X)u,
where A(X) is the affine ring of X, is a 1-dimensional local ring, and integrally closed, since
X is normal. Any integrally closed 1-dimensional local domain is a DVR, and thus regular.
But S is the singular set, so its local ring is not regular, a contradiction.
A
200.9
sheaf of meromorphic functions
Given a ringed space X, let Kx be the sheafification of the presheaf associating to each
open set U the fraction field of OX (U), where OX is the structure sheaf. This is called the
sheaf of meromorphic functions since on a complex algebraic variety, it is isomorphic to the
sheaf of functions which are meromorphic in the analytic sense.
200.10
very ample
An invertible sheaf L on a scheme X over a field k is called very ample if (1) at each point
x X, there is a global section s L(X) not vanishing at x, and (2) for each pair of points
x, y X, there is a global section s L(X) such that s vanishes at exactly one of x and y.
Equivalently, L is very ample if there is an embedding f : X Pn such that f O(1) = L.
867
If k is algebraically closed, Riemann-Roch shows that on a curve X, any invertible sheaf of

degree greater than or equal to 2g is very ample.
868
Chapter 201
14C20 Divisors, linear systems,
invertible sheaves
201.1
divisor
A divisor D on a projective
nonsingular curve over an algebraically closed field is a formal
P
sum of points D = np p where only finitely many of the np Z are nonzero.
P
The degree of a divisor D is deg(D) = np .
869
Chapter 202
Rational and birational maps
202.1
general type
A variety is said to be of general type if its Kodaira dimension equals its dimension.
870
Chapter 203
14F05 Vector bundles, sheaves,
related constructions
203.1
direct image (functor)
If f : X Y is a continuous map of topological spaces, and if Sheaves(X) is the category

of sheaves of abelian groups on X (and similarly for Sheaves(Y )), then the direct image
functor f : Sheaves(X) Sheaves(Y ) sends a sheaf F on X to its direct image f F
on Y . A morphism of sheaves g : F G obviously gives rise to a morphism of sheaves
f g : f F f G, and this determines a functor.
If F is a sheaf of abelian groups (or anything else), so is f F, so likewise we get direct image
functors f : Ab(X) Ab(Y ), where Ab(X) is the category of sheaves of abelian groups
on X.
871
Chapter 204
14F20 Etale
and other Grothendieck
topologies and cohomologies
204.1
site
A site is a category with a Grothendieck topology.

Version: 3 Owner: drini Author(s): drini, nerdy2
872
Chapter 205
14F25 Classical real and complex
cohomology
205.1
Serre duality
Serre duality is a theorem which can be thought of as a massive generalization of Poincare duality
to an algebraic context.
The most general version of Serre duality states that on certain schemes X of dimension n, including all projective varieties over any algebraically closed field k, there is a natural isomorphism
Exti (F, )
= H ni(X, F)
, F is any coherent sheaf on X and is a fixed sheaf, called the dualizing sheaf.
In special cases, this reduces to more approachable
forms. If X is nonsingular (or more
Vn
generally, Cohen-Macauley), then is simply
, where is the sheaf of differentials on
X.
If F is locally free, then
Exti (F, )
= Exti (OX , F )
= H i(X, F ),
so that we obtain the somewhat more familiar looking fact that there is a perfect pairing
H i (X, F ) H ni (X, F) k.
873
205.2
sheaf cohomology
Let X be a topological space, and assume that the category of sheaves of abelian groups on
X has enough injectives. Then we define the sheaf cohomology H i(X, F) of a sheaf F to be
the right derived functors of the global section functor F (X, F).
Usually we are interested in the case where X is a scheme, and F is a coherent sheaf. In this
case, it does not matter if we take the derived functors in the category of sheaves of abelian
groups or coherent sheaves.
Sheaf cohomology can be explicitly calculated using Cech

cohomology. Choose an open cover
{Ui } of X. We define
Y
C i (F) =
F(Uj0ji )
T T
where the product is over i + 1 element subsets of {1, . . . , n} and Uj0 ji = Uj0 Uji . If
s F(Uj0ji ) is thought of as an element of C i (F), then the differential
(s) =
Y
`
j`+1 1
(1) s|Uj0 j` kj`+1 ji
k=j` +1
i (X, F)
makes C (F) into a chain complex. The cohomology of this complex is denoted H
and called the Cech

cohomology of F with respect to the cover {Ui }. There is a natural map
i
i
H (X, F) H (X, F) which is an isomorphism for sufficiently fine covers.

874
Chapter 206
14G05 Rational points
206.1
Hasse principle
Let V be an algebraic variety defined over a field K. By V (K) we denote the set of points
be an algebraic closure of K. For a valuation of K, we write
on V defined over K. Let K
K for the completion of K at . In this case, we can also consider V defined over K and
talk about V (K ).
Definition 13.
1. If V (K) is not empty we say that V is soluble in K.
2. If V (K ) is not empty then we say that V is locally soluble at .
3. If V is locally soluble for all then we say that V satisfies the Hasse condition, or
we say that V /K is everywhere locally soluble.
The Hasse Principle is the idea (or desire) that an everywhere locally soluble variety V
must have a rational point, i.e. a point defined over K. Unfortunately this is not true, there
are examples of varieties that satisfy the Hasse condition but have no rational points.
Example: A quadric (of any dimension) satisfies the Hasse condition. This was proved by
Minkowski for quadrics over Q and by Hasse for quadrics over a number field.
REFERENCES
1. Swinnerton-Dyer, Diophantine Equations: Progress and Problems, online notes.

875
Chapter 207
14H37 Automorphisms
207.1
Frobenius morphism
Let K be a field of characteristic p > 0 and let q = pr . Let C be a curve defined over K
contained in PN , the projective space of dimension N. Define the homogeneous ideal of C
to be (the ideal generated by):
I(C) = {f K[X0 , ..., XN ] | P C, f (P ) = 0, f is homogeneous}
P
P
For f K[X0 , ..., XN ], of the form f = i ai X0i0 ...XNiN we define f (q) = i aqi X0i0 ...XNiN . We
define a new curve C (q) as the zero set of the ideal (generated by):
I(C (q) ) = {f (q) | f I(C)}
Definition 14. The q th -power Frobenius morphism is defined to be:
: C C (q)
([x0 , ..., xN ]) = [xq0 , ...xqN ]
In order to check that the Frobenius morphism is well defined we need to prove that
P = [x0 , ..., xN ] C (P ) = [xq0 , ...xqN ] C (q)
This is equivalent to proving that for any g I(C (q) ) we have g((P )) = 0. Without loss of
generality we can assume that g is a generator of I(C (q) ), i.e. g is of the form g = f (q) for
some f I(C). Then:
g((P )) = f (q) ((P )) =
=
=
=
f (q) ([xq0 , ..., xqN ])

(f ([x0 , ..., xN ]))q , [aq + bq = (a + b)q in characteristic p]
(f (P ))q
0, [P C, f I(C)]
876
as desired.
Example: Suppose E is an elliptic curve defined over K = Fq , the field of pr elements. In
this case the Frobenius map is an automorphism of K, therefore
E = E (q)
Hence the Frobenius morphism is an endomorphism (or isogeny) of the elliptic curve.
REFERENCES
877
Chapter 208
14H45 Special curves and curves of
low genus
208.1
Fermats spiral
Fermats spiral (or parabolic spiral) is an archimedean spiral with the equation :
r 2 = a2 .
This curve was discovered by Fermat in 1636.
208.2
archimedean spiral
An archimedean spiral is a spiral with the following polar equation :

r = a1/t ,
where a is a real, r is the radius distance, is the angle, and t is a constant.
For an archimedean spiral the curvature is given by the following formula :
=
t11/t (t2 2 + t + 1)
.
a(t2 2 + 1)3/2

878
208.3
folium of Descartes
The folium of Descartes is a curve with the cartesian equation:

x3 + y 3 = 3axy,
and the parametrically equation :
3at
,
1 + t3
3at2
.
y=
1 + t3
The folium of Descartes has as asymptote
x=
d : y + x + a = 0,
and the property that
3a2
,
2
where A1 and A2 are the two areas form the figure.
A1 = A2 =
208.4
spiral
Let c(s) be a curve, and let (s) and (s) be the torsion and the curvature of c(s) . Then a
(s)
constant, for all s.
spiral is a curve with the rapport (s)
879
Chapter 209
14H50 Plane and space curves
209.1
torsion (space curve)
Let g : I R3 be a parameterized space curve, assumed to be regular and free of points of inflection.
Physically, we conceive of g(t) as a particle moving through space. Let T(t), N(t), B(t) denote the corresponding moving trihedron. The speed of this particle is given by
s(t) = kg0(t)k.
In order for a moving particle to escape the osculating plane, it is necessary for the particle
to roll along the axis of its tangent vector, thereby lifting the normal acceleration vector
out of the osculating plane. The rate of roll, that is to say the rate at which the osculating
plane rotates about the tangent vector, is given by B(t) N0 (t); it is a number that depends
on the speed of the particle. The rate of roll relative to the particles speed is the quantity
(t) =
B(t) N0 (t)
(g0 (t) g00 (t)) g000 (t)
=
,
s(t)
kg0 (t) g00 (t)k2
called the torsion of the curve, a quantity that is invariant with respect to reparameterization.
The torsion (t) is, therefore, a measure of an intrinsic property of the oriented space curve,
another real number that can be covariantly assigned to the point g(t).
Version: 2 Owner: rmilson Author(s): rmilson, slider142
880
Chapter 210
14H52 Elliptic curves
210.1
Birch and Swinnerton-Dyer conjecture
Let E be an elliptic curve over Q, and let L(E, s) be the L-series attached to E.
Conjecture 1 (Birch and Swinnerton-Dyer).
1. L(E, s) has a zero at s = 1 of order equal to the rank of E(Q).
2. Let R = rank(E(Q)). Then the residue of L(E, s) at s = 1, i.e. lims1 (s1)R L(E, s)
has a concrete expression involving the following invariants of E: the real period, the
Shafarevich-Tate group, the elliptic regulator and the Neron model of E.
J. Tate said about this conjecture: This remarkable conjecture relates the behavior
of a function L at a point where it is not at present known to be defined to the
order of a group (Sha) which is not known to be finite!
The following is an easy consequence of the B-SD conjecture:
Conjecture 2. The root number of E, denoted by w, indicates the parity of the rank of the
elliptic curve, this is, w = 1 if and only if the rank is even.
There has been a great amount of research towards the B-SD conjecture. For example, there
are some particular cases which are already known:
Theorem 19 (Coates, Wiles). Suppose E is an elliptic curve defined over an imaginary
quadratic field K, with complex multiplication by K, and L(E, s) is the L-series of E. If
L(E, 1) 6= 0 then E(K) is finite.
881
REFERENCES
1. Claymath Institute, Description, online.
2. J. Coates, A. Wiles, On the Conjecture of Birch and Swinnerton-Dyer, Inv. Math. 39, 223-251
(1977).
3. James Milne, Elliptic Curves, online course notes.
New York, 1994.
210.2
Hasses bound for elliptic curves over finite fields
Let E be an elliptic curve defined over a finite field Fq with q = pr elements (p Z is a

prime). The following theorem gives a bound of the size of E(Fq ), Nq , i.e. the number points
of E defined over Fq . This was first conjectured by Emil Artin (in his thesis!) and proved
by Helmut Hasse in the 1930s.
Theorem 20 (Hasse).
| Nq q 1 |6 2 q
Remark: Let ap = p + 1 Np as in the definition of the L-series of an ellitpic curve. Then

Hasses bound reads:
| ap |6 2 p
This fact is key for the convergence of the L-series of E.
210.3
L-series of an elliptic curve
Let E be an elliptic curve over Q with Weierstrass equation:

y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
with coefficients ai Z. For p a prime in Z, define Np as the number of points in the
reduction of the curve modulo p, this is:
Np = {(x, y) Fp 2 : y 2 + a1 xy + a3 y x3 a2 x2 a4 x a6 0 mod p}
882
Also, let ap = p + 1 Np . We define the local part at p of the L-series to be:
1 ap T + pT 2 , if E has good reduction at p,
1 T , if E has split multiplicative reduction at p,

Lp (T ) =
1 + T , if E has non-split multiplicative reduction at p,
1, if E has additive reduction at p.
Definition 15. The L-series of the elliptic curve E is defined to be:

L(E, s) =
Y
p
1
Lp (ps )
where the product is over all primes in Z.

Note: The product converges and gives an analytic function for all Re(s) > 3/2. This follows
from the fact that | ap |6 2 p. However, far more is true:

Theorem 21 (Taylor, Wiles). The L-series L(E, s) has an analytic continuation to the
entire complex plane, and it satisfies the following functional equation. Define
(E, s) = (NE/Q )s/2 (2)s (s)L(E, s)
where NE /Q is the conductor of E and is the gamma function. Then:
(E, s) = w(E, 2 s) with w = 1
The number w above is usually called the root number of E, and it has an important
conjectural meaning (see Birch and Swinnerton-Dyer conjecture).
This result was known for elliptic curves having complex multiplication (Deuring, Weil) until
the general result was finally proven.
REFERENCES
New York, 1994.
Version: 3 Owner: alozano Author(s): mathcam, alozano
883
210.4
Mazurs theorem on torsion of elliptic curves
Theorem 22 (Mazur). Let E/Q be an elliptic curve. Then the torsion subgroup Etorsion (Q)
is exactly one of the following groups:
Z/NZ 1 6 N 6 10 or
N = 12
Z/2Z Z/2NZ 1 6 N 6 4
Note: see Nagell-Lutz theorem for an efficient algorithm to compute the torsion subgroup of
an elliptic curve defined over Q.
REFERENCES
2. Barry Mazur, Modular curves and the Eisenstein ideal, IHES Publ. Math. 47 (1977), 33-186.
3. Barry Mazur, Rational isogenies of prime degree, Invent. Math. 44 (1978), 129-162.
210.5
Mordell curve
A Mordell curve is an elliptic curve E/K, for some field K, which admits a model by a
Weierstrass equation of the form:
y 2 = x3 + k,
kK
Examples:
1. Let E1 /Q : y 2 = x3 + 2, this is a Mordell curve with Mordell-Weil group E1 (Q) ' Z
and generated by (1, 1).
L
2. Let E2 /Q : y 2 = x3 + 109858299531561, then E2 (Q) ' Z/3Z Z5 . See generators
here.
3. In general, a Mordell curve of the form y 2 = x3 + n2 has torsion group isomorphic to
Z/3Z generated by (0, n).
4. Let E3 /Q : y 2 = x3 + 496837487681 then this is a Mordell curve with E3 (Q) ' Z8 . See
generators here.
884
5. Here you can find a list of the minimal-known positive and negative k for Mordell
curves of given rank, and the Mordell curves with maximum rank known (see B-SD
conjecture).
210.6
Nagell-Lutz theorem
The following theorem, proved independently by E. Lutz and T. Nagell, gives a very efficient
method to compute the torsion subgroup of an elliptic curve defined over Q.
Theorem 23 (Nagell-Lutz Theorem). Let E/Q be an elliptic curve with Weierstrass
equation:
y 2 = x3 + Ax + B, A, B Z
Then for all non-zero torsion points P we have:
1. The coordinates of P are in Z, i.e.
x(P ), y(P ) Z
2. If P is of order greater than 2, then
y(P )2
divides 4A3 + 27B 2
3. If P is of order 2 then
y(P ) = 0
and
x(P )3 + Ax(P ) + B = 0
REFERENCES
1. E. Lutz, Sur lequation y 2 = x3 Ax B dans les corps p-adic, J. Reine Angew. Math. 177
(1937), 431-466.
2. T. Nagell, Solution de quelque problemes dans la theorie arithmetique des cubiques planes du
premier genre, Wid. Akad. Skrifter Oslo I, 1935, Nr. 1.
885
210.7
Selmer group
Given an elliptic curve E we can define two very interesting and important groups, the
Selmer group and the Tate-Shafarevich group, which together provide a measure of the
failure of the Hasse principle for elliptic curves, by measuring whether the curve is everywhere
locally soluble. Here we present the construction of these groups.
be an algebraic closure of Q. Let
Let E, E 0 be elliptic curves defined over Q and let Q
0
: E E be an non-constant isogeny (for example, we can let E = E 0 and think of as
being the multiplication by n map, [n] : E E). The following standard result asserts
that is surjective over Q:

Theorem 24. Let C1 , C2 be curves defined over an algebraically closed field K and let
: C1 C2
be a morphism (or algebraic map) of curves. Then is either constant or surjective.
S
ee [4], Chapter II.6.8.
E 0 (Q)
is non-constant, it must be surjective and we obtain the following
Since : E(Q)
exact sequence:
E 0 (Q)
0
0 E(Q)[]
E(Q)
(1)
where E(Q)[]
= Ker . Let G = Gal(Q/Q),
the absolute Galois group of Q, and consider
th
i
the i -cohomology group H (G, E(Q)) (we abbreviate by H i(G, E)). Using equation (1) we
obtain the following long exact sequence (see proposition 1 in group cohomology):
0 H 0 (G, E(Q)[])
H 0(G, E) H 0 (G, E 0 ) H 1 (G, E(Q)[])
H 1 (G, E) H 1 (G, E 0 )
Note that
H 0 (G, E(Q)[])
= (E(Q)[])
= E(Q)[]
and similarly
H 0 (G, E) = E(Q),
H 0 (G, E 0 ) = E 0 (Q)
From (2) we can obtain an exact sequence:
0 E 0 (Q)/(E(Q)) H 1 (G, E(Q)[])

H 1 (G, E)[] 0
We could repeat the same procedure but this time for E, E 0 defined over Qp ,for some
prime number p, and obtain a similar exact sequence but with coefficients in Qp which
886
(2)
relates to the original in the following commutative diagram (here Gp = Gal(Qp /Qp )):
0 E 0 (Q)/(E(Q)) H 1 (G, E(Q)[])

H 1 (G, E)[] 0
0 E 0 (Qp )/(E(Qp )) H 1 (Gp , E(Qp )[]) H 1 (Gp , E)[] 0
The goal here is to find a finite group containing E 0 (Q)/(E(Q)). Unfortunately H 1 (G, E(Q)[])
is not necessarily finite. With this purpose in mind, we define the -Selmer group:
!
Y
S (E/Q) = Ker H 1 (G, E(Q)[])
H 1 (Gp , E)
p
Equivalently, the -Selmer group is the set of elements of H 1 (G, E(Q)[])

whose image
1
p in H (Gp , E(Qp)[]) comes from some element in E(Qp ).

Finally, by imitation of the definition of the Selmer group, we define the Tate-Shafarevich
group:
!
Y
T S(E/Q) = Ker H 1 (G, E)
H 1 (Gp , E)
p
The Tate-Shafarevich group is precisely the group that measures the Hasse principle in the
elliptic curve E. It is unknown if this group is finite.
REFERENCES
1.
2.
3.
4.
J.P. Serre, Galois Cohomology, Springer-Verlag, New York.

James Milne, Elliptic Curves, online course notes.
Joseph H. Silverman, The Arithmetic of Elliptic Curves. Springer-Verlag, New York, 1986.
R. Hartshorne, Algebraic Geometry, Springer-Verlag, 1977.
210.8
bad reduction
210.8.1
Singular Cubic Curves
Let E be a cubic curve over a field K with Weierstrass equation f (x, y) = 0, where:
f (x, y) = y 2 + a1 xy + a3 y x3 a2 x2 a4 x a6
887
which has a singular point P = (x0 , y0). This is equivalent to:

f /x(P ) = f /y(P ) = 0
and so we can write the Taylor expansion of f (x, y) at (x0 , y0 ) as follows:
f (x, y) f (x0 , y0 ) = 1 (x x0 )2 + 2 (x x0 )(y y0 ) + 3 (y y0 )2 (x x0 )3
= [(y y0 ) (x x0 )][(y y0 ) (x x0 )] (x x0 )3
(an algebraic closure of K).
for some i K and , K
Definition 16. The singular point P is a node if 6= . In this case there are two different
tangent lines to E at P , namely:
y y0 = (x x0 ),
y y0 = (x x0 )
If = then we say that P is a cusp, and there is a unique tangent line at P .

Note: See the entry for elliptic curve for examples of cusps and nodes.
There is a very simple criterion to know whether a cubic curve in Weierstrass form is singular
and to differentiate nodes from cusps:
Proposition 10. Let E/K be given by a Weierstrass equation, and let be the discriminant
and c4 as in the definition of . Then:
1. E is singular if and only if = 0,
2. E has a node if and only if = 0 and c4 6= 0,
3. E has a cusp if and only if = 0 = c4 .
S
ee [2], chapter III, proposition 1.4, page 50.
210.8.2
Reduction of Elliptic Curves
Let E/Q be an elliptic curve (we could work over any number field K, but we choose Q for
simplicity in the exposition). Assume that E has a Weierstrass equation:
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
with coefficients in Z. Let p be a prime in Z. By reducing each of the coefficients ai modulo p
e over the finite field Fp (the field with p elements).
we obtain the equation of a cubic curve E
888
Definition 17.
e is a non-singular curve then E
e is an elliptic curve over Fp and we say that E has
1. If E
good reduction at p. Otherwise, we say that E has bad reduction at p.
e has a cusp then we say that E has additive reduction at p.

2. If E
e has a node then we say that E has multiplicative reduction at p. If the slopes
3. If E
of the tangent lines ( and as above) are in Fp then the reduction is said to be split
multiplicative (and non-split otherwise).
From Proposition 1 we deduce the following:
Corollary 4. Let E/Q be an elliptic curve with coefficients in Z. Let p Z be a prime. If
E has bad reduction at p then p | .
Examples:
1. E1 : y 2 = x3 + 35x + 5 has good reduction at p = 7.
2. However E1 has bad reduction at p = 5, and the reduction is additive (since modulo 5
we can write the equation as [(y 0) 0(x 0)]2 x3 and the slope is 0).
3. The elliptic curve E2 : y 2 = x3 x2 + 35 has bad multiplicative reduction at 5 and 7.
The reduction at 5 is split, while the reduction at 7 is non-split. Indeed, modulo 5 we
3
could write the equation as [(y 0) 2(x 0)][(y 0) + 2(x
0)] x , being the slopes
2 and 2. However, for p = 7 the slopes are not in F7 ( 1 is not in F7 ).
REFERENCES
New York, 1994.
889
210.9
conductor of an elliptic curve
Let E be an elliptic curve over Q. For each prime p Z define the quantity fp as follows:
0, if E has good reduction at p,
1, if E has multiplicative reduction at p,

fp =
2, if E has additive reduction at p, and p 6= 2, 3,
2 + p , if E has additive reduction at p = 2 or 3.
where p depends on wild ramification in the action of the inertia group at p of Gal(Q/Q)
on the Tate module Tp (E).
Definition 18. The conductor NE/Q of E/Q is defined to be:
Y
NE/Q =
pf p
p
where the product is over all primes and the exponent fp is defined as above.
Example:
Let E/Q : y 2 + y = x3 x2 + 2x 2. The primes of bad reduction for E are p = 5 and 7.
The reduction at p = 5 is additive, while the reduction at p = 7 is multiplicative. Hence
NE/Q = 175.
REFERENCES
New York, 1994.
210.10
elliptic curve
210.10.1
Basics
An elliptic curve over a field K is a projective nonsingular algebraic curve E over K of genus
1 together with a point O of E defined over K. The word genus is taken here in the
890
Figure 210.1: Graph of y 2 = x(x 1)(x + 1)

Figure 210.2: Graph of y 2 = x3 x + 1
algebraic geometry sense, and has no relation with the topological notion of genus (defined
as 1 /2, where is the Euler characteristic) except when the field of definition K is the
complex numbers C.
Using the Riemann-Roch theorem, one can show that every elliptic curve E is the zero set
of a Weierstrass equation of the form
E : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 ,
for some ai K, where the polynomial on the right hand side has no double roots. When K
has characteristic other than 2 or 3, one can further simpify this Weierstrass equation into
the form
E : y 2 = x3 27c4 x 54c6.
The extremely strange numbering of the coefficients is an artifact of the process by which the
above equations are derived. Also, note that these equation are for affine curves; to translate
them to projective curves, one has to homogenize the equations (replace x with X/Z, and y
with Y /Z).
210.10.2
Examples
We present here some pictures of elliptic curves over the field R of real numbers. These
pictures are in some sense not representative of most of the elliptic curves that people work
with, since many of the interesting cases tend to be of elliptic curves over algebraically closed
fields. However, curves over the complex numbers (or, even worse, over algebraically closed
fields in characteristic p) are very difficult to graph in three dimensions, let alone two.
Figure 210.1 is a graph of the elliptic curve y 2 = x3 x.
Figure 210.2 shows the graph of y 2 = x3 x + 1:
Finally, Figures 210.3 and 210.4 are examples of algebraic curves that are not elliptic curves.
Both of these curves have singularities at the origin.
Figure 210.3: Graph of y 2 = x2 (x + 1). Has two tangents at the origin.
891
Figure 210.4: Graph of y 2 = x3 . Has a cusp at the origin.
210.10.3
The Group Law
The points on an elliptic curve have a natural group structure, which makes the elliptic curve
into an abelian variety. There are many equivalent ways to define this group structure; two
of the most common are:
Every Weyl divisor on E is linearly equivalent to a unique divisor of the form [P ] [O]
for some P E, where O E is the base point. The divisor class group of E then
yields a group structure on the points of E, by way of this correspondence.
Let O E denote the base point. Then one can show that every line joining two points
on E intersects a unique third point of E (after properly accounting for tangent lines
as a multiple intersection). For any two points P, Q E, define their sum as:
1. Form the line between P and Q; let R be the third point on E that intersects this
line;
2. Form the line between O and R; define P + Q to be the third point on E that
intersects this line.
This addition operation yields a group operation on the points of E having the base
point O for identity.
210.10.4
Elliptic Curves over C
Over the complex numbers, the general correspondence between algebraic and analytic theory
specializes in the elliptic curves case to yield some very useful insights into the structure of
elliptic curves over C. The starting point for this investigation is the Weierstrass pfunction,
which we define here.
Definition 10. A lattice in C is a subgroup L of the additive group C which is generated
by two elements 1 , 2 C that are linearly independent over R.
Definition 11. For any lattice L in C, the Weierstrass pL function of L is the function
pL : C C given by

X
1
1
1
.
pL (z) := 2 +
z
(z )2 2
L\{0}
When the lattice L is clear from context, it is customary to suppress it from the notation
and simply write p for the Weierstrass pfunction.
properties of the Weierstrass pfunction:
892
p(z) is a meromorphic function with double poles at points in L.

p(z) is constant on each coset of C/L.
p(z) satisfies the differential equation
p0 (z) = 4p(z)3 g2 p(z) g3
where the constants g2 and g3 are given by
g2 := 60
L\{0}
g3 := 140
1
4
L\{0}
1
6
The last property above implies that, for any z C/L, the point (p(z), p0 (z)) lies on the
elliptic curve E : y 2 = 4x3 g2 x g3 . Let : C/L E be the map given by
(
(p(z), p0 (z)) z
/L
(z) :=
zL
(where denotes the point at infinity on E). Then is actually a bijection (!), and
moreover the map : C/L E is an isomorphism of Riemann surfaces as well as a group
isomorphism (with the addition operation on C/L inherited from C, and the elliptic curve
group operation on E).
We can go even further: it turns out that every elliptic curve E over C can be obtained in
this way from some lattice L. More precisely, the following is true:
Theorem 16.
1. For every elliptic curve E : y 2 = 4x3 bx c over C, there is a unique
lattice L C whose constants g2 and g3 satisfy b = g2 and c = g3 .
2. Two elliptic curves E and E 0 over C are isomorphic if and only if their corresponding
lattices L and L0 satisfy the equation L0 = L for some scalar C.
REFERENCES
1. Dale Husemoller, Elliptic Curves. SpringerVerlag, New York, 1997.
3. Joseph H. Silverman, The Arithmetic of Elliptic Curves. SpringerVerlag, New York, 1986.
893
210.11
height function
Definition 19. Let A be an abelian group. A height function on A is a function h : A R

with the properties:
1. For all Q A there exists a constant C1 , depending on A and Q, such that for all
P A:
h(P + Q) 6 2h(P ) + C1
2. There exists an integer m > 2 and a constant C2 , depending on A, such that for all
P A:
h(mP ) > m2 h(P ) C2
3. For all C3 R, the following set is finite:
{P A : h(P ) 6 C3 }
Examples:
1. For t = p/q Q, a fraction in lower terms, define H(t) = max{| p |, | q |}. Even
though this is not a height function as defined above, this is the prototype of what a
height function should look like.
2. Let E be an elliptic curve over Q. The function on E(Q), the points in E with
coordinates in Q, hx : E(Q) R :

log H(x(P )), if P 6= 0
hx (P ) =
0, if P = 0
is a height function (H is defined as above). Notice that this depends on the chosen
Weierstrass model of the curve.
3. The canonical height of E/Q (due to Neron and Tate) is defined by:
hC (P ) = 1/2 lim 4(N ) hx ([2N ]P )
N
where hx is defined as in (2).

Finally we mention the fundamental theorem of descent, which highlights the importance
of the height functions:
Theorem 25 (Descent). Let A be an abelian group and let h : A R be a height function.
Suppose that for the integer m, as in property (2) of height, the quotient group A/mA is
finite. Then A is finitely generated.
894
REFERENCES
210.12
j-invariant
Let E be an elliptic curve over Q with Weierstrass equation:

y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
with coefficients ai Q. Let:
b2
b4
b6
b8
c4
c6
=
=
=
=
=
=
a21 + 4a2 ,
2a4 + a1 a3 ,
a23 + 4a6 ,
a21 a6 + 4a2 a6 a1 a3 a4 + a23 a2 a24 ,
b22 24b4 ,
b32 + 36b2 b4 216b6
Definition 20.
1. The discriminant of E is defined to be
= b22 b8 8b34 27b26 + 9b2 b4 b6
2. The j-invariant of E is
j=
c34
3. The invariant differential is

=
dy
dx
= 2
2y + a1 x + a3
3x + 2a2 x + a4 a1 y
Example:
If E has a Weierstrass equation in the simplified form y 2 = x3 + Ax + B then
= 16(4A3 + 27B 2 ),
895
j=
1728(4A)3
210.13
rank of an elliptic curve
Let K be a number field and let E be an elliptic curve over K. By E(K) we denote the set
of points in E with coordinates in K.
Theorem 26 (Mordell-Weil). E(K) is a finitely generated abelian group.
he proof of this theorem is fairly involved. The main two ingredients are the so called
weak Mordell-Weil theorem (see below), the concept of height function for abelian groups
and the descent theorem.
See [2], Chapter VIII, page 189.
T
Theorem 27 (Weak Mordell-Weil). E(K)/mE(K) is finite for all m > 2.

The Mordell-Weil theorem implies that for any elliptic curve E/K the group of points has
the following structure:
M
E(K) ' Etorsion (K)
ZR
where Etorsion (K) denotes the set of points of finite order (or torsion group), and R is a
non-negative integer which is called the rank of the elliptic curve. It is not known how big
this number R can get for elliptic curves over Q. The largest rank known for an elliptic curve
over Q is 24 Martin-McMillen (2000).
Note: see Mazurs theorem for an account of the possible torsion subgroups over Q.
Examples:
1. The elliptic curve E1 /Q : y 2 = x3 + 6 has rank 0 and E1 (Q) ' 0.
2. Let E2 /Q : y 2 = x3 + 1, then E2 (Q) ' Z/6Z. The torsion group is generated by the
point (2, 3).
L
3. Let E3 /Q : y 2 = x3 + 109858299531561, then E3 (Q) ' Z/3Z Z5 . See generators
here.
4. Let E4 /Q : y 2 +1951/164xy 3222367/40344y = x3 +3537/164x2 40302641/121032x,
then E4 (Q) ' Z10 . See generators here.
REFERENCES
New York, 1994.
896

210.14
supersingular
An elliptic curve E over a field of characteristic p defined by the cubic equation f (w, x, y) = 0
is called supersingular if the coefficient of (wxy)p1 in f (w, x, y)p1 is zero.
A supersingular elliptic curve is said to have Hasse invariant 0; an ordinary (i.e. nonsupersingular) elliptic curve is said to have Hasse invariant 1.
This is equivalent to many other conditions. E is supersingular iff the invariant differential
is exact. Also, E is supersingular iff F : H 1 (E, OE ) H 1 (E, OE ) is nonzero where F is
induced from the Frobenius morphism F : E E.
210.15
the torsion subgroup of an elliptic curve injects

in the reduction of the curve
Let E be an elliptic curve defined over Q and let p Z be a prime. Assume E has a
Weierstrass equation of the form:
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
e be reduction of E modulo p (see bad reduction) which is a
with coefficients ai Z. Let E
curve defined over Fp = Z/pZ. We have a map (the reduction map)
e p)
p : E(Q) E(F
p (P ) = p ([x0 , y0 , z0 ]) = [x0 mod p, y0 mod p, z0 mod p] = Pe
e might be a singular curve at some points. We denote E

ens (Fp ) the set of
Recall that E
e We also define
non-singular points of E.
ens (Fp )}
E0 (Q) = {P E(Q) | p (P ) = Pe E
e = Ker(p )
E1 (Q) = {P E(Q) | p (P ) = Pe = O}
897
Proposition 11. There is an exact sequence of abelian groups

ens (Fp ) 0
0 E1 (Q) E0 (Q) E
where the right-hand side map is p restricted to E0 (Q).
Notation: Given a group G, we denote by G[m] the m-torsion of G, i.e. the points of order
m.
Proposition 12. Let E/Q be an elliptic curve (as above) and let m be a positive integer
such that gcd(p, m) = 1. Then:
1.
E1 (Q)[m] = {O}
e p ) is a non-singular curve, then the reduction map, restricted to E(Q)[m], is
2. If E(F
injective. This is
e p)
E(Q)[m] E(F
is injective.
Remark: part 2 of the proposition is quite useful when trying to compute the torsion
subgroup of E/Q. Note that this can be reinterpreted as follows: for all primes p which do
e p ) must be injective and therefore the number of m-torsion
not divide m, E(Q)[m] E(F
points divides the number of points defined over Fp .
Example:
Let E/Q be given by
y 2 = x3 + 3
The discriminant of this curve is = 3888 = 24 35 . Recall that if p is a prime of bad
e is non-singular
reduction, then p | . Thus the only primes of bad reduction are 2, 3, so E
for all p > 5.
e Then we have
Let p = 5 and consider the reduction of E modulo 5, E.
e
e (1, 2), (1, 3), (2, 1), (2, 4), (3, 0)}
E(Z/5Z)
= {O,
where all the coordinates are to be considered modulo 5 (remember the point at infinity!).
e
Hence N5 =| E(Z/5Z)
|= 6. Similarly, we can prove that N7 = 13.
Now let q 6= 5, 7 be a prime number. Then we claim that E(Q)[q] is trivial. Indeed, by the
remark above we have
| E(Q)[q] | divides N5 = 6, N7 = 13
so | E(Q)[q] | must be 1.
898
For the case q = 5 be know that | E(Q)[5] | divides N7 = 13. But it is easy to see that
if E(Q)[p] is non-trivial, then p divides its order. Since 5 does not divide 13, we conclude
that E(Q)[5] must be trivial. Similarly E(Q)[7] is trivial as well. Therefore E(Q) has trivial
torsion subgroup.
Notice that (1, 2) E(Q) is an obvious point in the curve. Since we have proved that there
is no non-trivial torsion, this point must be of infinite order! In fact
E(Q)
=Z
and the group is generated by (1, 2).
899
Chapter 211
14H99 Miscellaneous
211.1
Riemann-Roch theorem
Let C be a projective nonsingular curve over an algebraically closed field. If D is a divisor

on C, then
`(D) `(K D) = deg(D) + 1 g
where g is the genus of the curve, and K is the canonical divisor (`(K) = g).
Version: 3 Owner: mathcam Author(s): nerdy2
211.2
genus
Genus has number of distinct but compatible definitions.

In topology, if S is an orientable surface, its genus g(S) is the number of handles it has.
More precisely, from the classification of surfaces, we know that any orientable surface is a
sphere, or the connected sum of n tori. We say the sphere has genus 0, and that the connected
sum of n tori has genus n (alternatively, genus is additive with respect to connected sum,
and the genus of a torus is 1). Also, g(S) = 1(S)/2 where (S) is the Euler characteristic
of S.
In algebraic geometry, the genus of a smooth projective curve X over a field k is the
dimension over k of the vector space 1 (X) of global regular differentials on X. Recall
that a smooth complex curve is also a Riemann surface, and hence topologically a surface.
In this case, the two definitions of genus coincide.
900
Version: 6 Owner: bwebste Author(s): bwebste, muqabala, nerdy2
211.3
projective curve
A projective curve over a field k is an equidimensional projective variety over k of dimension

1. In other words, each of the irreducible components of this variety must have dimension 1.
Version: 2 Owner: muqabala Author(s): muqabala
211.4
proof of Riemann-Roch theorem
For a divisor D, let LD be the associated line bundle. By Serre duality, H 0 (LK D)
=
1
H (LD), so `(D)`(K D) = (D), the Euler characteristic of LD. Now, let p be a point of
C, and consider the divisors D and D + p. There is a natural injection LD LD + p. This
is an isomorphism anywhere away from p, so the quotient E is a skyscraper sheaf supported
at p. Since skyscraper sheaves are flasque, they have trivial higher cohomology, and so
(E) = 1. Since Euler characteristics add along exact sequences (because of the long exact
sequence in cohomology) (D + p) = (D) + 1. Since deg(D + p) = deg(D) + 1, we see that
if Riemann-Rock holds for D, it holds for D+p, and vice-versa. Now, we need only confirm
that the theorem holds for a single line bundle. OX is a line bundle of degree 0. `(0) = 1
and `(K) = g. Thus, Riemann-Roch holds here, and thus for all line bundles.
901
Chapter 212
14L17 Affine algebraic groups,
hyperalgebra constructions
212.1
affine algebraic group
An affine algebraic group over a field k is quasi-affine variety G (a locally closed subset of
affine space) over k, which is a equipped with a group structure such that the multiplication
map m : G G G and inverse map i : G G are algebraic.
For example, k is an affine algebraic group over itself with the group law being addition,
and as is k = k {0} with the group law multiplication. Other common examples of affine
algebraic groups are GLn k, the general linear group over k (identifying matrices with affine
space) and any algebraic torus over k.
212.2
algebraic torus
Let k be a field. Then k , the multiplicative group of k is an affine algebraic group over k.
An affine algebraic group of the form (k )n is called an algebraic torus over k.
The name is connected to the fact that if k = C, then an algebraic torus is the complexification
of the standard torus (S 1 )n .
902
Chapter 213
14M05 Varieties defined by ring
conditions (factorial,
Cohen-Macaulay, seminormal)
213.1
normal
Let X be a variety or algebraic set. X is said to be normal at a point p X if the local ring
Op is integrally closed. X is said to be normal if it is normal at every point. If X is
non-singular at p, it is normal at p, since regular local rings are integrally closed.
903
Chapter 214
14M15 Grassmannians, Schubert
varieties, flag manifolds
214.1
Borel-Bott-Weil theorem
Let G be a semisimple Lie group, and be an integral weight for that group. naturally
defines a one-dimensional representation C of the Borel subgroup B of G, by simply pulling
back the representation on the maximal torus T = B/U where U is the unipotent radical
of G. Since we can think of the projection map : G G/B as a principle B-bundle, to
each C , we get an associated fiber bundle L on G/B, which is obviously a line bundle.
Identifying L with its sheaf of holomorphic sections, we consider the sheaf cohomology
groups H i (L ). Realizing g, the Lie algebra of G, as vector fields on G/B, we see that g acts
on the sections of L over any open set, and so we get an action on cohomology groups. This
integrates to an action of G, which on H 0 (L ) is simply the obvious action of the group.
The Borel-Bott-Weil theorem states the following: if ( + , ) = 0 for any simple root of
g, then
H i (L ) = 0
for all i, where is half the sum of all the positive roots. Otherwise, let w W , the
Weyl group of G, be the unique element such that w(+) is dominant (i.e. (w(+), ) > 0
for all simple roots ). Then
H `(w) (L )
= V
where V is the unique irreducible representation of highest weight , and H i (L ) = 0 for all
other i. In particular, if is already dominant, then (L )
= V , and the higher cohomology
of L vanishes.
If is dominant, than L is generated by global sections, and thus determines a map
m : G/B P ((L )) .
904
This map is an obvious one, which takes the coset of B to the highest weight vector v0 of V .
This can be extended by equivariance since B fixes v0 . This provides an alternate description
of L .
For example, consider G = SL2 C. G/B is CP 1, the Riemann sphere, and an integral weight
is specified simply by an integer n, and = 1. The line bundle Ln is simply O(n), whose
sections are the homogeneous polynomials of degree n. This gives us in one stroke the
representation theory of SL2 C: (O(1)) is the standard representation, and (O(n)) is its
nth symmetric power. We even have a unified decription of the action of the Lie algebra,
derived from its realization as vector fields on CP 1 : if H, X, Y are the standard generators
of sl2 C, then
d
d
y
dx
dy
d
X=x
dy
d
Y =y
dx
H=x
214.2
flag variety
Let k be a field, and let V be a vector space over k of dimension n and choose an increasing
sequence i = (i1 , . . . , in ), with 1 6 i1 < < in 6 n. Then the (partial) flag variety F`(V, i)
associated to this data is the set of all flags {0} 6 V1 6 6 Vn with dim Vj = ij . This
has a natural embedding into the product of Grassmannians G(V, i1 ) G(V, in ), and its
image here is closed, making F`(V, i) into a projective variety over k. If k = C these are
often called flag manifolds.
The group Sl(V ) acts transtively on F`(V, i), and the stabilizer of a point is a parabolic subgroup.
Thus, as a homogeneous space, F`(V, i)
= Sl(V )/P where P is a parabolic subgroup of
Sl(V ). In particular, the complete flag variety is isomorphic to Sl(V )/B, where B is the
Borel subgroup.
905
Chapter 215
14R15 Jacobian problem
215.1
Jacobian conjecture
Let F : Cn Cn be a polynomial map, i.e.,

F (x1 , . . . , xn ) = (f1 (x1 , . . . , xn ), . . . , fn (x1 , . . . , xn ))
for certain polynomials fi C[X1 , . . . , Xn ].
If F is invertible, then its Jacobi determinant det(fi /xj ), which is a polynomial over C,
vanishes nowhere and hence must be a non-zero constant.
The Jacobian conjecture asserts the converse: every polynomial map Cn Cn whose Jacobi
determinant is a non-zero constant is invertible.
Version: 1 Owner: petervr Author(s): petervr
906
Chapter 216
216.1
Cholesky decomposition
A symmetric and positive definite matrix can be efficiently decomposed into a lower and
upper triangular matrix. For a matrix of any type, this is achieved by the LU decomposition
which factorizes A = LU. If A satisfies the above criteria, one can decompose more efficiently
into A = LLT , where L (which can be seen as the matrix square root of A) is a lower
triangular matrix with positive diagonal elements. L is called the Cholesky triangle.
To solve Ax = b, one solves first Ly = b for y, and then LT x = y for x.
A variant of the Cholesky decomposition is the form A = RT R , where R is upper triangular.
Cholesky decomposition is often used to solve the normal equations in linear least squares
problems; they give AT Ax = AT b , in which AT A is symmetric and positive definite.
To derive A = LLT , we simply equate coefficients on both sides of the equation:
a11
a21
a31
..
.
an1
a12 a1n
l
0
0
l
l
l
11
11
21
n1
a22 a2n
l21 l22 0 0 l22 ln2
a32 a3n
= ..
..
.. . .
.. . .
. 0 .
. 0
..
.. .
.
.
..
.
.
.
ln1 ln2 lnn
0 0 lnn
an2 ann
Solving for the unknowns (the nonzero lij s), for i = i, , n and j = i + 1, . . . , n, we get:
907
lii
v
!
u
i1
u
X
2
= t aii
lik
k=1
lji =
aji
i1
X
ljk lik
k=1
/lii
Because A is symmetric and positive definite, the expression under the square root is always
positive, and all lij are real.
References
216.2
Hadamard matrix
An n n matrix H = hij is an Hadamard matrix of order n if the entries of H are either

+1 or 1 and such that HH T = nI, where H T is the transpose of H and I is the order n
identity matrix.
In other words, an n n matrix with only +1 and 1 as its elements is Hadamard if the
inner product of two distinct rows is 0 and the inner product of a row with itself is n.
A few examples of hadamard matrices are
1
1
1 1
1
1

1 1
1 1 1
1 1
1
, 1 1 1 1
,
1 1 1
1 1 1 1 1 1 1
1
1
1 1
1 1 1 1
These matrices were first considered as Hadamard determinants, because the determinant
of an Hadamard matrix satisfies equality in Hadamards determinant theorem, which states
that if X = xij is a matrix of order n where |xij | 6 1 for all i and j, then
det(X) 6 nn/2
property 1:
The order of an Hadamard matrix is 1, 2 or 4n, where n is an integer.
908
property 2:
If the rows and columns of an Hadamard matrix are permuted, the matrix remains Hadamard.
property 3:
If any row or column is multiplied by 1, the Hadamard property is retained.
Hence it is always possible to arrange to have the first row and first column of an Hadamard
matrix contain only +1 entries. An Hadamard matrix in this form is said to be normalized.
Hadamard matrices are common in signal processing and coding applications.
Version: 3 Owner: giri Author(s): giri
216.3
Hessenberg matrix
An upper Hessenberg matrix is of the form
a11 a12
a21 a22
0 a32
0
0
..
..
.
.
0
0
a13
a23
a33
a43
..
.
..
.
a1,n1
a2,n1
a3,n1
a4,n1
..
.
an,n1
a1n
a2n
a3n
a4n
..
.
ann
and a lower Hessenberg matrix is of the form
a11
a21
..
.
a12
a22
..
.
0
a23
..
.
..
.
0
0
..
.
0
0
..
.
an2,1 an2,2 an2,3 an2,n1

0
an1,1 an1,2 an1,3 an1,n1 an1,n

an,1
an,2
an,3 an,n1
an,n
909
216.4
If A Mn(k) and A is supertriangular then An = 0
theorem: Let A be a square matrix of dimension n over a field k and A is supertriangular

then An = 0.
proof: Find the characteristic polynomial of A by computing the determinant of tI A.
The square matrix tI A is a triangular matrix. The determinant of a a triagular matrix is
the product of the diagonal element of the matrix. Therefore the characteristic polynomial is
p(t) = tn and by the Cayley-Hamilton theorem the matrix A satisfies the polynomial. That
is An = 0.
QED
216.5
Jacobi determinant
Let
f = f (x) = f (x1 , . . . , xn )
be a function of n variables, and let
u = u(x) = (u1 (x), . . . , un (x))
be a function of x, where inversely x can be expressed as a function of u,
x = x(u) = (x1 (u), . . . , xn (u))
The formula for a change of variable in an n-dimensional integral is then
int f (x)dn x = intu() f (x(u))| det(dx/du)|dn u
is an integration region, and one integrates over all x , or equivalently, all u u().
dx/du = (du/dx)1 is the Jacobi matrix and
| det(dx/du)| = | det(du/dx)|1
is the absolute value of the Jacobi determinant or Jacobian.
910
As an example, take n = 2 and

= {(x1 , x2 )|0 < x1 1, 0 < x2 1}
Define
p
= 2 log(x1 )
= 2x
u1 = cos
u2 = sin
Then by the chain rule and definition of the Jacobi matrix,
du/dx = (u1 , u2)/(x1 , x2 )

= ((u1 , u2)/(, ))((, )/(x1 , x2 ))

cos sin
1/x1 0
=
sin cos
0
2
The Jacobi determinant is
det(du/dx) = det{(u1 , u2 )/(, )} det{(, )/(x1 , x2 )}

= (2/x1 ) = 2/xi
and
d2 x = | det(dx/du)|d2u = | det(du/dx)|1d2 u
= (x1 /2) = (1/2) exp((u21 + u22/2))d2 u
This shows that if x1 and x2 are independent random variables with uniform distributions
between 0 and 1, then u1 and u2 as defined above are independent random variables with
standard normal distributions.
References
911
216.6
Jacobis Theorem
Jacobis Theorem If A is a skew-symmetric matrix of odd dimension, then det A = 0.

Proof. Suppose A is an n n square matrix. For the determinant, we then have det A =
det AT , and det(A) = (1)n det A. Thus, since n is odd, and AT = A, we have det A =
det A, and the theorem follows. 2
Remarks
1. According to [1], this theorem was given by Carl Gustav Jacob Jacobi (1804-1851) [2]
in 1827.

0 1
2. The 2 2 matrix
shows that Jacobis theorem does not hold for 2 2
1 0
matrices. The determinant of the 2n 2n block matrix with these 2 2 matrices on
the diagonal equals (1)n . Thus Jacobis theorem does not hold for matrices of even
dimension.
REFERENCES
2. The MacTutor History of Mathematics archive, Carl Gustav Jacob Jacobi
216.7
Kronecker product
Definition. Let A be a n n matrix (with entries aij ) and let B be a m m matrix. Then
the Kronecker product of A and B is the mn mn block matrix
a11 B a1n B
.. .
..
A B = ...
.
.
an1 B ann B
The Kronecker product is also known as the direct product or the tensor product [1].
Fundamental properties [1, 2]
912
1. The product is bilinear. If k is a scalar, and A, B and C are square matrices, such that
B and C are of the same dimension, then
A (B + C) = A B + A C,
(B + C) A = B A + C A,
k(A B) = (kA) B = A (kB).
2. If A, B, C, D are square matrices such that the products AC and BD exist, then
(A B)(C D) exists and
(A B)(C D) = AC BD.
If A and B are invertible matrices, then
(A B)1 = A1 B 1 .
3. If A and B are square matrices, then for the transpose (AT ) we have
(A B)T = AT B T .
4. Let A and B be square matrices of dimensions n and m. If {i |i = 1, . . . , n} are
the eigenvalues of A and {j |j = 1, . . . , m} are the eigenvalues of B, then {ij |i =
1, . . . , n, j = 1, . . . , m} are the eigenvalues of A B. Also,
det(A B) = (det A)n (det B)m ,
rank(A B) = rank A rank B,
trace(A B) = trace A trace B,
REFERENCES
2. T. Kailath, A.H. Sayed, B. Hassibi, Linear estimation, Prentice Hall, 2000
216.8
LU decomposition
Any non-singular matrix A can be expressed as a product A = LU; there exists exactly one
lower triangular matrix L and exactly one upper triangular matrix U of the form:
913

u11 u12 u1n
1 0 0
a11 a12 a1n
a21 a22 a2n l21 1 a2n 0 a22 u2n
A = ..
..
.. . .
.. ..
.. . .
.. = ..
..
.
.
.
.
.
. .
.
. .
.
0
0 unn
ln1 ln2 1
an1 an2 ann
if row exchanges (partial pivoting) are not necessary. With pivoting, we have to introduce a
permutation matrix P . Instead of A one then decomposes P A:
P A = LU
The LU decomposition can be performed in a way similar to Gaussian elimination.
LU decomposition is useful, e.g. for the solution of the exactly determined system of linear equations
Ax = b, when there is more than one right-hand side b. With A = LU the system becomes
LUx = b
or
Lc = b and Ux = c
c can be computed by forward substitution and x by back substitution.
216.9
Peetres inequality
Theorem [Peetres inequality] [1, 2] If t is a real number and x, y are vectors in Rn , then
1 + |x|2 t
1 + |y|2
2|t| (1 + |x y|2)|t| .
Proof. (Following [1].) Suppose b and c are vectors in Rn . Then, from (|b| |c|)2 0, we
obtain
2|b| |c| |b|2 + |c|2 .
914
Using this inequality and the Cauchy-Schwarz inequality, we obtain

1 + |b c|2 =
1 + |b|2 2b c + |c|2
1 + |b|2 + 2|b||c| + |c|2
1 + 2|b|2 + 2|c|2

2 1 + |b|2 + |c|2 + |b|2 |c|2
2(1 + |b|2 )(1 + |c|2 )
Let us define a = b c. Then for any vectors a and b, we have

1 + |a|2
2(1 + |a b|2 ).
1 + |b|2
(216.9.1)
Let us now return to the given inequality. If t = 0, the claim is trivially true for all x, y in
Rn . If t > 0, then raising both sides in inequality 216.9.1 to the power of t, using t = |t|, and
setting a = x, b = y yields the result. On the other hand, if t < 0, then raising both sides
in inequality 216.9.1 to the power to t, using t = |t|, and setting a = y, b = x yields the
result. 2
REFERENCES
1. J. Barros-Neta, An introduction to the theory of distributions, Marcel Dekker, Inc.,
1973.
2. F. Treves, Introduction To Pseudodifferential and Fourier Integral Operators, Vol. I,
Plenum Press, 1980.
216.10
Schur decomposition
If A is a complex square matrix of dimention n (i.e. A Matn (C)), then there exists a
unitary matrix Q Matn (C) such that
QH AQ = T = D + N
where H is the conjugate transpose, D = diag(1 , . . . , n ) (the i are eigenvalues of A), and
N Matn (C) is strictly upper triangular matrix. Furthermore, Q can be chosen such that
the eigenvalues i appear in any order along the diagonal. [GVL]
915
REFERENCES
[GVL] Golub, H. Gene, Van Loan F. Charles: Matrix Computations (Third Edition). The Johns
Hopkins University Press, London, 1996.
216.11
antipodal
Definition Suppose x and y are points on the n-sphere S n . If x = y then x and y are called
antipodal points. The antipodal map is the map A : S n S n defined as A(x) = x.
Properties
1. The antipodal mapA : S n S n is homotopic to the identity map if n is odd [1].
2. The degree of the antipodal map is (1)n+1 .
REFERENCES
1. V. Guillemin, A. Pollack, Differential topology, Prentice-Hall Inc., 1974.
216.12
conjugate transpose
Definition If A is a complex matrix, then the conjugate transpose A is the matrix

A = A T , where A is the complex conjugate of A, and A T is the transpose of A.
It is clear that for real matrices, the conjugate transpose coincides with the transpose.
Properties
1. If A and B are complex matrices of same dimension, and , are complex constants,
then
(A + B) = A + B ,
A = A.
916
2. If A and B are complex matrices such that AB is defined, then

(AB) = B A .
3. If A is a complex square matrix, then
det(A ) = det A,
trace(A ) = trace A,
(A )1 = (A1 ) ,
where trace and det are the trace and the determinant operators, and 1 is the inverse
operator.
4. Suppose h, i is the standard inner product on Cn . Then for an arbitrary complex
n n matrix A, and vectors x, y Cn , we have
hAx, yi = hx, A yi.
Notes
The conjugate transpose of A is also called the adjoint matrix of A, the Hermitian
conjugate of A (whence one usually writes A = A H ). The notation A is also used for the
conjugate transpose [2]. In [1], A is also called the tranjugate of A.
REFERENCES
2. M. C. Pease, Methods of Matrix Algebra, Academic Press, 1965.
See also
Wikipedia, conjugate transpose
216.13
corollary of Schur decomposition
theorem:A Cnn is a normal matrix if and only if there exists a unitary matrix Q Cnn
such that QH AQ = diag(1 , . . . , n )(the diagonal matrix) where H is the conjugate transpose.
[GVL]
917
proof: Firstly we show that if there exists a unitary matrix Q Cnn such that QH AQ =
diag(1 , . . . , n ) then A Cnn is a normal matrix. Let D = diag(1 , . . . , n ) then A may
be written as A = QDQH . Verifying that A is normal follows by the following observation
AAH = QDQH QDH QH = QDDH QH and AH A = QDH QH QDQH = QDH DQH . Therefore A is normal matrix because DD H = diag(1 1 , . . . , n n ) = D H D.
Secondly we show that if A Cnn is a normal matrix then there exists a unitary matrix Q Cnn such that QH AQ = diag(1 , . . . , n ). By Schur decompostion we know
that there exists a Q Cnn such that QH AQ = T (T is an upper triangular matrix).
Since A is a normal matrix then T is also a normal matrix. The result that T is a diagonal matrix comes from showing that a normal upper triagular matrix is diagonal (see
theorem for normal triangular matrices).
QED
REFERENCES
[GVL] Golub, H. Gene, Van Loan F. Charles: Matrix Computations (Third Edition). The Johns
Hopkins University Press, London, 1996.
216.14
covector
If V is a vector space over a field k, then a covector is a linear map : V k, that is, and element of the dual space to V . Thus, for example, a covector field on a differentiable manifold
is a synonym for a 1-form.
216.15
diagonal matrix
Definition [1, 2] Let A be a square matrix (with entries in any field). If all off-diagonal entries
of A are zero, then A is a diagonal matrix.
From the definition, we see that an n n diagonal matrix is completely determined by the
n entries on the diagonal; all other entries are zero. If the diagonal entries are a1 , a2 , . . . , an ,
918
then we denote the corresponding diagonal matrix by
a1 0 0 0
0 a2 0 0
0 0 a3 0
diag(a1 , . . . , an ) =
.
.. .. .. . .
. . .
.
0 0 0
an
Examples
1. The identity matrix and zero matrix are diagonal matrices. Also, any 1 1 matrix is
a diagonal matrix.
2. A matrix A is a diagonal matrix if and only if A is both an upper and lower triangular matrix.
Properties
1. If A and B are diagonal matrices of same order, then A + B and AB are again a
diagonal matrix. Further, diagonal matrices commute, i.e., AB = BA. It follows that
real (and complex) diagonal matrices are normal matrices.
2. A square matrix is diagonal if and only if it is triangular and normal (see this page).
3. The eigenvalues of a diagonal matrix A = diag(a1 , . . . , an ) are a1 , . . . , an . In consequence, for the determinant, we have det A = a1 a2 an , so A is invertible if and only
if all ai are non-zero. Then the inverse is given by
1
diag(a1 , . . . , an )
= diag(1/a1 , . . . , 1/an ).
4. If A is a diagonal matrix, then the adjugate of A is also a diagonal matrix [1].
Remarks
Diagonal matrices are also sometimes called quasi-scalar matrices [1].
REFERENCES
2. Wikipedia, diagonal matrix.

919
216.16
diagonalization
Let V be a finite-dimensional linear space over a field K, and T : V V a linear transformation.

To diagonalize T is to find a basis of V that consists of eigenvectors. The transformation is
called diagonalizable if such a basis exists. The choice of terminology reflects the following.
Proposition 5. The matrix of T relative to a given basis is a diagonal if and only if the
basis in question consists of eigenvectors.
Next, we give necessary and sufficient conditions for T to be diagonalizable. For K set
E = {u V : T u = u}.
The set E is a subspace of V called the eigenspace associated to . This subspace is
non-trivial if and only if is an eigenvalue of T .
Proposition 6. A transformation is diagonalizable if and only if
X
dim V =
dim E ,
where the sum is taken over all eigenvalues of the transformation.

There are two fundamental reasons why a transformation T can fail to be diagonalizable.
1. The characteristic polynomial of T does not factor into linear factors over K.
2. There exists an eigenvalue , such that the kernel of (T I)2 is strictly greater than
the kernel of (T I). Equivalently, there exists an invariant subspace where T acts
as a nilpotent transformation plus some multiple of the identity.
216.17
diagonally dominant matrix
Let A be a square matrix (possibly complex) of dimension n with entries aij . Then A is said
to be diagonally dominant if
n
X
|aii | >
|aij |
j=1,j6=i
for i from 1 to n.
In addition A is said to be strictly diagonally dominant if
|aii | >
n
X
j=1,j6=i
920
|aij |
for i from 1 to n.
216.18
eigenvalue (of a matrix)
Let A be an n n matrix of complex numbers. A number C is said to be an eigenvalue

of A if there is a nonzero n 1 column vector x for which
Ax = x.
This definition raises several natural questions, among them: Does any matrix of complex
numbers have eigenvalues? How many different eigenvalues can a matrix have? Given a
matrix, how does one compute its eigenvalues?
The answers to the above questions are usually studied in introductory linear algebra courses,
usually in the following sequence:
One learns that C is an eigenvalue of A precisely when satisfies
det(I A) = 0
where I denotes the n n identity matrix and det is the determinant function.
Basic facts about the determinant imply that det(I A) is a polynomial in of degree
n. This is often referred to as the characteristic polynomial of A. (Note: some define
the characteristic polynomial to be det(A I); for the purposes of finding eigenvalues
of A it makes no difference.)
From the fundamental theorem of algebra we know that any polynomial with complex
coefficients has at least one complex root, and at most n complex roots.
It follows that any matrix of complex numbers A has at least one eigenvalue, and at
most n eigenvalues.
If one is given a n n matrix A of real numbers, the above argument implies that A has
at least one complex eigenvalue; the question of whether or not A has real eigenvalues is
more subtle since there is no real-numbers analogue of the fundamental theorem of algebra.
It should not be a surprise then that some real matrices do not have real eigenvalues. For
example, let

0 1
A=
.
1 0
In this case det(I A) = 2 + 1; clearly no real number satisfies 2 + 1 = 0; hence A has

no real eigenvalues (although A has complex eigenvalues i and i).
921
If one converts the above theory into an algorithm for calculating the eigenvalues of a matrix
A, one is led to a two-step procedure:
Compute the polynomial det(I A).
Solve det(I A) = 0.
Unfortunately, computing n n determinants and finding roots of polynomials of degree n
are both computationally messy procedures for even moderately large n, so for most practical
purposes variations on this naive scheme are needed. See the eigenvalue problem for more
information.
Remark: The definition of an eigenvalue for a endomorphism can be found here. The present
entry is an extract of version 6 of that entry.
216.19
eigenvalue problem
The eigenvalue problem appears as part of the solution in many scientific or engineering
applications. An example of where it arises is the determination of the main axes of a
second order surface Q = xT Ax = 1 (with a symmetric matrix A). The task is to find the
places where the normal

Q
Q
, ,
(Q) =
= 2Ax
x1
xn
is parallel to the vector x, i.e Ax = x.
[picture to go here]
T
2
A solution x of the above equation with xT Ax = 1 has the squared distance
x x = d from
T
2
the origin. Therefore, x x = 1 and d = 1/. The main axes are ai = 1/ i (i = 1, . . . , n).
The general algebraic eigenvalue problem is given by

Ax = x, or (A I)x = 0
with I the identity matrix, with an arbitrary square matrix A, an unknown scalar , and the
unknown vector x. A non-trivial solution to this system of n linear homogeneous equations
922
exists if and only if the determinant

a11
a
a
12
1n

a21
a22
a2n

det(A I) = ..
.. = 0
..
..
.
.
.
.

an1
an2
ann
This nth degree polynomial in is called the characteristic equation. Its roots are called
the eigenvalues and the corresponding vectors x eigenvectors. In the example, x is a right
eigenvector for ; a left eigenvector y is defined by y T A = y T .
Solving this polynomial for is not a practical method to solve the eigenvalue problem; a
QR-based method is a much more adequate tool ([Golub89]); it works as follows:
A is reduced to the (upper) Hessenberg matrix H or, if A is symmetric, to a tridiagonal matrix
T.
This is done with a similarity transform: if S is a non-singular nn matrix, then Ax = x
is transformed to SAx = Sx = SAS 1 or By = y with y = Sx and B = SAS 1 ,
i.e. A and B share the same eigenvalues (not the eigenvectors). We will choose for S
Householder transformation. The eigenvalues are then found by applying iteratively the
QR decomposition, i.e. the Hessenberg (or tridiagonal) matrix H will be decomposed into
upper triangular matrices R and orthogonal matrices Q.
The algorithm is surprisingly simple: H = H1 is decomposed into H1 = Q1 R1 , then an H2
is computed, H2 = R1 Q1 . H2 is similar to H1 because H2 = R1 Q1 = Q1
1 H1 Q1 , and is
decomposed to H2 = Q2 R2 . Then H3 is formed, H3 = R2 Q2 , etc. In this way a sequence of
Hi s (with the same eigenvalues) is generated, that finally converges to (for conditions, see
[Golub89])
1 X X
X
0 2 X
X
0 0 3
X
..
..
.. . .
..
.
.
.
.
.
0 0 0 n1
0 0 0
0
for the Hessenberg and
923
X
X
..
.
X
n
1 0 0
0 2 0
0 0 3
..
..
..
.
.
.
0 0 0
0 0 0
..
.
0
0
0
..
.
n1
0
0
..
.
0
n
for the tridiagonal.

References
Golub89 Gene H. Golub and Charles F. van Loan: Matrix Computations, 2nd edn., The John
Hopkins University Press, 1989.
216.20
eigenvalues of orthogonal matrices
Theorem Let A be an n n orthogonal matrix . Then the following properties hold:

1. The characteristic polynomial p() = det(A I) of A is a reciprocal polynomial, that
is,
p() = n p(1/).
2. If is an eigenvalue of A, then so is 1/.
3. If n is odd, then either 1 or 1 is an eigenvalue of A.
4. All eigenvalues have unit modulus. In other words, if is an eigenvalue, then || = 1.
Here, || is the complex modulus of .
Proof. Since A1 = A T , we have A I = A(A T I/). Taking the determinant of
both sides, and using det A = det A T and det cA = cn det A (c C), yields
1
det(A I) = n det(A I),
and property (1) follows. For property (2), let us first note that since A is orthogonal, no
eigenvalue can be 0. Thus, p() = 0 implies that p(1/) = 0, and property (2) follows.
924
For property (3), suppose n is odd. Then A has an odd number of eigenvalues (including
multiplicities) that all satisfy property (2). Therefore, there must exist at least one eigenvalue
, such that = 1/. Then = 1/ = /||2. Taking the modulus of both sides, and
multiplying by || =
6 0, gives || = 1. For part (4), let be an eigenvalue corresponding to
an (column) eigenvector x, i.e., Ax = x. Taking the complex transpose gives x A = x .
Here x is the row vector corresponding to x, where each entry in complex conjugated. Also,
A = A T , where AT is the transpose of A and A is the complex conjugate of A. Since A is
real, we then have x A T = x . Thus
x x = x A T Ax = x x.
As an eigenvector, x is non-zero, so || = 1. 2
These results can be found in [1] (page 268). In the same reference, it is mentioned that
properties (1) and (3) can essentially be found in a paper published in 1854 by Francesco
Brioschi (1824-1897). Later in the same year, an improved proof was given by F. Faà di
Bruno (1825-1888) [1]. Bibliographies of Brioschi and Faà di Bruno can be found at the
MacTutor History of Mathematics archive [2, 3].
REFERENCES
2. The MacTutor History of Mathematics archive, entry on Francesco Brioschi
3. The MacTutor History of Mathematics archive, entry on Francesco Faà di Bruno.
216.21
eigenvector
Let A be an n n square matrix and x an n 1 column vector. Then the eigenvectors of

A are nonzero values x such that
Ax = x
In other words, these vectors become multiples of themselves when transformed by A.
One can find eigenvectors by first finding eigenvalues, then for each eigenvalue i , solving
the system
(A i I)xi = 0
925
to find a form which characterizes the eigenvector xi (any multiple of xi is also an eigenvector). Of course this is not the smart way to do it; for this, see singular value decomposition.
216.22
exactly determined
An exactly determined system of linear equations has precisely as many unknowns as

equations and is hence soluble.
216.23
free vector space over a set
In this entry we construct the free vector space over a set, or the vector space generated by a set [1]. For a set X, we shall denote this vector space by C(X). One application of this construction is given in [2], where the free vector space is used to define the
tensor product for modules.
To define the vector space C(X), let us first define C(X) as a set. For a set X and a field
K, we define
C(X) = {f : X K | f 1 (K\{0}) is finite}.
In other words, C(X) consists of functions f : X K that are non-zero only at finitely
many points in X. Here, we denote the identity element in K by 1, and the zero element
by 0. The vector space structure for C(X) is defined as follows. If f and g are functions
in C(X), then f + g is the mapping x 7 f (x) + g(x). Similarly, if f C(X) and K,
then f is the mapping x 7 f (x). It is not difficult to see that these operations are well
defined, i.e., both f + g and f are again functions in C(X).
Basis for C(X)

If a X, let us define the function a C(X) by

1 when x = a,
a (x) =
0 otherwise.
These functions form a linearly independent basis for C(X), i.e.,
C(X) = span{a }aX .
926
(216.23.1)
Here, the space span{a }aX consists of all finite linear combinations of elements in {a }aX .
It is clear that any element in span{a }aX is a member in C(X). Let us check the other
direction. Suppose f is a member in C(X). Then, let 1 , . . . , N be the distinct points in X
where f is non-zero. We then have
f =
N
X
f (i)i ,
i=1
and we have established equality in equation 216.23.1.

To see that the set {a }aX is linearly independent, we need to show that its any finite
subset is linearly independent. Let {1 , . . . , N } be such a finite subset, and suppose
PN
i=1 i i = 0 for some i K. Since the points i are pairwise distinct, it follows that
i = 0 for all i. This shows that the set {a }aX is linearly independent.
Let us define the mapping : X C(X), x 7 x . This mapping gives a bijection between
X and the basis vectors {a }aX . We can thus identify these spaces. Then X becomes a
linearly independent basis for C(X).
Universal property of : X C(X)
The mapping : X C(X) is universal in the following sense. If is an arbitrary mapping
from X to a vector space V , then there exists a unique mapping such that the below
diagram commutes:
C(X)
x) =
Proof. We define as the linear mapping that maps the basis elements of C(X) as (
(x). Then, by definition, is linear. For uniqueness, suppose that there are linear mappings

x) =
,
: C(X) V such that = =
. For all x X, we then have (
(x ).
Thus =
since both mappings are linear and the coincide on the basis elements.2
REFERENCES
1. W. Greub, Linear Algebra, Springer-Verlag, Fourth edition, 1975.
2. I. Madsen, J. Tornehave, From Calculus to Cohomology, Cambridge University press, 1997.
927
216.24
in a vector space, v = 0 if and only if = 0 or

v is the zero vector
Theorem Let V be a vector space over the field F . Further, let F and v V . Then
v = 0 if and only if is zero, or if v is the zero vector, or if both and v are zero.
Proof. Let us denote by 0F and by 1F the zero respectively unit elements in F . Similarly,
we denote by 0V the zero vector in V . Suppose = 0F . Then, by axiom 8, we have that
1F v + 0F v = 1F v,
for all v V . By axiom 6, there is an element in V that cancels 1F v. Adding this element
to both sides yields 0F v = 0V . Next, suppose that v = 0V . We claim that 0V = 0V for all
F . This follows from the previous claim if = 0, so let us assume that 6= 0F . Then
1 exists, and axiom 7 implies that
1 v + 0V = (1 v + 0V )
holds for all v V . Then using axiom 3, we have that
v + 0V = v
for all v V . Thus 0V satisfies the axiom for the zero vector, and 0V = 0V for all F .
For the other direction, suppose v = 0V and 6= 0F . Then, using axiom 3, we have that
v = 1F v = 1 v = 1 0V = 0V .
On the other hand, suppose v = 0V and v 6= 0V . If 6= 0, then the above calculation for v
is again valid whence
0V 6= v = 0V ,
which is a contradiction, so = 0. 2
This result with proof can be found in [1], page 6.
REFERENCES
1. W. Greub, Linear Algebra, Springer-Verlag, Fourth edition, 1975.
Version: 3 Owner: drini Author(s): drini, matte
928
216.25
invariant subspace
Let T : V V be a linear transformation of a vector space V . A subspace U V is called

an invariant subspace of T if
T (U) U
If U is an invariant subspace, then the restriciton of T to U defines a well defined linear

transformation of U.
216.26
least squares
The general problem to be solved by the least squares method is this: given some direct
measurements y of random variables, and knowing a set of equations f which have to be
satisfied by these measurements, possibly involving unknown parameters x, find the set of x
which comes closest to satisfying
f (x, y) = 0
where closest is defined by a y such that
f (x, y + y) = 0 and y 2 is minimized
The sum of squares of elements of a vector can be written in different ways
y 2 = y T y = ||y||2 =
yi2
The assumption has been made here that the elements of y are statistically uncorrelated and
have equal variance. For this case, the above solution results in the most efficent estimators
for x, y. If the y are correlated, correlations and variances are defined by a covariance
matrix C, and the above minimum condition becomes
y T C 1 y is minimized
Least squares solutions can be more or less simple, depending on the constraint equations
f . If there is exactly one equation for each measurement, and the functions f are linear in
the elements of y and x, the solution is discussed under linear regression. For other linear
models, see linear least squares. Least squares methods applied to few parameters can lend
929
themselves to very efficient algorithms (e.g. in real-time image processing), as they reduce
to simple matrix operations.
If the constraint equations are non-linear, one typically solves by linearization and in iterations,
using approximate values of x, y in every step, and linearizing by forming the matrix of
derivatives , df /dx (the Jacobian matrix) and possibly also df /dy at the last point of approximation.
Note that as the iterative improvements x, y tend towards zero (if the process converges),
y converges towards a final value which enters the minimum equation above.
Algorithms avoiding the explicit calculation of df /dx and df /dy have also been investigated,
e.g. [1]; for a discussion, see [2]. Where convergence (or control over convergence) is problematic, use of a general package for minimization may be indicated.
REFERENCES
1. M.L. Ralston and R.I. Jennrich, Dud, a Derivative-free Algorithm for Non-linear Least Squares,
Technometrics 20-1 (1978) 7.
2. W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C,
Second edition, Cambridge University Press, 1995.
Note: This entry is based on content from the The Data Analysis Briefbook
216.27
linear algebra
Linear algebra is the branch of mathematics devoted to the theory of linear structure. The
axiomatic treatment of linear structure is based on the notions of a linear space (more commonly known as a vector space), and a linear mapping. Broadly speaking, there are two
fundamental questions considered by linear algebra:
the solution of a linear equation, and
diagonalization, a.k.a. the eigenvalue problem.
From the geometric point of view, linear is synonymous with straight, and consequently
linear algebra can be regarded as the branch of mathematics dealing with lines and planes,
as well as with transformations of space that preserve straightness, e.g. rotations and
reflections. The two fundamental questions, in geometric terms, deal with
930
the intersection of hyperplanes, and

the principle axes of an ellipsoid.
Linearity is a very basic notion, and consequently linear algebra has applications in numerous
areas of mathematics, science, and engineering. Diverse disciplines, such as differential equations,
differential geometry, the theory of relativity, quantum mechanics, electrical circuits, computer graphics, and information theory benefit from the notions and techniques of linear
algebra.
Euclidean geometry is related to a specialized branch of linear algebra that deals with linear
measurement. Here the relevant notions are length and angle. A typical question is the determination of lines perpendicular to a given plane. A somewhat less specialized branch deals
with affine structure, where the key notion is that of area and volume. Here determinants
play an essential role.
Yet another branch of linear algebra is concerned with computation, algorithms, and numerical approximation. Important examples of such techniques include: Gaussian elimination,
the method of least squares, LU factorization, QR decomposition, Gram-Schmidt orthogonalization,
singular value decomposition, and a number of iterative algorithms for the calculation of
eigenvalues and eigenvectors.
Syllabus.
The following subject outline is meant to serve as a survey of some key topics in linear algebra
(Warning: the choice of topics is far from comprehensive, and no doubt reflects the biases
of the present authors background). As such, it may (or may not) be of use to motivated
auto-didacts interested in deepening their understanding of the subject.
1. Linear structure.
(a) Introduction: systems of linear equations, Gaussian elimination, matrices, matrix operations.
(b) Foundations: fields and vector spaces, subspace, linear independence, basis,
dimension, direct sum decomposition.
(c) Linear mappings: linearity axioms, kernels and images, injectivity, surjectivity, bijections, compositions, inverses, matrix representations, change of basis,
conjugation, similarity.
2. Affine structure.
(a) Determinants: characterizing properties, cofactor expansion, permutations, Cramers rule,
classical adjoint.
(b) Geometric aspects: Euclidean volume, orientation, equiaffine transformations,
determinants as geometric invariants of linear transformations.
3. Diagonalization.
931
(a) Basic notions: eigenvector, eigenvalue, eigenspace, characteristic polynomial.

(b) Obstructions: imaginary eigenvalues, nilpotent transformations, classification
of 2-dimensional real transformations.
(c) Structure theory: invariant subspaces, Cayley-Hamilton theorem, Jordan canonical form,
rational canonical form.
4. multi-linearity.
(a) Foundations: vector space dual, bilinearity, bilinear transpose, Gram-Schmidt
orthogonalization.
(b) Bilinearity: bilinear forms, symmetric bilinear forms, quadratic forms, signature
and Sylvesters theorem, orthogonal transformations, skew-symmetric bilinear forms,
symplectic transformations.
(c) tensor algebra: tensor product, contraction, invariants of linear transformations, symmetry operations.
5. Euclidean and Hermitian structure.
(a) Foundations: inner product axioms, the adjoint operation, symmetric transformations, skew-symmetric transformations, self-adjoint transformations, normal
transformations.
(b) Spectral theorem: diagonalization of self-adjoint transformations, diagonalization of quadratic forms.
6. Computational and numerical methods.
(a) Linear problems: LU-factorization, QR decomposition, least squares, Householder transformat

(b) Eigenvalue problems: singular value decomposition, Gauss and Jacobi-Siedel
iterative algorithms.
216.28
linear least squares
Let A be an m n matrix with m n and b an m 1 matrix. We want to consider the

problem
Ax b
where stands for the best approximate solution in the least squares sense, i.e. we want to
minimize the Euclidean norm of the residual r = Ax b
932
||Ax b||2 = ||r||2 =
" m
X
i=1
ri2
#1/2
We want to find the vector x which is closest to b in the column space of A.

Among the different methods to solve this problem, we mention normal equations, sometimes
ill-conditioned, QR decomposition, and, most generally, singular value decomposition. For
further reading, [Golub89], [Branham90], [Wong92], [Press95].
Example: Let us consider the problem of finding the closest point (vertex) to measurements
on straight lines (e.g. Trajectories emanating from a particle collision). This problem can
be described by Ax = b with
a11
..
A= .
am1

a12
b1

.. ; x = u ; b = ..
.
.
v
am2
bm
This is clearly an inconsistent system of linear equations, with more equations than unknowns, a frequently occurring problem in experimental data analysis. The system is, however, not very inconsistent and there is a point that lies nearly on all straight lines. The
solution can be found with the linear least squares method, e.g. by QR decomposition for
solving Ax = b:
QRx = b x = R1 QT b
References
Wong92 S.S.M. Wong, Computational Methods in Physics and Engineering, Prentice Hall, 1992.
Golub89 Gene H. Golub and Charles F. van Loan: Matrix Computations, 2nd edn., The John
Hopkins University Press, 1989.
Branham90 R.L. Branham, Scientific Data Analysis, An Introduction to Overdetermined Systems,
Springer, Berlin, Heidelberg, 1990.
Press95 W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes
in C, Second edition, Cambridge University Press, 1995. (The same book exists for
the Fortran language). There is also an Internet version which you can work from.
933
216.29
linear manifold
Definition [1] Suppose V is a vector space and suppose that L is a non-empty subset of V .
If there exists a v V such that L + v = {v + l | l L} is a vector subspace of V , then L
is a linear manifold of V . Then we say that the dimension of L is the dimension of L + v
and write dim L = dim(L + v). If dim L = dim V 1, then L is called a hyperplane.
A linear manifold is, in other words, a linear subspace that has possibly been shifted away
from the origin. For instance, in R2 examples of linear manifolds are points, lines (which are
hyperplanes), and R2 itself.
REFERENCES
1. R. Cristescu, Topological vector spaces, Noordhoff International Publishing, 1977.
216.30
matrix exponential
The exponential of a real valued square matrix A, denoted by eA , is defined as

A
X
1 k
=
A
k!
k=0
1
= I + A + A2 +
2
Let us check that eA is a real valued square matrix. Suppose M is a real number such
|Aij | < M for all entries Aij of A. Then |(A2 )ij | < nM 2 for all entries in A2 , where n is
P
nk
k+1
the dimension of A. Thus, in general, we have |(Ak )i,j | < nk M k+1 . Since
k=0 k! M
A
converges, we see that e converges to real valued n n matrix.
Example 1. Suppose A is nilpotent, i.e., Ar = 0 for some natural number r. Then
eA = I + A +
1 2
1
A ++
Ar1 .
2!
(r 1)!
934
Example 2. If A is diagonalizable, i.e., of the form A = LDL1 , where D is a diagonal matrix,

then
A
X
1
=
(LDL1 )k
k!
k=0
X
1
LD k L1
=
k!
k=0
= LeD L1 .
Further, if D = diag{a1 , , an }, then D k = diag{ak1 , , akn } whence

eA = L diag{ea1 , , ean }L1 .
For diagonalizable matrix A, it follows that det eA = etrace A . However, this formula is, in
fact, valid for all A.
properties
Let A be a square n n real valued matrix. Then the matrix exponential satisfies the
following properties
1. For the n n zero matrix O, eO = I, where I is the n n identity matrix.
2. If A = L diag{a1 , , an }L1 for an invertible n n matrix L, then
eA = L diag{ea1 , , ean }L1 .
3. If B is a matrix of the same type as A, and A and B commute, then eA+B = eA eB .
4. The trace of A and the determinant of eA are related by the by the formula
det eA = etrace A .
In effect, eA is always invertible. The inverse is given by
(eA )1 = eA .
5. If trace A = 0, then eA is a rotational matrix.
216.31
matrix operations
A matrix is an array, or a rectangular grid, of numbers. An m n matrix is one which has

m rows and n columns. Examples of matrices include:
935
The 2 3 matrix

1 2 2
A=
1 0 2
The 3 3 matrix
1 0 1
B = 3 1 5
4 2 0
The 3 1 matrix
The 1 2 matrix

7
C = 7
2
D=
1
2
All of our example matrices (except the last one) have entries which are integers. In general,
matrices are allowed to have their entries taken from any ring R. The set of all m n
matrices with entries in a ring R is denoted Mmn (R). If a matrix has exactly as many rows
as it has columns, we say it is a square matrix.
Addition of two matrices is allowed provided that both matrices have the same number of
rows and the same number of columns. The sum of two such matrices is obtained by adding
their respective entries. For example,

7
1
7 + (1)
6
7 + 4.5 = 7 + 4.5 = 11.5
2
0
2+0
2
Multiplication of two matrices is allowed provided that the number of columns of the first
matrix equals the number of rows of the second matrix. (For example, multiplication of a
2 3 matrix with a 3 3 is allowed, but multiplication of a 3 3 matrix with a 2 3 matrix
is not allowed, since the first matrix has 3 columns, and the second matrix has 2 rows, and
3 doesnt equal 2.) In this case the matrix multiplication is defined by
X
(AB)ij =
(A)ik (B)kj .
k
We will describe how matrix multiplication works is with an example. Let

1 0 1
1 2 2
A=
, B = 3 1 5
1 0 2
4 2 0
be the two matrices that we used above as our very first two examples of matrices. Since A
is a 2 3 matrix, and B is a 3 3 matrix, it is legal to multiply A and B, but it is not legal
936
to multiply B and A. The method for computing the product AB is to place A below and
to the right of B, as follows:
1 0 1
3 1 5

4 2 0
1 2 2
X Y
1 0 2
A is always in the bottom left corner, B is in the top right corner, and the product, AB, is
always in the bottom right corner. We see from the picture that AB will be a 2 3 matrix.
(In general, AB has as many rows as A, and as many columns as B.)
Let us compute the top left entry of AB, denoted by X in the above picture. The way to
calculate this entry of AB (or any other entry) is to take the dot product of the stuff above
it [which is (1, 3, 4)] and the stuff to the left of it [which is (1, 2, 2)]. In this case, we have
X = 1 1 + 3 (2) + 4 2 = 3.
Similarly, the top middle entry of AB (where the Y is in the above picture) is gotten by
taking the dot product of the stuff above it: (0, 1, 2), and the stuff to the left of it: (1, 2, 2),
which gives
Y = 0 1 + (1) (2) + 2 2 = 6
Continuing in this way, we can compute every entry of AB one by one to get
1 0 1
3 1 5
4 2 0

3 6 9
1 2 2
9 4 1
1 0 2
and so

3 6 9
AB =
.
9 4 1
If one tries to compute the illegal product BA using this procedure, one winds up with

1 2 2
1 0 2
1 0 1
?
3 1 5
4 2 0
The top left entry of this illegal product (marked with a ? above) would have to be the dot
product of the stuff above it: (1, 1), and the stuff to the left of it: (1, 0, 1), but these vectors
do not have the same length, so it is impossible to take their dot product, and consequently
it is impossible to take the product of the matrices BA.
937
Under the correspondence of matrices and linear transformations, one can show that matrix
multiplication is equivalent to composition of linear transformations, which explains why
matrix multiplication is defined in a manner which is so odd at first sight, and why this
strange manner of multiplication is so useful in mathematics.
216.32
nilpotent matrix
The square matrix A is said to be nilpotent if An = AA

A} = 0 for some positive integer
| {z
n times
n (here 0 denotes the matrix where every entry is 0).
Theorem 28 (Characterization of nilpotent matrices). A matrix is nilpotent iff its

eigenvalues are all 0.
ssume An = 0. Let be an eigenvalue of A. Then Ax = x for some nonzero vector x.

By induction n x = An x = 0, so = 0.
A
Conversely, suppose that all eigenvalues of A are zero. Then the chararacteristic polynomial
of A: det(I A) = n . It now follows from the Cayley-Hamilton theorem that An = 0.
Since the determinant is the product of the eigenvalues it follows that a nilpotent matrix has
determinant 0. Similarly, since the trace of a square matrix is the sum of the eigenvalues, it
follows that it has trace 0.
One class of nilpotent matrices are the strictly triangular matrices (lower or upper), this
follows from the fact that the eigenvalues of a triangular matrix are the diagonal elements,
and thus are all zero in the case of strictly triangular matrices.
Note for 2 2 matrices A the theorem implies that A is nilpotent iff A = 0 or A2 = 0.
Also its worth noticing that any matrix that is similar to a nilpotent matrix is nilpotent.
216.33
nilpotent transformation
A linear transformation N : U U is called nilpotent if there exists a k N such that

N k = 0.
A nilpotent transformation naturally determines a flag of subspaces

{0} ker N 1 ker N 2 . . . ker N k1 ker N k = U
938
and a signature
0 = n0 < n1 < n2 < . . . < nk1 < nk = dim U,
ni = dim ker N i .
The signature is governed by the following constraint, and characterizes N up to linear

isomorphism.
Proposition 7. A sequence of increasing natural numbers is the signature of a nil-potent
transformation if and only if
nj+1 nj 6 nj nj1
for all j = 1, . . . , k 1. Equivalently, there exists a basis of U such that the matrix of N
relative to this basis is block diagonal
N1 0
0 ... 0
0 N2 0 . . . 0
0
0
N
.
.
.
0
3
,
..
..
.. . .
..
.
. .
.
.
0
0
0 . . . Nk
with each of the blocks having the form
0
0
.
.
.
Ni =
0
0
0
1
0
..
.
0 ...
1 ...
.. . .
.
.
..
.
0 0
0 0 ...
0 0 ...
0 0
0 0
..
1 0
0 1
0 0
Letting di denote the number of blocks of size i, the signature of N is given by

ni = ni1 + di + di+1 + . . . + dk ,
i = 1, . . . , k
216.34
non-zero vector
A non-zero vector in a vector space V is a vector that is not equal to the zero vector in
V.
939
216.35
off-diagonal entry
Let A = (aij ) be a square matrix. An element aij is an off-diagonal entry if aij is not on
the diagonal, i.e., if i 6= j.
216.36
orthogonal matrices
A real square n n matrix Q is orthogonal if QT Q = I, i.e., if Q1 = QT . The rows and

columns of an orthogonal matrix form an orthonormal basis.
Orthogonal matrices play a very important role in linear algebra. Inner products are preserved under an orthogonal transform: (Qx)T Qy = xT QT Qy = xT y, and also the Euclidean norm
||Qx||2 = ||x||2 . An example of where this is useful is solving the least squares problem
Ax b by solving the equivalent problem QT Ax QT b.
Orthogonal matrices can be thought of as the real case of unitary matrices. A unitary
matrix U Cnn has the property U U = I, where U = U t (the conjugate transpose).
Since Qt = Qt for real Q, orthogonal matrices are unitary.
An orthogonal matrix Q has det(Q) = 1.
Important orthogonal matrices are Givens rotations and Householder transformations. They
help us maintain numerical stability because they do not amplify rounding errors.
Orthogonal 2 2 matrices are rotations or reflections if they have the form:

cos() sin()
cos() sin()
or
sin() cos()
sin() cos()
respectively.
REFERENCES
1. Friedberg, Insell, Spence. Linear Algebra. Prentice-Hall Inc., 1997.
This entry is based on content from The Data Analysis Briefbook (http://rkb.home.cern.ch/rkb/titleA.html)
940
216.37
orthogonal vectors
Two vectors, v1 and v2 , are orthogonal if and only if their inner product hx, yiis 0. In two
dimensions, orthogonal vectors are perpendicular (or in n dimensions in the plane defined
by the two vectors.)
A set of vectors is orthogonal when, taken pairwise, any two vectors in the set are orthogonal.
216.38
overdetermined
An overdetermined system of linear equations has more equations than unknowns. In

general, overdetermined systems have no solution. In some cases, linear least squares may
be used to find an approximate solution.
216.39
partitioned matrix
A partitioned matrix, or a block matrix, is a matrix M that has been constructed from
other smaller matrices. These smaller matrices are called blocks or sub-matrices of M.
For instance, if we partition the below 5 5
1
0
L =
2
2
2
then we can define the matrices A =
9 9 9
9 9 9 , and write L as
9 9 9
L=
A B
C D
matrix as follows
0 1 2 3
1 1 2 3
3 9 9 9
,
3 9 9 9
3 9 9 9
1 0
0 1
, B =
, or L =
941
1 2 3
1 2 3
A B
C D
2 3
, C = 2 3 , D =
2 3
If A1 , . . . , An are square matrices (of possibly different dimensions), then we define the
direct sum of the matrices A1 , . . . , An as the partitioned matrix
A1
..
diag(A1 , . . . , An ) =
,
.
An
where the off-diagonal blocks are zero.
Version: 4 Owner: mathcam Author(s): mathcam, matte
216.40
pentadiagonal matrix
An n n pentadiagonal matrix
c1 d 1
b
1 c2
a1 b2
0 a2
.. . . .
.
.
..
0
(with n 3) is a matrix of the form
e1
0
0
..
..
.
.
d2
e2
..
..
..
..
..
.
.
.
.
.
..
..
..
.
.
.
. en3
0
..
..
..
.
.
. dn2 en2
..
. an3 bn2 cn1 dn1
0
an2 bn1 cn
It follows that a pentadiagonal matrix is determined by five vectors: one n-vector c =

(c1 , . . . , cn ), two (n 1)-vectors b = (b1 , . . . , bn1 ) and d = (d1 , . . . , dn1 ), and two (n 2)vectors a = (a1 , . . . , an2 ) and e = (e1 , . . . , en2 ). It follows that a pentadiagonal matrix is
completely determined by n + 2(n 1) + 2(n 2) = 5n 6 scalars.
Version: 3 Owner: drini Author(s): matte, dandan
216.41
proof of Cayley-Hamilton theorem
We begin by showing that the theorem is true if the characteristic polynomial does not have
repeated roots, and then prove the general case.
Suppose then that the discriminant of the characteristic polynomial is non-zero, and hence
that T : V V has n = dim V distinct eigenvalues once we extend 1 to the algebraic closure
where k is the
Technically, this means that we must work with the vector space V = V k,
algebraic closure of the original field of scalars, and with T : V V the extended automorphism with
action
T(v a) T (V ) a, v V, a k.
1
942
of the ground field. We can therefore choose a basis of eigenvectors, call them v1 , . . . , vn , with
1 , . . . , n the corresponding eigenvalues. From the definition of characteristic polynomial
we have that
n
Y
cT (x) =
(x i ).
i=1
The factors on the right commute, and hence
cT (T )vi = 0
for all i = 1, . . . , n. Since cT (T ) annihilates a basis, it must, in fact, be zero.
To prove the general case, let (p) denote the discriminant of a polynomial p, and let us
remark that the discriminant mapping
T 7 (cT ),
T End(V )
is polynomial on End(V ). Hence the set of T with distinct eigenvalues is a dense open subset
of End(V ) relative to the Zariski topology. Now the characteristic polynomial map
T 7 cT (T ),
T End(V )
is a polynomial map on the vector space End(V ). Since it vanishes on a dense open subset,
it must vanish identically. Q.E.D.
216.42
proof of Schur decomposition
The columns of the unitary matrix Q in Schurs decomposition theorem form an orthonormal basis
of Cn . The matrix A takes the upper-triangular form D + N on this basis. Conversely, if
v1 , . . . , vn is an orthonormal basis for which A is of this form then the matrix Q with vi as
its i-th column satisfies the theorem.
To find such a basis we proceed by induction on n. For n = 1 we can simply take Q =
1. If n > 1 then let v Cn be an eigenvector of A of unit length and let V = v be
its orthoplement. If denotes the orthogonal projection onto the line spanned by v then
(1 )A maps V into V .
By induction there is an orthonormal basis v2 , . . . , vn of V for which (1)A takes the desired
form on V . Now A = A + (1 )A so Avi (1 )Avi ( (mod#1)) for i {2, . . . , n}.
Then v, v2 , . . . , vn can be used as a basis for the Schur decomposition on Cn .
Version: 1 Owner: debosberg Author(s): debosberg
943
216.43
singular value decomposition
Any real m n matrix A can be decomposed into

A = USV T
where U is an m m orthogonal matrix, V is an n n orthogonal matrix, and S is a
unique m n diagonal matrix with real, non-negative elements i , i = 1, . . . , min(m, n) , in
descending order:
1 2 min(m,n) 0
The i are the singular values of A and the first min(m, n) columns of U and V are the
left and right (respectively) singular vectors of A. S has the form:

if m n and 0 if m < n,
0
where is a diagonal matrix with the diagonal elements 1 , 2 , . . . , min(m,n) . We assume
now m n. If r = rank(A) < n , then
1 2 r > r+1 = = n = 0.
If r 6= 0 and r+1 = = n = 0, then r is the rank of A. In this case, S becomes an r r
matrix, and U and V shrink accordingly. SVD can thus be used for rank determination.
The SVD provides a numerically robust solution to the least-squares problem. The matrixalgebraic phrasing of the least-squares solution x is
x = (AT A)1 AT b
Then utilizing the SVD by making the replacement A = USV T we have
References

x = V 1 0 U T b.

944
216.44
skew-symmetric matrix
Definition:
Let A be an square matrix of dimension n n with real entries (aij ). The matrix A is
skew-symmetric if aij = aji for all 1 6 i 6 n, 1 6 j 6 n.
a11 = 0
a1n
..
..
A = ...
.
.
an1
ann = 0
The main diagonal entries are zero because ai,i = ai,i implies ai,i = 0.
One can see skew-symmetric matrices as a special case of complex skew-Hermitian matrices.
Thus, all properties of skew-Hermitian matrices also hold for skew-symmetric matrices.
Properties:
1. The matrix A is skew-symmetric if and only if At = A, where At is the matrix
transpose
2. For the trace operator, we have that tr(A) = tr(At ). Combining this with property
(1), it follows that tr(A) = 0 for a skew-symmetric matrix A.
3. Skew-symmetric matrices form a vector space: If A and B are skew-symmetric and
, R, then A + B is also skew-symmetric.
4. Suppose A is a skew-symmetric matrix and B is a matrix of same dimension as A.
Then B t AB is skew-symmetric.
5. All eigenvalues of skew-symmetric matrices are purely imaginary or zero. This result
is proven on the page for skew-Hermitian matrices.
6. According to Jacobis Theorem, the determinant of a skew-symmetric matrix of odd
dimension is zero.
Examples:

0 b
b 0
0
b c
b 0 e
c e 0
Version: 3 Owner: Daume Author(s): matte, Daume
945
216.45
square matrix
A square matrix has the same number of rows as columns.

Examples:
1.00000
0.50000
33333
25000
0.50000
0.33333
0.25000
0.20000
0.33333 0.25000
0.25000 0.20000
0.20000 0.16667
0.0.16667 0.14286
0.94
90

1
50
15
04
0.37
0.16
0.03
0.59
0.64
0.71
0.74
0.07
0.43
0.61
0.32 0.58
0.83 0.27
89 38 50
64 26 98
0.49 0.550.
0.03 0.76
40 96 83
0.17 0.29
The notation Matn (K) is often used to signify the standard class of square matrices which
are of dimension n n with elements draw from a field K. Thus, one would use a Mat3 (C)
to declare that a is a three-by-three matrix with elements that are complex numbers.
property: Suppose A and B are matrices such that AB is a square matrix. Then the
product BA is defined and also a square matrix.
216.46
strictly upper triangular matrix
A strictly upper triangular matrix is an upper triangular matrix which has 0 on the
main diagonal. Similarly A strictly lower triangular matrix is an upper triangular
matrix which has 0 on the main diagonal. i.e.
A strictly upper triangular matrix is of the form
0 a12 a13
0 0 a23
0 0
0
.. ..
..
. .
.
0 0
0
A strictly lower triangular matrix is of the form
946
a1n
a2n
a3n
..
..
.
.
0
0
0
0
a21 0
0
a31 a32 0
..
..
..
.
.
.
an1 an2 an3
0
0
. . ..
. .
0
216.47
symmetric matrix
Definition:
Let A be a square matrix of dimension n. The matrix A is symmetric if aij = aji for all
1 6 i 6 n, 1 6 j 6 n.
a11 a1n
..
..
A = ...
.
.
an1 ann
properties:
1. At = A where At is the matrix transpose
Examples:

a
b
c

b
c
b c
d e
e f
216.48
theorem for normal triangular matrices
Theorem ([1], pp. 82) Let A be a complex square matrix. Then A is diagonal if and only
if A is normal and triangular.
947
Proof. If A is a diagonal matrix, then the complex conjugate A is also a diagonal matrix.
Since arbitrary diagonal matrices commute, it follows that A A = AA . Thus any diagonal
matrix is a normal triangular matrix.
Next, suppose A = (aij ) is an normal upper triangular matrix. Thus aij = 0 for i > j, so for
the diagonal elements in A A and AA , we obtain
(A A)ii =
(AA )ii =
i
X
k=1
n
X
k=i
|aki|2 ,
|aik |2 .
For i = 1, we have
|a11 |2 = |a11 |2 + |a12 |2 + + |a1n |2 .
It follows that the only non-zero entry on the first row of A is a11 . Similarly, for i = 2, we
obtain
|a12 |2 + |a22 |2 = |a22 |2 + + |a2n |2 .
Since a12 = 0, it follows that the only non-zero element on the second row is a22 . Repeating
this argumentation for all rows, we see that A is a diagonal matrix. Thus any normal upper
triangular matrix is a diagonal matrix.
Suppose then that A is a normal lower triangular matrix. Then it is not difficult to see that
A is a normal upper triangular matrix. Thus, by the above, A is a diagonal matrix, whence
also A is a diagonal matrix. 2
REFERENCES
1. V.V. Prasolov, Problems and Theorems in Linear Algebra, American Mathematical Society, 1994.
Version: 3 Owner: bwebste Author(s): bwebste, matte
216.49
triangular matrix
An upper triangular matrix is of the form
948

a11 a12 a13
0 a22 a23
0
0 a33
..
..
..
.
.
.
0
0
0
a1n
a2n
a3n
..
.
ann
..
.
A lower triangular matrix is of the form
a11 0
0
a21 a22 0
a31 a32 a33
..
..
..
.
.
.
an1 an2 an3
..
.
0
0
0
..
.
ann
Triangular matrices allow numerous algorithmic shortcuts in many situations. For example,
Ax = b can be solved in n2 operations if A is an n n triangular matrix.
Triangular matrices have the following properties (prefix triangular with either upper
or lower uniformly):
The inverse of a triangular matrix is a triangular matrix.
The product of two triangular matrices is a triangular matrix.
The determinant of a triangular matrix is the product of the diagonal elements.
The eigenvalues of a triangular matrix are the diagonal elements.
The last two properties follow easily from the cofactor expansion of the triangular matrix.
216.50
tridiagonal matrix
An n n tridiagonal matrix is of the form
949
d1 u 1 0
0
0
l1 d2 u2
0
0 l2 d3 u3
.. .. . .
..
..
..
. .
.
.
.
.
0 0 ln2 dn1 un1

0 0
0
ln1 dn
216.51
under determined
An under determined system of linear equations has more unknowns than equations. It
can be consistent with infinitely many solutions, or have no solution.
216.52
unit triangular matrix
A unit triangular matrix is a triangular matrix with 1 on the diagonal.

i.e.
A unit upper triangular matrix is of the form
1 a12 a13
0 1 a23
0 0
1
.. ..
..
. .
.
0 0
0
a1n
a2n
a3n
..
..
.
.
1
A unit lower triangular matrix is of the form
1
0
0
a21 1
0
a31 a32 1
..
..
..
.
.
.
an1 an2 an3
950
0
0
. . ..
. .
1
216.53
unitary
Definitions. A unitary space V is a complex vector space with a distinguished positive definite
Hermitian form,
h, i : V V C,
which serves as the inner product on V .
A unitary transformation is a surjective linear transformation T : V V satisfying
hu, vi = hT u, T vi,
u, v V.
(216.53.1)
These are the isometries for Euclidean space.

A unitary matrix is a square complex-valued matrix, A, whose inverse is equal to its
conjugate transpose:
A1 = At .
Remarks.
1. A standard example of a unitary space is Cn with inner product
hu, vi =
n
X
ui vi ,
i=1
u, v Cn .
(216.53.2)
2. Unitary transformations and unitary matrices are closely related. On the one hand,
a unitary matrix defines a unitary transformation of Cn relative to the inner product
(216.53.2). On the other hand, the representing matrix of a unitary transformation
relative to an orthonormal basis is, in fact, a unitary matrix.
3. A unitary transformation is an automorphism. This follows from the fact that a unitary
transformation T preserves the inner-product norm:
kT uk = kuk,
u V.
Hence, if
T u = 0,
then by the definition (216.53.1) it follows that
kuk = 0,
and hence by the inner-product axioms that
u = 0.
Thus, the kernel of T is trivial, and therefore it is an automorphism.
951
(216.53.3)
4. Indeed, relation (216.53.3) can be taken as the definition of a unitary transformation.

This follows from the polarization identity for sesquilinear forms, namely
2hu, vi = ku + vk2 + iku + ivk2 (1 + i)kuk2 (1 + i)kvk2 .
The polarization identity is obtained by taking a linear combination of the following
two bilinearity relations:
hu + v, u + vi = hu, ui + hu, vi + hu, vi + hv, vi
hu + iv, u + ivi = hu, ui ihu, vi + ihu, vi + hv, vi
Thanks to the polarization identity, it is possible to show that if T preserves the norm,
then (216.53.1) must hold as well.
5. A simple example of a unitary matrix is the change of coordinates matrix between two
orthonormal bases. Indeed, let u1 , . . . , un and v1 , . . . , vn be two orthonormal bases, and
let A = (Aij ) be the corresponding change of basis matrix defined by
vj =
Aij ui ,
j = 1, . . . , n.
Substituting the above relation into the defining relations for an orthonormal basis,
hui, uj i = ij ,
hvk , vl i = kl ,
we obtain
ij Aik Ajl =
ij
Aik Ail = kl .
In matrix notation, the above is simply
AAt = I,
as desired.
6. Unitary spaces, transformations, and matrices are of fundamental importance in quantum mechanics.
Version: 11 Owner: drini Author(s): rmilson, drini, Daume
216.54
vector space
Let F be a field. A vector space V over F is a set with two binary operations, + : V V V
and : F V V , such that
952
1. (a + b) + c = a + (b + c) for all a, b, c V
2. a + b = b + a for all a, b V
3. There exists an element 0 V such that a + 0 = a for all a V
4. For any a V , there exists an element b V such that a + b = 0
5. k1 (k2 v) = (k1 k2 ) v for all k1 , k2 F and v V
6. 1 v = v for all v V
7. k (v + w) = (k v) + (k w) for all k F and v, w V
8. (k1 + k2 ) v = (k1 v) + (k2 v) for all k1 , k2 F and v V
Equivalently, a vector space is a module V over a field F .
The elements of V are called vectors, and the element 0 V is called the zero vector of V .
216.55
vector subspace
Definition Let V be a vector space over a field F , and let W be a nonempty subset of V .
If W is a vector space, then W is a vector subspace of V .
If W is a subset of a vector space V , then a sufficient condition for W to be a subspace is
that a + b W for all a, b W and all , F .
Examples
1. Every vector space contain two trivial vector subspaces: the entire vector space, and
the zero vector space.
2. If S and T are vector subspaces of a vector space V, then the vector sum
S + T = {s + t V | s S, t T }
and the intersection
S
are vector subspaces of V.
T = {u V | u S, u T }
3. Suppose S and T are vector spaces, and suppose L is a linear mapping L : S T .

Then Img L is a vector subspace of T , and Ker L is a vector subspace of S.
953
Results for vector subspaces

Theorem 1 [1] Let V be a finite dimensional vector space. If W is a vector subspace of V
and dim W = dim V , then W = V .
Theorem 2 [2] (Dimension theorem for subspaces) Let V be a finite dimensional vector
space with subspaces S and T . Then
\
dim(S + T ) + dim(S
T ) = dim S + dim T.
Theorem 3 ([1], page 42) (Dimension theorem for a composite mapping) Suppose U, V, W
are finite dimensional vector spaces, and L : U V and M : V W are linear mappings.
Then
\
dim Img L Ker M = dim Img L dim Img ML
= dim Ker ML dim Ker L.
REFERENCES
1. S. Lang, Linear Algebra, Addison-Wesley, 1966.
2. W.E. Deskins, Abstract Algebra, Dover publications, 1995.
Version: 7 Owner: matte Author(s): matte, [a] p
216.56
zero map
Definition Suppose X is a set, and Y is a vector space with zero vector 0. If T is a map
T : X Y , such that T (x) = 0 for all x in X, then T is a zero map.
Examples
1. On the set of non-invertible n n matrices, the determinant is a zero map.
2. If X is the zero vector space, any linear map T : X Y is a zero map. In fact,
T (0) = T (0 0) = 0T (0) = 0.
954
216.57
zero vector in a vector space is unique
Theorem The zero vector in a vector space is unique.

Proof. Suppose 0 and 0 are zero vectors in a vector space V . Then both 0 and 0 must
satisfy axiom 3, i.e., for all v V ,
v + 0 = v,
v + 0 = v.
Setting v = 0 in the first equation, and v = 0 in the second yields 0 + 0 = 0 and 0 + 0 = 0.
Thus, using axiom 2,
0 = 0 + 0
= 0 + 0
= 0,
and 0 = 0. 2
216.58
zero vector space
Definition A zero vector space is a vector space that contains only one element, a
zero vector.
Properties
1. Every vector space has a zero vector space as a vector subspace.
2. A vector space X is a zero vector space if and only if the dimension of X is zero.
3. Any linear map defined on a zero vector space is the zero map. If T is linear on {0},
then T (0) = T (0 0) = 0T (0) = 0.
Version: 2 Owner: drini Author(s): matte
955
Chapter 217
217.1
circulant matrix
A square matrix M : A A C is said to be circulant if for some cyclic permutation of

A, we have
M((x), (y)) = M(x, y)
x, y A
or equivalently
M((x), y) = M(x, 1 (y))
x, y A .
The same term is in use in a more restrictive sense, when the indexing set A is {1, 2, . . . , d}.
A matrix of the form
M1 M2 M3 . . . Md
Md M1 M2 . . . Md1
Md1 Md M1 . . . Md2
..
..
.. . .
..
.
.
.
.
.
M2 M3 M4 . . . M1
is called circulant. This concurs with the first definition, since we can define the permutation
: A A by
(x) = x + 1
1x<d
(d) = 1 .
Because the Jordan decomposition of a circulant matrix is rather simple, circulant matrices
have some interest in connection with the approximation of eigenvalues of more general
matrices. In particular, they have become part of the standard apparatus in the computerized
analysis of signals and images.
Version: 3 Owner: bwebste Author(s): Larry Hammick
956
217.2
matrix
A matrix is simply a mapping M : A B C of the product of two sets into some third
set. As a rule, though, the word matrix and the notation associated with it are used only in
connection with linear mappings. In such cases C is the ring or field of scalars.
Matrix of a linear mapping
Definition:Let V and W be finite-dimensional vector spaces over the same field k, with
bases A and B respectively, and let f : V W be a linear mapping. For each a A let
(kab )bB be the unique family of scalars (elements of k) such that
X
f (a) =
kab b .
bB
Then the family (Mab ) (or equivalently the mapping (a, b) 7 Mab of A B k) is called
the matrix of f with respect to the given bases A and B. The scalars Mab are called the
components of the matrix.
The matrix describes the function f completely; for any element
X
x=
xa a
aA
of V , we have
f (x) =
Mab b
aA
as is readily verified.
Any two linear mappings V W have a sum, defined pointwise; it is easy to verify that the
matrix of the sum is the sum, componentwise, of the two given matrices.
The formalism of matrices extends somewhat to linear mappings between modules, i.e.
extends to a ring k, not necessarily commutative, rather than just a field.
Rows and columns; product of two matrices
Suppose we are given three modules V, W, X, with bases A, B, C respectively, and two linear
mappings f : V W and g : W X. f and g have some matrices (Mab ) and (Nbc ) with
respect to those bases. The product matrix NM is defined as the matrix (Pac ) of the
function
x 7 g(f (x))
V W
with respect to the bases A and C. Straight from the definitions of a linear mapping and a
basis, one verifies that
X
Pac =
Mab Nbc
(217.2.1)
bB
957
for all a A and c C.

To illustrate the notation of matrices in terms of rows and columns, suppose the spaces
V, W, X have dimensions 2, 3, and 2 respectively, and bases
A = {a1 , a2 }
We write
B = {b1 , b2 , b3 }
M11 M12 M13

M21 M22 M23
C = {c1 , c2 } .

N11 N12
N21 N22 = P11 P12 .
P21 P22
N31 N32
(Notice that we have taken a liberty with the notation, by writing e.g. M12 instead of
Ma1 a2 .) The equation (217.2.1) shows that the multiplication of two matrices proceeds
rows by columns. Also, in an expression such as N23 , the first index refers to the row, and
the second to the column, in which that component appears.
Similar notation can describe the calculation
P of f (x) whenever
P f is a linear mapping. For
example, if f : V W is linear, and x = i xi ai and f (x) = i yi bi , we write

M11 M12 M13
x1 x2
= y1 y2 y3 .
M21 M22 M23
When, as above, a row vector denotes an element of a space, a column vector denotes
transpose of f , then, with respect

an element of the dual space. If, say, f : W
PV is the P
to the bases dual to A and B, an equation f ( j j j ) = i i i may be written

1
1
M11 M12 M13
2 ,
=
2
M21 M22 M23
3
One more illustration: Given a bilinear form L : V W k, we can denote L(v, w) by

w1

L11 L12 L13
w2 .
v1 v2
L21 L22 L23
w3
square matrix
A matrix M : A B C is called square if A = B, or if some bijection A B is implicit
in the context. (It is not enough for A and B to be equipotent.) Square matrices naturally
arise in connection with a linear mapping of a space into itself (called an endomorphism),
and in the related case of a change of basis (from one basis of some space, to another basis
of the same space).
Miscelleous usages of matrix
The word matrix has come into use in some areas where linear mappings are not at issue.
An example would be a combinatorical statement, such as Halls marriage theorem, phrased
in terms of 0-1 matrices instead of subsets of A B.
958
Remark
Matrices are heavily used in the physical sciences, engineering, statistics, and computer programming. But for purely mathematical purposes, they are less important than one might expect, and indeed are frequently irrelevant in linear algebra. Linear mappings, determinants,
traces, transposes, and a number of other simple notions can and should be defined without
matrices, simply because they have a meaning independant of any basis or bases. Many little
theorems in linear algebra can be proved in a simpler and more enlightening way without
matrices than with them. One more illustration: The derivative (at a point) of a mapping
from one surface to another is a linear mapping; it is not a matrix of partial derivatives,
because the matrix depends on a choice of basis but the derivative does not.
Version: 13 Owner: bbukh Author(s): bbukh, Larry Hammick, Manoj, djao
959
Chapter 218
15-XX Linear and multilinear
algebra; matrix theory
218.1
linearly dependent functions
Let f1 , f2 , f3 , ..., fn be real valued functions defined on some I R. Then f1 , f2 , f3 , ..., fn are
said to be linearly dependent if for some a1 , a2 , a3 , ..., an R not all zero, we have that:
n
X
i=1
ai fi (x) = 0, x I.
960
Chapter 219
15A03 Vector spaces, linear
dependence, rank
219.1
Sylvesters law
The rank and signature of a quadratic form are invariant under change of basis.
In matrix terms, if A is a real symmetric matrix, there is an invertible matrix S such that
Is
0
0
SAS T = 0 Irs 0
0
0
0
where r, the rank of A, and s, the signature of A, characterise the congruence class.
219.2
basis
A (Hamel) basis of a vector space is a linearly independent spanning set.

It can be proved that any two bases of the same vector space must have the same cardinality.
This introduces the notion of dimension of a vector space, which is precisely the cardinality
of the basis, and is denoted by dim(V ), where V is the vector space.
The fact that every vector space has a Hamel basis is an important consequence of the axiom of choice
(in fact, that proposition is equivalent to the axiom of choice.)
961
Examples.
= {ei }, 1 i n, is a basis for Rn (the n-dimensional vector space over the reals).
For n = 4,

0
0
0
1

0
0 1 0
,
,
,
=
0 0 1 0
1
0
0
0
= {1, x, x2 } is a basis for the vector space of polynomials of degree 2.

0 0
0 0
0 1
1 0
,
,
,
=
1 0
0 1
0 0
0 0
is a basis for the vector space of 2 2 matrices, and so is
0

0 0
0 0
0 1
2 0
, 1
,
,
.
0 4
0 0
0 0
0
2
The empty set is a basis for the trivial vector space which consists of the unique element
0.
Version: 11 Owner: Koro Author(s): Koro, akrowne
219.3
complementary subspace
Direct sum decomposition. Let U be a vector space, and V, W U subspaces. We say

that V and W span U, and write
U =V +W
if every u U can be expressed as a sum
u=v+w
for some v V and w W .
If in addition, such a decomposition is unique for all u U, or equivalently if
\
V
W = {0},
then we say that V and W form a direct sum decomposition of U and write
U = V W.
962
In such circumstances, we also say that V and W are complementary subspaces.

Here is useful characterization of complementary subspaces if U is finite-dimensional.
Proposition 8. Let U, V, W be as above, and suppose that U is finite-dimensional. The
subspaces V and W are complementary if and only if for every basis v1 , . . . , vm of V and
every basis w1 , . . . , wn a basis of W , the combined list
v1 , . . . , vm , w1 , . . . , wn
is a basis of U.
Let us also remark that direct sum decompositions of a vector space U are in a one-to
correspondence fashion with projections on U.
Orthogonal decomposition. Specializing somewhat, suppose that the ground field K
is either the real or complex numbers, and that U is either an inner product space or a
unitary space, i.e. U comes equipped with a positive-definite inner product
h, i : U U K.
In such circumstances, for every subspace V U we define the orthogonal complement of
V , denoted by V to be the subspace
V = {u U : hv, ui = 0, for all v V }.
Proposition 9. Suppose that U is finite-dimensional and V U a subspace. Then, V and
its orthogonal complement V determine a direct sum decomposition of U.
Note: the proposition is false if either the finite-dimensionality or the positive-definiteness
assumptions are violated.
219.4
dimension
Let V be a vector space over a field K. We say that V is finite-dimensional if there exists a
finite basis of V . Otherwise we call V infinite-dimensional.
It can be shown that every basis of V has the same cardinality. We call this cardinality the
dimension of V . In particular, if V is finite-dimensional, then every basis of V will consist
of a finite set v1 , . . . , vn . We then call the natural number n the dimension of V .
Next, let U V a subspace. The dimension of the quotient vector space V /U is called the
codimension of U relative to V .
963
Note: in circumstances where the choice of field is ambiguous, the dimension of a vector
space depends on the choice of field. For example, every complex vector space is also a real
vector space, and therefore has a real dimension, double its complex dimension.
219.5
every vector space has a basis
Every vector space has a basis. This result, trivial in the finite case, is in fact rather
surprising when one thinks of infinite dimension vector spaces, and the definition of a basis.
The theorem is equivalent to the axiom of choice family of axioms and theorems. Here we
will only prove that Zorns lemma implies that every vector space has a basis.
Let X be any vector space, and let A be the set of linear independent subsets of X. For
x, y A, we define x 6 y iff x y. It it easy to see that this (the canonical order
S relation on
0
subsets) defines a partial order on A. For each chain C A, define C = C. Now any
finite collection of vectors from C 0 must lie in a single set c C 0 , and as such they are
linearly independent. This shows that C 0 A and thus C 0 is an upper bound for C.
According to Zorns lemma A now has a maximal element, , which by definition is linearly
independant. But is also a spanning
set, for is this were not true let z X be any vector
S
not in the span of . Then {z} would again be a linearly independent) set larger than
, contradicting that is a maximal element. Thus is a basis for X.
*) If not, let (xi ) be a finite collection of vectors from so that a1 x1 +a2 x2 + +an xn +az z =
0, and not all ai are 0. W must then have az 6= 0, because if not we would have a non-trivial
linear combination of vectors from equalling the zero vector, contrary to being a linearly
independent set. But then
z=
a2
an
a1
x1 x2 xn ,
az
az
az
contrary to z not being in the span of .

Version: 4 Owner: cryo Author(s): cryo
219.6
flag
Let V be a finite-dimensional vector space. A filtration of subspaces

V1 V2 Vn = V
964
is called a flag in V . We speak of a complete flag when

dim Vi = i
for each i = 1, . . . , n.
Next, putting
dk = dim Vk ,
k = 1, . . . n,
we say that a list of vectors (u1 , . . . , udn ) is an adapted basis relative to the flag, if the first
d1 vectors give a basis of V1 , the first d2 vectors give a basis of V2 , etc. Thus, an alternate
characterization of a complete flag, is that the first k elements of an adapted basis are a
basis of Vk .
Example Let us consider Rn . For each k = 1, . . . , n let Vk be the span of e1 , . . . , ek , where
ej denotes the j th basic vector, i.e. the column vector with 1 in the j th position and zeros
everywhere else. The Vk give a complete flag in Rn . The list (e1 , e2 , . . . , en ) is an adapted
basis relative to this flag, but the list (e2 , e1 , . . . , en ) is not.
219.7
frame
Introduction Frames and coframes are notions closely related to the notions of basis and
dual basis. As such, frames and coframes are needed to describe the connection between
list vectors and the more general abstract vectors.
Frames and bases. Let U be a finite-dimensional vector space over a field K, and let I be
a finite, totally ordered set of indices 1 , e.g. (1, 2, . . . , n). We will call a mapping F : I U
a reference frame, or simply a frame. To put it plainly, F is just a list of elements of U with
indices belong to I. We will adopt a notation to reflect this and write Fi instead of F(i).
Subscripts are used when writing the frame elements because it is best to regard a frame as
a row-vector 2 whose entries happen to be elements of U, and write
F = (F1 , . . . , Fn ).
This is appropriate because every reference frame F naturally corresponds to a linear mapping
: KI U defined by
F
X
a 7
ai Fi , a KI .
iI
It is advantageous to allow general indexing sets, because one can indicate the use of multiple frames
of reference by employing multiple, disjoint sets of indices.
2
It is customary to use superscripts for the components of a column vector, and subscripts for the
components of a row vector. This is fully described in the vector entry.
965
is a linear form on KI that takes values in U instead of K. We use row

In other words, F
vectors to represent linear forms, and thats why we write the frame as a row vector.
is an isomorphism of vector spaces.
We call F a coordinate frame (equivalently, a basis), if F
fails to be,
Otherwise we call F degenerate, incomplete, or both, depending on whether F
respectively, injective and surjective.
Coframes and coordinates. In cases where F is a basis, the inverse isomorphism
1 : U KI
F
is called the coordinate mapping. It is cumbersome to work with this inverse explicitly, and
instead we introduce linear forms xi U , i I defined by
1 (u)(i),
xi : u 7 F
u U.
Each xi , i I is called the ith coordinate function relative to F, or simply the ith coordinate
3
. In this way we obtain a mapping
x : I U ,
i 7 xi
called the coordinate coframe or simply a coframe. The forms xi , i I give a basis of U . It
is the dual basis of Fi , i I, i.e.
xi (Fj ) = ji ,
i, j I,
where ji is the well-known Kronecker symbol.

In full duality to the custom of writing frames as row-vectors, we write the coframe as a
column vector whose components are the coordinate functions:

x1
x2

.. .
.
xn
Strictly speaking, we should be denote the coframe by xF and the the coordinate functions by xiF so as
to reflect their dependence on the choice of reference frame. Historically, writers have been loath to do this,
preferring a couple of different notational tricks to avoid ambiguity. The cleanest approach is to use different
symbols, e.g. xi versus yj , to distinguish coordinates coming from different frames. Another approach is to
use distinct indexing sets; in this way the indices themselves will indicate the choice of frame. Say we have
two frames F : I U and G : J U with I and J distinct finite sets. We stipulate that the symbol i
refers to elements of I and that j refers to elements of J, and write xi for coordinates relative to F and xj
for coordinates relative to G. Thats the way it was done in all the old-time geometry and physics papers,
and is still largely the way physicists go about writing coordinates. Be that as it may, the notation has its
problem and is the subject of long-standing controversy, named by mathematicians the debauche of indices.
The problem is that the notation employs the same symbol, namely x, to refer to two different objects,
namely a map with domain I and another map with domain J. In practice, ambiguity is avoided because
the old-time notation never refers to the coframe (or indeed any tensor) without also writing the indices.
This is the classical way of the dummy variable, a cousin to the f (x) notation. It creates some confusion for
beginners, but with a little practice its a perfectly serviceable and useful way to communicate.
3
966
1 and x with the above column-vector. This is quite natural because all of
We identify of F
these objects are in natural correspondence with a K-valued functions of two arguments,
U I K,
that maps an abstract vector u U and an index i I to a scalar xi (u), called the ith
component of u relative to the reference frame F.
Change of frame. Given two coordinate frames F : I U and G : J U, one can
easily show that I and J must have the same cardinality. Letting xi , i I and yj , j J
denote the coordinates functions relative to F and G, respectively, we define the transition
matrix from F to G to be the matrix
M:I J K
with entries
Mji = yj (Fi ),
i I, j J.
An equivalent description of the transition matrix is given by

X j
yj =
Mi xi , for all j J.
iI
It is also the custom to regard the elements of I as indexing the columns of the matrix, while
the elements of J label the rows. Thus, for I = (1, 2, . . . , n) and J = (1, 2, . . . , n
), we can
write

y1
M11 . . . M1n

.. . .
. .
.
. .. = .. F1 . . . Fn .
yn
Mn1 . . . Mnn
In this way we can describe the relation between coordinates relative to the two frames in
terms of ordinary matrix multiplication. To wit, we can write

x1
M11 . . . M1n
y1
.. .. . .
. .
. = .
. .. ..
xn
Mn1 . . . Mnn
yn
Notes. The term frame is often used to refer to objects that should properly be called a
moving frame. The latter can be thought of as a field of frames, or functions taking values
in the space of all frames, and are fully described elsewhere. The confusion in terminology
is unfortunate but quite common, and is related to the questionable practice of using the
word scalar when referring to a scalar field (a.k.a. scalar-valued functions) and using the
word vector when referring to a vector field.
967
We also mention that in the world of theoretical physics, the preferred terminology seems to
be polyad and related specializations, rather than frame. Most commonly used are dyad, for
a frame of two elements, and tetrad for a frame of four elements.
219.8
linear combination
If ~v1 , . . . , ~vn is a collection of vectors in a vector space V , then a linear combination of the
~vi V is a vector ~u of the form
n
X
~u =
ai~vi
i=1
for any ai which are elements of a field F . Thus if w

~ = a1~v1 + a2~v2 , where (a1 , a2 ) R, then
w
~ is a linear combination of v~1 and ~v2 .
219.9
linear independence
Let V be a vector space over a field F . Then for scalars 1 , 2 , . . . , n F the vectors
~v1 , ~v2 , . . . , ~vn V are linearly independent if the following condtion holds:
1~v1 + 2~v2 + + n~vn = 0 implies 1 = 2 = . . . = n = 0
Otherwise, if this conditions fails, the vectors are said to be linearly dependent. Furthermore,
an infinite set of vectors is linearly independent if all finite subsets are linearly independent.
In the case of two vectors, linear independence means that one of these vectors is not a scalar
multiple of the other.
As an alternate characterization of dependence, we have that a set of of vectors is linearly
dependent if and only if some vector in the set lies in the linear span of the other vectors in
the set.
Version: 22 Owner: rmilson Author(s): rmilson, NeuRet, Daume
219.10
list vector
Let K be a field and n a positive natural number. We define Kn to be the set of all mappings
from the index list (1, 2, . . . , n) to K. Such a mapping a Kn is just a formal way of speaking
of a list of field elements a1 , . . . , an K.
968
The above description is somewhat restrictive. A more flexible definition of a list vector is
the following. Let I be a finite list of indices 4 , I = (1, . . . , n) is one such possibility, and let
KI denote the set of all mappings from I to K. A list vector, an element of KI , is just such
a mapping. Conventionally, superscripts are used to denote the values of a list vector, i.e.
for u KI and i I, we write ui instead of u(i).
We add and scale list vectors point-wise, i.e. for u, KI and k K, we define u + KI and
ku KI , respectively by
(u + v)i
=
ui + vi ,
(ku)i = kui , i I.
i I,
We also have the zero vector 0 KI , namely the constant mapping

0i = 0,
i I.
The above operations give KI the structure of an (abstract) vector space over K.
Long-standing traditions of linear algebra hold that elements of KI be regarded as column
vectors. For example, we write a Kn as

a1
a2

a = .. .
.
an
Row vectors are usually taken to represents

linear forms on KI . In other words, row vectors

I
are elements of the dual space K . The components of a row vector are customarily
written with subscripts, rather than superscripts. Thus, we express a row vector (Kn )
as
= (1 , . . . , n ).
219.11
nullity
The nullity of a linear mapping is the dimension of the mappings kernel. For a linear
mapping T : V W , the nullity of T gives the number of linearly independent solutions to
the equation
T (v) = 0, v V.
The nullity is zero if and only if the linear mapping in question is injective.
4
Distinct index sets are often used when working with multiple frames of reference.
969
219.12
orthonormal basis
orthonormal basis
Let X be an inner product space over a field F and {x }J X be a set of orthonormal
vectors in the space. If we can write any vector in our space as the sum of vectors from
P the set
multiplied by elements of the field, or in symbols x X : {a }J F : x = J a x
then we say that {x } form an orthonormal basis for X.
Version: 4 Owner: say 10 Author(s): say 10, apmxi
219.13
physical vector
Definition. Let L be a collection of labels L. For each ordered pair of labels (, )

LL let M be a non-singular nn matrix, the collection of all such satisfying the following
functor-like consistency conditions:
For all L, the matrix M is the identity matrix.
For all , , L we have
M = M M ,
where the product in the right-hand side is just ordinary matrix multiplication.
We then impose an equivalence relation by stipulating that for all , L and u Rn , the
pair (, u) is equivalent to the pair (, M u). Finally, we define a physical vector to be an
equivalence class of such pairs relative to the just-defined relation.
The idea behind this definition is that the L are labels of various coordinate systems,
and that the matrices M encode the corresponding changes of coordinates. For label L
and list-vector u Rn we think of the pair (, u) as the representation of a physical vector
relative to the coordinate system .
Discussion. All scientific disciplines have a need for formalization. However, the extent to
which rigour is pursued varies from one discipline to the next. Physicists and engineers are
more likely to regard mathematics as a tool for modeling and prediction. As such they are
likely to blur the distinction between list vectors and physical vectors. Consider, for example
the following excerpt from R. Feynmans Lectures on physics [1]
All quantities that have a direction, like a step in space, are called vectors. A
vector is three numbers. In order to represent a step in space, . . ., we really need
970
three numbers, but we are going to invent a single mathematical symbol, r, which
is unlike any other mathematical symbols we have so far used. It is not a single
number, it represents three numbers: x, y, and z. It means three numbers, but
not only those three numbers, because if we were to use a different coordinate
system, the three numbers would be changed to x0 , y 0 , and z 0 . However, we
want to keep our mathematics simple and so we are going to use the same mark
to represent the three numbers (x, y, z) and the three numbers (x0 , y 0, z 0 ). That
is, we use the same mark to represent the first set of three numbers for one
coordinate system, but the second set of three numbers if we are using the other
coordinate system. This has the advantage that when we change the coordinate
system, we do not have to change the letters of our equations.
Surely you are joking Mr. Feynman!? What are we supposed to make of this definition? We
learn that a vector is both a physical quantity, and a list of numbers. However we also learn
that it is not really a specific list of numbers, but rather any of a number of possible lists.
Furthermore, the choice of which list is being used is dependent on the context (choice of
coordinate system), but this is not really important because we just end up using the same
symbol r regardless.
What a muddle! Even at the informal level one can do better than Feynman. The central
weakness of his definition is that he is unwilling to distinguish between physical vectors
(quantities) and their representation (lists of numbers). Here is an alternative physical
definition from a book by R. Aris on fluid mechanics [2].
There are many physical quantities with which only a single magnitude can be
associated. For example, when suitable units of mass and length have been
adopted the density of a fluid may be measured. . . . There are other quantities
associated with a point that have not only a magnitude but also a direction. If
a force of 1 lb weight is said to act at a certain point, we can still ask in what
direction the force acts and it is not fully specified until this direction is given.
Such a physical quantity is a vector. . . . We distinguish therefore between the
vector as an entity and its components which allow us to reconstruct it in a
particular system of reference. The set of components is meaningless unless the
system of reference is also prescribed, just as the magnitude 62.427 is meaningless
as a density until the units are also prescribed. . . ..
Definition. A Cartesian vector, a, in three dimensions is a quantity with three
components a1 , a2 , a3 in the frame of reference O123, which, under rotation of
the coordinate frame to O 123, become components a
1 , a
2 , a
3 , where
aj = lij ai .
The vector a is to be regarded as an entity, just as the physical quantity it
represents is an entity. It is sometimes convenient to use the bold face a to show
this. In any particular coordinate system it has components a1 , a2 , a3 , and it is
at other times convenient to use the typical component ai .
971
Here we see a carefully drawn distinction between physical quantities and the numerical
measurements that represent them. A system of measurement, i.e. a choice of units and
or coordinate axes, turns physical quantities into numbers. However the correspondence is
not fixed, but varies according to the choice of measurement system. This point of view can
be formalized by representing physical vectors as labeled list vectors, the label specifying a
choice of measurement systems. The actual vector is then defined to be an equivalence class
of such labeled list vectors.
REFERENCES
1. R. Feynman, R. Leighton, and M. Sands, Lectures on Physics, 11-4, Vol. I, Addison-Wesley.
2. R. Aris, Vectors, Tensors and the Basic Equations of Fluid Mechanics, Dover.
219.14
proof of rank-nullity theorem
Let T : V W be a linear mapping, with V finite-dimensional. We wish to show that

dim V = dim Ker T + dim Im T
The images of a basis of V will span Im T , and hence Im T is finite-dimensional. Choose
then a basis w1 , . . . , wn of Im T and choose preimages v1 , . . . , vn U such that
wi = T (vi ),
i = 1...n
Choose a basis u1 , . . . , uk of Ker T . The result will follow once we show that u1 , . . . , uk , v1 , . . . , vn
is a basis of V .
Let v V be given. Since T (v) Im T , by definition, we can choose scalars b1 , . . . , bn such
that
T (v) = b1 w1 + . . . bn wn .
Linearity of T now implies that T (b1 v1 + . . . + bn vn v) = 0, and hence we can choose scalars
a1 , . . . , ak such that
b1 v1 + . . . + bn vn v = a1 u1 + . . . ak uk .
Therefore u1 , . . . , uk , v1 , . . . , vn span V .
Next, let a1 , . . . , ak , b1 , . . . , bn be scalars such that
a1 u1 + . . . + ak uk + b1 v1 + . . . + bn vn = 0.
972
By applying T to both sides of this equation it follows that

b1 w1 + . . . + bn wn = 0,
and since w1 , . . . , wn are linearly independent that
b1 = b2 = . . . = bn = 0.
Consequently
a1 u1 + . . . + ak uk = 0
as well, and since u1 , . . . , uk are also assumed to be linearly independent we conclude that
a1 = a2 = . . . = ak = 0
also. Therefore u1 , . . . , uk , v1 , . . . , vn are linearly independent, and are therefore a basis.
Q.E.D.
219.15
rank
The rank of a linear mapping is the dimension of the mappings image. For a linear mapping
T : V W , the rank of T gives the number of independent linear constraints on v V
imposed by the equation
T (v) = 0.
The rank of a linear mapping is equal to the dimension of the codomain if and only if the
mapping in question is surjective.
The rank of a linear mapping is equal to the dimension of the domain if and only if the
mapping in question is injective.
219.16
rank-nullity theorem
The sum of the rank and the nullity of a linear mapping gives the dimension of the mappings
domain. More precisely, let T : V W be a linear mapping. If V is a finite-dimensional,
then
dim V = dim Ker T + dim Img T.
The intuitive content of the Rank-Nullity theorem is the principle that
973
Every independent linear constraint takes away one degree of freedom.

The rank is just the number of independent linear constraints on v V imposed by the
equation
T (v) = 0.
The dimension of V is the number of unconstrained degrees of freedom. The nullity is the
degrees of freedom in the resulting space of solutions. To put it yet another way:
The number of variables minus the number of independent linear constraints
equals the number of linearly independent solutions.
219.17
similar matrix
Definition A square matrix A is similar to a square matrix B if there exists a nonsingular

square matrix S such that
A = S1 BS.
(219.17.1)
Because given S we can define R = S1 and have A = RBR1, whether the inverse comes
first or last does not matter.
Transformations of the form S1 BS (or SBS1 ) are called similarity transformations.
Discussion Similarity is useful for turning recalcitrant matrices into pliant ones. The
canonical example is that a diagonalizable matrix A is similar to the diagonal matrix of its
eigenvalues , with the matrix of its eigenvectors acting as the similarity transformation.
That is,
A = TT1
=
v1 v2
0 ...
1
1

. . . vn 0 2 . . . v1 v2 . . . vn
..
..
.
. n
(219.17.2)
(219.17.3)
This follows directly from the equation defining eigenvalues and eigenvectors,
AT = T.
974
(219.17.4)
Through this transformation, weve turned A into the product of two orthogonal matrices
and a diagonal matrix. This can be very useful. As an application, see the solution for the
normalizing constant of a multidimensional Gaussian integral.
Properties of similar matrices

1. Similarity is reflexive. All square matrices A are similar to themselves via the similarity
transformation A = I1 AI, where I is the identity matrix.
2. Similarity is symmetric. If A is similar to B, then B is similar to A, as we can define
a matrix R = S1 and have
B = R1 AR
(219.17.5)
3. Similarity is transitive. If A is similar to B which is similar to C, we have
A = S1 BS = (S1 R1 )C(RS) = (RS)1 C(RS).
(219.17.6)
4. Because of 1, 2 and 3, similarity defines an equivalence relation (reflexive, transitive

and symmetric) on square matrices, partitioning the space of such matrices into a
disjoint set of equivalence classes.
5. If A is similar to B, then their determinants are equal, |A| = |B|. This is easily
verified:
|A| = |S1 BS| = |S1 ||B||S| = |S|1 |B||S| = ||B|.
(219.17.7)
6. Similar matrices represent the same linear transformation after a change of basis.
219.18
span
The span of a set of vectors ~v1 , . . . , v~n is the set of linear combinations a1~v1 + + an~vn . Its
denoted Sp(~v1 , . . . , ~vn ). The standard basis vectors i and j span R2 because every vector of
R2 can be represented as a linear combination of i and j. Sp(~v1 , . . . , ~vn ) is a subspace of Rn
and is the smallest subspace containing ~v1 , . . . , ~vn .
Span is both a noun and a verb; a set of vectors can span a vector space, and a vector can
be in the span of a set of vectors.
Checking span: To see whether a vector is in the span of other vectors, one can set up
an augmented matrix, since if ~u is in the span of ~v1 , ~v2 , then ~u = x1~v1 + x2~v2 . This is a
system of linear equations. Thus, if it has a solution, ~u is in the span of ~v1 , ~v2 . Note that
the solution does not have to be unique for ~u to be in the span.
975
To see whether a set of vectors spans a vector space, you need to check that there are at
least as many linearly independent vectors as the dimension of the space. For example, it
can be shown that in Rn , n + 1 vectors are never linearly independent, and n 1 vectors
never span.
219.19
theorem for the direct sum of finite dimensional

vector spaces
Theorem Let S and T be subspaces of a finite dimensional vector space V . Then

T V is the
direct sum of S and T , i.e., V = S T if and only if dim V = dim S +dim T and S T = {0}.
T
Proof. Suppose that V = S T . Then, by definition, V = S + T and S T = {0}. The
dimension theorem for subspaces states that
\
dim(S + T ) + dim S
T = dim S + dim T.
Since the dimension of the zero vector space {0} is zero, we have that
dim V = dim S + dim T,
and the first direction of the claim follows.
For the other direction, suppose dim V = dim S + dim T and S
dimension theorem theorem for subspaces imply that
T = {0}. Then the
dim(S + T ) = dim V.
Now S + T is a subspace of V with the same dimension as V so, by Theorem 1 on this page,
V = S + T . This proves the second direction. 2
219.20
vector
Overview. The word vector has several distinct, but interrelated meanings. The present
entry is an overview and discussion of these concepts, with links at the end to more detailed
definitions.
A list vector (follow the link to the formal definition) is a finite list of numbers 5 .
Most commonly, the vector is composed of real numbers, in which case a list vector is
5
infinite vectors arise in areas such as functional analysis and quantum mechanics, but require a much
more complicated and sophisticated theory.
976
just an element of Rn . Complex numbers are also quite common, and then we speak
of a complex vector, an element of Cn . Lists of ones and zeroes are also utilized, and
are referred to as binary vectors. More generally, one can use any field K, in which
case a list vector is just an element of Kn .
A physical vector (follow the link to a formal definition and in-depth discussion)
is a geometric quantity that correspond to a linear displacement. Indeed, it is customary to depict a physical vector as an arrow. By choosing a system of coordinates
a physical vector v, can be represented by a list vector (v 1 , . . . , v n )T . Physically, no
single system of measurement cannot be preferred to any other, and therefore such a
representation is not canonical. A linear change of coordinates induces a corresponding
linear transformation of the representing list vector.
In most physical applications vectors have a magnitude as well as a direction, and then
we speak of a euclidean vector. When lengths and angles can be measured, it is most
convenient to utilize an orthogonal system of coordinates. In this case, the magnitude
of a Euclidean vector v is given by the usual Euclidean norm of the corresponding list
vector,
qP
i 2
kvk =
i (v ) .
This definition is independent of the choice of orthogonal coordinates.
An abstract vector is an element of a vector space. An abstract Euclidean vector

is an element of an inner product space. The connection between list vectors and the
more general abstract vectors is fully described in the entry on frames.
Essentially, given a finite dimensional abstract vector space, a choice of a coordinate
frame (which is really the same thing as a basis) sets up a linear bijection between the
abstract vectors and list vectors, and makes it possible to represent the one in terms
of the other. The representation is not canonical, but depends on the choice of frame.
A change of frame changes the representing list vectors by a matrix multiplication.
We also note that the axioms of a vector space make no mention of lengths and angles.
The vector space formalism can be enriched to include these notions. The result is the
axiom system for inner products.
Why do we bother with the bare-bones formalism of length-less vectors? The reason
is that some applications involve velocity-like quantities, but lack a meaningful notion
of speed. As an example, consider a multi-particle system. The state of the system is
represented as a point in some manifold, and the evolution of the system is represented
by velocity vectors that live in that manifolds tangent space. We can superimpose
and scale these velocities, but it is meaningless to speak of a speed of the evolution.
Discussion. What is a vector? This simple question is surprisingly difficult to answer.

Vectors are an essential scientific concept, indispensable for both the physicist and the
mathematicians. It is strange then, that despite the obvious importance, there is no clear,
universally accepted definition of this term.
977
The difficulty is one of semantics. The term vector is ambiguous, but its various meanings
are interrelated. The different usages of vector call for different formal definitions, which
are similarly interrelated. List vectors are the most elementary and familiar kind of vectors.
They are easy to define, and are mathematically precise. However, saying that a vector is
just a list of numbers leads to conceptual difficulties.
A physicist needs to be able to say that velocities, forces, fluxes are vectors. A geometer,
and for that matter a pilot, will think of a vector as a kind of spatial displacement. Everyone would agree that a choice of a vector involves multiple degrees of freedom, and that
vectors can linearly superimposed. This description of vector evokes useful and intuitive
understanding, but is difficult to formalize.
The synthesis of these conflicting viewpoints is the modern mathematical notion of a vector
space. The key innovation of modern, formal mathematics is the pursuit of generality by
means of abstraction. To that end, we do not give an answer to What is a vector?,
but rather give a list of properties enjoyed by all objects that one may reasonably term a
vector. These properties are just the axioms of an abstract vector space, or as Forrest
Gump[1] might have put it, A vector is as a vector does.
The axiomatic approach afforded by vector space theory gives us maximum flexibility. We
can carry out an analysis of various physical vector spaces by employing propositions based
on vector space axioms, or we can choose a basis and perform the same analysis using list
vectors. This flexibility is obtained by means of abstraction. We are not obliged to say what
a vector is; all we have to do is say that these abstract vectors enjoy certain properties,
and make the appropriate deductions. This is similar to the idea of an abstract class in
object-oriented programming.
Surprisingly, the idea that a vector is an element of an abstract vector space has not made
great inroads in the physical sciences and engineering. The stumbling block seems to be a
poor understanding of formal, deductive mathematics and the unstated, but implicit attitude
that
formal manipulation of a physical quantity requires that it be represented by one
or more numbers.
Great historical irony is at work here. The classical, Greek approach to geometry was purely
synthetic, based on idealized notions like point and line, and on various axioms. Analytic
geometry, a la Descartes, arose much later, but became the dominant mode of thought
in scientific applications and largely overshadowed the synthetic method. The pendulum
began to swing back at the end of the nineteenth century as mathematics became more
formal and important new axiomatic systems, such as vector spaces, fields, and topology,
were developed. The cost of increased abstraction in modern mathematics was more than
justified by the improvement in clarity and organization of mathematical knowledge.
Alas, to a large extent physical science and engineering continue to dwell in the 19th century.
The axioms and the formal theory of vector spaces allow one to manipulate formal geomet978
ric entities, such as physical vectors, without first turning everything into numbers. The
increased level of abstraction, however, poses a formidable obstacle toward the acceptance
of this approach. Indeed, mainstream physicists and engineers do not seem in any great
hurry to accept the definition of vector as something that dwells in a vector space. Until this
attitude changes, vector will retain the ambiguous meaning of being both a list numbers,
and a physical quantity that transforms with respect to matrix multiplication.
REFERENCES
1. R. Zemeckis, Forrest Gump, Paramount Pictures.
979
Chapter 220
15A04 Linear transformations,
semilinear transformations
220.1
admissibility
Let k be a field, V a vector space over k, and T a linear operator over V . We say that a
subspace W of V is T -admissible if
1. W is a T - invariant subspace;
2. If f k[X] and f (T )x W , there is a vector y W such that f (T )x = f (T )y
Version: 2 Owner: gumau Author(s): gumau
220.2
conductor of a vector
Let k be a field, V a vector space, T : V V a linear transformation, and W a T invariant subspace of V . Let x V . The T - conductor of x in W is the set ST (x, W )
containing all polynomials g k[X] such that g(T )x W . It happens to be that this set
is an ideal of the polynomial ring. We also use the term T - conductor of x in W to refer
to the generator of such ideal. In the special case W = {0}, the T - conductor is called T annihilator of x. Another way to define the T - conductor of x in W is by saying that it
is a monic polynomial p of lowest degree such that p(T )x W . Of course this polynomial
happens to be unique. So the T -annihilator of x is the monic polynomial mx of lowest degree
such that mx (T )x = 0.
980
220.3
cyclic decomposition theorem
Let k be a field, V a finite dimensional vector space over k and T a linear operator over V .
Let W0 be a proper T - admissible subspace of V . There are non zero vectors x1 , ..., xr in V
with respective annihilator polynomials p1 , ..., pr such that
1. V = W0
Z(x1 , T )
...
Z(xr , T ) (See the cyclic subspace definition)
2. pk divides pk1 for every k = 2, ..., r

Moreover, the integer r and the minimals p1 , ..., pr are univocally determined by (1),(2) and
the fact that none of xk is zero.
220.4
cyclic subspace
Let k be a field, V a vector space over k, and x V . Let T : V V be a linear transformation.

The T -cyclic subspace generated by x is the smallest T - invariant subspace which
contains x.
We note this space by Z(x, T ). If Z(x, T ) = V we say that x is a cyclic vector of T .
Note that Z(x, T ) = {p(T )(x)/p k[X]}.
220.5
dimension theorem for symplectic complement

(proof )
We denote by V ? the dual space of V , i.e., linear mappings from V to R. Moreover, we

assume known that dim V = dim V for any vector space V .
We begin by showing that the mapping S : V V , a 7 (a, ) is an linear isomorphism.
First, linearity is clear, and since is non-degenerate, ker S = {0}, so S is injective. To show
that S is surjective, we apply the rank-nullity theorem to S, which yields dim V = dim im S.
We now have im S V and dim im S = dim V . (The first assertion follows directly from
the definition of S.) Hence im S = V (see this page), and S is a surjection. We have shown
that S is a linear isomorphism.
Let us next define the mapping T : V W , a 7 (a, ). Applying the rank-nullity theorem
981
to T yields
dim V
= dim ker T + dim im T.
(220.5.1)
Now ker T = W and im T = W . To see the latter assertion, first note that from the
definition of T , we have im T W . Since S is a linear isomorphism, we also have im T
W . Then, since dim W = dim W , the result follows from equation 220.5.1. 2
220.6
dual homomorphism
Definition. Let U, V be vector spaces over a field K, and T : U V be a homomorphism

(a linear map) between them. Letting U, V denote the corresponding dual spaces, we
define the dual homomorphism T : V U, to be the linear mapping with action
T,
V .
We can also characterize T as the adjoint of T relative to the natural evaluation bracket
between linear forms and vectors:
h, iU : U U K,
h, ui = (u),
U, u U.
To be more precise T is characterized by the condition

hT , uiU = h, T uiV ,
V , u U.
If U and V are finite dimensional, we can also characterize the dualizing operation as the
composition of the following canonical isomorphisms:
'
'
'
Hom(U, V ) U V (V ) U Hom(V , U ).
Category theory perspective. The dualizing operation behaves contravariantly with
respect to composition, i.e.
(S T ) = T S,
for all vectir space homomorphisms S, T with suitably matched domains. Furthermore, the
dual of the identity homomorphism is the identity homomorphism of the dual space. Thus,
using the language of category theory, the dualizing operation can be characterized as the
homomorphism action of the contravariant, dual-space functor.
982
Relation to the matrix transpose. The above properties closely mirror the algebraic
properties of the matrix transpose operation. Indeed, T is sometimes referred to as the
transpose of T , because at the level of matrices the dual homomorphism is calculated by
taking the transpose.
To be more precise, suppose that U and V are finite-dimensional, and let M Matn,m (K)
be the matrix of T relative to some fixed bases of U and V . Then, the dual homomorphism
T is represented as the transposed matrix M T Matm,n (K) relative to the corresponding
dual bases of U, V .
220.7
dual homomorphism of the derivative
Let Pn denote the vector space of real polynomials of degree n or less, and let Dn : Pn Pn1
denote the ordinary derivative. Linear forms on Pn can be given in terms of evaluations, and
(n)
so we introduce the following notation. For every scalar k R, let Evk (Pn ) denote the
evaluation functional
(n)
Evk : p 7 p(k), p Pn .
Note: the degree superscript matters! For example:
(1)
(1)
(1)
Ev2 = 2 Ev1 Ev0 ,

(2)
(2)
(2)
whereas Ev0 , Ev1 , Ev2 are linearly independent. Let us consider the dual homomorphism
D2 , i.e. the adjoint of D2 . We have the following relations:

(1)
(2)
(2)
(2)
D2 Ev0 = 23 Ev0 +2 Ev1 21 Ev2 ,

(1)
(2)
(2)
D2 Ev1 = 21 Ev0
+ 21 Ev2 .
(1)
(1)
(2)
(2)
(2)
In other words, taking Ev0 , Ev1 as the basis of (P1 ) and Ev0 , Ev1 , Ev2 as the basis
of (P2 ), the matrix that represents D2 is just
3
2 21
2
0
1
1
2
2
Note the contravariant relationship between D2 and D2 . The former turns second degree
polynomials into first degree polynomials, where as the latter turns first degree evaluations
into second degree evaluations. The matrix of D2 has 2 columns and 3 rows precisely
because D2 is a homomorphism from a 2-dimensional vector space to a 3-dimensional
vector space.
By contrast, D2 will be represented by a 2 3 matrix. The dual basis of P1 is
x + 1,
983
and the dual basis of P2 is

1
1
(x 1)(x 2), x(2 x),
x(x 1).
2
2
Relative to these bases, D2 is represented by the transpose of the matrix for D2 , namely
3

2 2 21
12 0 21
This corresponds to the following three relations:

D2 21 (x 1)(x 2) = 23 (x + 1) 21 x
D2 [x(2 x)]
=
2 (x + 1) +0 x
1
= 21 (x + 1) + 12 x
D2 2 x(x 1)
220.8
image of a linear transformation
Definition Let T : V W be a linear transformation. Then the image of T is the set

Im(T ) = {w W | w = T (v) for some v V }.
Properties
1. T is a surjection, if and only if Im(T ) = W .
2. Im(T ) is a vector subspace of W .
3. If V and W are finite dimensional, then the dimension of Im(T ) is given by the
rank-nullity theorem.
220.9
invertible linear transformation
An invertible linear transformation is a linear transformation T : V W which is a

bijection.
If V and W are finite dimensional, the linear transformation T is invertible if and only if the
matrix of T is not singular.
984
220.10
kernel of a linear transformation
Let T : V W be a linear transformation. The set of all vectors in V that T maps to 0 is

called the kernel (or nullspace) of T , and is denoted ker T .
ker T = { x V | T (x) = 0 }.
The kernel is a vector subspace of V , and its dimension is called the nullity of T . Note that
T is injective if and only if ker T = {0}.
Suppose T is as above, and U is a vector subspace of V . Then, for the restriction T |U , we
have
\
ker T |U = U
ker T.
When the transformations are given by means of matrices, the kernel of the matrix A is
ker A = { x V | Ax = 0 }.
Version: 9 Owner: yark Author(s): matte, yark, drini
220.11
linear transformation
Let V and W be vector spaces over the same field F . A linear transformation is a function
T : V W such that:
T (v + w) = T (v) + T (w) for all v, w V
T (v) = T (v) for all v V , and F
properties:
T (0) = 0.
If G : W U are linear transformations then G T : V U is also a linear
transformation.
The kernel ker(T ) = {v V | T (v) = 0} is a subspace of V .
The image Im(T ) = {T (v) | v V } is a subspace of W .
The inverse image T 1 (w) is a subspace if and only if w = 0.
A linear transformation is injective if and only if ker(T ) = {0}.
985
If v V then T 1 T (v) = v + u where u is any element of ker(T ).

If T is a surjection and w W then T T 1 (w) = w.
See also:
Wikipedia, linear transformation
220.12
minimal polynomial (endomorphism)
Let T be an endomorphism of an n-dimensional vector space V .

We say that the minimal polynomial, MT (X), is the unique monic polynomial of minimal degree
such that MT (T ) = 0. In other words the matrix representing MT (T ) is the zero matrix.
We say that P (X) is a zero polynomial for T if P (T ) is the zero endomorphism.
2
Firstly, End(V ) is a vector space of dimension n2 . Therefore the n2 +1 vectors, iv , T, T 2 , . . . T n ,

P 2
are linearly dependant. So there are coefficients, ai not all zero such that ni=0 ai T i = 0.
We conclude that a non-trivial zero polynomial for T exists. We take MT (X) to be a zero
polynomial for T of minimal degree with leading coefficient one.
Lemma: Suppose P (X) is a zero polynomial for T then MT (X) | P (X).
Proof:
By the division algorithm for polynomials, P (X) = Q(X)MT (X) + R(X) with degR <
degMT . We note that R(X) is also a zero polynomial for T and by minimality of MT (X),
must be just 0. Thus we have shown MT (X) | P (X).
Now suppose P (X) was also both minimal and monic, then we have MT (X) = P (X), which
gives uniqueness.
The minimal polynomial has a number of interesting properties
1. The roots are exactly the eigenvalues of the endomorphism
2. If the minimal polynomial of T splits into linear factors then T is upper-triangular
with respect to some basis
3. If the minimal polynomial of T splits into linear factors each of degree 1 if and only if
T is diagonal with respect to some basis
986
It is then a simple corollary of the fundamental theorem of algebra that every endomorphism
of a finite dimensional vector space over C may be upper-triangularized.
The minimal polynomial is intimately related to the characteristic polynomial for T . For let
T (X) be the characteristc polynomial. Then as was shown earlier MT (X) | T (X). It is
also a fact that the eigenvalues of T are exactly the roots of T . So when split into linear
factors the only difference between MT (X) and T (X) is the algebraic multiplicity of the
roots.
220.13
symplectic complement
Definition [1, 2] Let (V, ) be a symplectic vector space and let W be a vector subspace of
V . Then the symplectic complement of W is
W = {x V | (x, y) = 0 for all y W }.
It is easy to see that W is also a vector subspace of V . Depending on the relation between
W and W , W is given different names.
1. If W W , then W is an isotropic subspace (of V ).
2. If W W , then W is an coisotropic subspace.
T
3. If W W = {0}, then W is an symplectic subspace.
4. If W = W , then W is an Lagrangian subspace.
For the symplectic complement, we have the following dimension theorem.

Theorem [1, 2] Let (V, ) be a symplectic vector space, and let W be a vector subspace of
V . Then
dim V = dim W + dim W.
REFERENCES
1. D. McDuff, D. Salamon, Introduction to Symplectic Topology, Clarendon Press, 1997.
2. R. Abraham, J.E. Marsden, Foundations of Mechanics, 2nd ed., Perseus Books, 1978.
987
220.14
trace
The trace Tr(A) of a square matrix A is defined to be the sum of the diagonal entries of A.
Key formulas for the trace operator:
Tr(A + B) = Tr(A) + Tr(B)
Tr(AB) = Tr(BA)
The trace Tr(T ) of a linear transformation T : V V from any finite dimensional vector space
V to itself is defined to be the trace of any matrix representation of T with respect to a basis
of V . This scalar is independent of the choice of basis of V , and in fact is equal to the sum
of the eigenvalues of T (over a splitting field of the characteristic polynomial), including
multiplicities.
The following link presents some examples for calculating the trace of a matrix.
988
Chapter 221
15A06 Linear equations
221.1
Gaussian elimination
Gaussian elimination is used to solve a system of linear equations

a11 x1 + a12 x2 + . . . + a1m xm = b1
a21 x1 + a22 x2 + . . . + a2m xm = b2
..
..
..
..
..
.
.
.
.
.
an1 x1 + an2 x2 + . . . + anm xm = bn
or equivalently Ax = b, where
a11 a12 a1m

a21 a22 a2m
A = ..
..
..
.
.
.
.
.
.
an1 an2 anm
is a given n m matrix of coefficients (elements of some field K - usually R or C in physical

applications), where

b1
b2

b = ..
.
bn
is a given element of K n and where
x1
x2

x = ..
.
xm
is the solution some unknown element of K m .

989
The method consists of combining the coefficient matrix A with the right hand side b to get
the augmented n (m + 1) matrix
a11 a12 a1m b1

a21 a22 a2m b2
A b = ..
..
..
..
..
.
.
.
.
.
an1 an2 anm bn
A sequence of elementary row operations is then applied to this matrix so as to transform it

to row echelon form. The allowed operations are:
multiply a row by a nonzero scalar c,
swap two rows
add c times one row to another one.
Note that these operations are legal because x is a solution of the transformed system if
and only if it is a solution of the initial system.
If the number of equations equals the number of variables (m = n), and if the coefficient
matrix A is non-singular, then the algorithm will terminate when the augmented matrix has
the following form:
a011 a012 a01n b1

0
0 a0 a0
2n b2
22
..
.. . .
..
..
.
.
.
.
.
0
0
0 ann b0n
With these assumptions, there exists a unique solution, which can be obtained from the
above matrix by back substitution.
For the general case, the termination procedure is somewhat more complicated. First recall
that a matrix is in echelon form if each row has more leading zeros than the rows above it.
A pivot is the leading non-zero entry of some row. We then have
If there is a pivot in the last column, the system is inconsistent ; there will be no
solutions.
If that is not the case, then the general solution will have d degrees of freedom, where
d is the number of columns from 1 to m that have no pivot. To be more precise,
the general solution will have the form of one particular solution plus an arbitrary
linear combination of d linearly independent elements of K n .
990
In even more prosaic language, the variables in the non-pivot columns are to be considered free variables and should be moved to the right-hand side of the equation. The
general solution is then obtained by arbitrarily choosing values of the free variables, and
then solving for the remaining non-free variables that reside in the pivot columns.
A variant of Gaussian elimination is Gauss-Jordan elimination. In this variation we reduce to
echelon form, and then if the system proves to be consistent, continue to apply the elementary
row operations until the augmented matrix is in reduced echelon form. This means that not
only does each pivot have all zeroes below it, but that each pivot also has all zeroes above
it.
In essence, Gauss-Jordan elimination performs the back substitution; the values of the unknowns can be read off directly from the terminal augmented matrix. Not surprisingly,
Gauss-Jordan elimination is slower than Gaussian elimination. It is useful, however, for
solving systems on paper.
221.2
finite-dimensional linear problem
Let L : U V be a linear mapping, and v V an element of the codomain. When both

the domain U and codomain V are finite-dimensional, a linear equation
L(u) = v
can be solved by applying the Gaussian elimination algorithm (a.k.a row reduction). To do
so, we need to have bases u1 , . . . , un and v1 , . . . , vm of U and V , respectively, so that will can
somehow determine a matrix, and a column vector
b1
M11 M12 . . . M1n
b2
M2 M2 . . . M2
n
2

1
b = ..
M = ..
.. ,
..
.
.
.
.
.
.
.
m
m
m
M1 M2 . . . Mn
bm
that serve to represent of L and v, relative to the chosen bases, i.e.
L(ui ) = M1i v1 + M2i v2 + . . . + Mm
i vm ,
1
2
m
v = b v1 + b v2 + . . . b vm .
991
i = 1, . . . , n
We are then able to re-express our linear equation as the following problem: find all n-tuples
of scalars x1 , . . . , xn such that
M11 x1 + M12 x2 + . . . + M1n xn = b1
M21 x1 + M22 x2 + . . . + M2n xn = b2
..
..
..
..
.
.
.
.
m 1
m 2
m n
M1 x + M2 x + . . . + Mn x = bm
or quite simply as the matrix-vector equation
Mx = b,
where x is the n-place column vector of unknowns.
Note that the dimension of the domain is the number of variables, while the dimension
of the codomain is the number of equations. The equation is called under-determined or
over-determined depending on whether the former is greater than the latter, or vice versa.
In general, over-determined systems are inconsistent, while under-determined ones have multiple solutions. However, this is a rule of thumb only, and exceptions are not hard to find.
A full understanding of consistency, and multiple solutions relies on the notions of kernel,
image, rank, and is described by the rank-nullity theorem.
Notes. Elementary applications focus exclusively on the coefficient matrix and the righthand vector, and neglect to mention the underlying linear mapping. This is unfortunate,
because the concept of a linear equation is much more general than the traditional notion
of variables and equations, and relies in an essential way on the idea of a linear mapping.
The attached example regarding polynomial interpolation is a case in point. Polynomial
interpolation is a linear problem, but one that is specified abstractly, rather than in terms
of variables and equations.
221.3
homogeneous linear problem
Let L : U V be a linear mapping. A linear equation is called homogeneous if it has the

form
L(u) = 0, u U.
A homogeneous linear problem always has a trivial solution, namely u = 0. The key issue
in homogeneous problems is, therefore, the question of the existence of non-trivial solutions,
i.e. whether or not the kernel of L is trivial, or equivalently, whether or not L is injective.
992
221.4
linear problem
Let L : U V be a linear mapping, and v V an element of the codomain. A linear

equation , a.k.a a linear problem, is the constraint
L(u) = v,
imposed upon elements of the domain u U. The solution of a linear equation is the set of
u U that satisfy the above constraint, i.e. the pre-image L1 (v). The equation is called
inconsistent if no solutions exist, i.e. if the pre-image is the empty set. It is otherwise called
consistent.
The general solution of a linear equation has the form
u = up + uh ,
up , uh U,
where
L(up ) = v
is a particular solution and where
L(uh ) = 0
is any solution of the corresponding homogeneous problem, i.e. an element of the kernel of
L.
Notes. Elementary treatments of linear algebra focus almost exclusively on finite-dimensional linear proble
They neglect to mention the underlying mapping, preferring to focus instead on variables
and equations. However, the scope of the general concept is considerably wider, e.g. linear
differential equations such as
y 00 + y = 0.
221.5
reduced row echelon form
For a matrix to be in reduced row echelon form it has to first satisfy the requirements to be
in row echelon form and additionally satisfy the following requirements,
1. The first non-zero element in any row must be 1.
2. The first element of value 1 in any row must the only non-zero value in its column.
993
An example of a matrix in reduced row
0 1 2
0 0 0
0 0 0
0 0 0
0 0 0
echelon form could be,
6 0 1 0 0 4 0
0 1 1 0 0 1 1
0 0 0 1 0 4 1
0 0 0 0 1 2 1
0 0 0 0 0 0 0
221.6
row echelon form
A matrix is said to be in row echelon form when each row in the matrix starts with more
zeros then the previous row. Rows which are composed completely of zeros will be grouped
at the bottom of the matrix.
Examples of matrices in row echelon form include,
0 2 1
5 0 1 3 2
0 0 1 , 0 0 4 1 0
0 0 0
0 0 0 0 7
If several rows have the same number of leading zeros then the matrix is not in row echelon
form unless the rows contain no non-zero values.
221.7
under-determined polynomial interpolation
Consider the following interpolation problem:

Given x1 , y1 , x2 , y2 R with x1 6= x2 to determine all cubic polynomials
p(x) = ax3 + bx2 + cx + d,
x, a, b, c, d R
such that
p(x1 ) = y1 ,
p(x2 ) = y2 .
This is a linear problem. Let P3 denote the vector space of cubic polynomials. The underlying linear mapping is the multi-evaluation mapping
E : P 3 R2 ,
994
given by

p(x1 )
p 7
,
p(x2 )
p P3
The interpolation problem in question is represented by the equation

y
E(p) = 1
y2
where p P3 is the unknown. One can recast the problem into the traditional form by
taking standard bases of P3 and R2 and then seeking all possible a, b, c, d R such that

a
b
(x1 )3 (x1 )2 x1 1
= y1
3
2
y2
(x2 ) (x2 ) x2 1 c
d
However, it is best to treat this problem at an abstract level, rather than mucking about with
row reduction. The Lagrange Interpolation formula gives us a particular solution, namely
the linear polynomial
p0 (x) =
x x1
x x2
y1 +
y2 ,
x2 x1
x1 x2
xR
The general solution of our interpolation problem is therefore given as p0 + q, where q P3

is a solution of the homogeneous problem
E(q) = 0.
A basis of solutions for the latter is, evidently,
q1 (x) = (x x1 )(x x2 ),
q2 (x) = xq1 (x),
xR
The general solution to our interpolation problem is therefore

p(x) =
x x2
x x1
y1 +
y2 + (ax + b)(x x1 )(x x2 ),
x2 x1
x1 x2
x R,
with a, b R arbitrary. The general under-determined interpolation problem is treated in

an entirely analogous manner.
995
Chapter 222
15A09 Matrix inversion, generalized
inverses
222.1
matrix adjoint
The adjoint (or classical adjoint

given by
or adjugate) A or adj(A) of a square matrix A is
Aij = cof ji (A)

where cof ji(A) denotes the j, ith cofactor of A.
The adjoint is closely related to the matrix inverse, as
AA = det(A)I
characterizes A for A invertible.
222.1.1
Property
Let A be invertible and let

p(t) = det(tI A) = tn p1 (A)tn1 + ... + (1)n det(A)
1
this term is to distinguish this sense from the conjugate transpose over the complexes sense, which is
more recent and explored here.
996
be the characteristic polynomial of A, where p1 (A), p2 (A), . . . pn (A) = det(A) are the fundamental invariant polynomials of A 2 .
From p(A) = 0 (by Cayleys theorem) we get that
A(An1 p1 (A)An2 + ... + (1)n1 pn1 (A)) = (1)n1 det(A)I
so we have
A = pn1 (A)I pn2 (A)A + pn3 (A)A2 + (1)n2 p1 (A)An2 + (1)n1 An1
222.2
matrix inverse
222.2.1
Basics
The inverse of an n n matrix A is denoted by A1 . The inverse is defined so that

AA1 = A1 A = In
Where In is the n n identity matrix.
It should be stressed that only square matrices have inverses proper however, a matrix of
any dimension may have left and right inverses (which will not be discussed here).
A precondition for the existence of the matrix inverse A1 is that det A 6= 0 (the determinant
is nonzero), the reason for which we will see in a second.
The general form of the inverse of a matrix A is
A1 =
1
(A )T
det(A)
where A is the adjoint of A. This can be thought of as a generalization of the 2 2 formula

given in the next section. However, due to the inclusion of the determinant in the expression,
it is impractical to actually use this to calculate inverses.
2
Note that p1 (A) = tr(A), the trace of A.
997
This general form also explains why the determinant must be nonzero for invertability; as
we are dividing through by its value.
222.2.2
Calculating the Inverse
Method 1:
An easy way to calculate the inverse of a matrix by hand is to form an augmented matrix
[A|I] from A and In , then use Gaussian elimination to transform the left half into I. At the
end of this procedure, the right half of the augmented matrix will be A1 (that is, you will
be left with [I|A1 ]).
Method 2:
One can calculate the i, jth element of the inverse by using the formula
A1
ji = cof ij (A)/ det A
Where cof ij (A) is the i, jth cofactor expansion of the matrix A.
Note that the indices on the left-hand side are swapped relative to the right-hand side. This
has the effect of the transpose which appears in the general form of the inverse in the previous
section.
2-by-2 case:
For the 22 case, the general formula reduces to a memorable shortcut. For the 22 matrix

a b
M=
c d
The inverse M 1 is always
1
det M

d b
c a
Where det M is simply ad bc.

Remarks
Some caveats: computing the matrix inverse for ill-conditioned matrices is error-prone; special care must be taken and there are sometimes special algorithms to calculate the inverse
of certain classes of matrices (for example, Hilbert matrices).
998
222.2.3
Avoiding the Inverse and Numerical Calculation
The need to find the matrix inverse depends on the situation whether done by hand or by
computer, and whether the matrix is simply a part of some equation or expression or not.
Instead of computing the matrix A1 as part of an equation or expression, it is nearly always
better to use a matrix factorization instead. For example, when solving the system Ax = b,
actually calculating A1 to get x = A1 b is discouraged. LU-factorization is typically used
instead.
We can even use this fact to speed up our calculation of the inverse by itself. We can cast
the problem as finding X in
AX = B
For n n matrices A, X, and B (where X = A1 and B = In ). To solve this, we first find
the LU decomposition of A, then iterate over the columns, solving Ly = P bk and Uxk = y
each time (k = 1 . . . n). The resulting values for xk are then the columns of A1 .
222.2.4
Elements of Invertible Matrices
Typically the matrix elements are members of a field when we are speaking of inverses (i.e.
the reals, the complex numbers). However, the matrix inverse may exist in the case of the
elements being members of a commutative ring, provided that the determinant of the matrix
is a unit in the ring.
222.2.5
References
Golub and Van Loan, Matrix Computations, Johns Hopkins Univ. Press, 1996.
Matrix Math, http://easyweb.easynet.co.uk/ mrmeanie/matrix/matrices.htm
999
Chapter 223
15A12 Conditioning of matrices
223.1
singular
An n n matrix A is called singular if its rows or columns are not linearly independent.
This is equivalent to the following conditions:
The determinant det(A) = 0.
The rank of A is less than n.
The nullity of A is greater than zero (null(A) > 0).
The homogeneous linear system Ax = 0 has a non-trivial solution.
More generally, any endomorphism of a finite dimensional vector space with a non-trivial
kernel is a singular transformation.
Because a singular matrix A has det(A) = 0, it is not invertible.
1000
Chapter 224
15A15 Determinants, permanents,
other special matrix functions
224.1
Cayley-Hamilton theorem
Let T be a linear operator on a finite-dimensional vector space V , and let c(t) be the
characteristic polynomial of T . Then c(T ) = T0 , where T0 is the zero transformation. In
other words, T satisfies its own characteristic equation.
224.2
Cramers rule
Let Ax = b be the matrix form of a system of n linear equations in n unknowns, with x and
b as n 1 column vectors and A an n n matrix. If det(A) 6= 0, then this system has a
unique solution, and for each i (1 i n) ,
xi =
det(Mi )
det(A)
where Mi is A with column i replaced by b.

1001
224.3
cofactor expansion
Let M = (Mij ) be an n n matrix with entries in some field of scalars. Let mij (M) denote
the (n 1) (n 1) submatrix obtained by deleting row i and column j of M, and set
cof ij (M) = (1)i+j det(mij (M)).
The matrices mij (M) are called the minors of M, and the scalars cof ij (M) the cofactors.
The usual definition of the determinant
X
det(M) =
sgn()M11 M22 Mnn
(224.3.1)
Sn
implies the following representation of the determinant in terms of cofactors. For every
j = 1, 2, . . . , n we have
det(M) =
det(M) =
n
X
i=1
n
X
Mij cof ij (M)

Mji cof ji(M).
i=1
These identities are called, respectively, the cofactor expansion of the determinant along the
j th column, and the j th row.
Example. Consider a general 3 3 determinant

a1 a2 a3

b1 b2 b3 = a1 b2 c3 + a2 b3 c1 + a3 b1 c2 a1 b3 c2 a3 b2 c1 a2 b1 c3 .

c1 c2 c3
The above can equally well be expressed as a cofactor expansion along the first row:

a1 a2 a3

b1 b2 b3 = a1 b2 b3 a2 b1 b3 + a3 b1 b2
c1 c2
c1 c3
c2 c3

c1 c2 c3
= a1 (b2 c3 b3 c2 ) a2 (b1 c2 b3 c1 ) + a3 (b1 c2 b2 c1 );
or along the second

a1 a2

b1 b2

c1 c2
column:

a3
b1 b3
a1 a3
a1 a3

b3 = a2
+ b2 c1 c3 c2 b1 b3
c
c
1
3
c3
= a2 (b1 c3 b3 c1 ) + b2 (a1 c3 a3 c1 ) c2 (a1 b3 a3 b1 );
or indeed as four other such expansion corresponding to rows 2 and 3, and columns 1 and 3.
Version: 7 Owner: rmilson Author(s): rmilson, [a] p
1002
224.4
determinant
Overview
The determinant is an algebraic operation that transforms a square matrix into a scalar.
This operation has many useful and important properties. For example, the determinant
is zero if and only if the corresponding system of homogeneous equations is singular. The
determinant also has important geometric applications, because it describes the area of a
parallelogram, and more generally the volume of a parallelepiped.
The notion of determinant predates matrices and linear transformations. Originally, the
determinant was a number associated to a system of n linear equations in n variables. This
number determined whether the system was singular; i.e., possessed multiple solutions.
In this sense, two-by-two determinants were considered by Cardano at the end of the 16th
century and ones of arbitrary size (see the definition below) by Leibniz about 100 years later.
Definition
Let M = (Mij ) be an n n matrix with entries in some field of scalars 1 . The scalar

M11 M12 . . . M1n

M21 M22 . . . M2n X

(224.4.1)
sgn()M11 M22 Mnn
..
..
.. =
..
.

.
.
.

Sn
Mn1 Mn2 . . . Mnn
is called the determinant of M, or det(M) for short. The index in the above sum varies
over all the permutations of {1, . . . , n} (i.e., the elements of the symmetric group Sn .) Hence,
there are n! terms in the defining sum of the determinant. The symbol sgn() denotes the
parity of the permutation; it is 1 according to whether is an even or odd permutation.
By way of example, the determinant of a 2 2 matrix is given by

M11 M12

M21 M22 = M11 M22 M12 M22 ,
There are six permutations of the numbers 1, 2, 3, namely
+
1 23, 2 31, 3 12, 1 32, 3 21, 2 13;

the overset sign indicates the permutations signature. Accordingly, the 3 3 deterimant is
a sum of the following 6 terms:

M11 M12 M13

M21 M22 M23 =
M11 M22 M33 + M12 M23 M31 + M13 M21 M32

M31 M32 M33
M11 M23 M32 M13 M22 M31 M12 M21 M33
1
Most scientific and geometric applications deal with matrices made up of real or complex numbers.
However, most properties of the determinant remain valid for matrices with entries in a commutative ring.
1003
Remarks and important properties

1. The determinant operation converts matrix multiplication into scalar multiplication;
det(AB) = det(A) det(B),
where A, B are square matrices of the same size.
2. The determinant operation is multi-linear, and anti-symmetric with respect to the
matrixs rows and columns. See the multi-linearity attachment for more details.
3. The determinant of a lower triangular, or an upper triangular matrix is the product of
the diagonal entries, since all the other summands in (1) are zero.
4. Similar matrices have the same determinant. To be more precise, let A and X be
square matrices with X invertible. Then,
det(XAX 1 ) = det(A).
In particular, if we let X be the matrix representing a change of basis, this shows that
the determinant is independant of the basis. The same is true of the trace of a matrix.
In fact, the whole characteristic polynomial of an endomorphism is definable without
using a basis or a matrix, and it turns out that the determinant and trace are two of
its coefficients.
5. The determinant of a matrix A is zero if and only if A is singular; that is, if there
exists a non-trivial solution to the homogeneous equation
Ax = 0.
6. The transpose operation does not change the determinant:
det AT = det A.
7. The determinant of a diagonalizable transformation is equal to the product of its
eigenvalues, counted with multiplicities.
8. The determinant is homogeneous of degree n. This means that
det(kM) = k n det M,
k is a scalar.
Version: 13 Owner: matte Author(s): matte, Larry Hammick, rmilson
1004
224.5
determinant as a multilinear mapping
Let M = (Mij ) be an n n matrix with entries in a field K. The matrix M is really the
same thing as a list of n column vectors of size n. Consequently, the determinant operation
may be regarded as a mapping
n times
}|
{
z
n
det : K . . . K n K
The determinant of a matrix M is then defined to be det(M1 , . . . , Mn ), where Mj K n

denotes the j th column of M.
Starting with the definition
det(M1 , . . . , Mn ) =
Sn
sgn()M11 M22 Mnn
(224.5.1)
the following properties are easily established:

1. the determinant is multilinear;
2. the determinant is anti-symmetric;
3. the determinant of the identity matrix is 1.
These three properties uniquely characterize the determinant, and indeed can some would
say should be used as the definition of the determinant operation.
Let us prove this. We proceed by representing elements of K n as linear combinations of

1
0
0
0
1
0

0
0

e1 = , e2 = , . . . en = 0 ,
..
..
..
.
.
.
0
0
1
n
th
the
P standard basis of K . Let M be an n n matrix. The j column is represented as
i Mij ei ; whence using multilinearity
!
X
X
X
det(M) = det
Mi1 ei ,
Mi2 ei , . . . ,
Min ei
i
n
X
i1 ,...,in =1
Mi1 1 Mi2 2 Min n det(ei1 , ei2 , . . . , ein )
1005
The anti-symmetry assumption implies that the expressions det(ei1 , ei2 , . . . , ein ) vanish if
any two of the indices i1 , . . . , in coincide. If all n indices are distinct,
det(ei1 , ei2 , . . . , ein ) = det(e1 , . . . , en ),
the sign in the above expression being determined by the the number of transpositions
required to rearrange the list (i1 , . . . , in ) into the list (1, . . . , n). The sign is therefore the
parity of the permutation (i1 , . . . , in ). Since we also assume that
det(e1 , . . . , en ) = 1,
we now recover the original definition (224.5.1).
224.6
determinants of some matrices of special form
Suppose A is n n square matrix, u, v are two column n-vectors, and is a scalar. Then
det(A + uv T ) = det A + v T adj A u,

A u
det T
= det A v T adj A u,
v
where adj A is the adjugate of A.
REFERENCES
224.7
example of Cramers rule
Say we want to solve the system

3x + 2y + z 2w
2x y + 2z 5w
4x + 2y
5w
3x
2z 4w
1006
=
=
=
=
4
15
1
1.
The associated matrix is
3 2
1
2 1 2
4 2
0
3 0 2
2
5
1
4
whose determinant is = 65. Since the determinant is non-zero, we can use Cramers rule.
To obtain the value of the k-th variable, we replace the k-th column of the matrix above by
the column vector

4
15
,
1
1
the determinant of the obtained matrix is divided by and the resulting value is the wanted
solution.
So
x=
1
=

4

15

1

1

2
5
1
4
2 1
1 2
2 0
0 2
65
65
=1
65

3 4 1 2

2 15 2 5

4 1 0 1

3 1 2 4
2
130
y=
=
=
=2
65
65
z=
w=
3
=
4
=

3

2

4

3
2 4
1 15
2 1
0 1
65

2
5
1
4

3

2

4

3
2 1
1 2
2 0
0 2
65

4
15
1
1
1007
195
=3
65
65
= 1
65
224.8
proof of Cramers rule
Since det(A) 6= 0, by properties of the determinant we know that A is invertible.

We claim that this implies that the equation Ax = b has a unique solution. Note that A1 b
is a solution since A(A1 b) = (AA1 )b = b, so we know that a solution exists.
Let s be an arbitrary solution to the equation, so As = b. But then s = (A1 A)s =
A1 (As) = A1 b, so we see that A1 b is the only solution.
For each integer i, 1 6 i 6 n, let ai denote the ith column of A, let ei denote the ith column
of the identity matrix In , and let Xi denote the matrix obtained from In by replacing column
i with the column vector x.
We know that for any matrices A, B that the kth column of the product AB is simply the
product of A and the kth column of B. Also observe that Aek = ak for k = 1, . . . , n. Thus,
by multiplication, we have:
AXi =
=
=
=
A(e1 , . . . , ei1 , x, ei+1 , . . . , en )

(Ae1 , . . . , Aei1 , Ax, Aei+1 , . . . , Aen )
(a1 , . . . , ai1 , b, ai+1 , . . . , an )
Mi
Since Xi is In with column i replaced with x, computing the determinant of Xi with

cofactor expansion gives:
det(Xi ) = (1)(i+i) xi det(In1 ) = 1 xi 1 = xi
Thus by the multiplicative property of the determinant,
det(Mi ) = det(AXi ) = det(A) det(Xi ) = det(A)xi
and so xi =
det(Mi )
det(A)
as required.
224.9
proof of cofactor expansion
Let M matN (K) be a nn-matrix with entries from a commutative field K. Let e1 , . . . , en
denote the vectors of the canonical basis of K n . For the proof we need the following
lemma: Let Mij be the matrix generated by replacing the i-th row of M by ej . Then
det Mij = (1)i+j det Mij
1008
where Mij is the (n 1) (n 1)-matrix obtained from M by removing its i-th row and
j-th column.
Proof: By adding appropriate multiples of the i-th row of Mij to its remaining rows we
obtain a matrix with 1 at position (i, j) and 0 at positions (k, j) (k 6= i). Now we apply the
permutation
(12) (23) ((i 1)i)
to rows and
(12) (23) ((j 1)j)
to columns of the matrix. The matrix now looks like this:

Row/column 1 are the vector e1 ;
under row 1 and right of column 1 is the matrix Mij .

Since the determinant has changed its sign i + j 2 times, we have
det Mij = (1)i+j det Mij .
Note also that computing the determinant of Mij only those permutations Sn are effective
where (i) = j.
Now we start out with
det M
Q

n
sgn
m
j(j)
Sn
j=1
P
Q
Q

Pn
= k=1 mik
sgnpi
m
.1.
m
.
j(j)
jpi(j)
Sn |(i)=k
1j<i
i<jn
=
From the previous lemma, it follows that the inner sum associated with Mik is the determinant of Mij . So we have
det M =
n
X
k=1

Mik (1)i+k det Mik .
224.10
resolvent matrix
The resolvent matrix of a matrix A is defined as

RA (s) = (sI A)1
Note: I is the identity matrix and s is a complex variable. Also note that RA (s) is undefined
on Sp(A) (the spectrum of A).
1009
Chapter 225
15A18 Eigenvalues, singular values,
and eigenvectors
225.1
Jordan canonical form theorem
Let V be a finite-dimensional vector space over a field F and t : V V be a linear transformation.

Then, if the characteristic polynomial factorizes completely over F, there will exist a basis
of V with respect to which the matrix of t is of the form
J1 0
0 J2
0 0
... 0
... 0
...
... Jk
where each Ji is a reduced (Jordan) matrix in which = i .

A Jordan block or Jordan matrix is a matrix of the form
..
.
0
0
..
.
0
1
..
.
...
...
...
..
.
0 0
...
0
0
with a constant value along the diagonal and 1s on the superdiagonal. Some texts place
the 1s on the subdiagonal instead.
1010
225.2
Lagrange multiplier method
The Lagrange multiplier method is used when one needs to find the extreme values of a
function whose domain is constrained to lie within a particular subset of the domain.
Method
Suppose that f (x) and gi(x), i = 1, ..., m (x Rn ) are differentiable functions that map
Rn 7 R, and we want to solve
min f (x) such that gi (x) = 0,
i = 1, . . . , m
By a calculus theorem, the gradient of f , f , must satisfy the following equation:

f =
m
X
i=1
i gi
Note that this is equivalent to solving the following problem:
min f (x)
m
X
i (gi (x))
i=1
for x, i , without restrictions.

Version: 3 Owner: drini Author(s): drini, Riemann
225.3
Perron-Frobenius theorem
Let A be a nonnegative matrix. Denote its spectrum by (A). Then the spectral radius
(A) is an eigenvalue, that is, (A) (A), and is associated to a nonnegative eigenvector.
If, in addition, A is an irreducible matrix, then |(A)| > ||, for all (A), 6= (A), and
(A) is a simple eigenvalue associated to a positive eigenvector.
If, in addition, A is a primitive matrix, then (A) > || for all (A), 6= (A).
Version: 2 Owner: jarino Author(s): jarino
1011
225.4
characteristic equation
Let A be a n n matrix. The characteristic equation of A is defined by

a11
a
a
12
1n

a21
a22
a2n

det(A I) = ..
.. = 0
..
..
.
.
.
.

an1
an2
ann
This forms an nth-degree polynomial in , the solutions to which are called the eigenvalues
of A.
225.5
eigenvalue
Let V be a vector space over a field k, and let A be an endomorphism of V (meaning a

linear mapping of V into itself). A scalar k is said to be an eigenvalue of A if there is
a nonzero x V for which
Ax = x .
(225.5.1)
Geometrically, one thinks of a vector whose direction is unchanged by the action of A, but
whose magnitude is multiplied by .
If V is finite dimensional, elementary linear algebra shows that there are several equivalent
definitions of an eigenvalue:
(2) The linear mapping
B = I A
i.e. B : x 7 x Ax, has no inverse.
(3) B is not injective.
(4) B is not surjective.
(5) det(B) = 0, i.e. det(I A) = 0.
But if V is of infinite dimension, (5) has no meaning and the conditions (2), (3), and (4) are
not equivalent to (1). A scalar satisfying (2) (called a spectral value of A) need not be
an eigenvalue. Consider for example the complex vector space V of all sequences (xn )
n=1 of
complex numbers with the obvious operations, and the map A : V V given by
A(x1 , x2 , x3 , . . . ) = (0, x1 , x2 , x3 , . . . ) .
1012
Zero is a spectral value of A, but clearly not an eigenvalue.

Now suppose again that V is of finite dimension, say n. The function
() = det(B)
is a polynomial of degree n over k in the variable , called the characteristic polynomial
of the endomorphism A. (Note that some writers define the characteristic polynomial as
det(A I) rather than det(I A), but the two have the same zeros.)
If k is C or any other algebraically closed field, or if k = R and n is odd, then has at least
one zero, meaning that A has at least one eigenvalue. In no case does A have more than n
eigenvalues.
Although we didnt need to do so here, one can compute the coefficients of by introducing a
basis of V and the corresponding matrix for B. Unfortunately, computing nn determinants
and finding roots of polynomials of degree n are computationally messy procedures for even
moderately large n, so for most practical purposes variations on this naive scheme are needed.
See the eigenvalue problem for more information.
If k = C but the coefficients of are real (and in particular if V has a basis for which the
matrix of A has only real entries), then the non-real eigenvalues of A appear in conjugate
pairs. For example, if n = 2 and, for some basis, A has the matrix

0 1
A=
1 0
then () = 2 + 1, with the two zeros i.
Eigenvalues are of relatively little importance in connection with an infinite-dimensional
vector space, unless that space is endowed with some additional structure, typically that of
a Banach space, a Hilbert space, or a normed algebra. But in those cases the notion is of
great value in physics, engineering, and mathematics proper. Look for spectral theory for
more on that subject.
The word eigenvalue derives from the German eigenwert, meaning proper value.
Version: 7 Owner: Koro Author(s): Larry Hammick, gbglace, akrowne
225.6
eigenvalue
Let V be a vectorial space over k and T a linear operator on V . An eigenvalue for T is an

scalar (that is, an element of k) such that T (z) = z for some nonzero vector z V . Is
that case, we also say that z is an eigenvector of T .
This can also be expressed as follows: is an eigenvalue for T if the kernel of A I is non
trivial.
1013
A linear operator can have several eigenvalues (but no more than the dimension of the space).
Eigenvectors corrsponding to different eigenvalues are linearly independent.
1014
Chapter 226
15A21 Canonical forms, reductions,
classification
226.1
companion matrix
Given a monic polynomial p(x) = xn + an1 xn1 + + a1 x + a0 the companion matrix of

p(x), denoted Cp(x) , is the nn matrix with 1s down the the first subdiagonal and minus the
coefficients of p(x) down the last column, or alternatively, as the transpose of this matrix.
Adopting the first convention this is simply
0 0 . . . . . . . . . a0
1 0 . . . . . . . . . a1
0 1 . . . . . . . . . a2
..
Cp(x) =
..
.
.
.
0
0
. .
..
..
.. ..
.
.
0 0 . . . . . . 1 an1
Regardless of which convention is used the minimal polynomial of Cp(x) equals the characteristic polynomial
of Cp(x) and is just p(x).
Version: 2 Owner: basseykay Author(s): basseykay
226.2
eigenvalues of an involution
Proof. For the first claim suppose is an eigenvalue corresponding to an eigenvector x of

A. That is, Ax = x. Then A2 x = Ax, so x = 2 x. As an eigenvector, x is non-zero, and
= 1. Now property (1) follows since the determinant is the product of the eigenvalues.
1015
For property (2), suppose that AI = A(A1/I), where A and are as above. Taking
the determinant of both sides, and using part (1), and the properties of the determinant,
yields
1
det(A I) = n det(A I).
Property (2) follows. 2

226.3
linear involution
Definition. Let V be a vector space. A linear involution is a linear operator L : V V

such that L2 is the identity operator on V . An equivalent definition is that a linear involution
is a linear operator that equals to its own inverse.
Theorem 1. Let V be a vector space and let A : V V be a linear involution. Then the
eigenvalues of A are 1. Further, if V is Cn , and A is a n n complex matrix, then we have
that:
1. det A = 1.
2. The characteristic polynomial of A, p() = det(A I), is a reciprocal polynomial,
i.e.,
p() = n p(1/).
(proof.)
The next theorem gives a correspondence between involution operators and projection operators.
Theorem 2. Let L and P be linear operators on a vector space V , and let I be the identity
operator on V . If L is an involution then the operators 21 I L are projection operators.
Conversely, if P is a projection operator, then the operators (2P I) are involutions.
The next theorem is given as exercise IV.10.14 in [2].
Theorem 3. Let A be a complex n n matrix. Then any two the the below conditions
imply the third:
1. A is a Hermitian matrix.
2. A is unitary matrix.
3. The mapping A : Cn Cn is an involution.
1016
The proofs of theorems 2 and 3 are straightforward calculations.
REFERENCES
1. M. C. Pease, Methods of Matrix Algebra, Academic Press, 1965
:
226.4
normal matrix
A complex matrix A Cnm is said to be normal if AH A = AAH where H denotes the

conjugate transpose.
Similarly for a real matrix A Rnm is said to be normal if AT A = AAT where T denotes
the transpose.
properties:
Equivalently a complex matrix A Cnm is said to be normal if it satisfies [A, AH ] = 0
where [, ] is the commutator bracket.
Equivalently a real matrix A Rnm is said to be normal if it satisfies [A, AT ] = 0
where [, ] is the commutator bracket.
Let A be a square real matrix(possibly
complex), it follows from Shurs inequality that
Pn
2
if A is a normal matrix then i=1 |i | = trace A A where is the complex conjugate
and i are the eigenvalues of A.
Let A be a complex square matrix. Then A is a diagonal matrix if and only if A is a normal matrix and A is a triangular matrix (see theorem for normal triangular matrices).
examples:

b

1

b
where a, b R
a

i
1
see also:
1017
Wikipedia, normal matrix

226.5
projection
A linear transformation P : V V of a vector space V is called a projection if it acts like

the identity on its image. This condition can be more succinctly expressed by the equation
P 2 = P.
(226.5.1)
Proposition 10. If P : V V is a projection, then its image and the kernel are complementary subspaces,
namely
V = ker P img P.
(226.5.2)
S
uppose that P is a projection. Let v V be given, and set

u = v P v.
The projection condition (226.5.1) then implies that u ker P , and we can write v as the
sum of an image and kernel vectors:
v = u + P v.
This decomposition is unique, because the intersection of the image and the kernel is the
trivial subspace. Indeed, suppose that v V is in both the image and the kernel of P . Then,
P v = v and P v = 0, and hence v = 0. QED
Conversely, every direct sum decomposition
V = V1 V2
corresponds to a projection P : V V defined by
(
v v V1
Pv =
0 v V2
Specializing somewhat, suppose that the ground field is R or C and that V is equipped with
a positive-definite inner product. In this setting we call an endomorphism P : V V an
orthogonal projection if it is self-dual
P ? = P,
in addition to satisfying the projection condition (226.5.1).
1018
Proposition 11. The kernel and image of an orthogonal projection are orthogonal subspaces.
L
et u ker P and v img P be given. Since P is self-dual we have

0 = hP u, vi = hu, P vi = hu, vi.
QED
Thus we see that a orthogonal projection P projects a v V onto P v in an orthogonal
fashion, i.e.
hv P v, ui = 0
for all u img P .
226.6
quadratic form
Let U be a vector space over a field k, whose characteristic is not equal to 2. A mapping
Q : U k is called a quadratic form if there exists a symmetric bilinear form B : U U k
such that
Q(u) = B(u, u), u U.
Thus, every symmetric bilinear form determines a quadratic form. The reverse is also true.
Let B be a symmetric bilinear form and Q the corresponding quadratic form. A straightforward calculation shows that
2B(u, v) = Q(u + v) Q(u) Q(v).
The above relation is called the polarization identity. It shows that a quadratic form Q fully
determines the corresponding bilinear form B.
Next, suppose that U is finite-dimensional, and let u1 , . . . , un be a basis. Every quadratic
form Q is represented relative to this basis by the symmetric matrix of scalars
Aij = B(ui , uj ),
i, j = 1, . . . , n,
where B is the corresponding bilinear form. Letting x1 , . . . , xn k denote the coordinates

of an arbitrary u U relative to this basis, i.e.
u = x1 u1 + . . . + xn un ,
we have
Q(u) =
n
X
Aij xi xj .
i,j=1
1019
Writing x = (x1 , . . . , xn )T k n for the corresponding coordinate vector, we can write the
above simply as
Q(u) = xT Ax.
In the case where k is the field of real numbers, we say that a quadratic form is positive definite,
negative definite, or positive semidefinite if the same can be said of the corresponding bilinear
form.
Version: 5 Owner: rmilson Author(s): rmilson, drummond
1020
Chapter 227
15A23 Factorization of matrices
227.1
QR decomposition
Orthogonal matrix triangularization (QR decomposition) reduces a real m n matrix A

with m n and full rank to a much simpler form. It guarantees numerical stability by
minimizing errors caused by machine roundoffs. A suitably chosen orthogonal matrix Q will
triangularize the given matrix:

R
A=Q
0
with the n n upper triangular matrix R. One only has then to solve the triangular system
Rx = P b, where P consists of the first n rows of Q.
The least squares problem Ax b is easy to solve with A = QR and Q an orthogonal matrix.
The solution
x = (AT A)1 AT b
becomes
x = (RT QT QR)1 RT QT b = (RT R)1 RT QT b = R1 QT b
This is a matrix-vector multiplication QT b, followed by the solution of the triangular system
Rx = QT b by back-substitution. The QR factorization saves us the formation of AT A and
the solution of the normal equations.
1021
Many different methods exist for the QR decomposition, e.g. the Householder transformation,
the Givens rotation, or the Gram-Schmidt decomposition.
References
1022
Chapter 228
15A30 Algebraic systems of matrices
228.1
ideals in matrix algebras
Let R be a ring with 1. Consider the ring Mnn (R) of n n-matrices with entries taken
from R.
It will be shown that there exists a one-to-one correspondence between the ideals of R and
the ideals of Mnn (R).
For 1 i, j n, let Eij denote the n n-matrix having entry 1 at position (i, j) and 0 in
all other places. It can be easily checked that

0
iff
k 6= j
Eij Ekl =
(228.1.1)
Eil otherwise.
Let m be an ideal in Mnn (R).
Claim 1. The set i R given by
i = {x R | xis an entry of A | A m}
is an ideal in R, and m = Mnn (i).
e have i 6= since 0 i. Now let A = (aij and B = (bij be matrices in m, and x, y R be
entries of A and B respectively say x = aij and y = bkl. Then the matrix AEjl +Eik B m
has x + y at position (i, l), and it follows: If x, y i, then x + y i. Since i is an ideal in
Mnn (R) it contains, in particular, the matrices Dr A and A Dr , where
W
Dr :=
n
X
i=1
r Eii , r R.
1023
thus, rx, xr i. This shows that i is an ideal in R. Furthermore, Mnn (i) m.

By construction, any matrix A m has entries in i, so we have
X
A=
aij Eij , aij i
1i,jn
so A mnn (i). Therefore m Mnn (i.

A consequence of this is: If F is a field, then Mnn (F ) is simple.
1024
Chapter 229
15A36 Matrices of integers
229.1
permutation matrix
A permutation matrix is any matrix which can be created by rearranging the rows and/or
columns of an identity matrix.
Pre-multiplying a matrix A by a permutation matrix P results in a rearrangement of the
rows of of A. Post-multiplying by P results in a rerrangement of the columns of A.
Let A be an n n matrix. If the matrix P is obtained by swapping rows i and j of the n n
identity matrix In , then rows i and j of A will be swapped in the product P A, and columns
i and j of A will be swapped in the product AP .
1025
Chapter 230
15A39 Linear inequalities
230.1
Farkas lemma
Given an m n matrix A and an n 1 real column vector c, both with real coefficients, one
and only one of the following systems has a solution:
1. Ax 6 0 and cx > 0 for some n-column vector x;
2. wA = c and w > 0 for some n-row vector w.
Equivalently, one and only one of the following has a solution:
1. Ax 6 0, x 6 0 and cx > 0 for some n-column vector x;
2. wA 6 c and w > 0 for some n-row vector w.
Remark. Here, Ax > 0 means that every component of Ax is nonnegative, and similarly
with the other expressions.
1026
Chapter 231
15A42 Inequalities involving
eigenvalues and eigenvectors
231.1
Gershgorins circle theorem
Let A be a square complex matrix. Around every element aii on the diagonal of the matrix,
we
P draw a circle with radius the sum of the norms of the other elements on the same row
j6=i |aij |. Such circles are called Gershgorin discs.
Theorem: Every eigenvalue of A lies in one of these Gershgorin discs.
Proof: Let be an eigenvalue of A and x its corresponding eigenvector. Choose i such that
|xi | = maxj |xj |. Since x cant be 0, |xi | > 0. Now Ax = x, or looking at the i-th component
X
( aii )xi =
aij xj .
j6=i
Taking the norm on both sides gives

| aii | = |
X aij xj
j6=i
xi
|6
X
j6=i
|aij |.
231.2
Gershgorins circle theorem result
Since the eigenvalues of A and A transpose are the same, you can get an additional
set of
P
discs which has the same centers, aii , but a radius calculated by the column j6=i |aji|. In
1027
each of these circles there must be an eigenvalue. Hence, by comparing the row and column
discs, the eigenvalues may be located efficiently
Version: 3 Owner: saki Author(s): saki
231.3
Shurs inequality
Theorem (Shurs inequality) Let A be a square n n matrix with real (or possibly complex
entries). If 1 , . . . , n are the eigenvalues of A, and D is the diagonal matrix D = diag(1 , . . . , n ),
then
trace D D trace A A.
Here trace is the trace of a matrix, and is the complex conjugate. Equality holds if and
only if A is a normal matrix ([1], pp. 146).
REFERENCES
1028
Chapter 232
15A48 Positive matrices and their
generalizations; cones of matrices
232.1
negative definite
Let A be an n n symmetric real square matrix. If for any non-zero vector x we have
xt Ax < 0,
we call A a negative definite matrix.
232.2
negative semidefinite
xt Ax 6 0,
we call A a negative semidefinite matrix.
1029
232.3
positive definite
Introduction
The definiteness of a matrix is an important property that has use in many areas of mathematics and even physics. Below are some examples:
1. In optimizing problems, the definiteness of the Hessian matrix determines the quality
of an extremal value. The full details can be found on this page.
2. In electromagnetism, one can show that the definiteness of a certain media matrix determines the qualitative property of the media: if the matrix is positive or negative definite,
the media is active respectively lossy [1].
Definition [2] Suppose A is an n n square Hermitian matrix. If, for any non-zero vector
x, we have that
x Ax > 0,
then A a positive definite matrix. (Here x = x T , where x is the complex conjugate of x,
and x T is the transpose of x.)
One can show that a Hermitian matrix is positive definite if and only if all its eigenvalues
are positive [2]. Thus the determinant of a positive definite matrix is positive, and a positive definite matrix is always invertible. The Cholesky decomposition provides an economic
method for solving linear equations involving a positive definite matrix. Further conditions
and properties for positive definite matrices are given in [3].
REFERENCES
1. I.V. Lindell, Methods for Electromagnetic Field Analysis, Clarendon Press, 1992.
2. M. C. Pease, Methods of Matrix Algebra, Academic Press, 1965
3. C.R. Johnson, Positive definite matrices, American Mathematical Monthly, Vol. 77, Issue
3 (March 1970) 259-264.
232.4
positive semidefinite
xt Ax > 0,
1030
we call A a positive semidefinite matrix.

232.5
primitive matrix
A nonnegative square matrix A is said to be a primitive matrix if there exists k such that
Ak 0, i.e., if there exists k such that for all i, j, the (i, j) entry of Ak is akij > 0.
An equivalent condition for a matrix to be a primitive matrix is for the matrix to be an
irreducible matrix with positive trace.
232.6
reducible matrix
A nonnegative n n matrix A is a reducible matrix if there exists a permutation matrix P

such that

A11 A12
T
P AP =
0 A22
A is an irreducible matrix if it is not a reducible matrix. Two equivalent conditions for a

matrix to be an irreducible matrix are:
1. the digraph associated to A is strongly connected,
2. for each i, j, there exists k such that the (i, j) entry in Ak is akij > 0.
1031
Chapter 233
15A51 Stochastic matrices
233.1
Birkoff-von Neumann theorem
An nn matrix is doubly stochastic if and only if it is a convex combination of permutation matrices.

233.2
proof of Birkoff-von Neumann theorem
First, we prove the following lemma:

Lemma 4. A convex combination of doubly stochastic matrices is doubly stochastic.
m
et {Ai }m
i=1 be a collection of
Pnm n doubly-stochastic matrices, and suppose {i }i=1 is a
collection
Pm of scalars satisfying i=1 i = 1 and i > 0 for each i = 1, . . . , m. We claim that
A = i=1 i Ai is doubly stochastic.
Take any i {1, . . . , m}. Since Ai is doubly stochastic, each of its rows and columns sum
to 1. Thus each of the rows and columns of i Ai sum to i .
By the definition of elementwise summation, given matrices N = M1 + M2 , the sum of the
entries in the ith column of N is clearly the sum of the sums of entries of the ith columns of
M1 and M2 respectively. A similar result holds for the jth row.
Hence the sum of the entries in the ith column
Pm of A is the sum of the sums of entries of the
ith columns of k Ak for each i, that is, k=1 k = 1. The sum of entries of the jth row of
A is the same. Hence A is doubly stochastic.
Observe that since a permutation matrix has a single nonzero entry, equal to 1, in each
row and column, so the sum of entries in any row or column must be 1. So a permutation
1032
matrix is doubly stochastic, and on applying the lemma, we see that a convex combination
of permutation matrices is doubly stochastic.
This provides one direction of our proof; we now prove the more difficult direction: suppose B is doubly-stochastic. Define a weight graph G = (V, E) with vertex set V =
{r1 , . . . , rn , c1 , . . . , cn }, edge set E, where eij = (ri , ci ) E if Bij 6= 0, and edge weight
, where (eij ) = Bij .
Clearly G is a bipartite graph, with partitions R = {r1 , . . . , rn } and C = {c1 , . . . , cn }, since
the only edges in E are between ri and cj for some i, j {1, . . . , n}. Furthermore, since
Bij > 0, then (e) > 0 for every e E.
For any A V define N(A), the neighbourhood of A, to be the set of vertices u V such
that there is some v A such that (u, v) E.
P
We claim that, for any v V , uN ({v}) (u, v) = 1. Take any v V ; either v R or
v C. Since G is bipartite, v R implies N({v}) C, and v C implies N({v}) R.
Now,
n
n
n
X
X
X
X
(ri , u) =
v = ri
(ri, cj ) =
Bij =
Bij = 1
uN (ri )
v = cj
(u, cj ) =
uN (cj )
j=1,
j=1,
eij E
n
X
Bij 6=0
n
X
(ri, cj ) =
j=1
Bij =
Bij = 1
i=1
i=1,
i=1,
Bij 6=0
eij E
n
X
since B is doubly stochastic. Now, take any A R. We have

X
X X
X
(v, w) =
(v, w) =
1 = |A|
vA wN ({v})
vA
wN (A)
vA
Let B = N(A). But then clearly A N(B), by definition of neighbourhood. So

X
X
X
|N(A)| = |B| =
(v, w) >
(v, w) =
(v, w) = |A|
vB
wN (B)
vB
wA
wA
vN (A)
So |N(A)| > |A|. We may therefore apply the graph-theoretic version of Halls marriage theorem
to G to conclude that G has a perfect matching.
So let M E be a perfect matching for G. Define an n n matrix P by

1 if eij M
Pij =
0 otherwise
Note that Bij = 0 implies Pij = 0: if Bij = 0, then (ri , cj ) 6 E, so (ri , cj ) 6 M, which
implies Pij = 0.
Further, we claim that P is a permutation matrix:
Let i be any row of P . Since M is a perfect matching of G, there exists e0 M such that
1033
ri is an end of e0 . Let the other end be cj for some i; then Pij = 1.

Suppose ii , i2 {1, . . . , n} with i1 6= i2 and Pi1 ,j = Pi2 ,j = 1 for some j. This implies
(ri1 , cj ), (ri2 , cj ) M, but this implies the vertex i is the end of two distinct edges, which
contradicts the fact that M is a matching.
Hence, for each row and column of P , there is exactly one nonzero entry, whose value is 1.
So P is a permutation matrix.
Define =
min
{Bij | Pij 6= 0}. We see that > 0 since Bij > 0, and Pij 6= 0 Bij 6= 0.
i,j{1,...,n}
Further, = Bpq for some p, q.
Let D = B P . If D = 0, then = 1 and B is a permutation matrix, so we are done.

Otherwise, note that D is nonnegative; this is clear since Pij 6 6 Bij for any Bij 6= 0.
Notice that Dpq = Bpq Ppq = 1 = 0.
Note that since every row and column of B and P sums to 1, that every row and column of
1
D = B P sums to 1 . Define B 0 = 1
D. Then every row and column of B 0 sums to
1, so B 0 is doubly stochastic.
Rearranging, we have B = P + (1 )B 0 . Clearly Bij = 0 implies that Pij = 0 which
implies that Bij0 = 0, so the zero entries of B 0 are a superset of those of B 0 . But notice that
1
0
Dpq = 0, so the zero entries of B 0 are a strict superset of those of B.
Bpq
= 1
We have decomposed B into a convex combination of a permutation matrix and another
doubly stochastic matrix with strictly more zero entries than B. Thus we may apply this
procedure repeatedly on the doubly stochastic matrix obtained from the previous step, and
the number of zero entries will increase with each step. Since B has at most n2 nonzero
entries, we will obtain a convex combination of permutation matrices in at most n2 steps.
Thus B is indeed expressible as a convex combination of permutation matrices.
1034
Chapter 234
15A57 Other types of matrices
(Hermitian, skew-Hermitian, etc.)
234.1
Hermitian matrix
A matrix A is said to be Hermitian or self-adjoint if

A = AT = A
where AT is the transpose, and A is the complex conjugate.
Note that a Hermitian matrix must have real diagonal elements, as the complex conjugate
of these elements must be equal to themselves.
Any real symmetric matrix is Hermitian; the real symmetric matrices are a subset of the
Hermitian matrices.
An example of a Hermitian matrix is
1
1 + i 1 + 2i 1 + 3i
1i
2
2 + 2i 2 + 3i
1 2i 2 2i
3
3 + 3i
1 3i 2 3i 3 3i
4
Hermitian matrices are named after Charles Hermite (1822-1901) [2], who proved in 1855
that the eigenvalues of these matrices are always real [1].
1035
REFERENCES
2. The MacTutor History of Mathematics archive, Charles Hermite
234.2
direct sum of Hermitian and skew-Hermitian matrices
In this example, we show that any square matrix with complex entries can uniquely be
decomposed into the sum of one Hermitian matrix and one skew-Hermitian matrix. A fancy
way to say this is that complex square matrices is the direct sum of Hermitian and skewHermitian matrices.
Let us denote the vector space (over C) of complex square n n matrices by M. Further,
we denote by M+ respectively M the vector subspaces of Hermitian and skew-Hermitian
matrices. We claim that
M = M+ M .
(234.2.1)
Since M+ and M are vector subspaces of M, it is clear that M+ + M is a vector subspace

of M. Conversely, suppose A M. We can then define

1
A + A ,
2

1
A A .
=
2
A+ =
A
Here A = A T , and A is the complex conjugate of A, and A T is the transpose of A. It

follows that A+ is Hermitian and A is anti-Hermitian. Since A = A+ + A , any element in
M can be written as the sum of one element inTM+ and one element in M . Let us check
that this decomposition is unique. If A M+ M , then A = A = A, so A = 0. We
have established equation 234.2.1.
Special cases
In the special case of 1 1 matrices, we obtain the decomposition of a complex number
into its real and imaginary components.
In the special case of real matrices, we obtain the decomposition of a n n matrix into
a symmetric matrix and anti-symmetric matrix.
1036
234.3
identity matrix
The n n identity matrix I (or In ) over a ring R is the square matrix with coefficients in
R given by
1
0
I=
0
0
0
0
0
0 1,
0
1
.
0 ..
where the numeral 1 and 0 respectively represent the multiplicative and additive identities
in R. The identity matrix In serves as the identity in the ring of n n matrices over R. For
any n n matrix M, we have In M = MIn = M, and the identity matrix is uniquely defined
by this property. In addition, for any n m matrix A and m n B, we have IA = A and
BI = B.
Properties
The n n identity matrix I satisfy the following properties
For the determinant, we have det I = 1, and for the trace, we have tr I = n.
The identity matrix has only one eigenvalue = 1 of multiplicity n. The corresponding
eigenvectors can be chosen to be v1 = (1, 0, . . . , 0), . . . , vn = (0, . . . , 0, 1).
The matrix exponential of I gives eI = eI.
The identity matrix is a diagonal matrix.
Version: 5 Owner: mathcam Author(s): mathcam, akrowne
234.4
skew-Hermitian matrix
Definition. A square matrix A with complex entries is skew-Hermitian, if

A = A .
Here A = A T , A T is the transpose of A, and A is is the complex conjugate of the matrix
A.
properties.
1037
1. The trace of a skew-Hermitian matrix is purely imaginary or zero.

2. The eigenvalues of a skew-Hermitian matrix are purely imaginary or zero [1].
Proof. For property (1), let xij and yij be the real respectively imaginary parts of the
elements in A. Then the diagonal elements of A are of the form xkk + iykk , and the diagonal
elements in A are of the form xkk + iykk . Hence xkk , i.e., the real part for the diagonal
elements in A must vanish, and property (1) follows. For property (2), suppose A is a
skew-Hermitian matrix, and x an eigenvector corresponding to the eigenvalue , i.e.,
Ax = x.
(234.4.1)
Here, x is a complex column vector. Multiplying both sides by x = x T yields

x Ax = x x.
Since x is a eigenvector, x is not the zero vector, and the right hand side in the above
equation is a positive real number. For column vectors x, y, and a square matrix A, we have
that (Ax) T = (x T A T ) T and x T y = y T x. Thus
x Ax =
=
=
=
x T Ax
xT (x A T ) T
(x A )x
x Ax,
so the left hand side in equation 234.4.1 is purely imaginary or zero. Hence the eigenvalue
corresponding to x is purely imaginary or zero. 2
REFERENCES
1978.
234.5
transpose
The transpose of a matrix A is the matrix formed by flipping A about the diagonal line
from the upper left corner. It is usually denoted At , although sometimes it is written as AT
or A0 . So if A is an m n matrix and
1038
a11
a21
A = ..
.
a12
a22
..
.
..
.
a1n
a2n
..
.
am1 am2 amn
then
a11 a21 am1

a12 a22 am2
At = ..
..
..
..
.
.
.
.
a1n a2n anm
Note that the transpose of an m n matrix is a n m matrix.
properties.
Let A and B be m n matrices and c be a constant. Let x and y be column vectors with n
rows. Then
1. (At )t = A
2. (A + B)t = At + B t
3. (cA)t = cAt
4. (AB)t = B t At
5. If A is invertible, then (At )1 = (A1 )t
6. trace(At A) 0 (where trace is the trace of a matrix).
7. The transpose is a linear mapping from the vector space of matrices to itself. That is,
(A + B)t = (A)t + (B)t , for same-sized matrices A and B and scalars and .
The familiar vector dot product can also be defined using the matrix transpose. If x and y
are column vectors with n rows each,
xt y = x y
which implies
xt x = x x = ||x||22
1039
which is another way of defining the square of the vector Euclidian norm.
1040
Chapter 235
15A60 Norms of matrices,
numerical range, applications of
functional analysis to matrix theory
235.1
Frobenius matrix norm
Let R be an ordered ring with a valuation | | and let M(R) denote the set of matrices over
R. The Frobenius norm function or Euclidean matrix norm is the norm function
||A||F : M(R) R given by
v
uX
n
u m X
t
|| A ||F =
|aij |2
i=1 j=1
A more concise (though equivalent) definition is

p
|| A ||F = trace(AA ),
where A denotes the conjugate transpose of A.
Denote the columns of A by Ai . A nice property of the norm is that

||A||2F = ||A1 ||22 + ||A2 ||22 + + ||An ||22.
(see trace, transpose)
1041
235.2
matrix p-norm
A class of matrix norms, denoted k kp , is defined as

k A kp = sup
x6=0
k Ax kp
k x kp
x Rn , A Rmn .
The matrix p-norms are defined in terms of the vector p-norms.

An alternate definition is
k A kp = max k Ax kp .
k x kp =1
As with vector p-norms, the most important are the 1, 2, and norms. The 1 and norms
are very easy to calculate for an arbitrary matrix:
k A k1
= max
16j6n
k A k = max
16i6n
m
X
i=1
m
X
j=1
|aij |
|aij |.
It directly follows from this that k A k1 = k AT k .

The calculation of the 2-norm is more complicated. However, it can be shown that the
2-norm of A is the square root of the largest eigenvalue of AT A. There are also various
inequalities that allow one to make estimates on the value of k A k2:
1
k A k 6 k A k2 6 mk A k.
n
1
k A k1 6 k A k2 6 nk A k1.
m
k A k2 6 k A kF 6
(k A kF is the Frobenius matrix norm)
Version: 7 Owner: drini Author(s): drini, Logan
1042
nk A k2 .
235.3
self consistent matrix norm
A matrix norm N is said to be self consistent if

N(AB) 6 N(A) N(B)
for all pairs of matrices A and B such that AB is defined.
1043
Chapter 236
15A63 Quadratic and bilinear
forms, inner products
236.1
Cauchy-Schwarz inequality
Let V be a vector space where an inner product <, > has been defined. Such spaces can be
given also a norm by defining
kxk = < x, x >.

Then in such space the Cauchy-Schwarz inequality holds:
| < v, w > | kvkkwk
for any v, w V . That is, the modulus (since it might as well be a complex number) of the
inner product for two given vectors is less or equal than the product of their norms. Equality
happens if and only if the two vectors are linearly dependent.
A very special case is when V = Rn and the inner product is the dot product defined as
< v, w >= v t w and usually denoted as v w and the resulting norm is the Euclidean norm.
If v = (v1 , v2 , . . . , vn ) and w = (w1 , w2 , . . . , wn ) the Cauchy-Schwarz inequality becomes
q
q
2
2
2
u v = |v1 w1 + v2 w2 + + vn wn | v1 + v2 + + vn w12 + w22 + + wn2 = kukkvk.
Notice that in this case inequality holds even if the modulus on the middle term (which is a
real number) is not used.
Cauchy-Schwarz inequality is also a special case of Holder inequality. The inequality arises
in lot of fields, so it is known under several othernames as Buniakovsky inequality or Kantorovich inequality. Another form that arises often is Cauchy-Schwartz inequality but this
is a mispelling since the inequality is named after Hermann Amandus Schwarz (1843-1921).
1044
236.2
adjoint endomorphism
Definition (the bilinear case). Let U be a finite-dimensional vector space over a field
K, and B : U U K a symmetric, non-degenerate bilinear mapping, for example a real
inner product. For an endomorphism T : U U we define the adjoint of T relative to B to
be the endomorphism T ? : U U, characterized by
B(u, T v) = B(T ? u, v),
u, v U.
It is convinient to identify B with a linear isomorphism B : U U in the sense that

u, v U.
B(u, v) = (Bu)(v),
We then have
T ? = B 1 T B.
To put it another way, B gives an isomorphism between U and the dual U , and the adjoint
T ? is the endomorphism of U that corresponds to the dual homomorphism T : U U.
Here is a commutative diagram to illustrate this idea:
U
T?
U
B
Relation to the matrix transpose. Let u1 , . . . , un be a basis of U, and let M

Matn,n (K) be the matrix of T relative to this basis, i.e.
X j
M i uj = T (ui ).
j
Let P Matn,n (K) denote the matrix of the inner product relative to the same basis, i.e.
Pij = B(ui , uj ).
Then, the representing matrix of T ? relative to the same basis is given by P 1 M T P. Specializing further, suppose that the basis in question is orthonormal, i.e. that
B(ui , uj ) = ij .
Then, the matrix of T ? is simply the transpose M T .
1045
The Hermitian (sesqui-linear) case. If T : U U is an endomorphism of a unitary space

(a complex vector space equipped with a Hermitian inner product). In this setting we can
define we define the Hermitian adjoint T ? : U U by means of the familiar adjointness
condition
hu, T vi = hT ? u, vi, u, v U.
However, the analogous operation at the matrix level is the conjugate transpose. Thus, if
M Matn,n (C) is the matrix of T relative to an orthonormal basis, then M T is the matrix
of T ? relative to the same basis.
236.3
anti-symmetric
A relation R on A is antisymmetric iff x, y A, (xRy yRx) (x = y). The number
of possible antisymmetric relations on A is 2n 3

where n = |A|.
n2 n
2
out of the 2n total possible relations,
Antisymmetric is not the same thing as not symmetric, as it is possible to have both at
the same time. However, a relation R that is both antisymmetric and symmetric has the
condition that xRy x = y. There are only 2n such possible relations on A.
An example of an antisymmetric relation on A = {, , ?} would be R = {(?, ?), (, ), (, ?), (?, )}.
One relation that isnt antisymmetric is R = {(, ), (?, ), (, ?)} because we have both ?R
and R?, but =
6 ?
236.4
bilinear map
Definition. Let U and V be vector spaces over a field K. A function B : U V K is

called a bilinear map if
1. B(cx1 + x2 , y) = cB(x1 , y) + B(x2 , y) , c K
2. B(x, cy1 + y2 ) = cB(x, y1 ) + B(x, y2 ) , c K
That is, B is bilinear if it is linear in each parameter, taken separately.
1046
Bilinear forms. If U = V then B is a bilinear form. In this case further assumptions

are often made:
1. B(x, y) = B(y, x), x, y V (symmetric)
2. B(x, y) = B(y, x), x, y V (skew-symmetric)
3. B(x, x) = 0, x V (alternating)
By expanding B(x + y, x + y) = 0, we can show alternating implies skew-symmetric. Further
if K is not of characteristic 2, then skew-symmetric implies alternating.
Left and Right Maps. We may regard the bilinear map as a left map or right map, as
follows:
BL L(U, V ) BR L(V, U )
BL (x)(y) = B(x, y)
BR (y)(x) = B(x, y)
The left is a linear map from U into the dual of V . So for example B is skew-symmetric iff
BL BR .
Matrix Representation. Suppose U and V are finite-dimensional and we have chosen
bases, B1 = {e1 , . . .} and B2 = {f1 , . . .}. Now we define the matrix C with entries Cij =
B(ei , fj ). This will be the matrix associated to B with respect to this basis as follows; If
we write x, y V as column vectors in terms of the chosen bases, then B(x, y) = xT Cy.
Further if we choose the corresponding dual bases for U and V then C and C T are the
corresponding matrices for BR and BL , respectively (in the sense of linear maps). Thus we
see that a symmetric bilinear form is represented by a symmetric matrix, and similarly for
skew-symmetric forms.
Let B01 and B02 be new bases, and P and Q the corresponding change of basis matrices. Then
the new matrix is C 0 = P T CQ.
Rank. If U and V are finite dimensional it may be shown that rank BL = rank BR . We
call this simply the rank of B. We say that B is non-degenerate if the left and right map
are both linear isomorphisms.
Now applying the rank-nullity theorem on both left and right maps gives the following results:
dim U = dim ker BL + r
1047
dim V = dim ker BR + r

Orthogonals. If T U and S V then we may define the orthogonals T V and
S U as follows:
T = {v | B(t, v) = 0 t T }
S = {u | B(u, s) = 0 s S}
The orthogonal of a subspace is itself a subspace. Further if B is a symmetric or skew-symmetric bilinear form
then A = A , and we may choose the latter notation.
T
T
T is a non-degenerate subspace if T T = {0}. Similarly when S S = {0}.
We may also realise T by considering the restriction B 0T= BT V . Its clear that T =
ker BR0 . Now if B is non-degenerate (or more generally T V = {0}) then ker BL0 = {0},
and we can use the rank-nullity equations to get dim V = dim T + dim T . Similarly we may
show that dim U = dim S + dim S.
Canonical Representations for Symmetric Forms. If B : V V K is a symmetric
bilinear form over a finite-dimensional vector space, then there is an orthogonal basis such
that B is represented by
a1
0
.
..
0
0
a2
..
.
0
... 0
... 0
.
..
. ..
. . . an
Denote the rank of B by r.

If K = R we may choose a basis such that a1 = . . . at = 1, at+1 = . . . = at+p = 1 and
at+p+j = 0, for some integers p and t. Further these integers are invariants of the bilinear
form. This is known as Sylvesters Law of Inertia. B is positive definite iff t = n,
p = 0. Such a form constitutes a real inner product space.
If K = C we may go further and choose a basis such that a1 = . . . = ar = 1 and ar+j = 0.
If K = Fp we may choose a basis such that a1 = . . . = ar1 = 1, ar+j = 0 and ar = n or
ar = 1; where n is the least positive quadratic non-residue.
1048
Adjoint. Suppose B : U V K is a non-degenerate bilinear map. If T L(U, U) then

we define the adjoint of T , T ? L(V, V ) to be the unique linear map such that:
B(T u, v) = B(u, T ?v)
Let T : U U be the dual endormorphism. Then T ? = BR 1 T BR .
If U = V and we choose a canonical (orthogonal) basis for B, then the adjoint corresponds
to the matrix transpose.
T is then said to be a normal operator (with respect to this bilinear map) if it commutes
with its adjoint.
Examples. An important example is the non-degenerate bilinear map

B :V V K
B(v, f ) = f (v)
Here the orthogonal is exactly the annihilator. This gives the result that dim U + dim U =
dim V .
An n m matrix may be regarded as a bilinear form over K n and K m . Two matrices, B
and C, are then said to be congruent if there exists an invertible P such that B = P T CP .
Note this is different to the usual notion of congruence.
If the matrix is the identity, I, then this gives the standard Euclidean inner product on K n .
An inner product space on a vector space is a bilinear form if its field is real, but not
if it is complex. In fact, the bilinear form associated with a real inner product space is
non-degenerate, and every subspace is non-degenerate. So, as we may intuitively expect,
V = U U for every subspace U.
Version: 32 Owner: vitriol Author(s): vitriol, akrowne
236.5
dot product
Let u = (u1 , u2, . . . , un ) and v = (v1 , v2 , . . . , vn ) two vectors on k n where k is a field (like R
or C). Then we define the dot product of the two vectors as:
u v = u1 v1 + u2 v2 + + un vn .
1049
Notice that u v is NOT a vector but an scalar (an element from the field k).
If u, v are vectors in Rn and is the angle between them, then we also have
u v = kukkvk cos .
Thus, in this case, u v if and only if u v = 0.
236.6
every orthonormal set is linearly independent
Theorem
Let S be a set of vectors from an inner product space L. If S is orthonormal, then S is
linearly independent.
Proof. We denote by h, i the inner product of L. Let us first consider the case when S is
finite, i.e., S = {e1 , . . . , en } for some n. Suppose
1 e1 + + n en = 0
for some scalars i (belonging to the field on the underlying vector space of L). For a fixed
k in 1, . . . , n, we then have
0 =
=
=
=
hek , 0i
hek , 1 e1 + + n en i
1 hek , e1 i + + n hek , en i
k ,
so k = 0, and S is linearly independent. Next, suppose S is infinite (countable or uncountable).

To prove that S is linearly independent, we need to show that all finite subsets of S are linearly independent. Since any subset of an orthonormal set is also orthonormal, the infinite
case follows from the finite case. 2
The present result with proof can be found in [1], page 153.
REFERENCES
1978.
1050
236.7
inner product
An inner product on a vector space V over a field K (which must be either the field R of
real numbers or the field C of complex numbers) is a function ( , ) : V V K such that,
for all k1 , k2 K and v1 , v2 , v, w V , the following properties hold:
1. (k1 v1 + k2 v2 , w) = k1 (v1 , w) + k2 (v2 , w) (linearity 1 )
2. (v, w) = (w, v), where
denotes complex conjugation (conjugate symmetry)
3. (v, v) > 0, and (v, v) = 0 if and only if v = 0 (positive definite)

(Note: Rule 2 guarantees that (v, v) R, so the inequality (v, v) > 0 in rule 3 makes sense
even when K = C.)
The standard example of an inner product is the dot product on K n :
((x1 , . . . , xn ), (y1 , . . . , yn )) :=
n
X
xi yi
i=1
Every inner product space is a normed vector space, with the norm being defined by ||v|| :=
p
(v, v).
236.8
inner product space
A vector space over R or C taken with a specific inner product (hx, yi) forms an inner product
space.
For example, Rn with the familiar dot product forms an inner product space.
p
The expression hx, xi is written kxk and is called the norm. This makes the inner product
space also a normed vector space. That is, the inner product space also has the following
properties:
1. kcxk = |c| kxk , c K.
2. kxk = 0 if and only if x = 0, kxk 0.
3. kx + yk kxk + kyk, the triangle inequality.
1
A small minority of authors impose linearity on the second coordinate instead of the first coordinate.
1051
where K is the underlying field of the vector space.

In addition, the Cauchy-Schwarz inequality
| hx, yi | kxk kyk
holds and follows from the definition of an inner product space.
Version: 8 Owner: akrowne Author(s): AxelBoldt, akrowne
236.9
proof of Cauchy-Schwarz inequality
If a and b are linearly dependent, we write b = a. So we get:

ha, ai2 = 2 ha, ai2 = 2 ||a||4 = ||a||2 ||b||2 .
So we have equality if a and b are linearly dependent. In the other case we look at the
quadratic function
||x a + b||2 = x2 ||a||2 + 2xha, bi + ||b||2 .
This function is positive for every real x, if a and b are linearly independent. Thus it has
no real zeroes, which means that
is always negative. So we have:
ha, bi2 ||a||2 ||b||2

ha, bi2 < ||a||2||b||2,
which is the Cauchy-Schwarz inequality if a and b are linearly independent.

236.10
self-dual
Definition. Let U be a finite-dimensional inner-product space over a field K. Let T : U

U be an endomorphism, and note that the adjoint endomorphism T ? is also an endomorphism of U. It is therefore possible to add, subtract, and compare T and T ? , and we are
able to make the following definitions. An endomorphism T is said to be self-dual (a.k.a.
self-adjoint) if
T = T ?.
By contrast, we say that the endomorphism is anti self-dual if
T = T ? .
Exactly the same definitions can be made for an endomorphism of a complex vector space
with a Hermitian inner product.
1052
Relation to the matrix transpose. All of these definitions have their counterparts in
the matrix setting. Let M Matn,n (K) be the matrix of T relative to an orthogonal basis
of U. Then T is self-dual if and only if M is a symmetric matrix, and anti self-dual if and
only if M is a skew-symmetric matrix.
In the case of a Hermitian inner product we must replace the transpose with the conjugate transpose.
Thus T is self dual if and only if M is a Hermitian matrix, i.e.
M = M t.
It is anti self-dual if and only if
M = M t .
236.11
skew-symmetric bilinear form
A skew-symmetric (or antisymmetric) bilinear form is a bilinear form B which is skewsymmetric in the two coordinates; that is, B(x, y) = B(y, x) for all vectors x and y. In
particular, this means that B(x, x) = 0.
A bilinear form is skew-symmetric iff its defining matrix is skew-symmetric.
236.12
spectral theorem
Let U be a finite-dimensional, unitary space and let M : U U be an endomorphism. We

say that M is normal if it commutes with its Hermitian adjoint, i.e.
MM ? = M ? M.
Spectral Theorem Let M : U U be a linear transformation of a unitary space. TFAE
1. The transformation M is normal.
2. Letting
= { C | M is singular}
denote the spectrum (set of eigenvalues) of M, the corresponding eigenspaces
E = ker (M ),
1053
give an orthogonal, direct sum decomposition of U, i.e.

M
U=
E ,
and E1 E2 for distinct eigenvalues 1 6= 2 .

3. We can decompose M as the sum
M=
P ,
where C is a finite subset of complex numbers indexing a family of commuting

orthogonal projections P : U U, i.e.
(
P =
P ? = P
P P =
0
6= ,
and where WLOG
P = 1U .
4. There exists an orthonormal basis of U that diagonalizes M.

Remarks.
1. Here are some important classes of normal operators, distinguished by the nature of
their eigenvalues.
Hermitian operators. Eigenvalues are real.
unitary transformations. Eigenvalues lie on the unit circle, i.e. the set of complex
numbers of modulus 1.
Orthogonal projections. Eigenvalues are either 0 or 1.
2. There is a well-known version of the spectral theorem for R, namely that a self-adjoint
(symmetric) transformation of a real inner product spaces can diagonalized and that
eigenvectors corresponding to different eigenvalues are orthogonal. An even more downto-earth version of this theorem says that a symmetric, real matrix can always be
diagonalized by an orthonormal basis of eigenvectors.
3. There are several versions of increasing sophistication of the spectral theorem that
hold in infinite-dimensional, Hilbert space setting. In such a context one must distinguish between the so-called discrete and continuous (no corresponding eigenspace)
spectrums, and replace the representing sum for the operator with some kind of an
integral. The definition of self-adjointness is also quite tricky for unbounded operators. Finally, there are versions of the spectral theorem, of importance in theoretical
quantum mechanics, that can be applied to continuous 1-parameter groups of commuting, self-adjoint operators.
1054
Version: 4 Owner: rmilson Author(s): rmilson, NeuRet
1055
Chapter 237
15A66 Clifford algebras, spinors
237.1
geometric algebra
Geometric algebra is a Clifford algebra which has been used with great success in the modeling of a wide variety of physical phenomena. Clifford algebra is considered a more general
algebraic framework than geometric algebra. The primary distinction is that geometric algebra utilizes only real numbers as scalars and to represent magnitudes.
Let Vn be an ndimensional vector space over the real numbers. The geometric algebra
Gn = G(Vn ) is a graded algebra similar to Grassmanns exterior algebra, except that the
exterior product is replaced by a more fundamental multiplication operation known as the
geometric product. For vectors a, b, c Vn and real scalars , R, the geometric
product satisfies the following axioms:
associativity:
commutivity:
a(bc) = (ab)c
=
b = b
ab = 21 (ab + ba) + 21 (ab ba)
distributivity: a(b + c) = ab + ac
linearity
(b + c) =P
b + c = (b + c)
2
contraction:
a = aa = ni=1 i |ai |2 =
a + (b + c) = (a + b) + c
+ =+
+b=b+
a+b= b+a
(b + c)a = ba + ca
where i {1, 0, 1}
Commutivity of scalarscalar multiplication and vectorscalar multiplication is symmetric;

however, in general, vectorvector multiplication is not commutative. The order of multiplication of vectors is significant. In particular, for parallel vectors:
ab = ba
and for orthogonal vectors:
ab = ba
1056
The parallelism of vectors is encoded as a symmetric property, while orthogonality of vectors

is encoded as an antisymmetric property.
The contraction rule specifies that the square of any vector is a scalar equal to the sum of
the square of the magnitudes of its components in each basis direction. Depending on the
contraction rule for each of the basis directions, the magnitude of the vector may be positive,
negative, or zero. A vector with a magnitude of zero is called a null vector.
The graded algebra Gn generated from Vn is defined over a 2n -dimensional linear space.
This basis entities for this space can be generated by successive application of the geometric
product to the basis vectors of Vn until a closed set of basis entities is obtained. The basis
entites for the space are known as blades. The following multiplication table illustrates the
generation of basis blades from the basis vectors e1 , e2 Vn .
0
e1
e2
e12
e1
1
e12 1 e2
e2 e12
2 2 e1
e12 1 e2 2 e1 1 2
Here, 1 and 2 represent the contraction rule for e1 and e2 respectively. Note that the basis
vectors of Vn become blades themselves in addition to the multiplicative identity, 0 1
and the new bivector e12 e1 e2 . As the table demonstrates, this set of basis blades is
closed under the geometric product.
The geometric product ab is related to the inner product a b and the exterior product
a b by
ab = a b + a b = b a b a = 2a b ba.
Version: 7 Owner: PhysBrain Author(s): PhysBrain
1057
Chapter 238
15A69 Multilinear algebra, tensor
products
238.1
Einstein summation convention
The Einstein summation convention imply that when an index occurs more than once
in the same expression, the expression is implicitly summed over all possible values for that
index. Therefore, in order to use the summation convention, it must be clear from the
context over what range indices should be summed.
The Einstein summation convention is illustrated in the below examples.
n
in Rn . Then the inner product of the vectors u =
1. Let {ei }P
i=1 be a orthogonal basis P
n
ui ei = ( i=1 )uiei and v = v i ei = ( ni=1 )v i ei , is
u v = ui v j ei ej
= ij uiv j .
2. Let V be a vector space with basis {ei }ni=1 and a dual basis {ei }ni=1 . Then, for a vector
v = v i ei and dual vectors = i ei and = i ei , we have
( + )(v) = i v i + j v j
= (i + i )v i .
This example shows that the summation convention is distributive in a natural way.
3. chain rule. Let F : Rm Rn , x 7 (F 1 , , F n ), and G : Rn Rp , y 7 (G1 (y), , Gp (y))
be smooth functions. Then
F k
Gi
(G F )i
(x)
=
F
(x)
(x),
xj
y k
xj
where the right hand side is summed over k = 1, . . . , n.
1058
An index which is summed is called a dummy index or dummy variable. For instance, i
is a dummy index in v iei . An expression does not depend on a dummy index, i.e., v i ei = v j ej .
It is common that one most change the name of dummy indices. For instance, above, in
Example 2 when we calculated u v, it was necessary to change the index i in v = v iei to j
so that it would not clash with u = ui ei .
When using the Einstein summation convention, objects are usually indexed so that when
summing, one index will always be an upper index and the other a lower index. Then
summing should only take place over upper and lower indices. In the above examples, we
have followed this rule. Therefore we did not write ij uiv j = uiv i in the first example since
ui v i has two upper indices. This is consistent; it is not possible to take the inner product of
two vectors without a metric, which is here ij . The last example illustrates that when we
i
consider k as a lower index in G
, then the chain rule obeys this upper-lower rule for the
y k
indices.
238.2
basic tensor
The present entry employs the terminology and notation defined and described in the entry
on tensor arrays. To keep things reasonably self-contained we mention that the symbol Tp,q
refers to the vector space of type (p, q) tensor arrays, i.e. Maps
I p I q K,
where I is some finite list of index labels, and where K is a field.
We say that a tensor array is a characteristic array, a.k.a. a basic tensor, if all but one of its
values are 0, and the remaining non-zero value is equal to 1. For tuples A I p and B I q ,
we let
p
q
B
A : I I K,
denote the characteristic array defined by

1 if (i1 , . . . , ip ) = A and (j1 , . . . , jp ) = B,
B i1 ...ip
(A )j1 ...jq =
0 otherwise.
The type (p, q) characteristic arrays form a natural basis for Tp,q .
Furthermore the outer multiplication of two characteristic arrays gives a characteristic array
of larger valence. In other words, for
A1 I p1 , B1 I q1 , A2 I p2 , B2 I q2 ,
we have that
B1 B2
1 B2
B
A1 A2 = A1 A2 ,
1059
where the product on the left-hand side is performed by outer multiplication, and where
A1 A2 on the right-hand side refers to the element of I p1 +p2 obtained by concatenating the
tuples A1 and A2 , and similarly for B1 B2 .
In this way we see that the type (1, 0) characteristic arrays (i) , i I (the natural
basis of
I
(i)
I
K ), and the type (0, 1) characteristic arrays , i I (the natural basis of K ) generate
the tensor array algebra relative to the outer multiplication operation.
The just-mentioned fact gives us an alternate way of writing and thinking about tensor
arrays. We introduce the basic symbols
(i) , (i) ,
iI
subject to the commutation relations

0
(i) (i ) = (i ) (i) ,
i, i0 I,
add and multiply these symbols using coefficients in K, and use

(i ...i )
(j11 ...jqp ) ,
i1 , . . . , iq , j1 , . . . , jp I
as a handy abbreviation for

(i1 ) . . . (iq ) (j1 ) . . . (jp ) .
We then interpret the resulting expressions as tensor arrays in the obvious fashion: the values
of the tensor array are just the coefficients of the symbol matching the given index. However, note that in the symbols, the covariant data is written as a superscript, and the contravariant data as a subscript. This is done to facilitate the Einstein summation convention.
By way of illustration, suppose that I = (1, 2). We can now write down a type (1, 0) tensor,
i.e. a column vector
1
u
T1,0
u=
u2
as
u = u1 (1) + u2 (2) .
Similarly, a row-vector
= (1 , 2 ) T0,1
can be written down as
= 1 (1) + 2 (2) .
In the case of a matrix
we would write
1

M 1 M 21
M=
T1,1
M 12 M 22
(1)
(2)
(1)
(2)
M = M 11 (1) + M 12 (1) + M 21 (2) + M 22 (2) .

1060
238.3
multi-linear
Let V1 , V2 , . . . , Vn , W be vector spaces over a field K. A mapping

M : V1 V2 . . . Vn W
is called multi-linear or n-linear, if M is linear in each of its arguments.
Notes.
A bilinear mapping is another name for a 2-linear mapping.
This definition generalizes in an obvious way to rings and modules.
An excellent example of a multi-linear map is the determinant operation.
238.4
outer multiplication
Note: the present entry employs the terminology and notation defined and described in the
entry on tensor arrays. To keep things reasonably self contained we mention that the symbol
Tp,q refers to the vector space of type (p, q) tensor arrays, i.e. Maps
I p I q K,
where I is some finite list of index labels, and where K is a field.
Let p1 , p2 , q1 , q2 be natural numbers. Outer multiplication is a bilinear operation
Tp1 ,q1 Tp2 ,q2 Tp1 +p2 ,q1 +q2
that combines a type (p1 , q1 ) tensor array X and a type (p2 , q2 ) tensor array Y to produce
a type (p1 + p2 , q1 + q2 ) tensor array XY (also written as X Y ), defined by
i1 ...ip ip
+1 ...ip +p
i ...ip
ip
+1 ...ip +p
(XY )j1 ...jq11 jq11 +1 ...jq11 +q22 = Xj11...jq11 Yjq11+1 ...jq11+q22

Speaking informally, what is going on above is that we multiply every value of the X array
by every possible value of the Y array, to create a new array, XY . Quite obviously then,
the size of XY is the size of X times the size of Y , and the index slots of the product XY
are just the union of the index slots of X and of Y .
1061
Outer multiplication is a non-commutative, associative operation. The type (0, 0) arrays are
the scalars, i.e. elements of K; they commute with everything. Thus, we can embed K into
the direct sum
M
Tp,q ,
p,qN
and thereby endow the latter with the structure of an K-algebra 1 .
By way of illustration we mention that the outer product of a column vector, i.e. a type
(1, 0) array, and a row vector, i.e. a type (0, 1) array, gives a matrix, i.e. a type (1, 1) tensor
array. For instance:

ax
ay
az
a

b x y z = bx by bz , a, b, c, x, y, z K
cx cy cz
c
238.5
tensor
Overview A tensor is the mathematical idealization of a geometric or physical quantity

whose analytic description, relative to a fixed frame of reference, consists of an array of
numbers 2 . Some well known examples of tensors in geometry are quadratic forms, and the
curvature tensor. Examples of physical tensors are the energy-momentum tensor, and the
polarization tensor.
Geometric and physical quantities may be categorized by considering the degrees of freedom
inherent in their description. The scalar quantities are those that can be represented by a
single number speed, mass, temperature, for example. There are also vector-like quantities, such as force, that require a list of numbers for their description. Finally, quantities
such as quadratic forms naturally require a multiply indexed array for their representation.
These latter quantities can only be conceived of as tensors.
Actually, the tensor notion is quite general, and applies to all of the above examples; scalars
and vectors are special kinds of tensors. The feature that distinguishes a scalar from a vector,
and distinguishes both of those from a more general tensor quantity is the number of indices
in the representing array. This number is called the rank of a tensor. Thus, scalars are rank
zero tensors (no indices at all), and vectors are rank one tensors.
1
We will not pursue this line of thought here, because the topic of algebra structure is best dealt with in
the a more abstract context. The same comment applies to the use of the tensor product sign in denoting
outer multiplication. These topics are dealt with in the entry pertaining to abstract tensor algebra.
2
Ceci nest pas une pipe, as Rene Magritte put it. The image and the object represented by the image
and not the same thing. The mass of a stone is not a number. Rather the mass can be described by a
number relative to some specified unit mass.
1062
It is also necessary to distinguish between two types of indices, depending on whether the
corresponding numbers transform covariantly or contravariantly relative to a change in the
frame of reference. Contravariant indices are written as superscripts, while the covariant
indices are written as subscripts. The valence of a tensor is the pair (p, q), where p is the
number contravariant and q the number of covariant indices, respectively.
It is customary to represent the actual tensor, as a stand-alone entity, by a bold-face symbol
such as T. The corresponding array of numbers for a type (p, q) tensor is denoted by the
i ...i
symbol Tj11...jqp , where the superscripts and subscripts are indices that vary from 1 to n. This
number n, the range of the indices, is called the dimension of the tensor. The total degrees
of freedom required for the specification of a particular tensor is a power of the dimension;
the exponent is the tensors rank.
i ...i
Again, it must be emphasized that the tensor T and the representing array Tj11...jpq are not
the same thing. The values of the representing array are given relative to some frame of
reference, and undergo a linear transformation when the frame is changed.
Finally, it must be mentioned that most physical and geometric applications are concerned
with tensor fields, that is to say tensor valued functions, rather than tensors themselves.
Some care is required, because it is common to see a tensor field called simply a tensor.
i ...i
There is a difference, however; the entries of a tensor array Tj11...jpq are numbers, whereas
the entries of a tensor field are functions. The present entry treats the purely algebraic
aspect of tensors. Tensor field concepts, which typically involved derivatives of some kind,
are discussed elsewhere.
Definition. The formal definition of a tensor quantity begins with a finite-dimensional

vector space U, which furnishes the uniform building blocks for tensors of all valences.
In typical applications, U is the tangent space at a point of a manifold; the elements of U
represent velocities and forces. The space of (p, q)-valent tensors, denoted here by Up,q is
obtained by taking the tensor product of p copies of U and q copies of the dual vector space
U . To wit,
q times
p times
}|
{
z
}|
{ z
Up,q = U . . . U U . . . U .
In order to represent a tensor by a concrete array of numbers, we require a frame of reference,

which is essentially a basis of U, say
e1 , . . . , en U.
Every vector in U can be measured relative to this basis, meaning that for every v U
there exist unique scalars v i , such that (note the use of the Einstein summation convention)
v = v i ei .
These scalars are called the components of v relative to the frame in question.
1063
Let e1 , . . . , en U be the corresponding dual basis, i.e.

ei (ej ) = ji ,
where the latter is the Kronecker delta array. For every covector U there exists a
unique array of components i such that
= i ei .
More generally, every tensor T Up,q has a unique representation in terms of components.
i ...i
That is to say, there exists a unique array of scalars Tj11...jpq such that
i ...i
T = Tj11...jpq ei1 . . . eiq ej1 . . . ejp .

Transformation rule. Next, suppose that a change is made to a different frame of reference, say
n U.
e1 , . . . , e
Any two frames are uniquely related by an invertible transition matrix Aij , having the
property that for all values of j we have
ej = Aij ei .
(238.5.1)
Let v U be a vector, and let v i and vi denote the corresponding component arrays relative
to the two frames. From
i ,
v = v iei = vi e
and from (238.5.1) we infer that
vi = B ij v j ,
(238.5.2)
where B ij is the matrix inverse of Aij , i.e.

Aik B kj = ij .
Thus, the transformation rule for a vectors components (238.5.2) is contravariant to the
transformation rule for the frame of reference (238.5.1). It is for this reason that the superscript indices of a vector are called contravariant.
To establish (238.5.2), we note that the transformation rule for the dual basis takes the form
i = B ij ej ,
e
and that
v i = ei (v),
while
i (v).
vi = e
1064
The transformation rule for covector components is covariant. Let U be a given

covector, and let i and
i be the corresponding component arrays. Then
j = Aij i .
The above relation is easily established. We need only remark that
i = (ei ),
and that
j = (
ej ),
and then use (238.5.1).
In light of the above discussion, we see that the transformation rule for a general type (p, q)
tensor takes the form
i ...i
i
k ...k
Tj11 ...jqp = Aik11 A kqq B lj11 B lj1p T l11...lpq .
Version: 6 Owner: rmilson Author(s): rmilson, djao
238.6
tensor algebra
Let R be a ring, and M an R-module. The tensor algebra T(M) is a graded R-algebra with
n-th graded component Tn (M) simply the nth tensor power M n of M, and multiplication
ab = a b Tn+m (M) for a Tn (M), b Tm (M). Thus
T(M) =
M
n=0
Tn (M) = R M M M
T is a functor and the adjoint to the forgetful functor from R-algebras to R-modules. That
is, every module homorphism M S where S is a R-algebra, extends to a unique R-algebra
homorphism T(M) S.
238.7
tensor array
Introduction. Tensor arrays, or tensors for short 3 are multidimensional arrays with two
types of (covariant and contravariant) indices. Tensors are widely used in science and mathematics, because these data structures are the natural choice of representation for a variety
of important physical and geometric quantities.
3
The word tensor has other meanings, c.f. the tensor entry.
1065
In this entry we give the definition of a tensor array and establish some related terminology
and notation. The theory of tensor arrays incorporates a number of other essential topics:
basic tensors, tensor transformations, outer multiplication, contraction, inner multiplication,
and generalized transposition. These are fully described in their separate entries.
Valences and the space of tensors arrays. Let K be a field
of indices 5 , such as (1, 2, . . . , n). A tensor array of type
(p, q),
and let I be a finite list
p, q N
is a mapping
I p I q K.
The set of all such mappings will be denoted by Tp,q (I, K), or when I and K are clear from
the context, simply as Tp,q . The numbers p and q are called, respectively, the contravariant
and the covariant valence of the tensor array.
Point-wise addition and scaling give Tp,q the structure of a a vector space of dimension
np+q , where n is the cardinality of I. We will interpret I 0 as signifying a singleton set.
Consequently Tp,0 and T0,q are just the maps from, respectively, I p and I q to K. It is also
customary to identify T1,0 with KI , the vector space of list vectors indexed by I, and to
identify T0,1 with dual space KI of linear forms on KI . Finally, T0,0 can be identified
with K itself. In other words, scalars are tensor arrays of zero valence.
Let X : I p I q K be a type (p, q) tensor array. In writing the values of X, it is customary to
write contravariant indices using superscripts, and covariant indices using subscripts. Thus,
for indices i1 , . . . , ip , j1 , . . . , jq I we write
i ...i
Xj11...jpq
instead of
X(i1 , . . . , ip ; j1 , . . . , jp ).
We also mention that it is customary to use columns to represent contravariant index dimensions, and rows to represent the covariant index dimensions. Thus column vectors are type
(1, 0) tensor arrays, row vectors are type (0, 1) tensor arrays, and matrices, in as much as
they can be regarded either as rows of columns or as columns of rows, are type (1, 1) tensor
arrays.7
4
In physics and differential geometry, K is typically R or C.

It is advantageous to allow general indexing sets, because one can indicate the use of multiple frames
of reference by employing multiple, disjoint sets of indices.
6
Curiously, the latter notation is preferred by some authors. See H. Weyls books and papers, for example.
7
It is also customary to use matrices to also represent type (2, 0) and type (0, 2) tensor arrays (The latter
are used to represent quadratic forms.) Speaking idealistically, such objects should be typeset, respectively,
as a column of column vectors and as a row of row vectors. However typographical constraints and notational
convenience dictate that they be displayed as matrices.
5
1066
Notes. It must be noted that our usage of the term tensor array is non-standard. The
traditionally inclined authors simply call thse data structures tensors. We bother to make
the distinction because the traditional nomenclature is ambiguous and doesnt include the
modern mathematical understanding of the tensor concept. (This is explained more fully
in the tensor entry.) Precise and meaningful definitions can only be given by treating the
concept of a tensor array as distinct from the concept of a geometric/abstract tensor.
We also mention that the term tensor is often applied to objects that should more appropriately be termed a tensor field. The latter are tensor-valued functions, or more generally
sections of a tensor bundle. A tensor is what one gets by evaluating a tensor field at one
point. Informally, one can also think of a tensor field as a tensor whose values are functions,
rather than constants.
238.8
tensor product (vector spaces)
Definition. The classical conception of the tensor product operation involved finite dimensional
vector spaces A, B, say over a field K. To describe the tensor product A B one was obliged
to chose bases
ai A, i I, bj B, j J
of A and B indexed by finite sets I and J, respectively, and represent elements of a A

and b B by their coordinates relative to these bases, i.e. as mappings a : I K and
b : J K such that
X
X
a=
ai ai ,
b=
bj bj .
iI
jJ
One then represented A B relative to this particular choice of bases as the vector space of
mappings c : I J K. These mappings were called second-order contravariant tensors
and their values were customarily denoted by superscripts, a.k.a. contravariant indices:
cij K,
i I, j J.
The canonical bilinear multiplication (also known as outer multiplication)

:AB AB
was defined by representing a b, relative to the chosen bases, as the tensor
cij = ai bj ,
i I, j J.
In this system, the products

ai bj ,
i I, j J
were represented by basic tensors, specified in terms of the Kronecker deltas as the mappings
0
(i0 , j 0) 7 ii jj ,
i0 I, j 0 J.
1067
These gave a basis of A B.

The construction is independent of the choice of bases in the following sense. Let
a0i A, i I 0 ,
b0j B, j J 0
be different bases of A and B with indexing sets I 0 and J 0 respectively. Let

r : I I 0 K,
s : J J 0 K,
be the corresponding change of basis matrices determined by

X

rii0 ai , i0 I 0
a0i0 =
iI
b0j 0
X
jI

sjj 0 bj ,
j 0 J 0.
One then stipulated that tensors c : I J K and c0 : I 0 J 0 K represent the same

element of A B if
X

0 0
cij =
(238.8.1)
rii0 sjj 0 (c0 )i j
i0 I 0
j 0 J 0
for all i I, j J. This relation corresponds to the fact that the products
a0i b0j ,
i I 0, j J 0
constitute an alternate basis of A B, and that the change of basis relations are
X

rii0 sjj 0 ai bj , i0 I 0 , j 0 J 0 .
(238.8.2)
a0i0 b0j 0 =
i0 I 0
j 0 J 0
Notes. Historically, the tensor product was called the outer product, and has its origins in
the absolute differential calculus (the theory of manifolds). The old-time tensor calculus is
difficult to understand because it is afflicted with a particularly lethal notation that makes
coherent comprehension all but impossible. Instead of talking about an element a of a vector
space, one was obliged to contemplate a symbol ai , which signified a list of real numbers
indexed by 1, 2, . . . , n, and which was understood to represent a relative to some specified,
but unnamed basis.
What makes this notation truly lethal is the fact a symbol aj was taken to signify an alternate
list of real numbers, also indexed by 1, . . . , n, and also representing a, albeit relative to a
different, but equally unspecified basis. Note that the choice of dummy variables make all
the difference. Any sane system of notation would regard the expression
ai ,
i = 1, . . . , n
as representing a list of n symbols

a1 , a2 , . . . , an .
1068
However, in the classical system, one was strictly forbidden from using
a1 , a2 , . . . , an
because where, after all, is the all important dummy variable to indicate choice of basis?
Thankfully, it is possible to shed some light onto this confusion (I have read that this is
credited to Roger Penrose) by interpreting the symbol ai as a mapping from some finite
index set I to R, whereas aj is interpreted as a mapping from another finite index set J (of
equal cardinality) to R.
My own surmise is that the source of this notational difficulty stems from the reluctance of
the ancients to deal with geometric objects directly. The prevalent superstition of the age
held that in order to have meaning, a geometric entity had to be measured relative to some
basis. Of course, it was understood that geometrically no one basis could be preferred to any
other, and this leads directly to the definition of geometric entities as lists of measurements
modulo the equivalence engendered by changing the basis.
It is also worth remarking on the contravariant nature of the relationship between the actual
elements of AB and the corresponding representation by tensors relative to a basis compare equations (1) and (2). This relationship is the source of the terminology contravariant
tensor and contravariant index, and I surmise that it is this very medieval pit of darkness
and confusion that spawned the present-day notion of contravariant functor.
References.
1. Levi-Civita, The Absolute Differential Calculus.
238.9
tensor transformations
The present entry employs the terminology and notation defined and described in the entry
on tensor arrays and basic tensors. To keep things reasonably self contained we mention
that the symbol Tp,q refers to the vector space of type (p, q) tensor arrays, i.e. Maps
I p I q K,
where I is some finite list of index labels, and where K is a field. The symbols (i) , (i) , i I
refer to the column and row vectors giving the natural basis of T1,0 and T0,1 , respectively.
Let I and J be two finite lists of equal cardinality, and let
T : KI KJ
1069
be a linear isomorphism. Every such isomorphism is uniquely represented by an invertible

matrix
M :J I K
with entries given by
M ji = T (i)
j
i I, j J.
In other words, the action of T is described by the following substitutions

X j
(i) 7
M i (j) , i I.
(238.9.1)
jJ
Equivalently, the action of T is given by matrix-multiplication of column vectors in KI by

M.
The corresponding substitutions relations for the type (0, 1) tensors involve the inverse matrix
M 1 : I J K, and take the form 8
X
i
(238.9.2)
M 1 j (j) , i I.
(i) 7
jJ
The rules for type (0, 1) substitutions are what they are, because of the requirement that
the (i) and (i) remain dual bases even after the substitution. In other words we want the
substitutions to preserve the relations
(i1 ) (i2 ) = ii12 ,
i1 , i2 I,
where the left-hand side of the above equation features the inner product and the right-hand
side the Kronecker delta. Given that the vector basis transforms as in (238.9.1) and given
the above constraint, the substitution rules for the linear form basis, shown in (238.9.2), are
the only such possible.
The classical terminology of contravariant and covariant indices is motivated by thinking in
term of substitutions. Thus, suppose we perform a linear substitution and change a vector,
i.e. a type (1, 0) tensor, X KI into a vector Y KJ . The indexed values of the former
and of the latter are related by
X j
Yj =
M i X i , j J.
(238.9.3)
iI
Thus, we see that the transformation rule for indices is contravariant to the substitution
rule (238.9.1) for basis vectors.
8
The above relations describe the action of the dual homomorphism of the inverse transformation

T 1 : KI KJ .
1070
In modern terms, this contravariance is best described by saying that the dual space space
construction is a contravariant functor 9 . In other words, the substitution rule for the linear
forms, i.e. the type (0, 1) tensors, is contravariant to the substitution rule for vectors:
X j
(j) 7
M i (i) , j J,
(238.9.4)
iI
in full agreement with the relation shown in (238.9.2). Everything comes together, and
equations (238.9.3) and (238.9.4) are seen to be one and the same, once we remark that
tensor array values can be obtained by contracting with characteristic arrays. For example,
X i = (i) (X),
Y j = (j) (Y ),
i I;
j J.
Finally we must remark that the transformation rule for covariant indices involves the inverse
matrix M 1 . Thus if T0,1 (I) is transformed to a T0,1 the indices will be related by
X
i
M 1 j i , j J.
j =
iI
The most general transformation rule for tensor array indices is therefore the following: the
indexed values of a tensor array X Tp,q (I) and the values of the transformed tensor array
Y Tp,q (J) are related by
X
kq i ...i
k
j
j ...j
Y l11 ...lqp =
M ji11 M ipp M 1 l11 M 1 lq Xk11 ...kpq ,
i1 ,...,ip I p
k1 ,...,kq I q
for all possible choice of indices
j1 , . . . jp , l1 , . . . , lq J. Debauche of indices, indeed!
See the entry on the dual homomorphism.
1071
Chapter 239
15A72 Vector and tensor algebra,
theory of invariants
239.1
bac-cab rule
The bac-cab rule states that for vectors A, B, and C (that can be either real or complex)
in R3 , we have
A (B C) = B(A C) C(A B).
Here is the cross product, and is the real inner product.

239.2
cross product
The cross product of two vectors is a vector orthogonal to the plane of the two vectors
being crossed, whose magnitude is equal to the area of the parallelogram defined by the two
vectors. Notice there can be two such vectors. The cross product produces the vector that
would be in a right-handed coordinate system with the plane. It is exclusively for use in R3
as you can see from the definition. We write the cross product of the vectors ~a and ~b as

i j k
a2 a3
a1 a3
a1 a2
~
~a b = det a1 a2 a3 = det
i det
j + det
k
b2 b3
b1 b3
b1 b2
b1 b2 b3
where {i, j, k}
is any right-handed basis
with ~a = a1i + a2j + a3 k and ~b = b1i + b2j + b3 k,
3
for R .
Note that ~a ~b = ~b ~a.
Properties of the Cross Product:
1072
From the expression for the area of a parallelogram we obtain |~a ~b| = |~a||~b| sin where
is the angle between ~a and ~b.
From the above, you can see that the cross product of two parallel vectors, or a vector
with itself, would be 0, assuming neither vector is ~0 since sin 0 = 0.
Version: 9 Owner: mathwizard Author(s): mathwizard, slider142
239.3
euclidean vector
A euclidean vector is an geometric entity that has the properties of magnitude and direction.
For
a vector in R2 can be represented by its components like this (3, 4) or like this

example,
3
. This particular vector can be represented geometrically like this:
4
In Rn , a vector can be easily constructed as the line segment between points whose difference
in each coordinate are the components of the vector. A vector constructed at the origin (The
vector (3, 4) drawn from the point (0, 0) to (3, 4)) is called a position vector. Note that a
vector that is not a position vector is independent of position. It remains the same vector
unless one changes its magnitude or direction.
Magnitude: The magnitude of a vector is the distance from one end of the line segment to
the other. The magnitude of the vector comes from the metric of the space it is embedded
in. Its magnitude is also referred to as the vector norm. In Euclidean space, this can be
gotten using Pythagoreans theorem in Rn such that for a vector ~v Rn , the length |~v | is
such that:
v
u n
uX
|~v | = t
x2n
i=0
This can also be found by the dot product ~a ~a.

Direction: A vector basically is a direction. However you may want to know the angle
it makes with another vector. In this case, one can use the dot product for the simplest
computation, since if you have two vectors ~a , ~b,
~a ~b = |~a||~b| cos
Since is the angle between the vectors, you just solve for . Note that if you want to find
the angle the vector makes with one of the axes, you can use trigonometry.
In this case, the length of the vector is just the hypotenuse of a single right triangle, and the
angle is just arctan xy , the arctangent of its components.
Projection, resolving a vector into its components:Say all we had was the magnitude
and angle of the vector to the x-axis and we needed the components.(Maybe we need to do
a cross product or addition). Look at figure a. again. The x component of the vector can be
likened to a plumb line dropped from the arrowhead of the vector to the x-axis so that it
is perpendicular to the x-axis. Thus, to get the x-component, all we would have to do is
1073
multiply the magnitude of the vector by the cosine of . This is called resolving the vector
into its components.
Now say we had another vector at an angle 2 to our vector. We construct a line between
the arrow of our vector and intercept the other vector in the same way, perpendicularly. The
length from the tail of the vector to our interception is called the projection of our vector
onto the other. Note that the equation remains the same. The is still d cos 2 . This formula
is valid in all higher dimensions as the angle between two vectors takes place on a plane.
Miscellaneous Properties:
Two vectors that are parallel to each other are called collinear, as they can be drawn
onto the same line. Collinear vectors are scalar multiples of each other.
Two non-collinear vectors are coplanar.
There are two main types of products between vectors. The dot product (scalar product),
and the cross product (vector product). There is also a commonly used combination called
the triple scalar product.
Version: 18 Owner: mathwizard Author(s): mathwizard, slider142
239.4
rotational invariance of cross product
Theorem
Let R be a rotational 3 3 matrix, i.e., a real matrix with det R = 1 and R1 = RT . Then
for all vectors u, v in R3 ,
R (u v) = (R u) (R v).
Proof. Let us first fix some right hand oriented orthonormal basis in R3 . Further, let
{u1 , u2, u3 } and {v 1 , v 2 , v 3 } be the components of u and v in that basis. Also, in the chosen
basis, we denote the entries of R by Rij . Since R is rotational, we have Rij Rkj = ik where
ik is the Kronecker delta symbol. Here we use the Einstein summation convention. Thus, in
the previous expression, on the left hand side, j should be summed over 1, 2, 3. We shall use
the Levi-Civita permutation symbol to write the cross product. Then the i:th coordinate
of u v equals (u v)i = ijk uj v k . For the kth component of (R u) (R v) we then have
((R u) (R v))k =
=
=
=
imk Rij Rmn uj v n

iml kl Rij Rmn uj v n
iml Rkr Rlr Rij Rmn uj v n
jnr det R Rkr uj v n .
The last line follows since ijk Rim Rjn Rkr = mnr ijk Ri1 Rj2 Rk3 = mnr det R. Since det R =
1, it follows that
((R u) (R v))k = Rkr jnr uj v n
= Rkr (u v)r
= (R u v)k
1074
as claimed. 2
1075
Chapter 240
15A75 Exterior algebra, Grassmann
algebras
240.1
contraction
Definition Let be a smooth k-form on a smooth manifold M, and let be a smooth

vector field on M. The contraction of with is the smooth (k 1)-form that maps
x M to x (x , ). In other words, is point-wise evaluated with in the first slot. We
shall denote this (k 1)-form by . If is a 0-form, we set = 0 for all .
properties Let and be as above. Then the following properties hold:
1. For any real number k
k = k .
2. For vector fields and
+ = + ,
= ,
= 0.
3. Contraction is an anti-derivation [1]. If 1 is a p-form, and 2 is a q-form, then

1 2 = ( 1) 2 + (1)p 1 ( 2 ).
REFERENCES
1. T. Frankel, Geometry of physics, Cambridge University press, 1997.

1076
240.2
exterior algebra
Let V be a vector space over a field K. The exterior algebra over V , commonly denoted by
(V ), is an associative K-algebra with unit, together with a linear injection
: V (V ),
which is characterized, up to isomorphism, by the universal properties given below. Note
that the exterior algebra product operation is most commonly denoted by a wedge symbol:
. Also note that the accepted convention is to identify v V with its image (v) (V ),
i.e. we dont bother writing (v) and just write v instead.
The exterior product is anti-symmetric in the sense that for all v V we have
v v = 0.
Let A be an associative K-algebra with unit. Every K-linear homomorphism
:V A
such that
vV
(v)(v) = 0,
lifts to a unique K-algebra homomorphism
: (V ) A
such that
(v) = ((v)),
v V.
Diagramatically:
V
(V )
!
A
So much for the abstract definition. It is concise, but does little to illuminate the nature of
the elements of (V ). To that end, let us say that an (V ) is k-primitive if there exist
v1 , . . . , vk V such that
= v1 . . . vk ,
and say that an element of (V ) is k-homogeneous if it is the linear combination of kprimitive elements. We then define 0 (V ) to be the span of the unit element, define 1 (V )
to be the image of canonical injection , and for k > 2 define k (V ) to be the vector space
of all k-homogeneous elements of (V ).
1077
Proposition 12. The above grading gives (V ) the structure of anti-symmetrically N-graded
algebra. To be more precise,
M
(V ) =
k (V ),
k=0
such that if j (V ) and k (V ), then
j+k (V )
with
= (1)jk .
Proof. The essence of the proof is that we construct a model of the exterior algebra, and
then use the universality properties of (V ) to show that it is isomorphic to that model. To
that end, for k N let V k denote the k-fold tensor product of V with itself. Note: it must
be understood that V 0 = K and that V 1 = V . For k > 2 we let Rk be the span of all
elements
v1 . . . vk V k ,
such that
vj = vj+1
for some j = 1, . . . , k 1 We then set
E k = V k /Rk
and set
E=
Ek.
k=0
It can be easily shown that the tensor product multiplication

: V j V k V (j+k)
factors through to the quotient and gives a well-defined associative, anti-symmetric product
on E. From there, the universality properties of the tensor product and the universality
properties of (V ) imply that E and (V ) are isomorphic algebras.
Q.E.D.
In the case that V is a finite-dimensional vector space one can give some more down-toearth definitions that may be helpful in understanding the nature of the exterior product.
Suppose then, that V is n-dimensional, and let V denote the dual space of linear forms.
Note that (V )k is naturally identified with the vector space of multi-linear mappings
V k R. It therefore makes sense to identify k (V ) with the vector space of anti-symmetric,
multi-linear mappings V k R. Next, we define the alternation operator
Ak : (V )k k (V )
1078
as follows. For (V )k we define Ak () k (V ) by

Ak ()(v1 , . . . , vk ) =
1 X
sgn()(v1 , . . . , vk ),
k!
v1 , . . . , vk V
where the right-hand sum is taken over all permutations of {1, . . . , k}, and sgn() = 1
according to the parity of the permutation. Let us also note that Ak restricts to the identity
on k (V ). Finally for j (V ), and k (V ) we define
= Aj+k ( ),
Proposition 13. The above wedge product is an associative product on
n
M
k (V )
k=0
and makes the latter a model of (V ), the exterior algebra on V .

A description of the exterior algebra in terms of a basis may also be useful. Therefore, let
e1 , . . . , en be a basis of V . For every sequence I = (i1 , . . . , ik ) of natural numbers between 1
and n let eI denote the primitive element ei1 . . . eik . If I is the empty sequence, we use
eI to denote the unit element of the exterior algebra. Note that
eI = 0
if and only if I contains duplicate entries. For a permutation of {1, . . . , k} let (I) denote
the sequence (i1 , . . . , ik ), and note that
e(I) = sgn()eI .
Proposition 14. The exterior algebra (V ) is a 2n dimensional vector space with basis {eI },
where the index I runs over all strictly increasing sequences including the empty sequence
of natural numbers between 1 and n.
The upshot of all this is that for finite-dimensional vector spaces we have another way to
construct a model of the exterior algebra. Namely, we choose a basis e1 , . . . , en an define
formal bi-vector symbols ei ej subject to the anti-symmetric relations
ei ei = 0,
and ei ej = ej ei .
We then define tri-vector symbols, and more generally k-vector symbols subject to the obvious k-place anti-symmetric relations. The exterior algebra (V ) is then defined to be the
vector space of all possible linear combinations of the k-vector symbols, and the algebra
product is defined by linearly extending the wedge product to all of (V ). Also note that
for k > n all k-vector symbols are identified with zero, and hence that k (V ) = {0} for all
k > n.
1079
Notes. The exterior algebra is also known as the Grassmann algebra after its inventor
Hermann Grassmann who created it in order to give algebraic treatment of linear geometry.
Grassman was also one of the first people to talk about the geometry of an n-dimensional
space with n an arbitrary natural number. The axiomatics of the exterior product are needed
to define differential forms and therefore play an essential role in the theory of integration
on manifolds. Exterior algebra is also an essential prerequisite to understanding DeRhams
theory of differential cohomology.
1080
Chapter 241
15A99 Miscellaneous topics
241.1
Kronecker delta
The Kronecker delta ij is defined as having value 1 when i = j and 0 otherwise (i and j are
integers). It may also be written as ij or ji . It is a special case of the generalized Kronecker delta symbol.
The delta symbol was first used in print by Kronecker in 1868[1].
Example.
The n n identity matrix I can be written in terms of the Kronecker delta as simply the
matrix of the delta, Iij = ij , or simply I = (ij ).
REFERENCES
1. N. Higham, Handbook of writing for the mathematical sciences, Society for Industrial and
Applied Mathematics, 1998.
241.2
dual space
Dual of a vector space; dual bases

Let V be a vector space over a field k. The dual of V , denoted by V ? , is the vector space of
linear forms on V , i.e. linear mappings V k. The operations in V ? are defined pointwise:
( + )(v) = (v) + (v)
1081
()(v) = (v)
?
for K, v V and , V .
V is isomorphic to V ? if and only if the dimension of V is finite. If not, then V ? has a larger
(infinite) dimension than V ; in other words, the cardinal of any basis of V ? is strictly greater
than the cardinal of any basis of V .
Even when V is finite-dimensional, there is no canonical or natural isomorphism V V ? .
But on the other hand, a basis B of V does define a basis B? of V ? , and moreover a bijection
B B? . For suppose B = {b1 , . . . , bn }. For each i from 1 to n, define a mapping
i : V k
by
i (
xk bk ) = xi .
It is easy to see that the i are nonzero elements of V ? and are independent.
{1 , . . . , n } is a basis of V ? , called the dual basis of B.
Thus
The dual of V ? is called the second dual or bidual of V . There is a very simple canonical
injection V V ?? , and it is an isomorphism if the dimension of V is finite. To see it, let x
be any element of V and define a mapping x0 : V ? k simply by
x0 () = (x) .
x0 is linear by definition, and it is readily verified that the mapping x 7 x0 from V to V ?? is
linear and injective.
Dual of a topological vector space
If V is a topological vector space, the continuous dual V 0 of V is the subspace of V ?
consisting of the continuous linear forms.
A normed vector space V is said to be reflexive if the natural embedding V V 00 is an
isomorphism. For example, any finite dimensional space is reflexive, and any Hilbert space
is reflexive by the Riesz representation theorem.
Remarks
Linear forms are also known as linear functionals.
Another way in which a linear mapping V V ? can arise is via a bilinear form
V V k .
The notions of duality extend, in part, from vector spaces to modules, especially free modules
over commutative rings. A related notion is the duality in projective spaces.
Version: 9 Owner: Daume Author(s): Daume, Larry Hammick, Evandar
1082
241.3
example of trace of a matrix
2 4 6
9 8 7
Let A = 8 10 12 and B = 6 5 4 then:
14 16 18
3 2 1
trace(A + B)
= trace(A) + trace(B)
= (2 + 10 + 18) + (9 + 5 + 1)
= 45
trace(A)
= trace(cA0 )
0
= c trace(A
)
1 2 3
= 2 trace 4 5 6
7 8 9
= 2 (1 + 5 + 9)
= 30
241.4
generalized Kronecker delta symbol
Let l and n be natural numbers such that 1 l n. Further, let ik and jk be natural numbers
in {1, , n} for all k in {1, , l}. Then the generalized Kronecker delta symbol,
il
, is zero if ir = is or jr = js for some r 6= s, or if {i1 , , il } =
6 {j1 , , jl }
denoted by ji11j
l
il
is
defined
as
the
sign of the
as sets. If none of the above conditions are met, then ji11j
l
permutation that maps i1 il to j1 jl .
From the definition, it follows that when l = 1, the generalized Kronecker delta symbol
reduces to the traditional delta symbol ji . Also, for l = n, we obtain
i1 in
in
j1 jn ,
ij11
jn =
1 n
j1 jn = j1 jn ,
where j1 jn is the Levi-Civita permutation symbol.

For any l we can write the generalized delta function as a determinant of traditional delta
1083
symbols. Indeed, if S(l) is the permutation group of l elements, then

X
i
i
l
ij11i
sign j1 (1) jl (l)
jl =
S(l)
ji11
..
= det ...
.
i1
jl
jil1
.. .
.
jill
The first equality follows since the sum one the first line has only one non-zero term; the
term for which i (k) = jk . The second equality follows from the definition of the determinant.
241.5
linear functional
Let V be a vector space over a field K. A linear functional on V is a linear transformation

: V K, where K is thought of as a one-dimensional vector space over itself.
The collection of all linear functionals on V can be made into a vector space by defining
addition and scalar multiplication pointwise; it is called the dual space of V.
241.6
modules are a generalization of vector spaces
A module is the natural generalization of a vector space, in fact, when working over a field
it is just another word for a vector space.
If M and N are R-modules then a mapping f : M N is called an R-morphism (or
homomorphism) if:
x, y M : f (x + y) = f (x) + f (y) and x M R : f (x) = f (x)
Note as mentioned in the beginning, if R is a field, these properties are the defining properties
for a linear transformation.
Similarly in vector space terminology the image Imf := {f (x) : x M} and kernel Kerf :=
{x M : f (x) = 0N } are called the range and null-space respectively.
1084
241.7
proof of properties of trace of a matrix
Proof of properties:
1. Let us check linearity. For sums we have
n
X
trace(A + B) =
i=1
n
X
(ai,i + bi,i )
ai,i +
i=1
n
X
(property of matrix addition)
bi,i (property of sums)
i=1
= trace(A) + trace(B).
Similarly,
trace(cA) =
n
X
i=1
= c
c ai,i (property of matrix scalar multiplication)
n
X
ai,i (property of sums)
i=1
= c trace(A).
2. The second property follows since the transpose does not alter the entries on the main
diagonal.
3. The proof of the third property follows by exchanging the summation order. Suppose
A is a n m matrix and B is a m n matrix. Then
trace AB =
=
n X
m
X
i=1 j=1
m X
n
X
Ai,j Bj,i
Bj,iAi,j (changing summation order)
j=1 i=1
= trace BA.
4. The last property is a consequence of Property 3 and the fact that matrix multiplication
is associative;

trace(B 1 AB) = trace (B 1 A)B

= trace B(B 1 A)

= trace (BB 1 )A
= trace(A).
1085
241.8
quasipositive matrix
A square matrix A is a quasipositive matrix if it is nonnegative except perhaps on its main

diagonal, i.e., aij > 0 for i 6= j.
241.9
trace of a matrix
Definition
Let A = (ai,j ) be a (real or possibly complex) square matrix of dimension n. The trace of
the matrix is the sum of the main diagonal:
trace(A) =
n
P
ai,i
i=1
properties:
1. The trace is a linear transformation from the space of square matrices to the real
numbers. In other words, if A and B are square matrices of same dimension and c is
a scalar, then
trace(A + B) = trace(A) + trace(B),
trace(cA) = c trace(A).
2. For the transpose and conjugate transpose, we have for any square matrix A,
trace(At ) = trace(A),
trace(A ) = trace(A).
3. If A and B are matrices such that AB is a square matrix, then
trace(AB) = trace(BA).
Thus, if A, B, C are matrices such that ABC is a square matrix, then
trace(ABC) = trace(CAB) = trace(BCA).
4. If B is in invertible square matrix of same dimension as A, then
trace(A) = trace(B 1 AB).
In other words, the trace of similar matrices are equal.
1086
5. Let A be a square n n matrix with real (or complex) entries aij . Then
trace A A = trace AA
n
X
=
|aij |2 .
i,j=1
Here is the complex conjugate, and | | is the complex modulus. In particular,

trace A A 0 with equality if and only if A = 0. (See the Frobenius matrix norm.)
6. Various inequalities for trace are given in [2].
See the proof of properties of trace of a matrix.
REFERENCES
1. The
Trace
of
a
Square
Matrix.
Paul
Ehrlich,
[online]
http://www.math.ufl.edu/ ehrlich/trace.html
2. Z.P. Yang, X.X. Feng, A note on the trace inequality for products of Hermitian matrix
power, Journal of Inequalities in Pure and Applied Mathematics, Volume 3, Issue 5, 2002,
Article 78, online.
Version: 12 Owner: Daume Author(s): Daume, matte
1087

Encyclopaedia of Mathematics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Encyclopaedia of Mathematics

Uploaded by

Copyright:

Available Formats

Free Encyclopedia of Mathematics 0.0.

c 2004 PlanetMath.org authors. Permission is granted to copy, disCopyright

Top-level Math Subject

00A20 Dictionaries and other general

cardinal number 217

Martins axiom 283

05A19 Combinatorial identities 342

Dirac theorem 396

graph topology 369

equivalence relation 443

proof of Lucass theorem 486

Szemeredis theorem 548

Schanuels conjecutre 598

11P05 Warings problem and variants

ray class group 675

real and complex embeddings 744

Galois conjugate 773

13C15 Dimension theory, depth, related

height of a prime ideal 866

diagonally dominant matrix 920

circulant matrix 956

15A06 Linear equations 989

15A30 Algebraic systems of matrices 1023 inner product space 1051

exact sequence 1089

simple module 1107

their modules 1294

diamond theory 1288

Sylows first theorem 1313

Iharas theorem 1354

even/odd function 1426

derivation of zeroth weighted power mean 1505

26D10 Inequalities involving derivatives Hahn-Kolmogorov theorem 1536

Runges theorem 1582

Liouvilles theorem 1603

proof of Mobius circle transformation theorem

33B10 Exponential and trigonometric func-34A05 Explicit solutions and reductions

proof of Abels test for convergence 1754

41A60 Asymptotic approximations, asymp-symmetric set 1803

47A53 (Semi-) Fredholm operators; inproof of Ascoli-Arzel theorem 1827

Brianchon theorem 1878

isosceles triangle 1899

51-01 Instructional exposition (textbooks,

isoperimetric inequality 1951

53-00 General reference works (hand- symplectic manifold 2009

connected set 2065

totally disconnected space 2100

extremally disconnected 2117

57M99 Miscellaneous 2174

binomial distribution 2219

normal equations 2246

Kleene algebra 2331

example of Nash equilibrium 2357

GNU Free Documentation License

Applicability and Definitions

In any section entitled Acknowledgements or Dedications, preserve the sections

Aggregation With Independent Works

Future Revisions of This License

ADDENDUM: How to use this License for your documents

Jordans Inequality states 2 x 6 sin(x) 6 x, x [0, 2 ]

A Laurent series centered about a is a series of the form

(This set may be empty)

Every Laurent series has an associated function, given by

Let S R, and let S 0 be the complement of S with respect to R. We define S to be

Leray spectral sequence