# On the existence of inﬁnitesimals

**Richard Kaye School of Mathematics University of Birmingham 26th May 2010
**

Abstract So-called nonstandard mathematics uses inﬁnite and inﬁnitesimal numbers to develop mathematics, especially calculus, without the use of the notion of limit. These methods are rigorous and all results are provable in usual mathematics or ZFC, and these nonstandard numbers are typical of modern mathematics in that they abstract familiar operations and concepts (in this case, of limit in the –δ analysis) as mathematical objects in their own right. This makes them a good test case for the study of what a mathematical object is, and issues relating to mathematical truth and knowledge. This paper studies from a mathematical and quasiphilosophical point of view the potential existence of such nonstandard numbers for problemms of existence, realism and platonism of mathematical objects, and indeed to develop ideas of what mathematical objects really are. Structuralist and utilitarian views are emphasised and we conclude suggestions for a theory of ‘pure structuralism’ that might unite mathematical and general thought.

Contents

1 Introduction 2 Mathematical existence 3 Existence of inﬁnitesimals 4 The case for ‘pure’ structuralism 5 Questions for further research 2 4 14 21 24

This is the second version of this paper and was a complete rewrite of an earlier one. I continue to be interested in the material here and I am still developing it. This paper is issued as a preview of work in progress. All comments are most welcome! Part of the material here was used as the basis of a colloquium to the Philosophy Department of Warwick University, in the summer of 2010, and there was a long and useful discussion afterwards.

1

1

Introduction

The nature of mathematical objects, whether they are abstract or real, how one might reason about them, and how one might understand the meaning of statements concerning them, are substantial and interesting philosophical questions.1 It is quite common in such discussion to direct attention towards the case of natural numbers, 0,1,2,. . . , as examples of mathematical objects, because of their comparative simplicity and their familiarity in the non-mathematical world as well as with them being objects of on-going research in pure mathematics. Because of this familiarity and because they are so fundamental makes natural numbers rather atypical examples of the kind of mathematical objects usually found in mathematical practice, and therefore not necessarily the most useful examples for our philosophical questions. This paper will comment on the important questions of the nature of mathematical objects by focusing on another example, that of inﬁnitesimal numbers. It is hoped that by looking at numbers that have seemingly contradictory or impossible nature but nevertheless— according to mainstream mathematics—incontrovertibly exist, we may learn something about the nature of mathematical objects. In any case, these are particularly interesting numbers in their own right, with many potential applications. Inﬁnitesimals are numbers that are ‘so small that there is no way to see them or to measure them’. More formally, in an ordered ﬁeld, a positive number x is inﬁnitesimal if 0 < x < 1/n holds for each ordinary positive natural number n, i.e. of the form 1 + 1 + · · · + 1. Newton and Leibniz both used inﬁnitesimals in the development of their calculus, but were famously criticised by Berkeley. In the nineteenth century Cauchy, and also Riemann and Weierstrass and many others, replaced the notion of inﬁnitesimal with that of limit. But in 1966, Abraham Robinson’s book Non-standard Analysis showed that the use of techniques from ﬁrst-order logic, in particular the Completeness, Soundness and Compactness Theorems, the notion of inﬁnitesimal could be put on a ﬁrm foundation and be useful enough to develop the calculus in the way Newton and Leibniz intended [9]. The name of Robinson’s theory is often abbreviated to NSA.2 Thus, according to Robinson at least, inﬁnitesimals exist and can be used proﬁtably in analysis. However this does not completely deal with questions to do with such numbers, questions I will associate with their ‘existence’ for reasons that will hopefully become clear. For example, the fact that the existence of inﬁnitesimals follows from other axioms of mathematics (or set theory) can be used as a way to focus on those axioms and provide a testing ground for questions about those axioms: whether we believe them, or in what sense we believe they model the (or a) valid mathematical universe. This paper attempts to be a mathematician’s view on questions on the nature of mathematics, mathematical objects and their existence, and mathematical

volume of essays edited by W.D. Hart [4] is an excellent introduction to these issues. called the new numbers in his system ‘non-standard’ to distinguish them from the ‘standard’ numbers or usual numbers of other kinds of mathematics. Thus ‘standard’ and ‘non-standard’ are technical terms with precise meanings. Unfortunately, many people reading ‘Non-standard Analysis’ see it incorrectly as meaning the activity of analysis done in a non-standard way, and this easily becomes a pejorative term for the subject, which is most unfortunate. Most recent authors write ‘nonstandard’ without the hyphen to emphasise the technical meaning of the word, and I will follow this convention here.

2 Robinson 1 The

2

reasoning. There are a range of mathematical ideas here, which I attempt to tell ‘straight’ without over-simpliﬁcation, and where there is a choice concentrating on the mathematical view. I make comments on the underlying philosophy where I am able, without being particularly thorough. A more thorough detailed examination of these ideas might constitute a new research project in its own right, or possibly more than one. The paper is organised as follows. After this introduction, the ﬁrst main section, Section 2, contains descriptions of four main points of view of mathematical research, as a working mathematician would see them. These four viewpoints are not mutually exclusive, nor do I claim that the list is complete. I suggest ways in which the four views merge into each other, but these are not the only ways. (Indeed this property of mathematics that it can be looked at in diﬀerent ways and at diﬀerent levels simultaneously is one of its strengths.) In the following section, Section 3, I shall present the case for and against inﬁnitesimal numbers in each of the four views. My presentation is again mostly mathematical, though I try to make speculative suggestions wherever possible. Section 4, contains my personal conclusions from this thought, which are that the structuralist view is essential, not just for mathematics, but for everyday thought and arguments too. However, the structuralist view as sometimes presented (e.g. Parson’s essay [7] reprinted in Hart [4, Chapter XIII]) has problems which must be addressed. My suggestion is essentially that the structuralist account des not go far enough, and this section concludes with what might be described as a brief manifesto, or research proposal, for what I call ‘pure structuralism’. I conclude with a list of research questions that arise out of the discussion that I think are worthy of further study. An appendix presents some technical information on nonstandard mathematics for the interested reader. The four viewpoints are as follows. Firstly, what I call the view from a unifying theory is the idea that all mathematics can be done or perhaps even is best done as if from a single unifying theory. A set theory such as ZFC is typically used. I mention this view ﬁrst because it seems to be the most common ground for the majority of mathematicians. It is also a useful simplifying view for mathematical practice: it may not be optimum for all work or represent the full views of any particular working mathematician but it is ideal for a ﬁrst presentation of new work. Secondly, there is what I call the pluralist view, where the main work of mathematics is in looking at a large variety of diﬀerent number systems, with ‘number’ being taken in the widest possible sense, and I would include geometrical arguments within these terms. The underlying theory is weakened but the construction of these systems is still possible in ZFC. These number systems are regarded as the most important aspect of mathematics and a number of them are more fundamental than others because of their ease of construction and applications. Chief amongst these systems are those for natural numbers, integers, rationals, reals and complexes. The third of my viewpoints is the structuralist one, that a system of numbers or other mathematical objects takes abstract meaning through what it does rather than what it is—the axioms it satisﬁes rather than the way it is constructed. Here, the axioms take priority, but constructions are still required to show that such objects still exist. The most important feature that this brings to bear for us is canonicity. Remarkably often it is possible to prove that two systems satisfying the same axioms are isomorphic, the number systems are ‘naturally forced upon us’, or canonical, and sometimes even more: sometimes the canonicity itself is 3

canonical. This, it will be seen has very deep consequences for the structuralist view of mathematical objects. Finally, we consider what I call the utilitarian view, that mathematical objects that are useful for other kinds of mathematics, and other applications such as science, are the ones that deserve most attention and the ones that have may be said to have existence in their own right. Apart from the issue of practicality, an argument related to Hilbert’s programme supports this view. It will be seen that the issues of canonicity are essential here and also support the utilitarian existence of certain objects, and in particular canonicity is particularly interesting with respect to applications for physical theories, even possibly quantum mechanics. This lists my four viewpoints and summarises the content of Section 2. I should emphasise that this section is not complete, even from the mathematical perspective, in the sense that there are many other sensible viewpoints, including diﬀerent ways of combining or prioritising the viewpoints I have given. One major omission is the more modern view of mathematics as done on a computer, with computer algebra systems, or similar. This is particularly poorly represented and presents interesting philosophical, mathematical, and computational problems, related to constructivism, but perhaps distinct from it too. But perhaps the biggest omission is the lack of discussion of how the various ways the views ﬁt together and what extra they oﬀer when they are taken together. In particular I do not mention anything like a constructivist or intuitionist framework for combining these views, this being where I suspect intuitionistic mathematics might still have its greatest impact.

2

Mathematical existence

The main issue, according to Benacerraf [1] (reprinted as Hart [4, Chapter I]) is that theory of truth of mathematical statements should be consistent with that of everyday truth, and so should the idea of knowledge. There are a number of suggestions, not all incompatible with the others, for answers to the questions, ‘What is a mathematical object?’ and ‘What does it mean to say that an object exists?’ In this section we highlight some of the main options and choices from the point of view of a typical working mathematician and actual mathematical practice. In each case I will hint at how mathematics is done, and what the idea of its underlying foundation is. (These diﬀerent views are not exclusive of each other. Indeed one of the interesting things about mathematics is the way the various diﬀerent foundational views can work together as diﬀerent aspects of ones work or at diﬀerent levels.) The unifying view of mathematics. By ‘the unifying view’ I mean the view of mathematics that it can all be done from one global theory, such as that of set theory, typically ZFC, and whether or not one chooses to write proofs and other arguments formally (and most choose not to) it is clear from their presentation that they all can be written in this way. Therefore one’s work contributes in the very least to the body of knowledge of consequences of the global theory. I think it is fair to say, however, that most mathematicians do not give their underlying theory much attention, preferring to ‘get on with the job of doing mathematics’. But if they were asked, they could give a list of principles 4

they ﬁnd admissible for deduction which would probably amount to ﬁrst-order logic and axioms that would be available in a (possibly multi-sorted) version of ZFC set theory. Most mathematicians in this sense are rather conservative, and understand that this conservatism places them well within the realm of ZFC. This places their work on a reasonably sound footing, at least according to one of the standard paradigms, but their belief in the soundness of mathematics is typically much stronger. They have little diﬃculty mentally picturing the set theoretical universe described by the ZFC axioms, or (more realistically) the part they are working on, as existing in some sort of platonic way. They ‘get on’ with their mathematics, which is to say, they posit the idea and consequences of there being particular objects with particular properties, both informally (using images, diagrams, analogies, and so on) and semi-formally. Research mathematicians have generally trained themselves to be ‘pessimistic’ about their picture of the universe; that is to say the mental picture they have is generally inclusive of all possibilities and therefore necessarily somewhat incomplete. This ‘pessimism’ arises because the informal ‘brain-storming’ stage is important for successful work, but from experience they know it can be unreliable. Potentially unreliable arguments generated by informal means are always carefully checked, veriﬁed and communicated in a rather diﬀerent semi-formal style when it is believed that some important conjecture that can be proved has been identiﬁed. When working semi-formally, proofs are written down in a mixture of natural language and mathematical symbolism in such a way that they could in principle be rewritten or developed into formal proofs in ﬁrst-order logic. Few mathematicians work in anything other than ﬁrst-order logic. These logical principles used in such proofs are, however, always phrased in a way that are compatible (via an informal version of the soundness theorem) with the notion of truth (deﬁned using something like Tarski semantics) relative to the universe they conceive of as (at least for the moment) existing platonically. These proofs can be rewritten as formal proofs in ZFC and are frequently interpreted as such, but at the time of conception the syntactical proof rules are rather considered as semantic rules concerning truth and possible situations which relate to the conception of the universe being ‘explored’. If an example or algorithm or other object is explicitly exhibited rather than shown to exist by non-constructive means a mathematician will usually say so, rather than leave the realm of classical logic. Proofs are deliberately written in a semi-formal way because mathematicians know that there may be a number of subtly diﬀerent interpretations of what they write, and they emphasise the (semi-formal) arguments rather than the pure statements of their results to aid these diﬀerent interpretations. A proof can be read as the reason why some statement is true, but also often as a method or process by which to carry out a calculation, and although mathematicians are generally unfamiliar with intuitionistic logic and other constructive logics, they do present proofs as methods when appropriate. Of course, some mathematicians are more familiar with foundational matters and may explicitly state they are ‘working in ZFC’ or similar. A few may be working in areas where additional axioms (such as CH, GCH, AD or large cardinal assumptions) are useful and typically pick and choose from this list of additional axioms as suits them. In any of these cases, mathematics is usually done in the ﬁrst instance within some standard logical framework, such as ZFC, and the actual work taking place is both informal and semi-formal, but both 5

parts are conceived by the mathematician as taking place in some semantic manner relative to the conceived or imagined set theoretic universe. The pluralist view of mathematics. Mathematics is primarily (but not exclusively) about numbers, where ‘number’ is often taken in the most general sense possible. Some number systems are of particular importance, and new number systems are usually built from more fundamental ones. Arguably the most fundamental number system of all is the system N of the natural numbers, 0, 1, 2, 3, . . .. These frequently are described as being the numbers corresponding to a ﬁnite sequence of strokes on the page, the number 5 corresponding to ||||| for example. The natural number system is extended to the integers, Z, the rationals, Q, the reals, R, and the complexes, C. Other number systems may be devised by related means, such as polynomial rings, groups, ﬁnite and other ﬁelds, and extension and quotients of these. Other structures that are not strictly speaking of numbers but are treated as numbers by mathematicians, such as the collection Vω of hereditarily ﬁnite sets, the collection of (ﬁnite or inﬁnite) graphs, groups, etc., may also be deﬁned and used. In this view, Mathematics becomes an industry of combining these ‘numbers’ from these diﬀerent systems to ﬁnd interesting properties or facts about them, to devise new systems, and to use these systems to model phenomena in the natural world. Some logical theory is required for this endeavour of course, but mathematicians working like this typically regard each individual system of numbers as having some genuine existence. It may be that a working mathematician will have what amounts to diﬀerent logical conventions for the diﬀerent areas of work. The theory for the reals might be based on ﬁrstorder logic, but that for making new systems out of old may be based on some category-theoretic framework. From the foundational point of view the mathematician feels on safer ground as the theory of each system has some sort of independent life, and if one falls, being found to be inconsistent or uninteresting (that is, if some interesting mathematical result is proved stating that there is no system with the particular properties in question) then the others still stand. In this view, each system is constructed, and exists because it is constructed. It has a particular construction, and therefore a particular deﬁnition. A rational number p/q is the equivalence class of a pair of integers (p, q) with q = 0. One of the early tasks of set theory (the theory ZFC, for example) was to verify that all the constructions of all these number systems could be carried out in that theory, so that ZFC could (but not necessarily should) be regarded as a unifying theory of all these number systems. In this sense set theory was moreor-less successful (though it is inconvenient that so-called ‘large categories’ are not sets) and the pluralist view can be and is partially subsumed by the global unifying-theory view. The structuralist view of mathematics. The main problem with what I have called the pluralist view of mathematics is that two workers may have diﬀerent constructions of number systems and these systems need to be compared. It is certainly the case that many number systems that look like the real numbers can be constructed, and the important thing about a particular real number is not how it is actually constructed (as a Dedekind cut, the equivalence class of a Cauchy sequence, a continued fraction, or whatever) but what it does.

6

Thus the important property of π is that it is the ratio of the circumference of the circle to the diameter, and not that it happens to be a Dedekind cut of smaller rational numbers (in the view of one construction of R). To get around this problem one writes down axioms for each structure of numbers one devises and then proves these axioms to be true for the number system in question. The name ‘axiom’ is used not because it is to be assumed without proof (on the contrary, these axioms must be proved) but because the other features of the structure will be quietly forgotten and future work regarding the structure will be from the axioms we have listed alone and nothing else. The word ‘structure’ has crept in here because a set such as N or Q without additional structure is just amorphous and depends only on its cardinality. So the additional algebraic structure (the order relation and addition and multiplication operations in N and Q) are important to distinguish these systems. The axioms describe the properties of the elements of the set (as primitive objects, i.e. without structure in themselves) and the properties they have relates to order, addition and multiplication. The number 2/3 in Q is described by the properties it has with respect to order, addition and multiplication compared to other rationals. In other words mathematical objects like N and Q, and also the numbers themselves, are abstract objects characterised by ‘what they do’ rather than ‘what they are’. For an elementary mathematical introduction to mathematics considered this way, read Gowers [3]. A more critical philosophical account is to be found by Parsons [7] (Hart [4, Chapter XIII]). There are two main aspects of this new way of thinking about mathematics. The ﬁrst is that we have the basis of understanding the idea of an abstract mathematical object: an object that has no structure in itself but is characterised by what it does in certain situations. Whatever the philosophical implications of this view are, it is at least in accordance with modern mathematical practice. The second is that an abstract mathematical object is in some sense independent of how it is constructed—the actual ‘internal structure’ of a real number as a Dedekind cut or whatever—but is in fact the same object as one constructed in an entirely diﬀerent way but with the same properties. Two key examples illustrate this perfectly. The axioms for the natural numbers N given by Dedekind characterise the structural properties of the natural numbers precisely. Similarly the axioms for the real numbers as being a complete Archimedean ordered ﬁeld (also essentially due to Dedekind) characterise the structural properties of the reals exactly. We have the following. First Canonicity Theorem for the Natural Numbers. Let N and M satisfy the axioms for the natural numbers. Then N and M are isomorphic. First Canonicity Theorem for the Real Numbers. Let R and S satisfy the axioms for the real numbers. Then R and S are isomorphic. In undergraduate lectures I like to describe a conversation between humans and some intelligent extra-terrestrial species soon after ﬁrst-contact. Humans and ET might struggle to agree on what constitutes something fashionable, or elegant, or even beautiful, but the mathematicians of the two races would get together and discuss axioms for real numbers, and would presumably agree on the set of axioms each species takes (or if not, prove from one set of axioms the axioms of the other, and vice versa, showing that the two sets are equivalent) and therefore be able to conclude that both humans and ET share the same 7

concept of real number, irrespective of any ideas each race might have of their implementation using Dedekind cuts, Cauchy sequences, or whatever. Although these theorems are well-known and appear to support the view that the structuralist approach to objects works, at least in these two cases, the story is not complete. Although we now know that all systems of reals are in fact structurally similar, these results do not tell us how they are similar. But the theorems that there are essentially only one natural number system and one real number system are even stronger than this, in another subtle but important way. Given two systems N and M of natural numbers, or two systems R and S of real numbers, not only are they isomorphic, but it is possible to show that in each case there is only one possible mapping f : N → M , g : R → S that demonstrates this isomorphism. Second Canonicity Theorem for the Natural Numbers. Let N and M satisfy the axioms for the natural numbers. Then there is a unique isomorphism between N and M . Second Canonicity Theorem for the Real Numbers. Let R and S satisfy the axioms for the real numbers. Then there is a unique isomorphism f : R → S. Not only are the ideas of ‘natural number’ and ‘real number’ canonical, or forced upon us in a natural mathematical way, but the isomorphism that shows this canonicity is canonical too. This means that however one deﬁnes real numbers, not only is the structure of real numbers essentially unique, but the individual real numbers are characterised by their properties and are also essentially unique. The Second Canonicity Theorem has important mathematical consequences. But it has important consequences for physical applications of the reals and measurement too, which brings us on to our fourth view of mathematics. The utilitarian view of mathematics. This is the idea that mathematical ideas, objects, and theories exist because they are necessary or useful to explain or model scientiﬁc phenomena, including other areas of mathematics. In mathematics, one can temporarily posit the existence of all sorts of mathematical structures and objects and it is remarkable from a psychological point of view how these objects can take some sort of real existence in the imagination when one starts to work with them. In this sense one can choose to believe in almost anything, including the leprechaun with a pot of gold at the end of the rainbow. A reasonable restriction is that one’s beliefs should not force one into an inconsistent point of view, but it is not necessary to be reasonable. I have heard mathematicians being compared to children at a sweet shop, being oﬀered many glittering packages of sweets to which they may pick and choose the ones they want. The choice of such sweets, be they axioms or number systems or something else, is usually made for practical reasons—to solve the current problem at hand—or for reasons of elegance, which may or may not in the long term amount to the same thing. We have already seen an example, where a mathematician needing axioms for set theory that go further than the usual ZFC axioms tends to pick and choose the ones they need without too much concern about how these are justiﬁed. But we were all brought up in a very proper way and know that an excess of sweets can give one a tummy-ache. So one tries to get by with as little as reasonably possible, though starting with 8

a large tub of sweets and being able to pick and choose a small number from such a large variety certainly adds to the excitement and excites the mind as to the possibilities of some hitherto undreamt-of exotic combination. There are two arguments supporting this view. One is the application of Peirce’s principle of abduction used by Quine [8] (Hart [4, ChapterII]), that if a piece of mathematics X is required to understand an observed phenomenon Y then the observation of Y tends to support the argument that X is correct, or true, or sound. This argument is also employed even if X cannot be shown as necessary for an understanding of Y but is perhaps the most elegant or the most powerful or suggestive of other applications. This argument might be considered to have more force if Y is some aspect of ‘the real world’ and X is being used as part of a mathematical theory to model phenomena in the real world, but it seems reasonable to take this further and argue that some new kind of mathematics X has mathematical existence (whatever that may be) if it is the most elegant or powerful way of explaining some other piece of mathematics Y. The second argument relates to Hilbert’s programme and says that provided that X can at least be argued to be consistent (or consistent with other mathematical ideas one is using) then it can be regarded as ‘ideal’ mathematics that has some validity of its own. G¨del’s Second Incompleteness Theorem o shows that Hilbert’s programme as originally posed cannot succeed, but the main thrust of the programme still holds weight. This is that new axioms or new ideal elements may be accepted if shown consistent, and such ideal mathematics makes a useful contribution if it can be shown to have many reasonable consequences for ordinary ‘real’ mathematics—and there are many levels of ‘realness’ from veriﬁable statements about the natural numbers (Hilbert’s original notion of ‘real’) to comprehensible statements about one of the other standard number systems discussed above. In some sense the abduction argument and Hilbert’s programme argument are similar, in that they both try to measure the success of the theory in terms of ‘real’ consequences, be they in some familiar mathematical structure, or in their use as models for natural phenomena, and these useful consequences ‘trickle down’ in the sense that given a theory Y which has consequences for X, a theory Z that has consequences for Y is likely to also have consequences for X. Put a diﬀerent way, if at the lower levels of this ‘hierarchy’ we can readily detect problems (such as inconsistency, and this is the point of Hilbert’s programme, that inconsistency is ‘real’) by ‘trickle down’ any problems in higher mathematics will eventually show up. This is even more true of powerful and elegant higher mathematics, which being one of the more glittering sweets available is likely to be taken up more often by other mathematicians, who will surely in due course ﬁnd out what the problems of it are, if there are any. In addition to all this there are mathematical reasons for taking the utilitarian point of view. A consistent ﬁrst-order system is, by the Completeness Theorem, satisﬁed by some mathematical structure. The Completeness Theorem is provable in a minimal theory of mathematics (ZF set theory with the Axiom of Choice is certainly suﬃcient, but rather less is actually required) and, as we shall see, arguments supporting consistency are not necessarily as diﬃcult as they might seem in all such cases. However the main problem with consistency as a criterion for belief is that it is rather weak: given its consistency and some unifying ZFC-like framework, the 9

existence of our number system follows, but we are looking to see if there is more than this. Thus belief (at least for the context of this paper) needs to have some reason or rationality associated with it. From the point of view of mathematics we expect belief in an object to have some usefulness: adding an axiom for the existence of a leprechaun does not in itself improve mathematical knowledge and if we tried it we would tend to reject the axiom and disbelieve in the leprechaun. But if the ‘leprechaun’ was simply a fanciful name for an abstract mathematical ‘point at inﬁnity’ (mathematicians are indeed given to using fanciful names for abstract ideas such as this) and the axioms state this property of the ‘leprechaun’ correctly then its addition could quite likely simplify the description of the geometry of the system being considered and we would have rational reason to believe the new axioms and the existence of the ‘leprechaun’ so characterised. In other words, it’s not what one believes and what one calls it that’s important, but rather how it aﬀects the way one thinks about everything√ else. Consider for example the addition of the number i for −1 to the real numbers, making the complex numbers. From the point of view of the physical universe, especially when one is thinking of the reals as measuring distance or time, the square root of minus one is a mysterious object, and historically it was rejected for a long time because this number does not seem to exist in this physical sense. However, the addition of this number to the reals turns out to be straightforward mathematically and not nearly as complicated as applying the Completeness Theorem of logic: essentially all that is required is to know that the polynomial X 2 + 1 is irreducible over the reals, something that is quite easy to establish. More precisely, the symbol i is taken simply as a formal symbol and a complex number is a formal expression x + iy where x, y are real numbers, and this expression can be considered as being simply a notation for a pair (x, y) of real numbers, with special rules for addition, multiplication and so on. That these rules make sense depend simply on the irreducibility of X 2 + 1. The number i itself is 0 + i1, and once one has added i to the reals one can see that all numbers of the form x + iy need to be added, so this construction has a pleasant kind of ‘inevitability’ about it. Thus the complex number system is easy to construct from the reals, and this is already in its favour. Is it a useful system of numbers? Well yes, most deﬁnitely, as the addition of i simpliﬁes a great many theorems and formulas. For example the ‘Fundamental Theorem of Algebra’ that ‘every polynomial has a root’ becomes true in general without having to qualify the hypothesis as ‘every polynomial of odd degree’ as one would have to for the reals. Complex numbers simplify the equations for the solutions of polynomial equations of third and fourth degree even when the solutions are purely real numbers (this was the ﬁrst use they were put to and their original motivation) and the introduction of i uniﬁes equations for real-valued trigonometric and hyperbolic functions into one single set of formulas. It could be said of the complex numbers that these are merely a technical device for handling a pair of real numbers simultaneously. And of course this is how they are constructed or deﬁned. However it is important to be clear that the applications of complex numbers show that they are rather more than this. In particular the important notion of diﬀerentiability of a complex-valued function is not at all the same as that of functions of two real variables, and may not have been discovered but for the view of complex numbers as single numbers rather than pairs of reals. In other words, the fact that complex numbers suggest new 10

mathematics that would not have been otherwise obvious is a very strong factor for their usefulness. The other aspect of the utilitarian view is the usefulness of the mathematics to scientiﬁc theories, especially theories of physics. Here it is important to stress that the question is whether any particular kind of number can be used to develop a useful theoretical model of some aspect of the universe, not whether numbers really exist in the physical world. For example, it is traditional to measure the traditional dimensions of length and time using real numbers. (The switch from the use of rational numbers to real numbers for this was made by the ancient Greeks who were genuinely concerned about √ measurements that seemed to have to be made with irrational numbers such as 2. After several centuries we do not seem to have any serious rival for this use of the reals, something I ﬁnd surprising.) If we ask how complex numbers help us with measurements and physical theories we see that although distance and time do not obviously have complex values, some quantities, notably current and voltage in AC circuits, are naturally modelled as complex numbers, with the magnitude of the number being the peak value and the argument of the number being the phase.3 So from the point of view of modelling physical phenomena, complex numbers play a part and should be accepted. Whether one goes so far as speculating whether other equations that occur in physical models also apply to complex numbers, for example that a particle with imaginary rest mass might exist and if so would travel faster than light speed, is perhaps more the realm of science ﬁction. However the fact that numbers such as i promote such speculations and that at least one or two of these speculations may turn out to be reasonable science rather than ﬁction is in itself also a reason to accept the utility of i. For the application of numbers to natural phenomena, the canonicity theorems are particularly important. The First Theorem is obviously essential, for if we were to measure a quantity by a number—a real number perhaps—it is important that the resulting numerical measurement comes from an identiﬁed structure so that two such measurements can be combined or compared. But the Second Canonicity Theorem is important too: it is this that guarantees that the result of a measurement is unique and reproducible. If there are two diﬀerent isomorphisms between structures R and S then each of R, S has at least one nontrivial automorphism sending numbers to diﬀerent numbers with the same properties. And if two numbers x, y ∈ R have the same properties they are both candidates for the same measurement of a physical quantity. The Second Canonicity Theorem might fail for a structure S because the system S may not have enough structure to distinguish between its elements. An example of a number system satisfying the First but not the Second Canonicity Theorem is the system of complex numbers C as a ﬁeld with +, ·, 0, 1 and (so that the real number line can be identiﬁed as a special subsystem) the absolute value operation |x|. The First Canonicity Theorem follows from that for R. But there is no way to distinguish i and −i, nor (more generally) x + iy and x − iy. In other words conjugation x + iy → x − iy is an automorphism of the structure and Second Canonicity fails. We can resurrect Second Canonicity by adding to our structure an additional function, for example the argument map arg z

magnitude if x + iy is the real quantity taken in the appropriate quadrant.

3 The

p x2 + y 2 and its argument is tan−1 (y/x)

11

(returning a value in the interval [0, 2π)) but adding such a function requires an arbitrary choice of which is the upper half-plane and which the lower, or whether angles are measured clockwise or anticlockwise. Failure of canonicity for C has some consequence for measurements using C. For example in an experiment or electronic design using C to model alternating current (AC) there are two choices for the measurement of the very ﬁrst current or voltage, but once the conventions for this ﬁrst measurement is chosen the rest of the measurements must follow suit. This is of little consequence to the physical theory using C to model the actual physical system, but suggests that it is not in fact exactly true that we ‘see’ complex numbers as complex voltages or currents in an AC circuit. Put mathematically, the ﬁrst complex number value or measurement is one of two values, x + iy or x − iy, the set of which is called the orbit of (either) value under the automorphism group in question. There is nothing to choose between these two values, although they turn out to be the same value if y = 0. But (provided y = 0) the second value u + iv will be uniquely determined. The orbit of a single point x+iy under the automorphism group has 2 elements (or 1 if y = 0) but the orbit of a pair of points (x+iy, u+iv) also has at most 2 points. No physical theory of measurement I can think of can distinguish between i and −i, so it is not quite true to say that the complex numbers exactly models AC circuits. For the complex numbers mathematicians usually choose to live with the failure of the Second Canonicity Theorem and to signal this failure and the additional properties of C as an extension of R that it gives us is coded into the conjugation operation. The canonicity of the reals is not necessary for believing their existence, but it is a very desirable property of the reals and strong evidence for such belief. In the framework already set up, it is an elegant property of the reals that is potentially highly useful. Although canonicity itself does not imply that real numbers can be used to measure physical quantities it does at least show that the number system is available for such measurements. And, as we know, it is common in physics to measure distance, time, mass, energy and so on as real numbers with appropriate units. This is not to say that real numbers must be used in this way or that there is no other more appropriate system to use, but rather that the real numbers forms a particularly useful model of such quantities that is applied extensively in physical theories. In contrast, consider the case of a family of number systems described by a set of axioms A which fails to have the basic canonicity property, i.e. that we can’t prove that every two systems satisfying A are isomorphic. We might be able to convince ourselves of the existence of systems satisfying A by elementary or straightforward manipulations of systems whose existence we already are convinced about. For example if A is the set of axioms for abelian groups, we can present the reals with the addition operation, or the reals with zero removed and the multiplication operation, or the integers modulo 5 as concrete examples of systems satisfying A. But if our evidence for the existence of systems satisfying A only comes from complicated arguments in ZFC this option is not available to us. If we can prove in ZFC that we expect that there is up to isomorphism only one system satisfying A then we can posit the existence of such a system ‘in the real world’ and describe it accurately in terms of the theorems about it that are provable in ZFC. We have a lot of concrete information about this system that we can at least consider, and maybe later choose to believe 12

in. (There is another example of the semi-semantical reasoning going on here.) Thus provable canonicity in a set of axioms like ZFC is at least a useful precursor to belief of existence. Equally, with the obvious necessary extra care being taken, provable canonicity some other conceivable set of axioms other than ZFC is helpful evidence for the belief of prior existence of the number system, irrespective of what we take for our usual axioms for set theory or mathematics. If systems satisfying A are not canonical then perhaps they are there (in models of ZFC) because of some dubious axiom or artifact from the way ZFC is conceived. This is particularly pertinent because one of the axioms of ZFC that has been the subject of much debate as to its correctness over the last hundred years—the Axiom of Choice (AC)—is often recognisable in its consequences by their non-canonicity. For example, AC implies that there is a ‘well-order’ of the set of real numbers. It doesn’t really matter what a well-order is for this discussion, except that no such well-order can be deﬁned by elementary means as discussed earlier and well-orders are (provably in ZFC) highly non-canonical. What’s more, from knowing in more detail the structure of any well-order of the reals, it would be possible to read oﬀ the solution to one of the biggest open problems in set theory: whether the continuum hypothesis (CH) should be regarded as true or not. (CH is known to be independent of the other axioms of ZFC, but no satisfactory evidence as to whether it is CH or its negation describes the true mathematical universe is known.) Non-canonicity in itself is suﬃcient to make the issue warrant further investigation and my view is that the other evidence is quite compelling in the direction of not accepting at this time the prior belief in the existence of a well-order on the reals. There are other more fundamental issues connected with canonicity that are not related to AC, but which are rather more diﬃcult to isolate. Interestingly some of these other issues may have an impact on physical principles and quantum mechanics. Consider the air in front of me. Most theories say it consists of particles—air molecules. Certainly there is plenty of good scientiﬁc evidence to say that there is matter in the air about us and it is in the form of very small particles, so this seems entirely reasonable to believe. But if I was asked to focus on one particular air molecule and describe it—in particular whether it exists—this becomes more problematic. The immediate question is which one? There is an amorphous mass of air molecules in front of me and I can’t pick a single one out. Does that matter? Is it a reasonable position to believe in the existence of the air in front of me and have some belief about the form of structure that air takes without any speciﬁc belief in any particular air molecule? If I am to believe in the existence of any single air molecule, shouldn’t I be able to say something speciﬁc about it other than it is simply an air molecule and it exists somewhere? From the point of view of quantum physics, my refusal to believe in a single molecule might be quite a sophisticated position. The uncertainty principle says I shouldn’t be so sure of any single molecule because I cannot specify its position and velocity. Furthermore, the Pauli exclusion principle says that all individual particles must have distinct states, i.e. there should be ways to distinguish them. Now I didn’t refuse to believe in the existence of individuals rather than the amorphous mass because I choose to bow to the Heisenberg–Pauli god, but rather because of some more general principle that needs to be pinned down and understood better. I admit to ﬁnding it diﬃcult to articulate the exact 13

principle here, but it is something along the lines of the following. Were I to believe in a single air molecule without being able to say anything at all speciﬁc about it, this would hardly be a useful belief but instead would be rather like a belief in an object that has no impact whatsoever on the rest of my thinking, like the arbitrary belief in the leprechaun. I can however reasonably believe in the amorphous mass of air, and also reasonably believe in the theory that says it is described best as a collection of individual molecules. Were I to be able to say something speciﬁc about some air molecules, the ones that are molecules of oxygen perhaps, then I would have a stronger belief in a particular part of the mass of air, the part that is oxygen, but I still would not be able to have any useful belief in any particular oxygen molecule. One wonders whether this issue of canonicity or deﬁnability and existence of individuals may have some bearing on the underlying principles of quantum mechanics. Unfortunately I have to leave these speculations open here as the questions seem diﬃcult at this stage, but it would seem worthwhile returning to them at another time.

3

Existence of inﬁnitesimals

The previous section set out four main viewpoints a working mathematician might typically take in his or her work. None is thought through in detail according to the underlying philosophy—we will make further comments on these views later. In this section I would like to describe the case for existence of inﬁnitesimals and nonstandard number systems from these diﬀerent viewpoints. For background information on inﬁnitesimals see Robinson [9], Kossak’s article [6], or the technical appendix to this paper. Additional material on ﬁrst-order logic, as well as a brief introduction to nonstandard analysis appears also in The Mathematics of Logic [5]. Inﬁnitesimals in the unifying view. From the unifying point of view, the existence of a number system with speciﬁed properties follows, if at all, from the axioms of the unifying theory one has chosen to adopt. In the case of the theory ZFC, axioms are available to construct or deﬁne the set of natural numbers, N (or ω, as it is usually called in this context), and from this the usual constructions allow us to deﬁne Z, Q and R. These systems are regarded (within ZFC) as structures for ﬁrst-order languages and ZFC can state and prove the main results of ﬁrst-order logic including the Soundness Theorem, the Completeness Theorem, and Lo´’s ultraproduct theorem. Then by usual models theoretic means, we can either analyse the structure R and using Soundness deduce that a ﬁrst-order theory of hyper-reals with inﬁnitesimals is consistent and hence by Completeness there is such a structure in the universe, or go directly from R to a hyper-real structure ∗R by means of Lo´’s theorem and a s suitable ultraﬁlter, usually a non-principal ultraﬁlter on ω or N. In this sense, number systems with inﬁnitesimals clearly exist, and this why Robinson’s approach is considered correct and rigorous. One concern that we may have is that the Axiom of Choice (AC) is used in an essential way as one of the ZFC axioms required for the Completeness Theorem, or for the construction of the ultraﬁlter to use when applying Lo´’s theorem. Looking at it a diﬀerent s way, these nonstandard number systems could be regarded as a test case for 14

theories such as ZFC: ZFC clearly ‘predicts’ the existence of them but direct constructions do not yield such systems. Is there some more direct way that arguments for such number systems can be given? Does this prediction support or refute the traditional belief that ZFC is a good unifying theory? We have concentrated on those unifying axiomatic systems in the ZF family. It is worth remarking that a number of alternative systems exist in which nonstandard numbers appear more naturally. Some of these have associated philosophical motivation (such as Vopˇnka’s Alternative Set Theory [10]). There are e a number of systems proposed by Kanovei and others. In any case, to adopt such a system requires one understand the consequences especially as it forces one’s mathematics outside the mainstream. Inﬁnitesimals in the pluralist view. In the pluralist view, we should take as little as possible from our metatheory and construct nonstandard number systems directly as an extension of R perhaps, analogously to the construction of C. The theory ZFC ‘predicts’ that this should be possible, and the Lo´ cons struction appears to be the most straightforward approach. It is direct and explicit, once one is given a suitable ultraﬁlter. For this approach to work one seems to need an axiom for the metatheory saying that such ultraﬁlters exist, and this axiom needs to be justiﬁed. The two such possible justiﬁcations that come to my mind are: (A) an argument that ZFC or some fragment of it is justiﬁed, as is AC (or the Boolean Prime Ideal Theorem) and the argument for ultraﬁlters from these; or (B) an argument of the utilitarian sort that says that ultraﬁlters are necessary to explain and work with a great number of mathematical phenomena. The existence of these ultraﬁlters is, it seems to me, not unreasonable, so that it seems that we can reasonably imagine our pluralist universe populated with nonstandard number systems amongst others. Without such ultraﬁlters, it is possible to make poor versions of nonstandard number systems. One can take for an inﬁnitesimal h a transcendental number over R and order the ﬁeld extension R[h] so that 0 < h < 1/n for all n ∈ N. This is an ordered ﬁeld with an inﬁnitesimal, but not as rich as the hyper-reals constructed from an ultraﬁlter, and not yet as useful for mathematical analysis either. I suspect that if this approach were to be continued, we might have a workable, but clumsy, theory of analysis very like the -approach with limits. Inﬁnitesimals in the structuralist view. The most useful and commonly discussed nonstandard number systems in practice are suﬃciently saturated models (it is usual to take them ℵ1 -saturated) of an appropriate ﬁrst-order theory—the theory of the reals with additional functions and relations, for example. The systems constructed from a non-principal ultraﬁlter over ω are of this type, for instance. From the structuralist point of view one would like to work with these properties as axioms, rather than concern oneself about the properties of the ultraﬁlter one used to construct the system, if it was obtained that way. Indeed, a construction via the completeness theorem seems preferable in this sense since it is ‘purer’ and from it one cannot easily see the details of how one obtained the system, only the properties of the system so obtained. The canonicity theorems for nonstandard number systems are, on the other hand, more problematic. The immediate reason for concern arises from the

15

fact that all the usual constructions of nonstandard number systems with inﬁnitesimals use the Axiom of Choice (or a slightly weaker form of it such as the Boolean Prime Ideal Theorem) in some essential way. These axioms have been around for some time, but are not to everyone’s taste, so are worth looking at in this context. Looking ahead, issues to do with canonicity also impact on the application of such numbers in physical theories, particularly in measurement, and our question on physical existence. We concern ourselves here with nonstandard number systems that are elementary extensions (in the sense of ﬁrst-order logic) of structures of the form R = (R, 0, 1, +, ·, <, Z, . . . , f, . . .)f ∈F for some suitable set of functions F . Following standard terminology from mathematical logic we shall also call such systems models, the theory of the model being understood to be the elementary diagram of the structure above, i.e. the set of all ﬁrst order statements that can be written down using addition constants naming real numbers and true in the above structure. Other nonstandard structures considered in NSA usually contain all this as a substructure or as an interpreted structure, so the non-canonicity phenomena we will be talking about for this structure applies to these others too. I have included the integers Z as a unary predicate in order to code inﬁnite sets, in the way that is common in NSA. The set of natural numbers N can be deﬁned from that of the integers if that is one’s main interest, and using this one can use results from the theory of models of arithmetic to help classify models. The issue for canonicity is whether there is some identiﬁable model of this form that can be described simply by means of mathematical axioms, other than the so-called ‘standard’ one (i.e. the one above) which contains no inﬁnitesimals. The answer in general is no, and there are a number of obstructions. The ﬁrst is well-known, but is not a particularly serious obstruction. By a pair of theorems of ﬁrst-order logic known as the Upward and Downward L¨wenheim–Skolem Theorems, models of the appropriate theory can be found o of every suitably large cardinality. (Where ‘suitably large’ means in this case at least as big as R itself and of the set of functions F used.) That is not so much of a problem is because we can specify the cardinality we are interested in in a natural way, as the ﬁrst cardinal bigger than this minimum, perhaps. The other obstructions to canonicity are speciﬁc to the particular theory we are looking at, the fact that it codes sequences, computations in Z, and other rather complex mathematics. It is necessary for NSA to look at structures that code complex mathematics to enable us to solve diﬀerence equations in the nonstandard world as indicated above, or more generally to use NSA to reduce continuous problems concerning sets of reals or functions of real variables to discrete problems with solutions by combinatorial means. In the terminology of the classiﬁcation of ﬁrst order theories given by model theory, the theory we are looking at is highly unstable with too many models at each cardinality to expect a classiﬁcation of these models. One possible candidate for a ‘canonical choice of model’ is a ‘minimal’ or ‘smallest’ one, but it turns out from this and some model theory that there is no minimal nonstandard model. (By a slight irony, minimal models do exist for a third method of construction outlined in the appendix—the one using G¨del’s o Incompleteness Theorem—but these necessarily give structures satisfying false

16

sentences, such as ¬ Con(PA). In any case there is an issue as to which false sentences we are to choose.) Perhaps instead one should look for large models, models which contain every possible feature that one might want, models that contain elements satisfying every possible property. This is a common idea in model theory, and models of this type are said to be saturated. Saturated models are very powerful, not only for model theory, but for nonstandard analysis, where saturation principles are often exactly what one needs to transfer a problem or deﬁnition from the real world to the nonstandard word. There are many notions of ‘saturation’ in model theory, but for highly unstable theories such as ours, all the notions of saturation have some diﬃculty too. Some weaker notions of saturation (such as recursive saturation, arithmetical saturation, resplendency) are available to allow models of all theories to have such models at all cardinalities, but unfortunately these notions of saturation do not characterise the models up to isomorphism, i.e. there are no canonical weakly saturated models. There is a notion of full saturation4 which does characterise models up to isomorphism, but unless the underlying set theoretical framework of ZFC that we are using is changed, saturated models of our theory need not exist at all. The best general results showing existence of saturated models are of the following type [2]. Theorem. Suppose ZFC together with either the generalised continuum hypothesis (GCH) or the assumption that there is a strongly inaccessible cardinal. Then there is a saturated elementary extension of R. One might say that the set theoretic assumptions required to build saturated models are irrelevant in the structuralist view, but if one takes this standpoint one still has to argue for the existence of saturated structures. In any case one of the strengths of mathematical work is that the four views I have outlined are in some senses compatible, and we should not throw away the unifying view lightly. If we are adding axioms to set theory, I would argue that adopting GCH is not something that one would want to do unless strong evidence is forthcoming on the continuum problem (CH), but an axiom for the existence of arbitrarily large strongly inaccessible cardinals is a much more reasonable addition to our set theoretic axioms for mathematics. Indeed much of modern set theory is concerned with adding ‘large cardinal axioms’ that cannot be proved from the usual axioms and seeing what the consequences of them are, especially for ordinary sets such as the set of reals. From the point of view of more advanced NSA, these large cardinal axioms are useful in another way too, since they allow us to have access to a number of models of set theory (something that is not available without large cardinal axioms) and for some applications it is helpful to start NSA by taking an elementary extension of a suitable model of set theory, rather than an elementary extension of our structure R. Any two saturated models of our theory of the same cardinality will be isomorphic, and it is diﬃcult to see how non-saturated models might be suﬃciently canonical to be of interest, so the conclusion is that for canonicity we

4 For experts, the technical deﬁnition I refer to is: a model M is λ-saturated if it realises all types over sets of parameters of cardinality strictly less than λ, and it is saturated if it is λ-saturated where λ is the cardinality of M .

17

do require addition axioms in our mathematics to allow us the chance to work with saturated models. If we feel that the reasons for including nonstandard systems as part of zoo of useful mathematical structures that we wish to accept and use, we would under the unifying view, require additional principle for existence of mathematical objects, but suitable additional principles are available as additional axioms in the ZFC style. This deals with the First Canonicity Theorem. The Second Canonicity Theorem adds a further complication. Given any two saturated models of the same inﬁnite cardinality, by standard methods in model theory there will always be a huge number of isomorphisms between the two. A consequence of the Second Canonicity Theorem is that there is precisely one automorphism of a structure that satisﬁes it, so the Second Canonicity Theorem fails for saturated systems of nonstandard numbers. This does not cause immediate problems for the Structuralist view, but it does have important consequences for the utilitarian view, to be discussed later. Inﬁnitesimals in the utilitarian view. In the utilitarian view, we would have evidence to support the existence of nonstandard number systems if we can show that such systems are useful and important enough. This might mean with relation to scientiﬁc theory, or to mathematics itself. We start here by looking at applications of inﬁnitesimals to mathematics, and look at possible application to other ares later. It is quite easy to say that inﬁnitesimals as used by Robinson are simply a technical device to simplify and code up the idea of ‘limit’ and this is in some sense correct. For example, there is no doubt that the nonstandard deﬁnition of derivative is simpler that the deﬁnition using limits, but arguably it does no more than use the same idea underlying that of ‘limit’ in a diﬀerent way. Against this criticism we might oﬀer the argument that Newton and Leibniz may not have come up with the diﬀerential calculus but for their thinking in terms of inﬁnitesimal quantities. Of course this is diﬃcult to judge so many years later. It is certainly true that many mathematicians today ﬁnd it easier to think in terms of inﬁnitesimal quantities, even if they later re-work their arguments in terms of limits. But also, many others prefer to think in terms of limits instead. Perhaps it has more to do with how one is (mathematically) nurtured, and at present current teaching methods at universities certainly emphasise limits rather than any alternative, and indeed inﬁnitesimals rarely enter the undergraduate curriculum at all. Another key thing to look at is whether inﬁnitesimals unify diﬀerent areas of existing mathematics and simplify the presentation of them or the statement of their results. In fact there is one area in which inﬁnitesimals do this beautifully: that of the parallel topics of diﬀerential equations and diﬀerence equations. A diﬀerential equation is an equation for an unknown function y(x) of a real variable involving the derivative y (x) of this function, or higher-order derivatives of this, y (x), y (x), etc. A diﬀerence equation is an equation for an unknown discrete function y(n) of a natural number variable involving the diﬀerence function ∆y(n) = y(n+1)−y(n) and possibly higher order diﬀerences ∆2 y(n) = ∆y(n+1)−∆y(n), ∆3 y(n), etc. That these two types of equation can be classiﬁed and solved by similar techniques is rather well-known, and a typical method for solving a diﬀerential equation numerically (i.e. approximately) on

18

a computer involves choosing a small step size h, approximating a continuous function y(x) by the discrete function y (n) = y(nh) and each derivative y (x) ˆ by (ˆ(n + 1) − y (n))/h = ∆ˆ(n)/h and so on. It is usually the case that the y ˆ y resulting equation can be rearranged to take the form y (n) = F (ˆ(n − 1)) or ˆ y possibly y (n) = F (ˆ(n − 1), y (n − 2), . . . , y (n − k)) so that on choosing approˆ y ˆ ˆ priate starting values y (0) (or y (0), y (1), . . . , y (k − 1)) one can generate all other ˆ ˆ ˆ ˆ values y (n) on the computer. A particularly simple example is known as the ˆ Euler method in which the diﬀerential equation y (x) = F (x, y(x)) is replaced by the diﬀerence equation y (n) = hF (hn, y (n − 1)) + y (n − 1). ˆ ˆ ˆ Diﬀerence equations like this have the advantage that it is obvious to see that some solution exists, though ﬁnding a closed expression for a solution is often more diﬃcult, whereas the existence of solutions of diﬀerential equations is often more delicate. Many diﬀerential equations can be solved exactly by nonstandard methods by the same numerical method by choosing an inﬁnitesimal step size h and solving the diﬀerence equation in the nonstandard world. Once again, the existence of the solution is usually obvious, and this gives a rapid existence proof for solutions of some kinds of diﬀerential equations. So inﬁnitesimals and nonstandard methods unify numerical methods with the classical analysis of real valued functions. There are other examples too. Nonstandard methods allow one to give adequate nonstandard approximations of useful but classically-speaking ﬁctitious functions such as the delta function. Nonstandard methods allow problems in calculus of variations be solved by traditional means such as using Lagrange multipliers maximising or minimising a function of nonstandard-inﬁnitely many variables with constraints. Thus nonstandard methods at least highlight the connection between discrete problems such as diﬀerence equations and analytic problems such as diﬀerential equations via versions of numerical methods normally used to ﬁnd approximate solutions. This is not quite the same thing as unifying these problems—seeing them as all the same kind of problem. (That would appear to be a useful project for some other time.) Do inﬁnitesimals and nonstandard methods permit new kinds of mathematics to be done that could not easily have been achieved without them? Here I think the jury is still out. Certainly it was hoped (by Robinson and people following him) that some signiﬁcant problems in analysis could be solved by nonstandard means, and in one case, Robinson himself solved an important outstanding problem in analysis by nonstandard means before Halmos identiﬁed the key ideas and presented an alternative classical argument. In fact it seems that most work in nonstandard analysis with impact on problems that can be stated purely in the classical language of analysis has been conﬁned to ﬁnding elegant nonstandard methods to existing problems with classical methods already known. It seems that for such problems, nonstandard methods and classical methods using limits are too close: it is a little too easy for experts to translate between one method and another. Where nonstandard methods are most useful, in my opinion, is that they allow the construction of interesting new analytical structures based on traditional discrete structures. 19

One possible stumbling block for the introduction of nonstandard analysis is that the procedure of arguing for classical results, about the reals R for example, using nonstandard means involves switching between two worlds: the ordinary real world and the hyper-real world. This is the moment where ‘inﬁnitesimal quantities are neglected’ which was most problematic for Berkeley and others, but is given a precise justiﬁcation by Robinson. In some accounts, this is done almost algorithmically—adding or removing stars from symbols, taking ‘standard parts’ and ignoring inﬁnitesimal quantities according to tightly deﬁned rules. Alternatively one does it using the standard tool-kit of ﬁrst order logic, which is elegant and comprehensive, but perhaps too much for beginners, especially students, to learn. This is also an issue with the subject and unfortunately a misunderstanding of some of these rules can lead to errors. Obviously there is still work to do in this direction too. The tentative conclusion to this part of the discussion is that inﬁnitesimals and nonstandard numbers in general do seem to form a useful system or systems by which to do mathematics, and (with some careful warnings about potential error that might occur if the methods are incorrectly applied) we might encourage more mathematicians to believe in their existence and usefulness. Now we address similar questions about physical existence of nonstandard numbers. The most obvious remark is that inﬁnitesimals seem to be useless to measure traditional quantities such as time and space, as inﬁnitesimal amounts of time and space would be too small to be measured in any conventional sense. Nor is there (to my knowledge) any physical theory in which inﬁnitesimals are potential measurements for physical quantities. This remark is a bit glib, as it presupposes that the traditional real-valued measurements of space and time are ‘correct’. So let us speculate for a moment what such a theory with nonstandard values for measurements might look like. Suppose some physical quantity—we will call it the mass of a particle, but nothing we say will be speciﬁc to this—is measured in a nonstandard system and the measurement may take inﬁnitesimal values. Then the physical theory will only ‘see’ the orbit of this value obtained by the measurement, i.e. the set of automorphic images of it under the automorphism group of the number system. Rather than seeing a hyperﬁne continuum of possible particles with inﬁnitesimal masses we would see a classiﬁcation of ‘types’ of masses based on the orbits of the values. However, the measurement of this one value will aﬀect the measurement of the mass of a second particle, since the orbit of a pair of points (u, v) is not necessarily the Cartesian product of the orbits of u and v individually. All of this looks suggestive of what actually happens in physics, i.e. that certain symmetry groups underlie physical structure and the result of one measurement may aﬀect another, but I am not enough of a physicist to see this idea through. It is not completely without precedent. Complex numbers pervade mathematical physics, but as we have discussed, they cannot be seen in isolation, at least not to the detail required to distinguish i from −i. One reason for proposing the quark model of hadrons was that they show structure and a non-zero size. Some particles, notably quarks and electrons and electron-like objects do not apparently have a size, i.e. they are point masses according to the best measurements possible today. But if we speculatively imagine physics at the scale of inﬁnitesimals, they might then have structure, 20

such as being made of smaller elementary particles at the inﬁnitesimal scale. These particles and any (inﬁnitesimal) distance between them would not show up as distances, but they would contribute to properties or quantum numbers describing the particle; they might be the key to understanding how a particle such as an electron might have a ‘hidden variable’ or to understanding the seemingly random processes in quantum mechanics. They might interact (possibly in inﬁnitesimal time) with other particles, and the large number of diﬀerent kinds of interactions might usefully be uniﬁed. From the ‘outside’, i.e. at noninﬁnitesimal scales, we would not actually see these inﬁnitesimal distances, but instead we would see the properties of inﬁnitesimals that are preserved by the automorphisms of the inﬁnitesimal number system. Of course most of the ideas in this section is complete speculation. The only message that I want to draw out from this discussion is not the detailed speculations as such but rather the fact that inﬁnitesimal quantities would not look inﬁnitesimal on the macroscopic scale, but rather would manifest themselves as ‘quantum numbers’ or properties of the objects concerned which would be identiﬁed through their classiﬁcation and patterns that relate closely to the symmetries and orbits of the situation at inﬁnitesimal level and in particular how the automorphism group of the nonstandard universe acts on such numbers and on the physical set-up at the inﬁnitesimal level.

4

The case for ‘pure’ structuralism

The main issues in the philosophy of mathematics that I wish to address concerning the existence and nature of mathematical objects, theories of truth and deduction about them, and theories of knowledge of mathematical truths are discussed in Hart’s volume The Philosophy of Mathematics [4]. I will present some speculations and personal views that will need defending further elsewhere, but I will not do these points justice here, and certainly do not answer them fully. My aim is simply to show how discussion of other kinds of objects can stimulate this discussion. Firstly, there is a major diﬃculty with both combining theories of both truth and knowledge of mathematics and mathematical objects with theories of ordinary everyday objects. (See Benacerraf [1], reprinted as Hart [4, Chapter I].) Mathematical objects, if they are to have Tarskian semantics must have something like platonic existence, but it is diﬃcult to see how this allows us to interact with them to obtain knowledge of them Secondly, if mathematical objects are to be viewed in a structuralist view (as I think they must, for it is impossible to conceive any other nature for them and this is in any case closest to the working point of view of most pure mathematicians) we have the issue that ‘structuralism’ is subject to circularity: structures explain objects, but being objects must be explained ﬁrst. (See Parsons [7], reprinted as Hart [4, Chapter XIII], for a more detailed account.) It seems to me that what I have called the utilitarian view is the only reasonable way to justify new axioms or new mathematics. (It corresponds to a modern and somewhat more pluralistic version of Hilbert’s programme with ‘levels’ of realness and diﬀerent modes of application, and it is these applications and their success that give the axioms credibility.) Structuralism is the only reasonable way to manipulate these diﬀerent kinds of mathematics. One

21

can go further and postulate a unifying theory that brings various stands together. This may be a matter of taste, or a matter for some other overall view of mathematics (an overarching constructive or intuitionistic one perhaps) but I ﬁnd the evidence given above that the bulk of mathematics is utilitarian and structuralist compelling. Canonicity results are important for both structuralism and the utilitarian approaches, and I have argued, the issues of existence associated with canonicity are very much ‘true to life’ too, possibly even having the ability to explain some phenomena from quantum mechanics that from a na¨ point of view seem ıve unnatural. The other major notion that arises from my discussion above of how mathematics is typically done is that of quasi-semantic reasoning, and reasoning about ‘imagined’ structures. This doesn’t quite ﬁt the usual semantics versus syntax canon that we have come to expect from foundational results in mathematics about mathematics, but is natural, commonplace, and appears reliable enough, especially with the additional checks that mathematicians employ.5 To speculate ﬁrst on the ﬁrst issue, the similarity or otherwise of truth and knowledge in the mathematical and everyday realms, it seems to me that the theoretical dichotomy between pure Tarskian semantics and formal ﬁrst-order theories and their syntax is stretching things rather and in everyday life, as well as in mathematics, some sort of quasi-semantic arguments take place rather more often than Hart’s introduction (op cit) would have us think. To take his example, the ‘trite’ example that ‘All bachelors are unmarried.’ We see that this is true in a quasi-semantic way, not by some argument in a formal system, by observing the semantic meaning of the deﬁnition of ‘bachelor’ and ‘married’ and making a connection in an (imagined) structure of people, some of known are bachelors, and some of whom are married. Hart’s more worldly example, ‘All bachelors are sexually frustrated,’ is determined false not by engaging in an opinion poll of bachelors on the street to see if we can ﬁnd one that is not sexually frustrated or else exhaust the supply, but rather by recalling from past experience some bachelor, that we might have been envious of perhaps, that was not sexually frustrated. We don’t actually know if this individual has since got married (if so he would be no use as an example for us) but we do have a semantic conception of the world and its people as being large with all reasonable possibilities represented, does not change particularly fast, and that our experience is rather more limited. Since it was not diﬃcult to ﬁnd a counterexample 20 years ago, things are unlikely to be diﬀerent now. To answer a critic who says this argument is not proof enough, we reserve the right to carry out an opinion poll, or possibly scan the men’s magazines to see if such a poll has already been carried out. Other examples work similarly. Benacerraf’s example, ‘There are at least three perfect numbers greater than 17’ can be determined by an opinion poll—or rather a computer search—but other quasi-semantic arguments are more satisfactory are more revealing, and the reason why mathematicians prefer these other arguments is that it gives more information, not because a computer search is out of the question. There seems to be a spectrum of modes of informal argument for propositions

5 Weaved in with all of this are psychological eﬀects, and we must ask to what extent do mathematicians work in the way they do because it is convenient and productive for them, and to what extent do they actually need to because of the nature of mathematics itself? This question clearly requires further study.

22

of all types, and I suggest that these arguments are essentially semantic in nature. In mathematics there are arguments about propositions concerning numbers that could possibly be found by calculation or computer search, and these are the ‘real’ propositions of Hilbert, but in mathematics there are also indirect means for argument, and many levels of indirectness, corresponding to Hilbert’s ‘ideal mathematics’. But so too are there indirect means for arguing in real life. The fact that mathematical argument can be formalised as a syntactic system is an interesting and useful fact (useful for reliability, communication and verifying arguments) but it is no accident that the so-called ‘natural deduction’ rules are based on quasi-semantic steps. Nevertheless, if we are going to argue that mathematical arguments and knowledge, like other kinds of knowledge, are essentially informal but semantic, we will have the problem of providing some details of these semantics and showing that these arguments determine truths—in particular that mathematical truths are truth like any others. Here the usual approach is to try to identify the objects to which Tarskian semantics apply, and this is fraught with diﬃculties. The same diﬃculties apply in everyday arguments: for example, which is the object corresponding to the ideal present-day sexually satisﬁed bachelor that we know exists by rather convincing means based on personal experience of the world? This too I will have to leave aside for further detailed discussion another time. But in some sense we all use these arguments and they do work, even if they correspond to neither the usual Tarski semantics nor any precisely delineated syntactical formal system. Objects, I am arguing, are of necessity abstract objects, described by what they do, in both mathematics (where this requirement is quite clear) and ordinary experience (where the issues are muddied by the possibility of the apparent availability of a pollster’s approach to truth). In other words they are objects presented to us by some kind of structuralist view of abstract objects. One take on structuralism is that objects are presented as being part of a structure (perhaps a structure for a ﬁrst order language) and are identiﬁed in some way by what they do, i.e. what properties they have in that structure. But, according to Parsons, this has two major diﬃculties. The ﬁrst is that many objects have the same properties and therefore must be identiﬁed in some way. But it is diﬃcult to see how to do this: are we somehow taking a typical or particular or canonical example of each? If so it seems diﬃcult to see how one could be chosen over the others. Or are we taking the equivalence class of all such objects? This has issues with the underlying theory of sets, of course, but I feel uncomfortable with this as the equivalence class of an object is not the same sort of thing as the object itself—the equivalence class construct has added extra unwanted structure onto the object. The second diﬃculty is that the ﬁrst-order structure which gives the abstract objects their deﬁnition is apparently itself an abstract object so needs to be deﬁned in the structuralistic way in terms of some other structure, and this creates circularity. It seems to me that these problems may be resolvable, and indeed I have argued that they must be resolvable. The mistake (and I believe it is a mistake) is focusing on ‘objects’ too strongly. Structuralism is a way of looking at things, but it isn’t itself deﬁned in terms of objects. Structuralism is a pair or spectacles, or a lens, or ﬁlter, which we look through at things. The spectacles or ﬁlter removes properties that we are not interested in and leaves us looking at things 23

with certain limited properties relating to some speciﬁc operations. The abstract objects that this gives us are the ‘things’ identiﬁed by their properties. Thus an object exists if there are ‘things’ that correspond to it but the object is not one thing nor the set of all of them, but an abstraction of all of them by their property. Mathematical examples are easy to ﬁnd, but the everyday example I used earlier is a god one to think about. In the collection of things in front of me there are (or so the physical theory says) air molecules, but the abstract object ‘air molecule’ is not a particular air molecule or the set of them all but an abstraction of those properties perceived through the ﬁlter. Having chosen the structuralist view, in discussions about circularity where there is a choice one must put the structuralist view ﬁrst. This means that the ﬁlter through which we see things is not an abstract object at all. It is a sort of description of how we are for the moment looking at things. Such descriptions are typically very simple. It might just be that we are looking at the things in front of us as a collection of molecules, and not as tables, chairs, etc. But these ﬁlters or descriptions will be diﬃcult to express in words. The mistake made by many with respect to structuralism is, I believe, that they think of the ﬁlters or spectacles as arising from structures which are objects. But they are not objects, nor do they arise from objects: they are something else less easy to pin down, akin to ‘pure descriptions’. When we choose to be more precise about them what we are actually doing is modelling the ﬁlter with a theory of objects, just as we choose to model space with a mathematical theory in which distances are given by real numbers, or we choose to model the ﬂight of a projectile as the movement of a point-mass particle in a uniform gravitational ﬁeld. Structuralism taken this way (where the structuralist view is taken as primary and is modelled by abstract objects rather than deﬁned by abstract objects) I call ‘pure structuralism’. Clearly the scope of this paper has not allowed for any detailled look at it, and there is much work still to do. It seems to me that the ﬁlters or spectacles are well-modelled by the idea of forgetful functor in category theory, and category theory should also provide a kind of semantics that is close in spirit to pure structuralism than the Tarskian one.

5

Questions for further research

Examine mathematics from the point of view of mathematics done on a computer with a computer algebra system, for exmaple. To what extent does the computer science concept of abstract object correspond to the one suggested above? Do the considerations above suggest any improvements to the object-orientated paradigm for computer programming? Does the fact that mathematics has several diﬀerent compatible viewpoints (and diﬀerent ways of making them compatible) add to the weight of its results or make cloudy water even more murky? What precisely is informal ‘quasi-semantic’ argument and how does it contrast with more formal modes of argument? What is it good for and what is it not so good at? There is also the interesting issue of the role of ﬁrst-order logic and what I have called ‘quasi-semantic’ arguments. As I have said, it is easy and natural to argue in ZFC ‘quasi-semantically’ with reference to an imagined universe or part of that universe. Then the syntactic rules for ﬁrst-order logic are justiﬁed quasi-

24

semantically, by a reﬂection on this imagined universe and an argument similar to the soundness theorem. In principle, quasi-semantic arguments of this form are more powerful, as other rules could in principle be imagined and used that go beyond the usual ﬁrst-order logical rules. In practice this rarely occurs, and one wonders why. Is this some psychological phenomenon restricting the mathematician’s imagination, perhaps related to the ‘pessimism’ mentioned earlier? Or is there some deeper philosophical reason that puts these imagined universes and quasi-semantic reasoning about them on a ﬁrmer foundation? something to do with an informal Completeness Theorem for such quasi-semantical deductions? In any case, it always seems remarkable to me that the true Completeness Theorem makes excellent predictions about provability in ﬁrst-order theories despite its non-constructive nature and the fact that AC is required to prove it.

References

[1] Paul Benacerraf. Mathematical truth. J. Philos., 70(19):661–679, 1973. [2] C. C. Chang and H. J. Keisler. Model theory, volume 73 of Studies in Logic and the Foundations of Mathematics. North-Holland Publishing Co., Amsterdam, third edition, 1990. [3] Timothy Gowers. Mathematics, a very short introduction. Oxford University Press, Oxford, 2002. [4] W.D. Hart, editor. The Philosophy of Mathematics. Oxford University Press, 1996. [5] Richard Kaye. The mathematics of logic. Cambridge University Press, Cambridge, 2007. A guide to completeness theorems and their applications. [6] Roman Kossak. What are inﬁnitesimals and why they cannot be seen. Amer. Math. Monthly, 103(10):846–853, 1996. [7] Charles Parsons. The structuralist view of mathematical objects. Synthese, 84(3):303–346, 1990. [8] W. V. Quine. Two dogmas of empiricism. In The philosophy of language, pages 39–52. Oxford Univ. Press, New York, 1996. [9] Abraham Robinson. Non-standard analysis. North-Holland Publishing Co., Amsterdam, 1966. [10] Petr Vopˇnka. Mathematics in the alternative set theory. BSB B. G. Teube ner Verlagsgesellschaft, Leipzig, 1979. Teubner-Texte zur Mathematik. [Teubner Texts in Mathematics], With German, French and Russian summaries.

**Appendix: commentary on the diﬀerent views
**

[consider deleting some of this or merging with the text above] The discussion throughout this paper is about existence of abstract objects such as numbers and number systems in mathematics, and of course the main 25

area of interest is in the foundations of mathematics. Because of the nature of the main questions addressed in this paper I have tended to take the working mathematician’s point of view as understood, and have rather skated over the fundamentals behind this. It is time to make amends and address the foundation for this view, and put a bit more ﬂesh on what a mathematician might mean by ‘existence’. It may seem strange to discuss this at the very end of the paper, but ideas here require some understanding of a few key examples, in particular that of real numbers, as discussed earlier. The notion of real number is a good place to start, since it is straightforward enough for most working mathematicians to appreciate, and yet complicated enough for a number of diﬀering views to have been put forward, especially in the ﬁrst half of the twentieth century. But to most people, there is a clear concept of real number based initially perhaps on the idea of a decimal expansion with several examples, including terminating decimal expansions, such as that √ for 1/4, repeating ones, such as for 1/7, non-repeating ones such as for 2 and more complicated ones such as for π. After some point, with these examples in mind one can abstract the idea of a general decimal expansion, and develop the concept of real number from that. (Of course some other intuition, such as Dedekind cuts can be used in place of decimal expansions.) One feels justiﬁed in this endeavour at the point when one writes down a set of very reasonable-looking axioms for the resulting system of numbers and proves but the existence theorem for such a system based on decimal expansions and proves both canonicity theorems. It seems to me that two very positive mental acts are being described in the last paragraph, and both give strong evidence towards the existence of a system √ of real numbers populated by familiar numbers such as 1/4, 1/7, 2 and π. The ﬁrst is the moment of abstraction when after several calculations with particular examples one realises that every sequence of decimal digits corresponds to a real number, and (ignoring the technical problem with recurring sequences of 9s) that each distinct decimal corresponds to a distinct real number and that conversely the mental view of a number on a number line shows that each can be measured by a sequence of decimal digits. The second is the moment of axiomatisation and the theorems of existence and canonicity that shows that there is essentially only one system of real numbers. The ﬁrst is a private moment of insight: after playing with calculations and symbols on a page I suddenly have an inkling of this particular kind of number in all its generality. The second is a way of sharing this insight: now that I can describe my real numbers and prove that my real numbers are the same as yours I can discuss them with you to examine their structure in much more detail. It seems that for most people this is ample evidence that the system of real numbers exists in as concrete a way as is required. It is just as robust, or perhaps more robust, than some of the notions of the physical objects around us and their qualities, and we have excellent reasons for believing that these mathematical objects are viewed in the same way by all other mathematicians—arguably much better reasons than we might have for believing that some particular patch of grass is actually seen as the same colour by all individuals, irrespective of the label ‘green’ that they choose to put on that colour. Against this evidence, some counter-arguments have been put forward. Most centre round diﬃculties in the idea of an arbitrary sequence of digits (or an arbitrary bounded set of rational numbers, or whatever is relevant on one’s favourite 26

conception of the reals). When one examines it, it seems that this idea of ‘arbitrary set’ or ‘sequence’ is harder to pin down that one might expect, and it is essential for full understanding of the reals, especially for canonicity. One alternative that has been proposed is the formalists’, which says that general sequences of digits exist in an ideal sense and we can accept them in this limited way because this belief in them and their properties does not impact in any neg√ ative way on our conception of concrete real numbers such as 2 and π—indeed more, that this world of ideal sequences might provide new information about our familiar numbers. Another alternative is the intuitionists’ which says that only numbers that can actually be constructed (such as via a computer program that prints out their digits) can be accepted—there are no others; and all decisions (such as whether one number is bigger than another) have to be made in a similar constructive manner. Cases can be made for both these points of view, especially in certain areas of research in which they are relevant, such as foundations of mathematics, theoretical computer science, philosophy of scientiﬁc method, etc. But most mathematicians reject them for practical reasons. It is as if mathematics has moved on a few steps beyond the formalists or intuitionists, in abstracting a number of important concepts as objects satisfying axioms; perhaps this process of abstraction is genuinely necessary for human or social or scientiﬁc reasons, is that if it is to be useful and be used as one of the building blocks of the next piece of theory then we have to in some sense be able to mentally picture these objects and believe in their existence. For another example that clearly separates the formalist, intuitionist and classical mathematician, consider the natural numbers, the ordinary counting numbers. For an intuitionist, the natural numbers are given. They are the starting point for the rest of mathematics. For a formalist they are strokes on a page, together with rules that combine sequences of strokes, or compare two diﬀerent such sequences. To a classical mathematician, the intuitionistic approach explains little, and in particular there is no place for Dedekind’s elegant axiomatic description of the natural numbers and the canonicity of this system, because it rests on more complex notions such as arbitrary sets of numbers, something the intuitionist rejects. Similarly, the formalist approach is cumbersome, and in it it seems that numbers always have to be manipulated using these formal systems. There is no place for the intuition of number that suddenly arises in a child when he or she sees some sheep and exclaims for the ﬁrst time ‘Three sheep!’ having identiﬁed and abstracted the concept of ‘three’ and therefore no longer has to perform any tedious matching against a collection of apples to determine that of the set of apples there is exactly one apple each for the sheep to eat. Abstract objects in general presumably arise in the same sort of way, even outside mathematics. Perhaps by playing a mental game, maybe with symbols following rules, or using a formal system, or by argumentation using a known style of argument, or whatever, one sees through intuition a pattern. The description of the pattern and analysis of how it arises is then the ﬁrst part of abstracting the pattern into an object or system of objects that have personal meaning to us. The next stage is to describe that pattern more fully and show that in some sense it is canonical, enough to communicate it with other people that have a concept of the same sort of pattern, and a shared concept arises for discussion and research, and in this discussion and research it is most convenient to talk about and believe that these objects have some sort of prior existence 27

as abstract objects. It’s important to remember that failure of canonicity does not necessarily indicate the failure of this programme. An object—be it a number system or whatever—may not be canonical, but that may be more of the fault of the purported deﬁnition. Even if the object is canonical, the failure of the Second Canonicity Theorem may be an issue. But on the other hand, this may be something one can live with, or may even be necessary for theories based on the concept.

Appendix: Inﬁnitesimals

This section provides a slightly technical background on nonstandard number systems and inﬁnitesimals. Just as with the complex numbers, where the addition of i to the reals necessarily requires us to add other numbers of the form x + iy to preserve as many of the usual properties of arithmetic that we expect, adding inﬁnitesimals to the reals requires us to add other numbers too. Thus if h is a positive inﬁnitesimal, −h should be a negative inﬁnitesimal and 1/h a positive inﬁnite number; also π +h is a new number that is inﬁnitesimally close to π, but slightly larger than it, and so on. We get a picture of a system of hyper-real numbers which is like that of an extended real number line where each real number is ‘fattened’ to a set of numbers all inﬁnitesimally close to that real number. This set of numbers inﬁnitesimally close to a real number x is usually called the monad of x and written µ(x) or st−1 (x). As well as containing inﬁnitesimals, our hyperreal number line also contains inﬁnite numbers larger than previously existing reals, and also the negatives of these inﬁnite numbers. There are ﬁnite hyperreals (i.e. ones that are in magnitude bounded by ordinary real numbers) and inﬁnite hyper-reals (other ones which are greater than all normal real numbers in magnitude). One non-obvious fact that follows from the completeness of the real number system is that every ﬁnite hyper-real y lies in the monad µ(x) of some standard real number x, which is uniquely determined by y. This standard real is called the standard part of y and written as st(y). The previous paragraph gives an account of the intuitive structure of the hyper-reals, but rather fails to explain why such number systems exist. In fact, prototype systems with inﬁnitesimals can be constructed by algebraic means similar to the construction of the complex numbers, in a way in which the usual arithmetic laws of addition and multiplication hold, but such systems are not particularly rich or useful. For analysis and much other mathematics we need plenty of other functions to be deﬁned, and for example it is not clear how by algebraic means we might deﬁne the value of the function sin(1/x) at an inﬁnitesimal h. Because h is inﬁnitesimal and sin(1/x) varies wildly between 0 and 1 near x = 0 the choice of sin(1/h) seems to be arbitrary. And there are many other similar arbitrary choices to make. This is where tools from logic help, and one method of constructing hyper-reals is to apply the Completeness Theorem in logic as described above. Essentially what is happening is that one of the axioms of ZFC, the Axiom of Choice or AC,6 usually via its equivalent form, Zorn’s Lemma, is applied to decide all these arbitrary choices simultaneously,

6 In this paper, AC for the Axiom of Choice is not to be confused with AC for Alternating Current. Both acronyms are too ﬁrmly set in place to be changed, unfortunately.

28

and in a consistent way relative to all the other properties of the real numbers. Thus the ﬁrst true models of nonstandard analysis contain inﬁnitesimals h, have many or perhaps all real-valued functions deﬁned, and satisfy all ﬁrst order statements that were already true in the reals. An alternative and popular but related method of construction is to use an ultraﬁlter. Essentially an ultraﬁlter is a set theoretic object which encodes a collection of inﬁnitely many choices, and which is shown to exist in ZFC by an application of Zorn’s Lemma. The hyper-reals are now constructed by taking a large Cartesian product RN of the reals and using the ultraﬁlter to make decisions as to which elements of this product should be regarded as equal and how functions such as sin(1/x) should be deﬁned on it. This type of construction goes back to the Polish mathematician Lo´ and his work in the 1950s, though s similar ideas were already being used by Skolem (in a slightly less powerful way) to construct nonstandard models of counting numbers in the 1920s and 1930s.7 A third method of construction uses G¨del’s Incompleteness Theorem. Tako ing for T to be one’s favourite consistent recursively axiomatised theory of arithmetic, the theory of ﬁrst-order Peano Arithmetic (PA) being a common choice, by G¨del’s Second Incompleteness Theorem the theory with the axioms of T o together with a single extra statement ¬ Con(T ) saying that T is inconsistent is itself consistent, so by the Completeness Theorem of logic has a model or system of numbers satisfying it. In this model, the statement that there is a proof of an inconsistency 0 = 1 from the axioms of T is true, but we knew in advance that there is no such proof in the real world. Therefore that proof of 0 = 1 is a nonstandard object and it turns out rather quickly that it must have inﬁnite length. So our model has inﬁnite numbers. Now we can replicate the standard construction of the integers, rational numbers and real numbers starting with this model rather than starting from the usual set of counting numbers N. The result is a nonstandard model resembling the reals and containing all standard reals. For the purposes of this paper, constructions by using tools from ﬁrst order logic and constructions by ultraﬁlters are perfectly reasonable constructions of nonstandard number systems. Nonstandard analysts tends to dismiss the third method based on the incompleteness theorems, because it is inconvenient to check that statements they know true in the reals (such as the continuity of the sin function perhaps) are indeed expressible and provable in the theory T . We shall also dismiss this third construction for a diﬀerent but related reason. That is, starting with a theory T that we know is consistent we build a model in which the theory T is not consistent. In some sense the model is wrong about the consistency of T and therefore we must reject it as not being a ‘true model of inﬁnitesimals’. Newton and Leibniz’s account already contain some ideas of when an inﬁnitesimal number can be ignored and when it must not. For example, the ratio of two inﬁnitesimals x/y should be calculated as it could turn out to be zero, inﬁnite or some ﬁnite number, but in the sum of a real number and an inﬁnitesimal a + h the inﬁnitesimal can often be neglected. Berkeley reasonably criticised this because the rules for when an inﬁnitesimal may be neglected were not explained. Robinson’s nonstandard analysis provides these rules, and simpliﬁes many of the deﬁnitions of analysis that according to Cauchy and others

7 References

required.

29

need the concept of limit. For example, the sum of a real number and an inﬁnitesimal a+h is a number in the monad of a. The notion of continuous function is one that maps monads to monads. More precisely, a function f deﬁned on the reals is continuous at a if ∗f (a + h) ∈ µ(f (a)) for all inﬁntesimals h, where ∗f is the nonstandard version on the function f . (The function f is deﬁned on the real numbers only so for technical reasons the corresponding function deﬁned of the hyper-reals is a diﬀerent function ∗f , though it extends f in as natural a way as possible.) In post-Cauchy mathematics, this is a theorem: one can prove quite easily the equivalence of Cauchy’s deﬁnition of continuity and the one just given. Diﬀerentiability and all other notions from analysis can be given a similar treatment. Newton and Leibniz calculated the derivative of a function f at a by computing (f (a + h) − f (a))/h for an inﬁnitesimal h and then neglecting inﬁnitesimals afterwards. (For a continuous function f we have just seen that f (a + h) − f (a) is inﬁnitesimal, so this is the ratio of two inﬁnitesimals which initially must be calculated in some diﬀerent way.) In nonstandard analysis, f is diﬀerentiable at a (with derivative b) if the quantity (∗f (a + h) − ∗f (a))/h always lies in the same monad (the monad µ(b)) irrespective of which nonzero inﬁnitesimal h is taken. This appears to be exactly what Newton and Leibniz intended, and agrees with Cauchy’s deﬁnition exactly.

30