This is a modern-English version of Introduction to Mathematical Philosophy, originally written by Russell, Bertrand. It has been thoroughly updated, including changes to sentence structure, words, spelling, and grammar—to ensure clarity for contemporary readers, while preserving the original spirit and nuance. If you click on a paragraph, you will see the original text that we modified, and you can toggle between the two versions.

Scroll to the bottom of this page and you will find a free ePUB download link for this book.

Library of Philosophy

Philosophy Library

EDITED BY J. H. MUIRHEAD, LL.D.

INTRODUCTION TO MATHEMATICAL
PHILOSOPHY

INTRO TO MATH PHILOSOPHY

By the same Author.

By the Same Author.

PRINCIPLES OF SOCIAL RECONSTRUCTION. 3rd Impression. Demy 8vo. 7s. 6d. net.

PRINCIPLES OF SOCIAL RECONSTRUCTION. 3rd Impression. Demy 8vo. £7.50. net.

"Mr Russell has written a big and living book."—The Nation.

"Mr. Russell has written a significant and vibrant book."—The Nation.

ROADS TO FREEDOM: SOCIALISM, ANARCHISM, AND SYNDICALISM. Demy 8vo. 7s. 6d. net.

ROADS TO FREEDOM: SOCIALISM, ANARCHISM, AND SYNDICALISM. Demy 8vo. £7.50 net.

An attempt to extract the essence of these three doctrines, first historically, then as guidance for the coming reconstruction.

An effort to capture the essence of these three beliefs, first historically, and then as a guide for the upcoming reconstruction.

London: George Allen & Unwin, Ltd. [Pg iii]

INTRODUCTION TO MATHEMATICAL PHILOSOPHY

BERTRAND RUSSELL

LONDON: GEORGE ALLEN & UNWIN, LTD.

NEW YORK: THE MACMILLAN CO. [Pg iv]

First published May 1919

Second Edition April 1920

PREFACE

THIS book is intended essentially as an "Introduction," and does not aim at giving an exhaustive discussion of the problems with which it deals. It seemed desirable to set forth certain results, hitherto only available to those who have mastered logical symbolism, in a form offering the minimum of difficulty to the beginner. The utmost endeavour has been made to avoid dogmatism on such questions as are still open to serious doubt, and this endeavour has to some extent dominated the choice of topics considered. The beginnings of mathematical logic are less definitely known than its later portions, but are of at least equal philosophical interest. Much of what is set forth in the following chapters is not properly to be called "philosophy," though the matters concerned were included in philosophy so long as no satisfactory science of them existed. The nature of infinity and continuity, for example, belonged in former days to philosophy, but belongs now to mathematics. Mathematical philosophy, in the strict sense, cannot, perhaps, be held to include such definite scientific results as have been obtained in this region; the philosophy of mathematics will naturally be expected to deal with questions on the frontier of knowledge, as to which comparative certainty is not yet attained. But speculation on such questions is hardly likely to be fruitful unless the more scientific parts of the principles of mathematics are known. A book dealing with those parts may, therefore, claim to be an introduction to mathematical philosophy, though it can hardly claim, except where it steps outside its province, [Pg vi] to be actually dealing with a part of philosophy. It does deal, however, with a body of knowledge which, to those who accept it, appears to invalidate much traditional philosophy, and even a good deal of what is current in the present day. In this way, as well as by its bearing on still unsolved problems, mathematical logic is relevant to philosophy. For this reason, as well as on account of the intrinsic importance of the subject, some purpose may be served by a succinct account of the main results of mathematical logic in a form requiring neither a knowledge of mathematics nor an aptitude for mathematical symbolism. Here, however, as elsewhere, the method is more important than the results, from the point of view of further research; and the method cannot well be explained within the framework of such a book as the following. It is to be hoped that some readers may be sufficiently interested to advance to a study of the method by which mathematical logic can be made helpful in investigating the traditional problems of philosophy. But that is a topic with which the following pages have not attempted to deal.

THIS book is primarily meant as an "Introduction" and does not aim to provide a comprehensive discussion of the issues it addresses. It seemed worthwhile to present certain findings, previously accessible only to those proficient in logical symbolism, in a way that minimizes difficulty for beginners. Great effort has been made to avoid dogmatism on questions that are still open to serious debate, and this goal has influenced the selection of topics discussed. The origins of mathematical logic are less clearly defined than its later developments, but they are of at least equal philosophical significance. Much of what is explored in the following chapters isn't strictly "philosophy," although these topics were considered philosophical until a satisfactory science of them emerged. For instance, the nature of infinity and continuity used to fall under philosophy but now belongs to mathematics. Mathematical philosophy, in a strict sense, may not include the specific scientific results achieved in this area; the philosophy of mathematics is expected to engage with questions at the edge of knowledge, where certainty has not yet been reached. However, speculating on such questions is unlikely to be productive without an understanding of the more scientific aspects of mathematics. A book addressing those aspects can thus be regarded as an introduction to mathematical philosophy, though it can hardly claim to engage with a legitimate part of philosophy except where it steps outside its boundaries. Nonetheless, it does explore a body of knowledge that, to those who embrace it, seems to undermine much traditional philosophy, as well as a significant portion of contemporary thought. In this regard, along with its relevance to still unresolved issues, mathematical logic is connected to philosophy. Therefore, given the subject's inherent importance, presenting a concise overview of the main findings of mathematical logic in a manner that requires neither familiarity with mathematics nor mathematical symbolism serves a purpose. Here, as elsewhere, the method is more crucial than the results for further research; and the method cannot be effectively explained within the confines of a book like this. It is hoped that some readers will be intrigued enough to pursue a study of the method that makes mathematical logic useful in exploring the traditional problems of philosophy. However, that is a topic that the following pages do not address.

BERTRAND RUSSELL.

Bertrand Russell.

[Pg vii]

EDITOR'S NOTE

THOSE who, relying on the distinction between Mathematical Philosophy and the Philosophy of Mathematics, think that this book is out of place in the present Library, may be referred to what the author himself says on this head in the Preface. It is not necessary to agree with what he there suggests as to the readjustment of the field of philosophy by the transference from it to mathematics of such problems as those of class, continuity, infinity, in order to perceive the bearing of the definitions and discussions that follow on the work of "traditional philosophy." If philosophers cannot consent to relegate the criticism of these categories to any of the special sciences, it is essential, at any rate, that they should know the precise meaning that the science of mathematics, in which these concepts play so large a part, assigns to them. If, on the other hand, there be mathematicians to whom these definitions and discussions seem to be an elaboration and complication of the simple, it may be well to remind them from the side of philosophy that here, as elsewhere, apparent simplicity may conceal a complexity which it is the business of somebody, whether philosopher or mathematician, or, like the author of this volume, both in one, to unravel. [Pg viii]

THOSE who believe that this book doesn't belong in the current Library because they see a distinction between Mathematical Philosophy and the Philosophy of Mathematics may want to check what the author has to say about this in the Preface. It’s not necessary to agree with his suggestions about redefining the field of philosophy by moving certain problems—like those of class, continuity, and infinity—over to mathematics in order to understand how the definitions and discussions that follow relate to the work of "traditional philosophy." If philosophers can’t agree to hand off the critique of these categories to any of the specialized sciences, it's crucial that they at least understand the exact meanings that the science of mathematics, in which these concepts are so significant, gives to them. Conversely, if some mathematicians think these definitions and discussions are just making things more complicated than they need to be, it may be helpful to remind them, from a philosophical perspective, that what seems simple can often hide a complexity that someone—be it a philosopher, a mathematician, or like the author of this volume, someone who is both—needs to unravel. [Pg viii]

CHAP.
__A_TAG_PLACEHOLDER_0__
__A_TAG_PLACEHOLDER_1__
1. __A_TAG_PLACEHOLDER_2__
2. __A_TAG_PLACEHOLDER_3__
3. __A_TAG_PLACEHOLDER_4__
4. __A_TAG_PLACEHOLDER_5__
5. __A_TAG_PLACEHOLDER_6__
6. __A_TAG_PLACEHOLDER_7__
7. __A_TAG_PLACEHOLDER_8__
8. __A_TAG_PLACEHOLDER_9__
9. __A_TAG_PLACEHOLDER_10__
10. __A_TAG_PLACEHOLDER_11__
11. __A_TAG_PLACEHOLDER_12__
12. __A_TAG_PLACEHOLDER_13__
13. __A_TAG_PLACEHOLDER_14__
14. __A_TAG_PLACEHOLDER_15__
15. __A_TAG_PLACEHOLDER_16__
16. __A_TAG_PLACEHOLDER_17__
17. __A_TAG_PLACEHOLDER_18__
18. __A_TAG_PLACEHOLDER_19__
__A_TAG_PLACEHOLDER_20__

[Pg ix]

INTRODUCTION TO MATHEMATICAL PHILOSOPHY

CHAPTER I

THE SERIES OF NATURAL NUMBERS

MATHEMATICS is a study which, when we start from its most familiar portions, may be pursued in either of two opposite directions. The more familiar direction is constructive, towards gradually increasing complexity: from integers to fractions, real numbers, complex numbers; from addition and multiplication to differentiation and integration, and on to higher mathematics. The other direction, which is less familiar, proceeds, by analysing, to greater and greater abstractness and logical simplicity; instead of asking what can be defined and deduced from what is assumed to begin with, we ask instead what more general ideas and principles can be found, in terms of which what was our starting-point can be defined or deduced. It is the fact of pursuing this opposite direction that characterises mathematical philosophy as opposed to ordinary mathematics. But it should be understood that the distinction is one, not in the subject matter, but in the state of mind of the investigator. Early Greek geometers, passing from the empirical rules of Egyptian land-surveying to the general propositions by which those rules were found to be justifiable, and thence to Euclid's axioms and postulates, were engaged in mathematical philosophy, according to the above definition; but when once the axioms and postulates had been reached, their deductive employment, as we find it in Euclid, belonged to mathematics in the [Pg 1] ordinary sense. The distinction between mathematics and mathematical philosophy is one which depends upon the interest inspiring the research, and upon the stage which the research has reached; not upon the propositions with which the research is concerned.

MATHEMATICS is a field that, starting from its most basic aspects, can be explored in two different ways. The more common approach is constructive, leading to increasingly complex ideas: from integers to fractions, real numbers, and complex numbers; from addition and multiplication to differentiation and integration, and then onto advanced mathematics. The other, less familiar approach moves towards greater abstraction and logical simplicity through analysis; instead of asking what can be defined and derived from initial assumptions, we instead seek broader concepts and principles that allow us to define or derive what we began with. This pursuit of the opposite approach defines mathematical philosophy in contrast to regular mathematics. It's important to recognize that the difference lies not in the subject matter, but in the mindset of the researcher. Early Greek geometers, moving from the practical rules of Egyptian land surveying to the general principles validating those rules, and then to Euclid's axioms and postulates, were engaged in mathematical philosophy as defined above; however, once those axioms and postulates were established, their deductive work, as seen in Euclid, belonged to mathematics in the usual sense. The distinction between mathematics and mathematical philosophy depends on the interest motivating the research and the phase the research is in, rather than the propositions being examined.

We may state the same distinction in another way. The most obvious and easy things in mathematics are not those that come logically at the beginning; they are things that, from the point of view of logical deduction, come somewhere in the middle. Just as the easiest bodies to see are those that are neither very near nor very far, neither very small nor very great, so the easiest conceptions to grasp are those that are neither very complex nor very simple (using "simple" in a logical sense). And as we need two sorts of instruments, the telescope and the microscope, for the enlargement of our visual powers, so we need two sorts of instruments for the enlargement of our logical powers, one to take us forward to the higher mathematics, the other to take us backward to the logical foundations of the things that we are inclined to take for granted in mathematics. We shall find that by analysing our ordinary mathematical notions we acquire fresh insight, new powers, and the means of reaching whole new mathematical subjects by adopting fresh lines of advance after our backward journey. It is the purpose of this book to explain mathematical philosophy simply and untechnically, without enlarging upon those portions which are so doubtful or difficult that an elementary treatment is scarcely possible. A full treatment will be found in Principia Mathematica;[1] the treatment in the present volume is intended merely as an introduction.

We can put the same distinction another way. The most obvious and straightforward things in math aren’t necessarily the ones that come first logically; they are actually those that fall somewhere in the middle. Just like the easiest things to see are those that aren’t too close or too far away, and not too small or too large, the easiest concepts to understand are those that aren't overly complex or overly simple (using "simple" in a logical sense). Just as we need two types of tools, the telescope and the microscope, to enhance our vision, we also need two types of tools to expand our logical understanding—one to help us advance to higher mathematics and another to help us return to the logical foundations of concepts we often take for granted in math. We’ll discover that by analyzing our everyday mathematical ideas, we gain fresh insights, new abilities, and the means to explore entirely new mathematical topics by taking new paths after looking back. The goal of this book is to explain mathematical philosophy in a straightforward and non-technical way, without delving into aspects that are so uncertain or challenging that a basic treatment is nearly impossible. A comprehensive treatment can be found in Principia Mathematica;[1] this book is meant to serve only as an introduction.

[1]Cambridge University Press, vol. I., 1910; vol. II., 1911; vol. III., 1913. By Whitehead and Russell.

[1]Cambridge University Press, vol. I, 1910; vol. II, 1911; vol. III, 1913. By Whitehead and Russell.

To the average educated person of the present day, the obvious starting-point of mathematics would be the series of whole numbers, $1,\ 2,\ 3,\ 4,\ \dots\ \text{etc.}$ [Pg 2] Probably only a person with some mathematical knowledge would think of beginning with 0 instead of with 1, but we will presume this degree of knowledge; we will take as our starting-point the series: $0,\ 1,\ 2,\ 3,\ \dots\ n,\ n + 1,\dots$ and it is this series that we shall mean when we speak of the "series of natural numbers."

To the average educated person today, the obvious starting point of mathematics would be the series of whole numbers, $1,\ 2,\ 3,\ 4,\ \dots\ \text{etc.}$ [Pg 2] Probably only someone with a bit of mathematical knowledge would think to start with 0 instead of 1, but we will assume this level of knowledge; we will begin with the series: $0,\ 1,\ 2,\ 3,\ \dots\ n,\ n + 1,\dots$ and it is this series that we will refer to when we talk about the "series of natural numbers."

It is only at a high stage of civilisation that we could take this series as our starting-point. It must have required many ages to discover that a brace of pheasants and a couple of days were both instances of the number 2: the degree of abstraction involved is far from easy. And the discovery that 1 is a number must have been difficult. As for 0, it is a very recent addition; the Greeks and Romans had no such digit. If we had been embarking upon mathematical philosophy in earlier days, we should have had to start with something less abstract than the series of natural numbers, which we should reach as a stage on our backward journey. When the logical foundations of mathematics have grown more familiar, we shall be able to start further back, at what is now a late stage in our analysis. But for the moment the natural numbers seem to represent what is easiest and most familiar in mathematics.

It’s only at a high point of civilization that we could use this series as our starting point. It must have taken many ages to figure out that a pair of pheasants and a couple of days are both examples of the number 2; the level of abstraction involved is not easy to grasp. And realizing that 1 is a number must have been a challenge. As for 0, it’s a very recent addition; the Greeks and Romans had no such digit. If we had started exploring mathematical philosophy in earlier times, we would have had to begin with something less abstract than the series of natural numbers, which we would arrive at as a step on our journey back. Once the logical foundations of mathematics become more familiar, we'll be able to start even further back, at what is now a more advanced step in our analysis. But for now, the natural numbers seem to represent what is simplest and most familiar in mathematics.

But though familiar, they are not understood. Very few people are prepared with a definition of what is meant by "number," or "0," or "1." It is not very difficult to see that, starting from 0, any other of the natural numbers can be reached by repeated additions of 1, but we shall have to define what we mean by "adding 1," and what we mean by "repeated." These questions are by no means easy. It was believed until recently that some, at least, of these first notions of arithmetic must be accepted as too simple and primitive to be defined. Since all terms that are defined are defined by means of other terms, it is clear that human knowledge must always be content to accept some terms as intelligible without definition, in order [Pg 3] to have a starting-point for its definitions. It is not clear that there must be terms which are incapable of definition: it is possible that, however far back we go in defining, we always might go further still. On the other hand, it is also possible that, when analysis has been pushed far enough, we can reach terms that really are simple, and therefore logically incapable of the sort of definition that consists in analysing. This is a question which it is not necessary for us to decide; for our purposes it is sufficient to observe that, since human powers are finite, the definitions known to us must always begin somewhere, with terms undefined for the moment, though perhaps not permanently.

But even though they're familiar, they're not truly understood. Very few people have a clear definition of what "number," "0," or "1" means. It's not hard to see that starting from 0, you can reach any other natural number by adding 1 over and over, but we need to define what we mean by "adding 1" and what "repeated" means. These questions are definitely not easy. Until recently, it was thought that at least some of these basic arithmetic concepts had to be accepted as too simple and primitive to be defined. Since all defined terms depend on other terms, it's clear that human knowledge must accept some terms as understandable without definition to have a starting point for other definitions. It's not clear that there must be terms that are impossible to define: it's possible that no matter how far back we go in defining, we could always go further. On the other hand, it might also be that once analysis is pushed enough, we reach terms that are genuinely simple and therefore logically can't be defined through further analysis. This is a question we don’t need to settle; for our purposes, it’s enough to note that since human abilities are finite, the definitions we know must always start somewhere, with terms currently undefined, though perhaps not forever. [Pg 3]

All traditional pure mathematics, including analytical geometry, may be regarded as consisting wholly of propositions about the natural numbers. That is to say, the terms which occur can be defined by means of the natural numbers, and the propositions can be deduced from the properties of the natural numbers—with the addition, in each case, of the ideas and propositions of pure logic.

All traditional pure mathematics, including analytical geometry, can be seen as entirely made up of statements about natural numbers. In other words, the terms used can be defined using natural numbers, and the statements can be derived from the properties of natural numbers, along with concepts and statements from pure logic in each case.

That all traditional pure mathematics can be derived from the natural numbers is a fairly recent discovery, though it had long been suspected. Pythagoras, who believed that not only mathematics, but everything else could be deduced from numbers, was the discoverer of the most serious obstacle in the way of what is called the "arithmetising" of mathematics. It was Pythagoras who discovered the existence of incommensurables, and, in particular, the incommensurability of the side of a square and the diagonal. If the length of the side is 1 inch, the number of inches in the diagonal is the square root of 2, which appeared not to be a number at all. The problem thus raised was solved only in our own day, and was only solved completely by the help of the reduction of arithmetic to logic, which will be explained in following chapters. For the present, we shall take for granted the arithmetisation of mathematics, though this was a feat of the very greatest importance. [Pg 4]

That all traditional pure mathematics can be derived from natural numbers is a relatively recent discovery, even though it had been suspected for a long time. Pythagoras, who believed that not only mathematics but everything else could be deduced from numbers, identified the biggest challenge to what we now call the "arithmetisation" of mathematics. He discovered the existence of incommensurables, particularly the incommensurability between the side of a square and its diagonal. If the length of the side is 1 inch, the diagonal’s length is the square root of 2, which seemed not to be a number at all. The problem this raised wasn’t completely solved until our own time, and it was only fully resolved with the help of reducing arithmetic to logic, which will be explained in the following chapters. For now, we’ll assume the arithmetisation of mathematics, even though this was an achievement of immense significance. [Pg 4]

Having reduced all traditional pure mathematics to the theory of the natural numbers, the next step in logical analysis was to reduce this theory itself to the smallest set of premisses and undefined terms from which it could be derived. This work was accomplished by Peano. He showed that the entire theory of the natural numbers could be derived from three primitive ideas and five primitive propositions in addition to those of pure logic. These three ideas and five propositions thus became, as it were, hostages for the whole of traditional pure mathematics. If they could be defined and proved in terms of others, so could all pure mathematics. Their logical "weight," if one may use such an expression, is equal to that of the whole series of sciences that have been deduced from the theory of the natural numbers; the truth of this whole series is assured if the truth of the five primitive propositions is guaranteed, provided, of course, that there is nothing erroneous in the purely logical apparatus which is also involved. The work of analysing mathematics is extraordinarily facilitated by this work of Peano's.

Having reduced all traditional pure mathematics to the theory of natural numbers, the next step in logical analysis was to simplify this theory to the smallest set of premises and undefined terms from which it could be derived. This was achieved by Peano. He demonstrated that the entire theory of natural numbers could be derived from three basic ideas and five fundamental propositions, in addition to those of pure logic. These three ideas and five propositions essentially became the foundations for all traditional pure mathematics. If they could be defined and proven in terms of others, then so could all pure mathematics. Their logical "weight," if you will, is equivalent to that of the entire series of sciences that have been deduced from the theory of natural numbers; the truth of this whole series is ensured if the truth of the five fundamental propositions is confirmed, provided, of course, that there are no errors in the purely logical framework involved. The analysis of mathematics is greatly aided by Peano's work.

The three primitive ideas in Peano's arithmetic are: $\text{0, number, successor.}$ By "successor" he means the next number in the natural order. That is to say, the successor of 0 is 1, the successor of 1 is 2, and so on. By "number" he means, in this connection, the class of the natural numbers.[2] He is not assuming that we know all the members of this class, but only that we know what we mean when we say that this or that is a number, just as we know what we mean when we say "Jones is a man," though we do not know all men individually.

The three basic concepts in Peano's arithmetic are: $\text{0, number, successor.}$ By "successor," he refers to the next number in the natural sequence. In other words, the successor of 0 is 1, the successor of 1 is 2, and so on. By "number," he is talking about the group of natural numbers.[2] He isn't assuming that we know every member of this group, but only that we understand what we mean when we say something is a number, just like we know what we mean when we say "Jones is a man," even though we don't personally know every man.

[2]We shall use "number" in this sense in the present chapter. Afterwards the word will be used in a more general sense.

[2]We will use "number" in this specific way in this chapter. After that, the term will be used more broadly.

The five primitive propositions which Peano assumes are:

The five basic propositions that Peano assumes are:

(1) 0 is a number.

0 is a number.

(2) The successor of any number is a number.

(2) The successor of any number is another number.

(3) No two numbers have the same successor. [Pg 5]

(3) No two numbers have the same next number. [Pg 5]

(4) 0 is not the successor of any number.

(4) 0 is not the next number after any number.

(5) Any property which belongs to 0, and also to the successor of every number which has the property, belongs to all numbers.

(5) Any property that belongs to 0, and also to the successor of every number that has the property, belongs to all numbers.

The last of these is the principle of mathematical induction. We shall have much to say concerning mathematical induction in the sequel; for the present, we are concerned with it only as it occurs in Peano's analysis of arithmetic.

The last of these is the principle of mathematical induction. We will discuss mathematical induction a lot later; for now, we are only focusing on it as it appears in Peano's analysis of arithmetic.

Let us consider briefly the kind of way in which the theory of the natural numbers results from these three ideas and five propositions. To begin with, we define 1 as "the successor of 0," 2 as "the successor of 1," and so on. We can obviously go on as long as we like with these definitions, since, in virtue of (2), every number that we reach will have a successor, and, in virtue of (3), this cannot be any of the numbers already defined, because, if it were, two different numbers would have the same successor; and in virtue of (4) none of the numbers we reach in the series of successors can be 0. Thus the series of successors gives us an endless series of continually new numbers. In virtue of (5) all numbers come in this series, which begins with 0 and travels on through successive successors: for (a) 0 belongs to this series, and (b) if a number $n$ belongs to it, so does its successor, whence, by mathematical induction, every number belongs to the series.

Let’s briefly look at how the theory of natural numbers comes from these three concepts and five propositions. First, we define 1 as "the successor of 0," 2 as "the successor of 1," and so on. It’s clear we can keep going with these definitions as long as we want, since, according to (2), every number we reach will have a successor, and according to (3), it can't be any of the numbers we've already defined; if it were, that would mean two different numbers share the same successor. According to (4), none of the numbers in the series of successors can be 0. So, the series of successors provides us with an infinite sequence of always new numbers. Because of (5), all numbers are part of this series, which starts with 0 and continues on through successive successors: for (a) 0 is included in this series, and (b) if a number $n$ is in it, then its successor is also in it, which means, by mathematical induction, every number belongs to this series.

Suppose we wish to define the sum of two numbers. Taking any number $m$ , we define $m + 0$ as $m$ , and $m + (n + 1)$ as the successor of $m + n$ . In virtue of (5) this gives a definition of the sum of $m$ and $n$ , whatever number $n$ may be. Similarly we can define the product of any two numbers. The reader can easily convince himself that any ordinary elementary proposition of arithmetic can be proved by means of our five premisses, and if he has any difficulty he can find the proof in Peano.

Suppose we want to define the sum of two numbers. Taking any number $m$ , we define $m + 0$ as $m$ , and $m + (n + 1)$ as the next number after $m + n$ . Based on (5), this gives us a definition of the sum of $m$ and $n$ , no matter what number $n$ is. Similarly, we can define the product of any two numbers. The reader can easily see that any basic arithmetic statement can be proven using our five premises, and if they have any trouble, they can find the proof in Peano.

It is time now to turn to the considerations which make it necessary to advance beyond the standpoint of Peano, who [Pg 6] represents the last perfection of the "arithmetisation" of mathematics, to that of Frege, who first succeeded in "logicising" mathematics, i.e. in reducing to logic the arithmetical notions which his predecessors had shown to be sufficient for mathematics. We shall not, in this chapter, actually give Frege's definition of number and of particular numbers, but we shall give some of the reasons why Peano's treatment is less final than it appears to be.

It’s now time to look at the reasons why we need to move beyond Peano’s perspective, who represents the ultimate refinement of the "arithmetization" of mathematics, to that of Frege, who was the first to successfully "logicize" mathematics, meaning he reduced the arithmetic concepts that his predecessors had established as adequate for mathematics to logic. In this chapter, we won’t provide Frege's definition of number and specific numbers, but we will outline some reasons why Peano's approach is not as complete as it seems.

In the first place, Peano's three primitive ideas—namely, "0," "number," and "successor"—are capable of an infinite number of different interpretations, all of which will satisfy the five primitive propositions. We will give some examples.

In the first place, Peano's three basic concepts—namely, "0," "number," and "successor"—can be interpreted in countless ways, all of which will meet the five basic propositions. We will provide some examples.

(1) Let "0" be taken to mean 100, and let "number" be taken to mean the numbers from 100 onward in the series of natural numbers. Then all our primitive propositions are satisfied, even the fourth, for, though 100 is the successor of 99, 99 is not a "number" in the sense which we are now giving to the word "number." It is obvious that any number may be substituted for 100 in this example.

(1) Let “0” represent 100, and let “number” refer to all numbers starting from 100 in the series of natural numbers. Then all our basic statements hold true, including the fourth one, because, while 100 is the successor of 99, 99 isn’t a “number” in the way we’re currently defining “number.” It’s clear that any number can replace 100 in this example.

(2) Let "0" have its usual meaning, but let "number" mean what we usually call "even numbers," and let the "successor" of a number be what results from adding two to it. Then "1" will stand for the number two, "2" will stand for the number four, and so on; the series of "numbers" now will be $\text{0, two, four, six, eight} \dots.$ All Peano's five premisses are satisfied still.

(2) Let "0" have its usual meaning, but let "number" mean what we usually call "even numbers," and let the "successor" of a number be what you get by adding two to it. Then "1" will represent the number two, "2" will represent the number four, and so on; the series of "numbers" now will be $\text{0, two, four, six, eight} \dots.$ All five of Peano's premises are still satisfied.

(3) Let "0" mean the number one, let "number" mean the set $1,\ \dfrac{1}{2},\ \dfrac{1}{4},\ \dfrac{1}{8},\ \dfrac{1}{16},\ \dots$ and let "successor" mean "half." Then all Peano's five axioms will be true of this set.

(3) Let "0" represent the number one, let "number" refer to the set $1,\ \dfrac{1}{2},\ \dfrac{1}{4},\ \dfrac{1}{8},\ \dfrac{1}{16},\ \dots$ and let "successor" mean "half." Then all five of Peano's axioms will hold true for this set.

It is clear that such examples might be multiplied indefinitely. In fact, given any series $x_{0},\ x_{1},\ x_{2},\ x_{3},\ \dots\ x_{n},\ \dots$ [Pg 7] which is endless, contains no repetitions, has a beginning, and has no terms that cannot be reached from the beginning in a finite number of steps, we have a set of terms verifying Peano's axioms. This is easily seen, though the formal proof is somewhat long. Let "0" mean $x_{0}$ , let "number" mean the whole set of terms, and let the "successor" of $x_{n}$ mean $x_{n+1}$ . Then

It’s clear that we can keep coming up with more examples. In fact, for any series $x_{0},\ x_{1},\ x_{2},\ x_{3},\ \dots\ x_{n},\ \dots$ [Pg 7] that is infinite, has no duplicates, has a starting point, and has no terms that can’t be reached from the start in a finite number of steps, we have a set of terms that confirm Peano's axioms. This is easy to see, although the formal proof takes a bit longer. Let "0" represent $x_{0}$ , let "number" refer to the entire set of terms, and let the "successor" of $x_{n}$ be $x_{n+1}$ . Then

(1) "0 is a number," i.e. $x_{0}$ is a member of the set.

(1) "0 is a number," that is $x_{0}$ is in the set.

(2) "The successor of any number is a number," i.e. taking any term $x_{n}$ in the set, $x_{n+1}$ is also in the set.

(3) "No two numbers have the same successor," i.e. if $x_{m}$ and $x_{n}$ are two different members of the set, $x_{m+1}$ and $x_{n+1}$ are different; this results from the fact that (by hypothesis) there are no repetitions in the set.

(3) "No two numbers have the same successor," i.e. if $x_{m}$ and $x_{n}$ are two different members of the set, $x_{m+1}$ and $x_{n+1}$ are different; this is because (by assumption) there are no duplicates in the set.

(4) "0 is not the successor of any number," i.e. no term in the set comes before $x_{0}$ .

(5) This becomes: Any property which belongs to $x_{0}$ , and belongs to $x_{n+1}$ provided it belongs to $x_{n}$ , belongs to all the $x$ 's.

(5) This means: Any property that belongs to $x_{0}$ , and belongs to $x_{n+1}$ as long as it belongs to $x_{n}$ , belongs to all the $x$ 's.

This follows from the corresponding property for numbers.

This comes from the same property that applies to numbers.

A series of the form $x_{0},\ x_{1},\ x_{2},\ \dots\ x_{n},\ \dots$ in which there is a first term, a successor to each term (so that there is no last term), no repetitions, and every term can be reached from the start in a finite number of steps, is called a progression. Progressions are of great importance in the principles of mathematics. As we have just seen, every progression verifies Peano's five axioms. It can be proved, conversely, that every series which verifies Peano's five axioms is a progression. Hence these five axioms may be used to define the class of progressions: "progressions" are "those series which verify these five axioms." Any progression may be taken as the basis of pure mathematics: we may give the name "0" to its first term, the name "number" to the whole set of its terms, and the name "successor" to the next in the progression. The progression need not be composed of numbers: it may be [Pg 8] composed of points in space, or moments of time, or any other terms of which there is an infinite supply. Each different progression will give rise to a different interpretation of all the propositions of traditional pure mathematics; all these possible interpretations will be equally true.

A series that looks like this: $x_{0},\ x_{1},\ x_{2},\ \dots\ x_{n},\ \dots$ has a first term, a successor for each term (so there's no final term), no duplicates, and every term can be reached from the start in a finite number of steps is called a progression. Progressions are very important in the fundamentals of mathematics. As we've just seen, every progression satisfies Peano's five axioms. It can also be shown that any series that meets Peano's five axioms is a progression. Therefore, these five axioms can be used to define the class of progressions: "progressions" are "those series that satisfy these five axioms." Any progression can serve as the foundation of pure mathematics: we can label its first term as "0," refer to the full set of its terms as "number," and call the next term in the progression "successor." The progression doesn't have to consist solely of numbers; it can be made up of points in space, moments in time, or any other terms that have an infinite number of possibilities. Each different progression will lead to a unique interpretation of all the propositions found in traditional pure mathematics, and all these interpretations will be equally valid.

In Peano's system there is nothing to enable us to distinguish between these different interpretations of his primitive ideas. It is assumed that we know what is meant by "0," and that we shall not suppose that this symbol means 100 or Cleopatra's Needle or any of the other things that it might mean.

In Peano's system, there's nothing that helps us tell apart these different interpretations of his basic concepts. It assumes that we understand what "0" means, and we won't think that this symbol represents 100, Cleopatra's Needle, or any of the other things it could potentially signify.

This point, that "0" and "number" and "successor" cannot be defined by means of Peano's five axioms, but must be independently understood, is important. We want our numbers not merely to verify mathematical formulæ, but to apply in the right way to common objects. We want to have ten fingers and two eyes and one nose. A system in which "1" meant 100, and "2" meant 101, and so on, might be all right for pure mathematics, but would not suit daily life. We want "0" and "number" and "successor" to have meanings which will give us the right allowance of fingers and eyes and noses. We have already some knowledge (though not sufficiently articulate or analytic) of what we mean by "1" and "2" and so on, and our use of numbers in arithmetic must conform to this knowledge. We cannot secure that this shall be the case by Peano's method; all that we can do, if we adopt his method, is to say "we know what we mean by '0' and 'number' and 'successor,' though we cannot explain what we mean in terms of other simpler concepts." It is quite legitimate to say this when we must, and at some point we all must; but it is the object of mathematical philosophy to put off saying it as long as possible. By the logical theory of arithmetic we are able to put it off for a very long time.

This point, that "0" and "number" and "successor" can't be defined by Peano's five axioms but must be understood on their own, is important. We want our numbers not just to validate mathematical formulas, but to apply correctly to everyday things. We have ten fingers, two eyes, and one nose. A system where "1" meant 100, "2" meant 101, and so on, might work for pure math, but wouldn't work in real life. We want "0," "number," and "successor" to have meanings that match our actual count of fingers, eyes, and noses. We already have some understanding (even if it’s not clearly defined or analytical) of what we mean by "1" and "2" and so on, and our use of numbers in arithmetic has to align with this understanding. We can’t guarantee this will happen using Peano’s method; all we can say if we use his method is "we know what we mean by '0,' 'number,' and 'successor,' even if we can't explain those meanings using simpler concepts." It's completely acceptable to say this when necessary, and at some point, we all have to; however, the goal of mathematical philosophy is to postpone that as long as possible. The logical theory of arithmetic allows us to delay it for a very long time.

It might be suggested that, instead of setting up "0" and "number" and "successor" as terms of which we know the meaning although we cannot define them, we might let them [Pg 9] stand for any three terms that verify Peano's five axioms. They will then no longer be terms which have a meaning that is definite though undefined: they will be "variables," terms concerning which we make certain hypotheses, namely, those stated in the five axioms, but which are otherwise undetermined. If we adopt this plan, our theorems will not be proved concerning an ascertained set of terms called "the natural numbers," but concerning all sets of terms having certain properties. Such a procedure is not fallacious; indeed for certain purposes it represents a valuable generalisation. But from two points of view it fails to give an adequate basis for arithmetic. In the first place, it does not enable us to know whether there are any sets of terms verifying Peano's axioms; it does not even give the faintest suggestion of any way of discovering whether there are such sets. In the second place, as already observed, we want our numbers to be such as can be used for counting common objects, and this requires that our numbers should have a definite meaning, not merely that they should have certain formal properties. This definite meaning is defined by the logical theory of arithmetic. [Pg 10]

It could be proposed that, instead of defining "0," "number," and "successor" as terms we understand but can’t precisely define, we could treat them as [Pg 9] representing any three terms that satisfy Peano's five axioms. They would then shift from being terms with a clear meaning, even if undefined, to being "variables," where we make specific assumptions based on the five axioms, but which remain otherwise unspecified. If we go with this approach, our theorems would not be established based on a known set of terms called "the natural numbers," but rather on all sets of terms that possess certain characteristics. This method is not flawed; in fact, it serves as a valuable generalization for various purposes. However, from two perspectives, it doesn’t provide a solid foundation for arithmetic. Firstly, it doesn’t allow us to determine whether any sets of terms meet Peano's axioms; it fails to even hint at how to find such sets. Secondly, as noted earlier, we want our numbers to be usable for counting real objects, which means they need to have a definite meaning, not just meet certain formal properties. This definite meaning is established by the logical theory of arithmetic. [Pg 10]

CHAPTER II

DEFINITION OF NUMBER

THE question "What is a number?" is one which has been often asked, but has only been correctly answered in our own time. The answer was given by Frege in 1884, in his Grundlagen der Arithmetik.[3] Although this book is quite short, not difficult, and of the very highest importance, it attracted almost no attention, and the definition of number which it contains remained practically unknown until it was rediscovered by the present author in 1901.

THE question "What is a number?" is one that has been asked frequently, but it has only been accurately answered in our time. The answer was provided by Frege in 1884, in his Grundlagen der Arithmetik.[3] Although this book is quite short, easy to read, and extremely important, it drew almost no attention, and the definition of number contained within it remained largely unknown until it was rediscovered by the current author in 1901.

[3]The same answer is given more fully and with more development in his Grundgesetze der Arithmetik, vol. I., 1893.

[3]The same answer is explained in more detail and with more depth in his Fundamentals of Arithmetic, vol. I, 1893.

In seeking a definition of number, the first thing to be clear about is what we may call the grammar of our inquiry. Many philosophers, when attempting to define number, are really setting to work to define plurality, which is quite a different thing. Number is what is characteristic of numbers, as man is what is characteristic of men. A plurality is not an instance of number, but of some particular number. A trio of men, for example, is an instance of the number 3, and the number 3 is an instance of number; but the trio is not an instance of number. This point may seem elementary and scarcely worth mentioning; yet it has proved too subtle for the philosophers, with few exceptions.

In trying to define number, the first thing to clarify is the framework of our inquiry. Many philosophers, when they try to define number, are really aiming to define plurality, which is something entirely different. Number is what defines numbers, just as man defines men. A plurality isn't an example of number but of a specific number. For instance, a group of three men is an example of the number 3, and the number 3 is an example of number; however, the group itself is not an example of number. This idea might seem simple and not worth mentioning, but it has proven too complex for philosophers, with only a few exceptions.

A particular number is not identical with any collection of terms having that number: the number 3 is not identical with [Pg 11] the trio consisting of Brown, Jones, and Robinson. The number 3 is something which all trios have in common, and which distinguishes them from other collections. A number is something that characterises certain collections, namely, those that have that number.

A specific number is not the same as any group of items that has that number: the number 3 is not the same as the group made up of Brown, Jones, and Robinson. The number 3 is something that all groups of three share, and it sets them apart from other groups. A number is something that defines certain groups, specifically those that contain that number.

Instead of speaking of a "collection," we shall as a rule speak of a "class," or sometimes a "set." Other words used in mathematics for the same thing are "aggregate" and "manifold." We shall have much to say later on about classes. For the present, we will say as little as possible. But there are some remarks that must be made immediately.

Instead of talking about a "collection," we'll usually refer to it as a "class," or sometimes a "set." Other terms used in mathematics for the same idea are "aggregate" and "manifold." We'll discuss classes in detail later. For now, we'll keep it brief. However, there are a few points that need to be addressed right away.

A class or collection may be defined in two ways that at first sight seem quite distinct. We may enumerate its members, as when we say, "The collection I mean is Brown, Jones, and Robinson." Or we may mention a defining property, as when we speak of "mankind" or "the inhabitants of London." The definition which enumerates is called a definition by "extension," and the one which mentions a defining property is called a definition by "intension." Of these two kinds of definition, the one by intension is logically more fundamental. This is shown by two considerations: (1) that the extensional definition can always be reduced to an intensional one; (2) that the intensional one often cannot even theoretically be reduced to the extensional one. Each of these points needs a word of explanation.

A class or collection can be defined in two ways that might seem quite different at first. We can list its members, like when we say, "The collection I’m referring to is Brown, Jones, and Robinson." Or we can mention a defining characteristic, as in "mankind" or "the inhabitants of London." The definition that lists members is called a definition by "extension," while the one that mentions a defining characteristic is called a definition by "intension." Between these two types of definitions, the one by intension is logically more fundamental. This is demonstrated by two points: (1) that the extensional definition can always be simplified to an intensional one; (2) that the intensional definition often cannot even theoretically be simplified to the extensional one. Each of these points requires some explanation.

(1) Brown, Jones, and Robinson all of them possess a certain property which is possessed by nothing else in the whole universe, namely, the property of being either Brown or Jones or Robinson. This property can be used to give a definition by intension of the class consisting of Brown and Jones and Robinson. Consider such a formula as " $x$ is Brown or $x$ is Jones or $x$ is Robinson." This formula will be true for just three $x$ 's, namely, Brown and Jones and Robinson. In this respect it resembles a cubic equation with its three roots. It may be taken as assigning a property common to the members of the class consisting of these three [Pg 12] men, and peculiar to them. A similar treatment can obviously be applied to any other class given in extension.

(1) Brown, Jones, and Robinson all have a unique quality that nothing else in the entire universe shares, specifically, the quality of being either Brown, Jones, or Robinson. This quality can be used to define the class made up of Brown, Jones, and Robinson. Consider the formula " $x$ is Brown or $x$ is Jones or $x$ is Robinson." This formula will only be true for the three individuals, Brown, Jones, and Robinson. In this regard, it is similar to a cubic equation with its three solutions. It can be seen as assigning a property that is common to the members of this class and specific only to them. A similar approach can clearly be applied to any other class defined in extension. [Pg 12]

(2) It is obvious that in practice we can often know a great deal about a class without being able to enumerate its members. No one man could actually enumerate all men, or even all the inhabitants of London, yet a great deal is known about each of these classes. This is enough to show that definition by extension is not necessary to knowledge about a class. But when we come to consider infinite classes, we find that enumeration is not even theoretically possible for beings who only live for a finite time. We cannot enumerate all the natural numbers: they are 0, 1, 2, 3, and so on. At some point we must content ourselves with "and so on." We cannot enumerate all fractions or all irrational numbers, or all of any other infinite collection. Thus our knowledge in regard to all such collections can only be derived from a definition by intension.

(2) It's clear that in practice we can often know a lot about a group without being able to list all its members. No one person could actually list every person, or even all the residents of London, yet we know a lot about each of these groups. This is enough to show that defining by listing is not necessary for knowledge about a group. But when we consider infinite groups, we find that listing is not even theoretically possible for beings who only live for a finite time. We cannot list all the natural numbers: they are 0, 1, 2, 3, and so on. At some point, we have to settle for "and so on." We cannot list all fractions or all irrational numbers, or all members of any other infinite collection. Therefore, our understanding of all such collections can only come from a definition based on their properties.

These remarks are relevant, when we are seeking the definition of number, in three different ways. In the first place, numbers themselves form an infinite collection, and cannot therefore be defined by enumeration. In the second place, the collections having a given number of terms themselves presumably form an infinite collection: it is to be presumed, for example, that there are an infinite collection of trios in the world, for if this were not the case the total number of things in the world would be finite, which, though possible, seems unlikely. In the third place, we wish to define "number" in such a way that infinite numbers may be possible; thus we must be able to speak of the number of terms in an infinite collection, and such a collection must be defined by intension, i.e. by a property common to all its members and peculiar to them.

These comments are important when we’re trying to define the concept of number in three different ways. First, numbers themselves form an infinite collection, so they can't be defined just by listing them. Second, collections that have a specific number of elements also likely form an infinite collection: for example, we can assume there are infinite groups of three in the world because if that weren’t true, the total number of things in the world would be finite, which, while possible, seems unlikely. Third, we want to define "number" in a way that allows for infinite numbers to exist; therefore, we need to be able to discuss the number of elements in an infinite collection, and such a collection must be defined by its essence, i.e. by a characteristic that is common to all its members and unique to them.

For many purposes, a class and a defining characteristic of it are practically interchangeable. The vital difference between the two consists in the fact that there is only one class having a given set of members, whereas there are always many different characteristics by which a given class may be defined. Men [Pg 13] may be defined as featherless bipeds, or as rational animals, or (more correctly) by the traits by which Swift delineates the Yahoos. It is this fact that a defining characteristic is never unique which makes classes useful; otherwise we could be content with the properties common and peculiar to their members.[4] Any one of these properties can be used in place of the class whenever uniqueness is not important.

For many purposes, a class and a defining characteristic of it are practically interchangeable. The main difference between the two is that there is only one class with a specific set of members, while there are always numerous different characteristics that can define a given class. Men can be described as featherless bipeds, rational animals, or (more accurately) by the traits Swift outlines for the Yahoos. It’s this fact that a defining characteristic is never unique that makes classes useful; otherwise, we could just focus on the common and unique traits of their members.[Pg 13] Any one of these traits can be used in place of the class whenever uniqueness is not important.[4]

[4]As will be explained later, classes may be regarded as logical fictions, manufactured out of defining characteristics. But for the present it will simplify our exposition to treat classes as if they were real.

[4]As will be explained later, classes can be seen as logical constructs made up of defining features. However, for now, it will make our explanation easier to treat classes as if they were real.

Returning now to the definition of number, it is clear that number is a way of bringing together certain collections, namely, those that have a given number of terms. We can suppose all couples in one bundle, all trios in another, and so on. In this way we obtain various bundles of collections, each bundle consisting of all the collections that have a certain number of terms. Each bundle is a class whose members are collections, i.e. classes; thus each is a class of classes. The bundle consisting of all couples, for example, is a class of classes: each couple is a class with two members, and the whole bundle of couples is a class with an infinite number of members, each of which is a class of two members.

Returning now to the definition of number, it's clear that a number is a way to group certain collections, specifically, those that have a specific number of terms. We can group all pairs in one bundle, all sets of three in another, and so on. This way, we create various bundles of collections, each consisting of all the collections that have a certain number of terms. Each bundle is a class made up of collections, i.e. classes; so each is a class of classes. The bundle containing all pairs, for example, is a class of classes: each pair is a class with two members, and the entire bundle of pairs is a class with an infinite number of members, each of which is a class of two members.

How shall we decide whether two collections are to belong to the same bundle? The answer that suggests itself is: "Find out how many members each has, and put them in the same bundle if they have the same number of members." But this presupposes that we have defined numbers, and that we know how to discover how many terms a collection has. We are so used to the operation of counting that such a presupposition might easily pass unnoticed. In fact, however, counting, though familiar, is logically a very complex operation; moreover it is only available, as a means of discovering how many terms a collection has, when the collection is finite. Our definition of number must not assume in advance that all numbers are finite; and we cannot in any case, without a vicious circle, [Pg 14] use counting to define numbers, because numbers are used in counting. We need, therefore, some other method of deciding when two collections have the same number of terms.

How do we determine if two collections belong in the same group? The obvious answer is: "Check how many items each has, and group them together if they have the same number." But this assumes that we have defined numbers and know how to count the items in a collection. We’re so accustomed to counting that this assumption may go unnoticed. However, counting, while familiar, is actually a very complex process; it can only be used to find out how many items a collection has when the collection is finite. Our definition of number shouldn’t assume that all numbers are finite; furthermore, we can’t use counting to define numbers without falling into a circular argument, because we use numbers in counting. Therefore, we need a different method to determine when two collections have the same number of items.

In actual fact, it is simpler logically to find out whether two collections have the same number of terms than it is to define what that number is. An illustration will make this clear. If there were no polygamy or polyandry anywhere in the world, it is clear that the number of husbands living at any moment would be exactly the same as the number of wives. We do not need a census to assure us of this, nor do we need to know what is the actual number of husbands and of wives. We know the number must be the same in both collections, because each husband has one wife and each wife has one husband. The relation of husband and wife is what is called "one-one."

In reality, it's easier logically to determine if two groups have the same number of elements than to define what that number actually is. A simple example will clarify this. If there were no polygamy or polyandry in the world, it would be obvious that the number of husbands at any given time would be exactly equal to the number of wives. We don’t need a census to confirm this, nor do we need to know the exact number of husbands and wives. It’s clear that the number must be the same in both groups because each husband has one wife and each wife has one husband. The relationship between husband and wife is what’s known as "one-to-one."

A relation is said to be "one-one" when, if $x$ has the relation in question to $y$ , no other term $x$ ' has the same relation to $y$ , and $x$ does not have the same relation to any term $y$ ' other than $y$ . When only the first of these two conditions is fulfilled, the relation is called "one-many"; when only the second is fulfilled, it is called "many-one." It should be observed that the number 1 is not used in these definitions.

A relation is called "one-to-one" when, if $x$ has the relation in question to $y$ , no other term $x$ has the same relation to $y$ , and $x$ does not have the same relation to any term $y$ ' other than $y$ . When only the first of these two conditions is met, the relation is called "one-to-many"; when only the second is met, it is called "many-to-one." It should be noted that the number 1 is not used in these definitions.

In Christian countries, the relation of husband to wife is one-one; in Mahometan countries it is one-many; in Tibet it is many-one. The relation of father to son is one-many; that of son to father is many-one, but that of eldest son to father is one-one. If $n$ is any number, the relation of $n$ to $n + 1$ is one-one; so is the relation of $n$ to $2n$ or to $3n$ . When we are considering only positive numbers, the relation of $n$ to $n^{2}$ is one-one; but when negative numbers are admitted, it becomes two-one, since $n$ and $-n$ have the same square. These instances should suffice to make clear the notions of one-one, one-many, and many-one relations, which play a great part in the principles of mathematics, not only in relation to the definition of numbers, but in many other connections.

In Christian countries, the relationship between husband and wife is one-to-one; in Muslim countries, it's one-to-many; in Tibet, it's many-to-one. The relationship of father to son is one-to-many, while the relationship of son to father is many-to-one, but the relationship of the eldest son to his father is one-to-one. If $n$ represents any number, the relationship of $n$ to $n + 1$ is one-to-one; the same goes for the relationship of $n$ to $2n$ or $3n$ . When we only consider positive numbers, the relationship of $n$ to $n^{2}$ is one-to-one; but when negative numbers are included, it changes to two-to-one because $n$ and $-n$ have the same square. These examples should be enough to clarify the concepts of one-to-one, one-to-many, and many-to-one relationships, which are important in the principles of mathematics, not only in defining numbers but in many other contexts as well.

Two classes are said to be "similar" when there is a one-one [Pg 15] relation which correlates the terms of the one class each with one term of the other class, in the same manner in which the relation of marriage correlates husbands with wives. A few preliminary definitions will help us to state this definition more precisely. The class of those terms that have a given relation to something or other is called the domain of that relation: thus fathers are the domain of the relation of father to child, husbands are the domain of the relation of husband to wife, wives are the domain of the relation of wife to husband, and husbands and wives together are the domain of the relation of marriage. The relation of wife to husband is called the converse of the relation of husband to wife. Similarly less is the converse of greater, later is the converse of earlier, and so on. Generally, the converse of a given relation is that relation which holds between $y$ and $x$ whenever the given relation holds between $x$ and $y$ . The converse domain of a relation is the domain of its converse: thus the class of wives is the converse domain of the relation of husband to wife. We may now state our definition of similarity as follows:—

Two classes are considered "similar" when there is a one-to-one relationship that pairs each term from one class with a term from the other class, similar to how marriage connects husbands with wives. A few preliminary definitions will help us clarify this definition. The group of terms that have a particular relationship with something is called the domain of that relationship: so, fathers form the domain of the father-child relationship, husbands form the domain of the husband-wife relationship, wives form the domain of the wife-husband relationship, and husbands and wives together make up the domain of the marriage relationship. The relationship of wife to husband is known as the converse of the husband-wife relationship. Likewise, less is the converse of greater, later is the converse of earlier, and so forth. Generally, the converse of a given relationship is the one that exists between $y$ and $x$ whenever the original relationship holds between $x$ and $y$ . The converse domain of a relation is the domain of its converse: thus, the class of wives is the converse domain of the husband-wife relationship. We can now define similarity as follows:—

One class is said to be "similar" to another when there is a one-one relation of which the one class is the domain, while the other is the converse domain.

One class is considered "similar" to another when there's a one-to-one relation where one class is the domain and the other is the converse domain.

It is easy to prove (1) that every class is similar to itself, (2) that if a class $\alpha$ is similar to a class $\beta$ , then $\beta$ is similar to $\alpha$ , (3) that if $\alpha$ is similar to $\beta$ and $\beta$ to $\gamma$ , then $\alpha$ is similar to $\gamma$ . A relation is said to be reflexive when it possesses the first of these properties, symmetrical when it possesses the second, and transitive when it possesses the third. It is obvious that a relation which is symmetrical and transitive must be reflexive throughout its domain. Relations which possess these properties are an important kind, and it is worth while to note that similarity is one of this kind of relations.

It's easy to show (1) that every class is similar to itself, (2) that if a class $\alpha$ is similar to a class $\beta$ , then $\beta$ is similar to $\alpha$ , (3) that if $\alpha$ is similar to $\beta$ and $\beta$ is similar to $\gamma$ , then $\alpha$ is similar to $\gamma$ . A relation is called reflexive when it has the first property, symmetrical when it has the second, and transitive when it has the third. It's clear that a relation which is symmetrical and transitive must also be reflexive across its entire domain. Relations that have these properties are significant, and it's important to note that similarity is one of these types of relations.

It is obvious to common sense that two finite classes have the same number of terms if they are similar, but not otherwise. The act of counting consists in establishing a one-one correlation [Pg 16] between the set of objects counted and the natural numbers (excluding 0) that are used up in the process. Accordingly common sense concludes that there are as many objects in the set to be counted as there are numbers up to the last number used in the counting. And we also know that, so long as we confine ourselves to finite numbers, there are just $n$ numbers from 1 up to $n$ . Hence it follows that the last number used in counting a collection is the number of terms in the collection, provided the collection is finite. But this result, besides being only applicable to finite collections, depends upon and assumes the fact that two classes which are similar have the same number of terms; for what we do when we count (say) 10 objects is to show that the set of these objects is similar to the set of numbers 1 to 10. The notion of similarity is logically presupposed in the operation of counting, and is logically simpler though less familiar. In counting, it is necessary to take the objects counted in a certain order, as first, second, third, etc., but order is not of the essence of number: it is an irrelevant addition, an unnecessary complication from the logical point of view. The notion of similarity does not demand an order: for example, we saw that the number of husbands is the same as the number of wives, without having to establish an order of precedence among them. The notion of similarity also does not require that the classes which are similar should be finite. Take, for example, the natural numbers (excluding 0) on the one hand, and the fractions which have 1 for their numerator on the other hand: it is obvious that we can correlate 2 with $\dfrac{1}{2}$ , 3 with $\dfrac{1}{3}$ , and so on, thus proving that the two classes are similar.

It's clear to everyone that two finite sets have the same number of items if they're similar, but not otherwise. Counting involves creating a one-to-one match between the items being counted and the natural numbers (excluding 0) that are used in the process. Therefore, common sense leads us to conclude that there are as many items in the set as there are numbers up to the last number counted. We also know that, as long as we stick to finite numbers, there are exactly $n$ numbers from 1 to $n$ . Thus, the last number counted in a collection represents how many items are in that collection, provided the collection is finite. However, this conclusion applies only to finite collections and relies on the understanding that two similar sets have the same number of items; for when we count (let's say) 10 objects, we're showing that the set of these objects is similar to the set of numbers from 1 to 10. The idea of similarity is logically assumed in the counting process and is logically simpler, though less familiar. When counting, it’s essential to arrange the items in a specific order, like first, second, third, etc., but order isn’t essential to the concept of number: it's an unnecessary addition and complicates things from a logical standpoint. The concept of similarity doesn't require an order; for instance, we can see that the number of husbands is the same as the number of wives without needing to set any precedence among them. Similarity also doesn’t require that the similar sets be finite. For example, consider the natural numbers (excluding 0) on one side and the fractions with 1 as the numerator on the other side: it’s clear that we can pair 2 with $\dfrac{1}{2}$ , 3 with $\dfrac{1}{3}$ , and so forth, thus demonstrating that the two sets are similar.

We may thus use the notion of "similarity" to decide when two collections are to belong to the same bundle, in the sense in which we were asking this question earlier in this chapter. We want to make one bundle containing the class that has no members: this will be for the number 0. Then we want a bundle of all the classes that have one member: this will be for the number 1. Then, for the number 2, we want a bundle consisting [Pg 17] of all couples; then one of all trios; and so on. Given any collection, we can define the bundle it is to belong to as being the class of all those collections that are "similar" to it. It is very easy to see that if (for example) a collection has three members, the class of all those collections that are similar to it will be the class of trios. And whatever number of terms a collection may have, those collections that are "similar" to it will have the same number of terms. We may take this as a definition of "having the same number of terms." It is obvious that it gives results conformable to usage so long as we confine ourselves to finite collections.

We can use the idea of "similarity" to determine when two collections belong to the same group, like we discussed earlier in this chapter. We want to create one group for the empty collection, which will represent the number 0. Then, we want a group for all collections with one member, representing the number 1. For the number 2, we want a group that includes all pairs; then one for all threes; and so on. For any collection, we can define the group it belongs to as the class of all those collections that are "similar" to it. It’s easy to see that if, for example, a collection has three members, the class of all collections similar to it will be the class of threes. No matter how many items a collection has, those collections that are "similar" to it will have the same number of items. We can take this as a definition of "having the same number of items." It’s clear that it produces results consistent with common usage, as long as we stick to finite collections. [Pg 17]

So far we have not suggested anything in the slightest degree paradoxical. But when we come to the actual definition of numbers we cannot avoid what must at first sight seem a paradox, though this impression will soon wear off. We naturally think that the class of couples (for example) is something different from the number 2. But there is no doubt about the class of couples: it is indubitable and not difficult to define, whereas the number 2, in any other sense, is a metaphysical entity about which we can never feel sure that it exists or that we have tracked it down. It is therefore more prudent to content ourselves with the class of couples, which we are sure of, than to hunt for a problematical number 2 which must always remain elusive. Accordingly we set up the following definition:—

So far, we haven’t suggested anything that seems even slightly contradictory. However, when we get to defining numbers, we can't avoid something that might initially come off as a contradiction, even though that feeling will quickly fade. We typically think of the class of pairs (for example) as being different from the number 2. But there’s no doubt about the class of pairs: it’s clear and easy to define, while the number 2, in any other context, is a philosophical idea that we can never be sure really exists or that we’ve identified correctly. It’s therefore wiser to rely on the class of pairs, which we know exists, rather than search for a questionable number 2 that will always remain elusive. So, we propose the following definition:—

The number of a class is the class of all those classes that are similar to it.

The number of a class is the set of all classes that are similar to it.

Thus the number of a couple will be the class of all couples. In fact, the class of all couples will be the number 2, according to our definition. At the expense of a little oddity, this definition secures definiteness and indubitableness; and it is not difficult to prove that numbers so defined have all the properties that we expect numbers to have.

Thus, the number of a couple will be the set of all couples. In fact, the set of all couples will be the number 2, according to our definition. Although this definition is a bit unusual, it ensures clarity and certainty; and it’s not hard to prove that numbers defined this way have all the properties we expect numbers to have.

We may now go on to define numbers in general as any one of the bundles into which similarity collects classes. A number will be a set of classes such as that any two are similar to each [Pg 18] other, and none outside the set are similar to any inside the set. In other words, a number (in general) is any collection which is the number of one of its members; or, more simply still:

We can now define numbers as any of the groups formed by similarity that gather classes together. A number is a collection of classes where any two members are similar to each other, and no member outside the collection is similar to any member inside it. In simpler terms, a number is any collection that represents the count of one of its members; or, even more simply: [Pg 18]

A number is anything which is the number of some class.

A number is anything that represents the quantity of a certain category.

Such a definition has a verbal appearance of being circular, but in fact it is not. We define "the number of a given class" without using the notion of number in general; therefore we may define number in general in terms of "the number of a given class" without committing any logical error.

Such a definition might seem circular at first glance, but it really isn't. We define "the number of a specific class" without using the overall concept of number; thus, we can define number in general based on "the number of a specific class" without making any logical mistakes.

Definitions of this sort are in fact very common. The class of fathers, for example, would have to be defined by first defining what it is to be the father of somebody; then the class of fathers will be all those who are somebody's father. Similarly if we want to define square numbers (say), we must first define what we mean by saying that one number is the square of another, and then define square numbers as those that are the squares of other numbers. This kind of procedure is very common, and it is important to realise that it is legitimate and even often necessary.

Definitions like this are actually quite common. The group of fathers, for instance, would need to be defined by first explaining what it means to be a father to someone; then the group of fathers will include everyone who is someone's father. Likewise, if we want to define square numbers, we first need to explain what we mean when we say one number is the square of another, and then we define square numbers as those that are squares of other numbers. This approach is very common, and it’s important to understand that it is both legitimate and often essential.

We have now given a definition of numbers which will serve for finite collections. It remains to be seen how it will serve for infinite collections. But first we must decide what we mean by "finite" and "infinite," which cannot be done within the limits of the present chapter. [Pg 19]

We have now defined numbers in a way that applies to finite collections. Next, we need to see how this definition applies to infinite collections. But first, we have to clarify what we mean by "finite" and "infinite," which we can't do in this chapter. [Pg 19]

CHAPTER III

FINITUDE AND MATHEMATICAL INDUCTION

THE series of natural numbers, as we saw in Chapter I., can all be defined if we know what we mean by the three terms "0," "number," and "successor." But we may go a step farther: we can define all the natural numbers if we know what we mean by "0" and "successor." It will help us to understand the difference between finite and infinite to see how this can be done, and why the method by which it is done cannot be extended beyond the finite. We will not yet consider how "0" and "successor" are to be defined: we will for the moment assume that we know what these terms mean, and show how thence all other natural numbers can be obtained.

THE series of natural numbers, as we saw in Chapter I, can all be defined if we understand what we mean by the three terms "0," "number," and "successor." But we can take it a step further: we can define all the natural numbers if we only understand "0" and "successor." It will help us grasp the difference between finite and infinite to see how this works, and why the method we use cannot be applied beyond the finite. We won’t dive into how "0" and "successor" are defined just yet; for now, we’ll assume we know what these terms mean and show how we can derive all the other natural numbers from them.

It is easy to see that we can reach any assigned number, say 30,000. We first define "1" as "the successor of 0," then we define "2" as "the successor of 1," and so on. In the case of an assigned number, such as 30,000, the proof that we can reach it by proceeding step by step in this fashion may be made, if we have the patience, by actual experiment: we can go on until we actually arrive at 30,000. But although the method of experiment is available for each particular natural number, it is not available for proving the general proposition that all such numbers can be reached in this way, i.e. by proceeding from 0 step by step from each number to its successor. Is there any other way by which this can be proved?

It’s clear that we can reach any given number, like 30,000. We start by defining "1" as "the successor of 0," then define "2" as "the successor of 1," and so on. For a specific number like 30,000, we can prove we can get there step by step through a practical test: we can keep counting until we actually hit 30,000. However, while this experimental method works for each individual natural number, it doesn’t help prove the broader claim that all such numbers can be reached this way, i.e. by starting from 0 and moving step by step to each number's successor. Is there another way to prove this?

Let us consider the question the other way round. What are the numbers that can be reached, given the terms "0" and [Pg 20] "successor"? Is there any way by which we can define the whole class of such numbers? We reach 1, as the successor of 0; 2, as the successor of 1; 3, as the successor of 2; and so on. It is this "and so on" that we wish to replace by something less vague and indefinite. We might be tempted to say that "and so on" means that the process of proceeding to the successor may be repeated any finite number of times; but the problem upon which we are engaged is the problem of defining "finite number," and therefore we must not use this notion in our definition. Our definition must not assume that we know what a finite number is.

Let’s look at the question from a different angle. What are the numbers we can generate using "0" and "successor"? Can we define the entire set of such numbers? We start with 1 as the successor of 0; then 2 as the successor of 1; 3 as the successor of 2; and so on. It’s this "and so on" that we want to replace with something clearer and more specific. We might think that "and so on" suggests that the process of moving to the successor can be repeated any finite number of times; however, the issue we’re tackling is defining "finite number," so we can’t use that concept in our definition. Our definition must not assume we understand what a finite number is.

The key to our problem lies in mathematical induction. It will be remembered that, in Chapter I., this was the fifth of the five primitive propositions which we laid down about the natural numbers. It stated that any property which belongs to 0, and to the successor of any number which has the property, belongs to all the natural numbers. This was then presented as a principle, but we shall now adopt it as a definition. It is not difficult to see that the terms obeying it are the same as the numbers that can be reached from 0 by successive steps from next to next, but as the point is important we will set forth the matter in some detail.

The key to our problem lies in mathematical induction. As we mentioned in Chapter I, this was the fifth of the five basic principles we established about the natural numbers. It stated that any property that applies to 0, and to the successor of any number that has that property, applies to all natural numbers. Initially, we presented this as a principle, but we will now treat it as a definition. It’s easy to see that the terms following this definition are the same as the numbers that can be reached from 0 by taking successive steps one after another. However, since this point is important, we will explain it in more detail.

We shall do well to begin with some definitions, which will be useful in other connections also.

We should start with some definitions, which will also be helpful in other contexts.

A property is said to be "hereditary" in the natural-number series if, whenever it belongs to a number $n$ , it also belongs to $n + 1$ , the successor of $n$ . Similarly a class is said to be "hereditary" if, whenever $n$ is a member of the class, so is $n + 1$ . It is easy to see, though we are not yet supposed to know, that to say a property is hereditary is equivalent to saying that it belongs to all the natural numbers not less than some one of them, e.g. it must belong to all that are not less than 100, or all that are less than 1000, or it may be that it belongs to all that are not less than 0, i.e. to all without exception.

A property is considered "hereditary" in the natural number series if it belongs to a number n, and it also belongs to n + 1, which is the next number after n. Similarly, a class is described as "hereditary" if whenever n is a member of the class, n + 1 is also a member. It's clear to see, even though we aren't meant to understand it fully yet, that saying a property is hereditary means it applies to all natural numbers starting from a certain point, for example, it must apply to all numbers from 100 onward, or all numbers from 1000 onward, or it could even apply to all numbers starting from 0, meaning it applies to all numbers without exception.

A property is said to be "inductive" when it is a hereditary [Pg 21] property which belongs to 0. Similarly a class is "inductive" when it is a hereditary class of which 0 is a member.

A property is called "inductive" when it is a hereditary property that belongs to 0. In the same way, a class is "inductive" when it is a hereditary class that includes 0. [Pg 21]

Given a hereditary class of which 0 is a member, it follows that 1 is a member of it, because a hereditary class contains the successors of its members, and 1 is the successor of 0. Similarly, given a hereditary class of which 1 is a member, it follows that 2 is a member of it; and so on. Thus we can prove by a step-by-step procedure that any assigned natural number, say 30,000, is a member of every inductive class.

Given a hereditary class that includes 0, it follows that 1 is also a member of it, because a hereditary class includes the successors of its members, and 1 is the successor of 0. In the same way, if a hereditary class includes 1, then 2 must be a member as well; and this continues indefinitely. Therefore, we can demonstrate through a step-by-step process that any specified natural number, like 30,000, belongs to every inductive class.

We will define the "posterity" of a given natural number with respect to the relation "immediate predecessor" (which is the converse of "successor") as all those terms that belong to every hereditary class to which the given number belongs. It is again easy to see that the posterity of a natural number consists of itself and all greater natural numbers; but this also we do not yet officially know.

We will define the "posterity" of a given natural number in relation to the term "immediate predecessor" (which is the opposite of "successor") as all those terms that are part of every hereditary class that the given number belongs to. It’s also clear to see that the posterity of a natural number includes itself and all greater natural numbers; however, we haven't officially established this yet.

By the above definitions, the posterity of 0 will consist of those terms which belong to every inductive class.

By the above definitions, the descendants of 0 will include those terms that belong to every inductive class.

It is now not difficult to make it obvious that the posterity of 0 is the same set as those terms that can be reached from 0 by successive steps from next to next. For, in the first place, 0 belongs to both these sets (in the sense in which we have defined our terms); in the second place, if $n$ belongs to both sets, so does $n + 1$ . It is to be observed that we are dealing here with the kind of matter that does not admit of precise proof, namely, the comparison of a relatively vague idea with a relatively precise one. The notion of "those terms that can be reached from 0 by successive steps from next to next" is vague, though it seems as if it conveyed a definite meaning; on the other hand, "the posterity of 0" is precise and explicit just where the other idea is hazy. It may be taken as giving what we meant to mean when we spoke of the terms that can be reached from 0 by successive steps.

It’s now clear that the descendants of 0 are the same group as those terms that can be reached from 0 by taking steps of two at a time. First, 0 is included in both of these groups (in the way we’ve defined our terms); second, if $n$ is in both groups, then $n + 1$ is too. It’s important to note that we are discussing a matter that doesn’t lend itself to precise proof, specifically the comparison of a somewhat vague idea with a more defined one. The idea of "those terms that can be reached from 0 by successive steps from next to next" is vague, even though it seems to have a clear meaning; on the flip side, "the descendants of 0" is clear and explicit where the other idea is fuzzy. It can be understood as what we intended to convey when we mentioned the terms that can be reached from 0 by taking steps of two.

We now lay down the following definition:—

We now present the following definition:—

The "natural numbers" are the posterity of 0 with respect to the [Pg 22] relation "immediate predecessor" (which is the converse of "successor").

The "natural numbers" are the descendants of 0 in relation to the [Pg 22] term "immediate predecessor" (which is the opposite of "successor").

We have thus arrived at a definition of one of Peano's three primitive ideas in terms of the other two. As a result of this definition, two of his primitive propositions—namely, the one asserting that 0 is a number and the one asserting mathematical induction—become unnecessary, since they result from the definition. The one asserting that the successor of a natural number is a natural number is only needed in the weakened form "every natural number has a successor."

We have now reached a definition of one of Peano's three basic concepts through the other two. Because of this definition, two of his basic propositions—specifically, the assertion that 0 is a number and the assertion of mathematical induction—are no longer needed, as they follow from the definition. The statement that the successor of a natural number is a natural number is only required in the simplified form "every natural number has a successor."

We can, of course, easily define "0" and "successor" by means of the definition of number in general which we arrived at in Chapter II. The number 0 is the number of terms in a class which has no members, i.e. in the class which is called the "null-class." By the general definition of number, the number of terms in the null-class is the set of all classes similar to the null-class, i.e. (as is easily proved) the set consisting of the null-class all alone, i.e. the class whose only member is the null-class. (This is not identical with the null-class: it has one member, namely, the null-class, whereas the null-class itself has no members. A class which has one member is never identical with that one member, as we shall explain when we come to the theory of classes.) Thus we have the following purely logical definition:—

We can easily define "0" and "successor" based on the definition of numbers we discussed in Chapter II. The number 0 refers to the number of items in a class that has no members, meaning the class known as the "null-class." According to the general definition of numbers, the number of items in the null-class is the collection of all classes that are similar to the null-class, which, as can be easily demonstrated, is just the null-class itself, meaning the class that has only the null-class as its member. (This is not the same as the null-class: it has one member, which is the null-class, while the null-class itself has no members. A class with one member is never the same as that member, as we will explain when we delve into the theory of classes.) So, we have the following logical definition:—

0 is the class whose only member is the null-class.

0 is the class that has only one member, which is the empty class.

It remains to define "successor." Given any number $n$ , let $\alpha$ be a class which has $n$ members, and let $x$ be a term which is not a member of $\alpha$ . Then the class consisting of $\alpha$ with $x$ added on will have $n + 1$ members. Thus we have the following definition:—

It’s time to define "successor." Given any number $n$ , let $\alpha$ be a class that has $n$ members, and let $x$ be a term that is not a member of $\alpha$ along with $x$ will have $n + 1$ members. So we have the following definition:—

The successor of the number of terms in the class $\alpha$ is the number of terms in the class consisting of a together with $x$ , where $x$ is any term not belonging to the class.

The successor of the number of terms in the class $\alpha$ is the number of terms in the class that includes a along with $x$ , where $x$ is any term that is not part of the class.

Certain niceties are required to make this definition perfect, but they need not concern us.[5] It will be remembered that we [Pg 23] have already given (in Chapter II.) a logical definition of the number of terms in a class, namely, we defined it as the set of all classes that are similar to the given class.

Certain details are needed to make this definition complete, but they aren't our focus.[5] It should be noted that we [Pg 23] have already provided (in Chapter II.) a logical definition of the number of terms in a class, which we defined as the collection of all classes that are similar to the given class.

[5]See Principia Mathematica, vol. II. * 110.

We have thus reduced Peano's three primitive ideas to ideas of logic: we have given definitions of them which make them definite, no longer capable of an infinity of different meanings, as they were when they were only determinate to the extent of obeying Peano's five axioms. We have removed them from the fundamental apparatus of terms that must be merely apprehended, and have thus increased the deductive articulation of mathematics.

We have therefore simplified Peano's three basic concepts into logical ideas: we’ve defined them in a way that clarifies their meaning, eliminating the many different interpretations they had when they were only somewhat aligned with Peano's five axioms. We have taken them out of the basic toolkit of terms that can only be understood at a surface level, and in doing so, we have enhanced the deductive structure of mathematics.

As regards the five primitive propositions, we have already succeeded in making two of them demonstrable by our definition of "natural number." How stands it with the remaining three? It is very easy to prove that 0 is not the successor of any number, and that the successor of any number is a number. But there is a difficulty about the remaining primitive proposition, namely, "no two numbers have the same successor." The difficulty does not arise unless the total number of individuals in the universe is finite; for given two numbers $m$ and $n$ , neither of which is the total number of individuals in the universe, it is easy to prove that we cannot have $m + 1 = n + 1$ unless we have $m = n$ . But let us suppose that the total number of individuals in the universe were (say) 10; then there would be no class of 11 individuals, and the number 11 would be the null-class. So would the number 12. Thus we should have 11 = 12; therefore the successor of 10 would be the same as the successor of 11, although 10 would not be the same as 11. Thus we should have two different numbers with the same successor. This failure of the third axiom cannot arise, however, if the number of individuals in the world is not finite. We shall return to this topic at a later stage.[6]

As for the five basic propositions, we've already managed to demonstrate two of them using our definition of "natural number." What about the other three? It’s straightforward to show that 0 is not the successor of any number, and that the successor of any number is a number. However, there's a complication with the remaining basic proposition, which states, "no two numbers have the same successor." This issue only comes up if the total number of individuals in the universe is finite; for two numbers $m$ and $n$ , neither of which represents the total number of individuals in the universe, it’s easy to prove that we cannot have $m + 1 = n + 1$ unless we have $m = n$ . But let’s assume that the total number of individuals in the universe was, say, 10; then there wouldn't be a group of 11 individuals, making the number 11 a null class. The same goes for the number 12. Hence, we would end up with 11 = 12, meaning the successor of 10 would be the same as the successor of 11, even though 10 is not the same as 11. This would lead to two different numbers sharing the same successor. However, this issue with the third axiom can't happen if the number of individuals in the world is not finite. We'll come back to this topic later.[6]

[6]See Chapter XIII.

See Chapter 13.

Assuming that the number of individuals in the universe is not finite, we have now succeeded not only in defining Peano's [Pg 24] three primitive ideas, but in seeing how to prove his five primitive propositions, by means of primitive ideas and propositions belonging to logic. It follows that all pure mathematics, in so far as it is deducible from the theory of the natural numbers, is only a prolongation of logic. The extension of this result to those modern branches of mathematics which are not deducible from the theory of the natural numbers offers no difficulty of principle, as we have shown elsewhere.[7]

Assuming that the number of people in the universe is not finite, we have not only defined Peano's three basic concepts but also figured out how to prove his five basic propositions using concepts and propositions that belong to logic. This means that all pure mathematics, as long as it can be derived from the theory of natural numbers, is just an extension of logic. Extending this result to modern areas of mathematics that aren't derived from the theory of natural numbers poses no fundamental challenges, as we have demonstrated elsewhere.[7]

[7]For geometry, in so far as it is not purely analytical, see Principles of Mathematics, part VI.; for rational dynamics, ibid., part VII.

[7]For geometry, as long as it's not just analytical, check out Principles of Mathematics, part VI.; for rational dynamics, ibid., part VII.

The process of mathematical induction, by means of which we defined the natural numbers, is capable of generalisation. We defined the natural numbers as the "posterity" of 0 with respect to the relation of a number to its immediate successor. If we call this relation $\mathrm N$ , any number $m$ will have this relation to $m + 1$ . A property is "hereditary with respect to $\mathrm N$ ," or simply " $\mathrm N$ -hereditary," if, whenever the property belongs to a number $m$ , it also belongs to $m + 1$ , i.e. to the number to which $m$ has the relation $\mathrm N$ . And a number $n$ will be said to belong to the "posterity" of $m$ with respect to the relation $\mathrm N$ if $n$ has every $\mathrm N$ -hereditary property belonging to $m$ . These definitions can all be applied to any other relation just as well as to $\mathrm N$ . Thus if $\mathrm R$ is any relation whatever, we can lay down the following definitions:[8]—

The method of mathematical induction, which we used to define natural numbers, can be generalized. We defined natural numbers as the "descendants" of 0 based on the relationship between a number and its immediate successor. If we denote this relationship as $\mathrm N$ , then any number $m$ will relate to $m + 1$ in this way. A property is "hereditary with respect to $\mathrm N$ ," or simply " $\mathrm N$ -hereditary," if whenever the property holds for a number $m$ , it also holds for $m + 1$ , meaning the number to which $m$ has the relationship $\mathrm N$ . A number $n$ will be considered part of the "descendants" of $m$ based on the relationship $\mathrm N$ if $n$ has every $\mathrm N$ -hereditary property belonging to $m$ . These definitions can apply to any other relationship just as well as to $\mathrm N$ . Therefore, if $\mathrm R$ is any relationship, we can establish the following definitions:[8]—

[8]These definitions, and the generalised theory of induction, are due to Frege, and were published so long ago as 1879 in his Begriffsschrift. In spite of the great value of this work, I was, I believe, the first person who ever read it—more than twenty years after its publication.

[8]These definitions and the broader theory of induction come from Frege, published as far back as 1879 in his Begriffsschrift. Despite the significant importance of this work, I think I was the first person to actually read it—over twenty years after it was published.

A property is called " $\mathrm R$ -hereditary" when, if it belongs to a term $x$ , and $x$ has the relation $\mathrm R$ to $y$ , then it belongs to $y$ .

A property is called " $\mathrm R$ -hereditary" when, if it belongs to a term $x$ and $x$ has the relation $\mathrm R$ to $y$ , then it belongs to $y$ .

A class is $\mathrm R$ -hereditary when its defining property is $\mathrm R$ -hereditary.

A term $x$ is said to be an " $\mathrm R$ -ancestor" of the term $y$ if $y$ has every $\mathrm R$ -hereditary property that $x$ has, provided $x$ is a term which has the relation $\mathrm R$ to something or to which something has the relation $\mathrm R$ . (This is only to exclude trivial cases.) [Pg 25]

A term $x$ is called an " $\mathrm R$ -ancestor" of the term $y$ if $y$ has every $\mathrm R$ -hereditary property that $x$ has, as long as $x$ is a term that has the relation $\mathrm R$ to something or to which something has the relation $\mathrm R$ . (This is just to exclude trivial cases.) [Pg 25]

The " $\mathrm R$ -posterity" of $x$ is all the terms of which $x$ is an $\mathrm R$ -ancestor.

The " $\mathrm R$ -posterity" of $x$ includes all the terms for which $x$ is a $\mathrm R$ -ancestor.

We have framed the above definitions so that if a term is the ancestor of anything it is its own ancestor and belongs to its own posterity. This is merely for convenience.

We have defined the above terms in such a way that if a term is the ancestor of anything, it is also its own ancestor and is part of its own descendants. This is just for convenience.

It will be observed that if we take for $\mathrm R$ the relation "parent," "ancestor" and "posterity" will have the usual meanings, except that a person will be included among his own ancestors and posterity. It is, of course, obvious at once that "ancestor" must be capable of definition in terms of "parent," but until Frege developed his generalised theory of induction, no one could have defined "ancestor" precisely in terms of "parent." A brief consideration of this point will serve to show the importance of the theory. A person confronted for the first time with the problem of defining "ancestor" in terms of "parent" would naturally say that $\mathrm A$ is an ancestor of $\mathrm Z$ if, between $\mathrm A$ and $\mathrm Z$ , there are a certain number of people, $\mathrm B$ , $\mathrm C$ , ..., of whom $\mathrm B$ is a child of $\mathrm A$ , each is a parent of the next, until the last, who is a parent of $\mathrm Z$ . But this definition is not adequate unless we add that the number of intermediate terms is to be finite. Take, for example, such a series as the following:— $1,\ -\tfrac{1}{2},\ -\tfrac{1}{4},\ -\tfrac{1}{8},\ \dots\ \tfrac{1}{8},\ \tfrac{1}{4},\ \tfrac{1}{2},\ 1.$ Here we have first a series of negative fractions with no end, and then a series of positive fractions with no beginning. Shall we say that, in this series, $-\frac{1}{8}$ is an ancestor of $\frac{1}{8}$ ? It will be so according to the beginner's definition suggested above, but it will not be so according to any definition which will give the kind of idea that we wish to define. For this purpose, it is essential that the number of intermediaries should be finite. But, as we saw, "finite" is to be defined by means of mathematical induction, and it is simpler to define the ancestral relation generally at once than to define it first only for the case of the relation of $n$ to $n + 1$ , and then extend it to other cases. Here, as constantly elsewhere, generality from the first, though it may [Pg 26] require more thought at the start, will be found in the long run to economise thought and increase logical power.

It can be noted that if we consider $\mathrm R$ as the relationship "parent," then "ancestor" and "descendant" will have their usual meanings, except that a person will be counted among their own ancestors and descendants. It's clear that "ancestor" can be defined in terms of "parent," but before Frege developed his generalized theory of induction, nobody could define "ancestor" accurately in relation to "parent." A brief exploration of this issue will highlight the significance of the theory. When faced with the task of defining "ancestor" in terms of "parent," one would naturally assert that $\mathrm A$ is an ancestor of $\mathrm Z$ if there are several people, $\mathrm B$ , $\mathrm C$ , ..., between $\mathrm A$ and $\mathrm Z$ where $\mathrm B$ is a child of $\mathrm A$ , and each is a parent of the next until the last, who is a parent of $\mathrm Z$ . However, this definition isn't sufficient unless we specify that the number of people in between must be finite. For instance, consider the following series:— $1,\ -\tfrac{1}{2},\ -\tfrac{1}{4},\ -\tfrac{1}{8},\ \dots\ \tfrac{1}{8},\ \tfrac{1}{4},\ \tfrac{1}{2},\ 1.$ In this case, we first have a series of negative fractions that continues indefinitely, followed by a series of positive fractions that has no start. Should we conclude that in this series, $-\frac{1}{8}$ is an ancestor of $\frac{1}{8}$ ? According to the initial definition suggested earlier, this would be true, but it would not hold under any definition that captures the essence of what we want to define. For this intention, it's essential that the number of intermediaries is finite. However, as we noted, "finite" has to be defined using mathematical induction, and it is easier to define the ancestral relationship generally from the outset than to first define it for the case of $n$ in relation to $n + 1$ and then extend it to other situations. Here, as is often the case, starting with a general approach, though it may require more effort initially, will ultimately save effort and enhance logical capability. [Pg 26]

The use of mathematical induction in demonstrations was, in the past, something of a mystery. There seemed no reasonable doubt that it was a valid method of proof, but no one quite knew why it was valid. Some believed it to be really a case of induction, in the sense in which that word is used in logic. Poincaré [9] considered it to be a principle of the utmost importance, by means of which an infinite number of syllogisms could be condensed into one argument. We now know that all such views are mistaken, and that mathematical induction is a definition, not a principle. There are some numbers to which it can be applied, and there are others (as we shall see in Chapter VIII.) to which it cannot be applied. We define the "natural numbers" as those to which proofs by mathematical induction can be applied, i.e. as those that possess all inductive properties. It follows that such proofs can be applied to the natural numbers, not in virtue of any mysterious intuition or axiom or principle, but as a purely verbal proposition. If "quadrupeds" are defined as animals having four legs, it will follow that animals that have four legs are quadrupeds; and the case of numbers that obey mathematical induction is exactly similar.

The use of mathematical induction in proofs used to be a bit of a mystery. While it was generally accepted as a valid proof method, no one really understood why it was valid. Some thought it was a form of induction, like that term is used in logic. Poincaré [9] considered it a very important principle that allowed an infinite number of syllogisms to be condensed into one argument. We now know that these views are incorrect and that mathematical induction is a definition, not a principle. There are certain numbers it can be applied to, and others (as we'll discuss in Chapter VIII.) that it cannot be applied to. We define "natural numbers" as those that can be used in proofs by mathematical induction, i.e. those that have all inductive properties. This means that such proofs can be applied to natural numbers, not because of some mysterious intuition, axiom, or principle, but simply as a verbal proposition. If "quadrupeds" are defined as animals with four legs, then it follows that animals with four legs are quadrupeds; the situation with numbers that fit mathematical induction is exactly the same.

[9]Science and Method, chap. IV.

[9]Science and Method, ch. IV.

We shall use the phrase "inductive numbers" to mean the same set as we have hitherto spoken of as the "natural numbers." The phrase "inductive numbers" is preferable as affording a reminder that the definition of this set of numbers is obtained from mathematical induction.

We will use the term "inductive numbers" to refer to the same group we’ve previously called "natural numbers." The term "inductive numbers" is better because it serves as a reminder that the definition of this set comes from mathematical induction.

Mathematical induction affords, more than anything else, the essential characteristic by which the finite is distinguished from the infinite. The principle of mathematical induction might be stated popularly in some such form as "what can be inferred from next to next can be inferred from first to last." This is true when the number of intermediate steps between first and last is finite, not otherwise. Anyone who has ever [Pg 27] watched a goods train beginning to move will have noticed how the impulse is communicated with a jerk from each truck to the next, until at last even the hindmost truck is in motion. When the train is very long, it is a very long time before the last truck moves. If the train were infinitely long, there would be an infinite succession of jerks, and the time would never come when the whole train would be in motion. Nevertheless, if there were a series of trucks no longer than the series of inductive numbers (which, as we shall see, is an instance of the smallest of infinites), every truck would begin to move sooner or later if the engine persevered, though there would always be other trucks further back which had not yet begun to move. This image will help to elucidate the argument from next to next, and its connection with finitude. When we come to infinite numbers, where arguments from mathematical induction will be no longer valid, the properties of such numbers will help to make clear, by contrast, the almost unconscious use that is made of mathematical induction where finite numbers are concerned. [Pg 28]

Mathematical induction provides the key feature that sets apart the finite from the infinite. The principle of mathematical induction can be simply expressed as "what you can infer from the next step can be inferred from the first to the last." This holds true only when the number of steps between the first and the last is finite. Anyone who has ever seen a freight train start moving will have noticed how the momentum is passed along with a jolt from one car to the next until the last car finally starts moving. When the train is very long, it takes quite a while for the last car to move. If the train were infinitely long, there would be an endless series of jolts, and the moment when the entire train would be in motion would never arrive. However, if there were a series of cars no longer than the series of inductive numbers (which, as we will see, is an example of the smallest infinity), each car would eventually start moving if the engine kept going, even though there would always be other cars further back that hadn’t started moving yet. This image helps clarify the concept of moving from one step to the next and how it relates to finiteness. When we deal with infinite numbers, where mathematical induction no longer applies, understanding the properties of such numbers will, by contrast, shed light on the almost instinctive use of mathematical induction with finite numbers.

CHAPTER IV

THE DEFINITION OF ORDER

WE have now carried our analysis of the series of natural numbers to the point where we have obtained logical definitions of the members of this series, of the whole class of its members, and of the relation of a number to its immediate successor. We must now consider the serial character of the natural numbers in the order 0, 1, 2, 3,.... We ordinarily think of the numbers as in this order, and it is an essential part of the work of analysing our data to seek a definition of "order" or "series" in logical terms.

We have now taken our analysis of the series of natural numbers to the point where we have logical definitions for the members of this series, the entire group of its members, and the relationship between a number and its immediate successor. We now need to examine the serial nature of the natural numbers in the order 0, 1, 2, 3,.... We usually think of the numbers in this order, and it's crucial to our analysis to find a logical definition of "order" or "series."

The notion of order is one which has enormous importance in mathematics. Not only the integers, but also rational fractions and all real numbers have an order of magnitude, and this is essential to most of their mathematical properties. The order of points on a line is essential to geometry; so is the slightly more complicated order of lines through a point in a plane, or of planes through a line. Dimensions, in geometry, are a development of order. The conception of a limit, which underlies all higher mathematics, is a serial conception. There are parts of mathematics which do not depend upon the notion of order, but they are very few in comparison with the parts in which this notion is involved.

The idea of order is extremely important in mathematics. Not only do integers have order, but so do rational fractions and all real numbers, which is key to most of their mathematical properties. The order of points on a line is crucial for geometry; the more complex order of lines through a point in a plane, or of planes through a line, is also important. Dimensions in geometry are a development of order. The concept of a limit, which is fundamental to all advanced mathematics, is based on a series concept. There are some areas of mathematics that don't rely on the notion of order, but they are very few compared to the many parts where this idea is involved.

In seeking a definition of order, the first thing to realise is that no set of terms has just one order to the exclusion of others. A set of terms has all the orders of which it is capable. Sometimes one order is so much more familiar and natural to our [Pg 29] thoughts that we are inclined to regard it as the order of that set of terms; but this is a mistake. The natural numbers—or the "inductive" numbers, as we shall also call them—occur to us most readily in order of magnitude; but they are capable of an infinite number of other arrangements. We might, for example, consider first all the odd numbers and then all the even numbers; or first 1, then all the even numbers, then all the odd multiples of 3, then all the multiples of 5 but not of 2 or 3, then all the multiples of 7 but not of 2 or 3 or 5, and so on through the whole series of primes. When we say that we "arrange" the numbers in these various orders, that is an inaccurate expression: what we really do is to turn our attention to certain relations between the natural numbers, which themselves generate such-and-such an arrangement. We can no more "arrange" the natural numbers than we can the starry heavens; but just as we may notice among the fixed stars either their order of brightness or their distribution in the sky, so there are various relations among numbers which may be observed, and which give rise to various different orders among numbers, all equally legitimate. And what is true of numbers is equally true of points on a line or of the moments of time: one order is more familiar, but others are equally valid. We might, for example, take first, on a line, all the points that have integral co-ordinates, then all those that have non-integral rational co-ordinates, then all those that have algebraic non-rational co-ordinates, and so on, through any set of complications we please. The resulting order will be one which the points of the line certainly have, whether we choose to notice it or not; the only thing that is arbitrary about the various orders of a set of terms is our attention, for the terms themselves have always all the orders of which they are capable.

In trying to define order, the first thing to understand is that no set of terms has just one order exclusively. A set of terms encompasses every possible order it can have. Sometimes one order feels so much more familiar and natural to us that we tend to see it as the order for that set of terms, but that’s a mistake. The natural numbers—or the "inductive" numbers, as we’ll also refer to them—come to mind most readily in terms of size; however, they can be arranged in countless other ways. For instance, we could list all the odd numbers first and then the even numbers; or start with 1, then list all the even numbers, followed by all the odd multiples of 3, then all the multiples of 5 that aren’t also multiples of 2 or 3, then all the multiples of 7 that aren’t multiples of 2, 3, or 5, and continue this pattern through all the prime numbers. When we say we "arrange" the numbers in these different orders, that’s not entirely accurate: what we actually do is focus on certain relationships between the natural numbers which create that particular arrangement. We can’t "arrange" the natural numbers any more than we can arrange the stars in the sky; but just as we can observe stars based on their brightness or distribution, there are various relationships among numbers that can be examined, leading to different orders that are all valid. What applies to numbers also applies to points on a line or moments in time: one order may be more familiar, but others are just as valid. For example, we could first take all the points on a line that have whole number coordinates, then all those with rational non-integer coordinates, then all those with algebraic non-rational coordinates, and so on through any level of complexity we prefer. The resulting order is one that the points on the line inherently have, whether we acknowledge it or not; the only arbitrary aspect of the different orders within a set of terms is our attention, as the terms inherently possess all the possible orders they can have.

One important result of this consideration is that we must not look for the definition of order in the nature of the set of terms to be ordered, since one set of terms has many orders. The order lies, not in the class of terms, but in a relation among [Pg 30] the members of the class, in respect of which some appear as earlier and some as later. The fact that a class may have many orders is due to the fact that there can be many relations holding among the members of one single class. What properties must a relation have in order to give rise to an order?

One important outcome of this consideration is that we shouldn't seek the definition of order in the nature of the set of terms being ordered, because a single set of terms can have multiple orders. The order is not found in the class of terms, but in a relationship among the members of the class, where some appear as earlier and others as later. The reason a class can have various orders is that there can be multiple relationships existing among the members of a single class. What characteristics must a relationship have to create an order?

The essential characteristics of a relation which is to give rise to order may be discovered by considering that in respect of such a relation we must be able to say, of any two terms in the class which is to be ordered, that one "precedes" and the other "follows." Now, in order that we may be able to use these words in the way in which we should naturally understand them, we require that the ordering relation should have three properties:—

The key features of a relationship that creates order can be found by noting that for this kind of relationship, we must be able to say that, regarding any two items in the group to be ordered, one "comes before" and the other "comes after." To use these terms in the way we would normally understand them, we need the ordering relationship to have three properties:—

(1) If $x$ precedes $y$ , $y$ must not also precede $x$ . This is an obvious characteristic of the kind of relations that lead to series. If $x$ is less than $y$ , $y$ is not also less than $x$ . If $x$ is earlier in time than $y$ , $y$ is not also earlier than $x$ . If $x$ is to the left of $y$ , $y$ is not to the left of $x$ . On the other hand, relations which do not give rise to series often do not have this property. If $x$ is a brother or sister of $y$ , $y$ is a brother or sister of $x$ . If $x$ is of the same height as $y$ , $y$ is of the same height as $x$ . If $x$ is of a different height from $y$ , $y$ is of a different height from $x$ . In all these cases, when the relation holds between $x$ and $y$ , it also holds between $y$ and $x$ . But with serial relations such a thing cannot happen. A relation having this first property is called asymmetrical.

(1) If $x$ comes before $y$ , then $y$ cannot also come before $x$ . This is an obvious trait of the types of relationships that form series. If $x$ is less than $y$ , then $y$ cannot also be less than $x$ . If $x$ occurs earlier than $y$ , then $y$ cannot also occur earlier than $x$ . If $x$ is to the left of $y$ , then $y$ cannot be to the left of $x$ . On the other hand, relationships that do not form series often lack this property. If $x$ is a sibling of $y$ , then $y$ is a sibling of $x$ . If $x$ is the same height as $y$ , then $y$ is the same height as $x$ . If $x$ is a different height from $y$ , then $y$ is a different height from $x$ . In all these cases, when the relationship exists between $x$ and $y$ , it also holds between $y$ and $x$ . But with serial relationships, this cannot happen. A relationship that has this first property is called asymmetrical.

(2) If $x$ precedes $y$ and $y$ precedes $z$ , $x$ must precede $z$ . This may be illustrated by the same instances as before: less, earlier, left of. But as instances of relations which do not have this property only two of our previous three instances will serve. If $x$ is brother or sister of $y$ , and $y$ of $z$ , $x$ may not be brother or sister of $z$ , since $x$ and $z$ may be the same person. The same applies to difference of height, but not to sameness of height, which has our second property but not our first. The relation "father," on the other hand, has our first property but not [Pg 31] our second. A relation having our second property is called transitive.

(2) If $x$ comes before $y$ and $y$ comes before $z$ , then $x$ must come before $z$ . This can be shown with the same examples as before: less, earlier, left of. However, for relationships that do not have this property, only two of our previous three examples will apply. If $x$ is a brother or sister of $y$ and $y$ is of $z$ , then $x$ might not be a brother or sister of $z$ , since $x$ and $z$ could be the same person. The same goes for differences in height, but not for being the same height, which has our second property but not our first. The relationship "father," on the other hand, has our first property but not our second. A relationship that has our second property is called transitive.

(3) Given any two terms of the class which is to be ordered, there must be one which precedes and the other which follows. For example, of any two integers, or fractions, or real numbers, one is smaller and the other greater; but of any two complex numbers this is not true. Of any two moments in time, one must be earlier than the other; but of events, which may be simultaneous, this cannot be said. Of two points on a line, one must be to the left of the other. A relation having this third property is called connected.

(3) For any two terms in the class that needs to be ordered, one must come before the other. For instance, of any two integers, fractions, or real numbers, one is smaller and the other is larger; however, this doesn't hold for any two complex numbers. For any two moments in time, one has to be earlier than the other; but for events, which can happen at the same time, this isn’t applicable. Of two points on a line, one has to be to the left of the other. A relationship that has this third property is called connected.

When a relation possesses these three properties, it is of the sort to give rise to an order among the terms between which it holds; and wherever an order exists, some relation having these three properties can be found generating it.

When a relation has these three properties, it creates an order among the terms it connects; and whenever there is an order, there is some relation with these three properties that can be found to create it.

Before illustrating this thesis, we will introduce a few definitions.

Before explaining this thesis, we will provide a few definitions.

(1) A relation is said to be an aliorelative,[10] or to be contained in or imply diversity, if no term has this relation to itself. Thus, for example, "greater," "different in size," "brother," "husband," "father" are aliorelatives; but "equal," "born of the same parents," "dear friend" are not.

(1) A relation is called an aliorelative,[10] or to be contained in or imply diversity, if no term has this relation to itself. So, for instance, "greater," "different in size," "brother," "husband," and "father" are aliorelatives; but "equal," "born of the same parents," and "dear friend" are not.

[10]This term is due to C. S. Peirce.

[10]This term comes from C. S. Peirce.

(2) The square of a relation is that relation which holds between two terms $x$ and $z$ when there is an intermediate term $y$ such that the given relation holds between $x$ and $y$ and between $y$ and $z$ . Thus "paternal grandfather" is the square of "father," "greater by 2" is the square of "greater by 1," and so on.

(2) The square of a relation is the relation that exists between two terms $x$ and $z$ when there is an intermediate term $y$ such that the given relation holds between $x$ and $y$ and between $y$ and $z$

(3) The domain of a relation consists of all those terms that have the relation to something or other, and the converse domain consists of all those terms to which something or other has the relation. These words have been already defined, but are recalled here for the sake of the following definition:—

(3) The domain of a relation includes all the terms that have a relation to something, while the converse domain includes all the terms that something has a relation to. These terms have already been defined, but they're mentioned again here for the sake of the upcoming definition:—

(4) The field of a relation consists of its domain and converse domain together. [Pg 32]

(4) The field of a relation includes both its domain and converse domain combined. [Pg 32]

(5) One relation is said to contain or be implied by another if it holds whenever the other holds.

(5) One relationship is said to contain or be implied by another if it is true whenever the other is true.

It will be seen that an asymmetrical relation is the same thing as a relation whose square is an aliorelative. It often happens that a relation is an aliorelative without being asymmetrical, though an asymmetrical relation is always an aliorelative. For example, "spouse" is an aliorelative, but is symmetrical, since if $x$ is the spouse of $y$ , $y$ is the spouse of $x$ . But among transitive relations, all aliorelatives are asymmetrical as well as vice versa.

It will be clear that an asymmetrical relation is the same as a relation whose square is an aliorelative. It often happens that a relation is an aliorelative without being asymmetrical, but an asymmetrical relation is always an aliorelative. For example, "spouse" is an aliorelative, but it's symmetrical, since if $x$ is the spouse of $y$ , then $y$ is the spouse of $x$ . However, among transitive relations, all aliorelatives are asymmetrical and vice versa.

From the definitions it will be seen that a transitive relation is one which is implied by its square, or, as we also say, "contains" its square. Thus "ancestor" is transitive, because an ancestor's ancestor is an ancestor; but "father" is not transitive, because a father's father is not a father. A transitive aliorelative is one which contains its square and is contained in diversity; or, what comes to the same thing, one whose square implies both it and diversity—because, when a relation is transitive, asymmetry is equivalent to being an aliorelative.

From the definitions, you can see that a transitive relation is one that is implied by its square, or, as we also say, "contains" its square. For example, "ancestor" is transitive because an ancestor's ancestor is still an ancestor; however, "father" is not transitive because a father's father is not a father. A transitive aliorelative is one that contains its square and is distinct; or, in other words, one whose square implies both itself and distinctness—because when a relation is transitive, asymmetry is the same as being an aliorelative.

A relation is connected when, given any two different terms of its field, the relation holds between the first and the second or between the second and the first (not excluding the possibility that both may happen, though both cannot happen if the relation is asymmetrical).

A relation is connected when, for any two different elements in its set, the relation exists either from the first to the second or from the second to the first (not ruling out the chance that both may occur, although both cannot happen if the relation is asymmetrical).

It will be seen that the relation "ancestor," for example, is an aliorelative and transitive, but not connected; it is because it is not connected that it does not suffice to arrange the human race in a series.

It will be seen that the relationship "ancestor," for example, is an aliorelative and transitive, but not connected; it is because it is not connected that it does not suffice to arrange the human race in a series.

The relation "less than or equal to," among numbers, is transitive and connected, but not asymmetrical or an aliorelative.

The "less than or equal to" relationship among numbers is transitive and connected, but it is not asymmetrical or an aliorelative.

The relation "greater or less" among numbers is an aliorelative and is connected, but is not transitive, for if $x$ is greater or less than $y$ , and $y$ is greater or less than $z$ , it may happen that $x$ and $z$ are the same number.

The relationship of "greater than or less than" between numbers is a non-relative relationship that is connected but not transitive. If $x$ is greater than or less than $y$ , and $z$ , it’s possible for $x$ and $z$ to be the same number.

Thus the three properties of being (1) an aliorelative, (2) transitive, [Pg 33] and (3) connected, are mutually independent, since a relation may have any two without having the third.

Thus the three properties of being (1) an aliorelative, (2) transitive, [Pg 33] and (3) connected are mutually independent because a relation can have any two of them without having the third.

We now lay down the following definition:—

We now present the following definition:—

A relation is serial when it is an aliorelative, transitive, and connected; or, what is equivalent, when it is asymmetrical, transitive, and connected.

A relation is serial when it is an aliorelative, transitive, and connected; or, in other words, when it is asymmetrical, transitive, and connected.

A series is the same thing as a serial relation.

A series is the same as a serial relation.

It might have been thought that a series should be the field of a serial relation, not the serial relation itself. But this would be an error. For example, $1,\ 2,\ 3;\quad 1,\ 3,\ 2;\quad 2,\ 3,\ 1;\quad 2,\ 1,\ 3;\quad 3,\ 1,\ 2;\quad 3,\ 2,\ 1$ are six different series which all have the same field. If the field were the series, there could only be one series with a given field. What distinguishes the above six series is simply the different ordering relations in the six cases. Given the ordering relation, the field and the order are both determinate. Thus the ordering relation may be taken to be the series, but the field cannot be so taken.

It might have been thought that a series should be the field of a serial relationship, not the serial relationship itself. But that would be a mistake. For example, $1,\ 2,\ 3;\quad 1,\ 3,\ 2;\quad 2,\ 3,\ 1;\quad 2,\ 1,\ 3;\quad 3,\ 1,\ 2;\quad 3,\ 2,\ 1$ there are six different series that all have the same field. If the field were the series, there could only be one series with a given field. What sets the six series apart is simply the different ordering relationships in each case. Given the ordering relationship, both the field and the order are clearly defined. Thus, the ordering relationship can be regarded as being the series, but the field cannot be viewed that way.

Given any serial relation, say $\mathrm P$ , we shall say that, in respect of this relation, $x$ "precedes" $y$ if $x$ has the relation $\mathrm P$ to $y$ , which we shall write " $x\mathrm Py$ " for short. The three characteristics which $\mathrm P$ must have in order to be serial are:

Given any serial relation, let's call it $\mathrm P$ , we will say that, in relation to this, $x$ "precedes" $y$ if $x$ has the relation $\mathrm P$ to $y$ , which we will write as " $x\mathrm Py$ " for simplicity. The three characteristics that $\mathrm P$ must have to be considered serial are:

(1) We must never have $x\mathrm Px$ , i.e. no term must precede itself.

(1) We can never have $x\mathrm Px$ , i.e. no term can come before itself.

(2) $\mathrm P^{2}$ must imply $\mathrm P$ , i.e. if $x$ precedes $y$ and $y$ precedes $z$ , $x$ must precede $z$ .

(2) $\mathrm P^{2}$ must imply $\mathrm P$ , i.e. if $x$ comes before $y$ and $y$ comes before $z$ , $x$ must come before $z$ .

(3) If $x$ and $y$ are two different terms in the field of $\mathrm P$ , we shall have $x\mathrm Py$ or $y\mathrm Px$ , i.e. one of the two must precede the other.

(3) If $x$ and $y$ are two distinct terms in the field of $\mathrm P$ , we will have $x\mathrm Py$ or $y\mathrm Px$ , i.e. one of the two has to come before the other.

The reader can easily convince himself that, where these three properties are found in an ordering relation, the characteristics we expect of series will also be found, and vice versa. We are therefore justified in taking the above as a definition of order [Pg 34] or series. And it will be observed that the definition is effected in purely logical terms.

The reader can easily convince themselves that when these three properties are present in an ordering relation, the characteristics we expect from series will also be present, and vice versa. Therefore, we can take the above as a definition of order [Pg 34] or series. It's worth noting that the definition is framed in purely logical terms.

Although a transitive asymmetrical connected relation always exists wherever there is a series, it is not always the relation which would most naturally be regarded as generating the series. The natural-number series may serve as an illustration. The relation we assumed in considering the natural numbers was the relation of immediate succession, i.e. the relation between consecutive integers. This relation is asymmetrical, but not transitive or connected. We can, however, derive from it, by the method of mathematical induction, the "ancestral" relation which we considered in the preceding chapter. This relation will be the same as "less than or equal to" among inductive integers. For purposes of generating the series of natural numbers, we want the relation "less than," excluding "equal to." This is the relation of $m$ to $n$ when $m$ is an ancestor of $n$ but not identical with $n$ , or (what comes to the same thing) when the successor of $m$ is an ancestor of $n$ in the sense in which a number is its own ancestor. That is to say, we shall lay down the following definition:—

Although a transitive asymmetrical connected relation always exists wherever there is a series, it isn't always the relation that would naturally be seen as generating the series. The series of natural numbers can illustrate this. The relation we considered for natural numbers was the relation of immediate succession, meaning the relation between consecutive integers. This relation is asymmetrical, but it isn't transitive or connected. However, we can derive from it, using mathematical induction, the "ancestral" relation we talked about in the previous chapter. This relation will be the same as "less than or equal to" among inductive integers. To generate the series of natural numbers, we want the relation "less than," leaving out "equal to." This is the relation of $m$ to $n$ when $m$ is an ancestor of $n$ but not identical to $n$ , or (which is the same thing) when the successor of $m$ is an ancestor of $n$ in the sense that a number is its own ancestor. In other words, we will establish the following definition:—

An inductive number $m$ is said to be less than another number $n$ when $n$ possesses every hereditary property possessed by the successor of $m$ .

An inductive number $m$ is considered less than another number $n$ when $n$ has every hereditary property that the successor of $m$ has.

It is easy to see, and not difficult to prove, that the relation "less than," so defined, is asymmetrical, transitive, and connected, and has the inductive numbers for its field. Thus by means of this relation the inductive numbers acquire an order in the sense in which we defined the term "order," and this order is the so-called "natural" order, or order of magnitude.

It’s clear and easy to prove that the relation "less than," as defined, is asymmetrical, transitive, and connected, and its field consists of the natural numbers. This relation gives the natural numbers a specific order, as we defined the term "order," and this order is known as the "natural" order or order of magnitude.

The generation of series by means of relations more or less resembling that of $n$ to $n + 1$ is very common. The series of the Kings of England, for example, is generated by relations of each to his successor. This is probably the easiest way, where it is applicable, of conceiving the generation of a series. In this method we pass on from each term to the next, as long as there [Pg 35] is a next, or back to the one before, as long as there is one before. This method always requires the generalised form of mathematical induction in order to enable us to define "earlier" and "later" in a series so generated. On the analogy of "proper fractions," let us give the name "proper posterity of $x$ with respect to $\mathrm R$ " to the class of those terms that belong to the $\mathrm R$ -posterity of some term to which $x$ has the relation $\mathrm R$ , in the sense which we gave before to "posterity," which includes a term in its own posterity. Reverting to the fundamental definitions, we find that the "proper posterity" may be defined as follows:—

The way we create sequences using relationships similar to $n$ and $n + 1$ is quite common. For instance, the list of English Kings is created by the connection of each king to his successor. This is likely the simplest way to understand how a sequence is formed when applicable. We can move from one term to the next, as long as there is a next term, or back to the previous one, as long as there is one before it. This approach always needs a generalized form of mathematical induction to help us define "earlier" and "later" in such a sequence. Following the idea of "proper fractions," let’s call the "proper posterity of $x$ concerning $\mathrm R$ " the group of terms that belong to the $\mathrm R$ -posterity of some term that $x$ is related to via $\mathrm R$ , in the same way we previously defined "posterity," which includes a term in its own posterity. Referring back to the basic definitions, we find that "proper posterity" can be defined as follows:—

The "proper posterity" of $x$ with respect to $\mathrm R$ consists of all terms that possess every $\mathrm R$ -hereditary property possessed by every term to which $x$ has the relation $\mathrm R$ .

The "proper offspring" of $x$ regarding $\mathrm R$ includes all terms that share every $\mathrm R$ -hereditary property of every term related to $x$ by the relation $\mathrm R$ .

It is to be observed that this definition has to be so framed as to be applicable not only when there is only one term to which $x$ has the relation $\mathrm R$ , but also in cases (as e.g. that of father and child) where there may be many terms to which $x$ has the relation $\mathrm R$ . We define further:

It should be noted that this definition needs to be constructed in a way that it applies not only when there is a single term to which $x$ has the relation $\mathrm R$ , but also in situations (for example, that of father and child) where there can be several terms to which $x$ has the relation $\mathrm R$ . We define further:

A term $x$ is a "proper ancestor" of $y$ with respect to $\mathrm R$ if $y$ belongs to the proper posterity of $x$ with respect to $\mathrm R$ .

A term $x$ is a "proper ancestor" of $y$ in relation to $\mathrm R$ if $y$ is part of the proper descendant line of $x$ concerning $\mathrm R$ .

We shall speak for short of " $\mathrm R$ -posterity" and " $\mathrm R$ -ancestors" when these terms seem more convenient.

We will refer to " $\mathrm R$ -posterity" and " $\mathrm R$ -ancestors" when it’s more convenient.

Reverting now to the generation of series by the relation $\mathrm R$ between consecutive terms, we see that, if this method is to be possible, the relation "proper $\mathrm R$ -ancestor" must be an aliorelative, transitive, and connected. Under what circumstances will this occur? It will always be transitive: no matter what sort of relation $\mathrm R$ may be, " $\mathrm R$ -ancestor" and "proper $\mathrm R$ -ancestor" are always both transitive. But it is only under certain circumstances that it will be an aliorelative or connected. Consider, for example, the relation to one's left-hand neighbour at a round dinner-table at which there are twelve people. If we call this relation $\mathrm R$ , the proper $\mathrm R$ -posterity of a person consists of all who can be reached by going round the table from right to left. This includes everybody at the table, including the person himself, since [Pg 36] twelve steps bring us back to our starting-point. Thus in such a case, though the relation "proper $\mathrm R$ -ancestor" is connected, and though $\mathrm R$ itself is an aliorelative, we do not get a series because "proper $\mathrm R$ -ancestor" is not an aliorelative. It is for this reason that we cannot say that one person comes before another with respect to the relation "right of" or to its ancestral derivative.

Reverting now to the generation of series through the relation $\mathrm R$ between consecutive terms, we see that if this method is to work, the relation "proper $\mathrm R$ -ancestor" must be an aliorelative, transitive, and connected. Under what conditions will this happen? It will always be transitive: regardless of what kind of relation $\mathrm R$ may be, "proper $\mathrm R$ -ancestor" and "proper $\mathrm R$ -ancestor" are always both transitive. However, it’s only under certain conditions that it will be an aliorelative or connected. For example, consider the relation to one's left-hand neighbor at a round dinner table with twelve people. If we call this relation $\mathrm R$ , the proper $\mathrm R$ -posterity of a person includes everyone who can be reached by going around the table from right to left. This includes everyone at the table, including the person themselves, since [Pg 36] twelve steps bring us back to our starting point. Thus, in this case, even though the relation "proper $\mathrm R$ -ancestor" is connected, and although $\mathrm R$ itself is an aliorelative, we do not get a series because "proper $\mathrm R$ -ancestor" is not an aliorelative. This is why we cannot say that one person comes before another concerning the relation "right of" or its ancestral derivative.

The above was an instance in which the ancestral relation was connected but not contained in diversity. An instance where it is contained in diversity but not connected is derived from the ordinary sense of the word "ancestor." If $x$ is a proper ancestor of $y$ , $x$ and $y$ cannot be the same person; but it is not true that of any two persons one must be an ancestor of the other.

The example above shows a case where the ancestral relationship is connected but not found in diversity. A case where it is found in diversity but not connected comes from the common understanding of the term "ancestor." If $x$ is a valid ancestor of $y$ , then $x$ and $y$ cannot be the same individual; however, it is not necessarily true that one of any two individuals must be an ancestor of the other.

The question of the circumstances under which series can be generated by ancestral relations derived from relations of consecutiveness is often important. Some of the most important cases are the following: Let $\mathrm R$ be a many-one relation, and let us confine our attention to the posterity of some term $x$ . When so confined, the relation "proper $\mathrm R$ -ancestor" must be connected; therefore all that remains to ensure its being serial is that it shall be contained in diversity. This is a generalisation of the instance of the dinner-table. Another generalisation consists in taking $\mathrm R$ to be a one-one relation, and including the ancestry of $x$ as well as the posterity. Here again, the one condition required to secure the generation of a series is that the relation "proper $\mathrm R$ -ancestor" shall be contained in diversity.

The question of the situations in which series can be created by ancestral relationships based on connections of consecutiveness is often significant. Some of the key cases are the following: Let $\mathrm R$ be a one-to-many relation, and let’s focus on the descendants of a certain term $x$ . When we do this, the relation "proper $\mathrm R$ -ancestor" must be connected; thus, the only requirement to ensure its being serial is that it is contained in diversity. This is a generalization of the example of the dinner table. Another generalization involves taking $\mathrm R$ as a one-to-one relation, including the ancestry of $x$ along with the descendants. Again, the one condition needed to ensure the generation of a series is that the relation "proper $\mathrm R$ -ancestor" must be contained in diversity.

The generation of order by means of relations of consecutiveness, though important in its own sphere, is less general than the method which uses a transitive relation to define the order. It often happens in a series that there are an infinite number of intermediate terms between any two that may be selected, however near together these may be. Take, for instance, fractions in order of magnitude. Between any two fractions there are others—for example, the arithmetic mean of the two. Consequently there is no such thing as a pair of consecutive fractions. If we depended [Pg 37] upon consecutiveness for defining order, we should not be able to define the order of magnitude among fractions. But in fact the relations of greater and less among fractions do not demand generation from relations of consecutiveness, and the relations of greater and less among fractions have the three characteristics which we need for defining serial relations. In all such cases the order must be defined by means of a transitive relation, since only such a relation is able to leap over an infinite number of intermediate terms. The method of consecutiveness, like that of counting for discovering the number of a collection, is appropriate to the finite; it may even be extended to certain infinite series, namely, those in which, though the total number of terms is infinite, the number of terms between any two is always finite; but it must not be regarded as general. Not only so, but care must be taken to eradicate from the imagination all habits of thought resulting from supposing it general. If this is not done, series in which there are no consecutive terms will remain difficult and puzzling. And such series are of vital importance for the understanding of continuity, space, time, and motion.

The generation of order through consecutive relationships, while significant in its own right, is less universally applicable than the method that uses a transitive relation to establish order. Often in a series, there can be an infinite number of intermediate terms between any two chosen terms, no matter how close they are. For instance, consider fractions in order of size. Between any two fractions, there are other fractions—like the arithmetic mean of the two. Therefore, there’s no such thing as a pair of consecutive fractions. If we relied on consecutiveness to define order, we wouldn't be able to establish the order of size among fractions. However, the relationships of greater and less among fractions do not require generation from consecutive relations, and these relationships possess the three characteristics we need for defining serial relations. In all such instances, the order must be defined using a transitive relation, since only such a relation can bypass an infinite number of intermediate terms. The method of consecutiveness, akin to counting for determining the number of a collection, is suitable for finite scenarios; it may even be applied to certain infinite series, specifically those where the total number of terms is infinite, yet the number of terms between any two is always finite; however, it shouldn’t be considered a general method. Moreover, it's crucial to eliminate any ingrained thought patterns that arise from viewing it as general. If this isn't addressed, series that lack consecutive terms will continue to be challenging and confusing. Such series are essential for understanding continuity, space, time, and motion.

There are many ways in which series may be generated, but all depend upon the finding or construction of an asymmetrical transitive connected relation. Some of these ways have considerable importance. We may take as illustrative the generation of series by means of a three-term relation which we may call "between." This method is very useful in geometry, and may serve as an introduction to relations having more than two terms; it is best introduced in connection with elementary geometry.

There are many ways to create series, but all of them rely on discovering or building an asymmetrical transitive connected relation. Some of these methods are quite significant. A good example is generating series using a three-term relation that we can refer to as "between." This approach is very helpful in geometry and can serve as a gateway to relations involving more than two terms; it is best introduced alongside basic geometry.

Given any three points on a straight line in ordinary space, there must be one of them which is between the other two. This will not be the case with the points on a circle or any other closed curve, because, given any three points on a circle, we can travel from any one to any other without passing through the third. In fact, the notion "between" is characteristic of open series—or series in the strict sense—as opposed to what may be called [Pg 38] "cyclic" series, where, as with people at the dinner-table, a sufficient journey brings us back to our starting-point. This notion of "between" may be chosen as the fundamental notion of ordinary geometry; but for the present we will only consider its application to a single straight line and to the ordering of the points on a straight line.[11] Taking any two points $a$ , $b$ , the line $(ab)$ consists of three parts (besides $a$ and $b$ themselves):

Given any three points on a straight line in regular space, at least one of them will be between the other two. This isn’t true for points on a circle or any closed curve, because if you pick any three points on a circle, you can move from one to another without going through the third. In fact, the idea of "between" is specific to open series—or series in the strict sense—contrasting with what can be called a "cyclic" series, where, like people at a dinner table, a long enough journey brings you back to where you started. We can regard the concept of "between" as foundational to ordinary geometry; however, for now, we’ll only look at how it applies to a single straight line and the arrangement of points on that line.[11] Taking any two points $a$ , $b$ , the line $(ab)$ is made up of three parts (in addition to $a$ and $b$ ) itself):

[11]Cf. Rivista di Matematica, IV. pp. 55 ff.; Principles of Mathematics, p. 394 (§ 375).

[11]See Rivista di Matematica, IV. pp. 55 and following; Principles of Mathematics, p. 394 (§ 375).

(1) Points between $a$ and $b$ .

Points between $a$ and $b$ .

(2) Points $x$ such that $a$ is between $x$ and $b$ .

Points $x$ such that $a$ is between $x$ and $b$ .

(3) Points $y$ such that $b$ is between $y$ and $a$ .

Thus the line $(ab)$ can be defined in terms of the relation "between."

Thus the line $(ab)$ can be defined based on the relation "between."

In order that this relation "between" may arrange the points of the line in an order from left to right, we need certain assumptions, namely, the following:—

In order for this relationship "between" to organize the points of the line from left to right, we need certain assumptions, specifically the following:—

(1) If anything is between $a$ and $b$ , $a$ and $b$ are not identical.

(1) If anything is between $a$ and $b$ , $a$ and $b$ are not the same.

(2) Anything between $a$ and $b$ is also between $b$ and $a$ .

(3) Anything between $a$ and $b$ is not identical with $a$ (nor, consequently, with $b$ , in virtue of (2)).

(3) Anything between $a$ and $b$ is not the same as $a$ (and, therefore, not the same as $b$ , because of (2)).

(4) If $x$ is between $a$ and $b$ , anything between $a$ and $x$ is also between $a$ and $b$ .

(5) If $x$ is between $a$ and $b$ , and $b$ is between $x$ and $y$ , then $b$ is between $a$ and $y$ .

(6) If $x$ and $y$ are between $a$ and $b$ , then either $x$ and $y$ are identical, or $x$ is between $a$ and $y$ , or $x$ is between $y$ and $b$ .

(6) If $x$ and $y$ are between $a$ and $b$ , then either $x$ and $y$ are the same, or $x$ is between $a$ and $y$ , or $x$ is between $y$ and $b$ .

(7) If $b$ is between $a$ and $x$ and also between $a$ and $y$ , then either $x$ and $y$ are identical, or $x$ is between $b$ and $y$ , or $y$ is between $b$ and $x$ .

(7) If $b$ is between $a$ and $x$ and also between $a$ and $y$ , then either $x$ and $y$ are the same, or $x$ is between $b$ and $y$ , or $y$ is between $b$ and $x$ .

These seven properties are obviously verified in the case of points on a straight line in ordinary space. Any three-term relation which verifies them gives rise to series, as may be seen from the following definitions. For the sake of definiteness, let us assume [Pg 39] that $a$ is to the left of $b$ . Then the points of the line $(ab)$ are (1) those between which and $b$ , $a$ lies—these we will call to the left of $a$ ; (2) $a$ itself; (3) those between $a$ and $b$ ; (4) $b$ itself; (5) those between which and $a$ lies $b$ —these we will call to the right of $b$ . We may now define generally that of two points $x$ , $y$ , on the line $(ab)$ , we shall say that $x$ is "to the left of" $y$ in any of the following cases:—

These seven properties are clearly demonstrated when looking at points on a straight line in regular space. Any three-term relationship that confirms these properties results in series, as shown in the definitions below. To be specific, let's assume that $a$ is to the left of $b$ . The points on the line $(ab)$ are (1) those that fall between $b$ and $a$ — we will call these to the left of $a$ itself; (3) those between $a$ and $b$

(1) When $x$ and $y$ are both to the left of $a$ , and $y$ is between $x$ and $a$ ;

(1) When $x$ and $y$ are both to the left of $a$ , and $y$ is positioned between $x$ and $a$ ;

(2) When $x$ is to the left of $a$ , and $y$ is $a$ or $b$ or between $a$ and $b$ or to the right of $b$ ;

(3) When $x$ is $a$ , and $y$ is between $a$ and $b$ or is $b$ or is to the right of $b$ ;

(4) When $x$ and $y$ are both between $a$ and $b$ , and $y$ is between $x$ and $b$ ;

(5) When $x$ is between $a$ and $b$ , and $y$ is $b$ or to the right of $b$ ;

(5) When $x$ is positioned between $a$ and $b$ , and $y$ is either $b$ or to the right of $b$ ;

(6) When $x$ is $b$ and $y$ is to the right of $b$ ;

(7) When $x$ and $y$ are both to the right of $b$ and $x$ is between $b$ and $y$ .

(7) When $x$ and $y$ are both to the right of $b$ and $x$ is positioned between $b$ and $y$ .

It will be found that, from the seven properties which we have assigned to the relation "between," it can be deduced that the relation "to the left of," as above defined, is a serial relation as we defined that term. It is important to notice that nothing in the definitions or the argument depends upon our meaning by "between" the actual relation of that name which occurs in empirical space: any three-term relation having the above seven purely formal properties will serve the purpose of the argument equally well.

It can be seen that, from the seven properties we’ve assigned to the relation "between," we can conclude that the relation "to the left of," as we defined it, is a serial relation as we outlined. It's important to note that nothing in the definitions or the argument relies on how we interpret "between" in the actual relation of that name found in real space: any three-term relation with the above seven purely formal properties will work just as well for the argument.

Cyclic order, such as that of the points on a circle, cannot be generated by means of three-term relations of "between." We need a relation of four terms, which may be called "separation of couples." The point may be illustrated by considering a journey round the world. One may go from England to New Zealand by way of Suez or by way of San Francisco; we cannot [Pg 40] say definitely that either of these two places is "between" England and New Zealand. But if a man chooses that route to go round the world, whichever way round he goes, his times in England and New Zealand are separated from each other by his times in Suez and San Francisco, and conversely. Generalising, if we take any four points on a circle, we can separate them into two couples, say $a$ and $b$ and $x$ and $y$ , such that, in order to get from $a$ to $b$ one must pass through either $x$ or $y$ , and in order to get from $x$ to $y$ one must pass through either $a$ or $b$ . Under these circumstances we say that the couple $(a, b)$ are "separated" by the couple $(x, y)$ . Out of this relation a cyclic order can be generated, in a way resembling that in which we generated an open order from "between," but somewhat more complicated.[12]

Cyclic order, like the points on a circle, can't be created using just three-term relationships of "between." We need a four-term relationship, which we can call "separation of couples." This idea can be illustrated by thinking about a journey around the world. You can travel from England to New Zealand by way of Suez or via San Francisco; we can't definitively say that either location is "between" England and New Zealand. However, if someone picks either route to travel around the world, no matter which way they go, their time in England and their time in New Zealand are separated by their time in Suez and their time in San Francisco, and vice versa. In general, if we take any four points on a circle, we can pair them into two couples, say $a$ and $b$ and $x$ and $a$ to $b$ one must go through either $x$ or $y$ and to get from $x$ to $y$ one must pass through either $a$ or $(a, b)$ are "separated" by the couple $(x, y)$ [12]

[12]Cf. Principles of Mathematics, p. 205 (§ 194), and references there given.

[12]See Principles of Mathematics, p. 205 (§ 194), and the references provided there.

The purpose of the latter half of this chapter has been to suggest the subject which one may call "generation of serial relations." When such relations have been defined, the generation of them from other relations possessing only some of the properties required for series becomes very important, especially in the philosophy of geometry and physics. But we cannot, within the limits of the present volume, do more than make the reader aware that such a subject exists. [Pg 41]

The goal of the second half of this chapter has been to introduce what we can refer to as the "generation of serial relations." Once these relations are defined, generating them from other relations that only have some of the necessary properties for series becomes crucial, particularly in the philosophy of geometry and physics. However, within the confines of this book, we can only inform the reader that such a topic exists. [Pg 41]

CHAPTER V

KINDS OF RELATIONS

A great part of the philosophy of mathematics is concerned with relations, and many different kinds of relations have different kinds of uses. It often happens that a property which belongs to all relations is only important as regards relations of certain sorts; in these cases the reader will not see the bearing of the proposition asserting such a property unless he has in mind the sorts of relations for which it is useful. For reasons of this description, as well as from the intrinsic interest of the subject, it is well to have in our minds a rough list of the more mathematically serviceable varieties of relations.

A big part of the philosophy of mathematics focuses on relationships, and various types of relationships have different uses. It's common for a property that applies to all relationships to be significant only in relation to certain types; in these cases, the reader might not grasp the implications of the statement about such a property unless they consider the types of relationships for which it is relevant. Because of this, as well as the inherent interest of the topic, it's helpful to keep a rough list of the more mathematically useful types of relationships in mind.

We dealt in the preceding chapter with a supremely important class, namely, serial relations. Each of the three properties which we combined in defining series—namely, asymmetry, transitiveness, and connexity—has its own importance. We will begin by saying something on each of these three.

We covered a really important category in the previous chapter, which is serial relations. Each of the three properties we used to define series—namely, asymmetry, transitiveness, and connexity—is significant on its own. We'll start by discussing each of these three.

Asymmetry, i.e. the property of being incompatible with the converse, is a characteristic of the very greatest interest and importance. In order to develop its functions, we will consider various examples. The relation husband is asymmetrical, and so is the relation wife; i.e. if $a$ is husband of $b$ , $b$ cannot be husband of $a$ , and similarly in the case of wife. On the other hand, the relation "spouse" is symmetrical: if $a$ is spouse of $b$ , then $b$ is spouse of $a$ . Suppose now we are given the relation spouse, and we wish to derive the relation husband. Husband is the same as male spouse or spouse of a female; thus the relation husband can [Pg 42] be derived from spouse either by limiting the domain to males or by limiting the converse to females. We see from this instance that, when a symmetrical relation is given, it is sometimes possible, without the help of any further relation, to separate it into two asymmetrical relations. But the cases where this is possible are rare and exceptional: they are cases where there are two mutually exclusive classes, say $\alpha$ and $\beta$ , such that whenever the relation holds between two terms, one of the terms is a member of $\alpha$ and the other is a member of $\beta$ —as, in the case of spouse, one term of the relation belongs to the class of males and one to the class of females. In such a case, the relation with its domain confined to $\alpha$ will be asymmetrical, and so will the relation with its domain confined to $\beta$ . But such cases are not of the sort that occur when we are dealing with series of more than two terms; for in a series, all terms, except the first and last (if these exist), belong both to the domain and to the converse domain of the generating relation, so that a relation like husband, where the domain and converse domain do not overlap, is excluded.

Asymmetry, or the quality of not being compatible with the reverse, is a property of significant interest and importance. To explore its functions, we will look at different examples. The relationship husband is asymmetrical, and the same applies to wife; in other words, if $a$ is the husband of $b$ , then $b$ cannot be the husband of $a$ , and the same is true for wife. On the flip side, the relationship "spouse" is symmetrical: if $a$ is the spouse of $b$ , then $b$ is the spouse of $a$ . Now, let's say we have the relationship spouse and want to derive the relationship husband. Husband is the same as male spouse or spouse of a female; therefore, the relationship husband can be derived from spouse by either restricting the domain to males or limiting the converse to females. This example shows that when a symmetrical relation is given, it's sometimes possible, without any additional relation, to break it down into two asymmetrical relations. However, such cases are rare and exceptional: they occur when there are two mutually exclusive classes, such as $\alpha$ and $\beta$ , meaning that whenever the relation holds between two terms, one term belongs to $\alpha$ and the other to $\beta$ —like in the case of spouse, where one term is male and the other is female. In such a case, the relation limited to $\alpha$ will be asymmetrical, as will the relation limited to $\beta$ . However, such cases don't typically arise when dealing with series of more than two terms; in a series, all terms, except the first and last (if they exist), belong to both the domain and the converse domain of the generating relation, meaning that a relation like husband, where the domain and converse domain don't overlap, is not possible.

The question how to construct relations having some useful property by means of operations upon relations which only have rudiments of the property is one of considerable importance. Transitiveness and connexity are easily constructed in many cases where the originally given relation does not possess them: for example, if $\mathrm R$ is any relation whatever, the ancestral relation derived from $\mathrm R$ by generalised induction is transitive; and if $\mathrm R$ is a many-one relation, the ancestral relation will be connected if confined to the posterity of a given term. But asymmetry is a much more difficult property to secure by construction. The method by which we derived husband from spouse is, as we have seen, not available in the most important cases, such as greater, before, to the right of, where domain and converse domain overlap. In all these cases, we can of course obtain a symmetrical relation by adding together the given relation and its converse, but we cannot pass back from this symmetrical relation to the original asymmetrical relation except by the help of some asymmetrical [Pg 43] relation. Take, for example, the relation greater: the relation greater or less—i.e. unequal—is symmetrical, but there is nothing in this relation to show that it is the sum of two asymmetrical relations. Take such a relation as "differing in shape." This is not the sum of an asymmetrical relation and its converse, since shapes do not form a single series; but there is nothing to show that it differs from "differing in magnitude" if we did not already know that magnitudes have relations of greater and less. This illustrates the fundamental character of asymmetry as a property of relations.

The question of how to build relationships with useful properties by using operations on relationships that only partially have those properties is quite important. Transitiveness and connexity can often be easily constructed in cases where the initial relationship does not have them: for instance, if $\mathrm R$ is any kind of relation, the ancestral relation derived from $\mathrm R$ using generalized induction is transitive; and if $\mathrm R$ is a many-one relation, the ancestral relation will be connected if limited to the descendants of a given term. However, creating asymmetry is much more challenging. The method we used to derive husband from spouse is not applicable in the most significant cases, such as greater, before, to the right of, where the domain and converse domain overlap. In all these instances, we can certainly obtain a symmetrical relation by combining the given relation and its converse, but we cannot revert from this symmetrical relation back to the original asymmetrical relation without some kind of asymmetrical [Pg 43] relation. For example, consider the relation greater: the relation greater or less—i.e. unequal—is symmetrical, but there is nothing in this relation to indicate that it is the result of two asymmetrical relations. Take the relation "differing in shape." This is not the combination of an asymmetrical relation and its converse, since shapes do not create a single series; but there is nothing to differentiate it from "differing in magnitude" if we didn't already know that magnitudes relate to greater and less. This highlights the essential nature of asymmetry as a property of relationships.

From the point of view of the classification of relations, being asymmetrical is a much more important characteristic than implying diversity. Asymmetrical relations imply diversity, but the converse is not the case. "Unequal," for example, implies diversity, but is symmetrical. Broadly speaking, we may say that, if we wished as far as possible to dispense with relational propositions and replace them by such as ascribed predicates to subjects, we could succeed in this so long as we confined ourselves to symmetrical relations: those that do not imply diversity, if they are transitive, may be regarded as asserting a common predicate, while those that do imply diversity may be regarded as asserting incompatible predicates. For example, consider the relation of similarity between classes, by means of which we defined numbers. This relation is symmetrical and transitive and does not imply diversity. It would be possible, though less simple than the procedure we adopted, to regard the number of a collection as a predicate of the collection: then two similar classes will be two that have the same numerical predicate, while two that are not similar will be two that have different numerical predicates. Such a method of replacing relations by predicates is formally possible (though often very inconvenient) so long as the relations concerned are symmetrical; but it is formally impossible when the relations are asymmetrical, because both sameness and difference of predicates are symmetrical. Asymmetrical relations are, we may [Pg 44] say, the most characteristically relational of relations, and the most important to the philosopher who wishes to study the ultimate logical nature of relations.

From the perspective of classifying relationships, being asymmetric is a much more significant characteristic than simply implying diversity. Asymmetric relationships imply diversity, but the reverse isn’t true. For instance, "unequal" implies diversity, but it is symmetric. Generally speaking, if we wanted to eliminate relational propositions and replace them with predicates assigned to subjects, we could do so as long as we focused on symmetric relationships: those that don’t imply diversity, if they are transitive, can be seen as asserting a common predicate, while those that do imply diversity can be seen as asserting incompatible predicates. For example, consider the similarity between classes, which we used to define numbers. This relationship is symmetric and transitive and doesn’t imply diversity. It would be possible, though less straightforward than our original method, to consider the number of a collection as a predicate of that collection: then two similar classes would be those that share the same numerical predicate, while two that are not similar would be those that have different numerical predicates. This method of replacing relationships with predicates is formally feasible (though often quite inconvenient) as long as the relationships are symmetric; however, it is formally impossible when the relationships are asymmetric because both the sameness and difference of predicates are symmetric. We can say that asymmetric relationships are, by nature, the most relational of relationships and the most crucial for philosophers who want to investigate the foundational logical nature of relations.

Another class of relations that is of the greatest use is the class of one-many relations, i.e. relations which at most one term can have to a given term. Such are father, mother, husband (except in Tibet), square of, sine of, and so on. But parent, square root, and so on, are not one-many. It is possible, formally, to replace all relations by one-many relations by means of a device. Take (say) the relation less among the inductive numbers. Given any number $n$ greater than 1, there will not be only one number having the relation less to $n$ , but we can form the whole class of numbers that are less than $n$ . This is one class, and its relation to $n$ is not shared by any other class. We may call the class of numbers that are less than $n$ the "proper ancestry" of $n$ , in the sense in which we spoke of ancestry and posterity in connection with mathematical induction. Then "proper ancestry" is a one-many relation (one-many will always be used so as to include one-one), since each number determines a single class of numbers as constituting its proper ancestry. Thus the relation less than can be replaced by being a member of the proper ancestry of. In this way a one-many relation in which the one is a class, together with membership of this class, can always formally replace a relation which is not one-many. Peano, who for some reason always instinctively conceives of a relation as one-many, deals in this way with those that are naturally not so. Reduction to one-many relations by this method, however, though possible as a matter of form, does not represent a technical simplification, and there is every reason to think that it does not represent a philosophical analysis, if only because classes must be regarded as "logical fictions." We shall therefore continue to regard one-many relations as a special kind of relations.

Another important type of relationship is the one-many relationship, meaning that at most one term can relate to a given term. Examples include father, mother, husband (except in Tibet), square of, sine of, and so on. However, terms like parent and square root are not one-many. It is formally possible to turn all relationships into one-many relationships using a method. For instance, consider the relationship "less" among whole numbers. For any number $ n $ greater than 1, there isn't just one number that has the "less" relation to $ n $; we can identify the entire set of numbers that are less than $ n $. This forms one class, and its relationship to $ n $ is unique to that class. We can refer to the set of numbers that are less than $ n $ as the "proper ancestry" of $ n $, similar to how we discussed ancestry and posterity in relation to mathematical induction. Thus, "proper ancestry" is a one-many relationship (we'll always interpret one-many to include one-one), since each number defines a single class of numbers that make up its proper ancestry. Therefore, the "less than" relation can be substituted with "being a member of the proper ancestry of." In this way, a one-many relationship, in which the one is a class alongside membership in that class, can always formally replace a relationship that isn’t one-many. Peano, who instinctively thinks of relationships as one-many, approaches those that naturally aren’t in this way. However, reducing relationships to one-many form, while formally possible, doesn't simplify things technically, and it probably doesn’t provide a philosophical analysis, mainly because we have to consider classes as "logical fictions." So, we will continue to think of one-many relationships as a specific type of relationship.

One-many relations are involved in all phrases of the form "the so-and-so of such-and-such." "The King of England," [Pg 45] "the wife of Socrates," "the father of John Stuart Mill," and so on, all describe some person by means of a one-many relation to a given term. A person cannot have more than one father, therefore "the father of John Stuart Mill" described some one person, even if we did not know whom. There is much to say on the subject of descriptions, but for the present it is relations that we are concerned with, and descriptions are only relevant as exemplifying the uses of one-many relations. It should be observed that all mathematical functions result from one-many relations: the logarithm of $x$ , the cosine of $x$ , etc., are, like the father of $x$ , terms described by means of a one-many relation (logarithm, cosine, etc.) to a given term ( $x$ ). The notion of function need not be confined to numbers, or to the uses to which mathematicians have accustomed us; it can be extended to all cases of one-many relations, and "the father of $x$ " is just as legitimately a function of which $x$ is the argument as is "the logarithm of $x$ ." Functions in this sense are descriptive functions. As we shall see later, there are functions of a still more general and more fundamental sort, namely, propositional functions; but for the present we shall confine our attention to descriptive functions, i.e. "the term having the relation $\mathrm R$ to $x$ ," or, for short, "the $\mathrm R$ of $x$ ," where $\mathrm R$ is any one-many relation.

One-to-many relationships are involved in all phrases of the form "the so-and-so of such-and-such." "The King of England," "the wife of Socrates," "the father of John Stuart Mill," and so forth, all describe a person using a one-to-many relationship to a specific term. A person can only have one father, so "the father of John Stuart Mill" refers to a specific individual, even if we don't know who it is. There's a lot to discuss about descriptions, but for now, we're focused on relationships, and descriptions are only relevant as examples of one-to-many relationships. It's important to note that all mathematical functions come from one-to-many relationships: the logarithm of $x$ , the cosine of $x$ , etc., are, like the father of $x$ , terms described through a one-to-many relationship (logarithm, cosine, etc.) to a specific term ( $x$ ). The concept of function doesn't need to be limited to numbers or the applications familiar to mathematicians; it can apply to all instances of one-to-many relationships, and "the father of $x$ is the argument as "the logarithm of $x$ descriptive functions. As we will explore later, there are even more general and fundamental functions, known as propositional functions; but for now, we will focus on descriptive functions, i.e. "the term having the relationship $\mathrm R$ to $x$ , or, for short, "the $\mathrm R$ of $\mathrm R$ is any one-to-many relationship.

It will be observed that if "the $\mathrm R$ of $x$ " is to describe a definite term, $x$ must be a term to which something has the relation $\mathrm R$ , and there must not be more than one term having the relation $\mathrm R$ to $x$ , since "the," correctly used, must imply uniqueness. Thus we may speak of "the father of $x$ " if $x$ is any human being except Adam and Eve; but we cannot speak of "the father of $x$ " if $x$ is a table or a chair or anything else that does not have a father. We shall say that the $\mathrm R$ of $x$ "exists" when there is just one term, and no more, having the relation $\mathrm R$ to $x$ . Thus if $\mathrm R$ is a one-many relation, the $\mathrm R$ of $x$ exists whenever $x$ belongs to the converse domain of $\mathrm R$ , and not otherwise. Regarding "the $\mathrm R$ of $x$ " as a function in the mathematical [Pg 46] sense, we say that $x$ is the "argument" of the function, and if $y$ is the term which has the relation $\mathrm R$ to $x$ , i.e. if $y$ is the $\mathrm R$ of $x$ , then $y$ is the "value" of the function for the argument $x$ . If $\mathrm R$ is a one-many relation, the range of possible arguments to the function is the converse domain of $\mathrm R$ , and the range of values is the domain. Thus the range of possible arguments to the function "the father of $x$ " is all who have fathers, i.e. the converse domain of the relation father, while the range of possible values for the function is all fathers, i.e. the domain of the relation.

It can be noted that if "the $\mathrm R$ of $x$ " is meant to describe a specific term, $x$ has to be a term that has a relation $\mathrm R$ to it, and there can't be more than one term that has the relation $\mathrm R$ to $x$ , because "the," when used correctly, implies uniqueness. Therefore, we can say "the father of $x$ " if $x$ refers to any human being other than Adam and Eve; however, we can't say "the father of $\mathrm R$ of $x$ father, while the range of possible values for the function includes all fathers, meaning the domain of the relation.

Many of the most important notions in the logic of relations are descriptive functions, for example: converse, domain, converse domain, field. Other examples will occur as we proceed.

Many of the key concepts in relational logic are descriptive functions, such as: converse, domain, converse domain, and field. We will encounter more examples as we continue.

Among one-many relations, one-one relations are a specially important class. We have already had occasion to speak of one-one relations in connection with the definition of number, but it is necessary to be familiar with them, and not merely to know their formal definition. Their formal definition may be derived from that of one-many relations: they may be defined as one-many relations which are also the converses of one-many relations, i.e. as relations which are both one-many and many-one. One-many relations may be defined as relations such that, if $x$ has the relation in question to $y$ , there is no other term $x$ ' which also has the relation to $y$ . Or, again, they may be defined as follows: Given two terms $x$ and $x$ ', the terms to which $x$ has the given relation and those to which $x$ ' has it have no member in common. Or, again, they may be defined as relations such that the relative product of one of them and its converse implies identity, where the "relative product" of two relations $\mathrm R$ and $\mathrm S$ is that relation which holds between $x$ and $z$ when there is an intermediate term $y$ , such that $x$ has the relation $\mathrm R$ to $y$ and $y$ has the relation $\mathrm s$ to $z$ . Thus, for example, if $\mathrm R$ is the relation of father to son, the relative product of $\mathrm R$ and its converse will be the relation which holds between $x$ and a man $z$ when there is a person $y$ , such that $x$ is the father of $y$ and $y$ is the son of $z$ . It is obvious that $x$ and $z$ must be [Pg 47] the same person. If, on the other hand, we take the relation of parent and child, which is not one-many, we can no longer argue that, if $x$ is a parent of $y$ and $y$ is a child of $z$ , $x$ and $z$ must be the same person, because one may be the father of $y$ and the other the mother. This illustrates that it is characteristic of one-many relations when the relative product of a relation and its converse implies identity. In the case of one-one relations this happens, and also the relative product of the converse and the relation implies identity. Given a relation $\mathrm R$ , it is convenient, if $x$ has the relation $\mathrm R$ to $y$ , to think of $y$ as being reached from $x$ by an " $\mathrm R$ -step" or an " $\mathrm R$ -vector." In the same case $x$ will be reached from $y$ by a "backward $\mathrm R$ -step." Thus we may state the characteristic of one-many relations with which we have been dealing by saying that an $\mathrm R$ -step followed by a backward $\mathrm R$ -step must bring us back to our starting-point. With other relations, this is by no means the case; for example, if $\mathrm R$ is the relation of child to parent, the relative product of $\mathrm R$ and its converse is the relation "self or brother or sister," and if $\mathrm R$ is the relation of grandchild to grandparent, the relative product of $\mathrm R$ and its converse is "self or brother or sister or first cousin." It will be observed that the relative product of two relations is not in general commutative, i.e. the relative product of $\mathrm R$ and $\mathrm S$ is not in general the same relation as the relative product of $\mathrm S$ and $\mathrm R$ . E.g. the relative product of parent and brother is uncle, but the relative product of brother and parent is parent.

Among one-many relations, one-one relations are a particularly important category. We've already talked about one-one relations in relation to defining numbers, but it's essential to understand them thoroughly, not just their formal definitions. Their formal definition can be derived from that of one-many relations: they can be defined as one-many relations that are also the converses of one-many relations, relations that are both one-many and many-one. One-many relations can be defined as relations such that if $x$ has the relation in question to $y$ , there is no other term $x$ ' that also has the relation to $y$ . Alternatively, they can be defined as follows: Given two terms $x$ and $x$ ', the terms to which $x$ has the given relation and those to which $x$ ' has it have no member in common. Or, they can be defined as relations such that the relative product of one of them and its converse implies identity, where the "relative product" of two relations $\mathrm R$ and $\mathrm S$ is that relation which holds between $x$ and $z$ when there is an intermediate term $y$ , such that $x$ has the relation $\mathrm R$ to $y$ and $y$ has the relation $\mathrm s$ to $z$ . For example, if $\mathrm R$ is the relation of father to son, the relative product of $\mathrm R$ and its converse will be the relation which holds between $x$ and a man $z$ when there is a person $y$ , so that $x$ is the father of $y$ and $y$ is the son of $z$ . It is clear that $x$ and $z$ must be [Pg 47] the same person. If, on the other hand, we take the relation of parent and child, which is not one-many, we can no longer conclude that if $x$ is a parent of $y$ and $y$ is a child of $z$ , $x$ and $z$ must be the same person, because one could be the father of $y$ and the other the mother. This shows that it's typical of one-many relations for the relative product of a relation and its converse to imply identity. In the case of one-one relations, this is the case, and also the relative product of the converse and the relation implies identity. Given a relation $\mathrm R$ , it helps to think of $y$ as being connected to $x$ by an " $\mathrm R$ -step" or an " $\mathrm R$ -vector." In the same context, $x$ will be reached from $y$ by a "backward $\mathrm R$ -step." Thus we can describe the characteristic of one-many relations we've been discussing by saying that an $\mathrm R$ -step followed by a backward $\mathrm R$ -step must take us back to our starting point. With other relations, this isn't always the case; for example, if $\mathrm R$ is the relation of child to parent, the relative product of $\mathrm R$ and its converse is the relation "self or brother or sister," and if $\mathrm R$ is the relation of grandchild to grandparent, the relative product of $\mathrm R$ and its converse is "self or brother or sister or first cousin." It should be noted that the relative product of two relations is not generally commutative, the relative product of $\mathrm R$ and $\mathrm S$ is not generally the same relation as the relative product of $\mathrm S$ and $\mathrm R$

One-one relations give a correlation of two classes, term for term, so that each term in either class has its correlate in the other. Such correlations are simplest to grasp when the two classes have no members in common, like the class of husbands and the class of wives; for in that case we know at once whether a term is to be considered as one from which the correlating relation $\mathrm R$ goes, or as one to which it goes. It is convenient to use the word referent for the term from which the relation goes, and the term relatum for the term to which it goes. Thus if $x$ and $y$ are husband and wife, then, with respect to the relation [Pg 48] "husband," $x$ is referent and $y$ relatum, but with respect to the relation "wife," $y$ is referent and $x$ relatum. We say that a relation and its converse have opposite "senses"; thus the "sense" of a relation that goes from $x$ to $y$ is the opposite of that of the corresponding relation from $y$ to $x$ . The fact that a relation has a "sense" is fundamental, and is part of the reason why order can be generated by suitable relations. It will be observed that the class of all possible referents to a given relation is its domain, and the class of all possible relata is its converse domain.

One-to-one relationships create a direct link between two categories, term for term, so that each term in either category corresponds to a term in the other. These connections are easiest to understand when the two categories have no overlap, like the category of husbands and the category of wives; in that case, we can immediately determine whether a term should be seen as one from which the correlating relationship $\mathrm R$ originates, or as one to which it leads. It’s useful to call the term from which the relation originates the referent, and the term to which it leads the relatum. So, if $x$ and $y$ are husband and wife, then with respect to the relationship "husband," $x$ is the referent and $y$ is the relatum, but with respect to the relationship "wife," $y$ is the referent and $x$ is the relatum. We say that a relationship and its reverse have opposite "senses"; thus, the "sense" of a relationship that goes from $x$ to $y$ is the opposite of the corresponding relationship from $y$ to $x$ . The fact that a relationship has a "sense" is fundamental and is part of the reason why order can be established through appropriate relationships. It should be noted that the class of all possible referents to a given relationship is its domain, and the class of all possible relata is its converse domain.

But it very often happens that the domain and converse domain of a one-one relation overlap. Take, for example, the first ten integers (excluding 0), and add 1 to each; thus instead of the first ten integers we now have the integers $2,\ 3,\ 4,\ 5,\ 6,\ 7,\ 8,\ 9,\ 10,\ 11.$ These are the same as those we had before, except that 1 has been cut off at the beginning and 11 has been joined on at the end. There are still ten integers: they are correlated with the previous ten by the relation of $n$ to $n + 1$ , which is a one-one relation. Or, again, instead of adding 1 to each of our original ten integers, we could have doubled each of them, thus obtaining the integers $2,\ 4,\ 6,\ 8,\ 10,\ 12,\ 14,\ 16,\ 18,\ 20.$ Here we still have five of our previous set of integers, namely, 2, 4, 6, 8, 10. The correlating relation in this case is the relation of a number to its double, which is again a one-one relation. Or we might have replaced each number by its square, thus obtaining the set $1,\ 4,\ 9,\ 16,\ 25,\ 36,\ 49,\ 64,\ 81,\ 100.$ On this occasion only three of our original set are left, namely, 1, 4, 9. Such processes of correlation may be varied endlessly.

But it often happens that the domain and the range of a one-to-one relation overlap. For example, take the first ten integers (excluding 0) and add 1 to each; now instead of the first ten integers, we have the integers $2,\ 3,\ 4,\ 5,\ 6,\ 7,\ 8,\ 9,\ 10,\ 11.$ These are the same as before, except that 1 has been removed from the beginning and 11 has been added at the end. There are still ten integers: they relate to the previous ten by the relation of $n$ to $n + 1$ , which is a one-to-one relation. Alternatively, instead of adding 1 to each of our original ten integers, we could have doubled each of them, which gives us the integers $2,\ 4,\ 6,\ 8,\ 10,\ 12,\ 14,\ 16,\ 18,\ 20.$ Here, we still have five of our original integers: 2, 4, 6, 8, and 10. The correlating relation in this case is the relation of a number to its double, which is also a one-to-one relation. Or we might have substituted each number with its square, resulting in the set $1,\ 4,\ 9,\ 16,\ 25,\ 36,\ 49,\ 64,\ 81,\ 100.$ In this case, only three of our original set remain: 1, 4, and 9. These correlation processes can be varied endlessly.

The most interesting case of the above kind is the case where our one-one relation has a converse domain which is part, but [Pg 49] not the whole, of the domain. If, instead of confining the domain to the first ten integers, we had considered the whole of the inductive numbers, the above instances would have illustrated this case. We may place the numbers concerned in two rows, putting the correlate directly under the number whose correlate it is. Thus when the correlator is the relation of $n$ to $n + 1$ , we have the two rows: $\begin{align*} &1,\ 2,\ 3,\ 4,\ 5,\ \dots\ n,\ \dots \\ &2,\ 3,\ 4,\ 5,\ 6,\ \dots\ n + 1,\ \dots. \end{align*}$ When the correlator is the relation of a number to its double, we have the two rows: $\begin{align*} &1,\ 2,\ 3,\ 4,\,\,\,\, 5,\ \dots\ n,\ \dots \\ &2,\ 4,\ 6,\ 8,\ 10,\ \dots\ 2n,\ \dots. \end{align*}$ When the correlator is the relation of a number to its square, the rows are: $\begin{align*} &1,\ 2,\ 3,\ \,4,\ \,\,\,\,5,\ \dots\ n,\ \dots \\ &1,\ 4,\ 9,\ 16,\ 25,\ \dots\ n^{2},\ \dots. \end{align*}$ In all these cases, all inductive numbers occur in the top row, and only some in the bottom row.

The most interesting example of this type is when our one-to-one relation has a converse domain that is part, but not all, of the domain. If we had looked at all the inductive numbers instead of just the first ten integers, the examples above would demonstrate this case. We can arrange the relevant numbers in two rows, placing the correlate directly under the number it corresponds to. For instance, when the correlator is the relation of $n$ to $n + 1$ , we have the two rows: $\begin{align*} &1,\ 2,\ 3,\ 4,\ 5,\ \dots\ n,\ \dots \\ &2,\ 3,\ 4,\ 5,\ 6,\ \dots\ n + 1,\ \dots. \end{align*}$ When the correlator is the relation of a number to its double, we have the two rows: $\begin{align*} &1,\ 2,\ 3,\ 4,\,\,\,\, 5,\ \dots\ n,\ \dots \\ &2,\ 4,\ 6,\ 8,\ 10,\ \dots\ 2n,\ \dots. \end{align*}$ When the correlator is the relation of a number to its square, the rows are: $\begin{align*} &1,\ 2,\ 3,\ \,4,\ \,\,\,\,5,\ \dots\ n,\ \dots \\ &1,\ 4,\ 9,\ 16,\ 25,\ \dots\ n^{2},\ \dots. \end{align*}$ In all these cases, all inductive numbers appear in the top row, while only some appear in the bottom row.

Cases of this sort, where the converse domain is a "proper part" of the domain (i.e. a part not the whole), will occupy us again when we come to deal with infinity. For the present, we wish only to note that they exist and demand consideration.

Cases like this, where the converse domain is a "proper part" of the domain (i.e. a part, not the whole), will be important when we discuss infinity. For now, we just want to acknowledge that they exist and need to be considered.

Another class of correlations which are often important is the class called "permutations," where the domain and converse domain are identical. Consider, for example, the six possible arrangements of three letters: $\begin{align*} a,\ b,\ c; \\ a,\ c,\ b; \\ b,\ c,\ a; \\ b,\ a,\ c; \\ c,\ a,\ b; \\ c,\ b,\ a. \end{align*}$ [Pg 50] Each of these can be obtained from any one of the others by means of a correlation. Take, for example, the first and last, $(a, b, c)$ and $(c, b, a)$ . Here $a$ is correlated with $c$ , $b$ with itself, and $c$ with $a$ . It is obvious that the combination of two permutations is again a permutation, i.e. the permutations of a given class form what is called a "group."

Another important type of correlation is known as "permutations," where both the original and the reverse sets are the same. For example, consider the six different arrangements of three letters: $\begin{align*} a,\ b,\ c; \\ a,\ c,\ b; \\ b,\ c,\ a; \\ b,\ a,\ c; \\ c,\ a,\ b; \\ c,\ b,\ a. \end{align*}$ [Pg 50] Each arrangement can be derived from any of the others through a correlation. Take, for instance, the first and last arrangements, $(a, b, c)$ and $(c, b, a)$ . In this case, $a$ is linked to $c$ , $b$ is linked to itself, and $c$ is linked to $a$

These various kinds of correlations have importance in various connections, some for one purpose, some for another. The general notion of one-one correlations has boundless importance in the philosophy of mathematics, as we have partly seen already, but shall see much more fully as we proceed. One of its uses will occupy us in our next chapter. [Pg 51]

These different kinds of correlations are important in various contexts, some for one reason and some for another. The overall idea of one-to-one correlations is extremely significant in the philosophy of mathematics, as we've started to explore, but we will dive deeper into it as we go along. One of its applications will be discussed in our next chapter. [Pg 51]

CHAPTER VI

SIMILARITY OF RELATIONS

WE saw in Chapter II. that two classes have the same number of terms when they are "similar," i.e. when there is a one-one relation whose domain is the one class and whose converse domain is the other. In such a case we say that there is a "one-one correlation" between the two classes.

WE saw in Chapter II that two classes have the same number of terms when they are "similar," i.e. when there is a one-to-one relationship where one class is the domain and the other is its converse. In this case, we say there is a "one-to-one correlation" between the two classes.

In the present chapter we have to define a relation between relations, which will play the same part for them that similarity of classes plays for classes. We will call this relation "similarity of relations," or "likeness" when it seems desirable to use a different word from that which we use for classes. How is likeness to be defined?

In this chapter, we need to establish a connection between relations that serves the same purpose for them as class similarity does for classes. We will refer to this connection as "similarity of relations" or "likeness" when it feels better to use a different term than the one we use for classes. How should we define likeness?

We shall employ still the notion of correlation: we shall assume that the domain of the one relation can be correlated with the domain of the other, and the converse domain with the converse domain; but that is not enough for the sort of resemblance which we desire to have between our two relations. What we desire is that, whenever either relation holds between two terms, the other relation shall hold between the correlates of these two terms. The easiest example of the sort of thing we desire is a map. When one place is north of another, the place on the map corresponding to the one is above the place on the map corresponding to the other; when one place is west of another, the place on the map corresponding to the one is to the left of the place on the map corresponding to the other; and so on. The structure of the map corresponds with that of [Pg 52] the country of which it is a map. The space-relations in the map have "likeness" to the space-relations in the country mapped. It is this kind of connection between relations that we wish to define.

We will still use the concept of correlation: we will assume that the domain of one relation can be linked to the domain of the other, and the opposite domain with the opposite domain; but that’s not enough for the kind of similarity we want to establish between our two relations. What we are looking for is that, whenever either relation applies to two terms, the other relation should also apply to the corresponding elements of those two terms. The simplest example of what we want is a map. When one location is north of another, the spot on the map that matches the first is above the spot on the map that matches the second; when one location is west of another, the spot on the map that matches the first is to the left of the spot on the map that matches the second; and so on. The layout of the map reflects that of the area it represents. The spatial relationships on the map are similar to the spatial relationships in the actual area. It’s this kind of connection between relations that we want to define.

We may, in the first place, profitably introduce a certain restriction. We will confine ourselves, in defining likeness, to such relations as have "fields," i.e. to such as permit of the formation of a single class out of the domain and the converse domain. This is not always the case. Take, for example, the relation "domain," i.e. the relation which the domain of a relation has to the relation. This relation has all classes for its domain, since every class is the domain of some relation; and it has all relations for its converse domain, since every relation has a domain. But classes and relations cannot be added together to form a new single class, because they are of different logical "types." We do not need to enter upon the difficult doctrine of types, but it is well to know when we are abstaining from entering upon it. We may say, without entering upon the grounds for the assertion, that a relation only has a "field" when it is what we call "homogeneous," i.e. when its domain and converse domain are of the same logical type; and as a rough-and-ready indication of what we mean by a "type," we may say that individuals, classes of individuals, relations between individuals, relations between classes, relations of classes to individuals, and so on, are different types. Now the notion of likeness is not very useful as applied to relations that are not homogeneous; we shall, therefore, in defining likeness, simplify our problem by speaking of the "field" of one of the relations concerned. This somewhat limits the generality of our definition, but the limitation is not of any practical importance. And having been stated, it need no longer be remembered. We may define two relations $\mathrm P$ and $\mathrm Q$ as "similar," or as having "likeness," when there is a one-one relation $\mathrm S$ whose domain is the field of $\mathrm P$ and whose converse domain is the field of $\mathrm Q$ , and which is such that, if one term has the relation $\mathrm P$ [Pg 53] to another, the correlate of the one has the relation $\mathrm Q$ to the correlate of the other, and vice versa.

We can start by introducing a specific limitation. We'll limit our definition of likeness to relations that have "fields," i.e. those that allow for the creation of a single class from both the domain and the converse domain. This isn’t always the case. For instance, consider the relation "domain," i.e. the relation between the domain of a relation and the relation itself. This relation includes all classes as its domain, since every class is the domain of some relation; and it includes all relations as its converse domain, since every relation has a domain. However, classes and relations cannot be combined to make a new single class because they belong to different logical "types." We don't need to delve into the complex concept of types, but it's good to recognize when we're choosing not to. We can state, without needing to explain how we know this, that a relation only has a "field" when it is what we call "homogeneous," i.e. when its domain and converse domain are of the same logical type; and as a rough guide to what we mean by a "type," we can say that individuals, classes of individuals, relations between individuals, relations between classes, relations of classes to individuals, and so on, are all different types. Now, the idea of likeness isn't very helpful when applied to non-homogeneous relations; therefore, in defining likeness, we'll simplify our approach by discussing the "field" of one of the relations in question. This slightly reduces the general applicability of our definition, but this limitation isn't practically significant. Once stated, it doesn’t need to be remembered. We can define two relations $\mathrm P$ and $\mathrm Q$ as "similar," or as having "likeness," when there is a one-to-one relation $\mathrm S$ whose domain is the field of $\mathrm P$ and whose converse domain is the field of $\mathrm Q$ , and where if one term has the relation $\mathrm P$ to another, the correlate of the first term has the relation $\mathrm Q$ to the correlate of the second term, and vice versa.

fig1 A figure will make this clearer. Let $x$ and $y$ be two terms having the relation $\mathrm P$ . Then there are to be two terms $z$ , $w$ , such that $x$ has the relation $\mathrm S$ to $z$ , $y$ has the relation $\mathrm S$ to $w$ , and $z$ has the relation $\mathrm Q$ to $w$ . If this happens with every pair of terms such as $x$ and $y$ , and if the converse happens with every pair of terms such as $z$ and $w$ , it is clear that for every instance in which the relation $\mathrm P$ holds there is a corresponding instance in which the relation $\mathrm Q$ holds, and vice versa; and this is what we desire to secure by our definition. We can eliminate some redundancies in the above sketch of a definition, by observing that, when the above conditions are realised, the relation $\mathrm P$ is the same as the relative product of $\mathrm S$ and $\mathrm Q$ and the converse of $\mathrm S$ , i.e. the $\mathrm P$ -step from $x$ to $y$ may be replaced by the succession of the $\mathrm S$ -step from $x$ to $z$ , the $\mathrm Q$ -step from $z$ to $w$ , and the backward $\mathrm S$ -step from $w$ to $y$ . Thus we may set up the following definitions:—

fig1 A diagram will make this clearer. Let $x$ and $y$ be two terms that have the relation $\mathrm P$ . Then there are two terms $z$ , $w$ , such that $x$ has the relation $\mathrm S$ to $z$ , $y$ has the relation $\mathrm S$ to $w$ , and $z$ has the relation $\mathrm Q$ to $w$ . If this happens with every pair of terms like $x$ and $y$ , and if the opposite happens with every pair of terms like $z$ and $w$ , it’s clear that for every situation where the relation $\mathrm P$ holds, there is a corresponding situation where the relation $\mathrm Q$ holds, and vice versa; and this is what we want to establish with our definition. We can remove some redundancies in the above outline of a definition, by noting that, when the above conditions are met, the relation $\mathrm P$ is the same as the relative product of $\mathrm S$ and $\mathrm Q$ and the converse of $\mathrm S$ , i.e. the $\mathrm P$ -step from $x$ to $y$ might be replaced by the sequence of the $\mathrm S$ -step from $x$ to $z$ , the $\mathrm Q$ -step from $z$ to $w$ and the backward $\mathrm S$ -step from $w$ to $y$ . Thus we can set up the following definitions:—

A relation $\mathrm S$ is said to be a "correlator" or an "ordinal correlator" of two relations $\mathrm P$ and $\mathrm Q$ if $\mathrm S$ is one-one, has the field of $\mathrm Q$ for its converse domain, and is such that $\mathrm P$ is the relative product of $\mathrm S$ and $\mathrm Q$ and the converse of $\mathrm S$ .

A relation $\mathrm S$ is called a "correlator" or an "ordinal correlator" of two relations $\mathrm P$ and $\mathrm Q$ if $\mathrm S$ is one-to-one, has the field of $\mathrm Q$ for its converse domain, and is such that $\mathrm P$ is the relative product of $\mathrm S$ and $\mathrm Q$ and the converse of $\mathrm S$ .

Two relations $\mathrm P$ and $\mathrm Q$ are said to be "similar," or to have "likeness," when there is at least one correlator of $\mathrm P$ and $\mathrm Q$ .

Two relations $\mathrm P$ and $\mathrm Q$ are called "similar," or have "likeness," when there is at least one correlator of $\mathrm P$ and $\mathrm Q$ .

These definitions will be found to yield what we above decided to be necessary.

These definitions will provide what we determined earlier to be essential.

It will be found that, when two relations are similar, they share all properties which do not depend upon the actual terms in their fields. For instance, if one implies diversity, so does the other; if one is transitive, so is the other; if one is connected, so is the other. Hence if one is serial, so is the other. Again, if one is one-many or one-one, the other is one-many [Pg 54] or one-one; and so on, through all the general properties of relations. Even statements involving the actual terms of the field of a relation, though they may not be true as they stand when applied to a similar relation, will always be capable of translation into statements that are analogous. We are led by such considerations to a problem which has, in mathematical philosophy, an importance by no means adequately recognised hitherto. Our problem may be stated as follows:—

It will be noted that when two relations are similar, they share all properties that aren’t dependent on the actual terms in their fields. For example, if one implies diversity, the other does too; if one is transitive, so is the other; if one is connected, the same applies to the other. Therefore, if one is serial, the other is also serial. Moreover, if one is one-to-many or one-to-one, the other will also be one-to-many or one-to-one, and this applies to all general properties of relations. Even statements that involve the actual terms of a relation's field, while they may not hold true as they are when applied to a similar relation, can always be transformed into analogous statements. These considerations lead us to a problem that is significantly important in mathematical philosophy but has not been adequately acknowledged so far. Our problem can be expressed as follows:—

Given some statement in a language of which we know the grammar and the syntax, but not the vocabulary, what are the possible meanings of such a statement, and what are the meanings of the unknown words that would make it true?

Given a statement in a language where we understand the grammar and syntax but not the vocabulary, what could the possible meanings of that statement be, and what would the meanings of the unknown words be that would make it true?

The reason that this question is important is that it represents, much more nearly than might be supposed, the state of our knowledge of nature. We know that certain scientific propositions—which, in the most advanced sciences, are expressed in mathematical symbols—are more or less true of the world, but we are very much at sea as to the interpretation to be put upon the terms which occur in these propositions. We know much more (to use, for a moment, an old-fashioned pair of terms) about the form of nature than about the matter. Accordingly, what we really know when we enunciate a law of nature is only that there is probably some interpretation of our terms which will make the law approximately true. Thus great importance attaches to the question: What are the possible meanings of a law expressed in terms of which we do not know the substantive meaning, but only the grammar and syntax? And this question is the one suggested above.

The reason this question is important is that it reflects, much more closely than one might think, the state of our understanding of nature. We know that certain scientific claims—which, in the most advanced sciences, are expressed in mathematical symbols—are more or less accurate regarding the world, but we are quite uncertain about how to interpret the terms used in these claims. We actually know a lot more (to use an old-fashioned pair of terms for a moment) about the form of nature than we do about the matter. Therefore, what we really understand when we state a law of nature is only that there is probably some interpretation of our terms that will make the law roughly accurate. Thus, a key question arises: What are the possible meanings of a law expressed in terms whose substantive meaning we don't understand, but only the grammar and syntax? And this question is the one mentioned above.

For the present we will ignore the general question, which will occupy us again at a later stage; the subject of likeness itself must first be further investigated.

For now, we'll set aside the broader question, which we will revisit later; we first need to investigate the topic of similarity more thoroughly.

Owing to the fact that, when two relations are similar, their properties are the same except when they depend upon the fields being composed of just the terms of which they are composed, it is desirable to have a nomenclature which collects [Pg 55] together all the relations that are similar to a given relation. Just as we called the set of those classes that are similar to a given class the "number" of that class, so we may call the set of all those relations that are similar to a given relation the "number" of that relation. But in order to avoid confusion with the numbers appropriate to classes, we will speak, in this case, of a "relation-number." Thus we have the following definitions:—

Since similar relations have the same properties except when they rely on the specific terms they're composed of, it's useful to have a naming system that groups together all relations similar to a given relation. Just as we referred to the set of classes that are similar to a specific class as the "number" of that class, we can also refer to the set of all relations that are similar to a particular relation as the "number" of that relation. However, to avoid confusion with the numbers associated with classes, we will use the term "relation-number" in this context. Therefore, we have the following definitions:— [Pg 55]

The "relation-number" of a given relation is the class of all those relations that are similar to the given relation.

The "relation-number" of a specific relation is the group of all relations that are similar to that given relation.

"Relation-numbers" are the set of all those classes of relations that are relation-numbers of various relations; or, what comes to the same thing, a relation number is a class of relations consisting of all those relations that are similar to one member of the class.

"Relation-numbers" are the collection of all those types of relations that correspond to various relations; in other words, a relation number is a group of relations that includes all those relations similar to one member of the group.

When it is necessary to speak of the numbers of classes in a way which makes it impossible to confuse them with relation-numbers, we shall call them "cardinal numbers." Thus cardinal numbers are the numbers appropriate to classes. These include the ordinary integers of daily life, and also certain infinite numbers, of which we shall speak later. When we speak of "numbers" without qualification, we are to be understood as meaning cardinal numbers. The definition of a cardinal number, it will be remembered, is as follows:—

When we need to talk about the number of classes in a way that can't be mixed up with relation-numbers, we'll refer to them as "cardinal numbers." So, cardinal numbers are the numbers that relate to classes. These include the regular whole numbers we use every day, and also some infinite numbers, which we will discuss later. When we mention "numbers" without any other context, we mean cardinal numbers. The definition of a cardinal number, as you may recall, is as follows:—

The "cardinal number" of a given class is the set of all those classes that are similar to the given class.

The "cardinal number" of a specific class is the group of all classes that are similar to that class.

The most obvious application of relation-numbers is to series. Two series may be regarded as equally long when they have the same relation-number. Two finite series will have the same relation-number when their fields have the same cardinal number of terms, and only then—i.e. a series of (say) 15 terms will have the same relation-number as any other series of fifteen terms, but will not have the same relation-number as a series of 14 or 16 terms, nor, of course, the same relation-number as a relation which is not serial. Thus, in the quite special case of finite series, there is parallelism between cardinal and relation-numbers. The relation-numbers applicable to series may be [Pg 56] called "serial numbers" (what are commonly called "ordinal numbers" are a sub-class of these); thus a finite serial number is determinate when we know the cardinal number of terms in the field of a series having the serial number in question. If $n$ is a finite cardinal number, the relation-number of a series which has $n$ terms is called the "ordinal" number $n$ . (There are also infinite ordinal numbers, but of them we shall speak in a later chapter.) When the cardinal number of terms in the field of a series is infinite, the relation-number of the series is not determined merely by the cardinal number, indeed an infinite number of relation-numbers exist for one infinite cardinal number, as we shall see when we come to consider infinite series. When a series is infinite, what we may call its "length," i.e. its relation-number, may vary without change in the cardinal number; but when a series is finite, this cannot happen.

The most obvious use of relation-numbers is for series. Two series can be seen as equally long when they share the same relation-number. Two finite series will have the same relation-number if they have the same number of terms, and only then—for example, a series with 15 terms will have the same relation-number as any other series with fifteen terms, but it won't have the same relation-number as a series with 14 or 16 terms, nor the same relation-number as something that isn't a series. Thus, in the specific case of finite series, there is a parallel between cardinal and relation-numbers. The relation-numbers relevant to series can be called "serial numbers" (what are usually referred to as "ordinal numbers" are a subset of these); thus, a finite serial number is defined when we know the cardinal number of terms in the field of a series with that serial number. If $n$ is a finite cardinal number, the relation-number of a series with $n$ terms is called the "ordinal" number $n$ . (There are also infinite ordinal numbers, but we'll discuss those in a later chapter.) When the cardinal number of terms in the field of a series is infinite, the relation-number of the series isn't determined just by the cardinal number; in fact, there are infinite relation-numbers for one infinite cardinal number, as we'll see when we look at infinite series. When a series is infinite, what we might call its "length," i.e. its relation-number, can change without affecting the cardinal number; but when a series is finite, this cannot happen.

We can define addition and multiplication for relation-numbers as well as for cardinal numbers, and a whole arithmetic of relation-numbers can be developed. The manner in which this is to be done is easily seen by considering the case of series. Suppose, for example, that we wish to define the sum of two non-overlapping series in such a way that the relation-number of the sum shall be capable of being defined as the sum of the relation-numbers of the two series. In the first place, it is clear that there is an order involved as between the two series: one of them must be placed before the other. Thus if $\mathrm P$ and $\mathrm Q$ are the generating relations of the two series, in the series which is their sum with $\mathrm P$ put before $\mathrm Q$ , every member of the field of $\mathrm P$ will precede every member of the field of $\mathrm Q$ . Thus the serial relation which is to be defined as the sum of $\mathrm P$ and $\mathrm Q$ is not " $\mathrm P$ or $\mathrm Q$ " simply, but " $\mathrm P$ or $\mathrm Q$ or the relation of any member of the field of $\mathrm P$ to any member of the field of $\mathrm Q$ ." Assuming that $\mathrm P$ and $\mathrm Q$ do not overlap, this relation is serial, but " $\mathrm P$ or $\mathrm Q$ " is not serial, being not connected, since it does not hold between a member of the field of $\mathrm P$ and a member of the field of $\mathrm Q$ . Thus the sum of $\mathrm P$ and $\mathrm Q$ , as above defined, is what we need in order [Pg 57] to define the sum of two relation-numbers. Similar modifications are needed for products and powers. The resulting arithmetic does not obey the commutative law: the sum or product which they are taken. But it obeys the associative law, one form of the distributive law, and two of the formal laws for powers, not only as applied to serial numbers, but as applied to relation-numbers generally. Relation-arithmetic, in fact, though recent, is a thoroughly respectable branch of mathematics.

We can define addition and multiplication for relation-numbers as well as for cardinal numbers, and a complete arithmetic system of relation-numbers can be developed. This can be easily understood by looking at the case of series. Suppose, for instance, that we want to define the sum of two non-overlapping series so that the relation-number of the sum can be defined as the sum of the relation-numbers of the two series. First, it's clear that there is an order between the two series: one must come before the other. So if $\mathrm P$ and $\mathrm Q$ are the generating relations of the two series, in the series that represents their sum, with $\mathrm P$ before $\mathrm Q$ , every member of the field of $\mathrm P$ will come before every member of the field of $\mathrm Q$ . Therefore, the serial relation that is to be defined as the sum of $\mathrm P$ and $\mathrm Q$ is not simply " $\mathrm P$ or $\mathrm P$ or $\mathrm Q$ or the relation of any member of the field of $\mathrm P$ to any member of the field of $\mathrm Q$ ." Assuming that $\mathrm P$ and $\mathrm Q$ do not overlap, this relation is serial, but " $\mathrm P$ or $\mathrm P$ and a member of the field of $\mathrm P$ and $\mathrm Q$

It must not be supposed, merely because series afford the most obvious application of the idea of likeness, that there are no other applications that are important. We have already mentioned maps, and we might extend our thoughts from this illustration to geometry generally. If the system of relations by which a geometry is applied to a certain set of terms can be brought fully into relations of likeness with a system applying to another set of terms, then the geometry of the two sets is indistinguishable from the mathematical point of view, i.e. all the propositions are the same, except for the fact that they are applied in one case to one set of terms and in the other to another. We may illustrate this by the relations of the sort that may be called "between," which we considered in Chapter IV. We there saw that, provided a three-term relation has certain formal logical properties, it will give rise to series, and may be called a "between-relation." Given any two points, we can use the between-relation to define the straight line determined by those two points; it consists of $a$ and $b$ together with all points $x$ , such that the between-relation holds between the three points $a$ , $b$ , $x$ in some order or other. It has been shown by O. Veblen that we may regard our whole space as the field of a three-term between-relation, and define our geometry by the properties we assign to our between-relation.[13] Now likeness is just as easily [Pg 58] definable between three-term relations as between two-term relations. If $\mathrm B$ and $\mathrm B$ ' are two between-relations, so that " $x\mathrm B(y, z)$ " means " $x$ is between $y$ and $z$ with respect to $\mathrm B$ ," we shall call $\mathrm S$ a correlator of $\mathrm B$ and $\mathrm B$ ' if it has the field of $\mathrm B$ ' for its converse domain, and is such that the relation $\mathrm B$ holds between three terms when $\mathrm B$ ' holds between their $\mathrm S$ -correlates, and only then. And we shall say that $\mathrm B$ is like $\mathrm B$ ' when there is at least one correlator of $\mathrm B$ with $\mathrm B$ '. The reader can easily convince himself that, if $\mathrm B$ is like $\mathrm B$ ' in this sense, there can be no difference between the geometry generated by $\mathrm B$ and that generated by $\mathrm B$ '.

It shouldn't be assumed, just because sequences provide a clear application of the idea of similarity, that there aren't other significant applications. We've already mentioned maps, and we can broaden our thinking from this example to geometry as a whole. If the relationship system by which a geometry applies to a specific set of concepts can be fully aligned with a system that applies to another set of concepts, then the geometries of the two sets are indistinguishable from a mathematical perspective, meaning all the propositions are the same, except they are applied to one set of terms in one case and to another in the other. We can illustrate this with the relationships that could be termed "between," which we examined in Chapter IV. There we established that if a three-term relationship has certain formal logical properties, it will give rise to sequences and can be called a "between-relation." Given any two points, we can use the between-relation to define the straight line determined by those points; it consists of $a$ and $b$ along with all points $x$ , where the between-relation holds among the three points $a$ , $b$ , and $x$ in some order. O. Veblen has shown that we can consider our entire space as the field of a three-term between-relation, and define our geometry through the properties we assign to our between-relation.[13] Now, similarity can be just as easily defined among three-term relations as it is among two-term relations. If $\mathrm B$ and $\mathrm B$ ' are two between-relations, so that " $x\mathrm B(y, z)$ " means " $x$ is between $y$ and $z$ with respect to $\mathrm B$ ," we will call $\mathrm S$ a correlator of $\mathrm B$ and $\mathrm B$ ' if it has the field of $\mathrm B$ ' for its converse domain, and it is true that the relation $\mathrm B$ holds among three terms when $\mathrm B$ holds among their $\mathrm S$ -correlates, and only then. We will say that $\mathrm B$ is similar to $\mathrm B$ ' when there is at least one correlator of $\mathrm B$ with $\mathrm B$ '. The reader can easily see that if $\mathrm B$ is similar to $\mathrm B$ ' in this sense, there can be no distinction between the geometry generated by $\mathrm B$ and that generated by $\mathrm B$ '.

[13]This does not apply to elliptic space, but only to spaces in which the straight line is an open series. Modern Mathematics, edited by J. W. A. Young, pp. 3-51 (monograph by O. Veblen on "The Foundations of Geometry").

[13]This only applies to spaces where straight lines form an open series, not to elliptic space. Modern Mathematics, edited by J. W. A. Young, pp. 3-51 (monograph by O. Veblen on "The Foundations of Geometry").

It follows from this that the mathematician need not concern himself with the particular being or intrinsic nature of his points, lines, and planes, even when he is speculating as an applied mathematician. We may say that there is empirical evidence of the approximate truth of such parts of geometry as are not matters of definition. But there is no empirical evidence as to what a "point" is to be. It has to be something that as nearly as possible satisfies our axioms, but it does not have to be "very small" or "without parts." Whether or not it is those things is a matter of indifference, so long as it satisfies the axioms. If we can, out of empirical material, construct a logical structure, no matter how complicated, which will satisfy our geometrical axioms, that structure may legitimately be called a "point." We must not say that there is nothing else that could legitimately be called a "point"; we must only say: "This object we have constructed is sufficient for the geometer; it may be one of many objects, any of which would be sufficient, but that is no concern of ours, since this object is enough to vindicate the empirical truth of geometry, in so far as geometry is not a matter of definition." This is only an illustration of the general principle that what matters in mathematics, and to a very great extent in physical science, is not the intrinsic nature of our terms, but the logical nature of their interrelations.

It follows that the mathematician doesn't need to worry about the specific existence or inherent nature of his points, lines, and planes, even when he is thinking as an applied mathematician. We can say that there is practical evidence supporting the approximate truth of those parts of geometry that aren't just definitions. However, there's no practical evidence about what a "point" actually is. It should be something that comes as close as possible to meeting our axioms, but it doesn't have to be "very small" or "without parts." Whether it has those qualities is unimportant, as long as it meets the axioms. If we can create a logical structure from empirical material, no matter how complex, that satisfies our geometric axioms, that structure can rightfully be called a "point." We shouldn't claim that nothing else could appropriately be termed a "point"; we should simply state: "This object we have created is adequate for the geometer; it might be one of many possible objects that would work, but that doesn't concern us, since this object is sufficient to validate the practical truth of geometry, as long as geometry isn't just about definitions." This exemplifies the broader principle that in mathematics—and to a large extent in physical science—what truly matters isn't the inherent nature of our terms, but the logical relationships between them.

We may say, of two similar relations, that they have the same [Pg 59] "structure." For mathematical purposes (though not for those of pure philosophy) the only thing of importance about a relation is the cases in which it holds, not its intrinsic nature. Just as a class may be defined by various different but co-extensive concepts—e.g. "man" and "featherless biped,"—so two relations which are conceptually different may hold in the same set of instances. An "instance" in which a relation holds is to be conceived as a couple of terms, with an order, so that one of the terms comes first and the other second; the couple is to be, of course, such that its first term has the relation in question to its second. Take (say) the relation "father": we can define what we may call the "extension" of this relation as the class of all ordered couples $(x, y)$ which are such that $x$ is the father of $y$ . From the mathematical point of view, the only thing of importance about the relation "father" is that it defines this set of ordered couples. Speaking generally, we say:

We can say that two similar relationships have the same [Pg 59] "structure." For math purposes (even though not for pure philosophy), the only important thing about a relationship is the cases in which it applies, not its inherent nature. Just like a class can be defined by various different but overlapping concepts—e.g. "man" and "featherless biped"—two relationships that are conceptually different can still apply to the same set of cases. An "instance" where a relationship holds is thought of as a pair of terms with an order, so that one term comes first and the other second; this pair must be such that the first term has the specified relationship with the second. For example, consider the relationship "father": we can define what we might call the "extension" of this relationship as the class of all ordered pairs $(x, y)$ such that $x$ is the father of $y$ . From a mathematical standpoint, the only important thing about the "father" relationship is that it defines this set of ordered pairs. In general, we say:

The "extension" of a relation is the class of those ordered couples $(x, y)$ which are such that $x$ has the relation in question to $y$ .

The "extension" of a relation is the set of ordered pairs $(x, y)$ where $x$ has the specified relation to $y$ .

We can now go a step further in the process of abstraction, and consider what we mean by "structure." Given any relation, we can, if it is a sufficiently simple one, construct a map of it. For the sake of definiteness, let us take a relation of which the extension is the following couples: $ab$ , $ac$ , $ad$ , $bc$ , $ce$ , $dc$ , $de$ , where $a$ , $b$ , $c$ , $d$ , $e$ are five terms, no matter what.

We can now take the next step in the process of abstraction and think about what we mean by "structure." For any relationship, if it's simple enough, we can create a map of it. To be clear, let’s consider a relation with the following pairs: $ab$ , $ac$ , $ad$ , $bc$ , $ce$ , $dc$ , $de$ , where $a$ , $b$ , $c$ , $d$ , $e$ are five terms, regardless of anything else.

fig2 We may make a "map" of this relation by taking five points on a plane and connecting them by arrows, as in the accompanying figure. What is revealed by the map is what we call the "structure" of the relation.

fig2 We can create a "map" of this relationship by selecting five points on a flat surface and connecting them with arrows, like in the figure shown. What the map shows us is what we refer to as the "structure" of the relationship.

It is clear that the "structure" of the relation does not depend upon the particular terms that make up the field of the relation. The field may be changed without changing the structure, and the structure may be changed without changing the field—for [Pg 60] example, if we were to add the couple $ae$ in the above illustration we should alter the structure but not the field. Two relations have the same "structure," we shall say, when the same map will do for both—or, what comes to the same thing, when either can be a map for the other (since every relation can be its own map). And that, as a moment's reflection shows, is the very same thing as what we have called "likeness." That is to say, two relations have the same structure when they have likeness, i.e. when they have the same relation-number. Thus what we defined as the "relation-number" is the very same thing as is obscurely intended by the word "structure"—a word which, important as it is, is never (so far as we know) defined in precise terms by those who use it.

It’s clear that the "structure" of a relationship doesn’t rely on the specific terms that make up that relationship. The field can change without altering the structure, and the structure can change without affecting the field— for [Pg 60] example, if we were to add the couple $ae$ in the illustration above, we would change the structure but not the field. Two relationships have the same "structure," we can say, when the same map works for both—or, looking at it another way, when either can be a map for the other (since any relationship can map to itself). And that, as a moment’s thought indicates, is exactly the same as what we’ve referred to as "likeness." In other words, two relationships have the same structure when they are alike, i.e. when they share the same relation-number. Therefore, what we defined as the "relation-number" is precisely what is vaguely implied by the term "structure"—a term that, as important as it is, has never (to our knowledge) been clearly defined by those who use it.

There has been a great deal of speculation in traditional philosophy which might have been avoided if the importance of structure, and the difficulty of getting behind it, had been realised. For example, it is often said that space and time are subjective, but they have objective counterparts; or that phenomena are subjective, but are caused by things in themselves, which must have differences inter se corresponding with the differences in the phenomena to which they give rise. Where such hypotheses are made, it is generally supposed that we can know very little about the objective counterparts. In actual fact, however, if the hypotheses as stated were correct, the objective counterparts would form a world having the same structure as the phenomenal world, and allowing us to infer from phenomena the truth of all propositions that can be stated in abstract terms and are known to be true of phenomena. If the phenomenal world has three dimensions, so must the world behind phenomena; if the phenomenal world is Euclidean, so must the other be; and so on. In short, every proposition having a communicable significance must be true of both worlds or of neither: the only difference must lie in just that essence of individuality which always eludes words and baffles description, but which, for that very reason, is irrelevant to science. Now the only purpose that philosophers [Pg 61] have in view in condemning phenomena is in order to persuade themselves and others that the real world is very different from the world of appearance. We can all sympathise with their wish to prove such a very desirable proposition, but we cannot congratulate them on their success. It is true that many of them do not assert objective counterparts to phenomena, and these escape from the above argument. Those who do assert counterparts are, as a rule, very reticent on the subject, probably because they feel instinctively that, if pursued, it will bring about too much of a rapprochement between the real and the phenomenal world. If they were to pursue the topic, they could hardly avoid the conclusions which we have been suggesting. In such ways, as well as in many others, the notion of structure or relation-number is important. [Pg 62]

There has been a lot of speculation in traditional philosophy that could have been avoided if the importance of structure and the challenge of understanding it had been recognized. For example, people often claim that space and time are subjective, but they have objective counterparts; or that phenomena are subjective but are caused by things in themselves, which must have differences corresponding to the differences in the phenomena they create. When such hypotheses are made, it is generally assumed that we can know very little about the objective counterparts. In reality, however, if the hypotheses were accurate, the objective counterparts would form a world with the same structure as the phenomenal world, allowing us to infer the truth of all statements that can be expressed in abstract terms and are known to be true of phenomena. If the phenomenal world has three dimensions, then the world beyond phenomena must also have three dimensions; if the phenomenal world is Euclidean, so must be the other one; and so on. In summary, every statement with a communicable significance must be true of both worlds or of neither: the only difference must be in that essence of individuality which always escapes words and defies description, but which, for that very reason, is irrelevant to science. The only goal philosophers have in rejecting phenomena is to convince themselves and others that the real world is very different from the world of appearances. We can all empathize with their desire to prove such a desirable proposition, but we can't applaud their success. It's true that many of them don’t claim objective counterparts to phenomena, and these avoid the above argument. Those who do claim counterparts tend to be very cautious about it, likely because they sense that, if pursued, it will lead to too much of a rapprochement between the real and the phenomenal world. If they were to delve deeper into the topic, they would almost certainly reach the conclusions we have been suggesting. In these and many other ways, the concept of structure or relation-number is significant.

CHAPTER VII

RATIONAL, REAL, AND COMPLEX NUMBERS

WE have now seen how to define cardinal numbers, and also relation-numbers, of which what are commonly called ordinal numbers are a particular species. It will be found that each of these kinds of number may be infinite just as well as finite. But neither is capable, as it stands, of the more familiar extensions of the idea of number, namely, the extensions to negative, fractional, irrational, and complex numbers. In the present chapter we shall briefly supply logical definitions of these various extensions.

WE have now seen how to define cardinal numbers and also relation-numbers, of which the commonly known ordinal numbers are a specific type. You'll find that each of these kinds of numbers can be both infinite and finite. However, neither is capable, in its current form, of the more familiar extensions of the idea of number, such as negative, fractional, irrational, and complex numbers. In this chapter, we will briefly provide logical definitions for these various extensions.

One of the mistakes that have delayed the discovery of correct definitions in this region is the common idea that each extension of number included the previous sorts as special cases. It was thought that, in dealing with positive and negative integers, the positive integers might be identified with the original signless integers. Again it was thought that a fraction whose denominator is 1 may be identified with the natural number which is its numerator. And the irrational numbers, such as the square root of 2, were supposed to find their place among rational fractions, as being greater than some of them and less than the others, so that rational and irrational numbers could be taken together as one class, called "real numbers." And when the idea of number was further extended so as to include "complex" numbers, i.e. numbers involving the square root of -1, it was thought that real numbers could be regarded as those among complex numbers in which the imaginary part (i.e. the part [Pg 63] which was a multiple of the square root of -1) was zero. All these suppositions were erroneous, and must be discarded, as we shall find, if correct definitions are to be given.

One of the mistakes that have held back the discovery of accurate definitions in this field is the common belief that each extension of numbers included the previous types as special cases. People thought that when dealing with positive and negative integers, positive integers could be considered as the original integers without signs. It was also assumed that a fraction with a denominator of 1 could be identified with the natural number that is its numerator. Additionally, irrational numbers, like the square root of 2, were thought to fit among rational fractions because they were greater than some and less than others, allowing rational and irrational numbers to be grouped together as one category called "real numbers." When the concept of numbers was further expanded to include "complex" numbers, meaning numbers that involve the square root of -1, it was believed that real numbers could be seen as those complex numbers where the imaginary part—specifically, the part that is a multiple of the square root of -1—was zero. All these assumptions were incorrect and need to be set aside, as we will discover, if we are to provide accurate definitions.

Let us begin with positive and negative integers. It is obvious on a moment's consideration that +1 and -1 must both be relations, and in fact must be each other's converses. The obvious and sufficient definition is that +1 is the relation of $n + 1$ to $n$ , and -1 is the relation of $n$ to $n + 1$ . Generally, if $m$ is any inductive number, $+m$ will be the relation of $n + m$ to $n$ (for any $n$ ), and $-m$ will be the relation of $n$ to $n + m$ . According to this definition, $+m$ is a relation which is one-one so long as $n$ is a cardinal number (finite or infinite) and $m$ is an inductive cardinal number. But $+m$ is under no circumstances capable of being identified with $m$ , which is not a relation, but a class of classes. Indeed, $+m$ is every bit as distinct from $m$ as $-m$ is.

Let’s start with positive and negative integers. It’s clear after a moment of thought that +1 and -1 must be opposites, and in fact, they must relate to each other as converses. The straightforward definition is that +1 represents the relationship of $n + 1$ to $n$ , while -1 represents the relationship of $n$ to $n + 1$ . Generally, if $m$ is any inductive number, $+m$ will represent the relation of $n + m$ to $n$ ), and $-m$ will represent the relation of $n$ to $n + m$ . According to this definition, $n$ is a cardinal number (finite or infinite) and $m$ is an inductive cardinal number. However, $+m$

Fractions are more interesting than positive or negative integers. We need fractions for many purposes, but perhaps most obviously for purposes of measurement. My friend and collaborator Dr A. N. Whitehead has developed a theory of fractions specially adapted for their application to measurement, which is set forth in Principia Mathematica.[14] But if all that is needed is to define objects having the required purely mathematical properties, this purpose can be achieved by a simpler method, which we shall here adopt. We shall define the fraction $m/n$ as being that relation which holds between two inductive numbers $x$ , $y$ when $xn = ym$ . This definition enables us to prove that $m/n$ is a one-one relation, provided neither $m$ or $n$ is zero. And of course $n/m$ is the converse relation to $m/n$ .

Fractions are more interesting than positive or negative whole numbers. We need fractions for many reasons, but most obviously for measurement. My friend and collaborator Dr. A. N. Whitehead has developed a theory of fractions specifically suited for their use in measurement, detailed in Principia Mathematica.[14] However, if the goal is simply to define objects that have the necessary mathematical properties, we can use a simpler method, which we will adopt here. We will define the fraction $m/n$ as the relationship that exists between two inductive numbers $x$ , $y$ when $xn = ym$ . This definition allows us to demonstrate that $m/n$ is a one-to-one relationship, as long as neither $m$ nor $n$ is zero. Furthermore, $n/m$ is the reverse relation of $m/n$ .

[14]Vol. III. * 300 ff., especially 303.

From the above definition it is clear that the fraction $m/1$ is that relation between two integers $x$ and $y$ which consists in the fact that $x = my$ . This relation, like the relation $+m$ , is by no means capable of being identified with the inductive cardinal number $m$ , because a relation and a class of classes are objects [Pg 64] of utterly different kinds.[15] It will be seen that $0/n$ is always the same relation, whatever inductive number $n$ may be; it is, in short, the relation of 0 to any other inductive cardinal. We may call this the zero of rational numbers; it is not, of course, identical with the cardinal number 0. Conversely, the relation $m/0$ is always the same, whatever inductive number $m$ may be. There is not any inductive cardinal to correspond to $m/0$ . We may call it "the infinity of rationals." It is an instance of the sort of infinite that is traditional in mathematics, and that is represented by " $\infty$ ." This is a totally different sort from the true Cantorian infinite, which we shall consider in our next chapter. The infinity of rationals does not demand, for its definition or use, any infinite classes or infinite integers. It is not, in actual fact, a very important notion, and we could dispense with it altogether if there were any object in doing so. The Cantorian infinite, on the other hand, is of the greatest and most fundamental importance; the understanding of it opens the way to whole new realms of mathematics and philosophy.

From the definition above, it's clear that the fraction $m/1$ represents the relationship between two integers $x$ and $y$ in which $x = my$ . This relationship, like the relationship $+m$ , cannot be confused with the inductive cardinal number $m$ , because a relationship and a class of classes are completely different kinds of objects. It can be seen that $0/n$ is always the same relationship, no matter what inductive number $n$ is; in short, it represents the relationship of 0 to any other inductive cardinal. We can call this the zero of rational numbers; however, it is not the same as the cardinal number 0. On the other hand, the relationship $m/0$ is always the same, regardless of what inductive number $m$ may be. There is no inductive cardinal that corresponds to $m/0$ . We can refer to it as "the infinity of rationals." This is a type of infinite that is traditionally recognized in mathematics, represented by " $\infty$

[15]Of course in practice we shall continue to speak of a fraction as (say) greater or less than 1, meaning greater or less than the ratio $1/1$ . So long as it is understood that the ratio $1/1$ and the cardinal number 1 are different, it is not necessary to be always pedantic in emphasising the difference.

[15]In practice, we’ll keep referring to a fraction as (for example) greater or less than 1, which means greater or less than the ratio $1/1$ . As long as it’s clear that the ratio $1/1$ and the whole number 1 are different, we don’t need to constantly be nitpicky about highlighting that difference.

It will be observed that zero and infinity, alone among ratios, are not one-one. Zero is one-many, and infinity is many-one.

It can be noted that zero and infinity, unlike other ratios, are not one-to-one. Zero represents one-to-many, and infinity represents many-to-one.

There is not any difficulty in defining greater and less among ratios (or fractions). Given two ratios $m/n$ and $p/q$ , we shall say that $m/n$ is less than $p/q$ if $mq$ is less than $pn$ . There is no difficulty in proving that the relation "less than," so defined, is serial, so that the ratios form a series in order of magnitude. In this series, zero is the smallest term and infinity is the largest. If we omit zero and infinity from our series, there is no longer any smallest or largest ratio; it is obvious that if $m/n$ is any ratio other than zero and infinity, $m/2n$ is smaller and $2m/n$ is larger, though neither is zero or infinity, so that $m/n$ is neither the smallest [Pg 65] nor the largest ratio, and therefore (when zero and infinity are omitted) there is no smallest or largest, since $m/n$ was chosen arbitrarily. In like manner we can prove that however nearly equal two fractions may be, there are always other fractions between them. For, let $m/n$ and $p/q$ be two fractions, of which $p/q$ is the greater. Then it is easy to see (or to prove) that $(m + p)/(n + q)$ will be greater than $m/n$ and less than $p/q$ . Thus the series of ratios is one in which no two terms are consecutive, but there are always other terms between any two. Since there are other terms between these others, and so on ad infinitum, it is obvious that there are an infinite number of ratios between any two, however nearly equal these two may be.[16] A series having the property that there are always other terms between any two, so that no two are consecutive, is called "compact." Thus the ratios in order of magnitude form a "compact" series. Such series have many important properties, and it is important to observe that ratios afford an instance of a compact series generated purely logically, without any appeal to space or time or any other empirical datum.

There’s no difficulty in defining greater and less among ratios (or fractions). Given two ratios $m/n$ and $p/q$ , we say that $m/n$ is less than $p/q$ if $mq$ is less than $pn$ . There’s also no difficulty in proving that the relation "less than," as defined here, is serial, meaning the ratios form a series in order of magnitude. In this series, zero is the smallest term and infinity is the largest. If we remove zero and infinity from our series, there's no longer a smallest or largest ratio; it’s clear that if $m/n$ is any ratio other than zero and infinity, $m/2n$ is smaller and $2m/n$ is larger, though neither is zero or infinity, so $m/n$ is neither the smallest nor the largest ratio. Therefore, when zero and infinity are left out, there is no smallest or largest since $m/n$ was chosen arbitrarily. Similarly, we can prove that however close two fractions may be, there are always other fractions in between them. Let $m/n$ and $p/q$ be two fractions, where $p/q$ is the greater. It’s then easy to see (or prove) that $(m + p)/(n + q)$ will be greater than $m/n$ and less than $p/q$ [16] A series that has the property where there are always other terms between any two, so that no two are consecutive, is called "compact." Thus, the ratios in order of magnitude form a "compact" series. Such series have many important properties, and it’s crucial to note that ratios provide an example of a compact series generated purely logically, without involving space, time, or any other empirical data.

[16]Strictly speaking, this statement, as well as those following to the end of the paragraph, involves what is called the "axiom of infinity," which will be discussed in a later chapter.

[16]Technically, this statement, along with those that follow through the end of the paragraph, refers to what's known as the "axiom of infinity," which will be covered in a later chapter.

Positive and negative ratios can be defined in a way analogous to that in which we defined positive and negative integers. Having first defined the sum of two ratios $m/n$ and $p/q$ as $(mq + pn)/nq$ , we define $+p/q$ as the relation of $m/n + p/q$ to $m/n$ , where $m/n$ is any ratio; and $-p/q$ is of course the converse of $+p/q$ . This is not the only possible way of defining positive and negative ratios, but it is a way which, for our purpose, has the merit of being an obvious adaptation of the way we adopted in the case of integers.

Positive and negative ratios can be defined similarly to how we defined positive and negative integers. First, we define the sum of two ratios $m/n$ and $p/q$ as $(mq + pn)/nq$ . We define $+p/q$ as the relation of $m/n + p/q$ to $m/n$ , where $m/n$ is any ratio; and $-p/q$ is obviously the opposite of $+p/q$ . This isn't the only way to define positive and negative ratios, but it clearly adapts the method we used for integers, which is suitable for our purposes.

We come now to a more interesting extension of the idea of number, i.e. the extension to what are called "real" numbers, which are the kind that embrace irrationals. In Chapter I. we had occasion to mention "incommensurables" and their discovery [Pg 66] by Pythagoras. It was through them, i.e. through geometry, that irrational numbers were first thought of. A square of which the side is one inch long will have a diagonal of which the length is the square root of 2 inches. But, as the ancients discovered, there is no fraction of which the square is 2. This proposition is proved in the tenth book of Euclid, which is one of those books that schoolboys supposed to be fortunately lost in the days when Euclid was still used as a text-book. The proof is extraordinarily simple. If possible, let $m/n$ be the square root of 2, so that $m^{2}/n^{2} = 2$ , i.e. $m^{2} = 2n^{2}$ . Thus $m^{2}$ is an even number, and therefore $m$ must be an even number, because the square of an odd number is odd. Now if $m$ is even, $m^{2}$ must divide by 4, for if $m = 2p$ , then $m^{2} = 4p^{2}$ . Thus we shall have $4p^{2} = 2n^{2}$ , where $p$ is half of $m$ . Hence $2p^{2} = n^{2}$ , and therefore $n/p$ will also be the square root of 2. But then we can repeat the argument: if $n = 2q$ , $p/q$ will also be the square root of 2, and so on, through an unending series of numbers that are each half of its predecessor. But this is impossible; if we divide a number by 2, and then halve the half, and so on, we must reach an odd number after a finite number of steps. Or we may put the argument even more simply by assuming that the $m/n$ we start with is in its lowest terms; in that case, $m$ and $n$ cannot both be even; yet we have seen that, if $m^{2}/n^{2} = 2$ , they must be. Thus there cannot be any fraction $m/n$ whose square is 2.

We now move on to a more intriguing expansion of the concept of number, namely the extension to what's known as "real" numbers, which include irrationals. In Chapter I, we referred to "incommensurables" and their discovery by Pythagoras. It was through geometry that irrational numbers were first considered. A square with a side measuring one inch will have a diagonal whose length is the square root of 2 inches. However, as the ancients found, there is no fraction whose square is 2. This statement is demonstrated in the tenth book of Euclid, a text that schoolboys at one time believed was fortunately lost when Euclid was still being used as a textbook. The proof is remarkably straightforward. Suppose that $m/n$ is the square root of 2, so that $m^{2}/n^{2} = 2$ , which means $m^{2} = 2n^{2}$ . Thus, $m^{2}$ is an even number, which means $m$ must also be even, since the square of an odd number is odd. If $m$ is even, then $m^{2}$ must be divisible by 4, because if $m = 2p$ , then $m^{2} = 4p^{2}$ . Therefore, we get $4p^{2} = 2n^{2}$ , where $p$ is half of $2p^{2} = n^{2}$ , and therefore $n/p$ will also be the square root of 2. However, we can repeat this reasoning: if $n = 2q$ , $p/q$ will also be the square root of 2, and so on, through an infinite series of numbers that are each half of the previous one. But this is impossible; if we continuously divide a number by 2, then halve that result, and so forth, we must eventually reach an odd number after a finite number of steps. Alternatively, we can simplify the argument by assuming that the $m/n$ we begin with is in its simplest form; therefore, $m$ and $n$ cannot both be even; yet we have shown that if $m^{2}/n^{2} = 2$ , they must be. Thus, there cannot be any fraction $m/n$ whose square is 2.

Thus no fraction will express exactly the length of the diagonal of a square whose side is one inch long. This seems like a challenge thrown out by nature to arithmetic. However the arithmetician may boast (as Pythagoras did) about the power of numbers, nature seems able to baffle him by exhibiting lengths which no numbers can estimate in terms of the unit. But the problem did not remain in this geometrical form. As soon as algebra was invented, the same problem arose as regards the solution of equations, though here it took on a wider form, since it also involved complex numbers.

Thus, no fraction can precisely express the length of the diagonal of a square with one-inch sides. This seems like a challenge thrown out by nature to arithmetic. However, no matter how much the arithmetician brags (as Pythagoras did) about the power of numbers, nature seems to confuse him by showing lengths that no numbers can measure in terms of the unit. But the problem didn't stay in this geometric form. Once algebra was invented, the same issue came up regarding the solution of equations, though this time it took on a broader form since it also included complex numbers.

It is clear that fractions can be found which approach nearer [Pg 67] and nearer to having their square equal to 2. We can form an ascending series of fractions all of which have their squares less than 2, but differing from 2 in their later members by less than any assigned amount. That is to say, suppose I assign some small amount in advance, say one-billionth, it will be found that all the terms of our series after a certain one, say the tenth, have squares that differ from 2 by less than this amount. And if I had assigned a still smaller amount, it might have been necessary to go further along the series, but we should have reached sooner or later a term in the series, say the twentieth, after which all terms would have had squares differing from 2 by less than this still smaller amount. If we set to work to extract the square root of 2 by the usual arithmetical rule, we shall obtain an unending decimal which, taken to so-and-so many places, exactly fulfils the above conditions. We can equally well form a descending series of fractions whose squares are all greater than 2, but greater by continually smaller amounts as we come to later terms of the series, and differing, sooner or later, by less than any assigned amount. In this way we seem to be drawing a cordon round the square root of 2, and it may seem difficult to believe that it can permanently escape us. Nevertheless, it is not by this method that we shall actually reach the square root of 2.

It’s clear that we can find fractions that get closer and closer to having their square equal to 2. We can create a series of fractions, all of which have squares less than 2, but the later fractions in the series differ from 2 by less than any chosen amount. For example, if I choose a small amount, like one-billionth, it turns out that all the fractions in our series after a certain point, let’s say the tenth one, will have squares that differ from 2 by less than this chosen amount. If I had picked an even smaller amount, we might need to go further along the series, but eventually, we would reach a term in the series, say the twentieth, after which all the terms would have squares differing from 2 by less than this still smaller amount. If we try to calculate the square root of 2 using the normal arithmetic method, we’ll get an endless decimal that, when taken to a specific number of decimal places, meets the above conditions perfectly. We can also create a descending series of fractions whose squares are all greater than 2, but with the differences getting smaller as we progress to the later terms of the series, eventually differing by less than any assigned amount. This way, it seems like we are surrounding the square root of 2, and it might feel hard to believe that it can permanently elude us. However, this isn’t the method we will use to actually find the square root of 2.

If we divide all ratios into two classes, according as their squares are less than 2 or not, we find that, among those whose squares are not less than 2, all have their squares greater than 2. There is no maximum to the ratios whose square is less than 2, and no minimum to those whose square is greater than 2. There is no lower limit short of zero to the difference between the numbers whose square is a little less than 2 and the numbers whose square is a little greater than 2. We can, in short, divide all ratios into two classes such that all the terms in one class are less than all in the other, there is no maximum to the one class, and there is no minimum to the other. Between these two classes, where $\sqrt{2}$ ought to be, there is nothing. Thus our [Pg 68] cordon, though we have drawn it as tight as possible, has been drawn in the wrong place, and has not caught $\sqrt{2}$ .

If we split all ratios into two categories based on whether their squares are less than 2 or not, we see that, among those whose squares are not less than 2, all have squares greater than 2. There’s no maximum for the ratios whose square is less than 2, and no minimum for those whose square is greater than 2. The difference between the numbers whose square is just under 2 and the numbers whose square is just over 2 has no lower limit, except for zero. In summary, we can separate all ratios into two classes where every term in one class is less than every term in the other. There’s no maximum in one class, and no minimum in the other. Between these two classes, where $\sqrt{2}$ should be, there’s nothing. So, even though we've made the division as precise as possible, we've placed it incorrectly and missed $\sqrt{2}$ . [Pg 68]

The above method of dividing all the terms of a series into two classes, of which the one wholly precedes the other, was brought into prominence by Dedekind,[17] and is therefore called a "Dedekind cut." With respect to what happens at the point of section, there are four possibilities: (1) there may be a maximum to the lower section and a minimum to the upper section, (2) there may be a maximum to the one and no minimum to the other, (3) there may be no maximum to the one, but a minimum to the other, (4) there may be neither a maximum to the one nor a minimum to the other. Of these four cases, the first is illustrated by any series in which there are consecutive terms: in the series of integers, for instance, a lower section must end with some number $n$ and the upper section must then begin with $n + 1$ . The second case will be illustrated in the series of ratios if we take as our lower section all ratios up to and including 1, and in our upper section all ratios greater than 1. The third case is illustrated if we take for our lower section all ratios less than 1, and for our upper section all ratios from 1 upward (including 1 itself). The fourth case, as we have seen, is illustrated if we put in our lower section all ratios whose square is less than 2, and in our upper section all ratios whose square is greater than 2.

The method of dividing all the terms of a series into two groups, where one completely precedes the other, was highlighted by Dedekind,[17] and is known as a "Dedekind cut." There are four possibilities regarding what happens at the point of division: (1) there may be a maximum in the lower section and a minimum in the upper section, (2) there may be a maximum in one section and no minimum in the other, (3) there may be no maximum in one section, but a minimum in the other, (4) there may be neither a maximum in one section nor a minimum in the other. The first case is shown by any series with consecutive terms: in the series of integers, for example, the lower section must end with some number $n$ and the upper section must then start with $n + 1$ . The second case is illustrated in the series of ratios if we take our lower section as all ratios up to and including 1, and our upper section as all ratios greater than 1. The third case is shown if we take our lower section as all ratios less than 1, and our upper section as all ratios from 1 upward (including 1 itself). The fourth case, as we have seen, is illustrated if we include in our lower section all ratios whose square is less than 2, and in our upper section all ratios whose square is greater than 2.

[17]Stetigkeit und irrationale Zahlen, 2nd edition, Brunswick, 1892.

[17]Continuity and Irrational Numbers, 2nd edition, Brunswick, 1892.

We may neglect the first of our four cases, since it only arises in series where there are consecutive terms. In the second of our four cases, we say that the maximum of the lower section is the lower limit of the upper section, or of any set of terms chosen out of the upper section in such a way that no term of the upper section is before all of them. In the third of our four cases, we say that the minimum of the upper section is the upper limit of the lower section, or of any set of terms chosen out of the lower section in such a way that no term of the lower section is after all of them. In the fourth case, we say that [Pg 69] there is a "gap": neither the upper section nor the lower has a limit or a last term. In this case, we may also say that we have an "irrational section," since sections of the series of ratios have "gaps" when they correspond to irrationals.

We can skip the first of our four cases since it only comes up when there are consecutive terms. In the second case, we say that the maximum of the lower section is the lower limit of the upper section, or of any group of terms taken from the upper section in such a way that none of the terms in the upper section come before all of them. In the third case, we say that the minimum of the upper section is the upper limit of the lower section, or of any group of terms taken from the lower section in such a way that none of the terms in the lower section come after all of them. In the fourth case, we say there is a "gap": neither the upper section nor the lower section has a limit or a last term. In this scenario, we can also say that we have an "irrational section," since sections of the series of ratios have "gaps" when they relate to irrationals. [Pg 69]

What delayed the true theory of irrationals was a mistaken belief that there must be "limits" of series of ratios. The notion of "limit" is of the utmost importance, and before proceeding further it will be well to define it.

What held back the true understanding of irrationals was a wrong belief that there had to be "limits" to series of ratios. The idea of "limit" is extremely important, and before moving on, it's a good idea to define it.

A term $x$ is said to be an "upper limit" of a class $\alpha$ with respect to a relation $\mathrm P$ if (1) $\alpha$ has no maximum in $\mathrm P$ , (2) every member of $\alpha$ which belongs to the field of $\mathrm P$ precedes $x$ , (3) every member of the field of $\mathrm P$ which precedes $x$ precedes some member of $\alpha$ . (By "precedes" we mean "has the relation $\mathrm P$ to.")

A term $x$ is called an "upper limit" of a class $\alpha$ regarding a relation $\mathrm P$ if (1) $\alpha$ has no maximum in $\mathrm P$ , (2) every member of $\alpha$ that is part of the field of $\mathrm P$ comes before $x$ , (3) every member of the field of $\mathrm P$ that comes before $x$ comes before some member of $\alpha$ . (By "precedes" we mean "has the relation $\mathrm P$ to.")

This presupposes the following definition of a "maximum":—

This assumes the following definition of "maximum":—

A term $x$ is said to be a "maximum" of a class $\alpha$ with respect to a relation $\mathrm P$ if $x$ is a member of $\alpha$ and of the field of $\mathrm P$ and does not have the relation $\mathrm P$ to any other member of $\alpha$ .

A term $x$ is called a "maximum" of a class $\alpha$ regarding a relation $\mathrm P$ if $x$ is a member of $\alpha$ and belongs to the field of $\mathrm P$ and does not have the relation $\mathrm P$ with any other member of $\alpha$ .

These definitions do not demand that the terms to which they are applied should be quantitative. For example, given a series of moments of time arranged by earlier and later, their "maximum" (if any) will be the last of the moments; but if they are arranged by later and earlier, their "maximum" (if any) will be the first of the moments.

These definitions don't require that the terms they describe be quantitative. For instance, if a series of moments in time is arranged from earlier to later, their "maximum" (if there is one) will be the last moment; however, if they are arranged from later to earlier, their "maximum" (if there is one) will be the first moment.

The "minimum" of a class with respect to $\mathrm P$ is its maximum with respect to the converse of $\mathrm P$ ; and the "lower limit" with respect to $\mathrm P$ is the upper limit with respect to the converse of $\mathrm P$ .

The "minimum" of a class in relation to $\mathrm P$ is its maximum in relation to the opposite of $\mathrm P$ ; and the "lower limit" in relation to $\mathrm P$ is the upper limit in relation to the opposite of $\mathrm P$ .

The notions of limit and maximum do not essentially demand that the relation in respect to which they are defined should be serial, but they have few important applications except to cases when the relation is serial or quasi-serial. A notion which is often important is the notion "upper limit or maximum," to which we may give the name "upper boundary." Thus the "upper boundary" of a set of terms chosen out of a series is their last member if they have one, but, if not, it is the first term after all of them, if there is such a term. If there is neither [Pg 70] a maximum nor a limit, there is no upper boundary. The "lower boundary" is the lower limit or minimum.

The concepts of limit and maximum don't necessarily require that the relationship they are based on be serial, but they mainly apply to situations where the relationship is serial or nearly serial. One often important idea is the "upper limit or maximum," which we can call the "upper boundary." Therefore, the "upper boundary" of a selected set of terms from a series is their last member, if there is one. If not, it's the first term that follows all of them, if such a term exists. If there's neither a maximum nor a limit, then there's no upper boundary. The "lower boundary" refers to the lower limit or minimum. [Pg 70]

Reverting to the four kinds of Dedekind section, we see that in the case of the first three kinds each section has a boundary (upper or lower as the case may be), while in the fourth kind neither has a boundary. It is also clear that, whenever the lower section has an upper boundary, the upper section has a lower boundary. In the second and third cases, the two boundaries are identical; in the first, they are consecutive terms of the series.

Revisiting the four types of Dedekind sections, we observe that for the first three types, each section has a boundary (either upper or lower, depending on the situation), while in the fourth type, neither has a boundary. It's also evident that whenever the lower section has an upper boundary, the upper section will have a lower boundary. In the second and third cases, the two boundaries are the same; in the first case, they are consecutive terms of the series.

A series is called "Dedekindian" when every section has a boundary, upper or lower as the case may be.

A series is termed "Dedekindian" when every section has a boundary, whether upper or lower, depending on the situation.

We have seen that the series of ratios in order of magnitude is not Dedekindian.

We have seen that the series of ratios arranged by size is not Dedekindian.

From the habit of being influenced by spatial imagination, people have supposed that series must have limits in cases where it seems odd if they do not. Thus, perceiving that there was no rational limit to the ratios whose square is less than 2, they allowed themselves to "postulate" an irrational limit, which was to fill the Dedekind gap. Dedekind, in the above-mentioned work, set up the axiom that the gap must always be filled, i.e. that every section must have a boundary. It is for this reason that series where his axiom is verified are called "Dedekindian." But there are an infinite number of series for which it is not verified.

From the habit of being influenced by spatial imagination, people have assumed that series must have limits in situations where it seems strange if they don’t. Therefore, realizing that there was no rational limit to the ratios whose square is less than 2, they allowed themselves to "postulate" an irrational limit to fill the Dedekind gap. Dedekind, in the previously mentioned work, established the axiom that the gap must always be filled, i.e. that every section must have a boundary. This is why series that verify his axiom are called "Dedekindian." However, there are infinitely many series for which it is not verified.

The method of "postulating" what we want has many advantages; they are the same as the advantages of theft over honest toil. Let us leave them to others and proceed with our honest toil.

The way of "assuming" what we want has a lot of benefits; they're the same benefits as stealing compared to hard work. Let's leave those benefits to others and stick to our hard work.

It is clear that an irrational Dedekind cut in some way "represents" an irrational. In order to make use of this, which to begin with is no more than a vague feeling, we must find some way of eliciting from it a precise definition; and in order to do this, we must disabuse our minds of the notion that an irrational must be the limit of a set of ratios. Just as ratios whose denominator is 1 are not identical with integers, so those rational [Pg 71] numbers which can be greater or less than irrationals, or can have irrationals as their limits, must not be identified with ratios. We have to define a new kind of numbers called "real numbers," of which some will be rational and some irrational. Those that are rational "correspond" to ratios, in the same kind of way in which the ratio $n/1$ corresponds to the integer $n$ ; but they are not the same as ratios. In order to decide what they are to be, let us observe that an irrational is represented by an irrational cut, and a cut is represented by its lower section. Let us confine ourselves to cuts in which the lower section has no maximum; in this case we will call the lower section a "segment." Then those segments that correspond to ratios are those that consist of all ratios less than the ratio they correspond to, which is their boundary; while those that represent irrationals are those that have no boundary. Segments, both those that have boundaries and those that do not, are such that, of any two pertaining to one series, one must be part of the other; hence they can all be arranged in a series by the relation of whole and part. A series in which there are Dedekind gaps, i.e. in which there are segments that have no boundary, will give rise to more segments than it has terms, since each term will define a segment having that term for boundary, and then the segments without boundaries will be extra.

It’s clear that an irrational Dedekind cut in some way "represents" an irrational number. To use this concept, which initially is just a vague feeling, we need to find a precise definition. To do this, we have to let go of the idea that an irrational number must be the limit of a set of ratios. Just as ratios with a denominator of 1 aren't the same as integers, rational numbers that can be greater or less than irrationals, or can have irrationals as their limits, shouldn’t be confused with ratios. We need to define a new category of numbers called "real numbers," some of which will be rational and some irrational. Rational numbers "correspond" to ratios, much like the ratio $n/1$ corresponds to the integer $n$ ; however, they are not identical to ratios. To determine what they should be, we note that an irrational number is represented by an irrational cut, and a cut is represented by its lower section. Let's focus on cuts where the lower section has no maximum; in this case, we call the lower section a "segment." The segments that correspond to ratios are those that include all ratios less than the ratio they correspond to, which is their boundary; while those that represent irrationals are those with no boundary. Segments, both those with boundaries and those without, are such that for any two in the same series, one must be part of the other; thus, they can all be arranged in a series based on the relationship of whole and part. A series that has Dedekind gaps, i.e. segments without boundaries, will produce more segments than it has terms since each term will define a segment with that term as its boundary, and then the segments without boundaries will be additional.

We are now in a position to define a real number and an irrational number.

We can now define a real number and an irrational number.

A "real number" is a segment of the series of ratios in order of magnitude.

A "real number" is a part of the series of ratios arranged by size.

An "irrational number" is a segment of the series of ratios which has no boundary.

An "irrational number" is a part of the series of ratios that has no limits.

A "rational real number" is a segment of the series of ratios which has a boundary.

A "rational real number" is a part of the series of ratios that has a limit.

Thus a rational real number consists of all ratios less than a certain ratio, and it is the rational real number corresponding to that ratio. The real number 1, for instance, is the class of proper fractions. [Pg 72]

A rational real number includes all ratios that are less than a specific ratio, and it represents the rational real number related to that ratio. For example, the real number 1 is the group of proper fractions. [Pg 72]

In the cases in which we naturally supposed that an irrational must be the limit of a set of ratios, the truth is that it is the limit of the corresponding set of rational real numbers in the series of segments ordered by whole and part. For example, $\sqrt{2}$ is the upper limit of all those segments of the series of ratios that correspond to ratios whose square is less than 2. More simply still, $\sqrt{2}$ is the segment consisting of all those ratios whose square is less than 2.

In cases where we naturally thought that an irrational number must be the limit of a set of ratios, the reality is that it is the limit of the corresponding set of rational real numbers in the series of segments organized by whole and part. For example, $\sqrt{2}$ is the highest limit of all those segments of the ratio series that correspond to ratios whose square is less than 2. More simply, $\sqrt{2}$ is the segment made up of all those ratios whose square is less than 2.

It is easy to prove that the series of segments of any series is Dedekindian. For, given any set of segments, their boundary will be their logical sum, i.e. the class of all those terms that belong to at least one segment of the set.[18]

It’s straightforward to show that the collection of segments from any series is Dedekindian. Because, for any group of segments, their boundary will be their logical sum, i.e. the set of all terms that belong to at least one segment in the group.[18]

[18]For a fuller treatment of the subject of segments and Dedekindian relations, see Principia Mathematica, vol. II. * 210-214. For a fuller treatment of real numbers, see ibid., vol. III. * 310 ff., and Principles of Mathematics, chaps. XXXIII. and XXXIV.

[18]For a more detailed discussion on segments and Dedekindian relations, check out Principia Mathematica, vol. II. * 210-214. For an in-depth look at real numbers, see ibid., vol. III. * 310 ff., and Principles of Mathematics, chaps. XXXIII. and XXXIV.

The above definition of real numbers is an example of "construction" as against "postulation," of which we had another example in the definition of cardinal numbers. The great advantage of this method is that it requires no new assumptions, but enables us to proceed deductively from the original apparatus of logic.

The definition of real numbers above is an example of "construction" compared to "postulation," which we saw in the definition of cardinal numbers. The big advantage of this method is that it doesn’t require any new assumptions, allowing us to proceed deductively from the original logic framework.

There is no difficulty in defining addition and multiplication for real numbers as above defined. Given two real numbers $\mu$ and $\nu$ , each being a class of ratios, take any member of $\mu$ and any member of $\nu$ and add them together according to the rule for the addition of ratios. Form the class of all such sums obtainable by varying the selected members of $\mu$ and $\nu$ . This gives a new class of ratios, and it is easy to prove that this new class is a segment of the series of ratios. We define it as the sum of $\mu$ and $\nu$ . We may state the definition more shortly as follows:—

There’s no trouble in defining addition and multiplication for real numbers as described above. Given two real numbers $\mu$ and $\nu$ , each being a class of ratios, take any member of $\mu$ and any member of $\nu$ and add them together according to the rule for adding ratios. Create the class of all such sums by varying the selected members of $\mu$ and $\mu$ and $\nu$

The arithmetical sum of two real numbers is the class of the arithmetical sums of a member of the one and a member of the other chosen in all possible ways. [Pg 73]

The arithmetical sum of two real numbers is the set of all possible sums formed by taking one member from each group and adding them together in every possible combination. [Pg 73]

We can define the arithmetical product of two real numbers in exactly the same way, by multiplying a member of the one by a member of the other in all possible ways. The class of ratios thus generated is defined as the product of the two real numbers. (In all such definitions, the series of ratios is to be defined as excluding 0 and infinity.)

We can define the arithmetic product of two real numbers in the same way, by multiplying one number by each number in the other in all possible combinations. The set of ratios generated this way is defined as the product of the two real numbers. (In all such definitions, the series of ratios is defined to exclude 0 and infinity.)

There is no difficulty in extending our definitions to positive and negative real numbers and their addition and multiplication.

There’s no trouble in expanding our definitions to include positive and negative real numbers, as well as their addition and multiplication.

It remains to give the definition of complex numbers.

It’s time to define complex numbers.

Complex numbers, though capable of a geometrical interpretation, are not demanded by geometry in the same imperative way in which irrationals are demanded. A "complex" number means a number involving the square root of a negative number, whether integral, fractional, or real. Since the square of a negative number is positive, a number whose square is to be negative has to be a new sort of number. Using the letter $i$ for the square root of $-1$ , any number involving the square root of a negative number can be expressed in the form $x + yi$ , where $x$ and $y$ are real. The part $yi$ is called the "imaginary" part of this number, $x$ being the "real" part. (The reason for the phrase "real numbers" is that they are contrasted with such as are "imaginary.") Complex numbers have been for a long time habitually used by mathematicians, in spite of the absence of any precise definition. It has been simply assumed that they would obey the usual arithmetical rules, and on this assumption their employment has been found profitable. They are required less for geometry than for algebra and analysis. We desire, for example, to be able to say that every quadratic equation has two roots, and every cubic equation has three, and so on. But if we are confined to real numbers, such an equation as $x^{2} + 1 = 0$ has no roots, and such an equation as $x^{3} - 1 = 0$ has only one. Every generalisation of number has first presented itself as needed for some simple problem: negative numbers were needed in order that subtraction might be always possible, since otherwise $a - b$ would be meaningless if $a$ were less than $b$ ; fractions were needed [Pg 74] in order that division might be always possible; and complex numbers are needed in order that extraction of roots and solution of equations may be always possible. But extensions of number are not created by the mere need for them: they are created by the definition, and it is to the definition of complex numbers that we must now turn our attention.

Complex numbers, while they can be visualized geometrically, aren't required by geometry as urgently as irrational numbers are. A "complex" number represents a number that includes the square root of a negative number, whether it's whole, fractional, or real. Given that the square of a negative number is positive, any number whose square is negative has to be a new type of number. By using the letter $i$ to denote the square root of $-1$ , we can express any number involving the square root of a negative number in the form $x + yi$ , where $x$ and $y$ are both real numbers. The part $yi$ is called the "imaginary" part, while $x$ is the "real" part. (The term "real numbers" comes from the distinction made with "imaginary" numbers.) Mathematicians have been using complex numbers routinely for a long time, even without a precise definition. It's been assumed they would follow standard arithmetic rules, and this assumption has proved useful. Complex numbers are needed more for algebra and analysis than for geometry. For example, we want to say that every quadratic equation has two roots and every cubic equation has three, and so forth. However, if we limit ourselves to real numbers, an equation like $x^{2} + 1 = 0$ has no roots, and an equation like $x^{3} - 1 = 0$ has only one. Each extension of numbers has emerged because of a particular problem: negative numbers were necessary so that subtraction could always be performed, because otherwise $a - b$ would be meaningless if $a$ was smaller than $b$ ; fractions were required so that division could always be performed, and complex numbers are necessary so that we can always extract roots and solve equations. However, number extensions aren't simply created out of need; they arise from definitions, and it is to the definition of complex numbers that we now need to focus our attention.

A complex number may be regarded and defined as simply an ordered couple of real numbers. Here, as elsewhere, many definitions are possible. All that is necessary is that the definitions adopted shall lead to certain properties. In the case of complex numbers, if they are defined as ordered couples of real numbers, we secure at once some of the properties required, namely, that two real numbers are required to determine a complex number, and that among these we can distinguish a first and a second, and that two complex numbers are only identical when the first real number involved in the one is equal to the first involved in the other, and the second to the second. What is needed further can be secured by defining the rules of addition and multiplication. We are to have $\begin{alignat*}{2} &(x + yi) + (x' + y'i) &&= (x + x') + (y + y')i, \\ &(x + yi) (x' + y'i) &&= (xx' - yy') + (xy' + x'y)i. \end{alignat*}$ Thus we shall define that, given two ordered couples of real numbers, $(x, y)$ and $(x', y')$ , their sum is to be the couple $(x + x', y + y')$ , and their product is to be the couple $(xx' - yy', xy' + x'y)$ . By these definitions we shall secure that our ordered couples shall have the properties we desire. For example, take the product of the two couples $(0, y)$ and $(0, y')$ . This will, by the above rule, be the couple $(-yy', 0)$ . Thus the square of the couple $(0, 1)$ will be the couple $(-1, 0)$ . Now those couples in which the second term is 0 are those which, according to the usual nomenclature, have their imaginary part zero; in the notation $x + yi$ , they are $x + 0i$ , which it is natural to write simply $x$ . Just as it is natural (but erroneous) to identify ratios whose denominator is unity with integers, so it is natural (but erroneous) [Pg 75] to identify complex numbers whose imaginary part is zero with real numbers. Although this is an error in theory, it is a convenience in practice; " $x + 0i$ " may be replaced simply by " $x$ " and " $0 + yi$ " by " $yi$ ," provided we remember that the " $x$ " is not really a real number, but a special case of a complex number. And when $y$ is 1, " $yi$ " may of course be replaced by " $i$ ." Thus the couple $(0, 1)$ is represented by $i$ , and the couple $(-1, 0)$ is represented by -1. Now our rules of multiplication make the square of $(0, 1)$ equal to $(-1, 0)$ , i.e. the square of $i$ is -1. This is what we desired to secure. Thus our definitions serve all necessary purposes.

A complex number can be understood as just an ordered pair of real numbers. There are many ways to define it, but what's important is that the definitions lead to specific properties. When we define complex numbers as ordered pairs of real numbers, we immediately establish some necessary properties: two real numbers are needed to define a complex number, and we can identify a first and second number in the pair. Two complex numbers are identical only if the first real number of one equals the first of the other, and the same for the second number. We can further specify things by defining how addition and multiplication work. We will have $\begin{alignat*}{2} &(x + yi) + (x' + y'i) &&= (x + x') + (y + y')i, \\ &(x + yi) (x' + y'i) &&= (xx' - yy') + (xy' + x'y)i. \end{alignat*}$ So, we define that, given two ordered pairs of real numbers, $(x, y)$ and $(x', y')$ , their sum will be the pair $(x + x', y + y')$ , and their product will be the pair $(xx' - yy', xy' + x'y)$ . With these definitions, we ensure that our ordered pairs have the properties we want. For instance, if we take the product of the pairs $(0, y)$ and $(0, y')$ . According to the above rule, this will be the pair $(-yy', 0)$ . Therefore, the square of the pair $(0, 1)$ will be the pair $(-1, 0)$ . The pairs where the second term is 0 are those which, in common terminology, have their imaginary part as zero; in the notation $x + yi$ , they are $x + 0i$ , which we typically just write as $x$ . Similarly, just as it's common (but incorrect) to identify ratios with a denominator of one as integers, it's common (but incorrect) to identify complex numbers where the imaginary part is zero with real numbers. While this is not theoretically accurate, it is practically convenient; " $x + 0i$ " can be simply replaced with " $x$ ," and " $0 + yi$ " can be replaced with " $yi$ ," as long as we remember that " $x$ " isn't truly a real number, but a specific case of a complex number. And when $y$ is 1, " $yi$ " can of course be replaced by " $i$ ." Therefore, the pair $(0, 1)$ is represented by $i$ , and the pair $(-1, 0)$ is represented by -1. Now, our multiplication rules show that the square of $(0, 1)$ equals $(-1, 0)$ , that is, the square of $i$ is -1. This is what we aimed to establish. Therefore, our definitions fulfill all necessary requirements.

It is easy to give a geometrical interpretation of complex numbers in the geometry of the plane. This subject was agreeably expounded by W. K. Clifford in his Common Sense of the Exact Sciences, a book of great merit, but written before the importance of purely logical definitions had been realised.

It’s simple to understand complex numbers geometrically in plane geometry. W. K. Clifford explained this well in his Common Sense of the Exact Sciences, an excellent book, though it was written before the importance of purely logical definitions was recognized.

Complex numbers of a higher order, though much less useful and important than those what we have been defining, have certain uses that are not without importance in geometry, as may be seen, for example, in Dr Whitehead's Universal Algebra. The definition of complex numbers of order $n$ is obtained by an obvious extension of the definition we have given. We define a complex number of order $n$ as a one-many relation whose domain consists of certain real numbers and whose converse domain consists of the integers from 1 to $n$ .[19] This is what would ordinarily be indicated by the notation $(x_{1}, x_{2}, x_{3}, \dots, x_{n})$ , where the suffixes denote correlation with the integers used as suffixes, and the correlation is one-many, not necessarily one-one, because $x_{r}$ and $x_{s}$ may be equal when $r$ and $s$ are not equal. The above definition, with a suitable rule of multiplication, will serve all purposes for which complex numbers of higher orders are needed.

Complex numbers of a higher order, although not as useful and significant as those we've been discussing, do have some importance in geometry, as seen, for instance, in Dr. Whitehead's Universal Algebra. We can define complex numbers of order $n$ by extending the definition we've already provided. A complex number of order $n$ is defined as a one-to-many relation where the domain consists of certain real numbers and the converse domain consists of the integers from 1 to $n$ .[19] This is typically represented by the notation $(x_{1}, x_{2}, x_{3}, \dots, x_{n})$ where the suffixes indicate a correlation with the integers used as suffixes. The correlation is one-to-many, not necessarily one-to-one, because $x_{r}$ and $x_{s}$ can be equal even when $r$ and $s$ are different. This definition, along with an appropriate multiplication rule, will fulfill all the requirements for using complex numbers of higher orders.

[19]Cf. Principles of Mathematics, § 360, p. 379.

[19]See Principles of Mathematics, § 360, p. 379.

We have now completed our review of those extensions of number which do not involve infinity. The application of number to infinite collections must be our next topic. [Pg 76]

We have now finished our review of the extensions of numbers that don't include infinity. Our next topic will be the application of numbers to infinite collections. [Pg 76]

CHAPTER VIII

INFINITE CARDINAL NUMBERS

THE definition of cardinal numbers which we gave in Chapter II. was applied in Chapter III. to finite numbers, i.e. to the ordinary natural numbers. To these we gave the name "inductive numbers," because we found that they are to be defined as numbers which obey mathematical induction starting from 0. But we have not yet considered collections which do not have an inductive number of terms, nor have we inquired whether such collections can be said to have a number at all. This is an ancient problem, which has been solved in our own day, chiefly by Georg Cantor. In the present chapter we shall attempt to explain the theory of transfinite or infinite cardinal numbers as it results from a combination of his discoveries with those of Frege on the logical theory of numbers.

THE definition of cardinal numbers that we presented in Chapter II was used in Chapter III for finite numbers, i.e. for the usual natural numbers. We called these "inductive numbers" because we found that they can be defined as numbers that follow mathematical induction starting from 0. However, we haven't yet looked at collections that don’t have an inductive number of terms, nor have we examined whether such collections can be considered as having a number at all. This is an age-old problem that has been addressed in our time, mainly by Georg Cantor. In this chapter, we will try to explain the theory of transfinite or infinite cardinal numbers that emerges from a blend of his findings and those of Frege on the logical theory of numbers.

It cannot be said to be certain that there are in fact any infinite collections in the world. The assumption that there are is what we call the "axiom of infinity." Although various ways suggest themselves by which we might hope to prove this axiom, there is reason to fear that they are all fallacious, and that there is no conclusive logical reason for believing it to be true. At the same time, there is certainly no logical reason against infinite collections, and we are therefore justified, in logic, in investigating the hypothesis that there are such collections. The practical form of this hypothesis, for our present purposes, is the assumption that, if $n$ is any inductive number, $n$ is not equal to $n + 1$ . Various subtleties arise in identifying this form of our assumption with [Pg 77] the form that asserts the existence of infinite collections; but we will leave these out of account until, in a later chapter, we come to consider the axiom of infinity on its own account. For the present we shall merely assume that, if $n$ is an inductive number, $n$ is not equal to $n + 1$ . This is involved in Peano's assumption that no two inductive numbers have the same successor; for, if $n = n + 1$ , then $n - 1$ and $n$ have the same successor, namely $n$ . Thus we are assuming nothing that was not involved in Peano's primitive propositions.

It can't be said for sure that there are any infinite collections in the world. The belief that there are infinite collections is known as the "axiom of infinity." Although there are several approaches we might take to prove this axiom, there's reason to worry that they might all be flawed and that there's no definitive logical reason to think it's true. At the same time, there's absolutely no logical reason against infinite collections, so it makes sense, logically, to explore the idea that such collections exist. Practically speaking, for our current purposes, this hypothesis assumes that if $n$ is any inductive number, $n$ is not equal to $n + 1$ . There are various complexities in relating this assumption to the claim of the existence of infinite collections, but we'll set those aside for now until we discuss the axiom of infinity in a later chapter. For now, we will just assume that if $n$ is an inductive number, $n$ is not equal to $n = n + 1$ were true, then $n - 1$ and $n$ would have the same successor, which is $n$

Let us now consider the collection of the inductive numbers themselves. This is a perfectly well-defined class. In the first place, a cardinal number is a set of classes which are all similar to each other and are not similar to anything except each other. We then define as the "inductive numbers" those among cardinals which belong to the posterity of 0 with respect to the relation of $n$ to $n + 1$ , i.e. those which possess every property possessed by 0 and by the successors of possessors, meaning by the "successor" of $n$ the number $n + 1$ . Thus the class of "inductive numbers" is perfectly definite. By our general definition of cardinal numbers, the number of terms in the class of inductive numbers is to be defined as "all those classes that are similar to the class of inductive numbers"—i.e. this set of classes is the number of the inductive numbers according to our definitions.

Let’s now look at the collection of inductive numbers themselves. This is a clearly defined group. First, a cardinal number is a set of classes that are all similar to one another and aren't similar to anything else except for each other. We define "inductive numbers" as those cardinal numbers that are part of the succession of 0 in relation to $n$ and $n + 1$ , meaning those that have every property of 0 and of the successors of others, where the "successor" of $n$ is the number $n + 1$ . So, the class of "inductive numbers" is precisely defined. According to our general definition of cardinal numbers, the number of terms in the inductive numbers class is defined as "all those classes that are similar to the class of inductive numbers"—in other words, this set of classes is the count of the inductive numbers based on our definitions.

Now it is easy to see that this number is not one of the inductive numbers. If $n$ is any inductive number, the number of numbers from 0 to $n$ (both included) is $n + 1$ ; therefore the total number of inductive numbers is greater than $n$ , no matter which of the inductive numbers $n$ may be. If we arrange the inductive numbers in a series in order of magnitude, this series has no last term; but if $n$ is an inductive number, every series whose field has $n$ terms has a last term, as it is easy to prove. Such differences might be multiplied ad lib. Thus the number of inductive numbers is a new number, different from all of them, not possessing all inductive properties. It may happen that 0 has a certain [Pg 78] property, and that if $n$ has it so has $n + 1$ , and yet that this new number does not have it. The difficulties that so long delayed the theory of infinite numbers were largely due to the fact that some, at least, of the inductive properties were wrongly judged to be such as must belong to all numbers; indeed it was thought that they could not be denied without contradiction. The first step in understanding infinite numbers consists in realising the mistakenness of this view.

Now it's clear that this number isn't one of the natural numbers. If $n$ is any natural number, the total count of numbers from 0 to $n$ (inclusive) is $n + 1$ ; therefore, the total number of natural numbers is greater than $n$ , regardless of which natural number $n$ it is. If we arrange the natural numbers in a series by size, this series has no final term; but if $n$ is a natural number, every series with $n$ terms has a final term, which is easy to prove. Such differences could be multiplied ad lib. Therefore, the number of natural numbers is a new number, distinct from all of them, lacking all natural properties. It's possible that 0 has a certain [Pg 78] property, and that if $n$ has it, then $n + 1$ also has it, yet this new number does not. The challenges that hindered the theory of infinite numbers for so long were mainly due to the mistaken belief that some, at least, of the natural properties must apply to all numbers; in fact, it was thought that denying them would lead to contradictions. The first step in understanding infinite numbers is realizing the error in this belief.

The most noteworthy and astonishing difference between an inductive number and this new number is that this new number is unchanged by adding 1 or subtracting 1 or doubling or halving or any of a number of other operations which we think of as necessarily making a number larger or smaller. The fact of being not altered by the addition of 1 is used by Cantor for the definition of what he calls "transfinite" cardinal numbers; but for various reasons, some of which will appear as we proceed, it is better to define an infinite cardinal number as one which does not possess all inductive properties, i.e. simply as one which is not an inductive number. Nevertheless, the property of being unchanged by the addition of 1 is a very important one, and we must dwell on it for a time.

The most remarkable and surprising difference between an inductive number and this new number is that this new number remains the same when you add 1, subtract 1, double it, halve it, or perform any of several other operations that we usually think of as making a number larger or smaller. Cantor uses the fact that adding 1 doesn’t change it to define what he calls "transfinite" cardinal numbers; however, for various reasons, some of which will become clear as we continue, it's more accurate to define an infinite cardinal number as one that doesn’t have all the inductive properties, i.e. simply as one that isn’t an inductive number. Still, the fact that it remains unchanged by adding 1 is really significant, and we need to focus on it for a while.

To say that a class has a number which is not altered by the addition of 1 is the same thing as to say that, if we take a term $x$ which does not belong to the class, we can find a one-one relation whose domain is the class and whose converse domain is obtained by adding $x$ to the class. For in that case, the class is similar to the sum of itself and the term $x$ , i.e. to a class having one extra term; so that it has the same number as a class with one extra term, so that if $n$ is this number, $n = n + 1$ . In this case, we shall also have $n = n - 1$ , i.e. there will be one-one relations whose domains consist of the whole class and whose converse domains consist of just one term short of the whole class. It can be shown that the cases in which this happens are the same as the apparently more general cases in which some part (short of the whole) can be put into one-one relation with the whole. When this can be done, [Pg 79] the correlator by which it is done may be said to "reflect" the whole class into a part of itself; for this reason, such classes will be called "reflexive." Thus:

To say that a class has a number that doesn’t change with the addition of 1 means that if we take a term $x$ that isn’t part of the class, we can find a one-to-one relation where the domain is the class and the opposite domain is formed by adding $x$ to the class. In that case, the class is similar to the total of itself and the term $x$ , i.e. to a class that has one extra term; therefore, it has the same number as a class with one additional term. If $n$ is this number, then $n = n + 1$ . In this case, we will also have $n = n - 1$ , i.e. there will be one-to-one relations where the domains are the entire class and where the opposite domains are just one term short of the entire class. It can be shown that the scenarios in which this happens are the same as the seemingly more general cases where some part (less than the whole) can be related one-to-one with the whole. When this is possible, the correlator that enables it may be said to “reflect” the whole class into a part of itself; for this reason, such classes are called “reflexive.” Thus:

A "reflexive" class is one which is similar to a proper part of itself. (A "proper part" is a part short of the whole.)

A "reflexive" class is one that is similar to a proper part of itself. (A "proper part" is a part that is not the whole.)

A "reflexive" cardinal number is the cardinal number of a reflexive class.

We have now to consider this property of reflexiveness.

We now need to look at this property of reflexiveness.

One of the most striking instances of a "reflexion" is Royce's illustration of the map: he imagines it decided to make a map of England upon a part of the surface of England. A map, if it is accurate, has a perfect one-one correspondence with its original; thus our map, which is part, is in one-one relation with the whole, and must contain the same number of points as the whole, which must therefore be a reflexive number. Royce is interested in the fact that the map, if it is correct, must contain a map of the map, which must in turn contain a map of the map of the map, and so on ad infinitum. This point is interesting, but need not occupy us at this moment. In fact, we shall do well to pass from picturesque illustrations to such as are more completely definite, and for this purpose we cannot do better than consider the number-series itself.

One of the most striking examples of a "reflection" is Royce's illustration of the map: he imagines it decided to create a map of England on a part of the surface of England. A map, if it is accurate, has a perfect one-to-one correspondence with its original; thus our map, which is a part, is in one-to-one relation with the whole, and must contain the same number of points as the whole, which must therefore be a reflexive number. Royce is interested in the idea that the map, if it is correct, must include a map of the map, which must then include a map of the map of the map, and so on ad infinitum. This idea is interesting, but we don’t need to focus on it right now. In fact, we should move from picturesque illustrations to those that are more clearly defined, and for this purpose, it would be best to consider the number series itself.

The relation of $n$ to $n + 1$ , confined to inductive numbers, is one-one, has the whole of the inductive numbers for its domain, and all except 0 for its converse domain. Thus the whole class of inductive numbers is similar to what the same class becomes when we omit 0. Consequently it is a "reflexive" class according to the definition, and the number of its terms is a "reflexive" number. Again, the relation of $n$ to $2n$ , confined to inductive numbers, is one-one, has the whole of the inductive numbers for its domain, and the even inductive numbers alone for its converse domain. Hence the total number of inductive numbers is the same as the number of even inductive numbers. This property was used by Leibniz (and many others) as a proof that infinite numbers are impossible; it was thought self-contradictory that [Pg 80] "the part should be equal to the whole." But this is one of those phrases that depend for their plausibility upon an unperceived vagueness: the word "equal" has many meanings, but if it is taken to mean what we have called "similar," there is no contradiction, since an infinite collection can perfectly well have parts similar to itself. Those who regard this as impossible have, unconsciously as a rule, attributed to numbers in general properties which can only be proved by mathematical induction, and which only their familiarity makes us regard, mistakenly, as true beyond the region of the finite.

The relationship of $n$ to $n + 1$ among natural numbers is one-to-one, has all natural numbers as its domain, and all but 0 as its inverse domain. Therefore, the entire set of natural numbers is similar to what that set becomes when we exclude 0. This makes it a "reflexive" class by definition, and the total number of its elements is a "reflexive" number. Similarly, the relationship of $n$ to $2n$ among natural numbers is one-to-one, has all natural numbers as its domain, and only the even natural numbers as its inverse domain. Thus, the total number of natural numbers is equal to the number of even natural numbers. This property was used by Leibniz (and many others) to argue that infinite numbers are impossible; it was considered contradictory that "a part should equal the whole." However, this is one of those statements that sound reasonable due to an unnoticed ambiguity: the word "equal" has many interpretations, but if we interpret it as "similar," there's no contradiction, as an infinite collection can indeed have parts similar to itself. Those who consider this impossible have, often unconsciously, attributed characteristics to numbers in general that can only be demonstrated through mathematical induction, and which our familiarity mistakenly leads us to believe are true beyond the finite realm.

Whenever we can "reflect" a class into a part of itself, the same relation will necessarily reflect that part into a smaller part, and so on ad infinitum. For example, we can reflect, as we have just seen, all the inductive numbers into the even numbers; we can, by the same relation (that of $n$ to $2n$ ) reflect the even numbers into the multiples of 4, these into the multiples of 8, and so on. This is an abstract analogue to Royce's problem of the map. The even numbers are a "map" of all the inductive numbers; the multiples of 4 are a map of the map; the multiples of 8 are a map of the map of the map; and so on. If we had applied the same process to the relation of $n$ to $n + 1$ , our "map" would have consisted of all the inductive numbers except 0; the map of the map would have consisted of all from 2 onward, the map of the map of the map of all from 3 onward; and so on. The chief use of such illustrations is in order to become familiar with the idea of reflexive classes, so that apparently paradoxical arithmetical propositions can be readily translated into the language of reflexions and classes, in which the air of paradox is much less.

Whenever we can "reflect" a class into a part of itself, that same relationship will also reflect that part into a smaller part, and so on ad infinitum. For example, as we've just seen, we can reflect all the inductive numbers into the even numbers; we can, using the same relationship (that of $n$ to $2n$ ) reflect the even numbers into the multiples of 4, then those into the multiples of 8, and so on. This is an abstract parallel to Royce's problem of the map. The even numbers are a "map" of all the inductive numbers; the multiples of 4 are a map of the map; the multiples of 8 are a map of the map of the map; and so on. If we had applied the same process to the relationship of $n$ to $n + 1$ , our "map" would have included all the inductive numbers except 0; the map of the map would have included all from 2 onward, the map of the map of the map would have included all from 3 onward; and so on. The main purpose of these illustrations is to help familiarize ourselves with the concept of reflexive classes, so that seemingly paradoxical arithmetic propositions can be easily translated into the language of reflections and classes, where the paradox seems much less pronounced.

It will be useful to give a definition of the number which is that of the inductive cardinals. For this purpose we will first define the kind of series exemplified by the inductive cardinals in order of magnitude. The kind of series which is called a "progression" has already been considered in Chapter I. It is a series which can be generated by a relation of consecutiveness: [Pg 81] every member of the series is to have a successor, but there is to be just one which has no predecessor, and every member of the series is to be in the posterity of this term with respect to the relation "immediate predecessor." These characteristics may be summed up in the following definition:[20]—

It’s helpful to define the number that represents the inductive cardinals. To do this, we'll first outline the type of series that the inductive cardinals exemplify in terms of magnitude. The type of series referred to as a "progression" was previously discussed in Chapter I. It’s a series that can be generated by a relationship of consecutiveness: [Pg 81] each member of the series should have a successor, but there should only be one that has no predecessor, and every member of the series must be a descendant of this term in relation to "immediate predecessor." These features can be summarized in the following definition:[20]—

[20]Cf. Principia Mathematica, vol. II. * 123.

[20]See Principia Mathematica, vol. II. * 123.

A "progession" is a one-one relation such that there is just one term belonging to the domain but not to the converse domain, and the domain is identical with the posterity of this one term.

A "progression" is a one-to-one relation where there is only one term in the domain that doesn't belong to the converse domain, and the domain is the same as the set of all terms that come after this one term.

It is easy to see that a progression, so defined, satisfies Peano's five axioms. The term belonging to the domain but not to the converse domain will be what he calls "0"; the term to which a term has the one-one relation will be the "successor" of the term; and the domain of the one-one relation will be what he calls "number." Taking his five axioms in turn, we have the following translations:—

It is easy to see that a progression, as defined, meets Peano's five axioms. The term in the domain but not in the converse domain is what he refers to as "0"; the term that has a one-to-one relation with another term is the "successor" of that term; and the domain of the one-to-one relation is what he calls "number." Taking his five axioms one by one, we have the following translations:—

(1) "0 is a number" becomes: "The member of the domain which is not a member of the converse domain is a member of the domain." This is equivalent to the existence of such a member, which is given in our definition. We will call this member "the first term."

(1) "0 is a number" becomes: "The member of the domain that isn’t part of the converse domain is a member of the domain." This means that such a member exists, as stated in our definition. We will refer to this member as "the first term."

(2) "The successor of any number is a number" becomes: "The term to which a given member of the domain has the relation in question is again a member of the domain." This is proved as follows: By the definition, every member of the domain is a member of the posterity of the first term; hence the successor of a member of the domain must be a member of the posterity of the first term (because the posterity of a term always contains its own successors, by the general definition of posterity), and therefore a member of the domain, because by the definition the posterity of the first term is the same as the domain.

(2) "The successor of any number is a number" becomes: "The term that relates to a given member of the domain is also a member of the domain." This is demonstrated as follows: According to the definition, every member of the domain is a member of the descendants of the first term; therefore, the successor of a member of the domain must also be a member of the descendants of the first term (since the descendants of a term always include their own successors, based on the general definition of descendants), and thus a member of the domain, because according to the definition, the descendants of the first term are the same as the domain.

(3) "No two numbers have the same successor." This is only to say that the relation is one-many, which it is by definition (being one-one). [Pg 82]

(3) "No two numbers have the same successor." This just means that the relationship is one-to-many, which it is by definition (being one-to-one). [Pg 82]

(4) "0 is not the successor of any number" becomes: "The first term is not a member of the converse domain," which is again an immediate result of the definition.

(5) This is mathematical induction, and becomes: "Every member of the domain belongs to the posterity of the first term," which was part of our definition.

(5) This is mathematical induction and it means: "Every member of the domain is a descendant of the first term," which was included in our definition.

Thus progressions as we have defined them have the five formal properties from which Peano deduces arithmetic. It is easy to show that two progessions are "similar" in the sense defined for similarity of relations in Chapter VI. We can, of course, derive a relation which is serial from the one-one relation by which we define a progression: the method used is that explained in Chapter IV., and the relation is that of a term to a member of its proper posterity with respect to the original one-one relation.

Thus, progressions, as we've defined them, have the five formal properties from which Peano derives arithmetic. It's easy to demonstrate that two progressions are "similar" in the way we've defined similarity of relations in Chapter VI. We can, of course, derive a serial relationship from the one-to-one relationship we use to define a progression: the method used is the one explained in Chapter IV, and the relationship is that of a term to a member of its proper posterity concerning the original one-to-one relationship.

Two transitive asymmetrical relations which generate progressions are similar, for the same reasons for which the corresponding one-one relations are similar. The class of all such transitive generators of progressions is a "serial number" in the sense of Chapter VI.; it is in fact the smallest of infinite serial numbers, the number to which Cantor has given the name $\omega$ , by which he has made it famous.

Two transitive asymmetrical relationships that create progressions are similar for the same reasons that the corresponding one-to-one relationships are similar. The group of all these transitive generators of progressions is a "serial number" in the sense of Chapter VI; it is actually the smallest of infinite serial numbers, the number that Cantor named $\omega$ and made well-known.

But we are concerned, for the moment, with cardinal numbers. Since two progressions are similar relations, it follows that their domains (or their fields, which are the same as their domains) are similar classes. The domains of progressions form a cardinal number, since every class which is similar to the domain of a progression is easily shown to be itself the domain of a progression. This cardinal number is the smallest of the infinite cardinal numbers; it is the one to which Cantor has appropriated the Hebrew Aleph with the suffix 0, to distinguish it from larger infinite cardinals, which have other suffixes. Thus the name of the smallest of infinite cardinals is $\aleph_{0}$ .

But for now, we're focused on cardinal numbers. Since two progressions are similar relations, it follows that their domains (or fields, which are the same as their domains) are similar classes. The domains of progressions form a cardinal number because any class that is similar to the domain of a progression can easily be shown to be the domain of a progression itself. This cardinal number is the smallest of the infinite cardinal numbers; it’s the one Cantor assigned the Hebrew Aleph with the suffix 0, to set it apart from larger infinite cardinals, which have different suffixes. So, the name of the smallest infinite cardinal is $\aleph_{0}$ .

To say that a class has $\aleph_{0}$ terms is the same thing as to say that it is a member of $\aleph_{0}$ , and this is the same thing as to say [Pg 83] that the members of the class can be arranged in a progression. It is obvious that any progression remains a progression if we omit a finite number of terms from it, or every other term, or all except every tenth term or every hundredth term. These methods of thinning out a progression do not make it cease to be a progression, and therefore do not diminish the number of its terms, which remains $\aleph_{0}$ . In fact, any selection from a progression is a progression if it has no last term, however sparsely it may be distributed. Take (say) inductive numbers of the form $n^{n}$ , or $n^{n^{n}}$ . Such numbers grow very rare in the higher parts of the number series, and yet there are just as many of them as there are inductive numbers altogether, namely, $\aleph_{0}$ .

To say that a class has $\aleph_{0}$ terms means it belongs to $\aleph_{0}$ . This is equivalent to stating that the members of the class can be arranged in a sequence. It's clear that any sequence stays a sequence if we remove a finite number of terms from it, or every other term, or all terms except every tenth or every hundredth term. These ways of narrowing down a sequence do not stop it from being a sequence, and thus do not reduce its number of terms, which remains $\aleph_{0}$ . In fact, any selection from a sequence is a sequence if it has no last term, no matter how widely spaced it may be. Consider, for instance, inductive numbers of the form $n^{n}$ , or $n^{n^{n}}$ . These numbers become increasingly rare in the higher ranges of the number series, yet there are just as many of them as there are inductive numbers overall, specifically $\aleph_{0}$ .

Conversely, we can add terms to the inductive numbers without increasing their number. Take, for example, ratios. One might be inclined to think that there must be many more ratios than integers, since ratios whose denominator is 1 correspond to the integers, and seem to be only an infinitesimal proportion of ratios. But in actual fact the number of ratios (or fractions) is exactly the same as the number of inductive numbers, namely, $\aleph_{0}$ . This is easily seen by arranging ratios in a series on the following plan: If the sum of numerator and denominator in one is less than in the other, put the one before the other; if the sum is equal in the two, put first the one with the smaller numerator. This gives us the series $1,\ 1/2,\ 2,\ 1/3,\ 3,\ 1/4,\ 2/3,\ 3/2,\ 4,\ 1/5,\ \dots.$ This series is a progression, and all ratios occur in it sooner or later. Hence we can arrange all ratios in a progression, and their number is therefore $\aleph_{0}$ .

Conversely, we can add terms to the natural numbers without increasing their quantity. For instance, consider ratios. One might think there are many more ratios than integers since ratios with a denominator of 1 correspond to integers and seem to represent only a tiny fraction of all ratios. But actually, the number of ratios (or fractions) is exactly the same as the number of natural numbers, which is $\aleph_{0}$ . This can be easily demonstrated by arranging the ratios in a series like this: If the sum of the numerator and denominator in one ratio is less than in another, place it before the other; if the sums are equal, place the one with the smaller numerator first. This gives us the series $1,\ 1/2,\ 2,\ 1/3,\ 3,\ 1/4,\ 2/3,\ 3/2,\ 4,\ 1/5,\ \dots.$ This series progresses, and all ratios will appear in it eventually. Thus, we can arrange all ratios in a progression, and their number is therefore $\aleph_{0}$ .

It is not the case, however, that all infinite collections have $\aleph_{0}$ terms. The number of real numbers, for example, is greater than $\aleph_{0}$ ; it is, in fact, $2^{\aleph_{0}}$ , and it is not hard to prove that $2^{n}$ is greater than $n$ even when $n$ is infinite. The easiest way of proving this is to prove, first, that if a class has $n$ members, it contains $2^{n}$ sub-classes—in other words, that there are $2^{n}$ ways [Pg 84] of selecting some of its members (including the extreme cases where we select all or none); and secondly, that the number of sub-classes contained in a class is always greater than the number of members of the class. Of these two propositions, the first is familiar in the case of finite numbers, and is not hard to extend to infinite numbers. The proof of the second is so simple and so instructive that we shall give it:

It’s not true that all infinite collections have $\aleph_{0}$ terms. For example, the number of real numbers is greater than $\aleph_{0}$ ; it is actually $2^{\aleph_{0}}$ , and it's easy to prove that $2^{n}$ is greater than $n$ even when $n$ is infinite. The simplest way to prove this is to show, first, that if a class has $n$ members, it contains $2^{n}$ sub-classes—in other words, that there are $2^{n}$ ways [Pg 84] to select some of its members (including the extremes where we select all or none); and secondly, that the number of sub-classes within a class is always greater than the number of members in that class. Of these two statements, the first is well-known in the case of finite numbers and is easy to extend to infinite numbers. The proof of the second is so straightforward and so insightful that we will present it:

In the first place, it is clear that the number of sub-classes of a given class (say $\alpha$ ) is at least as great as the number of members, since each member constitutes a sub-class, and we thus have a correlation of all the members with some of the sub-classes. Hence it follows that, if the number of sub-classes is not equal to the number of members, it must be greater. Now it is easy to prove that the number is not equal, by showing that, given any one-one relation whose domain is the members and whose converse domain is contained among the set of sub-classes, there must be at least one sub-class not belonging to the converse domain. The proof is as follows:[21] When a one-one correlation $\mathrm R$ is established between all the members of $\alpha$ and some of the sub-classes, it may happen that a given member $x$ is correlated with a sub-class of which it is a member; or, again, it may happen that $x$ is correlated with a sub-class of which it is not a member. Let us form the whole class, $\beta$ say, of those members $x$ which are correlated with sub-classes of which they are not members. This is a sub-class of $\alpha$ , and it is not correlated with any member of $\alpha$ . For, taking first the members of $\beta$ , each of them is (by the definition of $\beta$ ) correlated with some sub-class of which it is not a member, and is therefore not correlated with $\beta$ . Taking next the terms which are not members of $\beta$ , each of them (by the definition of $\beta$ ) is correlated with some sub-class of which it is a member, and therefore again is not correlated with $\beta$ . Thus no member of $\alpha$ is correlated with $\beta$ . Since $\mathrm R$ was any one-one correlation of all members [Pg 85] with some sub-classes, it follows that there is no correlation of all members with all sub-classes. It does not matter to the proof if $\beta$ has no members: all that happens in that case is that the sub-class which is shown to be omitted is the null-class. Hence in any case the number of sub-classes is not equal to the number of members, and therefore, by what was said earlier, it is greater. Combining this with the proposition that, if $n$ is the number of members, $2^{n}$ is the number of sub-classes, we have the theorem that $2^{n}$ is always greater than $n$ , even when $n$ is infinite.

In the first place, it's clear that the number of sub-classes in a given class (let's say $\alpha$ ) is at least as large as the number of members, since each member represents a sub-class, and we thus have a correlation of all the members with some of the sub-classes. Therefore, it follows that if the number of sub-classes is not equal to the number of members, it must be greater. Now, it’s easy to prove that the number is not equal by showing that, given any one-to-one relation whose domain consists of the members and whose converse domain is included among the set of sub-classes, there must be at least one sub-class not included in the converse domain. The proof is as follows:[21] When a one-to-one correlation $\mathrm R$ is established between all the members of $\alpha$ and some of the sub-classes, it could happen that a given member $x$ is correlated with a sub-class of which it is also a member; or, alternatively, $x$ could be correlated with a sub-class of which it is not a member. Let’s create the whole class, $\beta$ , consisting of those members $x$ that are correlated with sub-classes of which they are not members. This is a sub-class of $\alpha$ , and it is not correlated with any member of $\alpha$ . For, starting with the members of $\beta$ , each of them is (by the definition of $\beta$ ) correlated with some sub-class of which it is not a member, and therefore is not correlated with $\beta$ . Next, consider the terms that are not members of $\beta$ . Each of them (by the definition of $\beta$ ) is correlated with some sub-class of which it is a member, and therefore again is not correlated with $\beta$ . Thus no member of $\alpha$ is correlated with $\beta$ . Since $\mathrm R$ was any one-to-one correlation of all members with some sub-classes, it follows that there is no correlation of all members with all sub-classes. It doesn't matter for the proof if $\beta$ has no members: the only consequence in that case is that the sub-class that is shown to be omitted is the null-class. Therefore, in any case, the number of sub-classes is not equal to the number of members, and thus, as stated earlier, it is greater. Combining this with the idea that if $n$ is the number of members, $2^{n}$ is the number of sub-classes, we arrive at the theorem that $2^{n}$ is always greater than $n$ , even when $n$ is infinite.

[21]This proof is taken from Cantor, with some simplifications: see Jahresbericht der deutschen Mathematiker-Vereinigung, I. (1892), p. 77.

[21]This proof is derived from Cantor, with some simplifications: see Jahresbericht der deutschen Mathematiker-Vereinigung, I. (1892), p. 77.

It follows from this proposition that there is no maximum to the infinite cardinal numbers. However great an infinite number $n$ may be, $2^{n}$ will be still greater. The arithmetic of infinite numbers is somewhat surprising until one becomes accustomed to it. We have, for example, $\begin{align*} \aleph_{0} + 1 &= \aleph_{0}, \\ \aleph_{0} + n &= \aleph_{0}, \text{where $n$ is any inductive number,} \\ \aleph_{0}^{2} &= \aleph_{0}. \end{align*}$ (This follows from the case of the ratios, for, since a ratio is determined by a pair of inductive numbers, it is easy to see that the number of ratios is the square of the number of inductive numbers, i.e. it is $\aleph_{0}^{2}$ ; but we saw that it is also $\aleph_{0}$ .) $\begin{alignat*}{2} &&\llap{$\aleph_{0}^{n}$} &= \aleph_{0}, \text{where $n$ is any inductive number.} \\ &\text{(This follows from } \aleph_{0}^{2} &&= \aleph_{0} \text{ by induction; for if $\aleph_{0}^{n} = \aleph_{0}$,} \\ &\text{then} &\llap{$\aleph_{0}^{n+1}} &= \aleph_{0}^{2} = \aleph_{0}.) \end{alignat*}$ But $2^{\aleph_{0}} > \aleph_{0}.$ In fact, as we shall see later, $2^{\aleph_{0}}$ is a very important number, namely, the number of terms in a series which has "continuity" in the sense in which this word is used by Cantor. Assuming space and time to be continuous in this sense (as we commonly do in analytical geometry and kinematics), this will be the number of points in space or of instants in time; it will also be the number of points in any finite portion of space, whether [Pg 86] line, area, or volume. After $\aleph_{0}$ , $2^{\aleph_{0}}$ is the most important and interesting of infinite cardinal numbers.

It follows from this proposition that there is no maximum to the infinite cardinal numbers. No matter how large an infinite number $ n $ is, $ 2^{n} $ will be even larger. The arithmetic of infinite numbers can be surprising until you get used to it. For example, we have: $\begin{align*} \aleph_{0} + 1 &= \aleph_{0}, \\ \aleph_{0} + n &= \aleph_{0}, \text{where $n$ is any inductive number,} \\ \aleph_{0}^{2} &= \aleph_{0}. \end{align*}$ (This comes from the case of ratios, because a ratio is determined by a pair of inductive numbers, it's easy to see that the number of ratios is the square of the number of inductive numbers, meaning it is $ \aleph_{0}^{2} $; but we also found that it is $ \aleph_{0} $.) $\begin{alignat*}{2} &&\llap{$\aleph_{0}^{n}$} &= \aleph_{0}, \text{where $n$ is any inductive number.} \\ &\text{(This follows from } \aleph_{0}^{2} &&= \aleph_{0} \text{ by induction; for if $\aleph_{0}^{n} = \aleph_{0}$,} \\ &\text{then} &\llap{$\aleph_{0}^{n+1}} &= \aleph_{0}^{2} = \aleph_{0}.) \end{alignat*}$ But $2^{\aleph_{0}} > \aleph_{0}.$ In fact, as we will see later, $ 2^{\aleph_{0}} $ is a very important number, specifically, the number of terms in a series that has "continuity" in the sense that Cantor uses this word. Assuming space and time to be continuous in this way (as we usually do in analytical geometry and kinematics), this will be the number of points in space or moments in time; it will also be the number of points in any finite portion of space, whether line, area, or volume. After $ \aleph_{0} $, $ 2^{\aleph_{0}} $ is the most important and interesting of infinite cardinal numbers.

Although addition and multiplication are always possible with infinite cardinals, subtraction and division no longer give definite results, and cannot therefore be employed as they are employed in elementary arithmetic. Take subtraction to begin with: so long as the number subtracted is finite, all goes well; if the other number is reflexive, it remains unchanged. Thus $\aleph_{0} - n = \aleph_{0}$ , if $n$ is finite; so far, subtraction gives a perfectly definite result. But it is otherwise when we subtract $\aleph_{0}$ from itself; we may then get any result, from 0 up to $\aleph_{0}$ . This is easily seen by examples. From the inductive, numbers, take away the following collections of $\aleph_{0}$ terms:—

Although addition and multiplication are always possible with infinite cardinals, subtraction and division don't provide definite results and can't be used in the same way as they are in basic arithmetic. Let's start with subtraction: as long as the number being subtracted is finite, everything works fine; if the other number is reflexive, it stays the same. So, $\aleph_{0} - n = \aleph_{0}$ , if $n$ is finite; in this case, subtraction gives a clear result. However, it's different when we subtract $\aleph_{0}$ from itself; we could end up with any result, from 0 to $\aleph_{0}$ . This is easily demonstrated with examples. From the inductive numbers, remove the following collections of $\aleph_{0}$ terms:—

(1) All the inductive numbers—remainder, zero.

(1) All the whole numbers—remainder, zero.

(2) All the inductive numbers from $n$ onwards—remainder, the numbers from 0 to $n - 1$ , numbering $n$ terms in all.

(2) All the inductive numbers from $n$ onwards—remainder, the numbers from 0 to $n - 1$ , numbering $n$ terms in total.

(3) All the odd numbers—remainder, all the even numbers, numbering $\aleph_{0}$ terms.

All these are different ways of subtracting $\aleph_{0}$ from $\aleph_{0}$ , and all give different results.

All these are different ways of subtracting $\aleph_{0}$ from $\aleph_{0}$ , and they all produce different outcomes.

As regards division, very similar results follow from the fact that $\aleph_{0}$ is unchanged when multiplied by 2 or 3 or any finite number $n$ or by $\aleph_{0}$ . It follows that $\aleph_{0}$ divided by $\aleph_{0}$ may have any value from 1 up to $\aleph_{0}$ .

As for division, very similar outcomes result from the fact that $\aleph_{0}$ stays the same when multiplied by 2, 3, or any finite number $n$ or by $\aleph_{0}$ . This means that $\aleph_{0}$ divided by $\aleph_{0}$ can have any value from 1 to $\aleph_{0}$ .

From the ambiguity of subtraction and division it results that negative numbers and ratios cannot be extended to infinite numbers. Addition, multiplication, and exponentiation proceed quite satisfactorily, but the inverse operations—subtraction, division, and extraction of roots—are ambiguous, and the notions that depend upon them fail when infinite numbers are concerned.

From the uncertainty of subtraction and division, it follows that negative numbers and ratios can't be extended to infinite numbers. Addition, multiplication, and exponentiation work just fine, but the opposite operations—subtraction, division, and finding roots—are unclear, and the concepts that rely on them break down when it comes to infinite numbers.

The characteristic by which we defined finitude was mathematical induction, i.e. we defined a number as finite when it obeys mathematical induction starting from 0, and a class as finite when its number is finite. This definition yields the sort of result that a definition ought to yield, namely, that the finite [Pg 87] numbers are those that occur in the ordinary number-series 0, 1, 2, 3, ... But in the present chapter, the infinite numbers we have discussed have not merely been non-inductive: they have also been reflexive. Cantor used reflexiveness as the definition of the infinite, and believes that it is equivalent to non-inductiveness; that is to say, he believes that every class and every cardinal is either inductive or reflexive. This may be true, and may very possibly be capable of proof; but the proofs hitherto offered by Cantor and others (including the present author in former days) are fallacious, for reasons which will be explained when we come to consider the "multiplicative axiom." At present, it is not known whether there are classes and cardinals which are neither reflexive nor inductive. If $n$ were such a cardinal, we should not have $n = n + 1$ , but $n$ would not be one of the "natural numbers," and would be lacking in some of the inductive properties. All known infinite classes and cardinals are reflexive; but for the present it is well to preserve an open mind as to whether there are instances, hitherto unknown, of classes and cardinals which are neither reflexive nor inductive. Meanwhile, we adopt the following definitions:—

The way we defined finitude was through mathematical induction; that is, we considered a number as finite if it follows mathematical induction starting from 0, and a class as finite if its number is finite. This definition gives the expected result, which is that the finite numbers are the ones that show up in the standard number series 0, 1, 2, 3, ... However, in this chapter, the infinite numbers we've talked about are not just non-inductive; they are also reflexive. Cantor used reflexiveness as the definition of the infinite and believes it is the same as non-inductiveness. In other words, he thinks that every class and every cardinal is either inductive or reflexive. This might be true and could potentially be proven, but the proofs presented so far by Cantor and others (including myself in the past) are flawed, for reasons that will be discussed when we examine the "multiplicative axiom." Currently, it is unknown whether there are classes and cardinals that are neither reflexive nor inductive. If $n$ were such a cardinal, we wouldn't have $n = n + 1$ , but $n$ would not be one of the "natural numbers" and would lack some inductive properties. All known infinite classes and cardinals are reflexive, but for now, it's wise to keep an open mind about whether there are unknown instances of classes and cardinals that are neither reflexive nor inductive. In the meantime, we adopt the following definitions:—

A finite class or cardinal is one which is inductive.

A finite class or cardinal is one that is inductive.

An infinite class or cardinal is one which is not inductive. All reflexive classes and cardinals are infinite; but it is not known at present whether all infinite classes and cardinals are reflexive. We shall return to this subject in Chapter XII. [Pg 88]

An infinite class or cardinal is one that is not inductive. All reflexive classes and cardinals are infinite, but it's currently unknown whether all infinite classes and cardinals are reflexive. We will revisit this topic in Chapter XII. [Pg 88]

CHAPTER IX

INFINITE SERIES AND ORDINALS

AN "infinite series" may be defined as a series of which the field is an infinite class. We have already had occasion to consider one kind of infinite series, namely, progressions. In this chapter we shall consider the subject more generally.

AN "infinite series" can be defined as a series with an infinite set of elements. We've already looked at one type of infinite series, namely progressions. In this chapter, we'll explore the topic in more detail.

The most noteworthy characteristic of an infinite series is that its serial number can be altered by merely re-arranging its terms. In this respect there is a certain oppositeness between cardinal and serial numbers. It is possible to keep the cardinal number of a reflexive class unchanged in spite of adding terms to it; on the other hand, it is possible to change the serial number of a series without adding or taking away any terms, by mere re-arrangement. At the same time, in the case of any infinite series it is also possible, as with cardinals, to add terms without altering the serial number: everything depends upon the way in which they are added.

The most notable feature of an infinite series is that its position can be changed simply by rearranging its terms. In this regard, there is a clear difference between cardinal and serial numbers. You can keep the cardinal number of a reflexive class the same even while adding terms to it; however, you can change the serial number of a series without adding or removing terms, just by rearranging them. At the same time, with any infinite series, it is also possible, just like with cardinals, to add terms without changing the serial number: it all depends on how they are added.

In order to make matters clear, it will be best to begin with examples. Let us first consider various different kinds of series which can be made out of the inductive numbers arranged on various plans. We start with the series $1,\ 2,\ 3,\ 4,\ \dots\ n,\ \dots,$ which, as we have already seen, represents the smallest of infinite serial numbers, the sort that Cantor calls $\omega$ . Let us proceed to thin out this series by repeatedly performing the [Pg 89] operation of removing to the end the first even number that occurs. We thus obtain in succession the various series: $\begin{align*} &1,\ 3,\ 4,\ 5,\ \dots\ n,\ \dots\ 2, \\ &1,\ 3,\ 5,\ 6,\ \dots\ n + 1,\ \dots\ 2,\ 4, \\ &1,\ 3,\ 5,\ 7,\ \dots\ n + 2,\ \dots\ 2,\ 4,\ 6, \end{align*}$ and so on. If we imagine this process carried on as long as possible, we finally reach the series $1,\ 3,\ 5,\ 7,\ \dots\ 2n + 1,\ \dots\ 2,\ 4,\ 6,\ 8,\ \dots\ 2n,\ \dots,$ in which we have first all the odd numbers and then all the even numbers.

To clarify things, let’s start with some examples. First, let's look at different types of sequences made from the inductive numbers arranged in various ways. We begin with the sequence $1,\ 2,\ 3,\ 4,\ \dots\ n,\ \dots,$ which, as we've seen, represents the smallest set of infinite serial numbers, referred to by Cantor as $\omega$ . Let’s gradually dwindle this sequence by repeatedly moving the first even number that appears to the end. This gives us the following series: $\begin{align*} &1,\ 3,\ 4,\ 5,\ \dots\ n,\ \dots\ 2, \\ &1,\ 3,\ 5,\ 6,\ \dots\ n + 1,\ \dots\ 2,\ 4, \\ &1,\ 3,\ 5,\ 7,\ \dots\ n + 2,\ \dots\ 2,\ 4,\ 6, \end{align*}$ and so on. If we imagine this process continued as long as possible, we arrive at the series $1,\ 3,\ 5,\ 7,\ \dots\ 2n + 1,\ \dots\ 2,\ 4,\ 6,\ 8,\ \dots\ 2n,\ \dots,$ where we have all the odd numbers first, followed by all the even numbers.

The serial numbers of these various series are $\omega + 1, \omega + 2, \omega + 3, \dots\ 2\omega$ . Each of these numbers is "greater" than any of its predecessors, in the following sense:—

The serial numbers of these different series are $\omega + 1, \omega + 2, \omega + 3, \dots\ 2\omega$ . Each of these numbers is "greater" than any of the numbers that came before it, in this way:—

One serial number is said to be "greater" than another if any series having the first number contains a part having the second number, but no series having the second number contains a part having the first number.

One serial number is considered "greater" than another if any sequence that includes the first number has a segment containing the second number, but no sequence containing the second number has a segment with the first number.

If we compare the two series $\begin{align*} &1,\ 2,\ 3,\ 4,\ \dots\ n,\ \dots, \\ &1,\ 3,\ 4,\ 5,\ \dots\ n + 1,\ \dots\ 2, \end{align*}$ we see that the first is similar to the part of the second which omits the last term, namely, the number 2, but the second is not similar to any part of the first. (This is obvious, but is easily demonstrated.) Thus the second series has a greater serial number than the first, according to the definition—i.e. $\omega + 1$ is greater than $\omega$ . But if we add a term at the beginning of a progression instead of the end, we still have a progression. Thus $1 + \omega = \omega$ . Thus $1 + \omega$ is not equal to $\omega + 1$ . This is characteristic of relation-arithmetic generally: if $\mu$ and $\nu$ are two relation-numbers, the general rule is that $\mu + \nu$ is not equal to $\nu + \mu$ . The case of finite ordinals, in which there is equality, is quite exceptional.

If we compare the two series $\begin{align*} &1,\ 2,\ 3,\ 4,\ \dots\ n,\ \dots, \\ &1,\ 3,\ 4,\ 5,\ \dots\ n + 1,\ \dots\ 2, \end{align*}$ we see that the first series resembles the part of the second series that excludes the last term, specifically the number 2. However, the second series does not resemble any part of the first. (This is obvious but can be easily demonstrated.) Therefore, the second series has a higher serial number than the first, according to the definition—i.e. $\omega + 1$ is greater than $\omega$ . However, if we add a term at the beginning of a sequence instead of the end, we still maintain a sequence. Thus, $1 + \omega = \omega$ . Therefore, $1 + \omega$ is not equal to $\omega + 1$ . This is characteristic of relation-arithmetic in general: if $\mu$ and $\nu$ are two relation-numbers, the general rule is that $\mu + \nu$ is not equal to $\nu + \mu$ . The case of finite ordinals, where equality exists, is quite exceptional.

The series we finally reached just now consisted of first all the odd numbers and then all the even numbers, and its serial [Pg 90] number is $2\omega$ . This number is greater than $\omega$ or $\omega + n$ , where $n$ is finite. It is to be observed that, in accordance with the general definition of order, each of these arrangements of integers is to be regarded as resulting from some definite relation. E.g. the one which merely removes 2 to the end will be defined by the following relation: " $x$ and $y$ are finite integers, and either $y$ is 2 and $x$ is not 2, or neither is 2 and $x$ is less than $y$ ." The one which puts first all the odd numbers and then all the even ones will be defined by: " $x$ and $y$ are finite integers, and either $x$ is odd and $y$ is even or $x$ is less than $y$ and both are odd or both are even." We shall not trouble, as a rule, to give these formulæ in future; but the fact that they could be given is essential.

The series we just reached consisted first of all the odd numbers and then all the even numbers, and its serial [Pg 90] number is $2\omega$ . This number is greater than $\omega$ or $\omega + n$ , where $n$ is finite. It should be noted that, according to the general definition of order, each of these arrangements of integers results from a specific relation. For example, the one that just moves 2 to the end will be defined by the following relation: " $x$ and $y$ are finite integers, and either $y$ is 2 and $x$ is not 2, or neither is 2 and $x$ is less than $y$ ." The one that puts all the odd numbers first and then all the even ones will be defined as: " $x$ and $y$ are finite integers, and either $x$ is odd and $y$ is even, or $x$ is less than $y$ and both are odd or both are even." We generally won't bother to provide these formulas in the future, but the fact that they could be provided is important.

The number which we have called $2\omega$ , namely, the number of a series consisting of two progressions, is sometimes called $\omega · 2$ . Multiplication, like addition, depends upon the order of the factors: a progression of couples gives a series such as $x_{1},\ y_{1},\ x_{2},\ y_{2},\ x_{3},\ y_{3},\ \dots\ x_{n},\ y_{n},\ \dots,$ which is itself a progression; but a couple of progressions gives a series which is twice as long as a progression. It is therefore necessary to distinguish between $2\omega$ and $\omega · 2$ . Usage is variable; we shall use $2\omega$ for a couple of progressions and $\omega · 2$ for a progression of couples, and this decision of course governs our general interpretation of " $\alpha · \beta$ " when $\alpha$ and $\beta$ are relation-numbers: " $\alpha · \beta$ " will have to stand for a suitably constructed sum of a relations each having $\beta$ terms.

The number we've called $2\omega$ , which represents a series made up of two progressions, is sometimes referred to as $\omega · 2$ . Multiplication, similar to addition, is influenced by the order of the factors: a couple's progression results in a series like $x_{1},\ y_{1},\ x_{2},\ y_{2},\ x_{3},\ y_{3},\ \dots\ x_{n},\ y_{n},\ \dots,$ which is a progression itself; however, a pair of progressions creates a series that is twice as long as a single progression. It is therefore crucial to distinguish between $2\omega$ and $\omega · 2$ . Usage can vary; we will use $2\omega$ for a pair of progressions and $\omega · 2$ for a progression of pairs, and this choice will guide our general interpretation of " $\alpha$ and $\beta$ are relation-numbers: " $\beta$ terms.

We can proceed indefinitely with the process of thinning out the inductive numbers. For example, we can place first the odd numbers, then their doubles, then the doubles of these, and so on. We thus obtain the series $\begin{align*} 1,\ 3,\ 5,\ 7,\ \dots;\quad 2,\ 6,\ 10,\ 14,\ \dots;\quad 4,\ 12,\ 20,\ 28,\ \dots; \\ 8,\ 24,\ 40,\ 56,\ \dots, \end{align*}$ of which the number is $\omega^{2}$ , since it is a progression of progressions. Any one of the progressions in this new series can of course be [Pg 91] thinned out as we thinned out our original progression. We can proceed to $\omega^{3}$ , $\omega^{4}$ , ... $\omega^{\omega}$ , and so on; however far we have gone, we can always go further.

We can continue endlessly with the process of narrowing down the inductive numbers. For example, we can start with the odd numbers, then their doubles, then the doubles of those, and so on. This gives us the series $\begin{align*} 1,\ 3,\ 5,\ 7,\ \dots;\quad 2,\ 6,\ 10,\ 14,\ \dots;\quad 4,\ 12,\ 20,\ 28,\ \dots; \\ 8,\ 24,\ 40,\ 56,\ \dots, \end{align*}$ which has the number $\omega^{2}$ , since it’s a progression of progressions. Any one of the sequences in this new series can be thinned out just like we did with our original sequence. We can continue to $\omega^{3}$ , $\omega^{4}$ , ... $\omega^{\omega}$ , and so on; no matter how far we've gone, we can always go further.

The series of all the ordinals that can be obtained in this way, i.e. all that can be obtained by thinning out a progression, is itself longer than any series that can be obtained by re-arranging the terms of a progression. (This is not difficult to prove.) The cardinal number of the class of such ordinals can be shown to be greater than $\aleph_{0}$ ; it is the number which Cantor calls $\aleph_{1}$ . The ordinal number of the series of all ordinals that can be made out of an $\aleph_{0}$ , taken in order of magnitude, is called $\omega_{1}$ . Thus a series whose ordinal number is $\omega_{1}$ has a field whose cardinal number is $\aleph_{1}$ .

The sequence of all the ordinals that can be produced this way, i.e. all that can be created by narrowing down a progression, is itself longer than any sequence that can be formed by rearranging the terms of a progression. (This isn’t hard to prove.) The cardinal number of the set of such ordinals can be shown to be greater than $\aleph_{0}$ ; it is the number that Cantor refers to as $\aleph_{1}$ . The ordinal number of the series of all ordinals that can be constructed from an $\aleph_{0}$ , arranged in order of size, is called $\omega_{1}$ . Therefore, a series whose ordinal number is $\omega_{1}$ has a field whose cardinal number is $\aleph_{1}$ .

We can proceed from $\omega_{1}$ and $\aleph_{1}$ to $\omega_{2}$ and $\aleph_{2}$ by a process exactly analogous to that by which we advanced from $\omega$ and $\aleph_{0}$ to $\omega_{1}$ and $\aleph_{1}$ . And there is nothing to prevent us from advancing indefinitely in this way to new cardinals and new ordinals. It is not known whether $2^{\aleph_{0}}$ is equal to any of the cardinals in the series of Alephs. It is not even known whether it is comparable with them in magnitude; for aught we know, it may be neither equal to nor greater nor less than any one of the Alephs. This question is connected with the multiplicative axiom, of which we shall treat later.

We can move from $\omega_{1}$ and $\aleph_{1}$ to $\omega_{2}$ and $\aleph_{2}$ using a process similar to how we moved from $\omega$ and $\aleph_{0}$ to $\omega_{1}$ and $\aleph_{1}$ . And there’s nothing stopping us from continuing this way to new cardinals and new ordinals. It isn't known whether $2^{\aleph_{0}}$ is equal to any of the cardinals in the Aleph series. It's not even known if it's comparable in size to them; for all we know, it might be neither equal to, greater than, nor less than any of the Alephs. This question is tied to the multiplicative axiom, which we will discuss later.

All the series we have been considering so far in this chapter have been what is called "well-ordered." A well-ordered series is one which has a beginning, and has consecutive terms, and has a term next after any selection of its terms, provided there are any terms after the selection. This excludes, on the one hand, compact series, in which there are terms between any two, and on the other hand series which have no beginning, or in which there are subordinate parts having no beginning. The series of negative integers in order of magnitude, having no beginning, but ending with -1, is not well-ordered; but taken in the reverse order, beginning with -1, it is well-ordered, being in fact a progression. The definition is: [Pg 92]

All the series we've looked at so far in this chapter are what we call "well-ordered." A well-ordered series has a starting point, features consecutive terms, and has a term next after any selection of its terms, as long as there are additional terms following that selection. This rules out, on one side, compact series, where there are terms between any two, and on the other side, series that have no beginning or where there are subparts that lack a starting point. The series of negative integers arranged by magnitude doesn’t have a beginning and only ends with -1, so it’s not well-ordered. However, if we reverse it, starting with -1, it becomes well-ordered, essentially forming a progression. The definition is: [Pg 92]

A "well-ordered" series is one in which every sub-class (except, of course, the null-class) has a first term.

A "well-ordered" series is one where every subclass (except, of course, the null class) has a first term.

An "ordinal" number means the relation-number of a well-ordered series. It is thus a species of serial number.

An "ordinal" number refers to the position of an item in a well-organized sequence. It is essentially a type of serial number.

Among well-ordered series, a generalised form of mathematical induction applies. A property may be said to be "transfinitely hereditary" if, when it belongs to a certain selection of the terms in a series, it belongs to their immediate successor provided they have one. In a well-ordered series, a transfinitely hereditary property belonging to the first term of the series belongs to the whole series. This makes it possible to prove many propositions concerning well-ordered series which are not true of all series.

Among well-ordered sequences, a generalized version of mathematical induction applies. A property is considered "transfinitely hereditary" if it holds for a specific selection of terms in a sequence, and it also holds for their immediate successor if there is one. In a well-ordered sequence, a transfinitely hereditary property that holds for the first term also holds for the entire sequence. This allows for the proof of many statements about well-ordered sequences that are not true for all sequences.

It is easy to arrange the inductive numbers in series which are not well-ordered, and even to arrange them in compact series. For example, we can adopt the following plan: consider the decimals from .1 (inclusive) to 1 (exclusive), arranged in order of magnitude. These form a compact series; between any two there are always an infinite number of others. Now omit the dot at the beginning of each, and we have a compact series consisting of all finite integers except such as divide by 10. If we wish to include those that divide by 10, there is no difficulty; instead of starting with .1, we will include all decimals less than 1, but when we remove the dot, we will transfer to the right any 0's that occur at the beginning of our decimal. Omitting these, and returning to the ones that have no 0's at the beginning, we can state the rule for the arrangement of our integers as follows: Of two integers that do not begin with the same digit, the one that begins with the smaller digit comes first. Of two that do begin with the same digit, but differ at the second digit, the one with the smaller second digit comes first, but first of all the one with no second digit; and so on. Generally, if two integers agree as regards the first $n$ digits, but not as regards the $(n + 1)^\mathord{th}$ , that one comes first which has either no $(n + 1)^\mathord{th}$ digit or a smaller one than the other. This rule of arrangement, [Pg 93] as the reader can easily convince himself, gives rise to a compact series containing all the integers not divisible by 10; and, as we saw, there is no difficulty about including those that are divisible by 10. It follows from this example that it is possible to construct compact series having $\aleph_{0}$ terms. In fact, we have already seen that there are $\aleph_{0}$ ratios, and ratios in order of magnitude form a compact series; thus we have here another example. We shall resume this topic in the next chapter.

It’s easy to arrange inductive numbers in sequences that aren't well-ordered and even to organize them in compact series. For instance, let’s consider the decimals from .1 (inclusive) to 1 (exclusive), sorted by size. These create a compact series; there are always an infinite number of other decimals between any two. Now, if we drop the dot at the start of each, we get a compact series made up of all finite integers except those that are divisible by 10. If we want to include those divisible by 10, it’s not an issue; instead of starting with .1, we'll consider all decimals less than 1, but when we remove the dot, we’ll shift any leading 0's to the right. Ignoring these and returning to the numbers that don’t have leading 0's, we can set up the rule for arranging our integers like this: Of two integers that start with different digits, the one that starts with the smaller digit comes first. Of two that start with the same digit but differ at the second digit, the one with the smaller second digit comes first, but the one without a second digit comes first overall; and so on. Generally, if two integers have the same first $n$ digits, but differ at the $(n + 1)^\mathord{th}$ digit, the one that has no $(n + 1)^\mathord{th}$ digit or a smaller one comes first. This arrangement rule, [Pg 93] as you can easily verify, results in a compact series comprising all integers not divisible by 10; and, as we noted, there’s no problem in including those that are divisible by 10. From this example, we can see that it’s possible to create compact series with $\aleph_{0}$ terms. Indeed, we’ve already established that there are $\aleph_{0}$ ratios, and ratios sorted by size create a compact series, giving us yet another example. We’ll revisit this topic in the next chapter.

Of the usual formal laws of addition, multiplication, and exponentiation, all are obeyed by transfinite cardinals, but only some are obeyed by transfinite ordinals, and those that are obeyed by them are obeyed by all relation-numbers. By the "usual formal laws" we mean the following:—

Of the standard rules for addition, multiplication, and exponentiation, all apply to transfinite cardinals, but only some apply to transfinite ordinals, and those that do apply to them are also applicable to all relation-numbers. By the "standard rules," we mean the following:—

I. The commutative law: $\alpha + \beta = \beta + \alpha \quad\text{and}\quad \alpha × \beta = \beta × \alpha.$

II. The associative law: $(\alpha + \beta) + \gamma = \alpha + (\beta + \gamma) \quad\text{and}\quad (\alpha × \beta) × \gamma = \alpha × (\beta × \gamma).$

III. The distributive law: $\alpha(\beta + \gamma) = \alpha\beta + \alpha\gamma.$

When the commutative law does not hold, the above form of the distributive law must be distinguished from $(\beta + \gamma)\alpha = \beta\alpha + \gamma\alpha.$ As we shall see immediately, one form may be true and the other false.

When the commutative law doesn't apply, we need to differentiate the above version of the distributive law from $(\beta + \gamma)\alpha = \beta\alpha + \gamma\alpha.$ As we will see shortly, one version may be true while the other is false.

IV. The laws of exponentiation: $\alpha^{\beta} · \alpha^{\gamma} = \alpha^{\beta + \gamma},\quad \alpha^{\gamma} · \beta^{\gamma} = (\alpha\beta)^{\gamma},\quad (\alpha^{\beta})^{\gamma} = \alpha^{\beta\gamma}.$

All these laws hold for cardinals, whether finite or infinite, and for finite ordinals. But when we come to infinite ordinals, or indeed to relation-numbers in general, some hold and some do not. The commutative law does not hold; the associative law does hold; the distributive law (adopting the convention [Pg 94] we have adopted above as regards the order of the factors in a product) holds in the form $(\beta + \gamma)\alpha = \beta\alpha + \gamma\alpha,$ but not in the form $\alpha(\beta + \gamma) = \alpha\beta + \alpha\gamma;$ the exponential laws $\alpha^{\beta} · \alpha^{\gamma} = \alpha^{\beta + \gamma}, \quad\text{and}\quad (\alpha^{\beta})^{\gamma} = \alpha^{\beta\gamma}$ still hold, but not the law $\alpha^{\gamma} · \beta^{\gamma} = (\alpha\beta)^{\gamma},$ which is obviously connected with the commutative law for multiplication.

All these laws apply to cardinals, whether finite or infinite, and to finite ordinals. However, when it comes to infinite ordinals, or really to relation numbers in general, some laws apply and some do not. The commutative law does not apply; the associative law does apply; the distributive law (following the convention we established earlier about the order of the factors in a product) applies in the form $(\beta + \gamma)\alpha = \beta\alpha + \gamma\alpha,$ but not in the form $\alpha(\beta + \gamma) = \alpha\beta + \alpha\gamma;$ The exponential laws $\alpha^{\beta} · \alpha^{\gamma} = \alpha^{\beta + \gamma}, \quad\text{and}\quad (\alpha^{\beta})^{\gamma} = \alpha^{\beta\gamma}$ still hold, but not the law $\alpha^{\gamma} · \beta^{\gamma} = (\alpha\beta)^{\gamma},$ which is clearly linked to the commutative law for multiplication.

The definitions of multiplication and exponentiation that are assumed in the above propositions are somewhat complicated. The reader who wishes to know what they are and how the above laws are proved must consult the second volume of Principia Mathematica, * 172-176.

The definitions of multiplication and exponentiation that are assumed in the above propositions are a bit complex. Readers who want to understand what they are and how the above laws are proven should check out the second volume of Principia Mathematica, * 172-176.

Ordinal transfinite arithmetic was developed by Cantor at an earlier stage than cardinal transfinite arithmetic, because it has various technical mathematical uses which led him to it. But from the point of view of the philosophy of mathematics it is less important and less fundamental than the theory of transfinite cardinals. Cardinals are essentially simpler than ordinals, and it is a curious historical accident that they first appeared as an abstraction from the latter, and only gradually came to be studied on their own account. This does not apply to Frege's work, in which cardinals, finite and transfinite, were treated in complete independence of ordinals; but it was Cantor's work that made the world aware of the subject, while Frege's remained almost unknown, probably in the main on account of the difficulty of his symbolism. And mathematicians, like other people, have more difficulty in understanding and using notions which are comparatively "simple" in the logical sense than in manipulating more complex notions which are [Pg 95] more akin to their ordinary practice. For these reasons, it was only gradually that the true importance of cardinals in mathematical philosophy was recognised. The importance of ordinals, though by no means small, is distinctly less than that of cardinals, and is very largely merged in that of the more general conception of relation-numbers. [Pg 96]

Ordinal transfinite arithmetic was created by Cantor earlier than cardinal transfinite arithmetic because it has various technical mathematical applications that led him to it. However, from a philosophical perspective in mathematics, it is less significant and not as fundamental as the theory of transfinite cardinals. Cardinals are fundamentally simpler than ordinals, and it's an interesting historical coincidence that they first emerged as an abstraction from ordinals and only gradually began to be studied independently. This does not apply to Frege's work, where cardinals, both finite and transfinite, were considered completely separate from ordinals. However, it was Cantor's work that brought the subject to the world's attention, while Frege's remained almost obscure, likely due to the complexity of his symbols. Mathematicians, like everyone else, find it harder to grasp and utilize ideas that are comparatively “simple” in a logical sense than to work with more complex concepts that align more closely with their everyday practices. For these reasons, the true significance of cardinals in mathematical philosophy was only gradually recognized. The significance of ordinals, while still considerable, is notably less than that of cardinals and largely overlaps with the broader idea of relation-numbers. [Pg 95] [Pg 96]

CHAPTER X

LIMITS AND CONTINUITY

THE conception of a "limit" is one of which the importance in mathematics has been found continually greater than had been thought. The whole of the differential and integral calculus, indeed practically everything in higher mathematics, depends upon limits. Formerly, it was supposed that infinitesimals were involved in the foundations of these subjects, but Weierstrass showed that this is an error: wherever infinitesimals were thought to occur, what really occurs is a set of finite quantities having zero for their lower limit. It used to be thought that "limit" was an essentially quantitative notion, namely, the notion of a quantity to which others approached nearer and nearer, so that among those others there would be some differing by less than any assigned quantity. But in fact the notion of "limit" is a purely ordinal notion, not involving quantity at all (except by accident when the series concerned happens to be quantitative). A given point on a line may be the limit of a set of points on the line, without its being necessary to bring in co-ordinates or measurement or anything quantitative. The cardinal number $\aleph_{0}$ is the limit (in the order of magnitude) of the cardinal numbers 1, 2, 3, ... $n$ , ..., although the numerical difference between $\aleph_{0}$ and a finite cardinal is constant and infinite: from a quantitative point of view, finite numbers get no nearer to $\aleph_{0}$ as they grow larger. What makes $\aleph_{0}$ the limit of the finite numbers is the fact that, in the series, it comes immediately after them, which is an ordinal fact, not a quantitative fact. [Pg 97]

THE concept of a "limit" is increasingly recognized for its importance in mathematics. The entire differential and integral calculus, and indeed almost everything in higher mathematics, relies on limits. In the past, it was believed that infinitesimals were crucial to these foundations, but Weierstrass demonstrated that this was incorrect: wherever infinitesimals were thought to appear, what actually occurs is a collection of finite quantities that have zero as their lower limit. It used to be assumed that "limit" was primarily a quantitative idea, meaning a quantity that others get closer to, so that among those others, there would be some that differ by less than any specified quantity. However, the concept of "limit" is fundamentally an ordinal idea, not involving quantity at all (except by coincidence when the related series happens to be quantitative). A specific point on a line can be the limit of a set of points on that line, without needing to use coordinates, measurement, or anything quantitative. The cardinal number $\aleph_{0}$ is the limit (in terms of order) of the cardinal numbers 1, 2, 3, ... $n$ , ..., even though the numerical difference between $\aleph_{0}$ and any finite cardinal is constant and infinite: from a quantitative perspective, finite numbers do not get closer to $\aleph_{0}$ as they increase. What makes $\aleph_{0}$ the limit of finite numbers is that, in the series, it follows them directly, which is an ordinal fact, not a quantitative fact. [Pg 97]

There are various forms of the notion of "limit," of increasing complexity. The simplest and most fundamental form, from which the rest are derived, has been already defined, but we will here repeat the definitions which lead to it, in a general form in which they do not demand that the relation concerned shall be serial. The definitions are as follows:—

There are different types of the concept of "limit

The "minima" of a class $\alpha$ with respect to a relation $\mathrm P$ are those members of $\alpha$ and the field of $\mathrm P$ (if any) to which no member of $\alpha$ has the relation $\mathrm P$ .

The "minima" of a class $\alpha$ related to a relation $\mathrm P$ are the members of $\alpha$ and the field of $\mathrm P$ (if there is one) to which no member of $\alpha$ has the relation $\mathrm P$ .

The "maxima" with respect to $\mathrm P$ are the minima with respect to the converse of $\mathrm P$ .

The "maxima" regarding $\mathrm P$ are the minima concerning the opposite of $\mathrm P$ .

The "sequents" of a class $\alpha$ with respect to a relation $\mathrm P$ are the minima of the "successors" of $\alpha$ , and the "successors" of $\alpha$ are those members of the field of $\mathrm P$ to which every member of the common part of $\alpha$ and the field of $\mathrm P$ has the relation $\mathrm P$ .

The "sequents" of a class $\alpha$ regarding a relation $\mathrm P$ are the minimum of the "successors" of $\alpha$ , and the "successors" of $\alpha$ are those members of the field of $\mathrm P$ to which every member of the common part of $\alpha$ and the field of $\mathrm P$ is related by $\mathrm P$ .

The "precedents" with respect to $\mathrm P$ are the sequents with respect to the converse of $\mathrm P$ .

The "precedents" regarding $\mathrm P$ are the sequents related to the converse of $\mathrm P$ .

The "upper limits" of $\alpha$ with respect to $\mathrm P$ are the sequents provided $\alpha$ has no maximum; but if $\alpha$ has a maximum, it has no upper limits.

The "upper limits" of $\alpha$ in relation to $\mathrm P$ are the sequents as long as $\alpha$ does not have a maximum; however, if $\alpha$ has a maximum, it does not possess upper limits.

The "lower limits" with respect to $\mathrm P$ are the upper limits with respect to the converse of $\mathrm P$ .

The "lower limits" regarding $\mathrm P$ are the upper limits concerning the opposite of $\mathrm P$ .

Whenever $\mathrm P$ has connexity, a class can have at most one maximum, one minimum, one sequent, etc. Thus, in the cases we are concerned with in practice, we can speak of "the limit" (if any).

Whenever $\mathrm P$ has connectivity, a class can have at most one maximum, one minimum, one successive limit, etc. Therefore, in the practical cases we are dealing with, we can refer to "the limit" (if it exists).

When $\mathrm P$ is a serial relation, we can greatly simplify the above definition of a limit. We can, in that case, define first the "boundary" of a class $\alpha$ , i.e. its limits or maximum, and then proceed to distinguish the case where the boundary is the limit from the case where it is a maximum. For this purpose it is best to use the notion of "segment."

When $\mathrm P$ is a serial relation, we can greatly simplify the definition of a limit mentioned above. In this case, we can first define the "boundary" of a class $\alpha$ , meaning its limits or maximum, and then distinguish between the situation where the boundary is the limit and the situation where it is a maximum. For this purpose, it's best to use the concept of "segment."

We will speak of the "segment of $\mathrm P$ defined by a class $\alpha$ " as all those terms that have the relation $\mathrm P$ to some one or more of the members of $\alpha$ . This will be a segment in the sense defined [Pg 98] in Chapter VII.; indeed, every segment in the sense there defined is the segment defined by some class $\alpha$ . If $\mathrm P$ is serial, the segment defined by $\alpha$ consists of all the terms that precede some term or other of $\alpha$ . If $\alpha$ has a maximum, the segment will be all the predecessors of the maximum. But if $\alpha$ has no maximum, every member of $\alpha$ precedes some other member of $\alpha$ , and the whole of $\alpha$ is therefore included in the segment defined by $\alpha$ . Take, for example, the class consisting of the fractions $\tfrac{1}{2},\ \tfrac{3}{4},\ \tfrac{7}{8},\ \tfrac{15}{16},\ \dots,$ i.e. of all fractions of the form $1 - \dfrac{1}{2^{n}}$ for different finite values of $n$ . This series of fractions has no maximum, and it is clear that the segment which it defines (in the whole series of fractions in order of magnitude) is the class of all proper fractions. Or, again, consider the prime numbers, considered as a selection from the cardinals (finite and infinite) in order of magnitude. In this case the segment defined consists of all finite integers.

We will discuss the "segment of $\mathrm P$ defined by a class $\alpha$ " as all the terms that relate $\mathrm P$ to one or more of the members of $\alpha$ [Pg 98] in Chapter VII.; in fact, every segment in the way defined there is the segment created by some class $\alpha$ $\tfrac{1}{2},\ \tfrac{3}{4},\ \tfrac{7}{8},\ \tfrac{15}{16},\ \dots,$ i.e. all fractions of the form $1 - \dfrac{1}{2^{n}}$

Assuming that $\mathrm P$ is serial, the "boundary" of a class $\alpha$ will be the term $x$ (if it exists) whose predecessors are the segment defined by $\alpha$ .

A "maximum" of $\alpha$ is a boundary which is a member of $\alpha$ .

A "maximum" of $\alpha$ is a limit that belongs to $\alpha$ .

An "upper limit" of $\alpha$ is a boundary which is not a member of $\alpha$ .

An "upper limit" of $\alpha$ is a boundary that isn't included in $\alpha$ .

If a class has no boundary, it has neither maximum nor limit. This is the case of an "irrational" Dedekind cut, or of what is called a "gap."

If a class has no boundary, it has no maximum or limit. This is the case of an "irrational" Dedekind cut, or what’s referred to as a "gap."

Thus the "upper limit" of a set of terms $\alpha$ with respect to a series $\mathrm P$ is that term $x$ (if it exists) which comes after all the $\alpha$ 's, but is such that every earlier term comes before some of the $\alpha$ 's.

Thus, the "upper limit" of a set of terms $\alpha$ regarding a series $\mathrm P$ is the term $x$ (if it exists) that comes after all the $\alpha$ ’s, but every earlier term comes before some of the $\alpha$ ’s.

We may define all the "upper limiting-points" of a set of terms $\beta$ as all those that are the upper limits of sets of terms chosen out of $\beta$ . We shall, of course, have to distinguish upper limiting-points from lower limiting-points. If we consider, for example, the series of ordinal numbers: $1,\ 2,\ 3,\ \dots\ \omega,\ \omega + 1,\ \dots\ 2\omega,\ 2\omega + 1,\ 3\omega,\ \dots\ \omega^{2},\ \dots\ \omega^{3},\ \dots,$ [Pg 99] the upper limiting-points of the field of this series are those that have no immediate predecessors, i.e. $1,\ \omega,\ 2\omega,\ 3\omega,\ \dots\ \omega^{2},\ \omega^{2} + \omega,\ \dots\ 2\omega^{2},\ \dots\ \omega^{3}\ \dots$ The upper limiting-points of the field of this new series will be $1,\ \omega^{2},\ 2\omega^{2},\ \dots\ \omega^{3},\ \omega^{3} + \omega^{2}\ \dots$ On the other hand, the series of ordinals—and indeed every well-ordered series—has no lower limiting-points, because there are no terms except the last that have no immediate successors. But if we consider such a series as the series of ratios, every member of this series is both an upper and a lower limiting-point for suitably chosen sets. If we consider the series of real numbers, and select out of it the rational real numbers, this set (the rationals) will have all the real numbers as upper and lower limiting-points. The limiting-points of a set are called its "first derivative," and the limiting-points of the first derivative are called the second derivative, and so on.

We can define all the "upper limiting points" of a set of terms $\beta$ as those that are the upper limits of sets of terms selected from $\beta$ . We will need to differentiate between upper limiting points and lower limiting points. For instance, if we look at the series of ordinal numbers: $1,\ 2,\ 3,\ \dots\ \omega,\ \omega + 1,\ \dots\ 2\omega,\ 2\omega + 1,\ 3\omega,\ \dots\ \omega^{2},\ \dots\ \omega^{3},\ \dots,$ [Pg 99] The upper limiting points of this series are those that have no immediate predecessors, i.e. $1,\ \omega,\ 2\omega,\ 3\omega,\ \dots\ \omega^{2},\ \omega^{2} + \omega,\ \dots\ 2\omega^{2},\ \dots\ \omega^{3}\ \dots$ The upper limiting points of the field of this new series will be $1,\ \omega^{2},\ 2\omega^{2},\ \dots\ \omega^{3},\ \omega^{3} + \omega^{2}\ \dots$ On the other hand, the series of ordinals—and actually every well-ordered series—has no lower limiting points because there are no terms other than the last that have no immediate successors. However, if we consider a series like the series of ratios, every member of this series can be both an upper and a lower limiting point for appropriately selected sets. If we examine the series of real numbers and isolate the rational real numbers, this set (the rationals) will have all the real numbers as its upper and lower limiting points. The limiting points of a set are referred to as its "first derivative," and the limiting points of the first derivative are called the second derivative, and so forth.

With regard to limits, we may distinguish various grades of what may be called "continuity" in a series. The word "continuity" had been used for a long time, but had remained without any precise definition until the time of Dedekind and Cantor. Each of these two men gave a precise significance to the term, but Cantor's definition is narrower than Dedekind's: a series which has Cantorian continuity must have Dedekindian continuity, but the converse does not hold.

With respect to limits, we can identify different levels of what could be called "continuity" in a series. The term "continuity" has been used for a long time but was not clearly defined until the work of Dedekind and Cantor. Both of these thinkers provided a specific meaning for the term; however, Cantor's definition is more limited than Dedekind's: a series that exhibits Cantorian continuity must also have Dedekindian continuity, but the reverse is not true.

The first definition that would naturally occur to a man seeking a precise meaning for the continuity of series would be to define it as consisting in what we have called "compactness," i.e. in the fact that between any two terms of the series there are others. But this would be an inadequate definition, because of the existence of "gaps" in series such as the series of ratios. We saw in Chapter VII. that there are innumerable ways in which the series of ratios can be divided into two parts, of which one wholly precedes the other, and of which the first has no last term, [Pg 100] while the second has no first term. Such a state of affairs seems contrary to the vague feeling we have as to what should characterise "continuity," and, what is more, it shows that the series of ratios is not the sort of series that is needed for many mathematical purposes. Take geometry, for example: we wish to be able to say that when two straight lines cross each other they have a point in common, but if the series of points on a line were similar to the series of ratios, the two lines might cross in a "gap" and have no point in common. This is a crude example, but many others might be given to show that compactness is inadequate as a mathematical definition of continuity.

The first definition that would probably come to mind for someone looking for a clear meaning of continuity in series is to define it as "compactness," meaning that there are other terms between any two terms of the series. But this isn't a complete definition, because there are "gaps" in series like the series of ratios. We saw in Chapter VII that there are countless ways to split the series of ratios into two parts, where one completely comes before the other, with the first part having no last term and the second part having no first term. This situation seems to go against the vague idea we have of what should define "continuity," and, what's more, it indicates that the series of ratios isn't suitable for many mathematical needs. Take geometry, for instance: we need to state that when two straight lines intersect, they share a common point. However, if the series of points on a line were like the series of ratios, the two lines could intersect in a "gap" and wouldn’t have a common point. This is a simple example, but there are many others that could illustrate that compactness is not a sufficient mathematical definition of continuity.

It was the needs of geometry, as much as anything, that led to the definition of "Dedekindian" continuity. It will be remembered that we defined a series as Dedekindian when every sub-class of the field has a boundary. (It is sufficient to assume that there is always an upper boundary, or that there is always a lower boundary. If one of these is assumed, the other can be deduced.) That is to say, a series is Dedekindian when there are no gaps. The absence of gaps may arise either through terms having successors, or through the existence of limits in the absence of maxima. Thus a finite series or a well-ordered series is Dedekindian, and so is the series of real numbers. The former sort of Dedekindian series is excluded by assuming that our series is compact; in that case our series must have a property which may, for many purposes, be fittingly called continuity. Thus we are led to the definition:

It was the demands of geometry, like many other factors, that brought about the definition of "Dedekindian" continuity. We remember that we defined a series as Dedekindian when every sub-class of the field has a boundary. (It's enough to assume there’s always an upper boundary or that there’s always a lower boundary. If one of these is assumed, the other can be derived.) In other words, a series is Dedekindian when there are no gaps. The lack of gaps can occur either because terms have successors or because limits exist without maxima. So, a finite series or a well-ordered series is Dedekindian, just like the series of real numbers. The first type of Dedekindian series is ruled out by assuming that our series is compact; in that case, our series must have a property that can suitably be called continuity for many purposes. Thus, we arrive at the definition:

A series has "Dedekindian continuity" when it is Dedekindian and compact.

But this definition is still too wide for many purposes. Suppose, for example, that we desire to be able to assign such properties to geometrical space as shall make it certain that every point can be specified by means of co-ordinates which are real numbers: this is not insured by Dedekindian continuity alone. We want to be sure that every point which cannot be specified by rational co-ordinates can be specified as the limit of a progression of points [Pg 101] whose co-ordinates are rational, and this is a further property which our definition does not enable us to deduce.

But this definition is still too broad for many purposes. For instance, if we want to be able to assign properties to geometric space that ensure every point can be identified using coordinates that are real numbers, Dedekindian continuity alone doesn't guarantee that. We need to ensure that every point that can't be specified by rational coordinates can be identified as the limit of a progression of points [Pg 101] with rational coordinates, and this is an additional property that our definition doesn't allow us to conclude.

We are thus led to a closer investigation of series with respect to limits. This investigation was made by Cantor and formed the basis of his definition of continuity, although, in its simplest form, this definition somewhat conceals the considerations which have given rise to it. We shall, therefore, first travel through some of Cantor's conceptions in this subject before giving his definition of continuity.

We are therefore prompted to take a closer look at series in relation to limits. This analysis was conducted by Cantor and laid the groundwork for his definition of continuity, although, in its simplest form, this definition somewhat masks the underlying ideas that led to it. Therefore, we will first explore some of Cantor's ideas on this topic before presenting his definition of continuity.

Cantor defines a series as "perfect" when all its points are limiting-points and all its limiting-points belong to it. But this definition does not express quite accurately what he means. There is no correction required so far as concerns the property that all its points are to be limiting-points; this is a property belonging to compact series, and to no others if all points are to be upper limiting- or all lower limiting-points. But if it is only assumed that they are limiting-points one way, without specifying which, there will be other series that will have the property in question—for example, the series of decimals in which a decimal ending in a recurring 9 is distinguished from the corresponding terminating decimal and placed immediately before it. Such a series is very nearly compact, but has exceptional terms which are consecutive, and of which the first has no immediate predecessor, while the second has no immediate successor. Apart from such series, the series in which every point is a limiting-point are compact series; and this holds without qualification if it is specified that every point is to be an upper limiting-point (or that every point is to be a lower limiting-point).

Cantor defines a series as "perfect" when all its points are limiting points and all its limiting points are included in the series. However, this definition doesn't fully capture what he means. There’s no correction needed regarding the property that all points must be limiting points; this is a characteristic of compact series, and no others if all points are to be upper limiting points or all lower limiting points. But if we only assume they are limiting points in one direction without specifying which, there will be other series that meet this criterion—for instance, the series of decimals where a decimal ending in a recurring 9 is distinguished from the corresponding terminating decimal and placed right before it. This kind of series is very close to being compact, but it has exceptional consecutive terms, the first of which has no immediate predecessor, while the second has no immediate successor. Aside from such series, the series in which every point is a limiting point are compact series; and this is true without exception if it's stated that every point is meant to be an upper limiting point (or that every point is meant to be a lower limiting point).

Although Cantor does not explicitly consider the matter, we must distinguish different kinds of limiting-points according to the nature of the smallest sub-series by which they can be defined. Cantor assumes that they are to be defined by progressions, or by regressions (which are the converses of progressions). When every member of our series is the limit of a progression or regression, Cantor calls our series "condensed in itself" (insichdicht). [Pg 102]

Although Cantor doesn’t explicitly address this issue, we need to differentiate between various types of limit points based on the smallest subsequences that can define them. Cantor assumes they should be defined by progressions or regressions (which are the opposites of progressions). When every member of our series is the limit of a progression or regression, Cantor refers to our series as "condensed in itself" (insichdicht). [Pg 102]

We come now to the second property by which perfection was to be defined, namely, the property which Cantor calls that of being "closed" (abgeschlossen). This, as we saw, was first defined as consisting in the fact that all the limiting-points of a series belong to it. But this only has any effective significance if our series is given as contained in some other larger series (as is the case, e.g., with a selection of real numbers), and limiting-points are taken in relation to the larger series. Otherwise, if a series is considered simply on its own account, it cannot fail to contain its limiting-points. What Cantor means is not exactly what he says; indeed, on other occasions he says something rather different, which is what he means. What he really means is that every subordinate series which is of the sort that might be expected to have a limit does have a limit within the given series; i.e. every subordinate series which has no maximum has a limit, i.e. every subordinate series has a boundary. But Cantor does not state this for every subordinate series, but only for progressions and regressions. (It is not clear how far he recognises that this is a limitation.) Thus, finally, we find that the definition we want is the following:—

We now turn to the second property that defines perfection, which Cantor refers to as being "closed" (abgeschlossen). As we noted, this was initially defined as the idea that all the limit points of a series belong to it. However, this only holds true if our series is given as part of some larger series (like a selection of real numbers), and limit points are considered in relation to that larger series. Otherwise, if a series is looked at in isolation, it will always include its limit points. What Cantor means isn't exactly what he says; in fact, at times he suggests something quite different, which is actually his intended meaning. What he really intends is that every subordinate series that might be expected to have a limit does indeed have a limit within the given series; i.e. every subordinate series without a maximum has a limit, i.e. every subordinate series has a boundary. However, Cantor only states this for every subordinate series, specifically for progressions and regressions. (It's unclear how much he acknowledges this as a limitation.) Therefore, we conclude that the definition we are looking for is as follows:—

A series is said to be "closed" (abgeschlossen) when every progression or regression contained in the series has a limit in the series.

A series is considered "closed" (abgeschlossen) when every increase or decrease within the series has a limit in the series.

We then have the further definition:—

We then have the additional definition:—

A series is "perfect" when it is condensed in itself and closed, i.e. when every term is the limit of a progression or regression, and every progression or regression contained in the series has a limit in the series.

A series is "perfect" when it is self-contained and complete, i.e. when every term is the limit of a sequence moving forward or backward, and every sequence contained in the series has a limit within the series.

In seeking a definition of continuity, what Cantor has in mind is the search for a definition which shall apply to the series of real numbers and to any series similar to that, but to no others. For this purpose we have to add a further property. Among the real numbers some are rational, some are irrational; although the number of irrationals is greater than the number of rationals, yet there are rationals between any two real numbers, however [Pg 103] little the two may differ. The number of rationals, as we saw, is $\aleph_{0}$ . This gives a further property which suffices to characterise continuity completely, namely, the property of containing a class of $\aleph_{0}$ members in such a way that some of this class occur between any two terms of our series, however near together. This property, added to perfection, suffices to define a class of series which are all similar and are in fact a serial number. This class Cantor defines as that of continuous series.

In looking for a definition of continuity, Cantor aims to find a definition that applies specifically to the series of real numbers and any similar series, but not to others. To achieve this, we need to include an additional property. Among the real numbers, there are both rational and irrational numbers; although there are more irrational numbers than rational ones, there are still rational numbers between any two real numbers, no matter how close they are to each other. As we observed, the count of rational numbers is $\aleph_{0}$ . This provides another property that fully characterizes continuity, specifically the property of containing a set of $\aleph_{0}$ members such that some of these members exist between any two terms of our series, no matter how close together they are. This property, combined with perfection, is enough to define a class of series that are all similar and effectively constitute a serial number. Cantor defines this class as that of continuous series.

We may slightly simplify his definition. To begin with, we say:

We can simplify his definition a bit. To start, we say:

A "median class" of a series is a sub-class of the field such that members of it are to be found between any two terms of the series.

A "median class" of a series is a subgroup within the field where its members are found between any two terms of the series.

Thus the rationals are a median class in the series of real numbers. It is obvious that there cannot be median classes except in compact series.

Thus, the rational numbers are a middle class in the series of real numbers. It's clear that there can't be middle classes except in compact series.

We then find that Cantor's definition is equivalent to the following:—

We then find that Cantor's definition is the same as the following:—

A series is "continuous" when (1) it is Dedekindian, (2) it contains a median class having $\aleph_{0}$ terms.

A series is "continuous" when (1) it is Dedekindian, (2) it has a median class containing $\aleph_{0}$ terms.

To avoid confusion, we shall speak of this kind as "Cantorian continuity." It will be seen that it implies Dedekindian continuity, but the converse is not the case. All series having Cantorian continuity are similar, but not all series having Dedekindian continuity.

To avoid confusion, we'll refer to this type as "Cantorian continuity." It will be clear that it implies Dedekindian continuity, but the opposite is not true. All series with Cantorian continuity are similar, but not all series with Dedekindian continuity.

The notions of limit and continuity which we have been defining must not be confounded with the notions of the limit of a function for approaches to a given argument, or the continuity of a function in the neighbourhood of a given argument. These are different notions, very important, but derivative from the above and more complicated. The continuity of motion (if motion is continuous) is an instance of the continuity of a function; on the other hand, the continuity of space and time (if they are continuous) is an instance of the continuity of series, or (to speak more cautiously) of a kind of continuity which can, by sufficient mathematical [Pg 104] manipulation, be reduced to the continuity of series. In view of the fundamental importance of motion in applied mathematics, as well as for other reasons, it will be well to deal briefly with the notions of limits and continuity as applied to functions; but this subject will be best reserved for a separate chapter.

The concepts of limit and continuity that we've been defining should not be confused with the idea of the limit of a function as it approaches a specific argument, or the continuity of a function near a specific argument. These are distinct ideas that are very important but are derived from the above concepts and are more complex. The continuity of motion (if motion is continuous) is an example of the continuity of a function; on the flip side, the continuity of space and time (if they are continuous) represents a type of continuity that can, with enough mathematical manipulation, be related to the continuity of series. Given the crucial role of motion in applied mathematics, among other reasons, it would be beneficial to briefly discuss the ideas of limits and continuity as they relate to functions; however, this topic is better suited for a separate chapter.

The definitions of continuity which we have been considering, namely, those of Dedekind and Cantor, do not correspond very closely to the vague idea which is associated with the word in the mind of the man in the street or the philosopher. They conceive continuity rather as absence of separateness, the sort of general obliteration of distinctions which characterises a thick fog. A fog gives an impression of vastness without definite multiplicity or division. It is this sort of thing that a metaphysician means by "continuity," declaring it, very truly, to be characteristic of his mental life and of that of children and animals.

The definitions of continuity we've been looking at, specifically those from Dedekind and Cantor, don't really match up with the vague idea the average person or philosopher associates with the term. They see continuity more as a lack of separateness, like the way a thick fog blurs distinctions. A fog feels expansive without any clear divisions or separations. This is what a metaphysician means by "continuity," and it's accurate to say it's a key feature of his mental life as well as that of children and animals.

The general idea vaguely indicated by the word "continuity" when so employed, or by the word "flux," is one which is certainly quite different from that which we have been defining. Take, for example, the series of real numbers. Each is what it is, quite definitely and uncompromisingly; it does not pass over by imperceptible degrees into another; it is a hard, separate unit, and its distance from every other unit is finite, though it can be made less than any given finite amount assigned in advance. The question of the relation between the kind of continuity existing among the real numbers and the kind exhibited, e.g. by what we see at a given time, is a difficult and intricate one. It is not to be maintained that the two kinds are simply identical, but it may, I think, be very well maintained that the mathematical conception which we have been considering in this chapter gives the abstract logical scheme to which it must be possible to bring empirical material by suitable manipulation, if that material is to be called "continuous" in any precisely definable sense. It would be quite impossible [Pg 105] to justify this thesis within the limits of the present volume. The reader who is interested may read an attempt to justify it as regards time in particular by the present author in the Monist for 1914-5, as well as in parts of Our Knowledge of the External World. With these indications, we must leave this problem, interesting as it is, in order to return to topics more closely connected with mathematics. [Pg 106]

The general idea hinted at by the term "continuity," or by the term "flux," is certainly quite different from what we have been defining. Take, for instance, the series of real numbers. Each number is precisely what it is, without compromise; it does not gradually transition into another number. It stands as a distinct, separate unit, and the distance between any two units is finite, although it can be made smaller than any predetermined finite amount. The relationship between the type of continuity found among real numbers and the type we observe at any given moment is a challenging and complex issue. It’s not accurate to claim that the two types are simply the same, but I believe it can be strongly argued that the mathematical idea we have been exploring in this chapter provides the abstract logical framework that empirical material must conform to through appropriate manipulation if we are to categorize it as "continuous" in any clearly defined way. It would be impossible to justify this claim within the boundaries of this volume. Readers interested in this topic may refer to an attempt by the author to justify it specifically concerning time in the Monist for 1914-5, as well as in parts of Our Knowledge of the External World. With these points mentioned, we must set aside this intriguing problem to return to discussions more directly related to mathematics. [Pg 105] [Pg 106]

CHAPTER XI

LIMITS AND CONTINUITY OF FUNCTIONS

IN this chapter we shall be concerned with the definition of the limit of a function (if any) as the argument approaches a given value, and also with the definition of what is meant by a "continuous function." Both of these ideas are somewhat technical, and would hardly demand treatment in a mere introduction to mathematical philosophy but for the fact that, especially through the so-called infinitesimal calculus, wrong views upon our present topics have become so firmly embedded in the minds of professional philosophers that a prolonged and considerable effort is required for their uprooting. It has been thought ever since the time of Leibniz that the differential and integral calculus required infinitesimal quantities. Mathematicians (especially Weierstrass) proved that this is an error; but errors incorporated, e.g. in what Hegel has to say about mathematics, die hard, and philosophers have tended to ignore the work of such men as Weierstrass.

IN this chapter, we will focus on defining the limit of a function (if one exists) as the input approaches a specific value, as well as clarifying what is meant by a "continuous function." Both concepts are a bit technical and would usually not require discussion in a simple introduction to mathematical philosophy, except that, especially through the so-called infinitesimal calculus, misconceptions about these topics have deeply taken root in the minds of professional philosophers, so it takes a significant amount of effort to correct them. Since Leibniz's time, it has been believed that differential and integral calculus relied on infinitesimal quantities. Mathematicians (especially Weierstrass) have proven that this is a mistake, but misconceptions embedded, e.g. in Hegel's discussions on mathematics, are hard to shake, and philosophers have often overlooked the contributions of figures like Weierstrass.

Limits and continuity of functions, in works on ordinary mathematics, are defined in terms involving number. This is not essential, as Dr Whitehead has shown.[22] We will, however, begin with the definitions in the text-books, and proceed afterwards to show how these definitions can be generalised so as to apply to series in general, and not only to such as are numerical or numerically measurable.

Limits and continuity of functions, in textbooks on basic mathematics, are defined using concepts related to numbers. This isn’t necessary, as Dr. Whitehead has demonstrated.[22] However, we will start with the definitions found in the textbooks and then show how these definitions can be generalized to apply to series in general, not just to those that are numerical or numerically measurable.

[22]See Principia Mathematica, vol. II. * 230-234.

Let us consider any ordinary mathematical function $fx$ , where [Pg 107] $x$ and $fx$ are both real numbers, and $fx$ is one-valued—i.e. when $x$ is given, there is only one value that $fx$ can have. We call $x$ the "argument," and $fx$ the "value for the argument $x$ ." When a function is what we call "continuous," the rough idea for which we are seeking a precise definition is that small differences in $x$ shall correspond to small differences in $fx$ , and if we make the differences in $x$ small enough, we can make the differences in $fx$ fall below any assigned amount. We do not want, if a function is to be continuous, that there shall be sudden jumps, so that, for some value of $x$ , any change, however small, will make a change in $fx$ which exceeds some assigned finite amount. The ordinary simple functions of mathematics have this property: it belongs, for example, to $x^{2}$ , $x^{3}$ , ... $\log x$ , $\sin x$ , and so on. But it is not at all difficult to define discontinuous functions. Take, as a non-mathematical example, "the place of birth of the youngest person living at time $t$ ." This is a function of $t$ ; its value is constant from the time of one person's birth to the time of the next birth, and then the value changes suddenly from one birthplace to the other. An analogous mathematical example would be "the integer next below $x$ ," where $x$ is a real number. This function remains constant from one integer to the next, and then gives a sudden jump. The actual fact is that, though continuous functions are more familiar, they are the exceptions: there are infinitely more discontinuous functions than continuous ones.

Let’s look at any regular mathematical function $fx$ , where [Pg 107] $x$ and $fx$ are both real numbers, and $fx$ has a single output—i.e. when $x$ is provided, there is only one corresponding value for $fx$ . We refer to $x$ as the "input," and $fx$ as the "output for the input $x$ should lead to small changes in $fx$

Many functions are discontinuous for one or several values of the variable, but continuous for all other values. Take as an example $\sin 1/x$ . The function $\sin \theta$ passes through all values from -1 to 1 every time that $\theta$ passes from $-\pi/2$ to $\pi/2$ , or from $\pi/2$ to $3\pi/2$ , or generally from $(2n - 1)\pi/2$ to $(2n + 1)\pi/2$ , where $n$ is any integer. Now if we consider $1/x$ when $x$ is very small, we see that as $x$ diminishes $1/x$ grows faster and faster, so that it passes more and more quickly through the cycle of values from one multiple of $\pi/2$ to another as $x$ becomes smaller and smaller. Consequently $\sin 1/x$ passes more and more quickly from -1 [Pg 108] to 1 and back again, as $x$ grows smaller. In fact, if we take any interval containing 0, say the interval from $-\epsilon$ to $+\epsilon$ where $\epsilon$ is some very small number, $\sin 1/x$ will go through an infinite number of oscillations in this interval, and we cannot diminish the oscillations by making the interval smaller. Thus round about the argument 0 the function is discontinuous. It is easy to manufacture functions which are discontinuous in several places, or in $\aleph_{0}$ places, or everywhere. Examples will be found in any book on the theory of functions of a real variable.

Many functions are discontinuous for one or more values of the variable but continuous for all the others. For instance, consider $\sin 1/x$ . The function $\sin \theta$ takes on all values from -1 to 1 whenever $\theta$ moves from $-\pi/2$ to $\pi/2$ , or from $\pi/2$ to $3\pi/2$ , or generally from $(2n - 1)\pi/2$ to $(2n + 1)\pi/2$ , where $n$ is any integer. Now, if we look at $1/x$ when $x$ is very small, we can see that as $x$ gets smaller, $1/x$ increases rapidly, cycling through values from one multiple of $\pi/2$ to another faster and faster as $x$ becomes smaller. As a result, $\sin 1/x$ moves more and more quickly from -1 to 1 and back again as $x$ gets smaller. In fact, if we take any interval that includes 0, like the interval from $-\epsilon$ to $+\epsilon$ where $\epsilon$ is a very small number, $\sin 1/x$ will undergo an infinite number of oscillations in this interval, and we cannot reduce these oscillations by making the interval smaller. Thus, around the point 0, the function is discontinuous. It’s easy to create functions that are discontinuous in several places, in $\aleph_{0}$ places, or everywhere. You can find examples of this in any book about the theory of functions of a real variable.

Proceeding now to seek a precise definition of what is meant by saying that a function is continuous for a given argument, when argument and value are both real numbers, let us first define a "neighbourhood" of a number $x$ as all the numbers from $x - \epsilon$ to $x + \epsilon$ , where $\epsilon$ is some number which, in important cases, will be very small. It is clear that continuity at a given point has to do with what happens in any neighbourhood of that point, however small.

Now, let's find a clear definition of what it means for a function to be continuous for a certain input when both the input and output are real numbers. First, we’ll define a "neighborhood" of a number $x$ as all the numbers from $x - \epsilon$ to $x + \epsilon$ , where $\epsilon$ is some number that will be quite small in important cases. It's clear that continuity at a certain point relates to what occurs in any neighborhood of that point, no matter how small.

What we desire is this: If $a$ is the argument for which we wish our function to be continuous, let us first define a neighbourhood ( $\alpha$ say) containing the value $fa$ which the function has for the argument $a$ ; we desire that, if we take a sufficiently small neighbourhood containing $a$ , all values for arguments throughout this neighbourhood shall be contained in the neighbourhood $\alpha$ , no matter how small we may have made $\alpha$ . That is to say, if we decree that our function is not to differ from $fa$ by more than some very tiny amount, we can always find a stretch of real numbers, having $a$ in the middle of it, such that throughout this stretch $fx$ will not differ from $fa$ by more than the prescribed tiny amount. And this is to remain true whatever tiny amount we may select. Hence we are led to the following definition:—

What we want is this: If $a$ is the input for which we want our function to be continuous, let’s first define a neighborhood ( $\alpha$ say) around the value $fa$ that the function has for the input $a$ , all the values for inputs within this neighborhood should be within the neighborhood $\alpha$ , no matter how small we make $fa$ by more than a very small amount, we can always find a range of real numbers, with $a$ in the center, such that throughout this range $fx$ won’t differ from $fa$ by more than the specified small amount. And this holds true no matter what small amount we choose. Therefore, we arrive at the following definition:—

The function $f(x)$ is said to be "continuous" for the argument $a$ if, for every positive number $\epsilon$ , different from 0, but as small as we please, there exists a positive number $\delta$ , different from 0, such that, for all values of $\delta$ which are numerically [Pg 109] less [23] than $\epsilon$ , the difference $f(a) + \delta) - f(a)$ is numerically less than $\sigma$ .

The function $f(x)$ is considered "continuous" at the point $a$ if, for every positive number $\epsilon$ , which is greater than 0 but can be made as small as needed, there exists a positive number $\delta$ , which is also greater than 0, such that for all values of $\delta$ that are numerically [Pg 109] less [23] than $\epsilon$ , the difference $f(a) + \delta) - f(a)$ is numerically less than $\sigma$ .

[23]A number is said to be "numerically less" than $\epsilon$ when it lies between $-\epsilon$ and $+\epsilon$ .

[23]A number is considered "numerically less" than $\epsilon$ when it falls between $-\epsilon$ and $+\epsilon$ .

In this definition, $\sigma$ first defines a neighbourhood of $f(a)$ , namely, the neighbourhood from $f(a) - \sigma$ to $f(a) + \epsilon$ . The definition then proceeds to say that we can (by means of $\epsilon$ define a neighbourhood, namely, that from $a - \epsilon$ to $a + \epsilon$ , such that, for all arguments within this neighbourhood, the value of the function lies within the neighbourhood horn $f(a) - \sigma$ to $f(a) + \sigma$ . If this can be done, however $\sigma$ may be chosen, the function is "continuous" for the argument $a$ .

In this definition, $\sigma$ first defines a neighborhood of $f(a)$ , which is the neighborhood from $f(a) - \sigma$ to $f(a) + \epsilon$ . The definition then goes on to say that we can, using $\epsilon$ define a neighborhood from $a - \epsilon$ to $a + \epsilon$ , such that for all arguments within this neighborhood, the value of the function lies within the neighborhood from $f(a) - \sigma$ to $f(a) + \sigma$ . If this can be accomplished, regardless of how $\sigma$ is chosen, the function is considered "continuous" for the argument $a$ .

So far we have not defined the "limit" of a function for a given argument. If we had done so, we could have defined the continuity of a function differently: a function is continuous at a point where its value is the same as the limit of its value for approaches either from above or from below. But it is only the exceptionally "tame" function that has a definite limit as the argument approaches a given point. The general rule is that a function oscillates, and that, given any neighbourhood of a given argument, however small, a whole stretch of values will occur for arguments within this neighbourhood. As this is the general rule, let us consider it first.

So far, we haven't defined the "limit" of a function for a specific argument. If we had, we could have defined the continuity of a function differently: a function is continuous at a point where its value matches the limit of its value as you approach from either direction. However, only very "well-behaved" functions have a clear limit as the argument approaches a specific point. Generally, functions oscillate, meaning that within any neighborhood of a given argument, no matter how small, you’ll find a whole range of values for arguments in that neighborhood. Since this is the general case, let's focus on it first.

Let us consider what may happen as the argument approaches some value $a$ from below. That is to say, we wish to consider what happens for arguments contained in the interval from $a - \epsilon$ to $a$ , where $\epsilon$ is some number which, in important cases, will be very small.

Let’s look at what might happen as the argument gets close to the value $a$ from below. Essentially, we want to examine what occurs for arguments within the range from $a - \epsilon$ to $a$ , where $\epsilon$ is a number that, in key situations, will be very small.

The values of the function for arguments from $a - \epsilon$ to $a$ ( $a$ excluded) will be a set of real numbers which will define a certain section of the set of real numbers, namely, the section consisting of those numbers that are not greater than all the values for arguments from $a - \epsilon$ to $a$ . Given any number in this section, there are values at least as great as this number for arguments between $a - \epsilon$ and $a$ , i.e. for arguments that fall very little short [Pg 110] of $a$ (if $\epsilon$ is very small). Let us take all possible $\epsilon$ 's and all possible corresponding sections. The common part of all these sections we will call the "ultimate section" as the argument approaches $a$ . To say that a number $z$ belongs to the ultimate section is to say that, however small we may make $\epsilon$ , there are arguments between $a - \epsilon$ and $a$ for which the value of the function is not less than $z$ .

The values of the function for inputs from $a - \epsilon$ to $a$ ( $a$ excluded) will create a set of real numbers that defines a specific section of the real number set, namely, the section containing those numbers that are not greater than all the values for inputs from $a - \epsilon$ to $a - \epsilon$ and $a$ , i.e. for arguments that come very close to $a$ (if $\epsilon$ is very small). Let’s take all possible $\epsilon$ values and all corresponding sections. The overlapping part of all these sections will be called the "ultimate section" as the argument approaches $a$ . Saying that a number $z$ is part of the ultimate section means that no matter how small we make $\epsilon$ , there are inputs between $a - \epsilon$ and $a$ for which the function value is not less than $z$ .

We may apply exactly the same process to upper sections, i.e. to sections that go from some point up to the top, instead of from the bottom up to some point. Here we take those numbers that are not less than all the values for arguments from $a - \epsilon$ to $a$ ; this defines an upper section which will vary as $\epsilon$ varies. Taking the common part of all such sections for all possible $\epsilon$ 's, we obtain the "ultimate upper section." To say that a number $z$ belongs to the ultimate upper section is to say that, however small we make $\epsilon$ , there are arguments between $a - \epsilon$ and $a$ for which the value of the function is not greater than $z$ .

We can use the same process for the upper sections, meaning those that extend from a certain point to the top, rather than from the bottom to a certain point. Here, we focus on the numbers that are not less than all the values for arguments from $a - \epsilon$ to $a$ ; this creates an upper section that will change as $\epsilon$ varies. By taking the common part of all these sections for all possible $\epsilon$ , we arrive at the "ultimate upper section." To say that a number $z$ belongs to the ultimate upper section means that, no matter how small we make $\epsilon$ , there are arguments between $a - \epsilon$ and $a$ for which the value of the function is not greater than $z$

If a term $z$ belongs both to the ultimate section and to the ultimate upper section, we shall say that it belongs to the "ultimate oscillation." We may illustrate the matter by considering once more the function $\sin 1/x$ as $x$ approaches the value 0. We shall assume, in order to fit in with the above definitions, that this value is approached from below.

If a term $z$ is part of both the ultimate section and the ultimate upper section, we will say it belongs to the "ultimate oscillation." To illustrate this, let's consider the function $\sin 1/x$ as $x$ approaches 0. We will assume, in line with the definitions above, that this value is approached from below.

Let us begin with the "ultimate section." Between $-\epsilon$ and 0, whatever $\epsilon$ may be, the function will assume the value 1 for certain arguments, but will never assume any greater value. Hence the ultimate section consists of all real numbers, positive and negative, up to and including 1; i.e. it consists of all negative numbers together with 0, together with the positive numbers up to and including 1.

Let’s start with the "ultimate section." Between $-\epsilon$ and 0, regardless of what $\epsilon$ is, the function will take on the value 1 for certain arguments, but will never go beyond that. Therefore, the ultimate section includes all real numbers, both positive and negative, up to and including 1; i.e. it includes all negative numbers along with 0, plus all positive numbers up to and including 1.

Similarly the "ultimate upper section" consists of all positive numbers together with 0, together with the negative numbers down to and including -1.

Similarly, the "ultimate upper section" includes all positive numbers along with 0, and the negative numbers down to and including -1.

Thus the "ultimate oscillation" consists of all real numbers from -1 to 1, both included. [Pg 111]

Thus the "ultimate oscillation" includes all real numbers from -1 to 1, both inclusive. [Pg 111]

We may say generally that the "ultimate oscillation" of a function as the argument approaches $a$ from below consists of all those numbers $x$ which are such that, however near we come to $a$ , we shall still find values as great as $x$ and values as small as $x$ .

We can generally say that the "ultimate oscillation" of a function as the argument approaches $a$ from below includes all the numbers $x$ such that, no matter how close we get to $a$ , we will still find values as large as $x$ and values as small as $x$ .

The ultimate oscillation may contain no terms, or one term, or many terms. In the first two cases the function has a definite limit for approaches from below. If the ultimate oscillation has one term, this is fairly obvious. It is equally true if it has none; for it is not difficult to prove that, if the ultimate oscillation is null, the boundary of the ultimate section is the same as that of the ultimate upper section, and may be defined as the limit of the function for approaches from below. But if the ultimate oscillation has many terms, there is no definite limit to the function for approaches from below. In this case we can take the lower and upper boundaries of the ultimate oscillation (i.e. the lower boundary of the ultimate upper section and the upper boundary of the ultimate section) as the lower and upper limits of its "ultimate" values for approaches from below. Similarly we obtain lower and upper limits of the "ultimate" values for approaches from above. Thus we have, in the general case, four limits to a function for approaches to a given argument. The limit for a given argument $a$ only exists when all these four are equal, and is then their common value. If it is also the value for the argument $a$ , the function is continuous for this argument. This may be taken as defining continuity: it is equivalent to our former definition.

The ultimate oscillation can have no terms, one term, or multiple terms. In the first two situations, the function has a clear limit when approaching from below. This is obvious if there’s one term; it’s also true if there are none, because it’s easy to show that if the ultimate oscillation is zero, the boundary of the ultimate section matches that of the ultimate upper section and can be defined as the limit of the function when approaching from below. However, if the ultimate oscillation has many terms, there’s no clear limit to the function when approaching from below. In this scenario, we can take the lower and upper boundaries of the ultimate oscillation (i.e., the lower boundary of the ultimate upper section and the upper boundary of the ultimate section) as the lower and upper limits of its "ultimate" values when approaching from below. Similarly, we can find the lower and upper limits of the "ultimate" values for approaches from above. Therefore, in general, we have four limits for a function as it approaches a given argument. The limit for a specific argument only exists when all four of these are the same, and is then considered their common value. If this value also refers to the argument, the function is continuous at that argument. This can be seen as the definition of continuity: it is equivalent to our earlier definition.

We can define the limit of a function for a given argument (if it exists) without passing through the ultimate oscillation and the four limits of the general case. The definition proceeds, in that case, just as the earlier definition of continuity proceeded. Let us define the limit for approaches from below. If there is to be a definite limit for approaches to $a$ from below, it is necessary and sufficient that, given any small number $\sigma$ , two values for arguments sufficiently near to $a$ (but both less than $a$ ) will differ [Pg 112] by less than $\sigma$ ; i.e. if $\epsilon$ is sufficiently small, and our arguments both lie between $a - \epsilon$ and $a$ ( $a$ excluded), then the difference between the values for these arguments will be less than $\sigma$ . This is to hold for any $\sigma$ , however small; in that case the function has a limit for approaches from below. Similarly we define the case when there is a limit for approaches from above. These two limits, even when both exist, need not be identical; and if they are identical, they still need not be identical with the value for the argument $a$ . It is only in this last case that we call the function continuous for the argument $a$ .

We can define the limit of a function for a given argument (if it exists) without going through ultimate oscillation and the four limits of the general case. The definition works just like the earlier definition of continuity. Let’s define the limit for approaches from below. For there to be a definite limit for approaches to $a$ from below, it’s necessary and sufficient that, given any small number $\sigma$ , two argument values that are close enough to $a$ (but both less than $a$ ) will differ by less than $\sigma$ ; i.e. if $\epsilon$ is sufficiently small, and our arguments both lie between $a - \epsilon$ and $a$ ( $a$ excluded), then the difference between the values for these arguments will be less than $a$ . It’s only in this last case that we call the function continuous for the argument $a$ .

A function is called "continuous" (without qualification) when it is continuous for every argument.

A function is called "continuous" (without any qualifiers) when it is continuous for every input.

Another slightly different method of reaching the definition of continuity is the following:—

Another slightly different way to define continuity is as follows:—

Let us say that a function "ultimately converges into a class $\alpha$ " if there is some real number such that, for this argument and all arguments greater than this, the value of the function is a member of the class $alpha$ . Similarly we shall say that a function "converges into $\alpha$ as the argument approaches $x$ from below" if there is some argument $y$ less than $x$ such that throughout the interval from $y$ (included) to $x$ (excluded) the function has values which are members of $alpha$ . We may now say that a function is continuous for the argument $a$ , for which it has the value $fa$ , if it satisfies four conditions, namely:—

Let’s say that a function "ultimately converges to a class $\alpha$ " if there is some real number such that, for this argument and all arguments greater than this, the value of the function is a member of the class $alpha$ . Similarly, we will say that a function "converges to $\alpha$ as the argument approaches $x$ from below" if there is some argument $y$ less than $x$ such that throughout the interval from $y$ (inclusive) to $x$ (exclusive) the function has values that are members of $alpha$ . We can now say that a function is continuous for the argument $a$ , which has the value $fa$ , if it meets four conditions, namely:—

(1) Given any real number less than $fa$ , the function converges into the successors of this number as the argument approaches $a$ from below;

(1) For any real number less than $fa$ , the function approaches the next numbers after this one as the argument gets closer to $a$ from below;

(2) Given any real number greater than $fa$ , the function converges into the predecessors of this number as the argument approaches $a$ from below;

(2) For any real number greater than $fa$ , the function approaches the values just below this number as the argument gets closer to $a$ from below;

(3) and (4) Similar conditions for approaches to $a$ from above.

(3) and (4) The same conditions apply for approaches to $a$ from above.

The advantages of this form of definition is that it analyses the conditions of continuity into four, derived from considering arguments and values respectively greater or less than the argument and value for which continuity is to be defined. [Pg 113]

The benefits of this type of definition are that it breaks down the conditions of continuity into four parts, based on examining arguments and values that are either greater or lesser than the argument and value for which continuity is being defined. [Pg 113]

We may now generalise our definitions so as to apply to series which are not numerical or known to be numerically measurable. The case of motion is a convenient one to bear in mind. There is a story by H. G. Wells which will illustrate, from the case of motion, the difference between the limit of a function for a given argument and its value for the same argument. The hero of the story, who possessed, without his knowledge, the power of realising his wishes, was being attacked by a policeman, but on ejaculating "Go to——" he found that the policeman disappeared. If $f(t)$ was the policeman's position at time $t$ , and $t_{0}$ the moment of the ejaculation, the limit of the policeman's positions as $t$ approached to $t_{0}$ from below would be in contact with the hero, whereas the value for the argument $t_{0}$ was —. But such occurrences are supposed to be rare in the real world, and it is assumed, though without adequate evidence, that all motions are continuous, i.e. that, given any body, if $f(t)$ is its position at time $t$ , $f(t)$ is a continuous function of $t$ . It is the meaning of "continuity" involved in such statements which we now wish to define as simply as possible.

We can now broaden our definitions to include series that are not numerical or can't be measured quantitatively. The concept of motion serves as a useful example. There's a story by H. G. Wells that highlights the difference between the limit of a function for a specific argument and its actual value for that same argument. In the story, the main character, who unknowingly has the ability to make his wishes come true, is confronted by a policeman. When he shouts "Go to——," the policeman vanishes. If $f(t)$ represents the policeman's position at time $t$ , and $t_{0}$ is the moment he shouted, the limit of the policeman's position as $t$ approaches $t_{0}$ from below would just touch the hero. However, the value for the argument $t_{0}$ was —. Such events are assumed to be rare in the real world, and it is thought—though without sufficient evidence—that all movements are continuous, meaning that for any object, if $f(t)$ is its position at time $t$ , $f(t)$ is a continuous function of $t$ . We now want to define the concept of "continuity" in such statements as simply as possible.

The definitions given for the case of functions where argument and value are real numbers can readily be adapted for more general use.

The definitions provided for functions where the input and output are real numbers can easily be adapted for broader applications.

Let $\mathrm P$ and $\mathrm Q$ be two relations, which it is well to imagine serial, though it is not necessary to our definitions that they should be so. Let $\mathrm R$ be a one-many relation whose domain is contained in the field of $\mathrm P$ , while its converse domain is contained in the field of $\mathrm Q$ . Then $\mathrm R$ is (in a generalised sense) a function, whose arguments belong to the field of $\mathrm Q$ , while its values belong to the field of $\mathrm P$ . Suppose, for example, that we are dealing with a particle moving on a line: let $\mathrm Q$ be the time-series, $\mathrm P$ the series of points on our line from left to right, $\mathrm R$ the relation of the position of our particle on the line at time $a$ to the time $a$ , so that "the $\mathrm R$ of $a$ " is its position at time $a$ . This illustration may be borne in mind throughout our definitions.

Let $\mathrm P$ and $\mathrm Q$ represent two relationships, which it’s helpful to think of as a series, even though it’s not essential for our definitions that they be so. Let $\mathrm R$ be a one-to-many relationship where its domain is in the field of $\mathrm P$ , while its converse domain is in the field of $\mathrm Q$ . Then $\mathrm R$ is (in a generalized sense) a function, whose inputs are from the field of $\mathrm Q$ , while its outputs belong to the field of $\mathrm P$ . For example, let’s consider a particle moving along a line: let $\mathrm Q$ be the time series, $\mathrm P$ the series of points on the line from left to right, and $\mathrm R$ the relationship of the particle's position on the line at time $a$ to the time $a$ . This way, "the $\mathrm R$ of $a$ ” is its position at time $a$ . This example can be kept in mind throughout our definitions.

We shall say that the function $\mathrm R$ is continuous for the argument $a$ [Pg 114] if, given any interval $\alpha$ on the $\mathrm P$ -series containing the value of the function for the argument $a$ , there is an interval on the $\mathrm Q$ -series containing $a$ not as an end-point and such that, throughout this interval, the function has values which are members of $\alpha$ . (We mean by an "interval" all the terms between any two; i.e. if $x$ and $y$ are two members of the field of $\mathrm P$ , and $x$ has the relation $\mathrm P$ to $y$ , we shall mean by the " $\mathrm P$ -interval $x$ to $y$ " all terms $z$ such that $x$ has the relation $\mathrm P$ to $x$ and $z$ has the relation $\mathrm P$ to $y$ —together, when so stated, with $x$ or $y$ themselves.)

We will say that the function $\mathrm R$ is continuous for the input $a$ [Pg 114] if, for any interval $\alpha$ in the $\mathrm P$ -series that includes the value of the function for the input $a$ , there exists an interval in the $\mathrm Q$ -series that contains $a$ not as an endpoint, and within this interval, the function has values that are part of $\alpha$ . (By "interval," we mean all the terms between any two; i.e. if $x$ and $y$ are two members of the field of $\mathrm P$ , and $x$ has the relation $\mathrm P$ to $y$ , we will refer to the " $\mathrm P$ -interval $x$ to $y$

We can easily define the "ultimate section" and the "ultimate oscillation." To define the "ultimate section" for approaches to the argument $a$ from below, take any argument $y$ which precedes $a$ (i.e. has the relation $\mathrm Q$ to $a$ ), take the values of the function for all arguments up to and including $y$ , and form the section of $\mathrm P$ defined by these values, i.e. those members of the $\mathrm P$ -series which are earlier than or identical with some of these values. Form all such sections for all $y$ 's that precede $a$ , and take their common part; this will be the ultimate section. The ultimate upper section and the ultimate oscillation are then defined exactly as in the previous case.

We can easily define the "ultimate section" and the "ultimate oscillation." To define the "ultimate section" for approaches to the argument $a$ from below, take any argument $y$ that comes before $a$ (i.e. has the relation $\mathrm Q$ to $a$ ), take the values of the function for all arguments up to and including $y$ , and form the section of $\mathrm P$ defined by these values, i.e. those members of the $\mathrm P$ -series that are earlier than or equal to some of these values. Form all such sections for all $y$ 's that come before $a$ , and take their common part; this will be the ultimate section. The ultimate upper section and the ultimate oscillation are then defined exactly as in the previous case.

The adaptation of the definition of convergence and the resulting alternative definition of continuity offers no difficulty of any kind.

The updated definition of convergence and the new definition of continuity are straightforward and easy to understand.

We say that a function $\mathrm R$ is "ultimately $\mathrm Q$ -convergent into $\alpha$ " if there is a member $y$ of the converse domain of $\mathrm Q$ and the field of $\mathrm Q$ such that the value of the function for the argument $y$ and for any argument to which $y$ has the relation $\alpha$ is a member of $\alpha$ . We say that $\mathrm R$ " $\mathrm Q$ -converges into $\alpha$ as the argument approaches a given argument $a$ " if there is a term $y$ having the relation $\mathrm Q$ to $a$ and belonging to the converse domain of $\mathrm R$ and such that the value of the function for any argument in the $\mathrm Q$ -interval from $y$ (inclusive) to $a$ (exclusive) belongs to $\alpha$ .

We say that a function $\mathrm R$ is "ultimately $\mathrm Q$ -convergent to $y$ of the converse domain of $\mathrm Q$ and the field of $\mathrm Q$ such that the function's value for the argument $y$ and any argument related to $y$ by $\alpha$ is a member of $\mathrm R$ " $\mathrm Q$

Of the four conditions that a function must fulfil in order to be continuous for the argument $a$ , the first is, putting $b$ for the value for the argument $a$ : [Pg 115]

Of the four conditions that a function must meet to be continuous for the input $a$ , the first is to substitute $b$ for the value of the input $a$ : [Pg 115]

Given any term having the relation $\mathrm P$ to $b$ , $\mathrm R$ $\mathrm Q$ -converges into the successors of $b$ (with respect to $\mathrm P$ ) as the argument approaches $a$ from below.

Given any term that has the relation $\mathrm P$ to $b$ , $\mathrm R$ $\mathrm Q$ -converges into the successors of $b$ (with respect to $\mathrm P$ ) as the argument approaches $a$ from below.

The second condition is obtained by replacing $\mathrm P$ by its converse; the third and fourth are obtained from the first and second by replacing $\mathrm Q$ by its converse.

The second condition is achieved by replacing $\mathrm P$ with its converse; the third and fourth are derived from the first and second by substituting $\mathrm Q$ with its converse.

There is thus nothing, in the notions of the limit of a function or the continuity of a function, that essentially involves number. Both can be defined generally, and many propositions about them can be proved for any two series (one being the argument-series and the other the value-series). It will be seen that the definitions do not involve infinitesimals. They involve infinite classes of intervals, growing smaller without any limit short of zero, but they do not involve any intervals that are not finite. This is analogous to the fact that if a line an inch long be halved, then halved again, and so on indefinitely, we never reach infinitesimals in this way: after $n$ bisections, the length of our bit is $\dfrac{1}{2^{n}}$ of an inch; and this is finite whatever finite number $n$ may be. The process of successive bisection does not lead to divisions whose ordinal number is infinite, since it is essentially a one-by-one process. Thus infinitesimals are not to be reached in this way. Confusions on such topics have had much to do with the difficulties which have been found in the discussion of infinity and continuity. [Pg 116]

There is nothing in the concepts of the limit of a function or the continuity of a function that fundamentally involves numbers. Both can be defined in general terms, and many statements about them can be proved for any two series (one being the argument series and the other the value series). You'll see that the definitions do not include infinitesimals. They involve infinite sets of intervals that get smaller without ever reaching zero, but they do not include any intervals that are not finite. This is similar to the fact that if you take a line one inch long and keep halving it, you never actually reach infinitesimals through this method: after $n$ bisections, the length of our piece is $\dfrac{1}{2^{n}}$ of an inch; and this is finite no matter what finite number $n$ may be. The process of repeated bisection does not result in divisions that are infinitely ordinal, since it is essentially a one-by-one process. Therefore, you cannot reach infinitesimals this way. Confusions around these topics have contributed significantly to the challenges encountered in discussions of infinity and continuity. [Pg 116]

CHAPTER XII

SELECTIONS AND THE MULTIPLICATIVE AXIOM

IN this chapter we have to consider an axiom which can be enunciated, but not proved, in terms of logic, and which is convenient, though not indispensable, in certain portions of mathematics. It is convenient, in the sense that many interesting propositions, which it seems natural to suppose true, cannot be proved without its help; but it is not indispensable, because even without those propositions the subjects in which they occur still exist, though in a somewhat mutilated form.

IN this chapter, we need to look at a principle that can be stated but not proven through logic, and which is useful, although not essential, in some areas of mathematics. It's useful in that many interesting statements that seem intuitively true can’t be proven without it. However, it’s not essential because even without those statements, the topics they relate to still exist, albeit in a somewhat incomplete form.

Before enunciating the multiplicative axiom, we must first explain the theory of selections, and the definition of multiplication when the number of factors may be infinite.

Before stating the multiplicative axiom, we first need to explain the theory of selections and the definition of multiplication when the number of factors can be infinite.

In defining the arithmetical operations, the only correct procedure is to construct an actual class (or relation, in the case of relation-numbers) having the required number of terms. This sometimes demands a certain amount of ingenuity, but it is essential in order to prove the existence of the number defined. Take, as the simplest example, the case of addition. Suppose we are given a cardinal number $\mu$ , and a class $\alpha$ which has $\mu$ terms. How shall we define $\mu + \mu$ ? For this purpose we must have two classes having $\mu$ terms, and they must not overlap. We can construct such classes from $\alpha$ in various ways, of which the following is perhaps the simplest: Form first all the ordered couples whose first term is a class consisting of a single member of $\alpha$ , and whose second term is the null-class; then, secondly, form all the ordered couples whose first term is [Pg 117] the null-class and whose second term is a class consisting of a single member of $\alpha$ . These two classes of couples have no member in common, and the logical sum of the two classes will have $\mu + \mu$ terms. Exactly analogously we can define $\mu + \nu$ , given that $\alpha$ is the number of some class $\alpha$ and $\nu$ is the number of some class $\beta$ .

In defining arithmetic operations, the only proper method is to create an actual class (or relation, in the case of relation-numbers) that has the required number of terms. This can sometimes take a bit of creativity, but it's crucial to prove the existence of the defined number. Let's consider the simplest example of addition. Suppose we have a cardinal number $\mu$ , and a class $\alpha$ that has $\mu$ terms. How can we define $\mu + \mu$ ? For this, we need two classes that each have $\mu$ terms, and they must not overlap. We can create such classes from $\alpha$ in various ways, one of the simplest being: first, form all the ordered pairs where the first element is a class consisting of a single member of $\alpha$ , and the second element is the empty class; then, form all the ordered pairs where the first element is the empty class and the second element is a class consisting of a single member of $\alpha$ . These two classes of pairs have no members in common, and the logical union of the two classes will have $\mu + \mu$ terms. Similarly, we can define $\mu + \nu$ , given that $\alpha$ is the number of some class $\alpha$ and $\nu$ is the number of some class $\beta$ .

Such definitions, as a rule, are merely a question of a suitable technical device. But in the case of multiplication, where the number of factors may be infinite, important problems arise out of the definition.

Such definitions are usually just a matter of an appropriate technical tool. However, in the case of multiplication, where the number of factors can be infinite, significant issues come up from the definition.

Multiplication when the number of factors is finite offers no difficulty. Given two classes $\alpha$ and $\beta$ , of which the first has $\mu$ terms and the second $\nu$ terms, we can define $\mu × \nu$ as the number of ordered couples that can be formed by choosing the first term out of $\alpha$ and the second out of $\beta$ . It will be seen that this definition does not require that $\alpha$ and $\beta$ should not overlap; it even remains adequate when $\alpha$ and $\beta$ are identical. For example, let $\alpha$ be the class whose members are $x_{1}$ , $x_{2}$ , $x_{3}$ . Then the class which is used to define the product $\mu × \mu$ is the class of couples: $\begin{align*} (x_{1}, x_{1}),\ (x_{1}, x_{2}),\ (x_{1}, x_{3}); \\ (x_{2}, x_{1}),\ (x_{2}, x_{2}),\ (x_{2}, x_{3}); \\ (x_{3}, x_{1}),\ (x_{3}, x_{2}),\ (x_{3}, x_{3}). \end{align*}$ This definition remains applicable when $\mu$ or $\nu$ or both are infinite, and it can be extended step by step to three or four or any finite number of factors. No difficulty arises as regards this definition, except that it cannot be extended to an infinite number of factors.

Multiplying a finite number of factors is straightforward. Given two sets $\alpha$ and $\beta$ , where the first set has $\mu$ elements and the second has $\nu$ elements, we define $\mu × \nu$ as the number of ordered pairs that can be made by picking the first element from $\alpha$ and the second from $\beta$ . This definition does not require that $\alpha$ and $\beta$ be distinct; it even holds true when $\alpha$ and $\beta$ are the same. For instance, let $\alpha$ be the set containing elements $x_{1}$ , $x_{2}$ , $x_{3}$ . Then the set used to define the product $\mu × \mu$ consists of the pairs: $\begin{align*} (x_{1}, x_{1}),\ (x_{1}, x_{2}),\ (x_{1}, x_{3}); \\ (x_{2}, x_{1}),\ (x_{2}, x_{2}),\ (x_{2}, x_{3}); \\ (x_{3}, x_{1}),\ (x_{3}, x_{2}),\ (x_{3}, x_{3}). \end{align*}$ This definition also applies when $\mu$ or $\nu$ or both are infinite, and it can be expanded step by step to three, four, or any finite number of factors. The only limitation is that it cannot be applied to an infinite number of factors.

The problem of multiplication when the number of factors may be infinite arises in this way: Suppose we have a class $\kappa$ consisting of classes; suppose the number of terms in each of these classes is given. How shall we define the product of all these numbers? If we can frame our definition generally, it will be applicable whether $\kappa$ is finite or infinite. It is to be observed that the problem is to be able to deal with the case when $\kappa$ is infinite, not with the case when its members are. If [Pg 118] $\kappa$ is not infinite, the method defined above is just as applicable when its members are infinite as when they are finite. It is the case when $\kappa$ is infinite, even though its members may be finite, that we have to find a way of dealing with.

The issue of multiplication when there can be an infinite number of factors comes up like this: Imagine we have a class $\kappa$ made up of classes; suppose each of these classes has a specific number of terms. How do we define the product of all these numbers? If we can come up with a general definition, it will work whether $\kappa$ is finite or infinite. It's important to note that the challenge is to handle the situation when $\kappa$ is infinite, not when its members are. If [Pg 118] $\kappa$ is not infinite, the method we defined earlier applies equally whether its members are infinite or finite. The real challenge is when $\kappa$ is infinite, even if its members are finite, and that's where we need to find a solution.

The following method of defining multiplication generally is due to Dr Whitehead. It is explained and treated at length in Principia Mathematica, vol. I. * 80 ff., and vol. II. * 114.

The method of defining multiplication outlined here is primarily attributed to Dr. Whitehead. It is discussed in detail in Principia Mathematica, vol. I. * 80 ff., and vol. II. * 114.

Let us suppose to begin with that $\kappa$ is a class of classes no two of which overlap—say the constituencies in a country where there is no plural voting, each constituency being considered as a class of voters. Let us now set to work to choose one term out of each class to be its representative, as constituencies do when they elect members of Parliament, assuming that by law each constituency has to elect a man who is a voter in that constituency. We thus arrive at a class of representatives, who make up our Parliament, one being selected out of each constituency. How many different possible ways of choosing a Parliament are there? Each constituency can select any one of its voters, and therefore if there are $\mu$ voters in a constituency, it can make $\mu$ choices. The choices of the different constituencies are independent; thus it is obvious that, when the total number of constituencies is finite, the number of possible Parliaments is obtained by multiplying together the numbers of voters in the various constituencies. When we do not know whether the number of constituencies is finite or infinite, we may take the number of possible Parliaments as defining the product of the numbers of the separate constituencies. This is the method by which infinite products are defined. We must now drop our illustration, and proceed to exact statements.

Let’s start by assuming that $\kappa$ is a class of classes in which no two overlap—like the electoral districts in a country without plural voting, with each district seen as a class of voters. Now, let’s choose one representative from each class, similar to how districts elect Parliament members, assuming that by law each district must elect someone who is a voter in that district. This leads us to a class of representatives who make up our Parliament, one being selected from each district. How many different ways can we form a Parliament? Each district can choose any of its voters, so if there are $\mu$ voters in a district, it has $\mu$ options. The choices made by different districts are independent of each other. Therefore, when the total number of districts is finite, the number of possible Parliaments is found by multiplying together the number of voters in each district. If we're unsure whether the number of districts is finite or infinite, we can define the number of possible Parliaments as the product of the numbers of the individual districts. This is how we define infinite products. Now we should set aside our example and move on to precise statements.

Let $\kappa$ be a class of classes, and let us assume to begin with that no two members of $\kappa$ overlap, i.e. that if $\alpha$ and $\beta$ are two different members of $\kappa$ , then no member of the one is a member of the other. We shall call a class a "selection" from $\kappa$ when it consists of just one term from each member of $\kappa$ ; i.e. $\mu$ is a "selection" from $\kappa$ if every member of $\mu$ belongs to some member [Pg 119] of $\kappa$ , and if $\alpha$ be any member of $\kappa$ , $\mu$ and $\alpha$ have exactly one term in common. The class of all "selections" from $\kappa$ we shall call the "multiplicative class" of $\kappa$ . The number of terms in the multiplicative class of $\kappa$ , i.e. the number of possible selections from $\kappa$ , is defined as the product of the numbers of the members of $\kappa$ . This definition is equally applicable whether $\kappa$ is finite or infinite.

Let $\kappa$ be a class of classes, and let's start by assuming that no two members of $\kappa$ overlap. This means that if $\alpha$ and $\beta$ are two different members of $\kappa$ , then no member of one is a member of the other. We'll call a class a "selection" from $\kappa$ when it includes exactly one term from each member of $\kappa$ . In other words, $\mu$ is a "selection" from $\kappa$ if every member of $\mu$ belongs to some member of $\kappa$ , and if $\alpha$ is any member of $\kappa$ , then $\mu$ and $\alpha$ have exactly one term in common. The class of all "selections" from $\kappa$ will be called the "multiplicative class" of $\kappa$ . The number of terms in the multiplicative class of $\kappa$ , meaning the number of possible selections from $\kappa$ , is defined as the product of the numbers of the members of $\kappa$ . This definition works whether $\kappa$ is finite or infinite.

Before we can be wholly satisfied with these definitions, we must remove the restriction that no two members of $\kappa$ are to overlap. For this purpose, instead of defining first a class called a "selection," we will define first a relation which we will call a "selector." A relation $\mathrm R$ will be called a "selector" from $\kappa$ if, from every member of $\kappa$ , it picks out one term as the representative of that member, i.e. if, given any member $\alpha$ of $\kappa$ , there is just one term $x$ which is a member of $\alpha$ and has the relation $\mathrm R$ to $\alpha$ ; and this is to be all that $\mathrm R$ does. The formal definition is:

Before we can be completely satisfied with these definitions, we need to remove the requirement that no two members of $\kappa$ can overlap. To do this, instead of first defining a class called a "selection," we'll first define a relation, which we will call a "selector." A relation $\mathrm R$ will be called a "selector" from $\kappa$ if, from every member of $\kappa$ , it selects one term as the representative of that member, i.e. if, for any member $\alpha$ of $\kappa$ , there is exactly one term $x$ that is a member of $\alpha$ and has the relation $\mathrm R$ to $\alpha$ ; and that is all that $\mathrm R$ does. The formal definition is:

A "selector" from a class of classes $\kappa$ is a one-many relation, having $\kappa$ for its converse domain, and such that, if $x$ has the relation to $\alpha$ , then $x$ is a member of $\alpha$ .

A "selector" from a class of classes $\kappa$ is a one-to-many relationship, having $\kappa$ as its converse domain. This means that if $x$ is related to $\alpha$ , then $x$ is a member of $\alpha$

If $\mathrm R$ is a selector from $\kappa$ , and $\alpha$ is a member of $\kappa$ , and $x$ is the term which has the relation $\mathrm R$ to $\alpha$ , we call $x$ the "representative" of $\alpha$ in respect of the relation $\mathrm R$ .

If $\mathrm R$ is a selector from $\kappa$ , and $\alpha$ is a member of $\kappa$ , and $x$ is the term that has the relation $\mathrm R$ to $\alpha$ , we call $x$ the "representative" of $\alpha$ regarding the relation $\mathrm R$ .

A "selection" from $\kappa$ will now be defined as the domain of a selector; and the multiplicative class, as before, will be the class of selections.

A "selection" from $\kappa$ will now be defined as the area of a selector; and the multiplicative class, as before, will be the class of selections.

But when the members of $\kappa$ overlap, there may be more selectors than selections, since a term $x$ which belongs to two classes $\alpha$ and $\beta$ may be selected once to represent $\alpha$ and once to represent $\beta$ , giving rise to different selectors in the two cases, but to the same selection. For purposes of defining multiplication, it is the selectors we require rather than the selections. Thus we define:

But when the members of $\kappa$ overlap, there can be more selectors than selections, since a term $x$ that belongs to both classes $\alpha$ and $\alpha$ and once to represent $\beta$ , leading to different selectors in the two instances, but the same selection. For defining multiplication, we need the selectors more than the selections. So we define:

"The product of the numbers of the members of a class of classes $\kappa$ " is the number of selectors from $\kappa$ .

"The product of the numbers of the members of a class of classes $\kappa$ is the number of selectors from $\kappa$ ."

We can define exponentiation by an adaptation of the above [Pg 120] plan. We might, of course, define $\mu^{\nu}$ as the number of selectors from $\nu$ classes, each of which has $\mu$ terms. But there are objections to this definition, derived from the fact that the multiplicative axiom (of which we shall speak shortly) is unnecessarily involved if it is adopted. We adopt instead the following construction:—

We can define exponentiation by adapting the plan above. We could define $\mu^{\nu}$ as the number of selectors from $\nu$ classes, each with $\mu$ terms. However, this definition has some issues because the multiplicative axiom (which we'll discuss soon) becomes unnecessarily complicated if we use it. Instead, we'll use the following approach:—

Let $\alpha$ be a class having $\mu$ terms, and $\beta$ a class having $\nu$ terms.

Let $\alpha$ be a class with $\mu$ terms, and $\beta$ a class with $\nu$ terms.

Let $y$ be a member of $\beta$ , and form the class of all ordered couples that have $y$ for their second term and a member of $\alpha$ for their first term. There will be $\mu$ such couples for a given $y$ , since any member of $\alpha$ may be chosen for the first term, and $\alpha$ has $\mu$ members. If we now form all the classes of this sort that result from varying $y$ , we obtain altogether $\nu$ classes, since $y$ may be any member of $\beta$ , and $\beta$ has $\nu$ members. These $\nu$ classes are each of them a class of couples, namely, all the couples that can be formed of a variable member of $\alpha$ and a fixed member of $\beta$ . We define $\mu^{\nu}$ as the number of selectors from the class consisting of these $\nu$ classes. Or we may equally well define $\mu^{\nu}$ as the number of selections, for, since our classes of couples are mutually exclusive, the number of selectors is the same as the number of selections. A selection from our class of classes will be a set of ordered couples, of which there will be exactly one having any given member of $\beta$ for its second term, and the first term may be any member of $\alpha$ . Thus $\mu^{\nu}$ is defined by the selectors from a certain set of $\nu$ classes each having $\mu$ terms, but the set is one having a certain structure and a more manageable composition than is the case in general. The relevance of this to the multiplicative axiom will appear shortly.

Let $y$ be a member of $\beta$ , and form the class of all ordered pairs that have $y$ as their second term and a member of $\alpha$ as their first term. There will be $\mu$ such pairs for a given $y$ , since any member of $\alpha$ may be chosen for the first term, and $\alpha$ has $\mu$ members. If we now create all the classes of this type that result from varying $y$ , we end up with $\nu$ classes altogether, since $y$ can be any member of $\beta$ , and $\beta$ has $\nu$ members. These $\nu$ classes are each a class of pairs, meaning all the pairs that can be formed with a variable member of $\alpha$ and a fixed member of $\beta$ . We define $\mu^{\nu}$ as the number of selectors from the class made up of these $\nu$ classes. Alternatively, we can define $\mu^{\nu}$ as the number of selections, since our classes of pairs are mutually exclusive, the number of selectors equals the number of selections. A selection from our class of classes will be a set of ordered pairs, with exactly one having any given member of $\beta$ as its second term, and the first term can be any member of $\mu^{\nu}$ is defined by the selectors from a specific set of $\nu$ classes each having $\mu$ terms, but the set has a specific structure and a more manageable composition than is generally the case. The relevance of this to the multiplicative axiom will become clear shortly.

What applies to exponentiation applies also to the product of two cardinals. We might define " $\mu × \nu$ " as the sum of the numbers of $\nu$ classes each having $\mu$ terms, but we prefer to define it as the number of ordered couples to be formed consisting of a member of $\alpha$ followed by a member of $\beta$ , where $\alpha$ has $\mu$ terms and $\beta$ has $\nu$ terms. This definition, also, is designed to evade the necessity of assuming the multiplicative axiom. [Pg 121]

What applies to exponentiation also applies to the product of two cardinals. We could define " $\mu × \nu$ " as the sum of the numbers of $\nu$ classes each having $\mu$ terms, but we prefer to define it as the number of ordered pairs that can be formed consisting of a member of $\alpha$ followed by a member of $\beta$ , where $\alpha$ has $\mu$ terms and $\beta$ has $\nu$ terms. This definition is also intended to avoid the need to assume the multiplicative axiom. [Pg 121]

With our definitions, we can prove the usual formal laws of multiplication and exponentiation. But there is one thing we cannot prove: we cannot prove that a product is only zero when one of its factors is zero. We can prove this when the number of factors is finite, but not when it is infinite. In other words, we cannot prove that, given a class of classes none of which is null, there must be selectors from them; or that, given a class of mutually exclusive classes, there must be at least one class consisting of one term out of each of the given classes. These things cannot be proved; and although, at first sight, they seem obviously true, yet reflection brings gradually increasing doubt, until at last we become content to register the assumption and its consequences, as we register the axiom of parallels, without assuming that we can know whether it is true or false. The assumption, loosely worded, is that selectors and selections exist when we should expect them. There are many equivalent ways of stating it precisely. We may begin with the following:—

With our definitions, we can prove the usual formal laws of multiplication and exponentiation. However, there’s one thing we can’t prove: a product is only zero when one of its factors is zero. We can demonstrate this when the number of factors is finite, but not when it's infinite. In other words, we can’t prove that, given a class of classes none of which is empty, there must be selectors from them; or that, given a class of mutually exclusive classes, there must be at least one class consisting of one term from each of the given classes. These things can’t be proven; and although they may seem obviously true at first glance, deeper thought leads to increasing doubt until we finally accept the assumption and its consequences, much like we accept the axiom of parallels, without claiming to know whether it's true or false. The assumption, simply put, is that selectors and selections exist when we expect them to. There are many equivalent ways to state it precisely. We can start with the following:—

"Given any class of mutually exclusive classes, of which none is null, there is at least one class which has exactly one term in common with each of the given classes."

"Given any set of mutually exclusive classes, none of which is empty, there is at least one class that shares exactly one term with each of the given classes."

This proposition we will call the "multiplicative axiom."[24] We will first give various equivalent forms of the proposition, and then consider certain ways in which its truth or falsehood is of interest to mathematics.

This proposition will be referred to as the "multiplicative axiom."[24] We'll first provide different equivalent forms of the proposition, and then explore how its truth or falsehood is relevant to mathematics.

[24]Principia Mathematica, vol. I. * 88. Also vol. III. * 257-258.

The multiplicative axiom is equivalent to the proposition that a product is only zero when at least one of its factors is zero; i.e. that, if any number of cardinal numbers be multiplied together, the result cannot be 0 unless one of the numbers concerned is 0.

The multiplicative axiom is equivalent to the statement that a product is only zero when at least one of its factors is zero; i.e. if any number of cardinal numbers are multiplied together, the result cannot be 0 unless one of the numbers involved is 0.

The multiplicative axiom is equivalent to the proposition that, if $\mathrm R$ be any relation, and $\kappa$ any class contained in the converse domain of $\mathrm R$ , then there is at least one one-many relation implying $\mathrm R$ and having $\kappa$ for its converse domain.

The multiplicative axiom is equivalent to the statement that if $\mathrm R$ represents any relation, and $\kappa$ is any class found in the converse domain of $\mathrm R$ , then there exists at least one one-many relation that indicates $\mathrm R$ and has $\kappa$ as its converse domain.

The multiplicative axiom is equivalent to the assumption that if $\alpha$ be any class, and $\kappa$ all the sub-classes of $\alpha$ with the exception [Pg 122] of the null-class, then there is at least one selector from $\kappa$ . This is the form in which the axiom was first brought to the notice of the learned world by Zermelo, in his "Beweis, dass jede Menge wohlgeordnet werden kann."[25] Zermelo regards the axiom as an unquestionable truth. It must be confessed that, until he made it explicit, mathematicians had used it without a qualm; but it would seem that they had done so unconsciously. And the credit due to Zermelo for having made it explicit is entirely independent of the question whether it is true or false.

The multiplicative axiom means that if $\alpha$ represents any class, and $\kappa$ represents all the sub-classes of $\alpha$ except for the null-class, then there’s at least one selector from $\kappa$ . This is how the axiom was first presented to the academic world by Zermelo in his "Beweis, dass jede Menge wohlgeordnet werden kann."[25] Zermelo sees the axiom as a self-evident truth. It must be acknowledged that, before he clarified it, mathematicians had been using it without hesitation; however, it seems they were doing so unconsciously. The recognition Zermelo deserves for clarifying it stands apart from whether it is actually true or not.

[25]Mathematische Annalen, vol. LIX. pp. 514-6. In this form we shall speak of it as Zermelo's axiom.

[25]Mathematical Annals, vol. LIX. pp. 514-6. In this context, we will refer to it as Zermelo's axiom.

The multiplicative axiom has been shown by Zermelo, in the above-mentioned proof, to be equivalent to the proposition that every class can be well-ordered, i.e. can be arranged in a series in which every sub-class has a first term (except, of course, the null-class). The full proof of this proposition is difficult, but it is not difficult to see the general principle upon which it proceeds. It uses the form which we call "Zermelo's axiom," i.e. it assumes that, given any class $\alpha$ , there is at least one one-many relation $\mathrm R$ whose converse domain consists of all existent sub-classes of $\alpha$ and which is such that, if $x$ has the relation $\mathrm R$ to $\xi$ , then $x$ is a member of $\xi$ . Such a relation picks out a "representative" from each sub-class; of course, it will often happen that two sub-classes have the same representative. What Zermelo does, in effect, is to count off the members of $\alpha$ , one by one, by means of $\mathrm R$ and transfinite induction. We put first the representative of $\alpha$ ; call it $x_{1}$ . Then take the representative of the class consisting of all of $\alpha$ except $x_{1}$ ; call it $x_{2}$ . It must be different from $x_{1}$ , because every representative is a member of its class, and $x_{1}$ is shut out from this class. Proceed similarly to take away $x_{2}$ , and let $x_{3}$ be the representative of what is left. In this way we first obtain a progression $x_{1}$ , $x_{2}$ , ... $x_{n}$ , ..., assuming that $\alpha$ is not finite. We then take away the whole progression; let $x_{\omega}$ be the representative of what is left of $\alpha$ . In this way we can go on until nothing is left. The successive representatives will form a [Pg 123] well-ordered series containing all the members of $\alpha$ . (The above is, of course, only a hint of the general lines of the proof.) This proposition is called "Zermelo's theorem."

The multiplicative axiom has been demonstrated by Zermelo, in the earlier proof, to be equivalent to the statement that every class can be well-ordered, i.e. can be organized in a sequence such that every sub-class has a first term (except, of course, the empty class). The complete proof of this statement is challenging, but the general principle behind it is not hard to grasp. It utilizes what we call "Zermelo's axiom," i.e. it assumes that, for any class $\alpha$ , there is at least one one-many relation $\mathrm R$ whose converse domain consists of all existing sub-classes of $\alpha$ and such that, if $x$ has the relation $\mathrm R$ to $\xi$ , then $x$ is a member of $\xi$ . Such a relation selects a "representative" from each sub-class; naturally, it will often occur that two sub-classes share the same representative. What Zermelo effectively does is count the members of $\alpha$ one by one, using $\mathrm R$ and transfinite induction. We start with the representative of $\alpha$ ; let's call it $x_{1}$ . Then we take the representative of the class consisting of all of $\alpha$ except $x_{1}$ ; we'll call it $x_{2}$ . It must be different from $x_{1}$ , since every representative is a member of its class, and $x_{1}$ is excluded from this class. We continue similarly to remove $x_{2}$ , and let $x_{3}$ be the representative of what remains. In this way we first obtain a sequence $x_{1}$ , $x_{2}$ , ... $x_{n}$ , ..., assuming that $\alpha$ is not finite. We then remove the entire sequence; let $x_{\omega}$ be the representative of what is left of $\alpha$ [Pg 123] well-ordered series that includes all the members of $\alpha$

The multiplicative axiom is also equivalent to the assumption that of any two cardinals which are not equal, one must be the greater. If the axiom is false, there will be cardinals $\mu$ and $\nu$ such that $\mu$ is neither less than, equal to, nor greater than $\nu$ . We have seen that $\aleph_{1}$ and $2^{\aleph_{0}}$ possibly form an instance of such a pair.

The multiplicative axiom is also equivalent to the idea that out of any two unequal cardinals, one must be greater than the other. If the axiom is false, there will be cardinals $\mu$ and $\nu$ such that $\mu$ is neither less than, equal to, nor greater than $\nu$ . We have seen that $\aleph_{1}$ and $2^{\aleph_{0}}$ could possibly be an example of such a pair.

Many other forms of the axiom might be given, but the above are the most important of the forms known at present. As to the truth or falsehood of the axiom in any of its forms, nothing is known at present.

Many other versions of the axiom could be provided, but the ones above are the most significant forms currently known. Regarding the truth or falsehood of the axiom in any of its forms, nothing is known at this time.

The propositions that depend upon the axiom, without being known to be equivalent to it, are numerous and important. Take first the connection of addition and multiplication. We naturally think that the sum of $\nu$ mutually exclusive classes, each having $\mu$ terms, must have $\mu × \nu$ terms. When $\nu$ is finite, this can be proved. But when $\nu$ is infinite, it cannot be proved without the multiplicative axiom, except where, owing to some special circumstance, the existence of certain selectors can be proved. The way the multiplicative axiom enters in is as follows: Suppose we have two sets of $\nu$ mutually exclusive classes, each having $\mu$ terms, and we wish to prove that the sum of one set has as many terms as the sum of the other. In order to prove this, we must establish a one-one relation. Now, since there are in each case $\nu$ classes, there is some one-one relation between the two sets of classes; but what we want is a one-one relation between their terms. Let us consider some one-one relation $\mathrm S$ between the classes. Then if $\kappa$ and $\lambda$ are the two sets of classes, and $\alpha$ is some member of $\kappa$ , there will be a member $\beta$ of $\lambda$ which will be the correlate of $\alpha$ with respect to $\mathrm S$ . Now $\alpha$ and $\beta$ each have $\mu$ terms, and are therefore similar. There are, accordingly, one-one correlations of $\alpha$ and $\beta$ . The trouble is that there are so many. In order to obtain a one-one correlation of the sum of $\kappa$ with the sum of $\lambda$ , we have to pick out one selection from a set of classes [Pg 124] of correlators, one class of the set being all the one-one correlators of $\alpha$ with $\beta$ . If $\kappa$ and $\lambda$ are infinite, we cannot in general know that such a selection exists, unless we can know that the multiplicative axiom is true. Hence we cannot establish the usual kind of connection between addition and multiplication.

The propositions that rely on the axiom, without being recognized as equivalent to it, are numerous and significant. First, consider the relationship between addition and multiplication. We naturally assume that the sum of $\nu$ mutually exclusive classes, each containing $\mu$ terms, should total $\mu × \nu$ terms. When $\nu$ is finite, this can be proved. However, when $\nu$ is infinite, it cannot be proved without the multiplicative axiom, unless specific conditions allow for the proof of certain selectors' existence. The role of the multiplicative axiom comes into play as follows: Suppose we have two sets of $\nu$ mutually exclusive classes, each with $\mu$ terms, and we want to prove that the sum of one set has the same number of terms as the other. To establish this, we must demonstrate a one-to-one relationship. Since each case has $\nu$ classes, there is some one-to-one relationship between the two sets of classes; however, we need a one-to-one relationship between their terms. Let’s consider some one-to-one relationship $\mathrm S$ among the classes. If $\kappa$ and $\lambda$ are the two sets of classes, and $\alpha$ is a member of $\kappa$ , then there will be a member $\beta$ of $\lambda$ that corresponds to $\alpha$ through $\mathrm S$ . Now $\alpha$ and $\beta$ each have $\mu$ terms, making them similar. Therefore, there are one-to-one correlations between $\alpha$ and $\beta$ . The problem is that there are so many. To achieve a one-to-one correlation of the sum of $\kappa$ with the sum of $\lambda$ , we have to choose one selection from a set of classes [Pg 124] of correlators, where one class of the set consists of all the one-to-one correlators of $\alpha$ with $\beta$ . If $\kappa$ and $\lambda$ are infinite, we generally cannot determine whether such a selection exists unless we know that the multiplicative axiom is true. Therefore, we cannot establish the usual connection between addition and multiplication.

This fact has various curious consequences. To begin with, we know that $\aleph_{0}^{2} = \aleph_{0} × \aleph_{0} = \aleph_{0}$ . It is commonly inferred from this that the sum of $\aleph_{0}$ classes each having $\aleph_{0}$ members must itself have $\aleph_{0}$ members, but this inference is fallacious, since we do not know that the number of terms in such a sum is $\aleph_{0} × \aleph_{0}$ , nor consequently that it is $\aleph_{0}$ . This has a bearing upon the theory of transfinite ordinals. It is easy to prove that an ordinal which has $\aleph_{0}$ predecessors must be one of what Cantor calls the "second class," i.e. such that a series having this ordinal number will have $\aleph_{0}$ terms in its field. It is also easy to see that, if we take any progression of ordinals of the second class, the predecessors of their limit form at most the sum of $\aleph_{0}$ classes each having $\aleph_{0}$ terms. It is inferred thence—fallaciously, unless the multiplicative axiom is true—that the predecessors of the limit are $\aleph_{0}$ in number, and therefore that the limit is a number of the "second class." That is to say, it is supposed to be proved that any progression of ordinals of the second class has a limit which is again an ordinal of the second class. This proposition, with the corollary that $\omega_{1}$ (the smallest ordinal of the third class) is not the limit of any progression, is involved in most of the recognised theory of ordinals of the second class. In view of the way in which the multiplicative axiom is involved, the proposition and its corollary cannot be regarded as proved. They may be true, or they may not. All that can be said at present is that we do not know. Thus the greater part of the theory of ordinals of the second class must be regarded as unproved.

This fact has several interesting consequences. To start with, we know that $\aleph_{0}^{2} = \aleph_{0} × \aleph_{0} = \aleph_{0}$ . It is often assumed from this that the sum of $\aleph_{0}$ classes, each having $\aleph_{0}$ members, must also have $\aleph_{0}$ members. However, this assumption is incorrect, as we don't know that the number of terms in such a sum is $\aleph_{0} × \aleph_{0}$ , nor that it is $\aleph_{0}$ . This has implications for the theory of transfinite ordinals. It's easy to show that an ordinal with $\aleph_{0}$ predecessors must be one of what Cantor refers to as the "second class," i.e. such that a series with this ordinal number will have $\aleph_{0}$ terms in its field. It's also clear that if we take any sequence of ordinals from the second class, the predecessors of their limit form at most the sum of $\aleph_{0}$ classes, each having $\aleph_{0}$ terms. It is incorrectly inferred from this—unless the multiplicative axiom is true—that the predecessors of the limit are $\aleph_{0}$ in number, and therefore that the limit is a number of the "second class." In other words, it's claimed that any sequence of ordinals from the second class has a limit that is again an ordinal from the second class. This proposition, along with the corollary that $\omega_{1}$ (the smallest ordinal of the third class) is not the limit of any progression, is central to most recognized theory regarding the second class ordinals. Given how the multiplicative axiom is involved, neither the proposition nor its corollary can be considered proven. They might be true, or they might not. All we can currently say is that we don't know. Hence, much of the theory regarding second class ordinals should be viewed as unproven.

Another illustration may help to make the point clearer. We know that $2 × \aleph_{0} = \aleph_{0}$ . Hence we might suppose that the sum of $\aleph_{0}$ pairs must have $\aleph_{0}$ terms. But this, though we can prove that it is sometimes the case, cannot be proved to happen always [Pg 125] unless we assume the multiplicative axiom. This is illustrated by the millionaire who bought a pair of socks whenever he bought a pair of boots, and never at any other time, and who had such a passion for buying both that at last he had $\aleph_{0}$ pairs of boots and $\aleph_{0}$ pairs of socks. The problem is: How many boots had he, and how many socks? One would naturally suppose that he had twice as many boots and twice as many socks as he had pairs of each, and that therefore he had $\aleph_{0}$ of each, since that number is not increased by doubling. But this is an instance of the difficulty, already noted, of connecting the sum of $\nu$ classes each having $\mu$ terms with $\mu × \nu$ . Sometimes this can be done, sometimes it cannot. In our case it can be done with the boots, but not with the socks, except by some very artificial device. The reason for the difference is this: Among boots we can distinguish right and left, and therefore we can make a selection of one out of each pair, namely, we can choose all the right boots or all the left boots; but with socks no such principle of selection suggests itself, and we cannot be sure, unless we assume the multiplicative axiom, that there is any class consisting of one sock out of each pair. Hence the problem.

Another example might make the point clearer. We know that $2 × \aleph_{0} = \aleph_{0}$ . So, we might think that the total of $\aleph_{0}$ pairs should have $\aleph_{0}$ terms. However, while we can show that this is sometimes true, it can't always be proven true without assuming the multiplicative axiom. This is illustrated by the millionaire who bought a new pair of socks every time he bought a pair of boots, and never at any other time. He was so obsessed with buying both that he ended up with $\aleph_{0}$ pairs of boots and $\aleph_{0}$ pairs of socks. The question is: How many boots did he have, and how many socks? One might naturally think he had twice as many boots and twice as many socks as he had pairs of each, which would mean he had $\aleph_{0}$ of each, since that number doesn't change when doubled. But this exemplifies the difficulty, already mentioned, of linking the total of $\nu$ classes, each containing $\mu$ terms, with $\mu × \nu$ . Sometimes this can be done, and sometimes it can't. In this case, it can be done with the boots, but not with the socks, unless we use some very artificial method. The reason for the difference is this: With boots, we can distinguish between right and left, allowing us to select one from each pair, meaning we could choose all the right boots or all the left boots. But with socks, no such principle of selection comes to mind, and we can't be sure, unless we assume the multiplicative axiom, that there's a class consisting of one sock from each pair. Hence the dilemma.

We may put the matter in another way. To prove that a class has $\aleph_{0}$ terms, it is necessary and sufficient to find some way of arranging its terms in a progression. There is no difficulty in doing this with the boots. The pairs are given as forming an $\aleph_{0}$ , and therefore as the field of a progression. Within each pair, take the left boot first and the right second, keeping the order of the pairs unchanged; in this way we obtain a progression of all the boots. But with the socks we shall have to choose arbitrarily, with each pair, which to put first; and an infinite number of arbitrary choices is an impossibility. Unless we can find a rule for selecting, i.e. a relation which is a selector, we do not know that a selection is even theoretically possible. Of course, in the case of objects in space, like socks, we always can find some principle of selection. For example, take the centres of mass of the socks: there will be points $p$ in space such that, with any [Pg 126] pair, the centres of mass of the two socks are not both at exactly the same distance from $p$ ; thus we can choose, from each pair, that sock which has its centre of mass nearer to $p$ . But there is no theoretical reason why a method of selection such as this should always be possible, and the case of the socks, with a little goodwill on the part of the reader, may serve to show how a selection might be impossible.

We can explain this in a different way. To prove that a class has $\aleph_{0}$ items, we need to find a way to arrange its items in a sequence. This is easy to do with the boots. The pairs are given as forming an $\aleph_{0}$ , and thus can represent a sequence. For each pair, we take the left boot first and the right boot second, keeping the order of the pairs the same; this way, we get a sequence of all the boots. However, with the socks, we will have to randomly decide which one to put first in each pair; making an infinite number of random choices is impossible. Unless we can find a specific rule for selecting, i.e., a relation that acts as a selector, we can't even say that a selection is theoretically possible. Of course, when it comes to physical objects, like socks, we can usually find some selection principle. For instance, consider the centers of mass of the socks: there will be points $p$ in space such that, with each pair, the centers of mass of the two socks are not both exactly the same distance from $p$ ; therefore, we can choose the sock from each pair that has its center of mass closer to $p$ . However, there’s no theoretical guarantee that this method of selection will always be possible, and the sock example, with a bit of goodwill from the reader, illustrates how selection might be unfeasible.

It is to be observed that, if it were impossible to select one out of each pair of socks, it would follow that the socks could not be arranged in a progression, and therefore that there were not $\aleph_{0}$ of them. This case illustrates that, if $\mu$ is an infinite number, one set of $\mu$ pairs may not contain the same number of terms as another set of $\mu$ pairs; for, given $\aleph_{0}$ pairs of boots, there are certainly $\aleph_{0}$ boots, but we cannot be sure of this in the case of the socks unless we assume the multiplicative axiom or fall back upon some fortuitous geometrical method of selection such as the above.

It's important to note that if it were impossible to pick one sock from each pair, then the socks could not be arranged in a sequence, which means there wouldn't be $\aleph_{0}$ of them. This example shows that if $\mu$ is an infinite number, one set of $\mu$ pairs might not have the same number of items as another set of $\mu$ pairs; because, given $\aleph_{0}$ pairs of boots, there are definitely $\aleph_{0}$ boots, but we can't be certain of this with the socks unless we accept the multiplicative axiom or rely on some random geometric method of selection like the one mentioned above.

Another important problem involving the multiplicative axiom is the relation of reflexiveness to non-inductiveness. It will be remembered that in Chapter VIII. we pointed out that a reflexive number must be non-inductive, but that the converse (so far as is known at present) can only be proved if we assume the multiplicative axiom. The way in which this comes about is as follows:—

Another important issue related to the multiplicative axiom is the connection between reflexiveness and non-inductiveness. As we mentioned in Chapter VIII, a reflexive number has to be non-inductive, but currently, we can only prove the opposite if we assume the multiplicative axiom. Here’s how this happens:—

It is easy to prove that a reflexive class is one which contains sub-classes having $\aleph_{0}$ terms. (The class may, of course, itself have $\aleph_{0}$ terms.) Thus we have to prove, if we can, that, given any non-inductive class, it is possible to choose a progression out of its terms. Now there is no difficulty in showing that a non-inductive class must contain more terms than any inductive class, or, what comes to the same thing, that if $\alpha$ is a non-inductive class and $\nu$ is any inductive number, there are sub-classes of $\alpha$ that have $\nu$ terms. Thus we can form sets of finite sub-classes of $\alpha$ : First one class having no terms, then classes having 1 term (as many as there are members of $\alpha$ ), then classes having [Pg 127] 2 terms, and so on. We thus get a progression of sets of sub-classes, each set consisting of all those that have a certain given finite number of terms. So far we have not used the multiplicative axiom, but we have only proved that the number of collections of sub-classes of $\alpha$ is a reflexive number, i.e. that, if $\mu$ is the number of members of $\alpha$ , so that $2^{\mu}$ is the number of sub-classes of $\alpha$ and $2^{2^{\mu}}$ is the number of collections of sub-classes, then, provided $\mu$ is not inductive, $2^{2^{\mu}}$ must be reflexive. But this is a long way from what we set out to prove.

It’s easy to show that a reflexive class includes sub-classes that have $\aleph_{0}$ terms. (The class itself can also have $\aleph_{0}$ terms.) Thus, we need to prove that it’s possible to choose a progression from its terms for any non-inductive class. It’s not difficult to demonstrate that a non-inductive class must contain more terms than any inductive class—or, similarly, if $\alpha$ is a non-inductive class and $\nu$ is any inductive number, there are sub-classes of $\alpha$ that have $\nu$ terms. So we can create sets of finite sub-classes of

In order to advance beyond this point, we must employ the multiplicative axiom. From each set of sub-classes let us choose out one, omitting the sub-class consisting of the null-class alone. That is to say, we select one sub-class containing one term, $\alpha_{1}$ , say; one containing two terms, $\alpha_{2}$ , say; one containing three, $\alpha_{3}$ , say; and so on. (We can do this if the multiplicative axiom is assumed; otherwise, we do not know whether we can always do it or not.) We have now a progression $\alpha_{1}$ , $\alpha_{2}$ , $\alpha_{3}$ , ... sub-classes of $\alpha$ , instead of a progression of collections of sub-classes; thus we are one step nearer to our goal. We now know that, assuming the multiplicative axiom, if $\mu$ is a non-inductive number, $2^{\mu}$ must be a reflexive number.

To move forward from this point, we need to use the multiplicative axiom. From each group of sub-classes, let's pick one, excluding the sub-class that contains only the null-class. In other words, we select one sub-class with one term, $\alpha_{1}$ , for example; one with two terms, $\alpha_{2}$ , for example; one with three terms, $\alpha_{3}$ , and so on. (We can do this if we accept the multiplicative axiom; otherwise, we can't be certain if we can always do it.) Now we have a sequence $\alpha_{1}$ , $\alpha_{2}$ , $\alpha_{3}$ , ... sub-classes of $\alpha$ , instead of a series of collections of sub-classes; so we are one step closer to our goal. We now understand that, assuming the multiplicative axiom, if $\mu$ is a non-inductive number, then $2^{\mu}$ must be a reflexive number.

The next step is to notice that, although we cannot be sure that new members of $\alpha$ come in at any one specified stage in the progression $\alpha_{1}$ , $\alpha_{2}$ , $\alpha_{3}$ , ... we can be sure that new members keep on coming in from time to time. Let us illustrate. The class $\alpha_{1}$ , which consists of one term, is a new beginning; let the one term be $x_{1}$ . The class $\alpha_{2}$ , consisting of two terms, may or may not contain $x_{1}$ ; if it does, it introduces one new term; and if it does not, it must introduce two new terms, say $x_{2}$ , $x_{3}$ . In this case it is possible that $\alpha_{3}$ consists of $x_{1}$ , $x_{2}$ , $x_{3}$ , and so introduces no new terms, but in that case $\alpha_{4}$ must introduce a new term. The first $\nu$ classes $\alpha_{1}$ , $\alpha_{2}$ , $\alpha_{3}$ , ... $\alpha_{\nu}$ contain, at the very most, $1 + 2 + 3 + \dots + \nu$ terms, i.e. $\nu/(\nu + 1)/2$ terms; thus it would be possible, if there were no repetitions in the first $\nu$ classes, to go on with only repetitions from the $(\nu + 1)^{th}$ [Pg 128] class to the $\nu(\nu + 1)/2^{th}$ class. But by that time the old terms would no longer be sufficiently numerous to form a next class with the right number of members, i.e. $\nu(\nu + 1)/2 + 1$ , therefore new terms must come in at this point if not sooner. It follows that, if we omit from our progression $\alpha_{1}$ , $\alpha_{2}$ , $\alpha_{3}$ ,... all those classes that are composed entirely of members that have occurred in previous classes, we shall still have a progression. Let our new progression be called $\beta_{1}$ , $\beta_{2}$ , $\beta_{3}$ .... (We shall have $\alpha_{1} = \beta_{1}$ and $\alpha_{2} = \beta_{2}$ , because $\alpha_{1}$ and $\alpha_{2}$ must introduce new terms. We may or may not have $\alpha_{3} = \beta_{3}$ , but, speaking generally, $\beta_{\nu}$ will be $\alpha_{\nu}$ , where $\nu$ is some number greater than $\mu$ ; i.e. the $\beta$ 's are some of the $\alpha$ 's.) Now these $\beta$ 's are such that any one of them, say $\beta_{\mu}$ , contains members which have not occurred in any of the previous $\beta$ 's. Let $\gamma_{\mu}$ be the part of $\beta_{\mu}$ which consists of new members. Thus we get a new progression $\gamma_{1}$ , $\gamma_{2}$ , $\gamma_{3}$ ,... (Again $\gamma_{1}$ will be identical with $\beta_{1}$ and with $\alpha_{1}$ ; if $\alpha_{2}$ does not contain the one member of $\alpha_{1}$ , we shall have $\gamma_{2} = \beta_{2} = \alpha_{2}$ , but if $\alpha_{2}$ does contain this one member, $\gamma_{2}$ will consist of the other member of $\alpha_{2}$ ). This new progression of $\gamma$ 's consists of mutually exclusive classes. Hence a selection from them will be a progression; i.e. if $x_{1}$ is the member of $\gamma_{1}$ , $x_{2}$ is a member of $\gamma_{2}$ , $x_{3}$ is a member of $\gamma_{3}$ , and so on; then $x_{1}$ , $x_{2}$ , $x_{3}$ , ... is a progression, and is a sub-class of $\alpha$ . Assuming the multiplicative axiom, such a selection can be made. Thus by twice using this axiom we can prove that, if the axiom is true, every non-inductive cardinal must be reflexive. This could also be deduced from Zermelo's theorem, that, if the axiom is true, every class can be well ordered; for a well-ordered series must have either a finite or a reflexive number of terms in its field.

The next step is to notice that, even though we can't be sure that new members of $\alpha$ arrive at any specific stage in the progression $\alpha_{1}$ , $\alpha_{2}$ , $\alpha_{3}$ , ... we can be sure that new members keep coming in from time to time. Let’s illustrate this. The class $\alpha_{1}$ , which has one term, is a new beginning; let that term be $x_{1}$ . The class $\alpha_{2}$ , which has two terms, may or may not include $x_{1}$ ; if it does, it introduces one new term; if it doesn't, it must introduce two new terms, let’s say $x_{2}$ , $x_{3}$ . In this case, it's possible that $\alpha_{3}$ consists of $x_{1}$ , $x_{2}$ , $x_{3}$ and thus introduces no new terms, but in that case $\alpha_{4}$ must introduce a new term. The first $\nu$ classes $\alpha_{1}$ , $\alpha_{2}$ , $\alpha_{3}$ , ... $\alpha_{\nu}$ contain, at most, $1 + 2 + 3 + \dots + \nu$ terms, i.e. $\nu/(\nu + 1)/2$ terms; therefore, it would be possible, if there were no repetitions in the first $\nu$ classes, to continue with only repetitions from the $(\nu + 1)^{th}$ class to the $\nu(\nu + 1)/2^{th}$ class. But by that time the old terms would no longer be numerous enough to form the next class with the required number of members, $\nu(\nu + 1)/2 + 1$ ; thus, new terms must come in at this point if not before. It follows that, if we exclude from our progression $\alpha_{1}$ , $\alpha_{2}$ , $\alpha_{3}$ ,... all those classes that are made up entirely of members that have appeared in earlier classes, we will still have a progression. Let our new progression be called $\beta_{1}$ , $\beta_{2}$ , $\beta_{3}$ .... (We will have $\alpha_{1} = \beta_{1}$ and $\alpha_{2} = \beta_{2}$ , because $\alpha_{1}$ and $\alpha_{2}$ must introduce new terms. We might or might not have $\alpha_{3} = \beta_{3}$ but generally, $\beta_{\nu}$ will be $\alpha_{\nu}$ where $\nu$ is some number greater than $\mu$ ; i.e. the $\beta$ 's are some of the $\alpha$ 's.) Now these $\beta$ 's are such that any one of them, let’s say $\beta_{\mu}$ , contains members that have not appeared in any of the previous $\beta$ 's. Let $\gamma_{\mu}$ be the part of $\beta_{\mu}$ that consists of new members. Thus we get a new progression $\gamma_{1}$ , $\gamma_{2}$ , $\gamma_{3}$ ,... (Again $\gamma_{1}$ will be identical to $\beta_{1}$ and with $\alpha_{1}$ ; if $\alpha_{2}$ does not contain the one member of $\alpha_{1}$ , we will have $\gamma_{2} = \beta_{2} = \alpha_{2}$ , but if $\alpha_{2}$ does include this one member, $\gamma_{2}$ will consist of the other member of $\alpha_{2}$ ). This new progression of $\gamma$ 's consists of mutually exclusive classes. Hence, a selection from them will create a progression; i.e. if $x_{1}$ is a member of $\gamma_{1}$ , $x_{2}$ is a member of $\gamma_{2}$ , $x_{3}$ is a member of $\gamma_{3}$ , and so on; then $x_{1}$ , $x_{2}$ , $x_{3}$ , ... forms a progression and is a subclass of $\alpha$ . Assuming the multiplicative axiom, such a selection can be made. Thus, by applying this axiom twice, we can prove that if the axiom is true, every non-inductive cardinal must be reflexive. This could also be derived from Zermelo's theorem, which states that if the axiom is true, every class can be well ordered; because a well-ordered series must have either a finite or a reflexive number of terms in its field.

There is one advantage in the above direct argument, as against deduction from Zermelo's theorem, that the above argument does not demand the universal truth of the multiplicative axiom, but only its truth as applied to a set of $\aleph_{0}$ classes. It may happen that the axiom holds for $\aleph_{0}$ classes, though not for larger numbers of classes. For this reason it is better, when [Pg 129] it is possible, to content ourselves with the more restricted assumption. The assumption made in the above direct argument is that a product of $\aleph_{0}$ factors is never zero unless one of the factors is zero. We may state this assumption in the form: " $\aleph_{0}$ is a multipliable number," where a number $\nu$ is defined as "multipliable" when a product of $\nu$ factors is never zero unless one of the factors is zero. We can prove that a finite number is always multipliable, but we cannot prove that any infinite number is so. The multiplicative axiom is equivalent to the assumption that all cardinal numbers are multipliable. But in order to identify the reflexive with the non-inductive, or to deal with the problem of the boots and socks, or to show that any progression of numbers of the second class is of the second class, we only need the very much smaller assumption that $\aleph_{0}$ is multipliable.

There’s one advantage to the direct argument mentioned above compared to deducing from Zermelo's theorem: the argument doesn't require the universal truth of the multiplicative axiom, just that it holds true for a set of $\aleph_{0}$ classes. It’s possible that the axiom is valid for $\aleph_{0}$ classes but not for a larger number of classes. For this reason, it’s better to stick with the more limited assumption whenever we can. The assumption made in the direct argument is that a product of $\aleph_{0}$ factors is never zero unless one of the factors is zero. We can express this assumption as: " $\aleph_{0}$ is a multipliable number," where a number $\nu$ is considered "multipliable" if a product of $\nu$ factors is never zero unless one of the factors is zero. We can prove that a finite number is always multipliable, but we can't prove that any infinite number is. The multiplicative axiom is equivalent to the assumption that all cardinal numbers are multipliable. However, to identify the reflexive with the non-inductive, to address the issue of boots and socks, or to show that any progression of numbers of the second class is of the second class, we only need the much smaller assumption that $\aleph_{0}$ is multipliable.

It is not improbable that there is much to be discovered in regard to the topics discussed in the present chapter. Cases may be found where propositions which seem to involve the multiplicative axiom can be proved without it. It is conceivable that the multiplicative axiom in its general form may be shown to be false. From this point of view, Zermelo's theorem offers the best hope: the continuum or some still more dense series might be proved to be incapable of having its terms well ordered, which would prove the multiplicative axiom false, in virtue of Zermelo's theorem. But so far, no method of obtaining such results has been discovered, and the subject remains wrapped in obscurity. [Pg 130]

It's quite possible that there’s a lot more to uncover about the topics discussed in this chapter. There could be instances where statements that appear to rely on the multiplicative axiom can actually be proven without it. It’s possible that the multiplicative axiom in its general form could be shown to be incorrect. From this perspective, Zermelo's theorem offers the best hope: the continuum or potentially an even denser series might be shown to lack a well-ordered structure, which would demonstrate the multiplicative axiom to be false, according to Zermelo's theorem. However, up to now, no method has been found to achieve these results, and the topic remains unclear. [Pg 130]

CHAPTER XIII

THE AXIOM OF INFINITY AND LOGICAL TYPES

THE axiom of infinity is an assumption which may be enunciated as follows:—

THE axiom of infinity is an assumption that can be stated as follows:—

"If $n$ be any inductive cardinal number, there is at least one class of individuals having $n$ terms."

"If $n$ is any inductive cardinal number, there exists at least one class of individuals that has $n$ terms."

If this is true, it follows, of course, that there are many classes of individuals having $n$ terms, and that the total number of individuals in the world is not an inductive number. For, by the axiom, there is at least one class having $n + 1$ terms, from which it follows that there are many classes of $n$ terms and that $n$ is not the number of individuals in the world. Since $n$ is any inductive number, it follows that the number of individuals in the world must (if our axiom be true) exceed any inductive number. In view of what we found in the preceding chapter, about the possibility of cardinals which are neither inductive nor reflexive, we cannot infer from our axiom that there are at least $\aleph_{0}$ individuals, unless we assume the multiplicative axiom. But we do know that there are at least $\aleph_{0}$ classes of classes, since the inductive cardinals are classes of classes, and form a progression if our axiom is true. The way in which the need for this axiom arises may be explained as follows:—One of Peano's assumptions is that no two inductive cardinals have the same successor, i.e. that we shall not have $m + 1 = n + 1$ unless $m = n$ , if $m$ and $n$ are inductive cardinals. In Chapter VIII. we had occasion to use what is virtually the same as the above assumption of Peano's, namely, that, if $n$ is an inductive cardinal, [Pg 131] $n$ is not equal to $n + 1$ . It might be thought that this could be proved. We can prove that, if $\alpha$ is an inductive class, and $n$ is the number of members of $\alpha$ , then $n$ is not equal to $n + 1$ . This proposition is easily proved by induction, and might be thought to imply the other. But in fact it does not, since there might be no such class as $\alpha$ . What it does imply is this: If $n$ is an inductive cardinal such that there is at least one class having $n$ members, then $n$ is not equal to $n + 1$ . The axiom of infinity assures us (whether truly or falsely) that there are classes having $n$ members, and thus enables us to assert that $n$ is not equal to $n + 1$ . But without this axiom we should be left with the possibility that $n$ and $n + 1$ might both be the null-class.

If this is true, then it naturally follows that there are many groups of individuals with $n$ terms, and that the total number of individuals in the world is not an inductive number. According to the axiom, there is at least one group with $n + 1$ terms, which implies that there are many groups with $n$ terms and that $n$ is not the number of individuals in the world. Since $n$ is any inductive number, it follows that the number of individuals in the world must (if our axiom is true) exceed any inductive number. Based on what we found in the previous chapter about the existence of cardinals that are neither inductive nor reflexive, we cannot conclude from our axiom that there are at least $\aleph_{0}$ individuals unless we accept the multiplicative axiom. However, we do know that there are at least $\aleph_{0}$ classes of classes since the inductive cardinals are classes of classes and form a progression if our axiom holds true. The reason this axiom is necessary can be explained as follows: One of Peano's assumptions is that no two inductive cardinals can have the same successor, i.e. we won't have $m + 1 = n + 1$ unless $m = n$ if $m$ and $n$ are inductive cardinals. In Chapter VIII, we had the opportunity to use something very similar to Peano's assumption, specifically that if $n$ is an inductive cardinal, [Pg 131] $n$ is not equal to $n + 1$ . It may seem like this could be proven. We can show that if $\alpha$ is an inductive class, and $n$ is the number of members in $\alpha$ , then $n$ is not equal to $n + 1$ . This proposition can be easily proven by induction and might seem to imply the other. However, it doesn’t, since there may not be such a class as $\alpha$ . What it does imply is this: If $n$ is an inductive cardinal and there is at least one class with $n$ members, then $n$ is not equal to $n$ members, allowing us to say that $n$ is not equal to $n + 1$

Let us illustrate this possibility by an example: Suppose there were exactly nine individuals in the world. (As to what is meant by the word "individual," I must ask the reader to be patient.) Then the inductive cardinals from 0 up to 9 would be such as we expect, but 10 (defined as $9 + 1$ ) would be the null-class. It will be remembered that $n + 1$ may be defined as follows: $n + 1$ is the collection of all those classes which have a term $x$ such that, when $x$ is taken away, there remains a class of $n$ terms. Now applying this definition, we see that, in the case supposed, $9 + 1$ is a class consisting of no classes, i.e. it is the null-class. The same will be true of $9 + 2$ , or generally of $9 + n$ , unless $n$ is zero. Thus 10 and all subsequent inductive cardinals will all be identical, since they will all be the null-class. In such a case the inductive cardinals will not form a progression, nor will it be true that no two have the same successor, for 9 and 10 will both be succeeded by the null-class (10 being itself the null-class). It is in order to prevent such arithmetical catastrophes that we require the axiom of infinity.

Let’s illustrate this possibility with an example: Imagine there are exactly nine people in the world. (As for what is meant by the term "person," I ask for your patience.) The inductive cardinals from 0 to 9 would be what we expect, but 10 (defined as $9 + 1$ ) would be the null class. It should be noted that $n + 1$ can be defined as follows: $n + 1$ is the collection of all those classes that have a term $x$ such that, when $x$ is removed, what remains is a class with $n$ terms. Now, applying this definition, we see that in the assumed case, $9 + 1$ is a class that consists of no classes; in other words, it is the null-class. The same will be true for $9 + 2$ or, generally, for $9 + n$ , unless $n$ is zero. Thus, 10 and all following inductive cardinals will be the same, as they will all be the null-class. In such a scenario, the inductive cardinals won’t form a progression, nor will it be true that no two have the same successor, since both 9 and 10 will be succeeded by the null-class (with 10 itself being the null-class). It is to prevent such mathematical disasters that we require the axiom of infinity.

As a matter of fact, so long as we are content with the arithmetic of finite integers, and do not introduce either infinite integers or infinite classes or series of finite integers or ratios, it is possible to obtain all desired results without the axiom of infinity. That is to say, we can deal with the addition, multiplication, [Pg 132] and exponentiation of finite integers and of ratios, but we cannot deal with infinite integers or with irrationals. Thus the theory of the transfinite and the theory of real numbers fails us. How these various results come about must now be explained.

As long as we stick to the arithmetic of finite integers and don't involve infinite integers or infinite sets or series of finite integers or ratios, we can achieve all our goals without needing the axiom of infinity. In other words, we can work with the addition, multiplication, and exponentiation of finite integers and their ratios, but we can’t handle infinite integers or irrationals. Therefore, the theories of transfinite numbers and real numbers are inadequate for us. Now, we need to explain how these different results come about. [Pg 132]

Assuming that the number of individuals in the world is $n$ , the number of classes of individuals will be $2^{n}$ . This is in virtue of the general proposition mentioned in Chapter VIII. that the number of classes contained in a class which has $n$ members is $2^{n}$ . Now $2^{n}$ is always greater than $n$ . Hence the number of classes in the world is greater than the number of individuals. If, now, we suppose the number of individuals to be 9, as we did just now, the number of classes will be $2^{9}$ , i.e. 512. Thus if we take our numbers as being applied to the counting of classes instead of to the counting of individuals, our arithmetic will be normal until we reach 512: the first number to be null will be 513. And if we advance to classes of classes we shall do still better: the number of them will be $2^{512}$ , a number which is so large as to stagger imagination, since it has about 153 digits. And if we advance to classes of classes of classes, we shall obtain a number represented by 2 raised to a power which has about 153 digits; the number of digits in this number will be about three times $10^{152}$ . In a time of paper shortage it is undesirable to write out this number, and if we want larger ones we can obtain them by travelling further along the logical hierarchy. In this way any assigned inductive cardinal can be made to find its place among numbers which are not null, merely by travelling along the hierarchy for a sufficient distance.[26]

Assuming the number of people in the world is $n$ , the number of categories of people will be $2^{n}$ . This follows the general idea mentioned in Chapter VIII, that the number of categories contained in a category with $n$ members is $2^{n}$ . Now $2^{n}$ is always greater than $2^{9}$ or 512. So if we apply our numbers to counting categories instead of individuals, our math will work normally until we get to 512: the first number that won't work will be 513. And if we move on to categories of categories, it gets even better: the number of them will be $2^{512}$ , which is such a gigantic number it's hard to imagine, having about 153 digits. If we go further to categories of categories of categories, we will get a number represented by 2 raised to a power that has about 153 digits; the number of digits in this number will be around three times $10^{152}$ [26]

[26]On this subject see Principia Mathematica, vol. II. * 120 ff. On the corresponding problems as regards ratio, see ibid., vol. III. * 303 ff.

[26]For more on this topic, see Principia Mathematica, vol. II. * 120 ff. For related issues regarding ratio, refer to ibid., vol. III. * 303 ff.

As regards ratios, we have a very similar state of affairs. If a ratio $\mu/\nu$ is to have the expected properties, there must be enough objects of whatever sort is being counted to insure that the null-class does not suddenly obtrude itself. But this can be insured, for any given ratio $\mu/\nu$ , without the axiom of [Pg 133] infinity, by merely travelling up the hierarchy a sufficient distance. If we cannot succeed by counting individuals, we can try counting classes of individuals; if we still do not succeed, we can try classes of classes, and so on. Ultimately, however few individuals there may be in the world, we shall reach a stage where there are many more than $\mu$ objects, whatever inductive number $\mu$ may be. Even if there were no individuals at all, this would still be true, for there would then be one class, namely, the null-class, 2 classes of classes (namely, the null-class of classes and the class whose only member is the null-class of individuals), 4 classes of classes of classes, 16 at the next stage, 65,536 at the next stage, and so on. Thus no such assumption as the axiom of infinity is required in order to reach any given ratio or any given inductive cardinal.

When it comes to ratios, we have a very similar situation. For a ratio $\mu/\nu$ to have the expected properties, there needs to be enough objects of whatever type is being counted to ensure that the null-class doesn’t suddenly appear. But this can be guaranteed, for any given ratio $\mu/\nu$ , without needing the axiom of [Pg 133] infinity, just by moving up the hierarchy a sufficient amount. If we can’t succeed by counting individuals, we can try counting classes of individuals; if that doesn’t work, we can count classes of classes, and so on. Ultimately, no matter how few individuals there may be in the world, we’ll reach a point where there are many more than $\mu$ objects, whatever inductive number $\mu$ may be. Even if there were no individuals at all, this would still hold true, because there would then be one class, the null-class, 2 classes of classes (the null-class of classes and the class whose only member is the null-class of individuals), 4 classes of classes of classes, 16 at the next level, 65,536 at the following level, and so on. Therefore, there’s no need for an assumption like the axiom of infinity to reach any given ratio or any given inductive cardinal.

It is when we wish to deal with the whole class or series of inductive cardinals or of ratios that the axiom is required. We need the whole class of inductive cardinals in order to establish the existence of $\aleph_{0}$ , and the whole series in order to establish the existence of progressions: for these results, it is necessary that we should be able to make a single class or series in which no inductive cardinal is null. We need the whole series of ratios in order of magnitude in order to define real numbers as segments: this definition will not give the desired result unless the series of ratios is compact, which it cannot be if the total number of ratios, at the stage concerned, is finite.

It’s when we want to work with the entire class or series of inductive cardinals or ratios that we need the axiom. We require the entire class of inductive cardinals to prove the existence of $\aleph_{0}$ , and the whole series to show the existence of progressions. For these outcomes, it’s essential that we can create a single class or series where no inductive cardinal is null. We need the entire series of ratios arranged by size to define real numbers as segments: this definition won’t yield the expected result unless the series of ratios is compact, which it can't be if the total number of ratios, at the relevant stage, is finite.

It would be natural to suppose—as I supposed myself in former days—that, by means of constructions such as we have been considering, the axiom of infinity could be proved. It may be said: Let us assume that the number of individuals is $n$ , where $n$ may be 0 without spoiling our argument; then if we form the complete set of individuals, classes, classes of classes, etc., all taken together, the number of terms in our whole set will be $n + 2^{n} + 2^{2^{n}} + \dots\ \mathit{ad\,inf.},$ which is $\aleph_{0}$ . Thus taking all kinds of objects together, and not [Pg 134] confining ourselves to objects of any one type, we shall certainly obtain an infinite class, and shall therefore not need the axiom of infinity. So it might be said.

It would be natural to think—like I once thought—that, with constructions like the ones we've been discussing, the axiom of infinity could be proved. One might say: Let's assume the number of individuals is $n$ , where $n$ can be 0 without affecting our argument; then if we create the complete set of individuals, classes, classes of classes, etc., all combined, the number of terms in our entire set will be $n + 2^{n} + 2^{2^{n}} + \dots\ \mathit{ad\,inf.},$ which is $\aleph_{0}$ . So, by considering all types of objects together, and not limiting ourselves to just one kind, we would definitely have an infinite class, and therefore wouldn't need the axiom of infinity. That’s one way to put it.

Now, before going into this argument, the first thing to observe is that there is an air of hocus-pocus about it: something reminds one of the conjurer who brings things out of the hat. The man who has lent his hat is quite sure there wasn't a live rabbit in it before, but he is at a loss to say how the rabbit got there. So the reader, if he has a robust sense of reality, will feel convinced that it is impossible to manufacture an infinite collection out of a finite collection of individuals, though he may be unable to say where the flaw is in the above construction. It would be a mistake to lay too much stress on such feelings of hocus-pocus; like other emotions, they may easily lead us astray. But they afford a prima facie ground for scrutinising very closely any argument which arouses them. And when the above argument is scrutinised it will, in my opinion, be found to be fallacious, though the fallacy is a subtle one and by no means easy to avoid consistently.

Now, before diving into this argument, the first thing to notice is that there’s a sense of trickery about it: it reminds one of a magician pulling things out of a hat. The person who lent his hat is pretty sure there wasn’t a live rabbit in it before, but he can’t quite explain how the rabbit ended up there. So the reader, if they have a strong grasp on reality, will feel convinced that it’s impossible to create an infinite collection from a finite collection of individuals, even if they can’t pinpoint where the flaw is in that reasoning. It would be a mistake to put too much emphasis on such feelings of trickery; like other emotions, they can easily mislead us. However, they provide a prima facie basis for closely examining any argument that stirs them. And when this argument is examined, it will, in my view, turn out to be flawed, although the fallacy is subtle and not easy to consistently avoid.

The fallacy involved is the fallacy which may be called "confusion of types." To explain the subject of "types" fully would require a whole volume; moreover, it is the purpose of this book to avoid those parts of the subjects which are still obscure and controversial, isolating, for the convenience of beginners, those parts which can be accepted as embodying mathematically ascertained truths. Now the theory of types emphatically does not belong to the finished and certain part of our subject: much of this theory is still inchoate, confused, and obscure. But the need of some doctrine of types is less doubtful than the precise form the doctrine should take; and in connection with the axiom of infinity it is particularly easy to see the necessity of some such doctrine.

The mistake here is known as the "confusion of types." To fully explain the topic of "types" would take an entire book; additionally, this book aims to focus on the clearer and less controversial aspects of the subject, making it easier for beginners by highlighting parts that represent mathematically proven truths. However, the theory of types is definitely not part of the settled and certain aspects of our subject: a lot of this theory is still unfinished, mixed up, and unclear. That said, the need for some theory of types is more certain than the exact way that theory should be formulated; and when it comes to the axiom of infinity, it's especially apparent why we need some sort of theory.

This necessity results, for example, from the "contradiction of the greatest cardinal." We saw in Chapter VIII. that the number of classes contained in a given class is always greater than the [Pg 135] number of members of the class, and we inferred that there is no greatest cardinal number. But if we could, as we suggested a moment ago, add together into one class the individuals, classes of individuals, classes of classes of individuals, etc., we should obtain a class of which its own sub-classes would be members. The class consisting of all objects that can be counted, of whatever sort, must, if there be such a class, have a cardinal number which is the greatest possible. Since all its sub-classes will be members of it, there cannot be more of them than there are members. Hence we arrive at a contradiction.

This necessity comes, for instance, from the "contradiction of the greatest cardinal." We discussed in Chapter VIII that the number of classes within a given class is always greater than the number of members in that class, leading us to the conclusion that there is no greatest cardinal number. However, if we could, as we suggested earlier, combine all individuals, classes of individuals, classes of classes of individuals, etc., into one class, we would create a class where its own sub-classes would be members. The class that includes all objects that can be counted, of any kind, must, if such a class exists, have a cardinal number that is the highest possible. Since all its sub-classes will be members of it, there can't be more of them than there are members. Therefore, we reach a contradiction.

When I first came upon this contradiction, in the year 1901, I attempted to discover some flaw in Cantor's proof that there is no greatest cardinal, which we gave in Chapter VIII. Applying this proof to the supposed class of all imaginable objects, I was led to a new and simpler contradiction, namely, the following:—

When I first encountered this contradiction in 1901, I tried to find a flaw in Cantor's proof that there is no greatest cardinal, which we discussed in Chapter VIII. By applying this proof to the supposed class of all imaginable objects, I arrived at a new and simpler contradiction, specifically the following:—

The comprehensive class we are considering, which is to embrace everything, must embrace itself as one of its members. In other words, if there is such a thing as "everything," then "everything" is something, and is a member of the class "everything." But normally a class is not a member of itself. Mankind, for example, is not a man. Form now the assemblage of all classes which are not members of themselves. This is a class: is it a member of itself or not? If it is, it is one of those classes that are not members of themselves, i.e. it is not a member of itself. If it is not, it is not one of those classes that are not members of themselves, i.e. it is a member of itself. Thus of the two hypotheses—that it is, and that it is not, a member of itself—each implies its contradictory. This is a contradiction.

The all-encompassing class we’re talking about, which is supposed to include everything, has to include itself as one of its members. In other words, if there’s such a thing as “everything,” then “everything” is something and is part of the class of “everything.” But typically, a class isn’t a member of itself. For example, humanity isn’t a single man. Now consider the collection of all classes that aren’t members of themselves. This is a class: is it a member of itself or not? If it is, then it’s one of those classes that aren’t members of themselves, meaning it isn’t a member of itself. If it’s not, then it’s not one of those classes that aren’t members of themselves, meaning it is a member of itself. Therefore, of the two possibilities—that it is a member of itself, and that it isn’t—each leads to a contradiction. This is a contradiction.

There is no difficulty in manufacturing similar contradictions ad lib. The solution of such contradictions by the theory of types is set forth fully in Principia Mathematica,[27] and also, more briefly, in articles by the present author in the American Journal [Pg 136] of Mathematics,[28] and in the Revue de Metaphysique et de Morale.[29] For the present an outline of the solution must suffice.

There’s no difficulty in creating similar contradictions ad lib. The theory of types provides a complete solution to these contradictions as detailed in Principia Mathematica,[27] and more briefly in articles by the author in the American Journal [Pg 136] of Mathematics,[28] and in the Revue de Metaphysique et de Morale.[29] For now, an outline of the solution will suffice.

[27]Vol. I., Introduction, chap. II., * 12 and * 20; vol. II., Prefatory Statement

[28]"Mathematical Logic as based on the Theory of Types," vol. XXX., 1908, pp. 222-262.

[28]"Mathematical Logic based on the Theory of Types," vol. XXX., 1908, pp. 222-262.

[29]"Les paradoxes de la logique," 1906, pp. 627-650.

[29]"The Paradoxes of Logic," 1906, pp. 627-650.

The fallacy consists in the formation of what we may call "impure" classes, i.e. classes which are not pure as to "type." As we shall see in a later chapter, classes are logical fictions, and a statement which appears to be about a class will only be significant if it is capable of translation into a form in which no mention is made of the class. This places a limitation upon the ways in which what are nominally, though not really, names for classes can occur significantly: a sentence or set of symbols in which such pseudo-names occur in wrong ways is not false, but strictly devoid of meaning. The supposition that a class is, or that it is not, a member of itself is meaningless in just this way. And more generally, to suppose that one class of individuals is a member, or is not a member, of another class of individuals will be to suppose nonsense; and to construct symbolically any class whose members are not all of the same grade in the logical hierarchy is to use symbols in a way which makes them no longer symbolise anything.

The fallacy lies in creating what we can call "impure" classes, meaning classes that aren't pure in terms of "type." As we will explore in a later chapter, classes are logical constructs, and a statement that seems to be about a class will only have meaning if it can be rephrased in a way that doesn’t reference the class itself. This limits how what are nominally, but not actually, names for classes can be used meaningfully: a sentence or set of symbols that incorrectly uses such pseudo-names isn’t false, but is completely devoid of meaning. The idea that a class is, or isn't, a member of itself is meaningless in this sense. More broadly, assuming that one class of individuals is a member or isn’t a member of another class of individuals leads to nonsense; and creating a class symbolically whose members don’t all belong to the same level in the logical hierarchy means using symbols in a way that makes them no longer represent anything.

Thus if there are $n$ individuals in the world, and $2^{n}$ classes of individuals, we cannot form a new class, consisting of both individuals and classes and having $n + 2^{n}$ members. In this way the attempt to escape from the need for the axiom of infinity breaks down. I do not pretend to have explained the doctrine of types, or done more than indicate, in rough outline, why there is need of such a doctrine. I have aimed only at saying just so much as was required in order to show that we cannot prove the existence of infinite numbers and classes by such conjurer's methods as we have been examining. There remain, however, certain other possible methods which must be considered.

Thus, if there are $n$ individuals in the world, and $2^{n}$ classes of individuals, we can't form a new class that includes both individuals and classes and has $n + 2^{n}$ members. In this way, the attempt to avoid the necessity of the axiom of infinity fails. I don’t claim to have fully explained the doctrine of types or more than just sketched out why we need such a doctrine. I’ve only aimed to explain just enough to show that we cannot prove the existence of infinite numbers and classes through the tricks we've been looking at. However, there are still certain other potential methods that must be discussed.

Various arguments professing to prove the existence of infinite classes are given in the Principles of Mathematics, § 339 (p. 357). [Pg 137] In so far as these arguments assume that, if $n$ is an inductive cardinal, $n$ is not equal to $n + 1$ , they have been already dealt with. There is an argument, suggested by a passage in Plato's Parmenides, to the effect that, if there is such a number as 1, then 1 has being; but 1 is not identical with being, and therefore 1 and being are two, and therefore there is such a number as 2, and 2 together with 1 and being gives a class of three terms, and so on. This argument is fallacious, partly because "being" is not a term having any definite meaning, and still more because, if a definite meaning were invented for it, it would be found that numbers do not have being—they are, in fact, what are called "logical fictions," as we shall see when we come to consider the definition of classes.

Various arguments claiming to prove the existence of infinite classes are presented in the Principles of Mathematics, § 339 (p. 357). [Pg 137] As far as these arguments assume that, if $n$ is an inductive cardinal, then $n$ is not equal to $n + 1$ , they have already been addressed. There is an argument, suggested by a passage in Plato's Parmenides, stating that if there is a number 1, then 1 exists; however, 1 is not the same as existence, and therefore 1 and existence are distinct, which implies that the number 2 exists as well. Hence, 2 along with 1 and existence creates a class of three terms, and so forth. This argument is flawed, partly because "existence" does not have a clear definition, and even more so because, if a specific meaning were created for it, it would be evident that numbers do not exist—they are essentially what are known as "logical fictions," as we will discuss when we examine the definition of classes.

The argument that the number of numbers from 0 to $n$ (both inclusive) is $n + 1$ depends upon the assumption that up to and including $n$ no number is equal to its successor, which, as we have seen, will not be always true if the axiom of infinity is false. It must be understood that the equation $n = n + 1$ , which might be true for a finite $n$ if $n$ exceeded the total number of individuals in the world, is quite different from the same equation as applied to a reflexive number. As applied to a reflexive number, it means that, given a class of $n$ terms, this class is "similar" to that obtained by adding another term. But as applied to a number which is too great for the actual world, it merely means that there is no class of $n$ individuals, and no class of $n + 1$ individuals; it does not mean that, if we mount the hierarchy of types sufficiently far to secure the existence of a class of $n$ terms, we shall then find this class "similar" to one of $n + 1$ terms, for if $n$ is inductive this will not be the case, quite independently of the truth or falsehood of the axiom of infinity.

The argument that the total count of numbers from 0 to $n$ (including both ends) is $n + 1$ relies on the idea that up to and including $n$ no number is the same as its successor. As we've noted, this won’t always hold true if the axiom of infinity isn’t valid. It’s important to recognize that the equation $n = n + 1$ could be true for a finite $n$ if $n$ exceeded the entire population of individuals in the world. This is quite different when applied to a reflexive number. For a reflexive number, it means that with a set of $n$ terms, this set is "similar" to one generated by adding another term. However, when a number is too large for the actual world, it just indicates there isn’t a set of $n$ individuals or of $n + 1$ individuals. It does not imply that if we elevate the hierarchy of types enough to achieve the existence of a set of $n$ terms, we will then find this set "similar" to one of $n + 1$ terms. This is not the case if $n$ is inductive, regardless of whether the axiom of infinity is true or false.

There is an argument employed by both Bolzano [30] and Dedekind [31] to prove the existence of reflexive classes. The argument, in brief, is this: An object is not identical with the idea of the [Pg 138] object, but there is (at least in the realm of being) an idea of any object. The relation of an object to the idea of it is one-one, and ideas are only some among objects. Hence the relation "idea of" constitutes a reflexion of the whole class of objects into a part of itself, namely, into that part which consists of ideas. Accordingly, the class of objects and the class of ideas are both infinite. This argument is interesting, not only on its own account, but because the mistakes in it (or what I judge to be mistakes) are of a kind which it is instructive to note. The main error consists in assuming that there is an idea of every object. It is, of course, exceedingly difficult to decide what is meant by an "idea"; but let us assume that we know. We are then to suppose that, starting (say) with Socrates, there is the idea of Socrates, and so on ad inf. Now it is plain that this is not the case in the sense that all these ideas have actual empirical existence in people's minds. Beyond the third or fourth stage they become mythical. If the argument is to be upheld, the "ideas" intended must be Platonic ideas laid up in heaven, for certainly they are not on earth. But then it at once becomes doubtful whether there are such ideas. If we are to know that there are, it must be on the basis of some logical theory, proving that it is necessary to a thing that there should be an idea of it. We certainly cannot obtain this result empirically, or apply it, as Dedekind does, to "meine Gedankenwelt"—the world of my thoughts.

There’s an argument used by both Bolzano [30] and Dedekind [31] to demonstrate the existence of reflexive classes. The gist of the argument is this: An object is not the same as the idea of that object, but there is (at least in existence) an idea of any object. The relationship between an object and its idea is one-to-one, and ideas are just some of the many objects. Therefore, the relationship "idea of" reflects the entire class of objects into a part of itself, specifically, the part that consists of ideas. Consequently, both the class of objects and the class of ideas are infinite. This argument is intriguing, not only for its own sake but also because the mistakes within it (or what I perceive to be mistakes) are worth noting. The main error lies in assuming that there is an idea for every object. It’s certainly very challenging to clarify what is meant by an "idea," but let’s assume we have that figured out. We should then consider that, starting (for example) with Socrates, there is the idea of Socrates, and so on ad inf. Clearly, this isn't true in the sense that all these ideas actually exist in people's minds. Beyond the third or fourth mention, they become mythical. If the argument is to hold, the "ideas" being referred to must be Platonic ideas existing somewhere beyond, as they certainly aren’t found on Earth. But then it becomes questionable whether such ideas exist at all. For us to know that they do, it must be based on some logical theory that proves it’s necessary for something to have an idea of it. We definitely can’t arrive at this conclusion empirically, nor can we apply it, as Dedekind does, to "meine Gedankenwelt"—the world of my thoughts.

[30]Bolzano, Paradoxien des Unendlichen, 13.

__A_TAG_PLACEHOLDER_0__Bolzano, Paradox of the Infinite, 13.

[31]Dedekind, Was sind und was sollen die Zahlen? No. 66.

[31]Dedekind, What Are Numbers and What Should They Be? No. 66.

If we were concerned to examine fully the relation of idea and object, we should have to enter upon a number of psychological and logical inquiries, which are not relevant to our main purpose. But a few further points should be noted. If "idea" is to be understood logically, it may be identical with the object, or it may stand for a description (in the sense to be explained in a subsequent chapter). In the former case the argument fails, because it was essential to the proof of reflexiveness that object and idea should be distinct. In the second case the argument also fails, because the relation of object and description is not [Pg 139] one-one: there are innumerable correct descriptions of any given object. Socrates (e.g.) may be described as "the master of Plato," or as "the philosopher who drank the hemlock," or as "the husband of Xantippe." If—to take up the remaining hypothesis—"idea" is to be interpreted psychologically, it must be maintained that there is not any one definite psychological entity which could be called the idea of the object: there are innumerable beliefs and attitudes, each of which could be called an idea of the object in the sense in which we might say "my idea of Socrates is quite different from yours," but there is not any central entity (except Socrates himself) to bind together various "ideas of Socrates," and thus there is not any such one-one relation of idea and object as the argument supposes. Nor, of course, as we have already noted, is it true psychologically that there are ideas (in however extended a sense) of more than a tiny proportion of the things in the world. For all these reasons, the above argument in favour of the logical existence of reflexive classes must be rejected.

If we were really going to look closely at the relationship between ideas and objects, we would need to delve into various psychological and logical discussions that aren't relevant to our main focus. However, a few additional points should be noted. If "idea" is to be understood in a logical sense, it could either be identical to the object or represent a description (as will be explained in a later chapter). In the first scenario, the argument doesn't hold because it was crucial for the proof of reflexiveness that the object and idea be distinct. In the second scenario, the argument also fails because the relationship between the object and its description isn't one-to-one: there are countless accurate descriptions for any given object. For example, Socrates can be referred to as "the master of Plato," "the philosopher who drank the hemlock," or "the husband of Xantippe." If we consider the remaining hypothesis—interpreting "idea" psychologically—we must assert that there isn't a single psychological entity that can be called the idea of the object. Instead, there are many beliefs and attitudes, each of which could be referred to as an idea of the object in the way that we might say, "my idea of Socrates is quite different from yours." However, there isn't any central entity (other than Socrates himself) that links the various "ideas of Socrates," and thus, there isn't a one-to-one relationship between idea and object as the argument suggests. Furthermore, as we've already noted, it isn't psychologically accurate to say there are ideas (in any broad sense) for more than a small fraction of things in the world. For all these reasons, the argument for the logical existence of reflexive classes must be dismissed.

It might be thought that, whatever may be said of logical arguments, the empirical arguments derivable from space and time, the diversity of colours, etc., are quite sufficient to prove the actual existence of an infinite number of particulars. I do not believe this. We have no reason except prejudice for believing in the infinite extent of space and time, at any rate in the sense in which space and time are physical facts, not mathematical fictions. We naturally regard space and time as continuous, or, at least, as compact; but this again is mainly prejudice. The theory of "quanta" in physics, whether true or false, illustrates the fact that physics can never afford proof of continuity, though it might quite possibly afford disproof. The senses are not sufficiently exact to distinguish between continuous motion and rapid discrete succession, as anyone may discover in a cinema. A world in which all motion consisted of a series of small finite jerks would be empirically indistinguishable from one in which motion was continuous. It would take up too much space to [Pg 140] defend these theses adequately; for the present I am merely suggesting them for the reader's consideration. If they are valid, it follows that there is no empirical reason for believing the number of particulars in the world to be infinite, and that there never can be; also that there is at present no empirical reason to believe the number to be finite, though it is theoretically conceivable that some day there might be evidence pointing, though not conclusively, in that direction.

It might be assumed that, regardless of what is said about logical arguments, the empirical arguments drawn from space and time, the variety of colors, etc., are quite enough to prove the actual existence of an infinite number of particulars. I don't believe that. We have no reason beyond bias to believe in the infinite nature of space and time, at least in the sense that space and time are physical realities, not mathematical concepts. We naturally see space and time as continuous, or at least as compact; but this too is mostly bias. The theory of "quanta" in physics, whether it's true or false, demonstrates that physics can never provide proof of continuity, although it might very well disprove it. Our senses are not precise enough to tell the difference between continuous motion and a rapid series of discrete movements, as anyone can see in a movie theater. A world where all motion was made up of a series of small, finite jolts would be empirically indistinguishable from one where motion was continuous. It would require too much space to defend these ideas thoroughly; for now, I am just putting them out there for the reader to think about. If they are valid, this means there is no empirical reason to believe that the number of particulars in the world is infinite, nor can there ever be; it also suggests that there is currently no empirical reason to believe the number is finite, although it is theoretically possible that one day there might be evidence, though not conclusive, pointing in that direction.

From the fact that the infinite is not self-contradictory, but is also not demonstrable logically, we must conclude that nothing can be known a priori as to whether the number of things in the world is finite or infinite. The conclusion is, therefore, to adopt a Leibnizian phraseology, that some of the possible worlds are finite, some infinite, and we have no means of knowing to which of these two kinds our actual world belongs. The axiom of infinity will be true in some possible worlds and false in others; whether it is true or false in this world, we cannot tell.

Since the infinite isn't self-contradictory but also can't be logically proven, we have to conclude that we can't know anything a priori about whether the number of things in the world is finite or infinite. Therefore, in Leibniz's terms, some possible worlds are finite, some are infinite, and we have no way of knowing which category our actual world falls into. The axiom of infinity will be true in some possible worlds and false in others; whether it's true or false in this world, we can't say.

Throughout this chapter the synonyms "individual" and "particular" have been used without explanation. It would be impossible to explain them adequately without a longer disquisition on the theory of types than would be appropriate to the present work, but a few words before we leave this topic may do something to diminish the obscurity which would otherwise envelop the meaning of these words.

Throughout this chapter, the terms "individual" and "particular" have been used without explanation. It would be impossible to explain them properly without a longer discussion on the theory of types than this work allows, but a few words before we move on may help clarify the meaning of these words.

In an ordinary statement we can distinguish a verb, expressing an attribute or relation, from the substantives which express the subject of the attribute or the terms of the relation. "Cæsar lived" ascribes an attribute to Cæsar; "Brutus killed Cæsar" expresses a relation between Brutus and Cæsar. Using the word "subject" in a generalised sense, we may call both Brutus and Cæsar subjects of this proposition: the fact that Brutus is grammatically subject and Cæsar object is logically irrelevant, since the same occurrence may be expressed in the words "Cæsar was killed by Brutus," where Cæsar is the grammatical subject. [Pg 141] Thus in the simpler sort of proposition we shall have an attribute or relation holding of or between one, two or more "subjects" in the extended sense. (A relation may have more than two terms: e.g. " $\mathrm A$ gives $\mathrm B$ to $\mathrm C$ " is a relation of three terms.) Now it often happens that, on a closer scrutiny, the apparent subjects are found to be not really subjects, but to be capable of analysis; the only result of this, however, is that new subjects take their places. It also happens that the verb may grammatically be made subject: e.g. we may say, "Killing is a relation which holds between Brutus and Cæsar." But in such cases the grammar is misleading, and in a straightforward statement, following the rules that should guide philosophical grammar, Brutus and Cæsar will appear as the subjects and killing as the verb.

In a regular statement, we can identify a verb that expresses an attribute or relation, separate from the nouns that represent the subject of that attribute or the terms of the relation. For example, "Cæsar lived" assigns an attribute to Cæsar; "Brutus killed Cæsar" shows a relation between Brutus and Cæsar. Using the term "subject" in a broader sense, we can refer to both Brutus and Cæsar as subjects of this statement. The fact that Brutus is the grammatical subject and Cæsar is the object doesn’t change the logic, as we could say "Cæsar was killed by Brutus," where Cæsar becomes the grammatical subject. [Pg 141] So, in simpler statements, we have an attribute or relation involving one, two, or more "subjects" in the broad sense. (A relation can involve more than two terms: e.g. " $\mathrm A$ gives $\mathrm B$ to $\mathrm C$ " is a relation of three terms.) Often, upon closer examination, the apparent subjects are found not to truly be subjects and can be analyzed further; however, this merely leads to new subjects taking their place. It can also happen that the verb may grammatically become the subject: e.g. we might say, "Killing is a relation that exists between Brutus and Cæsar." But in these cases, the grammar can be misleading, and in a clear statement that follows the principles of philosophical grammar, Brutus and Cæsar will be recognized as the subjects and killing as the verb.

We are thus led to the conception of terms which, when they occur in propositions, can only occur as subjects, and never in any other way. This is part of the old scholastic definition of substance; but persistence through time, which belonged to that notion, forms no part of the notion with which we are concerned. We shall define "proper names" as those terms which can only occur as subjects in propositions (using "subject" in the extended sense just explained). We shall further define "individuals" or "particulars" as the objects that can be named by proper names. (It would be better to define them directly, rather than by means of the kind of symbols by which they are symbolised; but in order to do that we should have to plunge deeper into metaphysics than is desirable here.) It is, of course, possible that there is an endless regress: that whatever appears as a particular is really, on closer scrutiny, a class or some kind of complex. If this be the case, the axiom of infinity must of course be true. But if it be not the case, it must be theoretically possible for analysis to reach ultimate subjects, and it is these that give the meaning of "particulars" or "individuals." It is to the number of these that the axiom of infinity is assumed to apply. If it is true of them, it is true [Pg 142] of classes of them, and classes of classes of them, and so on; similarly if it is false of them, it is false throughout this hierarchy. Hence it is natural to enunciate the axiom concerning them rather than concerning any other stage in the hierarchy. But whether the axiom is true or false, there seems no known method of discovering. [Pg 143]

We are led to the idea of terms that can only be used as subjects in propositions and never in any other way. This is part of the old scholastic definition of substance; however, the aspect of persistence through time, which belonged to that concept, does not pertain to the notion we’re discussing. We will define "proper names" as those terms that can only function as subjects in propositions (using "subject" in the broader sense just explained). Additionally, we will define "individuals" or "particulars" as the objects that can be identified by proper names. (It would be better to define them directly rather than through the symbols that represent them, but to do that we would need to delve deeper into metaphysics than is appropriate here.) It is possible that there is an endless regress: that what seems to be a particular is actually, upon closer examination, a class or some form of complex. If this is the case, the axiom of infinity must be true. But if it isn’t the case, then it should theoretically be possible for analysis to reach ultimate subjects, and these are what define "particulars" or "individuals." The axiom of infinity is assumed to apply to these. If it is true for them, it is true for classes of them, and classes of classes of them, and so on; similarly, if it is false for them, it is false throughout this hierarchy. Therefore, it makes sense to state the axiom regarding them rather than any other level in the hierarchy. However, whether the axiom is true or false, there seems to be no known method for discovering this. [Pg 142] [Pg 143]

CHAPTER XIV

INCOMPATIBILITY AND THE THEORY OF DEDUCTION

WE have now explored, somewhat hastily it is true, that part of the philosophy of mathematics which does not demand a critical examination of the idea of class. In the preceding chapter, however, we found ourselves confronted by problems which make such an examination imperative. Before we can undertake it, we must consider certain other parts of the philosophy of mathematics, which we have hitherto ignored. In a synthetic treatment, the parts which we shall now be concerned with come first: they are more fundamental than anything that we have discussed hitherto. Three topics will concern us before we reach the theory of classes, namely: (1) the theory of deduction, (2) propositional functions, (3) descriptions. Of these, the third is not logically presupposed in the theory of classes, but it is a simpler example of the kind of theory that is needed in dealing with classes. It is the first topic, the theory of deduction, that will concern us in the present chapter.

WE have now explored, somewhat quickly, that part of the philosophy of mathematics that doesn't require a critical examination of the idea of class. In the previous chapter, however, we faced problems that make such an examination necessary. Before we can dive into it, we need to look at certain other aspects of the philosophy of mathematics that we've overlooked so far. In a comprehensive approach, the areas we will focus on first are more fundamental than anything we've discussed up to this point. Three topics will occupy our attention before we tackle the theory of classes: (1) the theory of deduction, (2) propositional functions, and (3) descriptions. Of these, the third doesn't logically depend on the theory of classes, but it's a simpler example of the kind of theory needed when dealing with classes. It is the first topic, the theory of deduction, that we will explore in this chapter.

Mathematics is a deductive science: starting from certain premisses, it arrives, by a strict process of deduction, at the various theorems which constitute it. It is true that, in the past, mathematical deductions were often greatly lacking in rigour; it is true also that perfect rigour is a scarcely attainable ideal. Nevertheless, in so far as rigour is lacking in a mathematical proof, the proof is defective; it is no defence to urge that common sense shows the result to be correct, for if we were to rely upon that, it would be better to dispense with argument altogether, [Pg 144] rather than bring fallacy to the rescue of common sense. No appeal to common sense, or "intuition," or anything except strict deductive logic, ought to be needed in mathematics after the premisses have been laid down.

Mathematics is a deductive science: starting from certain premises, it arrives at the various theorems that make it up through a strict process of deduction. It's true that in the past, mathematical deductions often lacked rigor; it's also true that achieving perfect rigor is an ideal that's hard to reach. However, if a mathematical proof lacks rigor, it is defective; it's not a valid excuse to say that common sense suggests the result is correct because if we rely on that, we might as well skip the argument altogether rather than use fallacy to support common sense. After the premises are established, there should be no need to appeal to common sense, "intuition," or anything other than strict deductive logic in mathematics. [Pg 144]

Kant, having observed that the geometers of his day could not prove their theorems by unaided argument, but required an appeal to the figure, invented a theory of mathematical reasoning according to which the inference is never strictly logical, but always requires the support of what is called "intuition." The whole trend of modern mathematics, with its increased pursuit of rigour, has been against this Kantian theory. The things in the mathematics of Kant's day which cannot be proved, cannot be known—for example, the axiom of parallels. What can be known, in mathematics and by mathematical methods, is what can be deduced from pure logic. What else is to belong to human knowledge must be ascertained otherwise—empirically, through the senses or through experience in some form, but not a priori. The positive grounds for this thesis are to be found in Principia Mathematica, passim; a controversial defence of it is given in the Principles of Mathematics. We cannot here do more than refer the reader to those works, since the subject is too vast for hasty treatment. Meanwhile, we shall assume that all mathematics is deductive, and proceed to inquire as to what is involved in deduction.

Kant observed that the mathematicians of his time couldn’t prove their theorems solely through argument; they needed to refer to diagrams for support. He developed a theory of mathematical reasoning that suggested inference isn’t strictly logical but always relies on what he called "intuition." The movement in modern mathematics, which seeks greater rigor, has largely opposed this Kantian view. In Kant's era, things in mathematics that couldn't be proved also couldn't be known—like the axiom of parallels. What can be known in mathematics and through mathematical methods is what can be derived from pure logic. Anything else that falls under human knowledge must be determined through other means—empirically, using the senses or experiences, but not a priori. Solid arguments for this view can be found in Principia Mathematica, passim; a heated defense can be found in Principles of Mathematics. Here, we can only point the reader to those works since the topic is too extensive for a quick overview. For now, we will assume all mathematics is deductive and explore what deduction entails.

In deduction, we have one or more propositions called premisses, from which we infer a proposition called the conclusion. For our purposes, it will be convenient, when there are originally several premisses, to amalgamate them into a single proposition, so as to be able to speak of the premiss as well as of the conclusion. Thus we may regard deduction as a process by which we pass from knowledge of a certain proposition, the premiss, to knowledge of a certain other proposition, the conclusion. But we shall not regard such a process as logical deduction unless it is correct, i.e. unless there is such a relation between premiss and conclusion that we have a right to believe the conclusion [Pg 145] if we know the premiss to be true. It is this relation that is chiefly of interest in the logical theory of deduction.

In deduction, we have one or more statements called premises, from which we draw a statement called the conclusion. For our purposes, it will be more convenient, when there are multiple premises, to combine them into a single statement, so we can refer to the premise as well as the conclusion. Thus, we can see deduction as a process where we move from knowing a specific statement, the premise, to knowing another specific statement, the conclusion. However, we will not consider such a process as logical deduction unless it is correct, meaning there is a relationship between the premise and conclusion that justifies believing the conclusion if we know the premise to be true. This relationship is what primarily interests us in the logical theory of deduction. [Pg 145]

In order to be able validly to infer the truth of a proposition, we must know that some other proposition is true, and that there is between the two a relation of the sort called "implication," i.e. that (as we say) the premiss "implies" the conclusion. (We shall define this relation shortly.) Or we may know that a certain other proposition is false, and that there is a relation between the two of the sort called "disjunction," expressed by " $p$ or $q$ ,"[32] so that the knowledge that the one is false allows us to infer that the other is true. Again, what we wish to infer may be the falsehood of some proposition, not its truth. This may be inferred from the truth of another proposition, provided we know that the two are "incompatible," i.e. that if one is true, the other is false. It may also be inferred from the falsehood of another proposition, in just the same circumstances in which the truth of the other might have been inferred from the truth of the one; i.e. from the falsehood of $p$ we may infer the falsehood of $q$ , when $q$ implies $p$ . All these four are cases of inference. When our minds are fixed upon inference, it seems natural to take "implication" as the primitive fundamental relation, since this is the relation which must hold between $p$ and $q$ if we are to be able to infer the truth of $q$ from the truth of $p$ . But for technical reasons this is not the best primitive idea to choose. Before proceeding to primitive ideas and definitions, let us consider further the various functions of propositions suggested by the above-mentioned relations of propositions.

To validly conclude that a statement is true, we need to know that another statement is true and that there is a relationship between the two called "implication," which means that the premise "implies" the conclusion. (We'll define this relationship shortly.) Alternatively, we can know that a certain other statement is false, and that there’s a relationship called "disjunction," expressed as " $p$ or $q$ ,"[32] allowing us to conclude that if one is false, the other must be true. Furthermore, what we want to conclude might be the falsehood of a statement rather than its truth. This can be concluded from the truth of another statement, provided we know that the two are "incompatible," meaning if one is true, the other is false. It can also be inferred from the falsehood of another statement, just like the truth of one could have been inferred from the truth of the other; that is, from the falsehood of $p$ we could infer the falsehood of $q$ , when $q$ implies $p$ . All four of these are forms of inference. When we focus on inference, it makes sense to view "implication" as the basic fundamental relationship, since this relationship must exist between $p$ and $q$ if we are to infer the truth of $q$ from the truth of $p$ . However, for technical reasons, this may not be the best foundational idea to use. Before we delve into foundational ideas and definitions, let's further explore the various functions of propositions suggested by the relationships mentioned above.

[32]We shall use the letters $p$ , $q$ , $r$ , $s$ , $t$ to denote variable propositions.

[32]We will use the letters $p$ , $q$ , $r$ , $s$ , $t$ to represent variable propositions.

The simplest of such functions is the negative, "not- $p$ ." This is that function of $p$ which is true when $p$ is false, and false when $p$ is true. It is convenient to speak of the truth of a proposition, or its falsehood, as its "truth-value"[33]; i.e. truth is the "truth-value" of a true proposition, and falsehood of a false one. Thus not $p$ has the opposite truth-value to $p$ . [Pg 146]

The simplest of these functions is the negative, "not- $p$ ." This function of $p$ is true when $p$ is false, and false when $p$ is true. It’s useful to talk about the truth of a statement, or its falsehood, as its "truth-value"[33]; i.e. truth is the "truth-value" of a true statement, and falsehood is that of a false one. So, not $p$ has the opposite truth-value of $p$ . [Pg 146]

[33]This term is due to Frege.

This term comes from Frege.

We may take next disjunction, " $p$ or $q$ ." This is a function whose truth-value is truth when $p$ is true and also when $q$ is true, but is falsehood when both $p$ and $q$ are false.

We can move on to the next disjunction, " $p$ or $q$ ." This is a function that is true when $p$ is true or when $q$ is true, but it is false when both $p$ and $q$ are false.

Next we may take conjunction, " $p$ and $q$ " This has truth for its truth-value when $p$ and $q$ are both true; otherwise it has falsehood for its truth-value.

Next, we can look at the conjunction, " $p$ and $q$ " This is true when both $p$ and $q$ are true; otherwise, it is false.

Take next incompatibility, i.e. " $p$ and $q$ are not both true." This is the negation of conjunction; it is also the disjunction of the negations of $p$ and $q$ , i.e. it is "not- $p$ or not- $q$ ." Its truth-value is truth when $p$ is false and likewise when $q$ is false; its truth-value is falsehood when $p$ and $q$ are both true.

Take the next incompatibility, i.e. " $p$ and $q$ cannot both be true." This represents the negation of conjunction; it’s also the disjunction of the negations of $p$ and $q$ , i.e. it is "not- $p$ or not- $q$ ." Its truth value is true when $p$ is false, and also when $q$ is false; its truth value is false when both $p$ and $q$ are true.

Last take implication, i.e. " $p$ implies $q$ ," or "if $p$ , then $q$ ." This is to be understood in the widest sense that will allow us to infer the truth of $q$ if we know the truth of $p$ . Thus we interpret it as meaning: "Unless $p$ is false, $q$ is true," or "either $p$ is false or $q$ is true." (The fact that "implies" is capable of other meanings does not concern us; this is the meaning which is convenient for us.) That is to say, " $p$ implies $q$ " is to mean "not- $p$ or $q$ ": its truth-value is to be truth if $p$ is false, likewise if $q$ is true, and is to be falsehood if $p$ is true and $q$ is false.

Last take implication, i.e. " $p$ implies $q$ ," or "if $p$ , then $q$ ." This should be understood in the broadest sense that allows us to determine the truth of $q$ if we know that $p$ is true. Thus we interpret it as meaning: "Unless $p$ is false, $q$ is true," or "either $p$ is false or $q$ is true." (The fact that "implies" can have other meanings doesn’t concern us; this is the meaning that works for our purposes.) In other words, " $p$ implies $p$ or $q$

We have thus five functions: negation, disjunction, conjunction, incompatibility, and implication. We might have added others, for example, joint falsehood, "not- $p$ and not- $q$ ," but the above five will suffice. Negation differs from the other four in being a function of one proposition, whereas the others are functions of two. But all five agree in this, that their truth-value depends only upon that of the propositions which are their arguments. Given the truth or falsehood of $p$ , or of $p$ and $q$ (as the case may be), we are given the truth or falsehood of the negation, disjunction, conjunction, incompatibility, or implication. A function of propositions which has this property is called a "truth-function."

We have five functions: negation, disjunction, conjunction, incompatibility, and implication. We could have added more, like joint falsehood, "not- $p$ and not- $q$ ," but the five listed will be enough. Negation is different from the other four because it’s based on one proposition, while the others are based on two. However, all five share this: their truth value only depends on the truth values of the propositions they take as arguments. Given the truth or falsehood of $p$ , or of $p$ and $q$ (as applicable), we can determine the truth or falsehood of negation, disjunction, conjunction, incompatibility, or implication. A function of propositions with this characteristic is called a "truth-function."

The whole meaning of a truth-function is exhausted by the statement of the circumstances under which it is true or false. "Not- $p$ ," for example, is simply that function of $p$ which is true when $p$ is false, and false when $p$ is true: there is no further [Pg 147] meaning to be assigned to it. The same applies to " $p$ or $q$ " and the rest. It follows that two truth-functions which have the same truth-value for all values of the argument are indistinguishable. For example, " $p$ and $q$ " is the negation of "not- $p$ or not- $q$ " and vice versa; thus either of these may be defined as the negation of the other. There is no further meaning in a truth-function over and above the conditions under which it is true or false.

The entire meaning of a truth-function is based on the situations in which it is true or false. For example, "Not- $p$ ,” is simply the function of $p$ that is true when $p$ is false, and false when $p$ is true: there’s no additional meaning to it. The same goes for " $p$ or $p$ and $q$ " is the negation of "not- $p$ or not- $q$ vice versa; thus either of these can be defined as the negation of the other. There is no added meaning in a truth-function beyond the conditions under which it is true or false.

It is clear that the above five truth-functions are not all independent. We can define some of them in terms of others. There is no great difficulty in reducing the number to two; the two chosen in Principia Mathematica are negation and disjunction. Implication is then defined as "not- $p$ or $q$ "; incompatibility as "not- $p$ or not- $q$ "; conjunction as the negation of incompatibility. But it has been shown by Sheffer [34] that we can be content with one primitive idea for all five, and by Nicod [35] that this enables us to reduce the primitive propositions required in the theory of deduction to two non-formal principles and one formal one. For this purpose, we may take as our one indefinable either incompatibility or joint falsehood. We will choose the former.

It's clear that the five truth-functions mentioned above aren’t entirely independent. We can define some of them using others. It’s not too difficult to narrow the number down to two; the ones selected in Principia Mathematica are negation and disjunction. Implication is then defined as "not- $p$ or $q$ "; incompatibility is defined as "not- $p$ or not- $q$ "; and conjunction is defined as the negation of incompatibility. However, Sheffer [34] has shown that we can be satisfied with one basic concept for all five, and Nicod [35] demonstrated that this allows us to reduce the basic propositions needed in the theory of deduction to two non-formal principles and one formal principle. For this purpose, we can consider either incompatibility or joint falsehood as our one indefinable concept. We'll select the former.

[34]Trans. Am. Math. Soc., vol. XIV. pp. 481-488.

[34]Trans. Am. Math. Soc., vol. 14, pp. 481-488.

[35]Proc. Camb. Phil. Soc., vol. XIX., i., January 1917.

[35]Proc. Camb. Phil. Soc., vol. 19, no. 1, January 1917.

Our primitive idea, now, is a certain truth-function called "incompatibility," which we will denote by $p/q$ . Negation can be at once defined as the incompatibility of a proposition with itself, i.e. "not- $p$ " is defined as " $p/p$ ." Disjunction is the incompatibility of not- $p$ and not- $q$ , i.e. it is $(p/p) | (q/q)$ . Implication is the incompatibility of $p$ and not- $q$ , i.e. $p | (q/q)$ . Conjunction is the negation of incompatibility, i.e. it is $(p/q) | (p/q)$ . Thus all our four other functions are defined in terms of incompatibility.

Our basic concept now is a certain truth-function called "incompatibility," which we'll represent as $p/q$ . Negation can be defined as the incompatibility of a proposition with itself; that is, "not- $p$ " is defined as " $p/p$ ." Disjunction is the incompatibility of not- $p$ and not- $q$ , meaning it is $(p/p) | (q/q)$ . Implication is the incompatibility of $p$ and not- $q$ , which means $p | (q/q)$ . Conjunction is the negation of incompatibility, so it is $(p/q) | (p/q)$

It is obvious that there is no limit to the manufacture of truth-functions, either by introducing more arguments or by repeating arguments. What we are concerned with is the connection of this subject with inference. [Pg 148]

It’s clear that there’s no limit to creating truth-functions, whether by adding more arguments or by using the same arguments repeatedly. What we’re focused on is how this topic relates to inference. [Pg 148]

If we know that $p$ is true and that $p$ implies $q$ , we can proceed to assert $q$ . There is always unavoidably something psychological about inference: inference is a method by which we arrive at new knowledge, and what is not psychological about it is the relation which allows us to infer correctly; but the actual passage from the assertion of $p$ to the assertion of $q$ is a psychological process, and we must not seek to represent it in purely logical terms.

If we know that $p$ is true and that $p$ implies $q$ , we can confidently state $q$ . There is always inevitably something psychological about inference: inference is a way we gain new knowledge, and what isn’t psychological about it is the relationship that lets us infer correctly; however, the actual shift from stating $p$ to stating $q$ is a psychological process, and we shouldn’t try to express it solely in logical terms.

In mathematical practice, when we infer, we have always some expression containing variable propositions, say $p$ and $q$ , which is known, in virtue of its form, to be true for all values of $p$ and $q$ ; we have also some other expression, part of the former, which is also known to be true for all values of $p$ and $q$ ; and in virtue of the principles of inference, we are able to drop this part of our original expression, and assert what is left. This somewhat abstract account may be made clearer by a few examples.

In math, when we make inferences, we always deal with some expression that has variable propositions, such as $p$ and $q$ , which we know is true for all values of $p$ and $p$ and $q$

Let us assume that we know the five formal principles of deduction enumerated in Principia Mathematica. (M. Nicod has reduced these to one, but as it is a complicated proposition, we will begin with the five.) These five propositions are as follows:—

Let’s assume we know the five formal principles of deduction listed in Principia Mathematica. (M. Nicod has simplified these to one, but since it’s a complex statement, we’ll start with the five.) These five propositions are as follows:—

(1) " $p$ or $p$ " implies $p$ —i.e. if either $p$ is true or $p$ is true, then $p$ is true.

(1) " $p$ or $p$ " means $p$ —i.e. if either $p$ is true or $p$ is true, then $p$ is true.

(2) $q$ implies " $p$ or $q$ "—i.e. the disjunction " $p$ or $q$ " is true when one of its alternatives is true.

(2) $q$ implies " $p$ or $q$ "—i.e. the disjunction " $p$ or $q$

(3) " $p$ or $q$ " implies " $q$ or $p$ ." This would not be required if we had a theoretically more perfect notation, since in the conception of disjunction there is no order involved, so that " $p$ or $q$ " and " $q$ or $p$ " should be identical. But since our symbols, in any convenient form, inevitably introduce an order, we need suitable assumptions for showing that the order is irrelevant.

(3) " $p$ or $q$ " suggests " $q$ or $p$ ." This wouldn't be necessary if we had a more perfect notation, since disjunction doesn't involve any order. Therefore, " $p$ or $q$ " and " $q$ or $p$ " should be the same. But since our symbols, in any practical form, inevitably create an order, we need appropriate assumptions to demonstrate that the order doesn’t matter.

(4) If either $p$ is true or " $q$ or $r$ " is true, then either $q$ is true or " $p$ or $r$ " is true. (The twist in this proposition serves to increase its deductive power.) [Pg 149]

(5) If $q$ implies $r$ , then " $p$ or $q$ " implies " $p$ or $r$ ."

These are the formal principles of deduction employed in Principia Mathematica. A formal principle of deduction has a double use, and it is in order to make this clear that we have cited the above five propositions. It has a use as the premiss of an inference, and a use as establishing the fact that the premiss implies the conclusion. In the schema of an inference we have a proposition $p$ , and a proposition " $p$ implies $q$ ," from which we infer $q$ . Now when we are concerned with the principles of deduction, our apparatus of primitive propositions has to yield both the $p$ and the " $p$ implies $q$ " of our inferences. That is to say, our rules of deduction are to be used, not only as rules, which is their use for establishing " $p$ implies $q$ " but also as substantive premisses, i.e. as the $p$ of our schema. Suppose, for example, we wish to prove that if $p$ implies $q$ , then if $q$ implies $r$ it follows that $p$ implies $r$ . We have here a relation of three propositions which state implications. Put

These are the formal principles of deduction used in Principia Mathematica. A formal principle of deduction serves a dual purpose, and we have cited the five propositions above to clarify this. It functions as the premise of an inference, and it also establishes that the premise implies the conclusion. In the structure of an inference, we have a proposition $p$ , and a proposition " $p$ implies $q$ ," from which we infer $q$ . When we focus on the principles of deduction, our set of basic propositions must provide both the $p$ and the " $p$ implies $q$ only as rules for establishing " $p$ implies $q$ also as concrete premises, i.e. as the $p$

$p_{1} = p$ implies $q$ , $p_{2} = q$ implies $r$ , and $p_{3} = p$ implies $r$ .

Then we have to prove that $p_{1}$ implies that $p_{2}$ implies $p_{3}$ . Now take the fifth of our above principles, substitute not- $p$ for $p$ , and remember that "not- $p$ or $q$ " is by definition the same as " $p$ implies $q$ ." Thus our fifth principle yields:

Then we need to show that $p_{1}$ leads to $p_{2}$ leading to $p_{3}$ . Now take the fifth of our earlier principles, substitute not- $p$ for $p$ , and keep in mind that "not- $p$ or $q$ " is, by definition, the same as " $p$ leads to $q$ ." So, our fifth principle gives us:

"If $q$ implies $r$ , then ' $p$ implies $q$ ' implies ' $p$ implies $r$ ,'" i.e. " $p_{2}$ implies that $p_{1}$ implies $p_{3}$ ." Call this proposition $\mathrm A$ .

But the fourth of our principles, when we substitute not- $p$ , not- $q$ , for $p$ and $q$ , and remember the definition of implication, becomes:

But the fourth of our principles, when we replace not- $p$ , not- $q$ with $p$ and $q$ , and recall the definition of implication, becomes:

"If $p$ implies that $q$ implies $r$ , then $q$ implies that $p$ implies $r$ ."

Writing $p_{2}$ in place of $p$ , $p_{1}$ in place of $q$ , and $p_{3}$ in place of $r$ , this becomes:

Writing $p_{2}$ instead of $p$ , $p_{1}$ instead of $q$ , and $p_{3}$ instead of $r$ , this becomes:

"If $p_{2}$ implies that $p_{1}$ implies $p_{3}$ , then $p_{1}$ implies that $p_{2}$ implies $p_{3}$ ." Call this $\mathrm B$ .

[Pg 150]

Now we proved by means of our fifth principle that

Now we've shown through our fifth principle that

" $p_{2}$ implies that $p_{1}$ implies $p_{3}$ ," which was what we called $\mathrm A$ .

" $p_{2}$ indicates that $p_{1}$ indicates $p_{3}$ ," which we referred to as $\mathrm A$ .

Thus we have here an instance of the schema of inference, since $\mathrm A$ represents the $p$ of our scheme, and $\mathrm B$ represents the " $p$ implies $q$ ." Hence we arrive at $q$ , namely,

Thus we have an example of the inference scheme, since $\mathrm A$ represents the $p$ of our scheme, and $\mathrm B$ represents the " $p$ implies $q$ .

" $p_{1}$ implies that $p_{2}$ implies $p_{3}$ ,"

which was the proposition to be proved. In this proof, the adaptation of our fifth principle, which yields $\mathrm A$ , occurs as a substantive premiss; while the adaptation of our fourth principle, which yields $\mathrm B$ , is used to give the form of the inference. The formal and material employments of premisses in the theory of deduction are closely intertwined, and it is not very important to keep them separated, provided we realise that they are in theory distinct.

which was the proposition to be proved. In this proof, the adaptation of our fifth principle, which gives $\mathrm A$ , serves as a substantial premise; while the adaptation of our fourth principle, which produces $\mathrm B$ , is used to shape the form of the inference. The formal and material uses of premises in the theory of deduction are closely intertwined, and it's not very important to keep them separate, as long as we understand that they are theoretically distinct.

The earliest method of arriving at new results from a premiss is one which is illustrated in the above deduction, but which itself can hardly be called deduction. The primitive propositions, whatever they may be, are to be regarded as asserted for all possible values of the variable propositions $p$ , $q$ , $r$ which occur in them. We may therefore substitute for (say) $p$ any expression whose value is always a proposition, e.g. not- $p$ , " $s$ implies $t$ ," and so on. By means of such substitutions we really obtain sets of special cases of our original proposition, but from a practical point of view we obtain what are virtually new propositions. The legitimacy of substitutions of this kind has to be insured by means of a non-formal principle of inference.[36]

The earliest way to get new results from a premise is shown in the deduction above, but it’s not really a deduction itself. The basic statements, whatever they may be, should be seen as applicable to all possible values of the variable statements $p$ , $q$ , $r$ that appear in them. Therefore, we can replace (for example) $p$ with any expression that always results in a proposition, such as not- $p$ , " $s$ implies $t$ , and so forth. By using these substitutions, we actually generate specific cases of our original statement, but from a practical viewpoint, we create what are effectively new propositions. The validity of these types of substitutions needs to be ensured through a non-formal principle of inference.[36]

[36]No such principle is enunciated in Principia Mathematica, or in M. Nicod's article mentioned above. But this would seem to be an omission.

[36]There isn't any principle stated in Principia Mathematica, or in M. Nicod's article mentioned earlier. However, this seems like a gap.

We may now state the one formal principle of inference to which M. Nicod has reduced the five given above. For this purpose we will first show how certain truth-functions can be defined in terms of incompatibility. We saw already that

We can now outline the single formal principle of inference that M. Nicod has condensed from the five mentioned above. To do this, we will first demonstrate how certain truth-functions can be defined in relation to incompatibility. We have already seen that

$p | (q/q)$ means " $p$ implies $q$ ." [Pg 151]

$p | (q/q)$ means " $p$ means $q$ ." [Pg 151]

We now observe that

We now see that

$p | (q/r)$ means " $p$ implies both $q$ and $r$ ."

For this expression means " $p$ is incompatible with the incompatibility of $q$ and $r$ ," i.e. " $p$ implies that $q$ and $r$ are not incompatible," i.e. " $p$ implies that $q$ and $r$ are both true"—for, as we saw, the conjunction of $q$ and $r$ is the negation of their incompatibility.

For this expression means " $p$ is not compatible with the fact that $q$ and $r$ ," i.e. " $p$ implies that $q$ and $r$ are not incompatible," i.e. " $p$ implies that $q$ and $r$ are both true"—for, as we saw, the conjunction of $q$ and $r$ is the negation of their incompatibility.

Observe next that $t | (t/t)$ means " $t$ implies itself." This is a particular case of $p | (q/q)$ .

Observe next that $t | (t/t)$ means " $t$ implies itself." This is a specific example of $p | (q/q)$ .

Let us write $\overline{p}$ for the negation of $p$ ; thus $\overline{p/s}$ will mean the negation of $p/s$ , i.e. it will mean the conjunction of $p$ and $s$ . It follows that $(s/q) | \overline{p/s}$ expresses the incompatibility of $s/q$ with the conjunction of $p$ and $s$ ; in other words, it states that if $p$ and $s$ are both true, $s/q$ is false, i.e. $s$ and $q$ are both true; in still simpler words, it states that $p$ and $s$ jointly imply $s$ and $q$ jointly.

Let’s represent $\overline{p}$ as the negation of $p$ ; so $\overline{p/s}$ will signify the negation of $p/s$ , meaning it will denote the conjunction of $p$ and $s$ . Consequently, $(s/q) | \overline{p/s}$ expresses the incompatibility of $s/q$ with the conjunction of $p$ and $s$ ; in simpler terms, it asserts that if $p$ and $s$ are both true, then $s/q$ is false, which means $s$ and $q$ are both true; to put it even more simply, it states that $p$ and $s$ together imply $s$ and $q$ together.

Now, put $\begin{align*} P &= p | (q/r), \\ \pi &= t | (t/t), \\ Q &= (s/q) | \overline{p/s}. \end{align*}$ Then M. Nicod's sole formal principle of deduction is $P | \pi/Q,$ in other words, $\mathrm P$ implies both $\pi$ and $\mathrm Q$ .

Now, put $\begin{align*} P &= p | (q/r), \\ \pi &= t | (t/t), \\ Q &= (s/q) | \overline{p/s}. \end{align*}$ Then M. Nicod's only formal principle of deduction is $P | \pi/Q,$ in other words, $\mathrm P$ implies both $\pi$ and $\mathrm Q$ .

He employs in addition one non-formal principle belonging to the theory of types (which need not concern us), and one corresponding to the principle that, given $p$ , and given that $p$ implies $q$ , we can assert $q$ . This principle is:

He also uses one informal principle from the theory of types (which we don't need to worry about) and one that aligns with the principle that if $p$ is true, and if $p$ implies $q$ , we can conclude that $q$ . This principle is:

"If $p | (r/q)$ is true, and $p$ is true, then $q$ is true."

From this apparatus the whole theory of deduction follows, except in so far as we are concerned with deduction from or to the existence or the universal truth of "propositional functions," which we shall consider in the next chapter.

From this setup, the entire theory of deduction follows, except for when we're dealing with deduction from or to the existence or the universal truth of "propositional functions," which we will discuss in the next chapter.

There is? if I am not mistaken, a certain confusion in the [Pg 152] minds of some authors as to the relation, between propositions, in virtue of which an inference is valid. In order that it may be valid to infer $q$ from $p$ , it is only necessary that $p$ should be true and that the proposition "not- $p$ or $q$ " should be true. Whenever this is the case, it is clear that $q$ must be true. But inference will only in fact take place when the proposition "not- $p$ or $q$ " is known otherwise than through knowledge of not- $p$ or knowledge of $q$ . Whenever $p$ is false, "not- $p$ or $q$ " is true, but is useless for inference, which requires that $p$ should be true. Whenever $q$ is already known to be true, "not- $p$ or $q$ " is of course also known to be true, but is again useless for inference, since $q$ is already known, and therefore does not need to be inferred. In fact, inference only arises when "not- $p$ or $q$ " can be known without our knowing already which of the two alternatives it is that makes the disjunction true. Now, the circumstances under which this occurs are those in which certain relations of form exist between $p$ and $q$ . For example, we know that if $r$ implies the negation of $s$ , then $s$ implies the negation of $r$ . Between " $r$ implies not- $s$ " and " $s$ implies not- $r$ " there is a formal relation which enables us to know that the first implies the second, without having first to know that the first is false or to know that the second is true. It is under such circumstances that the relation of implication is practically useful for drawing inferences.

There seems to be, if I'm not mistaken, some confusion among certain authors about the relationship between propositions that makes an inference valid. In order for it to be valid to infer $q$ from $p$ , it is only necessary for $p$ to be true and for the proposition "not- $p$ or $q$ " to be true. Whenever this is the case, it's clear that $q$ must be true. However, inference will only actually happen when the proposition "not- $p$ or $q$ known by means other than knowing not- $p$ or knowing $q$ . Whenever $p$ is false, "not- $p$ or $p$ to be true. Whenever $q$ is already known to be true, "not- $p$ or $q$ is already known, and thus doesn’t need to be inferred. In fact, inference only occurs when "not- $p$ or $p$ and $r$ implies the negation of $s$ , then $s$ implies the negation of $r$ . There is a formal relationship between " $r$ implies not- $s$ implies not- $r$ know that the first statement implies the second, without needing to know that the first statement is false or the second is true beforehand. It’s under these circumstances that the relationship of implication is practically useful for making inferences.

But this formal relation is only required in order that we may be able to know that either the premiss is false or the conclusion is true. It is the truth of "not- $p$ or $q$ " that is required for the validity of the inference; what is required further is only required for the practical feasibility of the inference. Professor C. I. Lewis [37] has especially studied the narrower, formal relation which we may call "formal deducibility." He urges that the wider relation, that expressed by "not- $p$ or $q$ " should not be called "implication." That is, however, a matter of words. [Pg 153] Provided our use of words is consistent, it matters little how we define them. The essential point of difference between the theory which I advocate and the theory advocated by Professor Lewis is this: He maintains that, when one proposition $q$ is "formally deducible" from another $p$ , the relation which we perceive between them is one which he calls "strict implication," which is not the relation expressed by "not- $p$ or $q$ " but a narrower relation, holding only when there are certain formal connections between $p$ and $q$ . I maintain that, whether or not there be such a relation as he speaks of, it is in any case one that mathematics does not need, and therefore one that, on general grounds of economy, ought not to be admitted into our apparatus of fundamental notions; that, whenever the relation of "formal deducibility" holds between two propositions, it is the case that we can see that either the first is false or the second true, and that nothing beyond this fact is necessary to be admitted into our premisses; and that, finally, the reasons of detail which Professor Lewis adduces against the view which I advocate can all be met in detail, and depend for their plausibility upon a covert and unconscious assumption of the point of view which I reject. I conclude, therefore, that there is no need to admit as a fundamental notion any form of implication not expressible as a truth-function.

But this formal relationship is only necessary so that we can know that either the premise is false or the conclusion is true. It's the truth of "not- $p$ or $q$ " that is needed for the validity of the inference; what is needed beyond that is only for the practical feasibility of the inference. Professor C. I. Lewis [37] has particularly studied the narrower, formal relationship that we can call "formal deducibility." He argues that the broader relationship represented by "not- $p$ or $q$ " shouldn't be labeled "implication." However, that's just a matter of terminology. [Pg 153] As long as we are consistent in our use of terms, it doesn't matter much how we define them. The key difference between the theory I support and Professor Lewis's theory is this: He asserts that when one proposition $q$ is "formally deducible" from another $p$ , the relationship perceived between them is what he calls "strict implication," which is not the relationship expressed by "not- $p$ or $p$ and $q$ . I argue that whether or not such a relationship exists, it is one that mathematics does not require, and therefore, for the sake of simplicity, we shouldn't include it in our basic concepts; that whenever the relationship of "formal deducibility" exists between two propositions, we can see that either the first is false or the second one is true, and that nothing more is necessary to be included in our premises; and finally, the detailed reasons that Professor Lewis presents against the view I support can all be addressed in detail and rely on a covert and unconscious assumption of the perspective I reject. Therefore, I conclude that there's no need to accept any notion of implication that can't be expressed as a truth-function.

[37]See Mind, vol. XXI., 1912, pp. 522-531; and vol. XXIII., 1914, pp. 240-247.

[37]See Mind, vol. XXI, 1912, pp. 522-531; and vol. XXIII, 1914, pp. 240-247.

[Pg 154]

CHAPTER XV

PROPOSITIONAL FUNCTIONS

WHEN, in the preceding chapter, we were discussing propositions, we did not attempt to give a definition of the word "proposition." But although the word cannot be formally defined, it is necessary to say something as to its meaning, in order to avoid the very common confusion with "propositional functions," which are to be the topic of the present chapter.

WHEN, in the previous chapter, we talked about propositions, we didn’t try to define the term “proposition.” However, since it can’t be formally defined, we need to clarify its meaning to avoid the common confusion with “propositional functions,” which will be the focus of this chapter.

We mean by a "proposition" primarily a form of words which expresses what is either true or false. I say "primarily," because I do not wish to exclude other than verbal symbols, or even mere thoughts if they have a symbolic character. But I think the word "proposition" should be limited to what may, in some sense, be called "symbols," and further to such symbols as give expression to truth and falsehood. Thus "two and two are four" and "two and two are five" will be propositions, and so will "Socrates is a man" and "Socrates is not a man." The statement: "Whatever numbers $a$ and $b$ may be, $(a + b)^{2} = a^{2} + 2ab + b^{2}$ " is a proposition; but the bare formula " $(a + b)^{2} = a^{2} + 2ab + b^{2}$ " alone is not, since it asserts nothing definite unless we are further told, or led to suppose, that $a$ and $b$ are to have all possible values, or are to have such-and-such values. The former of these is tacitly assumed, as a rule, in the enunciation of mathematical formulæ, which thus become propositions; but if no such assumption were made, they would be "propositional functions." A "propositional function," in fact, is an expression containing one or more undetermined constituents, [Pg 155] such that, when values are assigned to these constituents, the expression becomes a proposition. In other words, it is a function whose values are propositions. But this latter definition must be used with caution. A descriptive function, e.g. "the hardest proposition in $\mathrm A$ 's mathematical treatise," will not be a propositional function, although its values are propositions. But in such a case the propositions are only described: in a propositional function, the values must actually enunciate propositions.

We refer to a "proposition" primarily as a set of words that expresses something that can be either true or false. I say "primarily" because I don't want to exclude symbols other than words, or even thoughts if they convey a symbolic meaning. But I believe the term "proposition" should be limited to what can, in some way, be considered "symbols," specifically those symbols that express truth and falsehood. For example, "two and two are four" and "two and two are five" are propositions, as well as "Socrates is a man" and "Socrates is not a man." The statement: "Whatever numbers $a$ and $b$ may be, $(a + b)^{2} = a^{2} + 2ab + b^{2}$ " is a proposition; however, the mere formula " $(a + b)^{2} = a^{2} + 2ab + b^{2}$ " by itself is not, since it doesn’t make any specific assertion unless we are further informed or led to assume that $a$ and $b$ can take all possible values, or specific values. The first scenario is generally assumed when stating mathematical formulas, which then qualify as propositions; however, if no such assumption is made, they would be considered "propositional functions." A "propositional function" is, in fact, an expression that contains one or more undefined elements, [Pg 155] which, when values are assigned to these elements, becomes a proposition. In other words, it's a function whose outputs are propositions. But this definition requires some caution. A descriptive function, for example, "the hardest proposition in $\mathrm A$ 's mathematical treatise," is not a propositional function, even though its outputs are propositions. In that case, the propositions are merely described; in a propositional function, the outputs must actually state propositions.

Examples of propositional functions are easy to give: " $x$ is human" is a propositional function; so long as $x$ remains undetermined, it is neither true nor false, but when a value is assigned to $x$ it becomes a true or false proposition. Any mathematical equation is a propositional function. So long as the variables have no definite value, the equation is merely an expression awaiting determination in order to become a true or false proposition. If it is an equation containing one variable, it becomes true when the variable is made equal to a root of the equation, otherwise it becomes false; but if it is an "identity" it will be true when the variable is any number. The equation to a curve in a plane or to a surface in space is a propositional function, true for values of the co-ordinates belonging to points on the curve or surface, false for other values. Expressions of traditional logic such as "all $\mathrm A$ is $\mathrm B$ " are propositional functions: $\mathrm A$ and $\mathrm B$ have to be determined as definite classes before such expressions become true or false.

Examples of propositional functions are easy to provide: " $x$ is human" is a propositional function; as long as $x$ remains undefined, it is neither true nor false, but when a value is assigned to $x$ it becomes a true or false proposition. Any mathematical equation is a propositional function. As long as the variables have no specific value, the equation is just an expression waiting to be determined in order to become a true or false proposition. If it is an equation with one variable, it becomes true when the variable equals a root of the equation; otherwise, it is false. If it is an "identity," it will be true for any number the variable takes. The equation for a curve in a plane or a surface in space is a propositional function, true for values of the coordinates that belong to points on the curve or surface and false for other values. Statements in traditional logic such as "all $\mathrm A$ is $\mathrm B$ " are also propositional functions: $\mathrm A$ and $\mathrm B$ need to be defined as specific classes before such statements become true or false.

The notion of "cases" or "instances" depends upon propositional functions. Consider, for example, the kind of process suggested by what is called "generalisation," and let us take some very primitive example, say, "lightning is followed by thunder." We have a number of "instances" of this, i.e. a number of propositions such as: "this is a flash of lightning and is followed by thunder." What are these occurrences "instances" of? They are instances of the propositional function: "If $x$ is a flash of lightning, $x$ is followed by thunder." The process of generalisation (with whose validity we are fortunately [Pg 156] not concerned) consists in passing from a number of such instances to the universal truth of the propositional function: "If $x$ is a flash of lightning, $x$ is followed by thunder." It will be found that, in an analogous way, propositional functions are always involved whenever we talk of instances or cases or examples.

The idea of "cases" or "instances" is based on propositional functions. For example, think about the process known as "generalization," and let's use a very basic example, like, "lightning is followed by thunder." We have several "instances" of this, such as: "this is a flash of lightning and it is followed by thunder." What are these occurrences "instances" of? They are instances of the propositional function: "If $x$ is a flash of lightning, $x$ is followed by thunder." The process of generalization (whose validity we are not concerned with) involves moving from several such instances to the universal truth of the propositional function: "If $x$ is a flash of lightning, $x$ is followed by thunder." Similarly, we can see that propositional functions are always involved whenever we discuss instances, cases, or examples.

We do not need to ask, or attempt to answer, the question: "What is a propositional function?" A propositional function standing all alone may be taken to be a mere schema, a mere shell, an empty receptacle for meaning, not something already significant. We are concerned with propositional functions, broadly speaking, in two ways: first, as involved in the notions "true in all cases" and "true in some cases"; secondly, as involved in the theory of classes and relations. The second of these topics we will postpone to a later chapter; the first must occupy us now.

We don’t need to ask or try to answer the question: “What is a propositional function?” A propositional function on its own can be seen as just a framework, an empty container for meaning, rather than something that already holds significance. We're looking at propositional functions in two main ways: first, in terms of the ideas “true in all cases” and “true in some cases”; second, in relation to the theory of classes and relations. We'll cover the second topic in a later chapter; for now, we need to focus on the first.

When we say that something is "always true" or "true in all cases," it is clear that the "something" involved cannot be a proposition. A proposition is just true or false, and there is an end of the matter. There are no instances or cases of "Socrates is a man" or "Napoleon died at St Helena." These are propositions, and it would be meaningless to speak of their being true "in all cases." This phrase is only applicable to propositional functions. Take, for example, the sort of thing that is often said when causation is being discussed. (We are net concerned with the truth or falsehood of what is said, but only with its logical analysis.) We are told that $\mathrm A$ is, in every instance, followed by $\mathrm B$ . Now if there are "instances" of $\mathrm A$ , $\mathrm A$ must be some general concept of which it is significant to say " $x_{1}$ is $\mathrm A$ ," " $x_{2}$ is $\mathrm A$ ," " $x_{3}$ is $\mathrm A$ ," and so on, where $x_{1}$ , $x_{2}$ , $x_{3}$ are particulars which are not identical one with another. This applies, e.g., to our previous case of lightning. We say that lightning ( $\mathrm A$ ) is followed by thunder ( $\mathrm B$ ). But the separate flashes are particulars, not identical, but sharing the common property of being lightning. The only way of expressing a [Pg 157] common property generally is to say that a common property of a number of objects is a propositional function which becomes true when any one of these objects is taken as the value of the variable. In this case all the objects are "instances" of the truth of the propositional function—for a propositional function, though it cannot itself be true or false, is true in certain instances and false in certain others, unless it is "always true" or "always false." When, to return to our example, we say that $\mathrm A$ is in every instance followed by $\mathrm B$ , we mean that, whatever $x$ may be, if $x$ is an $\mathrm A$ , it is followed by a $\mathrm B$ ; that is, we are asserting that a certain propositional function is "always true."

When we say that something is "always true" or "true in all cases," it's clear that the "something" involved can't be a straightforward statement. A statement is simply either true or false, and that's the end of it. There are no examples or cases of "Socrates is a man" or "Napoleon died at St Helena." These are statements, and it wouldn't make sense to talk about them being true "in all cases." This phrase only applies to propositional functions. For instance, look at what is often mentioned when discussing causation. (We aren't focused on whether what is said is true or false; we're only interested in its logical analysis.) We're told that $\mathrm A$ is, in every instance, followed by $\mathrm B$ . Now, if there are "instances" of $\mathrm A$ , $\mathrm A$ must be some general concept where it's meaningful to say " $x_{1}$ is $x_{2}$ is $x_{3}$ is $x_{1}$ , $x_{2}$ , $x_{3}$ are specific examples that are not identical to one another. This applies, for example, to our earlier case of lightning. We say that lightning ( $\mathrm A$ ) is followed by thunder ( $\mathrm B$ ). But the individual flashes are specific examples; they’re not identical, but they share the common trait of being lightning. The only way to express a common property generally is to say that a common property of several objects is a propositional function that becomes true when any of these objects is taken as the value of the variable. In this instance, all the objects are "instances" of the truth of the propositional function—because a propositional function, while it can't itself be true or false, is true in some cases and false in others, unless it is "always true" or "always false." When, going back to our example, we say that $\mathrm A$ is in every instance followed by $\mathrm B$ , we mean that, regardless of what $x$ may be, if $x$ is an $\mathrm A$ , it is followed by a $\mathrm B$ ; that is, we're asserting that a certain propositional function is "always true."

Sentences involving such words as "all," "every," "a," "the," "some" require propositional functions for their interpretation. The way in which propositional functions occur can be explained by means of two of the above words, namely, "all" and "some."

Sentences that use words like "all," "every," "a," "the," and "some" need propositional functions to be understood. The way propositional functions appear can be clarified using two of those words: "all" and "some."

There are, in the last analysis, only two things that can be done with a propositional function: one is to assert that it is true in all cases, the other to assert that it is true in at least one case, or in some cases (as we shall say, assuming that there is to be no necessary implication of a plurality of cases). All the other uses of propositional functions can be reduced to these two. When we say that a propositional function is true "in all cases," or "always" (as we shall also say, without any temporal suggestion), we mean that all its values are true. If " $\phi x$ " is the function, and $a$ is the right sort of object to be an argument to " $\phi x$ ," then $\phi a$ is to be true, however $a$ may have been chosen. For example, "if $a$ is human, $a$ is mortal" is true whether $a$ is human or not; in fact, every proposition of this form is true. Thus the propositional function "if $x$ is human, $x$ is mortal" is "always true," or "true in all cases." Or, again, the statement "there are no unicorns" is the same as the statement "the propositional function ' $x$ is not a unicorn' is true in all cases." The assertions in the preceding chapter about propositions, e.g. "' $p$ or $q$ ' implies ' $q$ or $p$ ,'" are really assertions [Pg 158] that certain propositional functions are true in all cases. We do not assert the above principle, for example, as being true only of this or that particular $p$ or $q$ , but as being true of any $p$ or $q$ concerning which it can be made significantly. The condition that a function is to be significant for a given argument is the same as the condition that it shall have a value for that argument, either true or false. The study of the conditions of significance belongs to the doctrine of types, which we shall not pursue beyond the sketch given in the preceding chapter.

There are ultimately only two things you can do with a propositional function: one is to claim that it is true in all cases, and the other is to claim that it is true in at least one case, or in some cases (we will say, assuming there's no necessary implication of multiple cases). All other uses of propositional functions can be reduced to these two. When we say a propositional function is true "in all cases," or "always" (as we will also say, without any time implication), we mean that all its values are true. If " $\phi x$ " is the function, and $a$ is the appropriate argument for " $\phi x$ ," then $\phi a$ must be true, regardless of how $a$ was chosen. For example, "if $a$ is human, $a$ is mortal" is true whether $a$ is human or not; in fact, every proposition of this form is true. So the propositional function "if $x$ is human, $x$ is mortal" is "always true" or "true in all cases." Similarly, the statement "there are no unicorns" is the same as saying that the propositional function ' $x$ is not a unicorn' is true in all cases." The assertions in the previous chapter about propositions, e.g. "' $p$ or $q$ ' implies ' $q$ or $p$ ,'" are really claims [Pg 158] that certain propositional functions are true in all cases. We do not claim the above principle, for example, as being true only of this or that specific $p$ or $q$ , but as being true of any $p$ or $q$ that can be significantly discussed. The requirement that a function must be significant for a given argument is the same as the requirement that it must have a value for that argument, either true or false. The study of the conditions for significance belongs to the doctrine of types, which we will not explore beyond the brief overview given in the previous chapter.

Not only the principles of deduction, but all the primitive propositions of logic, consist of assertions that certain propositional functions are always true. If this were not the case, they would have to mention particular things or concepts—Socrates, or redness, or east and west, or what not,—and clearly it is not the province of logic to make assertions which are true concerning one such thing or concept but not concerning another. It is part of the definition of logic (but not the whole of its definition) that all its propositions are completely general, i.e. they all consist of the assertion that some propositional function containing no constant terms is always true. We shall return in our final chapter to the discussion of propositional functions containing no constant terms. For the present we will proceed to the other thing that is to be done with a propositional function, namely, the assertion that it is "sometimes true," i.e. true in at least one instance.

Not only do the principles of deduction, but also all the basic propositions of logic, consist of claims that certain propositional functions are always true. If this weren’t the case, they would have to refer to specific things or concepts—like Socrates, or the color red, or directions like east and west, or whatever else—and it’s clear that it’s not the role of logic to make claims that are true for one specific thing or concept but not for another. It’s part of the definition of logic (though not all of it) that all its propositions are entirely general, i.e. they all assert that some propositional function without constant terms is always true. We’ll revisit the discussion of propositional functions without constant terms in our final chapter. For now, we’ll move on to the other aspect of propositional functions, which is the claim that they can be "sometimes true," i.e. true in at least one instance.

When we say "there are men," that means that the propositional function " $x$ is a man" is sometimes true. When we say "some men are Greeks," that means that the propositional function " $x$ is a man and a Greek" is sometimes true. When we say "cannibals still exist in Africa," that means that the propositional function " $x$ is a cannibal now in Africa" is sometimes true, i.e. is true for some values of $x$ . To say "there are at least $n$ individuals in the world" is to say that the propositional function " $\alpha$ is a class of individuals and a member of the cardinal number $n$ " is sometimes true, or, as we may say, is true for certain [Pg 159] values of $\alpha$ . This form of expression is more convenient when it is necessary to indicate which is the variable constituent which we are taking as the argument to our propositional function. For example, the above propositional function, which we may shorten to " $\alpha$ is a class of $n$ individuals," contains two variables, $\alpha$ and $n$ . The axiom of infinity, in the language of propositional functions, is: "The propositional function 'if $n$ is an inductive number, it is true for some values of $\alpha$ that $\alpha$ is a class of $n$ individuals' is true for all possible values of $n$ ." Here there is a subordinate function, " $\alpha$ is a class of $n$ individuals," which is said to be, in respect of $\alpha$ , sometimes true; and the assertion that this happens if $n$ is an inductive number is said to be, in respect of $n$ , always true.

When we say "there are men," it means that the statement " $x$ is a man" is sometimes true. When we say "some men are Greeks," it means that the statement " $x$ is a man and a Greek" is sometimes true. When we say "cannibals still exist in Africa," it means that the statement " $x$ is a cannibal now in Africa" is sometimes true, i.e. it is true for some values of $x$ . To say "there are at least $n$ individuals in the world" means that the statement " $\alpha$ is a class of individuals and a member of the cardinal number $n$

The statement that a function $\phi x$ is always true is the negation of the statement that not- $\phi x$ is sometimes true, and the statement that $\phi x$ is sometimes true is the negation of the statement that not- $\phi x$ is always true. Thus the statement "all men are mortals" is the negation of the statement that the function " $x$ is an immortal man" is sometimes true. And the statement "there are unicorns" is the negation of the statement that the function " $x$ is not a unicorn" is always true.[38] We say that $\phi x$ is "never true" or "always false" if not- $\phi x$ is always true. We can, if we choose, take one of the pair "always," "sometimes" as a primitive idea, and define the other by means of the one and negation. Thus if we choose "sometimes" as our primitive idea, we can define: "' $\phi x$ is always true' is to mean 'it is false that not- $\phi x$ is sometimes true.'"[39] But for reasons connected with the theory of types it seems more correct to take both "always" and "sometimes" as primitive ideas, and define by their means the negation of propositions in which they occur. That is to say, assuming that we have already [Pg 160] defined (or adopted as a primitive idea) the negation of propositions of the type to which $x$ belongs, we define: "The negation of ' $\phi x$ always' is 'not- $\phi x$ sometimes'; and the negation of ' $\phi x$ sometimes' is 'not- $\phi x$ always.'" In like manner we can re-define disjunction and the other truth-functions, as applied to propositions containing apparent variables, in terms of the definitions and primitive ideas for propositions containing no apparent variables. Propositions containing no apparent variables are called "elementary propositions." From these we can mount up step by step, using such methods as have just been indicated, to the theory of truth-functions as applied to propositions containing one, two, three, ... variables, or any number up to $n$ , where $n$ is any assigned finite number.

The claim that a function $\phi x$ is always true contradicts the claim that not- $\phi x$ is sometimes true. Similarly, the claim that $\phi x$ is sometimes true contradicts the claim that not- $\phi x$ is always true. Therefore, the statement "all men are mortals" is the negation of the assertion that the function " $x$ is an immortal man" is sometimes true. Likewise, the statement "there are unicorns" negates the claim that the function " $x$ is not a unicorn" is always true.[38] We say that $\phi x$ is "never true" or "always false" if not- $\phi x$ is always true. We can choose either "always" or "sometimes" as a basic concept and define the other in relation to it and its negation. So, if we consider "sometimes" as our basic concept, we can define: "' $\phi x$ is always true' means 'it's false that not- $\phi x$ is sometimes true.'" [39] However, due to considerations related to type theory, it seems more accurate to treat both "always" and "sometimes" as basic concepts and define the negation of propositions that include them. In other words, assuming we've already defined (or adopted as a basic concept) the negation of propositions of the type to which $x$ belongs, we define: "The negation of ' $\phi x$ always' is 'not- $\phi x$ sometimes;' and the negation of ' $\phi x$ sometimes' is 'not- $\phi x$ always.'" Likewise, we can redefine disjunction and other truth-functions related to propositions with apparent variables based on the definitions and basic concepts for propositions without apparent variables. Propositions that contain no apparent variables are called "elementary propositions." From these, we can systematically build up, using methods as described, to the theory of truth-functions applied to propositions with one, two, three, ... variables, or any number up to $n$ , where $n$ is any assigned finite number.

[38]The method of deduction is given in Principia Mathematica, vol. I. * 9.

[38]The method of deduction is explained in Principia Mathematica, vol. I. * 9.

[39]For linguistic reasons, to avoid suggesting either the plural or the singular, it is often convenient to say " $\phi x$ is not always false" rather than " $\phi x$ sometimes" or " $\phi x$ is sometimes true."

[39]For linguistic reasons, to avoid implying either the plural or the singular, it’s often easier to say " $\phi x$ is not always false" instead of " $\phi x$ sometimes" or " $\phi x$ is sometimes true."

The forms which are taken as simplest in traditional formal logic are really far from being so, and all involve the assertion of all values or some values of a compound propositional function. Take, to begin with, "all $\mathrm S$ is $\mathrm P$ ." We will take it that $\mathrm S$ is defined by a propositional function $\phi x$ , and $\mathrm P$ by a propositional function $\psi x$ . E.g., if $\mathrm S$ is men, $\phi x$ will be " $x$ is human"; if $\mathrm P$ is mortals, $\psi x$ will be "there is a time at which $x$ dies." Then "all $\mathrm S$ is $\mathrm P$ " means: "' $\phi x$ implies $\psi x$ ' is always true." It is to be observed that "all $\mathrm S$ is $\mathrm P$ " does not apply only to those terms that actually are $\mathrm S$ 's; it says something equally about terms which are not $\mathrm S$ 's. Suppose we come across an $x$ of which we do not know whether it is an $\mathrm S$ or not; still, our statement "all $\mathrm S$ is $\mathrm P$ " tells us something about $x$ , namely, that if $x$ is an $\mathrm S$ , then $x$ is a $\mathrm P$ . And this is every bit as true when $x$ is not an $\mathrm S$ as when $x$ is an $\mathrm S$ . If it were not equally true in both cases, the reductio ad absurdum would not be a valid method; for the essence of this method consists in using implications in cases where (as it afterwards turns out) the hypothesis is false. We may put the matter another way. In order to understand "all $\mathrm S$ is $\mathrm P$ ," it is not necessary to be able to enumerate what terms are $\mathrm S$ 's; provided we know what is meant by being an $\mathrm S$ and what by being a $\mathrm P$ , we can understand completely what is actually affirmed [Pg 161] by "all $\mathrm S$ is $\mathrm P$ ," however little we may know of actual instances of either. This shows that it is not merely the actual terms that are $\mathrm S$ 's that are relevant in the statement "all $\mathrm S$ is $\mathrm P$ ," but all the terms concerning which the supposition that they are $\mathrm S$ 's is significant, i.e. all the terms that are $\mathrm S$ 's, together with all the terms that are not $\mathrm S$ 's—i.e. the whole of the appropriate logical "type." What applies to statements about all applies also to statements about some. "There are men," e.g., means that " $x$ is human" is true for some values of $x$ . Here all values of $x$ (i.e. all values for which " $x$ is human" is significant, whether true or false) are relevant, and not only those that in fact are human. (This becomes obvious if we consider how we could prove such a statement to be false.) Every assertion about "all" or "some" thus involves not only the arguments that make a certain function true, but all that make it significant, i.e. all for which it has a value at all, whether true or false.

The forms that are considered the simplest in traditional formal logic are actually not that simple at all, and all involve stating all values or some values of a combined propositional function. Let's start with "all $\mathrm S$ is $\mathrm P$ ." We assume that $\mathrm S$ is defined by a propositional function $\phi x$ , and $\mathrm P$ by a propositional function $\psi x$ . For example, if $\mathrm S$ is men, then $\phi x$ will be " $x$ is human"; if $\mathrm P$ is mortals, then $\psi x$ will be "there is a time when $x$ dies." Therefore, "all $\mathrm S$ is $\phi x$ implies $\psi x$ ' is always true." It's important to note that "all $\mathrm S$ is $\mathrm S$ ; it equally includes terms that are not $\mathrm S$ . Suppose we encounter an $x$ whose status as $\mathrm S$ is unclear; still, our statement "all $\mathrm S$ is $x$ , which is that if $x$ is an $\mathrm S$ , then $x$ is a $x$ is not an $\mathrm S$ or when it is. If it weren't equally true in both scenarios, the reductio ad absurdum would not be a valid method; the essence of this method is based on using implications in cases where (as it later turns out) the assumption is false. We can explain this differently. To comprehend "all $\mathrm S$ is $\mathrm S$ ; as long as we know what it means to be an $\mathrm S$ and what it means to be a $\mathrm P$ [Pg 161] by "all $\mathrm S$ that matter in the statement "all $\mathrm S$ is significant, meaning all terms that are $\mathrm S$ , along with all terms that are not $\mathrm S$ —that is, the entire relevant logical "type." What applies to statements about all also applies to statements about some. For instance, "There are men" means that " $x$ is human" is true for some values of $x$ . Here, all values of $x$ (meaning all values for which " $x$ is human" is relevant, whether true or false) are relevant, not just those that actually are human. (This is clear when we consider how we could disprove such a statement.) Every claim about "all" or "some" therefore includes not only the arguments that make a certain function true but also everything that makes it significant, meaning all for which it has a value at all, true or false.

We may now proceed with our interpretation of the traditional forms of the old-fashioned formal logic. We assume that $\mathrm S$ is those terms $x$ for which $\phi x$ is true, and $\mathrm P$ is those for which $\psi x$ is true. (As we shall see in a later chapter, all classes are derived in this way from propositional functions.) Then:

We can now continue with our explanation of the traditional forms of classic formal logic. We assume that $\mathrm S$ refers to those terms $x$ for which $\phi x$ is true, and $\mathrm P$ refers to those for which $\psi x$ is true. (As we will discuss in a later chapter, all classes are derived this way from propositional functions.) Then:

"All $\mathrm S$ is $\mathrm P$ " means "' $\phi x$ implies $\psi x$ ' is always true."

"All $\mathrm S$ is $\mathrm P$ means ' $\phi x$ implies $\psi x$ ' is always true."

"Some $\mathrm S$ is $\mathrm P$ " means "' $\phi x$ and $\psi x$ ' is sometimes true."

"No $\mathrm S$ is $\mathrm P$ " means "' $\phi x$ implies not- $\psi x$ ' is always true."

"No $\mathrm S$ is $\mathrm P$ means "' $\phi x$ implies not- $\psi x$ ' is always true."

"Some $\mathrm S$ is not $\mathrm P$ " means "' $\phi x$ and not- $\psi x$ ' is sometimes true."

It will be observed that the propositional functions which are here asserted for all or some values are not $\phi x$ and $\psi x$ themselves, but truth-functions of $\phi x$ and $\psi x$ for the same argument $x$ . The easiest way to conceive of the sort of thing that is intended is to start not from $\phi x$ and $\psi x$ in general, but from $\phi a$ and $\psi a$ , where $a$ is some constant. Suppose we are considering "all men are mortal": we will begin with

It’s important to note that the propositional functions being discussed for all or some values are not $\phi x$ and $\psi x$ themselves, but rather truth-functions of $\phi x$ and $\psi x$ for the same argument $x$ . The simplest way to grasp the concept intended is to begin not with $\phi x$ and $\phi a$ and $\psi a$ , where $a$ is some constant. Let’s say we are considering "all men are mortal": we will begin with

"If Socrates is human, Socrates is mortal,"

"If Socrates is human, then Socrates is mortal,"

[Pg 162]

and then we will regard "Socrates" as replaced by a variable $x$ wherever "Socrates" occurs. The object to be secured is that, although $x$ remains a variable, without any definite value, yet it is to have the same value in " $\phi x$ " as in " $\psi x$ " when we are asserting that " $\phi x$ implies $\psi x$ " is always true. This requires that we shall start with a function whose values are such as " $\phi a$ implies $\psi a$ ," rather than with two separate functions $\phi x$ and $\psi x$ ; for if we start with two separate functions we can never secure that the $x$ , while remaining undetermined, shall have the same value in both.

and then we will treat "Socrates" as replaced by a variable $x$ wherever "Socrates" appears. The goal is that, although $x$ remains a variable without any specific value, it should have the same value in " $\phi x$ " as in " $\psi x$ " when we are claiming that " $\phi x$ implies $\psi x$ " is always true. This requires us to start with a function whose values satisfy " $\phi a$ implies $\phi x$ and $\psi x$ ; because if we start with two separate functions, we can never ensure that the $x$ , while remaining undefined, will have the same value in both.

For brevity we say " $\phi x$ always implies $\psi x$ " when we mean that " $\phi x$ implies $\psi x$ " is always true. Propositions of the form " $\phi x$ always implies $\psi x$ " are called "formal implications"; this name is given equally if there are several variables.

For simplicity, we say " $\phi x$ always implies $\psi x$ " when we mean that " $\phi x$ implies $\psi x$ " is always true. Statements of the form " $\phi x$ always implies $\psi x$ " are called "formal implications"; this term applies equally if there are multiple variables.

The above definitions show how far removed from the simplest forms are such propositions as "all $\mathrm S$ is $\mathrm P$ ," with which traditional logic begins. It is typical of the lack of analysis involved that traditional logic treats "all $\mathrm S$ is $\mathrm P$ " as a proposition of the same form as " $x$ is $\mathrm P$ "—e.g., it treats "all men are mortal" as of the same form as "Socrates is mortal." As we have just seen, the first is of the form " $\phi x$ always implies $\psi x$ ," while the second is of the form " $\psi x$ ." The emphatic separation of these two forms, which was effected by Peano and Frege, was a very vital advance in symbolic logic.

The definitions above illustrate how different complex statements are from the simplest ones, like "all $\mathrm S$ is $\mathrm S$ is $\mathrm P$

It will be seen that "all $\mathrm S$ is $\mathrm P$ " and "no $\mathrm S$ is $\mathrm P$ " do not really differ in form, except by the substitution of not- $\psi x$ for $\psi x$ , and that the same applies to "some $\mathrm S$ is $\mathrm P$ " and "some $\mathrm S$ is not $\mathrm P$ ." It should also be observed that the traditional rules of conversion are faulty, if we adopt the view, which is the only technically tolerable one, that such propositions as "all $\mathrm S$ is $\mathrm P$ " do not involve the "existence" of $\mathrm S$ 's, i.e. do not require that there should be terms which are $\mathrm S$ 's. The above definitions lead to the result that, if $\phi x$ is always false, i.e. if there are no $\mathrm S$ 's, then "all $\mathrm S$ is $\mathrm P$ " and "no $\mathrm S$ is $\mathrm P$ " will both be true, whatever [Pg 163] $\mathrm P$ may be. For, according to the definition in the last chapter, " $\phi x$ implies $\psi x$ " means "not- $\phi x$ or $\psi x$ " which is always true if not- $\phi x$ is always true. At the first moment, this result might lead the reader to desire different definitions, but a little practical experience soon shows that any different definitions would be inconvenient and would conceal the important ideas. The proposition " $\phi x$ always implies $\psi x$ , and $\phi x$ is sometimes true" is essentially composite, and it would be very awkward to give this as the definition of "all $\mathrm S$ is $\mathrm P$ ," for then we should have no language left for " $\phi x$ always implies $\psi x$ ," which is needed a hundred times for once that the other is needed. But, with our definitions, "all $\mathrm S$ is $\mathrm P$ " does not imply "some $\mathrm S$ is $\mathrm P$ ," since the first allows the non-existence of $\mathrm S$ and the second does not; thus conversion per accidens becomes invalid, and some moods of the syllogism are fallacious, e.g. Darapti: "All $\mathrm M$ is $\mathrm S$ , all $\mathrm M$ is $\mathrm P$ , therefore some $\mathrm S$ is $\mathrm P$ ," which fails if there is no $\mathrm M$ .

It will be evident that "all $\mathrm S$ is $\mathrm P$ ” and “no $\mathrm S$ is $\psi x$ with $\psi x$ , and that the same applies to “some $\mathrm S$ is $\mathrm S$ is not $\mathrm S$ is $\mathrm S$ ’s, meaning they don’t require that there should be items that are $\mathrm S$ . The definitions above lead to the conclusion that, if $\phi x$ is always false, meaning there are no $\mathrm S$ ’s, then both “all $\mathrm S$ is $\mathrm S$ is $\phi x$ implies $\mathrm S$ is $\mathrm S$ is $\mathrm S$ is $\mathrm S$ and the second does not; thus conversion per accidens becomes invalid, and some moods of the syllogism are fallacious, e.g. Darapti: "All $\mathrm M$ is $\mathrm M$ is $\mathrm P$

The notion of "existence" has several forms, one of which will occupy us in the next chapter; but the fundamental form is that which is derived immediately from the notion of "sometimes true." We say that an argument $a$ "satisfies" a function $\phi x$ if $\phi a$ is true; this is the same sense in which the roots of an equation are said to satisfy the equation. Now if $\phi x$ is sometimes true, we may say there are $x$ 's for which it is true, or we may say "arguments satisfying $\phi x$ exist" This is the fundamental meaning of the word "existence." Other meanings are either derived from this, or embody mere confusion of thought. We may correctly say "men exist," meaning that " $x$ is a man" is sometimes true. But if we make a pseudo-syllogism: "Men exist, Socrates is a man, therefore Socrates exists," we are talking nonsense, since "Socrates" is not, like "men," merely an undetermined argument to a given propositional function. The fallacy is closely analogous to that of the argument: "Men are numerous, Socrates is a man, therefore Socrates is numerous." In this case it is obvious that the conclusion is nonsensical, but [Pg 164] in the case of existence it is not obvious, for reasons which will appear more fully in the next chapter. For the present let us merely note the fact that, though it is correct to say "men exist," it is incorrect, or rather meaningless, to ascribe existence to a given particular $x$ who happens to be a man. Generally, "terms satisfying $\phi x$ exist" means " $\phi x$ is sometimes true"; but " $a$ exists" (where $a$ is a term satisfying $\phi x$ ) is a mere noise or shape, devoid of significance. It will be found that by bearing in mind this simple fallacy we can solve many ancient philosophical puzzles concerning the meaning of existence.

The idea of "existence" has several forms, one of which we'll discuss in the next chapter; but the basic form comes directly from the idea of "sometimes true." We say that an argument $a$ "satisfies" a function $\phi x$ if $\phi a$ is true; this is the same way we say the roots of an equation satisfy the equation. Now, if $\phi x$ is sometimes true, we can say there are $x$ that make it true, or we might say "arguments satisfying $\phi x$ exist." This is the fundamental meaning of the word "existence." Other meanings come from this or are just a misunderstanding. We can correctly say "men exist," meaning that " $x$ is a man" is sometimes true. But if we create a false syllogism: "Men exist, Socrates is a man, therefore Socrates exists," we're speaking nonsense because "Socrates" isn't just an undetermined argument like "men" is, in relation to a given propositional function. The mistake is similar to the argument: "Men are numerous, Socrates is a man, therefore Socrates is numerous." In this case, it's clear that the conclusion is nonsensical, but with existence, it's not obvious for reasons we'll explore in the next chapter. For now, let's simply note that while it’s accurate to say "men exist," it’s incorrect, or rather meaningless, to attribute existence to a specific $x$ who happens to be a man. Generally, "terms satisfying $\phi x$ exist" means " $\phi x$ is sometimes true"; but " $a$ exists" (where $a$ is a term satisfying $\phi x$ ) is just meaningless chatter or a shapeless sound, lacking significance. It turns out that by keeping this simple misunderstanding in mind, we can resolve many old philosophical riddles about the meaning of existence.

Another set of notions as to which philosophy has allowed itself to fall into hopeless confusions through not sufficiently separating propositions and propositional functions are the notions of "modality": necessary, possible, and impossible. (Sometimes contingent or assertoric is used instead of possible.) The traditional view was that, among true propositions, some were necessary, while others were merely contingent or assertoric; while among false propositions some were impossible, namely, those whose contradictories were necessary, while others merely happened not to be true. In fact, however, there was never any clear account of what was added to truth by the conception of necessity. In the case of propositional functions, the three-fold division is obvious. If " $\phi x$ " is an undetermined value of a certain propositional function, it will be necessary if the function is always true, possible if it is sometimes true, and impossible if it is never true. This sort of situation arises in regard to probability, for example. Suppose a ball $x$ is drawn from a bag which contains a number of balls: if all the balls are white, " $x$ is white" is necessary; if some are white, it is possible; if none, it is impossible. Here all that is known about $x$ is that it satisfies a certain propositional function, namely, " $x$ was a ball in the bag." This is a situation which is general in probability problems and not uncommon in practical life—e.g. when a person calls of whom we know nothing except that he brings a letter of introduction from our friend so-and-so. In all such [Pg 165] cases, as in regard to modality in general, the propositional function is relevant. For clear thinking, in many very diverse directions, the habit of keeping propositional functions sharply separated from propositions is of the utmost importance, and the failure to do so in the past has been a disgrace to philosophy. [Pg 166]

Another set of ideas where philosophy has gotten itself into a tangled mess is in the understanding of "modality": necessary, possible, and impossible. (Sometimes contingent or assertoric is used instead of possible.) The traditional understanding was that among true statements, some were necessary, while others were just contingent or assertoric; and among false statements, some were impossible—specifically, those whose opposites were necessary—while others were simply not true. However, there was never a clear explanation of what concept of necessity added to the notion of truth. When it comes to propositional functions, the three categories are clear. If " $\phi x$ " is an unspecified value of a particular propositional function, it will be necessary if the function is always true, possible if it’s sometimes true, and impossible if it’s never true. This kind of scenario comes up in probabilities, for example. Imagine pulling a ball $x$ from a bag filled with different balls: if all the balls are white, then "the ball $x$ is white" is necessary; if some are white, it's possible; if none are, it's impossible. Here, all we really know about $x$ is that it meets a certain propositional function: "the ball $x$ was in the bag." This situation often comes up in probability scenarios and is not rare in everyday life—for instance, when someone calls and all we know is that they have a letter of introduction from our mutual friend. In all such cases, and with modality in general, the propositional function is significant. For clear thinking across many different areas, keeping propositional functions distinct from propositions is crucial, and the failure to do this in the past has been an embarrassment to philosophy. [Pg 165] [Pg 166]

CHAPTER XVI

DESCRIPTIONS

We dealt in the preceding chapter with the words all and some; in this chapter we shall consider the word the in the singular, and in the next chapter we shall consider the word the in the plural. It may be thought excessive to devote two chapters to one word, but to the philosophical mathematician it is a word of very great importance: like Browning's Grammarian with the enclitic $\delta\epsilon$ , I would give the doctrine of this word if I were "dead from the waist down" and not merely in a prison.

We talked about the words all and some in the last chapter; in this chapter, we will look at the word the in the singular, and in the next chapter, we will discuss the word the in the plural. It might seem excessive to spend two chapters on one word, but for the philosophical mathematician, it's a word of great significance: like Browning's Grammarian with the enclitic $\delta\epsilon$ , I would explain the doctrine of this word even if I were "dead from the waist down" and not just in a prison.

We have already had occasion to mention "descriptive functions," i.e. such expressions as "the father of $x$ " or "the sine of $x$ ." These are to be defined by first defining "descriptions."

We’ve already had a chance to talk about “descriptive functions,” i.e. expressions like “the father of $x$ " or “the sine of $x$ .” We need to define these by first explaining what “descriptions” are.

A "description" may be of two sorts, definite and indefinite (or ambiguous). An indefinite description is a phrase of the form "a so-and-so," and a definite description is a phrase of the form "the so-and-so" (in the singular). Let us begin with the former.

A "description" can be of two types: definite and indefinite (or ambiguous). An indefinite description is a phrase like "a something," while a definite description is a phrase like "the something" (in the singular). Let's start with the first type.

"Who did you meet?" "I met a man." "That is a very indefinite description." We are therefore not departing from usage in our terminology. Our question is: What do I really assert when I assert "I met a man"? Let us assume, for the moment, that my assertion is true, and that in fact I met Jones. It is clear that what I assert is not "I met Jones." I may say "I met a man, but it was not Jones"; in that case, though I lie, I do not contradict myself, as I should do if when I say I met a [Pg 167] man I really mean that I met Jones. It is clear also that the person to whom I am speaking can understand what I say, even if he is a foreigner and has never heard of Jones.

"Who did you meet?" "I met a guy." "That's a pretty vague description." So we aren't straying from the usual language here. Our question is: What am I really claiming when I say "I met a guy"? For now, let's assume my statement is true, and I actually met Jones. It's clear that what I'm claiming is not "I met Jones." I could say, "I met a guy, but it wasn't Jones"; in that case, even though I'm lying, I'm not contradicting myself, as I would be if when I said I met a guy I actually meant that I met Jones. It's also clear that the person I’m talking to can understand what I’m saying, even if they are a foreigner and have never heard of Jones.

But we may go further: not only Jones, but no actual man, enters into my statement. This becomes obvious when the statement is false, since then there is no more reason why Jones should be supposed to enter into the proposition than why anyone else should. Indeed the statement would remain significant, though it could not possibly be true, even if there were no man at all. "I met a unicorn" or "I met a sea-serpent" is a perfectly significant assertion, if we know what it would be to be a unicorn or a sea-serpent, i.e. what is the definition of these fabulous monsters. Thus it is only what we may call the concept that enters into the proposition. In the case of "unicorn," for example, there is only the concept: there is not also, somewhere among the shades, something unreal which may be called "a unicorn." Therefore, since it is significant (though false) to say "I met a unicorn," it is clear that this proposition, rightly analysed, does not contain a constituent "a unicorn," though it does contain the concept "unicorn."

But we can go further: not just Jones, but no real person, is part of my statement. This becomes clear when the statement is false, because then there's no reason to think Jones should be included in the proposition any more than anyone else. In fact, the statement would still hold meaning even if it couldn't possibly be true, like saying "I met a unicorn" or "I met a sea-serpent." These are perfectly meaningful claims if we understand what it means to be a unicorn or a sea-serpent, i.e., we know the definitions of these mythical creatures. So, it's really just the concept we can refer to in the proposition. In the case of "unicorn," for instance, there's just the concept; there's nothing existing in some shadowy form that we could call "a unicorn." Therefore, since it's meaningful (even though it's false) to say "I met a unicorn," it's clear that this proposition, when properly analyzed, doesn't actually include a component "a unicorn," even though it does include the concept "unicorn."

The question of "unreality," which confronts us at this point, is a very important one. Misled by grammar, the great majority of those logicians who have dealt with this question have dealt with it on mistaken lines. They have regarded grammatical form as a surer guide in analysis than, in fact, it is. And they have not known what differences in grammatical form are important. "I met Jones" and "I met a man" would count traditionally as propositions of the same form, but in actual fact they are of quite different forms: the first names an actual person, Jones; while the second involves a propositional function, and becomes, when made explicit: "The function 'I met $x$ and $x$ is human' is sometimes true." (It will be remembered that we adopted the convention of using "sometimes" as not implying more than once.) This proposition is obviously not of the form "I met $x$ ," which accounts [Pg 168] for the existence of the proposition "I met a unicorn" in spite of the fact that there is no such thing as "a unicorn."

The question of "unreality" that we face here is really significant. Many logicians who have tackled this issue, misled by grammar, have approached it incorrectly. They've treated grammatical structure as a more reliable guide in analysis than it actually is. They also haven’t recognized which differences in grammatical structure matter. "I met Jones" and "I met a man" would traditionally be seen as statements of the same type, but they are actually very different: the first identifies a real person, Jones, while the second involves a propositional function and can be made explicit as: "The function 'I met $x$ and $x$ is human' is sometimes true." (Remember, we agreed to use "sometimes" to mean not more than once.) This statement is clearly not in the form "I met $x$

For want of the apparatus of propositional functions, many logicians have been driven to the conclusion that there are unreal objects. It is argued, e.g. by Meinong,[40] that we can speak about "the golden mountain," "the round square," and so on; we can make true propositions of which these are the subjects; hence they must have some kind of logical being, since otherwise the propositions in which they occur would be meaningless. In such theories, it seems to me, there is a failure of that feeling for reality which ought to be preserved even in the most abstract studies. Logic, I should maintain, must no more admit a unicorn than zoology can; for logic is concerned with the real world just as truly as zoology, though with its more abstract and general features. To say that unicorns have an existence in heraldry, or in literature, or in imagination, is a most pitiful and paltry evasion. What exists in heraldry is not an animal, made of flesh and blood, moving and breathing of its own initiative. What exists is a picture, or a description in words. Similarly, to maintain that Hamlet, for example, exists in his own world, namely, in the world of Shakespeare's imagination, just as truly as (say) Napoleon existed in the ordinary world, is to say something deliberately confusing, or else confused to a degree which is scarcely credible. There is only one world, the "real" world: Shakespeare's imagination is part of it, and the thoughts that he had in writing Hamlet are real. So are the thoughts that we have in reading the play. But it is of the very essence of fiction that only the thoughts, feelings, etc., in Shakespeare and his readers are real, and that there is not, in addition to them, an objective Hamlet. When you have taken account of all the feelings roused by Napoleon in writers and readers of history, you have not touched the actual man; but in the case of Hamlet you have come to the end of him. If no one thought about Hamlet, there would be nothing [Pg 169] left of him; if no one had thought about Napoleon, he would have soon seen to it that some one did. The sense of reality is vital in logic, and whoever juggles with it by pretending that Hamlet has another kind of reality is doing a disservice to thought. A robust sense of reality is very necessary in framing a correct analysis of propositions about unicorns, golden mountains, round squares, and other such pseudo-objects.

For lack of the tools for propositional functions, many logicians have come to the conclusion that unreal objects exist. Meinong, for instance, argues that we can discuss "the golden mountain," "the round square," and so on; we can make true statements about these subjects; therefore, they must have some kind of logical existence, or else the statements that involve them would be meaningless. It seems to me that such theories lack a sense of reality that should be maintained even in the most abstract studies. I would argue that logic should not accept a unicorn any more than zoology would; for logic is just as concerned with the real world as zoology is, even though it deals with more abstract and general features. To claim that unicorns exist in heraldry, literature, or imagination is a pitiful and trivial evasion. What exists in heraldry is not an actual animal made of flesh and blood that moves and breathes on its own. What exists is a picture or a description in words. Similarly, to insist that Hamlet, for example, exists in his own world, that is, in Shakespeare's imagination, just as Napoleon existed in the ordinary world, is to say something that is deliberately confusing or else confusing to a degree that is hard to believe. There is only one world, the "real" world: Shakespeare's imagination is a part of it, and the thoughts he had while writing Hamlet are real. So are the thoughts we have while reading the play. But the essence of fiction is that only the thoughts, feelings, etc., in Shakespeare and his readers are real, and that there isn’t an objective Hamlet in addition to them. When you consider all the feelings stirred by Napoleon in historians and readers, you haven’t touched the actual man; but with Hamlet, you’ve reached the end of him. If no one thought about Hamlet, there would be nothing left of him; if no one had thought about Napoleon, he would have made sure someone did. A sense of reality is crucial in logic, and whoever plays games with it by suggesting that Hamlet has another kind of reality is undermining clear thinking. A strong sense of reality is essential for accurately analyzing propositions about unicorns, golden mountains, round squares, and other such pseudo-objects.

[40]Untersuchungen zur Gegenstandstheorie und Psychologie, 1904.

[40]Studies on Object Theory and Psychology, 1904.

In obedience to the feeling of reality, we shall insist that, in the analysis of propositions, nothing "unreal" is to be admitted. But, after all, if there is nothing unreal, how, it may be asked, could we admit anything unreal? The reply is that, in dealing with propositions, we are dealing in the first instance with symbols, and if we attribute significance to groups of symbols which have no significance, we shall fall into the error of admitting unrealities, in the only sense in which this is possible, namely, as objects described. In the proposition "I met a unicorn," the whole four words together make a significant proposition, and the word "unicorn" by itself is significant, in just the same sense as the word "man." But the two words "a unicorn" do not form a subordinate group having a meaning of its own. Thus if we falsely attribute meaning to these two words, we find ourselves saddled with "a unicorn," and with the problem how there can be such a thing in a world where there are no unicorns. "A unicorn" is an indefinite description which describes nothing. It is not an indefinite description which describes something unreal. Such a proposition as " $x$ is unreal" only has meaning when " $x$ " is a description, definite or indefinite; in that case the proposition will be true if " $x$ " is a description which describes nothing. But whether the description " $x$ " describes something or describes nothing, it is in any case not a constituent of the proposition in which it occurs; like "a unicorn" just now, it is not a subordinate group having a meaning of its own. All this results from the fact that, when " $x$ " is a description, " $x$ is unreal" or " $x$ does not exist" is not nonsense, but is always significant and sometimes true. [Pg 170]

In line with the principle of reality, we assert that, in analyzing propositions, nothing "unreal" should be accepted. But if there truly is nothing unreal, one might wonder, how could we accept anything unreal? The answer is that, when we analyze propositions, we are first dealing with symbols, and if we ascribe meaning to groups of symbols that lack significance, we risk the mistake of accepting unrealities, but only in the sense of objects described. In the statement "I met a unicorn," all four words together form a meaningful proposition, and the term "unicorn" is meaningful on its own, much like the word "man." However, the two words "a unicorn" do not create a subordinate group with their own meaning. Therefore, if we mistakenly assign meaning to these two words, we are left with "a unicorn," and the dilemma of how such a thing could exist in a world without unicorns. "A unicorn" is an indefinite description that describes nothing. It is not an indefinite description that describes something unreal. A proposition like " $x$ is unreal" only holds meaning when " $x$ [Pg 170]

We may now proceed to define generally the meaning of propositions which contain ambiguous descriptions. Suppose we wish to make some statement about "a so-and-so," where "so-and-so's" are those objects that have a certain property $\phi$ , i.e. those objects $x$ for which the propositional function $\phi x$ is true. (E.g. if we take "a man" as our instance of "a so-and-so," $\phi x$ will be " $x$ is human.") Let us now wish to assert the property $\psi$ of "a so-and-so," i.e. we wish to assert that "a so-and-so" has that property which $x$ has when $\psi x$ is true. (E.g. in the case of "I met a man," $\psi x$ will be "I met $x$ .") Now the proposition that "a so-and-so" has the property $\psi$ is not a proposition of the form " $\psi x$ ." If it were, "a so-and-so" would have to be identical with $x$ for a suitable $x$ ; and although (in a sense) this may be true in some cases, it is certainly not true in such a case as "a unicorn." It is just this fact, that the statement that a so-and-so has the property $\psi$ is not of the form $\psi x$ , which makes it possible for "a so-and-so" to be, in a certain clearly definable sense, "unreal." The definition is as follows:—

We can now generally define what propositions with ambiguous descriptions mean. Let's say we want to make a statement about "a so-and-so," where "so-and-sos" are objects that have a specific property $\phi$ , i.e. those objects $x$ for which the propositional function $\phi x$ is true. (E.g. if we take "a man" as our example of "a so-and-so," then $\phi x$ will be " $x$ is human.") Now, let’s say we want to claim the property $\psi$ of "a so-and-so," i.e. we want to claim that "a so-and-so" has that property which $x$ has when $\psi x$ is true. (E.g. in the case of "I met a man," $\psi x$ will be "I met $x$ .") Now, the proposition that "a so-and-so" has the property $\psi$ is not of the form " $\psi x$ ." If it were, "a so-and-so" would have to be identical with $x$ for some suitable $x$ ; and although (in a sense) this may be true in some cases, it is definitely not true in a case like "a unicorn." It is precisely this fact that the statement that a so-and-so has the property $\psi$ is not of the form $\psi x$ , which allows "a so-and-so" to be, in a specific, clearly definable sense, "unreal." The definition is as follows:—

The statement that "an object having the property $\phi$ has the property $\psi$ "

The statement that "an object with the property $\phi$ has the property $\psi$

means:

"The joint assertion of $\phi x$ and $\psi x$ is not always false."

"The combined statement of $\phi x$ and $\psi x$ isn’t always untrue."

So far as logic goes, this is the same proposition as might be expressed by "some $\phi$ 's are $\psi$ 's"; but rhetorically there is a difference, because in the one case there is a suggestion of singularity, and in the other case of plurality. This, however, is not the important point. The important point is that, when rightly analysed, propositions verbally about "a so-and-so" are found to contain no constituent represented by this phrase. And that is why such propositions can be significant even when there is no such thing as a so-and-so.

As far as logic goes, this is the same statement as saying "some $\phi$ 's are $\psi$ 's"; but rhetorically there is a difference since one implies singularity while the other implies plurality. However, this isn’t the key issue. The key issue is that when properly analyzed, statements that refer to "a so-and-so" are found to have no actual component represented by that phrase. That’s why such statements can still be meaningful even when there isn’t actually a so-and-so.

The definition of existence, as applied to ambiguous descriptions, results from what was said at the end of the preceding chapter. We say that "men exist" or "a man exists" if the [Pg 171] propositional function " $x$ is human" is sometimes true; and generally "a so-and-so" exists if " $x$ is so-and-so" is sometimes true. We may put this in other language. The proposition "Socrates is a man" is no doubt equivalent to "Socrates is human," but it is not the very same proposition. The is of "Socrates is human" expresses the relation of subject and predicate; the is of "Socrates is a man" expresses identity. It is a disgrace to the human race that it has chosen to employ the same word "is" for these two entirely different ideas—a disgrace which a symbolic logical language of course remedies. The identity in "Socrates is a man" is identity between an object named (accepting "Socrates" as a name, subject to qualifications explained later) and an object ambiguously described. An object ambiguously described will "exist" when at least one such proposition is true, i.e. when there is at least one true proposition of the form " $x$ is a so-and-so," where " $x$ " is a name. It is characteristic of ambiguous (as opposed to definite) descriptions that there may be any number of true propositions of the above form—Socrates is a man, Plato is a man, etc. Thus "a man exists" follows from Socrates, or Plato, or anyone else. With definite descriptions, on the other hand, the corresponding form of proposition, namely, " $x$ is the so-and-so" (where " $x$ " is a name), can only be true for one value of $x$ at most. This brings us to the subject of definite descriptions, which are to be defined in a way analogous to that employed for ambiguous descriptions, but rather more complicated.

The definition of existence, when it comes to ambiguous descriptions, comes from what was mentioned at the end of the previous chapter. We say that "men exist" or "a man exists" if the propositional function " $x$ is human" is sometimes true; and generally "a so-and-so" exists if " $x$ is so-and-so" is sometimes true. We can express this in different words. The statement "Socrates is a man" is definitely equivalent to "Socrates is human," but they are not exactly the same statement. The is in "Socrates is human" shows the relationship between subject and predicate; the is in "Socrates is a man" shows identity. It’s unfortunate for humanity that we use the same word "is" for these two completely different concepts—a problem that symbolic logical language can fix. The identity in "Socrates is a man" refers to the identity between a named object (considering "Socrates" as a name, with qualifications to be discussed later) and an ambiguously described object. An ambiguously described object will "exist" when at least one of these propositions is true, i.e. when there is at least one true proposition of the form " $x$ is a so-and-so," where " $x$ " is a name. It’s common for ambiguous (as opposed to definite) descriptions to have any number of true propositions of this form—Socrates is a man, Plato is a man, etc. So, "a man exists" can be derived from Socrates, Plato, or anyone else. In contrast, for definite descriptions, the corresponding proposition, namely, " $x$ is the so-and-so" (where " $x$ " is a name), can only be true for one value of $x$ at most. This leads us to the topic of definite descriptions, which will be defined in a way similar to that used for ambiguous descriptions, but in a somewhat more complex manner.

We come now to the main subject of the present chapter, namely, the definition of the word the (in the singular). One very important point about the definition of "a so-and-so" applies equally to "the so-and-so"; the definition to be sought is a definition of propositions in which this phrase occurs, not a definition of the phrase itself in isolation. In the case of "a so-and-so," this is fairly obvious: no one could suppose that "a man" was a definite object, which could be defined by itself. [Pg 172] Socrates is a man, Plato is a man, Aristotle is a man, but we cannot infer that "a man" means the same as "Socrates" means and also the same as "Plato" means and also the same as "Aristotle" means, since these three names have different meanings. Nevertheless, when we have enumerated all the men in the world, there is nothing left of which we can say, "This is a man, and not only so, but it is the 'a man,' the quintessential entity that is just an indefinite man without being anybody in particular." It is of course quite clear that whatever there is in the world is definite: if it is a man it is one definite man and not any other. Thus there cannot be such an entity as "a man" to be found in the world, as opposed to specific man. And accordingly it is natural that we do not define "a man" itself, but only the propositions in which it occurs.

We now arrive at the main topic of this chapter, which is the definition of the word the (in the singular). A crucial point about defining "a so-and-so" also applies to "the so-and-so"; the definition we are looking for is of the propositions in which this phrase appears, not of the phrase itself in isolation. With "a so-and-so," this is quite clear: no one could think that "a man" is a specific object that can be defined on its own. Socrates is a man, Plato is a man, Aristotle is a man, but we can't conclude that "a man" means the same thing as "Socrates," or "Plato," or "Aristotle," because these three names carry different meanings. However, once we list all the men in the world, there is nothing left we can say, "This is a man, and not only that, but it is the 'a man,' the ideal example that is just an indefinite man without being anyone in particular." It's clear that whatever exists in the world is specific: if it's a man, it’s one particular man and not any other. Therefore, there can't be such a thing as "a man" in the world, distinct from specific men. Thus, it makes sense that we don't define "a man" itself, but only the propositions in which it appears.

In the case of "the so-and-so" this is equally true, though at first sight less obvious. We may demonstrate that this must be the case, by a consideration of the difference between a name and a definite description. Take the proposition, "Scott is the author of Waverley." We have here a name, "Scott," and a description, "the author of Waverley," which are asserted to apply to the same person. The distinction between a name and all other symbols may be explained as follows:—

In the case of "the so-and-so," this is also true, although it's less obvious at first glance. We can show that this must be the case by looking at the difference between a name and a definite description. Take the statement, "Scott is the author of Waverley." Here, we have a name, "Scott," and a description, "the author of Waverley," that are said to refer to the same person. The difference between a name and all other symbols can be explained as follows:—

A name is a simple symbol whose meaning is something that can only occur as subject, i.e. something of the kind that, in Chapter XIII., we defined as an "individual" or a "particular." And a "simple" symbol is one which has no parts that are symbols. Thus "Scott" is a simple symbol, because, though it has parts (namely, separate letters), these parts are not symbols. On the other hand, "the author of Waverley" is not a simple symbol, because the separate words that compose the phrase are parts which are symbols. If, as may be the case, whatever seems to be an "individual" is really capable of further analysis, we shall have to content ourselves with what may be called "relative individuals," which will be terms that, throughout the context in question, are never analysed and never occur [Pg 173] otherwise than as subjects. And in that case we shall have correspondingly to content ourselves with "relative names." From the standpoint of our present problem, namely, the definition of descriptions, this problem, whether these are absolute names or only relative names, may be ignored, since it concerns different stages in the hierarchy of "types," whereas we have to compare such couples as "Scott" and "the author of Waverley," which both apply to the same object, and do not raise the problem of types. We may, therefore, for the moment, treat names as capable of being absolute; nothing that we shall have to say will depend upon this assumption, but the wording may be a little shortened by it.

A name is a simple symbol that has a meaning which can only exist as a subject, meaning something that, as we defined in Chapter XIII, is an "individual" or a "particular." A "simple" symbol is one that has no parts that are symbols. So "Scott" is a simple symbol because, although it has parts (like separate letters), those parts are not symbols. In contrast, "the author of Waverley" is not a simple symbol since the individual words that make up the phrase are parts that are symbols. If it happens that whatever appears to be an "individual" can actually be analyzed further, we’ll have to settle for what we can call "relative individuals," which will be terms that, throughout the relevant context, are never analyzed and only appear as subjects. In that case, we will similarly have to stick with "relative names." From the perspective of our current issue, which is the definition of descriptions, we can overlook whether these are absolute names or just relative names, since it involves different levels in the hierarchy of "types." We need to compare pairs like "Scott" and "the author of Waverley," which both refer to the same object and don’t raise the type problem. Thus, for now, we can consider names as being absolute; nothing we say will rely on this assumption, but it may make the wording slightly more concise.

We have, then, two things to compare: (1) a name, which is a simple symbol, directly designating an individual which is its meaning, and having this meaning in its own right, independently of the meanings of all other words; (2) a description, which consists of several words, whose meanings are already fixed, and from which results whatever is to be taken as the "meaning" of the description.

We have two things to compare: (1) a name, which is a straightforward symbol that directly refers to an individual and has its own meaning, independent of the meanings of other words; (2) a description, which is made up of several words with established meanings, and from which we derive whatever is considered the "meaning" of the description.

A proposition containing a description is not identical with what that proposition becomes when a name is substituted, even if the name names the same object as the description describes. "Scott is the author of Waverley" is obviously a different proposition from "Scott is Scott": the first is a fact in literary history, the second a trivial truism. And if we put anyone other than Scott in place of "the author of Waverley," our proposition would become false, and would therefore certainly no longer be the same proposition. But, it may be said, our proposition is essentially of the same form as (say) "Scott is Sir Walter," in which two names are said to apply to the same person. The reply is that, if "Scott is Sir Walter" really means "the person named 'Scott' is the person named 'Sir Walter,'" then the names are being used as descriptions: i.e. the individual, instead of being named, is being described as the person having that name. This is a way in which names are frequently used [Pg 174] in practice, and there will, as a rule, be nothing in the phraseology to show whether they are being used in this way or as names. When a name is used directly, merely to indicate what we are speaking about, it is no part of the fact asserted, or of the falsehood if our assertion happens to be false: it is merely part of the symbolism by which we express our thought. What we want to express is something which might (for example) be translated into a foreign language; it is something for which the actual words are a vehicle, but of which they are no part. On the other hand, when we make a proposition about "the person called 'Scott,'" the actual name "Scott" enters into what we are asserting, and not merely into the language used in making the assertion. Our proposition will now be a different one if we substitute "the person called 'Sir Walter.'" But so long as we are using names as names, whether we say "Scott" or whether we say "Sir Walter" is as irrelevant to what we are asserting as whether we speak English or French. Thus so long as names are used as names, "Scott is Sir Walter" is the same trivial proposition as "Scott is Scott." This completes the proof that "Scott is the author of Waverley" is not the same proposition as results from substituting a name for "the author of Waverley," no matter what name may be substituted.

A statement that includes a description isn't the same as what it becomes when you replace it with a name, even if the name refers to the same thing the description does. "Scott is the author of Waverley" is clearly a different statement from "Scott is Scott": the first is a historical fact in literature, while the second is just a trivial truth. If we replace "the author of Waverley" with anyone other than Scott, our statement would be false and would definitely no longer be the same statement. However, one might argue that our statement is essentially similar to "Scott is Sir Walter," where two names refer to the same person. The response is that if "Scott is Sir Walter" really means "the person named 'Scott' is the person named 'Sir Walter,'" then the names are being used as descriptions: that is, instead of naming the individual, we are describing them as the person with that name. This is often how names are used in practice, and typically, there’s nothing in the wording to indicate whether they are being used this way or as names. When a name is used directly just to point out what we’re talking about, it’s not a part of the actual fact we’re stating or of the falsehood if our statement happens to be false; it’s just part of the symbols we use to express our thoughts. What we aim to express could (for example) be translated into another language; it’s something that the actual words serve as a vehicle for, but those words aren’t part of it. Conversely, when we make a statement about "the person called 'Scott,'" the actual name "Scott" becomes part of what we are asserting, not just part of the language used to make the assertion. Our statement would be different if we substituted "the person called 'Sir Walter.'" But as long as we are using names as names, whether we say "Scott" or "Sir Walter" is as irrelevant to what we’re asserting as whether we speak English or French. Therefore, as long as names are used as names, "Scott is Sir Walter" is the same trivial statement as "Scott is Scott." This completes the proof that "Scott is the author of Waverley" is not the same statement that results from substituting a name for "the author of Waverley," regardless of what name is substituted.

When we use a variable, and speak of a propositional function, $\phi x$ say, the process of applying general statements about $x$ to particular cases will consist in substituting a name for the letter " $x$ ," assuming that $\phi$ is a function which has individuals for its arguments. Suppose, for example, that $\phi x$ is "always true"; let it be, say, the "law of identity," $x = x$ . Then we may substitute for " $x$ " any name we choose, and we shall obtain a true proposition. Assuming for the moment that "Socrates," "Plato," and "Aristotle" are names (a very rash assumption), we can infer from the law of identity that Socrates is Socrates, Plato is Plato, and Aristotle is Aristotle. But we shall commit a fallacy if we attempt to infer, without further premisses, that the author of Waverley is the author of Waverley. This results [Pg 175] from what we have just proved, that, if we substitute a name for "the author of Waverley" in a proposition, the proposition we obtain is a different one. That is to say, applying the result to our present case: If " $x$ " is a name, " $x = x$ " is not the same proposition as "the author of Waverley is the author of Waverley," no matter what name " $x$ " may be. Thus from the fact that all propositions of the form " $x = x$ " are true we cannot infer, without more ado, that the author of Waverley is the author of Waverley. In fact, propositions of the form "the so-and-so is the so-and-so" are not always true: it is necessary that the so-and-so should exist (a term which will be explained shortly). It is false that the present King of France is the present King of France, or that the round square is the round square. When we substitute a description for a name, propositional functions which are "always true" may become false, if the description describes nothing. There is no mystery in this as soon as we realise (what was proved in the preceding paragraph) that when we substitute a description the result is not a value of the propositional function in question.

When we use a variable and talk about a propositional function, $\phi x$ —let’s say that the process of applying general statements about $x$ to specific cases involves substituting a name for the letter " $\phi$ is a function that has individuals as its arguments. For instance, if $\phi x$ is "always true," let’s say it's the "law of identity," $x = x$ . Then we can substitute any name we want for " $x$ Waverley is the author of Waverley. This follows from what we just proved: if we replace "the author of Waverley" in a statement, the statement we end up with is a different one. In other words, applying this result to our current case: if " $x = x$ Waverley is the author of Waverley," no matter what name " $x = x$ Waverley is the author of Waverley. In fact, statements of the form "the so-and-so is the so-and-so" are not always true: it's necessary for the so-and-so to actually exist (a concept we’ll clarify soon). It is false that the current King of France is the current King of France, or that the round square is the round square. When we replace a description with a name, propositional functions that are "always true" might turn false if the description refers to nothing. This isn’t mysterious as soon as we understand (as proved in the previous paragraph) that when we substitute a description, the result is not a value of the propositional function in question.

We are now in a position to define propositions in which a definite description occurs. The only thing that distinguishes "the so-and-so" from "a so-and-so" is the implication of uniqueness. We cannot speak of "the inhabitant of London," because inhabiting London is an attribute which is not unique. We cannot speak about "the present King of France," because there is none; but we can speak about "the present King of England." Thus propositions about "the so-and-so" always imply the corresponding propositions about "a so-and-so," with the addendum that there is not more than one so-and-so. Such a proposition as "Scott is the author of Waverly" could not be true if Waverly had never been written, or if several people had written it; and no more could any other proposition resulting from a propositional function $x$ by the substitution of "the author of Waverly" for " $x$ ." We may say that "the author of Waverly" means "the value of $x$ for which ' $x$ wrote [Pg 176] Waverly' is true." Thus the proposition "the author of Waverly was Scotch," for example, involves:

We can now define propositions that include a definite description. The main difference between "the so-and-so" and "a so-and-so" is the implication of uniqueness. We can't refer to "the inhabitant of London," because living in London is not unique. We can't talk about "the current King of France," because there isn't one; but we can say "the current King of England." Therefore, propositions about "the so-and-so" always imply the corresponding propositions about "a so-and-so," with the added note that there is only one so-and-so. A proposition like "Scott is the author of Waverly" cannot be true if Waverly was never written or if multiple people wrote it; nor can any other proposition resulting from a propositional function $x$ by replacing "the author of Waverly" for " $x$ ." We can say that "the author of Waverly" means "the value of $x$ for which ' $x$ wrote [Pg 176] Waverly' is true." So, the proposition "the author of Waverly was Scotch," for example, involves:

(1) " $x$ wrote Waverly" is not always false;

" $x$ wrote Waverly" isn't always incorrect;

(2) "if $x$ and $y$ wrote Waverly, $x$ and $y$ are identical" is always true;

(2) "if $x$ and $y$ wrote Waverly, then $x$ and $y$ are the same" is always true;

(3) "if $x$ wrote Waverly, $x$ was Scotch" is always true.

(3) "if $x$ wrote Waverly, $x$ was Scottish" is always true.

These three propositions, translated into ordinary language, state:

These three statements, put into everyday language, say:

(1) at least one person wrote Waverly;

(2) at most one person wrote Waverly;

(3) whoever wrote Waverly was Scotch.

whoever wrote Waverly was Scottish.

All these three are implied by "the author of Waverly was Scotch." Conversely, the three together (but no two of them) imply that the author of Waverly was Scotch. Hence the three together may be taken as defining what is meant by the proposition "the author of Waverly was Scotch."

All three of these are suggested by "the author of Waverly was Scottish." On the other hand, the three together (but not any two of them) suggest that the author of Waverly was Scottish. Therefore, the three together can be considered as defining what is meant by the statement "the author of Waverly was Scottish."

We may somewhat simplify these three propositions. The first and second together are equivalent to: "There is a term $c$ such that ' $x$ wrote Waverly' is true when $x$ is $c$ and is false when $x$ is not $c$ ." In other words, "There is a term $c$ such that ' $x$ wrote Waverly' is always equivalent to ' $x$ is $c$ .'" (Two propositions are "equivalent" when both are true or both are false.) We have here, to begin with, two functions of $x$ , " $x$ wrote Waverly" and " $x$ is $c$ ," and we form a function of $c$ by considering the equivalence of these two functions of $x$ for all values of $x$ ; we then proceed to assert that the resulting function of $c$ is "sometimes true," i.e. that it is true for at least one value of $c$ . (It obviously cannot be true for more than one value of $c$ .) These two conditions together are defined as giving the meaning of "the author of Waverly exists."

We can simplify these three statements a bit. The first and second together mean: "There is a term $c$ such that ' $x$ wrote Waverly' is true when $x$ is $c$ and is false when $x$ is not $c$ ." In other words, "There is a term $c$ such that ' $x$ wrote Waverly' is always equivalent to ' $x$ is $c$ .'" (Two propositions are "equivalent" when both are true or both are false.) We initially have two functions of $x$ , " $x$ wrote Waverly" and " $x$ is $c$ by looking at the equivalence of these two functions of $x$ for all values of $x$ ; we then claim that the resulting function of $c$ is "sometimes true," i.e. that it is true for at least one value of $c$ . (It clearly cannot be true for more than one value of $c$ Waverly exists."

We may now define "the term satisfying the function $\phi x$ exists." This is the general form of which the above is a particular case. "The author of Waverly" is "the term satisfying the function ' $x$ wrote Waverly.'" And "the so-and-so" will [Pg 177] always involve reference to some propositional function, namely, that which defines the property that makes a thing a so-and-so. Our definition is as follows:—

We can now define "the term satisfying the function $\phi x$ exists." This is the general form of which the above is a specific case. "The author of Waverly" is "the term satisfying the function ' $x$ wrote Waverly.'" And "the so-and-so" will [Pg 177] always refer to some propositional function, which is the one that defines the characteristic that makes something a so-and-so. Our definition is as follows:—

"The term satisfying the function $\phi x$ exists" means:

"The phrase 'the term satisfying the function $\phi x$ exists' means:"

"There is a term $c$ such that $\phi x$ is always equivalent to ' $x$ is $c$ .'"

In order to define "the author of Waverly was Scotch," we have still to take account of the third of our three propositions, namely, "Whoever wrote Waverly was Scotch." This will be satisfied by merely adding that the $c$ in question is to be Scotch. Thus "the author of Waverly was Scotch" is:

In order to define "the author of Waverly was Scottish," we still need to consider the third of our three points, which is "Whoever wrote Waverly was Scottish." This will be fulfilled by simply adding that the $c$ in question is to be Scottish. So, "the author of Waverly was Scottish" is:

"There is a term $c$ such that (1) ' $x$ wrote Waverly' is always equivalent to ' $x$ is $c$ ,' (2) $c$ is Scotch."

And generally: "the term satisfying $\phi x$ satisfies $\psi x$ " is defined as meaning:

"There is a term $c$ such that (1) $\phi x$ is always equivalent to ' $x$ is $c$ ,' (2) $\psi c$ is true."

This is the definition of propositions in which descriptions occur.

This is the definition of propositions that include descriptions.

It is possible to have much knowledge concerning a term described, i.e. to know many propositions concerning "the so-and-so," without actually knowing what the so-and-so is, i.e. without knowing any proposition of the form " $x$ is the so-and-so," where " $x$ " is a name. In a detective story propositions about "the man who did the deed" are accumulated, in the hope that ultimately they will suffice to demonstrate that it was $\mathrm A$ who did the deed. We may even go so far as to say that, in all such knowledge as can be expressed in words—with the exception of "this" and "that" and a few other words of which the meaning varies on different occasions—no names, in the strict sense, occur, but what seem like names are really descriptions. We may inquire significantly whether Homer existed, which we could not do if "Homer" were a name. The proposition "the so-and-so exists" is significant, whether true or false; but if $a$ is the so-and-so (where " $a$ " is a name), the words " $a$ exists" are meaningless. It is only of descriptions—definite [Pg 178] or indefinite—that existence can be significantly asserted; for, if " $a$ " is a name, it must name something: what does not name anything is not a name, and therefore, if intended to be a name, is a symbol devoid of meaning, whereas a description, like "the present King of France," does not become incapable of occurring significantly merely on the ground that it describes nothing, the reason being that it is a complex symbol, of which the meaning is derived from that of its constituent symbols. And so, when we ask whether Homer existed, we are using the word "Homer" as an abbreviated description: we may replace it by (say) "the author of the Iliad and the Odyssey." The same considerations apply to almost all uses of what look like proper names.

It’s possible to have a lot of knowledge about a term described, i.e. to know many statements about "the so-and-so," without actually knowing what the so-and-so is, i.e. without knowing any statement of the form " $x$ is the so-and-so," where " $x$ " is a name. In a detective story, statements about "the man who did the deed" pile up, hoping that ultimately they will be enough to prove that it was $\mathrm A$ who did it. We might even say that, in all such knowledge expressible in words—with the exception of "this" and "that" and a few other words whose meanings change in different situations—no names, in the strict sense, exist, but what seem like names are actually descriptions. We can meaningfully ask if Homer existed, which we couldn't do if "Homer" were just a name. The statement "the so-and-so exists" is meaningful, whether it’s true or false; but if $a$ is the so-and-so (where " $a$ " is a name), the words " $a$ exists" are meaningless. Only with descriptions—definite or indefinite—can existence be meaningfully asserted; because if " $a$ " is a name, it must refer to something: what doesn’t refer to anything isn’t a name, and therefore, if intended as a name, is a symbol without meaning, while a description, like "the current King of France," doesn’t lose its significance just because it describes nothing. The reason is that it is a complex symbol, whose meaning comes from its component symbols. So, when we ask if Homer existed, we’re using the word "Homer" as a shortened description: we can replace it with (for example) "the author of the Iliad and the Odyssey." The same points apply to nearly all uses of what look like proper names.

When descriptions occur in propositions, it is necessary to distinguish what may be called "primary" and "secondary" occurrences. The abstract distinction is as follows. A description has a "primary" occurrence when the proposition in which it occurs results from substituting the description for " $x$ " in some propositional function $\phi x$ ; a description has a "secondary" occurrence when the result of substituting the description for $x$ in $\phi x$ gives only part of the proposition concerned. An instance will make this clearer. Consider "the present King of France is bald." Here "the present King of France" has a primary occurrence, and the proposition is false. Every proposition in which a description which describes nothing has a primary occurrence is false. But now consider "the present King of France is not bald." This is ambiguous. If we are first to take " $x$ is bald," then substitute "the present King of France" for " $x$ " and then deny the result, the occurrence of "the present King of France" is secondary and our proposition is true; but if we are to take " $x$ is not bald" and substitute "the present King of France" for " $x$ " then "the present King of France" has a primary occurrence and the proposition is false. Confusion of primary and secondary occurrences is a ready source of fallacies where descriptions are concerned. [Pg 179]

When descriptions appear in propositions, it’s important to differentiate between what can be termed "primary" and "secondary" occurrences. The abstract distinction is as follows: a description has a "primary" occurrence when the proposition in which it appears results from replacing the description for " $x$ in some propositional function $\phi x$ ; a description has a "secondary" occurrence when substituting the description for $x$ in $\phi x$ produces only part of the proposition in question. An example will clarify this. Take the statement "the current King of France is bald." Here, "the current King of France" has a primary occurrence, and the proposition is false. Any proposition in which a description that refers to nothing has a primary occurrence is false. Now consider "the current King of France is not bald." This is ambiguous. If we first take " $x$ is bald," then substitute "the current King of France" for " $x$ is not bald" and substitute "the current King of France" for " $x$ [Pg 179]

Descriptions occur in mathematics chiefly in the form of descriptive functions, i.e. "the term having the relation $\mathrm R$ to $y$ ," or "the $\mathrm R$ of $y$ " as we may say, on the analogy of "the father of $y$ " and similar phrases. To say "the father of $y$ is rich," for example, is to say that the following propositional function of $c$ : " $c$ is rich, and ' $x$ begat $y$ ' is always equivalent to ' $x$ is $c$ ,'" is "sometimes true," i.e. is true for at least one value of $c$ . It obviously cannot be true for more than one value.

Descriptions in mathematics mainly take the form of descriptive functions, meaning "the term relating $\mathrm R$ to $\mathrm R$ of $c$ : " $c$ is rich, and ' $x$ begat $y$

The theory of descriptions, briefly outlined in the present chapter, is of the utmost importance both in logic and in theory of knowledge. But for purposes of mathematics, the more philosophical parts of the theory are not essential, and have therefore been omitted in the above account, which has confined itself to the barest mathematical requisites. [Pg 180]

The theory of descriptions, briefly summarized in this chapter, is extremely important in both logic and knowledge theory. However, for the purposes of mathematics, the more philosophical aspects of the theory are not necessary, so they have been left out of the above explanation, which has focused only on the most essential mathematical requirements. [Pg 180]

CHAPTER XVII

CLASSES

IN the present chapter we shall be concerned with the in the plural: the inhabitants of London, the sons of rich men, and so on. In other words, we shall be concerned with classes. We saw in Chapter II. that a cardinal number is to be defined as a class of classes, and in Chapter III. that the number 1 is to be defined as the class of all unit classes, i.e. of all that have just one member, as we should say but for the vicious circle. Of course, when the number 1 is defined as the class of all unit classes, "unit classes" must be defined so as not to assume that we know what is meant by "one"; in fact, they are defined in a way closely analogous to that used for descriptions, namely: A class $\alpha$ is said to be a "unit" class if the propositional function "' $x$ is an $\alpha$ ' is always equivalent to ' $x$ is $c$ '" (regarded as a function of $c$ ) is not always false, i.e., in more ordinary language, if there is a term $x$ such that $x$ will be a member of $\alpha$ when $x$ is $c$ but not otherwise. This gives us a definition of a unit class if we already know what a class is in general. Hitherto we have, in dealing with arithmetic, treated "class" as a primitive idea. But, for the reasons set forth in Chapter XIII., if for no others, we cannot accept "class" as a primitive idea. We must seek a definition on the same lines as the definition of descriptions, i.e. a definition which will assign a meaning to propositions in whose verbal or symbolic expression words or symbols apparently representing classes occur, but which will assign a meaning that altogether eliminates all mention of classes from a right analysis [Pg 181] of such propositions. We shall then be able to say that the symbols for classes are mere conveniences, not representing objects called "classes," and that classes are in fact, like descriptions, logical fictions, or (as we say) "incomplete symbols."

IN this chapter, we will focus on the in the plural: the people of London, the sons of wealthy families, and so on. In other words, we will focus on classes. We noted in Chapter II that a cardinal number can be defined as a class of classes, and in Chapter III, we established that the number 1 is defined as the class of all unit classes, i.e. all that have exactly one member, as we would state if it weren't for the problematic circular reasoning. Obviously, when defining the number 1 as the class of all unit classes, "unit classes" must be defined in a way that does not presuppose an understanding of "one"; in fact, they are defined similarly to how descriptions are defined, which means: A class $\alpha$ is considered a "unit" class if the propositional function "' $x$ is an $\alpha$ ' is always equivalent to ' $x$ is $c$ '" (considered as a function of $c$ ) is not always false, , in simpler terms, if there is a term $x$ such that $x$ will be a member of $\alpha$ when $x$ is $c$ but not otherwise. This provides us with a definition of a unit class, assuming we already understand what a class is in general. Until now, in addressing arithmetic, we have treated "class" as a fundamental concept. However, for the reasons outlined in Chapter XIII, among others, we cannot accept "class" as a primitive concept. We need to seek a definition along the same lines as the definition of descriptions, a definition that will give meaning to propositions where words or symbols seemingly representing classes appear, yet which will provide a meaning that completely removes any reference to classes from an accurate analysis of such propositions. Thus, we can say that the symbols for classes are just useful tools, not representing entities called "classes," and that classes are, in fact, like descriptions, logical fictions, or as we put it, "incomplete symbols."

The theory of classes is less complete than the theory of descriptions, and there are reasons (which we shall give in outline) for regarding the definition of classes that will be suggested as not finally satisfactory. Some further subtlety appears to be required; but the reasons for regarding the definition which will be offered as being approximately correct and on the right lines are overwhelming.

The theory of classes is not as thorough as the theory of descriptions, and there are reasons (which we will briefly outline) for thinking that the proposed definition of classes is not entirely satisfying. It seems some additional complexity is needed; however, the arguments for considering the definition we will provide as roughly accurate and headed in the right direction are convincing.

The first thing is to realise why classes cannot be regarded as part of the ultimate furniture of the world. It is difficult to explain precisely what one means by this statement, but one consequence which it implies may be used to elucidate its meaning. If we had a complete symbolic language, with a definition for everything definable, and an undefined symbol for everything indefinable, the undefined symbols in this language would represent symbolically what I mean by "the ultimate furniture of the world." I am maintaining that no symbols either for "class" in general or for particular classes would be included in this apparatus of undefined symbols. On the other hand, all the particular things there are in the world would have to have names which would be included among undefined symbols. We might try to avoid this conclusion by the use of descriptions. Take (say) "the last thing Cæsar saw before he died." This is a description of some particular; we might use it as (in one perfectly legitimate sense) a definition of that particular. But if " $a$ " is a name for the same particular, a proposition in which " $a$ " occurs is not (as we saw in the preceding chapter) identical with what this proposition becomes when for " $a$ " we substitute "the last thing Cæsar saw before he died." If our language does not contain the name " $a$ " or some other name for the same particular, we shall have no means of expressing the proposition which we expressed by means of " $a$ " as opposed to the one that [Pg 182] we expressed by means of the description. Thus descriptions would not enable a perfect language to dispense with names for all particulars. In this respect, we are maintaining, classes differ from particulars, and need not be represented by undefined symbols. Our first business is to give the reasons for this opinion.

The first thing to understand is why classes can’t be seen as part of the essential makeup of the world. It’s hard to explain exactly what this means, but one consequence that clarifies it can be used to illustrate the point. If we had a complete symbolic language, with definitions for everything that can be defined, and an undefined symbol for everything that can't, the undefined symbols would represent what I mean by “the essential makeup of the world.” I argue that there would be no symbols for “class” in general or for any specific classes in this set of undefined symbols. On the other hand, everything specific in the world would need to have names that fall under the category of undefined symbols. We might try to get around this conclusion by using descriptions. For example, take “the last thing Cæsar saw before he died.” This is a description of a specific thing; we could use it as a definition of that thing in one valid sense. But if " $a$ " is a name for the same specific thing, a statement that includes " $a$ " is not (as we saw in the previous chapter) the same as what that statement becomes when we replace " $a$ " with "the last thing Cæsar saw before he died." If our language does not include the name " $a$ " or another name for the same specific thing, we won’t be able to express the statement that we expressed using " $a$ " as opposed to the one conveyed by the description. Thus, descriptions wouldn’t allow a perfect language to do without names for all specific things. In this regard, we assert that classes are different from specifics and do not need to be represented by undefined symbols. Our first task is to provide the reasons for this view.

We have already seen that classes cannot be regarded as a species of individuals, on account of the contradiction about classes which are not members of themselves (explained in Chapter XIII.), and because we can prove that the number of classes is greater than the number of individuals.

We have already seen that classes can't be viewed as a type of individual, due to the contradiction of classes that aren't members of themselves (explained in Chapter XIII), and because we can demonstrate that the number of classes is greater than the number of individuals.

We cannot take classes in the pure extensional way as simply heaps or conglomerations. If we were to attempt to do that, we should find it impossible to understand how there can be such a class as the null-class, which has no members at all and cannot be regarded as a "heap"; we should also find it very hard to understand how it comes about that a class which has only one member is not identical with that one member. I do not mean to assert, or to deny, that there are such entities as "heaps." As a mathematical logician, I am not called upon to have an opinion on this point. All that I am maintaining is that, if there are such things as heaps, we cannot identify them with the classes composed of their constituents.

We can’t view classes in a purely extensional way as just random collections. If we tried to do that, we wouldn’t be able to grasp how a class like the null-class, which has no members at all, can exist since it can’t be seen as a "heap." We would also struggle to understand why a class with just one member isn’t the same as that member. I’m not claiming or denying the existence of "heaps." As a mathematical logician, I don’t need to take a stance on that. What I’m saying is that, if heaps do exist, we can’t equate them with the classes made up of their parts.

We shall come much nearer to a satisfactory theory if we try to identify classes with propositional functions. Every class, as we explained in Chapter II., is defined by some propositional function which is true of the members of the class and false of other things. But if a class can be defined by one propositional function, it can equally well be defined by any other which is true whenever the first is true and false whenever the first is false. For this reason the class cannot be identified with any one such propositional function rather than with any other—and given a propositional function, there are always many others which are true when it is true and false when it is false. We say that two propositional functions are "formally equivalent" when this happens. Two propositions are "equivalent" [Pg 183] when both are true or both false; two propositional functions $\phi x$ , $\psi x$ are "formally equivalent" when $\phi x$ is always equivalent to $\psi x$ . It is the fact that there are other functions formally equivalent to a given function that makes it impossible to identify a class with a function; for we wish classes to be such that no two distinct classes have exactly the same members, and therefore two formally equivalent functions will have to determine the same class.

We can get much closer to a solid theory if we try to connect classes with propositional functions. Every class, as we discussed in Chapter II, is defined by some propositional function that is true for its members and false for everything else. However, if a class can be defined by one propositional function, it can also be defined by any other function that is true whenever the first one is true and false whenever the first one is false. Because of this, a class can't be tied to just one propositional function instead of another—and given a propositional function, there are always many others that are true when it is true and false when it is false. We say that two propositional functions are "formally equivalent" when this is the case. Two propositions are "equivalent" when both are true or both are false; two propositional functions $\phi x$ , $\psi x$ are "formally equivalent" when $\phi x$ is always equivalent to $\psi x$ . The existence of other functions that are formally equivalent to a given function is what prevents us from linking a class to a function; we want classes to be such that no two distinct classes share exactly the same members, and therefore two formally equivalent functions will end up defining the same class.

When we have decided that classes cannot be things of the same sort as their members, that they cannot be just heaps or aggregates, and also that they cannot be identified with propositional functions, it becomes very difficult to see what they can be, if they are to be more than symbolic fictions. And if we can find any way of dealing with them as symbolic fictions, we increase the logical security of our position, since we avoid the need of assuming that there are classes without being compelled to make the opposite assumption that there are no classes. We merely abstain from both assumptions. This is an example of Occam's razor, namely, "entities are not to be multiplied without necessity." But when we refuse to assert that there are classes, we must not be supposed to be asserting dogmatically that there are none. We are merely agnostic as regards them: like Laplace, we can say, "je n'ai pas besoin de cette hypothèse."

When we decide that classes can’t be the same as their members, that they can't simply be piles or collections, and that they can’t be equated with propositional functions, it becomes really tough to figure out what they could be if they’re meant to be more than just symbolic fictions. If we can find a way to treat them as symbolic fictions, we strengthen our logical stance because we sidestep the need to assume that classes exist while also avoiding the opposite assumption that they don’t. We just refrain from both assumptions. This illustrates Occam's razor: "entities should not be multiplied beyond necessity." However, when we choose not to claim that classes exist, we shouldn't be thought of as dogmatically claiming that they don’t exist either. We simply take an agnostic view on them: like Laplace, we can say, "je n'ai pas besoin de cette hypothèse."

Let us set forth the conditions that a symbol must fulfil if it is to serve as a class. I think the following conditions will be found necessary and sufficient:—

Let’s outline the criteria that a symbol must meet to function as a class. I believe the following criteria will be necessary and sufficient:—

(1) Every propositional function must determine a class, consisting of those arguments for which the function is true. Given any proposition (true or false), say about Socrates, we can imagine Socrates replaced by Plato or Aristotle or a gorilla or the man in the moon or any other individual in the world. In general, some of these substitutions will give a true proposition and some a false one. The class determined will consist of all those substitutions that give a true one. Of course, we have still to decide what we mean by "all those which, etc." All that [Pg 184] we are observing at present is that a class is rendered determinate by a propositional function, and that every propositional function determines an appropriate class.

(1) Every propositional function defines a group made up of those arguments for which the function is true. Given any proposition (whether true or false), let's say about Socrates, we can picture Socrates being replaced by Plato, Aristotle, a gorilla, the man in the moon, or any other individual in the world. Generally, some of these replacements will result in a true proposition and some will not. The group defined will include all those substitutions that result in a true proposition. Of course, we still need to clarify what we mean by "all those which, etc." What we are currently noting is that a group is made specific by a propositional function, and that every propositional function defines an appropriate group. [Pg 184]

(2) Two formally equivalent propositional functions must determine the same class, and two which are not formally equivalent must determine different classes. That is, a class is determined by its membership, and no two different classes can have the same membership. (If a class is determined by a function $\phi x$ , we say that $a$ is a "member" of the class if $\phi a$ is true.)

(2) Two propositional functions that are formally equivalent must define the same class, while two that are not formally equivalent must define different classes. In other words, a class is defined by its members, and no two different classes can share the same members. (If a class is defined by a function $\phi x$ , we say that $a$ is a "member" of the class if $\phi a$ is true.)

(3) We must find some way of defining not only classes, but classes of classes. We saw in Chapter II. that cardinal numbers are to be defined as classes of classes. The ordinary phrase of elementary mathematics, "The combinations of $n$ things $m$ at a time" represents a class of classes, namely, the class of all classes of $m$ terms that can be selected out of a given class of $n$ terms. Without some symbolic method of dealing with classes of classes, mathematical logic would break down.

(3) We need to find a way to define not just classes but also classes of classes. As we saw in Chapter II, cardinal numbers should be defined as classes of classes. The common phrase in basic mathematics, "The combinations of $n$ things $m$ at a time," refers to a class of classes, specifically, the class of all classes of $m$ terms that can be selected from a given class of $n$ terms. Without some symbolic approach to handle classes of classes, mathematical logic would collapse.

(4) It must under all circumstances be meaningless (not false) to suppose a class a member of itself or not a member of itself. This results from the contradiction which we discussed in Chapter XIII.

(4) It has to be meaningless (not false) to think of a class being a member of itself or not being a member of itself. This comes from the contradiction we talked about in Chapter XIII.

(5) Lastly—and this is the condition which is most difficult of fulfilment,—it must be possible to make propositions about all the classes that are composed of individuals, or about all the classes that are composed of objects of any one logical "type." If this were not the case, many uses of classes would go astray—for example, mathematical induction. In defining the posterity of a given term, we need to be able to say that a member of the posterity belongs to all hereditary classes to which the given term belongs, and this requires the sort of totality that is in question. The reason there is a difficulty about this condition is that it can be proved to be impossible to speak of all the propositional functions that can have arguments of a given type.

(5) Lastly—and this is the hardest condition to meet—it must be possible to make statements about all the groups made up of individuals, or about all the groups made up of objects of any one logical "type." If this weren’t the case, many applications of groups would be misguided—for example, mathematical induction. In defining the descendants of a given term, we need to be able to state that a member of the descendants belongs to all hereditary groups that the given term belongs to, and this requires the kind of totality in question. The reason this condition is problematic is that it can be shown to be impossible to discuss all the propositional functions that can have arguments of a given type.

We will, to begin with, ignore this last condition and the problems which it raises. The first two conditions may be [Pg 185] taken together. They state that there is to be one class, no more and no less, for each group of formally equivalent propositional functions; e.g. the class of men is to be the same as that of featherless bipeds or rational animals or Yahoos or whatever other characteristic may be preferred for defining a human being. Now, when we say that two formally equivalent propositional functions may be not identical, although they define the same class, we may prove the truth of the assertion by pointing out that a statement may be true of the one function and false of the other; e.g. "I believe that all men are mortal" may be true, while "I believe that all rational animals are mortal" may be false, since I may believe falsely that the Phoenix is an immortal rational animal. Thus we are led to consider statements about functions, or (more correctly) functions of functions.

We'll start by ignoring this last condition and the issues it brings up. The first two conditions can be viewed together. They say that there is one class, no more and no less, for each group of formally equivalent propositional functions; for example, the class of men will be the same as that of featherless bipeds, rational animals, Yahoos, or any other characteristic preferred for defining a human being. Now, when we say that two formally equivalent propositional functions might not be identical, even though they define the same class, we can prove this claim by showing that a statement can be true for one function and false for the other; for instance, "I believe that all men are mortal" might be true, while "I believe that all rational animals are mortal" could be false, since I might wrongly believe that the Phoenix is an immortal rational animal. Thus, we are led to consider statements about functions, or (more accurately) functions of functions.

Some of the things that may be said about a function may be regarded as said about the class defined by the function, whereas others cannot. The statement "all men are mortal" involves the functions " $x$ is human" and " $x$ is mortal"; or, if we choose, we can say that it involves the classes men and mortals. We can interpret the statement in either way, because its truth-value is unchanged if we substitute for " $x$ is human" or for " $x$ is mortal" any formally equivalent function. But, as we have just seen, the statement "I believe that all men are mortal" cannot be regarded as being about the class determined by either function, because its truth-value may be changed by the substitution of a formally equivalent function (which leaves the class unchanged). We will call a statement involving a function $\phi x$ an "extensional" function of the function $\phi x$ , if it is like "all men are mortal," i.e. if its truth-value is unchanged by the substitution of any formally equivalent function; and when a function of a function is not extensional, we will call it "intensional," so that "I believe that all men are mortal" is an intensional function of " $x$ is human" or " $x$ is mortal." Thus extensional functions of a function $x$ may, for practical [Pg 186] purposes, be regarded as functions of the class determined by $x$ , while intensional functions cannot be so regarded.

Some things that can be said about a function can also be seen as about the class defined by that function, while others cannot. The statement "all men are mortal" involves the functions " $x$ is human" and " $x$ is mortal"; or, if we prefer, it can involve the classes men and mortals. We can understand the statement in either way, because its truth doesn't change if we substitute " $x$ is human" or " $x$ is mortal" with any formally equivalent function. However, as we've just seen, the statement "I believe that all men are mortal" cannot be viewed as referring to the class determined by either function because its truth-value can change when substituting in a formally equivalent function (which leaves the class unchanged). We will call a statement involving a function $\phi x$ an "extensional" function of the function $\phi x$ , if it is like "all men are mortal," i.e. if its truth-value remains the same when substituting any formally equivalent function; and when a function of a function is not extensional, we will refer to it as "intensional," so that "I believe that all men are mortal" is an intensional function of " $x$ is human" or " $x$ is mortal." Thus, extensional functions of a function $x$ can, for practical purposes, be viewed as functions of the class determined by $x$ , while intensional functions cannot be regarded that way.

It is to be observed that all the specific functions of functions that we have occasion to introduce in mathematical logic are extensional. Thus, for example, the two fundamental functions of functions are: " $\phi x$ is always true" and " $\phi x$ is sometimes true." Each of these has its truth-value unchanged if any formally equivalent function is substituted for $\phi x$ . In the language of classes, if $\alpha$ is the class determined by $\phi x$ , " $\phi x$ is always true" is equivalent to "everything is a member of $\alpha$ ," and " $\phi x$ is sometimes true" is equivalent to " $\alpha$ has members" or (better) " $\alpha$ has at least one member." Take, again, the condition, dealt with in the preceding chapter, for the existence of "the term satisfying $\phi x$ ." The condition is that there is a term $c$ such that $\phi x$ is always equivalent to " $x$ is $c$ ." This is obviously extensional. It is equivalent to the assertion that the class defined by the function $\phi x$ is a unit class, i.e. a class having one member; in other words, a class which is a member of 1.

It should be noted that all the specific functions we introduce in mathematical logic are extensional. For instance, the two fundamental functions of functions are: " $\phi x$ is always true" and " $\phi x$ is sometimes true." The truth value of each of these remains unchanged if any formally equivalent function replaces $\phi x$ . In the context of classes, if $\alpha$ is the class determined by $\phi x$ , then " $\phi x$ is always true" is equivalent to "everything is a member of $\phi x$ is sometimes true" is equivalent to " $\alpha$ has members" or (better) " $\alpha$ has at least one member." Consider again the condition discussed in the previous chapter for the existence of "the term satisfying $\phi x$

Given a function of a function which may or may not be extensional, we can always derive from it a connected and certainly extensional function of the same function, by the following plan: Let our original function of a function be one which attributes to $\phi x$ the property $f$ ; then consider the assertion "there is a function having the property $f$ and formally equivalent to $\phi x$ ." This is an extensional function of $\phi x$ ; it is true when our original statement is true, and it is formally equivalent to the original function of $\phi x$ if this original function is extensional; but when the original function is intensional, the new one is more often true than the old one. For example, consider again "I believe that all men are mortal," regarded as a function of " $x$ is human." The derived extensional function is: "There is a function formally equivalent to ' $x$ is human' and such that I believe that whatever satisfies it is mortal." This remains true when we substitute " $x$ is a rational animal" [Pg 187] for " $x$ is human," even if I believe falsely that the Phoenix is rational and immortal.

Given a function of a function that might or might not be extensional, we can always create a connected and definitely extensional function of the same original function using the following approach: Let our original function of a function be one that assigns the property $\phi x$ the property $f$ ; then consider the claim "there is a function that has the property $f$ and is formally equivalent to $\phi x$ ; it holds true when our original statement is true, and is formally equivalent to the original function of $\phi x$ if the original function is extensional; but when the original function is intensional, the new one is usually more accurate than the old one. For example, consider again the statement "I believe that all men are mortal," viewed as a function of " $x$ is human." The derived extensional function is: "There is a function formally equivalent to ' $x$ is human' and such that I believe that whatever satisfies it is mortal." This remains true when we replace " $x$ is a rational animal" [Pg 187] for " $x$ is human," even if I mistakenly believe that the Phoenix is rational and immortal.

We give the name of "derived extensional function" to the function constructed as above, namely, to the function: "There is a function having the property $f$ and formally equivalent to $\phi x$ ," where the original function was "the function $\phi x$ has the property $f$ ."

We call the function created as described above a "derived extensional function." This function states: "There is a function that has the property $f$ and is formally equivalent to $\phi x$ ," where the original function was "the function $\phi x$ has the property $f$ ."

We may regard the derived extensional function as having for its argument the class determined by the function $\phi x$ , and as asserting $f$ of this class. This may be taken as the definition of a proposition about a class. I.e. we may define:

We can think of the derived extensional function as having the class identified by the function $\phi x$ as its argument, and it states $f$ for this class. This can be considered the definition of a proposition regarding a class. That is, we can define:

To assert that "the class determined by the function $\phi x$ has the property $f$ " is to assert that $\phi x$ satisfies the extensional function derived from $f$ .

To say that "the class defined by the function $\phi x$ has the property $\phi x$ meets the extensional function derived from $f$ .

This gives a meaning to any statement about a class which can be made significantly about a function; and it will be found that technically it yields the results which are required in order to make a theory symbolically satisfactory.[41]

This provides meaning to any statement about a class that can be clearly made about a function; and it will be found that technically it produces the results necessary to make a theory symbolically satisfactory.[41]

[41]See Principia Mathematica, vol. I. pp. 75-84 and * 20.

[41]See Principia Mathematica, vol. I, pp. 75-84 and * 20.

What we have said just now as regards the definition of classes is sufficient to satisfy our first four conditions. The way in which it secures the third and fourth, namely, the possibility of classes of classes, and the impossibility of a class being or not being a member of itself, is somewhat technical; it is explained in Principia Mathematica, but may be taken for granted here. It results that, but for our fifth condition, we might regard our task as completed. But this condition—at once the most important and the most difficult—is not fulfilled in virtue of anything we have said as yet. The difficulty is connected with the theory of types, and must be briefly discussed.[42]

What we've just discussed about the definition of classes is enough to meet our first four conditions. The way it addresses the third and fourth conditions—specifically, the possibility of classes of classes and the impossibility of a class being a member of itself—is a bit technical; it's detailed in Principia Mathematica, but we can assume it here. This means that, aside from our fifth condition, we could consider our task finished. However, this condition—both the most important and the most challenging—is not met based on what we've covered so far. The difficulty relates to the theory of types, and we need to discuss it briefly. [42]

[42]The reader who desires a fuller discussion should consult Principia Mathematica, Introduction, chap. II.; also * 12.

[42]Readers who want a more detailed discussion should check out Principia Mathematica, Introduction, chapter II; also * 12.

We saw in Chapter XIII. that there is a hierarchy of logical types, and that it is a fallacy to allow an object belonging to one of these to be substituted for an object belonging to another. [Pg 188] Now it is not difficult to show that the various functions which can take a given object $a$ as argument are not all of one type. Let us call them all $a$ -functions. We may take first those among them which do not involve reference to any collection of functions; these we will call "predicative $a$ -functions." If we now proceed to functions involving reference to the totality of predicative $a$ -functions, we shall incur a fallacy if we regard these as of the same type as the predicative $a$ -functions. Take such an everyday statement as " $a$ is a typical Frenchman." How shall we define a "typical" Frenchman? We may define him as one "possessing all qualities that are possessed by most French men." But unless we confine "all qualities" to such as do not involve a reference to any totality of qualities, we shall have to observe that most Frenchmen are not typical in the above sense, and therefore the definition shows that to be not typical is essential to a typical Frenchman. This is not a logical contradiction, since there is no reason why there should be any typical Frenchmen; but it illustrates the need for separating off qualities that involve reference to a totality of qualities from those that do not.

We saw in Chapter XIII that there’s a hierarchy of logical types, and it’s a mistake to substitute one object from this hierarchy for an object from another level. [Pg 188] Now, it's easy to show that the different functions accepting a given object $a$ as an argument are not all of the same type. Let’s call them all $a$ -functions. First, let’s consider those that don’t reference any collection of functions; we’ll refer to these as “predicative $a$ -functions.” If we then move on to functions that refer to all the predicative $a$ -functions, we’ll fall into a fallacy if we treat these as the same type as the predicative $a$ -functions. Take a common statement like “ $a$ is a typical Frenchman.” How do we define a “typical” Frenchman? We might say he’s someone “who has all the qualities that most Frenchmen possess.” But unless we limit “all qualities” to those that don’t reference any totality of qualities, we’ll find that most Frenchmen are not typical in that sense, which means the definition implies that not being typical is key to being a typical Frenchman. This isn't a logical contradiction since there's no reason why there should be typical Frenchmen, but it highlights the need to differentiate between qualities that reference a totality of qualities and those that don’t.

Whenever, by statements about "all" or "some" of the values that a variable can significantly take, we generate a new object, this new object must not be among the values which our previous variable could take, since, if it were, the totality of values over which the variable could range would only be definable in terms of itself, and we should be involved in a vicious circle. For example, if I say "Napoleon had all the qualities that make a great general," I must define "qualities" in such a way that it will not include what I am now saying, i.e. "having all the qualities that make a great general" must not be itself a quality in the sense supposed. This is fairly obvious, and is the principle which leads to the theory of types by which vicious-circle paradoxes are avoided. As applied to $a$ -functions, we may suppose that "qualities" is to mean "predicative functions." Then when I say "Napoleon had all the qualities, etc.," I mean [Pg 189] "Napoleon satisfied all the predicative functions, etc." This statement attributes a property to Napoleon, but not a predicative property; thus we escape the vicious circle. But wherever "all functions which" occurs, the functions in question must be limited to one type if a vicious circle is to be avoided; and, as Napoleon and the typical Frenchman have shown, the type is not rendered determinate by that of the argument. It would require a much fuller discussion to set forth this point fully, but what has been said may suffice to make it clear that the functions which can take a given argument are of an infinite series of types. We could, by various technical devices, construct a variable which would run through the first $n$ of these types, where $n$ is finite, but we cannot construct a variable which will run through them all, and, if we could, that mere fact would at once generate a new type of function with the same arguments, and would set the whole process going again.

Whenever we create a new object by discussing "all" or "some" of the values that a variable can take, this new object cannot be one of the values that our original variable could take. If it were, the full range of values for the variable would only be definable in terms of itself, leading to a circular reasoning problem. For example, if I say, "Napoleon had all the qualities that make a great general," I need to define "qualities" in such a way that it doesn't include what I'm currently saying—meaning "having all the qualities that make a great general" cannot itself be one of those qualities. This is fairly straightforward and is the principle that leads to the theory of types, which helps avoid circular paradoxes. When applied to $a$ -functions, we can understand "qualities" to mean "predicative functions." So when I say "Napoleon had all the qualities, etc.," I mean "Napoleon satisfied all the predicative functions, etc." This statement attributes a property to Napoleon, but not in a predicative sense; thus, we avoid the circular reasoning. However, whenever "all functions which" comes up, the functions being discussed must be restricted to one type to avoid circularity; and, as shown by Napoleon and the typical Frenchman, the type is not determined by that of the argument. It would take a longer discussion to explain this fully, but what has been said should clarify that the functions applicable to a given argument belong to an infinite series of types. We could, using various technical methods, create a variable that runs through the first $n$ of these types, where $n$ is finite, but we cannot create a variable that runs through them all. If we could, that alone would create a new type of function with the same arguments and restart the whole process.

We call predicative $a$ -functions the first type of $a$ -functions; $a$ -functions involving reference to the totality of the first type we call the second type; and so on. No variable $a$ -function can run through all these different types: it must stop short at some definite one.

We refer to predicative $a$ -functions as the first type of $a$ -functions; $a$ -functions that reference the totality of the first type are called the second type; and so on. No variable $a$ -function can go through all these different types: it must stop at a specific one.

These considerations are relevant to our definition of the derived extensional function. We there spoke of "a function formally equivalent to $\phi x$ ." It is necessary to decide upon the type of our function. Any decision will do, but some decision is unavoidable. Let us call the supposed formally equivalent function $\psi$ . Then $\psi$ appears as a variable, and must be of some determinate type. All that we know necessarily about the type of $\phi$ is that it takes arguments of a given type—that it is (say) an $a$ -function. But this, as we have just seen, does not determine its type. If we are to be able (as our fifth requisite demands) to deal with all classes whose members are of the same type as $a$ , we must be able to define all such classes by means of functions of some one type; that is to say, there must be some type of $a$ -function, say the $n^\mathord{th}$ , such that any $a$ -function is formally [Pg 190] equivalent to some $a$ -function of the $n^\mathord{th}$ type. If this is the case, then any extensional function which holds of all $a$ -functions of the $n^\mathord{th}$ type will hold of any $a$ -function whatever. It is chiefly as a technical means of embodying an assumption leading to this result that classes are useful. The assumption is called the "axiom of reducibility," and may be stated as follows:—

These considerations are important for defining the derived extensional function. We previously mentioned "a function that is formally equivalent to $\phi x$ ." It's essential to determine the type of our function. Any choice is acceptable, but a decision is necessary. Let's refer to the supposed formally equivalent function as $\psi$ . Then $\psi$ appears as a variable and must have a specific type. What we know for sure about the type of $\phi$ is that it takes arguments of a certain type—that it is (for example) an $a$ -function. However, as we've just noted, this does not define its type. If we are to fulfill our fifth requirement of being able to deal with all classes whose members are of the same type as $a$ , we must define all such classes using functions of a single type; in other words, there needs to be a specific type of $a$ -function, let’s say the $n^\mathord{th}$ type, such that any $a$ -function is formally equivalent to some $a$ -function of the $n^\mathord{th}$ type. If this is true, then any extensional function that applies to all $a$ -functions of the $n^\mathord{th}$ type will apply to any $a$ -function in general. Classes are primarily useful as a technical way to support an assumption leading to this conclusion. This assumption is known as the "axiom of reducibility," and can be stated as follows:—

"There is a type ( $\tau$ say) of $a$ -functions such that, given any $a$ -function, it is formally equivalent to some function of the type in question."

"There’s a type ( $\tau$ say) of $a$ -functions such that, for any $a$ -function, it’s formally equivalent to some function of the type in question."

If this axiom is assumed, we use functions of this type in defining our associated extensional function. Statements about all $a$ -classes (i.e. all classes defined by $a$ -functions) can be reduced to statements about all $a$ -functions of the type $\tau$ . So long as only extensional functions of functions are involved, this gives us in practice results which would otherwise have required the impossible notion of "all $a$ -functions." One particular region where this is vital is mathematical induction.

If we accept this principle, we use this type of function to define our related extensional function. Statements about all $a$ -classes (i.e. all classes defined by $a$ -functions) can be simplified to statements about all $a$ -functions of the type $\tau$ . As long as we are only dealing with extensional functions of functions, this practically provides results that would otherwise require the impossible concept of "all $a$ -functions." A key area where this is crucial is mathematical induction.

The axiom of reducibility involves all that is really essential in the theory of classes. It is therefore worth while to ask whether there is any reason to suppose it true.

The axiom of reducibility includes everything that is truly important in the theory of classes. Therefore, it's worth asking if there's any reason to believe it is true.

This axiom, like the multiplicative axiom and the axiom of infinity, is necessary for certain results, but not for the bare existence of deductive reasoning. The theory of deduction, as explained in Chapter XIV., and the laws for propositions involving "all" and "some," are of the very texture of mathematical reasoning: without them, or something like them, we should not merely not obtain the same results, but we should not obtain any results at all. We cannot use them as hypotheses, and deduce hypothetical consequences, for they are rules of deduction as well as premisses. They must be absolutely true, or else what we deduce according to them does not even follow from the premisses. On the other hand, the axiom of reducibility, like our two previous mathematical axioms, could perfectly well be stated as an hypothesis whenever it is used, instead of being assumed to be actually true. We can deduce [Pg 191] its consequences hypothetically; we can also deduce the consequences of supposing it false. It is therefore only convenient, not necessary. And in view of the complication of the theory of types, and of the uncertainty of all except its most general principles, it is impossible as yet to say whether there may not be some way of dispensing with the axiom of reducibility altogether. However, assuming the correctness of the theory outlined above, what can we say as to the truth or falsehood of the axiom?

This axiom, similar to the multiplicative axiom and the axiom of infinity, is essential for certain outcomes but not for the mere existence of deductive reasoning. The theory of deduction, as explained in Chapter XIV, and the laws pertaining to propositions involving "all" and "some," are fundamental to mathematical reasoning: without these or something similar, we wouldn't just miss similar results; we'd fail to get any results at all. We can't treat them as hypotheses to deduce hypothetical outcomes, as they serve as both rules of deduction and premises. They must be absolutely true, or else what we deduce based on them doesn't even logically follow from the premises. Conversely, the axiom of reducibility, like our two previous mathematical axioms, can easily be framed as a hypothesis whenever it is utilized, rather than being assumed to be actually true. We can deduce its consequences hypothetically, and we can also deduce what follows if we assume it's false. Therefore, it is merely convenient, not essential. Given the complexity of the theory of types and the uncertainty surrounding everything but its most general principles, it’s currently impossible to determine whether there might be a way to completely eliminate the axiom of reducibility. However, assuming the theory outlined above is correct, what can we say about the truth or falsehood of the axiom?

The axiom, we may observe, is a generalised form of Leibniz's identity of indiscernibles. Leibniz assumed, as a logical principle, that two different subjects must differ as to predicates. Now predicates are only some among what we called "predicative functions," which will include also relations to given terms, and various properties not to be reckoned as predicates. Thus Leibniz's assumption is a much stricter and narrower one than ours. (Not, of course, according to his logic, which regarded all propositions as reducible to the subject-predicate form.) But there is no good reason for believing his form, so far as I can see. There might quite well, as a matter of abstract logical possibility, be two things which had exactly the same predicates, in the narrow sense in which we have been using the word "predicate." How does our axiom look when we pass beyond predicates in this narrow sense? In the actual world there seems no way of doubting its empirical truth as regards particulars, owing to spatio-temporal differentiation: no two particulars have exactly the same spatial and temporal relations to all other particulars. But this is, as it were, an accident, a fact about the world in which we happen to find ourselves. Pure logic, and pure mathematics (which is the same thing), aims at being true, in Leibnizian phraseology, in all possible worlds, not only in this higgledy-piggledy job-lot of a world in which chance has imprisoned us. There is a certain lordliness which the logician should preserve: he must not condescend to derive arguments from the things he sees about him. [Pg 192]

The axiom, as we can see, is a general version of Leibniz's identity of indiscernibles. Leibniz believed, as a logical principle, that two different entities must differ in their predicates. However, predicates are just one type of what we call "predicative functions," which also includes relations to specific terms and various properties that aren't considered predicates. So, Leibniz's assumption is much stricter and narrower than ours. (This is not, of course, according to his logic, which viewed all propositions as reducible to the subject-predicate form.) But I see no good reason to endorse his form. It's entirely possible, in abstract logical terms, for two things to have exactly the same predicates, in the specific sense we've been using the word "predicate." How does our axiom hold up when we look beyond predicates in this narrow sense? In the real world, it seems undeniable that its empirical truth holds for particulars, due to spatio-temporal differentiation: no two particulars have exactly the same spatial and temporal relations to all other particulars. Yet, this is somewhat of an accident, a fact about the world we're in. Pure logic, and pure mathematics (which is essentially the same), aims to be true, using Leibnizian terminology, in all possible worlds, not just in this chaotic jumble of a world where chance has trapped us. There’s a certain dignity that the logician should maintain: they shouldn’t stoop to deriving arguments from the things they observe around them. [Pg 192]

Viewed from this strictly logical point of view, I do not see any reason to believe that the axiom of reducibility is logically necessary, which is what would be meant by saying that it is true in all possible worlds. The admission of this axiom into a system of logic is therefore a defect, even if the axiom is empirically true. It is for this reason that the theory of classes cannot be regarded as being as complete as the theory of descriptions. There is need of further work on the theory of types, in the hope of arriving at a doctrine of classes which does not require such a dubious assumption. But it is reasonable to regard the theory outlined in the present chapter as right in its main lines, i.e. in its reduction of propositions nominally about classes to propositions about their defining functions. The avoidance of classes as entities by this method must, it would seem, be sound in principle, however the detail may still require adjustment. It is because this seems indubitable that we have included the theory of classes, in spite of our desire to exclude, as far as possible, whatever seemed open to serious doubt.

From this strictly logical perspective, I don't see any reason to think that the axiom of reducibility is logically necessary, which would imply it's true in all possible worlds. Allowing this axiom into a logical system is therefore a flaw, even if the axiom is empirically valid. For this reason, the theory of classes can't be seen as complete as the theory of descriptions. More work is needed on the theory of types, with the hope of developing a class doctrine that doesn't rely on such a questionable assumption. However, it's reasonable to consider the theory described in this chapter as correct in its main ideas, i.e., in how it reduces propositions nominally about classes to propositions about their defining functions. This method's avoidance of classes as entities appears to be fundamentally sound, even if the specifics still need refinement. Because this seems undeniable, we have included the theory of classes, despite our intention to exclude anything that seemed seriously questionable.

The theory of classes, as above outlined, reduces itself to one axiom and one definition. For the sake of definiteness, we will here repeat them. The axiom is:

The theory of classes, as described above, boils down to one axiom and one definition. To be clear, we will repeat them here. The axiom is:

There is a type $\tau$ such that if $\phi$ is a function which can take a given object $a$ as argument, then there is a function $\psi$ of the type $\tau$ which is formally equivalent to $\phi$ .

There is a type $\tau$ such that if $\phi$ is a function that can take a given object $a$ as an argument, then there is a function $\psi$ of the type $\tau$ that is formally equivalent to $\phi$ .

The definition is:

If $\phi$ is a function which can take a given object $a$ as argument, and $\tau$ the type mentioned in the above axiom, then to say that the class determined by $\phi$ has the property $f$ is to say that there is a function of type $\tau$ , formally equivalent to $\phi$ , and having the property $f$ . [Pg 193]

If $\phi$ is a function that can take a specific object $a$ as an input, and $\tau$ is the type mentioned in the previous axiom, then stating that the class defined by $\phi$ has the property $f$ means there is a function of type $\tau$ , formally equivalent to $\phi$ , that has the property $f$ . [Pg 193]

CHAPTER XVIII

MATHEMATICS AND LOGIC

MATHEMATICS and logic, historically speaking, have been entirely distinct studies. Mathematics has been connected with science, logic with Greek. But both have developed in modern times: logic has become more mathematical and mathematics has become more logical. The consequence is that it has now become wholly impossible to draw a line between the two; in fact, the two are one. They differ as boy and man: logic is the youth of mathematics and mathematics is the manhood of logic. This view is resented by logicians who, having spent their time in the study of classical texts, are incapable of following a piece of symbolic reasoning, and by mathematicians who have learnt a technique without troubling to inquire into its meaning or justification. Both types are now fortunately growing rarer. So much of modern mathematical work is obviously on the border-line of logic, so much of modern logic is symbolic and formal, that the very close relationship of logic and mathematics has become obvious to every instructed student. The proof of their identity is, of course, a matter of detail: starting with premisses which would be universally admitted to belong to logic, and arriving by deduction at results which as obviously belong to mathematics, we find that there is no point at which a sharp line can be drawn, with logic to the left and mathematics to the right. If there are still those who do not admit the identity of logic and mathematics, we may challenge them to indicate at what point, in the successive definitions and [Pg 194] deductions of Principia Mathematica, they consider that logic ends and mathematics begins. It will then be obvious that any answer must be quite arbitrary.

MATHEMATICS and logic have historically been completely separate fields. Mathematics has been associated with science, while logic has been linked to Greek philosophy. However, both have evolved in modern times: logic has become more mathematical, and mathematics has taken on more logical aspects. As a result, it’s now practically impossible to draw a clear distinction between the two; in fact, they are essentially one. They are different like a boy and a man: logic is the youth of mathematics, and mathematics is the adulthood of logic. This perspective is often resisted by logicians who, having focused on classical texts, struggle to follow symbolic reasoning, and by mathematicians who have mastered techniques without considering their meaning or justification. Fortunately, both types of thinkers are becoming rarer. Much of today’s mathematical work lies on the edge of logic, and much of today’s logic is symbolic and formal, making the close relationship between logic and mathematics clear to any knowledgeable student. The proof of their identity is a matter of detail: if we start with universally accepted premises from logic and deduce results that clearly belong to mathematics, we see there isn’t a point where a strict line can be drawn, with logic on one side and mathematics on the other. For those who still deny the identity of logic and mathematics, we challenge them to pinpoint where they believe logic ends and mathematics begins in the successive definitions and deductions of Principia Mathematica. It will then become evident that any response must be entirely arbitrary.

In the earlier chapters of this book, starting from the natural numbers, we have first defined "cardinal number" and shown how to generalise the conception of number, and have then analysed the conceptions involved in the definition, until we found ourselves dealing with the fundamentals of logic. In a synthetic, deductive treatment these fundamentals come first, and the natural numbers are only reached after a long journey. Such treatment, though formally more correct than that which we have adopted, is more difficult for the reader, because the ultimate logical concepts and propositions with which it starts are remote and unfamiliar as compared with the natural numbers. Also they represent the present frontier of knowledge, beyond which is the still unknown; and the dominion of knowledge over them is not as yet very secure.

In the earlier chapters of this book, starting with the natural numbers, we first defined "cardinal number" and demonstrated how to expand the idea of number. We then examined the concepts involved in the definition until we found ourselves addressing the basics of logic. In a synthetic, deductive approach, these basics come first, and the natural numbers are only reached after a long journey. While this approach is formally more correct than what we've chosen, it is harder for the reader because the ultimate logical concepts and propositions it begins with are distant and unfamiliar compared to the natural numbers. They also represent the current edge of knowledge, beyond which lies the unknown, and our understanding of them isn't yet very secure.

It used to be said that mathematics is the science of "quantity." "Quantity" is a vague word, but for the sake of argument we may replace it by the word "number." The statement that mathematics is the science of number would be untrue in two different ways. On the one hand, there are recognised branches of mathematics which have nothing to do with number—all geometry that does not use co-ordinates or measurement, for example: projective and descriptive geometry, down to the point at which co-ordinates are introduced, does not have to do with number, or even with quantity in the sense of greater and less. On the other hand, through the definition of cardinals, through the theory of induction and ancestral relations, through the general theory of series, and through the definitions of the arithmetical operations, it has become possible to generalise much that used to be proved only in connection with numbers. The result is that what was formerly the single study of Arithmetic has now become divided into numbers of separate studies, no one of which is specially concerned with numbers. The most [Pg 195] elementary properties of numbers are concerned with one-one relations, and similarity between classes. Addition is concerned with the construction of mutually exclusive classes respectively similar to a set of classes which are not known to be mutually exclusive. Multiplication is merged in the theory of "selections," i.e. of a certain kind of one-many relations. Finitude is merged in the general study of ancestral relations, which yields the whole theory of mathematical induction. The ordinal properties of the various kinds of number-series, and the elements of the theory of continuity of functions and the limits of functions, can be generalised so as no longer to involve any essential reference to numbers. It is a principle, in all formal reasoning, to generalise to the utmost, since we thereby secure that a given process of deduction shall have more widely applicable results; we are, therefore, in thus generalising the reasoning of arithmetic, merely following a precept which is universally admitted in mathematics. And in thus generalising we have, in effect, created a set of new deductive systems, in which traditional arithmetic is at once dissolved and enlarged; but whether any one of these new deductive systems—for example, the theory of selections—is to be said to belong to logic or to arithmetic is entirely arbitrary, and incapable of being decided rationally.

It used to be said that mathematics is the science of "quantity." "Quantity" is a vague term, but for the sake of discussion, we can replace it with the word "number." The claim that mathematics is the study of numbers is inaccurate in two ways. First, there are established branches of mathematics that have nothing to do with numbers—like all of geometry that doesn’t involve coordinates or measurements. For example, projective and descriptive geometry, until coordinates are introduced, doesn’t deal with numbers or even with quantity in the sense of greater and less. Second, thanks to the definition of cardinals, the theory of induction and ancestral relations, the general theory of series, and the definitions of arithmetic operations, it has become possible to generalize many concepts that were previously only proved in relation to numbers. As a result, what used to be the sole study of Arithmetic has now split into several distinct fields, none of which focus specifically on numbers. The most basic properties of numbers relate to one-to-one relationships and similarities between classes. Addition involves creating mutually exclusive classes that are similar to a set of classes that are not known to be mutually exclusive. Multiplication is embedded in the theory of "selections," which deals with a specific kind of one-to-many relationships. Finitude is integrated into the broader study of ancestral relations, leading to the entire theory of mathematical induction. The ordinal properties of various kinds of number series, along with elements of the theory of function continuity and function limits, can be generalized so they no longer require any essential reference to numbers. It’s a principle in all formal reasoning to generalize as much as possible, since this ensures that a given deductive process will yield results that are more broadly applicable. Therefore, by generalizing the reasoning of arithmetic, we are simply following a guideline that is widely accepted in mathematics. In doing so, we have effectively created a set of new deductive systems, in which traditional arithmetic is both dissolved and expanded. However, whether any of these new deductive systems—like the theory of selections—belongs to logic or arithmetic is entirely arbitrary and cannot be rationally determined.

We are thus brought face to face with the question: What is this subject, which may be called indifferently either mathematics or logic? Is there any way in which we can define it?

We are therefore confronted with the question: What is this subject, which can be referred to as either mathematics or logic? Is there any way we can define it?

Certain characteristics of the subject are clear. To begin with, we do not, in this subject, deal with particular things or particular properties: we deal formally with what can be said about any thing or any property. We are prepared to say that one and one are two, but not that Socrates and Plato are two, because, in our capacity of logicians or pure mathematicians, we have never heard of Socrates and Plato. A world in which there were no such individuals would still be a world in which one and one are two. It is not open to us, as pure mathematicians or logicians, to mention anything at all, because, if we do so, [Pg 196] we introduce something irrelevant and not formal. We may make this clear by applying it to the case of the syllogism. Traditional logic says: "All men are mortal, Socrates is a man, therefore Socrates is mortal." Now it is clear that what we mean to assert, to begin with, is only that the premisses imply the conclusion, not that premisses and conclusion are actually true; even the most traditional logic points out that the actual truth of the premisses is irrelevant to logic. Thus the first change to be made in the above traditional syllogism is to state it in the form: "If all men are mortal and Socrates is a man, then Socrates is mortal." We may now observe that it is intended to convey that this argument is valid in virtue of its form, not in virtue of the particular terms occurring in it. If we had omitted "Socrates is a man" from our premisses, we should have had a non-formal argument, only admissible because Socrates is in fact a man; in that case we could not have generalised the argument. But when, as above, the argument is formal, nothing depends upon the terms that occur in it. Thus we may substitute $\alpha$ for men, $\beta$ for mortals, and $x$ for Socrates, where $\alpha$ and $\beta$ are any classes whatever, and $x$ is any individual. We then arrive at the statement: "No matter what possible values $x$ and $\alpha$ and $\beta$ may have, if all $\alpha$ 's are $\beta$ 's and $x$ is an $\alpha$ , then $x$ is a $\beta$ "; in other words, "the propositional function 'if all $\alpha$ 's are $\beta$ and $x$ is an $\alpha$ , then $x$ is a $\beta$ ' is always true." Here at last we have a proposition of logic—the one which is only suggested by the traditional statement about Socrates and men and mortals.

Certain characteristics of the subject are clear. To start, we don't focus on specific things or specific properties: we formally address what can be said about any thing or any property. We can state that one and one make two, but we wouldn’t say that Socrates and Plato make two, because, as logicians or pure mathematicians, we have no knowledge of Socrates and Plato. A world without such individuals would still be a world where one and one make two. As pure mathematicians or logicians, we can't mention anything at all, because doing so would introduce something irrelevant and not formal. We can clarify this by applying it to syllogisms. Traditional logic says: "All men are mortal, Socrates is a man, therefore Socrates is mortal." It's clear that what we actually intend to assert is only that the premises imply the conclusion, not that the premises and conclusion are actually true; even traditional logic points out that the actual truth of the premises is irrelevant to logic. Therefore, the first change made to the above traditional syllogism is to express it in the form: "If all men are mortal and Socrates is a man, then Socrates is mortal." We can now observe that it's meant to indicate that this argument is valid due to its form, not because of the specific terms used. If we omitted "Socrates is a man" from our premises, we would end up with a non-formal argument that’s only valid because Socrates is indeed a man; in that case, we wouldn't be able to generalize the argument. But when, as stated above, the argument is formal, nothing depends on the terms involved. Thus we can substitute $\alpha$ for men, $\beta$ for mortals, and $x$ for Socrates, where $\alpha$ and $\beta$ are any classes at all, and $x$ is any individual. We then arrive at the statement: "No matter what possible values $x$ and $\alpha$ and $\beta$ may have, if all $\alpha$ 's are $\beta$ 's and $x$ is an $\alpha$ , then $x$ is a $\beta$ "; in other words, "the propositional function 'if all $\alpha$ 's are $\beta$ and $x$ is an $\alpha$ , then $x$ is a $\beta$ ' is always true." Here, finally, we have a proposition of logic—the one that is only suggested by the traditional statement about Socrates and men and mortals.

It is clear that, if formal reasoning is what we are aiming at, we shall always arrive ultimately at statements like the above, in which no actual things or properties are mentioned; this will happen through the mere desire not to waste our time proving in a particular case what can be proved generally. It would be ridiculous to go through a long argument about Socrates, and then go through precisely the same argument again about Plato. If our argument is one (say) which holds of all men, we shall prove it concerning " $x$ ," with the hypothesis "if $x$ is a man." With [Pg 197] this hypothesis, the argument will retain its hypothetical validity even when $x$ is not a man. But now we shall find that our argument would still be valid if, instead of supposing $x$ to be a man, we were to suppose him to be a monkey or a goose or a Prime Minister. We shall therefore not waste our time taking as our premiss " $x$ is a man" but shall take " $x$ is an $\alpha$ ," where $\alpha$ is any class of individuals, or " $\phi x$ " where $\phi$ is any propositional function of some assigned type. Thus the absence of all mention of particular things or properties in logic or pure mathematics is a necessary result of the fact that this study is, as we say, "purely formal."

It's clear that if we're aiming for formal reasoning, we'll always end up with statements like the one above, where no specific things or properties are mentioned. This happens simply because we want to avoid wasting time proving something in a particular case when it can be proven generally. It would be pointless to argue for a long time about Socrates and then repeat the same argument for Plato. If our argument is one that applies to all men, we'll prove it concerning " $x$ ," with the assumption "if $x$ is a man." With this assumption, the argument remains hypothetically valid even if $x$ is not a man. However, we'll find that our argument would still work if, instead of assuming $x$ is a man, we assume he's a monkey, a goose, or a Prime Minister. Therefore, we won't waste our time using " $x$ is a man" as our premise; instead, we'll take " $x$ is an $\alpha$ " where $\alpha$ is any class of individuals, or " $\phi x$ " where $\phi$ is any propositional function of a certain type. Thus, the lack of specific references to things or properties in logic or pure mathematics is a necessary outcome of the fact that this study is, as we say, "purely formal."

At this point we find ourselves faced with a problem which is easier to state than to solve. The problem is: "What are the constituents of a logical proposition?" I do not know the answer, but I propose to explain how the problem arises.

At this point, we face a problem that's easier to describe than to resolve. The problem is: "What are the elements of a logical proposition?" I don’t have the answer, but I’d like to explain how this problem comes up.

Take (say) the proposition "Socrates was before Aristotle." Here it seems obvious that we have a relation between two terms, and that the constituents of the proposition (as well as of the corresponding fact) are simply the two terms and the relation, i.e. Socrates, Aristotle, and before. (I ignore the fact that Socrates and Aristotle are not simple; also the fact that what appear to be their names are really truncated descriptions. Neither of these facts is relevant to the present issue.) We may represent the general form of such propositions by " $x\mathrm Ry$ ," which may be read " $x$ has the relation $\mathrm R$ to $y$ ." This general form may occur in logical propositions, but no particular instance of it can occur. Are we to infer that the general form itself is a constituent of such logical propositions?

Take, for example, the statement "Socrates was before Aristotle." Here, it's clear that we have a relationship between two terms, and the elements of the statement (as well as the related fact) are simply the two terms and the relationship, namely Socrates, Aristotle, and before. (I'm ignoring the fact that Socrates and Aristotle are not simple figures; I'm also disregarding that what seem to be their names are actually shortened descriptions. Neither of these points is relevant to our current discussion.) We can represent the general form of such statements as " $x\mathrm Ry$ ," which can be interpreted as " $x$ has the relationship $\mathrm R$ to $y$ ." While this general form may appear in logical statements, no specific instance of it can occur. Should we conclude that the general form itself is a component of such logical statements?

Given a proposition, such as "Socrates is before Aristotle," we have certain constituents and also a certain form. But the form is not itself a new constituent; if it were, we should need a new form to embrace both it and the other constituents. We can, in fact, turn all the constituents of a proposition into variables, while keeping the form unchanged. This is what we do when we use such a schema as " $x\mathrm Ry$ ," which stands for any [Pg 198] one of a certain class of propositions, namely, those asserting relations between two terms. We can proceed to general assertions, such as " $x\mathrm Ry$ is sometimes true"—i.e. there are cases where dual relations hold. This assertion will belong to logic (or mathematics) in the sense in which we are using the word. But in this assertion we do not mention any particular things or particular relations; no particular things or relations can ever enter into a proposition of pure logic. We are left with pure forms as the only possible constituents of logical propositions.

Given a statement like "Socrates is before Aristotle," we have specific components and a particular structure. However, the structure isn't a new component itself; if it were, we would need a new structure to include both it and the other components. In fact, we can turn all the components of a statement into variables while the structure remains the same. This is what we do when we use a format like " $x\mathrm Ry$ ," which represents any one of a specific class of statements, specifically those that assert relationships between two terms. We can also make general statements, like " $x\mathrm Ry$ is sometimes true"—i.e. there are instances where two relationships apply. This statement will be part of logic (or mathematics) in the way we are using the term. However, in this statement, we don’t refer to any specific things or specific relationships; no specific things or relationships can ever be included in a statement of pure logic. We are left with pure forms as the only possible components of logical statements.

I do not wish to assert positively that pure forms—e.g. the form " $x\mathrm Ry$ "—do actually enter into propositions of the kind we are considering. The question of the analysis of such propositions is a difficult one, with conflicting considerations on the one side and on the other. We cannot embark upon this question now, but we may accept, as a first approximation, the view that forms are what enter into logical propositions as their constituents. And we may explain (though not formally define) what we mean by the "form" of a proposition as follows:—

I don’t want to definitely say that pure forms—e.g. the form " $x\mathrm Ry$ "—actually play a role in the types of propositions we’re looking at. The analysis of such propositions is a complex issue, with differing points to consider on both sides. We can’t dive into this question right now, but we can tentatively accept the idea that forms are what make up logical propositions as their building blocks. And we can describe (though not formally define) what we mean by the "form" of a proposition like this:—

The "form" of a proposition is that, in it, that remains unchanged when every constituent of the proposition is replaced by another.

The "form" of a proposition is what stays the same when every part of the proposition is substituted with something else.

Thus "Socrates is earlier than Aristotle" has the same form as "Napoleon is greater than Wellington," though every constituent of the two propositions is different.

Thus "Socrates is earlier than Aristotle" has the same structure as "Napoleon is greater than Wellington," even though every part of the two statements is different.

We may thus lay down, as a necessary (though not sufficient) characteristic of logical or mathematical propositions, that they are to be such as can be obtained from a proposition containing no variables (i.e. no such words as all, some, a, the, etc.) by turning every constituent into a variable and asserting that the result is always true or sometimes true, or that it is always true in respect of some of the variables that the result is sometimes true in respect of the others, or any variant of these forms. And another way of stating the same thing is to say that logic (or mathematics) is concerned only with forms, and is concerned with them only in the way of stating that they are always or [Pg 199] sometimes true—with all the permutations of "always" and "sometimes" that may occur.

We can therefore establish as a necessary (though not sufficient) characteristic of logical or mathematical propositions that they should be derived from a proposition that contains no variables (i.e., none of the words like "all," "some," "a," "the," etc.) by changing every part into a variable and claiming that the outcome is always true or sometimes true, or that it is always true concerning some of the variables while being sometimes true regarding others, or any variation of these forms. Another way to express the same idea is to say that logic (or mathematics) deals only with forms and only in the sense of stating that they are always or [Pg 199] sometimes true—with all the different combinations of "always" and "sometimes" that might occur.

There are in every language some words whose sole function is to indicate form. These words, broadly speaking, are commonest in languages having fewest inflections. Take "Socrates is human." Here "is" is not a constituent of the proposition, but merely indicates the subject-predicate form. Similarly in "Socrates is earlier than Aristotle," "is" and "than" merely indicate form; the proposition is the same as "Socrates precedes Aristotle," in which these words have disappeared and the form is otherwise indicated. Form, as a rule, can be indicated otherwise than by specific words: the order of the words can do most of what is wanted. But this principle must not be pressed. For example, it is difficult to see how we could conveniently express molecular forms of propositions (i.e. what we call "truth-functions") without any word at all. We saw in Chapter XIV. that one word or symbol is enough for this purpose, namely, a word or symbol expressing incompatibility. But without even one we should find ourselves in difficulties. This, however, is not the point that is important for our present purpose. What is important for us is to observe that form may be the one concern of a general proposition, even when no word or symbol in that proposition designates the form. If we wish to speak about the form itself, we must have a word for it; but if, as in mathematics, we wish to speak about all propositions that have the form, a word for the form will usually be found not indispensable; probably in theory it is never indispensable.

In every language, there are some words whose only purpose is to show the structure. Generally, these words are most common in languages with fewer inflections. Take "Socrates is human." Here, "is" isn’t part of the actual proposition but just shows the subject-predicate structure. Similarly, in "Socrates is earlier than Aristotle," "is" and "than" simply indicate structure; the proposition is the same as "Socrates precedes Aristotle," where these words are omitted and the structure is shown in a different way. Usually, structure can be indicated by means other than specific words: the order of the words can accomplish much of what is needed. However, this principle shouldn't be pushed too far. For instance, it’s hard to see how we could express molecular forms of propositions (i.e., what we call "truth-functions") conveniently without any words at all. We noted in Chapter XIV. that one word or symbol can serve this purpose, specifically a word or symbol that expresses incompatibility. But without even one, we would run into problems. Nonetheless, this isn’t the key point for our current discussion. What’s important for us to recognize is that form may be the sole focus of a general proposition, even when no word or symbol in that proposition indicates the form. If we want to talk about the form itself, we need a word for it; but if, as in mathematics, we want to discuss all propositions that share the form, a word for the form isn’t usually necessary; in theory, it’s probably never essential.

Assuming—as I think we may—that the forms of propositions can be represented by the forms of the propositions in which they are expressed without any special word for forms, we should arrive at a language in which everything formal belonged to syntax and not to vocabulary. In such a language we could express all the propositions of mathematics even if we did not know one single word of the language. The language of mathematical [Pg 200] logic, if it were perfected, would be such a language. We should have symbols for variables, such as " $x$ " and " $\mathrm R$ " and " $y$ ," arranged in various ways; and the way of arrangement would indicate that something was being said to be true of all values or some values of the variables. We should not need to know any words, because they would only be needed for giving values to the variables, which is the business of the applied mathematician, not of the pure mathematician or logician. It is one of the marks of a proposition of logic that, given a suitable language, such a proposition can be asserted in such a language by a person who knows the syntax without knowing a single word of the vocabulary.

Assuming—as I believe we can—that the forms of propositions can be represented by the forms of the propositions in which they are expressed without needing a specific word for forms, we would arrive at a language where everything formal belonged to syntax rather than vocabulary. In such a language, we could express all the propositions of mathematics even if we didn't know a single word of the language. The language of mathematical logic, if it were perfected, would be such a language. We would have symbols for variables, like " $x$ " and " $\mathrm R$ " and " $y$ ," arranged in different ways; and the way they are arranged would indicate that something is being stated to be true of all values or some values of the variables. We wouldn’t need to know any words, because they would only be necessary for assigning values to the variables, which is the job of the applied mathematician, not the pure mathematician or logician. One of the hallmarks of a logical proposition is that, given a suitable language, such a proposition can be asserted in that language by someone who understands the syntax without knowing a single word of the vocabulary.

But, after all, there are words that express form, such as "is" and "than." And in every symbolism hitherto invented for mathematical logic there are symbols having constant formal meanings. We may take as an example the symbol for incompatibility which is employed in building up truth-functions. Such words or symbols may occur in logic. The question is: How are we to define them?

But, after all, there are words that express form, like "is" and "than." And in every symbolism created for mathematical logic, there are symbols that have consistent formal meanings. We can take the symbol for incompatibility, which is used in constructing truth functions, as an example. Such words or symbols can appear in logic. The question is: How do we define them?

Such words or symbols express what are called "logical constants." Logical constants may be defined exactly as we defined forms; in fact, they are in essence the same thing. A fundamental logical constant will be that which is in common among a number of propositions, any one of which can result from any other by substitution of terms one for another. For example, "Napoleon is greater than Wellington" results from "Socrates is earlier than Aristotle" by the substitution of "Napoleon" for "Socrates," "Wellington" for "Aristotle," and "greater" for "earlier." Some propositions can be obtained in this way from the prototype "Socrates is earlier than Aristotle" and some cannot; those that can are those that are of the form " $x\mathrm Ry$ ," i.e. express dual relations. We cannot obtain from the above prototype by term-for-term substitution such propositions as "Socrates is human" or "the Athenians gave the hemlock to Socrates," because the first is of the subject-predicate [Pg 201] form and the second expresses a three-term relation. If we are to have any words in our pure logical language, they must be such as express "logical constants," and "logical constants" will always either be, or be derived from, what is in common among a group of propositions derivable from each other, in the above manner, by term-for-term substitution. And this which is in common is what we call "form."

Such words or symbols represent what we call "logical constants." Logical constants can be defined just like we defined forms; essentially, they are the same thing. A basic logical constant will be what is shared among several propositions, where any one of them can be derived from another by replacing terms with one another. For example, "Napoleon is greater than Wellington" comes from "Socrates is earlier than Aristotle" by substituting "Napoleon" for "Socrates," "Wellington" for "Aristotle," and "greater" for "earlier." Some propositions can be formed this way from the prototype "Socrates is earlier than Aristotle," and some cannot; those that can take the form of " $x\mathrm Ry$ ," i.e. express dual relations. We cannot derive propositions like "Socrates is human" or "the Athenians gave the hemlock to Socrates" from the above prototype through term-for-term substitution, because the first is a subject-predicate form and the second expresses a three-term relation. If we are to include any words in our pure logical language, they must express "logical constants," and "logical constants" will always either be or be derived from what is common among a group of propositions that can be derived from each other in the manner discussed, through term-for-term substitution. And this shared aspect is what we refer to as "form."

In this sense all the "constants" that occur in pure mathematics are logical constants. The number 1, for example, is derivative from propositions of the form: "There is a term $c$ such that $\phi x$ is true when, and only when, $x$ is $c$ ." This is a function of $\phi$ , and various different propositions result from giving different values to $\phi$ . We may (with a little omission of intermediate steps not relevant to our present purpose) take the above function of $\phi$ as what is meant by "the class determined by $\phi$ is a unit class" or "the class determined by $\phi$ is a member of 1" (1 being a class of classes). In this way, propositions in which 1 occurs acquire a meaning which is derived from a certain constant logical form. And the same will be found to be the case with all mathematical constants: all are logical constants, or symbolic abbreviations whose full use in a proper context is defined by means of logical constants.

In this way, all the "constants" found in pure mathematics are logical constants. For instance, the number 1 comes from statements like: "There exists a term $c$ such that $\phi x$ is true if and only if $x$ is $c$ ." This represents a function of $\phi$ , and different statements come from assigning different values to $\phi$ . We can (with a slight omission of intermediate steps not relevant to our current focus) take the function of $\phi$ as what is meant by "the class determined by $\phi$ is a unit class" or "the class determined by $\phi$ is a member of 1" (with 1 being a class of classes). This way, statements involving 1 gain a meaning derived from a specific constant logical form. The same applies to all mathematical constants: they are all logical constants or symbolic shortcuts whose full meaning in the right context is defined using logical constants.

But although all logical (or mathematical) propositions can be expressed wholly in terms of logical constants together with variables, it is not the case that, conversely, all propositions that can be expressed in this way are logical. We have found so far a necessary but not a sufficient criterion of mathematical propositions. We have sufficiently defined the character of the primitive ideas in terms of which all the ideas of mathematics can be defined, but not of the primitive propositions from which all the propositions of mathematics can be deduced. This is a more difficult matter, as to which it is not yet known what the full answer is.

But even though all logical (or mathematical) statements can be completely expressed using logical constants and variables, not all statements that can be expressed this way are logical. So far, we've established a necessary but not sufficient criterion for mathematical statements. We’ve clearly defined the nature of the primitive ideas that all mathematical ideas can be defined in terms of, but not of the primitive propositions from which all mathematical propositions can be deduced. This is a more complex issue, and it’s still unclear what the complete answer is.

We may take the axiom of infinity as an example of a proposition which, though it can be enunciated in logical terms, [Pg 202] cannot be asserted by logic to be true. All the propositions of logic have a characteristic which used to be expressed by saying that they were analytic, or that their contradictories were self-contradictory. This mode of statement, however, is not satisfactory. The law of contradiction is merely one among logical propositions; it has no special pre-eminence; and the proof that the contradictory of some proposition is self-contradictory is likely to require other principles of deduction besides the law of contradiction. Nevertheless, the characteristic of logical propositions that we are in search of is the one which was felt, and intended to be defined, by those who said that it consisted in deducibility from the law of contradiction. This characteristic, which, for the moment, we may call tautology, obviously does not belong to the assertion that the number of individuals in the universe is $n$ , whatever number $n$ may be. But for the diversity of types, it would be possible to prove logically that there are classes of $n$ terms, where $n$ is any finite integer; or even that there are classes of $\aleph_{0}$ terms. But, owing to types, such proofs, as we saw in Chapter XIII., are fallacious. We are left to empirical observation to determine whether there are as many as $n$ individuals in the world. Among "possible" worlds, in the Leibnizian sense, there will be worlds having one, two, three, ... individuals. There does not even seem any logical necessity why there should be even one individual [43]—why, in fact, there should be any world at all. The ontological proof of the existence of God, if it were valid, would establish the logical necessity of at least one individual. But it is generally recognised as invalid, and in fact rests upon a mistaken view of existence—i.e. it fails to realise that existence can only be asserted of something described, not of something named, so that it is meaningless to argue from "this is the so-and-so" and "the so-and-so exists" to "this exists." If we reject the ontological [Pg 203] argument, we seem driven to conclude that the existence of a world is an accident—i.e. it is not logically necessary. If that be so, no principle of logic can assert "existence" except under a hypothesis, i.e. none can be of the form "the propositional function so-and-so is sometimes true." Propositions of this form, when they occur in logic, will have to occur as hypotheses or consequences of hypotheses, not as complete asserted propositions. The complete asserted propositions of logic will all be such as affirm that some propositional function is always true. For example, it is always true that if $p$ implies $q$ and $q$ implies $r$ then $p$ implies $r$ , or that, if all $\alpha$ 's are $\beta$ 's and $x$ is an $\alpha$ then $x$ is a $\beta$ . Such propositions may occur in logic, and their truth is independent of the existence of the universe. We may lay it down that, if there were no universe, all general propositions would be true; for the contradictory of a general proposition (as we saw in Chapter XV.) is a proposition asserting existence, and would therefore always be false if no universe existed.

We can consider the axiom of infinity as an example of a statement that, even though it can be expressed in logical terms, [Pg 202] cannot be claimed to be true by logic. All logical propositions share a feature that used to be described as analytic, meaning their opposites are self-contradictory. However, this way of stating it is not satisfactory. The law of contradiction is just one of many logical propositions; it doesn’t have any special importance. Proving that the opposite of a certain proposition is self-contradictory will likely require additional deductive principles besides the law of contradiction. Still, the feature of logical propositions we’re trying to identify is the one that was sensed and intended to be defined by those who claimed it could be derived from the law of contradiction. For now, we can call this feature tautology, which clearly does not apply to the claim that the number of individuals in the universe is $n$ , regardless of what number $n$ represents. If it weren't for the variety of types, it would be possible to logically prove that there are classes of $n$ terms, where $n$ is any finite integer; or even that there are classes of $\aleph_{0}$ terms. However, because of types, as we saw in Chapter XIII, such proofs are flawed. We are left to rely on empirical observation to figure out whether there are as many as $n$ individuals in the world. Among "possible" worlds, as per Leibniz's definition, there will be worlds with one, two, three, ... individuals. There doesn’t even appear to be any logical necessity for there to be even one individual [43]—or, in fact, for any world to exist at all. The ontological proof of God’s existence, if it were valid, would confirm the logical necessity of at least one individual. But it is widely recognized as invalid and is based on a misunderstanding of existence—i.e. it does not acknowledge that existence can only be affirmed of something described, not merely named, making it nonsensical to argue from "this is the so-and-so" and "the so-and-so exists" to "this exists." If we dismiss the ontological [Pg 203] argument, we seem compelled to conclude that the existence of a world is an accident—i.e. it is not logically necessary. If that's the case, no principle of logic can assert "existence" without a hypothesis, i.e. none can be framed as "the propositional function so-and-so is sometimes true." Propositions of this kind, when they appear in logic, will need to be presented as hypotheses or as consequences of hypotheses, not as fully asserted propositions. Fully asserted propositions in logic will always state that some propositional function is always true. For instance, it is always true that if $p$ implies $q$ and $q$ implies $r$ then $p$ implies $r$ , or that if all $\alpha$ 's are $\beta$ 's and $x$ is an $\alpha$ then $x$ is a $\beta$ . These propositions can appear in logic, and their truth does not depend on the universe's existence. We can establish that, if there were no universe, all general propositions would be true; because the opposite of a general proposition (as we explored in Chapter XV) is a proposition claiming existence, which would therefore always be false if no universe existed.

[43]The primitive propositions in Principia Mathematica are such as to allow the inference that at least one individual exists. But I now view this as a defect in logical purity.

[43]The basic statements in Principia Mathematica suggest that at least one individual exists. However, I now see this as a flaw in logical clarity.

Logical propositions are such as can be known a priori, without study of the actual world. We only know from a study of empirical facts that Socrates is a man, but we know the correctness of the syllogism in its abstract form (i.e. when it is stated in terms of variables) without needing any appeal to experience. This is a characteristic, not of logical propositions in themselves, but of the way in which we know them. It has, however, a bearing upon the question what their nature may be, since there are some kinds of propositions which it would be very difficult to suppose we could know without experience.

Logical propositions are ones that can be understood a priori, without having to study the actual world. We only learn from examining empirical facts that Socrates is a man, but we recognize the validity of the syllogism in its abstract form (i.e. when it's expressed in terms of variables) without needing any real-world experience. This reflects not so much on the logical propositions themselves, but on how we come to know them. However, it does impact the question of what their nature might be, since there are some types of propositions that it would be very hard to believe we could understand without experience.

It is clear that the definition of "logic" or "mathematics" must be sought by trying to give a new definition of the old notion of "analytic" propositions. Although we can no longer be satisfied to define logical propositions as those that follow from the law of contradiction, we can and must still admit that they are a wholly different class of propositions from those that we come to know empirically. They all have the characteristic which, a moment ago, we agreed to call "tautology." This, [Pg 204] combined with the fact that they can be expressed wholly in terms of variables and logical constants (a logical constant being something which remains constant in a proposition even when all its constituents are changed)—will give the definition of logic or pure mathematics. For the moment, I do not know how to define "tautology."[44] It would be easy to offer a definition which might seem satisfactory for a while; but I know of none that I feel to be satisfactory, in spite of feeling thoroughly familiar with the characteristic of which a definition is wanted. At this point, therefore, for the moment, we reach the frontier of knowledge on our backward journey into the logical foundations of mathematics.

It’s clear that we need to find a new definition for the old idea of "analytic" propositions to really understand what "logic" or "mathematics" means. While we can't just define logical propositions as those that come from the law of contradiction anymore, we can still acknowledge that they are a completely different category of propositions compared to what we learn through experience. They all share what we agreed to call "tautology." This, along with the fact that they can be expressed entirely through variables and logical constants (a logical constant being something that stays the same in a proposition even when all its parts are changed)—will help us define logic or pure mathematics. Right now, I don’t know how to define "tautology." It would be easy to come up with a definition that might seem acceptable for a while; however, I haven’t found one that I truly think is satisfactory, despite feeling very familiar with the characteristic that needs defining. At this point in our exploration of the logical foundations of mathematics, we’ve reached the edge of what we know. [Pg 204]

[44]The importance of "tautology" for a definition of mathematics was pointed out to me by my former pupil Ludwig Wittgenstein, who was working on the problem. I do not know whether he has solved it, or even whether he is alive or dead.

[44]My former student Ludwig Wittgenstein highlighted the significance of "tautology" for defining mathematics while he was tackling this issue. I'm not sure if he has resolved it, or even if he is still alive.

We have now come to an end of our somewhat summary introduction to mathematical philosophy. It is impossible to convey adequately the ideas that are concerned in this subject so long as we abstain from the use of logical symbols. Since ordinary language has no words that naturally express exactly what we wish to express, it is necessary, so long as we adhere to ordinary language, to strain words into unusual meanings; and the reader is sure, after a time if not at first, to lapse into attaching the usual meanings to words, thus arriving at wrong notions as to what is intended to be said. Moreover, ordinary grammar and syntax is extraordinarily misleading. This is the case, e.g., as regards numbers; "ten men" is grammatically the same form as "white men," so that 10 might be thought to be an adjective qualifying "men." It is the case, again, wherever propositional functions are involved, and in particular as regards existence and descriptions. Because language is misleading, as well as because it is diffuse and inexact when applied to logic (for which it was never intended), logical symbolism is absolutely necessary to any exact or thorough treatment of our subject. Those readers, [Pg 205] therefore, who wish to acquire a mastery of the principles of mathematics, will, it is to be hoped, not shrink from the labour of mastering the symbols—a labour which is, in fact, much less than might be thought. As the above hasty survey must have made evident, there are innumerable unsolved problems in the subject, and much work needs to be done. If any student is led into a serious study of mathematical logic by this little book, it will have served the chief purpose for which it has been written. [Pg 206]

We have now reached the end of our brief introduction to mathematical philosophy. It’s difficult to effectively convey the ideas related to this topic without using logical symbols. Since everyday language doesn’t have words that precisely express what we want to say, we have to stretch words to fit unusual meanings. After a while, if not right away, readers will likely revert to the usual meanings of words, leading to misunderstandings about what is meant. Additionally, regular grammar and syntax can be extremely misleading. For example, "ten men" is grammatically similar to "white men," so someone might mistakenly think that 10 is an adjective describing "men." The same applies whenever propositional functions are involved, particularly regarding existence and descriptions. Because language can be misleading, and because it is vague and imprecise when applied to logic (for which it was never designed), logical symbolism is essential for any precise or thorough examination of our topic. Therefore, those readers who want to master the principles of mathematics should not shy away from the effort of learning the symbols—a task that is actually much easier than it may seem. As this quick overview should make clear, there are countless unsolved problems in this field, and a lot of work still needs to be done. If this little book inspires any student to seriously study mathematical logic, it will have fulfilled the main purpose for which it was written.

INDEX

Aggregates, 12
Alephs, 83, 92, 97, 125
Aliorelatives, 32
All, 158 ff.
Analysis, 4
Ancestors, 25, 33
Argument of a function, 47, 108
Arithmetising of mathematics, 4
Associative law, 58, 94
Axioms, 1

Between, 38 ff., 58
Bolzano, 138 n.
Boots and socks, 126
Boundary, 70, 98, 99

Cantor, Georg, 77, 79, 85 n., 86, 89,
95, 102, 136
Classes, 12, 137, 181 ff.;
reflexive, 80, 127, 138;
similar, 15, 16
Clifford, W. K., 76
Collections, infinite, 13
Commutative law, 58, 94
Conjunction, 147
Consecutiveness, 37, 38, 81
Constants, 202
Construction, method of, 73
Continuity, 86, 97 ff.;
Cantorian, 102 ff.;
Dedekindian, 101 ff.;
in philosophy, 105;
of functions, 106 ff.
Contradictions, 135 ff.
Convergence, 115
Converse, 16, 32, 49
Correlators, 54
Counterparts, objective, 61
Counting, 14, 16

Dedekind, 69, 99, 138 n.
Deduction, 144 ff.
Definition, 3;
extensional and intensional, 12
Derivatives, 100
Descriptions, 139, 144
Descriptions, 167
Dimensions, 29
Disjunction, 147
Distributive law, 58, 94
Diversity, 87
Domain, 16, 32, 49

Equivalence, 183
Euclid, 67
Existence, 164, 171, 177
Exponentiation, 94, 120
Extension of a relation, 60

Fictions, logical, 14 n., 45, 137
Field of a relation, 32, 53
Finite, 27
Flux, 105
Form, 198
Fractions, 37, 64
Frege, 7, 10, 25 n., 77, 95, 146 n.
Functions, 46;
descriptive, 46, 180;
intensional and extensional, 186;
predicative, 189;
propositional, 46, 144;
propositional, 155;

Gap, Dedekindian, 70 ff., 99
Generalisation, 156
Geometry, 29, 59, 67, 74, 100, 145;
analytical, 4, 86
Greater and less, 65, 90

Hegel, 107
Hereditary properties, 21

Implication, 146, 153;
formal, 163
Incommensurables, 4, 66
Incompatibility, 147 ff., 200
Incomplete symbols, 182
Indiscernibles, 192
Individuals, 132, 141, 173
Induction, mathematical, 20 ff., 87, 93,
185
Inductive properties, 21
Inference, 148
Infinite, 28; of rationals, 65;
Cantorian, 65;
of cardinals, 77 ff.;
and series and ordinals, 89 ff.
Infinity, axiom of, 66 n., 77, 131 ff.,
202
Instances, 156
Integers, positive and negative, 64
Intervals, 115
Intuition, 145
Irrationals, 66, 72
[Pg 207]

Kant, 145

Leibniz, 80, 107, 192
Lewis, C. I., 153, 154
Likeness, 52
Limit, 29, 69 ff., 97 ff.;
of functions, 106 ff.
Limiting points, 99
Logic, 159, 65, 194 ff.;
mathematical, v, 201, 206
Logicising of mathematics, 7

Maps, 52, 60 ff., 80
Mathematics, 194 ff.
Maximum, 70, 98
Median class, 104
Meinong, 169
Method, vi
Minimum, 70, 98
Modality, 165
Multiplication, 118 ff.
Multiplicative axiom, 92, 117 ff.

Names, 173, 182
Necessity, 165
Neighbourhood, 109
Nicod, 148, 149, 151
Null-class, 23, 132
Number, cardinal, 10 ff., 56, 77 ff., 95;
complex, 74 ff.;
finite, 20 ff.;
inductive, 27, 78, 131;
infinite, 77 ff.;
irrational, 66, 72;
maximum? 135;
multipliable, 130;
natural, 2 ff., 22;
non-inductive, 88, 127;
real, 66, 72, 84;
reflexive, 80, 127;
relation, 56, 94;
serial, 57

Occam, 184
Occurrences, primary and secondary,
179
Ontological proof, 203
Order 29ff.; cyclic, 40
Oscillation, ultimate, 111

Parmenides, 138
Particulars, 140 ff., 173
Peano, 5 ff., 23, 24, 78, 81, 131, 163
Peirce, 32 n.
Permutations, 50
Philosophy, mathematical, v, 1
Plato, 138
Plurality, 10
Poincaré, 27
Points, 59
Posterity, 22 ff., 32; proper, 36
Postulates, 71, 73
Precedent, 98
Premisses of arithmetic, 5
Primitive ideas and propositions, 5, 202
Progressions, 8, 81 ff.
Propositions, 155; analytic, 204;
elementary, 161
Pythagoras, 4, 67

Quantity, 97, 195

Ratios, 64, 71, 84, 133
Reducibility, axiom of, 191
Referent, 48
Relation numbers, 56 ff.
Relations, asymmetrical 31, 42;
connected, 32;
many-one, 15;
one-many, 15, 45;
one-one, 15, 47, 79;
reflexive, 16;
serial, 34;
similar, 52;
squares of, 32;
symmetrical, 16, 44;
transitive, 16, 32
Relatum, 48
Representatives, 120
Rigour, 144
Royce, 80

Section, Dedekindian, 69 ff.;
ultimate, 111
Segments, 72, 98
Selections, 117
Sequent, 98
Series, 29 ff.; closed, 103;
compact, 66, 93, 100;
condensed in itself, 102;
Dedekindian, 71, 73, 101;
generation of, 41;
infinite, 89;
perfect, 102, 103;
well-ordered, 92, 123
Sheffer, 148
Similarity, of classes, 15 ff.;
of relations, 83;
of relations, 52
Some, 158 ff.
Space, 61, 86, 140
Structure, 60 ff.
Sub-classes, 84 ff.
Subjects, 142
Subtraction, 87
Successor of a number, 23, 35
Syllogism, 197

Tautology, 203, 205
The, 167, 172 ff.
Time, 61, 86, 140
Truth-function, 147
Truth-value, 146
Types, logical, 53, 135 ff., 185, 188

Unreality, 168

Value of a function, 47, 108
Variables, 10, 161, 199
Veblen, 58
Verbs, 141

Weierstrass, 97, 107
Wells, H. G., 114
Whitehead, 64, 76, 107, 119
Wittgenstein, 205 n.

Zermelo, 123, 129
Zero, 65

Aggregates, 12
Alephs, 83, 92, 97, 125
Aliorelatives, 32
All, 158 ff.
Analysis, 4
Ancestors, 25, 33
Argument of a function, 47, 108
Arithmetising of mathematics, 4
Associative law, 58, 94
Axioms, 1

Between, 38 ff., 58
Bolzano, 138 n.
Boots and socks, 126
Boundary, 70, 98, 99

Cantor, Georg, 77, 79, 85 n., 86, 89,
__A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__, __A_TAG_PLACEHOLDER_2__
Classes, 12, 137, 181 ff.;
reflexive, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__, __A_TAG_PLACEHOLDER_2__;
similar, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__
Clifford, W. K., 76
Collections, infinite, 13
Commutative law, 58, 94
Conjunction, 147
Consecutiveness, 37, 38, 81
Constants, 202
Construction, method of, 73
Continuity, 86, 97 ff.;
Cantorian, __A_TAG_PLACEHOLDER_0__ ff.;
Dedekindian, __A_TAG_PLACEHOLDER_0__ etc.;
in philosophy, __A_TAG_PLACEHOLDER_0__;
of functions, __A_TAG_PLACEHOLDER_0__ etc.
Contradictions, 135 ff.
Convergence, 115
Converse, 16, 32, 49
Correlators, 54
Counterparts, objective, 61
Counting, 14, 16

Dedekind, 69, 99, 138 n.
Deduction, 144 ff.
Definition, 3;
extensional and intensional, __A_TAG_PLACEHOLDER_0__
Derivatives, 100
Descriptions, 139, 144
Descriptions, 167
Dimensions, 29
Disjunction, 147
Distributive law, 58, 94
Diversity, 87
Domain, 16, 32, 49

Equivalence, 183
Euclid, 67
Existence, 164, 171, 177
Exponentiation, 94, 120
Extension of a relation, 60

Fictions, logical, 14 n., 45, 137
Field of a relation, 32, 53
Finite, 27
Flux, 105
Form, 198
Fractions, 37, 64
Frege, 7, 10, 25 n., 77, 95, 146 n.
Functions, 46;
descriptive, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__;
intensional and extensional, __A_TAG_PLACEHOLDER_0__;
predicative, __A_TAG_PLACEHOLDER_0__;
propositional, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__;
propositional, __A_TAG_PLACEHOLDER_0__;

Gap, Dedekindian, 70 ff., 99
Generalisation, 156
Geometry, 29, 59, 67, 74, 100, 145;
analytical, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__
Greater and less, 65, 90

Hegel, 107
Hereditary properties, 21

Implication, 146, 153;
formal, __A_TAG_PLACEHOLDER_0__
Incommensurables, 4, 66
Incompatibility, 147 ff., 200
Incomplete symbols, 182
Indiscernibles, 192
Individuals, 132, 141, 173
Induction, mathematical, 20 ff., 87, 93,
185
Inductive properties, 21
Inference, 148
Infinite, 28; of rationals, 65;
Cantorian, __A_TAG_PLACEHOLDER_0__;
of cardinals, __A_TAG_PLACEHOLDER_0__ ff.;
and series and ordinals, __A_TAG_PLACEHOLDER_0__ etc.
Infinity, axiom of, 66 n., 77, 131 ff.,
202
Instances, 156
Integers, positive and negative, 64
Intervals, 115
Intuition, 145
Irrationals, 66, 72
[Pg 207]

Kant, 145

Leibniz, 80, 107, 192
Lewis, C. I., 153, 154
Likeness, 52
Limit, 29, 69 ff., 97 ff.;
of functions, __A_TAG_PLACEHOLDER_0__ etc.
Limiting points, 99
Logic, 159, 65, 194 ff.;
mathematical, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__, __A_TAG_PLACEHOLDER_2__
Logicising of mathematics, 7

Maps, 52, 60 ff., 80
Mathematics, 194 ff.
Maximum, 70, 98
Median class, 104
Meinong, 169
Method, vi
Minimum, 70, 98
Modality, 165
Multiplication, 118 ff.
Multiplicative axiom, 92, 117 ff.

Names, 173, 182
Necessity, 165
Neighbourhood, 109
Nicod, 148, 149, 151
Null-class, 23, 132
Number, cardinal, 10 ff., 56, 77 ff., 95;
complex, __A_TAG_PLACEHOLDER_0__ ff.
finite, __A_TAG_PLACEHOLDER_0__ ff.;
inductive, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__, __A_TAG_PLACEHOLDER_2__;
infinite, __A_TAG_PLACEHOLDER_0__ ff.
irrational, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__;
max? __A_TAG_PLACEHOLDER_0__;
multipliable, __A_TAG_PLACEHOLDER_0__;
natural, __A_TAG_PLACEHOLDER_0__ ff., __A_TAG_PLACEHOLDER_1__;
non-inductive, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__;
real, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__, __A_TAG_PLACEHOLDER_2__;
reflexive, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__;
relationship, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__;
serial, __A_TAG_PLACEHOLDER_0__

Occam, 184
Occurrences, primary and secondary,
179
Ontological proof, 203
Order 29ff.; cyclic, 40
Oscillation, ultimate, 111

Parmenides, 138
Particulars, 140 ff., 173
Peano, 5 ff., 23, 24, 78, 81, 131, 163
Peirce, 32 n.
Permutations, 50
Philosophy, mathematical, v, 1
Plato, 138
Plurality, 10
Poincaré, 27
Points, 59
Posterity, 22 ff., 32; proper, 36
Postulates, 71, 73
Precedent, 98
Premisses of arithmetic, 5
Primitive ideas and propositions, 5, 202
Progressions, 8, 81 ff.
Propositions, 155; analytic, 204;
elementary, __A_TAG_PLACEHOLDER_0__
Pythagoras, 4, 67

Quantity, 97, 195

Ratios, 64, 71, 84, 133
Reducibility, axiom of, 191
Referent, 48
Relation numbers, 56 ff.
Relations, asymmetrical 31, 42;
connected, __A_TAG_PLACEHOLDER_0__;
many-to-one, __A_TAG_PLACEHOLDER_0__;
one-to-many, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__;
one-on-one, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__, __A_TAG_PLACEHOLDER_2__;
reflexive, __A_TAG_PLACEHOLDER_0__;
serial, __A_TAG_PLACEHOLDER_0__;
similar, __A_TAG_PLACEHOLDER_0__;
squares of __A_TAG_PLACEHOLDER_0__;
symmetrical, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__;
transitive, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__
Relatum, 48
Representatives, 120
Rigour, 144
Royce, 80

Section, Dedekindian, 69 ff.;
ultimate, __A_TAG_PLACEHOLDER_0__
Segments, 72, 98
Selections, 117
Sequent, 98
Series, 29 ff.; closed, 103;
compact, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__, __A_TAG_PLACEHOLDER_2__;
condensed in itself, __A_TAG_PLACEHOLDER_0__;
Dedekindian, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__, __A_TAG_PLACEHOLDER_2__;
generation of __A_TAG_PLACEHOLDER_0__;
infinite, __A_TAG_PLACEHOLDER_0__;
perfect, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__;
well-organized, __A_TAG_PLACEHOLDER_0__, __A_TAG_PLACEHOLDER_1__
Sheffer, 148
Similarity, of classes, 15 ff.;
of relationships, __A_TAG_PLACEHOLDER_0__;
of relationships, __A_TAG_PLACEHOLDER_0__
Some, 158 ff.
Space, 61, 86, 140
Structure, 60 ff.
Sub-classes, 84 ff.
Subjects, 142
Subtraction, 87
Successor of a number, 23, 35
Syllogism, 197

Tautology, 203, 205
The, 167, 172 ff.
Time, 61, 86, 140
Truth-function, 147
Truth-value, 146
Types, logical, 53, 135 ff., 185, 188

Unreality, 168

Value of a function, 47, 108
Variables, 10, 161, 199
Veblen, 58
Verbs, 141

Weierstrass, 97, 107
Wells, H. G., 114
Whitehead, 64, 76, 107, 119
Wittgenstein, 205 n.

Zermelo, 123, 129
Zero, 65

PRINTED IN GREAT BRITAIN BY NEILL AND CO., LTD., EDINBURGH.

TRANSCRIBER'S NOTES

Transcriber's Notes

Minor typographical corrections and presentational changes have been made without comment.

Minor typographical corrections and formatting changes have been made without comment.

Download ePUB

If you like this ebook, consider a donation!

INTRODUCTION TO MATHEMATICAL PHILOSOPHY

PREFACE

EDITOR'S NOTE

CONTENTS

INTRODUCTION TO MATHEMATICAL PHILOSOPHY

CHAPTER I THE SERIES OF NATURAL NUMBERS

CHAPTER II DEFINITION OF NUMBER

CHAPTER III FINITUDE AND MATHEMATICAL INDUCTION

CHAPTER IV THE DEFINITION OF ORDER

CHAPTER V KINDS OF RELATIONS

CHAPTER VI SIMILARITY OF RELATIONS

CHAPTER VII RATIONAL, REAL, AND COMPLEX NUMBERS

CHAPTER VIII INFINITE CARDINAL NUMBERS

CHAPTER IX INFINITE SERIES AND ORDINALS

CHAPTER X LIMITS AND CONTINUITY

CHAPTER XI LIMITS AND CONTINUITY OF FUNCTIONS

CHAPTER XII SELECTIONS AND THE MULTIPLICATIVE AXIOM

CHAPTER XIII THE AXIOM OF INFINITY AND LOGICAL TYPES

CHAPTER XIV INCOMPATIBILITY AND THE THEORY OF DEDUCTION

CHAPTER XV PROPOSITIONAL FUNCTIONS

CHAPTER XVI DESCRIPTIONS

CHAPTER XVII CLASSES

CHAPTER XVIII MATHEMATICS AND LOGIC

INDEX

CHAPTER I

THE SERIES OF NATURAL NUMBERS

CHAPTER II

DEFINITION OF NUMBER

CHAPTER III

FINITUDE AND MATHEMATICAL INDUCTION

CHAPTER IV

THE DEFINITION OF ORDER

CHAPTER V

KINDS OF RELATIONS

CHAPTER VI

SIMILARITY OF RELATIONS

CHAPTER VII

RATIONAL, REAL, AND COMPLEX NUMBERS

CHAPTER VIII

INFINITE CARDINAL NUMBERS

CHAPTER IX

INFINITE SERIES AND ORDINALS

CHAPTER X

LIMITS AND CONTINUITY

CHAPTER XI

LIMITS AND CONTINUITY OF FUNCTIONS

CHAPTER XII

SELECTIONS AND THE MULTIPLICATIVE AXIOM

CHAPTER XIII

THE AXIOM OF INFINITY AND LOGICAL TYPES

CHAPTER XIV

INCOMPATIBILITY AND THE THEORY OF DEDUCTION

CHAPTER XV

PROPOSITIONAL FUNCTIONS

CHAPTER XVI

DESCRIPTIONS

CHAPTER XVII

CLASSES

CHAPTER XVIII

MATHEMATICS AND LOGIC