11 The Calculus of Probabilities

Chapter 11
The Calculus of Probabilities

N o doubt the reader will be astonished to ﬁnd reﬂections on the calculus of probabilities in such a volume as this. What has that calculus to do with physical science? The questions I shall raise—without, however, giving them a solution—are naturally raised by the philosopher who is examining the problems of physics. So far is this the case, that in the two preceding chapters I have several times used the words “probability” and “chance.” “Predicted facts,” as I said above, “can only be probable.” However solidly founded a prediction may appear to be, we are never absolutely certain that experiment will not prove it false; but the probability is often so great that practically it may be accepted. And a little farther on I added:—“See what a part the belief in simplicity plays in our generalisations. We have veriﬁed a simple law in a large number of particular cases, and we refuse to admit that this so-often-repeated coincidence is a mere eﬀect of chance.” Thus, in a multitude of circumstances the physicist is often in the same position as the gambler who reckons up his chances. Every time that he reasons by induction, he more or less consciously requires the calculus of probabilities, and that is why I am obliged to open this chapter parenthetically, and to interrupt our discussion of method in the physical sciences in order to examine a little closer what this calculus is worth, and what dependence we may place upon it. The very name of the calculus of probabilities is a paradox. Probability as opposed to certainty is what one does not know, and how can we calculate the unknown? Yet many eminent scientists have devoted themselves to this calculus, and it cannot be denied that science has drawn therefrom no small advantage. How can we explain this apparent contradiction? Has probability been deﬁned? Can it even be deﬁned? And if it cannot, how can we venture to reason upon it? The deﬁnition, it will be said, is very simple. The probability of an event is the ratio of the number of cases favourable to the event to the total number of possible cases. A simple example will show how incomplete this deﬁnition is:—I throw two dice. What is the probability that one of the two at least turns up a 6? Each can turn up in six diﬀerent ways; the number of possible cases is $6 \times 6 = 36$ . The number of favourable cases is $11$ ; the probability is $\frac{11}{36}$ . That is the correct solution. But why cannot we just as well proceed as follows?—The points which turn up on the two dice form $\frac{6 \times 7}{2} = 21$ diﬀerent combinations. Among these combinations, six are favourable; the probability is $\frac{6}{21}$ . Now why is the ﬁrst method of calculating the number of possible cases more legitimate than the second? In any case it is not the deﬁnition that tells us. We are therefore bound to complete the deﬁnition by saying, “…to the total number of possible cases, provided the cases are equally probable.” So we are compelled to deﬁne the probable by the probable. How can we know that two possible cases are equally probable? Will it be by a convention? If we insert at the beginning of every problem an explicit convention, well and good! We then have nothing to do but to apply the rules of arithmetic and algebra, and we complete our calculation, when our result cannot be called in question. But if we wish to make the slightest application of this result, we must prove that our convention is legitimate, and we shall ﬁnd ourselves in the presence of the very diﬃculty we thought we had avoided. It may be said that common-sense is enough to show us the convention that should be adopted. Alas! M. Bertrand has amused himself by discussing the following simple problem:—“What is the probability that a chord of a circle may be greater than the side of the inscribed equilateral triangle?” The illustrious geometer successively adopted two conventions which seemed to be equally imperative in the eyes of common-sense, and with one convention he ﬁnds $\frac{1}{2}$ , and with the other $\frac{1}{3}$ . The conclusion which seems to follow from this is that the calculus of probabilities is a useless science, that the obscure instinct which we call common-sense, and to which we appeal for the legitimisation of our conventions, must be distrusted. But to this conclusion we can no longer subscribe. We cannot do without that obscure instinct. Without it, science would be impossible, and without it we could neither discover nor apply a law. Have we any right, for instance, to enunciate Newton’s law? No doubt numerous observations are in agreement with it, but is not that a simple fact of chance? and how do we know, besides, that this law which has been true for so many generations will not be untrue in the next? To this objection the only answer you can give is: It is very improbable. But grant the law. By means of it I can calculate the position of Jupiter in a year from now. Yet have I any right to say this? Who can tell if a gigantic mass of enormous velocity is not going to pass near the solar system and produce unforeseen perturbations? Here again the only answer is: It is very improbable. From this point of view all the sciences would only be unconscious applications of the calculus of probabilities. And if this calculus be condemned, then the whole of the sciences must also be condemned. I shall not dwell at length on scientiﬁc problems in which the intervention of the calculus of probabilities is more evident. In the forefront of these is the problem of interpolation, in which, knowing a certain number of values of a function, we try to discover the intermediary values. I may also mention the celebrated theory of errors of observation, to which I shall return later; the kinetic theory of gases, a well-known hypothesis wherein each gaseous molecule is supposed to describe an extremely complicated path, but in which, through the eﬀect of great numbers, the mean phenomena which are all we observe obey the simple laws of Mariotte and Gay-Lussac. All these theories are based upon the laws of great numbers, and the calculus of probabilities would evidently involve them in its ruin. It is true that they have only a particular interest, and that, save as far as interpolation is concerned, they are sacriﬁces to which we might readily be resigned. But I have said above, it would not be these partial sacriﬁces that would be in question; it would be the legitimacy of the whole of science that would be challenged. I quite see that it might be said: We do not know, and yet we must act. As for action, we have not time to devote ourselves to an inquiry that will suﬃce to dispel our ignorance. Besides, such an inquiry would demand unlimited time. We must therefore make up our minds without knowing. This must be often done whatever may happen, and we must follow the rules although we may have but little conﬁdence in them. What I know is, not that such a thing is true, but that the best course for me is to act as if it were true. The calculus of probabilities, and therefore science itself, would be no longer of any practical value.

Unfortunately the diﬃculty does not thus disappear. A gambler wants to try a coup, and he asks my advice. If I give it him, I use the calculus of probabilities; but I shall not guarantee success. That is what I shall call subjective probability. In this case we might be content with the explanation of which I have just given a sketch. But assume that an observer is present at the play, that he knows of the coup, and that play goes on for a long time, and that he makes a summary of his notes. He will ﬁnd that events have taken place in conformity with the laws of the calculus of probabilities. That is what I shall call objective probability, and it is this phenomenon which has to be explained. There are numerous Insurance Societies which apply the rules of the calculus of probabilities, and they distribute to their shareholders dividends, the objective reality of which cannot be contested. In order to explain them, we must do more than invoke our ignorance and the necessity of action. Thus, absolute scepticism is not admissible. We may distrust, but we cannot condemn en bloc. Discussion is necessary.

I. Classiﬁcation of the Problems of Probability.—In order to classify the problems which are presented to us with reference to probabilities, we must look at them from diﬀerent points of view, and ﬁrst of all, from that of generality. I said above that probability is the ratio of the number of favourable to the number of possible cases. What for want of a better term I call generality will increase with the number of possible cases. This number may be ﬁnite, as, for instance, if we take a throw of the dice in which the number of possible cases is $36$ . That is the ﬁrst degree of generality. But if we ask, for instance, what is the probability that a point within a circle is within the inscribed square, there are as many possible cases as there are points in the circle—that is to say, an inﬁnite number. This is the second degree of generality. Generality can be pushed further still. We may ask the probability that a function will satisfy a given condition. There are then as many possible cases as one can imagine diﬀerent functions. This is the third degree of generality, which we reach, for instance, when we try to ﬁnd the most probable law after a ﬁnite number of observations. Yet we may place ourselves at a quite diﬀerent point of view. If we were not ignorant there would be no probability, there could only be certainty. But our ignorance cannot be absolute, for then there would be no longer any probability at all. Thus the problems of probability may be classed according to the greater or less depth of this ignorance. In mathematics we may set ourselves problems in probability. What is the probability that the ﬁfth decimal of a logarithm taken at random from a table is a $9$ . There is no hesitation in answering that this probability is $\frac{1}{10}$ . Here we possess all the data of the problem. We can calculate our logarithm without having recourse to the table, but we need not give ourselves the trouble. This is the ﬁrst degree of ignorance. In the physical sciences our ignorance is already greater. The state of a system at a given moment depends on two things—its initial state, and the law according to which that state varies. If we know both this law and this initial state, we have a simple mathematical problem to solve, and we fall back upon our ﬁrst degree of ignorance. Then it often happens that we know the law and do not know the initial state. It may be asked, for instance, what is the present distribution of the minor planets? We know that from all time they have obeyed the laws of Kepler, but we do not know what was their initial distribution. In the kinetic theory of gases we assume that the gaseous molecules follow rectilinear paths and obey the laws of impact and elastic bodies; yet as we know nothing of their initial velocities, we know nothing of their present velocities. The calculus of probabilities alone enables us to predict the mean phenomena which will result from a combination of these velocities. This is the second degree of ignorance. Finally it is possible, that not only the initial conditions but the laws themselves are unknown. We then reach the third degree of ignorance, and in general we can no longer aﬃrm anything at all as to the probability of a phenomenon. It often happens that instead of trying to discover an event by means of a more or less imperfect knowledge of the law, the events may be known, and we want to ﬁnd the law; or that, instead of deducing eﬀects from causes, we wish to deduce the causes from the eﬀects. Now, these problems are classiﬁed as probability of causes, and are the most interesting of all from their scientiﬁc applications. I play at écarté with a gentleman whom I know to be perfectly honest. What is the chance that he turns up the king? It is $\frac{1}{8}$ . This is a problem of the probability of eﬀects. I play with a gentleman whom I do not know. He has dealt ten times, and he has turned the king up six times. What is the chance that he is a sharper? This is a problem in the probability of causes. It may be said that it is the essential problem of the experimental method. I have observed $n$ values of $x$ and the corresponding values of $y$ . I have found that the ratio of the latter to the former is practically constant. There is the event; what is the cause? Is it probable that there is a general law according to which $y$ would be proportional to $x$ , and that small divergencies are due to errors of observation? This is the type of question that we are ever asking, and which we unconsciously solve whenever we are engaged in scientiﬁc work. I am now going to pass in review these diﬀerent categories of problems by discussing in succession what I have called subjective and objective probability.

II. Probability in Mathematics.—The impossibility of squaring the circle was shown in 1885, but before that date all geometers considered this impossibility as so “probable” that the Académie des Sciences rejected without examination the, alas! too numerous memoirs on this subject that a few unhappy madmen sent in every year. Was the Académie wrong? Evidently not, and it knew perfectly well that by acting in this manner it did not run the least risk of stiﬂing a discovery of moment. The Académie could not have proved that it was right, but it knew quite well that its instinct did not deceive it. If you had asked the Academicians, they would have answered: “We have compared the probability that an unknown scientist should have found out what has been vainly sought for so long, with the probability that there is one madman the more on the earth, and the latter has appeared to us the greater.” These are very good reasons, but there is nothing mathematical about them; they are purely psychological. If you had pressed them further, they would have added: “Why do you expect a particular value of a transcendental function to be an algebraical number; if $π$ be the root of an algebraical equation, why do you expect this root to be a period of the function $sin 2 x$ , and why is it not the same with the other roots of the same equation?” To sum up, they would have invoked the principle of suﬃcient reason in its vaguest form. Yet what information could they draw from it? At most a rule of conduct for the employment of their time, which would be more usefully spent at their ordinary work than in reading a lucubration that inspired in them a legitimate distrust. But what I called above objective probability has nothing in common with this ﬁrst problem. It is otherwise with the second. Let us consider the ﬁrst $10, 000$ logarithms that we ﬁnd in a table. Among these $10, 000$ logarithms I take one at random. What is the probability that its third decimal is an even number? You will say without any hesitation that the probability is $\frac{1}{2}$ , and in fact if you pick out in a table the third decimals in these $10, 000$ numbers you will ﬁnd nearly as many even digits as odd. Or, if you prefer it, let us write $10, 000$ numbers corresponding to our $10, 000$ logarithms, writing down for each of these numbers $+ 1$ if the third decimal of the corresponding logarithm is even, and $- 1$ if odd; and then let us take the mean of these $10, 000$ numbers. I do not hesitate to say that the mean of these $10, 000$ units is probably zero, and if I were to calculate it practically, I would verify that it is extremely small. But this veriﬁcation is needless. I might have rigorously proved that this mean is smaller than $0.003$ . To prove this result I should have had to make a rather long calculation for which there is no room here, and for which I may refer the reader to an article that I published in the Revue générale des Sciences, April 15th, 1899. The only point to which I wish to draw attention is the following. In this calculation I had occasion to rest my case on only two facts—namely, that the ﬁrst and second derivatives of the logarithm remain, in the interval considered, between certain limits. Hence our ﬁrst conclusion is that the property is not only true of the logarithm but of any continuous function whatever, since the derivatives of every continuous function are limited. If I was certain beforehand of the result, it is because I have often observed analogous facts for other continuous functions; and next, it is because I went through in my mind in a more or less unconscious and imperfect manner the reasoning which led me to the preceding inequalities, just as a skilled calculator before ﬁnishing his multiplication takes into account what it ought to come to approximately. And besides, since what I call my intuition was only an incomplete summary of a piece of true reasoning, it is clear that observation has conﬁrmed my predictions, and that the objective and subjective probabilities are in agreement. As a third example I shall choose the following:—The number $u$ is taken at random and $n$ is a given very large integer. What is the mean value of $sin n u$ ? This problem has no meaning by itself. To give it one, a convention is required—namely, we agree that the probability for the number $u$ to lie between $a$ and $a + d a$ is $ϕ (a) d a$ ; that it is therefore proportional to the inﬁnitely small interval $d a$ , and is equal to this multiplied by a function $ϕ (a)$ , only depending on $a$ . As for this function I choose it arbitrarily, but I must assume it to be continuous. The value of $sin n u$ remaining the same when $u$ increases by $2 π$ , I may without loss of generality assume that $u$ lies between $0$ and $2 π$ , and I shall thus be led to suppose that $ϕ (a)$ is a periodic function whose period is $2 π$ . The mean value that we seek is readily expressed by a simple integral, and it is easy to show that this integral is smaller than

\frac{2 π M_{K}}{n^{K}},

M_{K}

being the maximum value of the

K

th derivative of

ϕ (u)

. We see then that if the

K

th derivative is ﬁnite, our mean value will tend towards zero when

n

increases indeﬁnitely, and that more rapidly than

\frac{1}{n^{K + 1}}

The mean value of $sin n u$ when $n$ is very large is therefore zero. To deﬁne this value I required a convention, but the result remains the same whatever that convention may be. I have imposed upon myself but slight restrictions when I assumed that the function $ϕ (a)$ is continuous and periodic, and these hypotheses are so natural that we may ask ourselves how they can be escaped. Examination of the three preceding examples, so diﬀerent in all respects, has already given us a glimpse on the one hand of the rôle of what philosophers call the principle of suﬃcient reason, and on the other hand of the importance of the fact that certain properties are common to all continuous functions. The study of probability in the physical sciences will lead us to the same result.

III. Probability in the Physical Sciences.—We now come to the problems which are connected with what I have called the second degree of ignorance—namely, those in which we know the law but do not know the initial state of the system. I could multiply examples, but I shall take only one. What is the probable present distribution of the minor planets on the zodiac? We know they obey the laws of Kepler. We may even, without changing the nature of the problem, suppose that their orbits are circular and situated in the same plane, a plane which we are given. On the other hand, we know absolutely nothing about their initial distribution. However, we do not hesitate to aﬃrm that this distribution is now nearly uniform. Why? Let $b$ be the longitude of a minor planet in the initial epoch—that is to say, the epoch zero. Let $a$ be its mean motion. Its longitude at the present time—i.e., at the time $t$ will be $a t + b$ . To say that the present distribution is uniform is to say that the mean value of the sines and cosines of multiples of $a t + b$ is zero. Why do we assert this? Let us represent our minor planet by a point in a plane—namely, the point whose co-ordinates are $a$ and $b$ . All these representative points will be contained in a certain region of the plane, but as they are very numerous this region will appear dotted with points. We know nothing else about the distribution of the points. Now what do we do when we apply the calculus of probabilities to such a question as this? What is the probability that one or more representative points may be found in a certain portion of the plane? In our ignorance we are compelled to make an arbitrary hypothesis. To explain the nature of this hypothesis I may be allowed to use, instead of a mathematical formula, a crude but concrete image. Let us suppose that over the surface of our plane has been spread imaginary matter, the density of which is variable, but varies continuously. We shall then agree to say that the probable number of representative points to be found on a certain portion of the plane is proportional to the quantity of this imaginary matter which is found there. If there are, then, two regions of the plane of the same extent, the probabilities that a representative point of one of our minor planets is in one or other of these regions will be as the mean densities of the imaginary matter in one or other of the regions. Here then are two distributions, one real, in which the representative points are very numerous, very close together, but discrete like the molecules of matter in the atomic hypothesis; the other remote from reality, in which our representative points are replaced by imaginary continuous matter. We know that the latter cannot be real, but we are forced to adopt it through our ignorance. If, again, we had some idea of the real distribution of the representative points, we could arrange it so that in a region of some extent the density of this imaginary continuous matter may be nearly proportional to the number of representative points, or, if it is preferred, to the number of atoms which are contained in that region. Even that is impossible, and our ignorance is so great that we are forced to choose arbitrarily the function which deﬁnes the density of our imaginary matter. We shall be compelled to adopt a hypothesis from which we can hardly get away; we shall suppose that this function is continuous. That is suﬃcient, as we shall see, to enable us to reach our conclusion.

What is at the instant $t$ the probable distribution of the minor planets—or rather, what is the mean value of the sine of the longitude at the moment $t$ —i.e., of $sin (a t + b)$ ? We made at the outset an arbitrary convention, but if we adopt it, this probable value is entirely deﬁned. Let us decompose the plane into elements of surface. Consider the value of $sin (a t + b)$ at the centre of each of these elements. Multiply this value by the surface of the element and by the corresponding density of the imaginary matter. Let us then take the sum for all the elements of the plane. This sum, by deﬁnition, will be the probable mean value we seek, which will thus be expressed by a double integral. It may be thought at ﬁrst that this mean value depends on the choice of the function $ϕ$ which deﬁnes the density of the imaginary matter, and as this function $ϕ$ is arbitrary, we can, according to the arbitrary choice which we make, obtain a certain mean value. But this is not the case. A simple calculation shows us that our double integral decreases very rapidly as $t$ increases. Thus, I cannot tell what hypothesis to make as to the probability of this or that initial distribution, but when once the hypothesis is made the result will be the same, and this gets me out of my diﬃculty. Whatever the function $ϕ$ may be, the mean value tends towards zero as $t$ increases, and as the minor planets have certainly accomplished a very large number of revolutions, I may assert that this mean value is very small. I may give to $ϕ$ any value I choose, with one restriction: this function must be continuous; and, in fact, from the point of view of subjective probability, the choice of a discontinuous function would have been unreasonable. What reason could I have, for instance, for supposing that the initial longitude might be exactly $0 \circ$ , but that it could not lie between $0 \circ$ and $1 \circ$ ?

The diﬃculty reappears if we look at it from the point of view of objective probability; if we pass from our imaginary distribution in which the supposititious matter was assumed to be continuous, to the real distribution in which our representative points are formed as discrete atoms. The mean value of $sin (a t + b)$ will be represented quite simply by

\frac{1}{n} \sum sin (a t + b),

n

being the number of minor planets. Instead of a double integral referring to a continuous function, we shall have a sum of discrete terms. However, no one will seriously doubt that this mean value is practically very small. Our representative points being very close together, our discrete sum will in general diﬀer very little from an integral. An integral is the limit towards which a sum of terms tends when the number of these terms is indeﬁnitely increased. If the terms are very numerous, the sum will diﬀer very little from its limit—that is to say, from the integral, and what I said of the latter will still be true of the sum itself. But there are exceptions. If, for instance, for all the minor planets

b = \frac{π}{2} - a t

, the longitude of all the planets at the time

t

would be

\frac{π}{2}

, and the mean value in question would be evidently unity. For this to be the case at the time

0

, the minor planets must have all been lying on a kind of spiral of peculiar form, with its spires very close together. All will admit that such an initial distribution is extremely improbable (and even if it were realised, the distribution would not be uniform at the present time—for example, on the 1st January 1900; but it would become so a few years later). Why, then, do we think this initial distribution improbable? This must be explained, for if we are wrong in rejecting as improbable this absurd hypothesis, our inquiry breaks down, and we can no longer aﬃrm anything on the subject of the probability of this or that present distribution. Once more we shall invoke the principle of suﬃcient reason, to which we must always recur. We might admit that at the beginning the planets were distributed almost in a straight line. We might admit that they were irregularly distributed. But it seems to us that there is no suﬃcient reason for the unknown cause that gave them birth to have acted along a curve so regular and yet so complicated, which would appear to have been expressly chosen so that the distribution at the present day would not be uniform.

IV. Rouge et Noir.—The questions raised by games of chance, such as roulette, are, fundamentally, quite analogous to those we have just treated. For example, a wheel is divided into a large number of equal compartments, alternately red and black. A ball is spun round the wheel, and after having moved round a number of times, it stops in front of one of these sub-divisions. The probability that the division is red is obviously $\frac{1}{2}$ . The needle describes an angle $𝜃$ , including several complete revolutions. I do not know what is the probability that the ball is spun with such a force that this angle should lie between $𝜃$ and $𝜃 + d 𝜃$ , but I can make a convention. I can suppose that this probability is $ϕ (𝜃) d 𝜃$ . As for the function $ϕ (𝜃)$ , I can choose it in an entirely arbitrary manner. I have nothing to guide me in my choice, but I am naturally induced to suppose the function to be continuous. Let $𝜖$ be a length (measured on the circumference of the circle of radius unity) of each red and black compartment. We have to calculate the integral of $ϕ (𝜃) d 𝜃$ , extending it on the one hand to all the red, and on the other hand to all the black compartments, and to compare the results. Consider an interval $2 𝜖$ comprising two consecutive red and black compartments. Let $M$ and $m$ be the maximum and minimum values of the function $ϕ (𝜃)$ in this interval. The integral extended to the red compartments will be smaller than $\sum M 𝜖$ ; extended to the black it will be greater than $\sum m 𝜖$ . The diﬀerence will therefore be smaller than $\sum (M - m) 𝜖$ . But if the function $ϕ$ is supposed continuous, and if on the other hand the interval $𝜖$ is very small with respect to the total angle described by the needle, the diﬀerence $M - m$ will be very small. The diﬀerence of the two integrals will be therefore very small, and the probability will be very nearly $\frac{1}{2}$ . We see that without knowing anything of the function $ϕ$ we must act as if the probability were $\frac{1}{2}$ . And on the other hand it explains why, from the objective point of view, if I watch a certain number of coups, observation will give me almost as many black coups as red. All the players know this objective law; but it leads them into a remarkable error, which has often been exposed, but into which they are always falling. When the red has won, for example, six times running, they bet on black, thinking that they are playing an absolutely safe game, because they say it is a very rare thing for the red to win seven times running. In reality their probability of winning is still $\frac{1}{2}$ . Observation shows, it is true, that the series of seven consecutive reds is very rare, but series of six reds followed by a black are also very rare. They have noticed the rarity of the series of seven reds; if they have not remarked the rarity of six reds and a black, it is only because such series strike the attention less.

V. The Probability of Causes.—We now come to the problems of the probability of causes, the most important from the point of view of scientiﬁc applications. Two stars, for instance, are very close together on the celestial sphere. Is this apparent contiguity a mere eﬀect of chance? Are these stars, although almost on the same visual ray, situated at very diﬀerent distances from the earth, and therefore very far indeed from one another? or does the apparent correspond to a real contiguity? This is a problem on the probability of causes.

First of all, I recall that at the outset of all problems of probability of eﬀects that have occupied our attention up to now, we have had to use a convention which was more or less justiﬁed; and if in most cases the result was to a certain extent independent of this convention, it was only the condition of certain hypotheses which enabled us à priori to reject discontinuous functions, for example, or certain absurd conventions. We shall again ﬁnd something analogous to this when we deal with the probability of causes. An eﬀect may be produced by the cause $a$ or by the cause $b$ . The eﬀect has just been observed. We ask the probability that it is due to the cause $a$ . This is an à posteriori probability of cause. But I could not calculate it, if a convention more or less justiﬁed did not tell me in advance what is the à priori probability for the cause $a$ to come into play—I mean the probability of this event to some one who had not observed the eﬀect. To make my meaning clearer, I go back to the game of écarté mentioned before. My adversary deals for the ﬁrst time and turns up a king. What is the probability that he is a sharper? The formulæ ordinarily taught give $\frac{8}{9}$ , a result which is obviously rather surprising. If we look at it closer, we see that the conclusion is arrived at as if, before sitting down at the table, I had considered that there was one chance in two that my adversary was not honest. An absurd hypothesis, because in that case I should certainly not have played with him; and this explains the absurdity of the conclusion. The function on the à priori probability was unjustiﬁed, and that is why the conclusion of the à posteriori probability led me into an inadmissible result. The importance of this preliminary convention is obvious. I shall even add that if none were made, the problem of the à posteriori probability would have no meaning. It must be always made either explicitly or tacitly.

Let us pass on to an example of a more scientiﬁc character. I require to determine an experimental law; this law, when discovered, can be represented by a curve. I make a certain number of isolated observations, each of which may be represented by a point. When I have obtained these diﬀerent points, I draw a curve between them as carefully as possible, giving my curve a regular form, avoiding sharp angles, accentuated inﬂexions, and any sudden variation of the radius of curvature. This curve will represent to me the probable law, and not only will it give me the values of the functions intermediary to those which have been observed, but it also gives me the observed values more accurately than direct observation does; that is why I make the curve pass near the points and not through the points themselves.

Here, then, is a problem in the probability of causes. The eﬀects are the measurements I have recorded; they depend on the combination of two causes—the true law of the phenomenon and errors of observation. Knowing the eﬀects, we have to ﬁnd the probability that the phenomenon shall obey this law or that, and that the observations have been accompanied by this or that error. The most probable law, therefore, corresponds to the curve we have traced, and the most probable error is represented by the distance of the corresponding point from that curve. But the problem has no meaning if before the observations I had an à priori idea of the probability of this law or that, or of the chances of error to which I am exposed. If my instruments are good (and I knew whether this is so or not before beginning the observations), I shall not draw the curve far from the points which represent the rough measurements. If they are inferior, I may draw it a little farther from the points, so that I may get a less sinuous curve; much will be sacriﬁced to regularity.

Why, then, do I draw a curve without sinuosities? Because I consider à priori a law represented by a continuous function (or function the derivatives of which to a high order are small), as more probable than a law not satisfying those conditions. But for this conviction the problem would have no meaning; interpolation would be impossible; no law could be deduced from a ﬁnite number of observations; science would cease to exist.

Fifty years ago physicists considered, other things being equal, a simple law as more probable than a complicated law. This principle was even invoked in favour of Mariotte’s law as against that of Regnault. But this belief is now repudiated; and yet, how many times are we compelled to act as though we still held it! However that may be, what remains of this tendency is the belief in continuity, and as we have just seen, if the belief in continuity were to disappear, experimental science would become impossible.

VI. The Theory of Errors.—We are thus brought to consider the theory of errors which is directly connected with the problem of the probability of causes. Here again we ﬁnd eﬀects—to wit, a certain number of irreconcilable observations, and we try to ﬁnd the causes which are, on the one hand, the true value of the quantity to be measured, and, on the other, the error made in each isolated observation. We must calculate the probable à posteriori value of each error, and therefore the probable value of the quantity to be measured. But, as I have just explained, we cannot undertake this calculation unless we admit à priori—i.e., before any observations are made—that there is a law of the probability of errors. Is there a law of errors? The law to which all calculators assent is Gauss’s law, that is represented by a certain transcendental curve known as the “bell.”

But it is ﬁrst of all necessary to recall the classic distinction between systematic and accidental errors. If the metre with which we measure a length is too long, the number we get will be too small, and it will be no use to measure several times—that is a systematic error. If we measure with an accurate metre, we may make a mistake, and ﬁnd the length sometimes too large and sometimes too small, and when we take the mean of a large number of measurements, the error will tend to grow small. These are accidental errors.

It is clear that systematic errors do not satisfy Gauss’s law, but do accidental errors satisfy it? Numerous proofs have been attempted, almost all of them crude paralogisms. But starting from the following hypotheses we may prove Gauss’s law: the error is the result of a very large number of partial and independent errors; each partial error is very small and obeys any law of probability whatever, provided the probability of a positive error is the same as that of an equal negative error. It is clear that these conditions will be often, but not always, fulﬁlled, and we may reserve the name of accidental for errors which satisfy them.

We see that the method of least squares is not legitimate in every case; in general, physicists are more distrustful of it than astronomers. This is no doubt because the latter, apart from the systematic errors to which they and the physicists are subject alike, have to contend with an extremely important source of error which is entirely accidental—I mean atmospheric undulations. So it is very curious to hear a discussion between a physicist and an astronomer about a method of observation. The physicist, persuaded that one good measurement is worth more than many bad ones, is pre-eminently concerned with the elimination by means of every precaution of the ﬁnal systematic errors; the astronomer retorts: “But you can only observe a small number of stars, and accidental errors will not disappear.”

What conclusion must we draw? Must we continue to use the method of least squares? We must distinguish. We have eliminated all the systematic errors of which we have any suspicion; we are quite certain that there are others still, but we cannot detect them; and yet we must make up our minds and adopt a deﬁnitive value which will be regarded as the probable value; and for that purpose it is clear that the best thing we can do is to apply Gauss’s law. We have only applied a practical rule referring to subjective probability. And there is no more to be said.

Yet we want to go farther and say that not only the probable value is so much, but that the probable error in the result is so much. This is absolutely invalid: it would be true only if we were sure that all the systematic errors were eliminated, and of that we know absolutely nothing. We have two series of observations; by applying the law of least squares we ﬁnd that the probable error in the ﬁrst series is twice as small as in the second. The second series may, however, be more accurate than the ﬁrst, because the ﬁrst is perhaps aﬀected by a large systematic error. All that we can say is, that the ﬁrst series is probably better than the second because its accidental error is smaller, and that we have no reason for aﬃrming that the systematic error is greater for one of the series than for the other, our ignorance on this point being absolute.

VII. Conclusions.—In the preceding lines I have set several problems, and have given no solution. I do not regret this, for perhaps they will invite the reader to reﬂect on these delicate questions.

However that may be, there are certain points which seem to be well established. To undertake the calculation of any probability, and even for that calculation to have any meaning at all, we must admit, as a point of departure, an hypothesis or convention which has always something arbitrary about it. In the choice of this convention we can be guided only by the principle of suﬃcient reason. Unfortunately, this principle is very vague and very elastic, and in the cursory examination we have just made we have seen it assume diﬀerent forms. The form under which we meet it most often is the belief in continuity, a belief which it would be diﬃcult to justify by apodeictic reasoning, but without which all science would be impossible. Finally, the problems to which the calculus of probabilities may be applied with proﬁt are those in which the result is independent of the hypothesis made at the outset, provided only that this hypothesis satisﬁes the condition of continuity.

ptail

top

Chapter 11The Calculus of Probabilities

Chapter 11
The Calculus of Probabilities