On A New Axiomatic Theory of Probability

ON A NEW AXIOMATIC THEORY OF PROBABILITY
By
ALFRI~D RENYI (Budapest), corresponding member of the Academy
Dedicated to Professors L. FEJ(~R and F. RlEsZ on their 75~h birthday
Introduction
The axiomatic foundation of probability theory given by A. N. KOLMO-

'GOROV [1] in the year 1933 has been the starting point of a new and bril-
liant period in the development of probability theory. According to this theory
to every situation (experiment, observation etc.) in which chance plays a
role, there corresponds a probability space [S, ~, P], i. e. an abstract space
S (the space of elementary events), a o-algebra ~ (the set of events)of
subsets of S, and a measure P(A) (the probability of the event A) defined
for A ~ ~ and satisfying P ( S ) ~ - 1 . The theory of KOLMOGOROVfurnished an
appropriate and mathematically exact basis for the rapid development of
probability theory, which took place in the last 30 years, as well as for its
fruitfull application in a great number of branches of science, including other
Chapters of mathematics too. Nevertheless in the course of development there
arose some problems concerning probability which can not be fitted into the
frames of the theory of KOLMOGOROV.
The common feature of these problems is that in them u n b o u n d e d m e a s -
ures occur, while in the theory of KOLMOGOROV probability is a bounded
measure normed by the condition P ( S ) ~ 1. Unbounded measures occur in
statistical mechanics, quantum mechanics, in some problems of mathematical
statistics (for example in connection with the application of the theorem of
Bayes) as limiting distributions of Markov chains and Markov processes, in
integral geometry, in connection with the applications of probability concepts
in number theory etc.
In the theory of KOLM.OGOROV it has, for instance, no sense to speak
about a probability distribution which is uniform on the w h o l e real axis or
on the whole plane, further it has no sense to say that we choose an integer
in such a way that all integers (or all non-negative integers) are equiprobable.
At the first glance it seems that unbo~Jnded measures can play no role
in probability theory, because, in view of the connection between probability
and relative frequency, probability clearly can not take any value greater than 1.
But if we observe more attentively how unbounded measures are really
~Jsed in all cases mentioned above, we see that unbounded measures are
286 A.~ENYI
used only to calculate conditional probabilities as the quotient of the Values:

of the unbounded measure of two sets (the first being contained in the
second) and in this way reasonable values (not exceeding 1) are obtained.
This is the reason why unbounded measures can be used with success in
calculating (conditional) probabilities. But as the use of unbounded measures
can not be justified in the theory of KOLMOGOROV, the necessity arises to
generalize this theory. In the present paper such an attempt is made.
Clearly in a theory in which unbounded measures are allowed, condi-
tional probability must be taken as the fundamental concept. This is natural
also from an other point of view. In fact, the probability of an event depends
essentially on the circumstances under which the event possibly occurs, and
it is a commonplace to say that in reality every probability is conditional.
T h i s has been realized by several authors; I mention here without aiming
at completeness only H. JEFFI~EYS [2], H. RE!CHENBACH[3], J. KEYNES [4],~
R. KOOPMAN [5], A. COPELAND [6], G. A. BARNA~D [7] and I. J. GOOD [8].
But none of the mentioned authors developed his theory on a measure-
theoretic basic. The axiomatic theory developed in the present paper com-
bines the measure-theoretic treatment of KOLMOGOROV with the idea proposed
by the authors mentioned (and also by others)to consider conditional proba-.
bility as the fundamental concept.
The novelty of the theory lies only in this combination. It follows from
what has been said that in developing the theory proposed in this paper w e
follow the same way as KOLMOGOROV, only we try to go one step further.
Thus, this new theory shcmld be considered as a generalization of that of
KOLMO~OUOV. In fact, it contains the theory of KOLMOGOI~OV as a special
case, but includes also cases which can not be fitted into the theory of
KOLMOGOI?OV, namely cases in which conditional probabilities are calculated
by means of unbounded measures.
Among the authorsmentioned above only H. JEFFREYS uses explicitely
unbounded probability distributions (especially random variables ~ for which
log .~ is uniformly distributed on the whole real axis) but he does not give
an exact mathematical meaning to such distributions, and restricts himself to
the remark that it is a mere convention that to a certain event there corres-
ponds the probability 1; he says that in some cases instead of 1 the value
-k ~ can also be taken.
The attempt to extend the scope of the mathematical theory of proba-
bility with the aim to give a well founded basis for such calculating proce-
dures which were successfully used in many fields of applications without
being mathematically rigorously justified, has many analogues in the history
of mathematics. I mention only, as one of the latest such successful attempts,
that of S. L. SOBOLEV and L. SCHWARTZ,concerning the generalization of the
ONA N E W AXIOMATIC T H E O R Y O F PROBABILITY 287
notion of a function, to include, for instance, the function of DI•AC, iogether

with its derivatives etc?
T h e theory as presented in this paper is still far from being fully devel-
oped; we restrict ourselves to give here only the axioms and their immediate
consequences (w 1), further to discuss some examples (w 2) and applications
(w 3), and finally to develop some "conditional" laws of l a n e numbers (w 4).
As an application of the conditional strong law of large numbers, a generali-
zation of BOnEL'S theorem on normal decimals for CANTOR'S series is given.
I hope to return to the questions left uns()Ived in a forthcoming publication.
I had the occasion to lecture on the present theory in 1954 at s e v e r a l
congresses: in Budapest [9] a~ the General Assembly of the Hungarian Aca-
demy in May 1954, at the Conference on probability and statistics in Prague
in June 1954, at the International Congress in Amsterdam [10] in September
1954, at the Conference on probability and statistics in Berlin [11] in October
1954 and at the Conference on stochastic processes in Wroc~aw in December
1954. At these (and other) occasions many valuable remarks have been made.
I mention only the following ones: B. V. GNEDENKO kindly informed me in
Prague in June 1954 that i n a lecture held s o m e y e a r s ago in Moscow A. N.
KOLMOGOROV himself has put forward the idea to develop his theory in
such a manner t h a t conditional probability should be taken a s the funda-
mental concept, but he never published his ideas regarding this question.
I was' glad to learn from this information that my attempt, besides of being
a continuation of the work of KOLMOGO~OV, follows the lines which have
been pointed out by himself. In w 3 an inequality of J. HAJEKis used which
he communicated to me in Prague in June 1954. A proof of this inequality
is published in a joint paper of the author and J. HAJEK[12] in this volume.
Some interesting measure-theoretic problems, which arose in connection with
the present paper, have been solved by A. CsAszAR: his results, which he
1 The Dirac &"function" and the uniform distribution on the whole real axis (or the
whole space etc.) arise in the same way in quantum mechanics and they are in a certain
sense dual to each other. As a matter of fact, it is easy to verify the following fact connected
with Heisenberg's uncertainty relation (for the sake of simplicity we restrict ourselves to
the one dimensional case): if the wave function ~,(x)of the position of a particle degener-
ates into a Dirac &"function,, the wave function ~o(p) of the imPUlse of the particle degen-
erates into a function for which [9(pll2=~const. is the "density" of a "uniform proba-
bility distribution in the whole phase space". This can be shown as follows: as it is known,
denoting by 9(p) the wave function of the position and by 9~(p) the wave function of the
+ c~ ip:c
1 ~f g,(x)e
impulse of a particle, we have 5o(p)~- [/2-~h 7~ dx. This formula shows that if
-GO
ipx o
~(x) degenerates into the Dirac function ~(X--Xo) , we have ~(p) -- _ 1_ e i~. for
which [~ (p) [o~ const. V2~-h
288 A. ~ N w
exposed at the first time as comments to my lecture [9], are published in his.
paper [13] in this volume and settle the question under what conditions
can the conditional probability, introduced in the present paper as a set:
function of two set variables, be expressed in quotient form by means
of (one or more) set functions of one variable. D. VAN DANTZIQ called m y
attention to the work of KEYNES, COPELAND and KOOPMAN. JU. V. LINNIK
called my attention t o a paper of J. NEYlVIAN [14], where the foundations of
the theory of probability are sketched in a form which has some points of:
contact with the point of view of the present paper. In Berlin A. N. KOLMOGOROV
called my attention to those conditional probability spaces which I call "Cavalieri
spaces" (w 2). I hope to return to the thorough investigation of such spaces in a.
forthcoming paper. I owe to a discussion with E. MARCZEWSKITheorem 14 of w 2..
The resuit of 3.5 has been suggested to the author by a remark of K. SARKADI..
A. CsAszAI~ and J. CzIPSZER kindly read the manuscript of the present paper
and made some valuable remarks which I have utilized in preparing t h e
final form of this paper.
l am thankful to all those mentioned for their remarks and suggestions..
w 1. The axioms and their immediate consequences
1.1. Notations. In what follows if A and B are sets, we denote by-

A + B the sum of the sets A and B (i. e. the set of those elements which
belong at least to one of the sets A and B); A B denotes the product of t h e
sets A and B (i. e. the set of those elements which belong to both of the sets
A and B); to denote the sum resp. the product of a finite or infinite family
of sets, we use also the notation .E resp. H. The empty set will be denoted
by O; A ~ B expresses the fact that A is a subset of B; the subset of B
consisting of those elements of B which do not belong to A will be denoted
by B - - A . If a is an element of the set A, this will be denoted by aCA..
If a does not belong to the set A, this will be denoted by a CA.
I. 2 . Definitions and axioms. Let us be given an arbitrary set S; the ele-
ments Of S which will be denoted by small letters a, b . . . . will be called
elementary events. Let ~2 denote a o-algebra of subsets of S; the subsets of
S which are elements of ~ will be denoted by capital letters A , B , C, . . .
and called random events, Or simply events. [The supposition that ~ is a
o-algebra means (see [15], p. 28) that 1. if A,~ r c% ( n ~ 1, 2 . . . . ), we have
co
~__TA,0~ ~ ; 2. if A ~ ~.%, we have S - - A ~ cq;; 3. ~ is not empty. This implies

n:=l
that 0 C ~ and S ~ ~.] Let us suppose further that a non empty subset &,
ON A NEW AXIOMATIC THEORY OF PI~OBABILITY 28~
of ~ is given; we do not suppose any restrictions regarding the set N. ~ W e

suppose finally that a set function P(AIB ) of two set variables is defined for
A E ~ and B E N ; P(AIB ) will be called the conditional probability of the
event A with respect to the event B. As the conditional probability of the
event A E ~ with respect to the event B is defined if and only if B belongs
to N, N may be called the set of possible conditions. We suppose that the
set function P(AIB ) satisfies the following axioms:
AXIOM I. P(AIB ) >-- 0, if A E ~ and B E N; further P ( B ! B ) - 1, iff
BEN.
AXIOM II. For any fixed B E N , P(AIB ) is a measure, i.e. a countably
additive set function of A E a, i. e. if A,~ E ~ (n - - 1, 2,. :.) and A~Aj~: - O.
for j @ k (j, k = 1, 2 . . . . ), we have
Axiom 11I.3 If A E ~, BE a, C E ~, and BC E N, we have

P(A IBC). P(B IC) = P ( A B IC).
If the Axioms I--IlI are satisfied, we shall call the set S, together with:
the a-algebra ~ of subsets of S, the subset N of ~ and the set function
P(A]B) defined for A E c~, B E N, a conditional probabifity space and denote
it for the sake of brevity by [S, col;,N, P(A[B)].
1.3. Connection with Kolmogorov's theory. If P(A) is a measure (i. e.
a countably additive and non-negative set function) defined on the
a-algebra ~ of subsets of the set S, if further P(S)~---1, then the triple
[S, d~, P(A)] is called a probability space in the sense of KOLMOGOROV, and
if we define d~* as the set of those sets B C A for which P ( B ) > 0 and put
P(A]B) ~ P(AB)P(B) for A E ~ , B~G*, clearly [S,c~,~2*,P(AIB)] is a condi-
tional probability space which will be called the conditional probability space
generated by the probability space [S, ~, P(A)].
Conversely, if [S, eft, N, P(AIB)] is a conditional probability space and
C is an arbitrary element of N, putting P c ( A ) = P(A IC), IS, A, Pc(A)] will
be a probability space in the sense of KOLMOGO~OV. Thus a conditional pro-
bability space is nothing else than a set of ordinary probability spaces which
2 It will be seen that our axioms imply that O (~ N, but it is possible that oAR con-
tains all elements of c~ except 9 on the other hand, it is possible that N contains
only one set. The theory is somewhat less general, but considerably simpler if it is sup-
posed that N is an additive class of sets, i. e. if from B~EN and B2 ~ N it follows
B~ -~- B 2 ~ N. Concerning the consequences of this supposition see [13].
3 See 1.7 where an equivalent axiom (Axiom I l l ' ) i s discussed, and I. 8 where a
stronger form of this axiom is mentioned.
290 A, RI~NY1
are connected with each other by Axiom III. This connection is such that it
is in conformity with the usual definition of conditional probability. Namely,
:if we put Pc(A)--P(AIC) for A E c~ with CE 33 fixed, and define the con-
ditional probability P*(AIB) for a B E o% for which P c ( B ) > 0 , as usual in
P (AB)
the theory of KOLMOGOROV, by P*(A[B)== Pc(B) ' we have by Axiom III
P*(AIB ) - P(ABIC) =P(A[BC). In case S~33, clearly [S, cq, Ps(A)] is a

P(BI c)
probability space in the sense of KOL~OaOROV. It must be mentioned that in
this case [S, ~, 33, P(A[B)] may not be identical with the conditional proba-
bility space generated by [S, c% Ps(A)] because ~3 may contain sets B for
which P(BIS ) ~ 0 and at the same time need not contain every set B for
which P(BtS ) > 0, i. e. the system 33~ consisting of all sets B E c~ for which
P ( B I S ) > 0 need not be identical with 33. However, if P*(A]B) is defined
by p,(AIB ) = - P(AB IS) for B C 33~, we have P*(A!B)=P(A] B), provided
p(BIS)
that B E 33.
1.4. Immediate consequences of the axioms. We shall prove some simple
theorems which follow fl'om our~ axioms. In what follows if P(AIB ) occurs,
it is always tacitly assumed that A ~c~ and B ~ 33. We denote the set S - - A
by A-.
THEOREM 1. P(AIB)=P(ABiB ).
PROOF. If in Axiom III C=B, we have P(A]B)P(BIB)--P(ABtB ).
Taking Axiom I into account Theorem 1 follows.
REMARK. It follows from Theorem 1 that P(SIB ) = 1; namely, by Theo-
rem 1 P(SIB)-~P(SBIB)--P(BIB) and thus, by Axiom I, P ( S [ B ) = 1.
THEOREM 2. P(AtB ) ~ 1.
PROOF. According to Axiom II we have P (A B]B) + P (ABIB) = P (BIB).
As by Axiom I P ( B ] B ) : 1 and P(A-BIB) ~ 0, it follows P(AB]B) <= 1 and
thus by Theorem 1 we obtain P(A[B) ~ 1.
THEOREM 3. P ( 9
PROOF. According to Axiom II P( 9 P( 9169169 ) and
thus P ( O I B ) ~ 0 .
REMARK 1. It follows from Theorem 3 that 9 ~g3, because if O would
belong to 33, we should have P( 9169 by Axiom I and P ( O ] O ) : 0
by Theorem 3; thus the assumption 9 C 33 leads to a contradiction.
REMARK 2. It follows from Theorem 1 and 3 that if A B : O , then
P(AIB):O.
ON A N E W AXIOMATIC T H E O R Y O F PROBABILITY 291
THEOREM 4. p (A]B C) P (B IC) : P (BIA C) P (AIC).

PROOF. By Axiom Ill both expressions are equal to P(ABiC ) and thus
to each other.
THEOREM 5. If A ~ A ' ~ B ~ B ' , we have
P(AIB' ) < P(A']B).
PROOF. We have
P(A IB') = P( A A' B[B') = P(A A']B B') P(BJ B')
<: P(AA']B) : P(A'{B)--P(AA'IB ) <: P(A'iB ).
REMARK t. If A ~ A ' , we obtain the following special case of Thec-
xem 5: if A ~ B ~ B ' , we have P(A~!B') =< P(AIB).
REMARK 2. If B : B ' , we obtain the following special case of Theo-
rem 5: if A ~ A', we have (without supposing that A' ~ B) P(A [B) ~ P(A'[B). 4
As a matter of fact, P(AIB ) = P(ABtB ) and P(A'IB) ~ P(A'BIB) by Theorem
t ; if A C
--A ' , we have A B ~ A ' B ~ B , and Theorem 5 can be applied.",
THEOREM 6. /f A I @ A ~ B 1 B 2 ~ s further P(A21B1) P(A~!B2) > 0, we have
P(A~ IB1) _ P(AIIB2)
P(A_~IB,) P(A21B~_)"
PROOF. According to Axiom-IlI
,(1) P(A~ [BIBs) P(B~ jB~) = P(A~B~ [B2) : P(A~{B2)
and similarly
9(2) P (A21B~B._,)P (Bx [B~) = P (A,_~B~ IB3 = P (A.21B:~):
9Dividing (1) by (2) we obtain
P (A~]B~ B~) P (m, [B~)
(3) P(A~ 1B~B~)- - P(A,.~[B,2) "
Interchanging BI and B~ in (3) we obtain
P(A, IB~B~) P(A, IB,)
(4) P (A~_[B~B2) P (A2 IBm)"
Comparing (3) with (4), Theorem 6 follows.
co
THEOREM 7. If C ~ B : - ~ B~ and ABjB,~C= O for j ~: k (j, k = 1, 2.... ),

k=l
we have
O3
P ( A I C ) ~ ~ ~7 P(A[ B ,~C) P(B,~IC ).

k~l
4 It should be mentioned that this special case of T h e o r e m 5 follows from Axioms

I and If; Axiom lll is not needed.
5 Acta Mafhematica V1:3--4

292 A. RI~NYI
PROOF. By Axiom 11I P ( A [ B k C ) P ( B k l C ) : P ( A B k t C ) and thus by

Axiom II and applying Theorem 1 twice, it follows
2P(AIB,~C)P(B,:IC)= 2P(ABk]C)=P(ABIC)=P(ABC]C)--
k=l 1~=1
:P(AC{C):P(AIC).
REMARK, We mention the following consequences of Theorem 7: Let
co
us suppose Bk(~J% and B j B k = O for j=f=k, B : ~ , B ~ : and B(O3; if

k~l
P(AIBk ) ~ ;oP(A'IB~) for k--= 1 , 2 , . . . where /~ >= 0, we have P(AtB)_<--

~ 2P(A'[B).
Especially if P(A[Bk)~LP(A'IBk ) for k~- 1, 2 , . . . , we have P(A ]B)
~ 2 P ( A ' [ B ) , further if P(A]B~,:)~2 for # ~ 1,2, . . . , we have P ( A I B ) ~ 2 .
PROOF. Applying Theorem 7 twice with C ~ B we obtain
co co
P(AtB ) -----~.~P(A tBj,,)P(B,~IB) =< s . ~ P(A' ]B,~)P(Bk[ B) = ;~P(A' IB).

k~- 1 h; ~ 1
The two other assertions are evident consequences.

1.5. Representation of lhe conditional probability as a quotient. We
shall give a sufficient condition under which the set function P(AIB ) of two.
set variables can be represented in "quotient form", i. e. in the form
Q(AB)
P ( A I B ) - - Q(B) where the set function Q(A) is a measure on ~ and
satisfies Q(B) > 0 if B ~ 03.
TI-IEOREM 8. Let [S, c%, ~, P(AIB)] be a conditional probability space..
Let us suppose that there exists a sequence of sets B,(n-----O, 1,...) for which
the following properties hold:
a) B,~B,+I ( n ~ O , 1, ...),
b) e(B0iB,0 > 0 (n = 1, 2,...),
c) For any B~ O3 there can be found a B,, for which B~B,~ and
P(B[B.) > O.
Then there exists a finite measure Q(C) defined for C ~ 6~* where ~
is the ring of those sets C ~ ~ for which there can be found a B~ with
C~B,~, and this measure Q(C) has the following properties:
Q ( B ) > 0 if B ( ~ ,
p(AIB ) _ Q(AB)
Q(B)
if the sequence B~ satisfies besides a), b), c) also the following con-
dition :
d) lira P(Bo[B.) > O,
ON A NEW AXIOMATIC THEORY OF PROBABILITY 293
then Q(C) can be defined for all C ~ ~ and is a bounded measure on ~,

and thus, putting P ( C ) - Q(C)
Q ( S ) ' we have P ( S ) = 1. Denoting by 33* the set
of those sets B ~ c% for which P(B) > 0, if 33* is not identical with 33, we
may extend the definition P(A]B) to all B ~ 33* putting
P(A IB): P (AB).

P(B) '
the conditional probability space [S, ~, 03", P(A 1B)] obtained in this way will
be identical with the conditional probability space generated by the ordinary
probability space [S, ~, P(A)].
PROOF. First we suppose only that the sequence B,~ has the properties
a), b) and c). Clearly 02 is a ring and we have 3 3 ~ c % * ~ . Let us con-
sider a set A r choose an index n for which A~B,~ and define Q(A) as
follows :
P(AiB~)
(5) Q ( A ) - - P(Bo{B,0 "
It is clear that the value of Q(A) does not depend on the choice of n. As
a matter of fact, if A ~ B , and A ~ B ~ where n < m , we have by Theorem 6
P (A IB,0 P (A IB,,)
P(BoJB.) P(Bo!B. 0
Now, if B ~ 33, Q(B) > 0, we have
Q(AB)
(6) P(AIB) - - Q(B)
This can be shown as follows: if B ~ B ~ and P ( B [ B , ) > 0 , we have
Q(AB) P(ABtB, O_ " P(BoIB.) P(AB[B, 0
Q(B) P(B0[B,~) P(B[B,) P(B[B,~)
Applying Axiom III we obtain
Q(AB)
Q(B) - - P(A[BBn).
As B B , - - B , (6) is proved.
From (5) it can be seen tliat Q(A) is non-negative. To show that Q(A)
is a measure, we have to prove that Q(A) is countably additive on 0L*, i. e~
if Ak r c~* (k ~ 1, 2, ...) and A:~A~~ 0 for j =4=k and ~ A~ = A E c~*, then

k:l
Q(A)---~ ~'Q(Ak). This follows simply from the remark that if A~B,~, we
k:l
have Ak~B,, for 1 , 2 , . . . and thus in the relations

Q(Ak)-- P(AklB'~) P(AI B~)
P(B01B~0 (k-~--1,2,...); Q(A)= P(Bo]B,)
5*
294 A. RENYI
the same B, may be used, and therefore the countable additivity of Q(A)
follows from that of P(A]B) for fixed B r e~g (Axiom ll).
Thus the first part of Theorem 8 is proved. Now we suppose that the
sequence B,, has besides a)--c) also lhe properly d). By a well-known the-
orem (see p.e. [15], w 13, Theorem A) the definition of Q(A) can be extended
to the smallest o-ring C-[*+ containing c'I* in such a manner that Q(A)
Co
remains countabl) additive on c2"*. Let us put S* - - ~ B

~ ,,; we shall show
that C[** is identical with the o-algebra d S * , i.e. with the set of all sets of
the form AS* where A ~ off. As a matter of fact, r and cqS* is a o -
ring, thus we have clearly ~ * *_~c q ; s ' ~. On the other hand, if A c c't, we have
CO
A S* ~ ~.T A B ....
~t ---(I
Now AB,,~_B,,, and thus AB,~c~[ *, and therefore AS*~c]**; thus

c'IS*~d~** which implies ~ S * c]**. Thus the definition of Q(A) can be
extended to all A ~r We prove now that Q ( A ) i s bounded on d S * .
To show this it is sufficient to prove that Q(S) is finite. But S * = lim B,,
)1 -:~ CO
and B,,CB,,+,, and thus Q ( S * ) = l i m Q(B,,) where Q(B,,) is non-decreasing.

On the other hand,
P(B.,, I&, ) 1
Q(B,,) - -
P(B,>I B,,) P(BoIB,)
Thus e) implies that Q ( S * ) < - F o~, Defining Q(A) by Q ( A ) = Q(AS*) for
A ~ J, At:dIS*, the definition of Q(A) is extended to the whole a-algebra A.
The final part of Theorem 8 concerning P(A) is obvious. Thus Theorem 8
is proved.
A necessary and sufficient condition for the existence of the quotient
Q(AB)
representation P ( A i B ) - Q(B) is contained in the paper [13] of A. CsAszAR.
1.6. t?andom variables on a conditional probability space. Let [S, ~%, ~`3,
P(AIB)] be a conditional probability space. If _~=~(a) denotes a real-valued
function defined for a ~ S which is measurable with respect to a , i.e. if Ax
denotes the set of those a ~ S for which _~(a)<x, we have A~ ~ for all
real x, we shall call ~ a random variable on [S, ~, g), P(AIB)]. Vector-valued
random variables are defined similarly. The (ordinary) conditional probability
distribution function of a random variable .~ with respect to an event B ~ 0'3
is defined by F(x]B) : P(A,,t B) ; if F(xlB ) is absolutely continuous, F'(x[B) ~-
: f ( x [ B ) is called the (ordinary) conditional probability density function of~
with respect to B. The conditional mean value M(~]B) of ~ with respect to
an event B ~ N is defined as the abstract Lebesgue integral
M(~! B) --: .t~(a)dP(AI B)
N
ON A NEW AXIOMATIC T H E O R Y O F P R O B A B I L I T Y 295
of ~ with respect to the measure defined on S by P(AiB ) with B fixed.

Higher conditional moments, the conditional characteristic function etc. are
defined similarly. The random variables ~ and q are called independent with
respect to an event C, if denoting by A~ the set of those a ~ S for which
.~(a)<x and by B~/ the set of those a ~ S for which l~(a)<y, we have
P(A:,~B,,iC):P(A,~iC)P(BylC ) for every real x and y.
As [S, ~, P(A[B)] is for any fixed BoE ~ a probability field in the sense
of the theory of KOLMOQOROV, any theorem of ordinary probability theory
remains valid when ordinary probabilities, distributions, m~an values, inde-
pendence etc. are replaced by conditional probabilities, conditional distribu-
tions, conditional mean values, conditional independence etc. with respect to
the same B ~ ~ . Problems which are peculiar to the theory presented in this
paper are those in which conditional probabilities with respect to different
conditions are figuring. We shall see in w 3 some examples of such problems.
Let us mention that if .~ is a random variable, and A~ denotes the set
consisting of those elements a ~ S for which c~ <= _~(a)< fl, and if A~, ~ ~ for
a set X of intervals (re, i~), then the conditional probabilities P (A,~
:' IA~) can
be considered for ( e , ~ ) ~ X and t h u s ~ generates a conditional probability
space on the real axis gt, as the space of elementary events, the a-algebra
c% being the set of Borel subsets of N and ~% consisting of the intervals
x.
This conditional probability space will be called the conditional proba-
bility distribution generated by .~ on the real axis.
Let F(x) denote a non-decreasing function of x which is continuous to
the left f o r - - ~ < x < q - r 1 6 2 If the set A~ belongs to 53 whenever
F(~)--F(e:) > O, and we have for any subinterval (x, y) of such an interval
(-,
(7a) P(AI~IA~) -- F(y)--F(x)
F(/C)--F(c0 '
we shall call F(x) the generalized distribution function ~ of ~; the function
F(x) is not uniquely determined, as together with F(x) G(x):cF(x)-t-d
where c > 0 is a l s o a distribution function of ~; but as F(x) will be used
only to calculate the conditional probabilities (7a), this will never lead to a
misunderstanding. If the distribution function F(x) of ~ is absolutely conti-
nuous, and F'(x):f(x), we shall call f(x)the generalized density function
of ~; clearly fix) is determined only up to a positive constant factor. If
F(x) = x (i. e. f(x) = 1) for - - ~ < x < + ~ , we shall say that the distribu-
tion of ~ is uniform in ( - - ~ , ~ - ~ ) .
Conditions ensuring the existence of a generalized distribution function of a ran-

dOln variable ~ can be formulated by using the results of the paper [13].
296 g. ~gNVl
If f(x) is the generalized density function of ~, we have

#
.['f(.)g.
(7b) P(A~] A~) ~z - -
ff(u)du
The generalized distribution function resp. density function of a random vector
is defined similarly.
1.7. An alternative form of Axiom llI. We have ah'eady pointed out
that our system of axioms can be characterised in the following manner: the
set S, the a-algebra c~ of subsets of S, the subset 03 of cr and the set func-
tion of two set variables P(A]B) defined for A r cr and B r 03 form a con-
ditional probability space if ~ ' n = [ S , cr P(A]B)] is an ordinary probability
space for every fixed B E 03 and if the probability spaces $~, and ~'~c, are
connected by Axiom llI if C r 03 and BC ~ 03. Thus different probability fields
can be combined to f o r m a conditional probability field only if they are
"compatible" in that sense that they satisfy Axiom III which c a n b e consid-
ered as the condition of compatibility.
Now it is easy to prove
THEOREM 9. Axiom III can be brought to the following equivalent form:
AxIoM IIl'. If B~03, C~03, B ~ C and P ( B I C ) > 0 , we have for any
P(ABIC)
P(A]B) = P(B[C)
P~OOF. Clearly Axiom IIl' is a special case of Axiom Ill. Conversely, if

Axiom Ill' is valid, Axiom Ill follows. This can be shown as follows: if
A r c%, B C0;, C E 03 and BCr g3, two cases are possible: either P ( B ! C ) = 0
or P ( B I C ) > 0 . In the first case we have also ~ P(ABIC)=O and thus
P(AIBC)P(BIC)=P(ABIC ) reduces to 0 = 0 . Now let us suppose
P ( B I C ) > 0. It is easy to see that Theorem 1 follows already from Axioms
l--III', and thus it can be applied in the present proof. But this means that
P(BCIC)=P(BIC ) > 0 and thus the conditions of Axiom III' are satisfied
with BC instead of B, and it follows from Axiom Ill'
(8) P(AIBC)P(BCiC ) P(ABC!C).
As P(BCIC)=P(BiC ) and P(ABCIC)=P(AB]C), it follows from (8)
P(AIBC)P(BIC ) -- P(AB 1C).
Thus Axiom IIl follows from Axiom IIl'.
': See the footnote ~ to Remark 2 to Theorem 5.

ON A NEW AXIOMATIC THEORY OF PROBABILITY 29"7
REMARK. Theorem 9 means that Axiom 1II contains a compatibility con-

dition for ~'B and ~c where B ~ C if and only if P ( B ] C ) > 0 ; if P(B[C)=O,
,SB and ~'e are compatible without any restriction. This fact is the basis of a
:general principle by use of which conditional probability spaces can be con-
structed. This principle is expressed in Theorem 14 of w 2.
1.8. Extensions of a conditional probability space. If [S, 0~, ~, P(AIB)]
is a conditional probability space, it is natural to ask how could this space
be extended, by including into N sets A C El; which are not contained in ~3.
T h e most simple way is suggested by Axiom Ill, and is contained in the
Iollowing
THEOREM 10. Let B1 denote a set for which B1 ~ ~ and B1r If there
,exists at least one set B~ with the following three properties: e) B2 C #~,
fi) B I ~ B.2, 7) P (B1 [B2) > 0, further if for any other set B?. which also has
She properties c;), fl), 7), we have B2B:~ ~ ~%, the definition of P(A]B) can be
extended for B = B1 by putting
P(A IB2)
,(9) P(AIB,)-- P(B IB ) "
PROOF. If P(AIB, ) is defined by (9), Axioms I and 11 are clearly saris-
:{led, therefore we have to verify only Axiom Ill. Three cases must be dis-
tinguished, a) If we put B I = C ' and B ' is a set for which B ' C ' ~ ~, we
must verify
P(A[B'C')P(B'tC' ) = P(AB'tC' ).
But this means by (9)
P(A IB, B')P(B~B'IB~) ---P(AB, B']B~)
what is true by force of Axiom llI, in view of B t B ' : B ' C ' r ~ and B~ r ~.
b) I f B ~ B ' C ' where C ' ~ b3, we must verify P(A[BOP(B'[C')=P(AB'IC' ).
But this means by (9)
P(B'iC')P(ABxlB2)=P(AB'IC')P(B~IB~ )
what reduces to 0 = 0 if P(B' 1 C ' ) = 0 arid to
P(ABIIB2) P(AB'IC')
{10) P(B~IB2) - - P(B'[(')
if P ( B ' [ C ' ) > 0 . But as B ~ : B ' C ' , by applying Theorem 1 to both condi-
tional probabilities on the right of (I0) we find that (10) is equivalent t o
{11) P(AB~IB2) __ P(ABI[C')
P (B, I P (B, t c') 9
But (11) follows from Theorem 6, taking into account that
ABI-[-B~:B~B.2. and AB~+B~B~=B'C'~C',
298 A. R~NV~
and that clearly P ( B ~ I C ' ) = P ( B ' ] C ' ) > 0 and thus C' has the properties e),
#) and 7) and therefore B~C'E ~.
If B ' C ' = & and C'----B1 where B'~c;I, we have to verify
P(AiB'C')P(B'iC')=P(AB']C'). As P(AB'IC')=P(AB'C']C'), this rela-
tion is trivially equivalent to P(AIB,)=P(ABIIBO, which is satisfied by
force of (9).
It is easy to see that the definition of P(AIB 0 does not depend on the
choice of B,2; as a matter of fact, if both B., and B.a have properties cO, ,~),
and 7), it follows by Theorem 6 thai
P(AB, IB,2) P(AB~ IB:~)
P(B~IB~) P(BII&)
It is also clear that P(AB~iB,a) can not be defined otherwise as by (9) be-
cause if B~ is included into ~-3, (9) must hold by force of Axiom III.
Another possibility for including new sets into ~ is yielded bypassing
to the limit; this procedure is described by the following
THEOREM 1t. Let us suppose that B,~ ~ ~, B,~ ~ B,,+~, further
O3
P ( B ,~ !B~i~l)
i > 0 (n = O, 1,.. .) and that [-IP(B,,[B,~+I) converges. If
l~ = 0
Co
Bo~ = . ~ B,~ does not belong to aB, the definition of P(A B ) c a n be extended
for B ~ B ~ , by putting
(12) P ( A I B ~ ) = l i m P(A[B;,) for any A ~ ~,
Jt --~- c o
provided that the following condition is satisfied: if B ~ oq3 and P(BiBx ) > (~
for some N, we have BBx E ~.
PROOF. For an arbitrary A~e% we put A~~ AOO AB~B~.:_~
( k : 1, 2, ...). As AoO~B,~ for n ~ k, the sequence P(A(~OlB,,) is by Theorem
5, Remark 1 monotonically decreasing for n : / c , k@ 1 , . . . and thus
lira P (A(Z")!B,~~- - P (A(J':)i B~)
exists.
Putting
co
(t3) P*(A i B~ ) = ~ P(A (7~iBm)

li: ---0
we have defined P*(A[Boo) for every A ~ r To prove Theorem 11, we have

to show that if P*(A]B~o) is defined b y (13), Axioms l--Ill remain valid,.
further that lim P ( A [ B , , ) : P ( A ] B ~ ) exists for all A C c% and P ( A [ B ~ ) : =
~t ~ co
~P*(AtBr i. e. that (12) and (13) are equivalent. As regards Axiom I, it

is clear thai P*(AIB~ ) ~ 0 ; the validity of P * ( B ~ I B o 0 : 1 can be shown as
ON A N E W AXIOMATIC T H E O R Y OF PROBABILITY 20~'
follows: if A~-Bco, A(~ and A{t'):Bz~B~_~ ( k = 1 , 2 , . . . ) which implies~

co
P*(Bco IB~) : - P(Bo[B~) q- . ~ P (B~.,B~;_~[Bco) =- lim P(BxlB~).

1,:=1 .Y-~ r
But
P(Bx]Bco) = lim P(B~vI.B,,) = lim 1-]-P(B,.:iB,.~,~) = //P(B~,.!B~+ 0
and thus
co
P*(B~ iBm) = lira HP(B,~IB,~+O = 1
because the remainder-products of a convergent infinite product are always

t e n d i n g to 1. Now let us verify Axiom 1I, i. e. that P*(AIB~o ) defined by
(13) is countably additive. Let us suppose that A~ ~ and A/At~= 9 if
co
j ' @ k (j, k = 1, 2, . . . ) , a n d Aco = ~ ,. Let us put A}~ and Ail'~

- - A , B~,B,, ~ for' k = 1 , 2 , . . . ,
further A(~ and A~ )~- A<Bj, B~;_z f o r
k= 1,2,.... As for A~Br; we have
co
(14) P(A!B~o) = P(A B,,)//P(B,,]&,+~)

~=]~
Cab
and P(A]B~,,)is countably additive, further A ~7:)- ~ , a~ and A~J,').A![')~- - O'

for n -~ m, it follows
co
(t5) P(A~)i B| = .~.~'P(A}~")JB~).

But
(16) P*(A co[Bc~) = k ~ P(A(~)/B~ ).
From (15) and (16) it follows
co co
(17) P*(A~iB~ ) ~- •' ~ o0 /~ -, ,~ ,~o~J-
k=-O n ~ l ~ = J . A: 0
As we have from (13)
k=_O
(A,, I ~) = P (A,,~Bco),
it follows from (16) and (17) that
co
(19) P*tA~ JB~) = , ~ P*(A, JBr
i. e. that P*(AIB~ ) is countably additive.

Now we prove that
(20) P*(A I R a ) - l i r a P(AJB.v)
X--~- c0
300 A. l~gNVI
for any A ~ ~. As a matter of fact, by (14)

/Y
,~=0
,(21) p (A 18~,,) = ~2 e (ac~ tS:,0 = ~ P (A~'~1B,-) = co
1,::0 1;:0
IIP(B,,I B,,+~)
It follows from (21) that
(22) lim P(AIBx ) --- 2 P(Ae~
a n d thus, by (13), (20) follows.

Now let us prove that Axiom Ili' is valid. We distinguish three cases.
T h e first case is when C r B ~ C and P(B~o}C)>0; then we have
P ( B x ! C ) > 0 if N is sufficiently large and thus
P(A B~,~IC)
.(23) P(A/Bx)= p(B,,IC ) .
Passing to the limit N - - , ~ in (23), and using (20), it follows that
(24) p(A~B| P(AB~IC)

P ( B ~ {C) "
T h e second case is when Br B~B~ and P ( B ! . B ~ ) > 0 ; we have to
prove that
P(ABIB~)
(25) P(AIB)-- P(B!Br
If there exists an index N for which B ~ Bx, we have for sufficiently large N
P(ABiBx) - - P(A[B)
P(BIB.v)
and thus, passing to the limit N--*~, (25) follows.
In the general case, let us consider the two measures ,,~(A)==P(A]B)
and ~ 2 ( A ) = P(AIB~). Let A r ~2 denote an arbitrary set, for which ACB.~.B
for some PC. Then we have by Theorem 6, taking N sufficiently large, to
ensure P(BIBa~ ) > 0,
P(AIB'v) P(AiB) !,2(A):P(B[B~)u~(A),
P(B[Bx) - - - P ( B ] B ) and rims
taking into account that lira P(BIBx)~P(BIB~)~I and thus P ( B I B ~ ) > 0
~--~- co
tor sufficiently large N. T h u s ,u~(A)~ Ctq(A) where C~P(BIB~ ) does not

depend on A if A runs over all sets for which there exists an index N for
which A~BNB.
Clearly these sets A form a ring. As !,,,(A) and C,,~(A) are finite meas-
ures, it follows that they coincide for the least a~ring which contains all
mentioned sets A. As B~ ~ 2 , B~-, it follows that ~2(A) : C~I(A) if A ~Bo~B

J~,':O
and A ~ . As B ~ B ~ , this means that ~2(A):C~I(A) if A C B , i.e. we
have proved that
P(AJB)-- P(AIBo~) if A ~ B .
P(BIB~)
As A B ~ B for any A r ~, this means that for any A r c% we have
P t a RI R ' ~ - P(ABiB~)
P(AIB)---v'-,-~-- P(BIB~ ) 9
Thus (25) is proved. The third case is, when B ~ - B ~ and C=B~o ; in this
P(AB~IB~)
case we have to prove that P ( A I B ~ ) ~ p(B~IB~) ; but this is evident
~ecause we have shown that P ( B ~ I B ~ ) : - I and by ( 1 5 ) P ( A B ~ [ B ~ ) =
~P(A!B| Thereby the proof of Theorem 11 is accomplished.
REMARK. The assertion of Theorem 6 for the case A~+A2~B~B2, but
without the supposition B~B,,~ ~, can be considered as a stronger form of
Axiom Ill, it shall be called Axiom IIl*.
AxIoM II1". If At ~ ~-t,A~2~ ~, B1 ~ 03 and B2 ~ ~, further A~+ A~~ B~B2
~nd P(A2JB~)P(A~[Bo)>0, we have
P(A~IB,) P(A, IB2)
P(A~]Bd P(A-2 IB2) 9
As Theorem 1 is not a consequence of Axiom III*, in case we replace
Axiom IlI by Axiom lII*, we must suppose the validity of Theorem 1 as
AxroM III**. P(A]B) = P(AB[B) for A ~ ~ and B ( 03.
Axiom lit* and Axiom Ill** together imply Axiom III' and thus Axiom
11I. As a matter of fact, choosing A~=-AB, A.2~-B,B~-~-B and B2-~-C
where A~c%,B~03, C~03, B ~ C and P ( B ] C ) > 0 , the conditions of Axiom
lit*are satisfied, and as by Axiom III** P ( A I B ) = P ( A B I B ) it follows that
p(AtB) - P(AB[B) P(ABIC)
P(BIB) P(BIC) '
]. e. Axiom III' is really a consequence of Axiom III* and Axiom III**.
As Axiom III and Axiom II1' are equivalent, Axiom III is also a con-
sequence of Axiom III*. Conversely, Axiom III* does not follow from Axiom III,
only in the special case when B~B~~ 03. If w e would suppose Axioms III*
.and lll** instead of Axiom Iii, the conditions of the extension theorems The-
orem 10 and Theorem 1 1 could be reduced: in Theorem 10 the condition
that for two s e t s B2, B~ with properties c,), fl) and 7), B~B~(03 could be
omitted, in Theorem 11 the condition that if B r 03 and P(BIBx ) > 0, then
302 a. ~ N W
BB~v E a-3 could be omitted. In what follows we do not suppose the validity
of Axiom III*; it should be mentioned, however, that Axiom Ill* is contained
as a special case in Axiom IV2 introduced by A. CSASZAR [13].
If a=[S,~'~,aZ, P(AIB)] and a ' = [ S ' , ~ ' , ~ ' , P ' ( A I B ) ] are two suck
conditional probability spaces t h a t S ~ S ' , c~Gc'{', ~3~a3' and P'(AIB)~-:
= P ( A I B ) if A ~ ~ and BE ~ , we shall say that the conditional probability
space ~ is imbedded into the conditional probability space ~'. In the forego-
ing section we considered only such special imbeddings of a conditional
probability space ~' into a conditional probability space ~' for which S ' = S,
~ ' ~ c r and ~ - ~ ' . These imbeddings o f ~ were obtained by extension of
,& in several manners. I f ~ is imbedded into ~', we shall write ~ ~ ~".
1.9. Continuity propertie s of conditional probability. It follows from
Axiom II that P(AIB) is for fixed B E N a bounded measure in A, and thus,
of course, continuous in A, i.e. if A , ~ A and A,~A,,+~ (or A,~@A,+ 0 for
a---~ 1, 2, . . . , we have for B E a3
Iim P(A, {B) - - P ( lira A,~[B).
As regards the continuity of P(AIB ) as a function of B, we have

r
THEOREM 12. If B~, ~ ~ and B,~ ~ B,~+I (n ~- 1, 2, .. .), further ~ . B,~ ~-

-~ B ~ ~, we have for A E ~
lim P (A IB,~) = P (A ~B).
PROOF. We have by Axiom III

p ( A I B , , ) - - P(AB"tB)
P(B,,IB)
i f P(B,~!B)>O. As I i m P ( B , ~ [ B ) = P ( B I B ) = I , the last condilion is satisfied
for sufficiently large n, and thus
lim P(AB,~IB ) P(ABIB)
limP(A[B,0: lim P(B,~IB) -- P(BIB) --P(A~B)"
Thus Theorem 12 is proved.

The situation is somewhat more complicated if we consider a decreasing
sequence of conditions. In this case we have
THEOREM 13. If B E ~03,BC, E $g and C,,~_C,.+~ (n-~l, 2,...) further if

O3
putting C = I - I c~ we have BC E O3 and P ( C ] B ) > 0 , it follows that
lira P (A !B C,~): - P (A ]B C).

PROOF. We have by Axiom III

P(AC.]B)
P(A[BC,,)-- P(C,,[B) if P ( C , , ! B ) > 0 .
As P ( C , , I B ) > = P ( C [ B ) > O , the condition P ( C ] B ) > 0 is satisfied by everyn,
and thus it follows
lim P(AC,~IB) P(AC!B)
lira P ( A ! B C , ) - - lira P(C,,IB ) P(CIB) --P(AIBC)
what proves Theorem 13.

1.10. Products of conditional probability spaces. The product of two
.conditional probability spaces (S (Jo, ~(~"), g3(t':), p(~:))~ ~,(7.-) ( k = 1, 2) is defined
as follows: let S=SO~,S(?-) denote the Cartesian product of the sets S (~':)
( k = 1 , 2 ) , i.e. let S denote the set of all ordered pairs (a(~),a ~2)) where
ao)~SO) and a(e)~S ~'-~, Let ~ denote the set of all subsets BI~-B~*B.2 of S
where B, ~ ~(~ and B_~~ ~(2~. Let ~, denote the set of atl subsets A = A~,A2
where A~ ~ CLo) a n d A.2 ~ ~(e), and let s denote the least o-algebra containing
~ . Let us define P(A]B) for A=Ar and B=B~,B._,, where A l ~ W ,
A2 ~ C~(2~, B~ ~ ~(~) and B~ ~ ~(2) by
P (A [B ) ~ po~ (AO) ]BY)) P(2)(At-') [Be')),
a n d let us extend the definition of P ( A / B ) for every fixed B~,% to all
A ~ in the usual way (see [15], w 13). Thus we obtain a conditional prob-
ability space 8~'~[S, ~, ~, V] which will be called the Cartesian product of
the conditional probability spaces g~(~) and 8~(~) and denoted by ~ = ~ ' ( ~ * g ' ( ~ -
The Cartesian product of a finite number of conditional probability
spaces is defined similarly. The Cartesian product of a denumerable sequence
of conditional probability spaces P("! = [S('O, c~t;':~,go(~o, p(~0] (/,:--= 1, 2 , . . . ) is
defined as follows: we denote by S = & , . . . , & , ... the Cartesian product
of the sets S(~') ( k - - l , 2 . . . . ) by ~ the set of all sets B ~ B I , . . . , B , ~ , . . .
where B,~N(") ( n = l, 2, . . .) and by ~[ the set of all sefs A of the form
A ~ - A ~ , A ~ , . . . A , ~ , S("+~),..., i. e. ~ is the set of all "cylinders" of S.
We define P(AIB ) for A ~ s and B ~ ~, where A ~ A ~ . . , A,~,S('~:~) ... and
B=-B~ ...B.,, by
P (A i!B) = H p0~-)(Ak[ B 0,
k=l
and extend the definition of P(AIB) in the usual way for any fixed B ff g3
t o all sets A belonging to the least o-algebra cq, containing cqi. In this way
w e obtain a conditional probability space P = [ S , ~ , ~, P(A]B)] which will
b e called the Cartesian product of the conditional probability spaces N~:~
and denoted by 87=- g o ) , . . . , g ( ~ , ) , . . . . To prove that g' is really a condi-
tional probability space, we have only to verify the validity of Axiom III,
304 A. R~NYt
as Axioms I and II are clearly valid for ~. As regards the validity of

P(AB[B)=P(AIB ) (i. e. of Theorem 1) for ~', it suffices to prove that
(26) P(A ]BC) P(BCI C) = P(A BCi C)
for A E ~ , B E ~ , C E ~ and BCE~.
As both sides of (26) are measures as functions of A for fixed /3 and C,
it suffices to verify (26) for the case when A is a cylinder, A = A 0 ) , . . -
9. . , A('0, S ('~+x)-...
In this case
P (A IB C) = H p(,o (A(~,)[(B C)(")).

1
Putting
N
p ~ =]]-p(~o ((B C)('~)tC(~)),
the sequence P.v is monotonically non-increasing, and thus lira P x ~ p exists.

N~o
Two cases are to be distinguished. If p = 0 , we have P(BCIC)=O and

thus P(ABCIC)=O and therefore (26) is satisfied. If p > 0 , we have
N m
P(A B CIC) = HP('~ (k)(BC)<'~)IC('o)H P<")((B C)(':) [C~"))

1 N+I
and
O3
P (B C IC) --- [/-p(,o ((B C)(~)tC <")),

]i'--1
and thus
N
P (AIB C) P (B CIC)=I~ P(~)(Ar C)<'o) P(":)((BC)('~176 [ ] Pr B C)(":)IC(')}

k=l N+I
and thus, as p(a:) satisfies Axiom III, we have

po,:)(A<,,)i(BC) (k))p(k)((B C)(k)IC(k))= p(..)(A (,o(B C )(~:)1
C (":))
and thus (26) is valid.
w 2. Examples of conditional probability spaces
2. 1. Conditional probability spaces of simple quotient type. An impor--

rant class of conditional probability spaces is obtained as follows:
Let S denote a set, ~ a o-algebra of subsets of ,9, /z(A) a measure on
6~, g3 the set of those sets B E ~ for which ,u(B) is positive and finite a n d
put P(AIB) =~z(AB) for AE~,BE~3. Then [S,~,~,P(A[B)] is a condi-
#(B)
tional probability space which will be called a "conditional probability space:
of simple quotient-type". We mention two special cases.
ON A NEW AXIOMATIC THEORY OF PROBABILITY 305~
a) Let us choose for S the n-dimensional Euclidean space E , , for A the

set of all measurable subsets of E,~, let f(x~,x.,_,..., x,) be a non-negative
measurable function on E~. For ~ we choose the set of ihose subsets B E
for which
m (B) - - ~ f ( x 1 , x.2,..., x~) dx1 dx.2.., dx,~

B
is finite and positive and we suppose that ~ is not empty.
p(AtB) - CO(AB)
c0(B)
for A E d~ and B E 4 . Clearly [S, c2, g3, P(AIB)] is a conditional probability
space of simple quotient-type, provided that B is not empty.
Especially, when f(xl, x2, . . . , x , , ) ~ 1, we obtain a conditional proba-
m(AB)
bility space, for which P(AIB) re(B) ' where re(C) denotes the Lebesgue
measure of the measurable set C; in this case g3 consists of all measurable
sets which have a positive a n d finite Lebesgue measure. The conditional
probability space thus obtained will be called the uniform conditional proba-
bility space in E,.
b) Let S denote a finite or denumerable set, with elements al, a~.... , a~..... ;
let p~, ( k ~ 1, 2 , . . . ) denote an arbitrary sequence of non-negative numbers.
Let col; denote the set of all subsets of S and ~ the set of those subsets B
of S for which ~.~ p~ ~ FI(B) is positive and finite. Let us suppose that g3
akEB
is not empty and put
P(AJB)~ Fi(AB) for A~c~[I and BE~.

FI (B)
Clearly [S, c~, g3, P(AtB)] is a conditional probability space of simple quoti-
ent-type. Especihlly, if p~.,~- 1 ( k = 1, 2 , . , .), we obtain a conditional p r o b a -
bility space which will be called uniform on the denumerable set S.
2. 2. Conditional probability spaces of alternative quotient-type. A general
type of conditional probability spaces is described by the following
THEOREM 14. Let S denote a set, ~ a a-ring of subsets of S if 7 E F
where F is an arbitrary ordered set of "indices", and suppose that ~ # ~ v
if fl < 7 according to the order relation of F. Let us put ~ ~ ~ ~ . Let ~
.yE1~
denote a subset of c~ and put ~ ~ ~ ; we suppose that ~ is not empty
.yE F
for any 7 E F and that J3# and J3~ are disjoint for fl ~ 7, Let us suppose
that for any 7 E F a measure l*~(A) is given for A E ~ (which can take also
the value + ~ ) , and that if B E ~ , , we have 0 < l*~(B) < + ~ . Let us sup-
-~00 A, RENYI
pose that if ~ E F, 7 E F, ,3 < 7 (according to lhe order relation defined in F )

and ,u~(A) < -~ ~ , then !%(A) = O.
Let P(A]B) be defined for A E cc~ and B E ~ as follows:
,%(AB)
P(AIB)-- ,%(B) '
where :,is uniquely determined by the condition B E ~%~,. Then [S, GL, ~3, P(AIB)]
is a conditional probability space.
PROOF. Axioms I and II are clearly satisfied. Instead of Axiom Ill we
verify the validity of Axiom III' which has been shown to be equivalent to
Axiom lII. Clearly Axiom III' is satisfied, because if B E ~ and C E ~%v and
B ~ C , only three cases are possible: ~ : 7 , ~ < 7 or 3 > 7. The case # : 7 is
evident. If # < , :" P(B]C)--:~(BC)
,
_ ~v(B) and as &~(B)< ~ and # < 7,
we laave by supposition , % ( B ) = 0 and therefore P ( B I C ) ~ 0 ; and thus there
is nothing to prove.
Finally 7" < ,~ is impossible, because from C E ~ 7 it follows !%(C)< @
and thus , u v ( C ) : 0 , what implies & ~ ( B ) = O because of B ~ C ; but this
contradicts B E s Thus Theorem 14 is proved.
The conditional probability spaces described by Theorem 14 will be
called "conditional probability spaces of alternative quotient-type". I f B E ~v,
we may call 7 the "dimension" of B, and say that the measures ,% are
"dimensionally ordered".
We mention two special cases:
a) A special case of a conditional probability space of alternative
quotient-type is obtained as follows : let f(x~, x~ . . . . . x,,) denote a non-negative
Borel measurable function in S--=En; let cq; denote the set of all measurable
subsets of E,,, and ~ the set of those measurable sets B, for which either
m(B)=.ff(x,,...,x,O dx~dx2.., dx,~ is finite and positive, o r B ' i s contained in a
k-dimensional subspace E !:~'~..... %~ of E,~ ( k = 1,2, n~l) defined by
x,~ : c ~ , x,.~ = c,,, ..., x~,,_7~ : c , : , , _ z . , and the integral
(27) m~:;,"":._;,
~''.....
..., ~"-~,_~= j'f(&, x2, . . . . x,~) dx,I. d x A 999 dxy,~
B
is finite and positive; where j ~ , j 2 , . . . , j~ denote those of the indices 1,2, ..., n
which are different from il, i.,, ..., i,~ ~; and in (26) x:~ = c~, x~-.~ c~,..,, x~,,_~= c,~_~,.
are substituted; we suppose that N is not empty; in the first case we
put
P (A IB) - - ~ (A B) .
re(B) '
ON A N E W AXIOMATIC T H E O R Y O F PROBABILITY 307"
el c 2
if B ~ ~ and B is contained in E~I,'~, " ' ~,-k
...~ ,,_~, we put
~; .... ~--~ tAB~/

"'" ~ * . - k \
P(AIB)=
m~l
i I ....... ,,-kk (B)
. . . ~ in--
where ~ ; .... ~,-7~ is defined by (27).

b) Another special case of a conditional probability space of alternative
quotient-type is the following: let S ~ E , ~ be the n-dimensional Euclidean
space, ~ the set of all Borel measurable subsets of E,, g3~ the set of those
Borel measurable subsets of E., which have finite positive k-dimensional
measure 7 (1 <=k<=n). Let us put g 3 = 2k=l

~, and P(A[B) - m km
( Ak B
( B) ) ' if
B ff ,~k, where mk(B) denotes the k-dimensional Lebesgue measure of the
set B. Clearly this conditional probability space is a n extension of the
uniform conditional probability space mentioned in 2. 1, a ) .
2. 3. Cavalieri spaces. Now we introduce a general concept, that Of a
"Cavalieri space". Let [S,~A, g3, P(AIB)] be a conditional probability space
and let us make the following assumptions:
C 1. To every real number t (~z =< t < fl) there corresponds a set C~,~ ~.
C 2. For any numbers a and b for which r ~ a < b ~
a<=t~b
C3. If e ~ s < t < f l , we have G . G : O .

C4. If for A ( ~ , A ' E ~ , we have P ( A ] G ) ~ Z P ( A ' t G ) , w h e r e Z > = O ,
for every t lying in the interval [a, b) (a <=a< b <=fl), it follows P ( A [ C ~ ) <=
-<- i P~ (AICo).
' b
If C 1--:-C4 hold, w e shall say that [S, ~, g3, P(AIB)] is a Cavalieri
space with respect to the family of sets { G ; a - - - t < b'}.
Let us mention that if instead.of a family C~ of the power of the con-
tinuum we consider a denumerable set {G,} satisfying properties C 1, C2,
and C 3, then (as it has been mentioned in the remark to Theorem 7) C 4
is always satisfied. On the other hand, Theorem9 14 shows that conditional
probability spaces in general are not Cavalieri spaces with respect to a family
{G} of the power of the continuum, satisfying C 1, C 2 and C3, because if
P ( G ] C , ~ ) : 0 for a ~ t < O, P ( A I G ) can be replaced by any other measure
for every t6 [a, b), by leaving P(A IC~) ~ unchanged.
The following simple consequences of the definition of a Cavalieri
space may be mentioned: if [S, dI,~,P(AIB)] is a Cavalieri space with
respect to the family of sets {G; a N t < ~}, then
See [26], p. 108.
6 Acta Mathematica VI/3 4

308 A. RtNV~
I2 If P(AtG)-~i~P(A'IG ) for a ~_ t < b, we have P(AiC~ ) ~P(A'iC~ ).

II. If P(A C~)~-~-2~ for a _~ t<b, we have P(AIC~O)=s
As a matter of fact, if P ( A [ G ) - ~ P ( A ' [ G ) for a<=t<b, we have
P(A[G)~)~P(A']G) and thus by C 4 P ( A ] C ~ ) ~ Z P ( A r IC,),
b
further (if Z > 0 )
we have P ( A ' I G ) _ <- 1 P ( A f G ) and thus P ( A ' I C ~ ) = < ~1 p(AjC~), i. e.
ioP(A'[C~)b ~ p(Ai Gr _1 ~,p(A, ICb~) (also for i ~ 0) and therefore P(A I C ~ ) =
= t~P (A'IC~).
On the other hand, II follows from l by choosing A ' = C~. The notion
of a Cavatieri space can be extended by replacing the one-parametric family
{C~} by a k-parametric family {Ct~,~.~..... ~.; e,: <=h< fl~; i ~ 1 , 2,..., k}.
2.4. Regular spaces. Let ~ = [ S , ~ , g 3 , P(AtB)] denote a conditional
probability space with the following properties:
R 1. To every real number t (~r t ~ fl) there corresponds a set R~ ~ N.
R2. For any pair of numbers a , b ( c r R ~ = . ~ R~ffg3.
a~t<b
R3. If cr we have R~.R~=O.
R4. If A ~ ~, P(AIRt ) is a Borel measurable function of t for c~=< t < fi
R5. There exists a strictly increasing and continuous function F(t) in
(e, ,~r with the property that if a =< a < b ~ fl we have for any A ff G
b
f P(AlPl)aF(t)
b
fdF(t)
In this case we Shall say that ~ is regular with respect to the family
{R~;a~--t<fl}. If F(t) is absolutely continuous, we shall say that ~' is
absolutely regular.
Clearly if ~' is regular with respect to the family {R,}, it is also a
Cavalieri space with respect to {R,}. 9 As a matter of fact, C 1, C 2 and C 3
are identical with R 1, R2 and R3, and C 4 follows from R 4 and R5.
It is easy to prove by using FUBINfS theorem, that if 0 < f(xl, x2,..., x,,) < M,
the conditional probability space defined in 2.2, a) is regular with respect to
s This consequence of the definition of a Cavatieri space (for 2 = 1) is analogous

with the well-known "principle of Cavalieri" (according to which if the areas of the
intersections of two solids with every 'horizontal plane are equal, the two solids have the
game volume) introduced by BONAVnNTCRACAvnunm(1598--1647) in 1635. This justifies the
name "Cavalieri space".
9 ]. CzwszEn has shown that lhere exist Cavalieri spaces with respect to a family
{R~} of sets which are not regular with respect to the same family {R~}.
the family {R,} defined as follows: R, is the set of points (x~, x2 . . . . x~,) for
which a~=x~<& for i=~k, i = 1 , 2 . . . . . n, and x k = t ( a ~ t < f l ) .
It is Clear that if ~' is regular with respect to the family {R,; a ~ t < ~},
we have
(29) lim
h "> 0, h--~0
P(AIRFh)=P(AIRO
for almost every t in (cr fl) with respect to the measure generated by F(I)
on (a, ~') for any fixed A.
It is also evident that if g' is regular with respect to the family
{R,;e<~t<fl}, we have P(R~[R~)=0 for a~a<=t<b<~fl.
Let ~--:~(a) (a ( S ) be a random variable on a conditional probability
space 3=[S, c~, 23, P(AIB)], further let us denote by R~ the set of those
a ( S for Which g ( a ) = t ( - - ~ ~ t < -F ~ ) . Let us suppose that ~ is regular
with respect to the family {R,}, i.e. we have for A 6
b
j P (AIR,) dF(t)
(30) P (A IR:) ~' b
.idF(t)
a
for - - ~ < a < b < + o~. Evidently the conditional distribution generated by
on the real axis is the distribution
p b d F(b)---F(a)
P(a<~<b!c<~_<d) = (R~]R~.)= F(d)--F(c) for c<~a<b~d,
i.e. F(x) is the (generalized) 9 distribution function of ~. If (30) holds, we

shall say that ~ has a regular conditional distribution with the distribution
function ~F(x); if F ( x ) ~ x , we shall say that ~ has a regular uniform
distribution.
The notion of a regular space can be extended by replacing the one-
parametric family {Rt} by a k-parametric family {R,,~,...,,~}. We consider
only the case when the following postulates are satisfied:
R 1. To every point T-(t~,t~,...,t#) of a k-dimensional interval
I ~ (e,: ~ t~ < P',:; i -- 1,2 . . . . , k) there corresponds a set R, E 23.
R2. For any subinterval ]~-(a~<~t~<br i - - 1 , 2 .... ,k) where
a,: ~ a~ < b,: ~ fl~ (i = 1, 2 . . . . , k) we have Rj : ~ R~ ( 23.
TEJ
R 3. If Ta =4= T2, we have Rr~.RT., - - 9
R 4. If A E ~, P(AIRT) is a Borel-measurable function of the variables
R5. There exists a function f(tl, t2,...,tt3 which is positive

and integrable on I, and if J is the interval a ~ t ~ < b ~ ( i = 1 , 2 . . . . ,k),
6*
310 A. ~N~t
(ai ~ a~ < b~ ~ fl~; i ~ 1, 2 , . . . , k ) , we have
f P(m IRT)f(t,, t,, . . . . , t~)dt~.., dt~.:
P(A[R,,) = J .ff(t~ . . . . , t~.)dt~ . .. dt,.: "

J
If R 1--R 5 hold, and f(t~ . . . . , t ~ ) ~ l - I f j ( t ~ ) , we say that the space is abso-

j:l
lutety regular with respect to the family {Rf} with independent parameters.
Especially if RT is the set defined by ~ ~ t l , . . . , ~k ,-~ tT~where ~, ~ , , . . , _~s~
k
are random variables defined on [S, ~, N, P (A]B)] and f(ti . . . . . t,~) = l i f t (ts),
j:l
the random variables ~ , . . . , ~ k are independent with respect to any Ra,
further ~j has a regular conditional distribution with the generalized density
function L ( t ) ( j ~ 1, 2 . . . . , k).
2. 5. R e m a r k on Borel's paradox. The notions of Cavalieri spaces resp.
regular spaces introduced in the preceding sections, are connected with the
well-known paradox of BOI~EL. (Cf. [1], p. 44.) Let S denote the surface of
the unit sphere x 2 + y 2 + z ~ : l ; let us introduce spherical coordinates 9o and
6~, defined by x ~ cos 9 sin `9, y ~ sin r sin `9, z ~ cos 3- (0 ~ 9 < 27c;
0 ~ ,9 < ~). Let us denote by 0; the set of all Borel subsets of S (i. e. sets
A for which the set of all pairs of numbers (`9, 9) for which the correspond-
ing point belongs to A, is a plane Borel set). Let us denote by g3 the set
/:~b, d
of all sets ,_.~,~ defined by the inequalities a - - c r < b, c=< ,9=< d, where
O<=_a<~b<=2~c and O < = c < d ~ . Let us put
f f sin ~ d ~ d q ~
A B b, d
P ( A B a,b'~) ~176
b d
if a<b,
i. e. let P(AiB ) for fixed B be proportional to the area of A B . Let us denote

by Bt the set q~= t, 0 ~ ,9 _< :v (i. e. put B~ ~ B t't, oj,
_ - - ~\. let us choose a bounded
measure / ~ ( B ~ ) ~ I defined on the Borel subsets of B~ and normed by
,u~(B0~ l, and put
t d
"B 5
/a( t,
(o --<_c < d ,0,
2~ x
and let us suppose that P(Bo,5 t B t ) = F ( x , t ) is a continuous function of x
and t, strictly increasing as a function of x. Clearly [S, ~, ~5, P(A[B)] = 8
is a conditional probability space by any choice of the measures (J,(A); we
shall prove that it is a Cavalieri space with respect to the family {B~; 0 ~ t < 2:~}
if and only if
1 9~
(31) = sin ,9d,9.
ABt,
This can be shown as follows: let us suppose that $ is a Cavalieri s p a c e

with respect to {B~; 0 ~ t < 2 z } ; let us choose a number ,~ (0 =< i, < 1) and
define x(t, ~) as the number for which, if A~(/~) denotes the interval 0--< ,9--<
<=x(t, 2O, we have P(A~(/.)]B~)~-L i. e. F(x(t, 2),t)~L Now it follows from
o u r supposition, that x(t, 20 is a continuous function of t and thus if
A ( 2 ) ~ ~.~ At(/.), we have A()0 E ~ , further P ( A ( g ) ] B 0 = 2 for 0 ~ t < 2 z .
O~<t<2n
It follows by II of 2 . 3 that putting B ~,o---~
b' ~ B ~,
b we have P(A(/~)]B,,)=g
b for
~ x(t,D
0 --<_a < b < 2~, i.e. ~- sin ~cl~dt-~-)~(b--a). By taking the derivative
a 0
of both sides, wi!h respect tob, by virtue ofthe continuity of x(t, ~), we obtain
that ~-,j ~ (t, D
sin,gd~)~ for every t (0 =< t < 2~). Thus, it follows easily that
0
(31) holds.
Our result can be stated as follows: the uniform probability distribution
on the surface of the sphere combined with arbitrary distributions on the
meridians q~--t gives always a conditional probability space, but this space
is a Cavalieri space with respect to the set of meridians (under suitable con-
tinuity restrictions) if and only if the density of the distribution is on every
1
meridian the same, and equal t o ~ sin,9. In this case the conditional prob-
ability space is clearly also regular with respect to the set of meridians, as
we have
b
" ~P(A]BOdt
p ( A ' B~~ ) - - b :1a f ( f l s i n ~gdoq]j dt -- "
e,
2. 6. Composition of conditional probability distributions. Let us suppose

that ~j--~j(a) (j---~ 1 , 2 , . . . , k; a ~ S) are positive random variables on t h e
conditional probability space ~' ~ [S, ~, ~, P(A IB)], Let us denote by R,,,~.... ,~.~
the set of those aES for which g x ( a ) ~ t ~ . . . . ,_~k(a)=tJ,~ and suppose that
the space $ is absolutely regular with respect to the family {Rt ...... ,j~;
0 ~ t~ < + oo ; i ~ I, 2, . . . , k}, with independent parameters and thus the
functions ~-(Ti) figuring in R 5 are absolutely continuous and F](t)~fj(t)
312 A. l~l~NYI
is almost everywhere positive (0 < t < - I - ~ ) . Let us suppose further that if

M is a Borel subset of the k-dimensional Euclidean space, ~ R~..... ,tj.
(h ..... tk)E~f
belongs to ~ . tn this case i f 1 and J~_t are k-dimensional intervals,
l=(a~<=h<fl~; i = l , 2 , . . . , k ) and J=(a~<-_t~<br i--1,2,...,k)
-+
where 0 =< a,; ~ a~ < b,: <= N ( i = 1, 2 . . . . , k), we have, putting ~ = ( ~ , . . . , .~),
vj
fZ.(tOdt
(32) P(~'EJI~'E I ) : M v' .
j-:l i~fi(t)d9t
aj
Thus the random variables ~, ~, ..., ~j~ are independent with respect to any R~.
Let u s calculate the conditional probability distribution of ~k = ~-t- ~ - t -
+ - . . § ~ . Let B~, be the interval 0 =< tj < T U = t, 2 . . . . , k) and A~ the set
defined b y ~ < z. Clearly, A~ ~ ~ , a s A~ = ~.~ B~..... , ~ ; we have by R 5
9 L;
9.
t HL.(t;) d#
j=l
k
i~__1~2.<.z, O ~ t j < T
P(A~IB~) . . . . - k
N o w let us suppose that 0 =< z~ < z.~ < T; in this case A,,C_A,..,~Br and thus
U ~ (tj) dt:,-
j-1
k
P(A~,.:]BT)-- ~ 9<~':
k T
,-
and
P (A~ A ~ ]Br) _ P (A~, [BT)
P(A~, ]AO - - P ( & , IA~.BT)
and thus
f k
.J j=l
k
f~: tj.. : zI
(33) p(& jA~o)-- j=l
f _Hfj(#)dti
k
i=l
ON A N E W AXIOMATIC T H E O R Y OF PROBABILITY 313
It follows that if we put

t
g~(t)=f~(t) and gi(t) =.IgJ ~(t--u)~.(u)du t==0

0
and
gAt) = o if t<O (j~-2,3,...,k),
we have by (33)
%t
Xgdt)dt
(34) P(A~, [A~) - - ~,~
Jgk(tldt
0
Thus gr~(t) is the generalized density function of ~j,,.
If j f M ) d t < § ~ ( j = 1 , 2 , . . . , k ) , we obtain as a special .case the

0
welt-known law o f composition of independent probability distributions, but
co
clearly some or all of the integrals ffj(t)dt may be divergent.

0
The method of Laplace transforms can b e applied also in the case when
~ 127
j'fj(t)dt= + oo, but ~fe-~fj(t)dt= q~j(s) exists for some s > 0 ( j - - 1, 2, :.., k)
0 0
co
and in this case we have, putting ~p~(s)=.fe-.~tgj(t)clt , evidently

k 0
~o,~(s) = / / ~ . (s).
j=l
If e. g. ~ and ~ are independent, regularly distributed random variables
which have the (generalized) density functions x ~--1 resp~- x~ -1 in (0, - [ - ~ ) ,
where e. > 0 and fl> 0, t h e (generalized) density function of the random
variable ~-l-~ is x "+~-1. A striking property of the random variables with
density functions x C'-1 (0 < x < - l - ~ ) is the following: if S has the density
function x ~-1 in (0, + ~ ) and C > 0 is a constant, the random variable C_~
has the same density function.
The problem of composition of distribulions in the general case (i. e. if
it is not supposed that the random variables considered are positive) is more intri-
cate. 1~ If ~ and ~ are independent random variables with regular joint distrib-
ution and density functions f(x), g(y), respectively, and the limit
+A
f(x--y) g(y)dy
lim -.4
+A
-- --h(x)
to The theory of "distributions" of L. SCHWARTZhas to be applied.

314 A. R~NYI
exists uniformly in x and is positive on a set of positive measure, then h(x)

is t h e d e n s i t y function o f ~ q - , / . This implies for example that if the distrib-
ution of ~ is uniform on the real axis, the distribution of ~-]-z/ is also
uniform, whatever the density function of ~ should be.
w 3. Some applications of the notion of a conditional

probability space
In this w we consider simple applications of the notions introduced in

w 1 and w 2 to some problems of ordinary probabilitY theory. The random varia-
bles considered in this w are thus random variables defined on an ordinary
probability space [S, ~, P], except when it is explicitly stated that they a r e
defined on a conditional probability space.
3. 1. The limit of the Poisson distribution for iv---,+ ~ . Let ~ denote
a random variable which is distributed according to Poisson's law with mean
value lv > 0. The random variable ~ generates the probability distribution on
the set 0+ of non-negative integers
)~e-~
pk(/,) ~---P ( ~ - - - k) = k! (k = 0, 1,...).
It follows easily from Stirling's formula that
(35) lira PH+~(~)--1
for j,/c~-~0, + 1 , +__2, .; thus if @ is the set of all integers, A and B are
finite subsets of 0 and B is not empty, then we have
lira P(~--[t~I ~ a t~--[/~] E B) -- n(AB)

~§ n (B) '
where n(B) denotes the number of elements of B. The result can be stated
as follows: the conditional probability distribution generated by ~ [ / ~ ] on
tends to the uniform conditional probability distribution on ~ if /~ ~ + ~ .
3.2. The number of prime divisors. An interesting application of the
result in 3. 1 to number theory is as follows: let U(n) denote the number
of different prime divisors of the integer n; let zek(N) denote the number of
those natural numbers n ~ N for which U(n) ~ k. By a theorem of P. ERDOS
[16] w e have
~(N) ~_ (log log N ) 7~e_loglogN.(1 _1_o(1))
(36) N k!
for [k--loglogNt<cVloglogN, where c > 0 is constant, and o(1) tends

uniformly to 0 for N - - - ~ , if c is fixed. It follows that if D ( 1 ) = 0 , D ( 2 ) = I
and D ( n ) = U ( n ) - - [ l o g log n] for n = 3, 4 , . . . and P~(N) denotes the number
of positive integers n =< N for which D ( n ) = k , we have by (35) and (36)

Pj(N)
lim Pk(N) = 1
N-~- Co
for ] , k ~ - 0 , + t , • ... ; thus the conditional probability distribution of the

number-theoretical function D(n) (1 ~ n - - < N) tends for N - - * ~ to the uni-
form conditional probability distribution on @.
3. 3. The normal distribution for a - * + ~ . Let ~ be a random variable,
normally distributed with mean value 0 and variance o~. Then we have for
c~a<b<=d
b
;e--~ dx
('Z
o"
P(a <=~ < blc <=~ < d) -
e
ie- d
(7"
and thus
b--a
lim P(a ~ ~o < b[c < ~,~ < d ) - -
O->-+'(3O = d--c '
i. e, the conditional probability distribution, generated by ~ on the real axis

N tends for o - ~ + ~ to the uniform conditional probability distribution on 2r.
3. 4. Limiting distribution of the sum of independent random variables.
Let ~1, ~2. . . . . _~, .,. denote independent random variables; let us suppose
that the variables ~,~ have the same distribution with the density function f(x)
+co
for which j~ f ( x ) d x existsY Let us suppose that M ( ~ ) = 0 and

-CO
+co
M (~_~)= f x~f(x) dx = D 2 < " ~ .

-CO
J
Let us put ~,~,= ~ + ~_2n-' ..- -t- ~,~ Putting

+co
~(t) = .f e~f(x) dx ( - - ~ <t< + ~),

-co
+co
we have .I [~f(t)"dt< + ~ f o r n => 2 (see [170, and thus denoting by f,,(x)

-co
the density function of ~,~, we have

q-co
(37) f~(x) = ~---~f(q~(t))'~e-"~:dt,

--09
u T h e s e conditions can be replaced by weaker olles.

92"316 A. R ~ Y W
As by supposition icf(t)] ~:~:1 for t::~:0 and lim [q~(t)l=0 , it follows easily
t~__+co
by the method of LAPLACE (see [17]) that

1
(38) f , , ( x ) - D ]/2~-n (1 + o(1)),
where o ( 1 ) - * 0 if n--.or uniformly for.ix t ~ C, where C does not depend

on n; thus for c ~ a < b ~ d we have
b--a
(39) lim P ( a --< r < bl c <=~,~ < d) = d--c-'
~'~---~-CO
i.e. the conditional probability distribution generated by .~ on N tends to

the uniform conditional probability distribution on gt for n - - * ~ .
3. 5. A connection between the uniform conditional Probability distribution
on g~+ and on g+. Let [S,c%23, P(AIB)] denote a conditional probability
field, ~ = . ~ ( a ) ( a E S ) a random variable, which takes only non-negative
integer v a l u e s ; l e t us denote by Ak the set of those a E S for which ~ ( a ) = k;
n
put B~ - - ~ H A
~,, and let us suppose that A~ E 23 and B,~ E 23 (k, n 0, 1, 2,...).
k=0
Let us suppose that ~ = t , ( a ) is an other random variable which generates
a regular uniform conditional probability distribution on the positive real
axis N+; let us denote by Ct the set of those a E S for which t~(a)= t and
put D x = .~.~ Cu; let us suppose that C,, E 23, D.~:E 23, further that B,,D~ E 23
O~y~x
(0 < x < + ~ , n = 0, 1, 2,...). We suppose
t~e-t
(40) P(Akl C,) k! (k=0, I,...),
i.e. that ~ has a Poisson distribution with mean value t under the hypothe-
sis that ~ = t. W e shall prove that in this case the conditional probability
distribution generated by ~ on the set of non-negative integers g+ is the
uniform distribution, i. e .
1
p(Aj~[B~)-=P(~-kI~<n) - for k = 0 , 1 .... , n ; n - - O , 1,....
' -- n-t-1
This can be shown as follows" We have by Theorem 12
P(A~ID~)
(41) P(A~[B,,) =~_.~limP(Ak[B,~D~,) = x+~lim p(B,~iD~ ) ,
As we have supposed that the distribution of ,t is not only uniform on o~+

but also regular, we have
2"
(42) P(A i D.~) = l f p ( A 1 C t ) d t

0
ON A N E W A X I O M A T I C T H E O R Y O F P R O B A B I L I T Y 317
~and thus by (41) and (42)

03
j ' P ( A kiCk)dr
o
(43) P (A~:IB~,) ~-- GO
/r ~'
o
As by (40)
f P (A;,.i C0 dt = . )(.~tke-t
y - dt ,
o o
it follows from (43) that
P(AeiB'~) - - n-~- 1
which was to be proved.
A realisation of what has been said can be given as follows: let S
denote the strip 0 _--<x < -[- ~ , 0 _--<y < 1 of the (x, y) plane, ~ the set of all
measurable subsets of S, ~ the set of all Borel-measurable subsets of S with
lI\ \
Fig. 1
positive and finite measure, and of all sets of positive linear measure lying on
some line x-~t (0 ~ t< -k ~), and let us put P(AIB ) - m2(AB)
m2(B) ' if m2(B)> O,
where m2(B) and m2(AB) denote the plane Lebesgue measure of B and AB,
respectively, and put P(AIB ) - mi(AB)
m~(B) ' if B is lying on a line x ~ t, where
318 A. R~NY[
ml(B) denotes the linear Lebesgue measure of B. Define /, as follows: 2 - -

= 2 ( x , y ) ~ x , and define ~=~(x,y)as follows: for a fixed value of X let us
~ xJ e-~ J~ "
have~0 for 0 ~ y < e - . ~ ' , a n d ~ = k for .(.% j! ~y<~--]F.t (k~l,2,...).
The sets A~ are shown on Fig. 1.
Clearly each set Ak (k-~-0, 1,...) has the area 1. Our result can be
stated in the following suggestive form: if ~ has the Poisson distribution
with mean value t under the hypothesis ~-----t and the conditional distribution
of 2 is regular and uniform on o~+, ~ is uniformly distributed on 2+. Clearly
the result obtained is a consequence of the following remarkable property of
the Poisson distribution: each member ~.~ of the Poisson distribution is as
a function of 2+ for fixed k a probability density function, in the interval
3. 6. Remarks on the method of Bayes. Let $ =--[S, ~, ~, P(A{B)] denote

a conditional probability space. Let ~ ( a ) a n d ~ = ~ ( a ) (aE S ) b e random
variables on ~' and suppose that ~ is regular with respect to the family of
sets Rxy defined by ~ : x , ~ = y for - - ~ < x < + ~ ; --~<Y<-k~ and
let us Suppose that R~,+7',~_--
+ ~_, R,~,!.~ ,~ where - - ~ < c < d < -i- ~ ,
c.,Czc<d, y--~-y- .d
--~<7<d<-~,further if - - ~ < = c < d - - < _ - l - ~ and - - ~ < 7 < d < ~ - ~ ,
we have
d 6
j" f P(AiR~,)h(x, y)dxdy

(44) P(.4 IR~r - ~ ~' ~0 ,
j" h(x, y)dxdy
e y
where h(x,y)is positive and integrable on any finite rectangle of the (x,y)
plane.
]~d, y
Let us suppose further that.,~,v E~ if c < d and R ~ ' ~ if 7 < d ,
further that we have
+1
f P (A !R:~,)h (x, y) dx
(45) P(A ~ 'J) = '~ ,+ ( c < d)
.t'h(x, y)dx
and
d
.iP (A i (x, y) ay
(46) PtA
\ t *Ip+'x~
~% :c] - "
" d (7 < d).
y)dv
ON A N E W AXIOMATIC T H E O R Y O F P R O B A B I L I T Y 319
In this case clearly for any fixed y, h(x, y) is, as a function of x, the condi-
tional density function of ~ under hypothesis ~I=Y and, for any fixed
x, h(x, y) is, as a function of y, the conditional density function of ~/ under
the hypothesis ~ = x . As a matter o f fact, choosing 7 = - d ~---y we have for
c~a<b~d from (45)
b
~ h(x, y)dx
f~
(47): P(a~<b]c~<d,,2=y)-- a
f h (x, y) ax
c
and choosing ~ = f l = x we have for 7 ~ e < f l ~ d from (44)

8
~h(x, y)dy
o~
(48) p(~, ~ ~ < ~[~, ~ ,~ < ~, ~ = x ) =
.f h (x, y) ay
Y
It should be mentioned that the set /?~ defined by _~=x is not sup-
posed to belong to g3, and thus h(x,y) is not an ordinary conditional den-
sity function, but only a generalized conditional density function, in the sense
defined in 1.6. By other words, h(x,y) is the generalized density function
of the random variable ~ on the subspace (R.~, a~, g3~, P~) of (S, A, B, P)
where 6[.~ is the set of all sets of the form AB~ with A ~ c~, ~ the. set of
.all sets B 6 g3 which are subsets of /?x and P~(AIB ) is defined by
P~:(A IB) = P (A [B)
for A ~ and B~-3~. If RxE~, clearly h(x,y) is up to a constant factor
the ordinary conditional density function of ~ under the condition /?~ and
:in this case f(x)={h(x,y)dy is finite.
q-co
If f ( x ) ~ j h(x, y)dy is finite for every x, then f(x)is the generalized

-{30
,density function of ~. As a matter of fact, we have by Theorem 12

b
~y(x) ax
P(a<=~_<blc~<d)~lim p(a<~<b[c<~<d, 7 ~ B < d ) - - - ~a
ff(x) ax
c
Putting
h (x, y)
.(49) g(y Ix) - - f(x)
besides h(x,y), g(ylx) is the normed (i. e. ordinary)condilional density func-
320 A. RZNVl
tion of ~ under condition ~ = x . Similarly, if we denote by f(xly ) the gener-

alized conditional density function of ~ under the condition q ~ y , we may
put f(xlY)=h(x,y) and thus we have
(50) f(xiy) = g(ylx)f(x).
If f h(x, y)dx is also finite,
-CO
(51) f.(xFy)_ g(ylx)f(x)
-(10
is the normed (ordinary) conditional density function of _~ under the hypo-

thesis 9l = Y.
The result obtained in this w is a generalization of the well-knowr~
method of BAYES. The generalization consists in the possibility that
+co +co
[ f(x)dx and .i" f(xpy)dx may be divergent.

-co --CO
Clearly if h(x,y) vanishes in some points of the plane, all what we

have said remains valid, only we may use as conditions only such sets
ds
cN~<d, 7 ~ ~ < d for which [Yh(x,y)dxdy>O.

c'g
3. 7. Conditional ergodicity of Markov chains, get us consider a tempo-

rally homogeneous Markov chain with a denumerable set of possible states;
let us denote the states by the numbers 0, +_1, + 2 , . . . and put ~ , , = k if
the system is in state k at time n (n = 0 , 1,2,...). Let us denote by P~k the
probability of passing from state j to state k in one step, i.e. Pi~,-=
:p(.~+l:kl~,~-j). We introduce the following notion: if there exists
a sequence P~ o f positive numbers, with the property that, putting
P(~;~ ~ kl~0 = j ) ~ O(n~
,jl~ , we have
(52) lim 'o0~) : p ~
for all h, i , j , k : O , +__1,Z2, ..., we shall say that the Markov chain is con-
ditionally ergodic. If the chain is ergodic in the usual sense (see [18]), i.e. if
1300
(53) lim r~k == PJ, > 0 (i, k = O, §__ 1, _ 2. . . . ),
~t --->- co
+co
then (52) holds and we have ~.~ PI, 1.

t,:~---- co
It is possible, however, that (52) holds without (53) taking place;

+co
especially, this is the case if (52) holds and ~ P~, = -~-~. If a Markov
Its- CO
ON A N E W A X I O M A T I C T H E O R Y O F P R O B A B I L I T Y 32[
chain is conditionally ergodic, this implies that the conditional probability

distribution generated by ~,~ on the set of integers converges to the con-
ditional probability distribution defined by P ( A J B ) - - ~AB

. ~ Pk where A and B
kEB
are subsets of the set of integers and B is finite and not empty.
Let us suppose that the Markov chain considered is irreducible and al!
its states are recurrent null states (see [18]) in which case lira PJ~~ If
~ - ~ co
the numbers P~ in (52) satisfy the system of equations
(54) P,,:= ~ PjPjk (k=O, +-1, +_2,...),

5=-~o
we say that the conditional limiting distribution defined by the n u m b e r s P~
is a stationary distribution. If the conditional limiting distribution exists, it i s
always a stationary distribution, i.e. (52) implies (54). As a matter of fact,
O3
as we have supposed that all states are recurrent, the series ,~P;,k ( k ~ 0 , + 1,...}
00
,~=~
diverges, and thus, it follows from (52) that
J~
x;~ p(,9
(55) Vt~: lim r=l __ Pk
r=l
But it has been proved by C. De,MAN [19] that if all states of an irreducible
chain are recurrent null states, the only solution P ~ = V~: of the system of
equations (54) with Vo= 1 is given by the limits in (55) which always exist..
The existence of the limits (52) has been proved by P. E~DOS a n d
K. L. CHUNQ [20] in the special case of additive Markov chains. Let us
consider an additive Markov chain, i.e. put _~n= ~o§ ~J~-f-~2 q - " " -k ~,~ where
the dj~'s are independent equidistributed random variables which take only
integer values, further suppose that the following conditions are satisfied"
Let us pnt P(dk=r)~---W,. ( r = 0 , • + 2 . . . . ). We suppose that the
g r e a t e s t c o m m o n divisor of the differences r--s where r and s are such
integers, for which IV,->0 and W~>0, is equal to 1, further ~hat
+co
~_, lr[W~<-k~ and ~.~ rW,.=:0. It has been shown by K. L. CHUNG

and P. ERD6S [20] that in this case the chain is (in our terminology) con-
ditionally ergodic, and the limiting conditional distribution is the uniform
distribution over the set of all integers, i.e. P~,:-~I. If the variance
+m
D ~:- ~_, r2W, of the variables dj~ exists, this follows easily by LAPLACE'S.
F:- CO
322 A. R~NYI
-~-(:D
method; as a matter of fact in this case, putting ~ f ( t ) = .,~ W~e~~, we have

r~-- (:o
+~
(56) -(') = l f(qD(t))2e!t(~-k)dt

-lg
and thus, by LAPLACE'S method,

/)(,) 1
(57)
DV
what implies
limp()!) - - 1.
If - t - ~ , the proof is more intricate.

D =
The existence of the limits (52) in the general case is still open, as has
been pointed out by CHUNG [21].
3. 8. On a paradox of the renewal theory. Renewal theory (see [22])
deals with stochastic processes of the following type ("recurrent pro-
cesses"): events occur at the moments 0 <-cl <'v2 < --- < ~ < --- where
lhe differences T ~ - - T , , - I = ~ are independent positive random variables
with the same distribution function F(x)=P(~, < x ) ; for any t > 0
let %(~) denote the moment when the first event after time t occurs,
i.e. w e suppose "t-~(t)-i~ t r and put ~(t) = ~ ( o - - t ; ~(t) is the
waiting time at t, i:e. the time somebody arriving at time t has to wait
until the next event occurs. Let us now suppose that somebody arrives at
random in the time interval (0, T) at time ~-, where-c is uniformly distributed
i n (0, T) and let us put d ( T ) = & ( - v ) . It has been proved by W. FELLnR
co
i[22] that if m = XxdF(x) (the mean value of the time interval between con-
o
secutive events)exists, and F(x) is not a lattice distribution function, the
distribution of "~(T) tends for T--~ or to the limiting distribution
(59) H(x)-- lim P ( & ( T ) < x ) = I J ( 1 - - F ( u )du.

/'--)-co
0
It has been shown by L. TAuAcs [23] that if we consider d(T) instead of

8~(T), the limiting distribution exists for a n y F(x) with finite mean value,
and is the same as in the case of &(T), i.e. we have
(60) H(x) = lira P ( d ( T ) < X ) : - ~ j ( 1 - - F ( u ) ) d u .

T,-,~- co
0
It has been pointed out by L. TAKACS [23] that the fact that the mean value
of the distribution function H(x) is m~+~176 where o 3 denotes the variance

2m
of ~:,,, i. e. o - ~ j (x--m)2dF(x), implies that if o - = - + ~ , the mean value

0
of 6(T) tends to - [ - ~ in spite of the fact that d(T) is the part of some
time interval ~ having the finite mean value m.
The situation is still more paradox, if m ~ ~ ; it can be shown (see [9])
that in this Case
(6t) lira P ( d ( T ) < x) ~ 0
for all finite x; nevertheless, if putting G(x) . i ( 1 - - F ( t ) ) d t the condition

0
x O' (x)
(62) z-+colim G(x)-~---0
is satisfied, there exists the limit of the conditional distribution of 6(T), and
we have
(63) lira P ( d ( T ) < xld(T) < y ) = G(x) for 0 < x ~ y < - [ - ~ .
~-~~ G(y)
3.9. Deduction of Maxwell's law of velocity distribution. Let us
consider an idea1 gas consisting of N partictes with equal masses m, let
~,, ~Ik, ~ denote the x, y, z-component of the velocity of the k-th particle;
we suppose that _~k,~k, ~k (k ~ 1, 2 , . . . , N ) are random variables defined on
a conditional probability space IS, N, ~3, P(AJB)]. Let us suppose that the
conditional probability distribution generated by the variables -~k,~k,~
( # = 1 , 2 , . . . , N) in the 3N-dimensional Euclidean space is uniform, further
that the space is regular with respect to the sets defined by _~=-xk, ~ . = y k ,
~,~zk ( - - ~ < X , ~ < + ~ , - - ~ < y k < q - ~ , - - ~ < Z k < + ~ ; k ~ l , 2 .... ,N),
further that the sets .~ Rq, .. ~3x belong to ~, where B~, is
(tl, , , , , t3N)~ 3I "' " " "' t3N
the set defined by ~1 ~---h, ~2a: t2, ~1 - - / 8 , ~e : h . . . . , ~, : h.~- and M is a

Borel subset of the 3N-dimensionaI Euclidean space, further that the space
is also regular with respect to the sets defined by ~ + " _~
(k--- 1, 2, . . . , N ; 0 < z~ < + ~ ) . All these conditions are satisfied, if S is
the 3N-dimensional Euclidean space, ~ the set of all Borel subsets of S,
&~- the set of all Borel subsets of S which have positive and finite k-dimen-
3N
sional measure ( k = 1, 2 , . . . , 3N), 030 the set of all points 0 f S a n d 03 =.~..,'03k
and we put P(AIB)--1 if B~03o and B~A, P(AIB)-~0 if B~03o
and B~i], further P ( A I B ) - - m~.(AB)
mk(B) if B ~03~,, where m~,(C) is
the k-dimensional measure of C ( k - - l , 2 . . . . . 3N), further if denoting
7 Acta M a t h e m a t i c a VI/3--4
324 ~. R~NY[
by P----(x~,ya,z~ ..... xN, y~-,za.) a point of S~ we put ~,(P)~x,,

/ h ( P ) - - - Yl, . . . , ~ N ( P ) - - Z ~ .
It follows that the kinetic energy e~,~ -~-

I m(~. + ~/~+ ~) of the k-th par-
ticle has the density function Vx for 0 < x < + ~ and thus by 2.6 that the
N
kinetic energy s ~ - , : ~ ~k of the whole gas has the density function gN(x)
defined by the recursion
g,(x)-~, g,dx)=jg,.~(x--y)/yay for x>=o.

0
Thus
(x >=0),
~TJ
3N
and therefore, as any constant factor is irrelevant, we may take x 2 (x > 0)
for the density function of e.
Similarly, the conditional density function of , under condition st~ ~ t
is clearly (x--t) 2 for x ~ t. It can be easily verified that the conditions

for applying formula (48) are fulfilled, and thus we obtain that the conditional
density function of 8/~ with respect to the condition e ~ - E is
3N-5
3N-5
vx
( e - x ) ~- Vx
f(xlE) I'2 3N-5 E 3N-5
j" ( e - x ) 2 ~ d x
0
0
3N
If N is a large number, we have, putting fl-- 2E ' approximately
3N-5
f(xlE ) ~ 9~/-'~~-~. e + .
Thus the conditional density function g ( x t E ) - ~ m x f -ffx-tE of the velocity

vl,,= ~k-1-,~k + ~ of the k-th particle under condition s = E is approximately
g(x[E)'"Vz o'/2 x~e ~'~

ON A NEW AXIOMATIC THEOt~Y~F PROBABILITY 325
1
where a = V ~ " As it is known that EN - - 3_2 k T where T denotes the absol-
ute temperature of the gas and k the constant of Boltzmann, we have
l,~--~-7~ and a = . Thus finally we obtain

3
(64) g(x]E) ~ -~ x~e 2,,,~ (0 < x< + ~),
i. e. Maxwell's law of velocity distribution.

Summarising, we may state our result as follows: if the components of
the velocity o f each of N particles are independent of each other and of
the components of the velocity of the other particles, further if all these ran-
dom variables have regular uniform (a priori) distributions on the whole real
axis, then the (a posteriori) distribution" of the velocity of each particle under
the condition that the kinetic energy of the whole system is given, is approxi-
mately, if N is large, the Maxwell distribution (64).
The proof given above shows clearly that the law of Maxwell is an
approximation, which is valid only for a large number of particles. For a
small number N we obtain the exact formula
It can also be shown by the same way that the conditional density function
of each of the variables _~I,.,~:, ~1~ under the condition ~ - E is exactly
where a ~ I / . Thus under the condition ~ ~ E all these variables are

/
approximately normally distributed. This is the reason why the velocities have
a Maxwell distribution, as the Maxwell distribution can be defined as the
distribution of a variable V . ~ ~ g~ where ~, ,~, ~ are independent and nor-
mally distributed random variables with mean value 0 and variance a s.
7*
326 A. RENW
w 4. Conditional laws of large numbers
Conditional probability is in the same relation to conditional relative

frequency as ordinary probability to ordinary relative frequency. This relation,
which is well known as an empirical fact from everyday experience, is de-
scribed mathematically by the laws of large numbers,
The laws of large numbers concerning the behaviour in the limit of the
conditional relative frequency (and generalizations concerning conditional
means of observations) shall be called "conditional laws of large numbers". These
laws will be investigated in this w Before going into details it shoutd be empha-
sized that the conditional probability P(A]B) is considered in this paper as
an objective characteristic of the random event A, under an objective condi-
tion B, and its value as that number in the near neighbourhood of which
/tab
the corresponding conditional relative frequency ~ will be found in general,
if a sufficiently great number n of observations is made, where kB de-
notes the number of those under the n observations with respect to which
the condition B has been realised, and /cab the number of those observations
which gave such results that, besides the condition B being realised, the
event A occurred also, and it is supposed that kR > 0.
4. I. A strong law of large numbers. In this section we consider ran-
dom variables defined on an ordinary probability space. We shall need the
following
LEMMA 1. Let ~1, ~2. . . . , ~...... be mutually independent random variables
9 9
with mean values M(~,,)=M~ >= 0 and finite variances D~-D~ Let us
put ~ - ~_, ~,~ and A,~= M(~,) = M~ + M2-F ... + M,~, and suppose that the
I~I
following conditions are fulfilled:

a) lira A,~--@ ~ ,
qg-0,- co
c o o
D~
b) ~ ] - - _ - - < @ ~ .
1~=~ A~
]t follows that
(67) P['lim ~ 1)=1.
PROOF. Lemma 1 (see e.g. [27], p. 238) is a consequence of the

well-known theorem of KOLMOGOROV, according to which if ~ , ~ , ,.., rl,~,...
are mutually independent random variables and M0~,) = 0 (k=- 1, 2. . . . ), then
co
_.,~~7~ converges with probability 1, provided that the series ~D~(~Tk) con-
1 , . = J.
ON A N E W AXIOMATIC T H E O R Y O F P R O B A B I L I T Y 327
verges. Lemma 1 can be proved also directly by applying the following

inequality which has been found recently by J. HAJEK (see [12]):
LEMMA2. Let 72~,~h. . . . , ~,~, ... denote mutually independent random vari-
ables with mean values M@~)=0 and variances D2(~k)=D~. (k---1, 2,...).
Let c~ denote a decreasing sequence of positive numbers, c~ <=ck+~ (k >=n); then we
have for any , > 0
1 2 2 ~ 2 2
(68) P(max ckl~l,+ ~ - ? 9.. + ~2,~[~ ~) < 7 cn D,o+
~i~l,', c k k:l k
Applying. Lemma 2, Lemma 1 can be proved as follows : it follows from (68),

1
applied for ~> -- ~k--ML,, c , ~ - ~ that
(69) P[sup G - - 1
g' ~ A~ -I- ~=,,+1 A~ ]
As a) and b) clearly imply that
lira J~=~ O,
(70) .... A,2
it follows from (69) that
(71) limP(sup G - - 1 > ) ~ - 0 for any ~ > O.
and (71) is evidently equivalent to (67).

4.2. A conditional law of large numbers.
THEOREM 15. Let [S, ~, o53,P(A[B)] denote a conditional probability

space and gl, g2, . . . , ~~,~, ... random variables on S which are mutually inde-
pendent w#h respect to C E g3. Let ~ denote the interval a <=x < b (a < b) of
the real axis. Let B,~ denote the set of those a E S for which ~,(a) ~ O, and
suppose that B , ~ C and B, ( ~ ; let us suppose that M ( ~ I B , ) = M,, > 0 and
D2(~!B~)~D~ exists (n = 1 , 2,...). Let us put P(&~IC)=p,~ and suppose
that the following conditions are satisfied:
co co
(72) ~.~p,-- + ~ and ~ p,~M~ = -r o~ ;

r~
M
(73) lira !"=~q~ ~ M
,J1--->-03 ,~
. a ~ p/,:
328 a. RENYI
exists ,"
(74) 2 p~(D~-7(1--p~)M;~)
Tllen we have
(75) P lim 2~<'r

__y,=Mtc)
gkE@ -
--1.
X 1
\ ~kE g I
PROOF. Let us define the random variable ~,: as follows :, ~; : 1 if g~:E

and ~'~---0 if ~ ( ~ g ; let us put ~ : ~ e ~ : . Then we have
/~ ,$ 9 ")
(~ktC):p~M,~ and M(~21C):p,~(DT~-q-MV)
and thus
D2 (g~l C) = p,.:(D~-~- (1--p,~.)M,~Z).
Applying Lemma 1 to the sequenc e ~ of random variables on C, it follows
that
.k
(76) P lira ~"--~

~t
--1 C -=1.
tl--~- eo 7-~ M
ZPh: 1,:
' k 1
On the other hand, let us apply Lemma 1 to the sequence of random vari-
ables ~ on C. As
M(e,~[C)--p,: and D2(~,~lC)=p,~(l--p,),
it follows that
(77) P lira/,=:J 1 C) =- 1.
' k=l
Combining (76), (77) and condition b) of the theorem, and taking into
account that
l~ ';t
(75) follows. Thus Theorem 15 is proved.

The statement of Theorem 15 can be expressed in words as follows:
the conditional empirical mean value of those of the variables g~,~2,...,~,~
which take on values lying in the interval g, converges with conditional prob-
ability 1 with respect to C to the limit M defined by (73). In the special
ON A N E W AXIOMATIC T H E O R Y OF P R O B A B I L I T Y 329
case, when M~, M > 0 and D,~=D > 0 do not depend on n, the conditions
co
(72)--(74) r e d u c e to the single condition that the series ~ ] p ~ diverges.

Especially let us suppose further that @ is the closed interval [0, 1] and the
only values in @ which the variables ~n can take on B, are the values 0 and
I (of course ~,, can take als0 other values outside ~). Suppose that A~ is the
set on which .~,,--1, and p:P(A~IB~O>O does not depend on n.
In this case the ,events A~ and B,, can be interpreted as the events
consisting in the realisation of some events A and B, respectively, at the n-th
experiment in a sequence of independent experiments, and we may put
p = P ( A ! B ) ; in this case
f i ( A [ B ) = ~-<~-<"
~_~ 1 ~ 1
is the conditional relative frequency of the event A with respect to the event
B in course of the first n observations. ~ The statement of Theorem 15 gives
for this special case
(78) P(lim fi~(A [B) : P(AIB)]C) = l ,
if P(B[ C) > 0, i.e. the conditional relative frequency of the event A with res-
pect to the event B converges with the conditional probability 1 (with respect
to C) to the conditional probability of A with respect to B.
4. 3. Generalization of the theorem of Borel on normal decimals. The
theorem of BOREL [25] states that if fn(k;x) denotes the relative frequency
of the number k between the first n digits of the decimal expansion of the
real number x (0 ~ x < 1), we have
1
(79) lim f"(k ; x ) - 10 ( k = 0 , 1, 2 , . . . , 9)
for almost every x; similarly all possible digits occur in the limit with equal
frequency in the expansion, in the number system with any basis q of almost
all real numbers.
We consider a more general representation of real numbers; namely
expansion of real numbers into Cantor's series. Let q~ ( n = l , 2 , . . . ) d e n o t e
an arbitrary sequence of integers, q~ ~ 2. It is easy to prove (see [24]) that
~e This interpretation suggests itself in the following special case: let us form the
Cartesian product of a denumerable sequence of isomorphic conditional probability spaces
1 2 n
[S, ~,~3, P] and denote by A,~ resp. B,, the sets defined by S'* S * . . . * A * S . . . . resp.
S*S*...*B*S,... (see section 1. tl).
330 A. R~NV*
any real number x (0 =< x < 1) can be represented in the form
~=1 qlq2""q~ ,
where e.(x) may take on the values O, 1, . . . , q ~ - - l ; the d!gits ~,~(x)are
uniquely determined except for those rational numbers x which can be writ-
ten in the form x ~ ~=,~'~}/1 q2--:-q,~ (s~v(x) > 0), where besides fhis finite re-
presentation the alternative infinite representation
N-1 8n(X) 8N(X)--I 2 q,--1
x~=l q,q~...q~ -k q~q2...qN q-,~=~v+, qiq~...q,,
is also possible. We shall prove that in the case q,~-+-k ~ , when all non-
negative integers are possible "digits", all these digits occur in the.limit with
co 1
the same (conditional) relative frequency, provided that ~ . X - diverges. This
is contained in the following theorem which is a consequence of Theorem 15.
THEOREM 16. Let the numbers q~ satisfy the following conditions-
a) q,~ is an !ntege r, q,~ >=2 for n = 1, 2 , . . . ,
b) q,~>=N for n =>no,
~%1
Let us consider the expansion of real numbers x (0 <= x < 1) into Cantor's
series with quotients q,, i.e. the expansion
CO
(80) x= Y _40_
~=1 qiq2. . . q~
where s~(x) is one of the numbers O, 1 , . . . , q,~--l.
Let F~(k; x) denote how many of the "digits" ~(x), ~.;(x),..., ~,~(x) are
equal to k. Then for almost every x in (0, 1) we have
(j; x)
(81) ,~+c~limF,~(k~))--l, for O<=j~k<~N--l:
RZMARK 1. If q,~---~; condition b) is fulfilled for every iV, and thus

(81) holds for any pair of non-negative integers j and k, i. e. all digits
O, 1 , 2 , . . . occur in the limit asymptotically with the same frequency, in the
Cantor expansion of almost all real numbers. This is the case, for instance,
if q,~ = n-I- 1, i. e. if we consider the expansion
~ s~(_x)
(82) x=~ k!
where ~j.(x) can have the values 0, 1,.. :, k--1.

REMARK 2. It follows from (81) that for almost all x
(83) lim zF,x(k;x) l-l-1 1 for k ~ 0 , 1. . . . ' l and l-~- 1,2, .... N - - I .
j~O
This is an other way of expressing that the frequencies of all digits
0, 1 , . . . , N - - 1 are in the limit equal to another.
If q, ~ ~ , we have
(84) lim F~.u(k;x) _ 0 for k~0,1,
2"l/-.--~C0 m " " "
for almost all x. This can be shown to be a consequence of Lemma 1 (see

(86) below).
REmAr~ 3. Theorem 16 clearly reduces to the theorem of Borel if
q , , = 10 for n = l , 2 , . . . .
ProoF. Theorem 16 is a special case of Theorem 15, but it is more
simple to prove it directly by means of Lemma 1 of section 4. 1. Let us
consider the ordinary probability space IS, g , P(A)] which we obtain if we
choose for S the interval (0, 1), for ~ the set of all measurable subsets of S
and put P ( A ) = m ( A ) where re(A) is the Lebesgue measure of the set A. Let
us define the random variables ~ = ~ , ~ k ( x ) (0--<_x<l) for O < - - k ~ n - - t
(n ~ 1 , 2 , . . . ) as follows:
t l if e,~(x)=k,
(85) ~,,~(x)= 0 otherwise.
The variables ~,,~ (n = 1 , 2 , . , .) are clearly mutually independent, further
1
we have M,,,.,= M (~,k (x)) = q,~- and D ~,,:_~D2(~,,,~(x))==_ q,~- ) for q,,>=k,
1 (_l _ _ !q,,
i. e. for n >= no if k < N; the values of M,,~ resp. D,7; for n < no are irreIevant.
Condition a) of Lemma 1 follows from supposition b) of Theorem 16, and
condition b) of Lemma 1 follows also from supposition b) of Theorem 16.
Thus we have
(86) P lim ,,=l --1t~1.

/
\ ~t i~o q~z
As (86) holds for all k = 0, 1 , . . . , N--1 and ~ _~,7~= F:~,(k ; x) the assertion
'0:=1
of Theorem 16 follows.
The results of this section (as well as those of section 3.2) show that
there exist sequences xl, x~, ..., x,~, consisting of the numbers 0, 1, 2 . . . . . in
w h i c h the relative frequency _F~(k)
ll
(where F~,,(k) denotes how many of the
332 n. RENYI
numbers x~, x . 2 , . . . , x,~ are equal to k) of every number k---O, 1. . . . tends to

0 for n - - ~ , but the conditional relative frequencies converge to definite
limits, i. e.
lim F,,(h) .... P,, (h:0, 1 , . . . , k; k:= 1, 2 , . . . )
I?, 1;:
,i=o ,i = o
if2
~p
where P~ > 0 for k = 0 , 1, ... and ~ , r Such sequences may be
1;~0
considered as mathematical models of sequences of observations on a random
variable .~, defined on a conditional probability field, and having the conditional
distribution
P,,
# (h=0,1 .... ,k; k--1,2 . . . . ).
j~U
(Received 25 July 1955)
Bibliography
Ill A. N. KOL/VIOOOROFF,Grundbegriffe der Wahrscheinlichkeitsrechnung (Berlin, 1933).

[2] H. JEFFREYS, Theory of probability (London, 1943).
13l H. REICHENBACH, Wahrscheinlichkeitslehre (Leiden, 1935).
[41 M. 1. KEYNES,A treatise on probability theotT.
151 B. O. KOOPMAN,The axioms and algebra of intuitive probability, Annals of Math., 41
(1940), pp. 269--292.
I6] A. H. COPELAND,Postulates for the theory o f probability, Amer. Journal of Math., 63
(1941), pp. 741--762.
[7] G. A. BARNAt?D, Statistical inference, Journal of the Royal Statistical Soc., Ser. B, 11
(1949), pp. 115--139.
[8] I. J. GOOD, Probability and the ;veighing of evidence (London, 1950).
[9] A. R~NY~, A val6szinfis~gsz~mitas fij axiomatikus fel~pit6se, MTA Mat. ds Fiz. Oszt.
K6zl., 4 (1954), pp. 359--427.
[10] A. R~NW, On a new axiomatic foundation of the theory of probability. Under press
in the Volume 1 of the Proceedings of the International Mathematical Con-
gress in Amsterdam, 1954.
[11] A. R~NYI, 1]ber die axiomaiische Begrfind~Jng der Wahrscheinlichkeitsrechnung. Under
press in the Proceedings of the Confereme on Probability Theory in Berlin,
1954.
[12] J. H~jeK and A. RgyvJ, Generalization of an inequality of Kolmogorov, Acta Math.
Acad. Sci. Hung., 6 (1955), pp. 281--283.
[13] A. Cs~.sz~R, Sur la structure des espaces de probabilit6 conditionnelle, Acta Math. Acad.
Sci. Hung., 6 (1955), pp. 337--361.
[14] J. NeYMAN, L'estimation statistique trait6e comme un probl~me classique de probabilit6,
Act. Sci. et Ind., 739 (Paris, 1938), pp. 25--57.
[15] P. HALMOS,Measu,e theory (New-York, 1951).
ON A NEW AXIOMATICTHEORY OF PROBABILITY 333
[16] P. E~D6S, On the integers having exactly k prime factors, Annals of Math., 49 (1948),
pp. 53--66.
[17] B. V. GnBD~nao, 13ber ein lokales Grenzwerttheorem iiir gleichverteilte unabh~ingige
Summanden, Wiss. Zeitschrift tier Humboldt-Univers#fft Berlin, 3 (1954"1, pp.
287 --293.
I18]. J. L. Dooa, Stochastic processes (New-York~ 1953).
[19] C. DERMAN, A solution to a set of fundamental equations in Markov chains, Proc. Amer.
Math. Soc., 5 (1954), pp. 332--334.
120] ERD6S P. and K. L. Cm~yo, Probability liniit theorems assuming 0nly the first molnent. 1,
Memoirs of the Amer. Math. Soc., 6 (1952).
[21] K. L. C,~u.~, Contributions to the theory of Markov chains; Trans. Amer. Mdth. Soc.,
76 (1954), pp. 397--419. See also K. L. Cau~G, An ergodic theorem for station-
ary Markov chains with a countable number of states, Proc. Internat. Congress
of Math. Cambridge(USA), 1959,(Amer. Math. Soc., 1952), I, p. 568.
122] W. FELLE~, On the integral equation of renewal theory, Ann. Math. Stat., 1~ (1941),
pp. 243--267. See also S.T.~eKLtND, Elementare Behandlnng vom Erneuerungs-
problem, Skandinavislr Alclttarietidskrift, (1944), pp. 1--15, and Fourier-analy-
tische Behandlung vom Erneuerungsproblem, Skandinavisk Aktuarietidskrift,
(1945), pp. 101--105; and J. L. DooB, Renewal theory from the point of view
of the theory of probability, Trans. Amer. Math. Soc., 63 (1948), pp. 422--438.
[23] L. TAKACS,Egy fij m6dszer rekurrens sztochasztikus folyamatok t~.rgyal~is~n~l, MTA Alk.
Mat. lnt. K6zlem~nyei, 2 (1953). pp. 135--163.
[24] O. PEr~R0r~, Irrationatzahlen (Berlin, 1921), pp. 111--116.
I25] E. BO~L, Les probabilit6s d6nombrables et leurs applications arithmetiques, Rend. Circ.
Math. Palermo, 27 (1909), pp. 247--27I.
[26] H. HAHN and A. ROSENTnAL, Set functions (New Mexico, 1948).
[27] M. LoirE, Probabilitv theory (New-York, 1955).
HOBOE AI<CHOMATHqECI~OE HOCTPOEHHE TEOPIAI/I BEPO~tTHOCTEFI

A. P e H 6 n (By~aneLUT)
(Pez~oMe)
Pa6oTa co~ep~nT HOBOe ar~ctloMaTnqi~crcoenocTpoeHne Teopun BepoflTHOCTe~ ; 0CHOB-

Hoe non~Tne B aTOh HOBO~i TeopuH, ~Ba~omeihcn o6o6m, enueM Teopnn A. H. K o a Mo,
r o p o B a, - - non~Tne yC~OBUO~i Bepo~Tn0CTU. HoTpe6nOCTb B paapa6oTKe nOBO~i Teopnn
6~ma BblaBaHa TeM, qTO g Teopnn A. H. t( o n M o r o p o B a neorpannqenn~ie Mep~l ncKa~-
nenb!, TaK, nanpn~aep, ne nMeeT cMb~caa rOBOpnTb O paBnoMepnoM pacnpe~enennn, BO BCeM
n-Mepn0M eBKan~OBOMnpocTpancTBe, u TO BpeM~, KaK rlpnnon~ennfl Teopnn BepO~THOCTn
qbnanKe, K nnTerpaabnoi~ reoMeTpnn, Teopnn qncea n T. ~t. TPe6y~T paccMbTpemla TaKnx
n e nopMnpyemblX pacnpe~eaeHni~. BsI6npa~ B KaqeCTBe OCHOBHOFOnOH~ITHTITeopnH BepO~T-
nocTe~i nonstTtae ycno~noi~ Bepo~ITHOCTn, ynoM~nyTbIM ne nopMnpyeMbl~a pacnpeAenennnM
MO;fCHO~aTb TOqHbl~ MaTet~:aTriqec~H~ CMblC.rLHosa~ Teopr~ 9;e.rIaeT-BOa~O~nb~mo6o6~eHne
ttenoro p~/~a TeopeM Teopm~ sepoflTnOCTe~i n HOBble npnnon~euna zeopnn; pa6oTa 8ann~a-
eTC~ paapa6oTKo~ nOaO~ Teopnn n anai<o~nT c neKoTop~IMn ee TrlrlHqnblMrinpuao~ennflMrl.
Teopn~ nCXO~nT na cae/~y~otttnx npe~oonon<enn~i n a~cnoM :
l-[yCTb S eCTb .rlro6oe MHOXZeCTBO, ~oTopoe ~ht 6y~e~t HaablBaTb tJpOCTpaHCTBOM
CO6blTni& ; nyCTb ~ eCTh a-aare6pa n0~MHOX<eCTB npocTpaHCTBa S, 3neMeHTb~ tC~ Mbl 6y~eM
334 A. ~ N w
nagbmaTb CO6bITH~Mrl; nycTb o~ eCTb HeKOTOpOe He nycToe no~xlno~ecTBO OT (,~ (~-~ eCTD
MHO~eCTBO Bcex ~oIIyCTnMbIX ycnoBun), a P(A [B) qbyHu~nu MIJo~ecTB o r aByx nepeMen-
nmx, onpe~enenna~, ecru A E ~ H B E ~ ; qncao P(A IB) Mu 6y~eM naamBaTb yC.riOBUO~I
nepo~TnOCTb~O CO6b~Tn~ A OTHOCnTe~bH0 ycaoBn~ B. Hpe~noao~nM, qTO BbmO3H~OTC~
cae~ymu~ne 3 a~cnoM~,~:
A~cnoMa I. P ( A ] B ) ~ 0 , ecnaAE~ n BE~, n P(BIB)=I, ecnn B ~ .
AKCuOMa il. Ecaa B ~d~ qbnecnpoBau, P(A]B) eCTb ~epa Ha a-aare6pe ~ .
AecHoMa Ill. Ecnn A E ~ : ~ , B ~ , C E ~ n BC~, To
P(A [BC) P ( B I C)-- P(AB] C).
Ecan a~cnoMm I--lll ~monnennm, TO COBOKyl]HOCTb Mno~ecT~a S, a-aare6p~t C~
CnCTeMbl MHO)I~eCTB~ I~1 ~0yn~Inn MHO~eCTB P ( A ] B ) HasoBe,~ npocTpaHCTBOM yCflOBttO~
Bepo~TaOCTn n 0603naqn~ qepe3 [S, ~ , ~ , P].
OqeBn~no, UTO ecan B qbn~cnposanno, TO S , ~ n P ( A I B ) 06paaymw no Teopm~
A. H. I / o a ,~ 0 r o p o B a npocTPaHCT~O ~epo~TaOCTm HOOTOMy npocTpancT~o yC,~0SH0~
Bepo~tTaOCTn ~Bh~eTC~ Hn qeM 14HSIM~iza~ COBOI~yIIHOCTblOO6hI~noBet~HbIXHpocTpaHcTB Bepo~T-
nOCT~, Cs~3aHHb~X a~cnoMofi IlL
B ~aqecTne Henocpe~cTsennoro cne3cTBn~ a~cno~ nony~aeTc~ paneHCTSO P(A}B)=::
P(AB]B), a no3xo~y P ( A [ B ) ~ I. Pa~aeJI 1.4 zana~aeTc~ aansnefimu~u npocTm~n cnea-
cksnman m<cno~. B pa3~eae 1.5 nccne~lyeTc~ npn i~a~nx yCnOBn~X ycnoBna~ Bepo~TBOCTr,
P (A ]B) ~onceT 6mT6 npe~icTasnena ~ snae P ,tA B~~ == QQ(AB)

~ , r~le Q ~epa (He o 6 ~ a r e n b s o
orpannqenHa~) Ha a-anre6pe ~ n Q ( B ) > 0 3nn B E oc3.

J I/Icqepnbmamtttee uccne~osanne
EtTOFO B0npoca, a Ta~x<e cnenyrontero 6onee o6mero Bonpoca: B ~a~o~ cnyqae ycaoBna~
sepOaTHOCTS P (A IB) MO~eT 6S~Tb npe~cTas.aena B sn~e P (A IB) - - Q3Qr B) , r~le Mepa Q~
~m6npaeTc~ n3 HeKoTopoFo Mn0~ecrna Mep B 3aBtICnMOCTti OT B, MO}~HO Ha~TH B pa6oTe

A. q a c a p a [13]. B paa~eae 1.6 ncc:~e~yroTc~ cnyqafinme ~ennqunm, onpe~eaennbie r~
npocTpa~c~e ycao~nofi ~epo~TnocT~, ~ co~epx<ar onpe~enen~e ux ~ y ~ L ~ p a c n p e ~ e a e ~
n t]~y~tmn HnOTHOCTI4.
9 yn~aum ~ = ~ ( S ) (S ~ S) Ms~ 8a~mBaeM c:~yqafiuofi Be~II~lqI4HO~, ec~qu ona naMepuMa
OT~OC~TeaSnO ~ . MOnOXOHHO ney6m~ammym n chela iJenpepsmnym qbyu~nro F ( x ) ~
na3msaeM (o6o6n~enno~) ~0ynmmefi pacnpe~enenn~ cayqafiHOfi ~ea~qnns~ ~, ec~m Mnoncecx~o,
r~e a ~ ~(s) < b, npnna~nencnT ~ , npe~no,~ara~, qTO F(b)--F(a)> 0, n n ZTOM cny~ae
P(c~ < dla~ < b ) : - F(d)--F(c)

F(b)--F(a) ' ecan a ~ c < d < b.
9 ya~un~ pacnpe~enenna F(x) He onpe~eaena o,~no~naano, noxo~y ~TO sMecze C

F(x) onpe~enennm y~oBneTnop~eT ~ 2F(x)-~-,u, r~e 2 > 0 n ,t~ nen~ecTuenno; qbyn~n,nz
pacnpe~,enenn~ F(x) ~oncex npnnn~aT~ mo6r~m nem, ecTBennbm (3naqnT ~ oTp~aaTenbn~e)
anaqenn~. Ecnn F(x) a6COntOTHO nenpepm~na, ~yn~l~mO f ( x ) = F'(x) na~oneM o6o6~enHofi
r rlnOTHOCTH cnyqai~no~ ~eznnnnbI ~; t~yrarr nClOTHOCTH onpe~eaena :mm~ c
+co
WOqH0CXb~O ~0 nono~nTe.abnOrO nocTo~nnoro COMnO;~nTe~a~ H [ f ( x ) d x He o 6 ~ a T e a b n o
-cO
cyu~ecTsyeT. Ecan, n qaCTHOCTH, f ( x ) ~ 1 (--,>0-~ X ~ "~ ), TO Mbl r o ~ 0 p n M , q T 0 cnyqabi-
na~t r~ennqnna ~ panno~epHo pacnpe~e~qena Ha Bce~ BeIILeCTBeHHO~iOCtl. Paa~,en 1.7 3ann~a-
eTC~t 09~ttOfirtepedpopMynapoBr~ofi a~cuoM~ III. B pa3~ene 1.8 paccMaTprmaeTc~ pacmnpenne
Hp0cTpaHCTB yC~IOBH0~ BepoStTHOCTH, R | . 9 - - nellpepblBHOCTb ycaonnofi BepogtTHOCT~I,
I. t0 - onpegeaenne npoHs~,enen~, npocTpa~cTP, ycnonno~ l~epO~T~OCT~J. Pa3~ea[~ 2. I n
-
2 . 2 co~epn<aT npnMepbi nocTpoenn~ npocTparlCTB yCJIOBHO~i Bepo~TnOCTH, B paa~e:mx 2. 3

n 2, 4 nayqa~oTca npocTpancTBa yCnOBnOfi BepOflTnOCTn, o6na~am~ne neKoTopuMn ~ononnn-
Tenbnb~Mn CBOfiCTBaMn: TaK na3bfBae~bm npocTpancTBa Kanaabepu H peryn~pahm np0-
cTpaHcT~a. Pa~aen 2.5 paccMaTp~BaeT napa~o~c B o p e a ~, 2.6 - - pacnpe~enenae cyMMb~
~e3aBnCn~blX C~yqa~inb~x Bennqnn n KoMno~ntti4m BBe~ennbIX B paa~eae 1.6 o6o6n~enaBIX
~bynn~ni~ pacnpe~eaean~i. B w 3 na necKonb~nx npn~epax noKa3bxBaeTca, KaK noBaa Teopn~
~pBBORHT K OTKpblTHtO ner nOBb~X COOTnomennfi, Ir B paMKax O6bIqUO~ Teopnn
zepo~TnOCTa He MOryT 6blTb c~opMy~InponannhL Paane~ 3.6 co~ep~nT o6o6menne MeTo3a
~Ba i~ c a. Pa3ae~ 3, 7 CO~epxr o n p ~ e z e n n e yc~onnoh 3pro~uqnocTn OTnOCnTenbnO l~ene~
Map~ona. B paa~eze 3.8 pacc~aTpnnaeTcn o~nn napa~o~c Teopnn ~oao6aoszenn~. Pa3~e.a
3 . 9 CO~ep~nT npocToe n ncx03~u~ee Ha 6oaee npocTb~X nea o6b~qno npe3noao-
3~HI~li~l ~oKa3aTe~IbCTBO 3a~ona Ma~cse~aa o pacnpeneaennn c~opocTefi, ncnOzbay~ ~annoe
p a ~ e z e 3. 6 o6o6ntenne ~eTo~a 15 z ~ c a. B w 4 peq~ n~eT o6 yczonn~ix ~a~oaax 6 o ~ t u n x
~IHC~,q, OTHOC~IIlI~IdXC~I K CXO~BMOCTI4 C BepO~ITHOCTblO 1 yC.rlOBHOF0 cpe~lnero na6am~ennfi.
B ~aqecTne npn~onr pa3~ez 4. 3 CO~lepar o606u4enne TeopeM~i B o p e z ~, OTnOCn-
atefica ~ nop~azbnuM p a z ~ o ~ e n n ~ n ~eC~TI4qHble npo6n, Ha p ~ KanTopa.

On A New Axiomatic Theory of Probability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

On A New Axiomatic Theory of Probability

Uploaded by

Copyright:

Available Formats

ON A NEW AXIOMATIC THEORY OF PROBABILITY

Dedicated to Professors L. FEJ(~R and F. RlEsZ on their 75~h birthday

The axiomatic foundation of probability theory given by A. N. KOLMO-

used only to calculate conditional probabilities as the quotient of the Values:

notion of a function, to include, for instance, the function of DI•AC, iogether

w 1. The axioms and their immediate consequences

1.1. Notations. In what follows if A and B are sets, we denote by-

~__TA,0~ ~ ; 2. if A ~ ~.%, we have S - - A ~ cq;; 3. ~ is not empty. This implies

of ~ is given; we do not suppose any restrictions regarding the set N. ~ W e

Axiom 11I.3 If A E ~, BE a, C E ~, and BC E N, we have

P*(AIB ) - P(ABIC) =P(A[BC). In case S~33, clearly [S, cq, Ps(A)] is a

THEOREM 4. p (A]B C) P (B IC) : P (BIA C) P (AIC).

THEOREM 7. If C ~ B : - ~ B~ and ABjB,~C= O for j ~: k (j, k = 1, 2.... ),

P ( A I C ) ~ ~ ~7 P(A[ B ,~C) P(B,~IC ).

4 It should be mentioned that this special case of T h e o r e m 5 follows from Axioms

5 Acta Mafhematica V1:3--4

PROOF. By Axiom 11I P ( A [ B k C ) P ( B k l C ) : P ( A B k t C ) and thus by

us suppose Bk(~J% and B j B k = O for j=f=k, B : ~ , B ~ : and B(O3; if

P(AIBk ) ~ ;oP(A'IB~) for k--= 1 , 2 , . . . where /~ >= 0, we have P(AtB)_<--

P(AtB ) -----~.~P(A tBj,,)P(B,~IB) =< s . ~ P(A' ]B,~)P(Bk[ B) = ;~P(A' IB).

The two other assertions are evident consequences.

then Q(C) can be defined for all C ~ ~ and is a bounded measure on ~,

P(A IB): P (AB).

if Ak r c~* (k ~ 1, 2, ...) and A:~A~~ 0 for j =4=k and ~ A~ = A E c~*, then

have Ak~B,, for 1 , 2 , . . . and thus in the relations

remains countabl) additive on c2"*. Let us put S* - - ~ B

Now AB,,~_B,,, and thus AB,~c~[ *, and therefore AS*~c]**; thus

and B,,CB,,+,, and thus Q ( S * ) = l i m Q(B,,) where Q(B,,) is non-decreasing.

of ~ with respect to the measure defined on S by P(AiB ) with B fixed.

Conditions ensuring the existence of a generalized distribution function of a ran-

If f(x) is the generalized density function of ~, we have

P~OOF. Clearly Axiom IIl' is a special case of Axiom Ill. Conversely, if

': See the footnote ~ to Remark 2 to Theorem 5.

REMARK. Theorem 9 means that Axiom 1II contains a compatibility con-

(t3) P*(A i B~ ) = ~ P(A (7~iBm)

we have defined P*(A[Boo) for every A ~ r To prove Theorem 11, we have

~P*(AtBr i. e. that (12) and (13) are equivalent. As regards Axiom I, it

follows: if A~-Bco, A(~ and A{t'):Bz~B~_~ ( k = 1 , 2 , . . . ) which implies~

P*(Bco IB~) : - P(Bo[B~) q- . ~ P (B~.,B~;_~[Bco) =- lim P(BxlB~).

P*(B~ iBm) = lira HP(B,~IB,~+O = 1

because the remainder-products of a convergent infinite product are always

j ' @ k (j, k = 1, 2, . . . ) , a n d Aco = ~ ,. Let us put A}~ and Ail'~

(14) P(A!B~o) = P(A B,,)//P(B,,]&,+~)

and P(A]B~,,)is countably additive, further A ~7:)- ~ , a~ and A~J,').A![')~- - O'

(t5) P(A~)i B| = .~.~'P(A}~")JB~).

As we have from (13)

(19) P*tA~ JB~) = , ~ P*(A, JBr

i. e. that P*(AIB~ ) is countably additive.

for any A ~ ~. As a matter of fact, by (14)

(22) lim P(AIBx ) --- 2 P(Ae~

a n d thus, by (13), (20) follows.

Passing to the limit N - - , ~ in (23), and using (20), it follows that

(24) p(A~B| P(AB~IC)

tor sufficiently large N. T h u s ,u~(A)~ Ctq(A) where C~P(BIB~ ) does not

mentioned sets A. As B~ ~ 2 , B~-, it follows that ~2(A) : C~I(A) if A ~Bo~B

As regards the continuity of P(AIB ) as a function of B, we have

THEOREM 12. If B~, ~ ~ and B,~ ~ B,~+I (n ~- 1, 2, .. .), further ~ . B,~ ~-

PROOF. We have by Axiom III

Thus Theorem 12 is proved.

THEOREM 13. If B E ~03,BC, E $g and C,,~_C,.+~ (n-~l, 2,...) further if

putting C = I - I c~ we have BC E O3 and P ( C ] B ) > 0 , it follows that

lira P (A !B C,~): - P (A ]B C).

PROOF. We have by Axiom III

remains countabl) additive on c2". Let us put S - - ~ B

Now AB,,~_B,,, and thus AB,~c~[ , and therefore AS~c]**; thus

(19) PtA~ JB~) = , ~ P(A, JBr