Aldous Exchangeability

12. 13. 4, 15. 16. 7 18. 19. CONTENTS Introduction... - 2 eee ee ee PART I Definitions and immediate consequences Mixtures of i.i.d, sequences... .. de Finetti's theorem... e+e. ee Exchangeable sequences and their directing Finite exchangeable sequences PART IT Properties equivalent to exchangeability Abstract spaces... ee eee eee The subsequence principle .. +... Other discrete structures»... Continuous-time processes .. 2... Exchangeable random partitions... « PART III Abstract results . . The infinitary tree random measures Partial exchangeability for arrays: the basic structure resu Partial exchangeability for arrays: complements The infinite-dimensional cube... . PART IV Exchangeable random sets... +s Sufficient statistics and mixtures . . Exchangeability in population genetics 19 27 36 42 49 58 65 70 84 97 108 121 133 145 155 157 16420. ai. ii Sampling processes and weak convergences . see e+e eee 168 Other results and open problems... +--+ + spe atEe EEE vw Appendix soe eee ee eee ee ee Seeiersreinsiey 181 Notation... ee eee eee eee eee eet panieeneialt-c} References 2... ee eee te te ee ee seeee 184Introduction If you had asked a probabilist in 1970 what was known about exchangeability, you would likely have received the answer "There's de Finetti's theorem: what else is there to say?” The purpose of these notes is to dispel this (sti11 prevalent) attitude by presenting, in Parts II-IV, a variety of mostly post-1970 results relating to exchangeability. The selection of topics is biased toward my own interests, and away from those areas for which survey articles already exist. Any student who has taken a standard first year graduate course in measure-theoretic probability theory (e.9. Breiman (1968)) should be able to follow most of this article; some sections require knowledge of weak convergence. In Bayesian language, de Finetti's theorem says that the general infinite exchangeable sequence (Z,) is obtained by first picking a distribution @ at random from some prior, and then taking (Z,) to be i.i.d. with distribution 8. Rephrasing in the language of probability theory, the theorem says that with (Z,) we can associate a random distribution a(u,+) such that, conditional on a= 6, the variables (Z,) are i.i.d. with distribution 6. This formulation is the central fact in the circle of ideas surrounding de Finetti's theorem, which occupies most of Part I. No previous knowledge of exchangeability is assumed, though the reader who finds my proofs overly concise should take time out to read the more carefully detailed account in Chow and Teicher (1978), Section 7.3. Part II contains results complementary to de Finetti's theorem. Dacunha-Castelle's “spreading-invariance" property and Kallenberg's stopping time property give conditions on an infinite sequence which turn out to be equivalent to exchangeability. Kingnan's "paintbox" description of exchangeable random partitions leads to Cauchy's formula for the distribution of 1cycle lengths in a uniform random permutation, and to results about components of random functions. Continuous-time processes with interchangeable increments are discussed; a notable result is that any cont inuous-path process on [0,) (resp. [0,1]) with interchangeable increments is a mixture of processes which are linear transformations of Brownian motion (resp. Brownian bridge). The subsequence principle reveals exchangeabTe-Tike sequences lurking unsuspectedly within arbitrary sequences of random variables. And we discuss exchangeability in abstract spaces, and weak convergence issues. The class of exchangeable sequences is the class of processes whose distributions are invariant under a certain group of transformations; in Part III related invariance concepts are described. After giving the abstract result on ergodic decompositions of measures invariant under a group of transformations, we specialize to the setting of partial exchangeability, where we study the class of processes (Xj: 1€1) invariant under the action of some group of transformations of the index set I. Whether anything can be proved about partially exchangeable classes in general is a challenging open problem; we can only discuss three particular instances. The most-studied instance, investigated by Hoover and by myself, js partial exchangeability for arrays of random variables, where the picture is fairly complete. We also discuss partial exchangeability on trees of infinite degree, where the basic examples are reversible Markov chains; and on infinite-dimensional cubes, where it appears that the basic examples are random walks, though here the picture remains fragmentary. part IV outlines other topics of current research. A now-classical result on convergence of partial sum processes from sampling without replacement to Brownian bridge leads to general questions of convergence fortriangular arrays of finite exchangeable sequences, where the present picture is unsatisfactory for applications. Kingman's uses of exchangeability in mathematical genetics will be sketched. The theory of sufficient statistics: and mixtures of processes of a specified form will also be sketched--actually, this topic is perhaps the most widely studied relative of exchangeability, but in view of the existing accounts in Lauritzen (1982) and Diaconis and Freedman (1982), I have not emphasized it in these notes. Kallenberg's stopping time approach to continuous-time exchangeability is illustrated by the study of exchangeable subsets of [0,=). A final section provides references to work related to exchangeability not elsewhere discussed: I apologize in advance to those colleagues whose favorite theorems I have overlooked. General references. Chow and Teicher (1978) is the only textbook (known to me) to give more than a cursory mention to exchangeability. A short but elegant survey of exchangeability, whose influence can be seen in these notes, has been given by Kingman (1978a). In 1981 a conference on “Exchangeability in Probability and Statistics" was held in Rome to honor Professor Bruno de Finetti: the conference proceedings (EPS in the References) form a sample of the current interests of workers in exchangeability. Oynkin (1978) gives a concise abstract treatment of the “sufficient statistics" approach in several areas of probability including exchangeability. The material in Sections 13 and 16 is new, and perhaps @ couple of proofs elsewhere may be new; otherwise no novelty is claimed. Notation and terminology. The mathematical notation is intended to be standard, so the reader should seldom find it necessary to consult the list of notation at the end. As for terminology, "exchangeable" is more popularand shorter than the synonyms "symmetrically dependent" and "interchangeable". I have introduced "directing random measure" in place of Kallenberg's "canonical random measure", partly as a more vivid metaphor and partly for more grammatical flexibility, so one can say “directed by ...". I use "partial exchangeability" in the narrow sense of Section 12 (processes with certain types of invariance) rather than in the wider context of Section 18 (processes with specified sufficient statistics). "Problem" means “unsolved problem" rather than "exercise": if you can solve one, please Tet me know. Acknowledgements. My thanks to Persi Diaconis for innumerable invaluable discussions over the last several years; and to the members of the audiences at St. Flour and the Berkeley preview who detected errors and contributed to the presentation. Research supported by National Science Foundation Grant MCS80-02698.PART I The purpose of Part I is to give an account of de Finetti's theorem and some straightforward consequences, using the language and techniques of modern probability theory. I have not attempted to assign attributions to these results: historical accounts of de Finetti's work on exchangeability and the subsequent development of the subject can be found in EPS (Foreword, and Furst's article) and in Hewitt and Savage (1955). 1. Definitions and immediate consequences A finite sequence (Z).---52y) of random variables is called exchangeable (or N-exchangeable, to indicate the number of random variables) if aay (Zyreeeidy) 2 yy Zany) § each permutation 7 of {1,...,N}. An infinite sequence (Z),25, is called exchangeable if (1.2) (52, GyeryZn(2y? for each finite permutation 7 of {1,2,-..}, that is each permutation for which #{i: (i) #1} <=, Throughout Part I we shall regard random variables Z; as real-valued; but we shall see in Section 7 that most results remain true whenever the have any "non-pathological” range space. 4 There are several obvious reformulations of these definitions. Any finite permutation can be obtained by composing permutations which trans- pose 1 and n> 13 so (1.2) is equivalent to the at first sight weaker condition6 AE a pee ee ereen eee (1.3) (2yseeealgyeZaeZnea ee? each n>, In the other direction, (1.2) implies the at first sight stronger condition (1.4) (2) 225323, each sequence (n;) with distinct elements. In Section 6 we shal] see some non-trivially equivalent conditions. Sampling variables. The most elementary examples of exchangeability arise in sampling. Suppose an urn contains N balls labelled xy,...9Xy- The results Z),Zp.+++ of an infinite sequence of draws with replacement form an infinite exchangeable sequence; the results Z,,---.2y of N draws without replacement form a N-exchangeable sequence (sequences of this latter type we call urn sequences). Both ideas generalize. In the first case, (Z;) is 1.i.d. uniform on {xyo-e+aXy}s obviously any i-i-d. sequence is exchangeable. In the second case we can write (1.5) (Zyseeedy) * myaqy see %re(N)? where a* denotes the uniform random permutation on {1,...sN}, that is (=n) = 1/NI for each a. More generally, let (Y),---sYy) be arbitrary random variables, take m* independent of (Y;), and then (1.6) (yereeady) = CyecryeeeYor(ny) defines a N-exchangeable sequence. This doesn't work for infinite sequences, since we cannot have a uniform permutation of a countable infinite set (without abandoning countable additivity--see (13.27)). However we candefine @ uniform random ordering on a countable infinite set: simply define FRj tomean €,(w) < £j(u), for tid. continuous (€ys£55--+)- This trick is useful in several contexts--see (11.9), (17.4), (19.8). Correlation structure. Exchangeability restricts the possible correlation structure for square-integrable sequences. Let (2;) be N-exchangeable. Then there is a correlation p = (25525). i#j. We assert (1.7) p> ch, with equality iff JZ, is a-s. constant. In particular, p = -1/(N-1) for sampling without replacement from an N-element urn. To prove (1.7), linearly scale to make EZ; = 0, et al, and then 2aete SY ex 1sifi-1/(N-1), with equality iff JZ; =O a.s. Observe that (1.7) implies (1.8) p> 0 for an infinite exchangeable sequence. Conversely, every p <1 satisfying (1.7) (resp. (1-8)) occurs as the correlation in some N-exchangeable (resp. infinite exchangeable) sequence. To prove this, let (,) be i-i.d., Fe; = 0, eet = 1, Define for some constant c, Then (Z;) is N-exchangeable, and a simple computation gives p= 1-(Ne2+2c41) 1. AS c varies we get all values 1>9>-(N-1)). (The case ¢ = -1/N which gives p = -1/(N-1) will be familiar from statistics!). Of course we can get p=1 by setting Z)=-+++=2Z,. Insid. and set the infinite case, take (£;: 120) cig tes 427 for some constant c. Then (Z,) is an infinite exchangeable sequence with 0 = c2/(c2+1), and as c varies we get all values ON) if @.- there exist N-exchangeable sequences which are not (N+1)-extendible; for sty) 2 (2ysssss8y) for sone H-exchangeable sequence (2)). ay (1.7) example, sampling without replacement from an urn with N elements. This suggests several problems, which we state rather vaguely. (1.11) Problem. Find effective criteria for deciding whether a given N-exchangeable sequence is N-extendible.(1.12) Problem. What proportion of N-exchangeable sequences are M-extendible? Such problems seem difficult. Some results can be found in Diaconis (1977), Crisma (1982) and Spizzichino (1982). Combinatorial arguments. Many identities and inequalities for i.i.d. sequences are proved by combinatorial arguments which remain valid for exchangeable sequences. Such results are scattered in the literature; for a selection, see Kingman (1978) Section 1 and Marshall and Olkin (1979). 2. Mixtures of i.i.d. sequences Everyone agrees on how to say de Finetti's theorem in words: “An infinite exchangeable sequence is a mixture of i.i.d. sequences." But there are several mathematical formalizations (at first sight different, though in fact equivalent) of the theorem in the literature, because the concept of “a mixture of i.i.d. sequences" can be defined in several ways. Our strategy is to discuss this concept in detail in this section, and defer discussion of exchangeability and de Finetti's theorem until the next section. Let 6,,-++58, be probability distributions on R, and Tet Pysesesy > 05 Spy = Te Then we can describe a sequence (¥;) by the two-stage procedure: (2.1) (i) Pick @ at random from {8yoe+-s8,) P(e=e;) Pas (if) then let (Y;) be 1.7.4. with distribution 6. More generally, write P for the set of probability measure on R, let © be a distribution on P, and replace (i) by (4") Pick 8 at random from distribution10 Here we are merely giving the familiar Bayesian idea that (Y;) is 1.1.4. (8), where @ has a prior distribution ©. The easiest way to formalize this verbal description is to say (2.2) P(YeA) = I eP(Ajo(ds) 5 ACR” oy Pp where Y= (YysYoeeee)s regarded as a random variable with values in R", and oe = @xgx-++ is the distribution on R” of an i.i.d. (8) sequence. This describes the distribution of a sequence which is a mixture of i.i.d. sequences. This is a special case of a general idea. Given a family {u: YET} of distributions on a space S, call a distribution v a ti 's if xture of cm) si (2.3) vr) = Jevroten for some distribution © onT . But in practice it is much more convenient to use a definition of "(Y;) is a mixture of i.i.d. sequences" which involves the random variables Y, explicitly. To do so, we need a brief digression to discuss random measures and regular conditional distributions. ‘A random measure a is simply a P-valued random variable. So for each » there is a probability measure o(w) and this assigns probability a(u,A) to subsets ACR. To make this definition precise we need to specify a a-field on P: the natural o-field is that generated by the maps a@— 0(A) ; measurable ACR. The technicalities about measurability in ? that we need are straightforward and will be omitted. We may equivalently define a random measure as a function o(w,A), w€2, ACR, such thatn a(w,*) is a probability measure; each wEQ. a(+,A) is a random-variable; each ACR. Say ay * ay a.s. if they are a.s. equal as random variables in P; or equivalently, if a,(+sA) = ay(+,A) ass. for each ACR. Given a real-valued random variable Y anda o-field F, a regular conditional distribution (r.c.d.) for Y given F is a random measure such that a(+,A) = P(YEA|F) a.s., each ACR. It is well known that r.c.d.'s exist, are a.s. unique, and satisfy the fundamental property (2.4) E(g( X,Y) [FD Jatxrrate,ay) a.s-; XEF, g(X,¥) integrable. We now come to the key idea of this section. Given a random measure a, it is possible to construct (Y,) such that conditional on a= @ (where 8 denotes a generié probability distribution), the sequence (Y;) is i.i.d. with distribution One way of doing so is to formalize the required properties of (Y;) in an abstract way (2.6) and appeal to abstract existence theorems to show that random variables with the required properties exist. We prefer to give a concrete construction first. Let F(a,t) = @(-~,t] be the distribution function of 6, and let (8,x) = inf{t: F(6,t) 2x} be the inverse distribution function. It is well known that if & is uniform on (0,1) (" is U(0,1)") then el(e,8) js a random variable with distribution 8. So if (g,) is an 4.1.4, U(0,1) sequence then (F°"(9,6,)) is an 11.4. (8) sequence. Now given a random measure a, take (€;) as above, independent of , and let (2.5) f=12 This construction captures the intuitive idea that, conditional on a = @, the variables 9, are i.i.d. (8). The abstract properties of (¥;, 1215 a) are given in (2.6) Definition. Let a bea random measure and let Y= (Y;) bea sequence of random varibles. Say Y is a mixture of i.i.d .'s directed by is a if (a(w))” is ar.c.d. for Y given ofa) . Plainly this implies that the distribution of Y is of the form (2.2), where @ is the distribution of a. We remark that this idea can be abstracted to the general setting of (2.3); X is a mixture of (u,: yer) directed by a random element 6: QT if wey) is ar.c.d. for X given o(8). Think of this as the "strong" notion of mixture corresponding to the "weak" notion (2.3). The condition in (2.6) is equivalent to (2.6a) PCY; EA;, T< tena) = Ma(w,A,) 5 all Aye i And this splits into two conditions, as follows. (2.7) Lemma. Mrite F = o(a). Then Y is a mixture of i.i.d.'s directed by a iff (2.8) (Yj: 421) are conditionally independent given F, that is POG EAS TSH salF) = 1 PCY, EA, LFD. (2.9) the conditional distribution of Y; given F is a3 that is. ~ PCY, GA,|F) = a(woAs).13 Readers unfamiliar with the concept of conditional independence defined in (2.8) should consult the Appendix, which lists properties (A1)-(A9) and references. Lemma 2.7 suggests a definition of "conditionally i.i.d." without explicit refernece to a random measure. (2.10) Definition. Let (¥;) be random variables and let F be a o-field. _ given F if (2.8) holds and if Say (¥,) is conditionally 4.4 (2.11) PCY, EAIF) = PCY; EAIF) as. each Ay f #5. Here is a useful technical Terma. (2.12) Lemma. Suppose (Y,) are conditionally i.i.d. given F. Let a be for Y; given F. Then (a) (Y)) (b) ¥ and F are conditionally independent given a. Proof. By (2.11), for each i we have that a isar.c.d. for Y; Now a is given F. So by (2.8), PCY, €A;. 1P by (2.13) AyOxyae++9%_) = nl 56. , the empirical distribution of (x4)5 i a i (2.14) Ax) Ee eee A (my 52+ XQ) = 6). say, if the limit does not exist, (3, denotes the degenerate distribution (A) fe Vyeny) If Xs OG) is an infinite i.i.d. (8) sequence, then the Glivenko-Cantelli theorem says that A(X) = @ a.s. Thus for a mixture (9) of i.i.d.'s directed by a we have A(¥) = a a.s, by (2.6) and the fundamental property of r.c.d.'s. Hence (2.15) Lerma. If the infinite sequence Y is a mixture of i.i.d.'s, then it is directed by a= MY), and this directing random measure is a.s. So we can talk about "the" directing random measure. Since A(x) is unchanged by changing a finite number of coordinates of x, we have (2.16) Lerma. Let the infinite sequence Y be a mixture of i.i.d.'s, let a be the directing random measure. Then a is essentially T-measurable, where T jis the tail o-field of Y. Keeping the notation of Lemma 2.16, we see that the following are a.s. equal. (2.17) (a) PUY, EAE, TST SAY Yagpereeds m>n (b) PUYZE Ags TST SMW Yng reese)» moon (c) P(Y;@Ay, 1< i ) arec.d. for Y, given (YagqVgygereed» m2 7- Observe that Lenmas 2.15 and 2.19 provide three ways to obtain (in principle, at least) the directing random measure. Each of these ways is useful in some circumstances. Facts about mixtures of i.i.d.'s are almost always easiest to prove by conditioning on the directing measure. Let us spell this out in detail. For bounded g: R-+R define g: P—+R by (2.20) (8) = |s de. By (2.9) and the fundamental property of r.c.d.'s (2.21) E(g(¥,) 1a) = G(a) a.s., and hence Eg(¥;) = Eg{a) - And using the conditional independence (2.8),16 (2.22) Ela (Y4)9g(¥jla) * G(ad8p(a) a.s., i # J, and hence Fai (¥s)aQ(¥5) = £9, (a)8,(a) These extend to unbounded g provided I9t¥ 15 lay (Y¥gdep(¥y) 1 are inte- grabie. Let us use these to record some technical facts about moments of Y; and a. For a distribution @ Tet mean(3), var(6), abs,(@) denote the mean, variance and r* absolute moment of 9. Let V(Y) denote the variance of a random variable Y. The next tema gives properties which follow inmediately from (2.21) and (2.22). (2.23) Lemma. (a) ElY; (b) If El¥;| is finite, then ey; = € mean(a). "= £ abs (a). (c) It eve is finite, then (8) EY, = Elmean(an)®, 4 # Js (38) (Yj) = E absp(a) = (E mean(a))? = € var(a) + V(mean(a)). Most classical limit theorems for i.i.d. sequences extend immediately to mixtures of i.f.d. sequences, For instance, writing S. = Y) 4-7" +Y,s (2.24) dim n”'s, = mean(a) a.s., provided ElY,| <=. neo a7 nemean (a) 2 (2.25) 1im sup ———*——____y= 1 a.s., provided EY) <@. m= {var(a)-2n Tog 1og(n)} S, ~ nemean(a) Tyee Normal(0,1) as are, provided £ Wee, (2.26) {n var(a)} Let us spell out some details. The fundamental property of r.c.d.'s says prim a's, =mean(a) la) = h(a), where (6) = P(Tim n”!(x,++-+4X,) = mean(@)) for (K;) it-d. (8). The strong law of large nunbers says h(8) = 1Ww provided abs,(9) <=. So (2.24) holds provided abs,(a) < » a.s., and by Lemma 2.23 this is a consequence of E|Y,| <=. The same argument works for (2.25), and for any a.s. convergence theorem for i.i.d. variables. For weak convergence theorems like (2.26), one more step is needed. Let g: RR be bounded continuous. Then S,-nemean(a) E(g(—*-———y79) |) = 9,,(4) tn var(a)} where n Mi = n-mean(8) (9) = E g(—————y7q)_ for (Kj) f.4-d. (8)- *n in var(8)} : The central limit theorem says g,(@)—>Eg(¥) when abs,(8) <=, where Y indicates a N(0,1) variable. Hence S, ~nemean(a) ) —+ Eg(V) 37 E gf fn var(a) provided abso(a) <= a.S.s which by Lemma 2.23 is a consequence of E % <©, The same technique works for any weak convergence theorem for isi.d. varfables. The form of results obtained in this way is slightly unusual, in that random normalization is inyotved, but they can easily be translated into a more familiar form. To ease notation, suppose EVs <@ and mean(a) = 0 a.s. (which by Lenma 2.23 is equivalent to assuming EY; = 0 and (Y,) uncorrelated). Then (2.25) translates inmediately to fe Vim sup ————"_y75 = tvar(a)}!/? as. m= {2n log og(n) } And (2.26) translates to18 (2.27) nls Ps vetvar(a)}/® 5 where V is Normal(0,1) independent of a. To see this, observe that the argument for (2.26) gives S,~ Aemean(a) (var(a), 275) Ps (varta) Ws (n var(a)} and then the continuous mapping theorem gives (2.27). Finally, keeping of, constant, iff the assumptions above, (2.22) shows that var(a) eyes o? and evY) = of, so that these extra assumptions are what is needed to obtain n7!/25, 2 N(0,02). Weak convergence theorems for mixtures of {.i.d. sequences are a simple class of stable convergence theorems, described further in Section 7. Finally, we should point out there are occasional subtleties in extending results from i.i.d. sequences to mixtures. Consider the weak law of large numbers. The set S$ of distributions @ such that (2.28) 2 7 2% Peo for (X;) iid. (8) is known, For a mixture of i.i.d, sequences to satisfy this weak law, it is certainly sufficient that the directing random measure satisfies a(w) € S$ a.s., by the usual conditioning argument. But this is not necessary, because (informally speaking) we can arrange the mixture to satisfy the weak law although a takes values @ such that convergence in (2.28) holds as n—= through some set of integers with asymptotic density one but not when n—>= through all integers. (My thanks to Mike Klass for this observation.) Another instance where the results for mixtures are more complicated is the estimation of LP norms for weighted suns Ja,X;. Such estimates19 are given by Dacunha-Castelle and Schreiber (1974) in connection with Banach space questions; see also (7.21). We end with an intriguing open problem. nee (2.29) Problem. Let S)= } X:, S, = 1%. where each of the sequences = 21 . (%,)5 (Bj) is a mixture of 1.f.d. sequences. Suppose S, 26 for each rn. Does this imply (x,) 2 (8)? 3. de Finetti's theorem Our verbal description of de Finetti's theorem is now a precise assertion, which we restate as (3.1) de Finetti's Theorem. An infinite exchangeable sequence (Z;) is a mixture of iid. sequences. Remarks (a) The converse, that a mixture of i.i.d.'s is exchangeable, is plain. (b) As noted in Section 1, a finite exchangeable sequence may not be to an infinite sequence, and so a finite exchangeable sequence may not be a mixture of i-i.d.'s. (c) The directing random measure can in principle be obtained from Lemma 2.15 or 2.19. Most modern proofs of de Finetti's theorem rely on martingale convergence. We shall present both the standard proof (which goes back at least to Lo&ve (1960)) and also a more sophisticated variant. Both proofs contain useful techniques which will be used later in other settings.20 First proof of Theorem 3.1. Let G, = Off y(Zyo+0+ 9%, ds f, symmetric} n a Hy = Gy Zar rZnegers-} > Then 4, > Hays N21. Exchangeability implies 2 (2M) 2 ay). 1s f, symmetric, and hence for Y of the form (f,(Zya-++22q)=Znaq »Zn4g? i for all Y@H,. So for bounded 9, 4 E(O(Z,)IK,) = ECO(Z)IH,) » Ts isa V8 202M) a 15 9(2,) . since this is G,-reasurable. (3.2) i=] The reversed martingale convergence theorem implies (3.3) an F924) — (412, IM) ase, where H= 0H, « isl a For bounded 4(x)5---»X%,)» the argument for (3.2) shows that for n> k E(O(Zp o-+- 52) (Hy) = TTT 3 a otZ5 1g K z where Dy = Uy sod dt Tsd.em G jy) distinct}. Using martingale convergence, and the fact that #0, , is oink!) as ne for fixed k, 4 nk y a) — E(O(Z, 9-2.) ]H) as. By considering Oxy +) of the form 64(x1)¢9(x9) ++O(x,) and using (3.3),al k k 8C a ,(Z 11H) 7 Ee) . This says that (Z;) is conditionally i.i.d. given H, and as discussed in Section 2 this is one of several equivalent formalizations of "mixture of i.i.d. sequences”. For the second proof we need an easy lenma. (3.4) Lemma. Let Y be a bounded real-valued random variable, and let FCG beo-fields. If E(E(YIG))* = EELYIF))®, in particular if e(y|e) 2 e(yje), then E(Y{G) = E(Y|F) as Proof. This is immediate from the identity ECE(Y|6) ~ ECVID)? = ECECYI))? ~ ECELYIAD)? « Second Proof of Theorem 3.1. Write F, = (2,15 -Zi49> the tail o-field. We shall show that (Z,) is conditionally i.i.d. given T. By exchangeability, (2, ,Z25255--+) a tae tpegees) and so £(9(Z,)IFp) ® E(G(Z4)IF_) for each bounded g: R—-R. The reversed martingale convergence theorem says €(9(2))|F,) > £(0(Z})|T) as. as nse, and so E(9(Z,)[Fy) 2 E(O(Z))IT). But Lenma 3.4 now implies there ‘is a.s. equality, and this means (A4) Z; and Fy are conditionally independent given T. The same argument applied to (Z,.Zn.1> gives Zq and Fa,, are conditionally independent given T; m>1. 1 These imply that the whole sequence (Z;; 121) is conditionally independent given T.22 - f ae 2 _ For each n>1, exchangeability says (Zy+ZyyqeZpyaeee-) © (ZyZaareZpagerecd» and so E(9(Z))|F,) = E(O(Z,)|F,) a-S- for each bounded $. Conditioning on T gives E(¢(Z,)|T) = E(o(Z,)|T) a.s. This js (2.11), and so (Z43 121) fs indeed conditionally i.i.d. given T. Spherically symmetric sequences. Another classical result can be regarded as a specialization of de Finetti's theorem. Call a random vector Y" = (tyseso¥q) spherically symmetric if UY" = Y" for each orthogonal nxn matrix U. Call an infinite sequence Y spherically symmetric if Y" is spherically symmetric for each n. It is easy to check that an i.i.d. Normal N(O,v) sequence is spherically symmetric; and hence so is a mixture (over v) of i-i.d. N(O,v) sequences. On the other hand, computations with characteristic functions (Feller (1971) Section III.4) give (3.5) Maxwell's Theorem. An independent spherically symmetric sequence has N(O,o°) distribution, for some of > 0. Now a spherically symmetric sequence is exchangeable, since for any permutation m of {1,...sn}, the map (¥y5+++5¥q) + (¥q¢q)> Ya(n)) is @ rotation. We shall show that Maxwell's theorem and de Finetti's theorem imply (3.6) Schoenberg's Theorem. An infinite spherically symmetric sequence Y This is apparently due to Schoenberg (1938), and has been rediscovered many times. See Eaton (1981), Letac (1981a) for variations and references. This result also fits naturally into the “sufficient statistics" setting of Section 18.23 Proof. Let U be anxn orthogonal matrix, Tet Y" = (Yys---s¥q)» GP = Ypres pug) BY considering o a} an orthogonal (min) x (min) matrix, we see (U¥",9 7) 2 (v7 ¥™). Letting m= gives (w",9) 2 cy" ,%), where ¥ = (Yi sYaygere+)e By conditioning on the tail o-field TCa(Y"), we see that the conditional distribution of Y" and of UY" given T are a.s. identical. Applying this to a countable dense set of orthogonal nxn matrices and to each n> 1, we see that the conditional distribution of Y given T is a.s. spherically symmetric. But de Finetti's theorem says that the conditional distribution of Y given T is a.s. an i.i.d. sequence, so the result follows from Maxwell's ‘theorem. Here is a slight variation on de Finetti's theorem. Given an exchangeable sequence Z, say Z jis exchangeable over V if (3.7) (Vidy agree) BZ Za(aye » all finite permutations 7. Similarly, say Z is exchangeable over a o-field G if (3.7) holds for each VEG, (3.8) Proposition, Let Z be an infinite sequence, exchangeable over V. Then (a) (Z;) is conditionatly i.i.d. given (Vso), where a is the directing random measure for Z. {b) Z and V are conditionally independent given a. Proof. Let 2, = (V,Z;). Then 2 is exchangeable, so de Finetti's theorem implies (2,) 4s conditionally i.i.d. given @, where @ is the directing random measure for 2. So in particular,24 (Z,) is conditionally i.i.d. given & « But applying Lemma 2.15 to 2, we see G= 6)xa, and so (8) = o(V,a). This gives (a). And (b) follows from (a) and Lemma 2.12(b). We remark that the conditional independence assertion of (b) is a special case of a very general result, Proposition 12.12. Proposition 3.8 plays an important role in the study of partial exchangeability in Part III. As a simple example, here is a version of de Finetti's theorem for a family of sequences, where each is “internally exchangeable". (3.9) Corolary. For 11) be i.i.d. (8;). independent for different j.25 Proof. Fix J. Proposition 3.8 shows 2 and o(Z": m#§) are conditionally independent given Then de Finetti's theorem for Z yields 3 o(Z": m#j) are conditionally independent given Jz Tt. eee a; isar.c.d. for 2] given aj. Since o(aj) CF Cola;, 2": m#j) this is equivalent to 3 o(Z": m#j} are conditionally independent given F3 oj isar.cd. for 2 given F. Since j is arbitrary, this establishes the result. Here is another application of Proposition 3.8. Call a subset B of R” exchangeable if (xx «+) ©B implies Oxy Ma(zyere) 8 for each finite permutation 1. Given an infinite sequence X = (X,)+ call events {X€B}, 8 exchangeable, exchangeable events, and call the set of exchangeable events the exchangeable o-field Ey. It is easy to check that Ey >Ty a.s., where Ty is the tail o-field of X. (3.10) Corollary. If Z is exchangeable then E, ofa) a.s. Proof. For AE, the random variable V=1, satisfies (3.7), and so by Proposition 3.4, V and Z are conditionally independent given a. Hence €, and Z are conditionally independent given . But Ey o(Z) and so E, and E, are conditionally independent given a, which implies26 (a6) that E, Cola) a.s. But ofa) CT, a.s. by Lemma 2.16, and Ty C Ey as. In particular, Corollary 3,10 gives the well-known Hewitt-Savage 0-1 law: (3.11) Corollary. If x id. then Ey is trivial. This can be proved by more elementary methods (Breiman (1968), Section 3.9). There are several other equivalences possible in Corotlary 3.10; let us state two. (3.12) Corollary. If Z is exchangeable then o(a) coincides a.s. with (a) the invariant o-field of Z (b) the tail o-field of (2,49 » for any distinct (n,). In particular, for an exchangeable process (Z;: -# ) be arbitrary; (4.1) Tet (X)%jo---) be t.4-d., independent of Y, taking values {1,2,...J5 Jet 2, Then Z is exchangeable, and using Lemma 2.15 we see a= J psdy (,.)> i i where p;, = P(X, =i). Let X)o%)s.. be i-i.d., distribution 05 (4.2) let Y be independent of X, with distribution 43 Jet 2, = f(Y.X;), for some function f. Then Z is exchangeable. Indeed, from the canonical construction (2.5) and de Finetti's theorem, every exchangeable sequence is of this form (in distribution). However, exchangeable sequences arising in practice can29 often be put into the form (4.2) where 6, , f have some simple form with intuitive significance (e.g. the representation (1.10) for Gaussian exchangeable sequences). To describe the directing random measure for such a sequence, we need some notation. (4.3) Definition, Given f: R-+R define the induced map #: P-+P by F(L(Y)) = LCF) Given f: RxR—+R define the induced map #: RxP—>P by F(x,L(Y)) = LEFOGY)) This definition and Lemma 2.15 give the next lemma. (4.4) Lemma. (a) Let Z be exchangeable, directed by a, let f: R-+R have induced map 7, and let 2, = f(Z,). Then 2 is exchangeable and is directed by F(a). (b) Let Z be of the form (4.2) for some f: RxR—+R, and let 7: RxP—+P be the induced map. Then Z is exchangeable and is directed by F(y,8). In particular, for the addition function f(x,y) = x+y we have #(x,9) = 4,8, where * denotes convolution. So (1.10) implies: (4.5) Z is Gaussian if and only if a(w,-) = 6y(,)*8, where @ and L(x) are Normal. Another simple special case is 0-1 valued exchangeable sequences. Call events (Aj, 121) exchangeable if the indicator random variables are exchangeable. In this case a must have the form30 X(u)8, #(1-X(w))8y» for some random variable 0 0. But de Finetti's theorem gives more information: by Lemma 2.23, (4.8) = 0 if and only if mean(a) = c a.s., some constant c. Of course, from de Finetti's theorem (4.9) mean(a) = E(Z,[a) = £(Z,|T) a.s. In particular, an exchangeable Z with EZ) = 0 is uncorrelated if and only if the random variable in (4.9) is a.s. zero. Curiously, this implies the (generally stronger) property that Z is a martingale difference sequence. (4,10) Lemma, Suppose 2 jis exchangeable, EZ, = 0. Then Z isa martingale difference sequence if and only if mean(a) = 0 a.s. Proof. By conditional independence, E(Zq IZ e+e Zqay90) = E(Z_la) = mean(a) a.s. So if mean(a) =Oa.s. then €(Z,|2,,--..2,.4) = 0a and so Z isa martingale difference sequence. Conversely, if Z is a martingale difference sequence then E(Z, |Z) >-++s%q_4) =0a.s., so by exchangeability E(Zy|Zps.++52Zq) = 0 2.3. The martingale convergence theorem now implies i>1) we have mean(a) EZ 124» i>1) = Oa.s., and since of{a) Co( = €(2)[a) = 0 as. Here is another instance where for exchangeable sequences one property implies a generally stronger property.32 (4.11) Lemma. If an infinite exchangeable sequence (Z,,Z,,..-) is pairwise independent then it is i.i.d. Proof. Fix some bounded function f: R—rR. The sequence 2; = f(Z;) is pairwise independent, and hence uncorrelated, By Lemma 4,4(a), 2 is directed by #a), and now (4.8) implies mean(#(a)) is a.s. constant, Cp Say. In other words: a.S.3 each bounded f. [reaat-ex) = Standard arguments (7.12) show this implies a= @a.s., where @ isa distribution with {rood Ec. So Z is iid. (8). (4.12) Example. Fix N> 2. Let (Yys--+s¥y) be uniform on the set of sequences (y;,-.-.¥y) of I's and 0's satisfying Jy, = 0 mod (2). Then (a) (Yyse+es¥y) 18 N-exchangeables (d) (Yys+++s¥y.q) are independent. So Lemma 4.11 is not true for finite exchangeable sequences. And by considering xX; = 2 » we see that a finite exchangeable sequence may be uncorrelated but not a martingale difference sequence. Our next topic is the Markov property. (4.13) Lemma. For an infinite exchangeable sequence Z the following are equivalent. (a) Z is Markov. (b) o(a) Co(Z,) avs. fe) ola) = 9 o(2)) as. Remark. When the support of a is some countable set (85) of distributions, these conditions are equivalent to33 (4) the distributions (@,) are mutually singular. It seems hard to formalize this in the general case. Proof. Z is Markov if and only if for each bounded 9: R-+R and each n>2, (at) (9(Z_) 12, 9-+-s2qeq) = EU Zq) yay) 265+ Suppose this holds. Then by exchangeability, EC (Lg) |Zy s2gree-sZqiq) * E(0(2,)12,) as. So by martingale convergence EC 9CZ) [2q) = E9429) [2qsZgetgore+) = ElO(Zp) Ia) - In particular, a(+,A) = P(Z,€Ala) is essentially o(Z,)-measurable , for each A. This gives (b). If (b) holds then by symmetry ofa) Co(Z;) a.s. for each 7, and so a(a) CM o(Z;) as. And Corollary 3.10 says o(a) = T 27 o(Z;) a... which gives (c). If (c) holds then £(9(Z,) 12) = E(O(Z [Zp aee-r%q yet) by {c) £(9(Z,)[Zy_y 0) by conditional independence E(o(Z,) 12,4) by Ce) and this is (a). This is one situation where the behavior of finite exchangeable sequences is the same as for the infinite case, by the next result.34 extends (4.14) Lemma. Any finite Markov exchangeable sequence (Z,»--+s2y) inite Markov exchangeable sequence Z, provided N > 3. This cannot be true for N= 2, since a 2-exchangeable sequence is vacuously Markov. Proof. The given s+oZy) extends to an infinite sequence Z whose distribution is specified by the conditions (4.15) (i) Z is Markov; a 2 (48) (Zp s2yyq) ® ,2,)- sy) ore We must prove Z 4s exchangeable. Suppose, inductively, that (Z,,- is mexchangeable for some m>N. Then (Zy5---sZq) 2, by (i) and (ii) we get 2 (4.16) (ZyreeesEqaq) = (2yo0 So these vectors are m-exchangeable. Hence (Zg900-s2naq) ane med it (4.174) (4s. »Z,) and so by the Markov property (at time 2) 2 rat) © (Uyree Ta oT sa ee) * Similarly, using the Markov property at time m, Q (4,17) (ZyreeeeZqaq) 7 Zp styotgore-oZana) - Finally, for any permutation m of (2,...,m) we assert (4.17¢) (Zys2yc0) 9° 2a) Ene) 2 a yeeTpag) « For let Y= (Zyseeerd yds Ye (Zyacgyeeeeea¢m))* Then the triples (2 Y2pyq) and (Ly Poz ggg) are arkov, and (4.16) shows (Z,,¥) ® (2,59)35 and (YZ, which gives (4.17c). Now (4.17a-c) establish Ris avy) ~ (452, im) the (m+1)-exchangeability of (Z)s+--s2i44)+ Let us digress slightly to present the following two results, due to Carnal (1980). Informally, these can be regarded as extensions of Lemmas 4.13 and 4,14 to non-exchangeable sequences. (4.18) Lemma. Let X), Xo) X3 be for ai ordering (i,5,k) of (1,2,3), and Xs are conditionally independent given X,. Then X), X)» X3 are conditionally independent given F = of Xj) Mo(Xy) MO(X3). (4.19) Lemma. For an infinite sequence X, the following are equivalent: (2) Og Mag? 2° (b) (X;5 121) are conditionally independent given @ = 9 o(X;)- i is Markov, for any distinct n Proof of Lemma 4.18. We shall prove that for any ordering (i,j,k), (a) o(45) Mo(X;) = F aes.s {b) X; and OK 5 Xe) are conditionally independent given o(X5) No(K)« These imply that X; and a(X5,%,) are conditionally independent given Fy and the lenma follows. For AS oX;) and BE o(X 5)» conditional independence given P(AMB|X,) = P(ALX,)*P(BIX,). So for A © ofX;) Na(K5) we have P(A] X,) = (PAIX, so AE o(X) as., giving (a). For bounded 6: R—+R, conditional independence gives > ECO(%G)[X50%) = ECOO%) 1X5) = COOK) 1X) + So EC(%5) 1X5 0%) is essentially o(K;) Mo(X,)-measurabie, proving (b).36 Proof of Lemma 4.19. Suppose (a) holds. for distinct i, j,k, the hypothesis of Lemna 4.18 holds, so-by its conclusion £(0(X5) 1X5) = EC 9X4) [o(Kj) M9(Kj) VO(K,}) : Since this holds for each k # i, j, ECHOX,) 1X5) € 19K) =Ga.s. m1 In other words, X; and X; are conditionally independent given G. But the Markov hypothesis implies that X, and o(X,3 k#i) are conditionally independent given X;. These last two facts imply (A7) that X, and o(%,3 k#4) are conditionally independent given G, and the result follows. 5. Finite exchangeable sequences As mentioned in Section 1, the basic way to obtain an N-exchangeable sequence is as an urn process: take N constants Y,...s¥ys not neces- sarily distinct, and put them in random order. (5.1) (Yyoeeerty) = Gacayees Sac) % the uniform random permutation on {1,.-.,N}. In the notation of (2.13), Y has empirical distribution (5.2) i Y) ayy = Aly) Conversely, it is clear that: (5.3) if Y is N-exchangeable and satisfies (5.2) then Y has distribution (5.1). Let uy denote the set of distributions L(Y) for urn processes aea7 Let Uy denote the set of empirical distributions ay ~ Let o: Uy—ruy nesy, NN be the natural bijection 9(L(Y)) = Ay(Y). The follo g simple result is a partial analogue of de Finetti's theorem in the finite case. (5.4) Lemma, Let Z= (Z,s---52y) be Noexchangeable. Then 3” (4(Z)) is a regular conditional distribution for Z given Ay(Z). In words: conditional on the empirical distribution, the N (possibly repeated) values comprising the empirical distribution occur in random order. In the real-valued case we can replace "empirical distribution” by "order statistics", which convey the same information. Proof. For any permutation 1 of {1,...yNIs (Zyoeee akg DD) 2 Zcqyeee egy eZ) « So if B(w,+) is a regular conditional distribution for Z given Ay(2) then (a.s. w) (a) the distribution 8(w,+) is N-exchangeable. But from the fundamental property of r.c.d.'s, for a.s. w we have (b) the N-vector with distribution 8(w,+) has empirical distribution ty( Zo). And (5.3) says that (a) and (b) tapty Bws+) = OM Ay(Z(w))) aes. Thus for some purposes the study of finite exchangeable sequences reduces to the study of “sampling without replacement" sequences of the form (5.1). Unlike de Finetti's theorem, this idea is not always useful: for example, it does not seem to help with the weak convergence problems discussed in Section 20.38 From an abstract viewpoint, the difference between finite and infinite exchangeability is that the group of permutations on a finite set is compact, whereas on an infinite set it is non-compact. Lemma 5.4 has an analogue for distributions invariant under a specified compact group of transformations; see (12.15). We know that an N-exchangeable sequence need not be a mixture of i.i.d.'s, that is to say it need not extend to an infinite exchangeable sequence. But we can ask how "close" it is to some mixture of i.i.d.'s. Let us measure closeness of two distributions u, v by the total variation distance (5.5) TuevI = sup u(A)-v6A) | : The next result implies that an M-exchangeable sequence which can be extended to an N-exchangeable sequence, where N is large compared to W, is close to a mixture of i.i.d. " (5.6) Proposition, Let Y be N-exchangeable. Then there exists an infinite exchangeable sequence Z such that, for 1 M) is by taking the first M variables; Proposition 5.6 and the discussion of extendibility in Section 1 show that the M-exchangeable sequences obtainable in this way are restricted. Here is another way of getting new exchangeable sequences from old. Let N,K>1. Let Z = (2: 1<4 0 (Stopping times are relative to the filtration F, = o(Zy5---sZ,)> trivial.) Remarks (a) Ryll-Nardzewski (1957) proved (A) implies exchangeability. Property (A), under the name “spreading-invariance”, arises naturally in the work of Dacunha-Castelle and others who have studied certain Banach space problems using probabilistic techniques. A good survey of this area is Dacunha-Castelle (1982).43 (b) The fact that (B) and (C) are equivalent to exchangeability is due to Xallenberg (1982a), who calls property (C) "strong stationarity*. The idea of expressing exchangeability-type properties in terms of stopping times seems a promising technique for the study of exchangeability concepts for continuous-time processes, where there is a well-developed technical machinery involving stopping times. See Section 17 for one such study. (c) Stopping times of the form T+I are predictable stopping times. (d) The difficult part is proving these conditions imply exchangeabitity. Let us state the (vague) question (6.2) Problem. What hypotheses prima facie weaker than exchangeability do in fact imply exchangeability? The best result known seems to be that obtained by combining Lemma 6.5 and Proposition 6.4 below, which are taken from Aldous (1982b). For the proof of Theorem 6.1 we need the following extension of Lerma 3.4, (6.3) Lemma. Let (@,) be an increasing sequence of g-fields, let ¢=V6,, n and let FCG. Let Y be a bounded random yariable such that for each n there exists F, CF such that E(Y|F,) 2 E(y|@,). Then E(Y|F) = E(¥|6) a.s. Proof. Write Tul = cv, Then rE(YIG,)¥ = GE(YIF,)I < HE(YIF)I. Since E(Y|6,)—*E(¥I@) in 2 by martingale convergence, we obtain HE(Y|G}I < WE(Y[F)E, But FCG implies e(Y|F)0 s ME(Y|G)n. So WE(Y|G)E JE(Y[F)#, and now Lemma 3.4 establishes the result. infinite sequence with tail (6.4) Proposition. Let X be field T. Suppose that for each j, k>1 there exist ny,..-sm, such that ny > ia4 2 é and Ors oKyag ore aKa) 2 Op oKyan oe Gan) Then (X53 421) are conditionally independent given T. Proof. Fix m,n>1, let F = o(k,.X +) and Tet G, = o(Xp5---2Kq)- me? n By repeatedly applying the hypothesis, there exist qj.---.g, such that 2 Bar ess ag > mM and (Xp5-.09%q) = a So for bounded 9: R-+R we in have £(8(X,)1G,) BECO )IF A)» where Fr = o(Xq seeesX, ) GF. Applying 2 Sn Lema 6.3, E(O(Xq) [Xp sXga+- Y= (GOK) [kyo Xngg ores) 2eSe = E(9(X,)|T) 2.8. by martingale convergence. This says that X, and o(X).X3,-.-) are conditionally independent given T. But for each j the sequence (X,,X;,,.---) satisfies the hypotheses of the Proposition, so X; and o(Xj41+Kj49>- are conditionally independent given T. This establishes the result. (6.5) Lemma. Let X be an infinite sequence with tail o-field T. Suppose 2 xx gach n>1. OX) oXn ar Mnee Xn43?* ne Xns2 nase + Then the random variables X, are conditionally identically distributed given T, that is €(4(X,){T) = E(9(X,)IT), gach n>1, ¢ bounded, Proof. By hypothesis E(O(X))[Xp4qoXpygeee+) = ECO) [Xpa Mnggree ed Condition on T. Proof of Theorem 6.1. It is well known (and easy) that an i.i.d. sequence has property (B). It is also easy to check that any stopping time T on (F,) can be taken to have the form T= t(Z),Zp,..-), where the function t(x) satisfies the condition45 wy if tlx) =a and xi = x;, ign. then t(x') Let Z be exchangeable, directed by a say, and let T= t,(Z) be an increasing sequence of stopping times. Conditional on a, the variables 1, ave i.i-d, and the times (T,,) are stopping times, since the property (*) is unaffected by conditioning. Thus we can apply the i.i.d. result to see that conditional on a the distributions of Z and Oy eptyar) are identical; hence the unconditional distributions are identical. Thus exchangeability implies (8). Plainly property (B) implies (A) and (C). It remains to show that each of the properties (A), (C) implies both the hypotheses of (6.4) and (6.5), so that (Z;) are conditionally i-i-d. given T, and hence exchangeable. For (A) these implications are obvious. So suppose (C) holds. Let j, k,n >1. Applying (C) to the stopping times S, T, where $= j and T=j on (ZeF), = jtn on (2, ¢FI, we have (Zigp>2jn09°° (Zag e2 tee? ). Since these vectors are identical on {2,€F}, the conditional distributions given 12, ¢F3 must be the same. That is, conditional on {25 €FI the distributions of Zyap2yager) and of (Zysna+Zjanegres7) are the same. Since F is arbitrary, we obtain 2 (6.6) (252541 2je02- + 2jak? 2 L525 anei?- aol and this implies the hypothesis of (6.4). Finally, property (C) with T= 1 shows Z is stationary, so antk) (25 4m2? jane? iw and this gives the hypothesis of (6.5).6 Remark. Here we have used Proposition 6.3 for sequences which eventually turn out to be exchangeable. However, it can also give information for sequences which are in fact not exchangeable; see (14.7). nite exchangeable sequences. for a finite sequence (Zyy-++s2y) conditions (A)=(C) do not imply exchangeability. For instance, when N= 2. they nerely imply 2,2 2). On the other hand an exchangeable sequence obviously satisfies (A); what is Tess obvious is that (B) (and hence (C)) holds in the Finite case, where the argument used in the infinite case based on de Finetti's theorem cannot be used. (6.7) Proposition. Let (Z)>---s2y) be exchangeable and let O is a martingale, so for a stopping time T1, the following more general fact. (6.9)(k) Assertion, Whenever (Z;) is exchangeable over V and O so. 22. 2ygapeeey) by (6.9) for k (Waly yerseety) by exchangeabiTity over V, (WsZ)s2p gees 1 To41 Taq tl iy establishing (6.9) for k+l in the special case T, In the generat case, fix i, and define vi, Z} as at (6.8). On the set (Ty=i) we have Ty = itty, where 7, = 0 and * is a stopping time with respect to Gi = o(v',21,...,24) = Gi, So by (6-8) and the special case, ws wiz s1_,), conditional on (1, = 4}. io i u Th avo +) Neink?* kA This implies ional on (7. 2 ; 4) F WekyyoereeZy)s condi (Vip gqeee eed TT Tht Since this holds for each i, it holds unconditionally, establishing (6.9) for k+1.49 7. Abstract spaces Let $ be an arbitrary measurable space. For a sequence Z= (2)2Zys+44) of S-valued random variables the definition (1.2) of "exchangeable" and the definition (2.6) of "mixture of i.i.d.'s" make sense. So we can ask whether de Finetti's theorem is true for S-valued sequences, i.e. whether these definitions are equivalent. Oubins and Freedman (1979) give an example to show that for general S$ de Finetti's theorem is false: an exchangeable sequence need not be a mixture of i.i.d. sequences. See also Freedman (1980). But, loosely speaking, de Finetti's theorem is true for "non-pathological" spaces. One way to try to prove this would be to examine the proof of the theorem for R and consider what abstract properties of the range space $ were needed to make the proof work for S, However, there is a much simpler technique which enables results for real-valued processes to be extended without effort to a large class of abstract spaces. We now describe this technique. (7.1) Definition. Spaces 51, S, are Borel-isonarphic if there exists a bijection ¢: S)—+S, such that @ and @”! are measurable, A space S is a Borel (or standard) space if it is Borel-isonorphic to some Borel- measurable subset of R. It ig well known (see e.g. Breiman (1968) A7) that any Polish (1.2. complete separate metric) space is Borels in particular a", R” and the familiar function spaces (0,1) and 0(0,1) are Borel. Restricting attention to Borel spaces costs us some generality; for instance, the general compact Hausdorf space is not Borel, and it is known (Diaconis and Freedman (1980a)) that de Finetti's theorem is true for compact Hausdorf spaces, but50 has the great advantage that results extend automatically from the real-valued setting to the Borel space-valued setting. We need Some notation. Let P(S) denote the set of probability measures on S. As at (4.3), for functions f: SS, or g: S;xS,—+S, define the induced maps #: P(S,)—>P(S)), Gs Sx P(S,)—>P(S3) by (7.3) FL(Y)) = LCF) 5 GExeL(Y)) = L(g(x.¥)) (7.4) Proposition. Let Z be an infinite exchangeable sequence, taking values in a Borel space S. then Z is a mixture of i.i.d. sequences. Proof. Let $: S—+B be an isomorphism as in (7.1) between $ and a Borel subset 8 of R. Let 2 be the real-valued sequence ((2;)). Then 2 is exchangeable, so by the result (3.1) for the real-valued case, 2 is a mixture of i.1.d. sequences, directed by a random measure @, say. Since 2, €B we have &+,8) = 1a.s., so we may regard & as P(B)-valued. The map y= gl: 8 S induces a map $: P(B)—+P(S), and a = $(&) defines a random measure on S. It is straightforward to check that 2 is a mixture of i.i.d.'s directed by a. Exactly the same arguments show that all our results for real-valued exchangeable sequences which involve only “measure-theoretic" properties of R can be extended to S-valued sequences. We shall not write them ail out explicitly. Let us just mention two facts. (7.5) There exists a regular conditional distribution for any S-valued random variable given any o-field. (7.6) Let & be U(0,1). For any distribution » on S there exists f: (0,1)—+S such that #(£) has distribution wy.51 Topological spaces. To discuss convergence results we need a topology on the range space $. We shall simply make the Convention. All abstract spaces $ mentioned are assumed to be Polish. Roughly speaking, convergence results for real-valued exchangeable processes extend to the Polish space setting. Let us record some notation and basic facts about weak convergence in a Polish space S. We assume the reader has some familiarity with this Parthasarathy (1967)). topic (see e.g. Billingsley (1968. For bounded f: SR write (7.7) #(8) = frooecon . Let C(S) be the set of bounded continuous functions f: S—+R. Give P(S) the topology of weak convergence: e, 78 iff F(0,) + F(0)s each f EC(S). The space P(S) itself is Polish: if d is a bounded complete metric on S$ then (7.8) A(usy) = inf{Ed(X,¥): LOX) =u, L(Y) =v} defines a complete metrization of P(S)- (7.9) Skorohod Representation Theorem. Given 9,—>8, we can construct random variables X,—>X a.s L(x) = 6. A sequence (8,) is relatively compact iff it is tight, that 1s for each 2 >0 there exists a compact K,CS such that inf @,(kK.) >1-e. There 4 exists a countable subset H of C(S) which is convergence-determining:52 (7.10) if lim h(s_) = f(e), hEH, then 6 8. nen n In particular H is determining: (7.11) if f(a) = A(u), HEH, then 6 =u. For a random measure a on S, that is to say a P(S)-valued random variable, and for bounded h: $—>R, the expression h(a) gives the real-valued random variable {n(x)a(+ds). By (7.10), Hf (7.12) Ray) = R(a,) as, hEH where H is a countable determining class, then a, = For a random measure a on S$ define (7.13) (A) = Eo(+,A), ACS, so & is a distribution on S. Here is a technical lemma. (7.14) Lenma. Let (a,) be random measures on S. (a) If (G,) is tight on P(S) then (L(a,)) is tight in P(P(S)). (b) If (o,) i in) As, a martingale, in the sense that 2B) a.s.3 BCS, n>1, Flagg (98)1F,) = Sq for some increasing o-fields (F,), then there exists a random measure 8 such that ot, > B a.s. Pw: aq(us+) —» (ws) in P(S)) = 1 Proof. (a) Fix ¢> 0. By hypothesis there exist compact Kk; CS such that53 (7.15) Bytk) < eas inode So by Markov's inequality (7.16) Plag(+5K§) > 274) < e224 = 2; jpn 20. So, setting (7.17) 0+ fe: a(n) <2°4, all jo), we have from (7.16) P(a,e6) 21-6; n>V. Since © is a compact subset of P(S), this establishes (a). (b) For each h@C(S) the sequence f(a,) is a real-valued martingale. So for a countable convergence-determining class # we have (2.5.) lim A(a,(w)) exists, each he H now Thus it suffices to prove that a.s. (7.18) the sequence of distributions a,(u,+) is tight. By the martingale property, &, does not depend on n. Take (K;) as at n J (7.15). Using the maximal inequality for the martingale a,(+.k§) gives Plo, cy yond ¢ -i i(toKi) >2°9 for some n) < e2°9, So for @ as at (7.17), P(w: a,(w.+)€O for all n)>l-e. This establishes (7.18).54 Weak convergence of exchangeable processes. First observe that the class of exchangeable processes is closed under weak convergence. To say this precisely, suppose that for each k >1 we have an infinite exchangeable Kas a random (resp. Neexchangeable) sequence Z* = (2K), Think of 2 element of S® (resp. SN), where this product space has the product tech- nology. If Z*2+x, which in the infinite case is equivalent to (7.19) (1.5K) Be Oaeeeatg) as kote; each m2 Ts then plainly X is exchangeable. Note that by using interpretation (7.19) we can also talk about 2* 2x where z* is wk-exchangeable, ore and X is infinite exchangeable. Note also that tightness of a family (2k) of exchangeable processes is equivalent to tightness of (2). Given some class of exchangeable processes, one can consider the “weak closure" of the class, i.e. the (neces- sarily exchangeable) processes which are weak limits of processes from the given class. We know that the distribution of an infinite (resp. finite) exchangeable process Z is determined by the distribution of the directing random measure (resp. empirical distribution) a. The next result shows that weak convergence of exchangeable processes is equivalent to weak convergence of these associated random measures. Kallenberg (1973) gives this and more general results. (7.20) Proposition. Let Z be an infinite exchangeable sequence directed by a. For k>1 let 2 be exchangeable, and suppose either (a) each 2 is infinite, directed by a,» Says oF (b) 2 Ss n-exchangeable, with enpirical distribution a,» and y=55 then 2° 27 if and only if oy a, that is to say Ua) ela) in P(P(S)). : Proof. (a) Recall the definition (7.8) of @. It is easy to check that the infimum in (7.8) is attained by some distribution L(X,Y) which may be taken to have the form g(8,u) for some measurable g: P(S) x P(S) —+ P(SxS). To prove the "if" assertion, we may suppose a,—a a.s., by the Skorohod representation (7.9). Then Fay sa) +0 a.s. For each k Tet (EWR) = ((vEK); 121) be the s®-valued infinite exchangeable sequence directed by g(aj.a). Then (i) VER TK, W225 each k>T Also e(a(VEWS) [aay sa) = Alaysa), and so (i) a(vewe) 2 0 as k=. Properties (i) and (14) imply 2%. Conversely, suppose 227, since a = U2), Lema 7.14 shows that (aj) 1s tight. If @ is a weak Timit, the "if" assertion of the Proposition implies @2a, so aa as required. (b)- Let 2% be the infinite exchangeable sequence directed by a. By Proposition 5.6, for fixed m>1_ the total variation distance WK.) = C22 tends to 0 as kw. so 2 2z iff 227, and part (b) follows from part (a). proposition 7.20 is of little practical use in the finite case, e.g. in proving central limit theorens for triangular arrays of exchangeable variables, because generally finite exchangeable sequences are presented in such a way that the distribution of their enpirical distribution is not manifest. Section 20 presents more practical results. Even in the infinite case, there are open problems, such as the following.56 Let Z be an infinite exchangeable real-valued sequence directed by a. For constants (a;5-++52,) we can define an exchangeable sequence ¥ by taking weighted sums of blocks of Z: By varying (msays-++2a,) we obtain a class of exchangeable sequences; Tet C(Z) be the weak closure of this class. (7.21) Problem. Describe explicitly which exchangeable processes are in eZ). This problem arises in the study of the Banach space structure of subspaces of 1; see Aldous (1981b). There it was shown that, under a uniform integrability hypothesis, ¢(Z) must contain a sequence of the special form (WY,)+ where (Yj) is 1-4-4. symmetric stable, and Vis independent of vv contains a subspace linearly isomorphic to some £, space. Further infor- This implies that every infinite-dimensional linear subspace of u mation about Problem 7.21 might yield further information about isomorphisms between subspaces of L'. (7.22) Stable convergence. For random variables X,,Xp,--- defined on the same probability space, say X, converges stably if for each non-null event A. the conditional distributions L(X,|A) converge in distribution to some limit, uy say. Plainly stable convergence is stronger than convergence jn distribution and weaker than convergence in probability. This concept is apparently due to Rényi (1963), but has been rediscovered by many authors; a recent survey of stability and its applications is in Aldous and57 Eagleson (1978). Rényi and Révész (1963) observed that exchangeable processes provide an example of stable convergence. Let us briefly outline this idea. Copying the usual proof of existence of regular conditional distributions, one readily obtains (7.23) Lemma. Suppose (X,) converges stably. Then there exists a random measure 8(w,+) which represents the limit distributions uw, via P(A)ug(8) = frgtwdatw.arecas Aca, BCS. Let us prove (7.24) Lemma. Suppose (Z,) is exchangeable, directed by a. Then (Z,) converges stably, and the representing random measure 8 = a. Proof. Let f ©C(S) and A€o(2,, wT) Then for n>m PCA)E(#(Z,)1A) = E Vp E(F(Z,)|Zp sere eZ) = EI,E(F(Z,)|a) by conditional independence = £1, [Fledalasax) : Tq)s Thus PCRJE(F(ZAIA)) > Ely [FOxdalwde) as nee for AE o(Zy, and this easily extends to al] A. Thus L(Z,|A)—*v,» where P(A)ug(*) = E1galass) as required. Note that our proof of Lemma (7.24) did not use the general result (7.23). It is actually possible to first prove the general result (7.23) and then use the type of argument above to give another proof of de Finetti's theorem; see Rényi and Révész (1963).58 If Xq» defined on (9,F,P), converges stably, then we can extend the space to construct a "limit" variable X* such that the representing measure 8 is a regular conditional distribution for X* given F. Then (see e.g. Aldous and Eagleson (1978)) (7.28) (Y4x_) 2B (1x45 all YER. Classical weak convergence theorems for exchangeable processes are stable. For instance, let (2;) be a square-integrable exchangeable sequence directed by a. Let s, = 07 V/? H(z, ~mean(a)). Tren S, converges stably, and its representing measure s(uye) is the Normal N(Qvar(a)) distribue tion. If we construct @ N(0,1) variable W independent of the original probability space, then not only do we have S, 2 s* = (var(a)3!/?W as at (2.27), but also by (7.25) (18,) & (v,s*)s each Yin the original space. 8, The subsequence principle Suppose we are given a sequence (X,) of random variables whose distributions are tight. Then we know we can pick out a subsequence Y; = X, which converges in distribution. Can we say more, e.g. can we pick (¥;) to have some tractable kind of dependence structure? It turns out that we can: informally, (A) we can find a subsequence (Y,) which is similar to some exchangeable sequence Z. Now we know from de Finetti's theorem that infinite exchangeable sequences are mixtures of i.i.d. sequences, and so satisfy analogues of the classical59 limit theorens for i.i.d. sequences. So (A) suggests the equally informal assertion (B) we can find a subsequence (Y,) which satisfies an analogue of any prescribed limit theorem for 1.1.4. sequences. Historically, the prototype for (B) vas the following result of Komiés (1967). (8.1) Proposition, If sup E|X;| <= then there exists a subsequence (Y;) such that Nv! iyow ae for some random variable .V. This is (B) for the strong law of large numbers. ChatterJi (1974) formulated (B) as the subsequence principle and established several other instances of it. Aweak form of (A), in which (Y;) is asymptotically exchangeable in the sense +) as joe, (gare Vgeare arose independently from several sources: Dacunha-Castelle (1974), Figiel and Sucheston (1976), and Kingman (unpublished), who was perhaps the first to note the connection between (A) and (B). We shall prove this weak form of (A) as Theorem 8.9. Unfortunately this form is not strong enough to imply (B); we shall discuss stronger results later. The key idea in our proof is in (b) below, An infinite exchangeable sequence Z has the property (stronger than the property of stable convergence) that the conditional distribution of Z.,, given (2),+--22,) converges to the directing random measure; the key idea is a kind of converse, that any sequence with this property is asymptotically exchangeable. Our arguments are rather pedestrian; the proof of Dacunha-Castelle (1974) uses ultrafilters to obtain limits, while Figiel and Sucheston (1976) use Ramsey's combinatorial theorem to prove a result for general Banach spaces which is readily adaptable to our setting.60 Suppose random variables take values in a Polish space S. (8.2) Lerma. Let Z be an infinite exchangeable sequence directed by a. (a) Let a, be a regular conditional distribution for Z,,, given (Zyo--e52q)+ Then a,—a a8. (b) Let X be an infinite sequence, let a, be a regular conditional distribution for Xqy1 given (Xys++9Kq)» and suppose a,—ra a.s. Then 4 (Zys2g5 (8.3) Oger %nee Proof. (a) Construct Zy so that (Z;3 120) jis exchangeable. Let h@C(S), and define fh as at (7.7). Then Alay) = E(W(Z,,)) |Z) >+-+9%_) = E(h(Z9) 12), Z,) by exchangeability — E(h(Zp)|Z5s 121) ass. by martingale convergence E(h(Z)|a) f(a). Apply (7.10). (b) Given X and a, Tet Fy = o(Xyoe+eoXn)o F = of{X;3 121) and construct Z such that Z is an infinite exchangeable sequence directed by a and also (8.4) Z and F are conditionally independent given a. We shall prove, by induction on k, that (8.5) (VX, Kgag) 2 Ws2y reseed) a8 mes each VE FS ine] ? for each k. This will establish (b).61 Suppose (8.5) holds for fixed k>0. Let f: skxs—+R be bounded continuous. Define #: S*xP(s)—+R by Flay seer sXpol(WD) = EF(xy oer okt) + Note # is continuous. By the fundamental property of conditional distributions, (8.6) ECF Xngy reek nace Xneked) Fisk) 7 Ppa e+ neko Sate (8.7) E(F(Zy eee sZyoZyyq )[FeZyoe++0Zy) = FyoreesZyoa)s using (8.4). Fix m>1 and AS Far By inductive hypothesis (asl gor Koa) De (dalgs2yrsee%,) a8 ve sTpehaer ee Xnak es : Since Gy AeSee (8.8) (as1,5%, Xate?Inay) Sr (es lyel Za) as ae, F slgeknereee Mpa nek spelprerealys . Now EF Rag oer knee lg = ER Mga pee Xpaeetnek) tar mms by (8.6) > ER(Zyseees2ysa)Iqp 2S ne, dy (8.8) and continuity of #5 = EF(ZyseeesZpaq Iq by (8.7). Since this convergence holds for all f, we see that the inductive assertion (8.5) holds for k+1 when V AGF. But m is arbitrary, ny so this extends to all VF.62 (8.9) Theorem, Let X be a sequence of random yariables such that £(X;) is tight. Then there exists a subsequence Y, =X, such that (Y 2 sar eVjageres) > Zys2es for some exchangeable 2. We need one preliminary. A standard fact from functional analysis is that the unit ball of a Hilbert space is compact in the weak topology (i.e. the topology generated by the dual space): applying this fact to the space L= of random variables gives (8.10) Lenma. Let (V;) be a uniformly bounded sequence of real-valued random variables. Then there exists a subsequence On and a random variable V such that EV, I,—> EVI, for all events A. Proof of Theorem 8.9. By approximating, we may suppose each X, takes values in some finite set Let (hj) be a convergence-determining class. i By Lemma 8.10 and a diagonal argument, we can pick a subsequence (X,) such that as no (8.13) Eh (XQ Tq EVGT 3 each A, j. We can now pass to a further subsequence in which =n (8.14) ECs % 419) - EV, 1A)| <2 for each n>1, each 1 0. Let a, be @ regular conditional distribution for X,,, given F,- We shall prove a,—8 a.s. for some random measure 8, and then Lemma 8.2(b) establishes the theorem. Note63 (8.15) E(ny XyaylFy) = 8;Ca,) - Fix m2] and an atom A of Fa. By (8.13), (8.16) L(X,]A) ups Says where Aj lug) 7 E(VSIA) i Let g, be the random measure such that Blues A,(6,) = EV; IF)» and so by (8.14) and (8.15) ugl-) for we A. So (8.17) [ja )-hj(8,) | <2"; lsicn. We assert that (g,) forms a martingale, in the sense of Lemna 7.14, For an atom A of F, is a finite union of atoms AL OF Fayys and by (8.16) ug(B) = TPCALIADg, (8) BCS, which implies £(8,4,(+48)[F_y) = 8,(+98)- Now by Lemma 7.14 we have g —8a.s., for some random measure 8. And (8.17) implies Fj (a,) > hy (8) a.s. foreach j, and so o,—8 a.s. as required. Let us return to discussion of the subsequence principle. Call (Y;) almost exchangeable if we can construct exchangeable (Z;) such that T1Y4-2,1 < © a.s. (we are now taking real-valued sequences}. Plainly such a (Y,) will inherit fron (Z;) the property of satisfying analogues, of classical limit theorems. So if we can prove (8.18) Every tight sequence (X,) has an almost exchangeable subsequence (4) i then we would have established a solid form of the subsequence principle (B). Unfortunately (8.18) is false. See Kingman (1978) for a counterexample, and Berkes and Rosenthal (1983) for more counterexamples and discussion of which sequences (X;) do satisfy (8.18).64 Thus we need a property weaker than “almost exchangeability" but stronger than “asymptotically exchangeable". Let e, 40. Let (X,) be such that for each k we can construct exchangeable a, §zk) such that PUK Zl > ey) $e for each j > k. This property (actually, a slightly stronger but more complicated version) was introduced by Berkes and Péter (1983), who catl such (x,) strongly exchangeable at, infinity with rate (e,). They prove (8.19) Theorem. Let (X;) be tight, and let ¢, +0. Then there exists a subsequence (Y,) which is strongly exchangeable at infinity with rate (ey (Again, they actually prove a slightly stronger result). From this can be deduced results of type (B), such as Proposition 8.1 and, to give another example, the analogue of the law of the iterated logarithm: subsequence (Y;) and random variables V. $ such that Tim sup (2N Tog tog(n))"/? J (v4-V) me (8.20) Proposition. If sup ext < then there exists = Sa.s. A different approach to the subsequence principle is to abstract the Let AC P(R)xR™ be the set idea of a "limit theorem" aN {(6,x): mean(a) == or Tim N'Y x, =mean(a)} « i T Then the strong law of large numbers is the assertion (8) . (8.21) P((O,XyoXos---JEA) = 1 for (X5) feted Similarly, any a.s. limit theorem for i.i.d, variables can be put in the form of (8.21) for some set A, which we call a statute. Call A a limit statute if also65 if (8.x) GA and if TR, <= then (8,2) EA. Then Aldous (1977) shows (8.22) Theorem. Let A bea limit statute and (X;) 2 tight sequence. Then there exists a subsequence (Y;) anda random measure a such that (asYpYoreee) A ase Applying this to the statutes describing the strong law of large numbers or the law of the iterated logarithm, we recover Propositions 8.1 and 8.20. To appreciate (8.22), observe that for an exchangeable sequence (Z;) directed by a we have (c,2).Zy..44) € A a.s. for each statute A, by (8.21). So for an almost exchangeable sequence (Y,) anda limit statute A we have (as¥ysYyse04) © A a.s. Thus (8.22) is a consequence of (8.18), when (8.18) holds; what is important is that (8.22) holds in general while (8.18) does not. The proofs of Theorens 8.19 and 8.22 are too technical to be described here: interested readers should consult the original papers. 9. Other discrete structures In Part III we shall discuss processes (X;: 4€1) invariant under specified transformations of the index set I. As an introduction to this subject, we now treat some simple cases where the structure of the invariant processes can be deduced from de Finetti's theorem. We have already seen one result of this type, Corollary 3.9. Two exchangeable sequences. Consider two infinite S-valued sequences (X;), (Y,) such that66 (9.9) the sequence (pYp> i> 1, of pairs is exchangeable. Then this sequence of pairs is a mixture ‘of i.i.d. bivariate sequences, directed by some random measure a on $x$, and the marginals ay(w), are the directing measures for (X,) and for (Y;). Corollary 3.9 says that the stronger condition 2 (9.2) (Xp oXpoeee5¥ya¥greee) # Olea Kacy ee Yop va(ayere+) for all finite permutations ™, 0 holds iff a(w) = ay(w) x ay(w). If we wish to allow switching X's and Y's, consider the following possible conditions: (9.3) (X) shy aX goo ee 3Vyo¥gaY gee (YqsYo0 9 2 : YE (¥yoXyoXgaeee5XyoVorVgoree) + (9.4) (Xy XpoXgeeeeiVpetge¥grees Let h(x.y) = (ysx)s Tet fi: P(SxS)—+P(S*S) be the induced map, and let $ be the set of synmetric (i.e. fi-invariant) measures on SxS. (9.5) Proposition. (a) Both (9.1) and (9.3) hold iff £ {b) Both (9.1) and (9.4) hold (c) Both (9.2) and (9.3) hold iff (ayscry) ® (aysay)+ (4) Both (9.2) and (9.4) hold iff = 4, ox Oy Bepuss is exchangeable.67 This is inmediate from the remarks above and the following lemma, applied to Z,= (Xp 9¥G)- (9.6) Lerma. Let h: S—+S be measurable, let Ri: P(s)—»P(S) be the induced map, and Jet P, be the set of distributions u which are heinvariant: f(u) =u. Let Z be an infinite exchangeable S-valued sequence directed by a. j Q (4) Z © (h(Z,)sh(2p) h(Z5) > (14) ZB (n2y)szps2geZqeee-) LEE a= Ala) aes, that is oS Py aes ) ae a 2 fe). Proof. Lerma 4.4(a) says that (h(Z;)) is an exchangeable sequence directed by fila), and this gives (i). For (ii), note first a isar.e.d. for 2) given a f(a) is ar.c.d. for h(Z,} given a . Writing W= (Zp»Zg¢++)» we have by Lemma 2.19 a isar.c.d. for 2, given Wy fila) isa r.c.d. for. n(Z,) given We Now (2,44) 7 (h(Z,),N) iff the conditional distribution for Z, and h(Z,) given W are a.s. equal: this is (it It is convenient to record here a technical result we need in Section 13. (9.7) Lerma. Let (X;), (Yj) be exchangeable. Suppose that for each subset A of {1,2,...} the sequence Z defined by 2, 7 Xe68 satisfies 22 x. Then the directing random measures a7, ay satisfy ay a.s. for each Z. Renark. This says that conditional on a= 6, the vectors indepedent as i varies and have marginal distributions 6, Proof. In the notation of Lemma 2.15, ay = A(Ky,Xps++-). Now a function of infinitely many variables may be approximated by functions of finitely many variables, so there exist functions g, such that (9.8) E d(ay say (Xyo--+oXj)) = Sy where @ is a bounded metrisation of P(S), and ,—~0 as k—™ Fix Z and define 2* by eX. isk Beelee he X, so by (9-8) £ dla pray l2hs..-sZQ)) = By. But - z ay = ay ass. because a is tail-neasurable; and Wax, for t 0 the sequence (Z;) = (Xj—-X(5_q)5) oF increments is exchangeable; (10.1b) XQ = 0. Assumption (b) entails no real Toss of generality, since we can replace Ky by X-Xq and preserve (a). Informally, think of interchangeable increments processes as integrals X, = [is ‘ds of some underlying exchangeable “generalized process" Z. Here is an alternative definition. Given disjoint intervals (ay.b,], (asb,] of equal Jength, let —+R* be the natural map which switches these intervals: (10.2) T(t) = ty (a,.b,) T(ajtt) = agtt, Tlagtt) = ay tts Oo0 set i= j2" and set Xi? (X(i#t)-X() o »T,) is Gaussian, and using (10.10) we compute &7% = (k-1)k°,, e242; * “Kk? (ig), and so (2,) is exchangeable. It follows that 8° has interchangeable increnents. From this we can construct more continuous-path interchangeable increments processes by putting (10.11) Xz + 089+ st; where (a,8) is independent of 8°, Note that Brownian motion on [0,1] is of this form, for a=1 and B having N(0,1) distribution. (10.12) Theorem. Every continuous-path interchangeable increments process (Ky: O1 let D, = C2": osic2™, Let Vay * (x(j27Meu) = x(527™) uc2™), considered as a random element of 0(0,2). For ted, Tet GE be the o-Field generated by (X,: s). Then conditional on Gf the variables x see Rea j2™>t) are an urn process, in the language of section (5412 jz 5, The elementary formulas for means and variances when sampling without replacement (20.1) show that for u€0,, u2t, E(K =X 16) = (10.16) Meno Hi ome t)(-u var(X,-x, 16%) = - lt Wt ae (1-t) (1-t= where 2-1 (10.17) dhe (x faite! teD, t do (isn)2™ “2 " For fixed t, the a-fields Gj are decreasing as m increases; for ted= Y D, let 6, = net and for general t let 6, eeu Suppose usd we can prove there exists a random variable a > 0 such that (10.18) fpr ats each te 0. Then reverse martingale convergence in (10.16) shows that for t “TE Me var (Xye%,|6,) + UGH) g78 These extend to all t 0) set Vy a’'/?x,. then V satisfies the hypotheses of Lemma 10.13, and so V is Brownian bridge, independent of a. Since X, ay, this establishes the theorem in the special case X, = 0. For the general case, set % = X,-tX,, define G2 using X as G2 was defined using 3 X, and include X, in the o-field. The previous argument gives that Vz 7 TR, is Brownian bridge independent of G5 > oa,X,). and then writing Xz 7 aV,+Xjt establishes the theorem in the general case. To prove (10.18), we quote the following Tenma, which can be regarded as a consequence of maximal inequalities for sampling without replacement (20.5) or as the degenerate weak convergence result (20.10). (10.19) Lemma. for each m>1 let (Zq ys--+Zq pq) be exchangeable. If (a) JZ), =0 for each m, i “ni sash ee ye. 0 20, ©) P% pro so Proof of (10.18). Set Z,; = (x _ Then (a) ee (i412 is immediate. For (b), 2 : } Tayi < Sn} Waq,al m £ 5n°2j Gr 0 since 6, —G* 0 by continuity and a} converges a.s. by reverse martingale convergence in (10.16). So conclusion (c) says79 et max [ottg’| <0 as me, top uuceaeaiecue a and this is (10.18). Remark. To remove the integrability hypothesis (10.13), note first that for non-integrable variables we can define conditional expectations “locally” E(U|F) = VY means that for every AEF for which VI, is integrable, we have that Ul, is integrable and E(UIg|F) = Vig- In the non-integrable case, (10.16) remains true with this interpretation. To establish convergence of the "local" martingale Q” it is necessary to show that (Q]: m21) fs tight, and for this we can appeal to results on sampling without replacement jn the spirit of (10.19). However, there must be some simpler way of making a direct proof of Theorem 10.12. Let us describe briefly the general case of processes (X,: 0. Recall that the Gamma(b,1) distribution has density 1 b-1,~: ¥(b) * there exists a Lévy process (Xe) such that x has Garma(a,1) distribu- on {x>0}. Since this distribution is infinitely divisible, tion, and hence X, has Ganma(at,1) distribution. Call X the Gamma(a) t process. Here is an alternative description. Let v be the measure on (0,2) with density (10.21) v(dx) = ax te ax Then v(e,~) <© for ¢>0, but v(0,~) ==. Let vA be the product of v and Lebesgue measure on Q= {(x,t): x>0, t>0}. Let N be a Poisson point process on Q with intensity vxA, Then N is distributed as the times and sizes of the jumps of xX: {xt}: XpoXp 2 xd So we can construct X from N by adding up jumps: Re TT o,sieu, sth? The Dirichlet(a) process is ¥, = Xy/Kj, O1, then there exists an exchangeable random partition R of {1,2.3,..+} such that (11.6)86 ur aim is to prove an analogue of de Finetti's theorem for exchangeable partitions of {1,2,3,...}. The role of i.i.d. sequences is played by the "paintbox processes" which we now describe. Let u be a distribution on [0,1]; think of u as partly discrete and partly continuous. Let (X;) be i.i.d. (y). Let R(w) be the partition with components {i: X;(w) =x}, (a) =X;(o)}. Clearly Ris Opy as. (11.8) Lemma. Let R be a paintbox(p) process. (a) NTR) (DysPyeeee) in 0 aeSes where, RY is the restriction of R to {1,2,.0.5N}. (b) Let C, be the component of R(w) containing 1. Then87 WT H(C, 911-6982) Bo PST Ey50) ° where P(J=3) = py» P(I=0) (c) P(1,2,...a" in same component) = J pls r>2. cremate 1 Here is the analogue of de Finetti's theorem, due to Kingman (1978b, 1982a). (11.9) Proposition. Let & be an exchangeable partition of 11,2, and let 2" be its rest 1,2s--0NP. Then (a) NTy(RY) 2252+ (0),0,.---) * 0 for some random element D of % (b) Uy is a regular conditional distribution for R given o(¥p)- So (b) says that conditional on D =p, the partition R has the paintbox(p) distribution yp. As discussed in Section 2, this is the "strong" notion of R being a mixture of paintbox processes. Proof. Let (&;) be i.f.d. uniform on (0,1), independent of R, Throwing out a null set, we can assume the values (E;(u): 121) are distinct. Define F,(w) = min(§s 1 and jin same component of R(u)) < 7 2, = 6 tier so for each w the partition R(w) is precisely the partition with components (i: Z;(u) =z}, O<¢z eT. We assert (Z,) is exchangeable. For (Z,) = 9((€,).8) for a certain function g, and (2,(5)) = 9((6,¢4))9T(R))» and (Eg (g))oER)) 2 ((E,)sR) by exchangeability and by independence of R and (E,). Let a be the directing random measure for (Z;). Then conditional on a=u the sequence (X;) is i.i.d. (i) and so R has the paintbox88 qecrahucieatwcts h : A ae distribution ¥%(,)+ In other words ¥y(4) is a regular conditional distribution for R given a, and this establishes (b) for D = L(a). And then (a) follows from Lemma 11.8(a) by conditioning on 0. Remarks. Kingman used a direct martingale argument, in the spirit of the first proof of de Finetti's theorem in Section 3, Our trick of labelling components by external randomization enables us to apply de Finetti's theorem. Despite the external randomization, (a) shows that D is a function of Ry Yet another proof of Proposition 11.9 can be obtained from deeper results on partial exchangeability--see (15.23). The Ewens sampling formula. Proposition 11.8 is a recent result, so perhaps some presently unexpected applications will be found in future. The known applications involve situations where certain extra structure is present. An exchangeable random partition R on {1,2,3,..-} can be regarded as a sequence of exchangeable random partitions RY on {1,2,....N} satisfying the consistency condition (11.5). Let us state informally another "consistency" condition which may hold. Fix N and r BC (1,2,...,N} be such thet 1€8, #B=r, Let ® denote the partition of 2 with the set containing 1 removed. The condition we want is -)iger) d. (11.12) PE@CRD = (ay yeep goal apag ee = WyeplOq eens s8pge2pe dot pgy ree To rephrase this condition, observe that the left side equals p(a(eY)=aleer’), for a= (a> Ny N Ny N. P(a(r) =a) -P(BER a(R) =a) - 1/P(BER) aula) + 2,/{t] = 1/P(C,2.-.reR') . Thus condition (11.12) implies depends only on (Nar) - This is the basis for the proof of (11.16) below. Now consider, for some exchangeable R, the chance that 1 and 2 belong to the same set in the partition: (11.13) P(R?= 1,23) = G,(0,1) = (148), says for sone O< 8s. There are two extreme cases: if @=0 then R is a.s. the trivial partition {1,2,...}5 if @=© then R is a.s. the discrete partition ((1}.{2},{3}s-04)e The interesting case is O<@1 are consistent jn the sense of (11.5). To see this, for a permutation o of {1,....N+} define a permutation = 9(c) of {1,...,N} by deleting N+1 from the cycle representation of o: oi) if ofi) #N+T o(N+1) if off) =N+T. Gli) Then «2a, atr\4y) = my 5 a pair i,j 0, ra, = The number of permutations of .N} with exactly a, cycles of Jength r (each r > 1) is And if (My 79My o> ) are the lengths of the cycles of a uniform random permutation of {1,....N}, arranged in decreasing order, then (11.15) says that N7T(My ysMy p2++) converges in distribution to the Poisson- Dirichlet(1) process. Remarks. There is a large literature on random permutations; we mention only a few recent papers related to the discussion above. Vershik and Schmidt (1977) give an interesting "process" description of the Poisson- DirichTet(1) Limit of NT (My yoMy poe Ignatov (1982) extends their ideas to general 6. Kerov and Vershik (1982) use the ideas of Theorem 11.14 ‘in the analysis of some deeper structure of random permutations (e.g. Young's tableaux).93 Components of random functions. A function f: {1,.++.N} — (1,+..5N} defines a directed graph with edges i—f(i) for each i, Thus f induces a partition of {1,...,N} into the components of this graph. The partition can be described by saying that i and j are in the same component iff * * * #04) = #™(j) for some kym2>1; where f*"(i) denotes the k-fold iteration f(f(...f(i).--)). 6 2 wi DID we—7 “> — . Vd Pe ‘SDs a — If we now Tet Fy be a random function, uniform over the set of all NY possible functions, then we get a random partition @” of {1,..-4Nb into the components of Fy. Clearly RY is exchangeable. We shall outline how Theorem 11.14 can be used to get information about the asymptotic (as N—+e) sizes of the components. Remark. Many different questions can be asked about iterating random func~ tions. For results and references see e.g. Knuth (1981), pp. 8, 518-5205 Pavlov (1982); Pittel (1983). I do not know references to the results out- Vined below, though I presume they are known. Given that a specified set 8 © {1,...,N} is a component of Fy, it is clear that Fy restricted to {],...,N}\B is uniform on the set of functions from {1,...,N}\B into itself. Thus N>1, satisfies the consistency condition (11.12). However, the consistency condition (11.5) is not satisfied exactly, but rather holds in an asymptotic sense, To make this precise, for KK 2. gk, say, as Noe. (nk, For each N the family KN) is consistent in the sense of (11.5), and so Lerma 11.21 implies that (R*; K>1) is consistent in that sense. It is then not hard to show (11.22) Lenma. (RS; k>1) is consistent in sense (11.12). Then Theorem 11.14 is applicable to (A*). To identify 9, we need (17.23) Lemma. P(1 and 2 in same component of RN) > 2/3 as N=. Then Lemma 11.21 implies p(Re {1,2}) = 2/3, and so (11.13) identifies o= 3 : Now Theorem 11.14 gives information about (8%). For instance, writing ef for the component of 8% containing 1, (11.17) says (11.24) where T has density f(t) = }(-t)"/%, o 9 as Nome K Fixed, where 1 fl is total variation distance (5.5). By exchangeability, a(cling2,...5K) is distributed as the sum of K-1 draws without replacement95 from an urn with #cN-1 "I"s and N-#C}! "O's. Let Vy x be the corresponding sum with replacenent. As -N—>=, sampling with or without replacement become equivalent, so (b) feel g2,..-K1)) = Ly Qilg + 0 as Nove K Fixed. Now Vy x 18 conditionally Binomial (K-1,(-1)"!(#C4-1)) given #c§, so by the weak law of large numbers (c) Tim Tim sup E|K Wy p= Tacl] = 0. Keo Neo o Properties (a)=(c) lead from (11.24) to (11.25). The same argument establishes the result corresponding to (11.15); if (Hh Meee) are the sizes of the components of Fy arranged in decreasing order, then NT(MIAMS,...) converges in distribution to the Poisson- Dirichtet(S) process. Let us outline the proofs of Lemmas 11.21 and 11.23. Let Xy = 1, Hye Ey(yapd> $1 7 mintns X,(u) € (yl) eee Xy(a))}. Then (fq) ts if.d. uniform unti? time S,, and we get the simple formula n-1 P(sy2n) = 20-9) : Of course this is just "the birthday problem." Calculus gives wl/2s, 2. 8, where S; has density f(s) = s-exp(-s7/2) « How Tet Yp = 2s Yq = Fyllqel» Spt mintn: Ya(a) © pl) seeeaNs, (us EM: Yolw)aeeer¥y q(w)}3, and let Ay be the event {Y, os, 1 = Ss. 2 {1 and 2 in same component of Fy}. Again (¥,) is i.i.d. uniform until time S,, there is a simple formula for P(Ay.Sp=n{S;= 4), and calculus gives (AysS;.S,) > (A.8,.8,). where the Timit has density96 P(A, SE (SpoSotdsy)» $y €(5545)4d5,)) = sfrexp-Hs, 459)” as ds, bist eAreeer eee Ty ui 1 ee ae Integrating this density gives P(A) = 2/3, and this is Lemma 11.23. Call the process (Xp, 1, where c(A) >0 for 2>1. Thus (c.f. 11.17) the partition into components of these random graphs cannot be fitted into the framework of Theorem 11.14. It would be interesting to know which classes of random graphs had components following the Poisson-Dirichlet distribution predicted by Theorem 11.14.97 PART II] The class of exchangeable séquences can be viewed as the class of distributions invariant under certain transformations. So a natural generalization of exchangeability is to consider distributions invariant under families of transformations. Section 12 describes some abstract theorys Sections 13-16 some particular cases. 12. Abstract results In the general setting of distributions invariant under a group of transformations, there is one classical result: that each invariant measure js a mixture of ergodic invariant measures. We shall discuss this result (Theorem 12.10) without giving detailed proof; we then specialize to the "partial exchangeability" setting, and pose some hard general problems. Until further notice, we work in the following general setting: (12.1) S$ isa Polish spaces T is a countable group (under composition of measurable maps T: SS. call a random element X of § invariant if (12.2) To) 2x; Ter Call a distribution » on S$ invariant if it is the distribution of an invariant random element, i.e. if (12.3) Fu)eus TET where T is the induced map (7.3). Let M denote the set of invariant distributions, and suppose M is non-empty. Call a subset A of S$ invariant if98 (12.4) TIA) =A3 TET. The family of invariant subsets forms the invariant o-field J. Call an invariant distribution » ergodic if (12.5) u(A) =O orl; each AGT. We quote two straightforward results: (12.6) If u is invariant and if A is an invariant set with u(A) > 0 then the conditional distribution u(+|A) is ‘invariant. (12.7) If uw is ergodic, and if a subset B of S is almost invariant, in the sense that 1(B) = B u-a.s., each TET, then u(B) = Oorl. The two obvious examples (to a probabilist!) are the classes of stationary T= (Tyg KEZ), where T, is the "shift by k" map taking (xj) to (x44) and of exchangeable sequences. To obtain stationarity, take S Then a sequence X= (X,: 4€Z) is stationary iff it is invariant under T, and the definitions of "invariant o-field", "ergodic" given above are just the definitions of these concepts for statfonary sequences. To obtain exchangeability, take $=", T= (T,), where for e finite permutation m the map T, takes (x,) to (x,¢jy)+ Then a sequence X = (X,)_ is exchangeable iff it is invariant under T. The invariant o-field J here is the exchangeable o-field of Section 3; the ergodic pro cesses are those with trivial exchangeable o-fields, which by Corollary 3.10 are precisely the i.i.d. sequences. Returning to the abstract setting, the set M of invariant distributions is convex:99 (12.8) uy Wy EM implies w= cu) +(I-c)uy EMS Oa}, a constants, are almost invariant in the sense of (12.7), which implies f is w-a.s. constant. This implies wy tHe and hence yp is extreme. Write E for the set of ergodic (= extreme) distributions. For an invariant random element X, write Jy for the o-field of sets {XGA}, AG J. We can now state the abstract result: the reader is referred to Dynkin (1978), Theorems 3.1 and 6.1 for a proof (in somewhat different notation) and further results. (12.10) Theorem. (a) E is a measurable subset of P(S). (b) Let. X be an invariant random element. Let a be ar.c.d. for ~ X given Jy. Then a(w) € E as. (c) To each invariant distribution » there corresponds a distribution A, on E such that we) = f oa, (ev). ie (4) The distribution A, in (c) is unique.100 (12.11) Remarks. (4) Assertions (b) and (c) are different ways of saying that an invariant distribution is a mixture of ergodic distributions, corresponding to the “strong” and "weak" notions of mixture discussed in Section 2. (ii) Dynkin (1978) proves the theorem directly. Mlaitra (1977) gives another direct proof. Parts (a), (c), (d) may alternatively be deduced from general results on the representation of elements of convex sets as means of distributions on the extreme points. See Choquet (1969), Theorem 31.3, for a version of Theorem 12.10 under somewhat different hypotheses; see also Phelps (1966). (iii) In the usual case where T consists of continuous maps T, the set M is closed (in the weak topology). But & need not be closed; for instance, in the case of stationary {0,1}-valued sequences E is a dense Gg. (iv) It is easy to see informally why (b} should be true. Fix X and Property (12.6) implies that a is alsoar.c.d. for T(x) given Jy, and so F(a(w)) = a(w) a.s.3 each TET. Since T is countable, this implies aw) EM as. Now for AG J we have a(w,A) = P(KGAIIy) = Tyg) aS- Then for a sub-c-field A of J generated by a countable sequence (12.114) A= o(AysAgeAgee2) CI we have P(w: a(w,A)=0 or 1 for each AGA) = 1. Unfortunately we cannot conclude101 P(w: a(w,A}=0 or 1 for each AES) = 1 because in general the invariant c-field J itself cannot be expressed in the form (12.11a) this technical difficulty forces proofs of Theorem 12.10 to take a Tess direct approach. Proposition 3.8 generalizes to our abstract setting in the following way (implicit in the "sufficiency" discussion in Dynkin (1978)). (12.12) Proposition. Let X, V be random elements of S, S' such that (xv) 2 (TOO,v) for each TET (so in particular X is invariant). Then X and V are conditionally independent given Jy. Proof. Consider first the special case where X is ergodic. i 0 < P(veB) <1 then X is a mixture of the conditional distributions P(XE+(VEB) and P(xe + |VEBS) which are invariant by hypothesis; so by extremality L(X) = L(X[V@B), andso X and V are independent. For the general case, let a be the class of distributions of cate on SxS such that (x*,V*) 2 (TOK*),V*) for each T. Let v bea r.c.d. for (X,¥) given dye We assert (12.13) la) EW as. Observe first that by the countability of T and (7.11), there exists a countable subset (h;) of C(SxS') such that (92.14) aeuw iff [rao =0, i21. Next, for A€J with. P(XEA) > 0 the hypothesis implies that the conditional distribution of (X,V) given (XeA} is in W?, Thus if vy, is 2102 read. for (VsK) given a finite sub-o-field F, of Jy, then ¥,(w) 4? a.s. Because Jy is a sub-o-field of the separable o-field o(X), it is essentially separable, that is Jy =F, 2.s. for some separable F,, Taking Finite o-fields F,+F,, Lema 7.14(b) says Yj vas. Since vy, eH a.s., (12.14) establishes (12.13). For 9 P(SxS') Tet 6; €P(S) be the marginal distribution. Then Wla) is arsed. for X given Jy, and so by Theorem 12.10 ¥4(w) € E a.s. But now for each w we can use the result in the special case to show that v(w) is a product measure; this establishes the Proposition. (12.15) Remarks. (i) In the case where T is a compact group (in particular, when T is finite), let T* be a random element of T with Haar distribution (1.e, uniform, when T is finite). Then for fixed s©S the random element T*(s) is invariant; and it is not hard to show that the set of distributions of T*(s), as s varies, is the set of ergodic distributions. This is an abstract version of Lemma 5.4 for finite exchangeable sequences. (ii) In the setting (12.1), call a distribution u guasi-invariant if for each TET the distributions » and T(u) are mutually absolutely continuous. Huch of the general theory extends to this setting: any quasi- invariant distribution is a mixture of ergodic quasi-invariant distributions. See e.g. Blum (1982); Brown and Dooley (1983). So far we have been working under assumptions (12.1). We now specialize. ‘Suppose (12.16) 1 is a countable set, T is a countable group (under convolution) of maps y: I—+I, and $ is a Polish space. By a process X we mean a family (X;: 1€1) with X, taking values in S.103 The process is invariant if (12.17) 1B Oye TED, each VET. To see that this is a particular case of (12.1), take S*=S', and take T to be the family (y*: yer) of maps S*—+S*, where y* maps (x;) to (xycgy)s Note that y* is continuous (in the product topology on st), and so the class of invariant processes is closed under weak convergence. Obviously any exchangeable process X is invariant, We use the phrase partially exchangeable rather loosely to indicate the invariant processes when (1,2) is given. Our main interest is in (12.18) The Characterization Problem. Given (I,T), can we say explicitly what the partially exchangeable processes are? Remarks. (a) To the reader who answers "the ones satisfying (12.17)" we offer the following analogy: the definition of a "simple" finite group tells us when a particular group is simples the problem of saying explicitly what are the finite simple groups is harder! (b) In view of Borel-isomorphism (Section 7), the nature of S is essentially unimportant for the characterization problem: one can assume $ = [0,1]. Theorem 12.10 gives some information: any partially exchangeable process is a mixture of ergodic partially exchangeable processes. This is all that js presently known in general. But there seem possibilities for better results, as we now describe. Suppose we are given a collection fy of invariant processes. How.can we construct from these more invariant processes? In the general setting (12.1), the only way apparent is to take mixtures. Thus it is natural to ask

Aldous Exchangeability

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aldous Exchangeability

Uploaded by

Copyright:

Available Formats

You might also like