Professional Documents
Culture Documents
Stochastic Processes Doob
Stochastic Processes Doob
5), we obtain a new Riemann sum Rf, 5p ° + * 5), and we shall show that for most values of f, in a sense made precise below, R, is a good approximation to the integral of Jif 6 is small. To show this it is convenient to define MO=ft-—b+a@, b| ds | [Ns, + fis + 0] de ' oO ue fas Tha 1 )~ fds + O| dt + 2elb— a) 8 wot ~> 2e(b— a), (6 > 0). Since ¢ can be taken arbitrarily near 0, (2.9) is true if R, is replaced by Rj, and therefore is true as written, in view of (2.8). The following theorem applies this result to stochastic processes. THEOREM 2.8 Let {w,, a < ¢ 0 corresponds a choice of the s;'s, say 8; = ty, with ’ QI b ) Se. The first statement of the theorem is a trivial application of what we have proved. To prove the second statement let u(t, @) be the absolute value in (2.10). Then (2.10) implies that, if ¢ > 0, the Lebesgue measure§2 THE SCOPE OF PROBABILITY MEASURE 65 of the ¢ set where u > ¢ goes to 0 with d for almost all. It follows that the (t, w) measure of the (r, w) set where u(t, w) > € goes to 0, that is, baa lim f Plu(t, o) > e} dt a0 y 0. Then, if 6 is sufficiently small, Plu(t, o) =e} ¥ and that.7 ,’ contains all subsets of its sets of probability 0. [1 will sometimes be necessary to suppose that X is a closed point set of the infinite line (or plane in the complex case), and in speaking of such a closed set we shall consider that the finite line has been closed at both ends, by the two points ~ 00, + 00, and that the plane is the direct product of the coordinate axes, closed in this way. The problem we discuss is how to enlarge the Borel field ¥',, to obtain a class F, of measurable « sets which will make the process separable and measurable. It has been shown that, if X is the finite line (and the proof is even applicable if X contains at least two points), if ¢ is any real number, and if / is any non-denumerable parameter set, then the « set {€(0) Se, te D} has probability 0 if it is in the class -F,’. It follows that, if the process {®,a<1< b, X,F_"} (with X containing at least two points) is§2 THE SCOPE OF PROBABILITY MEASURE 69 separable, if {r,} is a sequence of parameter values satisfying the conditions of the separability definition, and if J is an open interval of parameter values, then P{LU.B. #,(@) = 00} = P{L.U.B. #,(d) = co} = 1. tel tet This may happen, and in fact the process under consideration will be separable for certain assignments of finite dimensional probability distri- butions, but this does not happen in the most interesting cases. It has been shown, if X is the finite line (and again the proof is applicable when- ever X contains at least two points), that the process {#,,a << b, X,Fy'} is not measurable for any assignment of finite dimensional probability distributions. These facts make clear the advisability of enlarging the field of measurable sets. The method of enlargement is described in the next paragraph. Let I° be an & set of outer measure | relative to the given field Fy’ of measurable sets, that is, we suppose that the relation 1 CMe Fy. implies that P{M} = 1. Define the Borel field ¥, as the class of all sets A of the form RaAP+AWO-1, Aye F,’ and for each such A define PA} = P(A). It is easily shown that P,{A} is uniquely defined in this way, and is a probability measure, that F,” C.¥,, and that PAA) =PfA}, Ne Fy’. The stochastic process {#,, fe T} with ¥ 7’ replaced by .#, and P{-} by P,{} will be called a standard extension of the given one. The standard extension depends on the choice of I’, Note that a standard extension of the %, process still is of function space type. The only difference between it and the given process is that more classes of f functions have been assigned probabilities. In this approach the counterpart of Theorem 2.4 is: Tueorem 2.4 Let {%,, tT, X,Fq'} be a process of function space type, with X a closed set of the infinite line (or plane in the complex case). There is then a standard extension of the process which is separable relative to the closed sets. We omit the proof of this theorem. (It is essentially the same theorem as Theorem 2.4.)70 DEF! 1ON OF A STOCHASTIC PROCESS u The problem of measurability is solved in an equally satisfactory manner. In fact the following theorem can be proved. THEOREM 2,9 Let {,, ¢€ T, X, F} be a process of function space type, with X a closed set of the infinite line (or plane in the complex case), and T a Lebesgue measurable set. There is a standard extension of tie process which is separable and measurable if and only if there is a measurable standard modification. The proof will be omitted. We conclude the discussion of processes of function space type by applying the results to a rather trivial example already considered in this and the previous section. Let {#,, —co << 00, X,F,"} be a process of function space type, with ¥,’ as defined in the above discussion. Suppose that P{E(@) o If the range V’ of the function values is the single point 0, the function space © contains only a single function, the identically vanishing one. In this case —O <1 w. P{¥,(@) == 0,1 <1 < oO} = and the , process is separable and measurable. If X contains a second point, the £, process is neither separable nor measurable, and the probability P{F(H) = 0,—-1 << eo} is not defined. This probability will be defined, and will be 1, for every separable standard extension of the process. It is easy to see that the @ set [in terms of which the standard extension is defined can be taken as the set containing only the identically vanishing function. This extension makes the process separable. Every separable standard exten- sion is measurable, by Theorem 2.5. On the other hand, a second standard extension can be defined in which the set I just used is replaced by its complement. With this definition P{E(G) = 0,—co «+ ty is any finite T set, the matrix [1(t,,, t,)] is non-negative definite. There ix then a Gaussian stochastic process (x, t€T} for which Eta} = u(r) @.1) fe} Efx,t)} ~ p(s) e() = 6s, 0. If the functions 4) and r(-, °) are real, there is a real Gaussian process ing (3.1). Inany case there is a complex Gaussian process sfving (3.1) and also Efx,2}} = neo. Suppose first that j(7) is real and that r(s, ¢) is real and satisfies (a) and (b) of the theorem. If 4, > + , fy is any finite ¢ set there is then an AN variate Gaussian distribution with mean values j1(4), ° - +, #(ty) and covariance matrix [r(tq,t,)). This is the distribution function with characteristic function x N Hen tyAndg bE Eyl) 1 m= (33) e™ If the matrix [/(/,,, ¢,)] is non-singular, this distribution has density N |a, | 2-4 1 Ayn nhl ra Mb MT a MUD) G4) awe where the matrix [a,,,], with determinant |q,,,|, is the inverse of the matrix [r(t,., f,)}. Moreover, if M< N the distribution defined by the characteristic function (3.3) assigns to #,,* * +, %, a Gaussian distribution with means j:(t,), > * +, (ty) and the covariance matrix [r(t,,, ¢,)] with m,n +, a, is the same as that assigned to x,,- * -, 2,,. Consequently, the consistency conditions of Kolmogorov (I §5) are satisfied, and there is a real Gaussian process satisfying (3.1), with « space a certain coordinate space. The distributions involved are determined uniquely by the assigned means and covariances. We complete the proof by dropping the hypothesis that the process is necessarily real, and defining a Gaussian process w satisfies (3.1) and (3.2). The process is obviously uniquely determined by these conditions,§3 GAUSSIAN PROCESSES 2B or, more exactly, the finite dimensional distributions involved are uniquely determined by these conditions. Note that according to these conditions E{{z,— e()}} = 0. The process will therefore only be real in trivial cases, in fact only when (0) is real and r(s, 0) == 0, in which case Plr(o)~— D} 1 be We shall define real random variables £,, 1; to satistyt EE} = RaQ} Ela) = Hee} E{EE} — EE JEED = Enon} — EejEQn) = ING(s, 0} E(En} — EESE(n) = — 13{r, 0}. The relations (3.5) imply (3.1) and (3.2) if «, = & + ij. To show that the &,, %, process really exists, we apply the criterion for real processes already proved. That is we show that the covariance matrix of any finite number of £;’s and 7's, as defined by (3.5), is symmetric and non- negative definite. It is no restriction to take corresponding pairs of &,'s and 77,’s, that is, we take G.5) Boe Sue Saye Ms 7 Oa Mey and investigate the corresponding 2N-dimensional covariance matrix [emul defined by (3.5). The matrix is obviously symmetric. Moreover, if Ay,* «+, Agy are real, aN x BX Prandndn = FX Nem OD} Aman + Avs mdnien) 1 mn=t mn +4 ie Selo DI Arrmdn —Awhnin) = 4S. Abr iY Gndn + Aare) mina Ac WAweman— Amarren) eM i ~ 2m, =4 5 ms tM Am = FivimAn + nen) me >0 + Throughout this book, Rand 3 will be used to denote “‘real part" and “imaginary part,” respectively.74 DEFINITION OF A STOCHASTIC PROCESS W since the matrix [r(t,,, 1,)] is non-negative definite. Hence the matrix [Pn nl 18 non-negative definite, as was to be proved. This theorem can be used to obtain Gaussian processes intimately related to given processes, as far as first and second moments are con- cerned. For example, according to the following theorem, if a y, process is given, a corresponding x, Gaussian process exists with the same means and covariances. This means that in a discussion involving only means and covariances of the y,’s, it is no restriction to assume that the y, process is Gaussian. If the y, process is real, the corresponding 2, process can also be supposed real. This means that, whenever two y,’s are uncorrelated, the corresponding 2,’s are independent. Whether the y,’s are real or not, the 2, process can be chosen to satisfy (3.2), and uncorrelated y,’s will then correspond to independent 2/’s. If the y, process is not real, the x, process is not real either, and the assignment of the x, means and covariances do not determine the ;, distributions. One token of this fact is that it is possible to superimpose the arbitrary condition (3.2). TuEoREM 3.2 Let {y,, t € T} be any stochastic process, with Ely <0, eT The parameter set T may be any finite or infinite aggregate. There is then a corresponding Gaussian process, with the same range of the parameter, but defined on a different measure space, whose random variables {x;} satisfy the equations Ef} = 0, G7) ste T. Eire Ey do If the y, process is real, there is a real x, process satisfying (3.7). In any case there is a complex Gaussian process satisfying (3.7) and the further condition (3.8) E(x} If the y, and x, processes are both real, and satisfy (3.1) or if both (3.7) and (3.8) are satisfied, the orthogonality of two y,'st implies independence of the corresponding 1/'s. If y, is replaced by y,—E{y,} in the above, zero correlation of two y,'s implies independence of the corresponding «,'s. More generally, if 0, 0 are any two random variables in the closed linear + The random variables u, v are said to be orthogonal if E{u?} =§3 GAUSSIAN PROCESSES 75 manifold of the y)’s and if 4, uz are the corresponding random variables in the closed linear manifold of the «,'s, then (3.7) (for all yy, y,) implies E{u} =0 Blut} = Efv,6,} and (3.8) (for all x,, «,) implies B.8) E(uu) (3.7) 0. Applying this theorem to the real and imaginary parts of the random variables of a given complex process, it is clear that there is a complex Gaussian process whose real and imaginary parts have the same covariance relations as the real and imaginary parts of the given process. Then (3.7) is certainly satisfied, and also G9) Efzx,} = Elyys). The exact covariance correspondence afforded by (3.9) in conjunction with (3.7) is not useful, however, and destroys the correspondence between orthogonality and independence. The proof of Theorem 3.2 is accomplished simply by noting that, if r(s, t) = E{y,g,}, then r(s, 1) = r(t, s), anid, if 2,,- + +, 2y are any complex numbers, x ‘ x % ban todemén = 5, El Yi}? mtn = E(IE Antal?) = 0 so that the hypotheses of Theorem 3.1 are certainly satisfied, if we take (1) = 0. The extension of the previous theorem involved in (3.7’) and (3.8’) is trivial, since the latter relations are obvious when », and », are linear combinations of the y,’s; the general case is proved by going to the limit. The theorem states in abstract language that there is a linear transformation taking the closed linear manifold of the y,’s into that of the 2's, taking y, into 2, for each t, which leaves the inner product E{v,5,} invariant. (The transformation is unitary.) One of the reasons why Gaussian processes are important is the simplification the Gaussian hypothesis brings to the theory of least squares approximation. Let the random variables x, y have a bivariate Gaussian distribution, with zero expectations. We suppose that x and y + The closed linear manifold determined by a set of random variables is the collection of random variables which are finite linear combinations of the given random variables or limits in the mean of sequences of such linear combinations.716 DEFINITION: OF A STOCHASTIC PROCESS nt are cither real or, if not, satisfy the relations E{xy}=0. Then the difference Efyé} 3.10) y-a, a= I. E{[a|?f has zero expectation and is orthogonal to and uncorrelated with and therefore independent of :. It follows that the conditional distribution of y for given x is Gaussian, with expectation aa and variance Gi Elly exit} = weiues[1— aes The bracket is between 0 and | by Schwarz’s inequality. More generally, if the random variables 2,,° > +,2,,y have a multivariate Gaussian distribution with zero expectations and if the variables are either real or, if not, satisfy the relations Efe) = Elay} <0, k= the difference, 3.10’) yo x Ah 7 has zero expectation and is uncorrelated with and therefore independent of 4, + +, x, for properly chosen a, + +, a, (see IV §3). It follows that the conditional distribution of y for given a4, ° « +, x, is Gaussian, with (3.12) Ely lays sa} = San The difference in (3.10’) is independent of and therefore orthogonal to every function f measurable on the sample space of 2,° + -, a, (in par- ticular to every Baire function of these variables) whose square is integrable. Hence BAN Rly— FP} = Elly Saey + CaN = Blly~ Saye) + ES oer F This equality shows that the problem of minimizing the left side of (3.13), the problem of /east squares approximation, is solved by setting f equal to the conditional expectation in (3.12), and that this solution is uniquely determined aside from the usual ambiguity in the definition of a condi- tional expectation. The Gaussian character of the distributions made it§3 GAUSSIAN PROCESSES 71 possible to infer independence from zero correlation. If we only suppose that 2, ++, 2, y are any random variables with E {lz} <0, — Efly?} < «0 the difference (3.10’) with the a,'s calculated in the same way is still orthogonal to 24, * *, 2, from which it follows only that (3.13) is true if fis a linear combination of the ws. Thus the random vz ble Saar, 1 which we shall denote by Efy|21,-- +2}, solves the problem of minimizing the left side of (3.13) for all linear combinations f of the 2,’s, the problem of linear least squares approximation. It will be necessary to extend the above discussion to the case of infinitely many conditioning variables in later chapters. Many concepts which are used in the theory of stochastic processes can be formulated in two ways, in a “‘strict sense” or in a “wide sense.” The general principle is the following: Suppose a y, process has a certain property P expressed in terms of variances and covariances. Suppose the corresponding complex Gaussian process given by Theorem 3.2 satisfying (3.7) and (3.8) has the corresponding but stronger property P’. Then P’ is called a strict sense property and P a wide sense property. (If the y, process is real, the corresponding Gaussian process can be taken as the uniquely defined corresponding real Gaussian process given by Theorem 3.2, satisfying (3.7).) The point is that a wide sense property implies much more when it is also supposed that the underlying distribu- tions are Gaussian. We shall see that a great many theorems have strict sense and wide sense versions, and that the distinction serves to organize and clarify the theory of stochastic processes. Example | Let 9, y, >» be mutually orthogonal random variables. We take this as a wide sense characterization, and find the corresponding, strict sense characterization by noting that, if an a; process is determined as in Theorem 3.2, the ,’s will be mutually independent. Thus a process with mutually orthogonal random variables (or mutually uncorrelated random variables if y;— E{y,} is considered instead of y,) is a process independent random variables in the wide sense. Example 2 The least squares analysis made above shows that the best linear least squares approximation Efy | x,,- - -,2,} is the wide sense version of the best least squares approximation. We carry the analysis further by evaluating the best least squares approximation in the non- Gaussian case. Suppose then that a,,- + ««,,y are any random variables with E{|¥,|?} < 00, and define Yo = Ely |t,- > se}. E{|yol} < ELE{l yl? | 21, - > +, ea}} = Elly?) Then78 DEFINITION OF A STOCHASTIC PROCESS u and it follows from I, Theorem 8.3, that y¥— yo is orthogonal to every function f which is measurable on the sample space of x,,* + *,%, and whose square is integrable. Hence (3.13) Etly— S13 = El(y— yo) + Yo S19) = Ely ol} + Ellyo—S so that the problem of minimizing the left side of (3.13’) is solved by setting f= yy. Thus the strict sense version of E{y | a4, - > +, xq} is E{y [24,°-, 2,}- The latter symbol is of course defined for any family of approximating random variables, finite or infinite, and solves the corresponding least squares approximation problem, since (3.13’) needs no change in the general case. The symbol Efy |—} will be defined for infinite families of approximating random variables in TV. It obeys the same rules of combination as the symbol E{y | —} since the two become the same in the real Gaussian case (and in the complex Gaussian case if the random variables all satisfy the condition that the expected value of the product of any pair vanishes). Aside from theorems which are strict sense versions of wide sense theorems, very few facts specifically true of Gaussian processes are known. For this reason no later chapter will be devoted to Gaussian processes as such, although many types of Gaussian processes will be treated. bles. These processes are discussed only in the discrete parameter cas, since the sample functions in the continuous parameter case are too irregular to arise in practice. The processes under discussion are thus simply sequences of mutually independent random variables. Most studies of the “foundations of probability,” which commonly identify the mathe- matical with the philosophical foundations, to the detriment of the former, have as their purpose the study of a sequence of mutually independent random variables with a common distribution function. We have already remarked that, if F,, Fy, + ** is any sequence of distribution functions, there is a sequence of mutually independent random variables with those distribution functions. Because of the independence hypothesis, the joint distribution functions of the x,’s are simply products of the F;'s._ It is quite feasible to study these processes without mentioning the word probability. For example, if s, = 2+ > * a,, the distri- bution function of s, is the convolution of those of a, + - +, #,, that is, (4.1) ws Pfs,(o) , can thus be carried on as a study of iterated convolutions, with no 7 mention of probability. To prove (4.1) we observe that, if x,°* +, 2, are the coordinate functions of the n-dimensional space of points (1+ + +) €n)s the probabilities of « sets are probabilities of (&,* + *, &,) sets, with P(A} = [> +f dF) + dEWE,)- A Then (4.1) is simply the evaluation by iterated integration of P{A} for A defined by the inequality >; <4. Finally, using the representation 1 theory of I §6, it is no restriction in proving (4.1) to assume that the a,’s are the stated coordinate functions. The proof given also proves the following slightly more general equation, which we shall have occasion to use: (4.2) Piao) SA, j= 1+ + 2 1, 5,() SA = | ame): [ Aba Sd Pa E- 5. Processes with uncorrelated or orthogonal random variables A process with uncorrelated random variables is one whose random variables are uncorrelated in pairs, that is, it is supposed that (5.1) E{\x/?} < 2 and that (5.2) E(x,é} = EfaJEE}, 5 #1. A closely related type of process is a process with orthogonal random variables, that is, one for which (5.1) is true and for which (5.3) E(x} =0, s At. If an x, process is a process with uncorrelated random variables, the ¥Y, process determined by y= %— Eley} is a process with orthogonal random variables. Thus, if x is a real random variable, uniformly distributed in the interval (0, 7), the random variables sin x, sin 2x, + +80 DEFINITION OF A STOCHASTIC PROCESS u constitute both a process with uncorrelated random variables and a process with orthogonal random variables (with zero means). On the other hand, the random variables 14 sina, 2+ sin 2a,- > + constitute a process with uncorrelated random variables but not one with orthogonal random variables. As already remarked in the introduction, the theory of probability is but one aspect of the theory of measure. This is particularly clearly exhibited at the present stage. The theory of orthogonal functions and series has occupied mathematicians for over 100 years, but has never been considered a part of probability. From the present point of view, a sequence of orthogonal functions is a sequence of orthogonal random variables, and theorems like the measure theoretic Riesz-Fischer theorem become probability theorems. The probability approach, however, imposes the rather pointless condition that the basic measure space have total measure |. Up to the present time this condition has been imposed by the physical interpretation of probability, although many of the ideas and definitions used in probability theory do not intrinsically presuppose the condition. Certain physical situations (such as a free particle in quantum mechanics which is equidistributed throughout space) even suggest the usefulness of a probability measure with a value + 00 for the whole space. For these reasons and because of later applications in this book, the hypothesis of finiteness of measure is dropped in IV, where orthogonal series are discussed, and to avoid confusion the customary language of measure theory is used. As already noted in §3, the processes of this section are the wide sense versions of processes with independent random variables. Although this terminology is not ordinarily used, the point of view will prove illuminating. 6. Markov processes (a) Strict sense \n the following we shall consider only real processes ; the changes to be made in going to the complex case will be obvious. A (strict sense) Markov process is a process {,, t ¢ T} satisfying the following condition: for any integer n > 1, if )<+ ++ <1, are parameter values, the conditional x,, probabilities relative to %,,* + *, %,,, are the same as those relative to X,__, in the sense that for each 2 (6.1) Pla (@) SAL aay? a) = Pleo) SAL am, ) with probability |. Then, if T; C T, the process {x,, t€7;} is also a Markov process. When a process is called a Markov process, “strict sense” is always understood,§6 MARKOV PROCESSES 81 Condition (6.1) is at first glance weaker than (6.1) Pla () SA ae, } = Mtn te, (to hold with probability 1), where the random variable on the right is a Baire function of «,,, or more generally is measurable on the sample space of x, Actually, however, fis then necessarily given by the right side of (6.1). In fact, if the operation E{— | «,, } is performed on both sides of (6.1'), the right side is unchanged, and the left side becomes the right side of (6.1), by 1 (10.9). Because of the equivatence of (6.1) and (6.1), a Markov process is sometimes loosely described as one in which the conditional probability on the left in (6.1) depends only on a, («). The condition (6.1) is also equivalent to the condition that, if s; < 52 and if A is arbitrary, then (6.2) Pix,(0) s, and if Ef|y|} < <0, then (6.3) Ey | 2,1 <5} = Ely | x} with probability 1. This equality contains (6.2) as a special case, and thus, as we have seen, implies (6.1). We shall calf the validity of (6.3) the Markov property. The Markov property will be derived first under the assumption that T contains only finitely many points > s. Let these points be s=uy Ay and, using (6.2), Pier, (0) < Igy %y,(@) S Ay |X, OS 8} = Pfx,(w) 1. We then show that it is true with probability | for m replaced by m + 1. Let A, A, be any real numbers, and define y, z by yo) =1 ita (o) Sa Aw) = 1 if ao) Say flor ymHl =0 otherwise = 0 otherwise. Then, applying the induction hypothesis to the 2, process, and also to this process with all parameter points < u, deleted except the point s = up, (6.5) Efz | a,. 1} = Efe fey) = Ble be a)§6 MARKOV PROCESSES 83 with probability |. Now (6.6) ELyEfe lant s. Then the result just obtained implies that (6.4) is true with probability | if M is measurable on the sample space of some finite aggregate of x,’s with ¢>s. It then follows (I, Theorem 9.6) that (6.3) is true with probability 1 if E{|y|} < 00, and if y is equal with probability | to an @ function measurable with respect to the Borel field of @ sets generated by the class of sets M, that is, if y is measurable on the sample space of the x's with 1 > s, as was to be proved. The definition of a Markov process given above is one-sided int, An alternative definition, stated in the form of a necessary and sufficient condition, is the following, which, because it is two-sided, shows that, if the process {x,, £¢ T} is a Markov process, that with variables {2_,} is also a Markov process. A process is a Markov process if and ve if, for any positive integers, m,n, and real numbers dys + *s Aimy Hass Pens fp <5 I< << 1, are parameter values, then (6.8) P{z,(o) < pf = Myo ms &(@) < pak = yn |x} = Plafo) + +, %), are mutually independent; that is, for xe) “ithe present) known, the past and84 DEFINITION OF A STOCHASTIC PROCESS u future are independent of each other. However, the ambiguity in the definition of conditional probabilities makes the interpretation of such statements rather delicate. The following is an intuitive and suggestive proof of the equivalence of (6.8) and the Markov property, in the spirit of the time interpretation of (6.8). A rigorous proof will also be given. For given 2c), (6.8) states that a certain past event and a certain future event are mutually independent, in view of the multiplying of the corre- sponding probabilities (all conditioned by the knowledge of the present). An alternative condition for independence is that certain conditional probabilities do not actually involve the conditioning variables; specific- ally, if P denotes probabilities conditioned by 2,, this version of the independence condition becomes (6.81) POA, (0) < tty k= Me sya} = POMln, (0) < pty k= bs yh with P! probability 1. We have seen in I §10 that this equation can be written in the form (6.8) Pl (o) +, m, aw) <4}, and we have seen in I §7 that equality in (6.12) for these sets M is sufficient to identify the integrand on the right with E{z | a,,,° °°, 2,,, tq, that is, Plx,(0) < yk = No fay ey ee = Pie(0) Sepk =o yn[ a} with probability |. This is (6.1) in a slightly different notation. Example | Let yy, y2,° «+ be mutually independent random variables. Then the process {y,, => 1} is a Markov process, and in fact Pfy,(@) <3} is one version of both P{y,(@) +2} = Fy(A— 2). This is intuitively clear, but it may be instructive to give a formal proof. We must show that, if A is an @ set measurable on the sample space of %,* + *, %, Or, which amounts to the same thing, measurable on the sample space of y,,+ + +, y. then (6.13) | FuQ— 2) dP = Plo e A, xf) <2}. A86 DEFINITION OF A STOCHASTIC PROCESS 1 To derive this equation it will be convenient to assume that the s + I random variables t Yas Ye D Ys v1 are the coordinate functions in the s + |-dimensional space of points (hs + +s Mees) with Plo nde dys [oo fad + dF Gind where Fo= Fay felons Far = Fu for every 5 + I-dimensional Borel set 4. According to the representa- tion theory of I §6 it is no restriction to make this assumption. Then, translating (6.13), it is sufficient to prove that Jaron + [asia J Fos ~ Sn) dend = Ply: + nde A, ao) SA for A an s-dimensional Borel set in (7, + > +, 7.) space. It is sufficient to prove the equality for A an s-dimensional interval determined by the inequalities . —O< HSM, fs, and the equality then becomes a special case of (4.2). Example 3 Let P,, Pz, - + + be functions of &, A, where & is a real number and A a linear Borel sct, with the following properties: (i) P\(E, A) defines a probability measure of A for fixed &; (ii) P(E, A) defines a Baire function of for fixed 4. Let P be a probability measure of linear Borel sets. Then there is a Markov process {w,, 1 > 1}, such that P,(x,, A) is one version of Phe, ia(oo) € A | ,} and that P{2x,(w) « A} = P(A). To sce this we observe first that, for every m, with the obvious conventions, J PED | Pan dé) +f Pm a En-ar dE) 4 defines a probability measure of m-dimensional Borel sets A,, in the space of points (&, ** %; &,). We define a sequence of random variables {x,, 1 > 1} such that the distribution of 24, - +, 2,, is given by the above multiple integral. This is possible, with the 2,’s the coordinate functions§6 MARKOV PROCESSES 87 of infinite dimensional space, if the above measures are mutually con- sistent (sce I §5), and it is trivial to verify that they are. It is then also trivial to verify that the x, process obtained in this way is a Markov process, with Plans) € A | ty, ° > +5 By} = Palms A) with probability 1. In particular, if P and the P,’s are given by densities, so that PCA) = | pla) dn, PAE A) = J PAE. mdr, 4 a the distribution of 2,,- + +, #,, has density PCE DP Et» &2)* + * Pma(Em—a» Em) In this form the significance of the Markov property lies in the fact that Ps does not involve &,- + +, &.. We have seen that the a, process reversed in time is also a Markov process. In the density case just described, the reversed transition probabilities can be exhibited directly in an elegant form. The conditional distribution of a, for 24,3) = Enst has density (in €,,) (6.14) Joe fener nd + + + Pa sQrnan Eo) dng > + dhip ——— pulEnr Engst) ++ Palin» Enix) Une + * dity fo ++ f pede ne) (Here we have assumed that the p,’s are Baire functions of their pairs of variables. Actually it is casily seen that these densities can always be chosen to have this property.) Note that the forward transition proba- bility density function p, can be chosen quite independently of the initial probability density function p, so that a change of p does not force a change of p,. Once the p,’s are chosen, however, the reverse transition probability density function (6.14) will in general depend on the choice of p. Now let {x,, > 1} be any Markov process with the indicated para- meter set. We have seen in I §9 that there is a function P,, of a linear Borel set A and a real variable, the wide sense conditional distribution Of aq, Telative to 24, which determines a Baire function when A is fixed and determines a probability measure when the variable is fixed, such that PErpi(@) € A | 2,3 = PalZny A) with probability 1, for each n, A. Let P be the distribution of «,, P(A) = P{a,(w) € A}.88 DEFINITION OF A STOCHASTIC PROCESS i We show (cf. Example 3) that, if x is a Baire function of x,,+ + *, at,,, then (6.15) Efe} = f PCE) | Pub aba) ++ f 2 Ema ME Using the Markov property, the first integration on the right yields a version of Ea | + + + ®,-1} which is a Baire function of 2, > - Oy ay according to our work in I §9. The next integration yields BLE fe [ay aud faye + 5s ana] = El Lt stn gh and proceeding in this way the last integral yields E(E(x | 2,}) = Efe}, as was to be proved. Thus, if P, Py, Py, * - «are used as in Example 3 to define a Markov process whose variables are coordinate variables, the Markov process so obtained is a representation of the given process. Now let {x,, ¢€ T} be any Markov process. Then, if s <7 <1, the transition probability satisfies the equation (6.16) Peo) € A [4} = E[P((o) A |x) | x,} with probability 1. In fact, the conditional probability on the right is also Pix(o) € A | xy 2}, in view of the Markov property, so that (6.16) is a consequence of the general theorems on conditional expectations and their iterations; see 1(10.9). This equation is known as the Chapman-Kolmogorov equation, or, in particular cases, as the Smoluchovski equation. Generalizing the sequence of functions P,, Pz, - - + of the preceding paragraph, we now observe that there is a function P of &, s, A, t, with s < 1, which determines a probability measure of the linear Borel set A when &, s, ¢ are fixed, and determines a Baire function of & when s, A, ¢ are fixed, such that 6.17) Plz, 8; A, 1) = Piao) € A |, with probability 1, for each s, A, f, Then (6.16) can also be written in the form (6.18) PES AD = [PG rs 4 OPES; db. This equation holds for £ not in some Borel set B (depending on s, 1, 7, A) with Plo) « B} = 0.§6 MARKOV PROCESSES 89 In the applications, one is usually given not a Markov process but transition probabilities in terms of which the process is to be constructed. Specifically, it is usually supposed that T has a minimum value f, and that a function P is given, satisfying (6.18) identically in the free variables. If an initial distribution is given at fo, a Markov process with the transition probability function P is defined as follows. If @<+-+<1,, the random variables a, * + +, 2,, are to have a joint distribution determined by the preassigned 2,, distribution and the preassigned transition proba- bilities as in (6.15) (with 2 defined as t on an m-dimensional Borel set, and 0 otherwise, that is, as in Example 3). We have already remarked that it is then trivial to verify that this definition will make the random variables 2), °° *, a, a Markov process. The distributions assigned in this way are mutually consistent, that is, the Kolmogorov consistency conditions are satisfied, so that there is actually a (Markov) process with the assigned initial and transition probabilities (1 §5). The transition function P is frequently supposed to be given by a density, PES: A= | psi nD dn, a and in this case (6.18) reduces to (6.19) PE si mt) =| oer my OPE, 83 7) db. Equation (6.19) is frequently justified intuitively by the statement that the probability of a transition from & at time s to 7 at time ¢ is the probability of a transition to ¢ at the intermediate time 7 multiplied by the probability of a transition from ¢ at 7 to 1 at rt, summed over all values of £. There is nothing wrong with this statement, which is simply an imprecise paraphrase of (6.19), but it is imprecise enough to have led some unwary students to believe that (6.19) is true of all stochastic processes. Note, however, that without the Markov property the first factor under the integral sign in (6.19) would depend on & and s, in general. A sequence of random variables {2,} is said to constitute a multiple Markov process if there is an integer » such that for each 4 and each 7 (6.17) Pheer) SA | yt Anes f= PaO) SA | atyse? +15 Baad with probability 1. If »= 1 the process is then a Markov process (sometimes also called a simple Markov process). The generalization is not very significant, because the (vector) process with random variables {8,}, By — yy * >“ Bay 4) has the Markov property [defined for vector processes by the obvious modification of (6.1)]. Thus multiple Markov processes can be reduced to simple ones at the small expense of going to90 DEFINITION OF A STOCHASTIC PROCESS n vector-valued random variables. In particular, in the important case of random variables {z,} which take on only a fixed finite set of values (Markov chains), the 2, process will be one of the same type; the variables {@,} need not be considered vector variables but simply variables taking on N° values, if the a,’s take on N values. (b) Wide sense Let {x,, 1 € T} be a Gaussian process with zero means, E{x,} = 0, and which either is real or, if not, satisfies the equation E(zx} = 0 (cf. Theorem 3.2). To define a Markov process in the wide sense we must discover what apparently weaker property of the x, process, defined in terms of variances and covariances, is equivalent to the Markov property, or at least to an important special case of this property. We have already remarked (§3) that if 4) < + + + <1,, one version of E{x, | 2,5° * 5%), } is that linear combination of a, °° * 21,» Bay, |, * > +s a, } in the notation of §3, which is the closest to a, in the sense of minimizing ned n (6.20) E{|x,,— x ajx,)"} am 2 Ether (a, =— 1). Now the Markov property implies that (6.21) Efe, bts, = Ele, | 2_} with probability 1, and in fact considerably more, since the Markov property applies to the conditional distribution of x,,, not merely to its conditional expectation. However, the condition (6.1) is equivalent to (6.21) in the present case, as we shall sce in a moment. Hence the con- dition that the x, process be a Markov process can be written in the form 6.21’) Bix, |2 5%), } = Efe, |2,) with probability 1. This is a condition involving only variances and covariances, according to (6.20). Hence we shall say that any process, Gaussian or not, is a Markov process in the wide sense if E{|zx/|2} < 00 for all t, and if, whenever ty <+ + + < ty, (6.21/) is satisfied with probability |, We still must justify, for the original Gaussian x, process our statement that (6.21) implies (6.1). For this process, the difference @),— Efe, [tio 4 td is a Gaussian variable, with mean 0, and is orthogonal to and therefore independent of «,,+ + + Hence the conditional distribution of «,,, given that x,(@) = pr ty (0) = Ene is that of (6.22) y — Efx, [2 (@) = 80 5 ty, () = Ena}