You are on page 1of 5

Review of Gaussian random variables

If x is a Gaussian random variable (with zero mean), then its probability


distribution function is given by
1 2 2
P (x) = √ e−x /2σ . (1)
2πσ
Note that Z ∞
P (x) dx = 1, (2)
−∞

(If x were Gaussian with non-zero mean, you could remove the mean by re-
defining x ← x − hxi, and then put it back again later.) The moments are
easy to compute, and they all depend only on the standard deviation σ. All
the odd moments vanish,
hx2m+1 i = 0 (3)
and the even moments are easily computed by (for example) taking the
derivatives of (1) with respect to 1/2σ 2 . One finds that hx2 i = σ 2 , hx4 i = 3σ 4 ,
etc.
Why be Gaussian? First of all, many things are. For example, the sin-
gle point statistics of velocity in turbulent flow are found to approximately
Gaussian. This is often considered to be a manifestation of the Central Limit
Theorem, which we prove below. Second, Gaussian distributions are easy to
handle mathematically. Many statistical theories of turbulence can viewed
as expansions about Gaussianity.
For any (not necessarily Gaussian) probability distribution P (x), the
characteristic function is defined as
Z ∞
φ(ω) = P (x)eiωx dx (4)
−∞

Thus φ(ω) is the Fourier transform of P (x). It follows that


Z ∞
0
φ (ω) = ixP (x)eiωx dx, φ0 (0) = ihxi
Z−∞∞
00
φ (ω) = (ix)2 P (x)eiωx dx, φ00 (0) = −hx2 i (5)
−∞

1
etc. Thus, knowledge of the probability distribution is equivalent to knowl-
edge of all its moments. For the Gaussian distribution (1) we obtain
2 ω 2 /2
φ(ω) = e−σ . (6)
The characteristic function is often used to do proofs. As an example,
we will prove the central limit theorem. Let x1 , x2 , . . . , xn be n independent
but not necessarily Gaussian random variables all having zero mean and the
same distribution function P1 (x). Consider the random variable defined as
their average:
1
x = (x1 + x2 + · · · + xn ) (7)
n
The Central Limit Theorem says that, as n → ∞, x becomes Gaussian even
though the xi are not.
Let P (x) be the unknown distribution of x, and let φ(ω) be its charac-
teristic function. Then
Z
φ(ω) = P (x)eiωx dx
ZZ Z  

= · · · P (x1 , x2 , . . . , xn ) exp (x1 + x2 + · · · + xn ) dx1 dx2 · · · dxn
n
ZZ Z  

= · · · P1 (x1 )P1 (x2 ) · · · P1 (xn ) exp (x1 + x2 + · · · + xn ) dx1 dx2 · · · dxn
n
Z Z Z
iωx1 iωx2 iωxn
= P1 (x1 ) exp( )dx1 P1 (x2 ) exp( )dx2 · · · P1 (xn ) exp( )dxn
n n n
  ω n
= φ1 (8)
n
where φ1 (ω) is the Fourier transform of P1 (x). Since
σ12 2
φ1 (ω) = 1 + 0 − ω + ··· (9)
2
where σ12 is the variance of any of the xi , we have
ω  σ2 ω2
φ1 = 1 − 1 2 + ··· (10)
n 2 n
Thus, by (8), as n → ∞,
n
σ2 ω2

φ (ω) → 1− 1 2 (11)
2 n

2
Since as n → ∞  s n
1+ → es (12)
n
we find that
2 2 /2n
φ (ω) → e−σ1 ω (13)

and therefore (cf. (6)) x is Gaussian with standard deviation σ = σ1 / n.
QED
Now let x1 , x2 , . . . , xn be n random variables with probability distribution
function !
1X
P (x1 , x2 , . . . , xn ) = C exp − Aij xi xj (14)
2 ij

The x1 , x2 , . . . , xn are said to be jointly Gaussian. With no loss in generality


we can assume that Aij is symmetric. Then the matrix A has real eigenvalues
and orthogonal eigenvectors (which can be made orthonormal). That is

Ae(i) = λ(i) e(i) (15)

where T
e(i) e(i) = δij (16)
and the T means transpose. It is possible to transform from the original
variables x1 , x2 , . . . , xn to new variables y1 , y2 , . . . , yn in which the quadratic
form X
Q= Aij xi xj (17)
ij

is diagonal. The transformation is given by


X (j) X
xi = yj ei = Uij yj (18)
j j

Thus yj is the amplitude of the eigenvector e(j) in an expansion of the column


vector (x1 , x2 , · · · , xn )T in terms of the eigenvectors. We write this in matrix
notation as
x = Uy (19)
with the understanding that x and y are n-dimensional column vectors. The
matrix U is defined by
U = (e(1) e(2) · · · e(n) ) (20)

3
Thus the column vectors of U are the eigenvectors of A. By the orthonor-
mality of the column vectors, we have

UT U = I (21)

That is, the transform of U is equal to its inverse. Matrices with this property
are called unitary matrices.
Now we compute

Q = xT Ax
= (U y)T A(U y)
= yT U T AU y
= yT Dy
X
= λ(i) yi2 (22)
i

where
U T AU = D ≡ diag(λ(1) , λ(2) , . . . λ(n) ) (23)
That is, D is the diagonal matrix with the eigenvalues as its diagonal com-
ponents.
The probability distribution of y1 , y2 , . . . , yn takes the form
!
1 X
P (y1 , y2 , . . . , yn ) = C exp − λ(i) yi2 (24)
2 i

Since this factors into functions of each yi , we easily compute the normaliza-
tion constant
(λ(1) λ(2) · · · λ(n) )1/2
C= (25)
(2π)n/2
and the covariances
1
hyi yj i = δij (26)
λ(i)
In matrix notation
hyyT i = D−1 (27)
What are the moments of our original variables xi ? By the linearity of the
transformation from y to x, we have that

hxi i = hxi xj xk i = 0 (28)

4
In fact, all the odd moments vanish. For the second moments we have using
(27)

hxxT i = hU y(U y)T i


= hU yyT (U )T i
= U D−1 U T (29)

But by (23) and (21), this is

hxxT i = U (U T AU )−1 U T = A−1 (30)

That is
hxi xj i = A−1
ij (31)
Finally we consider hxi xj xk xm i. We have
X
hxi xj xk xm i = hUir yr Ujs ys Ukp yp Umq yq i
rspq
X
= Uir Ujs Ukp Umq hyr ys yp yq i (32)
rspq

where the summation convention is in effect. But we know that the yi are
independent random variables. Thus

hyr ys yp yq i = hyr ys iδrs hyp yq iδpq + hyr yp iδrp hys yq iδsq + hyr yq iδrq hys yp iδsp (33)

(which is true as a special case when r = s = p = q). Substituting this back


into (32), we conclude that

hxi xj xk xm i = hxi xj ihxk xm i + hxi xk ihxj xm i + hxi xm ihxj xk i (34)

A similar factorization rule applies to all the higher even moments, but we
shall find particular use for (34).