Notes for Math 450

Lecture Notes 3
Renato Feres

1 Moments of Random Variables
We introduce some of the standard parameters associated to a random variable.
The two most common are the expected value and the variance. These are
special cases of moments of a probability distribution. Before defining these
quantities, it may be helpful to recall some basic concepts associated to random
variables.

1.1 Discrete and continuous random variables
A random variable X is said to be discrete if, with probability one, it can take
only a finite or countably infinite number of possible values. That is, there is a
set {x1 , x2 , . . . } ⊆ R such that

X
P (X = xk ) = 1.
k=1

X is a continuous random variable if there exists a function fX : R → [0, ∞)
(not necessarily continuous) such that, for all x,
Z x
P (X ≤ x) = fX (s)ds.
−∞

The probability density function, or PDF, fX (x), must satisfy:

1. fX (x) ≥ 0 for all x ∈ R;
R∞
2. −∞ fX (x)dx = 1;
Rb
3. P (a ≤ X ≤ b) = a
fX (x)dx for any a ≤ b.

If fX (x) is continuous, it follows from item 3 and the fundamental theorem
of calculus that
P (x ≤ X ≤ x + h)
fX (x) = lim .
h→0 h

1

The cumulative distribution function of X is defined (both for continuous and
discrete random variables) as:
FX (x) = P (X ≤ x), for all x.
In terms of the probability density function, Fx (x) takes the form
Z x
FX (x) = P (X ≤ x) = fX (z)dz.
−∞

Proposition 1.1 (PDF of a linear transformation) Let X be a continu-
ous random variable with PDF fX (x) and cumulative distribution function FX (x)
and let Y = aX + b. Then the PDF of Y is given by

1.

fY (y) = .

.

.

.

we find 1 fY (y) = fX ((y − b)/a). Then Y is a continuous random variable with PDF fY (y) given by fY (y) = fX (h(y))|h0 (y)|.1 (Expectation) The expectation or mean value of the random variable X is defined as P∞  i=1 xi P (X = xi ) if X is discrete  E[X] = R ∞  −∞ xfX (x)dx if X is continuous. First assume that a > 0. Fix a probability space (S. The higher dimensional generalization of the proposition was already mentioned in the second set of lecture notes. we have FY (y) = P (Y ≤ y) = P (X ≤ (y − b)/a) = FX ((y − b)/a). Since Y ≤ y if and only if X ≤ (y − b)/a. a Proof. 1. a The case a < 0 is left as an exercise.2 Expectation The most basic parameter associated to a random variable is its expected value or mean.  This proposition is a special case of the following. F. Define a new random variable Y = g(X). P ) and let X : S → R be a random variable. 2 . Differentiating both sides. Proposition 1.2 (PDF of a transformation of X) Let X be a continuous random variable with PDF fX (x) and y = g(x) a differentiable one-to-one func- tion with inverse x = h(y). fX ((y − b)/a). Definition 1.

Here are a few simple properties of expectations.2 (Waiting in line) Let us suppose that the waiting time to be served at the post office at a particular location and time of day is known to follow an exponential distribution with parameter λ = 6 (in units 1/hour). i For discrete random variables. It is natural to define the fair price to play one round of the game as being the expected value of X. of any type. 2. introduced in the appendix of the previous set of notes. 0 λ Therefore. (You do not need to know about the Lebesgue integral. For any real number a. the mean time of wait is one-sixth of an hour. Then X E[X] ∼ xi P (X ∈ (xi . P ). xi + h]). What is the expected time of wait? We have now a continuous random variable T with probability density function fT (t) = λe−λt .) So we will often denote the expected value of a random variable X. Then: 1. For this reason. The notation should be understood as follows. xi + h]. We are only using the notation. we need a uniform notation that represents all cases. you would make a sure profit by playing it long enough. by Z E[X] = X(s)dP (s). F. E[aX] = aE[X]. the same integral represents the sum in definition 1.1 (A game of dice) A game consists in tossing a die and receiv- ing a payoff X equal to $n for n pips. We will use the notation for the Lebesgue integral. The expected value is easily calculated to be: Z ∞ 1 E[T ] = tλe−λt dt = .50 i=1 Example 1.3 Let X and Y be random variables on the probability space (S. If X ≥ 0 then E[X] ≥ 0. Suppose that we decompose the range of values of X into intervals (xi . or 10 minutes. The fair price is then X 6 E[X] = i/6 = 21/6 = $3.Example 1.1. If you could play the game for less than E[X]. and if you pay more you are sure to lose money in the long run. Proposition 1. where h is a small step-size. It is a bit inconvenient to have to distinguish the continuous and discrete cases every time we refer to the expected value of a random variable. 3 . S Sometimes dP (s) is written P (ds).

3. F the σ-algebra generated by 4-dimensional parallelepipeds.2 (Variance) Let (S. then E[XY ] = E[X]E[Y ]. S −∞ 1. and P the 4-dimensional volume obtained by integrating dV = dx dy dz dt.3 Let D be the determinant of the matrix   X Y D = det . then Z Z ∞ E[g(X)] = g(X(s))dP (s) = g(x)fX (x)dx. E[XY ]2 ≤ E[X 2 ]E[Y 2 ]. 1]. then E[a] = a.) 6. (This is the Cauchy-Schwartz inequality. If X is constant equal to a. there are constants a.e. S p The standard deviation of X is defined as σ(X) = Var(X). i. (We assume that m exists and is finite. 4. It is not difficult to obtain from the defintion that if Y = g(X) for some function g(x) and X is a continuous random variable with probability density function fX (x). not both zero.. Definition 1. Z. Example 1. P ) be a probability space and consider a random variable X : S → R with expectation m = E[X]. 0 0 0 0 4 . b. We wish to find E[D] and Var(D).3 Variance The variance of a random variable X refines our knowledge of the probability distribution of X by giving a broad measure of how X is dispersed around its mean. that is. T are independent random variables uniformly distributed in [0. The probability space of this problem is S = [0. E[X + Y ] = E[X] + E[Y ]. F. Y. 0 0 0 0 The variance is Z 1 Z 1 Z 1 Z 1 Var(D) = (xt − zy)2 dx dy dz dt = 7/72. with the equality if and only if X and Y are linearly dependent. 1]4 . Z Var(X) = E[(X − m)2 ] = (X(s) − m)2 dP (s). Z T where X. 5. Thus Z 1Z 1Z 1Z 1 E[D] = (xt − zy)dx dy dz dt = 0.) We define the variance of X as the mean square of the difference X − m. If X and Y are independent and both E[|X|] and E[|Y |] are finite. such that P (aX + bY = 0) = 1.

5 . Proposition 1. 2. X : S → R a random variable. Some of the general properties of variance are enumerated in the next propo- sition. n n Mean and variance are examples of moments and central moments of prob- ability distributions. Var(X) = E[X 2 ] − E[X]2 . Then σ2   Sn Var = . Definition 1. F. this inequality says that if the central moments are small. 3. 2. Xn be independent random variables having the same standard deviation σ. . Theorem 1. and  > 0 a fixed number. Broadly speaking. .5 Let X1 . of a random variable X is defined as Z k E[X ] = X(s)k dP (s). . These are defined as follows. They can be derived from the definitions by simple calculations. then the random variable cannot deviate much from its mean. The next proposition implies that the standard deviation of the √ arithmetic mean of independent random variables X1 . . Denote their sum by Sn = X 1 + · · · + X n . The variance of a sum of any number of independent random variables now follows from the above results. . . X2 . Proposition 1. where a is any real constant. Y be random variables on a probability space (S. and the variance in particular. The details are left as exercises. S The central moment of order k of the random variable with mean µ is Z E[(X − µ)k ] = (X(s) − µ)k dP (s). . F. then Var(X + Y ) = Var(X) + Var(Y ).4 Let X. . 3. 1. . Xn decreases like 1/ n. Var(aX) = a2 Var(X). P ) be a probability space.3 (Moments) The moment of order k = 1. . S The meaning of the central moments.1 (Chebyshev inequality) Let (S. is easier to interpret using Chebyshev’s inequality. P ). If X and Y have finite variance and are independent. .

’ and 0 if ‘tail. For any positive c > 0 we have P (|X − m| ≥ cσ) ≤ 1/c2 .1 we have P (450 ≤ SN ≤ 550) is at least 1 − 250/502 = 0. Let X be a random variable with finite expectation m and finite standard deviation σ > 0. Let X ≥ 0 (with probability 1) and let k ∈ N. From the second inequality in theorem 1. 1. Proof. where Xi is 1 if the i-th toss obtains ‘head.9. then the probability that X lies in the interval (m − 3σ. 2.  Example 1. The third inequality follows from the second by taking  = cσ. 3. discussed later.5 (Tosses of a fair coin) We make N = 1000 tosses of a fair coin and denote by SN the number of heads. 6 . A better estimate of the dispersion around the mean will be provided by the central limit theorem. Thus we only need to prove the first.’ We assume that the Xi are independent and P (Xi = 0) = P (Xi = 1) = 1/2. So we get P (X ≥ ) ≤ E[X k ]/k as claimed. and the second follows from the first by substituting |X − m| for X. Then E[SN ] = N/2 = 500 and Var(SN ) = N/4 = 250. Let X be a random variable with finite expectation m. m + 3σ) is at least 1 − 1/32 = 8/9. Example 1. in the form of inequality 3 in the the- orem. Then P (X ≥ ) ≤ E[X k ]/k .4 Chebyshev’s inequality. Then P (|X − m| ≥ ) ≤ Var(X)/2 . This is done by noting that Z k E[X ] = X(s)k dP (s) S Z ≥ X(s)k dP (s) {s∈S:X(s)≥} Z ≥ k dP (s) {s∈S:X(s)≥} = k P (X ≥ ). Notice that SN = X1 + X2 + · · · + XN . implies that if X is a random variable with finite mean m and finite variance σ 2 .

. . then the averages (x1 + · · · + xn )/n converge to some value x¯ as n grows. we write Sn = X 1 + · · · + X n . . .1 (Weak law of large numbers) Let (S. respectively. if we toss a fair coin n times and let xi be 0 or 1 when the coin gives. Then. Theorem 2. . We assume that the means are finite and there is a constant L such that σi2 ≤ L for all i. . . X2 . . x2 . For example.’ then the running averages should converge to the relative frequency of ‘tails. . 2. . F. We discuss now two theorem that make this idea precise. of random variables. X2 . be a sequence of independent random variables with means mi and variances σi2 . ‘head’ or ‘tail.1 The weak law of large numbers Given a sequence X1 . P ) be a probability space and let X1 .5. yielding numerical outcomes x1 .2 The Laws of Large Numbers The frequentist interpretation of probability rests on the intuitive idea that if we perform a large number of independent trials of an experiment. . for every  > 0.’ which is 0.

.

 .

Sn − E[Sn ] .

lim P .

.

.

n→∞ n . ≥  = 0.

in particular. then . E[Xi ] = m for all i. If.

.

 .

Sn .

lim P .

.

− n.

.

Chebyshev’s (second) inequality applied to Sn then gives . The independence of the random variables implies Var(Sn ) = Var(X1 ) + · · · + Var(Xn ) ≤ nL. n→∞ n Proof. ≥  = 0.

.

 .

Sn − E[Sn ] .

Var(Sn ) L P .

.

≥  = P (|Sn − E[Sn ]| ≥ n) ≤ (n)2 ≤ n2 . .

.

n This proves the theorem. 7 . . the proba- bility that Sn /n lies outside the interval [p − .  For example. for any  > 0. Then the weak law of large numbers says that the arithmetic mean Sn /n converges to p = E[Xi ] in the sense that. . identically dis- tributed random variables with two outcomes: 1 with probability p and 0 with probability 1 − p. let X1 . . p + ] goes to zero as n goes to ∞. be a sequence of independent. X2 .

We state this theorem here without proof. This is a much more subtle result than the weak law. are independent random variables with mean µ and variance σ 2 . P ) be a probability space and let X1 . then the random variable Sn − nµ Zn = √ σ n 8 . applied to a sequence Xi ∈ {0. .1 (The central limit theorem) If X1 . . and we will be content with simply stating the general theorem. Then. . The strong law of large numbers states that the set of s for which this holds is an event of probability 1. In particular. . Sn E[Sn ] − →0 n n as n → ∞. X2 . P ) be a probability space and X1 . as well as the arithmetic averages Sn (s)/n. there is an event E ∈ F of probability 1 such that for all s ∈ E. and ask whether Sn (s)/n (an ordinary sequence of numbers) actually converges to 1/2. To explain the meaning of the stronger claim. Assume that the Xi have a common distribution with finite expectation m and finite nonzero variance σ 2 . for each s ∈ S we can consider the sample sequence X1 (s). let us be more explicit and view the random variables as functions Xi : S → R on the same probability space (S. if in addition all the means are equal to m then for all s in a subset of S of probability 1. X2 . . Theorem 2. although experimental evidence for its validity will be given in a number of examples. n→∞ n 3 The Central Limit Theorem The reason why the normal distribution arises so often is the central limit theo- rem. X2 . 1} of coin tosses. a sequence of coin tosses yields a sum Sn such that Sn /n actually converges to 1/2. .2. . be independent random variables defined on S. .2 The strong law of large numbers The weak law of large numbers. . F. Let (S. A stronger statement would be to say that. . i=1 i2 Then. Sn (s) lim = m. P ). . F. Theorem 3. . F. Define the sum Sn = X 1 + X 2 + · · · + X n . says that Sn /n must lie in an arbitrarily small interval around 1/2 with high probability (arbitrarily close to 1) if n is taken big enough. . be random variables defined on S with finite means and variances satisfying ∞ X Var(Xi ) < ∞.2 (Strong law of large numbers) Let (S. with probability one. X2 (s).

. By the central limit theorem. f n∗ approaches a normal distribution. 9 . after centering and re-scaling (not done in the figure). Let X n be the sample mean. In other words.) f f∗ f 2 2.converges in distribution to a standard normal random variable.5 1 1 0. of indepen- dent realizations of X.5 0 0. one way to observe the convergence to the normal distribution claimed in the central limit theorem is to consider the convolution powers f n∗ = f ∗ · · · ∗ f of the common probability density function f with itself n times and see√that it ap- proaches a normal distribution. 1]. then the sum X + Y has probability density function equal to the convolution fX ∗ fY .5 0 0 −1 −0. one considers a random variable X with mean µ and variance σ 2 .1. .5 1 −2 −1 0 1 2 f∗ f∗ f f∗ f∗ f∗ f 4 6 5 3 4 2 3 2 1 1 0 0 −4 −2 0 2 4 −4 −2 0 2 4 Figure 1: Convolution powers of the function f (x) = 1 over [−1. (Subtracting µ and dividing by σ/ n changes the probability density function in the simple way described in proposition 1. and a sequence X1 . Then the limiting distribution of the random variable Zn = (X n − µ)/(σ/ n) is the standard normal distribution.5 1. We saw in Lecture Notes 2 that if random variables X and Y are indepen- dent and have probability density functions fX and fY . Z z 1 1 2 P (Zn ≤ z) → Φ(z) = √ e− 2 u du 2π −∞ as n → ∞.5 0. Therefore. .5 2 1. X2 . X n = (X1 + · · · + Xn√ )/n. In a typical application of the central limit theorem.

which is approximately 54.Example 3.1 0.08 0. we con- sider the random variable Xi . .06 0. 1000. with P (Xi = 1) = 1/6. we approximate the probability p distribution of Sn by a normal distribution with µ = 1000p and variance σ = 1000(1 − p)p. i = 1. By the central limit theorem. Each Xi has mean p = 1/6 and standard deviation (1 − p)p. Example 3. Therefore.5 and σ 2 = 35/12.02 0 3300 3350 3400 3450 3500 3550 3600 3650 3700 Figure 2: Comparison between the sample distribution given by the stem plot and the normal distribution for the experiment of tossing a die 1000 times and counting the total number of pips. by the central limit theorem Sn has approximately the pnormal distribution with mean µ(Sn ) = 3500 and standard deviation σ(Sn ) = 35 × 1000/12. Let Xi be the number obtained in the i-th toss and Sn = X1 + · · · + Xn .12 0.79. taking values in {0. 1}.67 and σ = 11.) Writing Sn = X1 +· · ·+Xn .04 0. the distribution of (Sn − µ)/σ is 10 .14 0. The Xi are independent and have a common discrete distribution with mean µ = 3. Now. one obtains more than 150 6s. This is approx- imately µ = 166. 0.2 (Die tossing II) We would like to compute the probability that after tossing a die 1000 times. σ 2π where µ = 3500 and σ = 54. what we obtain should be approximated by the function 1 1 x−µ 2 f (x) = √ e− 2 ( σ ) . (Xi = 1 represents the event of getting a 6 in the i-th toss.1 (Die tossing) Consider the experiment of tossing a fair die n times. Here. . . . Assuming n = 1000. if we simulate the experiment of tossing a die 1000 times. we wish to compute the probability p P (S1000 > 150). repeat the experiment a number of times (say 500) and plot a histogram of the result.

41 = 0. x2 .01. This requires evaluating the integral Z ∞ E[g(X)] = g(x)f (x)dx. Suppose we have a continuous random variable X with probability density function f (x) and we wish to evaluate the expectation E[g(X)]. The integral above was evaluated numerically by a simple Riemann sum over the interval [−1. yn of a random variable Y with probability density function h(x) which is related to X in that h(x) is not 0 unless f (x) is 0.approximately the standard normal. It may happen that we cannot simulate realizations of X. y2 . We conclude that the probability of obtaining at least 150 6s in 1000 tosses is approximately 0.41.92. n i=1 This is the basic idea behind Monte-Carlo integration. xn of X and. . −∞ If the integral is not tractable by analytical or standard numerical methods. so we can write P (Sn > 150) = P ((Sn − µ)/σ > (150 − µ)/σ) = P ((Sn − µ)/σ > −1.9215. . for some func- tion g(x). provided that the variance of g(X) is finite. . In this case we can write Z ∞ E[g(X)] = g(x)f (x)dx −∞ Z ∞ g(x)f (x) = h(x)dx −∞ h(x) = E[g(Y )f (Y )/h(Y )] n 1 X g(yi )f (yi ) ∼ . . . . . one can approach it by simulating realizations x1 . . one can apply the law of large numbers to obtain the approximation n 1X E[g(X)] ∼ g(xi ). 10] and step-size 0. but we can simulate realizations y1 .4139) Z ∞ 1 1 2 ∼√ e− 2 z dz 2π −1. 4 Some Simulation Techniques Up until now we have mostly done simulations of random variables with a finite number of possible values. n i=1 h(yi ) 11 . In this section we explore a few ideas for simulating continuous random variables.

) Example 4. Since only finitely many different numbers occur. X2 with the uniform distribution on [0. 1]. If we pick a point from S at random with the uniform distribution. (Similarly for cubes [0.1403. Typically b is set to 0. then the probability that it will be from the disc is p = π/4.2). X=2*rand(n. If we want to simulate a random point in the square [0. but a few general techniques are available. suppose we wish to find the area of a disc of radius 1.2)-1. Then x = u/M gives a pseudo-random number in [0. A good choice of parameters are K = 75 and M = 231 − 1. This is usually done by number theoretic methods. The following program simulates this experiment (50000 random points) and estimates a = 4p = π. 4. where M and K are integers. 1]n in any dimension n. a=4*sum(X(:.^2<=1)/n %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 12 .1 Uniform random number generation We have already looked at simulating uniform random numbers in Lecture Notes 1. We need now ways to simulate realizations of random variables. and then generating a sequence of new integer values by some deterministic rule of the type un+1 = Kun + b (mod M ). The disc is inscribed in the square S = [−1. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% rand(’seed’. we will continue to take for granted that this is a good way to simulate uniform random numbers over [0. This is typically not an easy task.The above procedure is known as importance sampling. 1] × [0. we can naturally do it by picking a pair of independent random variables X1 . To prevent cycling with a period less than M the multiplier K should be taken relatively prime to M .1). We describe some below. 1]. 1]. Such an algorithm begins by setting an integer seed. It gave the value a = 3.1 (Area of a disc by Monte Carlo) As a simple illustration of the Monte Carlo integration method.^2 + X(:.121) n=500000. 1]. the simplest of which are the linear congruential algorithms. 1] × [−1. We review the main idea here. We may return to this topic later and discuss statistical tests for evaluating the quality of such pseudo-random numbers. the modulus M should be chosen as large as possible. which is the ratio of the area of the disc by the area of the square S. u0 . For now. 1] that simulates a uniformly distributed random variable. in which case the pseudo-random number generator is called a multiplicative congruential generator.

Re- call that the probability distribution density of Zn .Figure 3: Simulation of 1000 random points on the square [−1. . where Dn − µ Zn = √ . 4 E and the random variable D : S → {0. . The expected value of D is µ = π/4 and the variance is easily calculated to be σ 2 = µ(1 − µ) = π(4 − π)/16. then the fraction of points in the disc is given by the random variable D1 + · · · + Dn Dn = . our calculation of π. . Dn must have mean value µ and standard deviation σ/ n. If we draw n independent points on the square. consider the probability space S = [−1. Dn . σ/ n This probability can now be estimated using the central limit theorem. we ask for the probability P (|Zn | ≥ K). 1} which is 1 for a point in the disc and 0 for a point in the complement of the disc. and how do we determine the number of random points needed for a given precision? First. 1]2 with the uniform distribution. and call the outcomes D1 . is very nearly a 13 .5). 1]2 with probability measure given by ZZ 1 P (E) = dxdy. To approximate the ratio of the area of the disc over the area of the square we compute the fraction of points that fall on the disc. Equivalently. . for big n. D2 . Fix a positive number K. n As we have already seen√ (proposition 1. say. One way to estimate the error in our calculation √ is to ask for the probability that |Dn − µ| is bigger than the error Kσ/ n. The above example should prompt the question: How do we estimate the error involved in.

P (X ≤ x) = P (F −1 (U ) ≤ x) = P (U ≤ F (x)) = FU (F (x)) = F (x). Notice that F : R → (0. of course.0017 is approximately 1.1 (Inverse distribution method) Let U ∼ U (0. 4.standard normal distribution. Using n = 500000. and denote by G : (0.6 × 10−5 . Proof. b−a which has inverse function F −1 (u) = a + (b − a)u. using U ∼ U (0. Now set X = F −1 (U ). the integral on the right-hand side is approximately 1. that our random numbers generator is ideal. 1) and G : (0. b] if U ∼ U (0. Deviations from this error bound can serve as a quality test for the generator. 1). Let Z x F (x) = f (s)ds −∞ denote the cumulative distribution function of X. the probability that our simulated Dn . we obtain P (|Dn − µ| ≥ 0. 1) a random variable that has the uniform distribution over [0. Then X has cumulative distribution function F (x). 1). This follows from the proposition since x−a F (x) = . 1).6×10−5 . Our problem now is to simulate realizations of another random variable X with probability density function f (x).2 The random variable X = a + (b − a)U clearly has the uniform distribution on [a. Assume that F is invertible. Set X = G(U ). This assumes. 14 . 1]. 2π K If we take K = 3.  Example 4. does not lie in an interval around the true value µ of radius 0.2 Transformation methods We indicate by U ∼ U (0.0017) ∼ 1. 1) → R the inverse function G = F −1 . 1) → R the inverse of the cumulative distribution function with PDF f (x). Thus √ Z ∞ 2 1 2 P (|Dn − µ| ≥ Kσ/ n) ∼ √ e− 2 z dz. for n = 500000. Proposition 4.6×10−5 . In other words.

18 0. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 15 .16 0. 1 F −1 (u) = − log(1 − u). 0. so we have the claim. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y=exponential(lambda.Example 4.04 0.06 0.08 0.12 0.02 0 0 1 2 3 4 5 6 7 8 Figure 4: Stem plot of relative frequencies of 5000 independent realizations of an exponential random variable with parameter 1. In fact. y=-log(rand(1. an exponential ran- dom variable has PDF f (x) = λe−λx and its cumulative distribution function is easily obtained by explicit integra- tion: F (x) = 1 − e−λx . Superposed to it is the graph of frequencies given by the exact density e−λ .n))/lambda.1 0. 1).3 (Exponential random variables) If U ∼ U (0. then 1 X = − log(U ) λ has an exponential distribution with parameter λ. 1) if U ∼ U (0. Therefore. λ But 1 − U ∼ U (0. 1) and λ > 0.14 0.n) %Simulates n independent realizations of a %random variable with the exponential %distribution with parameter lambda.

} or a subset of it.fdx) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 4. Some. .3 Lookup methods This is the discrete version of the inverse transformation method. In fact. we must necessarily have qk−1 < U ≤ qk for some k and so P (X = k) = P (U ∈ (qk−1 . then Y = X/λ is exponentially distributed with parameter λ. Then X has the desired probability distribution. 16 . 2.5000). Now define k X qk = P (X ≤ k) = pi . 1) and write X = min{k : qk ≥ U }. . Similarly. i=0 Let U ∼ U (0. stem(xout. then   1 y−b fY (y) = f . possibly infinitely many. if X is exponentially distributed with parameter 1. |a| a Thus.n/5000) grid hold on dx=xout(2)-xout(1). plot(xout. 1.xout]=hist(y. fdx=exp(-xout)*dx. Suppose one is interested in simulating a discrete random variable X with sample space S = {0. 4.4 Scaling We have already seen that if Y = aX + b for a non-zero a. of the pk may be zero. [n.40). for example. Write pk = P (X = k). if we can simulate a random variable Z with the standard normal distribution. then X = σZ + µ will be a normal random variable with mean µ and standard deviation σ. . Here are the commands used to produce figure 4. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% y=exponential(1. qk ]) = qk − qk−1 = pk .

b] × [0. Y ) ∈ A|X = s)ds Rx P ((X. Choose X ∼ U (a. which uses the continuous version of the total probability formula and the key fact: P ((X. b] and f (x) ≤ L for all x. If Y < f (X). will fall in the 17 . We wish to show that F (x) = a f (s)ds.4. Let F (x) = P (X ≤ x) denote R x the cumulative distribution function of X. This procedure is referred to as the uniform rejection method for density f (x). a  It is clear that the efficiency of the rejection method will depend on the probability that a random point (X. Y ) ∈ A) P ({X ≤ x} ∩ {(X. Y ) ∈ A|X = s)ds R x f (s) L ds = Rab f (s) ds Z ax L = f (s)ds. This is a consequence of the following calculation. Proposition 4. As above. Y ) ∈ A}) = P ((X. Suppose we want to simulate a random variable with PDF f (x) such that f (x) is zero outside of the interval [a. F (x) = P (X ≤ x) = P (X ≤ x|(X. Y ) will be accepted.e. Then take that X for which Y < f (X) as the output of the algorithm and call it X. we denote by (X. try again enough times until it holds. accept X as the simulated value we want. Y ) ∈ A|X = s) = f (s)/L. Y ) ∈ A}|X = s)ds = b−a a Rb 1 b−a a P ((X. L].5 The uniform rejection method The methods of this and the next subsection are examples of the rejection sampler method. Y ) ∈ A) 1 Rb P ({X ≤ x} ∩ {(X. b] × [0. We call A the acceptance region. b) and Y ∼ U (0. L] consisting of points (x. i. Let A represent the region in [a.. L) independently. Proof. If the acceptance condition is not satisfied.2 (Uniform rejection method) The random variable X pro- duced by the uniform rejection method for density f (x) has probability distribu- tion function f (x). y) such that y < f (x). Y ) ∈ A|X = s)ds = Rab a P ((X. Y ) a random variable uniformly distributed on [a.

Y ) ∈ A) Z b 1 = P ((X.xout]=hist(y. (b − a)L If this number is too small. x=[]. the uniform rejection method. stem(xout. first.4 (Uniform rejection method) Consider the probability den- sity function 1 f (x) = sin(x) 2 over the interval [0. The following Matlab commands can be used to obtain figure 5: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% y=samplefromsine(5000). while Y>=(1/2)*sin(U) U=pi*rand. π]. Example 4.acceptance region A. Y=(1/2)*rand. This probability can be estimated as follows: P (accept) = P ((X. Y=1/2. the procedure will be inefficient. We wish to simulate a random variable X with PDF f (x) using.40).64. or approximately 0.n/5000) grid hold on 18 . for i=1:n U=0. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function x=samplefromsine(n) %Simulates n independent realizations of a random %variable with PDF (1/2)sin(x) over the interval %[0.pi]. The following program does that. end x=[x U]. Y ) ∈ A|X = s)ds b−a a Z b 1 = f (s)ds (b − a)L a 1 = . [n. end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The probability of acceptance is 2/π.

Then set X equal to the obtained value of Y .045 0. b].015 0.035 0. The overall acceptance probability is P (U < f (Y )).5 sin(x) over [0.fdx) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0. In other words.5 3 Figure 5: Stem plot of the relative frequencies of 5000 independent realizations a simulated random variable with PDF 0. The exact frequencies are superposed for comparison.02 0. ag(y)). (Simply 19 .025 0.5 1 1. Now consider the following algorithm. Suppose that we wish to simulate a random variable with PDF f (x) and that we already know how to simulate a second random variable Y with PDF g(x) having the property that f (x) ≤ ag(x) for some positive a and all x. simulate a value from the distribution g(y) and accept this value with probability f (y)/(ag(y)).5 2 2. Note that a ≥ 1 since the total integral of both f (x) and g(x) is 1. A more general procedure. The method will work more efficiently if the acceptance rate is high. Draw a realization of Y with the distribution density g(y) and then draw a realization of U with the uniform distribution U (0. plot(xout. It is not difficult to calculate this probability as we did in the case of the uniform rejection method.01 0. called the envelope method for f (x) can sometimes be used when the uniform rejection method does not apply. fdx=(1/2)*sin(xout)*dx. Repeat the procedure until a pair (Y.005 0 0 0. otherwise reject and try again.6 The envelope method One limitation of the uniform rejection method is the requirement that the PDF f (x) be 0 on the complement of a finite interval [a. 4.04 0. U ) such that U < f (Y ) is obtained.03 0. π].dx=xout(2)-xout(1).

π]. Notice that f (x) ≤ ag(x) for a = π 2 /8. we have: F (x) = P (X ≤ x) = P (Y ≤ x|U ≤ f (Y )) P ({Y ≤ x} ∩ {U ≤ f (Y )}) = P (U ≤ f (Y )) R∞ P ({Y ≤ x} ∩ {U ≤ f (Y )}|Y = s)g(s)ds = −∞ R ∞ −∞ P (U ≤ f (Y )|Y = s)g(s)ds Rx P (U ≤ f (Y )|Y = s)g(s)ds = R−∞ ∞ −∞ P (U ≤ f (Y )|Y = s)g(s)ds R x f (s) a ds = R−∞ ∞ f (s) −∞ a ds Z x = f (s)ds. note that g(x) = (h ∗ h)(x).apply the integral form of the total probability formula. π/2]. With this in mind. We wish to simulate a random variable X with PDF (1/2) sin(x) over [0.3 (The envelope method) The envelope method for f (x) de- scribed above simulates a random variable X with probability distribution f (x). −∞  Example 4. where ( 4 2y if y ∈ [0. Therefore.81. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function x=samplefromsine2(n) 20 . the envelope method will have probability of acceptance 1/a = 0. Proof. π]. a Proposition 4.) The result is 1 P (accept) = . The argument is essentially the same as for the uniform rejection method. π/2].5 (Envelope method) This is the same as the previous example. Therefore. where h(x) = 2/π over [0. We first simulate a random variable Y with probability density g(y). where Vi are identically distributed uniform random variables over [0. Note now that P (U ≤ f (Y )|Y = s) = f (s)/(ag(s)). To simulate the random variable Y . π/2] g(y) = π4 π2 (π − y) if y ∈ [π/2. we can take Y = V1 + V2 . but we now approach the problem via the envelope method.

that is.%Simulates n independent realizations of a random %variable with PDF (1/2)sin(x) over the interval %[0. while U>=(1/2)*sin(Y) Y=(pi/2)*sum(rand(1.2)). . . written X ∼ DU (n) if it takes values in the set S = {1. . therefore. 1 P (X = k) = . k The cumulative distribution function of X is. . end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 5 Standard Probability Distributions We study here a number of the more commonly occurring probability distribu- tions. k ∈ S. . for i=1:n U=1/2. They are associated with basic types of random experiments that often serve as building blocks for more complicated probability models. . and the Poisson distributions.1 The discrete uniform distribution A random variable X is discrete uniform on the numbers 1. x=[]. end x=[x Y]. n. 5. . k P (X ≤ k) = . 2 21 . n} and each of the possible values is equally likely to occur. using the envelope method. U=(pi^2/8)*((2/pi)-(4/pi^2)*abs(Y-pi/2))*rand. n The expectation of a discrete uniform random variable X is easily calculated: n X k E[X] = n k=1 n(n + 1) = 2n n+1 = . the exponential. 2. .pi]. Y=0. 2. Among the most important for our later study are the normal. k ∈ S.

1}. 2. Thus. . k k!(n − k)! It gives the number of ways to pick k elements in a set of n elements.” If the total number of successes is denoted X. Indeed. p) to indicate that X is a binomial random variable for n independent trials and success probability p. . p) random variable.” The binomial distribution is the distribution of the number of “successful” outcomes in a series of n independent trials. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y=discreteuniform(n.2 The binomial distribution Given a positive integer n and an integer k between 0 and n. k) such sequences. Its value is n2 − 1 Var(X) = . . we write X ∼ B(n. so the probability of k successes is   n P (X = k) = pk (1 − p)n−k . P (Zi = 0) = 1 − p. this is the probability of any sequence of n outcomes with k success trials. . 22 .The variance is similarly calculated. k The expectation and variance of the binomial distribution are easily ob- tained: E[X] = np. y=[]. Zn are independent random variables taking values in {0. each with a probability p of “suc- cess” and 1 − p of “failure. 12 The following is a simple way to simulate a DU (n) random variable. Var(X) = np(1 − p). n}. if Z1 . for i=1:m y=[y ceil(n*rand)]. k) = = . and P (Zi = 0) = 1. k) as “n choose k. independent of the order in which they occur. . . We often read C(n.m) %Simulates m independent samples of a DU(n) random variable. 1. then X = Z1 + · · · + Zn is a B(n. The probability of k successes followed by n − k failures is pk (1 − p)n−k . . . The sample space for a binomial random variable X is S = {0. end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 5. recall that the binomial coefficient is defined by   n n! C(n. There are C(n.

. Xr ) is then said to have the multinomial distribution. Note that X1 + · · · + Xr = n.. r. pkr r . p1 . The following program simulates one sample draw of a random variable with the multinomial distribution.. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y=multinomial(n. y=sum(rand(n.p). 1]. . 2. %where p=[p1 . .1 (Urn problem) An urn contains N balls. Let p = K/N . p1 . k1 ! . . Then. r=length(p).. kr ! 1 Note that if X = (X1 . the histogram itself can be viewed as a random variable with the multinomial distribution. .p). c2 . in which the bar over the bin labeled by i equals Xi .p) %Simulates drawing a sample vector y=[y1. . . and we write X ∼ M (n. . p). The information is then represented as a histogram.p. . . . . .3 The multinomial distribution Consider the following random experiment. . then each Xi can be interpreted as the number of successes in n trials.1). . . Then X ∼ B(n. for i = 1. . X2 . 23 . X10 of observations that fall in each bin. . An urn contains balls of colors c1 . of which K are black and N − K are red. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 5. . pr] is a probability vector. . pr ). . a=0.m)<=p). Xi is a binomial random variable B(n. . Binomial random variables correspond to the special case r = 2. . . We partition the interval into 10 equal subintervals (bins). . . . pr ). . .Example 5. . . . .2 Suppose that 100 independent observations are taken from a uni- form distribution on [0. and record the numbers X1 . We draw with replacement n balls and count the number X of black balls drawn. . . . . . . each of which has probability pi of success and 1 − pi of failure. p2 . . Example 5. which can be drawn with probabilities p1 .. the multinomial distribution assigns probabilities n! P (X1 = k1 . yr] %with the multinomial distribution M(n. . More explicitly. Suppose we draw n balls with replacement and register the number Xi . . %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y=binomial(n. or bar graph. . pi ). Xr = kr ) = pk1 . Xr ) ∼ M (n. . Therefore. that a ball of color ci was drawn. pr . where n = 100 and pi = 1/10 for i = 1. 2. .m) %Simulates drawing m independent samples of a %binomial random variable B(n. The vector X = (X1 . 10. cr . x=rand(n.

. . denoted X ∼ Geom(p). a=a+p(i).4 The geometric distribution As in the binomial distribution. X2 . The sample space is S = {1. In other words. . for i=1:r A=A+i*(a<=x & x<a+p(i)). Therefore. }. 3. consider a sequence of trials with a success/fail outcome and probability p of success. In order to have X = k. .A=zeros(n. Then X is said to have the geometric distribution. as the Xi are independent. Then X = min{n ≥ 1 : Xn = 1} ∼ Geom(p). The expectation of a geometrically distributed random variable is calculated as follows: ∞ ∞ X X p 1 E[X] = iP (X = i) = i(1 − p)i−1 p = = . 2. be independent random variables with values in {0. end y=zeros(1. end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 5. there must be a sequence of k − 1 failures followed by one success. Another way to describe a geometric random variable X is as follows. for j=1:r y(j)=sum(A==j).r). P (Xn−1 = 0)P (Xn = 1) = (1 − p)n−1 p. . 1} such that P (Xi = 1) = p and P (Xi = 0) = 1 − p. Let X be the number of independent trials until the first success is encountered. we have P (X = n) = P ({X1 = 0} ∩ {X2 = 0} ∩ · · · ∩ {Xn−1 = 0} ∩ {Xn = 1}) = P (X1 = 0)P (X2 = 0) . . Let X1 . . P (X = k) = (1 − p)k−1 p. i=1 i=1 (1 − (1 − p))2 p Similarly: ∞ ∞ X X 1 + (1 − p) 2−p E[X 2 ] = i2 P (X = i) = i2 (1 − p)i−1 p = p = . .1). X is the waiting time until the first success. In fact. i=1 i=1 (1 − (1 − p))3 p2 24 .

with probability 1/6. 6 6 k=1 k=1 So on average we need to wait 6 tosses to get a 6. end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 25 . i2 = n(n + 1)(2n + 1). The expected value of X is ∞ ∞  k−1 X X 5 1 kP (X = k) = k = 6. p2 p p2 We have used above the following formulas: ∞ ∞ ∞ X 1 X 1 X 1+a ai−1 = .3 (Waiting for a six) How long should we expect to have to wait to get a 6 in a sequence of die tosses? Let X denote the number of tosses until 6 appears for the first time. while a==0 y=y+1. iai−1 = . 6 6 In other words we have k − 1 failures. Then the probability that X = k is  k−1 5 1 P (X = k) = . until a success.from which we obtain the variance: 2−p 1 1−p Var = E[X 2 ] − E[X]2 = − 2 = . i=1 2 i=1 6 Example 5. each with probability 5/6. a=0. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y=geometric(p) %Simulates one draw of a geometric %random variable with parameter p. The following program simulates one sample draw of a random variable with the Geom(p) distribution. y=0. i2 ai−1 = i=1 1−a i=1 (1 − a)2 i=1 (1 − a)3 as well as the formulas n n X n(n + 1) X 1 i= . a=(rand<p).

One way to think about the Poisson distribution is as the limit of a 26 . then add them up. Therefore. if there exists an integer n ≥ 1 and a real number p ∈ (0. y=0. } such that λk −λ P (X = k) = e . to simulate a negative binomial random variable all we need is to simulate n independent geometric random variables. for i=1:n a=0. . while a==0 u=u+1. 2. 1.p) %Simulates one draw of a negative binomial %random variable with parameters n and p. . is a random variable with sample space S = {0. denoted X ∼ N B(n. denoted X ∼ Po(λ). end y=y+u. . It also follows from this proposition that n n(1 − p) E[X] = .5. u=0. . .1 Let X1 . Then X = X1 + · · · + Xn has the negative binomial distribution with parameters n and p. . also called the Pascal distribution. 1. k = 0. 2. . p p2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y=negbinomial(n. 1) such that   n+k−1 P (X = n + k) = pn (1 − p)k . .5 The negative binomial distribution A random variable X has the negative binomial distribution. p). . Var(X) = . . k = 0. 1. a=(rand<p). . 2. k! This is a very ubiquitous distribution and we will encounter it many times in the course. k The negative binomial distribution has the following interpretation.6 The Poisson distribution A Poisson random variable X with parameter λ. Xn be independent Geom(p) random variables. . . Proposition 5. . end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 5.

Var(X) = λ. n. a=(rand(1. p → 0. . . 2/n). The result is: E[X] = λ. replacing p by λ/n in the binomial distribution gives   k  n−k nλ λ P (X = k) = 1− kn n  k  n−k n! λ λ = 1− k!(n − k)! n n λk n! (1 − λ/n)n = k! (n − k)!n (1 − λ/n)k k λk n (n − 1) (n − 2) (n − k + 1) (1 − λ/n)n = ··· k! n n n n (1 − λ/n)k λk −λ → e . %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %Approximate Poisson random variable. [(n − 1)/n. k! Notice that we have used the limit (1 − λ/n)n → eλ . It produces samples of a Poisson ran- dom variable with parameter λ = 3 over the interval [0. . 1/n). lambda=3. The following Matlab script illustrates this procedure. then Z = X + Y ∼ Po(λ + µ). 1] partitioned into a large number. X.binomial distribution B(n. of subinter- vals of equal length: [0. for large n. the random variable X is approximately Poisson with parameter λ. while λ = np = E[X] remains constant. One noteworthy property of Poisson random variables is that. In fact.a) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 27 . p=lambda/n. Then. n=500. p) as n → ∞. The expectation and variance of a Poisson random variable are easily calcu- lated from the definition or by the limit of the corresponding quantities for the binomial distribution. if X ∼ Po(λ) and Y ∼ Po(µ) are independent Poisson random variables. Let X be the number of 1s. 1] and a graph that shows the positions where an event occurs.n)<p). x=1/n:1/n:1. Consider the interval [0. To each subinterval we randomly assign a value 1 with a small probability λ/n (for a fixed λ) and 0 with probability 1 − λ/n. [1/n. with parameter lambda. X=sum(a) stem(x. A numerical example may help clarify the meaning of the Poisson distribu- tion. 1]. .

2 0. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 28 .4 0. 1] for λ = 3. T ] of a Poisson process with parameter λ.8 0.1 0 0 0. The following is a function script for the arrival times over [0.7 0.1 0. 1 0.6 0. Nt .5 0. else t(i.T time interval. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function a=poisson(lambda.3 0.3 0.9 0.9 1 Figure 6: Poisson distributed events over the interval [0.T] %Output . %interarrival times if i==1 t(i.7 0.1)=t(i-1)+z(i. [0.6 0.2 0.1)=(1/lambda)*log(1/(1-rand(1. A sequence of times of occurrences of random events is said to be a Poisson process with rate λ if the number of observations.1).1)=z(i).5 0. end if t(i)>T break end end M=length(t)-1. This is a simple model for discrete events occurring continuously in time.8 0.4 0.a arrival times in interval [0.lambda arrival rate % . in any interval of length t is Nt ∼ Po(λt) and the number of events in disjoint intervals are independent of one another.T) %Imput .1))).T] for i=1:1000 z(i. a=t(1:M).

and released. 2. 1. the maximum is attained for n = 50. so L(n) stops growing for n = 50. of which 10 are tagged. and it is found that 4 of them are tagged. . 20 animals are cap- tured.7 The hypergeometric distribution A random variable X is said to have a hypergeometric distribution if there exist positive integers r. .5 (Capture/recapture) The capture/recapture method is some- times used to estimate the size of a wildlife population. Therefore. but it can be estimated using the maximum likelihood method. How large is the population? We assume that there are n animals in the population.5. The expectation and variance of a random variable X having the hypergeo- metric distribution are given by N −n E[X] = np. The hypergeometric distribution arises when the experiment involves drawing without replacement. n and m such that for any k = 0. We draw from the urn without replacement m bals and denote by X the number of black balls among them. In other words. Suppose that 10 animals are captured. 29 . N −1 Example 5. This number serves as our estimate of the population size in the sense that it is the value that maximizes the likelihood of the outcome X = 4. n we have    r n−r k m−k P (X = k) =   . If the 20 animals captured later are taken in such a way that all the n-choose-20 possible groups are equally likely. Var(X) = npq . then the probability that 4 of them are tagged is    10 n − 10 4 16 L(n) =   . Then X has the hypergeometric distribution with parameters r. n m The binomial. we estimate the population size to be the value n that maximizes L(n).4 (Drawing without replacement) An urn contains n balls. The idea is to estimate the value of n as the value that makes the observed outcome (X = 4 in this example) most probable. r of which are black and n−r are red. It is left as an exercise to check that L(n)/L(n − 1) > 1 if and only if n < 50. Example 5. . n 20 The number n cannot be precisely determined from the given information. the multinomial and the Poisson distributions arise when one wants to count the number of successes in situations that generally correspond to drawing from a population with replacement. . n and m. On a later occasion. tagged.

therefore. Expectation and variance are given by 1 1 E[X] = . then Z x FX (x) = fX (y)dy Z∞x = fX (y)dy a x−a = . b].5.  The expectation and variance are easily calculated: a+b E[X] = 2 (b − a)2 Var(X) = . written X ∼ U (a. If x ∈ [a. written X ∼ Exp(λ) if it has the PDF ( 0 if x < 0 fX (x) = λe−λx if x ≥ 0. Var(X) = 2 . λ λ 30 . given by ( 0 if x < 0 FX (x) = 1 − e−λx if x ≥ 0.9 The exponential distribution A random variable X has an exponential distribution with parameter λ > 0. 2 5. The cumulative distribution function is. b−a Therefore  0  if x < a x−a FX (x) = if a ≤ x ≤ b  b−a 1 if x > b. if the PDF is given by ( 1 if a ≤ x ≤ b fX (x) = b−a 0 otherwise. b). b].8 The uniform distribution A random variable X has a uniform distribution over the range [a.

More precisely. the probability of surviving for an additional time s is the same probability it had initially to survive for a time s. Thus the system does not keep any memory of the passage of time. Then T ∼ Exp(λ). t ≥ 0. To put it differently. Proposition 5. its death cannot be due to an “aging mechanism” since.3 Consider a Poisson process with rate λ.2 (Memoryless property) If X ∼ Exp(λ). Suppose that something has a random life span which is exponentially distributed with parameter λ. the atom of a radioactive element. if an entity has a life span that is exponentially distributed. then we have P (X > s + t|X > t) = P (X > s) for any s. we have the following proposition. P ({X > s + t} ∩ {X > t}) P (X > s + t|X > t) = P (X > t) P (X > s + t = P (X > t) 1 − P (X ≤ s + t = 1 − P (X ≤ t) 1 − FX (s + t) = 1 − FX (t) 1 − (1 − e−λ(s+t) ) = 1 − (1 − eλt ) = e−λs = 1 − (1 − e−λs ) = 1 − FX (s) = 1 − P (X ≤ s) = P (X > s). Let T be the time to the first event (after 0). The first property states can be interpreted as follows. Proof. Exponential random variables often arise as random times. (For example. having survived for a time t. Proposition 5. The following propositions contain some of their most notable properties. 31 .) Then. the chances of it surviving an extra time s are the same as the chances that it would have survived to time s from the very beginning. having survived by time t.  The next proposition states that the inter-event times for a Poisson random variable with parameter λ are exponentially distributed with parameter λ. We will see them very often.

Let Nt be the number of events in the interval (0. where λ 0 = λ1 + λ 2 + · · · + λ n . n. P (T ≤ h) is approximately λh and due to the independence property of the Poisson process. 32 .  So the time of the first event of a Poisson process is an exponential random variable. . . . . X2 .4 Let Xi ∼ Exp(λi ). 2. . Thus the times between events of the Poisson process are exponential. and define X0 = min{X1 . this is the probability for any time interval of length h. Then Nt ∼ Po(λt). The exponential distribution with parameter λ can therefore also be reinterpreted as the time to an event of constant hazard λ. There is another way of thinking about the Poisson process that this result suggests. This is the distribution function of an Exp(λ) random variable. t] (for given fixed t > 0). For a small time h we have P (T ≤ h) 1 − e−λh 1 − (1 − λh) O(h2 ) = = + →λ h h h h as h → 0. Proposition 5. Then X0 ∼ Exp(λ0 ). So for very small h. . . . The Poisson process can therefore be thought of as a process with constant event “hazard” λ. i = 1. Consider the cumulative distribution function of T : FT (t) = P (T ≤ t) = 1 − P (T > t) = 1 − P (Nt = 0) (λt)0 e−λt =1− 0! = 1 − e−λt . be independent random variables. Using the independence properties of the Poisson process. so T ∼ Exp(λ). Xn }.Proof. where the “hazard” is essentially a measure of event density on the time axis. it should be clear (more details later) that the time between any two such events has the same exponential distribution. The next proposition describes the distribution of the minimum of a collec- tion of independent exponential random variables.

. Then J is a discrete random variable with probability mass function λi P (J = i) = . Proof. define the random variable Y = mink6=j {Xk } and set λ−j = 33 . λ+µ  The next result gives the likelihood of a particular exponential random vari- able of an independent collection being the smallest. . . . 2. Proof. .  Proposition 5. Then P (X < Y ) = λ/(λ + µ).Proof. λ0 where λ0 = λ1 + · · · + λn . . . n. n be independent random variables and let J be the index of the smallest of the Xi .5 Suppose that X ∼ Exp(λ) and Y ∼ Exp(µ) are independent random variables.6 Let Xi ∼ Exp(λi ). . Then P (X0 > x) = P (min{Xi } > x) i = P ({X1 > x} ∩ {X2 > x} ∩ · · · ∩ {Xn > x}) Yn = P (Xi > x) i=1 Yn = e−λi x i=1 = e−x(λ1 +···+λn ) = e−λ0 x . i = 1. i = 1. For each j. First note that for X ∼ Exp(λ) we have P (X > x) = e−λx . Proposition 5. Z ∞ P (X < Y ) = P (X < Y |Y = y)f (y)dy Z0 ∞ = P (X < y)f (y)dy 0 Z ∞ = (1 − eλy )µe−µy dy 0 λ = . 2.

Then for α > 0. σ 2 ) if it has probability density function (  2 ) 1 1 x−µ fX (x) = √ exp − σ 2π 2 σ for −∞ < x < ∞ and σ > 0. Tn are independent random variables with a common exponential distribution with parameter λ.11 The normal distribution The normal.7 Let X ∼ Exp(λ). written X ∼ N (µ. . One reason for this is that sums of random variables often approximately follow a normal distribution. or Gaussian. . It follows from this claim that the expectation and variance of an Erlang random variable are given by n n E[X] = .λ0 − λj . λ0  From the formula for a linear transformation of a random variable we im- mediately have: Proposition 5. . Y = αX has distribution Y ∼ Exp(λ/α). λ λ 5. . (n − 1)! It can be shown that if T1 . Var(X) = 2 . then Sn = T 1 + T 2 + · · · + T n has the Erlang distribution with parameters n and λ.10 The Erlang distribution A continuous random variable X taking values in [0. 5. Definition 5. T2 . 34 . Then P (J = j) = P (Xj < mink6=j {Xk }) = P (Xj < Y ) λj = λj + λ−j λj = .1 A random variable X has a normal distribution with parame- ters µ and σ 2 . ∞) is said to have the Erlang distribution is it has PDF λ(λx)n−1 −λx f (x) = e . distribution is one of the most important distributions in probability theory.

35 . The random variable Z is said to have the standard normal distribution if Z ∼ N (0. Therefore. But we know that fx (x) = (1/σ)φ((x − µ)/σ). and is given by Z z Φ(z) = φ(x)dx. any linear combination of independent normal random variables is also a normal random variable. The cumulative distribution function of a standard normal random variable is denoted Φ(z). σ12 ) and X2 ∼ N (µ2 . σ It is also easily shown that the cumulative distribution function satisfies:   x−µ FX (x) = Φ σ and so the cumulative probabilities for any normal random variable can be calculated using the tables for the standard normal distribution. α > 0. σ 2 ). if X ∼ N (µ. 1) and let X = µ + σZ. then Y = X1 + X2 is also normal and Y ∼ N (µ1 + µ2 . The result is E[X] = µ. for σ > 0. 1). σ12 + σ22 ). Then X ∼ N (µ. the density of Z. The sum of normal random variables is also a normal random variable. Therefore. The mean and variance of the resulting random variable can then be calculated from the proposition. σ22 ) are independent nor- mal random variables. Note that the PDF is symmetric about x = µ. −∞ There is no simple analytic expression for Φ(z) in terms of elementary functions. Var(X) = σ 2 . Conversely.8 If X1 ∼ N (µ1 . Checking that the density integrates to 1 requires the well-known integral Z ∞ r −αx2 π e dx = . This is shown in the following proposition. Proposition 5. which is usually denoted φ(z). Consider Z ∼ N (0. 1). The elementary proof will be left as an exercise. is given by   1 1 2 φ(z) = √ exp − z 2π 2 for −∞ < z < ∞. from which the claim follows. σ 2 ). then X −µ Z= ∼ N (0. −∞ α We leave the calculation of this and the variance as an exercise. so the median and mean of the distribution will be µ.

X2 ) and of (R. The following propo- sition is needed to justify this claim. This is what we will typically use when sampling from a normal distribution. X2 ) in polar coordinates. First assume that X1 and X2 are independent standard normal random vari- ables. and R2 ∼ Exp(1/2). Let R and Θ be the radius and angle expressing the vector valued random variable X = (X1 . Proposition 5. 2π Using the general change of coordinate formula. One simple way to generate a normal random variable is to use the central limit theorem. x2 = r sin(θ). Proof. The PDFs of (X1 . This consists of first simulating a uniform and an exponential random variable independently: Θ ∼ U (0.9 Let X1 and X2 be random variables with values in R. This method is not very efficient since it requires sampling from the uniform distri- bution many times for a single realization of the normal distribution. we obtain . Θ) are related by the change of coor- dinate x1 = r cos(θ). 5.12 The Box-Muller method A more efficient procedure for simulating normal random variables is the so- called Box-Muller method. the PDF of the vector random variable X is the product of the respective PDFs 1 − 21 (x21 +x22 ) fX (x1 . Consider n 12 X Z= (Ui − 1/2) n i=1 where Ui are independent random variables with the uniform distribution on [0. Then X1 . x2 ) = f1 (x1 )f2 (x2 ) = e . 2π). 2π). X2 = R sin(Θ) are two independent standard normal random variables. In Matlab we can sample a standard normal random variable using the command randn. Then X1 = R cos(Θ). 1]. Then Z has mean 0 and variance 1 and is approximately normal. By independence. X2 are independent standard normal random variables if and only if R2 and Θ are independent with R2 ∼ Exp(1/2) and Θ ∼ U (0. which has the same usage as rand.

.

.

∂(x1 . x2 ) .

x2 ) . f(R.Θ) (r.X2 (x1 . θ) = fX1 .

.

.

θ) . ∂(r.

.

 .

1 −r2 /2 .

.

cos θ −r sin θ .

.

= e det 2π .

sin θ r cos θ .

2π 36 . 1 2 = × re−r /2 .

Θ) splits as a product of a (constant) function of θ and a function of r. this time for the function g(r) = r2 we obtain that the PDF for R2 is 1 −1u fR2 (u) = e 2 .Therefore the PDF of (R. Requires the function %exponential(lambda. using the Box-Muller method.5.x2]’.*sin(theta). This shows that R and Θ are independent. y=[x1.X2) using %the Box-Muller method. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y=stdnormal2d(n) %Simulates n realizations of two independent %standard normal random variables (X1. The converse is shown similarly. 2π) 2 and R has PDF re−r /2 .n). The program is shown below.n) to simulate an exponential random variable. Applying the transformation formula again. The following program implements the Box-Muller method. The output is a matrix %of size n-by-2. x2=r.  3 2 1 0 −1 −2 −3 −5 −4 −3 −2 −1 0 1 2 3 4 5 Figure 7: Simulation of 5000 realizations of a two dimensional standard normal random variable. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 37 . Θ ∼ U (0.n)).n). 2 This shows that R2 and Θ are as claimed. theta=2*pi*rand(1. r=sqrt(exponential(0. It uses the program exponential(lambda. x1=r.*cos(theta).

it follows that Γ(n + 1) = n!. The proof is left as an exercise. Var(X) = 2 . Γ(1/2) = π. Note that Γ(1. so the gamma distribution is a generalization of the exponential distribution.13 The χ2n distribution If Zi are independent standard normal random variables. We note the following property of gamma random variables. if it has PDF ( α β xα−1 e−βx if x > 0 f (x) = Γ(α) 0 if x ≤ 0. β β We show how to compute E[X] and leave the variance as an exercise. Also worth noting. written X ∼ Γ(α. β. is defined by the integral Z ∞ Γ(x) = y x−1 e−y dy. It is also not difficult to show that if X ∼ Γ(α. then α α E[X] = . β Figure 8 shows the graph of the PDF function for Γ(4. λ) = Exp(λ). 0 It is not difficult to show from the definition that Γ(1) = 1 and Γ(x+1) = xΓ(x). If x = n is√a positive integer. A random variable X has a gamma distribution with parameters α. Z ∞ E[X] = xf (x)dx Z0 ∞ β α α−1 −βx = x x e dx 0 Γ(α) α ∞ β α+1 α −βx Z = x e dx β 0 αΓ(α) Z ∞ α β α+1 = xα e−βx dx β 0 Γ(α + 1) α = . 5. 38 . β). β). 1). then n X X= Zi i=1 has a χ2n distribution.0.14 The gamma distribution The gamma function.5. Γ(x).

06 0. u=rand. for j=1:n t=2.03 0.04 0. λ). u=3. 0. Superposed to it are the frequencies given by the density f (x) = x3 e−x /6.n) %Simulates n independent realizations of a Gamma(r. while (t<u) e=-r*log(rand).05 0. β). β) and X2 ∼ Γ(α1 . and Y = X1 + X2 .10 If X1 ∼ Γ(α1 . end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 39 . t=((e/r)^(r-1))*exp((1-r)*(e/r-1)). This proposition implies that the sum of n independent exponentially dis- tributed random variables with parameter λ is Γ(n. it follows that the time of the nth event of a Poisson process is Γ(n. end y(j)=e.08 0. Proposition 5. Since the inter-event times of a Poisson process with rate λ are exponential with parameter λ.07 0. λ).1) %random variable.01 0 0 5 10 15 Figure 8: Stem plot of the relative frequencies of 5000 independent realizations of a random variable with Γ(4.02 0. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y=gamma(r. β) are two independent random variables.09 0. 1) distribution. then Y ∼ Γ(α1 + α2 .

T .n). Exercise 6. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 6 Exercises and Computer Experiments Exercise 6.5 At each lecture a professor will call one student at random from a class of 30 students to solve a problem on the chalkboard.b. find the expected value and variance of T . m + 5σ).n). the waiting time for the next distinct element is a random variable. 2. If the time of wait is T .3 (Geometric distribution) We are drawing with replacement from a population of size N and want to know how long we need to wait to obtain exactly r distinct elements. 40 . y=x1.15 The beta distribution %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y=beta(a.b) %random variable.n) %Simulates n independent realizations of a Beta(a. Exercise 6. x1=gamma(a. k = 1.5. M N The total waiting time. with the geometric distribution. then E[X] = 1/λ. show that E[X] = (a + b)/2 and Var(X) = (b − a)2 /2. Exercise 6. . then with probability at least 0. Tn . Hint: If we have just obtained n distinct elements.2 (Chebyshev’s inequality) Use Chebyshev’s inequality to show that if X is a random variable with finite mean m and finite variance σ 2 . On average. b). how many lectures will you have to wait to be called? Exercise 6. Exercise 6. 3.7 Show that if X ∼ Exp(λ). Now use the fact that E[T ] is the sum of the expectations E[Ti ]. is then T = 1 + T1 + T2 + · · · + Tr−1 . x2=gamma(b.6 If X ∼ U (a.1 Show that the variance of a random variable X satisfies Var(X) = E[X 2 ] − E[X]2 . . It satisfies  n k−1 N − n P (Tn = k) = . Exercise 6. ./(x1+x2). and Var(X) = 1/λ2 .4 Show that the cumulative distribution function of a Geom(p) random variable is FX (k) = P (X ≤ k) = 1 − (1 − p)k . Thus the probability that you will be selected on any particular day is 1/30.96 X is in (m − 5σ. n < r.

Suppose that f (x) is zero for x in the complement of the interval [a. This means that there are positive integers N and M such that b−a d−c =e= . d].f vector discretization of a function % . e=(b-a)/N. b].(k-1)*N+1]) 41 .b. .j−N } for j = 1. Hint: show that the convolution of the indicator functions of the first two intervals is zero in the complement of the third.n degree of convolution %Output . Assume that the lengths of the two intervals are multiples of a common small positive step size e. The following script implements the approximation for the convolution in- tegral to obtain the nth convolution power of a function f .n) %Input . We wish to obtain an approximation formula for the convolution h = f ∗ g of f and g by discretizing the convolution integral. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function g=convolution(f. nb] N=length(f)-1. b + d]. .9 Suppose that f (x) and g(x) are two functions of a real variable x ∈ R which are zero outside of the intervals [a.h vector approximating of the n degree % convolution of f with itself % over interval [na. N + M + 1. d]. . h=zeros(size(x)). g=f. for j=1:k*N+1 for i=max([j-N. s=[a:e:b]. respectively.1]):min([j. for k=2:n x=[k*a:e:k*b]. Show that (f ∗ g)(x) is zero in the complement of [a + c. and g(x) is zero in the complement of [c.6.M +1} X h(xj ) = f (a + (j − i)e)g(c + (i − 1)e)e. i=max{1.a and b are the left and right endpoints % of an interval outside of which f is zero % . b] and [c.1 Convolution of PDFs Exercise 6.a. . Exercise 6. N M Show that the approximation of (f ∗ g)(x) over the interval [a + c.8 Let f (x) and g(x) be two functions of a real variable x ∈ R. b + d] by Riemann sum discretization of the convolution integral is min{j.

So write now x=[-n:0.xout]=hist(y.1:m+3*s.n/M) %stem plot of the relative frequencies %Now. plot(x.01:1]. x=m-3*s:0. This is done with the command %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% g=convolution(f.5. end [n. 1].^2. %Number of trials y=[].-1. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% When plotting g.01:n]. %Probabilities of the outcomes of tossing a die N=1000. for i=1:M y=[y sum(samplefromp(p.10 Let f be the function f (x) = cx2 over the interval [−1. %n is the count in each bin. f=f/sum(f).2 Central limit theorem %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% rand(’seed’.20). we invoke the function convolution defined above.121) p=ones(1. nb].1.g). Draw the graphs of f and the convolution powers of degree n = 2. dx=xout(2)-xout(1).n). We discretize it and write in Matlab: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% x=[-1:0. we superpose the plot of the normal density m=N*3.N))]. f=x. and xout %is a vector giving the bin locations hold off stem(xout. s=sqrt(N*35/12).6)/6. end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Exercise 6. 5 and 20. %Number of tosses in each trial M=500. 42 . h(j)=h(j)+f(j-i+1)*g(i)*e. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% To find the convolution power f ∗n = f ∗ · · · ∗ f of degree n. where c is a normalization constant. keep in mind that it may be non-zero over the bigger interval [na. end end g=h. 6.

sigma) %Input . 1] using the transformation method. 43 . . sigma: parameters of normal distribution % . 4 Use the uniform rejection method. . superpose the graph of f (x) (appropriately normalized so as to give be correct frequencies of each bin. pi = 1. . nk ! i=1 i=1 Geometric P (X = k) = (1 − p)k−1 p. . superpose the graph of f (x) (appropriately normalized so as to give be correct frequencies of each bin.value of the pdf at x y=exp(-0.x a real number %Output . . . .5*((x-m)/s). . 1. k = 1.^2)/(sigma*sqrt(2*pi)). hold on plot(x.3 Normal distributions %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y=normalpdf(x. .mu. n2 . . . k = 0. 2. k Multinomial P k k ( ni )! Y ni X P (n1 .f) grid %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Exercise 6. 1] with probability density 3 f (x) = (1 − x2 ).12 (Uniform rejection method) Write a program to simulate a random variable taking values in [−1. Simulate 1000 realizations of X and plot a stem plot with 20 bins. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 7 Appendix: summary of famous distributions Binomial   n P (X = k) = pk (1 − p)n−k .11 (Transformation method) Write a program to simulate a random variable X with probability density function f (x) = 3x2 over the inter- val [0.) Exercise 6.) 6.5*(x-mu). nk ) = pi . . n. Simulate 1000 realizations of X and plot a stem plot with 20 bins. n1 !n2 ! . . .mu.f=(1/(s*sqrt(2*pi)))*exp(-0. On the same coordinate system.^2)*dx. On the same coordinate system. n.

x ≥ 0. x ∈ Rk . k = 0. Γ(n) Chi-square with ν degrees of freedom xν/2−1 e−x/2 f (x) = . b−a Negative binomial   n+k−1 P (X = n + k) = pn (1 − p)k . .b] (x). νπΓ(ν/2) ν Cauchy 1 β f (x) = . Γ(ν/2)2ν/2 tν x2   Γ((ν + 1)/2) f (x) = √ 1+ . 2 44 . 2. k Exponential f (x) = λe−λx . . π β 2 + (x − α)2 Weibull κ f (x) = κρ(ρx)κ−1 e−(ρx) . 2. Normal 1 1 x−µ 2 f (x) = √ e− 2 ( σ ) . x ≥ 0. . Gamma λn xn−1 e−λx f (x) = . f (x) = e x ≥ 0. . x ∈ R. k! Uniform 1 f (x) = I[a. k = 0. x ≥ 0.Poisson λk e−λ P (X = k) = . x ≥ 0. . σ π Multivariate normal   1 f (x) = (2π)−k/2 | det(C)|−1/2 exp − (x − µ)0 C −1 (x − µ) . x ∈ R. σ 2π Half-normal r 1 2 − 21 ( x−µ 2 σ ) . x ∈ R. 1. . 1. x ∈ R.

Ombach. 45 . Arnold. Springer. [Wilk] Darren J. (1 + e−α x−β )2 References [CKO] S. Wadsworth & Brooks/Cole. Wilkinson. Applied stochastic modelling. Morgan. 2006. Logistic βe−(α+βx) f (x) = . (1 + e−(α+βx) )2 Log-logistic βe−α x−(β+1) f (x) = . Chap- man and Hall/CRC.A. 2000. From Elementary Probabil- ity to Stochastic Differential Equations with MAPLE.T. Cyganowski. Kloeden. P. x ≥ 0. 1988. and J. Mathematical Statistics and Data Analysis. [Mor] Byron J. [Rice] J. 2002. Rice. Stochastic Modelling for Systems Biology. x ∈ R.