You are on page 1of 4

# The Memoryless Property

Theorem 1 Let X be an Exponential random variable with parameter > 0. Then X has the memoryless property, which means that for any two real numbers a, b > 0, P (X > a + b|X > b) = P (X > a). WARNING: This is not saying that P (X > a + b|X > b) = P (X > a + b). That would mean - literally - that the future values of X are independent of the past, which is not correct. It is simply saying - intuitively - that the probability that X is greater than some positive value does not remember the past; it may still depend on the past, however. Proof. First, well derive an expression for P (X > t) for any t > 0.
t

P (X > t) = 1 P (X t) = 1

ex dx = 1

= 1 + et 1 = et
0

Now, compute P (X > a + b|X > b) using the denition of conditional probability. P (X > a + b|X > b) = P ({X > a + b} {X > b}) P (X > b) P (X > a + b) = P (X > b) e(a+b) eb e a eb = eb = ea = But we just showed that ea is exactly P (X > a). Thus, we see that P (X > a + b|X > b) = P (X > a). Some Remarks on the Consequences and Interpretations of the Memoryless Property First of all, keep in mind that this property is not a universal property. It does not hold for all continuous random variables. Moreover, it would obviously not apply in all physical situations. For instance, suppose X is the lifetime of a car engine given in terms of number of miles driven. If the engine has lasted 200,000 miles, we might not expect - based on actual, physical experience - that the probability that the engine lasts another 100,000 miles is the same as the probability that the engine lasts 100,000 miles from the time it was rst built. That is, we would probably not expect to have P (X > 300, 000|X > 200, 000) = P (X > 300, 000 200, 000) = P (X > 100, 000). But if empirical data showed that the lifetime of a car engine was, in fact, exponentially distributed, then this property would, indeed, hold, whether it matches your intuition or not. In fact, this property and the questions that were raised in class regarding it bring up an important philosophical point in probability theory. Probabilities do not exist vacuously, and they are not universal from situation to another. They carry with them some particular distribution or form. When you estimate or compute probabilities in real life, everyday experiences, you are -

whether you realize it or not - implicitly imposing some sample space and probability mass/density function on the quantity you are computing. For instance, you would not estimate the probability that you get a particular hand in a game of poker in the same way you would estimate the probability that you have to wait 10 or more minutes in the checkout line at the grocery store. Each of those situations implicitly carries with it a distinct notion of what probability means (or, rather, how it is distributed and how to compute it) in the particular context. Consequently - because dierent probabilities behave in dierent ways - you cannot always fall back on your intuition. Just because your gut tells you something about probabilities in one instance doesnt mean that it holds in all instances. This is a common probabilistic fallacy. Take, for example, a problem like we discussed in class. Let X be the lifetime of a machine, machine component, battery - anything that has a nite lifetime. It may very well be the case that X is not memoryless. For example, if it is known that the machine has already lasted 20 hours, it might not be reasonable - in this particular physical context - to assume that the probability that the machine lasts at least another 15 hours (or that it lasts at least 35 hours total, given that it has lasted 20 hours) is equal to the probability that it lasts at least 15 hours from the time it starts. That is, we might not have that P (X > 20 + 15|X > 20) = P (X > 15). But whether or not this holds would depend on the probability distribution you are assuming applies to X . On the other hand, if you are told or if you know based on empirical data that the lifetime of this machine is exponentially distributed, then the above equation would, indeed, hold, whether it seems intuitive to you or not. It would hold because you are imposing the condition that X is exponentially distributed, which means - regardless of what we think or feel should be right - that X is memoryless. If the lifetime of the machine is not memoryless, then we simply wouldnt describe its lifetime using an exponential random variable; we would use some other random variable. In other words, the memoryless property is a specic property of exponential random variables. If it seems counterintuitive to you in certain physical circumstances - and if your intuition is correct in those circumstances - then that doesnt mean that the memoryless property is wrong. It just means that whatever quantity youre measuring in that particular problem is not an exponential random variable. If this property seems strange and counterintuitive to you, though, perhaps it will be better to think of this in discrete terms, since there is one more random variable that has the memoryless property. Some basic examples follow after that. Another Memoryless Distribution: Geometric Random Variables The geometric random variable also has the memoryless property. This should make intuitive sense given how we interpret exponential random variables. Exponential random variables usually measure the time until some event occurs. On the other hand, recall that geometric random variables describe the rst occurrence of a particular event (i.e. a success) in a Bernoulli experiment. Theorem 2 If X is a geometric random variable with parameter p, then X has the memoryless property, which means that for any two positive integers i, j 1, P (X > m + n|X > n) = P (X > m). Intuitively, this can be interpreted as follows. If you have a Bernoulli experiment of identical, independent trials, each of which results in one of two outcomes with respective probabilities p

and 1 p (e.g. either a 1 or a 0, or a success or a failure, etc.), then X describes when the rst occurrence of the success outcome happens. Now, suppose, for instance, you run through 10 trials of this experiment, all of which have been failures or 0s. Just because youve gotten 10 failures in a row to begin the experiment doesnt mean probabilistically that you should necessarily expect a success to be more likely (or a failure to be less likely) on the 11th trial. If we go purely on intuition - which, as Ive tried to point out above, isnt always the best route in probability you might expect that we should eventually have to get a success, so the longer we go without one, the more likely we are to get one next. But thats not true! Because X is described by a geometric random variable, which has the memoryless property, it actually doesnt matter how many consecutive failures we get. The probability that the rst success occurs at any particular trial is the same as it is at the beginning of the sequence! If the intuition worked in this case, then X wouldnt be a geometric random variable, because it wouldnt be memoryless. But it is geometric, so we have an example where the everyday intuition we try to apply to probabilities fails us. Proof. The probability mass function of X is p(i) = P (X = i) = p(1 p)i1 . As we did before, we will nd an expression for P (X > n) to make things easier.
n n1

P (X > n) = 1 P (X n) = 1
k=1

p(1 p)k1 = 1 p
k=0

(1 p)k = 1 p

1 (1 p)n 1 (1 p)

= P (X > n) = (1 p)n Proceeding as in Theorem 1, we have the following. P (X > m + n|X > n) = P ({X > m + n} {X > n}) P (X > n) P (X > m + n) = P (X > n) (1 p)m+n = (1 p)n = (1 p)m = P (X > m).

Some Basic Examples Memoryless random variables like the exponential random variable may seem strange, but they actually describe many dierent real-world phenomena. For instance, the time between arrivals of customers to a store, drive-thru, or any comparable service center is an exponential random variable. You might expect more customers to arrive at certain times of the day than at others, but just because a customer hasnt arrived in, say, the last 20 minutes doesnt mean that you should expect one in the next 5 minutes with anymore likelihood than you would have 20 minutes ago. By the same reasoning, the time between telephone calls received by a particular phone is an exponential random variable. Ignoring circumstances where youre expecting a planned phone call, and regardless of your average frequency of calls, just because you havent received a call in, say, the last hour doesnt mean that youre anymore likely to receive a call in the next 10 minutes than you were at the beginning of the hour. On the other hand - to admittedly contradict one of my own examples from class - the lifetime of a cell phone battery would not be well-modeled by an exponential random variable. It works ne

as a mathematical example, but experience tells us that - in physical terms - the longer a battery is used and recharged the more likely it is to ultimately fail. You could model the batterys lifetime with an exponential random variable if you simply chose to, and it would even be appropriate for short times. But in the long run, modeling the overall lifetime exponentially wouldnt be a good idea. Also, you should read through Examples 5d in Section 5.5, and I encourage you to read the portion of Section 4.7 that discusses Poisson processes and see how the exponential random variable is used to model the time between occurrences of an event. In particular, notice that if X is a Poisson random variable with parameter (which means is average/expected number of events that occur in a particular time interval), it follows from the Poisson p.m.f. that P (X = 1) = e . This is just the p.d.f. of an exponential random variable with parameter evaluated at x = 1. If we let Y be this exponential random variable, then the probability that Y is close to 1 is approximately P (1 x < Y < 1 + x) fY (1)x = P (X = 1)x. This shows the connection between the Poisson and Exponential random variables. In particular, it shows why we interpret the expected values the way we do for the Poisson and exponential random variables. If we expect events to occur in a given time interval T (from a Poisson() random variable), then we expect them to occur at a rate of T events per unit time. This means that we T would expect units of time between events. If we set T = 1, we see that expected value of a Poisson random variable is the number of events per unit time, and the reciprocal of that is the rate at which the events occur (unit of time per event). This is, intuitively, why the expected values 1 of Poisson() random variables and Exponential() random variables are, respectively, and reciprocals of each other. Each variable measures an inverse aspect of the same problem from the other variable.