Probability Distribution

Continuous Probability Distributions

A continuous random variable can assume any value in an
interval on the real line or in a collection of intervals.

It is not possible to talk about the probability of the random
variable assuming a particular value.

Instead, we talk about the probability of the random
variable assuming a value within a given interval.

Continuous Probability Distributions

The probability of the random variable assuming a value
within some given interval from x1 to x2 is defined to be the
area under the graph of the probability density function
between x1 and x2.

f (x)

f (x) Exponential

Uniform
f (x)

x1 x2

Normal

x1 xx12 x2

x
x1 x2

x

x

Normal Probability Distributions
 The normal probability distribution is the most important

distribution for describing a continuous random variable.
 It is widely used in statistical inference.

14159 e = 2.71828 .Normal Probability Distributions  Normal Probability Density Function 1  ( x   )2 /2  2 f (x)  e  2 where:  = mean  = standard deviation  = 3.

Normal Probability Distributions  It’s a probability function. so no matter what the values of  and . must integrate to 1!    1 2 E(X)= = 1 x 2  ( )  e 2  dx  x   1 2 Var(X)=2 = ( x 2   1 1 x 2  ( )  e 2  dx 1  2 Standard Deviation(X)= 1 x 2  ( ) 2  e dx)   2 .

Normal Probability Distributions  Characteristics The distribution is symmetric. its skewness measure is zero. x .

Standard Deviation  Mean  x .Normal Probability Distributions  Characteristics The entire family of normal probability distributions is defined by its mean  and its standard deviation  .

Normal Probability Distributions  Characteristics The highest point on the normal curve is at the mean. which is also the median and mode. x .

or positive. x -10 0 20 .Normal Probability Distributions  Characteristics The mean can be any numerical value: negative. zero.

 = 15  = 25 x .Normal Probability Distributions  Characteristics The standard deviation determines the width of the curve: larger values result in wider. flatter curves.

5 . .5 x . The total area under the curve is 1 (.5 to the left of the mean and 0.Normal Probability Distributions  Characteristics Probabilities for the normal random variable are given by areas under the curve.5 to the right).

P(x = 10) = 0 P(x = 3) = 0 P(x = 7.Normal Probability Distributions Since the area under the curve represents probability. With a single value. the probability of a normal random variable at one specific value is zero .7) P(x > 3) . one can find the following probabilities: P( 1 < x < 3) P(2.5) = 0 However. Thus.2 < x < 3. one can’t find the area since the area must be bound by two values.

99.Normal Probability Distributions  Characteristics 68. .2 standard deviations of its mean.3 standard deviations of its mean.72% of values of a normal random variable are within +/.44% of values of a normal random variable are within +/.26% of values of a normal random variable are within +/. 95.1 standard deviation of its mean.

72% 95.44% 68.26%  – 3  – 1  – 2   + 3  + 1  + 2 x .Normal Probability Distributions  Characteristics 99.

the areas under the curves between any two points are also different.  To make life easier.Normal Probability Distributions  There may be thousands of normal distribution curves. Almost all values fall within 3 standard deviations.26%. the area between - and + is about 68.44%. each with a different mean and a different standard deviation. A standard normal distribution has a mean of 0 and a standard deviation of 1. . the area between -2 and +2 is about 95. and the area between -3 and +3 is about 99. all normal distributions can be converted to a standard normal distribution.72%.  No matter what  and  are. Since the shapes are different.

5 .How good is rule for real data?  Check some example data:  The mean of the weight of the women = 127.8  The standard deviation (SD) = 15.

68% of 120 = .68x120 = ~ 82 runners In fact.5 lbs) of the mean. 112. 79 runners fall within 1-SD (15.3 127.3 25 20 P e r c e n t 15 10 5 0 80 90 100 110 120 POUNDS 130 140 150 160 .8 143.

8 158.8 25 20 P e r c e n t 15 10 5 0 80 90 100 110 120 POUNDS 130 140 150 160 . 115 runners fall within 2-SD’s of the mean.8 127. 96.95 x 120 = ~ 114 runners In fact.95% of 120 = .

7% of 120 = .8 174. all 120 runners fall within 3-SD’s of the mean. 81.99.3 25 20 P e r c e n t 15 10 5 0 80 90 100 110 120 POUNDS 130 140 150 160 .6 runners In fact.3 127.997 x 120 = 119.

then:  68% of students will have scores between 450 and 550  95% will be between 400 and 600  99.S. population of college-bound students (with range restricted to 200-800).Example Suppose SAT scores roughly follows a normal distribution in the U. and the average math SAT is 500 with a standard deviation of 50.7% will be between 350 and 650 .

Standard Normal Probability Distributions The formula for the standardized normal probability density function is 1 p( Z )  e (1) 2 1 Z 0 2  ( ) 2 1 1  e 2 1  ( Z )2 2 .

The Standard Normal Distribution (Z) All normal distributions can be converted into the standard normal curve by subtracting the mean and dividing by the standard deviation: Z X   Somebody calculated all the integrals for the standard normal and put them in a table! So we never have to integrate! Even better. computers now do all the integration. .

Standard Normal Probability Distributions The letter z is used to designate the standard normal random variable. 1 z 0 .

A score of 575 is 1..5 50 i.e. =500 and =50? 575  500 Z  1.Applications of Standard Normal Distribution Problem: What’s the probability of getting a math SAT score of 575 or less.5 1 x 500 2  ( )  e 2 50 dx     1 2 1  Z2  e 2 dz Yikes! But to look up Z= 1.9332 .5 standard deviations above the mean 575  P( X  575)  1  (50) 200 2 1.5 in standard normal chart (or enter into SAS) no problem! = .

Applications of Standard Normal Distribution Problem: Test scores of a special examination administered to all potential employees of a firm are normally distributed with a mean of 500 points and a standard deviation of 100 points. What is the probability that a score selected at random will be higher than 700? P(x > 700) = ? If we convert this normal variable. z. z = (x .µ) / σ = (700 – 500) / 100 = 2 -------------500----------700 x-scale P(x > 700) = P(z > 2) ----------------0-----------2 z-scale . to a standard normal variable. x.

What is the chance of obtaining a birth weight of 120 or lighter? . What is the chance of obtaining a birth weight of 141 oz or heavier when sampling birth records at random? b. a.Problem If birth weights in a population are normally distributed with a mean of 109 oz and a standard deviation of 13 oz.

Solution a.46 corresponds to a right tail (greater than) area of: P(Z≥2.69 % .46) = 1-(.46 13 From the chart or SAS  Z of 2.9931)= .0069 or . What is the chance of obtaining a birth weight of 141 oz or heavier when sampling birth records at random? 141 109 Z  2.

Solution b.8023= 80.85) = .85 corresponds to a left tail area of: P(Z≤. What is the chance of obtaining a birth weight of 120 or lighter? 120  109 Z  .23% .85 13 From the chart or SAS  Z of .

51 Z=1.Looking up probabilities in the standard normal table What is the area to the left of Z=1.45% .51 Area is 93.51 in a standard normal curve? Z=1.

Are my data “normal”?  Not all continuous random variables are normally distributed!!  It is important to evaluate how well the data are approximated by a normal distribution .

Do 2/3 of observations lie within 1 std dev of the mean? Do 95% of observations lie within 2 std dev of the mean? 4. and mode similar? 3. median. Look at the histogram! Does it appear bell shaped? 2. Look at a normal probability plot—is it approximately linear? 5. be cautious.Are my data normally distributed? 1. highly influenced by sample size! . Compute descriptive summary measures—are mean. But. Run tests of normality (such as Kolmogorov-Smirnov).

then the binomial starts to look like a normal distribution in fact.6.Normal approximation to the binomial When you have a binomial distribution where n is large and p is middle-of-the road (not too small. closer to . not too big. this doesn’t even take a particularly large n Recall: What is the probability of being a smoker among a group of cases with lung cancer is . what’s the probability that in a group of 8 cases you have less than 2 smokers? .5).

Here np=4. with a bell curve shape. then the binomial starts to look like a normal distribution Recall: smoking example… .8. You can imagine that if n got larger. 1 2 3 4 5 6 7 8 .27 0 Starting to have a normal shape even with fairly small n.Normal approximation to the binomial When you have a binomial distribution where n is large and p isn’t too small (rule of thumb: mean>5). the bars would get thinner and thinner and this would look more and more like a continuous function.

008 = 0.8 =1.27 0 1 2 3 4 5 6 7 8 What is the probability of fewer than 2 smokers? Exact binomial probability (from before) = .8 Z   2 1.00865 Normal approximation probability: =4.00065 + .8)  2.39 2  (4.39 1.Normal approximation to binomial .39 P(Z<2)= 0.0227 .

called the “continuity correction”)… 1.39 P(Z≤-2.00865. but in the right ballpark… we could also use the value to the left of 1. .5 (as we really wanted to know less than but not including 2.39 1.0089 A fairly good approximation of the exact probability.37 1.3 Z   2.5  (4. .8)  3.37) =.A little off.

25 for the study duration. If the probability of developing disease in the exposed group is . then if you sample (randomly) 500 exposed people.Practice problem 1. You are performing a cohort study. What’s the probability that at most 120 people develop the disease? .

75) +  (. normal approximation: =np=500(.75) +  (.+ P(X=120)= (.25) (. 0. put Cohort.323504227 OR use.75) +  (. .25) (.25) (.Solution: By hand (yikes!): P(X≤120) = P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)+…. 120.52)= 0.25) (.25.25)=125 and 2=np(1-p)=93.3015 500 0 0 500 .75. Cohort=cdf('binomial'. run.75) … 500 120 120 380 500 2 498 2 500 1 499 1 OR Use SAS: data _null_.68 P(Z<-. =9. 500).

5) = .More Sample Problems a.0) = c.5) = b.5) = d. P(z ≤ 1. P(0 < z < 2. P(z ≤ 1. P(1 ≤ z ≤ 1.

5) = 0.5000 = 0.9332 – 0. P(0 < z < 2. P(1 ≤ z ≤ 1.4938 .0) = 0.8413 = 0.09 d. P(z ≤ 1.9332 b. P(z ≤ 1.9938 – 0.5) = 0.More Sample Problems a.5) = 0.8413 c.

1.1. P(. P(z ≥ .More Sample Problems a.0) = b. P(z ≤ .1) = c. P(z ≥ .5) = d.3 < z ≤ 0) = .

3 < z ≤ 0) = 0. P(z ≥ .5) = 1 – 0.0668 = 0. P(.1.1.1) = 1 – 0.8413 c.9332 d.1587 = 0.1587 b.0) = 0. P(z ≥ .0014 = 0. P(z ≤ .5) = 1 – P(z ≤ .1) = 1 – P(z ≤ . 4986 .5 – 0.1.More Sample Problems a.

More Sample Problems Given: µ = 77 σ = 20 a. P(x < 50) = ? Convert to z: z = (x .15) = 1 – 0.35) = 0.35 P(x < 50) = P(z < .15 P(x > 100) = P(z > 1.1. P(x > 100) = ? z = (100 – 77) / 20 = 1.51 % .1.8749 = 0.0885 b.15) = 1 – P(z ≤ 1.1251 or 12.µ) / σ = (50 – 77) / 20 = .

Continuation of Sample Problem c. The closest entry is 0. Use this value of z in the following equation: z = (x .84. x = ? to be considered a heavy user Upper 20% of the area is in the right tail of the normal curve.8 hours .8 (or 80%) as the table entry.µ) / σ 0.84 = (x – 77)/ 20 x = 93.7995. That point represents a z-value of 0. Go to Table 1 and locate 0. 80% of the area is to the left.

µ) / σ 1.04 = (x – 75) / 10 x = 85.More Sample Problems A statistics instructor grades on a curve. He does not want to give more than 15 percent A in his class.4 or 85 . what should be the cut-off point for an A? z = (x . If test scores of students in statistics are normally distributed with a mean of 75 and a standard deviation of 10.

but does not know the length of the warranty.Sample Problems The service life of a certain brand of automobile battery is normally distributed with a mean of 1000 days and a standard deviation of 100 days.µ) / σ .1. It does not want to replace more than 10 percent of the batteries sold.28 = (x – 1000) / 100 x = 872 days . The manufacturer of the battery wants to offer a guarantee. What should be the length of the warranty? z = (x .

Reference:  Anderson Sweeney Williams .