You are on page 1of 12

UCLA STAT 110 A

Applied Probability & Statistics for


Engineers
zInstructor:

Ivo Dinov,

Asst. Prof. In Statistics and Neurology


zTeaching

Assistant:

Neda Farzinnia, UCLA Statistics

University of California, Los Angeles, Spring 2004

http://www.stat.ucla.edu/~dinov/

Stat 110A, UCLA, Ivo Dinov

Slide 1

Chapter 4

Continuous
Random Variables
and Probability
Distributions
Slide 2

4.1

Stat 110A, UCLA, Ivo Dinov

Continuous Random Variables

Continuous Random
Variables and
Probability
Distributions
Slide 3

Stat 110A, UCLA, Ivo Dinov

Probability Distribution
Let X be a continuous rv. Then a
probability distribution or probability
density function (pdf) of X is a function
f (x) such that for any two numbers a
and b,

A random variable X is continuous if its


set of possible values is an entire
interval of numbers (If A < B, then any
number x between A and B is possible).

Slide 4

Stat 110A, UCLA, Ivo Dinov

Probability Density Function


For f (x) to be a pdf
1. f (x) > 0 for all values of x.
2.The area of the region between the
graph of f and the x axis is equal to 1.

P ( a X b ) = f ( x)dx
a

The graph of f is the density curve.


Slide 5

Stat 110A, UCLA, Ivo Dinov

y = f ( x)

Area = 1
Slide 6

Stat 110A, UCLA, Ivo Dinov

Continuous RVs

Probability Density Function


P (a X b) is given by the area of the shaded
region.

y = f ( x)

z A RV is continuous if it can take on any real value in a


non-trivial interval (a ; b).
z PDF, probability density function, for a cont. RV, Y, is
a non-negative function pY(y), for any real value y,
such that for each interval (a; b), the probability that Y
takes on a value in (a; b), P(a<Y<b) equals the area
under pY(y) over the interval (a: b).
pY(y)
z P(a<Y<b)

b
Slide 7

z For a continuous RV the density histograms converge


to the PDF as the size of the bins goes to zero.
AdditionalInstructorAids\BirthdayDistribution_1978_systat.SYD

Slide 9

Stat 110A, UCLA, Ivo Dinov

Uniform Distribution
A continuous rv X is said to have a
uniform distribution on the interval [A, B]
if the pdf of X is
1
A x B

f ( x; A, B ) = B A
0
otherwise
Slide 11

Slide 8

Stat 110A, UCLA, Ivo Dinov

Convergence of density histograms to the PDF

Stat 110A, UCLA, Ivo Dinov

b
Stat 110A, UCLA, Ivo Dinov

Convergence of density histograms to the PDF


z For a continuous RV the density histograms converge
to the PDF as the size of the bins goes to zero.
z

Slide 10

Stat 110A, UCLA, Ivo Dinov

Probability for a Continuous rv


If X is a continuous rv, then for any
number c, P(x = c) = 0. For any two
numbers a and b with a < b,

P ( a X b) = P ( a < X b)
= P( a X < b)

= P( a < X < b)
Slide 12

Stat 110A, UCLA, Ivo Dinov

The Cumulative Distribution Function

4.2

Cumulative Distribution
Functions and Expected
Values
Slide 13

The cumulative distribution function,


F(x) for a continuous rv X is defined for
every number x by

F ( x) = P ( X x ) =

Let X be a continuous rv with pdf f(x)


and cdf F(x). Then for any number a,

P ( X > a ) = 1 F (a)
and for any numbers a and b with a < b,

f ( y )dy

For each x, F(x) is the area under the


density curve to the left of x.
Slide 14

Stat 110A, UCLA, Ivo Dinov

Using F(x) to Compute Probabilities

Stat 110A, UCLA, Ivo Dinov

Obtaining f(x) from F(x)


If X is a continuous rv with pdf f(x)
and cdf F(x), then at every number x
for which the derivative F ( x ) exists,

F ( x) = f ( x).

P ( a X b ) = F (b) F (a)
Slide 15

Stat 110A, UCLA, Ivo Dinov

Let p be a number between 0 and 1. The


(100p)th percentile of the distribution of a
continuous rv X denoted by ( p ), is
defined by
( p)

Slide 17

Stat 110A, UCLA, Ivo Dinov

Median

Percentiles

p = F ( ( p ) ) =

Slide 16

The median of a continuous distribution,


denoted by % , is the 50th percentile. So %
satisfies 0.5 = F ( % ). That is, half the area
under the density curve is to the left of % .

f ( y )dy

Stat 110A, UCLA, Ivo Dinov

Slide 18

Stat 110A, UCLA, Ivo Dinov

Expected Value
The expected or mean value of a
continuous rv X with pdf f (x) is

X = E ( X ) =

x f ( x)dx

Slide 19

The variance of continuous rv X with


pdf f(x) and mean is

X2 = V ( x) =

(x )

If X is a continuous rv with pdf f(x) and


h(x) is any function of X, then
E [ h( x )] = h ( X ) =

f ( x)dx

h( x) f ( x)dx

Slide 20

Stat 110A, UCLA, Ivo Dinov

Variance and Standard Deviation

Expected Value of h(X)

Stat 110A, UCLA, Ivo Dinov

Short-cut Formula for Variance

( )

V ( X ) = E X 2 [ E ( X )]

= E[( X ) ]

The standard deviation is X = V ( x).


Slide 21

Slide 22

Stat 110A, UCLA, Ivo Dinov

Stat 110A, UCLA, Ivo Dinov

Normal Distributions

4.3

The Normal
Distribution

A continuous rv X is said to have a


normal distribution with parameters

and , where < < and


0 < , if the pdf of X is
f ( x) =

Slide 23

Stat 110A, UCLA, Ivo Dinov

2
2
1
e( x ) /(2 )
2

Slide 24

< x <

Stat 110A, UCLA, Ivo Dinov

Standard Normal Distributions


The normal distribution with parameter
values = 0 and = 1 is called a
standard normal distribution. The
random variable is denoted by Z. The
pdf is
2
1
f ( z;0,1) =
e z / 2 < z <
2
The cdf is
z
( z ) = P( Z z ) = f ( y;0,1)dy

Standard Normal Cumulative Areas


Shaded area = (z )

Standard
normal
curve

Slide 25

Slide 26

Stat 110A, UCLA, Ivo Dinov

Standard Normal Distribution


Let Z be the standard normal variable.
Find (from table)

Stat 110A, UCLA, Ivo Dinov

c. P (2.1 Z 1.78)
Find the area to the left of 1.78 then
subtract the area to the left of 2.1.

= P( Z 1.78) P ( Z 2.1)

a. P ( Z 0.85)
Area to the left of 0.85 = 0.8023

= 0.9625 0.0179
= 0.9446

b. P(Z > 1.32)

1 P( Z 1.32) = 0.0934
Slide 27

Slide 28

Stat 110A, UCLA, Ivo Dinov

Stat 110A, UCLA, Ivo Dinov

Ex. Let Z be the standard normal variable. Find z if


a. P(Z < z) = 0.9278.

z Notation
z will denote the value on the

measurement axis for which the area


under the z curve lies to the right of z .
Shaded area
= P( Z z ) =

Look at the table and find an entry


= 0.9278 then read back to find

z = 1.46.
b. P(z < Z < z) = 0.8132
P(z < Z < z ) = 2P(0 < Z < z)
= 2[P(z < Z ) ]
= 2P(z < Z ) 1 = 0.8132

0
Slide 29

P(z < Z ) = 0.9066

z
Stat 110A, UCLA, Ivo Dinov

Slide 30

z = 1.32
Stat 110A, UCLA, Ivo Dinov

Nonstandard Normal Distributions


If X has a normal distribution with
mean and standard deviation , then

Z=

Normal Curve
Approximate percentage of area within
given standard deviations (empirical
rule).
99.7%
95%
68%

has a standard normal distribution.

Slide 31

Stat 110A, UCLA, Ivo Dinov

Ex. Let X be a normal random variable


with = 80 and = 20.
Find P( X 65).
65 80

P ( X 65 ) = P Z

20

= P ( Z .75 )
= 0.2266

Slide 33

Stat 110A, UCLA, Ivo Dinov

96
3.75 6
P ( 3.75 X 9 ) = P
Z

1.5
1.5

= P ( 1.5 Z 2 )
= 0.9772 0.0668

Slide 32

Stat 110A, UCLA, Ivo Dinov

Ex. A particular rash shown up at an


elementary school. It has been
determined that the length of time that the
rash will last is normally distributed with
= 6 days and = 1.5 days.
Find the probability that for a student
selected at random, the rash will last for
between 3.75 and 9 days.

Slide 34

Stat 110A, UCLA, Ivo Dinov

Percentiles of an Arbitrary Normal


Distribution
(100p)th percentile
(100 p )th for
for normal ( , ) = + standard normal

= 0.9104

Slide 35

Stat 110A, UCLA, Ivo Dinov

Slide 36

Stat 110A, UCLA, Ivo Dinov

Normal Approximation to the


Binomial Distribution

Ex. At a particular small college the pass rate


of Intermediate Algebra is 72%. If 500
students enroll in a semester determine the
probability that at least 375 students pass.

Let X be a binomial rv based on n trials, each


with probability of success p. If the binomial
probability histogram is not too skewed, X may
be approximated by a normal distribution with
= np and = npq .

= np = 500(.72) = 360
= npq = 500(.72)(.28) 10
375.5 360
P ( X 375)
= (1.55)
10

x + 0.5 np
P( X x)

npq

Slide 37

= 0.9394
Slide 38

Stat 110A, UCLA, Ivo Dinov

Normal approximation to Binomial

Normal approximation to Binomial Example

z Suppose Y~Binomial(n, p)
z Then Y=Y1+ Y2+ Y3++ Yn, where

z Roulette wheel investigation:


z Compute P(Y>=58), where Y~Binomial(100, 0.47)

Yk~Bernoulli(p) , E(Yk)=p & Var(Yk)=p(1-p)

E(Y)=np & Var(Y)=np(1-p), SD(Y)= (np(1-p))

1/2

Standardize Y:
Z=(Y-np) / (np(1-p))1/2
By CLT Z ~ N(0, 1). So, Y ~ N [np, (np(1-p))1/2]

z Normal Approx to Binomial is


reasonable when np >=10 & n(1-p)>10
(p & (1-p) are NOT too small relative to n).
Slide 39

Stat 110A, UCLA, Ivo Dinov

Stat 110A, UCLA, Ivo Dinov

Normal approximation to Poisson


z Let X1~Poisson() & X2~Poisson() X1+ X2~Poisson(+)

The proportion of the Binomial(100, 0.47) population having


more than 58 reds (successes) out of 100 roulette spins (trials).
Since np=47>=10

& n(1-p)=53>10 Normal


approx is justified.
Roulette has 38 slots
z Z=(Y-np)/Sqrt(np(1-p)) = 18red 18black 2 neutral
58 100*0.47)/Sqrt(100*0.47*0.53)=2.2
z P(Y>=58) P(Z>=2.2) = 0.0139
z True P(Y>=58) = 0.177, using SOCR (demo!)
z Binomial approx useful when no access to SOCR avail.
Slide 40

Stat 110A, UCLA, Ivo Dinov

Normal approximation to Poisson example


z Let X1~Poisson() & X2~Poisson() X1+ X2~Poisson(+)

z Let X1, X2, X3, , Xk ~ Poisson(), and independent,

z Let X1, X2, X3, , X200 ~ Poisson(2), and independent,

z Yk = X1 + X2 + + Xk ~ Poisson(k), E(Yk)=Var(Yk)=k.

z Yk = X1 + X2 + + Xk ~ Poisson(400), E(Yk)=Var(Yk)=400.

z The random variables in the sum on the right are


independent and each has the Poisson distribution
with parameter .

z By CLT the distribution of the standardized variable


(Yk 400) / (400)1/2 N(0, 1), as k increases to infinity.

z By CLT the distribution of the standardized variable


(Yk k) / (k)1/2 N(0, 1), as k increases to infinity.
z So, for k >= 100, Zk = {(Yk k) / (k)1/2 } ~ N(0,1).

z Yk ~ N(k, (k)1/2).
Slide 41

Stat 110A, UCLA, Ivo Dinov

z Zk = (Yk 400) / 20 ~ N(0,1) Yk ~ N(400, 400).


z P(2 < Yk < 400) = (stdz 2 & 400) =
z P( (2400)/20 < Zk < (400400)/20 ) = P( -20< Zk<0)
= 0.5
Slide 42

Stat 110A, UCLA, Ivo Dinov

Poisson or Normal approximation to Binomial?


z Poisson Approximation (Binomial(n, pn) Poisson() ):

y
WHY? e
n p y (1 p ) n y

n

n
y n
y!
n pn

n>=100 & p<=0.01 & =n p <=20


z Normal Approximation
(Binomial(n, p) N ( np, (np(1-p))1/2) )

np >=10 & n(1-p)>10


Slide 43

Stat 110A, UCLA, Ivo Dinov

The Gamma Function


For > 0, the gamma function
( ) is defined by

( ) = x 1e x dx
0

4.4

The Gamma
Distribution and Its
Relatives
Slide 44

Stat 110A, UCLA, Ivo Dinov

Gamma Distribution
A continuous rv X has a gamma
distribution if the pdf is
1
x 1e x / x 0

f ( x; , ) = ( )

0
otherwise

where the parameters satisfy > 0, > 0.


The standard gamma distribution has = 1.
Slide 45

Stat 110A, UCLA, Ivo Dinov

Mean and Variance


The mean and variance of a random
variable X having the gamma distribution
f ( x; , ) are
E ( X ) = = V ( X ) = 2 = 2

Slide 47

Stat 110A, UCLA, Ivo Dinov

Slide 46

Stat 110A, UCLA, Ivo Dinov

Probabilities from the Gamma


Distribution
Let X have a gamma distribution with
parameters and .
Then for any x > 0, the cdf of X is given by
x
P( X x) = F ( x; , ) = F ;

where
x 1 y
y e
F ( x; ) =
dy
( )
0
Slide 48

Stat 110A, UCLA, Ivo Dinov

Exponential Distribution

Mean and Variance

A continuous rv X has an exponential


distribution with parameter if the pdf is
e x x 0
f ( x; ) =
0

Slide 49

otherwise

Stat 110A, UCLA, Ivo Dinov

Let X have a exponential distribution


Then the cdf of X is given by
x<0
0
F ( x; ) =
x
x0
1 e

Stat 110A, UCLA, Ivo Dinov

The Chi-Squared Distribution


Let v be a positive integer. Then a
random variable X is said to have a chisquared distribution with parameter v if
the pdf of X is the gamma density with
= v / 2 and = 2. The pdf is
1

x ( v / 2)1e x / 2
v/2
f ( x; v ) = 2 (v / 2)

Slide 53

= =

2 = 2 =

Slide 50

Stat 110A, UCLA, Ivo Dinov

Applications of the Exponential


Distribution

Probabilities from the Gamma


Distribution

Slide 51

The mean and variance of a random


variable X having the exponential
distribution

x0

Suppose that the number of events


occurring in any time interval of length t
has a Poisson distribution with parameter t
and that the numbers of occurrences in
nonoverlapping intervals are independent
of one another. Then the distribution of
elapsed time between the occurrences of
two successive events is exponential with
parameter = .
Slide 52

Stat 110A, UCLA, Ivo Dinov

The Chi-Squared Distribution


The parameter v is called the number of
degrees of freedom (df) of X. The
2
symbol is often used in place of chisquared.

x<0

Stat 110A, UCLA, Ivo Dinov

Slide 54

Stat 110A, UCLA, Ivo Dinov

Constructing QQ plots

Identifying Common Distributions QQ plots


z Quantile-Quantile plots indicate how well the model
distribution agrees with the data.
z q-th quantile, for 0<q<1, is the (data-space) value, Vq, at or
below which lies a proportion q of the data.
1

Graph of the CDF, FY(y)=P(Y<=Vq)=q

Vq
Slide 55

z Start off with data {y1, y2, y3, , yn}


z Order statistics y(1) <= y(2) <= y(3) <=<= y(n)
z Compute quantile rank, q(k), for each observation, y(k),

P(Y<= q(k)) = (k-0.375) / (n+0.250),

where Y is a RV from the (target) model distribution.


z Finally, plot the points (y(k), q(k)) in 2D plane, 1<=k<=n.
z Note: Different statistical packages use slightly
different formulas for the computation of q(k). However,
the results are quite similar. This is the formulas
employed in SAS.
z Basic idea: Probability that:
P((model)Y<=(data)y(1))~ 1/n;

P(Y<=y(2)) ~ 2/n; P(Y<=y(3)) ~ 3/n;


Slide 56

Stat 110A, UCLA, Ivo Dinov

Stat 110A, UCLA, Ivo Dinov

Example - Constructing QQ plots

Expected Value for


Normal Distribution

z Plot the points (y(k), q(k)) in 2D plane, 1<=k<=n.

3
2
1
0
-1
-2
-3

Slide 57

C:\Ivo.dir\UCLA_Classes\Winter2002\AdditionalInstructorAids
BirthdayDistribution_1978_systat.SYD
SYSTAT, Graph Probability Plot, Var4, Normal Distribution

z Start off with data {y1, y2, y3, , yn}.

4.5

Other Continuous
Distributions

Slide 58

Stat 110A, UCLA, Ivo Dinov

The Weibull Distribution


A continuous rv X has a Weibull
distribution if the pdf is
1 ( x / )
x e

f ( x; , ) =

x0
x<0

where the parameters satisfy > 0, > 0.


Slide 59

Stat 110A, UCLA, Ivo Dinov

Stat 110A, UCLA, Ivo Dinov

Mean and Variance


The mean and variance of a random
variable X having the Weibull
distribution are

= 1 +

1
2 1
2
2
= 1 + 1 +

Slide 60

Stat 110A, UCLA, Ivo Dinov

10

Weibull Distribution

Lognormal Distribution

The cdf of a Weibull rv having


parameters and is
1 e( x / )
F ( x; , ) =
0

Slide 61

A nonnegative rv X has a lognormal


distribution if the rv Y = ln(X) has a
normal distribution the resulting pdf has
parameters and and is

x0
x<0

2
2
1
e[ln( x ) ] /(2 )

f ( x; , ) = 2 x

Slide 62

Stat 110A, UCLA, Ivo Dinov

Mean and Variance

E( X ) = e

V (X ) = e

Slide 63

2 + 2

x<0

Stat 110A, UCLA, Ivo Dinov

Lognormal Distribution
The cdf of the lognormal distribution is
given by

The mean and variance of a variable X


having the lognormal distribution are
+ 2 / 2

x0

(e

F ( x; , ) = P( X x ) = P[ln( X ) ln( x )]

ln( x )

ln( x)
= P Z
=

Slide 64

Stat 110A, UCLA, Ivo Dinov

Stat 110A, UCLA, Ivo Dinov

Beta Distribution

Mean and Variance

A rv X is said to have a beta distribution


with parameters A, B, > 0, and > 0
if the pdf of X is

The mean and variance of a variable X


having the beta distribution are

f ( x; , , A, B) =
1
1
1
( + ) x A B x

B A ( ) ( ) B A B A

0
otherwise

Slide 65

Stat 110A, UCLA, Ivo Dinov

x0

= A + ( B A)

2 =

( B A)2
( + ) 2 ( + + 1)
Slide 66

Stat 110A, UCLA, Ivo Dinov

11

Sample Percentile

4.6

Order the n-sample observations from


smallest to largest. The ith smallest
observation in the list is taken to be the
[100(i 0.5)/n]th sample percentile.

Probability
Plots

Slide 67

Slide 68

Stat 110A, UCLA, Ivo Dinov

Stat 110A, UCLA, Ivo Dinov

Normal Probability Plot


Probability Plot

A plot of the pairs

[100(i .5) / n]th percentile ith smallest sample

observation
of the distribution

If the sample percentiles are close to the


corresponding population distribution
percentiles, the first number will roughly
equal the second.
Slide 69

([100(i .5) / n]th z percentile,

On a two-dimensional coordinate system


is called a normal probability plot. If the
drawn from a normal distribution the
points should fall close to a line with
slope and intercept .
Slide 70

Stat 110A, UCLA, Ivo Dinov

Stat 110A, UCLA, Ivo Dinov

Relation among Distributions

Beyond Normality
Consider a family of probability
distributions involving two parameters
1 and 2 . Let F ( x;1, 2 ) denote the
corresponding cdfs. The parameters
1 and 2 are said to location and scale
parameters if
x 1
F ( x;1, 2 ) is a function of
.

Stat 110A, UCLA, Ivo Dinov

Z=

Normal (X)

Normal (Z)

2 = i =1 Z i

Y = eX

df
1

Chi-square ( )

Weibull

Lognormal (Y)

, 2
Uniform(X)

= n / 2, = 2

,
U=

Beta

Gamma

X = ( )U +

Uniform(U)
= =1

0,1

Tdf=n
(0,1)

df

0, 1

X = ln Y

,
Slide 71

ith smallest observation )

Cauchy
(0,1)

=1

n=2

=1

Exponential(X)

X = ln U
Slide 72

Stat 110A, UCLA, Ivo Dinov

12

You might also like