Week 5

Statistics for Data Science - 2
Week 5 Notes
Continuous Random Variables
1. Cumulative distribution function:

A function F : R → [0, 1] is said to be a Cumulative Distribution Function (CDF) if
(i) F is a non-decreasing function taking values between 0 and 1.
(ii) As x → −∞, F → 0
(iii) As x → ∞, F → 1
(iv) Technical: F is continuous from the right.
2. CDF of a random variable:

Cumulative distribution function of a random variable X is a function FX : R → [0, 1]
defined as
FX (x) = P (X ≤ x)
Properties of CDF
• FX (b) − FX (a) = P (a < X ≤ b)

• FX is a non-decreasing function of x.
• FX takes non-negative values.
• As x → −∞, FX (x) → 0
• As x → ∞, FX (x) → 1
3. Theorem: Random variable with CDF F(x)

Given a valid CDF F (x), there exists a random variable X taking values in R such
that
P (X ≤ x) = F (x)
• If F is not continuous at x and F (X) rises from F1 to F2 at x (jump at x), then
P (X = x) = F2 − F1
• If F is continuous at x, then
P (X = x) = 0
4. Continuous random variable:

A random variable X with CDF FX (x) is said to be a continuous random variable if
FX (x) is continuous at every x.
Properties of continuous random variables
• CDF has no jumps or steps.

• P (X = x) = 0 for all x.
• Probability of X falling in an interval will be nonzero
P (a < X ≤ b) = F (b) − F (a)
• Since P (X = a) = 0 and P (X = b) = 0, we have
P (a ≤ X ≤ b) = P (a < X ≤ b) = P (a ≤ X < b) = P (a < X < b)
5. Probability density function (PDF):

A continuous random variable X with CDF FX (x) is said to have a PDF fX (x) if, for
all x0 , Z x0
FX (x0 ) = fX (x)dx
−∞
• CDF is the integral of the PDF.

• Derivative of the CDF (wherever it exists) is usually taken as the PDF.
• Value of PDF around fX (x0 ) is related to X taking a value around x0 .
• Higher the PDF, higher the chance that X lies there.
6. For a random variable X with PDF fX , an event A is a subset of the real line and its
probability is computed as Z
P (A) = fX (x)dx
A
Rb
• P (a < X < b) = FX (b) − FX (a) = a
fX (x)dx
7. Density function:
A function f : R → R is said to be a density function if
(i) fR(x) ≥ 0
∞
(ii) −∞ fX (x)dx = 1
(iii) f (x) is piece-wise continuous
8. Given a density function f , there is a continuous random variable X with PDF as f .
9. Support of random variable X

Support of the random variable X with PDF fX is
supp(X) = {x : fX (x) > 0}
• supp(X) contains intervals in which X can fall with positive probability.
Page 2
10. Continuous Uniform distribution:
• X ∼ Uniform[a, b]
• PDF: 
 1 a<x<b
fX (x) = b − a
0 otherwise
• CDF: 

 0 x≤a
x − a
FX (x) = a<x<b

 b−a
1 x≥b

11. Exponential distribution:

• X ∼ Exp(λ)
• PDF: (
λe−λx x>0
fX (x) =
0 otherwise
• CDF: (
0 x≤0
FX (x) =
1 − e−λx x>0
12. Normal distribution:

• X ∼ Normal[µ, σ 2 ]
• PDF:
−(x − µ)2

1
fX (x) = √ exp −∞<x<∞
σ 2π 2σ 2
• CDF: Z x
FX (x) = fX (u)du
−∞
• CDF has no closed form expression.

• Standard normal: Z = Normal(0, 1)
2
1 −z
– PDF: fZ (z) = √ exp −∞<z <∞
2π 2
13. Standardization:
If X ∼ Normal(µ, σ 2 ), then
X −µ
= Z ∼ Normal(0, 1)
σ
14. To compute the probabilities of the normal distribution, convert probability computa-
tion to that of a standard normal.
Page 3
15. Functions of continuous random variable:
Suppose X is a continuous random variable with CDF FX and PDF fX and suppose
g : R → R is a (reasonable) function. Then, Y = g(X) is a random variable with CDF
FY determined as follows:
• FY (y) = P (Y ≤ y) = P (g(X) ≤ y) = P (X ∈ {x : g(x) ≤ y})
• To evaluate the above probability
– Convert the subset Ay = {x : g(x) ≤ y} into intervals in real line.
– Find the probability that X falls in those intervals.
R
– FY (y) = P (X ∈ AY ) = AY fX (x)dx
• If FY has no jumps, you may be able to differentiate and find a PDF.
16. Theorem: Monotonic differentiable function
Suppose X is a continuous random variable with PDF fX . Let g(x) be monotonic for
dg(x)
x ∈ supp(X) with derivative g 0 (x) = . Then, the PDF of Y = g(X) is
dx
1 −1
fY (y) = fX (g (y))
|g 0 (g −1 (y))|
• Translation: Y = X + a
fY (y) = fX (y − a)
• Scaling: Y = aX
1
fY (y) = fX (y/a)
|a|
• Affine: Y = aX + b
1
fY (y) = fX ((y−b)/a)
|a|
• Affine transformation of a normal random variable is normal.
17. Expected value of function of continuous random variable:
Let X be a continuous random variable with density fX (x). Let g : R → R be a
function. The expected value of g(X), denoted E[g(X)], is given by
Z ∞
E[g(X)] = g(x)fX (x)dx
−∞
whenever the above integral exists.

• The integral may diverge to ±∞ or may not exist in some cases.
18. Expected value (mean) of a continuous random variable:
Mean, denoted E[X] or µX or simply µ is given by
Z ∞
E[X] = xfX (x)dx
−∞
Page 4
19. Variance of a continuous random variable:
2
Variance, denoted Var[X] or σX or simply σ 2 is given by
Z ∞
2
Var(X) = E[(X − E[X]) ] = (x − µ)2 fX (x)dx
−∞
• Variance is a measure of spread of X about its mean.

• Var(X) = E[X 2 ] − E[X]2
X E[X] Var(X)
a+b (b−a)2
Uniform[a, b] 2 12
1 1
Exp(λ) λ λ2
Normal(µ, σ 2 ) µ σ2
20. Markov’s inequality:

If X is a continuous random variable with mean µ and non-negative supp(X) (i.e.
P (X < 0) = 0), then
µ
P (X > c) ≤
c
21. Chebyshev’s inequality:
If X is a continuous random variable with mean µ and variance σ 2 , then
1
P (|X − µ| ≥ kσ) ≤
k2
Page 5

Week 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 5

Uploaded by

Copyright:

Available Formats

Statistics for Data Science - 2

1. Cumulative distribution function:

2. CDF of a random variable:

• FX (b) − FX (a) = P (a < X ≤ b)

3. Theorem: Random variable with CDF F(x)

• If F is not continuous at x and F (X) rises from F1 to F2 at x (jump at x), then

4. Continuous random variable:

• CDF has no jumps or steps.

P (a < X ≤ b) = F (b) − F (a)

• Since P (X = a) = 0 and P (X = b) = 0, we have

P (a ≤ X ≤ b) = P (a < X ≤ b) = P (a ≤ X < b) = P (a < X < b)

5. Probability density function (PDF):

• CDF is the integral of the PDF.

8. Given a density function f , there is a continuous random variable X with PDF as f .

9. Support of random variable X

supp(X) = {x : fX (x) > 0}

• supp(X) contains intervals in which X can fall with positive probability.

11. Exponential distribution:

12. Normal distribution:

• CDF has no closed form expression.

whenever the above integral exists.

• Variance is a measure of spread of X about its mean.

20. Markov’s inequality:

You might also like