CL202: Introduction To Data Analysis: MB+SCP

CL202: Introduction to Data Analysis
MB+SCP
Slides prepared by: Mani Bhushan, Sachin Patwardhan,

Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076
mbhushan,sachinp@iitb.ac.in
Instructor: Sharad Bhartiya (bhartiya@che.iitb.ac.in)
Spring 2020
MB+SCP (IIT Bombay) CL202 Spring 2020 1 / 37

Next Topic
Expectations
Work in the sample space SR = R, i.e. on the real-line.

Probability Distributions
For a random variable X (ω),

Probability (cumulative) distribution dunction FX (x) = P{X ≤ x}
PMF or PDF
I probability mass function (PMF), if X is discrete,
I probability density function (PDF), if X is continuous.

Expectation of a Random Variable
The expected value or mean of a random variable X is defined as the integral

Z ∞
E [X ] = xfX (x)dx
−∞
with fX (x) being the probability density function. For discete case, it is
generalization of probability mass function written using dirac delta functions.
Also denoted as µ.
It is the average value of X in a large number of repetitions of the
experiment and not necessarily the value that we expect X to have.

Expectation of a discrete random variable
For a discrete random variable, the expectation can be written as a sum.
Suppose X takes values xi with probability αi . Then,
X
fX (x) = αi δ(x − xi )
i
and,
!
Z ∞ X X Z ∞
E [X ] = x αi δ(x − xi ) dx = αi xδ(x − xi )dx
−∞ i i −∞
X
= αi xi
i
It is the centre of gravity of the probability mass function of discrete random

variable. The function can be balanced at µ, since
X
(xi − µ)αi = 0
i

Example 1: Expectation of a discrete random variable
What is the expected value when you roll a die?
E [X ] = 1(1/6) + 2(1/6) + ... + 6(1/6) = 3.5
Note that 3.5 is not a value you get on a single roll.

Probability that a drug works on {0, 1, 2, 3, 4} out of four patients.

r P(X = r )
0 0.008
1 0.076
2 0.265
3 0.411
4 0.240
1.000
The expected number of patients who would be cured is
E [X ] = 0(0.008) + 1(0.076) + 2(0.265) + 3(0.411) + 4(0.240)

= 2.80
Therefore 2.8 out of 4 patients would be cured, on average.

For the case of two coin tosses with a fair coin, if X = number of heads, then
E [X ] is
E [X ] = 0(1/4) + 1(1/2) + 2(1/4) = 1
If the probability of a head on a coin toss was 3/4
 2
 (1/4) = 1/16 k = 0
P(X = k) = 2 × 4 × 34
1
k=1
 2
(3/4) k=2
This is a Binomial RV with number of tosses n = 2 and probability of success

in a single toss as p = 3/4.
2 2 2
1 1 3 3
E [X ] = 0 +1 2× × +2
4 4 4 4
24
= = 1.5
16

Expectation or Mean of a continuous random variable
For a discrete random variable (RV), we had

X
E [X ] = µ = xi αi
i
For a continuous RV,

Z ∞
E [X ] = µ = xf (x)dx
−∞

Centre of Gravity Interpretation
The mean is the centre of gravity of the density function

Z ∞ Z ∞ Z ∞
(x − µ)f (x)dx = xf (x)dx − µf (x)dx = 0
−∞ −∞ −∞

Adapted from Example 4.4d from Ross
Q. You are waiting for a phone call after dinner (9 PM). The number of hours (X)
after 9 PM until the message arrives is a random variable with probability density
function:
1/1.5 0 < x < 1.5
f (x) =
0 otherwise
R 1.5
Answer: E [X ] = 0 (x/1.5)dx = 0.75.
Note: The above density function is a uniform density function over (0, 1.5) and
the random variable is said to have a uniform distribution over (0, 1.5).

Expected Value of a Transformation
Given RV X and its PMF or PDF, compute expected value of some function
of X say g (X ).
Option 1:
I g (X ) is also a RV: For a given ω ∈ S, x = X (ω) is a number and
g (X (ω)) = g (x) is also a number.
I Find probability distribution of g (X ) and then compute its expectation.

Example 4.5b (Ross)
The time (in hrs.) taken to repair an electrical breakdown in a factory is a
uniform RV X over (0, 1) i.e. its density function is:

1, 0 < x < 1
f (x) =
0, otherwise
Penalty paid due to breakdown of duration x is x 3 . What is the expected

penalty due to a breakdown?
Let Y = X 3 denote penalty. Compute its cumulative distribution function.
For 0 < y < 1:
FY (y ) = P{Y ≤ y } = P{X 3 ≤ y }
Z y 1/3
1/3
= P{X ≤ y } = 1dx = y 1/3
0
Thus, 
 0, y ≤0
FY (y ) = y 1/3 , 0<y <1
1, 1≤y


Example (Continued...)
Density function of Y is:

d
fY (y ) = FY (y )
dy

 0, y ≤0
1 −2/3
= y , 0<y <1
 3
0, 1≤y
R∞ R11 −2/3
E [Y ] = −∞
yfY (y )dy = 0 3
y dy = 1/4

Expectation of a Function of RV
Option 2: E [g (X )] is a weighted average of the possible values g (X ) with

the weight given to g (x) equal to the probability (or probability density in
continuous case) that X = x.
Makes sense since g (X ) takes value g (x) whenever X takes value x.
For X discrete with PMF p(x) and takes values xi with probability αi , for any
real-valued function g :
X X
E [g (X )] = g (xi )P(X = xi ) = g (xi )αi
i i
For X continuous with PDF f (x), for any real-valued function g :

Z ∞
E [g (X )] = g (x)fX (x)dx
−∞
R1
For earlier example, E [g (X )] = 0
x 3 dx = 1/4 as before.

Special Cases of g (X )
For constants a, b
E [aX + b] = aE [X ] + b
for X continuous or discrete.
Show this.
What happens for a = 0, expectation of a constant is the constant itself.
E [X ] is called mean (µ) or the first moment of X .
E [X n ] is called the nth moment of X .
n
P
n
E [X ] = R ∞xi xi nαi X discrete
−∞
x f (x)dx X continuous

Variance of a Random Variable
Variance of a random variable X with mean µ is defined as:
Var(X ) = E [(X − µ)2 ]
Var(X ) also denoted as σX2 or σ 2

Variance of a Discrete RV
X 2
Var(X ) = (xi − E [X ]) αi
i
X
xi2 − 2xi E [X ] + (E [X ])2 αi

=
i
X X X
= xi2 αi − 2E [X ] xi αi + (E [X ])2 αi
i i i
2 2
= E [X 2 ] − 2 (E [X ]) + (E [X ])
= E [X 2 ] − (E [X ])2
= E [X 2 ] − µ2
Var(X ) = E [(X − E [X ])2 ] = E [X 2 ] − (E [X ])2

Variance of a Continuous RV
The variance of a continuous RV X is
Var(X ) = E [(X − µ)2 ]

Z ∞
= (x − µ)2 fX (x)dx
−∞
= E [X 2 ] − (E [X ])2

Variance
Var(X ) = E [(X − µ)2 ] which turns out to be Var(X ) = E [X 2 ] − (E [X ])2 i.e.

Variance of X is the expected value of square of X minus the square of the
expected valued of X .
p
Quantity σ = Var(X ) is called the Standard Deviation of X .
Example: X is outcome when we roll a fair die i.e.
P{X = i} = 1/6; i = 1, 2, ..., 6. Then
P6
E [X 2 ] = i=1 i 2 P{X = i} = 12 (1/6) + 22 (1/6) + ... + 62 (1/6) = 91/6.
Also, E [X ] = 7/2 (calculate this). Thus
Var(X ) = 91/6 − (7/2)2 = 35/12.

Useful Relations Regarding Variances
If Y = aX + b, a, b are constants, then
E [Y ] = aE [X ] + b = aµ + b
What about variance of Y ?
Var(Y ) = E [((aX + b) − E (aX + b))2 ]

= E [(aX + b − aµ − b)2 ]
= E [(aX − aµ)2 ]
= E [a2 (X − µ)2 ]
= a2 E [(X − µ)2 ]
= a2 Var(X )
This holds for both discrete and continuous random variables.

Special Cases for a and b
Var(aX + b) = a2 Var(X )
a=0: Var(b) = 0.
b=0: Var(aX ) = a2 Var(X ); b does not affect variance.

Moments of a Random Variable
nth moment (about the origin) of a random variable X defined as: E [X n ]

µ = E [X ] is the first moment about the origin and indicates the average of
values that X takes. It is the centre of gravity of the density function.
Var(X ) = E [(X − µ)2 ] is the second moment about the mean and indicates
the spread (variation) in the value that X will take around the mean.

Skewness
skewness=
E [(X − µ)3 ] E [(X − µ)3 ]
=
2
(E [(X − µ) ])3/2 σ3
is the (normalized) third moment about the mean
Indicates degree of asymmetry
x 苲
x x 苲
x x
苲
x
Negative or left skew Symmetric Positive or right skew
(a) (b) (c)
MONTGOMERY: Applied Statistics, 3e

Fig. 6.11 W-114

Kurtosis
Kurtosis (degree of peakedness)=
E [(x − E [x]]4
σ4
Can keep defining higher moments.
Moments can be used to summarize a probability density function or
probability mass function.
We tend to use only the first two moments usually.

Why Moments? Polymer Motivation
1 Moments of a random variable not merely interesting theoretical

characterization, but have significant practical applications.
2 Polymers: Macromolecules with non-uniform molecular weights since random
events occurring during the manufacturing process ensure that polymer
molecules grow to varying sizes.
3 Polymers primarily characterized by their molecular weight distributions
(MWD).
4 MWD: Equivalent to a density function, performance of a polymer depends
critically on its MWD.
5 Many techniqes developed for experimental determination of MWD.

Polymers: MWD
1 Quantities commonly used are:

1 Mn : number average molecular weight= M1 /M0 (ratio of first moment to
zeroth moment). For MWD M0 is not 1, but the total number of molecules in
the sample.
2 Mw : the weight average molecular weight= M2 /M1 .
3 Mz : the z average molecular weight =M3 /M2 .
4 Polydispersity index (PDI): = Mw /Mn . A measure of breadth of the MWD,
approximately 2 for most linear polymers, and 20 or higher for highly branched
polymers.
2 Other particulate processes as well: granulated sugar, fertilizer granules, etc.

Obtaining Moments
How does one obtain E [X n ] from the given density function of X . From basic
definition
n
P
E [X n ] = R ∞xi xi nαi X discrete
−∞
x f (x)dx X continuous
Cumbersome procedure: for a different n need to work out the summations

or integrals.
Any smart shortcuts?
I IITians famous for identifying smart shortcuts!
I Yes!

Moment Generating Function
A moment generating function of X is defined as
φX (t) = φ(t) = E [e tX ]
P tx
R xi txe αi X discrete
i
=
x
e f (x)dx X continuous
Its a function of the real-valued variable t. Note:

t 2X 2

φ(t) = E 1 + tX + + ...
2!
t2
= 1 + tE [X ] + E [X 2 ] + · · ·
2
The nth derivative of φ(t) w.r.t. t, evaluated at t = 0 is
φ(n) (0) = E [X n ]
where E [X n ] is the nth moment (around origin) of X .

Moment Generating Function (MGF): Properties
1 Uniqueness: The MGF, φ(t), does not exist for all random variables; but
when it exists, it uniquely determines the distribution. Thus, if two random
variables have the same MGF, they have the same distribution. Conversely
random variables with different MGFs have different distributions.
2 Linear Transformations: If two random variables Y and X are related
according to the linear expression:
Y = aX + b
for constant a and b. Then,
φY (t) = e bt φX (at)
Pn
3 Independent Sum: If Y = i=1 Xi where Xi s are independent, then
φY (t) = E [e tY ] = E [e tX1 e tX2 ...e tXn ] = E [e tX1 ]E [e tX2 ]...E [e tXn ]
Yn
= φx1 (t)φX2 (t)...φXn (t) = φXi (t)
i=1
Note: If Xi , Xj independent, then E [Xi Xj ] = E [Xi ]E [Xj ].

e tXi , e tXj independent since Xi , Xj independent.
Moment Generating Function: CSTR Example
1 The residence time X of a CSTR (continuously stirred tank reactor) is a

random variable with pdf

0, x <0
f (x) = 1 −x/τ
τe , x ≥0
2 The moment generating function is:
1 ∞ tx −x/τ 1 ∞ − (1−τ t)x

Z Z
1
ΦX (t) = E [e tX ] = e e dx = e τ dx =
τ 0 τ 0 (1 − τ t)
3 The moments thus are:
E [X ] = τ, E [X 2 ] = 2τ 2 , ..., E [X k ] = k!τ k

Markov’s and Chebyshev’s Inequalities
Markov’s inequality: For a non-negative RV X and any a > 0,
E [X ]
P{X ≥ a} ≤
a
Chebyshev’s inequality (for any k > 0):
σ2
P{|X − µ| ≥ k} ≤
k2
Chebyshev’s inequality is a special case of Markov’s inequality.
These inqualities allow derivation of probability bounds when only mean, or
mean and variance are known.

Chebyshev’s inequality
0.4
0.3
0.2
0.1
µ − kσ µ µ + kσ
0.0
−4 −2 0 2 4 6
1
P (|X − E [X ]| ≥ kσ) ≤
k2
where
σ 2 = E [(X − µ)2 ]

Chebyshev’s Inequality
This applies to ANY distribution or any random variable.

For k = 2, Chebyshev’s inequality suggests:
P{|X − µ| ≥ 2σ} ≤ 0.25
For a Gaussian,
P{|X − µ| ≥ 2σ} ≤ 0.05
f (x)
␮ – 3␴ µ – 2␴ ␮–␴ ␮ ␮ + ␴ ␮ + 2␴ ␮ + 3␴ x
68%
95%
99.7%
MONTGOMERY: Applied Statistics, 3e

Fig. 4.12 W-68

Example 4.9, Ross
Number of items produced in a factory is a RV with µ = 50.

1 What can be said about the probability that this week’s production will
exceed 75?
Answer: ≤ 2/3 (Markov’s inequality)
2 If σ 2 = 25, what can be said about the probability that this week’s
production will be between 40 and 60?
Answer: ≥ 3/4 (Chebyshev’s inequality)

Example 2: Chebyshev’s Inequality
Q. Show that for 40000 tosses of a fair coin, there is at least a 0.99 prob that
the proportion of heads will be between 0.475 and 0.525.
RV: X= number of heads in n = 40000 tosses.
For a binomial RV, E [X ] = np and Var(X ) = np(1 − p).
Soln. I E [X ] = 40000
2
p
= 20000 = np.
p
I Also σ = np(1 − p) = 40000 × 1/2 × 1/2 = 100
I 1 − (1/k 2 ) = 0.99 ⇒ k = 10.
I The probability is at least 0.99 that we get between
20000 − (10 × 100) = 19000 and 20000 + (10 × 100) = 21000 heads.
I 19000/40000 = 0.475 and 21000/40000 = 0.525.

THANK YOU

CL202: Introduction To Data Analysis: MB+SCP

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CL202: Introduction To Data Analysis: MB+SCP

Uploaded by

Copyright:

Available Formats

CL202: Introduction to Data Analysis

Slides prepared by: Mani Bhushan, Sachin Patwardhan,

Instructor: Sharad Bhartiya (bhartiya@che.iitb.ac.in)

MB+SCP (IIT Bombay) CL202 Spring 2020 1 / 37

Work in the sample space SR = R, i.e. on the real-line.

MB+SCP (IIT Bombay) CL202 Spring 2020 2 / 37

For a random variable X (ω),

MB+SCP (IIT Bombay) CL202 Spring 2020 3 / 37

The expected value or mean of a random variable X is defined as the integral

MB+SCP (IIT Bombay) CL202 Spring 2020 4 / 37

It is the centre of gravity of the probability mass function of discrete random

MB+SCP (IIT Bombay) CL202 Spring 2020 5 / 37

What is the expected value when you roll a die?

E [X ] = 1(1/6) + 2(1/6) + ... + 6(1/6) = 3.5

Note that 3.5 is not a value you get on a single roll.

MB+SCP (IIT Bombay) CL202 Spring 2020 6 / 37

Probability that a drug works on {0, 1, 2, 3, 4} out of four patients.

E [X ] = 0(0.008) + 1(0.076) + 2(0.265) + 3(0.411) + 4(0.240)

Therefore 2.8 out of 4 patients would be cured, on average.

MB+SCP (IIT Bombay) CL202 Spring 2020 7 / 37

This is a Binomial RV with number of tosses n = 2 and probability of success

MB+SCP (IIT Bombay) CL202 Spring 2020 8 / 37

For a discrete random variable (RV), we had

For a continuous RV,

MB+SCP (IIT Bombay) CL202 Spring 2020 9 / 37

The mean is the centre of gravity of the density function

MB+SCP (IIT Bombay) CL202 Spring 2020 10 / 37

MB+SCP (IIT Bombay) CL202 Spring 2020 11 / 37

MB+SCP (IIT Bombay) CL202 Spring 2020 12 / 37

Penalty paid due to breakdown of duration x is x 3 . What is the expected

MB+SCP (IIT Bombay) CL202 Spring 2020 13 / 37

Density function of Y is:

MB+SCP (IIT Bombay) CL202 Spring 2020 14 / 37

Option 2: E [g (X )] is a weighted average of the possible values g (X ) with

For X continuous with PDF f (x), for any real-valued function g :

MB+SCP (IIT Bombay) CL202 Spring 2020 15 / 37

MB+SCP (IIT Bombay) CL202 Spring 2020 16 / 37

Variance of a random variable X with mean µ is defined as:

Var(X ) = E [(X − µ)2 ]

Var(X ) also denoted as σX2 or σ 2

MB+SCP (IIT Bombay) CL202 Spring 2020 17 / 37

Var(X ) = E [(X − E [X ])2 ] = E [X 2 ] − (E [X ])2

MB+SCP (IIT Bombay) CL202 Spring 2020 18 / 37

The variance of a continuous RV X is

Var(X ) = E [(X − µ)2 ]

MB+SCP (IIT Bombay) CL202 Spring 2020 19 / 37

Var(X ) = E [(X − µ)2 ] which turns out to be Var(X ) = E [X 2 ] − (E [X ])2 i.e.

MB+SCP (IIT Bombay) CL202 Spring 2020 20 / 37

If Y = aX + b, a, b are constants, then

What about variance of Y ?

Var(Y ) = E [((aX + b) − E (aX + b))2 ]

This holds for both discrete and continuous random variables.

MB+SCP (IIT Bombay) CL202 Spring 2020 21 / 37

MB+SCP (IIT Bombay) CL202 Spring 2020 22 / 37

nth moment (about the origin) of a random variable X defined as: E [X n ]

MB+SCP (IIT Bombay) CL202 Spring 2020 23 / 37

MONTGOMERY: Applied Statistics, 3e

MB+SCP (IIT Bombay) CL202 Spring 2020 24 / 37

Kurtosis (degree of peakedness)=

MB+SCP (IIT Bombay) CL202 Spring 2020 25 / 37

1 Moments of a random variable not merely interesting theoretical

MB+SCP (IIT Bombay) CL202 Spring 2020 26 / 37

1 Quantities commonly used are: