You are on page 1of 18

Random

Variables

Kaustav Banerjee

IIM Lucknow

July 2021

K Banerjee (IIML) Random Variables July 2021 1 / 18


Random variable
Flip a coin three times: Ω = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}
If the coin is fair, each outcome has probability 1/8. Suppose our
interest is in the number of heads, we assign a random variable
X: number of heads in 3 flips of a coin ⇒ ΩX = {0, 1, 2, 3}
X P(X = x)
0 1/8
1 3/8
2 3/8
3 1/8
1

X numerically describes outcomes of the random experiment. An


experiment is random, if its outcomes are known but unpredictable.

K Banerjee (IIML) Random Variables July 2021 2 / 18


Discrete random variable

X : number of calls a student gets from ABCL


ΩX = {0, 1, 2, 3, 4}
Y : number of trials necessary to develop a vaccine for COVID
ΩY = {1, 2, 3, 4, ..}
Discrete random variable has either a finite number of values or
infinitely many values that can be arranged in a sequence: it can
only take countably many values
Number of road accidents in a month in a city
Number of defective products in a batch of size 20
The story of a discrete random variable X isPsummarized in its
probability mass function P(X = x), where x∈A P(X = x) = 1

K Banerjee (IIML) Random Variables July 2021 3 / 18


0.20
0.15
probability

0.10
0.05
0.00

0 5 10 15 20

defectives

20
X
P(X = x), x = 0(1)20; P(X = x) = 1
x=0
K Banerjee (IIML) Random Variables July 2021 4 / 18
Continuous random variable

X : survival time of a patient following a heart attack ∈ (0, ∞)


Y : random number picked from the interval (0, 1)
Random variable representing some measurement on a
continuous scale, is capable of assuming all values in an interval.
This is continuous random variable: it takes values in continuum.
any measuring device has limited accuracy and, therefore, a
continuous scale must be interpreted as an abstraction.
Waiting time for a machine to break down after repair
Blood pressure of a student while writing CAT
The story of a continuous random variable R X is summarized in its
probability density function f(x), where x∈A f(x) dx = 1

K Banerjee (IIML) Random Variables July 2021 5 / 18


0.00 0.02 0.04 0.06 0.08 0.10
probability.density

0 10 20 30 40

waiting time

Z Z
P(X ∈ A) = f(x)dx; f(x) ≥ 0, ∀x ∈ Ω and f(x) dx = 1
A Ω

K Banerjee (IIML) Random Variables July 2021 6 / 18


Cumulative distribution function

P
 x≤a P(X = x), if X is discrete variable;
F(a) = P(X ≤ a) = R

x≤a
f(x) dx, if X is continuous variable.

For discrete variable, F(a) is step function


For continuous variable, F(a) is continuous curve

P(x < X < x + h) F(x + h) − F(x) d


lim = lim = F(x) = f(x)
h→0 h h→0 h dx

Probability density function is the rate of change of cumulative


distribution function: not to be confused with probability

K Banerjee (IIML) Random Variables July 2021 7 / 18


1.0
cumulative probability

0.8
0.6
0.4
0.2
0.0

0 5 10 15 20

defectives

Figure: cumulative distribution function of a discrete distribution

K Banerjee (IIML) Random Variables July 2021 8 / 18


1.0
0.8
cumulative probability

0.6
0.4
0.2
0.0

0 10 20 30 40

waiting time

Figure: cumulative distribution function of a continuous distribution

K Banerjee (IIML) Random Variables July 2021 9 / 18


Measure of centre
X: number of heads in 3 flips of a fair coin
x P(X = x) x × P(X = x)
0 1/8 0
1 3/8 3/8
2 3/8 6/8
3 1/8 3/8
1 12/8 = 1.5 = µ

The mean of a probability distribution, also called population mean


for the variable X, alternatively its expected value, is denoted as E(X),
symbolically written as µ. It plays the role of centre of mass.
P
 x∈A x × P(X = x), if X is discrete;
E(X) = µ = R

x∈A
x × f(x) dx, if X is continuous.

K Banerjee (IIML) Random Variables July 2021 10 / 18


Setting premium

A trip insurance policy pays | 2000 to the customer in case of a loss


due to theft or damage. If the risk of such a loss is 1 in 200, what is
the expected cost, per customer, to cover?
Payment Probability
|0 0.995
| 2000 0.005

The company’s expected cost per customer is | 10. A premium equal


to this amount is viewed as fair. If this premium is charged and no
other costs are involved, the company will neither make a profit nor
lose money in the long run. In practice, the premium is set at a higher
price because it must include administrative costs and intended profit.

K Banerjee (IIML) Random Variables July 2021 11 / 18


Gambler’s expectation

Consider a simple bet on the red of a roulette wheel that has 18 red,
18 black, and 2 green slots. This bet is at even money: a $10 wager
on red has an expected profit
18 20
E(Profit) = 10 × + (−10) × = −0.526
38 38
The negative expected profit says we expect to lose an average of
52.6g on every $10 bet. Over a long series of bets, the relative
frequency of winning will approach the probability 18/38 and that of
losing will approach 20/38, so a player will lose a substantial amount
of money. Other bets against the house have a similar negative
expected profit. How else could a casino stay in business?

K Banerjee (IIML) Random Variables July 2021 12 / 18


Expectation rules

If P(X = c) = 1, then E(X) = c


For any constants a and b, E(a + bX) = a + bE(X)
For any two random variables X and Y, E(X + Y) = E(X) + E(Y)
Expectation of a function of a random variable X, say g(X) is
P
 x∈A g(x) P(X = x), if X is discrete,
E{g(X)} = R
g(x) f(x) dx, if X is continuous

x∈A
P
 x∈A (x − µ)2 P(X = x), if X is discrete,
2
V(X) = E(X − µ) = R
(x − µ)2 f(x) dx, if X is continuous

x∈A

K Banerjee (IIML) Random Variables July 2021 13 / 18


Flaw of the average

K Banerjee (IIML) Random Variables July 2021 14 / 18


Measure of spread

X=0 with probability 1


(
−1 with probability 1/2;
Y=
1 with probability 1/2.
(
−100 with probability 1/2;
Z=
100 with probability 1/2.
All these distributions have same expected value 0. Y is more spread
than X, a constant; and Z is more spread than Y.
σ 2 = E(X − µ)2 = E(X2 ) − µ2 = V(X) or population variance captures
the spread of a distribution. As it’s in
p squared units, an alternative
measure is standard deviation: σ = V(X).
Note: mina E(X − a)2 = E(X − µ)2 and V(a + bX) = b2 V(X)

K Banerjee (IIML) Random Variables July 2021 15 / 18


Risk assessment
As a manager, which one of the following funds you will prefer?

Profit1 = | 400 with probability 1


(
| 10000 with probability 0.15;
Profit2 =
-| 1000 with probability 0.85.

| 1000
 with probability 0.50;
Profit3 = | 500 with probability 0.30;

-| 500 with probability 0.20.

One way to assess volatility is to consider the coefficient of variation

SD(X) σ
CV(X) = =
E(X) µ

K Banerjee (IIML) Random Variables July 2021 16 / 18


0.00 0.05 0.10 0.15 0.20 0.25
probability density

p 1−p

−4 −2 x 0 2 4

x is 100p’th percentile if P(X ≤ x) ≥ p and P(X ≥ x) ≥ 1 − p


K Banerjee (IIML) Random Variables July 2021 17 / 18
Some alternative measures
Alternatively, x is 100p’th percentile if p ≤ F(x) ≤ p + P(X = x). For
continuous distribution, 100p’th percentile is a solution of F(x) = p.
Median (50’th percentile) is a robust measure of centre. A robust
measure of spread is IQR = 75’th percentile - 25’th percentile

−2 −1 0 1 2

K Banerjee (IIML) Random Variables July 2021 18 / 18

You might also like