You are on page 1of 11

Chapter 3: Fundamentals of Data Analysis Statistics

RANDOM VARIABLES AND


DISTRIBUTIONS
Outline

• Random variables

• Probability distributions

• Expectation

• Variance
Random Variables
• Suppose x = 5
– What is 2 x + 7?

• Suppose x is 5 with probability 50% and 2 with


probability 50%
– What is 2 x + 7?

• A random variable is a quantity that can take on different


numerical values based on some probabilities
Distribution
• We describe the values that the random variable can take, along with
the probabilities of those values using a probability distribution

1, 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏 0.7


• 𝑋=ቊ
2, 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏 0.3

• All probabilities must be non-negative and sum to 1

• When all possible values the RV can take can be listed we call it a
discrete random variable
Distribution
• Suppose the number of insurance claims filed by a driver in a
month is a random variable described by

0, 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏 0.95


1, 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏 0.04
–𝑋=
2, 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏 0.008
3, 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏 0.002
• What is the probability that a driver will file fewer than 2
claims this month?
Distribution
• Suppose the number of mL in a soda bottle is described by a random
variable
• Can we list all possible values?
– 498 mL, 499 mL, 500 mL, …
– What about 499.2129415 mL?
• Sometimes it is just not possible to list all values a RV can take
• In this case the random variable can take any number in a continuous
range
• We call it a continuous random variable
• We will go into more details about continuous random variables later
Random Variables
• There is a random variable, 𝑋, described by a distribution
with possible values 𝑥1 , 𝑥2 , … , 𝑥𝑚 and corresponding
probabilities 𝑝1 , 𝑝2 , … , 𝑝𝑚
– What is the mean of that random variable?
• Before now to calculate the mean we used the arithmetic
average of data
• We can’t simply average all the possible values of X
• Suppose we have an m sided weighted Dice
– What is the average side that shows up when I flip it a
bunch of times?
Random Variables
• Let’s rearrange terms

𝑛𝑢𝑚 𝑠𝑖𝑑𝑒 1 𝑛𝑢𝑚 𝑠𝑖𝑑𝑒 2 𝑛𝑢𝑚 𝑠𝑖𝑑𝑒 𝑚


• 𝑚𝑒𝑎𝑛 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑚
𝑛𝑢𝑚 𝑓𝑙𝑖𝑝𝑠 𝑛𝑢𝑚 𝑓𝑙𝑖𝑝𝑠 𝑛𝑢𝑚 𝑓𝑙𝑖𝑝𝑠
• If we flip infinitely many times then we’ll get
𝑛𝑢𝑚 𝑠𝑖𝑑𝑒 1
– → 𝑝1
𝑛𝑢𝑚 𝑓𝑙𝑖𝑝𝑠
𝑛𝑢𝑚 𝑠𝑖𝑑𝑒 2
– → 𝑝2
𝑛𝑢𝑚 𝑓𝑙𝑖𝑝𝑠
• 𝑚𝑒𝑎𝑛 → 𝑝1 𝑥1 + 𝑝2 𝑥2 + ⋯ + 𝑝𝑚 𝑥𝑚
• For random variables we call this the expectation
– 𝐸[𝑋] = 𝑝1 𝑥1 + 𝑝2 𝑥2 + ⋯ + 𝑝𝑚 𝑥𝑚
Random Variables
• Suppose I have another RV defined by 𝑌 = 𝑋
– What is E[Y]?
• 𝐸[𝑌] = 𝑝1 𝑦1 + 𝑝2 𝑦2 + ⋯ + 𝑝𝑚 𝑦𝑚
• 𝐸[ 𝑥] = 𝑝1 𝑥1 + 𝑝2 𝑥2 + ⋯ + 𝑝𝑚 𝑥𝑚

• In general we can take the expectation of any function, 𝑓, of a random


variable
• 𝐸[𝑓(𝑋)] = 𝑝1 𝑓(𝑥1 ) + 𝑝2 𝑓(𝑥2 ) + ⋯ + 𝑝𝑚 𝑓(𝑥𝑚 )

• If we just have data on X then we approximate the expectation with the


arithmetic average
Random Variables
• What about variance?
• Remember the definition of variance
– Mean squared difference between data and mean
• 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 − 𝐸 𝑋 2
• This means to calculate variance we first have to calculate the
expectation
– 𝑉𝑎𝑟 𝑋 = 𝑝1 𝑥1 − 𝐸 𝑋 2 + … + 𝑝𝑚 𝑥𝑚 − 𝐸 𝑋 2

• Standard deviation still has the same definition: square root of


variance!
Random Variables
0, 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏 0.95
1, 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏 0.04
• 𝑋=
2, 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏 0.008
3, 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏 0.002
• What are E[X], Var[X] and sd[X]? (0.062 & 0.0862 & 0.294)
• 𝐸 𝑋 = 0 × 0.95 + 1 × 0.04 + 2 × 0.0862 + 3 × 0.002 = 0.062
• 𝑉𝑎𝑟 𝑋 = 0.95 × (0 − 0.062)2 + 0.04 × (1 − 0.062)2 + 0.008 × (2 − 0.062)2 + 0.002 × (3 −
0.062)2 = 0.0862
• 𝑠𝑑 𝑋 = 𝑉𝑎𝑟 𝑋 = 0.294

You might also like