You are on page 1of 107

Statistical Simulation

in Python
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Topics covered
Basics of randomness & simulation.

Simulation & probability.

Bootstrapping and resampling methods.

Advanced applications of simulation.

STATISTICAL SIMULATION IN PYTHON


Introduction to random variables
Continuous Random Variables

In nitely many possible values.

e.g., Height / Weight

STATISTICAL SIMULATION IN PYTHON


Introduction to random variables
Continuous Random Variables Discrete Random Variables

In nitely many possible values. Finite set of possible values.

e.g., Height / Weight e.g., Outcomes of a six-sided die

STATISTICAL SIMULATION IN PYTHON


Probability distributions
Continuous Probability Distributions

STATISTICAL SIMULATION IN PYTHON


Probability distributions
Continuous Probability Distributions Discrete Probability Distributions

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Simulation basics
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Simulations
Framework for modeling real-world events.

Characterized by repeated random sampling.

STATISTICAL SIMULATION IN PYTHON


Simulations
Framework for modeling real-world events.

Characterized by repeated random sampling.

Gives us an approximate solution.

STATISTICAL SIMULATION IN PYTHON


Simulations
Framework for modeling real-world events.

Characterized by repeated random sampling.

Gives us an approximate solution.

Can help solve complex problems.

STATISTICAL SIMULATION IN PYTHON


Simulation steps
1. De ne possible outcomes for random variables.

2. Assign probabilities.

3. De ne relationships between random variables.

STATISTICAL SIMULATION IN PYTHON


Simulation steps
1. De ne possible outcomes for random variables.

2. Assign probabilities.

3. De ne relationships between random variables.

4. Get multiple outcomes by repeated random sampling.

5. Analyze sample outcomes.

STATISTICAL SIMULATION IN PYTHON


Simulating the dice game

STATISTICAL SIMULATION IN PYTHON


Simulating the dice game

STATISTICAL SIMULATION IN PYTHON


Simulating the dice game

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Using simulation for
decision-making
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Simulation work ow

STATISTICAL SIMULATION IN PYTHON


Change input, evaluate output

STATISTICAL SIMULATION IN PYTHON


Outcomes: New B vs. Old B

STATISTICAL SIMULATION IN PYTHON


Change input to get desired output

STATISTICAL SIMULATION IN PYTHON


Modify C and record outcomes

STATISTICAL SIMULATION IN PYTHON


Change input to get desired output

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Probability Basics
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Sample Space
Sample Space S : Set of all possible outcomes

STATISTICAL SIMULATION IN PYTHON


Probability
Sample Space S : Set of all possible outcomes

Probability P (A): Likelihood of event A

0 ≤ P (A) ≤ 1
P (S) = 1 eg. P (H) + P (T ) = 1

STATISTICAL SIMULATION IN PYTHON


Probability
Sample Space S : Set of all possible outcomes

Probability P (A): Likelihood of event A

0 ≤ P (A) ≤ 1
P (S) = 1 eg. P (H) + P (T ) = 1

STATISTICAL SIMULATION IN PYTHON


Mutually Exclusive Events
Sample Space S : Set of all possible outcomes

Probability P (A): Likelihood of event A

0 ≤ P (A) ≤ 1
P (S) = 1
P (H) + P (T ) = 1
For mutually exclusive events A and B:
P (A ∩ B) = 0
P (A ∪ B) = P (A) + P (B)

STATISTICAL SIMULATION IN PYTHON


Probability
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

STATISTICAL SIMULATION IN PYTHON


Using Simulation for Probability Estimation
Steps for Estimating Probability:

1. Construct sample space or population.

2. Determine how to simulate one outcome.

3. Determine rule for success.

4. Sample repeatedly and count successes.

5. Calculate frequency of successes as an estimate of probability.

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
More Probability
Concepts
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Conditional Probability
Conditional Probability
P (A ∩ B)
P (A∣B) =
P (B)

STATISTICAL SIMULATION IN PYTHON


Conditional Probability
Conditional Probability
P (A ∩ B)
P (A∣B) =
P (B)
P (B ∩ A)
P (B∣A) =
P (A)
P (A ∩ B) = P (B ∩ A)

STATISTICAL SIMULATION IN PYTHON


Bayes Rule
Conditional Probability
P (A ∩ B)
P (A∣B) =
P (B)
P (B∣A)P (A)
Bayes' rule: P (A∣B) =
P (B)

STATISTICAL SIMULATION IN PYTHON


Independent Events
Independent Events
P (A ∩ B) = P (A)P (B)
P (A ∩ B) P (A)P (B)
Conditional Probability: P (A∣B) = = = P (A)
P (B) P (B)

STATISTICAL SIMULATION IN PYTHON


Solar Panels & Clean Vehicles
Number of houses = 150

STATISTICAL SIMULATION IN PYTHON


Solar Panels & Clean Vehicles
30 10 40
P (Solar) = P (Solar ∩ Hybrid, EV) + P (Solar ∩ No Hybrid, EV) = 150
+ 150
= 150

STATISTICAL SIMULATION IN PYTHON


Solar Panels & Clean Vehicles
P (Solar ∩ Hybrid, EV) 30
P (Solar∣Hybrid, EV) = = 80 = 0.375
P (Hybrid, EV)

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Data Generating
Process
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Simulation Steps
1. De ne Possible Outcomes for Random Variables.

2. Assign Probabilities.

3. De ne Relationships between Random Variables.

STATISTICAL SIMULATION IN PYTHON


Data Generating Process

STATISTICAL SIMULATION IN PYTHON


Cricket

1 Source: Wikipedia

STATISTICAL SIMULATION IN PYTHON


Cricket

STATISTICAL SIMULATION IN PYTHON


Cricket

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
eCommerce Ad
Simulation
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
eCommerce Funnel
 

STATISTICAL SIMULATION IN PYTHON


Signup Flow
 

STATISTICAL SIMULATION IN PYTHON


Purchase Flow
 

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Introduction to
resampling methods
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Resampling work ow

STATISTICAL SIMULATION IN PYTHON


Why resample?
Advantages Drawbacks

Simple implementation procedure. Computationally expensive.

Applicable to complex estimators.

No strict assumptions.

STATISTICAL SIMULATION IN PYTHON


Types of resampling methods

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Bootstrapping
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Easter eggs

STATISTICAL SIMULATION IN PYTHON


Easter eggs

STATISTICAL SIMULATION IN PYTHON


Bootstrapping Easter eggs

STATISTICAL SIMULATION IN PYTHON


Bootstrapped distribution

STATISTICAL SIMULATION IN PYTHON


Bootstrap - Good to know
Run at least 5-10k iterations.

Expect an approximate answer.

Consider bias correction.

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Jackknife resampling
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Easter eggs

STATISTICAL SIMULATION IN PYTHON


Easter eggs

STATISTICAL SIMULATION IN PYTHON


Jackkni ng Easter eggs

STATISTICAL SIMULATION IN PYTHON


Jackknife estimate
Jackknife Estimate

Variance of Jackknife Estimate

STATISTICAL SIMULATION IN PYTHON


Jackknife vs Bootstrap
Jackknife Bootstrap

Mean Weight = 51g Mean Weight = 50.8g

95% CI = [33.36g, 68.64g] 95% CI = [35g, 67.03g]

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Permutation testing
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Steps involved

STATISTICAL SIMULATION IN PYTHON


Steps involved

STATISTICAL SIMULATION IN PYTHON


Discussion
Advantages Drawbacks
Very exible Computationally Expensive

No strict assumptions Custom coding required

Widely applicable

STATISTICAL SIMULATION IN PYTHON


Donation website

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Advanced
applications of
simulation
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Overview
Simulation for Business Planning

Monte Carlo Integration

Simulation for Power Analysis

Portfolio Simulation

STATISTICAL SIMULATION IN PYTHON


Simulation for business planning
Corn Farm

STATISTICAL SIMULATION IN PYTHON


Corn farm

STATISTICAL SIMULATION IN PYTHON


Business pro tability

STATISTICAL SIMULATION IN PYTHON


Business pro tability

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Monte Carlo
integration
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
De nite integration

STATISTICAL SIMULATION IN PYTHON


Monte Carlo integration
Calculate overall area. f(x) = x2
Randomly sample points in the area.

Multiply the fraction of the points below the


curve by overall area.

STATISTICAL SIMULATION IN PYTHON


Monte Carlo integration
Calculate overall area. Calculate Overall Area
∫12 x2 dx
Randomly sample points in the area.

Multiply the fraction of the points below the


xmin = 1, xmax = 2
curve by overall area. min(0, fmin (x)) = 0, fmax (x) = 4
Overall Area = 4

STATISTICAL SIMULATION IN PYTHON


Monte Carlo integration
Calculate overall area. Random Sampling
Randomly sample points in the area.

Multiply the fraction of the points below the


curve by overall area.

STATISTICAL SIMULATION IN PYTHON


Monte Carlo integration
Calculate overall area. Fraction of Area
Overall Area × fraction = 2.303
Randomly sample points in the area.
Actual Answer = 2.333
Multiply the fraction of the points below the
curve by overall area.

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Simulation for power
analysis
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
What is power?
What Is Power? - Statistics Teacher

power = P (rejecting Null∣true alternative)
Probability of detecting an effect if it exists.

Depends on sample size, α and effect size.

Typically 80% power recommended for α = 0.05.

STATISTICAL SIMULATION IN PYTHON


News media website

STATISTICAL SIMULATION IN PYTHON


Simulation for power analysis

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Applications in
Finance
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Applications in Finance

STATISTICAL SIMULATION IN PYTHON


Portfolio Simulation

STATISTICAL SIMULATION IN PYTHON


Portfolio Simulation

STATISTICAL SIMULATION IN PYTHON


Let's practice!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N
Wrap up
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

Tushar Shanker
Data Scientist
Simulation concepts covered
Basics of Random Variables

Simulation for Probability

Data Generating Process

Resampling Methods

Monte Carlo Integration

STATISTICAL SIMULATION IN PYTHON


Real-World applications designed
eCommerce Ad Simulation

Website Design for Donation

Corn Production

Portfolio Simulation

STATISTICAL SIMULATION IN PYTHON


Thank You & Good
Luck!
S TAT I S T I C A L S I M U L AT I O N I N P Y T H O N

You might also like