Causal AI For LQG - Ben Steiner - Apr 2023

4/17/2023
Ben Steiner | Causal AI | Apr 11th 2023
CAUSAL AI FOR INVESTMENT MANAGEMENT

THE POTENTIAL, PITFALLS & PERILS
BEN STEINER, COLUMBIA UNIVERSITY
2021 CAUSAL
TECHNIQUES
RECOGNIZED
Nobel Prize 2021

awarded for the
‘analysis of causal
relationships’
1
4/17/2023
2022 CAUSAL
TECHNIQUES
HYPED
“Causal AI” added

to Gartner Hype
Cycle for Artificial
Intelligence, 2022
KEY TAKE AWAYS FROM TODAY

1. “Correlation is not causation”: Now causal techniques are an alternative to move beyond just association
 Understanding relationships between variables, specifically their causal effects on each other, is valuable when
decisions impact outcomes
 We want to understand how much of an outcome is because of each variable
 Even when decisions do not impact outcomes, causal tools add explainability. This helps communication and
accelerates speed of model development within teams
2. Not one algorithm or one technique

 Many different algorithms and techniques. Not one-size-fits-all
 Different algorithms and techniques have different assumptions and limitations. As always: use the right tool for the job
3. Appropriate expectations for investment management: Don’t expect a crystal ball inside a black box
 If prediction accuracy is the goal, there are better ways to curve-fit (with enough data and no concept drift)
 For Investment management specifically: benefits include communication and collaboration to accelerate innovation
 More generally: causal methods can unify domain experts and machine learning
2
4/17/2023
ROADMAP
1 2 3 4
Intro Foundations Examples Final Thoughts
• What do we mean • Causal Graphs • Examples from • Assumptions,
by causality? capital markets Weaknesses &
• Structural Causal and asset Limitations
• Introduction to Models management
causal AI • What are good and
• Causal Discovery bad uses of causal
approaches?
• Causal Inference
WHAT DO WE MEAN BY CAUSALITY?
3
4/17/2023

CORRELATIONS CAN NOT UNDERSTAND CAUSE AND EFFECT
Volume of Ice cream sold and frequency of shark attacks by month
Shark Attacks
Ice Cream Sales

CORRELATED
Not causal
JAN MAR MAY JUL SEP NOV
“Correlation is not causation”

SIMPSON’S PARADOX SHOWS WHY CAUSE AND EFFECT MATTERS
 Using correlations instead of causal relationships can lead to incorrect inferences
 Incorrect inferences can train models delivering incorrect predictions
 Simpson’s Paradox: a statistical association for the total population is reversed

for sub-populations
 Example of paradox from Judea Pearl’s The Book of Why:
 Cholesterol is positively correlated with exercise (upper right)
 To reduce cholesterol: should our health policy suggest less exercise?!
 But for each age group indivudally, correlation is negative. (lower right)
 Which is correct? - We need to know the causal mechanism that generated the data
Figure source: Judea Pearl and Dana Mackenzie, The Book of Why
4
4/17/2023
INTRODUCTION TO CAUSAL AI
CORRELATION VERSUS CAUSATION
“Correlation is not
causation”
But that's the problem...
Because sometimes it is!
The challenge is
knowing when it is,
and when it isn't
5
4/17/2023
CAUSAL DRIVERS ARE A SUBSET OF CORRELATED ASSOCIATIONS
1. 2.
Of all the correlated … only some are
factors in a model…
causation”true
But causal
that's thedrivers.
problem...
3. The challenge 4.
is
(a) differentiate
Extra causal knowing when itbetween
is, causal and
drivers can be correlated.
and when it isn't
(b) Discover more
discovered causal drivers
X CAN BE CORRELATED WITH Y… BUT NOT CAUSAL
Correlation can be the same in these 5 cases:
C M
X Y X Y X Y
X Y X Y
X causes Y Y causes X C causes both Association by chance X has an indirect

C = ‘confounder’ (“spurious correlation”) impact on Y
No causal relationship M = ‘mediator’
X is a true causal driver Now the other way A 3rd variable is the driver of No causal relationship, but X is the driver of a 3rd variable,
of Y, changes in X lead to around, Y is a driver of X both of them association detected in which is the driver of Y
changes in Y. limited sample size.
Use X to model Y Do not use X to model Y Do not use X to model Y Do not use X to model Y X could be used to model Y…
 But M would create a better

model for Y
6
4/17/2023
POTENTIAL FOR BOTH ACCURACY AND EXPLAINABILITY
Causal
AI
causation”
But that's the problem...
CAUSAL AI IN CAPITAL MARKETS

POTENTIAL TO BE USED WHEREVER WE FIND CORRELATION
● Human Guided Research integration of human expertise into models (below right)
● Portfolio Construction: Similar performance, lower turnover (right)
● Orthogonal factor research: enhance existing models &/or uncover true causal
drivers (below left)
● Client Analytics Maximize lifetime value with next best actions
● Evaluate alt datasets faster and only on-board datasets with value
● Other use cases: performance attribution, market impact, scenario analysis, stress
testing of factors, model validation…
Causal Human
Discovery Guided
algorithms Research
Data used to Humans can
uncover true see models to
causal drivers interact &
modify them
14
7
4/17/2023
ROADMAP
1 2 3 4
by causal AI? capital markets Weaknesses &
causal AI • Value add for
• Causal Discovery investment
management
ROADMAP OF FOUNDATIONAL CONCEPTS
 Causal Graphs
 Structural Causal Models
 Conditional Independence and CI tests
 Causal Discovery
 Causal Inference
 Counterfactuals
 ‘What-if”
 Algorithmic recourse for explainability
 Counterfactuals for model validation
8
4/17/2023
CAUSAL GRAPHS
WHAT ARE CAUSAL GRAPHS?

FROM SIMPLE RELATIONSHIPS TO COMPLEX GRAPHS
X Y
X Y
X Y
M
X Y
C
Y
X1 X2
X Y
9
4/17/2023

CAUSAL GRAPHS ENCODE AND VISUALIZE CAUSAL RELATIONSHIPS
 Nodes represent variables (inputs and outputs)
 Edges show direction and strength of relationships
Example
 A negative correlation is observed between Y and
X5. But X5 is not a causal driver of Y
 Y is directly caused by X3 and X4 only
 Y is associated with X5 (correlated)
Nodes represent variables
Line thickness represents strength of relationship
Line colour represents sign (+ vs -)

EDGE FUNCTIONS CAN BE NON-LINEAR
Edges can be constants
X1 X2 X1 = 1
1+2 = 3 2
X2 = 2
1
Y =3
X1
Y X2
Edges can be linear

X1 X2 X1 = Revenue
Output
Output
Revenue – Expense = Profit X2 = Expense

Y = Profit
X1 Y X2
Edges can be any

functional form X1 X2 Strictly +ve
Output
Output
Strictly -ve
Y = f(X1, X2) Neither
X1 Y X2
10
4/17/2023
MAKING AND USING CAUSAL GRAPHS
OPTIMAL DECISION MAKING
Causal
Discovery Human Human / ML
(data & ML Experts iteration
only)
Orthogonal Model
Factor Validation
Discovery
MODEL DEVELOPMENT & GOVERNANCE
FOUNDATIONAL CONCEPTS:
1. STRUCTURAL CAUSAL MODELS (SCM)
2. CAUSAL DISCOVERY
3. CAUSAL INFERENCE
11
4/17/2023
FOUNDATIONAL CONCEPTS
CAUSAL GRAPH (DAG) AND STRUCTURAL CAUSAL MODEL (SCM)
Directed Acyclic Graph (DAG) is the name for a graph An SCM {U,V,F} is fully described by exogenous
with directed edges (arrows) used to describe causes variables, U , endogenous variables, V , and a set of
and effect functions, F. The set of functions, F, assign values to
variables in V based on other variables in the model.
A Directed Acyclic Graph

A structural causal model
Key point: DAGs and SCMs are alternative ways to represent causal relationships in a system
TRADITIONAL ALGEBRA COULD NOT HANDLE CAUSAL QUESTIONS
1. How effective is a given treatment in preventing a disease?
2. Did the new tax break cause sales to go up? Or the new marketing campaign?
3. What are the annual health-care costs attributed to obesity?
4. Do hiring records prove an employer guilty of sex discrimination?
“Unarticulatable in the standard grammar of science” - Pearl
Y = aX vs. Y ← aX
12
4/17/2023
TRADITIONAL ALGEBRA COULD NOT HANDLE CAUSAL QUESTIONS
1. How effective is a given treatment in preventing a disease?

How do we prevent customers leaving to reduce churn and increase revenue?
2. Did the new tax break cause sales to go up? Or the new marketing campaign?
Which factors really cause positive equity returns?
3. What are the annual health-care costs attributed to obesity?
How much portfolio performance can really be attributed to each factor?
4. Do hiring records prove an employer guilty of sex discrimination?
“Unarticulatable in the standard grammar of science” - Pearl
Y = aX vs. Y ← aX
NEW MATHEMATICAL LANGUAGE TO HANDLE ‘BECAUSE’ NOT ‘WHEN’
As an example, assume and are correlated. Further, assume that one causes the other (but we don’t know which).
How can we express the distribution of (our target)? Answer: It depends if is caused by (or not)
Traditional mathematical notation (conditional probability) does not differentiate if ‘X causes Y’ or ‘Y causes X’
Both factorizations of the Unable to express the
joint distribution are equally
possible!
direction of causality
In contrast: “do-calculus” allows two different distributions after intervening on X (applying the “do” operator)
If we intervene on , fix it to , only one of equations [1] or [2] below will be correct depending on the causal relationship:
● If causes , , then intervening on changes the conditional distribution of :
RHS of [1] and [2] are
[1] different
A newer language can

● But, if causes , , distribution of is simply , regardless if differentiate between:
[2] X→Y versus Y→X
13
4/17/2023
THREE APPROACHES TO CAUSAL DISCOVERY
1. Do experiments for each edge

 intervene on X and measure the change in Y
But: these are not always possible, may be
 perform a randomized controlled trial (RCT) unethical, too expensive, etc
 A/B testing
2. Perform causal discovery using observational data

 Using data, but go beyond just looking for statistical
associations with correlations Iterate between 2 & 3
3. Apply domain knowledge
 injecting human context Algorthims determine some edges (but some remain unresolved)…
Human expresses a view on some edges (but perhaps not all)…
Algorithms determine some additional edges…
And then repeat…
CAUSAL DISCOVERY FROM DATA: MANY DIFFERENT APPROACHES
Recent Interview & Survey paper: Hünermund et al
14
4/17/2023
EXAMPLE OF ONE TYPE OF CAUSAL ALGORITHMS:

(USING CONDITIONAL INDEPENDENCE TESTING)
CONDITIONAL INDEPENDENCE
Stats 101 X and Z are independent if P(X)= P(X | Z), or alternatively, P(X∩Z)=P(X)*P(Z)
Conditional Independence X and Z are conditionally independent given Y if:
P(X∩Z | Y)=P(X | Y)*P(Z | Y)
Intuition If we know Y already, then knowing X tells us nothing more about Z (nor Z about X)
Example • Ice cream sold (X) & shark attacks (Z) are not
independent. They are both higher when its Y
hot (Y)
• However, if we know the temperature
(‘condition on Y’) then X and Z are X
Conditionally Z
conditionally independent Independent given Y
15
4/17/2023
CAUSAL DISCOVERY FROM DATA: EQUIVALENCE CLASSES
CONDITIONAL INDEPENDENCIES IN DAGS
X Y Z
X is independent of Z given Y, all

X Y Z
with the same p-value!
X Y Z
MARKOV EQUIVALENCE CLASS:
X Y Z
X Y Z
X Y Z
A Markov equivalence class is a set of DAGs

that encode the same set of conditional
X Y Z independencies.
16
4/17/2023
X is independent of Z given Y
X is not independent of Z given Y
X Y Z
X Y Z
X Y Z
Most causal discovery algorithms work by
identifying the Markov equivalence class,
performing conditional independence tests
X Y Z
‘Collider’ relationships are not conditionally
independent.
These can be used to orientate the edges
CAUSAL DISCOVERY FROM DATA: EXAMPLE ALGORITHMS
Constraint-based: Score-based: Continuous

Based on conditional Based on evaluating a Optimization:
independence (CI) testing potential causal graph Treating the graph structure as
with p-values. with a score function (eg: a continuous variable.
BIC). Examples include NOTEARS
Well-studied methods that and GOLEM.
can, for some methods, Score functions mostly
identify latent confounding. include likelihood terms
but some also use CI tests. Non-Gaussian Noise:
Examples: PC and FCI. Linear models with non-
Examples: GES and A* gaussian noise (LinGAM based
models)
Recent review paper: Glymour et al., 2019
17
4/17/2023
CAUSAL GRAPHS INCORPORATING HUMAN DOMAIN KNOWLEDGE
Market experts can define the causal drivers and relationships according to beliefs. Encoded as
constraints by the machine. The machine calibrates edge functions based on data and constraints.
Step 1: Human can specify the drivers of the target Step 2: Human can constrain edges between nodes
This node is a direct parent of the target The relationship between two nodes are strictly positive (or negative)
This node is not a parent of the target The relationship between two nodes is piecewise linear
And many others edge functions…
CAUSAL MODEL: CALIBRATION OF EDGE FUNCTIONS FROM DATA
18
4/17/2023
SUMMARY OF CAUSALITY FOUNDATIONAL CONCEPTS
Causal Graphs Causal Discovery Causal Inference Counterfactual

Reasoning
A tool that allows us to Identifying a causal Inferring
encode causal graph from interventional Hypothetical
relationships and observational data. distributions from experiments: “What
visualize them. observational data. would we have seen if
Human-in-the-loop something else had
Allows communication approaches allow happened?”
and explainability integration of domain Estimating the causal
knowledge. effect of doing certain Extremely useful for
interventions. explainability, fairness
and decision-making.
ROADMAP
1 2 3 4
causal AI • Value for
management
19
4/17/2023
CAUSAL AI IN CAPITAL MARKETS

POTENTIAL TO BE USED WHEREVER WE FIND CORRELATION
● Human Guided Research integration of human expertise into models (below right)
● Portfolio Construction: Similar performance, lower turnover (right)
● Orthogonal factor research: enhance existing models &/or uncover true causal
drivers (below left)
● Client Analytics Maximize lifetime value with next best actions
● Evaluate alt datasets faster and only on-board datasets with value
● Other use cases: performance attribution, market impact, scenario analysis, stress
testing of factors, model validation…
Causal Human
Discovery Guided
algorithms Research
Data used to Humans can
uncover true see models to
causal drivers interact &
modify them
39
PERFORMANCE ATTRIBUTION
UNDERSTAND WHY PERFORMANCE OCCURS, NOT JUST WHEN
An explainable framework to understand performance more precisely:
 Supplement traditional performance attribution,
 Investigate the source of a manager’s performance,
 Monitor if a manager is doing what they claim to do.
Combine macroeconomic factors with Find causal drivers, beyond linear models, to Explainable and transparent causal
security-specific and trade-specific factors gain a deeper understanding of performance attribution
20
4/17/2023
PORTFOLIO CONSTRUCTION
SUPERIOR PORTFOLIO CONSTRUCTION AT LOWER TURNOVER
Traditional methods require accurate correlation forecasts (of the future)
Markowitz Mean-Variance uses covariance Hierarchical Risk Parity clusters on a distance Correlations: Risk of overfitting to historical data, &
matrix (& correlations) measure (using correlations) unable to capture the asymmetry of causal drivers
Using causal relationships, we find more stable relationships to improve traditional methods
Markowitz Mean-Variance -> Causal Quadratic Optimisation
Hierarchical Risk Parity -> Causal Hierarchical Risk Parity
Higher out-of-sample
generalizability and
lower turnover with
similar performance
HUMAN GUIDED RESEARCH

TESTING INVESTMENT HYPOTHESIS
Human-Guided Causal Discovery

allows domain experts to aid the
causal discovery process by
iteratively injecting their domain
knowledge into the graph
discovery process.
Also allows quants and non-

quants to efficiently collaborate
on models
21
4/17/2023
CLIENT ANALYTICS AND NEXT-BEST-ACTION

WEALTH MANAGEMENT: IMPROVE CLIENT RETENTION TO DRIVE LIFETIME VALUE
Business Challenge
• Challenge to identify reasons clients leave
• Deploying new models takes too long
• Concerns about model validation and explainability
Solution
• Deliver key propensity indicators and recommend ‘next-best-actions’
• Expose the root cause of change in client behaviors to drivers (fees, discounts, client
engagement, etc)
• Causal model is explainable and fully compliant with banks model risk guidelines and
AI regulations
Benefits
• Proactive client engagement with early warnings for Relationship managers
• Lower cost to serve existing clients and improve scalability
• Models that are inherently explainable, non-technical users can interrogate and trust
ROADMAP
1 2 3 4
causal AI • Value for
management
22
4/17/2023
ASSUMPTIONS / LIMITATIONS / WEAKNESSES: LIKE ANY MODEL!
 Requirement of data or subjective domain knowledge (and ideally both)

 Data or domain knowledge representative out-of-sample
 Easy problems vs Hard problems
1. Low stakes versus High stakes
2. Multiple decisions versus single decisions
3. Invariance versus concept drift
WHAT DOES A “GOOD” CAUSAL PROBLEM LOOK LIKE?
 Some data (but not enough)

 Some subjective domain knowledge (but not perfect foresight)
 Objective to increase scalability and bandwidth
 Multiple low stakes decisions (vs a single high stakes decision)
 Explainability required
 Accuracy of predictions?
“No, not a crystal ball in a black box!”
23
4/17/2023
THE VALUE FOR INVESTMENT MANAGEMENT
 Enhanced communication for hybrid teams FASTER MODEL DEVELOPMENT

 Integration of human domain knowledge SMALL DATASETS & FUTURE PAST
 Explainability (vs black box & accuracy) TRUST
Final general comments:

 Current AI/ML paradigm: More data = better & “remove the human”
 Sometimes the answer may not lie in historical data…
 Future AI/ML paradigm: “Human–Machine ensembles”
RESOURCES Introductory / non-technical

 The danger of mixing up causality and correlation – Smeets, 5 mins TedX Talk, 2012
 Why: A guide to finding and using causes – Klein, book, 2015
 The Book of Why – Pearl & Mackenzie, book, 2018
Algorithm overview / survey papers
 Review of Causal Discovery Methods Based on Graphical Models – Glymour, Zhang & Spirtes, 2019
 Causal Machine Learning and Business Decision Making – Hünermund, Kaminski & Schmitt, 2022
Technical material
 Lecture series from Brady Neil: https://www.bradyneal.com/causal-inference-course
 From Statistical to Causal Leaning – Schoelkopf & von Kuegelgen, paper, 2022
 Elements of Causal Inference – Peters, Janzing & Schoelkopf, book, 2017
Open Source Python Packages
 CausalNex – Quantum Black / McKinsey https://github.com/quantumblacklabs/causalnex
 causalLib – IBM https://github.com/IBM/causallib
 CausalML – Uber https://github.com/uber/causalml
 DoWhy – Microsoft https://github.com/py-why/dowhy
 EconML – Microsoft https://github.com/microsoft/EconML
 PyWhy – Amazon https://github.com/py-why
24
4/17/2023
QUESTIONS?
BEN STEINER
LINKEDIN.COM/IN/STEINERBEN/
BS3283@COLUMBIA.EDU
Event: Trading Day, June 8th

Location: BlackRock, Hudson Yards
Ben Steiner's experience is at the intersection of machine

SPEAKER learning, decision science and model risk management for
BIOGRAPHY investment firms.
With a history of successful leadership roles, including chief-of-
BEN STEINER staff in the Global Fixed Income division of BNP Paribas Asset
Management, Ben's focus is on delivering long term sustainable
value for his clients. Earlier in his career, Ben was Head of Model
Development, Portfolio Manager & Quant Researcher at
investment managers and quantitative hedge funds.
Ben is a well-regarded speaker at computational finance and
machine learning events and holds a BA (Hons) Economics from
the University of Manchester and an MSc Mathematical Finance
from Imperial College, London.
Ben is also a lecturer at Columbia University and Director of the
Society of Quantitative Analysts (SQA).
25

Causal AI For LQG - Ben Steiner - Apr 2023

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Causal AI For LQG - Ben Steiner - Apr 2023

Uploaded by

Copyright:

Available Formats

4/17/2023

Ben Steiner | Causal AI | Apr 11th 2023

CAUSAL AI FOR INVESTMENT MANAGEMENT

Ben Steiner | Causal AI | Apr 11th 2023

Nobel Prize 2021

Ben Steiner | Causal AI | Apr 11th 2023

“Causal AI” added

Ben Steiner | Causal AI | Apr 11th 2023

KEY TAKE AWAYS FROM TODAY

2. Not one algorithm or one technique

Ben Steiner | Causal AI | Apr 11th 2023

Ben Steiner | Causal AI | Apr 11th 2023

WHAT DO WE MEAN BY CAUSALITY?

Ben Steiner | Causal AI | Apr 11th 2023

WHAT DO WE MEAN BY CAUSALITY?

Volume of Ice cream sold and frequency of shark attacks by month

Ice Cream Sales

“Correlation is not causation”

Ben Steiner | Causal AI | Apr 11th 2023

WHAT DO WE MEAN BY CAUSALITY?

 Using correlations instead of causal relationships can lead to incorrect inferences

 Incorrect inferences can train models delivering incorrect predictions

 Simpson’s Paradox: a statistical association for the total population is reversed

Ben Steiner | Causal AI | Apr 11th 2023

Ben Steiner | Causal AI | Apr 11th 2023

Ben Steiner | Causal AI | Apr 11th 2023

Ben Steiner | Causal AI | Apr 11th 2023

X causes Y Y causes X C causes both Association by chance X has an indirect

 But M would create a better

Ben Steiner | Causal AI | Apr 11th 2023

Ben Steiner | Causal AI | Apr 11th 2023

CAUSAL AI IN CAPITAL MARKETS

Ben Steiner | Causal AI | Apr 11th 2023

Ben Steiner | Causal AI | Apr 11th 2023

ROADMAP OF FOUNDATIONAL CONCEPTS

 Structural Causal Models

 Conditional Independence and CI tests

Ben Steiner | Causal AI | Apr 11th 2023

Ben Steiner | Causal AI | Apr 11th 2023

WHAT ARE CAUSAL GRAPHS?

Ben Steiner | Causal AI | Apr 11th 2023

WHAT ARE CAUSAL GRAPHS?

 Nodes represent variables (inputs and outputs)

 Edges show direction and strength of relationships

 Y is associated with X5 (correlated)

Nodes represent variables

Line thickness represents strength of relationship

Line colour represents sign (+ vs -)

Ben Steiner | Causal AI | Apr 11th 2023

WHAT ARE CAUSAL GRAPHS?

Edges can be linear

Revenue – Expense = Profit X2 = Expense

Edges can be any

Ben Steiner | Causal AI | Apr 11th 2023

MAKING AND USING CAUSAL GRAPHS

OPTIMAL DECISION MAKING

MODEL DEVELOPMENT & GOVERNANCE

Ben Steiner | Causal AI | Apr 11th 2023

Ben Steiner | Causal AI | Apr 11th 2023

A Directed Acyclic Graph

Ben Steiner | Causal AI | Apr 11th 2023

1. How effective is a given treatment in preventing a disease?