You are on page 1of 2

CAUSAL INFERENCE

From causal inference we can know about the effects of treatment, policies or intervention.

SIMPSON'S PARADOX
Let's take a hypothetical situation.

New disease: Covid 25


Treatments: A(0) , B(1)
Conditions: Mild(0), Severe(1)
Outcomes: Alive(0), Dead(1)

Assumption: Treatment B is much more scarce than A.

We have the data on what happens after a treatment is given, condition of each patient.
Here all the variables are binary, though it can be extended to continuous in the latter stage.
My aim is to reduce the number of deaths in the country. So which treatment will lower the
number of deaths?

------
By just looking at this picture, we can say that treatment A is performing better than B
because the percentage of people dying is less in A, as compared to that of B.

----
However, if we subgroup the data by conditions, then treatment B shows a lower mortality
rate for different conditions: mild and severe. This is known as Simpson's Paradox.

---
In 16%, the largest weight comes from the mild group, i.e. (1400/1500). Whereas in the 19%,
the largest weight is derived from the severe group (500/550).
Simpson's Paradox comes from this unequal weighting. Moreover, the large weightage in
treatment B comes from severe conditions, which makes it apparent that people with severe
conditions are more likely to die than those with mild conditions.

So the question still remains, which treatment is more effective? The answer lies in the
causal structure. There are 2 scenarios:
● Condition as a cause of treatment
● Treatment is a cause of condition

---
From the diagram (1st scenario) we can see that Condition is the cause, its effects are
Treatment, Outcome.
Moreover, Treatment is the cause and Outcome is the effect.
Here treatment B is a better choice because the doctor is trying to keep the scarce treatment
for severe cases.

Similarly for the 2nd scenario, Treatment is the cause of Condition and outcome.
And Condition is also the cause of outcome. Since B is scarce, people with mild conditions
can become severe over time, while waiting for treatment B. So A is a better choice here.
CORRELATION DOESN'T IMPLY CAUSATION

Confounding association is running between shoe sleeping and waking up with a headache,
where drinking is the confounder.
This is different from causal association, which would be that shoe sleeping is causing
headache after we wake up. It's a sorf of direct relationship.

Total association is a mixture of confounding association and causal association.


DON'T CONFUSE ASSOCIATION WITH CORRELATION BECAUSE CORRELATION IS
ONLY 1 TYPE OF ASSOCIATION.
Correlation is a measure of association, so the presence of confounding association in it,
makes correlation not equals to causation.

WHAT DOES IMPLY CAUSATION?


Potential outcome can answer this question.
Let's say If I take a pill, my headache goes away.
If I don't take a pill my headache remains.
If I don't take a pill, my headache goes away.

So from the above situations, how do we actually know if the pill is causing the headache to
go away?

do(T=1) ----> Taking the pill


do(T=0) -----> Not taking the pill

The two potential outcomes are given by:


● Yi | do(T=1)
● Yi | do(T=0)
The simpler notation for the above two are Yi(1) and Yi(0) respectively.

Causal effect = Yi(1) - Yi(0)


Causal effect is the difference between these two potential outcomes.

A fundamental problem of causal inference is: if we take the pill, we cannot observe the
outcome of not taking the pill.
Or, if we don't take the pill, we cannot observe the outcome of taking the pill. So we cannot
compute the causal effect because we have access to only one of the terms, and not both.

Randomised experiment is one way of getting causation.


RCT or Randomised Control Trial is an experimental or interventional study in which we are
measuring the effects in two groups: The control and intervention group.

Randomisation is a statistical procedure by which participants are allocated into two groups.
Randomisation eliminates selection bias (In Selection bias the sample is not a representative
of the population because every individual didn't have the same chance of getting selected in
any of the groups.)

You might also like