Professional Documents
Culture Documents
In the previous chapter, we showed that the ratio method for a single genetic
variant could be performed using summarized data on genetic associations
with the exposure and with the outcome. In this chapter, we introduce
the inverse-variance weighted method, which combines summarized data for
multiple variants into a single causal estimate. We show that estimates from
the inverse-variance weighted method are the same as those obtained from
the two-stage method, and hence most efficiently combine information from
multiple valid instrumental variables.
67
68 Mendelian Randomization
FIGURE 5.1
Mendelian randomization estimates for interleukin-1. Estimates represent
odds ratios per interleukin-1 receptor antagonist increasing allele. Error bars
represent 95% confidence intervals. Taken from The Interleukin-1 Genetics
Consortium [2015].
β̂Y j
θ̂j = (5.2)
β̂Xj
and its approximate standard error is:
se(β̂Y j )
se(θ̂j ) = | |, (5.3)
β̂Xj
where hats indicate that the quantities are estimated and the vertical lines
indicate taking the positive value of the expression. This is the first-order
standard error estimate, which we assume is a reasonable representation of the
uncertainty in the ratio estimate. It only takes into account the uncertainty
in the genetic association with the outcome, but this is typically much greater
than the uncertainty in the genetic association with the exposure (particularly
if the variant is associated with the exposure at a genome-wide level of
significance).
The β̂Xj and β̂Y j association estimates and their respective standard errors
se(β̂Xj ) and se(β̂Y j ) are referred to as summarized data. They are obtained
70 Mendelian Randomization
This is a weighted average (or, more formally, a weighted mean) of the variant-
specific causal estimates. The (fixed-effect) standard error of the IVW estimate
is:
1
se(θ̂IV W ) = . (5.5)
2 −2
j β̂Xj se(β̂Y j )
Estimating a causal effect from summarized data 71
The assumption of uncorrelated IVs ensures that the summarized data from
univariate regressions on each variant in turn equal those from multivariable
72 Mendelian Randomization
regression on all the variants in a single model (as in the two-stage least squares
method). In practice, the two-stage least squares and weighted regression-
based estimates will differ slightly as there will be non-zero correlations
between the genetic variants in finite samples, even if the variants are truly
uncorrelated in the population. However, these differences are likely to be
slight.
where bold face represents vectors and the error distribution is multivariable
normal. The estimate from a weighted generalized linear regression is:
T T
θ̂IV W,c = (β̂ X Ω−1 β̂ X )−1 β̂ X Ω−1 β̂ Y . (5.9)
This estimate is also equivalent to the 2SLS estimate with correlated variants.
Inclusion of genetic variants that are perfectly correlated will not result
in gains in efficiency for the IVW estimate. Inclusion of partially correlated
variants will improve efficiency if the extra variants explain additional
variation in the exposure. This is equivalent to saying that they are
conditionally associated with the exposure in a multivariable regression model.
Some caution is required though: if large numbers of correlated variants are
included in an analysis, then small discrepancies in correlation estimates can
lead to overly precise estimates [Burgess et al., 2017c].
Under the null hypothesis that each variant identifies the same causal
parameter, this approximately has a chi-squared distribution on J − 1 degrees
of freedom, where J is the number of genetic variants [Greco et al., 2015]. A
large value of the Q statistic means that the variant-specific ratio estimates
Estimating a causal effect from summarized data 75
0.4
rs6859
0.3
Genetic association
rs7254892
with Alzheimer’s
0.2
0.1
0.0
−0.1
−0.2
FIGURE 5.2
Genetic associations with LDL-cholesterol (horizontal axis, standard deviation
units) and with Alzheimer’s disease (vertical axis, log odds ratios) for 75
genetic variants associated with LDL-cholesterol at a genome-wide level
of significance. Lines represent 95% confidence intervals for the genetic
associations. Most points suggest a null causal effect. The two marked outlying
variants suggest a positive causal effect, but these are likely to be pleiotropic
and hence unreliable. Taken from Rees et al. [2019b].
76 Mendelian Randomization
an overdispersion parameter:
φ̂∗M
se(θ̂IV W ) = . (5.12)
2 −2
j β̂Xj se(β̂Y j )
where φ̂∗M = max(1, φ̂M ) and φ̂M is the estimate of the residual standard
error.
There are two main reasons why we prefer a multiplicative to an additive
random-effects model. First, the point estimates from the fixed-effect and
multiplicative random-effects are identical. Using a random-effects model
widens confidence intervals in the case of heterogeneity, but otherwise results
remain the same. Secondly, a multiplicative random-effects model does not
change the relative weights of the estimates in the meta-analysis. If there is
heterogeneity, an additive random-effects model reduces the weight assigned
to the most precise estimates and shares the weight more evenly amongst
all estimates. This has been criticized in the field of meta-analysis, as more
precise estimates are generally more reliable. A further reason is that the
IVW estimate (fixed-effect and multiplicative random-effects) is a consistent
estimate under the assumption of balanced pleiotropy, defined in Section 7.4.1.
Unless there is good reason for believing that all variants target the same
parameter (for example, because all variants are from a single gene region, or
there are very few variants in the analysis so heterogeneity cannot be estimated
reliably), we would generally advocate random-effects models for Mendelian
randomization. If there is no excess heterogeneity in the variant-specific causal
estimates, then the random-effects and fixed-effect results will be identical, so
there is no loss of precision. If there is excess heterogeneity in the variant-
specific causal estimates, then the fixed-effect estimate is overly precise, and
this heterogeneity should be accounted for. If there is statistical evidence for
a causal effect in a random-effects analysis, this means that there is consistent
evidence that the genetic variants support a causal effect of the exposure on
the outcome even accounting for heterogeneity in the variant-specific causal
estimates.
78 Mendelian Randomization
where bx and by are the genetic associations with the exposure and outcome,
and bxse and byse are their standard errors. A multiplicative random-effects
model is chosen by default when the number of variants is four or more. The
function automatically reports a heterogeneity test based on Cochran’s Q
statistic. The IVW estimate can also be obtained by direct calculation:
sum(by*bx*byse^-2)/sum(bx^2*byse^-2)
or by meta-analysis:
library(meta)
metagen(by/bx, abs(byse/bx))$TE.fixed
or by weighted regression:
lm(by~bx-1, weights=byse^-2)$coef[1]
summary(lm(by~bx-1, weights=byse^-2))$coef[1,2]/
min(summary(lm(by~bx-1, weights=byse^-2))$sigma, 1)
library(MendelianRandomization)
bmi_snps <- c("rs1558902", "rs2867125", "rs571312", "rs10938397",
"rs10767664", "rs2815752", "rs7359397", "rs9816226", "rs3817334",
"rs29941", "rs543874", "rs987237", "rs7138803", "rs10150332",
"rs713586", "rs12444979", "rs2241423", "rs2287019", "rs1514175",
"rs13107325", "rs2112347", "rs10968576", "rs3810291", "rs887912",
"rs13078807", "rs11847697", "rs2890652", "rs1555543", "rs4771122",
"rs4836133", "rs4929949", "rs206936")
mr_obj = pheno_input(bmi_snps,
exposure="Body mass index", pmidE="25673413", ancestryE="European",
outcome="Cigarettes per day", pmidO="20418890", ancestryO="European")
mr_ivw(mr_obj)
mr_plot(mr_obj, interactive=FALSE, orientate=TRUE)
FIGURE 5.3
Genetic associations with body mass index (horizontal axis, standard
deviation units) and with smoking intensity (vertical axis, cigarettes per day)
for 31 genetic variants associated with body mass index at a genome-wide level
of significance. Horizontal and vertical lines represent 95% confidence intervals
for the genetic associations, and the line through the origin represents the
IVW estimate. This graph was produced using the mr plot command from
the MendelianRandomization software package for R.
82 Mendelian Randomization
5.6 Summary
In summary, the inverse-variance weighted method combines summarized
data on multiple genetic variants to provide an estimate equal to that
which would have been obtained from a two-stage method if individual-
level data were available. This has allowed the widespread proliferation of
Mendelian randomization analyses using publicly-available data. However, the
inverse-variance weighted method assumes that all genetic variants are valid
instrumental variables, an assumption that may not hold in practice. Before
considering approaches that challenge this assumption, we next discuss the
interpretation of estimates from Mendelian randomization.