You are on page 1of 73

Polygenicity of Complex Traits

Abdel Abdellaoui a.abdellaoui@amc.nl
Key observations from GWASs
• The most important loci (near functional genes that
make sense) explain very little variation
• Genes with relevant functions only contribute slightly
more than random genes
• Protein coding variants contribute relatively little (~10%)
– it’s mostly regulatory variants
• The bulk of the heritability is attributable to a huge
number of common variants all across the genome

2
Why are complex traits so polygenic?

Because of indirect regulatory Because selection pressures has
effects that influence a large allowed common variants to only
number of “peripheral” genes have small effects (“flattening”)

3
Omnigenic model
2016

• There are some genes that are
functionally proximate to disease
risk:
– They produce the biggest signals
– They are most illuminating in
understanding the biology
behind disease risk
– They are responsible for only Influences “synaptic pruning” — the
elimination of connections between
small proportion of individual neurons

differences
• The bulk of variation is explained by
genes with a wide variety of
functions, many of which have no
obvious functional connection to the
disease (but are expressed in
relevant tissues)
4
Omnigenic model

Core gene = gene that encodes protein or RNA that has a direct effect on cellular and
organismal processes that change the phenotype
Peripheral gene = all other genes expressed in relevant cell-types and can affect the
phenotype indirectly through its regulatory effects on core genes
5
Omnigenic model

• Most cis-regulatory variants for
peripheral genes are weak trans-
regulatory variants for core genes.

6
Omnigenic model

Cis-regulatory elements are
regions of non-coding DNA
which regulate the transcription
of neighboring genes.

Trans-regulatory elements are
• Most cis-regulatory variants for
genes which may modify (or
peripheral genes are weak trans-
regulate) the expression of
regulatory variants for core genes.
distant genes.

7
Omnigenic model

• Most cis-regulatory variants for
peripheral genes are weak trans-
regulatory variants for core genes.
• Individually, they make tiny
contributions to the heritability,
but there are many.

8
Omnigenic model
• Some peripheral genes drive
coordinated regulation of multiple
core genes with shared directional
effects (“master regulators”)

• Most cis-regulatory variants for
peripheral genes are weak trans-
regulatory variants for core genes.
• Individually, they make tiny
contributions to the heritability,
but there are many.

9
Omnigenic model
• Some peripheral genes drive
coordinated regulation of multiple
core genes with shared directional
effects (“master regulators”)
• Include transcription factors &
protein regulators
• These can produce strong GWAS hits
as well

• Most cis-regulatory variants for
peripheral genes are weak trans-
regulatory variants for core genes.
• Individually, they make tiny
contributions to the heritability,
but there are many.

10
Omnigenic model

• In other words: all cis and trans
effects go through the core genes.

• Only the core genes have a direct
effect on the phenotype, mainly
through variation in expression
levels.

11
Omnigenic model

A quantitative model that links phenotypic
variation to the expression levels of core genes:

12
Omnigenic model

Phenotype
value in
individual i

13
Omnigenic model

Phenotype
value in
individual i

Population
mean
phenotype
14
Omnigenic model
Direct effect of a unit
change in expression of
core gene j on E(Yi)

Phenotype
value in
individual i

Population
mean
phenotype
15
Omnigenic model
Direct effect of a unit
change in expression of
core gene j on E(Yi) Expression of
Population mean
gene j in - expression of gene j
individual i

Phenotype
value in
individual i

Population
mean
phenotype
16
Omnigenic model
Direct effect of a unit
change in expression of
core gene j on E(Yi) Expression of
Population mean
gene j in - expression of gene j
individual i

Phenotype
value in
individual i
There is no direct effect of the
Population peripheral genes on the phenotype
mean value of individual [E(Yi)]
phenotype
17
Omnigenic model

Most variation in gene
expression is due to trans-
effects (~70%)

But trans-effects are more
difficult to detect because
they are much smaller

18
Omnigenic model

19
Omnigenic model
Direct effect of a unit
change in expression of
core gene j on E(Yi)

20
Omnigenic model
Direct effect of a unit
change in expression of
core gene j on E(Yi)

cis genetic variance
underlying expression
of gene j

21
Omnigenic model
Direct effect of a unit
change in expression of
core gene j on E(Yi)

cis genetic variance
underlying expression
of gene j

22
Omnigenic model
Direct effect of a unit
change in expression of
core gene j on E(Yi)

cis genetic variance trans genetic variance
underlying expression underlying expression
of gene j of gene j

23
Omnigenic model
Direct effect of a unit
change in expression of The genetic covariance of
core gene j on E(Yi) expression of core genes j & k

cis genetic variance trans genetic variance
underlying expression underlying expression
of gene j of gene j

24
Omnigenic model

25
Omnigenic model

26
Omnigenic model

SNP

27
Omnigenic model

SNP

The effect size of SNP s
on core gene j

28
Omnigenic model

Direct effect of a unit
change in expression of
SNP core gene j on E(Yi)

The effect size of SNP s
on core gene j

29
Omnigenic model

Direct effect of a unit
change in expression of
SNP core gene j on E(Yi)

The effect size of SNP s
on core gene j Nr of core
genes

30
Omnigenic model

31
Omnigenic model

Pleiotropy

• If the core genes for two traits are
uncorrelated, trans-eQTLs SNPs may
affect both traits, but with uncorrelated
directions of effect

32
Omnigenic model
• If core genes are shared between traits or
Pleiotropy expression of core genes is genetically
correlated, this may lead to genetic
covariance of the traits.

• If the core genes for two traits are
uncorrelated, trans-eQTLs SNPs may
affect both traits, but with uncorrelated
directions of effect

33
Omnigenic model
• If core genes are shared between traits or
Pleiotropy expression of core genes is genetically
correlated, this may lead to genetic
covariance of the traits.
• Genetic covariance occurs if the directions
of trans-regulation and effect sizes line up
between two traits in a coordinated way

• If the core genes for two traits are
uncorrelated, trans-eQTLs SNPs may
affect both traits, but with uncorrelated
directions of effect

34
Why are complex traits so polygenic?

Because of trans regulatory Because selection pressures has
effects that influence a large allowed common variants to only
number of “peripheral” genes have small effects (“flattening”)

35
Negative Selection
• “The large number of causal SNPs could be
explained by extraordinary biological
complexity, e.g. if thousands of genes affect a
trait under an “omnigenic model”. However,
biological complexity does not explain the
absence of large-effect SNPs.”

36
Negative Selection
• Complementary hypothesis: due to negative
selection, large-effect SNPs are prevented
from becoming common in the population
while small-effect SNPs are unaffected,
resulting in increased polygenicity for
common variants (“flattening”)

37
Negative Selection

38
Negative Selection

Effect size threshold
imposed by natural
selection pressures

39
Negative Selection

Effect size threshold
Large-effect SNPs are prevented from imposed by natural
becoming common in the population selection pressures
while small-effect SNPs are unaffected,
resulting in increased polygenicity for
common variants. 40
Negative Selection

Effect size threshold
Large-effect SNPs are prevented from imposed by natural
becoming common in the population selection pressures
while small-effect SNPs are unaffected,
resulting in increased polygenicity for
common variants. “flattening”
41
Negative Selection
In order to quantify the effects of flattening, we
introduce a mathematical definition of polygenicity:

42
Negative Selection
Effective number of
associated SNPs
(= polygenicity)

43
Negative Selection
Effective number of
associated SNPs
(= polygenicity)

Total number
of causal SNPs

44
Negative Selection
Effective number of
associated SNPs
(= polygenicity)

45
Negative Selection
Effective number of
associated SNPs Total number
(= polygenicity) of SNPs

46
Negative Selection
Effective number of
associated SNPs Total number
(= polygenicity) of SNPs

Kurtosis

47
Negative Selection
Effective number of
associated SNPs Total number
(= polygenicity) of SNPs

Kurtosis

48
Negative Selection
Effective number of
associated SNPs Total number
(= polygenicity) of SNPs

Standardized
SNP effect
Kurtosis

49
Negative Selection
Effective number of
associated SNPs Total number
(= polygenicity) of SNPs

Standardized
SNP effect
Kurtosis

SNPs with large
effects contribute
strongly to the
kurtosis, which
decreases Ma
50
Negative Selection

51
Negative Selection

52
Negative Selection
Effective number of
Average per-SNP associated SNPs
heritability (= polygenicity) Total heritability

53
Negative Selection
Effective number of
Average per-SNP associated SNPs
heritability (= polygenicity) Total heritability

• h2 = 1

54
Negative Selection
Effective number of
Average per-SNP associated SNPs
heritability (= polygenicity) Total heritability

• h2 = 1
• Mc = 100 (i.e., 100 causal SNPs in total)

55
Negative Selection
Effective number of
Average per-SNP associated SNPs
heritability (= polygenicity) Total heritability

• h2 = 1
• Mc = 100 (i.e., 100 causal SNPs in total)
• 4 causal SNPs explain 2/3 of h2, i.e., 1/6 each

56
Negative Selection
Effective number of
Average per-SNP associated SNPs
heritability (= polygenicity) Total heritability

• h2 = 1
• Mc = 100 (i.e., 100 causal SNPs in total)
• 4 causal SNPs explain 2/3 of h2, i.e., 1/6 each
• Average per-SNP h2 = Eh2(α) = 1/6 × 2/3 = 1/9

57
Negative Selection
Effective number of
Average per-SNP associated SNPs
heritability (= polygenicity) Total heritability

• h2 = 1
• Mc = 100 (i.e., 100 causal SNPs in total)
• 4 causal SNPs explain 2/3 of h2, i.e., 1/6 each
• Average per-SNP h2 = Eh2(α) = 1/6 × 2/3 = 1/9
• Ma = 9
58
Negative Selection
• Why estimate Ma (effective nr of
SNPs) and not Mc (total nr of causal
SNPs)?

59
Negative Selection
• Why estimate Ma (effective nr of
SNPs) and not Mc (total nr of causal
SNPs)?
– MC is difficult to estimate (hard to
distinguish zero from very tiny)

60
Negative Selection
• Why estimate Ma (effective nr of
SNPs) and not Mc (total nr of causal
SNPs)?
– MC is difficult to estimate (hard to
distinguish zero from very tiny)
– Negative selection affects large-
effect SNPs more than SNPs with
tiny effects, so influences Ma much
more than Mc

61
Negative Selection
• Why estimate Ma (effective nr of
SNPs) and not Mc (total nr of causal
SNPs)?
– MC is difficult to estimate (hard to
distinguish zero from very tiny)
– Negative selection affects large-
effect SNPs more than SNPs with
tiny effects, so influences Ma much
more than Mc
– Ma is more related to missing
heritability and polygenic
prediction accuracy than Mc (see
Methods of the paper)

62
Negative Selection
• Ma can be defined for categories of SNPs (e.g., low
frequency, coding SNPs).
• Differences between categories in polygenicity are
expressed in per-SNP Ma

63
Negative Selection
• Ma can be defined for categories of SNPs (e.g., low
frequency, coding SNPs).
• Differences between categories in polygenicity are
expressed in per-SNP Ma
• Polygenicity enrichment = [per-SNP Ma in a
category] / [per-SNP Ma for all SNPs]
• Heritability enrichment = [per-SNP h2 in a category]
/ [per-SNP h2 for all SNPs]
• Enrichment can be either >1 or <1 (i.e., depletion)

64
Negative Selection
• Stratified LD fourth moments
regression (S-LD4M) estimates Ma Used for the mean
from summary statistics
• S-LD4M regresses squared 𝜒2
statistics (i.e. fourth powers of
Used for the variance
signed Z scores) on LD fourth
moments, defined as sums of 𝑟4
values to each category of SNPs.
• This approach is analogous to Used for the skewness
stratified LD score regression (S-
LDSC), which regresses 𝜒2 statistics
on LD scores (LD second moments)
Used for the kurtosis
65
Negative Selection
Simulations with 10k causal SNPs and 50k total SNPs
(red & blue have 400 large effect SNPs)

66
Negative Selection
Simulations with 10k causal SNPs and 50k total SNPs
(red & blue have 400 large effect SNPs)

Estimation of Mc (total nr of causal SNPs) depends on power
(N). Estimation of Ma (effective nr of causal SNPs) does not!

67
Negative Selection
S-LD4M can be used to compare
the polygenicity of common and
low-frequency SNPs

Simulations
68
Negative Selection
S-LD4M can be used to compare Simulations
the polygenicity of common and
low-frequency SNPs

S-LD4M produces approximately
unbiased estimates of
polygenicity enrichment across
Simulations functional categories

69
Negative Selection
• S-LD4M was used to estimate
polygenicity for 33 complex traits
(average N = 361k)
• Fecundity- and brain-related
traits were remarkably polygenic
• The high polygenicity for these
traits could result from strong
negative selection or a greater
biological complexity

70
Negative Selection
For 15 well-powered
traits, common variants
were more polygenic than
rare variants

71
Negative Selection
For 15 well-powered
traits, common variants
were more polygenic than
rare variants

Heritability enrichment in functional
categories is predominantly driven by
differences in polygenicity, rather than
differences in effect-size magnitude

72
Conclusions
• The omnigenic model is incomplete – part of
the polygenicity is due to negative selection
• The negative selection paper however does
support the existence of core genes

73