Professional Documents
Culture Documents
R ES E A RC H
Loci associated with skin pigmentation quantity of melanins generated, melanosome size,
and the manner in which keratinocytes sequester
and degrade melanins (4).
identified in African populations Although more than 350 pigmentation genes
have been identified in animal models, only a
subset of these genes have been linked to normal
Nicholas G. Crawford,1 Derek E. Kelly,1,2* Matthew E. B. Hansen,1* Marcia H. Beltrame,1*
variation in humans (5). Of these, there is limited
Shaohua Fan,1* Shanna L. Bowman,3,4* Ethan Jewett,5,6* Alessia Ranciaro,1 knowledge about loci that affect pigmentation in
Simon Thompson,1 Yancy Lo,1 Susanne P. Pfeifer,7 Jeffrey D. Jensen,7 populations with African ancestry (6, 7).
Michael C. Campbell,1,8 William Beggs,1 Farhad Hormozdiari,9,10
Sununguko Wata Mpoloka,11 Gaonyadiwe George Mokone,12 Thomas Nyambo,13 Skin pigmentation is highly variable
Dawit Wolde Meskel,14 Gurja Belay,14 Jake Haut,1 NISC Comparative Sequencing Program,† within Africa
Harriet Rothschild,15 Leonard Zon,15,16 Yi Zhou,15,17 Michael A. Kovacs,18 Mai Xu,18 To identify genes affecting skin pigmentation in
Tongwu Zhang,18 Kevin Bishop,19 Jason Sinclair,19 Cecilia Rivas,20 Eugene Elliot,20 Africa, we used a DSM II ColorMeter to quantify
Jiyeon Choi,18 Shengchao A. Li,21,22 Belynda Hicks,21,22 Shawn Burgess,19 light reflectance from the inner arm as a proxy
V
single-nucleotide polymorphisms (SNPs) for anal-
ariation in epidermal pigmentation is a Asians, and Australo-Melanesians) have darker ysis. A genome-wide association study (GWAS)
striking feature of modern humans. Human pigmentation (Fig. 1), which likely mitigates the analysis with linear mixed models, controlling
pigmentation is correlated with geographic negative impact of high UVR exposure, such as for age, sex, and genetic relatedness (9), identi-
and environmental variation (Fig. 1). Pop- skin cancer and folate degradation (1). In con- fied four regions with multiple significant asso-
ulations at lower latitudes have darker trast, the synthesis of vitamin D3 in response to ciations (P < 5 × 10−8) (Fig. 1, fig. S3, and tables S2
pigmentation than those at higher latitudes, sug- UVR, needed to prevent rickets, may drive selec- and S3).
gesting that skin pigmentation is an adaptation tion for light pigmentation at high latitudes (1). We then performed fine-mapping using local
to differing levels of ultraviolet radiation (UVR) The basal layer of human skin contains melano- imputation of high-coverage sequencing data
(1). Because equatorial regions receive more UVR cytes, specialized pigment cells that harbor sub- from a subset of 135 individuals and data from
than temperate regions, populations from these cellular organelles called melanosomes, in which the Thousand Genomes Project (TGP) (Fig. 3 and
regions (including sub-Saharan Africans, South melanin pigments are synthesized and stored and table S3) (10). We ranked potential causal variants
1
Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA. 2Genomics and Computational Biology Graduate Program, University of
Pennsylvania, Philadelphia, PA 19104, USA. 3Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA. 4Department
of Pathology and Laboratory Medicine and Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA. 5Department of Electrical
Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94704, USA. 6Department of Statistics, University of California, Berkeley, Berkeley, CA 94704, USA. 7School
of Life Sciences, Arizona State University, Tempe, AZ 85287, USA. 8Department of Biology, Howard University, Washington, DC 20059, USA. 9Department of Epidemiology, Harvard T.H. Chan
School of Public Health, Boston, MA 02115, USA. 10Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA
02142, USA. 11Department of Biological Sciences, University of Botswana, Gaborone, Botswana. 12Department of Biomedical Sciences, University of Botswana School of Medicine, Gaborone,
Botswana. 13Department of Biochemistry, Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania. 14Department of Biology, Addis Ababa University, Addis Ababa, Ethiopia.
15
Stem Cell Program, Division of Hematology and Oncology, Pediatric Hematology Program, Boston Children’s Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
02115, USA. 16Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA. 17Harvard Stem Cell Institute, Harvard University, Cambridge, MA 02138, USA. 18Laboratory of
Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA. 19Translational and Functional
Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA. 20Genetic Disease Research Branch, National Human Genome Research
Institute, National Institutes of Health, Bethesda, MD 20892, USA. 21Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892,
USA. 22Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD 21701, USA. 23Department of Molecular Pharmacology, Physiology and Biotechnology,
Brown University, Providence, RI 02912, USA. 24Chan Zuckerberg Biohub, San Francisco, CA 94158, USA. 25Department of Biology, School of Arts and Sciences, University of Pennsylvania,
Philadelphia, PA 19104, USA. 26Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA. 27Department of Computer Science and Department of Human Genetics,
University of California, Los Angeles, Los Angeles, CA 90095, USA.
*These authors contributed equally to this work. †National Institutes of Health Intramural Sequencing Center (NISC) Comparative Sequencing Program collaborators and affiliations are listed in the supplementary
materials. ‡These authors contributed equally to this work. §Corresponding author. Email: tishkoff@pennmedicine.upenn.edu
within each locus using CAVIAR, a fine-mapping to other genes containing MFS domains, con- predicted regulatory regions active in melano-
method that accounts for linkage disequilibrium served throughout vertebrates, which function as cytes and/or keratinocytes (Table 1 and Fig. 3)
(LD) and effect sizes (Table 1) (11). We char- transmembrane solute transporters (25). MFSD12 and show enhancer activity in luciferase expres-
acterized global patterns of variation at these mRNA levels are low in depigmented skin of vit- sion assays in a WM88 melanoma cell line (Table 1,
loci using whole-genome sequences from West iligo patients (26), likely due to autoimmune- table S5, and fig. S7). Within MFSD12, the two
African, Eurasian, and Australo-Melanesian pop- related destruction of melanocytes. SNPs that CAVIAR identifies as having the highest
ulations (10, 12, 13). The MFSD12 locus is in a region with extensive probability of being causal are rs56203814 (F test,
The SNPs with strongest association with skin recombination, enabling us to fine-map eight P = 3.6 × 10−18), a synonymous variant within
color in Africans were on chromosome 15 at or potentially causal SNPs (Table 1 and table S3) exon 9, and rs10424065 (F test, P = 5.1 × 10−20),
near the solute carrier family 24 member 5 that cluster in two regions: one within MFSD12 located within intron 8. They are 130 bp apart,
(SLC24A5) gene (Figs. 1 and 3 and tables S2 and the other ~7600 to 9000 base pairs (bp) up- are in strong LD, and affect gene expression in
and S3). A functional nonsynonymous mutation stream of MFSD12 (Fig. 3). Many SNPs are in luciferase expression assays (1.5 to 2.7× higher
within SLC24A5 (rs1426654) (14) was significant-
ly associated with skin color (F test, P = 5.5 ×
10−62) and was identified as potentially causal
by CAVIAR (Table 1). The rs1426654 (A) allele Fig. 1. Correlations A B
is at high frequency in European, Pakistani, and between allele fre-
Indian populations (Fig. 1) and is a target of se- quencies at loci
lection in Europeans, Central Asians, and North associated with pig-
Indians (15–18). In Africa, this variant is com- mentation and UV
mon (28 to 50% frequency) in populations from exposure in global
ancestry and recent European admixture (Fig. (MI). These data were 70 –log10(P-value)
p < 5.0 10 -8
SLC24A5 QQ plot 70
(Fig. 4). Haplotype analysis indicates that the SLC24A5 - rs1426654 (15:48426484) MFSD12 - rs10424065 (19:3545022)
(QQ) plot of observed G H
rs1426654 (A) variant in Africans is on the same versus expected P
extended haplotype background as Europeans values from the GWAS.
(Fig. 5 and fig. S6), likely reflecting gene flow In both (C) and (D),
from western Eurasia over at least the past 3 to significant SNPs at P <
9 ky (23). The rs1426654 (A) variant is at high 5 × 10−8 are highlighted
frequency (28%) in Tanzanian populations, sug- in purple. (E to L) Allele A T
gesting a lower bound (~5 ka) for introduction of frequencies of genetic G A
this allele into East Africa, the time of earliest variants associated
migration from Ethiopia into Tanzania (24). Fur-
I MFSD12 - rs6510760 (19:3565253)
J TMEM138 - rs7948623 (11:61137147)
Fig. 2. Melanin distribu- Botswana San rived rs6510760 (A) and rs112332856 (C) alleles
N = 358
tions. Histograms of melanin (associated with dark pigmentation) are common
index computed from under- in all sub-Saharan Africans except the San, as
arm measurements with a well as in South Asian and Australo-Melanesian
DSM II ColorMeter for all populations (Fig. 1 and fig. S4). Haplotype anal-
individuals in each population Botswana San/Bantu N = 106 ysis places the rs6510760 (A) allele [and linked
as described in (70). Skin rs112332856 (C) allele] in Australo-Melanesians
tones were visualized by on similar haplotype backgrounds relative to
displaying the scaled mean central and eastern Africans (Fig. 5 and fig.
red, green, and blue values S6), suggesting that they are identical by de-
from the ColorMeter for Ethiopia Semitic N = 193 scent from an ancestral African population. Co-
individuals binned by alescent analysis of the SGDP data set indicates
melanin index. that the TMRCA for the derived rs6510760 (A)
allele is 996 ka [95% CI, 0.82 to 1.2 million years
ago (Ma); Fig. 4].
Botswana Bantu N = 292 We do not detect evidence for positive selec-
tion at MFSD12 using Tajima’s D and iHS statis-
tics [figs. S5 and S8; as expected if selection were
ancient (28)]. However, levels of genetic differen-
tiation are elevated when comparing East African
Tanzania Sandawe N = 98 Nilo-Saharan and western European (CEU) pop-
–log10(P-value)
40
Keratinocytes
(32), which almost exclusively make eumelanin H3K27ac
DHS
Melanocytes
H3K27ac
with two distinct lentivirally encoded shRNAs DHS
chromHMM
–log10(P-value)
rs111317445
with nontarget shRNA (Fig. 6, C and D). A frac- 10
rs73527942 rs142317543
rs6510760
6
with pigmented melanosomes (Fig. 6, E to G; 3
Melanocytes
H3K27ac
Functional characterization of MFSD12 Fig. 3. Genomic context of GWAS loci. Plot of −log10(P value) versus genomic position for variants
in mice near the four regions with most strongly associated SNPs from GWAS, including annotations for
CRISPR-Cas9 was used to generate an Mfsd12 genes, MITF ChIP-seq (chromatin immunoprecipitation sequencing) data for melanocytes (48),
null allele in a wild-type mouse background (Fig. a CTCF ChIP-seq track for NHEK keratinocytes, and H3K27ac, DNase I hypersensitive sites (DHS),
7E and fig. S11). Four founders were observed and chromHMM tracks for melanocytes and keratinocytes from the Roadmap Epigenomics data
with a uniformly gray coat color, rather than set (30). Genome-wide significant variants are highlighted in red. Circles, squares, and triangles denote
the expected agouti coat color (fig. S11, A and B). noncoding, synonymous, and nonsynonymous variants, respectively. (A) SLC24A5 locus. (B) MFSD12
These four gray founders harbored deletions at locus. (C) DDB1/TMEM138 locus. (D) OCA2/HERC2 locus.
amino acids for the gr/gr allele, Mfsd12 p.Leu163_ 857 ka (1.5 - 2.)
( 705 ka - 1.1 Ma) 1.1 Ma
(812 - 1.5)
Skin pigmentation–associated loci (<1% frequency) nonsynonymous mutations in A second group of tightly linked SNPs (LD r2 >
that play a role in UV response are the TGP data set. Genetic variants near DDB1 0.7 in East Africans) with predicted high probabil-
targets of selection were associated with human pigmentation in an ity of containing causal variants spans a ~195-kb
Another genomic region associated with pigmen- African population with high levels of European region encompassing DDB1 and TMEM138 (Table
tation encompasses a ~195-kb cluster of genes on admixture (7). 1 and Fig. 3). Two SNPs that tag this LD block are
chromosome 11 that play a role in UV response and Because of extensive LD in this region, CAVIAR rs1377457 (F test, P = 1.5 × 10−9), located ~7600 bp
melanoma risk, including the damage-specific identified 33 SNPs predicted to be causal (Table downstream of TMEM138, and rs148172827 (F test,
DNA binding protein 1 (DDB1) gene (Figs. 1 and 1). The most strongly associated SNPs are located P = 1.8 × 10−9), an insertion/deletion polymorphism
3 and table S3). DDB1 (complexed with DDB2 in a region conserved across vertebrates flanked at TKFC (triokinase and FMN cyclase) located in an
and XPC) functions in DNA repair (39); levels of by TMEM138 and TMEM216 (45) ~36 to 44 kb enhancer active in WM88 melanoma cells (67.6 to
DDB1 are regulated by UV exposure and MC1R upstream of DDB1 and are in high LD within this 76.2× higher than the minimal promoter; fig. S7
signaling, a regulatory pathway of pigmentation cluster (r2 > 0.7 in East Africans) (Fig. 3, Table 1, and table S5), which overlaps an MITF binding site
(40). DDB1 is a component of CUL4-RING E3 and table S3). Among these, the most significant- in melanocytes (30, 48); both SNPs interact with
ubiquitin ligases that regulate several cellular ly associated SNP is rs7948623 (F test, P = 2.2 × the promoters of DDB1 and neighboring genes in
and developmental processes (41); it is critical 10−11), located 172 bp downstream of TMEM138, MCF-7 cells (Table 1 and Fig. 3) (46, 47). SNPs
for follicle maintenance and female fertility in which shows enhancer activity in WM88 mela- within introns of DDB1 (rs12289370, rs7934735,
mammals (42) and for plastid size and fruit pig- noma cells (91.9 to 140.8× higher than the mini- rs11230664, rs12275843, and rs7120594) also tag
mentation in tomatoes (43). Knockouts of DDB1 mal promoter; fig. S7 and table S5) and interacts this LD block (Table 1 and Fig. 3).
orthologs are lethal in both mouse and fruitfly with the promoters of DDB1 and neighboring RNA-seq data from 106 primary melanocyte cul-
development (44), and DDB1 only exhibits rare genes in MCF-7 cells (Table 1 and Fig. 3) (46, 47). tures indicate that African ancestry is significantly
correlated with increased DDB1 gene expression DDB1 is most strongly associated with a SNP in correlates with increased DDB1 expression. We
(PCC, P = 2.6 × 10−5; fig. S9). Association tests an intron of DDB1, rs7120594, at marginal statis- did not have the power to detect an association
using a permutation approach indicated that, of tical significance after correction for ancestry and between expression of DDB1 and SNPs in LD with
the 35 protein-coding genes with a transcription multiple testing (Padj = 0.06; fig. S9). The allele rs7948623 due to low minor allele frequencies
start site within 1 Mb of rs7948623, expression of associated with dark pigmentation at rs7120594 (~2%). The role of DDB1 and neighboring loci
in human pigmentation remains to be further nearly fixed in European, East Asian, and Native treme negative Tajima’s D values in East African
explored. American populations. Nilo-Saharans and San over a shorter distance
The derived rs7948623 (T) allele near TMEM138 In South Asians and Australo-Melanesians, the (115 and 100 kb, respectively) (fig. S5). A haplo-
(associated with dark pigmentation) is most com- alleles associated with darker pigmentation re- type extending greater than 195 kb is common in
mon in East African Nilo-Saharan populations side on haplotypes closely related, or identical, Eurasians and rare in Africans (Fig. 5) and tags
and is at moderate to high frequency in South to those observed in Africa (Fig. 5 and fig. S6), the alleles associated with light skin pigmenta-
Asian and Australo-Melanesian populations (Fig. suggesting that they are identical by descent. The tion. The TMRCA of a large number of haplotypes
1 and fig. S4). At SNP rs11230664, within DDB1, TMRCAs for the derived dark allele at rs7948623 carrying the rs7948623 (A) allele in non-Africans,
the ancestral (C) allele (associated with dark pig- and the derived light allele at rs11230664 are es- associated with light pigmentation, is 60 ka (95%
mentation) is common in all sub-Saharan African timated to be older than 600 and 250 ka, respec- CI, 58 to 62 ka), close to the inferred time of the
populations, having the highest frequency in East tively (Fig. 4). migration of modern humans out of Africa (Fig. 4)
African Nilo-Saharan, Hadza, and San populations Consistent with a selective sweep, we see an (49). These results, combined with large FST val-
(88 to 96%), and is at moderate to high frequency excess of rare alleles (and extreme negative ues between Africans and Europeans at SNPs
in South Asian and Australo-Melanesian popula- Tajima’s D values) and high levels of homozygosity tagging the extended haplotype near DDB1 (for
tions (12 to 66%) (Fig. 1 and fig. S4). The derived extending ~350 to 550 kb in Europeans and Asians, example, FST = 0.98 between Nilo-Saharans and
(T) allele (associated with light pigmentation) is respectively (figs. S5 and S14). We observe ex- CEU at rs7948623, within the top 0.01% of values
A B
and San (Fig. 1 and fig. S4), consistent with a
previous observation (56). Haplotype (Fig. 5) and
coalescent analyses (Fig. 4 and fig. S6) show two
divergent clades, one enriched for the rs1800404
(C) allele and the other for the rs1800404 (T)
allele. Coalescent analysis indicates that the
TMRCA of all lineages is 1.7 Ma (95% CI, 1.5 to
2.0 Ma), and the TMRCA of lineages containing
the derived (T) allele is 629 ka (95% CI, 426 to
848 ka) (Fig. 4). The deep coalescence of lineages,
and the positive Tajima’s D values in this region
C D in both African and non-African populations (fig.
S5), is consistent with balancing selection acting
at this locus.
The SNP with highest probability of being
causal in region 3 is rs4932620 (F test, P = 3.2 ×
10−9) located within intron 11 of HERC2 (Fig. 3,
Table 1, and table S3). This SNP is 917 bp from
rs916977, a SNP associated with blue eye color
in Europeans (57, 58), and is in strong LD (r2 =
Table 1. Annotations of candidate causal SNPs from GWAS. Top types (DHS other) available from Roadmap Epigenomics are indicated with
candidate causal variants for the four regions identified based on analysis X (30, 92). Variants intersecting enhancer regions tested by luciferase
with CAVIAR (11). For each variant, the genomic position (Location), RSID, assays were labeled with Y (significant enhancer activity) or N (no
and Ancestral>Derived alleles are shown, with the allele associated with enhancer activity) (fig. S7). Chromatin interactions with nearby genes
dark pigmentation in bold. Beta and standard error [Beta(SE)] and the measured in MCF-7 or K562 cell lines as identified by ChIA-PET are listed
P values from the GWAS (F test, linear mixed model) are given. For with gene names (Chromatin interactions) (46, 47). SNPs that are in
functional genomic data, nearest genes are given and variants overlapping strong LD (r2 > 0.7 in East Africans) are numerically labeled in the column
DHS sites for melanocytes (E059) (DHS melanocytes) and/or other cell titled LD block.
Alleles in this LD block associated with dark pig- SNPs. Considering each locus in turn and all sig- and Denisovan genome sequences, which di-
mentation correlate with increased OCA2 expres- nificantly associated variants (P < 5 × 10−8), the verged from modern human sequences 804 ka
sion. We did not observe associations between trait variation attributable to each locus is as fol- (64), contain the ancestral allele at all loci. These
the candidate causal variants in region 1 and lows: SLC24A5 (12.8%; SE, 3.5%), MFSD12 (4.5%; observations are consistent with the hypothesis
OCA2 expression despite a high minor allele SE, 2.1%), DDB1/TMEM138 (2.2%; SE, 1.5%), and that darker pigmentation is a derived trait that
frequency (34%). However, we observe a signif- OCA2/HERC2 (3.9%; SE, 2.9%). Thus, ~29% of the originated in the genus Homo within the past
icant association between a haplotype tagged additive heritability of skin pigmentation in Afri- ~2 million years (My) after human ancestors lost
by rs1800404 and alternative splicing resulting cans is due to variation at these four regions. This most of their protective body hair, although these
in inclusion/exclusion of exon 10 (linear regres- observation indicates that the genetic architecture ancestral hominins may have been moderately,
sion t test, P = 9.1 × 10−40). Exon 10 encodes the of skin pigmentation is simpler (that is, fewer rather than darkly, pigmented (65, 66). Moreover,
amino acids encompassing the third transmem- genes of stronger effect) than other complex it appears that both light and dark pigmentation
brane domain of OCA2 and is the location of sev- traits, such as height (62). In addition, most can- have continued to evolve over hominid history.
eral albinism-associated OCA2 mutations (60, 61), didate causal variants are in noncoding regions, Individuals from South Asia and Australo-
raising the possibility that the shorter transcript indicating the importance of regulatory variants Melanesia share variants associated with dark
encodes a nonfunctional channel. Comparing influencing skin pigmentation phenotypes. pigmentation at MFSD12, DDB1/TMEM138, OCA2,
splice junction usage across individuals, we es- and HERC2 that are identical by descent from
timate that each additional copy of the light Evolution of skin pigmentation in Africans. This raises the possibility that other
rs1800404 (T) allele reduces inclusion of exon modern humans phenotypes shared between Africans and some
10 by ~20% (95% CI, 17.9 to 21.5%; fig. S9). There- Skin pigmentation is highly variable within Af- South Asian and Australo-Melanesian popula-
fore, homozygotes for the light rs1800404 (T) rica. Populations such as the San from southern tions may also be due to genetic variants identical
allele are expected to produce ~60% func- Africa are the most lightly pigmented among by descent from African populations rather than
tional OCA2 protein (compared to individuals Africans, whereas the East African Nilo-Saharan convergent evolution (67). This observation is con-
with albinism who produce no functional OCA2 populations are the most darkly pigmented in sistent with a proposed southern migration route
protein). the world (Fig. 1). Most alleles associated with out of Africa ~80 ka (68). Alternatively, it is possible
light and dark pigmentation in our data set are that light and dark pigmentation alleles segregated
Skin pigmentation is a complex trait estimated to have originated before the origin in a single African source population (13, 69) and
To estimate the proportion of pigmentation var- of modern humans ~300 ka (27). In contrast to that alleles associated with dark pigmentation
iance explained by the top eight candidate SNPs the lack of variation at MC1R, which is under pur- were maintained outside of Africa only in the South
at SLC24A5, MFSD12, DDB1/TMEM138, and OCA2/ ifying selection in Africa (63), our results indicate Asian and Australo-Melanesian populations due
HERC2, we used a linear mixed model with two that both light and dark alleles at MFSD12, DDB1, to selection.
genetic random effect terms: one based on the OCA2, and HERC2 have been segregating in the By studying ethnically, genetically, and pheno-
genome-wide kinship matrix and the other based hominin lineage for hundreds of thousands of typically diverse Africans, we identify novel pig-
on the kinship matrix derived from the set of years (Fig. 4). Furthermore, the ancestral allele mentation loci that are not highly polymorphic
significant variants. About 28.9% (SE, 10.6%) of is associated with light pigmentation in about in European populations. The loci identified in this
the pigmentation variance is attributable to these half of the predicted causal SNPs; Neandertal study appear to affect multiple phenotypes. For
example, DDB1 influences pigmentation (43), cel- each region and phased them using SHAPEIT2 ered a single copy of each chromosome from each
lular response to the mutagenic effect of UVR (76, 77). The reference panel came from two of the 279 individuals from (13). We inferred re-
(40), and female fertility (42). Thus, some of the datasets: filtered variants from the 135 African combination breakpoints within a symmetric win-
pigmentation-associated variants identified here genomes and TGP (10). After phasing, imputa- dow surrounding each locus using the program
may be maintained because of pleiotropic effects tion was performed using Minimac3 (78). Im- kwarg (89) and identified the longest shared
on other aspects of human physiology. putation performed very well at most loci (R2 > haplotype between each pair of sequences in
It is important to note that genetic variants 0.91 with MAF ≥ 0.05) (table S3). which no recombination events occurred. We
that do not reach genome-wide significance in To identify SNPs associated with pigmenta- then computed the expected coalescence time
our study might also affect the pigmentation tion, GWAS was performed first on the Illumina between each pair of sequences, conditional on
phenotype. The 1000 most strongly associated Omni 5M SNP dataset, and independently with the observed number of mutations in the non-
SNPs exhibit enrichment for genes involved in imputed variants at candidate regions, using recombining region. Genealogies were constructed
pigmentation and melanocyte physiology in the linear mixed models implemented in EMMAX by applying the WPGMA hierarchical clustering
mouse phenotype database and in ion transport software (9). Age and sex were included as co- algorithm to the estimated pairwise coalescence
and pyrimidine metabolism in humans (table S8). variates, and we corrected for genetic related- times. Our estimator accounts for recombination
Future research in larger numbers of ethnically ness with an IBS kinship matrix. We used CAVIAR events and the population size history. However,
diverse Africans may reveal additional loci asso- to identify variants in the imputed dataset most simulation studies indicate that accounting for
ciated with skin pigmentation and will further likely to be causal (11). Ontology enrichment for time-varying population size has relatively little
shed light on the evolutionary history, and adap- genes near the top 1000 most strongly associated effect on our estimates when the size changes ac-
tive significance, of skin pigmentation in humans. variants from the 5M dataset was obtained using cording to previously inferred histories for human
the annotation tool GREAT (79). populations (70, 90). Because the true population
Materials and methods We estimated the contribution to the variation sizes and relationships among the populations we
expression values and principal components, with renilla luciferase control vector (pRL-CMV) erate a 134 bp deletion resulting in a null allele
Pearson correlation between gene expression lev- in a dual luciferase assay. Relative luciferase activ- of Mfsd12. A mixture of Cas9 mRNA (TriLink
els and ancestry was calculated, and associations ity (firefly/renilla luminescence ratio) is presented BioTechnologies) and each of the two synthesized
between GWAS variant genotypes and gene ex- as fold change compared to cells transfected with gRNAs was used for pronuclear injection into
pression levels were evaluated using ordinary least the empty pGL4.23 vector. Data were analyzed C57BL/6J × FVB/N F1 hybrid zygotes. Mutation
squares regression. with a modified Kruskal-Wallis Rank Sum test carrying mice were viable and presented with gray
To identify associations between our GWAS and pairwise comparisons between groups were coat color distinct from littermates. Hairs were
candidate causal variants and expression of near- performed using the Conover method. P values plucked from postnatal day 18 mice and indi-
by genes (using the 106 melanocyte transcriptomes), were corrected for multiple comparisons with the vidual awl hairs were mounted in permount and
we first found all protein-coding genes with tran- Benjamini-Hochberg method using the R package imaged with a stereomicroscope (Zeiss SteREO
scription start site (TSS) within 1 Mb of the top PMCMR, and P values less than 0.05 were con- Discovery.V12) at the base of the sub apical yel-
GWAS variant for each locus and RSEM values sidered significant. low band where the switch from eumelanin to
greater than 0.5 in the primary melanocyte cultures. We characterized the function of MFSD12 in pheomelanin is visible.
Pearson correlation was used to measure the asso- vitro in immortalized melanocytes and in vivo in To characterize Mfsd12 in grizzled mice, Illu-
ciation between ancestry and gene expression. both zebrafish and mice. Immortalized melan- mina generated whole genome sequences of
For each locus, we tested whether any genes Ink4a melanocytes from C57BL/6 Ink4a-Arf1−/− grizzled, JIGR/DN (gr/gr) reads were mapped
with a transcription start site within 1 Mb of the mice were cultured as described (32). To deplete using bwa mem to GRCm38/mm10 (available at
top SNP had an eQTL amongst the set of pig- MFSD12, cells were infected with recombinant SRA Accession SRR5571237). Sequence variants
mentation QTLs using an additive linear model lentiviruses—generated by transient transfection between JIGR/DN gr/gr and C57BL/6J reference
with the first two principal components of ances- in HEK293T cells—to express Mfsd12-targeted genome within the gr/gr candidate region were
try as covariates. To identify significant variant- shRNAs or non-target controls. Cells resistant to identified using SnpEff (104). Validation of a 12-bp
15. S. Beleza et al., The timing of pigmentation lightening in cells. J. Cell Biol. 152, 809–824 (2001). doi: 10.1083/ expression. Hum. Genet. 123, 177–187 (2008). doi: 10.1007/
Europeans. Mol. Biol. Evol. 30 24–35 (2013). doi: 10.1093/ jcb.152.4.809; pmid: 11266471 s00439-007-0460-x; pmid: 18172690
molbev/mss207; pmid: 22923467 39. G. Chu, E. Chang, Xeroderma pigmentosum group E cells 59. H. E. Seberg et al., TFAP2 paralogs regulate melanocyte
16. M. Jonnalagadda et al., Identifying signatures of positive lack a nuclear factor that binds to damaged DNA. differentiation in parallel with MITF. PLOS Genet. 13,
selection in pigmentation genes in two South Asian Science 242, 564–567 (1988). doi: 10.1126/ e1006636 (2017). pmid: 10.1371/journal.pgen.1006636;
populations. Am. J. Hum. Biol. 29, e23012 (2017). science.3175673; pmid: 3175673 pmid: 28249010
doi: 10.1002/ajhb.23012; pmid: 28439965 40. A. L. Kadekaro et al., Melanocortin 1 receptor genotype: An 60. W. S. Oetting, S. S. Garrett, M. Brott, R. A. King, P gene
17. C. Basu Mallick et al., The light skin allele of SLC24A5 in important determinant of the damage response of mutations associated with oculocutaneous albinism type II
South Asians and Europeans shares identity by descent. melanocytes to ultraviolet radiation. FASEB J. 24, (OCA2). Hum. Mutat. 25, 323 (2005). doi: 10.1002/
PLOS Genet. 9, e1003912 (2013). doi: 10.1371/journal. 3850–3860 (2010). doi: 10.1096/fj.10-158485; humu.9318; pmid: 15712365
pgen.1003912 pmid: 20519635 61. R. Kerr et al., Identification of P gene mutations in individuals
18. I. Mathieson et al., Genome-wide patterns of selection in 230 41. Y. Zhang et al., Arabidopsis DDB1-CUL4 ASSOCIATED with oculocutaneous albinism in sub-Saharan Africa. Hum.
ancient Eurasians. Nature 528, 499–503 (2015). FACTOR1 forms a nuclear E3 ubiquitin ligase with DDB1 and Mutat. 15, 166–172 (2000). doi: 10.1002/(SICI)1098-1004
doi: 10.1038/nature16152; pmid: 26595274 CUL4 that is involved in multiple plant developmental (200002)15:2<166::AID-HUMU5>3.0.CO;2-Z; pmid: 10649493
19. L. Pagani et al., Ethiopian genetic diversity reveals linguistic processes. Plant Cell 20, 1437–1455 (2008). doi: 10.1105/ 62. A. R. Wood et al., Defining the role of common variation in the
stratification and complex influences on the Ethiopian gene tpc.108.058891; pmid: 18552200 genomic and biological architecture of adult human height.
pool. Am. J. Hum. Genet. 91, 83–96 (2012). doi: 10.1016/ 42. C. Yu et al., CRL4 complex regulates mammalian oocyte Nat. Genet. 46, 1173–1186 (2014). doi: 10.1038/ng.3097;
j.ajhg.2012.05.015; pmid: 22726845 survival and reprogramming by activation of TET proteins. pmid: 25282103
20. F. Tekola-Ayele et al., Novel genomic signals of recent Science 342, 1518–1521 (2013). doi: 10.1126/ 63. R. M. Harding et al., Evidence for variable selective pressures
selection in an Ethiopian population. Eur. J. Hum. Genet. 23, science.1244587; pmid: 24357321 at MC1R. Am. J. Hum. Genet. 66, 1351–1361 (2000).
1085–1092 (2015). doi: 10.1038/ejhg.2014.233; 43. M. Lieberman, O. Segev, N. Gilboa, A. Lalazar, I. Levin, The doi: 10.1086/302863; pmid: 10733465
pmid: 25370040 tomato homolog of the gene encoding UV-damaged DNA 64. D. Reich et al., Genetic history of an archaic hominin group
21. C. M. Schlebusch et al., Genomic variation in seven Khoe-San binding protein 1 (DDB1) underlined as the gene that causes from Denisova Cave in Siberia. Nature 468, 1053–1060
groups reveals adaptation and complex African history. the high pigment-1 mutant phenotype. Theor. Appl. Genet. (2010). doi: 10.1038/nature09710; pmid: 21179161
Science 338, 374–379 (2012). doi: 10.1126/science.1227721; 108, 1574–1581 (2004). pmid: 14968305 65. N. G. Jablonski, G. Chaplin, The colours of humanity: The
pmid: 22997136 44. K.-i. Takata, H. Yoshida, M. Yamaguchi, K. Sakaguchi, evolution of pigmentation in the human lineage. Philos. Trans.
81. J. Yang et al., Common SNPs explain a large proportion of 96. T. L. Bailey et al., MEME SUITE: Tools for motif discovery and for technical assistance; M. Burmeister at University of Michigan
the heritability for human height. Nat. Genet. 42, 565–569 searching. Nucleic Acids Res. 37, W202–W208 (2009). for grizzled mouse samples; L. Garrett at the Embryonic Stem Cell
(2010). doi: 10.1038/ng.608; pmid: 20562875 doi: 10.1093/nar/gkp335; pmid: 19458158 and Transgenic Mouse Core [National Human Genome Research
82. B. S. Weir, C. C. Cockerham, Estimating F-statistics for the 97. A. Mathelier et al., JASPAR 2016: A major expansion and Institute (NHGRI)]; R. Sood in the Zebrafish Core (NHGRI); and the
analysis of population structure. Evolution 38, 1358–1370 update of the open-access database of transcription factor African participants. We acknowledge the contribution of the staff
(1984). doi: 10.2307/2408641; pmid: 28563791 binding profiles. Nucleic Acids Res. 44, D110–D115 (2016). members of the Cancer Genomics Research Laboratory [National
83. B. F. Voight, S. Kudaravalli, X. Wen, J. K. Pritchard, A map of doi: 10.1093/nar/gkv1176; pmid: 26531826 Cancer Institute (NCI)], the NIH Intramural Sequencing Center, the
recent positive selection in the human genome. PLOS Biol. 4, e72 98. A. Dobin et al., STAR: Ultrafast universal RNA-seq aligner. NCI Center for Cancer Research Sequencing Facility, the Yale
(2006). doi: 10.1371/journal.pbio.0040072; pmid: 16494531 Bioinformatics 29, 15–21 (2013). doi: 10.1093/ University Skin SPORE Specimen Resource Core, and the
84. F. Tajima, Statistical method for testing the neutral mutation bioinformatics/bts635; pmid: 23104886 Botswana–University of Pennsylvania Partnership. This work
hypothesis by DNA polymorphism. Genetics 123, 585–595 99. B. Li, C. N. Dewey, RSEM: Accurate transcript quantification used computational resources of the NIH High-Performance
(1989). pmid: 2513255 from RNA-seq data with or without a reference genome. BMC Computing (HPC) Biowulf cluster. This research was funded by
85. P. Danecek et al., The variant call format and VCFtools. Bioinformatics 12, 323 (2011). doi: 10.1186/1471-2105-12-323; the following grants: NIH grants 1R01DK104339-0 and
Bioinformatics 27, 2156–2158 (2011). doi: 10.1093/ pmid: 21816040 1R01GM113657-01 and NSF grant BCS-1317217 to S.A.T.,
bioinformatics/btr330; pmid: 21653522 100. O. Stegle, L. Parts, R. Durbin, J. Winn, A Bayesian framework NIH grant R01 AR048155 from the National Institute of Arthritis
86. Z. A. Szpiech, R. D. Hernandez, selscan: An efficient to account for complex non-genetic factors in gene and Musculoskeletal and Skin Diseases (NIAMS) to M.S.M,
multithreaded program to perform EHH-based scans for expression levels greatly increases power in eQTL studies. NIH grant R01 AR066318 from NIAMS to E.O., NIH grants
positive selection. Mol. Biol. Evol. 31, 2824–2827 (2014). PLOS Comput. Biol. 6, e1000770 (2010). doi: 10.1371/journal. 5R24OD017870-04 and 1U54DK110805-01 to L.Z. and Y.Z., NIH
doi: 10.1093/molbev/msu211; pmid: 25015648 pcbi.1000770; pmid: 20463871 grant R01-GM094402 to Y.S.S., and NIH grant K12 GM081259
87. H. J. Bandelt, P. Forster, A. Röhl, Median-joining networks for 101. C. E. Bonferroni, Teoria statistica delle classi e calcolo delle from NIGMS to S.B. M.H.B. was partly supported by a “Science
inferring intraspecific phylogenies. Mol. Biol. Evol. 16, 37–48 probabilità (Pubblicazioni del R Istituto Superiore di Scienze Without Borders” fellowship from CNPq, Brazil. Y.S.S. is a Chan
(1999). doi: 10.1093/oxfordjournals.molbev.a026036; Economiche e Commerciali di Firenze, 1936), vol. 8. Zuckerberg Biohub investigator. This work was supported in part
pmid: 10331250 102. C. Delevoye et al., AP-1 and KIF13A coordinate endosomal by the Center of Excellence in Environmental Toxicology
88. Fluxus Engineering, www.fluxus-engineering.com. sorting and positioning during melanosome biogenesis. J. Cell (NIH P30-ES013508, T32-ES019851 to M.E.B.H.) (National
89. R. B. Lyngsø, Y. S. Song, J. Hein, Minimum recombination Biol. 187, 247–264 (2009). doi: 10.1083/jcb.200907122; Institute of Environmental Health Sciences), the Intramural
histories by branch and bound, in Algorithms in Bioinformatics, pmid: 19841138 Program of the NHGRI, and the Division of Cancer Epidemiology
SUPPLEMENTARY http://science.sciencemag.org/content/suppl/2017/10/11/science.aan8433.DC1
MATERIALS
RELATED file:/content
CONTENT
http://science.sciencemag.org/content/sci/358/6360/157.full
http://science.sciencemag.org/content/sci/358/6365/867.full
http://science.sciencemag.org/content/sci/358/6369/eaar7002.full
REFERENCES This article cites 127 articles, 26 of which you can access for free
http://science.sciencemag.org/content/358/6365/eaan8433#BIBL
PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions
Science (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement of
Science, 1200 New York Avenue NW, Washington, DC 20005. 2017 © The Authors, some rights reserved; exclusive
licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. The title
Science is a registered trademark of AAAS.