Professional Documents
Culture Documents
HSC
CMP
GMP
Correspondence
jbuen@broadinstitute.org (J.D.B.),
FACS scATAC-seq clusters wjg@stanford.edu (W.J.G.)
Lymphoid Myeloid Erythroid
single-cell RNA-seq
+ Myeloid pseudo-time human hematopoiesis.
iii) enhancer-gene correlation
AAA
AAA
Highlights
d Single-cell chromatin accessibility reveals a heterogeneous
hematopoietic landscape
Cell 173, 1535–1548, May 31, 2018 ª 2018 Elsevier Inc. 1535
A B CD34 CD45RA
5 5
10 10
HSC 4
4 10 CLP (8.2%)
10
CD10
CD38
3
3 10
10
CD34+ HSPCs
2
10 0
Differentiation
02 2
-10
MPP -10
3 4 5 3 4 5
0 10 10 10 0 10 10 10
5 pDC (15.8%)
LMPP CMP 10 10
5
HSC (10.8%)
4 4 CMP (24.3%)
10 LMPP (2.1%) 10
CD123
GMP
CD90
CLP pDC GMP MEP 3 3 (23.0%)
10 10
2 2
10 10
02 MPP (3.3%) Unknown
02
B, T Peripheral Monocyte, mDC Ery, -10 -10 MEP (7.1%) (5.3%)
3 4 5
NK cells pDC Granulocytes Mega 0 10 10 10 0 10
3
10
4
10
5
CD45RA CD45RA
C Single-cell ATAC-seq
E
100 Filter (n=1,038) Pass Filter (n=2,034)
HSC (N=347) F
Obs
MPP (N=142) GATA1
5 Perm
LMPP (N=160)
Cell-cell variability
Unkown (N=60)
0 0 400 800 1200 1600
Fragments Rank Sorted TF motifs
Figure 1. Single-Cell ATAC-Seq Profiles Chromatin Accessibility within Single Hematopoietic Progenitors
(A) A schematic of human hematopoietic differentiation.
(B) Sorting strategy for CD34+ cells.
(C) Single-cell ATAC-seq workflow used in this study.
(D) Single-cell epigenomic profiles along the TET2 locus.
(E) Percent fragments in peaks by number of fragments in peaks, red lines show cutoffs used for determining which cells pass filter; points are colored by density.
(F) TF motif variability analysis in all single-cell epigenomic profiles collected for this study.
See also Figure S1 and Table S1.
epigenomic and transcriptional dynamics associated with sorted cell epigenomic measurements that may define cis- and trans-
human progenitors across differentiation providing a foundation regulatory mechanisms underlying transcriptional and cell fate
for the dissection of regulatory variation in normal multi-lineage commitment heterogeneity in hematopoiesis.
cellular differentiation (Chen et al., 2014; Corces et al., 2016; Farlik To define a single-cell chromatin accessibility landscape of
et al., 2016; Novershtern et al., 2011). Furthermore, recent work this developmental hierarchy, we applied a single-cell assay
measuring single-cell transcriptomes has revealed significant for transposase-accessible chromatin by sequencing (scATAC-
transcriptional heterogeneity in isolated progenitors (Notta et al., seq) (Buenrostro et al., 2015b) to 10 sortable populations in
2016) and across differentiation (Laurenti and Göttgens, 2018; human bone marrow or blood comprising multipotent and line-
Velten et al., 2017). These observations set the stage for single- age restricted progenitors. We find that the regulatory landscape
Dim. 2
EBF1 0
SPIB
STAT1
IRF8
SPI1 -10
BCL11A
RUNX2
MEIS3 -20
-30
Erythroid
ID3
TCF12 -40 Myeloid
-50
HSC LMPP GMP CLP Unknown -60 -40 -20 0 20 40 60
MPP CMP MEP pDC Z-score -5 5
Mono Dim. 1
Z-score Z-score
C E -11.7 11.2 G -6.1 6.4
GATA1
GA
ATA1
A A1
1 motif
o IID3 motif
t
0 0 0
PC3 (10.42%)
PC3 (10.42%)
PC3 (10.42%)
-2 -2 -2
-4 -4 -4
%)
%)
%)
15 15 15
20
(45.5
20
(45.5
20
(45.5
-6 25 -6 25 -6 25
30 30 30
PC1
35
PC1
35
PC1
-12 -10 35
-8 -6 -4 -2 0 2 -12 -10 -8 -6 -4 -2 0 2 -12 -10 -8 -6 -4 -2 0 2
PC2 (37.8%) PC2 (37.8%) PC2 (37.8%)
Density Z-score Z-score
D 0 1 F -8.4 7.9 H -2.9 3.2
CEBPB
CE
EBPB
E PB
Bmmotif HOXA9
HO
OX
OXXA9
9 motif
o
Lymphoid
Lym
L mpho
m ho
oid
id
d
0 0 0
PC3 (10.42%)
PC3 (10.42%)
PC3 (10.42%)
-2 -2 -2
-4 Erythroid
Eryythro
ro
oiid
d M l
Myeloid -4 -4
%)
%)
%)
15 15 15
20
(45.5
20
(45.5
20
(45.5
-6 25 -6 25 -6 25
30 30 30
PC1
35
PC1
35
PC1
35
-12 -10 -8 -6 -4 -2 0 2 -12 -10 -8 -6 -4 -2 0 2 -12 -10 -8 -6 -4 -2 0 2
PC2 (37.8%) PC2 (37.8%) PC2 (37.8%)
to interpret. We reasoned a reference guided approach that uti- scores each single-cell by the contribution of each PC. Cells
lizes accessibility co-variance of regulatory elements in bulk are subsequently clustered using the Pearson correlation coeffi-
hematopoietic samples, combining previously published (Cor- cients between these normalized PCs scores and all other cells
ces et al., 2016) and additional reference profiles (see STAR (Figure S2G). For low-dimensional visual representation, we per-
Methods), would provide a natural and intuitive subspace for formed PCA on this correlation matrix (Figures 2C and 2D) and
dimensionality reduction of single-cell data. To achieve this, display the first 3 principal components, which represent
we implemented a computational strategy, similar to recent 93.7% of the total subspace variance (Figures S2H–S2M).
methods for single-cell RNA-seq analysis (Li et al., 2017), that We validated this computational approach by down sampling
first identifies principal components (PCs) of variation in bulk bulk profiles to 104 fragments and find that the PCA-projection
ATAC-seq samples (Figure S2F) (Corces et al., 2016), then approach closely follows sample clustering using the bulk
PC3 (10.42%)
5
4 K6
K K12 LMPP
-2
3
-2 K2
K2
2 K5 GMP
-4 1
-4 K3 K9
K 9 Mono
%)
%)
CMP 15 K4
K 15
20 K10
K 0
(45.5
20
(45.5
-6 GMP 25 -6 25 pDC
30 P C1 K11
K1
11
1 30
35
P C1
35 CLP
-12 -10 -8 -6 -4 -2 0 2 -12 -10 -8 -6 -4 -2 0 2
PC2 (37.8%) PC2 (37.8%) 1 2 3 4 5 6 7 8 9 10 11 12 13 14
TF Z-score Clusters Z-score
D 0 Max E F -3.8 3.4
PAX5 2.2 RELA motif
EBF1 GATA motif 2
RUNX2 CTCF ID motif
CTCF motif
NKFB motif 0
Variability, TF z-score
1.8 CTCFL ETS motif
RELA
TCF12
LYL1 NFKB1
FLI1 -2
PC3
GATA2
TCF4 1.4 ETV4 MESP2 GATA1
ID3 MESP1 GATA3 GATA5
MEIS3 ID4 -4
TBX15
IRF1
PRDM1 1
STAT1 -6
MEF2A variability=1.80
BCL11A
KLF4 direction p-value=0.28 Low High
KLF14 0.6
0 5 10 15 20 25 -12 -8 -4 0 4
ESRRA Direction score, -log10 p-value PC2
RARA
RELA Z-score
STAT3 G Z-score H -2.7 2.7
MAFF -3.1 2.6
2 GATA2 motif 2 MESP1 motif
BACH2
ATF4
CEBPD
JUND 0 0
SPI1
ATF3
-2 -2
ERG
HOXA9
CTCF
HOXA4 -4 -4
ALX4
GATA1
FOXO4 -6 variability=1.43 -6 variability=1.36
direction p-value=10-17 Low High direction p-value=10-8 Low High
FOXP3
-12 -8 -4 0 4 -12 -8 -4 0 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Clusters
find CTCF, nuclear factor kB (NF-kB) (represented by the RELA this TF bias is consistent with previous studies that lineage trace
motif), and ETS motifs to be significantly variable in HSCs, HSCs that suggest that HSCs exhibit oligo-lineage bias toward
however, uncorrelated with any specific direction of differentia- erythroid-myeloid or lymphoid cell fates (Pei et al., 2017).
tion (Figure 3F). Interestingly, NF-kB signaling (inflammatory We next turned to LMPPs, a subset previously shown to
signaling) has been implicated in mouse HSC stem-cell mainte- demonstrate lineage priming toward dendritic (pDC), myeloid
nance (Zhao et al., 2012) and HSC emergence mediated by (GMP:monocytes), and lymphoid (CLP:B cell) fates in mice
neutrophil secretion of tumor necrosis factor alpha (TNF-a) (Busch et al., 2015; Naik et al., 2013) and in human (Karamitros
(Espı́n-Palazón et al., 2014; Sawamiphak et al., 2014). In et al., 2018). Interestingly, we found motifs associated with the
contrast, we find the GATA (p = 1017) and MESP/ID (p = 108) TFs TCF4, STAT1, and CEBPE to be significantly correlated
motifs (represented by GATA2 and MESP1 motifs) TF Z scores with directionality toward CLP, pDC, and GMP differentiation,
to be significantly correlated to erythroid and lymphoid trajec- respectively (Figures S3O–S3R), notably the TCF4 motif is similar
tories respectively (Figures 3G, 3H, and S3N). The direction of to the TCF3, ID3, and MESP1 motifs. TCF4 and CEBPE motif
C D H
E F
accessibility were anti-correlated with each other, and each To achieve this, we first determined the shortest path between
defined a unique direction toward CLP (lymphoid) or GMP cluster centroids and assigned cells to the closest point along
(myeloid) differentiation, respectively, suggesting antagonism that path; a similar approach has been described for analyzing
between myeloid/lymphoid differentiation programs. Sepa- scRNA-seq data (Shin et al., 2015). This approach aligned
rately, the STAT1 motif accessibility appeared to be directed cells to well-defined lineage pathways (Orkin and Zon, 2008)
toward CLP (lymphoid) and pDC (dendritic) cell fates. Overall, producing an ordering of single cells along continuous
this reference-guided computational approach provides a erythroid (K1,K3,K5,K6,K7), lymphoid (K1,K2,K8,K14), pDC
statistical framework for assigning TF motif-associated vari- (K1,K2,K8,K12,K13), and myeloid (K1,K2,K9,K10,K11) differenti-
ability to lineage-associated CAL variation providing a resource ation trajectories (Figures 4A–4D and S4A–S4D). These
for identifying molecular factors that may be involved in lineage trajectories allow for interpretation of CAL heterogeneity within
priming across multipotent cell populations. progenitors and provide methods to further parse cellular sub
states across differentiation.
Heterogeneous Cell Types Can Be Further Divided along We examined variability within two clusters (K9 and K10) of
Developmental Trajectories GMPs, which show significant differences in accessibility among
We next sought to order cells along continuous differentiation myeloid-defining factors SPI1 (PU.1) and CEBP-associated
trajectories across branches of hematopoietic development. motifs across the myeloid developmental trajectory (Figure S4E).
0.8 0.8
HOXA9
Deviation Score
Deviation Score
k=1
GATA1
HOXB8 0.6 0.6
0.4
52 motifs
CTCF 0.4
k=2
TCF3 0.2
TCF12 HOXB8, K1 0.2 CEBPD, K4
GATA1, K1 BCL11A, K5
0
25 motifs
95% CI 0 95% CI
k=3
BPTF -2 0 2 4 6 8 10 12 -2 0 2 4 6 8 10 12
Myeloid pseudo-time Myeloid pseudo-time
FOXJ3
Rel. Density
D E
35 motifs
SPIB 60 GMP
HSC HSC
DBP Mono
scATAC
CMP
CEBPA 50
CMP
STAT1 40
20 motifs
GMP
IRF8
k=5
BCL11A 30 mono
Dim. 2
IRF4
20
HSC
37 motifs
scRNA
ESRRA 10 CMP
k=6
RORB
KLF2 0 GMP
-2 0 2 4 6 8 10 12 14 -10 mono
Myeloid pseudo-time -1 0 1 2 3 4 5 6 7 8 9 10 11 12
-40 -30 -20 -10 0 10 20 30 40 50
Dim. 1 Myeloid pseudo-time
F 5 CEBPD expression H
log2 expression
CEBPD
0 SPIB
1
Pearson
1 IRF8
IRF2
0.5
0 -2 0 2 4 6 8 10 12 -2 0 2 4 6 8 10 12
-2 0 2 4 6 8 10 12
Myeloid pseudo-time Myeloid pseudo-time
Myeloid pseudo-time
expected as DPT does not explicitly model cell-cell differences Linking TF Expression with Associated Accessibility
in dropout (zero counts for genes) and further suggests that Variation in Binding Motif
computational tools for joint analysis of scATAC-seq and In effort to disambiguate TFs that bind the same or similar motif
scRNA-seq may be more robust to technical confounders. and thus assign expression of TFs to downstream changes
Most importantly, the gene expression trajectories (Figure S5P) in chromatin accessibility at TF motifs, we correlated the
are largely similar between the two approaches, supporting our expression of TFs with the TF motif Z scores across myeloid
ATAC:RNA pairing approach. pseudo-time. We then filtered for motif accessibility-TF
0
0 0
-2 0 2 4 6 8 10 12 14 -2 0 2 4 6 8 10 12 14 Transition
Myeloid pseudo-time Myeloid pseudo-time peaks
D Chromosome 8 Early to
Scale 1 Mb hg19 activate
chr8: 48,500,000 49,000,000 49,500,000 50,000,000
K1-HSC
K2-CMP
Late to
K9-GMP
activate
K10-GMP
Correlate dynamics
K11-Mono REs to nearby genes (+/-10Mb)
CD34+_HSPCs Early to
CEBPD MCM4 SNAI2 C8orf22 1 repress
Pearson
EFCAB1
PRKDC BC042029
KIAA0146 UBE2V2
0.5 Late to
<0.5 repress
Mean Pearson
0.2 3
0.3
Early to
2
0.2 activate
0.1
0.1 1
Late to
activate
0 0 0
s s
b b b b
<1k -10k 100k b-1M -10M
b op op op
s airs scATAC/RNA capture -2 0 2 4 6 8 10 12 14
1 f. lo f. lo nf. lo ene p
10- 100k 1
c on . con c o - g correlation HiC Myeloid pseudo-time
igh d k
h me low l pea
Al
developmentally committed states. While this reference-guided already or soon-to-be collected (Regev et al., 2017). As such,
approach enabled us to pair scATAC-seq and scRNA-seq data the data generated here and associated computational methods
along a common lineage trajectory, this approach may be gener- may be broadly adapted to further develop computational tools
alized to pair cells along more-diverse cell fate transitions. to pair different single-cell data types.
Notably, methods for computationally pairing multi ‘‘-omic’’ pro- Furthermore, single-cell CALs can be aggregated to define
files have advantages over experimentally coupled approaches, unique cis-regulatory elements active at different stages of
for example, a computational approach may provide (1) more differentiation. The intersection of genetic variants with these reg-
flexible experimental workflows, (2) allow pairing data across ulatory elements may provide new insights into cell types or stages
experimental methods that may not be easily combined, and of differentiation relevant to disease (Corces et al., 2016; Guo
(3) the reanalysis of the large repertoire of scRNA-seq data et al., 2017). Current experimental methods that aim to associate
Buenrostro, J.D., Wu, B., Litzenburger, U.M., Ruff, D., Gonzales, M.L., Snyder, Karamitros, D., Stoilova, B., Aboukhalil, Z., Hamey, F., Reinisch, A., Samitsch,
M.P., Chang, H.Y., and Greenleaf, W.J. (2015b). Single-cell chromatin acces- M., Quek, L., Otto, G., Repapi, E., Doondeea, J., et al. (2018). Single-cell anal-
sibility reveals principles of regulatory variation. Nature 523, 486–490. ysis reveals the continuum of human lympho-myeloid progenitor cells. Nat.
Immunol. 19, 85–97.
Busch, K., Klapproth, K., Barile, M., Flossdorf, M., Holland-Letz, T., Schlenner,
Kelsey, G., Stegle, O., and Reik, W. (2017). Single-cell epigenomics: recording
S.M., Reth, M., Höfer, T., and Rodewald, H.-R. (2015). Fundamental properties
the past and predicting the future. Science 358, 69–75.
of unperturbed haematopoiesis from stem cells in vivo. Nature 518, 542–546.
Laurenti, E., and Göttgens, B. (2018). From haematopoietic stem cells to com-
Calo, E., and Wysocka, J. (2013). Modification of enhancer chromatin: what,
plex differentiation landscapes. Nature 553, 418–426.
how, and why? Mol. Cell 49, 825–837.
Lawrence, H.J., Helgason, C.D., Sauvageau, G., Fong, S., Izon, D.J., Humph-
Chen, L., Kostadima, M., Martens, J.H.A., Canu, G., Garcia, S.P., Turro, E.,
ries, R.K., and Largman, C. (1997). Mice bearing a targeted interruption of the
Downes, K., Macaulay, I.C., Bielczyk-Maczynska, E., Coe, S., et al. (2014).
homeobox gene HOXA9 have defects in myeloid, erythroid, and lymphoid he-
Transcriptional diversity during lineage commitment of human blood progeni-
matopoiesis. Blood 89, 1922–1930.
tors. Science 345, 1251033.
Li, H., Courtois, E.T., Sengupta, D., Tan, Y., Chen, K.H., Goh, J.J.L., Kong, S.L.,
Corces, M.R., Buenrostro, J.D., Wu, B., Greenside, P.G., Chan, S.M., Koenig,
Chua, C., Hon, L.K., Tan, W.S., et al. (2017). Reference component analysis of
J.L., Snyder, M.P., Pritchard, J.K., Kundaje, A., Greenleaf, W.J., et al. (2016).
single-cell transcriptomes elucidates cellular heterogeneity in human colo-
Lineage-specific and single-cell chromatin accessibility charts human hema-
rectal tumors. Nat. Genet. 49, 708–718.
topoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203.
Long, H.K., Prescott, S.L., and Wysocka, J. (2016). Ever-changing landscapes:
Crane, G.M., Jeffery, E., and Morrison, S.J. (2017). Adult haematopoietic stem
transcriptional enhancers in development and evolution. Cell 167, 1170–1187.
cell niches. Nat. Rev. Immunol. 17, 573–590.
Magnusson, M., Brun, A.C.M., Lawrence, H.J., and Karlsson, S. (2007).
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, Hoxa9/hoxb3/hoxb4 compound null mice display severe hematopoietic de-
P., Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq fects. Exp. Hematol. 35, 1421–1428.
aligner. Bioinformatics 29, 15–21.
Manz, M.G., Miyamoto, T., Akashi, K., and Weissman, I.L. (2002). Prospective
Espı́n-Palazón, R., Stachura, D.L., Campbell, C.A., Garcı́a-Moreno, D., Del isolation of human clonogenic common myeloid progenitors. Proc. Natl. Acad.
Cid, N., Kim, A.D., Candel, S., Meseguer, J., Mulero, V., and Traver, D. Sci. USA 99, 11872–11877.
(2014). Proinflammatory signaling regulates hematopoietic stem cell emer-
Matsuoka, Y., Sumide, K., Kawamura, H., Nakatsuka, R., Fujioka, T., Sasaki,
gence. Cell 159, 1070–1085.
Y., and Sonoda, Y. (2015). Human cord blood-derived primitive CD34-negative
Fairfax, B.P., Humburg, P., Makino, S., Naranbhai, V., Wong, D., Lau, E., Jos- hematopoietic stem cells (HSCs) are myeloid-biased long-term repopulating
tins, L., Plant, K., Andrews, R., McGee, C., and Knight, J.C. (2014). Innate HSCs. Blood Cancer J. 5, e290.
immune activity conditions the effect of regulatory variants upon monocyte
Mifsud, B., Tavares-Cadete, F., Young, A.N., Sugar, R., Schoenfelder, S., Fer-
gene expression. Science 343, 1246949.
reira, L., Wingett, S.W., Andrews, S., Grey, W., Ewels, P.A., et al. (2015). Map-
Farlik, M., Halbritter, F., Müller, F., Choudry, F.A., Ebert, P., Klughammer, J., ping long-range promoter contacts in human cells with high-resolution capture
Farrow, S., Santoro, A., Ciaurro, V., Mathur, A., et al. (2016). DNA methylation Hi-C. Nat. Genet. 47, 598–606.
dynamics of human hematopoietic stem cell differentiation. Cell Stem Cell 19,
Naik, S.H., Perié, L., Swart, E., Gerlach, C., van Rooij, N., de Boer, R.J., and
808–822.
Schumacher, T.N. (2013). Diverse and heritable lineage imprinting of early hae-
Frelin, C., Herrington, R., Janmohamed, S., Barbara, M., Tran, G., Paige, C.J., matopoietic progenitors. Nature 496, 229–232.
Benveniste, P., Zuñiga-Pflücker, J.-C., Souabni, A., Busslinger, M., and Notta, F., Gan, O.I., Wilson, G., Kaufmann, K.B., Mcleod, J., Laurenti, E., Dun-
Iscove, N.N. (2013). GATA-3 regulates the self-renewal of long-term hemato- ant, C.F., John, D., Stein, L.D., Dror, Y., et al. (2016). Distinct routes of lineage
poietic stem cells. Nat. Immunol. 14, 1037–1044. development reshape the human blood hierarchy across ontogeny. Science
Fulco, C.P., Munschauer, M., Anyoha, R., Munson, G., Grossman, S.R., Perez, 351, aab2116.
E.M., Kane, M., Cleary, B., Lander, E.S., and Engreitz, J.M. (2016). Systematic Novershtern, N., Subramanian, A., Lawton, L.N., Mak, R.H., Haining, W.N.,
mapping of functional enhancer-promoter connections with CRISPR interfer- McConkey, M.E., Habib, N., Yosef, N., Chang, C.Y., Shay, T., et al. (2011).
ence. Science 354, 769–773. Densely interconnected transcriptional circuits control cell states in human he-
Goldberg, A.D., Allis, C.D., and Bernstein, E. (2007). Epigenetics: a landscape matopoiesis. Cell 144, 296–309.
takes shape. Cell 128, 635–638. Orkin, S.H., and Zon, L.I. (2008). Hematopoiesis: an evolving paradigm for
Graf, T., and Enver, T. (2009). Forcing cells to change lineages. Nature 462, stem cell biology. Cell 132, 631–644.
587–594. Paul, F., Arkin, Y., Giladi, A., Jaitin, D.A., Kenigsberg, E., Keren-Shaul, H.,
Grün, D., Muraro, M.J., Boisset, J.-C., Wiebrands, K., Lyubimova, A., Dhar- Winter, D., Lara-Astiaso, D., Gury, M., Weiner, A., et al. (2015). Transcriptional
madhikari, G., van den Born, M., van Es, J., Jansen, E., Clevers, H., et al. heterogeneity and lineage commitment in myeloid progenitors. Cell 163,
(2016). De novo prediction of stem cell identity using single-cell transcriptome 1663–1677.
data. Cell Stem Cell 19, 266–277. Pei, W., Feyerabend, T.B., Rössler, J., Wang, X., Postrach, D., Busch, K.,
Guo, M.H., Nandakumar, S.K., Ulirsch, J.C., Zekavat, S.M., Buenrostro, J.D., Rode, I., Klapproth, K., Dietlein, N., Quedenau, C., et al. (2017). Polylox bar-
Natarajan, P., Salem, R.M., Chiarle, R., Mitt, M., Kals, M., et al. (2017). coding reveals haematopoietic stem cell fates realized in vivo. Nature 548,
Comprehensive population-based genome sequencing provides insight into 456–460.
hematopoietic regulatory mechanisms. Proc. Natl. Acad. Sci. USA 114, Perié, L., Duffy, K.R., Kok, L., de Boer, R.J., and Schumacher, T.N. (2015). The
E327–E336. branching point in erythro-myeloid differentiation. Cell 163, 1655–1662.
Haghverdi, L., Büttner, M., Wolf, F.A., Buettner, F., and Theis, F.J. (2016). Diffu- Regev, A., Teichmann, S.A., Lander, E.S., Amit, I., Benoist, C., Birney, E., Bod-
sion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, enmiller, B., Campbell, P.J., Carninci, P., Clatworthy, M., et al. (2017). Science
845–848. Forum: The Human Cell Atlas. eLife 6, e27041.
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, William J.
Greenleaf (wjg@stanford.edu).
Single-cell RNA-seq
Single-cell RNA-seq data was collected using the recommended protocol for the 30 scRNA-seq 10X genomics platform
using v2 chemistry. DNA shearing used the Covaris S220, libraries were sequenced on a NextSeq with 26x8x98 read lengths.
PCA projection
To calculate PCs on the bulk datasets, later used for projecting single-cell profiles, we first removed peaks in annotated promoters or
aligning to chrX, peaks associated with chrY and unmapped contigs were filtered out in the preprocessing steps described above.
455,057 peaks remained after filtering and were used for the PCA projection analysis. To normalize the bulk count matrix by library
size, we identified 19,287 low variance promoters, using a MATLAB implementation of a previously described approach (Brennecke
et al., 2013), across all bulk samples and normalized each sample by the mean fragment counts within the low variance promoters.
We subsequently took the mean counts of all normalized bulk sample replicates (HSC, MPP, LMPP, CMP, GMP, MEP, Mono, CD4,
CD8, NK, NKT, B, CLP, Ery, UNK, pDC and Megakaryocyte) and performed PCA-SVD, resulting in 15 principal components.
To score single-cells by the activity for each component, we first centered the counts for each cell by dividing each peak by the
mean fragment counts in peaks for a given cell. We then used the weighted coefficients for each peak and PC (determined using
PCA-SVD of the bulk data above) to take the product of the weighted PC coefficients by the centered count values for each cell,
taking the sum of this value resulted in a matrix of cells by PCs. Last, we calculate a cell-cell similarity matrix using Pearson correlation
The accession number for the sequencing data reported in this paper is GEO: GSE96772. Processed scATAC-seq data, which
include PC values and TF scores per cell can be found in Data S1. The software developed for analyzing these data, which includes
projecting scATAC-seq profiles onto bulk hematopoietic PCs and pairing single-cell ATAC-seq with single-cell RNA-seq, as well as
processed data from this manuscript can be found in Data S2.
ADDITIONAL RESOURCES
The data can be visualized in the UCSC genome browser using the track hubs representing bulk data (https://s3.amazonaws.com/
JasonBuenrostro/scATAC_heme_label/hub.txt) and clusters of single-cell ATAC-seq data (https://s3.amazonaws.com/
JasonBuenrostro/scATAC_heme/hub.txt). This manuscript is accompanied by a web resource for visualizing the single-cell
ATAC-seq data, which can be found at: http://schemer.buenrostrolab.com/.
Dim. 2
Dim. 2
Dim. 2
Dim. 2
0 0 0 0
-10 -10 -10 -10
-20 -20 -20 -20
-30 -30 -30 -30
Z-score Z-score log10 counts FRiP
-40 -40 -40 2.7 4.8 -40 .55
-11.8 10.9 -7.2 7.1 .99
-50 -50 -50 -50
-60 -40 -20 0 20 40 60 -60 -40 -20 0 20 40 60 -60 -40 -20 0 20 40 60 -60 -40 -20 0 20 40 60
Dim. 1 Dim. 1 Dim. 1 Dim. 1
E Cells scored by ENCODE ChIP-seq F G Pearson
1
H 40 PCA of bulk samples
HSC LMPP GMP CLP Unknown 300
50 MPP CMP MEP pDC centroid
Mono 30
CD8
% Variancee Explained
40 bulk sample
CD4 20
30 200 NK -0.1
10
PC2 18.96%
20 HSC
100 Bcell MPP 0
10
Dim. 2
LMPP 0 2 4 6 8 10 12 14 16 18
0 Component #
CMP
-10 0 pDC CLP GMP 50 PCA of projected samples
GMP Mono 40
-20 LMPP MEP
HSC 30
-30 -100 CMP Mega CLP
MEP 20
-40 Ery pDC 10
-50 -200 Unknown 0
-80 -60 -40 -20 0 20 40 60 80 -200 0 200 400 600 0 2 4 6 8 10 12 14 16 18
PC1 37.34% scATAC-seq cells scored by bulk PCs Component #
Dim. 1
I PC1 more stem J PC2 lym-ery K PC3 myel-lym L PC4 CLP-pDC
4 2 8 7
2 1
7 6
0 0
-1 6
PC2 (37.8%)
PC3 (10.4%)
PC4 (2.6%)
PC5 (2.1%)
-2 5
-2 5
-4
-3 4
-6 4
-4
-8 -5 3
3
-10 -6
2 2
-12 -7
-14 -8 1 1
10 15 20 25 30 35 40 -14 -12 -10 -8 -6 -4 -2 0 2 4 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 1 2 3 4 5 6 7 8
PC1 (45.5%) PC2 (37.8%) PC3 (10.4%) PC4 (2.6%)
M N Peak accessibility based correlation PCA-based correlation PCA-based correlation across bulk samples
across bulk samples across bulk samples Downsampled to 10,000 fragments
PC5 CLP 1
Pearson
2.5
2
0
PC6 noise
PC6 (0.9%)
1.5
1
0.5 HSC
MPP
0 LMPP
CMP
-0.5 GMP
-1 MEP
1 2 3 4 5 6 7 CLP
PC5 (2.1%) pDC
O Downsampled
D n m d to m
mean d
depth
t (10k)
th 0
P Downsampled
D
Dow
wn
nsampled
n m d to m
matched
h depths
d hs
de h
Q Synthetic mixture R Synthetic mixture
Downsampled
Do ssampled
p to mean
ea depth
de
e Downsampled
Do s pl d to to mean
e de depth
epth
e
Ratio CMP/GMP Ratio LMPP/CLP
-2 -2 -2 -2
-4 -4 -4 -4
15 15 15 15
20 20 20 20
-6 25 -6 25 -6 25 -6 25
30 30 30 30
35 35 35 35
-12 -10 -8 -6 -4 -2 0 2 -12 -10 -8 -6 -4 -2 0 2 -12 -10 -8 -6 -4 -2 -12 -10 -8 -6 -4 -2
0 2 0 2
S Fragment counts, log10
T Fraction of reads in peaks (FRiP) U Fresh HSC vs frozen profiles V DonorID
2 2 2 2
1 1 1 1
0 0 0 0
-1 -1 -1 -1
-2 -2 -2 -2
PC3
PC3
PC3
PC3
-3 -3 -3 -3 BM4983
-4 -4 -4 -4 BM0106
BM0828
-5 -5 -5 -5 PB1022
-6 -6 -6 -6 BM1077
log10 counts FRiP BM1137
-7 -7 -7 -7
2.7 4.8 .55 .99 Frozen Fresh BM1214
-8 -8 -8 -8
-14 -12 -10 -8 -6 -4 -2 0 2 4 -14 -12 -10 -8 -6 -4 -2 0 2 4 -14 -12 -10 -8 -6 -4 -2 0 2 4 -14 -12 -10 -8 -6 -4 -2 0 2 4
PC2 PC2 PC2 PC2
Figure S2. Analytical Frameworks for Clustering scATAC-Seq Profiles, Related to Figure 2
(A–D) t-SNE embedding of cells colored by the TF z-score activity of (A) GATA1 and (B) CEBPD motifs, or the quality metrics (C) log10 fragment counts and (D)
fraction of reads in peaks.
(legend continued on next page)
(E) t-SNE embedding of single-cell profiles using ENCODE ChIP-seq of K562s as a feature vector, instead of the TF motifs shown above, cells are colored by their
cell identity.
(F) PC1 and PC2 from PCA of fragments in peaks for bulk samples (gray) and their centroids (red).
(G) (left) Hierarchical clustering of the correlation of single-cell epigenomic profiles scored by bulk PCs, (right) profiles colored by their sorted immunophenotype
identity.
(H) Percent variance explained for each PC from (top) PCA derived from bulk data using fragments in peaks and (bottom) PCA of the PCA projected subspace (see
STAR Methods).
(I–M) PC projection of single-cell ATAC-seq data showing cells scored by PC components (I) PC1 and PC2, (J) PC2 and PC3, (K) PC3 and PC4, (L) PC4 and PC5,
and (M) PC5 and PC6.
(N) Bulk sample-sample correlation matrix determined by correlation of (left) fragments in peaks, (middle) PCA projection values and (right) PCA projection values
after down sampling data to 10,000 fragments per sample. Far left represents the sorted immunophenotype of each bulk sample.
(O and P) PCA projection of mean single-cell profiles of immunophenotypically defined cell types down-sampled to (O) 10,000 or (P) matched fragment counts to
the observed single-cell dataset.
(Q and R) Synthetic mixtures of two immunophenotypically defined single-cell profiles down sampled to 10,000 fragments with varying mixtures of (Q) CMP/GMP
and (R) LMPP/CLP cell types, unmixed cell types from (O) are shown for reference.
(S–V) PCA projection of single-cells colored by (S) log10 fragment counts, (T) fraction of reads in peaks, (U) fresh HSC versus frozen profiles, and (V) donor.
A MPP and GMP B CMP and LMPP C Sample Name Subset Name Count D Sample Name Subset Name Count
BM1077_PrimarySort CMP 11668 BM1077_PrimarySort GMP 16428
BM1077_PrimarySort CD34+CD38-CD10- 11885 BM1077_PrimarySort CD34+CD38-CD10- 11885
5 5
10 10
CD90
CD90
PC3 (10.42%)
PC3 (10.42%)
-2 -2 3 3
10 10
-4 -4
%)
%)
15 15 10
2
10
2
20 20
(45.5
(45.5
-6 25 -6 25 0
MPP 0 MPP
30 30 -10
2
-10
2
35
PC1
35
PC1
0 1 2 3 4 5
-12 -10 -8 -6 -4 -2 0 2 -12 -10 -8 -6 -4 -2 0 2 10 10 10 10 10 10 10
0
10
1
10
2
10
3
10
4
10
5
12
Cell-cell TF variability
10 5
4
CMP Clusters
8 3
3
6
4
4 2
2 2
1
0 1 4 D I1 A 3 2 4 Erythroid bias Myeloid bias
TAATF EBP SP L11 ATF ACH TCF
GA
SC
PP
PP
P
P
EP
LP
pD
C
M
LM
M
H
C
G
Sorted TF motifs
F Peak permutation by GC & peak intensity G 6 J Scale 10 kb hg19
chrX: 48,645,000 48,650,000 48,655,000 48,660,000
Fold variance over expected
5 GATA1 HDAC6
1_
4 5
0 CMP Clusters 0_
1_
3
PC3 (10.42%)
3
-2
0_
2 1_
-4 4
%)
15 1 0_
20
(45.5
1_
-6 25
30 0
2
35
PC1
-12 -10 0_
-8 -6 -4 -2 0 2
SC
PP
PP
P
EP
LP
o
on
M
pD
PC2 (37.8%)
C
M
LM
M
H
C
G
GATA2
PC3
LYL1 -2
TCF12 GATA1
MESP1 -4
PAX5
EBF1
MESP2
variability=1.68
- C
- P
- P
- P
- P
P
-L P
K1 -GMP
K1 -G P
K1 -Mo P
K1 -pD o
3 C
4- C
LP
2 n
-6
K3 CM
K4 CM
K5 CM
K6 CM
K7 ME
K8 -ME
K9 MP
1 M
K2 HS
K1 -pD
C
-12 -8 -4 0 4
PC2
P Q -3.5
Z-score
2.4 R -2.9
Z-score
3.2
TF motifs 2 STAT1 motif 2 CEBPE motif
REL
0 0
CEBPE
PC3
PC3
STAT1 -2 -2
ID3 -4 -4
TCF4
MESP1
variability=1.54
-6 variability=1.48 -6
direction p-value=10-11 Low High direction p-value=10-10 Low High
LMPP EIPP pure epigenomes TF Z-score -3 3 -12 -8 -4 0 4 -12 -8 -4 0 4
PC2 PC2
Num of cells
Num of cells
Num of cells
140
Num of cells
120 200 120
100 120
100
150 100
80 80
80
60 100 60 60
40 40 40
50
20 20 20
0 0 0 0
-2 0 2 4 6 8 10 12 14 0 5 10 15 20 25 -2 0 2 4 6 8 10 12 14 -2 0 2 4 6 8 10 12 14 16
Dev. progression Dev. progression Dev. progression Dev. progression
11 11
4 R=0.91 R=0.86
1813-SPI1-D-N5
-3 4 5 6 7 8 9 10 11 4 5 6 7 8 9 10 11
0 400 800 1200 1600 UNK, log2 fragments GMP-A, log2 fragments
Rank sorted TF motifs
H Scale
20 kb hg19
chr3:
128,190,000 128,200,000 128,210,000 128,220,000 128,230,000
500 _
K1-HSC
0_
500 _
K2-CMP
0_
500 _
K9-GMP
0_
500 _
K10-GMP
0_
500 _
K11-Mono
0_
500 _
Bulk HSC
0_
500 _
Bulk GMP-A
0_
500 _
Bulk GMP-B
0_
500 _
Bulk Mono
0_
RefSeq Genes
DNAJB8 GATA2
I J K
1200
GMP-A
GM
MP-A
M A (10.2%)
(10
( 2 2%) GMP
GMP-B
GMP
G M B (15.
(15.0%)
.0%)
..0 1000
Rank Sorted Cells
GMP-C
G - (8.9%)
(8 9%) UNK
800
0
GMP-A
600
PC3 (10.42%)
-2
0.8
GMP-B 400
-4
%)
Freq.
15 200
20
(45.5
-6 25 GMP-C
30
0
0
PC1
35
-12 -10 -8 -6 -4 -2 0 2 K2 K8 K9 K10 K11 K12 -2 0 2 4 6 8 10 12 14
PC2 (37.8%) CMP LMPP GMP GMP Mono pDC
Myeloid developmental progression
Clusters
L 848-BCL11A-D
M 558-CEBPD-D-N2
N 2081-GATA1-D-N7
12 12
5
8 8
0
TF z-score
TF z-score
TF z-score
4 4
-5
0 0
-10
-4 -4
-2 0 2 4 6 8 10 12 14 -2 0 2 4 6 8 10 12 14 -2 0 2 4 6 8 10 12 14
Dev. trajectory Dev. trajectory Dev. trajectory
0.8
Cluster Centroids
Gap Values
0.6 1
0.6
cluster cluster
0.4 1 3
0.5 2 4
0.4
5
0.2 6
0 0.2
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 0 2 4 6 8 10 12 14 16 18 20 -2 0 2 4 6 8 10 12 14 -2 0 2 4 6 8 10 12 14
Standard Dev. of TF scores Number of Clusters Developmental traj. (euc. distance) Developmental traj. (euc. distance)
D E F 11 Mono
Infer RNA expression Assign scRNA to scATAC cell 1
PC
8 6 pDC
Score 9 CMP
5
10
scATAC cells 11 4 MEP
CLP
Calculate Take max value to 12
13
3
scATAC x scRNA assign scRNA cell 14 MPP
2
Inferred expression 15
similarity (Pearson) to scATAC-seq profile 1 HSC
profiles -10 -5 0 5 10 15 1 2 3 4 5 6 7 8 9 10 11
CEBPD fit coefficients Fit CEBPD expression (log2)
G H I HSC
MPP
LMPP
CMP
GMP
MEP
CLP
pDC
Unknown J FACs isolated CMPs
Pearson Mono K2 K3 K4 K5
Inferred CEBPD expression 1 1.0 1.0
2
-1
0.6 0.6
-2
PC3
-3
0.4 0.4
-4
-5
0.2 0.2
-6
log2 expression
-7
-2 12
0.0 0.0
-8 P P o 4 4 8 7 7 q
C
-14 -12 -10 -8 -6 -4 -2 0 2 4 scATAC-seq HS CM GM Mon CD3 121 M082 M107 M113 NA-se
BM B B B cR
PC2 10X scRNA-seq sorted identity s
scATAC-seq
Norm. -1 1
K L expression M
0.6 HSC CMP GMP
K5
K3 R = 0.86
0.5
K4
K2 0.4
DPT order
K5 0.2
K3
0.1
K4
K2 0
I1 X1 F4 1 2B F1 A1 E 2 C S1 O 4 A5 FF ID1 ID2 T3 74 T2 PB -2 0 2 4 6 8 10
FL PB P FPM GA KL AT PO PRG CL AL MP XCR OX MA FL CD BS EB
A ATAC:RNA pairing order
Z IT G LG C H C
ATAC:RNA pairing order
Smoothed
N O P HSC CMP GMP
scRNA-seq of HSCs scRNA-seq of HSCs HOXA9 CEBPD
Rel. expression
1
log2 expression
2000 2000 0 0 0 0
0 5 10 0 5 10
ATAC:RNA pairing order ATAC:RNA pairing order
1500 1500
HOXA9 CEBPD
log2 expression
Rel. expression
4 1 1
1000 1000
DPT order
Counts Counts 4
2500 2500
500 500 2 2
500 500 0 0 0 0
0 0
-2 0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6
DPT order DPT order
ATAC:RNA pairing order DPT order
Figure S5. Chromatin Accessibility and Expression Dynamics across Myeloid Cell Differentiation, Related to Figure 5
(A) The standard deviation of smoothed TF accessibility scores, as shown above, for observed scores (blue) and cells randomly permuted (red), dotted line
represents an FDR less than 1%.
(B) Gap values for 1 to 20 k-medoids clusters, with optimal cluster number 6 highlighted. Error bars represent 1 SD on the within cluster dispersion.(C) Centroids
for each k-medoid TF cluster for (left) early and (right) late acting TFs, cells ordered by myeloid development (top) are shown for reference.
329 Reactivation
HSC specific
Dim. 2
0.4 404 0
298
338
0.2 -40
203
132
0 -80
66
-2 0 2 4 6 8 10 12 -100 -50 0 50 100
Developmental traj. (euc. distance) 55 Dim. 1
B E
Transition
0.3 78
IL7;-364457
62 80 Mono. activity
65 0 1
Fragments per cell
0.2 58 40
78
Dim. 2
86
0
Monocyte specific
0.1 156
248
-40
285
0 67
-2 0 2 4 6 8 10 12 14
Developmental traj. (euc. distance) -2 0 2 4 6 8 10 12 14 -80
Developmental traj. (euc. distance) -100 -50 0 50 100
Dim. 1
Collect raw
PWM scores
per peak
Compute PWM
peak-peak log-score 3 8
distance
2081-GATA1-D-N7 848-BCL11A-D 1880-SPIB-D-N2 216-TCF12-D-N3 3073-ESRRA-D-N4
For each peak
average over
K-nearest (300)
Local
PWM score min max
G I rs1978487: C->T
0.5 100 kb hg19
31,150,000 31,200,000 31,250,000 31,300,000 31,350,000 31,400,000 31,450,000 31,500,000
Fraction of correlated peaks
K1-HSC
0.45
K2-CMP
0.4
K9-GMP
0.35
K10-GMP
0.3
K11-Mono
0.25
CD34+_HSPCs
0.2 s s
pairs ce loops ce loop ce loop BCKDK PRSS36 FUS PYDC1 ITGAM ITGAX ZNF843 C16orf58
ene n en en
eak-g w evide ed. evid igh evid 1
PRSS8 PYCARD ITGAD TGFB1I1
Pearson
0.5
H Expected peak-gene pairs
<0.5
7
Background peaks-gene pairs
6
5
Rel. density
0
-1.5Mb 0 1.5Mb
Distance of cis-eQTL peak to predicted target gene
Figure S6. Association of TFs and Genetic Variants to Regulatory Element Dynamics across Myeloid Differentiation, Related to Figure 6
(A and B) Example cis-regulatory dynamics across myeloid development for (A) HSC active peaks and (B) a reactivated peak near IL7.
(C) Hierarchical clustering of k-medoids centroids across all peaks, values (left) denote number of peaks per cluster and labels (right) denote categorization into
different global patterns.