You are on page 1of 1

Decoding Gene Regulation of Immune Cells with Deep Learning

Nuria Alina Chandra, Alexander Sasse, Sara Mostafavi, & the Immunological Genome Project
Paul G. Allen Center for Computer Science & Engineering

Problem
All somatic cells, from heart cells to immune cells, have the same The Model Profile prediction

Inspired by the BPNet architecture2


genetic code but express different genes. Decoding the regulatory
processes that allow the same DNA sequence to produce vastly
9 dilated convolutions with residual
different gene expression patterns is needed to understand the genetic skip connections
Softmax

basis of disease. To study gene regulation of the immune system we


model differences in chromatin accessibility across 90 immune cell
types to determine areas of DNA that are differentially bound by
transcriptional machinery, which is a reliable proxy for changes in … 25 x number of
celltypes
Add Tn5
cut bias
downstream gene expression1. Sequence length x
number of celltypes (predicted from
separate model)
Total count prediction
3x64 3x64
25x4x64
One-hot
encoded
~1000 bp
sequence
6

Average Pool
Total Count Loss λ = Weight on Profile Loss 2.25
Profile loss
Results
12
The model successfully learns to predict both the total
number of ATAC-seq reads and the base-pair resolution
ATAC-seq profile of open chromatin regions.
Profile Prediction for λ=1e-1
Total Count Prediction Performance Profile Prediction Performance Celltype: B.FrE.BM peak 102436

Base-pair Pearson correlation of profile


Pearson correlation of total counts

ATAC-seq Data
Assays for transposase-accessible chromatin
(ATAC-seq) measures chromatin accessibility by
counting the number of cuts from the Tn5
enzyme regions of DNA. The enzyme cuts where
λ Weight on profile loss λ Weight on profile loss
DNA is accessible to the transcriptional
Model performance on the profile prediction task improves as the profile loss is weighted more heavily.
machinery. More counts are a proxy for stronger
transcriptional activity through binding of A More Complex Model Modified scalar output head Total Count Prediction Performance
transcription factors (TF). Base-pair resolution
Modifications: the number of

Pearson correlation of total counts


ATAC-seq profiles contain positional information
maxpools followed by
about transcription factor (TF) binding sites. We convolutional filters throughout the convolutional layer 3x

use ATAC-seq data from 90 different mouse model was increased to 300 and 3x
immune cell types collected by the
Immunological Genome Project1 . Hypothesis: a
maxpooling and convolutions were
added to the scalar output head.
Maxpool …
model learning base-pair resolution ATAC-
seq accessibility will have improved total Conclusion: In the more complex model
chromatin accessibility prediction accuracy. ATAC-seq footprints change with TF interactions adding base-pair resolution information
improved ATAC-seq total count λ Weight on profile loss

prediction accuracy Max correlation = 0.49 when λ = 0.7


Objective
Model
Learned regulatory Acknowledgments: Thank you to the Mary Gates Endowment for Students and the Washington Research Foundation for
motifs
supporting this project. Thank you to the members of the Mostafavi Lab for providing critiques, support, and insights.
DNA Sequence Chromatin Accessibility
For 90 Celltypes References
1. Yoshida, Lareau, C. A., Ramirez, R. N., Rose, S. A., et al. The cis-Regulatory Atlas of the Mouse Immune System. Cell 176(4), 897–912 (2019).
Biological insights 2. Avsec, Ž., Weilert, M., Shrikumar, A. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 53, 354–366 (2021).

You might also like