Professional Documents
Culture Documents
Evaluating Vitamin D Receptor Binding Site ChIP-Seq Data for Novel Motifs, Histone
Modifications, and Disease-Associated SNPs
Jane Shmushkis GCB 535 Spring 2017
Methods
Finding common peaks between VDR antibody ChIP-seq datasets. First, I found the ChIP-seq
results for both datasets from Genome Expression Omnibus. For the THP-1 human monocytic
leukemia cells, I used GSM678268 for the unstimulated dataset and GSM678269 for the
treatment dataset. For the lymphoblastoid cell lines, I used GSM558631 for the unstimulated
dataset and GSM558633 for the calcitriol stimulated dataset. I used liftOver to convert the
lymphoblastoid datasets from the hg18 human genome assembly to the hg19 assembly to
enable dataset comparison. I sorted the peaks and used $ comm in Unix to remove any peaks
1
that overlapped between the control and treatments datasets within a cell type. Next, I
performed a fisher’s exact test using $ bedtools intersect to check for overlaps in peaks
between the two control datasets and overlaps between the two treatment datasets.
Identifying known and de novo motifs among the common VDR-binding peaks. Using the
common peaks, I completed HOMER motif analysis to identify known and de novo motifs within
the chromosomal regions. I used liftOver again, this time to convert the chromosomal regions in
the common peaks files to the hg18 assembly. I used $ findMotifsGenome.pl
[common_peaks.bed] hg18 MotifOutput/ -size 200 -mask -preparsedDir
parsed_genome -len 8 to initiate the motif identification. The top five known and top
unknown motifs not listed at false positive were recorded in Figure 1 and Figure 2, respectively.
Finding overlaps between histone marks in sigmoid colon tissue and common chromosomal
peaks. To investigate whether the common peaks overlapped with histone marks, I selected a
ChIP-seq dataset of Homo sapiens sigmoid colon tissue from an female adult (53 years) on
ENCODE (ENCBS555KUV). The sigmoid colon was selected because conditions linked to
vitamin D deficiencies, such as Crohn’s disease and celiac disease, often affect this tissue. I
downloaded the stable bed narrowPeak files for ChIP-seq performed with H3K4me1, H3K4me3,
and H3K9me3 as the target histone marks. All peaks were mapped to the hg19 assembly. I then
used $ bedtools fisher to check for significant overlaps between the common peaks for
treatment/untreated datasets and the three histone marks. The right (as opposed to the left)
value result from the fisher's exact test was accepted as the overall p-value.
Results
Identified chromosomal region peaks common to both cell lines for further analysis. I found that
258 peaks overlapped between the treated THP-1 and leukemia cells ChIP-seq peaks. There
was a total of 4684 peaks between the two treatment datasets. Additionally, 100 of the peaks in
the control untreated groups were common between the cell types. A total of 3121 peaks were
in the two control datasets. The 258 common treatment peaks and 100 common control peaks
will be used for further analyses.
Known and novel motifs within the common VDR associated peaks. Tables of HOMER results
are available in Figure 1 and Figure 2 on page 4 and 5, respectively. Note that the top known
motif for the treatment groups was the VDR binding site motif. This serves as a positive control
for the HOMER analysis. One of the top known motifs for the treatment group is ETS1, a
transcription factor and proto-oncogene. The ETS family of transcription factors is associated
with Jacobsen Syndrome and Estrogen-Receptor Negative Breast Cancer. The ETS1 gene
2
controls the expression of cytokines and chemokines genes.4 The fourth highest ranking result
is a motif for Fli1, another transcription factor part of the ETS family. This gene is associated
with Ewing Sarcoma and Isolated Delta-Storage Pool Disease.5 The top de novo motif for the
treatment peaks appears to be linked to the retinoid X receptor. The second de novo motif also
appears to be associated with the ETS transcription factor family. The top known motif for the
untreated group is the motif for the NFY gene, whose protein is a component of a heterodimeric
transcription factor that recognizes the 5-CCAATT box motif in the promoter of its target genes.6
H3K4me3 activating histone mark associated with the common peaks. Of the 258 treatment
common peaks, 87 overlapped with the H3K4me3 mark in the sigmoid colon ChIP-seq dataset,
yielding a p-value of 1.826e-93. Of the 100 untreated peaks, 96/100 overlapped with the
H3K4me3 mark yielding a p-value of 7.4975e-173. The H3K9me3 and H3K4me1 histone marks
yielded p-values above 0.05 when checking for overlaps with the common peaks. Overall, the
only other overlap beside those for H3K4me3 was 1 of 258 treatment common peaks
overlapped with an H3K9me3 mark.
Disease-associated SNPs mapped to ChIP-seq peaks for each treated cell line individually.
None of the common peaks (treated or untreated) mapped to any of the total 1054 SNPs tested.
As a next step, I checked whether each of the treated datasets individually had overlaps with
any of the SNPs. I found that the calcitriol-treated lymphoblastoid cells had four peaks overlap
with SNPs for Crohn’s disease, yielding a p-value of 0.00018326. These cells also had two
peaks overlap with type 1 diabetes SNPs, yielding a p-values of .007524. The 1α,
25-dihydroxyvitamin D3-treated THP-1 leukemia cells had 2 of its 1818 peaks overlap with
SNPs linked to multiple sclerosis, yielding a p-value of 0.020834. Any remaining comparisons
yielded p-values above 0.05.
3
Limitations and future directions. Limitations of this work include a lack of investigation into the
particular chromosomal region peaks that overlapped with histone marks or SNPs. No gene
ontology analysis was performed to to characterize the roles of the peaks highlighted by
chromatin marks or disease-associated SNPs. Future works could look into SNPs for other
diseases linked to vitamin D deficiencies, such as rheumatoid arthritis. Another limitation was
that only two human cell types were investigated. The case for a particular chromosomal
region’s involvement in vitamin D-driven gene expression alteration could be strengthened if a
peak was common across three or four different datasets in which unique cell types were
treated with vitamin D. Additionally, VDR antibody ChIP-seq could be performed on other
human cell lines after vitamin D treatment, particularly cells involved in insulin maintenance or
cells in the lower gut.
Figure 1. HOMER known motif results for common treatment (a) and control peaks (b)
4
Figure 2. HOMER de novo motif results for common treatment (a) and control peaks (b)
5
References
1. Handel, Adam E., et al. "Vitamin D receptor ChIP-seq in primary CD4+ cells: relationship
to serum 25-hydroxyvitamin D levels and autoimmune disease." BMC medicine 11.1
(2013): 163.
2. Ramagopalan, Sreeram V., et al. "A ChIP-seq defined genome-wide map of vitamin D
receptor binding: associations with disease and evolution." Genome research 20.10
(2010): 1352-1360.