You are on page 1of 6

 

Evaluating Vitamin D Receptor Binding Site ChIP-Seq Data for Novel Motifs, Histone
Modifications, and Disease-Associated SNPs
Jane Shmushkis GCB 535 Spring 2017

Background & Significance


Vitamin D is essential to maintaining overall good health. It is produced by the skin when
exposed to the sun, but it can also be acquired through diet. Vitamin D is converted to its active
form, calcitriol, by the liver and kidneys.​1​ Calcitriol then binds to the vitamin D receptor (VDR),
and the VDR protein dimerizes with retinoid X receptors (RXR). This heterodimer binds to
regions in the genome, thus able to alter gene transcription.​2​ Beyond that, it is poorly
understood how exactly vitamin D alters expression. Low levels of vitamin D have been linked to
multiple sclerosis, rheumatoid arthritis, type 1 diabetes, autoimmune disorders, and vitamin D
disorders. However, around one million worldwide have vitamin D deficiency due to low sun
exposure or inadequate nutrition.​1,2​ It is important to understand how vitamin D interacts with
genomic regions and other proteins to affect gene expression and phenotype. Investigating
VDR’s interactions with the genome is the first stage to understanding this process and how it is
altered in disease phenotypes.

Question & Specific Aims


The question being asked is what DNA sequences/regions and proteins does VDR
interact in human cells? This question will be addressed by investigating two ChIP-seq datasets
that use a VDR antibody on human cell types treated with vitamin D or its active form. The first
dataset is from Heikkinen ​et al​. in a 2011 publication in ​Nucleic Acids Research​. The ChIP-seq
was performed on THP-1 human monocytic leukemia cells treated with 1α, 25-dihydroxyvitamin
D​3​ and an untreated control with the reads mapped to the hg19 human genome assembly.​2​ The
second was from Ramagopalan ​et al.​ in a 2010 publication in ​Genome Research​. The ChIP-seq
was performed on lymphoblastoid cell lines (GM10855) treated with calcitriol (as well as a
control treatment) and was mapped to the hg18 human genome assembly.​3​ This work has four
aims for analysis of these two datasets:
(1) Identify ChIP-seq peaks (VDR binding sites) common to both treated datasets and both
untreated datasets.
(2) Identify ​de novo​ motifs in the chromosomal locations of the common peaks from aim 1.
(3) Identify whether the common peaks overlap with particular histone marks.
(4) Determine whether the common peaks overlap with SNPs for diseases associated with
vitamin D deficiency.
Hypothesis.​ About 25% of the total peaks evaluated will be common between ChIP-seq peaks
for both datasets. The ​de novo​ motifs identified for the common treated and common untreated
datasets will not be the same, and the common peaks will overlap with histone markers from a
sigmoid colon histone mark ChIP-seq dataset. Lastly, the common peaks, especially in the
control untreated groups, will overlap with SNPs for vitamin D deficiency associated diseases,
including Crohn’s disease, Celiac disease, type 1 diabetes, and multiple sclerosis.

Methods
Finding common peaks between VDR antibody ChIP-seq datasets.​ ​First, I found the ChIP-seq
results for both datasets from Genome Expression Omnibus. For the THP-1 human monocytic
leukemia cells, I used GSM678268 for the unstimulated dataset and GSM678269 for the
treatment dataset. For the lymphoblastoid cell lines, I used GSM558631 for the unstimulated
dataset and GSM558633 for the calcitriol stimulated dataset. I used ​liftOver​ to convert the
lymphoblastoid datasets from the hg18 human genome assembly to the hg19 assembly to
enable dataset comparison. I sorted the peaks and used ​$ comm​ in Unix to remove any peaks

1
that overlapped between the control and treatments datasets within a cell type. Next, I
performed a fisher’s exact test using ​$ bedtools intersect​ to check for overlaps in peaks
between the two control datasets and overlaps between the two treatment datasets.

Identifying known and ​de novo​ motifs among the common VDR-binding peaks.​ ​Using the
common peaks, I completed HOMER motif analysis to identify known and ​de novo​ motifs within
the chromosomal regions. I used ​liftOver​ again, this time to convert the chromosomal regions in
the common peaks files to the hg18 assembly. I used ​$ findMotifsGenome.pl
[common_peaks.bed] hg18 MotifOutput/ -size 200 -mask -preparsedDir
parsed_genome -len 8 ​to initiate the motif identification. The top five known and top
unknown motifs not listed at false positive were recorded in ​Figure 1 ​and ​Figure 2​, respectively.

Finding overlaps between histone marks in sigmoid colon tissue and common chromosomal
peaks.​ ​To investigate whether the common peaks overlapped with histone marks, I selected a
ChIP-seq dataset of ​Homo sapiens ​sigmoid colon tissue from an female adult (53 years) on
ENCODE (ENCBS555KUV). The sigmoid colon was selected because conditions linked to
vitamin D deficiencies, such as Crohn’s disease and celiac disease, often affect this tissue. I
downloaded the stable bed narrowPeak files for ChIP-seq performed with H3K4me1, H3K4me3,
and H3K9me3 as the target histone marks. All peaks were mapped to the hg19 assembly. I then
used $ bedtools fisher to check for significant overlaps between the common peaks for
treatment/untreated datasets and the three histone marks. The right (as opposed to the left)
value result from the fisher's exact test was accepted as the overall p-value.

Identifying overlaps with disease-associated SNPs and common VDR-binding peaks.​ To


determine whether the common peaks overlapped with any disease-associated SNPs for
conditions linked to vitamin D deficiencies, I collected lists of SNPs linked to Crohn’s disease,
Celiac disease, type 1 diabetes, and multiple sclerosis from the GWAS Catalog. I adjusted the
lists to only include the chromosome number (format chr#), start base, and end base (only one
after the start base) in Microsoft Excel, and I reuploaded the SNP lists as bed files. I used the
hg19 mapped common peaks to perform a fisher’s exact test via ​$ bedtools fisher​ to
determine the number of overlapping intervals between the SNPs for each disease and the
common peaks (treated and untreated). The overlaps between the SNPs and each of the
treatment datasets (calcitriol-treated lymphoblastoid cells and 1α, 25-dihydroxyvitamin
D​3​-treated THP-1 leukemia cells) were also determined with the fisher’s exact test.

Results
Identified chromosomal region peaks common to both cell lines for further analysis.​ ​I found that
258 peaks overlapped between the treated THP-1 and leukemia cells ChIP-seq peaks. There
was a total of 4684 peaks between the two treatment datasets. Additionally, 100 of the peaks in
the control untreated groups were common between the cell types. A total of 3121 peaks were
in the two control datasets. The 258 common treatment peaks and 100 common control peaks
will be used for further analyses.

Known and novel motifs within the common VDR associated peaks.​ Tables of HOMER results
are available in​ Figure 1 ​and ​Figure 2 ​on page 4 and 5, respectively. Note that the top known
motif for the treatment groups was the VDR binding site motif. This serves as a positive control
for the HOMER analysis. One of the top known motifs for the treatment group is ETS1, a
transcription factor and proto-oncogene. The ETS family of transcription factors is associated
with Jacobsen Syndrome and Estrogen-Receptor Negative Breast Cancer. The ETS1 gene
2
controls the expression of cytokines and chemokines genes.​4​ The fourth highest ranking result
is a motif for Fli1, another transcription factor part of the ETS family. This gene is associated
with Ewing Sarcoma and Isolated Delta-Storage Pool Disease.​5​ The top ​de novo​ motif for the
treatment peaks appears to be linked to the retinoid X receptor. The second ​de novo​ motif also
appears to be associated with the ETS transcription factor family. The top known motif for the
untreated group is the motif for the NFY gene, whose protein is a component of a heterodimeric
transcription factor that recognizes the 5-CCAATT box motif in the promoter of its target genes.​6

H3K4me3 activating histone mark associated with the common peaks.​ Of the 258 treatment
common peaks, 87 overlapped with the H3K4me3 mark in the sigmoid colon ChIP-seq dataset,
yielding a p-value of 1.826e-93. Of the 100 untreated peaks, 96/100 overlapped with the
H3K4me3 mark yielding a p-value of 7.4975e-173. The H3K9me3 and H3K4me1 histone marks
yielded p-values above 0.05 when checking for overlaps with the common peaks. Overall, the
only other overlap beside those for H3K4me3 was 1 of 258 treatment common peaks
overlapped with an H3K9me3 mark.

Disease-associated SNPs mapped to ChIP-seq peaks for each treated cell line individually.
None of the common peaks (treated or untreated) mapped to any of the total 1054 SNPs tested.
As a next step, I checked whether each of the treated datasets individually had overlaps with
any of the SNPs. I found that the calcitriol-treated lymphoblastoid cells had four peaks overlap
with SNPs for Crohn’s disease, yielding a p-value of 0.00018326. These cells also had two
peaks overlap with type 1 diabetes SNPs, yielding a p-values of .007524. The 1α,
25-dihydroxyvitamin D3-treated THP-1 leukemia cells had 2 of its 1818 peaks overlap with
SNPs linked to multiple sclerosis, yielding a p-value of 0.020834. Any remaining comparisons
yielded p-values above 0.05.

Discussion & Conclusion


The chromosomal regions that were common peaks between treatment datasets for both cell
types could play an important role in the regulation of gene expression by vitamin D. These
regions could be turned on in the presence of vitamin D and, thus, play an important role in
preventing disease phenotypes or other adverse effects caused by deficiencies. This is further
corroborated by the high number of common VDR-binding chromosomal region peaks
overlapping with H3K4me3, which is an activating histone modification. The peaks common
between the untreated groups could be involved in inducing adverse phenotypes and could be
targets for repression in the presence of vitamin D. The ​de novo​ motifs and the known motifs
identified by HOMER analysis suggest the the ETS transcription factor family plays a critical role
in how vitamin D alters gene expression. Therefore, future investigations should look into how
these transcription factors interact with chromosomal regions to better understand the
mechanisms of vitamin D. Since VDR binding sites seem to be strongly associated with the
H3K4me3 histone mark, future investigations should explore whether the removal of these
marks mitigate or induce the adverse effects of vitamin D in deficiencies in animal models.This
mark could be synthetically added to particular chromosomal regions (for instance, VDR binding
sites that did not already have the mark in the common treated peaks), and the phenotypic
effects could be evaluated. Lastly, this work suggests VDR may play a role in some forms of
Crohn’s disease, type 1 diabetes, and multiple sclerosis. However, this association was not
strongly correlated within cell types and was not corroborated across cell types.

3
Limitations and future directions.​ Limitations of this work include a lack of investigation into the
particular chromosomal region peaks that overlapped with histone marks or SNPs. No gene
ontology analysis was performed to to characterize the roles of the peaks highlighted by
chromatin marks or disease-associated SNPs. Future works could look into SNPs for other
diseases linked to vitamin D deficiencies, such as rheumatoid arthritis. Another limitation was
that only two human cell types were investigated. The case for a particular chromosomal
region’s involvement in vitamin D-driven gene expression alteration could be strengthened if a
peak was common across three or four different datasets in which unique cell types were
treated with vitamin D. Additionally, VDR antibody ChIP-seq could be performed on other
human cell lines after vitamin D treatment, particularly cells involved in insulin maintenance or
cells in the lower gut.

Figure 1. ​HOMER known motif results for common treatment (a) and control peaks (b)

4
Figure 2. ​HOMER ​de novo​ motif results for common treatment (a) and control peaks (b)

5
References

1. Handel, Adam E., et al. "Vitamin D receptor ChIP-seq in primary CD4+ cells: relationship
to serum 25-hydroxyvitamin D levels and autoimmune disease." ​BMC medicine​ 11.1
(2013): 163.

2. Ramagopalan, Sreeram V., et al. "A ChIP-seq defined genome-wide map of vitamin D
receptor binding: associations with disease and evolution." Genome research 20.10
(2010): 1352-1360.

3. Heikkinen, Sami, et al. "Nuclear hormone 1α, 25-dihydroxyvitamin D3 elicits a


genome-wide shift in the locations of VDR chromatin occupancy." ​Nucleic acids research
39.21 (2011): 9181-9193.

4. European Bioinformatics Institute Protein Information Resource SIB Swiss Institute of


Bioinformatics. (2017, March 15). UniProtKB - P14921 (ETS1_HUMAN): Protein C-ets-1.
Retrieved April 26, 2017, from http://www.uniprot.org/uniprot/P14921#function

5. European Bioinformatics Institute Protein Information Resource SIB Swiss Institute of


Bioinformatics. (2017, March 15). UniProtKB - Q01543 (FLI1_HUMAN): Friend leukemia
integration 1 transcription factor. Retrieved April 26, 2017, from
http://www.uniprot.org/uniprot/Q01543#function

6. European Bioinformatics Institute Protein Information Resource SIB Swiss Institute of


Bioinformatics. (2017, March 15). UniProtKB - P23511 (NFYA_HUMAN): Nuclear
transcription factor Y subunit alpha. Retrieved April 26, 2017, from
http://www.uniprot.org/uniprot/P23511#function

You might also like