You are on page 1of 25

FRONTIERS OF SCIENCE INSITUTE AT UNIVERSITY OF NORTHERN COLORADO (2015

)

Examination of Similarities
in Gene Sequence between
Drosophila melanogaster
and Drosophila elegans
June 14th, 2015-July 24th, 2015
Cian Colgan
7/24/2015

This study examines the genetic similarity between Drosophila elegans and Drosophila melanogaster. It
utilizes gene annotation to locate five genes on two section of the Drosophila elegans genome. The
protein sequences of these genes are then used to evaluate conservation with Drosophila melanogaster.
Finally, this study uses RaptorX protein prediction methods to evaluate unknown genes' potential
functions. This study concludes that there are some genes conserved between the two species and
other that are not. This has further implications in understanding the divergent evolution of the two
species.

Acknowledgements:
This study was mentored by Judith Leatherman, PhD in the department of biological sciences at
the University of Northern Colorado in Greeley, CO. Graduates students Jordan McCarthy and
James Major III assisted in helping find certain or ambiguous exons. Editing of this paper and
accompanying poster and presentation was provided by Nicole Wood. Funding for this project
was provided by the Zimmerle Family and the Union Pacific Foundation. This study was
conducted in the department of biological sciences at the University of Northern Colorado in
Greeley, CO.

Introduction:
Mankind's understanding of genetics has shifted rapidly since the creation of the field.
Starting with the discovery of DNA, scientists have developed ways to read the genetic codes of
many organisms. The process by which scientists create this code, or genome, is though called
genome sequencing. This process involves cutting longer pieces of DNA into small manageable
segments, matching them with complementary base pairing, and reading the result via computer
(Pierce, 2006). This process allows for scientists to build a complete genetic picture of an
organism.
Genomic sequencing has rapidly developed in efficiency and cost since its invention.
The overall cost of The Human Genome Project was 2.7 billion dollars from 1991-2005
("Human Genome Project Completion: Frequently Asked Questions," 2010). Since then, the
price of sequencing individual genomes has dropped significantly to an approximated $4,211 in
April 2015 (Wetterstrand, 2015). The project took 13 years. This has caused a rapid increase in
the number of organism that can have their genome sequenced for further study; the unfortunate
2|Page

side effect is that the rate of organisms being sequenced is faster than the rate that it can be
analyzed. This has caused a need for an alternate solution.
The Genomic Education Partnership, or GEP, is an interuniversity cooperation to fill the
need for gene analysis. Based out of Washington University in Saint Louis, the GEP and its 66
partner universities sequences data of important organisms to genetic modeling and uses
different universities to annotate smaller sections of genome called contigs. Through this
collaborative effort, the genomes of important organisms have been collected into a database for
scientists to use for their research.
The current project of the GEP is to try and analyze gene silencing in the Drosophila
genus. Epigenetic gene silencing is the process of temporary gene inactivation without changing
the gene itself. It can occur in two ways: DNA methylation or Histone modification. DNA
methylation is the process of adding a methyl group to cytosine bases in the DNA. This process
allows for manipulation of genetic expression but does not change the code itself. It allows for
modified expression and creates silent genes or heterochromatin.
Another form of genetic regulation is Histone manipulation. Histones are proteins that
DNA is coiled around in the formation of chromosomes (Pierce, 2006). Histone Manipulation
involves any form of modification to this protein. The list of methods in which this can be
achieved is constantly expanding (Goldberg, Allis, & Bernstein, 2007). One such process is the
addition of a methyl group to histone H3 in the expression in fibrotic gene expression (Sun et al.,
2010). Another example that is similar is the addition of an acetyl group to the histone (Turner,
2000). Both of these methods create heterochromatin and inactivate gene expression (Pierce,
2006).

3|Page

The Drosophila family has long been an important organism in the understanding of
human genetics. Another name for fruit flies, Drosophila have served as a model for more
complex eukaryotic organisms for over 50 years. Drosophila have been used to investigate the
signal transduction pathways in nervous tissue for many eukaryotic organisms allowing scientists
to understand dhow critical components of the nervous system communicate with one and other
(Hardie & Raghu, 2001). Drosophila have also been used in the study of the importance of sleep
and the importance of the proteins in controlling sleep patterns (Potdar & Sheeba, 2013). The
importance of Drosophila has never been understated in its importance, but there is still much
scientists have to learn from these organisms.
One section of the genome for Drosophila species that has not been investigated
thoroughly is chromosome four. Chromosome four, also referred to as the dot chromosome,
exhibits both silent, heterochromatic, and non-silent, euchromatic, properties. This gene is
mostly inactivated, but maintains several very active genes. This perplexing chromosome is a
perfect example of why Drosophila are model organisms for study; even in current model
organisms, scientists still do not understand how this chromosome can exist in both states at once
(Leung et al., 2010).
Drosophila melanogaster (D. melanogaster) is the most common model organism from
the Drosophila genus for animal development and the study of animal genes (Pei et al., 2007). Its
genetic composition and gene sets have served use by showing scientists how more complex
animal processes function. Drosophila melanogaster was one of the organisms that were
sequenced in preparation for the Human Genome Project (Pierce, 2006). It has been extensively
studied and its genes have been mapped since the Human Genome Project (Pierce, 2006).

4|Page

Drosophila elegans (D. elegans) is another species of fruit fly that has not been
extensively studied. The current project orchestrated by the GEP has been to try and understand
how chromosome four exhibits these perplexing characteristics of being silent and active at the
same time. At this current time, many Drosophila genomes are being analyzed to understand this
region.
Current gene annotation procedures being done by hand. Gene annotation utilizes an
array of tools to map contigs; one such tool is BLAST. The BLAST tool is managed by the
National Center for Biotechnology Information. This tool provides scientists with the ability to
align protein segments with nucleotides, vice versa, and many other queries through five
different modules. Each module allows for researchers to crudely map their genes by providing
basic alignment data for exons of a gene (Madden, 2013). With these basic coordinates, scientists
can then move on two making a fine map of their gene.
BLAST is not the most precise tool for an annotator to use. BLAST often has errors in its
matches such as mistaking genes for one and other (Koski & Golding, 2001). As such, a different
program is required for fine mapping chromosomes. GEP projects use the UCSC Genome
Browser Mirror for this process. This computer program, developed by the University of
California at Santa Cruz, allows for annotators to examine each individual base pair in order to
find more accurate coordinates for genes. This tool combines data from many ab initio programs
to predict the locations of genes. It also provides access to RNA Seq data derived from
sequenced messenger RNAs (Leatherman, 2014). These different sources of data allow for an
annotator to make accurate decisions about selecting coordinates for genes to complete their
map.

5|Page

Within the organism
Drosophila elegans, there exists
contigs that have not been fully
analyzed for their genes.
Contig40 is one such contig.
Proper annotation is required for
assembling a complete genetic map. Once contig40 is analyzed, scientists will be able to
examine gene conservation between different Drosophila species.

Figure 1: An annotated contig. Genes Identified here are in blue at the top

Once contigs are fully

annotated, scientists can perform comparative analysis of the genes. The field of comparative
genomics deals with the comparison of similar or different organisms at the genetic level.
Organisms throughout the phylogenic tree have vastly different genomes in their gene
composition and in their size with Homo sapiens with 35,000 genes, to E. coli with 4668
(Bachhawat, 2006). Though there are enumerable differences between these organisms in their
physical characteristics, comparative genomics is finding that some genes are in multiple
organisms. For example, the Histone H3 gene has been found in almost all eukaryotic organisms
(Leatherman, 2014). Through comparative genomics, scientists are able to identify some of life's
most important genes in many different organisms.
Another important piece of information that can be determined from analyzed sequence is
the protein structure. From an amino acid sequence, it is possible to predict the shape of the

6|Page

protein and by extension its function (Källberg et al., 2012). This allows for a speculative
analysis of the gene itself by examining its products.
RaptorX is a server run by the University of Chicago that specializes in protein
prediction. This server uses a known protein structure and sequence database and compares it to
a submitted sequence. From there, it can determine similar strings of amino acids and arrange it
into protein shapes, domains, and binding sites (Källberg et al., 2012). The binding site
predictions allow for researchers to examine different molecules that bind to the domains and
make the best determination to overall function.

Materials and Methods
Selection and setup of contigs:
Contigs were presequenced by the Genome Education Partnership at Washington
University in Saint Louis. Contig40 and contig51 were selected through Dr. Judith Leatherman's
account with the GEP. Once contigs were selected, UCSC Genome Browser Mirror was used for
viewing the contig sequences on Internet Explorer. In the UCSC Genome Browser Mirror, the
BLASTX Alignment module (D. Mel proteins) was turned to pack on the Genome Browser.
Several ab initio programs modules were also turned to pack; the programs that were turned to
pack were Genscan, Geneid, N-Scan, SNAP, and GlimmerHMM. The splice predictor module
was turned to dense. RNA-Seq data modules were turned on for embryos, adult males, and adult
females. The strand of DNA that was being used for analysis was changed as required by each
gene.

Crude mapping of contigs:

7|Page

Contigs were analyzed by hand. Genes aligned by BLASTX were verified through the
use of the www.flybase.org database. Genes that were identified by BLASTX alignment but
were not in proximity to other genes identified were discarded because they were likely part of a
related gene family but located elsewhere on the genome. Once the proper genes had been
identified, the genes were searched using the GEP Gene Record Finder program to get sequence
data on all gene isoforms.
Once the first isoform was selected to be annotated first, each individual exon was
BLASTed using tBLASTn to align exons, to find approximate coordinates, and to identify the
correct reading frame. Coordinates were recorded to be used for fine mapping. tBLASTn
algorithm parameters were standard except for "Low Complexity Regions" was unchecked and
"No Adjustment" was selected for "Compositional Adjustments".
After the completion of the fine mapping one isoform, the next isoform was then
annotated. Exons that were conserved between isoforms were not re-BLASTed and fine mapping
coordinates were reused. New exons were still BLASTed.

Fine Mapping of contigs:
Once exons for a gene were crude mapped, fine mapping began. The UCSC Genome
Browser Mirror was zoomed in ±20 bases of the crude mapped start of the first exon. Once the
start coordinate had been identified in the proper reading frame, the coordinate of the first letter
of the codon were recorded. The splice donor site was identified with the below process.
The following steps mentioned in this and the preceding paragraphs were used for the
ends and beginnings of all exons except those mention the first and last paragraphs of this
subsection. The UCSC Genome Browser Mirror was zoomed in ± 20 bases of the crude mapped
8|Page

coordinate of the end of an exon called a splice donor. The splice donor sequence (GT) was
searched by hand. All possible splice donor sites were recorded. The splice donor site was
selected using RNA-Seq data, ab initio program predictions, and splice predictor data. The splice
donor was the one best supported by a combination of the aforementioned. The coordinate of the
base just before the splice donor sequence was recorded as the end of the exon. The phase, or
leftover bases, was also recorded.
The beginning of each exon, or splice acceptor, was searched for in a similar manner. The
UCSC Genome Browser Mirror was zoomed in ±20 bases of the crude mapped coordinate. The
splice acceptor sequence (AG) was searched for by hand. All potential splice acceptor sites were
recorded. The splice acceptor was selected based upon RNA-Seq data, ab initio program
predictions, and splice prediction results. The phase of the splice acceptor was also checked to
ensure accuracy of the splice site. The coordinate of the base immediately after the splice
acceptor sequence was recorded as the start of the exon.
The last exon's splice acceptor was searched for using the aforementioned process. The
UCSC Genome Mirror Browser was zoomed in ±20 bases of the stop codon crude mapped
coordinates. The first stop codon in the proper reading frame was selected as the correct one and
its coordinates recorded.

Verification of gene model:
Once all exons have been fine mapped, the GEP Gene Model Checker was used to verify
the gene that was annotated. All recorded information was put into its proper module in the
program. The gene model was then checked for errors. If no errors in the checklist tab or dot plot
tab were recorded, the isoform was considered properly annotated. As per GEP protocol, the dot
9|Page

plot, protein alignment data, FASTA file, PEP file, and GFF file were downloaded and a screen
shot was taken of BLASTX Alignment data and the completed checklist.
Should an error be discovered in the checklist or dot plot, the view tool next to the error
was used. The coordinates would be double checked for accuracy. If they were mis-entered, they
were corrected and the model was re-verified. If this was not the case or it did not remove the
error message, the procedures mentioned in the previous subsection were used to find the correct
coordinates. This was repeated until the checklist and dot plot came back with no errors.

GEP reporting:
The GEP requires annotators to prepare a report of all genes mapped on their contig data.
After the annotation of all the isoforms of the genes were properly annotated, a report was
submitted to the GEP using the template sent with the contig files. All instructions on the
template were followed to create this report. This process was repeated for both contigs as each
contig required a separate report.

Protein structure prediction of unknown proteins:
Genes that did not have a known function to humans are denoted by CG followed by a
number. If such a protein was mapped on the contig, then the protein shape was predicted.
Prediction was done through the RaptorX server managed by the University of Chicago. Both
RaptorX Structure predictor and RaptorX binding were used to determine protein structures and
to hypothesize function.

10 | P a g e

Results:
Contig Mapping Results:
Contig40 Results:

Figure 2 Contig40 with current gene positions (blue) compared to D. mel positioning (top red)

Figure 3 lgs-PA gene Dot Plot Representing Protein Conservation

11 | P a g e

Figure 4 CaMKI-PA Dot Plot Representing Sequence
Conservation

Figure 5 CaMKI-PB Dot Plot Representing Sequence
Conservation

Figure 7 CaMKI-PH Dot Plot Representing Sequence
Conservation

12 | P a g e

Figure 6 CaMKI-PG Dot Plot Representing Sequence Conservation

Figure 8 CaMKI-PI Dot Plot Representing Sequence
Conservation

Contig51:

Figure 9 Complete Contig51 with Annotated genes (Blue) Aligned to D. mel Genes (Top Red)

Figure 10 CG31999-PA Dot Plot Representing Sequence Conservation

13 | P a g e

Figure 11 CG31999-PC Dot Plot Representing Sequence
Conservation

Figure 12 yellow-h-PA Dot Plot Representing Sequence
Conservation for Exons Located on this Contig

Figure 14 CG1674-PD Dot Plot Representing Sequence Conservation

14 | P a g e

Figure 13 CG1674-PB Dot Plot Representing Sequence
Conservation for Exons Located on this Contig

Figure 15 CG1674-PH,PF, and PE Dot Plot
Representing Sequence Conservation for Exons
Located on this Contig

Figure 16 CG1674-PI Dot Plot Representing Sequence Conservation for
Exons Located on this Contig

Figure 18 CG1674-PK Dot Plot Representing Sequence
Conservation for Exons Located on this Contig

15 | P a g e

Figure 17 CG1674-PJ Dot Plot Representing
Sequence Conservation for Exons Located on this
Contig

Figure 19 CG1674-PM Dot Plot Representing
Sequence Conservation for Exons Located on this
Contig

Figure 20 CG31999-PA Predicted Shape

Figure 21 CG31999-PA Binding
Domain 1 with Calcium
Configuration

Figure 22 CG31999-PA
Domain 2 with Calcium
Configuration

Figure 23 CG3199-PA Binding Domain 3
with N-Acetyl-D-Glucosamine
Configuration

Contig40 was very well conserved. No genes were missing. As can be seen by Figure 2,
the genes had positions that were very similar to D. mel gene locations. Figure 3 shows some
sequence conservation with the legless (lgs-PA) gene in D. ele. The CaMKI gene on contig40

16 | P a g e

shows remarkable conzervation with the D. mel gene in all isoforms as can be seen in Figures 38. This level of high sequence conservation is not as prevalent in the last exon.
Contig51 is not as highly conserved as contig40. Figure 9 shows that the positions of the
genes located on the contig match that of D. mel.. However, Figures 10-11 show that there is less
amino acid conservation in CG31999 between D. ele. and D. mel. than genes on contig 40. A
similar case is seen yellow-h as illustrated by Figure 12.
In Figure 9, the CG1674 gene isoforms look much longer than its D. mel homolog; this
is not the case and is caused by in error of BLAST alignment in the UCSC Genome Browser
Mirror. Another thing to note is that CG1674 has only one complete isoform located on this
contig. All isoforms except CG1674-PD continue on to the next contig. Annotation of the next
contig will be required to get full data on the gene itself.
Of the exons that were present, CG1674 has very little conservation. As Figure 13 shows,
CG1674 is not highly conserved in the PB isoform as there is almost no sequence similarity.
Figures 17 and 18 show that some isoforms of the gene are still semi-conserved.
CG1674 in this annotation report is missing one isofrom CG1674-PL. This is
because during the process of annotation, one of the exons could not be found that was required
for this isoform. An area with high sequence conservation was identified in the proper region,
however because of a mutation, this exon has become unreadable. This region also appears to
have stopped being transcribed because there is a lack of RNA-Sequence data for that area.
CG1674 does show one complete isoform on this contig, but differs from D. mel..
As can be seen by the dot plot in Figure 14, the last exon in D. ele. is significantly longer that in
D. mel.. This is because a stop codon that ends translation of the gene no longer exists where it
17 | P a g e

should. It is because of this that translations continues 30-40 amino acids further than it should to
reach a stop codon.
Because of the significant change in the last exon and no other complete isoform,
CG1674 was not run through the Raptor X protein prediction server as it may yield inaccurate
and incorrect results. CG31999 was still run through this program. As can be seen in Figure 20,
the protein is predicted to exist in a very disordered fashion with some β-pleated sheets and a few
α-helices. Figures 21-23 show that the binding sites of this protein bind calcium and N-Acetyl-DGlucosamine. This molecule is one of the components that make up Chitin, a carbohydrate that is
used in the construction of the exoskeleton and other structures.

Discussion:
Gene differentiation is important to understanding the process of speciation (Feder, Egan,
& Nosil, 2012). As shown by the results, Drosophila elegans is evolving away from Drosophila
melanogaster. CG1674 indicates that the amino acid sequence has been changed drastically and
as such likely has a different shape (Wilson, Hunt, Alberts, & Lewis, 2002). This may change the
protein's function within Drosophila elegans (Wilson, Hunt, Alberts, & Lewis, 2002).
The legless gene in Drosophila elegans has good conservation with Drosophila
melanogaster as can be seen in Figure 3. This means that the gene's function is likely unaltered
and functions in beta-catenin binding (Kramps et al., 2002). This gene may also still serve in
RNA Polymerase II coactivator activity as well (Thompson, 2004). Further research is needed to
verify exact function.

18 | P a g e

CaMKI has the best conservation of any genes among both contigs. As can be seen by
Figures 4-8, CaMKI shows almost complete conservation in amino acid sequences. This suggests
that the gene still performs its function regularly in Drosophila elegans. This function likely
involves the use of calcium ions in kinase activity within the synaptic transmission (Xu et al.,
1998). Further evaluation through knockout studies would be needed to determine exact
function.
Yellow-h in contig51 exhibited some conservation. Figure 12 shows large sections of
conservation among the three exons. This likely means that function has been conserved as well.
This protein likely still serves in its synthesis of melanin and pigmentation of the cuticle (Gaudet,
Livstone, Thomas, & The Reference Genome Project, 2010).
CG1674 has the worst conservation of the genes located on both contigs. Figures 13-19
show low conservation among the exons in most isoforms. This combined with the loss isoform
PL suggest that the function of this gene is being lost (Krylov, Wolf, Rogozin, & Koonin, 2003).
Some isoforms, such as PJ and PK, exhibit moderate conservation. This suggests that function
still can still be conserved (Wilson, Hunt, Alberts, & Lewis, 2002). However, function for this
gene has yet to be determined (Dos Santos et al., 2014). This means that more analysis in
prediction information will be needed to determine potential functions.
CG31999 exhibits good conservation. Figures 10 and 11 show that many of the exons
have been conserved. This likely means that function is conserved and important functions are
limited to areas that are conserved (Wilson, Hunt, Alberts, & Lewis, 2002). However, the exact
function of this gene has not been determined yet (Dos Santos et al., 2014). Information that is
currently known is that this gene is transcribed in the malpighian tubules and exoskeleton of the

19 | P a g e

fly (Dos Santos et al., 2014). Figures 21-23 show that the predicted protein has two binding sites
for calcium ions and one binding site for N-acetyl-D-Glucosamine. This molecule has been down
to be a component of the polymer chitin in other organisms (Chen et al., 2011). This lead to the
conclusion that functions of the CG31999 protein involves the degradation or formation of
chitin. Further testing will be needed to confirm this hypothesis.
It is important to note the limitations of this project. A major limitation of this project is
its speculative nature. No empirical testing of protein function has been conducted to analyze the
protein functions themselves. Further testing will be needed in order to determine exact function
and shape.
A second limitation is the size and positioning of the contigs. Contig51 sits in a section of
the genome that breaks up the CG1674 gene into pieces. In order for a complete analysis of the
gene to be performed, the whole gene must be annotated before any prediction work can be done.
This would have to be done on neighboring contig50 where the entire gene is present to evaluate
is full conservation.

20 | P a g e

References
Bachhawat, A. K. (2006). Comparative genomics: A powerful new tool in biology. Resonance,
11(8), 22-40. doi:10.1007/BF02855776
Chen, J., Shen, C., Yeh, C., Fang, B., Huang, T., & Liu, C. (2011). N-Acetyl Glucosamine
Obtained from Chitin by Chitin Degrading Factors in Chitinbacter tainanesis.
International Journal of Molecular Sciences, 12(2), 1187-1195.
doi:10.3390/ijms12021187
Dos Santos, G., Schroeder, A. J., Goodman, J. L., Strelets, V. B., Crosby, M. A., Thurmond, J., .
. . Consortium, T. F. (2014, November 14). FlyBase: Introduction of the Drosophila
melanogaster Release 6 reference genome assembly and large-scale migration of
genome annotations. Nucleic Acids Research. doi:10.1093/nar/gku1099
Feder, J. L., Egan, S. P., & Nosil, P. (2012, July). The genomics of speciation-with-gene-flow.
Trends in Genetics, 28(7), 342-350. http://dx.doi.org/10.1016/j.tig.2012.03.009
Gaudet, P., Livstone, M., Thomas, P., & The Reference Genome Project. (2010). Annotation
inferences using phylogenetic trees. Retrieved July 21, 2015, from
http://www.geneontology.org/cgi-bin/references.cgi#GO_REF0000033
Goldberg, A. D., Allis, C. D., & Bernstein, E. (2007, February 23). Epigenetics: A Landscape
Takes Shape. Cell, 128(4), 635-638. http://dx.doi.org/10.1016/j.cell.2007.02.006
Hardie, R. C., & Raghu, P. (2001). Visual transduction in Drosophila. Nature, 413(6852), 186193. doi:10.1038/35093002

21 | P a g e

Haynes, K. A., Gracheva, E., & Elgin, S. C. (2007, March). A Distinct Type of Heterochromatin
Within Drosophila melanogaster Chromosome 4. Genetics, 175(3), 1539-1542.
doi:10.1534/genetics.106.066407
Human Genome Project Completion: Frequently Asked Questions. (2010, October 30).
Retrieved June 18, 2015, from https://www.genome.gov/11006943
Kent, W. J. (2002). The Human Genome Browser at UCSC. Genome Research, 12(6), 996-1006.
doi:10.1101/gr.229102.
Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., & Xu, J. (2012, August).
Template-based protein structure modeling using the RaptorX web server. Nat.
Protocols, 7(8), 1511-1522. doi:10.1038/nprot.2012.085
Koski, L. B., & Golding, G. B. (2001). The Closest BLAST Hit Is Often Not the Nearest
Neighbor. Journal of Molecular Evolution, 52(6), 540-542.
doi:10.1007/s002390010184
Kramps, T., Peter, O., Brunner, E., Nellen, D., Froesch, B., Chatterjee, S., . . . Basler, K. (2002).
Wnt/wingless signaling requires BCL9/legless-mediated recruitment of pygopus to the
nuclear beta-catenin-TCF complex. Cell, 109(1), 47-60.
Krylov, D. M., Wolf, Y. I., Rogozin, I. B., & Koonin, E. V. (2003, October). Gene Loss, Protein
Sequence Divergence, Gene Dispensability, Expression Level, and Interactivity Are
Correlated in Eukaryotic Evolution. Genome Research, 13(10), 2229-2235.
doi:10.1101/gr.1589103
Kuraku, S., & Kuratani, S. (2011). Genome-Wide Detection of Gene Extinction in Early
Mammalian Evolution. Genome Biology and Evolution, 3, 1449-1462.
doi:10.1093/gbe/evr120

22 | P a g e

Leatherman, J., Ph.D. (2014). Introduction to comparative genomics [Pamphlet]. Greeley, CO:
University of Northern Colorado.
Leung, W., Shaffer, C., Cordonnier, T., Wong, J., Itano, M., Tempel, E., . . . Elgin, S. (2010).
Evolution of a Distinct Genomic Domain in Drosophila: Comparative Analysis of the
Dot Chromosome in Drosophila melanogaster and Drosophila virilis. Genetics, 185(4),
1519-U629. doi:10.1534/genetics.110.116129
Ma, J., Wang, S., Zhao, F., & Xu, J. (2013, July 01). Protein threading using context-specific
alignment potential. Bioinformatics, 29(13), I257-I265.
doi:10.1093/bioinformatics/btt210
Madden, T., Ph.D. (2013, March 15). The BLAST sequence analysis tool. Retrieved June 28,
2015, from http://www.ncbi.nlm.nih.gov/books/NBK153387/
Pei, Z., Shi, X., Niu, M., Tang, X., Liu, L., Kong, Y., & Liang, Y. (2007). A Method of GeneFunction Annotation Based on Variable Precision Rough Sets. Journal of Bionic
Engineering, 4(3), 177-184. http://dx.doi.org/10.1016/S1672-6529(07)60030-4
Peng, J., & Xu, J. (2011, January 01). Raptorx: Exploiting structure information for protein
alignment by statistical inference. Proteins: Structure, Function, and Bioinformatics,
79(S10), 161-171. doi:10.1002/prot.23175
Peng, J., & Xu, J. (2011, June). A multiple-template approach to protein threading. Proteins,
79(6), 1930-1939. doi:10.1002/prot.23016
Pierce, B. A. (2006). Genetics: A conceptual approach (2nd ed.). New York, NY: W.H, Freeman
and Company.

23 | P a g e

Potdar, S., & Sheeba, V. (2013). Lessons From Sleeping Flies: Insights from Drosophila
melanogaster on the Neuronal Circuitry and Importance of Sleep. Journal of
Neurogenetics, 27(1-2), 23-42. doi:10.3109/01677063.2013.791692
Riddle, N., Jung, Y., Gu, T., Alekseyenko, A., Asker, D., Gui, H., . . . Molekylärbiologi
(Teknisk-naturvetenskaplig fakultet), I. F. (2012). Enrichment of HP1a on Drosophila
Chromosome 4 Genes Creates an Alternate Chromatin Structure Critical for Regulation
in this Heterochromatic Domain. Plos Genetics, 8(9), E1002954.
doi:10.1371/journal.pgen.1002954
Schubeler, D. (2015). Function and information content of DNA methylation. Nature,
517(7534), 321-326. doi:10.1038/nature14192
Sun, G., Reddy, M., Yuan, H., Lanting, L., Kato, M., & Natarajan, R. (2010). Epigenetic Histone
Methylation Modulates Fibrotic Gene Expression. Journal of the American Society of
Nephrology, 21(12), 2069-2080. doi:10.1681/ASN.2010060633
Thompson, B. (2004). A complex of Armadillo, Legless, and Pygopus coactivates dTCF to
activate wingless target genes. Current Biology, 14(6), 458-466.
Turner, B. (2000). Histone acetylation and an epigenetic code. Bioessays, 22(9), 836-845.
doi:10.1002/1521-1878(200009)22:93.3.CO;2-O
Wasington University. (n.d.). Genomics Education Partnership - Becoming a GEP Member.
Retrieved June 28, 2015, from http://gep.wustl.edu/community/prospective_members
Wetterstrand, K., M.S. (2015, June 15). DNA sequencing costs. Retrieved June 18, 2015, from
http://www.genome.gov/sequencingcosts/
Wilson, J. H., Hunt, T., Alberts, B., & Lewis, J. (2002). Molecular biology of the cell, 4th ed (4th
ed.). New York: Garland.

24 | P a g e

Xu, X., Wes, P., Chen, H., Li, H., Yu, M., Morgan, S., . . . Montell, C. (1998). Retinal targets for
calmodulin include proteins implicated in synaptic transmission. Journal of Biological
Chemistry, 273(47), 31297-31307.

25 | P a g e