You are on page 1of 26

Gene Annotation in

Drosophila Species
By: Cian Colgan, Sahit Talchutla

The Human Genome Project
• Goal to fully sequence the
human genome
• Began in 1990
• Ended in 2003
• Importance
• Medical discoveries
• Fuels research
Image Courtesy of University College, London

Drosophila melanogaster
• First fully annotated fruit fly species
• Sequenced before humans
• Served as a model organism
• Knockout studies
• Allele dominance
Courtesy of Shutter Stock (Oringer, J. 2003)

What did The Human Genome Project do?
• Created faster sequencing
• Took 13 years
• Less than 24 hours today

• Less Expensive
• $100 million 2001
• $5,000 Today

• Provided raw data
Courtesy of National Human Genome
Sequencing Institute (Wetterstrand, KA)

Gene Annotation
• Purpose
• Locate genes
• Identify exact sequence of genes
• Helps determine species differentiation

• Why do we do it
• Learn about protein sequences
• Predict protein structure
• Identify exons

• Contigs
Image Courtesy of Plant GDB

What to look for
• Exons

• Coding regions
• Want to know whole sequence of

• Intervening region (intron)

• Non coding regions
• Only want to know location

• Splice donors

• End of Exon/ Beginning of intron
• GT Nucleotide sequence

• Splice Acceptors

• End of intron/ Beginning of exon
• AG Nucleotide

Image Courtesy of (Ben-Hur, A.,
et., al 2008)

How Do you Annotate Genes?
Crude Map
• BLAST amino acid sequence
• Record coordinates

Image Courtesy of BLAST (Altschul, S.F.,

Programs Used

Image Courtesy of UCSC Genome Browser
(Fujita PA, et. al 2011)

BLAST, ab initio prediction programs, RNA sequence data

How do ab initio programs work?
• Use of GHMM

Emphasize and use certain aspects
Splice sites
Open Reading Frames (ORF)
Previously annotated genes
Image Courtesy of UCSC Genome Browser
(Fujita PA, et. al 2011)

Fine Mapping
• Look for start codon (ATG)

Image Courtesy of UCSC Genome Browser
(Fujita PA, et. al 2011)

Fine Mapping (cont.)

Image Courtesy of UCSC Genome Browser
(Fujita PA, et. al 2011)

• Find first splice donor
• GT

Fine Mapping (cont.)

• Find first splice acceptor
• AG

Image Courtesy of UCSC Genome Browser
(Fujita PA, et. al 2011)

Now What?
• Evaluate for conservation
• D. melanogaster proteins
• Other species

• Protein predictions
• Shape
• Function
• Binding site

Image Courtesy of Raptor X Web Server (Morten Källberg, et. al, 2012)

• Similarity in protein sequences
• Call tell
• Importance of a gene
• Which areas are important
• If other organisms have similar

Image Courtesy of BLAST (Altschul, S.F.,

• Function
• High speed synaptic transmission
with calcium ions

• Located on contig 47
• Extremely high conservation
• Function is very important

Image courtesy of The Genomics Education

• Function
• Unknown

• Located on contig51
• Full gene not on contig

• Little conservation
• Function likely not as important

Image courtesy of The Genomics Education

Protein Prediction
• RaptorX
• Protein prediction server
• Run by University of Chicago
• How it works
• Uses known protein sequences and
• Looks for similar sequences
• Builds shapes based upon similarity

Image Courtesy of Online Lectures on Bioinformatics

• Located on contig51
• Large conservation with D.

• Transcribed in
• Malphigan tubules
• Exoskeleton

• Binding sites
• Calcium
• N-acetyl-D Glucosamine

• Potential functions
• Chitin production
• Chitin destruction

Image Courtesy of Raptor X Web Server (Morten Källberg, et. al, 2012)

CG31999 (Cont.)

Image Courtesy of Raptor X Web Server (Morten Källberg, et. al, 2012)

• Located on contig21
• Moderate conservation with D.

• Transcribed in
• Female ovaries

• Potential functions
• Cell Destruction
• Gene Suppression and Expression
Image Courtesy of Raptor X Web Server (Morten Källberg, et. al, 2012)

CG11148 (cont.)

Taken From Raptor X Web Server (Morten Källberg, et. al, 2012)

The Point
• Fruit flies serve as a model
• Allow for research for humans
• Advance knowledge of other
Image Courtesy of


Bachhawat, A. K. (2006). Comparative genomics: A powerful new tool in biology.
Resonance, 11(8), 22-40. doi:10.1007/BF02855776

Kent, W. J. (2002). The Human Genome Browser at UCSC. Genome Research, 12(6), 9961006. doi:10.1101/gr.229102.

Brent, Michael. (2007). How does eukaryotic gene prediction work?. COMPUTATIONAL
BIOLOGY, 883-885. Web. 26 June 2015

Dos Santos, G., Schroeder, A. J., Goodman, J. L., Strelets, V. B., Crosby, M. A., Thurmond,
J., . . . Consortium, T. F. (2014, November 14). FlyBase: Introduction of the Drosophila
melanogaster Release 6 reference genome assembly and large-scale migration of
genome annotations. Nucleic Acids Research. doi:10.1093/nar/gku1099

Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., & Xu, J. (2012, August).
Template-based protein structure modeling using the RaptorX web server. Nat.
Protocols, 7(8), 1511-1522. doi:10.1038/nprot.2012.085

Koski, L. B., & Golding, G. B. (2001). The Closest BLAST Hit Is Often Not the Nearest
Neighbor. Journal of Molecular Evolution, 52(6), 540-542. doi:10.1007/s002390010184

Kuraku, S., & Kuratani, S. (2011). Genome-Wide Detection of Gene Extinction in Early
Mammalian Evolution. Genome Biology and Evolution, 3, 1449-1462.

Leatherman, J., Ph.D. (2014). Introduction to comparative genomics [Pamphlet].
Greeley, CO: University of Northern Colorado.

Leung, W., Shaffer, C., Cordonnier, T., Wong, J., Itano, M., Tempel, E., . . . Elgin, S.
(2010). Evolution of a Distinct Genomic Domain in Drosophila: Comparative Analysis of
the Dot Chromosome in Drosophila melanogaster and Drosophila virilis. Genetics,
185(4), 1519-U629. doi:10.1534/genetics.110.116129

Evgeny M. Zdobnov, Christian von Mering, Ivica Letunic. (2002, October). Comparative
Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogaster.
Web. June 28, 2015.

Goldberg, A. D., Allis, C. D., & Bernstein, E. (2007, February 23). Epigenetics: A
Landscape Takes Shape. Cell, 128(4), 635-638.

Hardie, R. C., & Raghu, P. (2001). Visual transduction in Drosophila. Nature, 413(6852),
186-193. doi:10.1038/35093002

Haynes, K. A., Gracheva, E., & Elgin, S. C. (2007, March). A Distinct Type of
Heterochromatin Within Drosophila melanogaster Chromosome 4. Genetics, 175(3),
1539-1542. doi:10.1534/genetics.106.066407

Madden, T., Ph.D. (2013, March 15). The BLAST sequence analysis tool. Retrieved June
28, 2015, from

HGNC (2015, July 5). SYT7 synaptotagmin VII [Homo sapiens (human)]. Retrieved July
16, 2015.

Morgenstern, B. & Waack, S.(2004, January 21). Gene Prediction with a Hidden Markov
Model. Web. 26 June 2015.

Human Genome Project Completion: Frequently Asked Questions. (2010, October 30).
Retrieved June 18, 2015, from

Murtagh, J., Martin, F., & Gronostajski, R. M. (2003). The nuclear factor I (NFI) gene
family in mammary gland development and function. Journal of Mammary Gland
Biology and Neoplasia, 8(2), 241-254.


References (cont.)

Myrick, K. V., Huet, F., Mohr, S. E., Alvarez-Garcia, I., Lu, J. T., Smith, M. A., . . . Gelbart, W.
M. (2009). Large-scale functional annotation and expanded implementations of the P{wHy}
hybrid transposon in the drosophila melanogaster genome. Genetics, 182(3), 653-660.

Pei, Z., Shi, X., Niu, M., Tang, X., Liu, L., Kong, Y., & Liang, Y. (2007). A Method of GeneFunction Annotation Based on Variable Precision Rough Sets. Journal of Bionic Engineering,
4(3), 177-184.

Pierce, B. A. (2006). Genetics: A conceptual approach (2nd ed.). New York, NY: W.H,
Freeman and Company.

Stanke, M., & Morgenstern, B. (2005, June 27). AUGUSTUS: A web server for gene
prediction in eukaryotes that allows user-defined constraints. Retrieved July 17, 2015.

Sun, Q., S. Muckatira, L. Yuan, Sw Ji, S. Newfeld, S. Kumar, and Jp Ye. "Image-level and
Group-level Models for Drosophila Gene Expression Pattern Annotation." Bmc
Bioinformatics 14.1 (2013): 350. Web. 23 June 2015..

Sun, G., Reddy, M., Yuan, H., Lanting, L., Kato, M., & Natarajan, R. (2010). Epigenetic Histone
Methylation Modulates Fibrotic Gene Expression. Journal of the American Society of
Nephrology, 21(12), 2069-2080. doi:10.1681/ASN.2010060633

Potdar, S., & Sheeba, V. (2013). Lessons From Sleeping Flies: Insights from Drosophila
melanogaster on the Neuronal Circuitry and Importance of Sleep. Journal of Neurogenetics,
27(1-2), 23-42. doi:10.3109/01677063.2013.791692

Turner, B. (2000). Histone acetylation and an epigenetic code. Bioessays, 22(9), 836-845.

Wade, S., & Auble, D. (August, 2010). The Rad23 ubiquitin receptor, the proteasome and
functional specificity in transcriptional control. Retrieved July 16, 2015.

Riddle, N., Jung, Y., Gu, T., Alekseyenko, A., Asker, D., Gui, H., . . . Molekylärbiologi (Teknisknaturvetenskaplig fakultet), I. F. (2012). Enrichment of HP1a on Drosophila Chromosome 4
Genes Creates an Alternate Chromatin Structure Critical for Regulation in this
Heterochromatic Domain. Plos Genetics, 8(9), E1002954. doi:10.1371/journal.pgen.1002954

Wasington University. (n.d.). Genomics Education Partnership - Becoming a GEP Member.
Retrieved June 28, 2015, from

Wetterstrand, K., M.S. (2015, June 15). DNA sequencing costs. Retrieved June 18, 2015,

Yok, N., & Rosen, G. (2011). Combining gene prediction methods to improve metagenomic
gene annotation. BMC BIOINFORMATICS, 12(1), 20-20

Rogers, Rebekah L., Ling Shao, Jaleal S. Sanjak, Peter Andolfatto, and Kevin R. Thornton.
(2014) "Revised Annotations, Sex-biased Expression, and Lineage-specific Genes in the
Drosophila Melanogaster Group." G3 (Bethesda, Md.) 4.12 (2014): 2345-351. Web. 23 June

Salamov, A., & Solovyev, V. (2000). Ab initio gene finding in drosophila genomic DNA.
Genome Research, 10(4), 516-522.

Schubeler, D. (2015). Function and information content of DNA methylation. Nature,
517(7534), 321-326. doi:10.1038/nature14192

Skibbe, D., Ging, G., Cao, J., Borsuk, L., Wen, T., & Fu, Y. (2005, March 1). Evaluation of five
ab initio gene prediction programs for the discovery of maize genes. Web. 26 June 2015

Image References
• Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.

• Ben-Hur, A., Ong, C., Sonnenburg, S., Schölkopf, B., & Rätsch, G. (2008). Tutorial: Support Vector Machines and Kernels for Computational
Biology. Retrieved July 20, 2015, from
• Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR,
Giardine BM, Harte RA, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom KR, Smith
KE, Haussler D, Kent WJ. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2010 Oct 18. [Epub ahead of print](2011)
• Genomics Education Partnership - GEP Home Page. (n.d.). Retrieved July 19, 2015, from

• How to Keep Fruit Flies at Bay - HackCollege. (n.d.). Retrieved July 19, 2015, from
• Molecular And Cultural Evolution Lab. (1990). Retrieved July 20, 2015, from
• Morten Källberg, Haipeng Wang, Sheng Wang, Jian Peng, Zhiyong Wang, Hui Lu & Jinbo Xu. Template-based protein structure modeling
using the RaptorX web server. Nature Protocols 7, 1511–1522, 2012.

• Oringer, J. (2003). Over 50 Million Stock Photos, Vectors, Videos, and Music Tracks. Retrieved July 19, 2015.
• Plant GDB. Annotation for Amateurs. (n.d.). Retrieved July 19, 2015, from
• Protein Structure. (n.d.). Retrieved July 20, 2015, from
• Wetterstrand, KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) Accessed July 19,2015.

• Zimmerle Family
• Union Pacific Foundation
• Tointon Family Foundation
• Kinder Morgan Foundation
• Judith Leatherman PhD
• Jordan McCarthy
• James Major III
• Nicole Wood