Gene Annotation in

Drosophila Species
By: Cian Colgan, Sahit Talchutla

The Human Genome Project
• Goal to fully sequence the
human genome
• Began in 1990
• Ended in 2003
• Importance
• Medical discoveries
• Fuels research
Image Courtesy of University College, London

Drosophila melanogaster
• First fully annotated fruit fly species
• Sequenced before humans
• Served as a model organism
• Knockout studies
• Allele dominance
Courtesy of Shutter Stock (Oringer, J. 2003)

What did The Human Genome Project do?
• Created faster sequencing
• Took 13 years
• Less than 24 hours today

• Less Expensive
• $100 million 2001
• $5,000 Today

• Provided raw data
Courtesy of National Human Genome
Sequencing Institute (Wetterstrand, KA)

Gene Annotation
• Purpose
• Locate genes
• Identify exact sequence of genes
• Helps determine species differentiation

• Why do we do it
• Learn about protein sequences
• Predict protein structure
• Identify exons

• Contigs
Image Courtesy of Plant GDB

What to look for
• Exons

• Coding regions
• Want to know whole sequence of

• Intervening region (intron)

• Non coding regions
• Only want to know location

• Splice donors

• End of Exon/ Beginning of intron
• GT Nucleotide sequence

• Splice Acceptors

• End of intron/ Beginning of exon
• AG Nucleotide

Image Courtesy of (Ben-Hur, A.,
et., al 2008)

How Do you Annotate Genes?
Crude Map
• BLAST amino acid sequence
• Record coordinates

Image Courtesy of BLAST (Altschul, S.F.,

Programs Used

Image Courtesy of UCSC Genome Browser
(Fujita PA, et. al 2011)

BLAST, ab initio prediction programs, RNA sequence data

How do ab initio programs work?
• Use of GHMM

Emphasize and use certain aspects
Splice sites
Open Reading Frames (ORF)
Previously annotated genes
Image Courtesy of UCSC Genome Browser
(Fujita PA, et. al 2011)

Fine Mapping
• Look for start codon (ATG)

Image Courtesy of UCSC Genome Browser
(Fujita PA, et. al 2011)

Fine Mapping (cont.)

Image Courtesy of UCSC Genome Browser
(Fujita PA, et. al 2011)

• Find first splice donor
• GT

Fine Mapping (cont.)

• Find first splice acceptor
• AG

Image Courtesy of UCSC Genome Browser
(Fujita PA, et. al 2011)

Now What?
• Evaluate for conservation
• D. melanogaster proteins
• Other species

• Protein predictions
• Shape
• Function
• Binding site

Image Courtesy of Raptor X Web Server (Morten Källberg, et. al, 2012)

• Similarity in protein sequences
• Call tell
• Importance of a gene
• Which areas are important
• If other organisms have similar

Image Courtesy of BLAST (Altschul, S.F.,

• Function
• High speed synaptic transmission
with calcium ions

• Located on contig 47
• Extremely high conservation
• Function is very important

Image courtesy of The Genomics Education

• Function
• Unknown

• Located on contig51
• Full gene not on contig

• Little conservation
• Function likely not as important

Image courtesy of The Genomics Education

Protein Prediction
• RaptorX
• Protein prediction server
• Run by University of Chicago
• How it works
• Uses known protein sequences and
• Looks for similar sequences
• Builds shapes based upon similarity

Image Courtesy of Online Lectures on Bioinformatics

• Located on contig51
• Large conservation with D.

• Transcribed in
• Malphigan tubules
• Exoskeleton

• Binding sites
• Calcium
• N-acetyl-D Glucosamine

• Potential functions
• Chitin production
• Chitin destruction

Image Courtesy of Raptor X Web Server (Morten Källberg, et. al, 2012)

CG31999 (Cont.)

Image Courtesy of Raptor X Web Server (Morten Källberg, et. al, 2012)

• Located on contig21
• Moderate conservation with D.

• Transcribed in
• Female ovaries

• Potential functions
• Cell Destruction
• Gene Suppression and Expression
Image Courtesy of Raptor X Web Server (Morten Källberg, et. al, 2012)

CG11148 (cont.)

Taken From Raptor X Web Server (Morten Källberg, et. al, 2012)

The Point
• Fruit flies serve as a model
• Allow for research for humans
• Advance knowledge of other
Image Courtesy of


