Professional Documents
Culture Documents
Class: B1 _ B6
Student-ID: 22BI13251
Introduction to Ensembl
https://www.youtube.com/watch?time_continue=1442&v=lA2xq3YkWko
Exercise 1 – Panda
(a) Go to the species homepage for Giant Panda. What is the name of the genome
assembly for Panda?
ASM200744v2
(b) Click on More information and statistics. How long is the Panda genome (in bp)?
How many coding genes have been annotated?
Length: 2,444,060,653
Exercise 2 – Zebrafish
Exercise 3 – Mosquitoes
(a) Go to Ensembl Metazoa. How many species of the genus Anopheles are
represented in Ensembl Metazoa?
22
(b) When was the current Anopheles gambiae genome assembly last revised?
2007
Exercise 4 – Bacteria
Go to Ensembl Bacteria and find the species Belliella baltica. How many coding and
non-coding genes does it have?
(a) Find the human MYH9 (myosin, heavy chain 9, non-muscle) gene, and go to the
Gene
On which chromosome and which strand of the genome is this gene located?
Chromosome number 22
Protein 1981aa.
(c) Click on Phenotype at the left side of the page. Are there any diseases
associated with this gene, according to O-MIM (Online Mendelian Inheritance in
Man)?
So there are 2.
(c) In the transcript table, click on the transcript ID for MYH9-201, and go to the
Transcript tab.
41.
Are any of the exons completely or partially untranslated?
(a) Search for phenylketonuria from the Ensembl homepage and narrow down your
search to only genes. What gene is associated with this disorder?
Name: PAH
ID: ENSG00000171759
(b) How many protein coding transcripts does this gene have? View all of these in
the transcript comparison view.
6 coding transcipts.
(a) What is the current human genome assembly? What are the previous
assemblies?
GRCh38.p13
(b) What is the size of a human genome? How many coding gene (Primary
assembly)?
3,096,649,726
(c) Search for the gene BRCA2? What is the position of the gene? How many
protein coding transcripts does this gene have?
GRCh38:CM000675.2.
Protein coding transcipts: 5.
(d) The same question as question (c), but using the GRCh37 human genome
assembly version.
GRCh37: CM000675.1
The SNP rs1738074 in the 5’ UTR of the human TAGAP gene has been identified as a
genetic risk factor for a few diseases.
(b) What is the least frequent genotype for this SNP in the Yoruba (YRI) population
from the 1000 Genomes phase 3?
(c) What is the ancestral allele? Is it conserved in the 90 eutherian mammals EPO-
Extended?
It’s conserved.
(d) With which diseases is this SNP associated? Is there any known risk (or
associated alleles?
(c) Zoom in on the largest gene EFI27358. How many exons does this gene have?
23 exons.
EFI27358 cdna:protein_coding
ATGCAGCCAACGCCAGCACCGTCGTCAGCACCTGGTTCCCCTCAACGAACCCAAGCTGAA
CCGGAAATGGAAACACCCTCATATCCACAACCTCCACAGAACGTAGGGACTGCACCATTT
AGCGTCCTGGTCAAACTCTTTGAGAAACTTGCCACGGAACGGAAACAAGAAAGACGAAGA
AAACTTCTGGATGCATGGTTTAGACACTGGCGAAGGGAGAAGGGCTTTGACCTTTACCCA
GTTCTTCGGCTGTTACTACCACAAAAAGACAGAGACCGCGCTGTGTATGGGTTGAAGGAG
AAGAACTTGGCGAAGACCTACATCAAACTCATCCCTCTTGGAATGCGCGACCCAGATGCG
ATCCGCCTCCTGAACTGGAAAAAGCCAACTGAACGCGACGTACTGTGCGAGGTCGTCTCC
AAGCGGTCGTCTGTGATCGAAGGGACTTTGACTATCGACGAACTTAATGAGATCTTGGAC
GATATAGCTAAGAACATGGGCAAATCGGATGTGCAGTCGAAGATTCTGAGGAGGATATAC
AACAATTCGACGGCGGATGAGCAACGGTGGATCATCCGTATTATTCTGAAGGATATGAAC
ATCTCCGTCAAGGAGACCACAGTGTTTGCAGTCTTCCATCCCGATGCACAAGATCTTTAC
AACACCTGCTCGGACCTGAAGAAAGTCGCATGGGAACTTTGGGATCCTTCGCGCCGGCTT
AATGCGAAGGACAAAGAAATCCAAATCTTTCATGCATTCGCTCCTATGTTGTGCAAACGG
CCGACTAGAAAGATAGAAGAGACAGTCAAAGCGATGGGTGGATCCAAATTTATCATTGAG
GAGAAGCTCGATGGTGAACGAATGCAGCTCCATAAGCGAGGCAATGAATACTTCTACTGC
TCACGGAAAGGCAAAGACTATACTTATCTTTATGGAAAGCACATCGGTGCTGGAAGTTTA
ACACCTTTTATCGACTCTGCTTTCGACTCAAGAATTGATGATATCATCCTTGATGGAGAG
ATGCTTGTCTGGGACCCCGTCTCTGAAAGAAACCTTCCATTCGGAACATTGAAGACCGCC
GCACTTGGTAGGTCCAAGAAAGAGAACAACCCCCGACCCTGCTTCAAAGTATTCGACTTG
CTATATCTGAACGGAATGTCCCTTCTCGACAAGACGGTGAAATTTCGTAAGAACAACCTG
CGACACTGCATCAAACCAATCCCTGGCCGGATTGAATTTGTTGAGGAGTACCAAGGCGAA
ACCGCAAATGACATCCGGAAGCGAATGGAGCAAGTCATGGAGAATAGAGGAGAAGGCCTC
GTCATCAAGCATCCCAAGGCAAAATATATCTTGAATGGGAGAAACACCGATTGGATCAAG
GTCAAGCCGGAGTATATGGATAATATGGGTGAAACGGTAGATGTACTCGTCGTTGCCGGA
AACTACGGCAGTGGGAAGAGGGGTGGTGGCGTGTCAACTTTGATTTGTGCCGTCATGGAC
GACCGTCGTCCAGACTCGGACGATGAGCCCAACAGTTTTGTACGTATCGGCACGGGACTG
TCGTTTGCCGACTACGTCTGGGTGAGGAGCAAACCGTGGAAGGTCTGGGACCCTAAGAAT
CCTCCCGAGTTTTTGCAGACTGCGAAGAAAGGTCAGGAGGACAAGGGTGACGTTTATCTG
GAACCAGAAGAGTCCGTAGGCTTAATTTATCGGATACACTCTGATGCTAATGGTGGTTCT
CTTTACTGTGTACGCAGTTCCTTCATTCTGAAAGTCAAGGCAGCGGAAATTACCCCTTCA
GACCAATACCACATGGGATTTACTATGAGGTTCCCCCGTGCATTGGCTATCCGGGATGAT
TTATCTATCGCGGATTGCATGACAGCGACCGGCAAGTATCCCGTATTGGCAGGGATCAGG
ATAACTACAAAGAAACGGAAGACAACAGTAAAGAAGGTCGCACTGTTGCCCGAATACTCT
GGGCCCAACCTGAAGAAAGTCGCTGTCAAAACTGATATATTCAACGGCATGAAGTTTGTC
GTCTTCTCAGATCCCAAGTCACGAACAGGAGAAGCAGACAAGAAGGAGCTCATGAAAACG
ATTCATGCAAATGGTGGAACCTGCTCTCAGATAGTCAATAAGAATTCCGAGGCGATTGTC
ATTTATGGAGGCTCAATAACTCCATATGACCTGAAGTTGGTGATTGATAAAGGGATCCAC
GATGTTATCAAGCCATCGTGGATCACAGATTCTGTGACCCTAGGAGAGCCAGCACCCTTC
AAAAAGAAGTACTTCTTCCACGCCACCGAAGAACGGAAATATGCAGATGAATATAATGAA
GATGACGGCGAGGAAGAGGGGGCGGTACCCAGTGCAGACGAGCAAGAAAGGGACGTTAA
A
TCTGGGACTGTCGAGCCCGGGTCTGAGACGGAAGACGAAGACGAAGAGCAGGCTCCGGA
A
ATCAAGGAGGAACAGGATGGGGAACTGCACGAATGGTTAAAGGTTGATGACCGGAAATCA
CCAGCGTTGCCCGCTCATGACGAAGAAGATTCTGTAACGGAGGACGATTCCGACAATGCT
GACGTCGCTGACGAGGAAGAACCTGACTTGGACGATTGGTTCCAAGTGAAGGGGGAAACA
GAAGATGAGGGAGCTGGAGCTCTAGCAACAGCATCAAGGCACAGAGAGACAACGCCTGA
C
GTCGACGGAGATGTTAAGATGGGGGAGAGTGAAGAGGCTATGGATTACGATCCGGATGTC
ATCTTCAAACACTTCTTTGAAGAAGTCGAAAAACTCATTAAAGATAACGGGGGCAAGATT
GTTGATTTGGACGAGCCAAAGCTAACACACGTTGTGCTTGATAAACGAGATGATAGTCGC
CGGGTTGAGCTCATGAAACGCACATCAAAGTGA