You are on page 1of 6

Name: Nguyễn Tú Linh

Class: B1 _ B6

Student-ID: 22BI13251

Introduction to Ensembl

https://www.youtube.com/watch?time_continue=1442&v=lA2xq3YkWko

Exercise 1 – Panda

(a) Go to the species homepage for Giant Panda. What is the name of the genome
assembly for Panda?

ASM200744v2

(b) Click on More information and statistics. How long is the Panda genome (in bp)?
How many coding genes have been annotated?

Length: 2,444,060,653

Number of coding genes: 20,857

Exercise 2 – Zebrafish

What previous assemblies are available for zebrafish?

Exercise 3 – Mosquitoes

(a) Go to Ensembl Metazoa. How many species of the genus Anopheles are
represented in Ensembl Metazoa?

22

(b) When was the current Anopheles gambiae genome assembly last revised?

2007
Exercise 4 – Bacteria

Go to Ensembl Bacteria and find the species Belliella baltica. How many coding and
non-coding genes does it have?

Coding: 3,680 Non-coding: 53

Exercise 5 – Exploring the human MYH9 gene

(a) Find the human MYH9 (myosin, heavy chain 9, non-muscle) gene, and go to the
Gene

 On which chromosome and which strand of the genome is this gene located?

Chromosome number 22

Reverse strand number: 36,281,280-36,388,010


 How many transcripts (splice variants) are there and how many are protein coding?

23 transcripts and 6 of them are protein coding.


 What is the longest transcript that codes for protein, and how long is the protein it
encodes?

Transcript ID: ENST00000685801.1.

Protein 1981aa.
(c) Click on Phenotype at the left side of the page. Are there any diseases
associated with this gene, according to O-MIM (Online Mendelian Inheritance in
Man)?

DEAFNESS, AUTOSOMAL DOMINANT 17 and Macrothrombocytopenia and


granulocyte inclusions with or without nephritis or sensorineural hearing loss.

So there are 2.

(c) In the transcript table, click on the transcript ID for MYH9-201, and go to the
Transcript tab.

 How many exons does it have?

41.
 Are any of the exons completely or partially untranslated?

Completely untranslated: ENSE00001874129

Partially untranslated: ENSE00002732339, ENSE00001230540.


Exercise 6 – Finding a gene associated with a phenotype

Phenylketonuria is a genetic disorder caused by an inability to metabolise phenylalanine


in any body tissue. This results in an accumulation of phenylalanine causing seizures
and mental retardation.

(a) Search for phenylketonuria from the Ensembl homepage and narrow down your
search to only genes. What gene is associated with this disorder?

Name: PAH

ID: ENSG00000171759

(b) How many protein coding transcripts does this gene have? View all of these in
the transcript comparison view.

6 coding transcipts.

(c) What is the OMIM gene identifier for this gene?

It’s the OMIM codepage 612349.

Exercises 7 - Exploring the Human genome

(a) What is the current human genome assembly? What are the previous
assemblies?

GRCh38.p13

(b) What is the size of a human genome? How many coding gene (Primary
assembly)?

3,096,649,726

19,827 coding genes of primary assembly.

(c) Search for the gene BRCA2? What is the position of the gene? How many
protein coding transcripts does this gene have?

Chromosome number 13.

Forward strand number: 32,315,086-32,400,268.

GRCh38:CM000675.2.
Protein coding transcipts: 5.

(d) The same question as question (c), but using the GRCh37 human genome
assembly version.

Chromosome number 13.

Forward strand number: 32,889,611-32,973,805.

GRCh37: CM000675.1

Protein coding transcipts: 3.

Exercise 8 — Human population genetics and phenotype data

The SNP rs1738074 in the 5’ UTR of the human TAGAP gene has been identified as a
genetic risk factor for a few diseases.

(a) In which transcripts is this SNP found?

ENST00000326965, ENST00000338313, and ENST00000367066.

(b) What is the least frequent genotype for this SNP in the Yoruba (YRI) population
from the 1000 Genomes phase 3? 

CIC: 0,056 (6)

(c) What is the ancestral allele? Is it conserved in the 90 eutherian mammals EPO-
Extended?

The ancestral allele is T.

It’s conserved.

(d) With which diseases is this SNP associated? Is there any known risk (or
associated alleles?

Celiac disease; Multiple scierosis and White blood cell count.

Exercise 9 – Exploring a Coprinopsis cinerea okayama region

(a) Go to the region 7:1400000-1425000 in Coprinopsis cinerea okayama in


Ensembl fungi.
(b) How many complete genes are found in this region? How many on the forward
and how many on the reverse strand?

7 complete genes, 5 forward stranded and 2 reverse stranded

(c) Zoom in on the largest gene EFI27358. How many exons does this gene have?

23 exons.

(d) Export the cDNA sequence of the transcript variant EFI27358.

EFI27358 cdna:protein_coding
ATGCAGCCAACGCCAGCACCGTCGTCAGCACCTGGTTCCCCTCAACGAACCCAAGCTGAA
CCGGAAATGGAAACACCCTCATATCCACAACCTCCACAGAACGTAGGGACTGCACCATTT
AGCGTCCTGGTCAAACTCTTTGAGAAACTTGCCACGGAACGGAAACAAGAAAGACGAAGA
AAACTTCTGGATGCATGGTTTAGACACTGGCGAAGGGAGAAGGGCTTTGACCTTTACCCA
GTTCTTCGGCTGTTACTACCACAAAAAGACAGAGACCGCGCTGTGTATGGGTTGAAGGAG
AAGAACTTGGCGAAGACCTACATCAAACTCATCCCTCTTGGAATGCGCGACCCAGATGCG
ATCCGCCTCCTGAACTGGAAAAAGCCAACTGAACGCGACGTACTGTGCGAGGTCGTCTCC
AAGCGGTCGTCTGTGATCGAAGGGACTTTGACTATCGACGAACTTAATGAGATCTTGGAC
GATATAGCTAAGAACATGGGCAAATCGGATGTGCAGTCGAAGATTCTGAGGAGGATATAC
AACAATTCGACGGCGGATGAGCAACGGTGGATCATCCGTATTATTCTGAAGGATATGAAC
ATCTCCGTCAAGGAGACCACAGTGTTTGCAGTCTTCCATCCCGATGCACAAGATCTTTAC
AACACCTGCTCGGACCTGAAGAAAGTCGCATGGGAACTTTGGGATCCTTCGCGCCGGCTT
AATGCGAAGGACAAAGAAATCCAAATCTTTCATGCATTCGCTCCTATGTTGTGCAAACGG
CCGACTAGAAAGATAGAAGAGACAGTCAAAGCGATGGGTGGATCCAAATTTATCATTGAG
GAGAAGCTCGATGGTGAACGAATGCAGCTCCATAAGCGAGGCAATGAATACTTCTACTGC
TCACGGAAAGGCAAAGACTATACTTATCTTTATGGAAAGCACATCGGTGCTGGAAGTTTA
ACACCTTTTATCGACTCTGCTTTCGACTCAAGAATTGATGATATCATCCTTGATGGAGAG
ATGCTTGTCTGGGACCCCGTCTCTGAAAGAAACCTTCCATTCGGAACATTGAAGACCGCC
GCACTTGGTAGGTCCAAGAAAGAGAACAACCCCCGACCCTGCTTCAAAGTATTCGACTTG
CTATATCTGAACGGAATGTCCCTTCTCGACAAGACGGTGAAATTTCGTAAGAACAACCTG
CGACACTGCATCAAACCAATCCCTGGCCGGATTGAATTTGTTGAGGAGTACCAAGGCGAA
ACCGCAAATGACATCCGGAAGCGAATGGAGCAAGTCATGGAGAATAGAGGAGAAGGCCTC
GTCATCAAGCATCCCAAGGCAAAATATATCTTGAATGGGAGAAACACCGATTGGATCAAG
GTCAAGCCGGAGTATATGGATAATATGGGTGAAACGGTAGATGTACTCGTCGTTGCCGGA
AACTACGGCAGTGGGAAGAGGGGTGGTGGCGTGTCAACTTTGATTTGTGCCGTCATGGAC
GACCGTCGTCCAGACTCGGACGATGAGCCCAACAGTTTTGTACGTATCGGCACGGGACTG
TCGTTTGCCGACTACGTCTGGGTGAGGAGCAAACCGTGGAAGGTCTGGGACCCTAAGAAT
CCTCCCGAGTTTTTGCAGACTGCGAAGAAAGGTCAGGAGGACAAGGGTGACGTTTATCTG
GAACCAGAAGAGTCCGTAGGCTTAATTTATCGGATACACTCTGATGCTAATGGTGGTTCT
CTTTACTGTGTACGCAGTTCCTTCATTCTGAAAGTCAAGGCAGCGGAAATTACCCCTTCA
GACCAATACCACATGGGATTTACTATGAGGTTCCCCCGTGCATTGGCTATCCGGGATGAT
TTATCTATCGCGGATTGCATGACAGCGACCGGCAAGTATCCCGTATTGGCAGGGATCAGG
ATAACTACAAAGAAACGGAAGACAACAGTAAAGAAGGTCGCACTGTTGCCCGAATACTCT
GGGCCCAACCTGAAGAAAGTCGCTGTCAAAACTGATATATTCAACGGCATGAAGTTTGTC
GTCTTCTCAGATCCCAAGTCACGAACAGGAGAAGCAGACAAGAAGGAGCTCATGAAAACG
ATTCATGCAAATGGTGGAACCTGCTCTCAGATAGTCAATAAGAATTCCGAGGCGATTGTC
ATTTATGGAGGCTCAATAACTCCATATGACCTGAAGTTGGTGATTGATAAAGGGATCCAC
GATGTTATCAAGCCATCGTGGATCACAGATTCTGTGACCCTAGGAGAGCCAGCACCCTTC
AAAAAGAAGTACTTCTTCCACGCCACCGAAGAACGGAAATATGCAGATGAATATAATGAA
GATGACGGCGAGGAAGAGGGGGCGGTACCCAGTGCAGACGAGCAAGAAAGGGACGTTAA
A
TCTGGGACTGTCGAGCCCGGGTCTGAGACGGAAGACGAAGACGAAGAGCAGGCTCCGGA
A
ATCAAGGAGGAACAGGATGGGGAACTGCACGAATGGTTAAAGGTTGATGACCGGAAATCA
CCAGCGTTGCCCGCTCATGACGAAGAAGATTCTGTAACGGAGGACGATTCCGACAATGCT
GACGTCGCTGACGAGGAAGAACCTGACTTGGACGATTGGTTCCAAGTGAAGGGGGAAACA
GAAGATGAGGGAGCTGGAGCTCTAGCAACAGCATCAAGGCACAGAGAGACAACGCCTGA
C
GTCGACGGAGATGTTAAGATGGGGGAGAGTGAAGAGGCTATGGATTACGATCCGGATGTC
ATCTTCAAACACTTCTTTGAAGAAGTCGAAAAACTCATTAAAGATAACGGGGGCAAGATT
GTTGATTTGGACGAGCCAAAGCTAACACACGTTGTGCTTGATAAACGAGATGATAGTCGC
CGGGTTGAGCTCATGAAACGCACATCAAAGTGA

You might also like