You are on page 1of 6

Genomics and Proteomics

Madhumitha S
First graded assignment on Annotation of SARS CoV 2 Genome
SEMESTER|7Btech Biotechnology ; Reg No. 121010076
Sequence used in this assignment:

Software: Genscan
Link: http://hollywood.mit.edu/GENSCAN.html

Organism: Vertebrate

Suboptimal Exon Cut off (optional): 1.00

Print Options: Predicted peptides only

DNA Sequence: ACCESSION Number MN908947; Version 3


(https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3)

Output: .

NO EXONS FOUND AT GIVEN PROBABILITY CUTOFF


Predicted peptide sequence(s):

>/tmp/08_21_20-05:26:32.fasta|GENSCAN_predicted_peptide_1|8673_aa
MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGV

LPQLEQPYVFIKRSDARTAPHGHVMVELVAELEGIQYGRSGETLGVLVPHVGEIPVAYR
//
SGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQRQ

KKQQTVTLLPAADLDDFSKQLQQSMSSADSTQA

Interpretation: .

Genscan predicted one full length polypeptide with this sequence.

Gn.Ex : gene number, exon number

Init = Initial exon (ATG to 5' splice site)

Intr = Internal exon (3' splice site to 5' splice site)

Term = Terminal exon (3' splice site to stop codon)

PlyA = poly-A signal (consensus: AATAAA)

Begin : beginning of exon or signal (numbered on input strand)

End : end point of exon or signal (numbered on input strand)

Len : length of exon or signal (bp)

Fr : reading frame (a forward strand codon ending at x has frame x mod 3)

P : probability of exon (sum over all parses containing exon)

The confidence P values are lower than other and is not consistent which means the prediction is not
accurate.

The predicted protein sequence blasted against non-redundant protein sequences.

This shows that the predicted protein has 79% query coverage with ORF1ab polyprotein of SARS CoV
2 with 100% identity.
Software: GeneMark
Link: http://exon.gatech.edu/GeneMark/

Choose Gene Prediction with Viruses and Phages; GeneMark Hmm

DNA Sequence: ACCESSION Number MN908947; Version 3

Output: LST

Output Options: Tick all

Output .

GeneMark.hmm PROKARYOTIC (Version 3.26)


Date: Wed Aug 19 15:00:35 2020
Sequence file name: seq.fna
Model file name: GeneMark_hmm_heuristic.mod
RBS: false
Model information: Heuristic_model_for_genetic_code_1_and_GC_38

FASTA definition line: NC_045512.2 Severe acute respiratory syndrome


coronavirus 2 isolate Wuhan-Hu-1, complete genome
Predicted genes
Gene Strand LeftEnd RightEnd Gene Class
# Length
1 + 266 13483 13218 1
2 + 13810 21555 7746 1
3 + 21536 25384 3849 1
4 + 25393 26220 828 1
5 + 26523 27191 669 1
6 + 27202 27387 186 1
7 + 27394 27759 366 1
8 + 27894 28259 366 1
9 + 28274 29533 1260 1
Gene protein sequence predicted: .
>gene_1|GeneMark.hmm|4405_aa|+|266|13483 >NC_045512.2 Severe acute
respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGVLPQLEQPYVFIKRSD
ARTAPHGHVMVELVAELEGIQYGRSGETLGVLVPHVGEIPVAYRK
//

>gene_9|GeneMark.hmm|419_aa|+|28274|29533 >NC_045512.2 Severe acute


respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTASWFTALTQHG
//
WPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKK
ADETQALPQRQKKQQTVTLLPAADLDDFSKQLQQSMSSADSTQA

Gene nucleotide predicted: .


>gene_1|GeneMark.hmm|13218_nt|+|266|13483 >NC_045512.2 Severe acute
respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
ATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGAC
GTGCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCA
//
>gene_9|GeneMark.hmm|1260_nt|+|28274|29533 >NC_045512.2 Severe acute
respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
ATGTCTGATAATGGACCCCAAAATCAGCGAAATGCACCCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGGC
//
GCCTTACCGCAGAGACAGAAGAAACAGCAAACTGTGACTCTTCTTCCTGCTGCAGATTTGGATGATTTCTCCAAA
CAATTGCAACAATCCATGAGCAGTGCTGACTCAACTCAGGCCTAA

Interpretation .

GeneMark predicted totally of 9 polypeptide sequence.

The protein sequences are blasted against non-redundant protein sequences

Gene Number Matched sequence Query cover % Identity Accession


Gene_1 ORF1a polyprotein 100% 100% YP_009725295.1
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_2 orf1ab polyprotein 100% 100% QHW06038.1
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_3 surface 100% 99% CAD0240757.1
glycoprotein,
partial [Severe
acute respiratory
syndrome
coronavirus 2]
Gene_4 Chain A, Protein 3a 100% 100% 6XDC_A
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_5 membrane 100% 100% YP_009724393.1
glycoprotein
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_6 ORF6 protein 100% 100% YP_009724394.1
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_7 ORF7a protein 100% 100% YP_009724395.1
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_8 ORF8 protein 100% 100% YP_009724396.1
[Severe acute
respiratory
syndrome
coronavirus 2]
Gene_9 nucleocapsid 100% 100% YP_009724397.2
phosphoprotein
[Severe acute
respiratory
syndrome
coronavirus 2]

GeneMark.Hmm is more accurate for COVID 19 genome.

You might also like