You are on page 1of 4

VIETNAM NATIONAL UNIVERSITY

HO CHI MINH INTERNATIONAL UNIVERSITY

BIOINFORMATICS
ASSIGNMENT 2
DATABASE SEARCHING

GROUP 21:
1. Võ Huỳnh Như - BTBTIU18367
2. Cao Sang - BTBTIU18324
3. Lê Thục Đoan Trinh - BTBTIU18254
4. Nguyễn Uyên Y Xuân - BTBTIU18301

Date of submission: October 25, 2020


Question 1: In this problem we will explore the effect of a short
protein query on the BLASTP parameters. Perform a BLASTP
search at NCBI using the following query of just 12 amino acids:
PNLHGLFGRKTG. By default, the parameters are adjusted for
short queries. Inspect the output.
a. What is the E value cut-off?
The E value cut-off is 200000
b. What is the word size?
The word size is 2
c. What is the scoring matrix?
The scoring matrix is PAM30
How do these settings compare to the default parameters?
Setting parameters Default parameters
E-value 200000 0.05
Word size 2 6
Matrix PAM30 BLOSUM62
Gapcosts Existence: 11 Existence: 9
Extension: 1 Extension: 1

Question 2: 1. Use BLAST to search the sequence below in GenBank


with the option of NR database.
TTTAACTTTAGAGCGAATACCTTTATTCGTTTGATCAGTTCT
AATTACTGCAATTCTACTTCTTCTTTCTTTGCCTGTTCTAGC
AGGGGCTATTACTATACTTTTAACAGATCGAAATTTTAATA
CTTCATTTTTTGACCCTAGAGGGGGAGGAGACCCCATTCTT
TATCAACATCTATTTTGGTTTTTTGGACATCCAGAAGTCTA
CATTCTCATTTTGCCTGGATTTGGTATAATTTCTCATATTAT
TTGCTTTCATACAGGAAAAAAAGAACCATTTGGAAATTTAG
GTATAATTTACGCTATATTAGCTATTGGATTTTTAGGATTTA
TTGTATGAGCCCACCATATATTTACAGTGGGTATAGATGTA
GATACACGAGCATATTTTACAGCTGCTACAATAATTATTGC
CGTTCCTACTGGAATTAAGATTTTCAGATGATTAGCTACTC
TTCATGGGTCTAACATTGAATTTAACTCATCAGTATTATGA
ACACTAGGATTTTTATTTCTATTTACATTAGGAGGACTAAC
AGGAATTATTTTATCAAATTCATCTCTTGATATTATTCTTCA
CGACACTTATTACGTAGTAGCTCATTTTCATTATGTTCTATC
AATAGGAGCAGTTTTTGCAATCATAGGATCAATTACTCATT
GATTTCCCTTGTTTTTTGGTTTAAACATAAACTCCATATGAT
TAAAAATTCAATTTTATACAATATTTATTGGAGTCAATTTAA
CATTTTTCCCACA
a. What is the accession number of the closest match to the query
sequence?
The accession number is ABO75955.1.
b. What is the length of the query sequence?
The length of the query sequence is 759.
c. Is it a nuclear gene or a mitochondrial gene?
The gene is a mitochondrial gene.
d. What is the species identity of the query sequence?
The species of which this gene belongs is Ixodes holocyclus.
e. What is the common name of the species?
Paralysis tick is the common name of the species.

Question 3: Performing BLASTn search with the accession number


JX989223
a. What is the most informative metric to predict the best alignment
(Score/E-value/Query Cover/Identity)? Explain why it is.
The most informative metric to predict the best alignment are Bit score
and E-value. Because he E-value (associated to a score S) is the number
of distinct alignments, with a score equivalent to or better than S, that are
expected to occur in a database search by chance. The lower the E value,
the more significant the match is. Besides, bit score measures sequence
similarity independent of query sequence length and database size and is
normalized base on the raw pairwise alignment score. The higher the bit
score , the more highly significant the match is. Therefore, we should
choose the lowest E-value and the highest bit score to have the best
alignment.
Find the next 3 species that are most similar to Cyprinus carpio
MC1R sequence. List their species names. What do they have in
common in taxonomy?
The next 3 species that are most similar to Cyprinus carpio MC1R
sequence are Carassius auratus, Sinocyclocheilus rhinocerous and
Sinocyclocheilus anshuiensis. They have the same E value (0.0) in
taxonomy.

Question 4: Use P04637 as the query for PSI-BLAST search, select


the nr database
Report for P04637
a. Locus name & gene name
The locus as given in the result is XP_018868681. The gene name is
TP53.
b. Type of molecule & its length
The type of molecule is protein and its length is 408 amino acid.
c. Scientific name & common name of organism
Based on results of PSI-BLAST:
Scientific name is Gorilla gorilla gorilla. Common name is lowland
Gorilla.
d. Identify potential name of this protein.
The potential name of this protein is cellular tumor antigen p53 isoform
X1

You might also like