You are on page 1of 2

Assignment 2

Answer Questions 4 and 5 after Lecture on Feb 16

Question 1: Find the protein sequences: human hemoglobin subunit alpha


(P69905) and human hemoglobin subunit beta (P68871) from the UniProt
database, and use “EMBOSS water” to compute the optimal local alignment of
the two sequences. What are the score and identity of the optimal local alignment
for each of the following similarity matrices: BLOSUM62, BLOSUM80 (using
default gap opening penalty 10 and gap extension penalty: 0.5). (10 points)

Question 2. Use the BLOSUM matrix construction method to calculate a


substitution matrix based on the following block. (The tables for mutation
frequencies qi,j, the probabilities of amino acids pi , and the final scoring table are
needed.) (30 points).

DFEE
DEEF
DDDF
EFFF

Question 3. With the following similarity scoring scheme: score(match) = 1,


score(mismatch) = -1, score(deletion) = -1, score(insertion) = -1, use the local
alignment algorithm (Smith-Waterman algorithm) to fill out a 2-dimensional score
table and use backtracking to find a best LOCAL ALIGNMENT of two sequences
S1 = CGTACGTCT and S2 = ATAGTGA. (The dynamic programming table,
backtracking process and final alignment should be provided). (30 points)

Question 4: Suppose that insertions and deletions are not allowed in sequence
global alignment, and that the scoring scheme of global alignment is:
score(match) = 1 and score(mismatch) = 0. Let X be a letter in random binary
sequences, Prob(X=1) = 0.5, and Prob(X=0) = 0.5.

(1) A query binary sequence 01010010011100 and a database binary sequence


01010010011100 have an alignment

01010010011100
01010010011100

with a similarity score 14. Calculate the p-value for the alignment with score 14.
(7.5 points)

(2) A query binary sequence 01010010011100 and a database binary sequence


01000010011100 have an alignment

01010010011100

This study source was downloaded by 100000881127121 from CourseHero.com on 02-24-2024 01:23:27 GMT -06:00

https://www.coursehero.com/file/193309535/Assignment-2docx/
01000010011100

with a similarity score 13. Calculate the p-value for the alignment with score 13.
(7.5 points)

Question 5: Suppose that insertions and deletions are not allowed in sequence
global alignment, and that the scoring scheme of global alignment is:
score(match) = 1 and score(mismatch) = 0. Let X be a letter in random DNA
sequence, Prob(X='A') = 0.25, Prob(X='C') = 0.25, Prob(X='G') = 0.25, and
Prob(X='T') = 0.25.

(1) A query DNA sequence ACGTACGTA and database DNA sequence


ACGTACGTA have an alignment

ACGTACGTA
ACGTAAGTA

with a similarity score 8. Calculate the p-value for the alignment with score 8.
(7.5 points)

(2) A query DNA sequence ACGTACGTA and database DNA sequence


ACGTAAGTA have an alignment

ACGTACGTA
ACATAAGTA

with similarity score 7. Calculate the p-value for the alignment with score 7. (7.5
points)

This study source was downloaded by 100000881127121 from CourseHero.com on 02-24-2024 01:23:27 GMT -06:00

https://www.coursehero.com/file/193309535/Assignment-2docx/
Powered by TCPDF (www.tcpdf.org)

You might also like