You are on page 1of 7

Labwork8

Biomedical informatics
University of Ljubljana, Faculty of Electrical
Engineering

Autor:
dr. Tomaž Vrtovec Koldo Eizmendi Gallastegui
Laboratory of Imaging Technologies Registration number: 70081273
Academic year: 2018/2019
University of Ljubljana, Faculty of Electrical Engineering Koldo Eizmendi Gallastegui
Laboratory of Imaging Technologies
Biomedical informatics 2018/2019

LabWork8: Sequence Alignment


The main purpose of this laboratory exercise is to keep working with the Sequence Alignment and to
encode a program that will show which the optimal alignment is.

Question 1

This first question asks to write a function for computing the score of the optimal alignment. The
programming code is shown and explained in the following lines.
function oScr=computeScore(iSeqA,iSeqB,iSubS,iSubM,iGapP)
oScr=0;
for i=1:length(iSeqA)
if iSeqA(i)==iSeqB(i)
idx=strfind(iSubS,iSeqA(i));
oScr=oScr+iSubM(idx,idx);
elseif iSeqA(i)=='-' | iSeqB(i)=='-'
oScr=oScr+iGapP;
else
idx=strfind(iSubS,iSeqA(i));
idx2=strfind(iSubS,iSeqB(i));
oScr=oScr+iSubM(idx,idx2) ;
end
end
end

The inputs of the function are respectively: the optimally aligned sequences a and b (iSeqA and
iSeqB), the symbols in the selected order (iSubS, ’AGCT’ for instance), the substituion matrix S
(iSubM) and the gap penalty (iGapP). The output (oScr) represents the optimal alignment score.
First of all, the score is defined as 0, since in the beginning its value is 0. Once this is done the
equality or inequality between iSeqA and iSeqB letter by letter is evaluated. This is done with a loop
that ends once the limit of iSeqA is reached (since iSeqB’s length is the same as iSeqA’s the upper
limit of the loop could also have been defined with iSeqB).
When it comes to the evaluation 3 possible scenarios have been defined:
1.- iSeqA(i) and iSeqB(i) are equal.
If both letters are equal, idx is defined. idx will mark the actual letter’s location in iSubs. It is then
know that its weight will be in iSubM’s diagonal (idx,idx).
2.- iSeqA(i) and iSeqB(i) are different and one of those is ’-’.
This is a simple scenario. It is just necessary to update oScr by adding iGapP.

1
University of Ljubljana, Faculty of Electrical Engineering Koldo Eizmendi Gallastegui
Laboratory of Imaging Technologies
Biomedical informatics 2018/2019

3.- iSeqA(i) and iSeaB(i) are different and none of those is ’-’
Two indexes will be needed in this case. In fact, depending on which the letters are, the fact that they
are different will have a different weight. idx will define the position of the first letter in iSubS and
idx2 will definde the position of the second letter in iSubS. The weight that difference has will then be
iSubM(idx,idx2).

Question 2

This second question asks to determine the optimal alignment of DNA sequences a= ’ACA’ and b=
’CGACT’ by using the substitution matrix S and gap penalty P.

Score Matrix

scrM =

0 0 0 0 0 0

0 -1 -1 2 0 -1

0 2 0 0 4 2

0 0 1 2 2 3

Trace Matrix

trcM =

XLLLLL

UDDDLD

UDLUDL

UUDDUD

Optimal trace
seqAopt =
--ACA
seqBopt =
CGACT
Score of the optimal alignment
score =
-1

2
University of Ljubljana, Faculty of Electrical Engineering Koldo Eizmendi Gallastegui
Laboratory of Imaging Technologies
Biomedical informatics 2018/2019

Question 3

This second question asks to determine the optimal alignment of DNA sequences a=
’CTCTAGCATTAG’ and b= ’GTGCACCCA’ by using the substitution matrix S and gap penalty P.

Score Matrix

scrM =

0 0 0 0 0 0 0 0 0 0

0 -1 -1 -1 2 0 2 2 2 0

0 -1 1 -1 0 1 0 1 1 1

0 -1 -1 0 1 -1 3 2 3 1

0 -1 1 -1 -1 0 1 2 1 2

0 -1 -1 0 -2 1 -1 0 1 3

0 2 0 1 -1 -1 0 -2 -1 1

0 0 1 -1 3 1 1 2 0 -1

0 -1 -1 0 1 5 3 1 1 2

0 -1 1 -1 -1 3 4 2 0 0

0 -1 1 0 -2 1 2 3 1 -1

0 -1 -1 0 -1 0 0 1 2 3

0 2 0 1 -1 -2 -1 -1 0 1

Trace Matrix

trcM =
XLLLLLLLLL
UDDDDLDDDL
UDDLUDUDDD
UDUDDDDDDL
UDDLDDUDDD
UDUDDDDDDD
UDLDDUDDDU
UUDDDLDDDU
UDDDUDLLDD
UDDLDUDDDD
UDDDDUDDDD

3
University of Ljubljana, Faculty of Electrical Engineering Koldo Eizmendi Gallastegui
Laboratory of Imaging Technologies
Biomedical informatics 2018/2019

UDUDDDDDDD
UDLDDDDDDD
Optimal trace
seqAopt =
CTCTAGCATTAG
seqBopt =
--GT-GCACCCA
Score of the optimal alignment
score =
-3

Question 4

This question asks to determine the optimal alignment of DNA sequences a= ’ACA’ and b= ’CGACT’ by
using the substitution matrix S* and gap penalty P*.

Score Matrix

scrM =

0 0 0 0 0 0

0 0 1 2 2 2

0 2 2 2 4 4

0 2 3 4 4 4

Trace Matrix

trcM =

XLLLLL

ULDDLL

UDLLDL

UUDDLL

Optimal trace
seqAopt =

AC-A--

seqBopt =

-CGACT

4
University of Ljubljana, Faculty of Electrical Engineering Koldo Eizmendi Gallastegui
Laboratory of Imaging Technologies
Biomedical informatics 2018/2019

Score of the optimal alignment


score =

Question 5
This question asks to determine the optimal alignment of DNA sequences a= ’CTCTAGCATTAG’
and b= ’GTGCACCCA’ by using the substitution matrix S* and gap penalty P*.

Score Matrix

scrM =
0 0 0 0 0 0 0 0 0 0
0 0 1 1 2 2 2 2 2 2
0 0 2 2 2 2 3 3 3 3
0 0 2 2 4 4 4 5 5 5
0 0 2 2 4 4 5 5 6 6
0 1 2 3 4 6 6 6 6 8
0 2 2 4 4 6 6 6 6 8
0 2 3 4 6 6 8 8 8 8
0 2 3 4 6 8 8 8 8 10
0 2 4 4 6 8 9 9 9 10
0 2 4 4 6 8 9 10 10 10
0 2 4 5 6 8 9 10 10 12
0 2 4 6 6 8 9 10 10 12
Trace Matrix

trcM =
XLLLLLLLLL
ULDLDLDDDL
ULDLDLDDDL
ULULDLDDDL
ULDLULDDDL
UDUDUDLLLD
UDLDLULLLU
UUDUDLDDDL
UUUDUDLLLD

5
University of Ljubljana, Faculty of Electrical Engineering Koldo Eizmendi Gallastegui
Laboratory of Imaging Technologies
Biomedical informatics 2018/2019

UUDLUUDDDU
UUDLUUDDDL
UUUDUDUULD
UDUDLUUULU
Optimal trace
seqAopt =
CTC-TAGCA-TTAG
seqBopt =
---GT-GCACCCA-
Score of the optimal alignment
score =
12

You might also like