You are on page 1of 84

1 .

6
2 . 5
3 8582 328 1 8253
-
- :
.

4 . 1328 -
1 +

1) +
18689 8582+1)
-

(26872 -
26812 +

1) +
(28696 -

28589 1) + +

(30617 30581 + 1)
-
=

802

1
5 .
36166 35306 860
- :

6 .
(31086 30618+ 1) +
-

135306 -

34510 + 1)

=3 for amino
acid
codon
-1 for stop
= 421

7 Look vertically ,
.

should
ideally names

match up .

as choices
cas
long match
sequence
of
up with one

question's seq .

c and 4 chymes wI3 .

P
purine transitions args CG3
I
Al G

duplication S

speration bonds
a

transversions 4
V
↓ ·
3
hydrogen
c -> T
alogs
pyramidines ↳ duplication AT 2
before
speciation
Purim ->
pyr Transversions
.
8 .
Orthologs
9 ,
inparalogs
Speciation
mi
10

ens
·

M 3) to ortholog
~black widow > co-orthologs (A and B diff
species)
11 .
2

12 .
I

13 . C,D
,
E
14 .
B
71 %
JaveragSee
100
:
:

15 : irens + Jerome

enetwanepe :

54 !
16 .
L

17 .
(
18 .
E

!
21 .
B
22 .
C+ B

Homosapiens +
chordata +
denterstones
23 .

pyr
(7:

AG pur
o 00 . 0 ..
:
or ......
.

18(5) 90
:

M
:

: 2
Transversions 21 D
:

-
-

0 A 92

24 6 .

25 .

Edge I
Go from
longest
:

0 3 + 0 24
.
.

0 2+0
. .
2

=
0 .

45
will end at
up
edge z .
sign te
Template
ddC
BIO 312 - Bioinformatics
Final Exam Study Guide

Casey Ngai
Updated Dec 1, 2021
Given a Genbank formatted gene
annotation of gDNA, calculate from the
provided coordinates:
- The number of introns and exons
- The size of the 5' and 3' UTRs
- The length of the CDS
- The length of the encoded protein
(added by Dr. Rest, 12/9)
Possible Exam Question

Possible Exam Question:


Given pictures of 2 trees,
“Which is a gene tree and which is a species tree?”

Main purpose of blast: to find homologs


Good alignments are homologs
Bad = not homologs
Explicitly Stated
? Know how to read and draw unrooted trees
Rest adds 12/9: this is for a given unrooted tree with 4 tips.
Consider what the answer is for each tree with 5 tips, or 6 tips?
Explicitly Stated

? Know definitions of:


? Orthologs (one-to-one)
? Co-orthologs
? Outparalogs
? Inparalogs
0
-

-
Explicitly Stated for Exam
? Solving these problems:

Iz
mRNA
co

-'AAGI
-
-
d
A
-
A

G
CTGICL

3'ATGACCG TA 5'mRNA
5'TACT GGLAT 3'temp ·
Explicitly Stated Question for Exam
-
& I 90

Sum :

150150/2 :
75

ne
*
*
&

75
-

E
q ↑

81
E

-
&

82
S

--
Explicitly Stated

#
? Know how to solve:
? Needleman-Wunsch
? Nussinov Jacobsen
Needleman-Wunsch

⑧ &
0P


-

&
O ·

00
segl+ 2 -

CTACT-
TTCT
-

&

8
Explicitly Stated for Exam
? How to solve:

O I

O 0 ⑧
- ⑧O
0 O

0- O
V >

C 6
O
8
St
... ⑧ 0⑬
A
“May ask on the exam” continued
“May ask on the exam” continued
“May ask on the exam”
M .
⑦ and U
bind
can

·O ⑦
->
O ⑧
0 h

O U

O y

⑧ P

so O

Ech
Nussinov Jacobsen

-D
-

....
I
-
- i
0

-
7 I
- W ↑
G- Cl

E
CGAACA
12 =
1907 6 -
d
S
-A
· -
#
Explicitly Stated

? 4th rule problem


-
? Use when:
? Can only use it in the 6th column and onwards
? Because otherwise you’d be too close to the 0 diagonal

? The # 2 to the right from the diagonal, the # 3 below


? Add two values together
? Largest # (including the numbers from the original matrix) is the final value of the matrix

? Apply the 4th rule as many times as you can


30 to 40 questions; 20 min of UPGMA
(will be verified on Monday);
All multiple choice
? Relationship between DNA, RNA, and proteins
? Structure of genes
? Calculating length of introns and exons
? Genetic code
? Needleman
? Jacobson
? UPGMA
? Orthologs and paralogs
? Reconciliation
? Commands
? ls, cd, less, cat
Explicitly Stated: Using Simple Scoring
Schemes
Lecture 8 (9/20)
“Not going to really need to know this
until you come to the final exam”
“Not going to really need to know this
until you come to the final exam”
“Not going to really need to know this
until you come to the final exam”
“Not going to really need to know this
until you come to the final exam”
Good luck everyone! :D
·

I
**** a
264/8642
see
seneAYy.AM
134 516
1 2 ㄧ
789

1011

Ns

(odous(
76 G strand
6 coding

mRNA.MG/tAA*Targetci GGtt-3itmplatGn71''tee1l5t
12

Anti-codous TACCC.TN
CDS 1 non
-

1
cuding


I

醐 3
-

3
m

AUGTHGAU

irne-Exnmat.ve/nRNtHG AfNon dtmncodingcDSG A


1
-
1

I codon ->
3 nucleotides .


pn.in

RNA Primerase

ribosome
CG have 3 H bonds sothey
AT 2
hydrogen bonds 7
:

bonds have higher meting temp .

CG 3
hydrogen
:

wobble :

G to u

in human
· 20 , 000 protein ading genes genome .

Less than 2 % of human


genome ..
64 Lodous
·

·
20 amino
acids
·
3 stop codons
central
Dogma
TAA , TAG , TGA stop codous coding Sequence (CDS)
-
HAA , UAG , UGA RNA
open reading
in
set of
.

frames (ORFs) that


encodes a
protein
coden
Expect one stop
·

= 21 codous
every 16413)
Intron Look +
:
mRNA
mRNA to
5'ntR From CDS :

3'UTR
:

From LDS to mRNA

Does not Include


* CDS ·

UTR Before start codon


:

(5) and after stop codon 13"


15 : list / see files
cd change directory
:

copy
:

cp
WC
:

# of words/# of lines

command as Input to next


1 :
pipe output from one

grep
:

search for patterns

GFF gene finding format


:

Faste

Bedtools
:

Swiss army
knife for genomic analysis
↑ Similar
Genome tools
: .

EMBOSS :
sequence alignment
copies that share a common ancestor
Homologs gene
:

from duplication
·

paralogs result :

result from speciation


orthologs
:

result from horizontal gene


transfer .

Xenologs
:

e
use BLAST to find

nucleotide database ol
·
nucleotide blast : search
query
discont Megablast
- blastn , megablast ,
.

protein blast : search protein database


-> blast psi-blast , phi-blast
,

database wl translated
·
blastX : search protein
nucleotide
query .

· A blast h
:

search translated nucleotide ul protein query

+blastX : translated nucleotide ultranslated


nucleotide
query .

BLOSUM :
HSPS
high scoring pairs
:

↑ max score , ↑ between hit


·
maxscore
:

alignment
and
query
.

HSPs
·

Total Score :

sum of .
·
Query coverage :

amount of query sea .


(1) that

overlaps the subject sequence


for a set of
·

Maxidentity highest %
:

identity
aligned seq to same subject seq
,
.

Ekmnexs which scoves are good scores"

* Evalue
:

better the
alignment
residues identical
Identity
:

two aligned are

residues that are


% identity % of
:

identical between aligned seq


.

↳ simplest way of
scoring
an alignment
Rule almost
:

always insect
gaps
in
of 3
protein coding
nucleotide sequences in
groups .

purineDurine E
transitions
purine Transits see
Al G .

transversions 4 I
Y - T

pyramidines
purinesPer E se
transversion

likelihood ratio of the


Score
log
:

Alignment
alignment

PAM
:

% Accepted Mutations
100 residues
-> PAM-1
:

one substitution per


100 resid
200 mutations on arg , per .

-> PAM-200 ,
:

Matrix
BLOSUM BLOCKS Substitution
:

related
-> 30 % identity
groups distantly :

seq are emphasized .

related
-> 100 % identity groups closely
:

seq ,
are emphasized
NewickFormat example :
(CLizard Frog) , (Auman Dog)
, ,
ASCII formatted graphical
tree example

sample all positions (columns


Bootstrap randomly
:

withreplacement
--

in an alignment)
can be repeated
- -
but
some columns

conserving #of positions .


of
·
Domains functional
:
elements proteins
· structural modules
domains :
are
repeatedly
found in diverse proteins

domain
Family group
:

of protein's share a

sequence /structure
-

·
Domains are classified by
related
by
common
ancestry Chomology)
.
·

sanger sequencing
:

form of DNA synthesis

! 5
!
:visualized
Misrosway
asheatmap S -
Disadvantages

Needleman-Wunsch
-

performs globaltwo
alignment on

sequences .

Alignment

↳irwe
Distance
Hamming
D =
nIN

+

#cleotides
Hof sites differences

t
differences
for distance
RNA Folding ->
GC , AU , GU

You might also like