You are on page 1of 123

Introduction to Bioinformatics

English Courses for Graduate Students

Introduction to Bioinformatics
English Courses for Graduate Students

Dr. rer. nat. Jing Gong


Cancer Research center
Medicine School of Shandong University
2011.9.14

1
Introduction to Bioinformatics
English Courses for Graduate Students

Chapter 1
Introduction

2
Introduction to Bioinformatics
English Courses for Graduate Students

About me
• Dr. rer. nat. Jing Gong
• Bachelor Degree in Marine Biology at the China
Ocean University (former Qingdao Ocean
University)
• Bachelor, Master & Doctoral Degree in
Bioinformatics at the Ludwig Maximilians
Universität München, Germany
• Affiliation: Cancer Research Center of SDU
• Tel: 0531-88380202
• Email: gongjing@sdu.edu.cn
• Office: Dianjing Building, Rm.106, Baotuquan
Campus

3
Introduction to Bioinformatics
English Courses for Graduate Students

About this course


• Schedule: 2011/9/14 - 2011/10/12, Mi. 14:00 - 18:00
• Locus: 8#, first floor, west, Computer Pool
• Homepage: http://1.51.212.243/bioinfo.html
• Table of Contents My name
is Lampy.
Chapter 1 : Introduction Chapter 2 : Databases

Chapter 5 : Tree

Chapter 3 : Alignment Chapter 4 : Structure

4
Introduction to Bioinformatics
English Courses for Graduate Students

Literatures:
1. Bioinformatics - An Introduction, 2nd Edition, Jeremy Ramsden, 2009, Springer.
2. Bioinformatics For Dummies, 2nd Edition, Jean-Michel Claverie, Cedric Notredame, 2007, Wiley.

5
Introduction to Bioinformatics
English Courses for Graduate Students

Information Page Vocabulary List


Information Page Vocabulary
Chapter 1, 2011/9/14 Chapter 1, 2011/9/14
Dr. rer. nat. Jing Gong
Affiliation: Cancer Research Center of SDU FASTA FASTA
Tel: 0531-88380202
FASTA (prounced FAST-Aye) FASTA (读作FAST-Aye) 代表
Email: gongjing@sdu.edu.cn stands forFAST-ALL, reflecting FAST-ALL, 反映的实施是他能
Office: Dianjing Building, Rm.106, Baotuquan the fact that it canbe used for a 够用于快速的蛋白质比对或者快
fast protein …… 组的核苷比对。该程序……
Campus
BLAST BLAST
Schedule: 2011/9/14 - 2011/10/12, Mi. 14:00 - 18:00 Basic Local Alignment Search 基本局部比对搜索工具。以速度
Place: 8#, first floor, west, Computer Pool Tool. A sequence comparison 最优化算法为核心,搜索序列数
algorithm optimized for speed 据库得到最佳局部比对结果。用
Course Homepage: http://1.51.212.243/bioinfo.html used to search sequence 替代矩阵和查新序列……
dtabases ……
Pubmed: http://www.ncbi.nlm.nih.gov/entrez/
Alignment
比对
ExPASy: http://expasy.org/
The result of a comparison of
两个甚至更多的基因或者蛋白质
NCBI: http://www.ncbi.nlm.nih.gov/ two or more gene or protein
序列进行比较的结果,用以计算
sequences in order to
他们碱基或者氨基酸的相似度。
PRI: http://pir.georgetown.edu determine their degree of base 序列比对用来决定两个甚至…….
or amino acid…….

6
Introduction to Bioinformatics
English Courses for Graduate Students

What is Bioinformatics?

biophysics
biohazards
biometrics
biomathematics

biochemistry
bioterrorism

biopotato
bioinformatics

7
Introduction to Bioinformatics
English Courses for Graduate Students

What is Bioinformatics? Interdisciplinary

a biology/medical
researchers, just like you

a professional in the
pharmaceutical industry

a policeman worrying about


DNA testing

a computer scientist
developing bio-databases

a consumer concerned
about GMOs (Genetically
Modified Organisms)

……
8
Introduction to Bioinformatics
English Courses for Graduate Students

What is Bioinformatics?
Definition:
Bioinformatics – the science of collecting and analyzing complex
biological data such as genetic codes. [Oxford Dictionary]
Bioinformatics – the computational branch of molecular biology.
[Bioinformatics for Dummies]

Bioinformatics – the application of computer science and information


technology to the field of biology and medicine. [Wikipedia]
Bioinformatics – the science of how information is generated,
transmitted, received, and interpreted in biological systems, i.e. the
application of information science to biology. [Bioinformatics-An
Introduction]
A formel definition ?

9
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
In 1809, French biologist Jean Baptiste Lamarck published “Philosophie
Zoologique”. Lamarck stressed two main themes in his biological work:
1. The environment gives rise to changes in animals, i.e. changes
through use and disuse.
2. Life was structured in an orderly manner and that many different parts
of all bodies make it possible for the organic movements of animals.

“blind as a mole” “show your teeth” “birds have no teeth?” Jean Baptiste Lamarck
(1744-1829)

10
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
In 1859, English naturalist Charles Darwin published “On the Origin of
Species by Means of Natural Selection, or the Preservation of Favoured
Races in the Struggle for Life”.

Charles Darwin
(1809-1882)

11
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics

In 1866, Austrian scientist Gregor


Mendel demonstrated that the
inheritance of certain traits in pea
plants follows particular patterns,
now referred to as the laws of
Gregor J. Mendel
(1822-1884) “Mendelian Inheritance”.

12
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
In 1869, Swiss physician and biologist Friedrich Miescher isolated
DNA from the white blood cells at Felix Hoppe-Seyler's laboratory at
the University of Tübingen, Germany.

Nuclei Nuclein Nucleic acid DNA

Friedrich Miescher
(1844-1895)

13
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
Thomas Hunt Morgan, American geneticist, famous for his experimental
research with the fruit fly by which he established the chromosome theory
of heredity. He showed that genes are linked in a series on chromosomes
and are responsible for identifiable, hereditary traits. Morgan’s work
played a key role in establishing the field of genetics. He received the
Nobel Prize for Physiology or Medicine in 1933.

Thomas H. Morgen
(1866-1945)
nobel prize 1933

14
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
In 1944, American physician and medical researcher Oswald Avery and
his co-workers Colin MacLeod and Maclyn McCarty demonstrated that
DNA is the material of which genes and chromosomes are made.

In his experiment he destroyed the lipids, ribonucleic acids, carbohydrates,


and proteins. Transformation still occurred after this. Next he destroyed
the deoxyribonucleic acid. Transformation did not occur.

Oswald Avery Colin MacLeod Maclyn McCarty


(1877-1955) (1909-1972) (1911-2005)
15
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
In 1950, American biochemist Erwin Chargaff noticed a pattern in the
amounts of the four bases: adenine (A) , thymine (T) , cytosine (C) ,
guanine (G). He discovered that the amounts of adenine (A) and
thymine (T) in DNA were roughly the same, as were the amounts of
cytosine (C) and guanine (G). This later became known as Chargaff's
rule.

%A = %T and %G = %C

Erwin Chargaff
(1905-2002)

16
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
In 1953, James D. Watson and Francis Crick
suggested the first correct double-helix model of
DNA structure in the journal Nature. Their double-
helix model of DNA was based on a single X-ray
diffraction image taken by Rosalind Franklin and
Maurice Wilkins in 1952.

James Waston Francis Crick Maurice Wilkins Rosalind Franklin


(1928-) (1916-2004) (1916-2004) (1920-1958)
nobel prize 1962 nobel prize 1962 nobel prize 1962

17
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
The sequence of 77 nucleotides of a yeast
alanine tRNA was found by an American
biochemist Robert W. Holley in 1965. Holley
was awarded the 1968 Nobel Prize in
Physiology or Medicine for describing the
structure of this tRNA, linking DNA and
protein synthesis.

Robert W. Holley
(1922-1993)
nobel prize 1968

18
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
In 1977, Frederick Sanger and
Colleagues introduced the
“dideoxy” chain-termination
method for sequencing DNA
molecules, also known as the
“Sanger method”. Hence, in
1980, he shared Nobel Prize in
chemistry with Walter Gilbert.

The key principle of the Sanger method was


the use of dideoxynucleotide triphosphates
Frederick Sanger Walter Gilbert
(ddNTPs), as DNA chain terminators. (1918-) (1932-)
nobel prize 1980 nobel prize 1980
19
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
Read protein sequence directly in the DNA sequence!

Central dogma of molecular biology was first


articulated by Francis Crick in 1958 and re-stated
Francis Crick
in a Nature paper published in 1970. (1916-2004)

20
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
Marshall Warren Nirenberg shared a
Nobel Prize in Physiology or Medicine
in 1968 with Har Gobind Khorana and
Robert W. Holley for "breaking the
genetic code" and describing how it
operates in protein synthesis.

Marshall Warren Har Gobind Robert W. Holley


Nirenberg Khorana (1922-) (1922-1993)
(1927-2010) nobel prize 1968 nobel prize 1968
nobel prize 1968

21
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics

Amino acids are the building blocks


of protein.
Protein is a nutrient
needed by the human body
for growth and maintenance.

Amino acids are made of carbon,


hydrogen, oxygen, nitrogen, and
sulfur atoms.
A protein = C1200H2400O600N300S100

22
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
insulin = (
# 1-letter 3-letter Nmae 30 glycines +
1 A Ala Alanine
A given type of protein 44 alanines +
2 R Arg Arginine always contains the same 5 tyrosines +
3 N Asn Asparagine
number of total amino acids 14 glutamines
4 D Asp Aspartic acid
+ . . .)
5 C Cys Cysteine in the same proportion.
6 Q Gln Glutamine

7 E Glu Glutamic acid

8 G Gly Glycine
Amino acids are linked
9 H His Histindine together as a chain.
10

11
I

L
Ile

Leu
Isoleucine

Leucine
The first amino acid
12 K Lys Lysine sequence of a protein,
13 M Met Methionine
Insulin, was determined
14 F Phe Phenylalanine Frederick Sanger (1918-)
15 P Pro Proline
in 1951 by Dr. Sanger. nobel prize 1958
16 S Ser Serine

17 T Thr Threonine
insulin = MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHL
18 W Trp Trytophan
VEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPL
19 Y Tyr Tyrosine ALEGSLQKRGIVEQCCTSICSLYQLENYCN
20 V Val Valine 23
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Analyzing Protein Sequences
Protein Sequence: MAVLD

The first 3D structure of a


protein was determined in
1958 by Drs. Kendrew
and Perutz, using the
complicated technique
of X-ray
crystallography. Max Ferdinand John Cowdery
Perutz (1914-2002) Kendrew (1917-1997)
nobel prize 1962 nobel prize 1962 24
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
In 1956, Symposium on Information Theory in Biology (Gatlinburg, USA).
In 1979, GenBank was established at Los Alamos National Laboratory (USA).
In 1982, nucleotide sequence database of European Molecular Biology Laboratory
(EMBL) was created (Europe).
In 1986, DNA Data Bank of Japan (DDBJ) began data bank activities at NIG (Japan).
in the early 1990s, International Nucleotide Sequence Database Collaboration
(INSDC) was founded in cooperation of Genebank/EMBL/DDBJ.
In 1987, a Chinese-American scientist LIN Hua-an first created the word
“bioinformatics”. At the very beginning, he created the word “compbio”, then
“bioinformatique”, and then “bio-informatics”. But at that time, the email title did not
support the hyphen symbol, thus “bioinformatics” was born.
Since at least the late 1980s, the term “bioinformatics” has been primary used in
genomics and genetics, particularly in those areas of genomics involving large-scale
DNA sequencing.
25
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics

26
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics
Publicly funded project: Privately funded project

James D. Watson & Francis Collins President Clinton (2000) Craig Venter

1990 began, $3-billion 1998 began, $300-million


patented
2000 90% 2000 90%
2001 99% feely available 2001 99%
2003 finished 2003 finished
27
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics

28
Introduction to Bioinformatics
English Courses for Graduate Students

History of Bioinformatics

Shenzhen

AB SOLiDTM
Illumina HiSeq 2000 4.0 System
Shanghai X 137 X 27

Beijing
29
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?

9 Analyzing DNAs
9 Analyzing RNAs
9 Analyzing Proteins
9 Others: Pathway, Bioimaging, etc.

30
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Analyzing DNAs 1. Read the DNA sequence:
ATGGAAGTATTTAAAGCGCCACCTATTGGGAT
ATAAG

2. Decompose it into successive triplets:


ATG GAA GTA TTT AAA GCG CCA CCT
ATT GGG ATA TAA G . . .

3. Translate each triplet into the


corresponding amino acid:
M E V F K A P P I G I STOP

31
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Analyzing DNAs
Database

M
E
V Protein
F
ATGGAAGTATTTAA…… K
A
DNA P

32
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Analyzing RNAs

In the context of bioinformatics,


there are only two important
differences between RNA and DNA:
9 RNA differs from DNA by one
nucleotide.
9 RNA comes as a single strand.
33
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Analyzing RNAs
Even though RNA molecules consist of
single strands of nucleotides, their
natural urge for pairing with
complementary sequences is still there.

All transfer RNAs (tRNAs)


assemble themselves into a
shape like a cloverleaf.
Hairpin shapes are the basic elements of
RNA secondary structure; they’re made up
of loops (the unpaired C-U) and stems
(the paired regions).
34
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Analyzing Proteins
The first 3D structure of a protein
Protein Structure Determination: was determined in 1958 using X-
ray crystallography.
‡ Experimental Methods

X-ray Crystallography Nuclear Magnetic Resonance (NMR)


‡ Computational Methods
De novo method, Homology Modeling,
Threading, and ensemble method.
35
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Analyzing Proteins
VMD
Sequence

Maestro

Function

Pymol
Structure 36
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Analyzing Protein Sequences
Drug Design:
• Virtual Screen
• Docking
Virtual screening
involves the rapid in
silico assessment of
large libraries of
chemical structures in
order to identify those
structures which are
most likely to bind to a
drug target, typically a
protein receptor or
enzyme.

37
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Analyzing Protein Sequences
Molecular dynamics (MD) is a computer
simulation of physical movements of atoms
and molecules.

Super-
computer

500-aa protein, 1 ns (10-9 s), 120 Cores :


5 hours
38
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Analyzing Protein Sequences
Bavaria Supercomputing Centre
• Linux Cluster: 2007, 753 notes, 5646 cores,
43 Tera Float/s
• HLRB II: 2007, 9728 cores, 62 Tera Float/s
• SuperMUC: 2012, 140000 cores, 3 Peta Float/s
天河一号: 2.5 Peta Float/s, No.1 in the world
Linux Cluster HLRB II SuperMUC

39
Introduction to Bioinformatics
English Courses for Graduate Students

What Bioinformatics Can Do for You?


9 Others: Pathway, Bioimaging, etc.

statistic graph
CT

magnetic
resonance

40
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?

9Retrieving Protein Sequences


9Becoming an Instant
Expert with PubMed
9Retrieve a 3D protein structure

9Retrieving DNA Sequences


9Making a Multiple Protein
Sequence Alignment with
ClustalW
9Using BLAST to Compare Your Protein Sequence
41
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

Gene Sequence

Specialist
in
Bioinformatics

But, what’s Great! It’s


dUTPase. dUTPase.

42
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed
http://www.ncbi.nlm.nih.gov/entrez/

dUTPase

43
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

44
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

45
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

46
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

47
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

48
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

Author Name

49
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

Author Name + Topic

50
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

51
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

52
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

53
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


Pubmed ID
Internal structure of a database record:
The information is spread out over
Publication separate sections, called fields.
Date
Title
Page
Abstracts

Laboratory
address
authors

54
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed
Search “Down” in
field “Author [AU]”

Search “Down” in
field “Title [TI]”

Search “Down” in
field “Laboratory
address [AD]”

Search “Down”
everywhere
55
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed
Using fields to find experts near you :
1

Beijing
Beijing
2

Tel : 86 - 10 - 6275-5002
Fax : 86 - 10 - 6276-2292
New Life Science Building, Peking
3
University, Summer Palace Road
No. 5, Beijing, P. R. China 100871 56
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed
http://www.ncbi.nlm.nih.gov/entrez/

dUTPase

Searching
PubMed
using limits

57
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

Searching
PubMed
using limits

58
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed

59
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Becoming an Instant Expert with PubMed
A few more tips about PubMed :
‡ How to get the most out of your query:
• quoted queries (for example, “down syndrome”)
• logical connectors: AND, OR, NOT (for example,
dUTPase[TI] OR pyrophosphatase[TI] NOT Smith[AU])
• initials to proper names (for example, “Abergel C”)
• PubMed Identifier (the number in the PMID field)
• deselection of the Limit box when starting a new search.
• Related Articles link
‡ How to get the most out of your query:
• Names ranking beyond the 10th place in author’s list for older papers (before 1995).
• Papers recorded before 1965.
• Abstracts for most references recorded before 1976.
60
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

acquire some preliminary information


about a particular function that you’re
9
interested in — dUTPase.

find out more about it by retrieving a


few examples of protein sequences
that perform this function in E. coli.
ExPASy

61
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

dUTPase coli

Prof. Amos Bairoch


62
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

63
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

64
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

65
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

66
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

1 2 3

67
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

68
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

1 2 3

69
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

70
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

71
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

72
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

Tab FASTA

Excel

73
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

74
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

9
75
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

“Cross-references”
point to data collections
other than UniProtKB.

76
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/
9

right click

“sequences” provides
you with the actual
amino acid sequence of
the protein.

Save this sequence


on your Desktop as
“P06968.fasta”.
77
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving Protein Sequences http://expasy.org/

What is FASTA? (has anything to do with PASTA?)


FASTA is the name of a popular sequence alignment and database
scanning program created by W.R. Pearson and D.J. Lipman in 1988. Its
legacy is the FASTA format which is now ubiquitous in bioinformatics.
The sequence in FASTA format : The line starting with > (the
definition line) contains a unique
>P06968 My_Sequence_Name identifier followed by an optional
ARCGTCRGCKINTANDRGCKINTAND short definition. The lines that
CKINTANDARCGTCRGCKINTANDRG follow it contain the DNA or
CKINTAND protein sequence (in one-letter
code) until the next > symbol
indicates the beginning of a new
sequence.
78
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving DNA Sequences
acquire some preliminary information
about a particular function that you’re
9 interested in — dUTPase.

9 find out more about it by retrieving a


few examples of protein sequences
ExPASy that perform this function in E. coli.

9
retrieve DNA sequence relevant
to dUTPase protein of E. coli.
79
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving DNA Sequences http://expasy.org/

P06968

80
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving DNA Sequences http://expasy.org/

81
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving DNA Sequences

82
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving DNA Sequences

83
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving DNA Sequences

84
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving DNA Sequences From UniprotKB: P06968 jump to

1. Summary Section

2. Reference Section
85
……
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving DNA Sequences
Range of UTPase 3. Features Section
ORF (CDS) • promoter elements
• ribosome binding
sites (RBS)
• protein coding
segments (CDS)
……
ORF translation

4. Sequence
Section
…… 86
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving DNA Sequences

1. Summary Section

2. Reference Section
87
……
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieving DNA Sequences

1. Summary Section

2. Reference Section
88
……
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Using BLAST to Compare Sequence
acquire some preliminary information
about a particular function that you’re
9 interested in — dUTPase.

9 find out more about it by retrieving a


few examples of protein sequences
ExPASy that perform this function in E. coli.

9
retrieve DNA sequence relevant perform a BLAST search
to dUTPase protein of E. coli.
89
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Using BLAST to Compare Sequence
What is BLAST?

BLAST (Basic Local Alignment Search Tool) – A sequence


comparison algorithm optimized for speed used to search
sequence databases for optimal local alignments to a query.
BLASTn – BLASTn will search a DNA sequence against a DNA databank.
BLASTp – BLASTp will compare a protein sequence against the protein
database of your choice.
BLASTx – BLASTx will translate a nucleic acid sequence in all six reading
frames and compare all these against the protein database of your choice.
BLAST? – BLAST? ……
90
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

91
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

92
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

Open “P06968.fasta” at
2 your Desktop, and paste
the sequence here.
Give a name here.
http://1.51.212.243/P06968.fasta
3
93
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

94
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

95
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?

E-value (form 0 to 1) close


to 1 is a warning that the
conclusion you might draw
from the alignments is NOT
reliable.

96
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?

to see the corresponding


database entry. to see the alignment between
your query sequence and the
matching sequence of the
protein that corresponds to
this score.

97
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

98
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

What is Alignment?
Alignment is the result of a comparison of two or more
gene or protein sequences in order to determine their
degree of base or amino acid similarity.

Pairwise Alignment

Multiple Alignment
99
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment
acquire some preliminary information
9 about a particular function that you’re
interested in — dUTPase.

9 find out more about it by retrieving a


few examples of protein sequences perform a
ExPASy that perform this function in E. coli. multiple
alignment

9 9
retrieve DNA sequence relevant perform a BLAST search
to dUTPase protein of E. coli.
100
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment
Multiple alignments are used to :
• Identify sequence positions where specific amino acids really
matter for the structural integrity or the function of a given protein
• Define specific sequence signatures for protein families
• Classify sequences and build evolutionary trees

101
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment http://pir.georgetown.edu

102
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment http://pir.georgetown.edu

103
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment http://pir.georgetown.edu

http://1.51.212.243/multi.fasta

Get sequences under :


http://1.51.212.243/multi.fasta

Select all

Copy
104
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment http://pir.georgetown.edu

Paste

105
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment http://pir.georgetown.edu

106
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment http://pir.georgetown.edu

* identical
: similar
. related
different

107
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment http://pir.georgetown.edu

108
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment http://pir.georgetown.edu

Conserved region

109
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment http://pir.georgetown.edu

110
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Making a Multiple Sequence Alignment http://pir.georgetown.edu

111
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure
acquire some preliminary information retrieve a
protein structure
9 about a particular function that you’re
interested in — dUTPase.

find out more about it by retrieving a 9


9
few examples of protein sequences perform a
ExPASy that perform this function in E. coli. multiple
alignment

9 9
retrieve DNA sequence relevant perform a BLAST search
to dUTPase protein of E. coli.
112
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure dUTPase
DNA sequence

protein sequence

3D structure

113
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure
Using fields to find experts near you :

Beijing
Beijing

Tel : 86 - 10 - 6275-5002
Fax : 86 - 10 - 6276-2292
New Life Science Building, Peking
University, Summer Palace Road
No. 5, Beijing, P. R. China 100871 114
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure

Su XD dUTPase

115
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure

116
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure

117
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure

118
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure Press left
button

119
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure Pressing
left button

Action
Rotate View Left Click and Drag
Shift + Left Click
Zoom drag mouse up or down /
roll mouse middle button
Select/
Deselect Left Click
Residue
Jmol Menu Right-Click
120
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure

121
Introduction to Bioinformatics
English Courses for Graduate Students

How Most People Use Bioinformatics?


9Retrieve a protein structure

Backbone by chain 122


Introduction to Bioinformatics
English Courses for Graduate Students

123

You might also like