You are on page 1of 58

Bioinformatics for Proteomics

Shu-Hui Chen ( 陳淑慧 )


Department of Chemistry
National Cheng Kung University
What is Proteomics ?

Systematic analysis of
All protein sequences
All protein expression pattern
All protein interactions

This involves
Protein identification
Protein quantification
Functional characterization of all proteins
MS-based Bioinformatics
• MS instrument is so far not sensitive enough to
resolve proteins in a biological system solely
based on signals measured.

• MS, however, is able to acquire sufficient data


for mapping a protein from the database using
new computer algorithms to analyze the data.

• MS data could be used for Protein


Quantification
The tools of Proteomics

Traditional protein chemistry assay methods struggle to establish


Identity and quantification

Identity requires:
Specificity of measurement (Precision)
Mass Spectrometry
MS-based data acquisition algorithm
Database dependent identification-A reference for comparison
Protein sequence databases
Search algorithms
Database-independent identification-No reference
De-novo sequencing
MS-based Protein Identification and Quantification

 Mass Mapping

Peptide Sequencing

Protein Quantification
Mass Spectrometry
Protein identified by database mapping
http://www.expasy.org/tools/
www.uniprot.org
Automated Database Search
Number 1 match: tumor necrosis factor type 1 receptor
associated protein TRAP-1 (Mr): 76030.27
1 RALRRAPALA AVPGGKPILC PRRTTAQLGP RRNPAWSLQA GRLFSTQTAE
51 DKEEPLHSII SSTESVQGST SKHEFQAETK KLLDIVARSL YSEKEVFIRE
101 LISNASDALE KLRHKLVSDG QALPEMEIHL QTNAEKGTIT IQDTGIGMTQ
151 EELVSNLGTI ARSGSKAFLD ALQNQAEASS KIIGQFGVGF YSAFMVADRV
201 EVYSRSAAPG SLGYQWLSDG SGVFEIAEAS GVRTGTKIII HLKSDCKEFS
251 SEARVRDVVT KYSNFVSFPL YLNGRRMNTL QAIWMMDPKD VGEWQHEEFY
301 RYVAQAHDKP RYTLHYKTDA PLNIRSIFYV PDMKPSMFDV SRELGSSVAL
351 YSRKVLIQTK ATDILPKWLR FIRGVVDSED IPLNLSRELL QESALIRKLR
401 DVLQQRLIKF FIDQSKKDAE KYAKFFEDYG LFMREGIVTA TEQEVKEDIA
451 KLLRYESSAL PSGQLTSLSE YASRMRAGTR NIYYLCAPNR HLAEHSPYYE
501 AMKKKDTEVL FCFEQFDELT LLHLREFDKK KLISVETDIV VDHYKEEKFE
551 DRSPAAECLS EKETEELMAW MRNVLGSRVT NVKVTLRLDT HPAMVTVLEM
601 GAARHFLRMQ QLAKTQEERA QLLQPTLEIN PRHALIKKLN HCAQASLAWL
651 SCWWIRYTRT P

Total coverage: 33.4%


History for MS Searching

1993 MOWSE By Pappin and Bleasby

1994 SEQUEST By Yates and Eng

1996 MOWSEⅡ
Molecular Weight Search
1997 MOWSEⅢ

1998 MASCOT By Matrix science


Scoring algorithm
Final score= -10*LOG(P),
where P is absolute probability that the observed match
is a random event

E value (expected value) = describes the number of hits


one can expect to see by chance when searching a
database of a particular size. A value of zero indicates
that no matches would be expected
by chance.
MS-based Protein Identification and Quantification

Mass Mapping

Peptide Sequencing

Protein Quantification
Tandem Mass Spectrometry- MS/MS

MS/MS acquisition is controlled by software setting


Nomenclature used for CID peptide fragmentation-
Low Energy (eV)- Q, TOF, FT

“Bioanalytical Chemistry” Mikkelsen, S.R.,


published by John Wiley & Sons, Inc.
Protein Identification by Database Search
PRIDE website
http://www.ebi.ac.uk/pride
Database Independent Sequencing- De Novo Sequencing
http://www.bioinfor.com/peaks/tutorials/denovo.html
Sequence Tag Approach for Peptide Sequencing

“Bioanalytical Chemistry” Mikkelsen, S.R.,


published by John Wiley & Sons, Inc.
MS-based Protein Identification and Quantification

Mass Mapping

Peptide Sequencing

 Protein Quantification
MS-based Quantification Methods Using
Stable Isotope Labeling

X.J. Li Institute for systems biology, Seattle


Exercises
Choose one protein you are interested in.

Go to the Bioinformatics portal website http://www.expasy.org/tools/,


link to Uniprot and search for your protein.

List accession number, molecular weight, and aminoacid sequence of


your protein. Briefly describe the protein function and its subcellular
localization.

Link to PaxDb proteomic database, show the abundance of the protein.

Link to PRIDE or MaxQB proteomic database, list the peptides


sequences) that have been identified by MS for this protein. Include
Mascot peptide score for each identified sequence. If same peptides
were identified by multiple times, list the highest score.

You might also like