You are on page 1of 51

BTH3752 MBB

Proteomics IIa
Analyzing Protein Sequence
Learning objectives
1. To learn the steps and method in direct protein
sequencing

2. To understand the current approach for indirect


determination of protein sequences
3. To learn the principle, important components
and application of mass spectrometry (MS)

Edman degradation

Elution profile of standard PTH-amino acids

N-terminal Protein/Peptide Sequencing

In practice peptides
cannot be much longer
than about 50 residues.

Direct Determination of a Protein Sequence

1.

An Eight Step Strategy:


If more than one polypeptide chain,
separate.

2.

Cleave (reduce) disulfide bridges

3.

Determine composition of each chain

4.

Determine N- and C-terminal residues

5.

6.

7.

8.

Cleave each chain into smaller fragments


and determine the sequence of each chain

Repeat step 5, using a different cleavage


procedure to generate a different set of
fragments
Reconstruct the sequence of the protein
from the sequences of overlapping
fragments
Determine the positions of the disulfide
crosslinks

Current Routes to protein annotation

Obviously, it is easier to just sequence the gene for a


protein rather than laboriously sequence the protein
itself by direct methods.

This is why most (>99%) of protein sequences


generated today are from indirect, inferential
nucleic acid sequencing data.

Today mass spectrometry is one of the most


powerful tools for analysis and annotation of
protein.

Principle of Mass Spectrometry (MS)

Different elements can be uniquely identified by


their mass

Principle of Mass Spectrometry (MS)

Different compounds can be uniquely identified


by their mass
Butorphanol

L-dopa

N -CH2OH

Ethanol
COOH

HO

-CH2CH-NH2

CH3CH2OH

HO
HO

MW = 327.1

MW = 197.2

MW = 46.1

Principle of Mass Spectrometry (MS)

Accurate determination of the molecular mass by


calculating mass-to-charge ratios in a vacuum

Used for;

Determining masses of particles,


Determining the elemental composition of a sample or
molecule,
Elucidating the chemical structures of molecules

Nucleic acids, proteins, metabolites and other chemical


compounds

In protein analysis, MS is essential tool for

Determining protein ID
Determining the sequence and structure of protein
Detecting protein modifications

Mass Spectrometer
Q Exactive Orbitrap

API QStar Pulsar i

ESI Mass Spec

LTQ Orbitrap XL

Essential Parts of Mass Spectrometer

Ionizer

Ionization: find a way to charge the atoms or molecules in sample

MALDI (Matrix Assisted Laser Desorption Ionisation)


ESI (Electrospray Ionisation)

Mass analyser

Sorts the charged atoms or molecules (ions) by their masses


Place ions in a magnetic field or subject it to an electric field and measure
its speed or radius of curvature relative to its mass-to-charge ratio

TOF (Time of flight)


Quadrupole

Ion trap
Orbitrap

Detector

Detect ions using


microchannel plate or
photomultiplier tube
Provides data for
calculating the
abundances of each ion
present

Principal Workflow of MS

Introduce sample to the instrument


Generate ions in the gas phase
Separate ions on the basis of differences in m/z with
a mass analyzer in a vacuum
Detect ions
Generate mass
spectrum that
representing the
distribution of ions
by mass in the
sample.

Application of MS in Protein Analysis

MALDI-TOF for Protein Identification


ESI-MS/MS for Protein Sequencing

MALDI
ESI

TOF
Quadrupole
Ion trap
Orbitrap

MALTI-TOF for Protein Identification


Matrix-assisted laser desorption ionization (MALDI)

Analyte (protein) is mixed with large excess of matrix (small


organic molecule)
Irradiated with short pulse of laser light. Wavelength of laser
is the same as max absorbance of matrix.

nicotinic acid:
absorption at 266 nm
sinapinic acid:
absorption 337-353 nm
-cyano-4-hydroxycinnamic acid:
absorption 337-353 nm

Matrix transfers some of its


energy to the analyte (leads
to ion sputtering)

Ion Mode in MS: Positive or Negative?

If the sample has functional groups that readily


accept H+
e.g. amide and amino groups found in peptides
and proteins

Positive ion detection is used PROTEINS

If a sample has functional groups that readily lose


a proton
e.g. carboxylic acids and hydroxyls as found in
nucleic acids and sugars
Negative ion detection is used - DNA

MALTI-TOF for Protein Identification


Matrix-assisted laser desorption ionization (MALDI)

MALDI generates spectra that have just a singly charged ion

Positive mode generates ions of M + H

Negative mode generates ions of M - H

Generally more robust, easier to use and maintain, capable of


higher throughput

Requires 10 L of
1 pmol/mL sample

MALDI ionization process

MALTI-TOF for Protein Identification


Linear Time Of Flight tube

Linear TOF mode

ion source

detector

time of flight

Reflector Time Of Flight tube

In Time of Fly (TOF) analyzers, ionized molecules are


accelerated by an electrostatic field and are then ejected
through a flight tube under vacuum.
ion source

detector

reflector

Smaller ions fly faster than larger ions.

The detector measures the time of flight for each particular


ion.

Theoretically, the time to reach the detector will be dependent


on the mass of that particular ion

time of flight

MALTI-TOF for Protein Identification


ion source

detector

TOF mode with reflectron

time of flight

Reflector Time Of Flight tube

ion source

detector
reflector

time of flight

Reflectron
focuses ions with the same m/z values, making these ions reach
the detector at the same time.
more accurate in the reflectron mode than in the linear mode

MALTI-TOF for Protein Identification


Detector

Early detectors used


photographic film

Todays detectors (ion channel


and electron multipliers) produce
electronic signals via 2 electronic
emission when struck by an ion

Timing mechanisms integrate


these signals with scanning
voltages to allow the instrument
to report which m/z has struck
the detector

Electron Multiplier
(Dynode)

Mass to charge ratio

MALTI-TOF for Protein Identification


Typical Mass Spectrum

Characterized
by sharp,
narrow peaks

X-axis position
indicates the
m/z ratio of a
given ion

Relative Abundance

Dibutylphthalate

149 m/z-for singly charged ion, is the mass

Height of peak indicates the relative abundance of a


given ion (not reliable for quantitation)

MALTI-TOF for Protein Identification


Peptide Mass Fingerprinting (PMF)

Depends on the fact that if a peptide


is cut up or fragmented in a known
way, the resulting fragments (and
resulting masses) are unique enough
to identify the protein

Used to identify protein spots on gels or protein peaks from


an HPLC run

Requires a database of known sequences

Uses software to compare observed masses with masses


calculated from database

MALTI-TOF for Protein Identification


Fragmentation of polypeptide

Enzymatic fragmentation

Chemical fragmentation

trypsin, chymotrypsin,
pepsin, staphylococcal
protease

cyanogen bromide

Trypsin

Most important
Cleaves peptide bond after
positively charged AAs

Principles of Fingerprinting
Sequence
Protein 1
acedfhsakdfqea
sdfpkivtmeeewe
ndadnfekqwfe
Protein 2
acekdfhsadfqea
sdfpkivtmeeewe
nkdadnfeqwfe

Protein 3
acedfhsadfqeka
sdfpkivtmeeewe
ndakdnfeqwfe

Mass (M+H)

Tryptic Fragments

4842.05

acedfhsak
dfgeasdfpk
ivtmeeewendadnfek
gwfe

4842.05

acek
dfhsadfgeasdfpk
ivtmeeewenk
dadnfeqwfe

4842.05

acedfhsadfgek
asdfpk
ivtmeeewendak
dnfegwfe

Principles of Fingerprinting
Sequence

Mass (M+H)

Protein 1
acedfhsakdfqea
sdfpkivtmeeewe
ndadnfekqwfe

4842.05

Protein 2
acekdfhsadfqea
sdfpkivtmeeewe
nkdadnfeqwfe

4842.05

Protein 3
acedfhsadfqeka
sdfpkivtmeeewe
ndakdnfeqwfe

4842.05

Mass Spectrum

Amino Acid Residue Masses


Monoisotopic Mass
Glycine
Alanine
Serine
Proline
Valine
Threonine
Cysteine
Isoleucine
Leucine
Asparagine

57.02147
71.03712
87.03203
97.05277
99.06842
101.04768
103.00919
113.08407
113.08407
114.04293

Aspartic acid
Glutamine
Lysine
Glutamic acid
Methionine
Histidine
Phenylalanine
Arginine
Tyrosine
Tryptophan

115.02695
128.05858
128.09497
129.0426
131.04049
137.05891
147.06842
156.10112
163.06333
186.07932

Building a PMF Database


Sequence DB

Tryptic Frags

Calc. Mass

P12345 (Protein 1)
acedfhsakdfqea
sdfpkivtmeeewe
ndadnfekqwfe

acedfhsak
dfgeasdfpk
ivtmeeewendadnfek
gwfe

1007.4251 (P12345)
1183.5266 (P12345)
2098.8909 (P12345)
609.2667 (P12345)

P21234 (Protein 2)
acekdfhsadfqea
sdfpkivtmeeewe
nkdadnfeqwfe

acek
dfhsadfgeasdfpk
ivtmeeewenk
dadnfeqwfe

450.2017 (P21234)
1740.7501 (P21234)
1407.6462 (P21234)
1300.5116 (P21234)

P89212 (Protein 3)
acedfhsadfqeka
sdfpkivtmeeewe
ndakdnfeqwfe

acedfhsadfgek
asdfpk
ivtmeeewendak
dnfegwfe

1526.6211 (P89212)
664.3300 (P89212)
1593.7101 (P89212)
1114.4416 (P89212)

Database Search
Query Masses

Database Mass List


450.2017 (P21234)
609.2667 (P12345)
664.3300 (P89212)
1007.4251 (P12345)
1114.4416 (P89212)
1183.5266 (P12345)
1300.5116 (P21234)
1407.6462 (P21234)
1526.6211 (P89212)
1593.7101 (P89212)
1740.7501 (P21234)
2098.8909 (P12345)

450.2201
609.3667
698.3100
1007.5391
1199.4916
2098.9909

Results
2 Unknown masses
1 hit on P21234
3 hits on P12345

Conclude the query


protein is P12345

Database of peptide mass list


Search program

800

1200

1600

2000

m/z

Experimental

P12345

P21234

P89212

2400

Protein ID
Theoretical

Theoretical

Theoretical

Generating Peptide Mass List

A protein (chaperonin GroEL) derived from sequence database


digested with specific protease (trypsin)
546 aa

60 kDa; 57 461 Da

pI = 4.75

>RBME00320 Contig0311_1089618_1091255 EC-mopA 60 KDa chaperonin GroEL


MAAKDVKFGR TAREKMLRGV DILADAVKVT LGPKGRNVVI EKSFGAPRIT KDGVSVAKEV
ELEDKFENMG AQMLREVASK TNDTAGDGTT TATVLGQAIV QEGAKAVAAG MNPMDLKRGI
DLAVNEVVAE LLKKAKKINT SEEVAQVGTI SANGEAEIGK MIAEAMQKVG NEGVITVEEA
KTAETELEVV EGMQFDRGYL SPYFVTNPEK MVADLEDAYI LLHEKKLSNL QALLPVLEAV
VQTSKPLLII AEDVEGEALA TLVVNKLRGG LKIAAVKAPG FGDCRKAMLE DIAILTGGQV
ISEDLGIKLE SVTLDMLGRA KKVSISKENT TIVDGAGQKA EIDARVGQIK QQIEETTSDY
DREKLQERLA KLAGGVAVIR VGGATEVEVK EKKDRVDDAL NATRAAVEEG IVAGGGTALL
RASTKITAKG VNADQEAGIN IVRRAIQAPA RQITTNAGEE ASVIVGKILE NTSETFGYNT
ANGEYGDLIS LGIVDPVKVV RTALQNAASV AGLLITTEAM IAELPKKDAA PAGMPGGMGG
MGGMDF

http://us.expasy.org/tools/peptide-mass.html

Generating Peptide Mass List

Trypsin yields 47 peptides (theoretically) for chaperonin GroEL


Peptide masses in Da:
501.3

533.3

544.3

545.3

614.4

634.3

674.3

675.4

701.4

726.4

822.4

855.5

861.4

879.4

921.5

953.4

974.5

988.5

1000.6

1196.6

1217.6

1228.5

1232.6

1233.7

1249.6

1249.6

1344.7

1455.8

1484.6

1514.8

1582.9

1583.9

1616.8

1726.7

1759.9

1775.9

1790.6

1853.9

1869.9

2286.2

2302.2

2317.2

2419.2

2526.4

2542.4

3329.6

4211.4

http://us.expasy.org/tools/peptide-mass.html

2D-PAGE approach

Multidimensional LC approach

Generalized Protein Identification by MS


Protein sample

Fragmented
using trypsin

Spectrum of
fragments
generated

MALDI-TOF
Library

Mascot

Artificial
spectra built

MATCH

Artificially
trypsinated

Database of
sequences
(i.e. SwissProt)
http://www.uniprot.org/

Peptide Mass Analysis Tools


MS database search program

Mascot

Tutorial
http://www.matrixscience.com/help/pmf_help.html

Data entry
http://www.matrixscience.com/cgi/search_form.pl?FORMVE
R=2&SEARCH=PMF

Search result
http://www.matrixscience.com/cgi/master_results.pl?file=../
data/F981122.dat

Search result_protein info


http://www.matrixscience.com/cgi/protein_view.pl?file=..%2
Fdata%2FF981122.dat&hit=1&db_idx=1

Peptide Sequence Analysis


Protein

Peptide Sequence

Mass (M+H)

Human hemoglobin alpha

vgahageygaealer

1529.74

Mouse hemoglobin alpha

igghgaeygaealer

1529.74

In some cases, two peptides are identical in one way (mass),


yet are obviously different in another (AA sequence).

Therefore sequencing is required to differentiate these


peptides with identical mass.

Tandem MS technology allows further peptide fragmentation


in order to determine peptide sequence from fragmentation
patterns in MS-MS spectra.

ESI-MS/MS for Protein Sequencing


Electrospray Ionization (ESI)

Liquid containing analyte is forced through a steel capillary at


high voltage to electrostatically disperse analyte.

Charge imparted from rapidly evaporating liquid.


Ions evaporating
from the surface
of the droplets
Droplet
containing
ions

Capillary,
3-4 kV

ESI-MS/MS for Protein Sequencing


Electrospray Ionization (ESI)

Can be modified to nanospray system with flow < 1 mL/min

Very sensitive, requires < 1 picomole of material

Positive ion mode measures (M + H)+ (add formic acid to


solvent)

Negative ion mode measures (M - H)- (add ammonia to


solvent)

Production of multiply charged ions from proteins and


peptides.

Peptides (250-2500 Da) typically exist as a mixture of singly,


doubly and triply charged ions with doubly charged ion as
dominant form.

ESI-MS analysis of a protein (bovine apomyoglobin)

Multicharge signals from differently


charged forms of the protein

Charge-deconvolution program can


convert multiplecharge signals to one that
represents the actual protein mass

ESI-MS analysis of a peptide (DAFLGSFLYEYSR)

Full scan spectrum of the


peptide indicating

A singly charged ion [M+H]+


at m/z 1567.9
A doubly charged ion [M+2H]+
at m/z 784.7

Mass Analyzers for MS/MS Analysis


Quadrupole

The applied
voltages affect the
trajectory of ions
traveling down the
flight path

For given dc and ac voltages, only ions of a certain


mass-to-charge ratio pass through the quadrupole
filter and all other ions are thrown out of their
original path

Mass Analyzers for MS/MS Analysis


Ion trap

Ion trapping devices that make


use of a three-dimensional
quadrupole field to trap and
mass-analyze ions

Offer good mass resolving power

Orbitrap

Ion trap consisting of an outer barrel-like electrode


and a coaxial inner spindle-like electrode

Trapped ions are


separated based on their
orbital movement
around the spindle.

ESI-MS/MS for Protein Sequencing


Tandem Mass Spectrometry

Purpose is to fragment ions from parent ion to


provide structural information about a molecule

Also allows mass separation and identification of


amino acid (AA) of compounds in complex mixtures

Uses two or more mass analyzers/filters in series


with a collision cell in between

Collision cell is the region where selected ions collide


with a gas (He, Ne, Ar) resulting in further
fragmentation of the ion

Tandem Mass Configuration

Triple quadrupole (QQQ)

Quadrupole time-of-flight (QTOF)

Full-scan mode
Collision

MS-MS mode

Ion trap

Trapping of ions within the


analyzer

Sequential scanning out


of ions of differing m/z

Collision-induced
dissociation (fragmentation)
of a selected ion

Sequential scanning out of product


ions derived from fragmentation of
the precursor ion.

Peptide Sequencing using Tandem Mass

Fragmentation of the peptides


occur in a predictable fashion,
mainly at the peptide bonds

The resulting daughter ions have


masses that are consistent with
known molecular weights of
dipeptides, tripeptides,
tetrapeptides

A-V-A-G-C-A-G-A-R

769

Collision Cell
A-V-A-G-C-A-G-A
A-V-A-G-C-A-G
A-V-A-G-C-A
A-V-A-G-C
A-V-A-G
A-V-A
A-V
A

601
530
473
402
299
242
171
72
Accumulative
mass

Fragmentation of Peptide Backbone


Collision-induced dissociation (CID)

Most widely applied fragmentation method in proteomic


study

Peptide ion undergoes collisions by interactions with


neutral gas molecules

Resulting vibrational energy dissociates amide bonds along


the peptide backbone, generating b- and y-type fragment
ions.

Other fragmentation methods

Electron-transfer dissociation (ETD)

Electron-capture dissociation (ECD)

High-energy collision dissociation (HCD)

Nomenclature for
fragmentation of peptide ions

Fragmentation of Peptide Backbone


Collision-induced dissociation (CID)

Possible b- and y-ion fragments for the peptide AVAGCAGAR

b1 A+

y8 V-A-G-C-A-G-A-R+

b2 A-V+

y7 A-G-C-A-G-A-R+

b3 A-V-A+

y6 G-C-A-G-A-R+

b4 A-V-A-G+

y5 C-A-G-A-R+

b5 A-V-A-G-C+

y4 A-G-A-R+

b6 A-V-A-G-C-A+

y3 G-A-R+

b7 A-V-A-G-C-A-G+

y2 A-R+

b8 A-V-A-G-C-A-G-A+

y1 R+

Fragmentation of Peptide Backbone


Collision-induced dissociation (CID)

y- and b-ion series describe the same amino acid sequence in


two different directions.

Annotated MS-MS spectrum of the [M+2H]2+ ion of AVAGCAGAR

MS/MS Analysis of Peptide Mixtures


LC
Separation of
peptide mixtures

MS
m/z profile of
peptide ion

MS/MS
Fragments of
peptide ion

Identification of
amino acid

MS/MS Analysis of Peptide Mixtures


Automated collection of MS/MS spectra

Matching MS/MS Spectra to Peptide Sequences


SEQUEST database search program
Experimental MS/MS
Spectrum

Peptides Matching Precursor Ion


Mass

Theoretical MS/MS
Spectra

PySpzS5609 #2438 RT: 66.03 AV: 1 NL: 8.37E6


T: + c d Full ms2 729.75@35.00 [ 190.00-1470.00]
545.31

100
95

#1
CALCULATE #2
#3
#4
#5

90
85
80
75
658.36
70
65

900.36

Relative Abundance

60
55
1031.40
50
45
913.42

40

1240.53
782.23
896.29

35

546.19

771.24

25

1028.41

721.31

20

431.15

15

801.38

427.27

559.13
651.14

408.74
399.24

217.91

1241.39

914.34

317.17

10
5

1032.43

895.33

30

K.TVLIMELINNVAK.K
L.NAKMELLIDLVKA.Q
E.ELAILMQNNIIGE.N
A.CGPSRQNLLNAMP.S
L.FAPLQEIINGILE.G

432.40

669.39

1027.22
915.53
986.50

882.07

600.24

481.13

869.23

1258.56

1033.60

1312.35
1142.43

1123.49

1356.10

1195.44

0
200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

m/z

COMPARE
SCORE

SEQUEST Output File

Peptide Mass Analysis Tools


MS/MS database search algorithms

Sequest

Paragon Algorithm

Mascot

Software for Proteomics Study

Proteome Discoverer

Maxquant

Mass informatics Platform (Thermo Scientific)


For analysis of qualitative and quantitative proteomics data
Include BioWork Browser with both SEQUEST and Mascot as search
engines for protein identification
Quantitative proteomics software package designed for analyzing
large mass spectrometric data sets.
Support Thermo LTQ-Orbitrap mass data analysis with Mascot as a
search engine.

Analyst QS

Applied Biosystem/MDS SCIEX


Software to support proteomics workflow for QSTAR system
Include a number programs e.g. ProteinPilot with Paragon
algorithm as search engine for protein identification.

Tandem Mass Spectrometry


Advantages

FAST, Gel-free
Determines MW and AA sequence
Can be used on complex mixtures-including low abundant
proteins
Can detect post-translational modification
High-thoughput capability

Disadvantages

Very expensive facility

Hardware: $1000
Setup: $300
1 run: $1000

Requires sequence databases for analysis

Summary of Mass Spectrometry

Create ions

Separate ions

Ionization
methods
MALDI
ESI

Mass analyzers
TOF
MALDI-TOF: MW

Quadrupole

ESI-QQQ: AA seq
MALDI-QTOF: MW & AA seq

Ion trap
Orbitrap

AA seq & protein modif.

Detect ions

Mass spectra
Database search
& matching
MS spectra
Mascot search program

MS/MS spectra

Sequest search program

Questions to check for understanding


1. Direct determination of a protein sequence can be
thought of as an eight step strategy, what are these
eight steps?

2. Name a method commonly used for N-terminal


sequencing of a short peptide (not longer than 50
residues).
3. Sequence the protein itself by direct method is
laborious. How do most of the protein sequences be
generated today?

Questions to check for understanding


4. Mass spectrometry (MS) is an analytical method widely
used for chemical compound analysis lately. Briefly
describe the:

(a) Principle of MS,


(b) Important components of a mass spectrometer

(c) Application of MS in protein analysis


5. What do you understand the method Peptide Mass
Fingerprinting in protein identification using mass
spectrometry?

You might also like