You are on page 1of 44

Mass Spectrometry I

Basic Data Processing

Mass spectrometry
A mass spectrometer measures molecular masses. The mass unit is called dalton, which is 1/12 of the mass of a carbon atom, and is about the mass of one hydrogen atom. If there is a mixture of different molecules in a sample, all the masses are measured simultaneously. So you get a spectrum.

Some Pictures
MALDI-R Q-Tof Micro

FT-ICR

LTQ-Orbitrap

Each peak corresponds to a different type of molecule in your sample


100
2790.22 2791.23

peak list ... 2789.22 3597.0 2790.22 5018.0 2791.23 4406.0 2792.23 2868.0 2793.23 1234.0

1324.60

2789.22

1325.62

2792.23

1265.62

2466.18 2465.20 2467.19 1326.60 1759.93 1974.94 1760.93 1975.93 1748.86 1477.62 1761.92 1327.61 1478.61 1540.63 1460.59 1976.92 2356.10 2355.11 2179.87 2794.20 2469.17 2746.23 2795.06 3104.41 3103.43 3106.42 2468.20 2793.23

1179.41

0 1000

m/z 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000

Three Components of an MS
A typical mass spectrometer contains
Ionizer Mass analyzer Detector

Ion source charges the to-be-measured molecules.


Charge can be negative but often positive. Two common types: MALDI and ESI. John B. Fenn & Koichi Tanaka 2002 Nobel Prize in Chemistry for Electrospray and MALDI

Mass analyzer separates ions according to the mass to charge ratio (m/z) of the ions.
Iontrap, TOF, Quadrupole, FTICR.

Detector detects the ions.

Ionization (1): MALDI


Matrix Assisted Laser Desorption/Ionization

Sample is co-crystallized with matrix (solid) Formation of singly charged ions Koichi Tanaka, Nobel Prize 2002 Other ionization method exists.

Mass Analyzer (1) TOF


Time of Flight.
+ Detector

Other mass analyzer exists. Time of flight is proportional to sqrt(m/z)

Putting Them Together


MALDI Time-of-flight

MALDI TOF

Drift region (D)

Average time in TOF: 10-7 sec : average speed 1-2 x 105 km/h

MALDI-TOF Linear

Mass range = 800-200,000

Sensitivity and accuracy decrease rapidly with size !

MALDI-TOF Linear vs Reflectron Mode Linear = poor resolution due to velocity variation of ions with the same m/z Reflectron = Contact lens for a near sighted machine!

Reflectron gives much better resolution for mass < 6,000

Protein identification with intact mass


We measure the intact mass of the protein. Then search in the protein database to find a protein with the same mass. Good idea but there are too many proteins with the same mass. In the rest of the lecture we study more sophisticated methods and why protein ID is important.

Complications
isotopes

widened peaks profile

Centroiding

Another example with lower resolution

Isotopes

Back to Basics
Chemical Composition of Living Matter 27 of 92 natural elements are essential. Elements in biomolecules (organic matter): H, C, N, O, P, S These elements represent approximately 92% of dry weight.

Organic Matter Organized in "building blocks" amino acids polypeptides ( proteins)

monosaccharides

starch, glycogen

nucleic acids

DNA, RNA

Mass (Weights) of Atoms and Molecules


element nominal mass C 12 exact mass 12.00000 Percent abundance 98.9% average mass

13
H 1 2

13.00335
1.00783 2.0140

1.1%
99.98% 0.02%

12.00115

1.008665

O
N

16 18
14 15

15.99491 17.9992
14.00307 15.00011 31.97207 32.97146 33.96787

99.8% 0.02%
99.63% 0.37% 94.93% 0.76% 4.29%

15.994

14.0067

32 33 34

excercise

Mass or Molecular Weight of molecules

Ethyl acetate 4 C12 8 H1 2 O16

C4H8O2 4 x 12.0000 8 x 1.0078 2 x 15.99949 48.0000 8.064

31.9898

Nominal Mass:

48 + 8 + 32 =

88 88.0555

Monoisotopic Mass: Average Mass: 48.04446 + 8.06932 + 31.988 = 88.10178

Amino Acids
There are 20 amino acids. All have the same basic structure but with different side chains:

Examples:

side chain group

H Glycine, or Gly, or G Arginine, or Arg, or R

All the 20 Structures

* Picture copied from Dr. R.J. Huskeys website: http://www.people.virginia.ed u/~rjh9u/aminacid.html

Peptides and Proteins


GR H Glycine, or Gly, or G Arginine, or Arg, or R

N-terminal

C-terminal

peptide bonds

Mass of Amino Acids Residues

Exact Mass of Amino Acid Residues in Proteins

Gly Ala Gln Lys Glu

G A Q K E

57.02150 71.03720 128.05860 128.09500 129.04270

Note: Leu (L) = Ile (I) = 113.08410

Amino Acid Table


AA Codes
Gly Ala Ser Pro Val Thr Cys Leu Ile Asn G A S P V T C L I N -

Mono.
57.021464 71.037114 87.032029 97.052764 99.068414 101.04768 103.00919 113.08406 113.08406 114.04293 -

AA Codes I O N S O U R C E . C O M
Asp Gln Lys Glu Met His Phe Arg CMC Tyr Trp Y W D Q K E M H F R

Mono.
115.02694 128.05858 128.09496 129.04259 131.04048 137.05891 147.06841 156.10111 161.01467 163.06333 186.07931

Cysteine

Proteins are often treated so that cysteine becomes carboxyamidomethyl cysteine (CamC) or Carboxymethyl (CmC) in order to break the disulphide bonds. CamC = 160.03

Mass of Peptides and Proteins

Ala-Ser-Phe (ASF) tripeptide (MW 71.04+87.03+147.07+18.01)=323.15 More precisely: monoisotopic mass 323.1481 average mass 323.3490

In a mass spectrum
Deconvolution adds all the isotopic peaks to the monoisotopic peak. So, the later process does not need to worry about the isotopes. isotope peaks

Monoisotope peak

323.15

324.15

325.15

Check the difference

ESI and Multiply Charged Ions

Electrospray

Ionization (2) ESI

Electrospray Ionization: Formation of Charged Droplets

Formation of multiply charged ions

Multiply Charged Ions


The same molecules may be charged differently, and therefore form a few peaks in the spectrum.
162.08

323.15

162.58 163.08

324.15 325.15 (M+1)/1 m/z

(M+3)/3

(M+2)/2

For protein/peptide with positive charges, the charge is obtained from adding protons (which has mass approx. 1 dalton. As a result, a molecule with mass M will have peaks at (M+Z)/Z

How to determine charge states?


Isotope ions when resolution is enough. Check different charge states when resolution is not enough.

Exercise

395.73 396.22

397.24

Exercise

Exercise
1211.9 1304.7 1413.2 1541.9

(A) Multi-charge envelope

(B) After Charge-deconvolution algorithm

Baseline

Baseline correction

Convex Hull Method

convex

not convex

Convex Hull

A convex hull is such that all the data points are above the lines and their extensions.

How to calculate convex hull?


Stack S contains all the data points that form the convex hull so far. Data point D[i] = (D[i].x, D[i].y). Algorithm: 1. S.push( D[0] ); s.push(D[1]) 2. for i from 2 to n 2.1 while D[i], S.top(), S.secondtop() are concave 2.1.1 S.pop(); 2.2 S.push(D[i]); 3. return S S.top() S.secondtop()

D[i]

Analyze the convex hull algorithm


Correctness
The algorithm finishes. The output is a convex hull. The proof will be included in an assignment.

Time complexity
O(n) time. Proof: each point is checked only once, and added to (and therefore removed from) the stack at most once.

Summarize of spectrum preprocessing


Baseline correction Centroiding Charge recognition and deconvolution Noise removal

You might also like