You are on page 1of 50

A biased look at Biomarkers

BioMarker
Definition:
Biomarker is a substance used as an indicator of a biologic state Existence of living organisms or biological process. A particular disease state

Proteins Nucleic acids

Metabolites:

Carbohydrates
Lipids Small molecules

Biomarker
Detection of biomarker Detection of biomarker diagnosis Self properties, e.g enzymatic activities Antibodies, IHC, ELISA

Detection of biomarker Quantitative a link between quantity of the marker and disease Qualitative a link between exist of a marker and disease

Biomarker & Diagnosis


Ideal Marker for diagnosis
Should have great sensitivity, specificity, and accuracy in reflecting total disease burden. A tumor marker should also be prognostic of outcome and treatment Biomarker for Screening
The marker must be highly specific, minimize false positive and negative The marker must be able to clearly reflect the different stages of the disease (early)

The marker must be easily detected without complicated medical procedures. The disease markers released to serum and urine are good targets for application of early screening.
The method for screening should be cost effective.

Samples for biomarker detection


Blood, urine, or other body fluids samples Tissue samples

Prostate Cancer marker PSA


PSA is a protein normally made in the prostate gland in ductal cells that make some of the semen. PSA helps to keep the semen liquid. PSA, also known as kallikrein III, seminin, semenogelase, -seminoprotein and P-30 antigen, is a glycoprotein, a serine protease

Prostate Cancer Diagnosis with PSA


Cancer of the prostate does not cause any symptoms until it is locally advanced or metastatic. There is a correlation between elevated PSA and prostate cancer. Detection of PSA is a surrogate for early detection of prostate cancer.

Large screening trials have shown that PSA nearly doubles the rate of detection when combined with other methods. Based on these data, PSA testing was approved by the US FDA for the screening and early detection of prostate cancer.
PSA is also found in the cytoplasm of benign prostate cells. I never dreamed that my discovery four decades ago would lead to such a profit-driven public health disaster." -Richard Ablin (inventor of the PSA test) PSA screening generates ~$1.7 billion annually in the U.S. alone.

Sensitivity = the ability of the test to detect the disease (True positive rate) Specificity = the likelihood that your test will be normal if you are disease free (True Negative)

A brief aside about Statistics and Probability

-Statistics are the formalization of common sense -because they have to handle many different situations, they can be really complicated -they should make you feel really good or really bad about your data -People are inherently bad at statisitics and probability Case Study: rate for being HIV positive: 1:10000 false positive rate of HIV test: 1:1000

If I test positive, what is the chance that I am really HIV negative?

A brief aside about Statistics and Probability

-Statistics are the formalization of common sense -because they have to handle many different situations, they can be really complicated -they should make you feel really good or really bad about your data -People are inherently bad at statisitics and probability Case Study: rate for being HIV positive: 1:10000 false positive rate of HIV test: 1:1000

What is the chance that I am HIV negative? 0.0001 0.001 0.01 0.1 0.9 0.99 0.9999

A brief aside about Statistics and Probability

-Statistics are the formalization of common sense -because they have to handle many different situations, they can be really complicated -they should make you feel really good or really bad about your data -People are inherently bad at statisitics and probability Case Study: rate for being HIV positive: 1:10000 false positive rate of HIV test: 1:1000

What is the chance that I am HIV negative? 0.0001 0.001 0.01 0.1 0.9 0.99 0.9999

For every 1 True Positive there will be 10 false positives, so my chance of being Negative is 10/11.

How about the PSA test?

Rate is 15:10000 False Positive Rate is 60:1000 For every 15 True Positives, there will be 600 False Positives! Chance of being Negative 600/615 = .97 Chance of being Positive = .03 (before test chance was 0.015) -Is this true?

How about the PSA test?

Rate is 15:10000 False Positive Rate is 60:1000 For every 15 True Positives, there will be 610 False Positives! Chance of being Negative 600/615 = .97 Chance of being Positive = .03 (before test chance was 0.015) -Is this true? The test will miss 80% of the true positives (sensitivity = 20%) so there will only be 3 True Positives Detected so: Chance of being Negative 600/603 = 0.995 Chance of being True Positive = 0.005 Follow up for a +HIV test is another blood test. Follow up for +PSA test is tissue biopsy.

How good does a Biomarker have to be?

By Age 65 the rate of Prostate Cancer climbs to 8:1000 and the test performs much better. For every 8 True Positives, there will be 60 False Positives! Chance of being Negative 60/68 = .88 Chance of being Positive = .12 (before test chance was 0.015)

How good does a Biomarker have to be?

Prostate Cancer is one of the most frequent cancers (15:10000), most cancers are much less frequent (1:10000: 1:50000) so a biomarker would have to be much better than the PSA test. It is currently believed that a new biomarker would need sensitivity and specificity better than 95%.

Early Proteomics Base Biomarker work was based on SELDI

SELDI can detect 200-300 features in a sample. It has been used to find biomarkers from everything from blood to tears.

Early Biomarker work has largely been discredited


-Biomarkers with similar masses kept being rediscovered -When the proteins were identified, they were abundant serum proteins and were from the same proteins -Multi-center studies failed to validate the biomarkers in clinical setting

-Realization that serum and other biofluids are incredibly complex. -Realization that serum and other biofluids are incredibly variable and fragile -some strong biomarkers -blood collection tube -# of freeze-thaw cycles -diet

Key Concept: Proteins vary widely in concentration

Typical Biomarker Discovery study will take 50 samples per condition. Typically takes 10 samples per condition to have a 90% chance of finding differences of 2 times.Validation will take 1000s of samples. Finally the assay will have to be converted to something that can be done in a clinical lab.

PCA or other Clustering is used for Biomarker discovery

2007

Common Serum Markers for Cancer Diagnosis/prognosis


AFP Lung Pancreas Kidney Breast Ovarian Cervical Uterine Prostate Liver x x x x x x x x x x CEA x x x x x x x x x x x x x x x x x x x x CA15-3 x x CA19-9 x CA125 x x PSA PSAf PAP hTG HCGb x x x x Ferr x NSE x B2M A2M

Gastro
Colon Bladder Brain Leukemia Myeloma Thyroid Testicular x

x
x

x
x

x
x x x x x x x x x x x x x x

Conclusions
-Biomarker Discovery is difficult -biofluids are complex -biofluids have a high dynamic range -biomarkers are usually low abundance -even taking proximal fluids typically does not help -the is a lot of person to person variability -Most Biomarkers will never become clinically relevant -statistical standards for diagnostic tools is very high -the more prevalent the disease the better the biomarker will perform -An MS based biomarker assay is unlikely due to the greater analytical performance of antibody based methods.

-For a biomarker workflow to be meaningful it must be quantitative!

Quantitative Approaches
Stable Isotope Labeling methods -adds heavy isotopes to one sample so chemically identical compounds are mass shifted -added to the peptides/proteins using reactive groups -added to the proteins in vivo using heavy amino acids -can be multiplexed Label free methods -extracted ion chromatograms -spectral counting

4700 Reflector Spec #1 MC[BP = 863.4, 3348]


863.4279

100

3348.0
4700 Reflector Spec #1 MC[BP = 863.4, 3348]
1737.8809

90
100 90

1738.8808

1941.2

80

80

70

60

% Intensity

70

1059.5333

1737.8809

50

1739.8810
40

60

30

% Intensity

20

1740.8808

50

10

963.5271

1296.6797

0 1737.49425

1738.56954

1739.64483 Mass (m /z)

1740.72011

1741.79540

1742.87069

1021.5520

40

1210.6891

30

1425.6223

1353.6017

1901.8827

1079.5632

881.2428

1222.6218

20

995.5375

1125.4923

1174.5804

1570.6759

1720.8409

1495.6821

2030.0236

2242.1663

1844.8245

1922.8702

2211.0522

2465.1926

10

0 799.0

1441.8

2084.6 Mass (m /z)

2539.4324

2727.4

3370.2

4013.0

ISOTOPE-CODED AFFINITY TAG (ICAT):


Label protein samples with heavy and light reagent Reagent contains affinity tag and heavy or light isotopes
Chemically reactive group: forms a covalent bond to the protein or peptide

Isotope-labeled linker: heavy or light, depending on which isotope is used Affinity tag: enables the protein or peptide bearing an ICAT to be isolated by affinity chromatography in a single step

Example of an ICAT Reagent


Biotin Affinity tag: Binds tightly to streptavidinagarose resin Reactive group: Thiol-reactive group will bind to Cys

O
NH NH

Linker: Heavy version will have deuteriums at * Light version will have hydrogens at *

H N S O

* *

O O

O
*

H N I O

The ICAT Reagent

How ICAT works?


Affinity isolation on streptavidin beads

Lyse & Label

Quantification MS

Identification MS/MS

NH2-EACDPLR-COOH

Light
100 MIX
100

Heavy

Proteolysis (ie trypsin) 0

0 550 570 m/z


590

200

400 m/z

600

ICAT Quantitation

ICAT Advantages vs. Disadvantages


Estimates relative protein levels between samples with a reasonable level of accuracy (within 10%)
Can be used on complex mixtures of proteins Cys-specific label reduces sample complexity Can set up the mass spectrometer to fragment only those peaks with a certain ratio

Yield and non specificity


Slight chromatography differences Expensive Tag fragmentation

Meaning of relative quantification information


No presence of cysteine residues or not accessible by ICAT reagent

iTRAQ Reagent Design Isobaric Tag


(Total mass = 145)

Reporter
Charged
Gives strong signature ion in MS/MS Gives good b- and y-ion series Maintains charge state Maintains ionization efficiency of peptide

Balance
Neutral loss
Balance changes in concert with reporter mass to maintain total mass of 145 Neutral loss in MS/MS

PRG
Amine specific

Isobaric Tag Total mass = 145 Isobaric Tag


(Total mass = 145)

= MS/MS Fragmentation Site

Amine specific peptide reactive group (NHS)


O

Reporter Reporter Group mass (Mass = 114 thru N 117) 114 117 (Retains Charge)
N O

O N

Peptide Reactive Group


O

PRG Balance Group Mass 31-28 (Neutral loss)


Multiplexed protein quantitation in saccharomyces cerevisiae using amine-reactive isobaric tagging reagents Ross, PL., et al, Mol Cell Proteomics 2004 3: 1154-1169.

Balance
(Mass = 31 thru 28)

Isobaric Tagging - General Method (4-Plex)


114 31 -PRG +

S1

Parallel Denature & Digest

b 114
115 30 -PRG +

S2

Mix
116 29 -PRG +

MS

114 31 -N H 115 30 -N H 116 29 -N H 117 28 -N H

115 b

y
y y

MS/MS

116 b 117

S3

117

28 -PRG +

-Reporter-Balance-Peptide INTACT - 4 samples identical m/z


1352.84

S4

- Peptide fragments EQUAL - Reporter ions DIFFERENT

100

90

114

115

116

80

70 60

1347.0

1349.6

1352.2

1354.8 Mass (m/z)

1357.4

1360.0

% Intensity

40 P 111.0 112.8 114.6 116.4 118.2

30 20 10 0 9.0 292.8 576.6 860.4 1144.2

y8

50

Mass (m/z) 39.0 45.1 A T 74.1 72.1 L

b4

y10

b2

117

112.1 q,H

b9 y9

142.1

y4

y11

y2

b6

y6

b8

1352.8
1428.0

y3

Mass (m/z)

b10

b1

y5

b7

Spotfire K-means Clustering of Protein-level Ratios


G1L S PM G1L S PM G1L S PM

MS/MS Spectra of a Singly-charged Peptide


100 90 80 70 60
% Intensity

*-TPHPALTEAK-*

8396.7

50 40 30 P

y8

y10

b2

39.0 45.1 A T 74.1 72.1 L

112.1 q,H

20 10 0 9.0

292.8

576.6
Mass (m/z)

860.4

b9 y9

142.1

y4

1144.2

y11

y2

b6

y6

b8

1352.8
1428.0

b4

114.1

115.1

116.1

117.1

b10

b1

y3

y5

b7

111.0

112.8

114.6

116.4

118.2

120.0 757 759 761 763 Mass (m/z) 765 767 869 871 873 875 Mass (m/z) 877 879

Mass (m/z)

b7

y8

Reporter Group Placement: Selection of Quiet Summed Ion Intensity Region (~75,000 Spectra)
160000000

Summed Ion Intensity

120000000

80000000

40000000

0 0 200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

Simplified Workflow: (One extra step)


Control
Example: Time course labeling

Test 1

Test 2

Test 3

Trypsin Digestion

114

Label with iTRAQ Reagents

115

116

117

1 hr, RT, Single addition

Quant ID and

MIX

SCX
Single 2D LC analysis for combined samples (4-plex) LC MS/MS Analysis

MS/MS

Differential Expression using iTRAQ Reagent Approach


OverExpression of Chaperonin 10
Non-Cysteine containing Protein
Cance Cancer r 54 50 Normal 45 Normal 40

*VLQATVVAVGSGS*K * iTRAQ Labeled Residue


115 116 m/z, amu 117

35
114

30

25 y1 y2

20
y3 15 b3 10 b2 y4 b4 5 b5 y6 y7b6 b7 0 100 200 300 400 500 m/z, amu 600 700 800 900 y5

ITRAQ Advantages vs. Disadvantages


Estimates relative protein levels between samples with a reasonable level of accuracy (> 10%) Can be used on complex mixtures of proteins Isobaric so the tag is only visible in the MS/MS, keeping the precursor scans as clean as possible. The abundance of the peptides sums together. Making analysis of low abundance peptides easier. Replicates analyzed on the same LC-MS/MS run, minimizing run to run variability.

Reagent not completely specific


Expensive Does not work on ion trap instruments Reporters tend to dominate the spectra You have to fragment everything and sort out the ITRAQ reporters later. The mass spec spends a lot of time analyzing peptides with no quantitative differences.

Stable Isotope Labeling in Animal Culture

SILAC Advantages vs. Disadvantages


Estimates relative protein levels between samples with a high level of accuracy ( <5%)
Can be used on complex mixtures of proteins Can set up the mass spectrometer to fragment only those peaks with a certain ratio

Labeling may be incomplete


Urea Cycle may cause incorporation of heavy isotopes into other amino acids

Expensive
Works best on high resolution instruments.

Extremely flexible and can be adapted to many systems.

Label-Free Quantitation
All approaches so far require purchase of isotopically labeled reagents (can be expensive). What if you want to compare large numbers of samples (10+) What if you cant afford lots of reagents? Peak/Spectral counting Peak area comparison (Extracted Ion Chromatograms)

Spectral Counting
Count the number of peptides identified from a protein in each sample. Typically do not count repeat identifications of the same peptide Not accurate at quantifying magnitude of change, but can be used to determine if there is a difference.

In general, need a spectral count difference of about 4 peptides in order to be confident of a difference being real. Most proteins in complex mixtures are identified by less than 4 peptides.

EIC

(Extracted Ion Chromatogram)


Measure intensity of peak during its elution off HPLC column and into the mass spectrometer. Measure area of peak in XIC. More accurate than selecting peak intensity for one given scan.

emPAI
(Exponentially Modified Protein Abundance Index)
emPAI = 10PAI 1 Where PAI = Nobserved / Nobservable What is an observable peptide Peptides with a precursor mass between 800-2400Da. There is a roughly linear relationship between log protein concentration and the ratio of observable peptides observed in range of 3-500 fmoles. If you know how much total protein you analyzed you can derive absolute abundancies.

Ishihama et al. Mol Cell Proteomics (2005) 4 9 1265-1272

MRM
(Multiple Reaction Monitoring)
Look for a component of a specific mass that when fragmented forms a fragment of another specific mass.

Transition:

precursor m/z 521.7

fragment m/z 757.6

Very sensitive and specific.

MRM
Best performed on a triple quadrupole instrument. Scans are very fast, so can perform multiple transition scans on a chromatographic time-scale. Requires a lot of optimization: Verify transitions are reproducible, typically want 2-3 transitions/peptide, 3-4 peptides/protein. Determine the retention time to maximize the number of peptides that can be analyzed per run. It is possible to analyze 100s of transition per hour MRM coupled to isotopically labeled peptides allows for very high sensitivity and high accuracy analysis and can give absolute quantification. Once optimized 1000s of samples can be run in a short time frame Not for discovery! You must already know what you are looking for, sometimes refered to as targeted proteomics

Issues with MS Quantitation Analysis


Should you use all data for quantitation? Minimum peak intensity? Peaks near to signal to noise will have much higher variability in quantitation accuracy. Very intensive peaks may be saturated. Proteins identified by a single peptide are probably not accurately quantified? It is best to ignore sequences with more than one form: PTMs, missed cleavages, etc. Multiple charge states should be summed. Results are normally reported with a mean and standard deviation

Conclusions
There are many different ways to quantitate proteomics data Quantitative studies need to be approached carefully, because it is easy to make mistakes No one strategy is best MRM is the most sensitive and accurate, but requires the most optimization and cannot be used for discovery.

You might also like