A Biased Look at Biomarkers

A biased look at Biomarkers
BioMarker
Definition:
Biomarker is a substance used as an indicator of a biologic state Existence of living organisms or biological process. A particular disease state
Proteins Nucleic acids
Metabolites:
Carbohydrates
Lipids Small molecules
Biomarker
Detection of biomarker Detection of biomarker diagnosis Self properties, e.g enzymatic activities Antibodies, IHC, ELISA
Detection of biomarker Quantitative a link between quantity of the marker and disease Qualitative a link between exist of a marker and disease
Biomarker & Diagnosis

Ideal Marker for diagnosis
Should have great sensitivity, specificity, and accuracy in reflecting total disease burden. A tumor marker should also be prognostic of outcome and treatment Biomarker for Screening
The marker must be highly specific, minimize false positive and negative The marker must be able to clearly reflect the different stages of the disease (early)
The marker must be easily detected without complicated medical procedures. The disease markers released to serum and urine are good targets for application of early screening.
The method for screening should be cost effective.
Samples for biomarker detection

Blood, urine, or other body fluids samples Tissue samples
Prostate Cancer marker PSA

PSA is a protein normally made in the prostate gland in ductal cells that make some of the semen. PSA helps to keep the semen liquid. PSA, also known as kallikrein III, seminin, semenogelase, -seminoprotein and P-30 antigen, is a glycoprotein, a serine protease
Prostate Cancer Diagnosis with PSA

Cancer of the prostate does not cause any symptoms until it is locally advanced or metastatic. There is a correlation between elevated PSA and prostate cancer. Detection of PSA is a surrogate for early detection of prostate cancer.
Large screening trials have shown that PSA nearly doubles the rate of detection when combined with other methods. Based on these data, PSA testing was approved by the US FDA for the screening and early detection of prostate cancer.
PSA is also found in the cytoplasm of benign prostate cells. I never dreamed that my discovery four decades ago would lead to such a profit-driven public health disaster." -Richard Ablin (inventor of the PSA test) PSA screening generates ~$1.7 billion annually in the U.S. alone.
Sensitivity = the ability of the test to detect the disease (True positive rate) Specificity = the likelihood that your test will be normal if you are disease free (True Negative)
A brief aside about Statistics and Probability
-Statistics are the formalization of common sense -because they have to handle many different situations, they can be really complicated -they should make you feel really good or really bad about your data -People are inherently bad at statisitics and probability Case Study: rate for being HIV positive: 1:10000 false positive rate of HIV test: 1:1000
If I test positive, what is the chance that I am really HIV negative?
What is the chance that I am HIV negative? 0.0001 0.001 0.01 0.1 0.9 0.99 0.9999
What is the chance that I am HIV negative? 0.0001 0.001 0.01 0.1 0.9 0.99 0.9999
For every 1 True Positive there will be 10 false positives, so my chance of being Negative is 10/11.
How about the PSA test?
Rate is 15:10000 False Positive Rate is 60:1000 For every 15 True Positives, there will be 600 False Positives! Chance of being Negative 600/615 = .97 Chance of being Positive = .03 (before test chance was 0.015) -Is this true?
How about the PSA test?
Rate is 15:10000 False Positive Rate is 60:1000 For every 15 True Positives, there will be 610 False Positives! Chance of being Negative 600/615 = .97 Chance of being Positive = .03 (before test chance was 0.015) -Is this true? The test will miss 80% of the true positives (sensitivity = 20%) so there will only be 3 True Positives Detected so: Chance of being Negative 600/603 = 0.995 Chance of being True Positive = 0.005 Follow up for a +HIV test is another blood test. Follow up for +PSA test is tissue biopsy.
How good does a Biomarker have to be?
By Age 65 the rate of Prostate Cancer climbs to 8:1000 and the test performs much better. For every 8 True Positives, there will be 60 False Positives! Chance of being Negative 60/68 = .88 Chance of being Positive = .12 (before test chance was 0.015)
How good does a Biomarker have to be?
Prostate Cancer is one of the most frequent cancers (15:10000), most cancers are much less frequent (1:10000: 1:50000) so a biomarker would have to be much better than the PSA test. It is currently believed that a new biomarker would need sensitivity and specificity better than 95%.
Early Proteomics Base Biomarker work was based on SELDI
SELDI can detect 200-300 features in a sample. It has been used to find biomarkers from everything from blood to tears.
Early Biomarker work has largely been discredited

-Biomarkers with similar masses kept being rediscovered -When the proteins were identified, they were abundant serum proteins and were from the same proteins -Multi-center studies failed to validate the biomarkers in clinical setting
-Realization that serum and other biofluids are incredibly complex. -Realization that serum and other biofluids are incredibly variable and fragile -some strong biomarkers -blood collection tube -# of freeze-thaw cycles -diet
Key Concept: Proteins vary widely in concentration
Typical Biomarker Discovery study will take 50 samples per condition. Typically takes 10 samples per condition to have a 90% chance of finding differences of 2 times.Validation will take 1000s of samples. Finally the assay will have to be converted to something that can be done in a clinical lab.
PCA or other Clustering is used for Biomarker discovery
2007
Common Serum Markers for Cancer Diagnosis/prognosis

AFP Lung Pancreas Kidney Breast Ovarian Cervical Uterine Prostate Liver x x x x x x x x x x CEA x x x x x x x x x x x x x x x x x x x x CA15-3 x x CA19-9 x CA125 x x PSA PSAf PAP hTG HCGb x x x x Ferr x NSE x B2M A2M
Gastro
Colon Bladder Brain Leukemia Myeloma Thyroid Testicular x
x
x
x
x
x
x x x x x x x x x x x x x x
Conclusions
-Biomarker Discovery is difficult -biofluids are complex -biofluids have a high dynamic range -biomarkers are usually low abundance -even taking proximal fluids typically does not help -the is a lot of person to person variability -Most Biomarkers will never become clinically relevant -statistical standards for diagnostic tools is very high -the more prevalent the disease the better the biomarker will perform -An MS based biomarker assay is unlikely due to the greater analytical performance of antibody based methods.
-For a biomarker workflow to be meaningful it must be quantitative!
Quantitative Approaches
Stable Isotope Labeling methods -adds heavy isotopes to one sample so chemically identical compounds are mass shifted -added to the peptides/proteins using reactive groups -added to the proteins in vivo using heavy amino acids -can be multiplexed Label free methods -extracted ion chromatograms -spectral counting
4700 Reflector Spec #1 MC[BP = 863.4, 3348]

863.4279
100
3348.0
4700 Reflector Spec #1 MC[BP = 863.4, 3348]
1737.8809
90
100 90
1738.8808
1941.2
80
80
70
60
% Intensity
70
1059.5333
1737.8809
50
1739.8810
40
60
30
% Intensity
20
1740.8808
50
10
963.5271
1296.6797
0 1737.49425
1738.56954
1739.64483 Mass (m /z)
1740.72011
1741.79540
1742.87069
1021.5520
40
1210.6891
30
1425.6223
1353.6017
1901.8827
1079.5632
881.2428
1222.6218
20
995.5375
1125.4923
1174.5804
1570.6759
1720.8409
1495.6821
2030.0236
2242.1663
1844.8245
1922.8702
2211.0522
2465.1926
10
0 799.0
1441.8
2084.6 Mass (m /z)
2539.4324
2727.4
3370.2
4013.0
ISOTOPE-CODED AFFINITY TAG (ICAT):

Label protein samples with heavy and light reagent Reagent contains affinity tag and heavy or light isotopes
Chemically reactive group: forms a covalent bond to the protein or peptide
Isotope-labeled linker: heavy or light, depending on which isotope is used Affinity tag: enables the protein or peptide bearing an ICAT to be isolated by affinity chromatography in a single step
Example of an ICAT Reagent

Biotin Affinity tag: Binds tightly to streptavidinagarose resin Reactive group: Thiol-reactive group will bind to Cys
O
NH NH
Linker: Heavy version will have deuteriums at * Light version will have hydrogens at *
H N S O
* *
O O
O
*
H N I O
The ICAT Reagent
How ICAT works?

Affinity isolation on streptavidin beads
Lyse & Label
Quantification MS
Identification MS/MS
NH2-EACDPLR-COOH
Light
100 MIX
100
Heavy
Proteolysis (ie trypsin) 0
0 550 570 m/z

590
200
400 m/z
600
ICAT Quantitation
ICAT Advantages vs. Disadvantages

Estimates relative protein levels between samples with a reasonable level of accuracy (within 10%)
Can be used on complex mixtures of proteins Cys-specific label reduces sample complexity Can set up the mass spectrometer to fragment only those peaks with a certain ratio
Yield and non specificity

Slight chromatography differences Expensive Tag fragmentation
Meaning of relative quantification information

No presence of cysteine residues or not accessible by ICAT reagent
iTRAQ Reagent Design Isobaric Tag

(Total mass = 145)
Reporter
Charged
Gives strong signature ion in MS/MS Gives good b- and y-ion series Maintains charge state Maintains ionization efficiency of peptide
Balance
Neutral loss
Balance changes in concert with reporter mass to maintain total mass of 145 Neutral loss in MS/MS
PRG
Amine specific
Isobaric Tag Total mass = 145 Isobaric Tag

(Total mass = 145)
= MS/MS Fragmentation Site
Amine specific peptide reactive group (NHS)

O
Reporter Reporter Group mass (Mass = 114 thru N 117) 114 117 (Retains Charge)
N O
O N
Peptide Reactive Group

O
PRG Balance Group Mass 31-28 (Neutral loss)

Multiplexed protein quantitation in saccharomyces cerevisiae using amine-reactive isobaric tagging reagents Ross, PL., et al, Mol Cell Proteomics 2004 3: 1154-1169.
Balance
(Mass = 31 thru 28)
Isobaric Tagging - General Method (4-Plex)

114 31 -PRG +
S1
Parallel Denature & Digest
b 114
115 30 -PRG +
S2
Mix
116 29 -PRG +
MS
114 31 -N H 115 30 -N H 116 29 -N H 117 28 -N H
115 b
y
y y
MS/MS
116 b 117
S3
117
28 -PRG +
-Reporter-Balance-Peptide INTACT - 4 samples identical m/z

1352.84
S4
- Peptide fragments EQUAL - Reporter ions DIFFERENT
100
90
114
115
116
80
70 60
1347.0
1349.6
1352.2
1354.8 Mass (m/z)
1357.4
1360.0
% Intensity
40 P 111.0 112.8 114.6 116.4 118.2
30 20 10 0 9.0 292.8 576.6 860.4 1144.2
y8
50
Mass (m/z) 39.0 45.1 A T 74.1 72.1 L
b4
y10
b2
117
112.1 q,H
b9 y9
142.1
y4
y11
y2
b6
y6
b8
1352.8
1428.0
y3
Mass (m/z)
b10
b1
y5
b7
Spotfire K-means Clustering of Protein-level Ratios

G1L S PM G1L S PM G1L S PM
MS/MS Spectra of a Singly-charged Peptide

100 90 80 70 60
% Intensity
*-TPHPALTEAK-*
8396.7
50 40 30 P
y8
y10
b2
39.0 45.1 A T 74.1 72.1 L
112.1 q,H
20 10 0 9.0
292.8
576.6
Mass (m/z)
860.4
b9 y9
142.1
y4
1144.2
y11
y2
b6
y6
b8
1352.8
1428.0
b4
114.1
115.1
116.1
117.1
b10
b1
y3
y5
b7
111.0
112.8
114.6
116.4
118.2
120.0 757 759 761 763 Mass (m/z) 765 767 869 871 873 875 Mass (m/z) 877 879
Mass (m/z)
b7
y8
Reporter Group Placement: Selection of Quiet Summed Ion Intensity Region (~75,000 Spectra)
160000000
Summed Ion Intensity
120000000
80000000
40000000
0 0 200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
Simplified Workflow: (One extra step)

Control
Example: Time course labeling
Test 1
Test 2
Test 3
Trypsin Digestion
114
Label with iTRAQ Reagents
115
116
117
1 hr, RT, Single addition
Quant ID and
MIX
SCX
Single 2D LC analysis for combined samples (4-plex) LC MS/MS Analysis
MS/MS
Differential Expression using iTRAQ Reagent Approach

OverExpression of Chaperonin 10
Non-Cysteine containing Protein
Cance Cancer r 54 50 Normal 45 Normal 40
*VLQATVVAVGSGS*K * iTRAQ Labeled Residue

115 116 m/z, amu 117
35
114
30
25 y1 y2
20
y3 15 b3 10 b2 y4 b4 5 b5 y6 y7b6 b7 0 100 200 300 400 500 m/z, amu 600 700 800 900 y5
ITRAQ Advantages vs. Disadvantages

Estimates relative protein levels between samples with a reasonable level of accuracy (> 10%) Can be used on complex mixtures of proteins Isobaric so the tag is only visible in the MS/MS, keeping the precursor scans as clean as possible. The abundance of the peptides sums together. Making analysis of low abundance peptides easier. Replicates analyzed on the same LC-MS/MS run, minimizing run to run variability.

Reagent not completely specific

Expensive Does not work on ion trap instruments Reporters tend to dominate the spectra You have to fragment everything and sort out the ITRAQ reporters later. The mass spec spends a lot of time analyzing peptides with no quantitative differences.
Stable Isotope Labeling in Animal Culture
SILAC Advantages vs. Disadvantages

Estimates relative protein levels between samples with a high level of accuracy ( <5%)
Can be used on complex mixtures of proteins Can set up the mass spectrometer to fragment only those peaks with a certain ratio
Labeling may be incomplete

Urea Cycle may cause incorporation of heavy isotopes into other amino acids
Expensive
Works best on high resolution instruments.
Extremely flexible and can be adapted to many systems.
Label-Free Quantitation
All approaches so far require purchase of isotopically labeled reagents (can be expensive). What if you want to compare large numbers of samples (10+) What if you cant afford lots of reagents? Peak/Spectral counting Peak area comparison (Extracted Ion Chromatograms)
Spectral Counting
Count the number of peptides identified from a protein in each sample. Typically do not count repeat identifications of the same peptide Not accurate at quantifying magnitude of change, but can be used to determine if there is a difference.
In general, need a spectral count difference of about 4 peptides in order to be confident of a difference being real. Most proteins in complex mixtures are identified by less than 4 peptides.
EIC
(Extracted Ion Chromatogram)

Measure intensity of peak during its elution off HPLC column and into the mass spectrometer. Measure area of peak in XIC. More accurate than selecting peak intensity for one given scan.
emPAI
(Exponentially Modified Protein Abundance Index)
emPAI = 10PAI 1 Where PAI = Nobserved / Nobservable What is an observable peptide Peptides with a precursor mass between 800-2400Da. There is a roughly linear relationship between log protein concentration and the ratio of observable peptides observed in range of 3-500 fmoles. If you know how much total protein you analyzed you can derive absolute abundancies.
Ishihama et al. Mol Cell Proteomics (2005) 4 9 1265-1272
MRM
(Multiple Reaction Monitoring)
Look for a component of a specific mass that when fragmented forms a fragment of another specific mass.
Transition:
precursor m/z 521.7
fragment m/z 757.6
Very sensitive and specific.
MRM
Best performed on a triple quadrupole instrument. Scans are very fast, so can perform multiple transition scans on a chromatographic time-scale. Requires a lot of optimization: Verify transitions are reproducible, typically want 2-3 transitions/peptide, 3-4 peptides/protein. Determine the retention time to maximize the number of peptides that can be analyzed per run. It is possible to analyze 100s of transition per hour MRM coupled to isotopically labeled peptides allows for very high sensitivity and high accuracy analysis and can give absolute quantification. Once optimized 1000s of samples can be run in a short time frame Not for discovery! You must already know what you are looking for, sometimes refered to as targeted proteomics
Issues with MS Quantitation Analysis

Should you use all data for quantitation? Minimum peak intensity? Peaks near to signal to noise will have much higher variability in quantitation accuracy. Very intensive peaks may be saturated. Proteins identified by a single peptide are probably not accurately quantified? It is best to ignore sequences with more than one form: PTMs, missed cleavages, etc. Multiple charge states should be summed. Results are normally reported with a mean and standard deviation
Conclusions
There are many different ways to quantitate proteomics data Quantitative studies need to be approached carefully, because it is easy to make mistakes No one strategy is best MRM is the most sensitive and accurate, but requires the most optimization and cannot be used for discovery.

A Biased Look at Biomarkers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Biased Look at Biomarkers

Uploaded by

Copyright:

Available Formats

A biased look at Biomarkers

Proteins Nucleic acids

Biomarker & Diagnosis

Samples for biomarker detection

Prostate Cancer marker PSA

Prostate Cancer Diagnosis with PSA

A brief aside about Statistics and Probability

If I test positive, what is the chance that I am really HIV negative?

A brief aside about Statistics and Probability

A brief aside about Statistics and Probability

How about the PSA test?

How about the PSA test?

How good does a Biomarker have to be?

How good does a Biomarker have to be?

Early Proteomics Base Biomarker work was based on SELDI

Early Biomarker work has largely been discredited

Key Concept: Proteins vary widely in concentration

PCA or other Clustering is used for Biomarker discovery

Common Serum Markers for Cancer Diagnosis/prognosis

-For a biomarker workflow to be meaningful it must be quantitative!

4700 Reflector Spec #1 MC[BP = 863.4, 3348]

1739.64483 Mass (m /z)

2084.6 Mass (m /z)

ISOTOPE-CODED AFFINITY TAG (ICAT):

Example of an ICAT Reagent

The ICAT Reagent

How ICAT works?

Lyse & Label

Proteolysis (ie trypsin) 0

0 550 570 m/z

ICAT Advantages vs. Disadvantages

Yield and non specificity

Meaning of relative quantification information

iTRAQ Reagent Design Isobaric Tag

Isobaric Tag Total mass = 145 Isobaric Tag

= MS/MS Fragmentation Site

Amine specific peptide reactive group (NHS)

Peptide Reactive Group

PRG Balance Group Mass 31-28 (Neutral loss)

Isobaric Tagging - General Method (4-Plex)

Parallel Denature & Digest

114 31 -N H 115 30 -N H 116 29 -N H 117 28 -N H

-Reporter-Balance-Peptide INTACT - 4 samples identical m/z

- Peptide fragments EQUAL - Reporter ions DIFFERENT

1354.8 Mass (m/z)

40 P 111.0 112.8 114.6 116.4 118.2

30 20 10 0 9.0 292.8 576.6 860.4 1144.2

Mass (m/z) 39.0 45.1 A T 74.1 72.1 L

Spotfire K-means Clustering of Protein-level Ratios

MS/MS Spectra of a Singly-charged Peptide

39.0 45.1 A T 74.1 72.1 L

Summed Ion Intensity

Simplified Workflow: (One extra step)

Label with iTRAQ Reagents

1 hr, RT, Single addition

Differential Expression using iTRAQ Reagent Approach

*VLQATVVAVGSGS*K * iTRAQ Labeled Residue

ITRAQ Advantages vs. Disadvantages

Reagent not completely specific

Stable Isotope Labeling in Animal Culture

SILAC Advantages vs. Disadvantages

Labeling may be incomplete

Extremely flexible and can be adapted to many systems.

VLQATVVAVGSGSK * iTRAQ Labeled Residue