Professional Documents
Culture Documents
Identification of Proteins Through Mass Spectrometry Databases
Identification of Proteins Through Mass Spectrometry Databases
Molecular Medicine
1
Proteome - complete set of proteins in cell Current methodologies: 2D gel, protein microarray, fluorescence microscopy, mass spectroscopy, chromatography, nuclear magnetic resonance, microfluidics, microchip
Rald_DbS
Sonar (Knexus) X!Tandam (The GPM)
3
Mascot
Software search engine Uses mass spectrometry data Mascot is unique Widely used Freely available by Matrix Science License is required for in-house use
Mascot Server
Gives excellent results with peak lists from instruments manufactured by:
In-house use:
Confidentiality
For automation To add and edit modifications, enzymes, quantitation methods, etc.
Uses the molecular masses of the peptides resulting from digestion of a protein by a specific enzyme
Sequence query Mass values combined with amino acid sequence or composition data.
MS/MS Ions Search Uninterpreted MS/MS data from a single peptide or from a complete LC-MS/MS run.
Peak picking
Get as many peptide masses in the range 1000 to 3500 Da To perform a search
not EST
Sequence must be present in databases Not Good for mixtures Start with Swiss-Prot. Protein hit is significant if expect value below 0.05
10
Single protein or a complex mixture Use chromatography to regulate the flow of peptides into the mass spectrometer. Select peptides one at a time using the first stage of mass analysis. Each isolated peptide is then induced to fragment. Second stage of mass analysis used to collect an MS/MS spectrum. We use software to determine which peptide sequence in the database gives the best match. The degree of matching is scored.
11
Peptide molecular ions fragment at preferred locations along backbone. Major peaks are b and y ions, Depends on the ionization technique, the mass analyser, and peptide structure. If peptides fragmented cleanly, we wouldnt need database search. A ladder of peaks for each ion series Fragmentation is rarely perfect
12
Results complicated to report Report, lists a series of proteins and the peptide matches that have been assigned. Report uses a pop-up window to show the alternative peptide matches Top match has a high score
13
Without enzyme
14
Sequence Query
15
Even the quality of spectrum is poor, its possible to pick out minimum of four clean peaks A few residues of amino acid sequence are interpreted What Mann and Wilm realized, that this very short stretch of amino acid sequence might provide sufficient specificity to provide identification if it was combined with the fragment ion mass values which enclose it, the peptide mass, and the enzyme specificity. Picking out a good tag requires both luck and experience. Requires interpretation of spectrum
16
Easier to skip the interpretation step and pass the peak list to the search engine.
Rapid search times Error tolerant
17
Search parameters
Name, Email and Search Title
The name and email are saved as a browser cookie. If Mascot security is enabled, information taken from user database Email address used for sending results
18
Databases
Single genome databases Not suitable for PMF cRAP and Contaminants
19
Database
Choose the right database
how
In Mascot 2.3 and later, you can select multiple databases You cannot mix AA and DNA databases. Comprehensive database repositories, NCBI and EBI, to download nr, GenBank, Swiss-Prot, EMBL, Trembl, etc
20
21
Taxonomy
Speeds up Simple report Keep indexes up to date Check the stats file for each database. If the correct protein from the correct species is not in the database , Dont specify a very narrow taxonomy.
22
Enzyme
First choice
Allowed missed cleavage sites to zero Choose a setting of 1 or 2 when youre not sure about your sample Higher number, increases the number of calculated peptide masses. No enzyme only in exceptional cases, never for PMF The list is user configurable.
23
Modifications
Mods that affect a terminus are less of a problem, e.g. Pyro-glu Mods that apply to residues with a high fractional abundance and at any position are BIG prob, e.g. Phospho (ST)
24
Modifications
Post-translational
Phosphorylation, acetylation Oxidation, acetylations Alkylation of cysteine Errors, SNPs, other varients
Artifacts
Derivatization
Sequence varients
And if alkylation agent is iodoacetamide (carbamidomethyl), iodoacetic acid (carboxymethyl), and MMTS (methylthio).
25
Phosphorylation
Intact fragments Natural loss of HPO3 (80 Da) Natural loss of H3PO4 (98 Da)
26
Protein mass
Mass of the intact protein in kDa. If this field is left blank, there is no restriction on protein mass Slow down the search a little.
27
Tolerance
Peptide tolerance MS/MS tolerance Error window on experimental peptide mass values Units: percentage, milli-mass units, parts per million, or Daltons. Protein/peptide view includes a graph of the mass errors for fragment ions.
Specifying too tight peptide tolerance , common reason for failing to get a match
A more appropriate tolerance should be +/- 0.3 in MS/MS
28
Mass type
Average or monoisotopic. Monoisotopic: most abundant natural isotopes First peak of isotope distribution. Average mass is the chemical mass, centre of gravity of the isotope distribution. Difference is approximately 0.06%.
If you get this setting wrong, the mass errors will be very large
29
Charge
Used on the sequence query and MS/MS forms. "1+" always means MH+, "1-" always means M-H-, etc.
30
Data (PMF)
Mass
Query window are used when no data file. The data format is auto detected.
List of mass values, one per line. If a second values is present, it is assumed to be intensity. Any further values on the same line are ignored
Applied biosystems data explorer (.pkm) Bruker analysis autoxecute data report
Bruker XML
mzData (1.o5) mzML
31
Data (MS/MS)
Instrument
Type of instrument used to acquire the data. This setting determines which fragment ion series will be used for scoring
32
Report
33
Final tip
Beware of
Removing modifications
Selecting spectras or mass values
34
35
36
A list of proteins
37
Scoring whether the match is random or not. Probability: observed match, is a random event. Real match, not random, has very low probability. Reject anything with a probability greater than a chosen threshold The mascot score is 10log10(p)
38
Significant thresholds
P=1/(20x500000)
39
Expectetion value
The number of times you could expect to get this score or better by chance
E=Pthreshold*(10**((Sthreshold-score)/10))
40
Wide range of modifications, SNPs Relax enzyme specificity All fixed and variable mods retained Allow for one additional unsuspected modification
41
Take query 218. the observed mass difference could correspond to either carbamidomethylation or carboxymethylation at the N-terminus. Since sample was alkylated with iodoacetamide. carbamidomethylation is also very believable, known artefact of over-alkylation. Finds new matches by introducing mass shifts
42
For confident site localization. Ascore, PTM score and MD-score MD -score, the score difference between top two matches
43
Validation (Decoy)
44
Decoy
Very simple
Repeat the search Matches that are found in the decoy database are false positives. It isnt useful when small number of spectra.
45
Decoy
A utility to create a decoy database Reversed or randomised sequence of the same length is automatically generated and tested. The average amino acid composition of the random sequences is the same The matches and scores for the decoy sequences are recorded separately in the result file.
46
Mascot Daemon
47
Mascot Distiller
48
References
http://www.matrixscience.com Mikhail M. S., Simone L., Markus B., Manja L., Toby M., Marcus B., Bernard K., The American Society for Biochemistry and Molecular Biology. (2011) Ville R. Koskinen, Patrick A. Emery, David M. Creasy, and John S. Cottrell, Molecular and Cellular Proteomics, (2011) Elias, J. E. and Gygi, S. P., Natural Methods 4 207-214 (2007)
49
50