Software:ProteinProphet
From SPCTools
Contents |
Getting the software
This software is included in the current TPP distribution.
In a nutshell
ProteinProphet is a tool for generating probablities for protein identifications based on MS/MS data. ProteinProphet makes use of results from PeptideProphet, which produces validation results for peptide sequence identifications. This software was originally developed at the SPC, part of the ISB. ProteinProphet is an integral part of the Trans-Proteomic Pipeline software distribution.
More info
Since MS/MS spectra are produced by peptides, and not proteins, there is a need for an additional statistical model for validation of the identifications at the protein level. We developed a model that has as input the list of peptides assigned to MS/MS spectra and corresponding probabilities that those peptide assignments are correct. Different peptide identifications corresponding to the same protein are combined together to estimate the probability that their corresponding protein is present in the sample. This protein grouping information is then employed to adjust the individual peptide probabilities, thus making the approach more discriminative. We also address the problem that we call degeneracy, which occurs when one peptide corresponds to several different proteins.
Usage
As of December 12, 2008, options available for command line users are:
NOPLOT: do not generate plot png file
NOOCCAM: non-conservative maximum protein list
ICAT: highlight peptide cysteines
GLYC: highlight peptide N-glycosylation motif
MINPROB: pepeptideProphet probabilty threshold (default=0.05)
GROUPWTS: check peptide's total weight in the Protein Group against the threshold (default: check peptide's actual weight against threshold)
ACCURACY: equivalent to MINPROB0
ASAP: compute ASAP ratios for protein entries
(ASAP must have been run previously on interact dataset)
REFRESH: import manual changes to ASAP ratios
(after initially using ASAP option)
NORMPROTLEN: Normalize NSP using Protein Length
PROTLEN: Report Protein Length
INSTANCES: Use Expected Number of Ion Instances to adjust the peptide probabilities prior to NSP adjustment
PROTMW: Get protein mol weights
IPROPHET: input is from iProphet
ASAP_PROPHET: *New and Improved* compute ASAP ratios for protein entries
(ASAP must have been run previously on all input interact datasets with mz/XML raw data format)
DELUDE: do NOT use peptide degeneracy information when assessing proteins
EXCELPEPS: write output tab delim xls file including all peptides
EXCELxx: write output tab delim xls file including all protein (group)s
with minimum probability xx, where xx is a number between 0 and 1
Some usage notes:
NORMPROTLEN: ProteinProphet grants higher probabilities to proteins with more identified (sibling) peptides (NSP="number of sibling peptides"). NSP is computed as the sum of the probabilities of the peptides: a protein with three peptides of probabilities 0.9, 0.6, and 0.4 would have NSP=1.9. With NORMPROTLEN, NSP is scaled according to protein length. Use of NORMPROTLEN is recommended.
IPROPHET: iProphet, or InterProphet, is software under development that further processes PeptideProphet output before processing by ProteinProphet. It can be used to combine PeptideProphet results from several experiments and search engines. The IPROPHET option to ProteinProphet should be used if and only if the input pepXML file(s) were created by iProphet. iProphet is not yet released to the public.
INSTANCES: this option is superceded by iProphet and should not be used with iProphet.
Caution for large datasets
protXML output format
Reference
Nesvizhskii AI, Keller A, Kolker E, Aebersold R. (2003) "A statistical model for identifying proteins by tandem mass spectrometry." Anal Chem 75:4646-58 download PDF

