Software:ProteinProphet
From SPCTools
Contents |
Getting the software
This software is included in the current TPP distribution.
In a nutshell
ProteinProphet is a tool for generating probablities for protein identifications based on MS/MS data. ProteinProphet makes use of results from PeptideProphet, which produces validation results for peptide sequence identifications. This software was originally developed at the SPC, part of the ISB. ProteinProphet is an integral part of the Trans-Proteomic Pipeline software distribution.
More info
Since MS/MS spectra are produced by peptides, and not proteins, there is a need for an additional statistical model for validation of the identifications at the protein level. We developed a model that has as input the list of peptides assigned to MS/MS spectra and corresponding probabilities that those peptide assignments are correct. Different peptide identifications corresponding to the same protein are combined together to estimate the probability that their corresponding protein is present in the sample. This protein grouping information is then employed to adjust the individual peptide probabilities, thus making the approach more discriminative. We also address the problem that we call degeneracy, which occurs when one peptide corresponds to several different proteins.
Usage
As of December 12, 2008, options available for command line users are:
NOPLOT: do not generate plot png file NOOCCAM: non-conservative maximum protein list ICAT: highlight peptide cysteines GLYC: highlight peptide N-glycosylation motif MINPROB: pepeptideProphet probabilty threshold (default=0.05) GROUPWTS: check peptide's total weight in the Protein Group against the threshold (default: check peptide's actual weight against threshold) ACCURACY: equivalent to MINPROB0 ASAP: compute ASAP ratios for protein entries (ASAP must have been run previously on interact dataset) REFRESH: import manual changes to ASAP ratios (after initially using ASAP option) NORMPROTLEN: Normalize NSP using Protein Length PROTLEN: Report Protein Length INSTANCES: Use Expected Number of Ion Instances to adjust the peptide probabilities prior to NSP adjustment PROTMW: Get protein mol weights IPROPHET: input is from iProphet ASAP_PROPHET: *New and Improved* compute ASAP ratios for protein entries (ASAP must have been run previously on all input interact datasets with mz/XML raw data format) DELUDE: do NOT use peptide degeneracy information when assessing proteins EXCELPEPS: write output tab delim xls file including all peptides EXCELxx: write output tab delim xls file including all protein (group)s with minimum probability xx, where xx is a number between 0 and 1
Some usage notes:
NORMPROTLEN: ProteinProphet grants higher probabilities to proteins with more identified (sibling) peptides (NSP="number of sibling peptides"). NSP is computed as the sum of the probabilities of the peptides: a protein with three peptides of probabilities 0.9, 0.6, and 0.4 would have NSP=1.9. With NORMPROTLEN, NSP is scaled according to protein length. Use of NORMPROTLEN is recommended.
IPROPHET: iProphet, or InterProphet, is software under development that further processes PeptideProphet output before processing by ProteinProphet. It can be used to combine PeptideProphet results from several experiments and search engines. The IPROPHET option to ProteinProphet should be used if and only if the input pepXML file(s) were created by iProphet. iProphet is not yet released to the public.
INSTANCES: this option is superceded by iProphet and should not be used with iProphet.
Caution for large datasets
protXML output format
Reference
Nesvizhskii AI, Keller A, Kolker E, Aebersold R. (2003) "A statistical model for identifying proteins by tandem mass spectrometry." Anal Chem 75:4646-58 download PDF