Software:PeptideSieve

From SPCTools

(Difference between revisions)
Jump to: navigation, search
Revision as of 03:49, 8 October 2007
Dcampbel (Talk | contribs)
(Getting the software)
← Previous diff
Current revision
Zsun (Talk | contribs)

Line 1: Line 1:
'''Peptide Sieve Package & Server''' '''Peptide Sieve Package & Server'''
 +
 +'''Important Note:'''
 +PeptideSieve has been returned to public distribution now that it has been validated. Thank you for your patience.
Line 5: Line 8:
It has been noted [1] that only a handful of a protein’s possible tryptic peptides are consistently observed in proteomics experiments. We denote these consistently observed peptides to be proteotypic peptides. Such peptides have a variety of potential applications in proteomic research including improving protein identification scoring functions of database search software, providing a panel of reagents for protein quantification as well as the annotation of genomes for coding sequences of e.g. the hundreds of sequenced bacterial genomes some of which are important model organisms in systems biology and a guide for peptide selection in targeted proteomics experiments. Here we present '''PeptideSieve''', an alpha version of a computational tool to predict a peptide’s proteotypic propensity based on its physico-chemical properties. The resulting predictors have the ability to accurately identify proteotypic peptides from any protein sequence and offer starting points for generating a physical model describing the factors that govern elements of proteomic workflows such as digestion, chromatography, ionization and fragmentation. It has been noted [1] that only a handful of a protein’s possible tryptic peptides are consistently observed in proteomics experiments. We denote these consistently observed peptides to be proteotypic peptides. Such peptides have a variety of potential applications in proteomic research including improving protein identification scoring functions of database search software, providing a panel of reagents for protein quantification as well as the annotation of genomes for coding sequences of e.g. the hundreds of sequenced bacterial genomes some of which are important model organisms in systems biology and a guide for peptide selection in targeted proteomics experiments. Here we present '''PeptideSieve''', an alpha version of a computational tool to predict a peptide’s proteotypic propensity based on its physico-chemical properties. The resulting predictors have the ability to accurately identify proteotypic peptides from any protein sequence and offer starting points for generating a physical model describing the factors that govern elements of proteomic workflows such as digestion, chromatography, ionization and fragmentation.
-The software consists of a PERL program wrapping a C++ program. The input is a FASTA file of protein sequences and a parameter file. The program then returns which of a protein's peptides are most likely to be proteotypic for each of four common experimental designs. The PERL program is executed from the command line.+The software consists of a single C++ program. The input is either a FASTA file of protein sequences or a TXT file of peptide sequences. The program then returns which of a protein's peptides are most likely to be proteotypic for each of four common experimental designs.
===Method outline=== ===Method outline===
Line 11: Line 14:
===Reference=== ===Reference===
-[1] Nat Biotechnol. 2007 Jan;25(1):125-31. Computational prediction of proteotypic peptides for quantitative proteomics. Mallick P, Schirle M, Chen SS, Flory MR, Lee H, Martin D, Ranish J, Raught B, Schmitt R, Werner T, Kuster B, Aebersold R.+[1] [http://www.nature.com/nbt/journal/v25/n1/abs/nbt1275.html Nat Biotechnol. 2007 Jan;25(1):125-31. Computational prediction of proteotypic peptides for quantitative proteomics]. Mallick P, Schirle M, Chen SS, Flory MR, Lee H, Martin D, Ranish J, Raught B, Schmitt R, Werner T, Kuster B, Aebersold R.
===Getting the software=== ===Getting the software===
-'''Notice:''' A new, re-written version of PeptideSieve will be released on or around October 8th 2007. This version fixes a known bug in converting some datasets, and will be available here as soon as it is released.+New native C++ version (.51) released 5/2008: download the [http://sourceforge.net/project/showfiles.php?group_id=69281&package_id=287807 peptideSieve files] from the Sashimi project at SourceForge. Linux, os x and windows binaries (PeptideSieve.exe PeptideSieve.linux.i386 PeptideSieve.osx.i386) are available.
 + 
 +A [http://tools.proteomecenter.org/software/PeptideSieve/PeptideSieve_v0.51.static.with.GUI.setup.exe GUI windows] version is available from our collaborator Chee-Hong! It is updated to PeptideSieve version .51
 + 
 +Build instructions are available [http://tools.proteomecenter.org/wiki/index.php?title=Software:PeptideSieve:Build here].
 + 
 +===Running the software===
 + 
 +<pre>
 +PeptideSieve is a commandline utility. Running it sans arguments gives the usage instructions:
 + 
 +Usage: peptideSieve [options] [files]
 +PeptideSieve: Identify Proteotypic Peptides from a FASTA or TXT file.
 +Version - 0.51
 +Options:
 + -O [ --outputDirectory ] arg : set output directory
 + -e [ --outputExtension ] arg : set extension for output files
 + -o [ --outputFile ] arg : output file name if not
 + input.extension
 + -P [ --propertyFile ] arg (=properties.txt) : set property file
 + -f [ --inputFormat ] arg (=FASTA) : FASTA or TXT, specifying input
 + format
 + -l [ --minSeqLength ] arg (=6) : minimum sequence length to
 + consider
 + -L [ --maxSeqLength ] arg (=40) : maximum sequence length to
 + consider
 + -m [ --minMass ] arg (=400) : minimum mass to consider
 + -M [ --maxMass ] arg (=3000) : maximum mass to consider
 + -c [ --numAllowedMisCleavages ] arg (=1) : maximum number of miscleavages
 + to consider
 + -s [ --saveConvertedFile ] : save the converted propertyFile
 + -h [ --help ] : display usage information
 + -d [ --experimentalDesign ] arg (=ALL) : which design to return, either
 + NONE, ALL, PAGE_MALDI, ICAT_ESI,
 + PAGE_ESI, or MUDPIT_ESI
 + -p [ --pValue ] arg (=0.8) : only return peptides with p
 + values greater than X
-Current version (0.2a:) [http://tools.proteomecenter.org/PeptideSieve/PeptideSieve_v0.2a.tgz PeptideSieve_v0.2a.tgz]+</pre>
-We are ''very'' pleased to present a GUI version of the software provided by Chee-Hong Wong (wongch at bii dot a-star dot edu dot sg)! +It is CRITICAL to either place the properties.txt file in the directory where PeptideSieve is being executed or to specify the location of properties.txt using the -P flag or PeptideSieve will work very strangely.
-*[http://tools.proteomecenter.org/PeptideSieve/PeptideSieve_v0.2a.with.GUI.setup.exe PeptideSieve_v0.2a.with.GUI.setup.exe]: installer for windows executable 
-*[http://tools.proteomecenter.org/PeptideSieve/PeptideSieve_v0.2a.with.GUI.zip PeptideSieve_v0.2a.with.GUI.zip]: source, including PeptideSieve_v0.2a.tgz with change to supportFunctions.pl, a PeptideSieve subfolder with VC++8 project files, a sample screen, a text based readme for the GUI, and a html file. 
===Predictions and Supplementary Materials=== ===Predictions and Supplementary Materials===

Current revision

Peptide Sieve Package & Server

Important Note: PeptideSieve has been returned to public distribution now that it has been validated. Thank you for your patience.


Contents

Description

It has been noted [1] that only a handful of a protein’s possible tryptic peptides are consistently observed in proteomics experiments. We denote these consistently observed peptides to be proteotypic peptides. Such peptides have a variety of potential applications in proteomic research including improving protein identification scoring functions of database search software, providing a panel of reagents for protein quantification as well as the annotation of genomes for coding sequences of e.g. the hundreds of sequenced bacterial genomes some of which are important model organisms in systems biology and a guide for peptide selection in targeted proteomics experiments. Here we present PeptideSieve, an alpha version of a computational tool to predict a peptide’s proteotypic propensity based on its physico-chemical properties. The resulting predictors have the ability to accurately identify proteotypic peptides from any protein sequence and offer starting points for generating a physical model describing the factors that govern elements of proteomic workflows such as digestion, chromatography, ionization and fragmentation.

The software consists of a single C++ program. The input is either a FASTA file of protein sequences or a TXT file of peptide sequences. The program then returns which of a protein's peptides are most likely to be proteotypic for each of four common experimental designs.

Method outline

The program first performs an in silico digest of the protein and then converts each of the peptides into chemical property strings. The C++ program then computes a likelihood function, which scores the likelihood each peptide is proteotypic. It is important to note that the predictors are specific for particular experimental designs.

Reference

[1] Nat Biotechnol. 2007 Jan;25(1):125-31. Computational prediction of proteotypic peptides for quantitative proteomics. Mallick P, Schirle M, Chen SS, Flory MR, Lee H, Martin D, Ranish J, Raught B, Schmitt R, Werner T, Kuster B, Aebersold R.

Getting the software

New native C++ version (.51) released 5/2008: download the peptideSieve files from the Sashimi project at SourceForge. Linux, os x and windows binaries (PeptideSieve.exe PeptideSieve.linux.i386 PeptideSieve.osx.i386) are available.

A GUI windows version is available from our collaborator Chee-Hong! It is updated to PeptideSieve version .51

Build instructions are available here.

Running the software

PeptideSieve is a commandline utility.  Running it sans arguments gives the usage instructions:

Usage: peptideSieve [options] [files]
PeptideSieve: Identify Proteotypic Peptides from a FASTA or TXT file.
Version - 0.51
Options:
  -O [ --outputDirectory ] arg                : set output directory
  -e [ --outputExtension ] arg                : set extension for output files
  -o [ --outputFile ] arg                     : output file name if not 
                                              input.extension
  -P [ --propertyFile ] arg (=properties.txt) : set property file
  -f [ --inputFormat ] arg (=FASTA)           : FASTA or TXT, specifying input 
                                              format
  -l [ --minSeqLength ] arg (=6)              : minimum sequence length to 
                                              consider
  -L [ --maxSeqLength ] arg (=40)             : maximum sequence length to 
                                              consider
  -m [ --minMass ] arg (=400)                 : minimum mass to consider
  -M [ --maxMass ] arg (=3000)                : maximum mass to consider
  -c [ --numAllowedMisCleavages ] arg (=1)    : maximum number of miscleavages 
                                              to consider
  -s [ --saveConvertedFile ]                  : save the converted propertyFile
  -h [ --help ]                               : display usage information
  -d [ --experimentalDesign ] arg (=ALL)      : which design to return, either 
                                              NONE, ALL, PAGE_MALDI, ICAT_ESI, 
                                              PAGE_ESI, or MUDPIT_ESI
  -p [ --pValue ] arg (=0.8)                  : only return peptides with p 
                                              values greater than X

It is CRITICAL to either place the properties.txt file in the directory where PeptideSieve is being executed or to specify the location of properties.txt using the -P flag or PeptideSieve will work very strangely.


Predictions and Supplementary Materials

AEBERSOLD_TABLE_S1_Observed_Proteins

AEBERSOLD_TABLE_S2_Proteotypic_Peptides

AEBERSOLD_TABLE_S7_PAGE_ESI

AEBERSOLD_TABLE_S8_CZ_MALDI

AEBERSOLD_TABLE_S9_MUDPIT

AEBERSOLD_TABLE_S10_MUDPIT

AEBERSOLD_TABLE_S11_YEAST_PREDICTIONS

AEBERSOLD_TABLE_S12_HUMAN_PREDICTIONS

AEBERSOLD_TABLE_S13_EXTENDED_YEAST_PREDICTIONS

AEBERSOLD_TABLE_S14_EXTENDED_HUMAN_PREDICTIONS

Personal tools