PABST peptide examples

From SPCTools

(Difference between revisions)
Jump to: navigation, search
Revision as of 17:10, 27 May 2009
Tfarrah (Talk | contribs)

← Previous diff
Revision as of 17:13, 27 May 2009
Tfarrah (Talk | contribs)
(formatting changes)
Next diff →
Line 1: Line 1:
- PABST is a tool to help users select the best potential peptides to use for Mass Spectrometric identification of a set of proteins. It merges various +PABST is a tool to help users select the best potential peptides to use for Mass Spectrometric identification of a set of proteins. It merges various
- data sources and evaluates the results based on user-tunable parameters. The current default parameter weightings are shown below, and the lower sections+data sources and evaluates the results based on user-tunable parameters. The current default parameter weightings are shown below, and the lower sections
- show links to example peptides along with comments to aid in the development of the selection algorithm.+show links to example peptides along with comments to aid in the development of the selection algorithm.
- The script can be run with the -h flag, or no args at all, to see the usage statement below. The only required parameter is build_id, which the script uses+The script can be run with the -h flag, or no args at all, to see the usage statement below. The only required parameter is build_id, which the script uses
- to determine which atlas build to export peptides from. The default config file is shown below the usage stmt, these values will be used unless a user-defined+to determine which atlas build to export peptides from. The default config file is shown below the usage stmt, these values will be used unless a user-defined
- config file is used. To get a template config file, simply execute the script with the -d flag and an example config file will be written to the CWD, which +config file is used. To get a template config file, simply execute the script with the -d flag and an example config file will be written to the CWD, which
- can then be edited as desired.+can then be edited as desired.
- The config file specifies various sequence attributes and an associated score; each peptide sequence is evaluated for every attribute,+The config file specifies various sequence attributes and an associated score; each peptide sequence is evaluated for every attribute,
- and a composite score is reached by multiplying together the score for each that matches. Each peptide has 2 possible sources, empirical data from having +and a composite score is reached by multiplying together the score for each that matches. Each peptide has 2 possible sources, empirical data from having
- been observed in the specified atlas build, and theoretical data from the electronic analysis of the reference database. Scores less than 1 will penalize+been observed in the specified atlas build, and theoretical data from the electronic analysis of the reference database. Scores less than 1 will penalize
- matching sequences, scores greater than 1 will reward them. For example, if a sequence had both a Proline and a Serine, and the score for each is set to 0.5,+matching sequences, scores greater than 1 will reward them. For example, if a sequence had both a Proline and a Serine, and the score for each is set to 0.5,
- then the final score will be multiplied by 0.5 * 0.5, or 0.25. If the bonus_obs param is set to 2, then the empirical (observed) suitability score will be+then the final score will be multiplied by 0.5 * 0.5, or 0.25. If the bonus_obs param is set to 2, then the empirical (observed) suitability score will be
- multiplied by 2.+multiplied by 2.
- The script must be run from /net/dblocal/www/html/devTF/sbeams/lib/scripts/PeptideAtlas/.+The script must be run from /net/dblocal/www/html/devTF/sbeams/lib/scripts/PeptideAtlas/.
usage: fetch_best_peptides.pl -a build_id [ -t outfile -n obs_cutoff -p proteins_file -v -b .3 ] usage: fetch_best_peptides.pl -a build_id [ -t outfile -n obs_cutoff -p proteins_file -v -b .3 ]
Line 33: Line 33:
-h, --help Print usage -h, --help Print usage
-v, --verbose Verbose output, prints progress -v, --verbose Verbose output, prints progress
 +
 +Default config file:
<PRE> <PRE>
Line 58: Line 60:
- Below are some perhaps interesting example proteins to explore how the various scoring parameters affect the peptides selected. +Below are some perhaps interesting example proteins to explore how the various scoring parameters affect the peptides selected.

Revision as of 17:13, 27 May 2009

PABST is a tool to help users select the best potential peptides to use for Mass Spectrometric identification of a set of proteins. It merges various data sources and evaluates the results based on user-tunable parameters. The current default parameter weightings are shown below, and the lower sections show links to example peptides along with comments to aid in the development of the selection algorithm.

The script can be run with the -h flag, or no args at all, to see the usage statement below. The only required parameter is build_id, which the script uses to determine which atlas build to export peptides from. The default config file is shown below the usage stmt, these values will be used unless a user-defined config file is used. To get a template config file, simply execute the script with the -d flag and an example config file will be written to the CWD, which can then be edited as desired.

The config file specifies various sequence attributes and an associated score; each peptide sequence is evaluated for every attribute, and a composite score is reached by multiplying together the score for each that matches. Each peptide has 2 possible sources, empirical data from having been observed in the specified atlas build, and theoretical data from the electronic analysis of the reference database. Scores less than 1 will penalize matching sequences, scores greater than 1 will reward them. For example, if a sequence had both a Proline and a Serine, and the score for each is set to 0.5, then the final score will be multiplied by 0.5 * 0.5, or 0.25. If the bonus_obs param is set to 2, then the empirical (observed) suitability score will be multiplied by 2.

The script must be run from /net/dblocal/www/html/devTF/sbeams/lib/scripts/PeptideAtlas/.

 usage: fetch_best_peptides.pl -a build_id [ -t outfile -n obs_cutoff -p proteins_file -v -b .3 ]
  -a, --atlas_build    Numeric atlas build ID to query
  -c, --config         Config file defining penalites for various sequence
  -d, --default_config prints an example config file with defaults in CWD,
                       named best_peptide.conf, will not overwrite existing
                       file.  Exits after printing.
  -p, --protein_file   file of protein names, one per line.  Should match
                       biosequence.biosequence_name
  -s, --show_builds    Print info about builds in db
  -b, --bonus_obs      Value by which observed peptide suitability score is
                       augmented relative to theoretical score, default 0.5.
  -t, --tsv_file       print output to specified file rather than stdout
  -n, --n_peptides     number of peptides to return per protein
  -o, --obs_min        Minimum n_obs to consider for observed peptides
  -h, --help           Print usage
  -v, --verbose        Verbose output, prints progress

Default config file:

C       0.3     # Avoid C
D       1       # Slightly penalize D or S in general?
DG      0.5     # Avoid dipeptide DG
DP      0.5     # Avoid dipeptide DP
M       0.3     # Avoid M
NG      0.5     # Avoid dipeptide NG
P       0.5     # Avoid P
QG      0.5     # Avoid dipeptide QG
S       1       # Slightly penalize D or S in general?
W       0.1     # Avoid W
Xc      0.5     # Avoid any C-terminal peptide
max_l   0       # Maximum length for peptide
max_p   1       # Penalty for peptides over max length
min_l   0       # Minimum length for peptide
min_p   1       # Penalty for peptides under min length
nE      0.4     # Avoid N-terminal E
nGPG    0.1     # Avoid nxyG where x or y is P or G
nQ      0.1     # Avoid N-terminal Q
nxxG    0.3     # Avoid nxxG
obs     2       # Bonus for observed peptides, usually > 1


Below are some perhaps interesting example proteins to explore how the various scoring parameters affect the peptides selected.


https://db.systemsbiology.net/devDC/sbeams/cgi/shortURL?key=c87rxtje

Protein: ALCAM, moderate number of observations


https://db.systemsbiology.net/devDC/sbeams/cgi/shortURL?key=xsp03v1h

Protein with tons of observed peptides, lots of them NT or MC.


https://db.systemsbiology.net/devDC/sbeams/cgi/shortURL?key=h5bwnrt2

Protein with many fewer observations

https://db.systemsbiology.net/devDC/sbeams/cgi/shortURL?key=s2kvwg9r

Protein with moderate number of obs, mixed MGL/SGL

Personal tools