PABST peptide examples
From SPCTools
Revision as of 21:05, 16 September 2009 Tfarrah (Talk | contribs) (→Usage notes) ← Previous diff |
Revision as of 21:07, 16 September 2009 Tfarrah (Talk | contribs) (→Examples to illustrate various settings) Next diff → |
||
Line 61: | Line 61: | ||
===Usage notes=== | ===Usage notes=== | ||
====Examples to illustrate various settings==== | ====Examples to illustrate various settings==== | ||
+ | On eac of the pages below, scroll down to Cached PABST best peptides. You will need to log in to SBEAMS. | ||
+ | |||
https://db.systemsbiology.net/devDC/sbeams/cgi/shortURL?key=c87rxtje | https://db.systemsbiology.net/devDC/sbeams/cgi/shortURL?key=c87rxtje | ||
Revision as of 21:07, 16 September 2009
PABST is a tool to help users select the best potential peptides to use for Mass Spectrometric identification of a set of proteins. It merges various data sources and evaluates the results based on user-tunable parameters. The current default parameter weightings are shown below, and the lower sections show links to example peptides along with comments to aid in the development of the selection algorithm.
The script can be run with the -h flag, or no args at all, to see the usage statement below. The only required parameter is build_id, which the script uses to determine which atlas build to export peptides from. The default config file is shown below the usage stmt, these values will be used unless a user-defined config file is used. To get a template config file, simply execute the script with the -d flag and an example config file will be written to the CWD, which can then be edited as desired.
The config file specifies various sequence attributes and an associated score; each peptide sequence is evaluated for every attribute, and a composite score is reached by multiplying together the score for each that matches. Each peptide has 2 possible sources, empirical data from having been observed in the specified atlas build, and theoretical data from the electronic analysis of the reference database. Scores less than 1 will penalize matching sequences, scores greater than 1 will reward them. For example, if a sequence had both a Proline and a Serine, and the score for each is set to 0.5, then the final score will be multiplied by 0.5 * 0.5, or 0.25. If the bonus_obs param is set to 2, then the empirical (observed) suitability score will be multiplied by 2.
The script must be run from $SBEAMS/lib/scripts/PeptideAtlas/, where $SBEAMS=/net/dblocal/www/html/<your_dev_area>/sbeams. If you don't have a dev area, use that of someone you know who's recently updated their software (maybe dev2 -- Eric Deutsch -- or devTF -- Terry Farrah).
usage: fetch_best_peptides.pl -a build_id [ -t outfile -n obs_cutoff -p proteins_file -v -b .3 ] -a, --atlas_build Numeric atlas build ID to query -c, --config Config file defining penalites for various sequence -d, --default_config prints an example config file with defaults in CWD, named best_peptide.conf, will not overwrite existing file. Exits after printing. -p, --protein_file file of protein names, one per line. Should match biosequence.biosequence_name -s, --show_builds Print info about builds in db -b, --bonus_obs Value by which observed peptide suitability score is augmented relative to theoretical score, default 0.5. -t, --tsv_file print output to specified file rather than stdout -n, --n_peptides number of peptides to return per protein -o, --obs_min Minimum n_obs to consider for observed peptides -h, --help Print usage -v, --verbose Verbose output, prints progress
Default config file:
C 0.3 # Avoid C D 1 # Slightly penalize D or S in general? DG 0.5 # Avoid dipeptide DG DP 0.5 # Avoid dipeptide DP M 0.3 # Avoid M NG 0.5 # Avoid dipeptide NG P 0.5 # Avoid P QG 0.5 # Avoid dipeptide QG S 1 # Slightly penalize D or S in general? W 0.1 # Avoid W Xc 0.5 # Avoid any C-terminal peptide max_l 0 # Maximum length for peptide max_p 1 # Penalty for peptides over max length min_l 0 # Minimum length for peptide min_p 1 # Penalty for peptides under min length nE 0.4 # Avoid N-terminal E nGPG 0.1 # Avoid nxyG where x or y is P or G nQ 0.1 # Avoid N-terminal Q nxxG 0.3 # Avoid nxxG obs 2 # Bonus for observed peptides, usually > 1
Usage notes
Examples to illustrate various settings
On eac of the pages below, scroll down to Cached PABST best peptides. You will need to log in to SBEAMS.
https://db.systemsbiology.net/devDC/sbeams/cgi/shortURL?key=c87rxtje
Protein: ALCAM, moderate number of observations
https://db.systemsbiology.net/devDC/sbeams/cgi/shortURL?key=xsp03v1h
Protein with tons of observed peptides, lots of them NT or MC.
https://db.systemsbiology.net/devDC/sbeams/cgi/shortURL?key=h5bwnrt2
Protein with many fewer observations
https://db.systemsbiology.net/devDC/sbeams/cgi/shortURL?key=s2kvwg9r
Protein with moderate number of obs, mixed MGL/SGL
Looking at empirical or theoretical peptides only
If you want just empirically observed peptides, filter for lines that do not have "na" in the empirical_proteotypic_score and suitability_score columns:
./fetch_best_peptides.pl --atlas_build 162 --bioseq_set 33 | awk '{if ($5!="na" && $6!="na" ) print}'
If you want just theoretical peptides (an in silico digest of an Atlas proteome), filter for lines that do not have "na" in the predicted_suitability_score column:
./fetch_best_peptides.pl --atlas_build 162 --bioseq_set 33 | awk '{ if ($7!="na") print }'
Of course, some peptides are both theoretical and empirically observed.