GetTransitionsAPI

From SPCTools

(Difference between revisions)
Jump to: navigation, search
Revision as of 18:09, 27 April 2010
Dcampbel (Talk | contribs)

← Previous diff
Current revision
Dcampbel (Talk | contribs)

Line 1: Line 1:
<PRE> <PRE>
- GetTransitions is a CGI script that allows users to query the peptide and transition information stored in the Peptide Atlas/MRM Atlas. The transitions retrieved + GetTransitions is a CGI script that allows users to query the peptide and transition information stored in the Peptide Atlas/MRM Atlas.
-are constrained by various parameters set by the user. In a web browser, these can be set interactively as needed, but the page can also be accessed in an automated+The transitions retrieved are constrained by various parameters set by the user. In a web browser, these can be set interactively as
-fashion using command-line utilities such as wget or curl, or directly from a program using an appropriate URL fetching mechanism. This page is meant to describe the+needed, but the page can also be accessed in an automated fashion using command-line utilities such as wget or curl, or directly from a
-various parameters that a remote user can use to obtain transitions.+program using an appropriate URL fetching mechanism. This page is meant to describe the various parameters that a remote user can use
 +to obtain transitions.
Line 9: Line 10:
[http://tools.proteomecenter.org/wiki/index.php?title=PABST [Return]] to main PABST page. [http://tools.proteomecenter.org/wiki/index.php?title=PABST [Return]] to main PABST page.
<PRE> <PRE>
-This section defines the parameters that can be used to refine the transitions retrieved from the Atlas, some required and some optional, with allowed values following+This section defines the parameters that can be used to refine the transitions retrieved from the Atlas, some required and some optional,
-the field name where applicable. The following section shows the the descriptive text provided in the web UI to help further explain these options.+with allowed values following the field name where applicable. The following section shows the the descriptive text provided in the
 +web UI to help further explain these options.
## Required parameters: ## Required parameters:
Line 24: Line 26:
protein_name_constraint protein_name_constraint
upload_file upload_file
 +
 +# This param ensures that certain params generally set with user interaction do not limit search results
 +default_search=1
# not strictly required, but page with return dense HTML otherwise. # not strictly required, but page with return dense HTML otherwise.
Line 86: Line 91:
<PRE> <PRE>
Example invocation using wget, and the resulting transitions - output mode tsv. Example invocation using wget, and the resulting transitions - output mode tsv.
-wget 'https://db.systemsbiology.net/devDC/sbeams/cgi/PeptideAtlas/GetTransitions?protein_name_constraint=YAL003W;n_highest_intensity_fragment_ions=3;n_peptides_per_protein=3&action=QUERY;output_mode=tsv;organism_name=Yeast' -O YAL003W_transitions.tsv+wget 'https://db.systemsbiology.net/devDC/sbeams/cgi/PeptideAtlas/GetTransitions?default_search=1;protein_name_constraint=YAL003W;n_highest_intensity_fragment_ions=3;n_peptides_per_protein=3&action=QUERY;output_mode=tsv;organism_name=Yeast' -O YAL003W_transitions.tsv
Line 101: Line 106:
</PRE> </PRE>
 +<pre>
 +A second example, this time fetching transitions from the Human Complete SRMAtlas:
 +
 +wget 'https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/GetTransitions?protein_name_constraint=P01258;n_highest_intensity_fragment_ions=3;n_peptides_per_protein=3;apply_action_hidden=&action=QUERY;output_mode=tsv;default_search=1;organism_name=Human' -O Human_transitions.tsv
 +
 +This output currently includes some HTML; the workaround is to use a function like cut (or excel) to prune the noisy columns, e.g.
 +
 +cut -f1,3-16 Human_transitions.tsv > Human_transitions_noHTML.tsv
 +
 +
 +</pre>
<PRE> <PRE>
-A second example, using xml output mode and demonstrating how to specify multiple protein names, by using the HTML escape code for semicolon, %3B: YBR002C%3BYAL003W +A third example, using xml output mode and demonstrating how to specify multiple protein names, by using the HTML escape code for semicolon, %3B: YBR002C%3BYAL003W
-wget 'https://db.systemsbiology.net/devDC/sbeams/cgi/PeptideAtlas/GetTransitions?protein_name_constraint=YBR002C%3BYAL003W;n_highest_intensity_fragment_ions=3;n_peptides_per_protein=3;apply_action_hidden=&action=QUERY;output_mode=xml;organism_name=Yeast' -O Yeast_transitions.xml +wget 'https://db.systemsbiology.net/devDC/sbeams/cgi/PeptideAtlas/GetTransitions?protein_name_constraint=YBR002C%3BYAL003W;default_search=1;n_highest_intensity_fragment_ions=3;n_peptides_per_protein=3;apply_action_hidden=&action=QUERY;output_mode=xml;organism_name=Yeast' -O Yeast_transitions.xml
<?xml version="1.0" standalone="yes"?> <?xml version="1.0" standalone="yes"?>
<resultset identifier="unknown"> <resultset identifier="unknown">

Current revision

  GetTransitions is a CGI script that allows users to query the peptide and transition information stored in the Peptide Atlas/MRM Atlas.
The transitions retrieved are constrained by various parameters set by the user.  In a web browser, these can be set interactively as 
needed, but the page can also be accessed in an automated fashion using command-line utilities such as wget or curl, or directly from a
program using an appropriate URL fetching mechanism.  This page is meant to describe the various parameters that a remote user can use 
to obtain transitions.


[Return] to main PABST page.

This section defines the parameters that can be used to refine the transitions retrieved from the Atlas, some required and some optional,
with allowed values following the field name where applicable.  The following section shows the the descriptive text provided in the 
web UI to help further explain these options.

## Required parameters:
############################################

action  [ QUERY ]

# One of the following is required, typical remote use will involve specifying the organism.  
organism_name   [ yeast, mouse, human ] 
pabst_build_id  [ any accessible build_id ]

# One of the following must be set, upload file requires POST method and file encoding
protein_name_constraint  
upload_file 

# This param ensures that certain params generally set with user interaction do not limit search results
default_search=1

# not strictly required, but page with return dense HTML otherwise.
output_mode  [ tsv xml ]

## Optional parameters
############################################

peptide_sequence_constraint 
peptide_length
empirical_proteotypic_constraint
n_protein_mappings_constraint
n_genome_locations_constraint 
n_highest_intensity_fragment_ions
n_peptides_per_protein 

# Options below affect the PABST peptide scoring algorithm, explained [here]
.  Current default values are shown.

4H = '1' 5H = '1' BA = '1' C = '0.95' D = '1' DG = '1' DP = '1' Hper = '1' M = '0.95' NG = '1' NxST = '1' P = '0.95' QG = '1' R = '1' S = '1' W = '1' Xc = '1' max_l = '25' max_p = '0.2' min_l = '7' min_p = '0.2' nE = '1' nGPG = '1' nM = '1' nQ = '1' nX = '1' nxxG = '1' obs = '2' ssr_p = '0.5'


This section shows the help text for the various parameters.

protein_name_constraint            Constraint for the Protein Name. '%' is wildcard character; '_' is single character wildcard; character range is like '[a-m]'; multiple entries may be separated with a  semicolon; Use ! for NOT.     
upload_file                        Path to file with list of Protein Names to be uploaded via the web interface (NOTE: if proteins are not found, search defaults to printing all proteins of the selected Atlas build)                   
peptide_sequence_constraint        Constraint for the Peptide Sequence. '%' is wildcard character; '_' is single character wildcard; character range is like '[a-m]'; multiple entries may be separated with a  semicolon; Use ! for NOT. 
peptide_length                     Constraint for the num amino acids in seq Allowed syntax: "n", "> n", "< n", "between n and n", "n +- n"                                                                                               
empirical_proteotypic_constraint   Constraint for the empirical proteotypic score for a peptide.  Allowed syntax: "n.n", "> n.n", "< n.n", "between n.n and n.n", "n.n +- n.n"                                                            
n_protein_mappings_constraint      Constraint for number of distinct proteins for this peptide ( >=0 )                                                                                                                                    
n_genome_locations_constraint      Constraint for number of genome locations for this peptide ( >=0 )                                                                                                                                     
n_highest_intensity_fragment_ions  Number highest inten frag ions per spec to keep, default 3                                                                                                                 
n_peptides_per_protein             Number of peptides to return per protein, default 3                                                                                                                                           
pabst_build_id                     Select desired PABST Build to search, required. 
Example invocation using wget, and the resulting transitions - output mode tsv. 
wget 'https://db.systemsbiology.net/devDC/sbeams/cgi/PeptideAtlas/GetTransitions?default_search=1;protein_name_constraint=YAL003W;n_highest_intensity_fragment_ions=3;n_peptides_per_protein=3&action=QUERY;output_mode=tsv;organism_name=Yeast' -O YAL003W_transitions.tsv


Protein Pre     Sequence        Fol     Score   Src     Q1_mz   Q1_chg  Q3_mz   Q3_chg  Label   Rank    RI      SSR
YAL003W K       SYIEGTAVSQADVTVFK       A       1.60    IT      907.96  2       994.52  1       y9      1       10000   33.2
YAL003W K       SYIEGTAVSQADVTVFK       A       1.60    IT      907.96  2       1093.59 1       y10     2       4129    33.2
YAL003W K       SYIEGTAVSQADVTVFK       A       1.60    IT      907.96  2       494.30  1       y4      3       2497    33.2
YAL003W K       SIVTLDVKPWDDETNLEEMVANVK        A       1.56    IT      1373.19 2       1059.01 2       y18     1       3874    44.6
YAL003W K       SIVTLDVKPWDDETNLEEMVANVK        A       1.56    IT      1373.19 2       945.43  2       y16     2       1152    44.6
YAL003W K       SIVTLDVKPWDDETNLEEMVANVK        A       1.56    IT      1373.19 2       1009.48 2       y17     3       800     44.6
YAL003W R       WFNHIASK        A       1.43    IT      1002.52 1       555.32  1       y5      1       730     22.9
YAL003W R       WFNHIASK        A       1.43    IT      501.76  2       669.37  1       y6      2       2955    22.9
YAL003W R       WFNHIASK        A       1.43    IT      501.76  2       816.44  1       y7      3       827     22.9
A second example, this time fetching transitions from the Human Complete SRMAtlas:

wget 'https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/GetTransitions?protein_name_constraint=P01258;n_highest_intensity_fragment_ions=3;n_peptides_per_protein=3;apply_action_hidden=&action=QUERY;output_mode=tsv;default_search=1;organism_name=Human' -O Human_transitions.tsv

This output currently includes some HTML; the workaround is to use a function like cut (or excel) to prune the noisy columns, e.g.

cut -f1,3-16 Human_transitions.tsv > Human_transitions_noHTML.tsv


A third example, using xml output mode and demonstrating how to specify multiple protein names, by using the HTML escape code for semicolon, %3B: YBR002C%3BYAL003W 
wget 'https://db.systemsbiology.net/devDC/sbeams/cgi/PeptideAtlas/GetTransitions?protein_name_constraint=YBR002C%3BYAL003W;default_search=1;n_highest_intensity_fragment_ions=3;n_peptides_per_protein=3;apply_action_hidden=&action=QUERY;output_mode=xml;organism_name=Yeast' -O Yeast_transitions.xml 
<?xml version="1.0" standalone="yes"?>
<resultset identifier="unknown">
  <row identifier="0"
    Protein="YAL003W"
    Pre="K"
    Sequence="SYIEGTAVSQADVTVFK"
    Fol="A"
    Score="1.60"
    Src="IT"
    Q1_mz="907.96"
    Q1_chg="2"
    Q3_mz="994.52"
    Q3_chg="1"
    Label="y9"
    Rank="1"
    RI="10000"
    SSR="33.2"
  />
  <row identifier="1"
    Protein="YAL003W"
    Pre="K"
    Sequence="SYIEGTAVSQADVTVFK"
    Fol="A"
    Score="1.60"
    Src="IT"
    Q1_mz="907.96"
    Q1_chg="2"
    Q3_mz="1093.59"
    Q3_chg="1"
    Label="y10"
    Rank="2"
    RI="4129"
    SSR="33.2"
  />

The rest of the file is not shown due to space considerations.
  
Personal tools