TPP:Developer Documentation
From SPCTools
Revision as of 22:00, 25 August 2006 Jtasman (Talk | contribs) ← Previous diff |
Current revision Jwinget (Talk | contribs) (→PeptideProphetParser) |
||
Line 1: | Line 1: | ||
- | This page will contain a tutorial for the TPP geared towards a code developer. | + | ==License== |
- | + | All of the TPP is released under the LGPL license. Some other programs (converters) have different licences. | |
- | It should contain the minimum necessary mass spec and proteomics info necessary to get someone new to the field started. | + | |
Line 9: | Line 8: | ||
- | ==version control== | + | ==Version control== |
- | repository: sourceforge, project "sashimi" | + | |
- | cvs access: | + | This project is hosted at sourceforge ([http://www.sourceforge.net]) under the "sashimi" project, using SVN. |
- | ==XML Parsing libraries== | ||
- | ===perl=== | + | ===SVN access=== |
- | * xml tree | + | |
- | ===c++=== | + | ====Anonymous access==== |
- | * A. Keller's "tag" system | + | Anyone can check out the code: ... |
- | * J. Tasman code | + | |
+ | ====Developer access==== | ||
+ | You'll need to be a project developer: register at SourceForge and contact one of the main Sashimi developers. | ||
- | ==Overview(from Readme)== | + | see [https://sourceforge.net/svn/?group_id=69281 our SourceForge SVN info page.] |
- | Trans-Proteomic Pipeline (TPP) | + | |
- | Andrew Keller, ISB | + | ==XML Parsing libraries== |
- | 08.11.04 | + | ===perl=== |
+ | * xml tree | ||
+ | ===c++=== | ||
+ | * A. Keller's "tag" system: | ||
+ | All Parser programs (children of Parser.cxx) parse pepXML and protXML with the constrain that all text be enclosed within a 'tag' such as the following example: | ||
+ | <pre><mytag name="akeller"><email address="akeller@systemsbiology.org/></mytag></pre> | ||
- | ===System Requirements=== | + | NOT ACCEPTABLE are files with text outside of a bracket tag enclosure, such as 'akeller' in the following illegal example: |
+ | <pre><illegaltag>akeller</illegaltag></pre> | ||
- | ====Webserver==== | + | * J. Tasman code |
- | Webserver with access to data directories. See configuration section below. IIS on windows is the most supported configuration. Apache support is in the works. | ||
+ | ==System Requirements== | ||
+ | ===Webserver=== | ||
+ | |||
+ | Webserver with access to data directories. See configuration section below. Apache support is the default configuration and is installed and configured on port 1441 automatically as part of the current cygwin installation. '''IIS on windows is no longer a supported configuration.''' | ||
===XSLT Processor=== | ===XSLT Processor=== | ||
Line 52: | Line 57: | ||
cases, bypass XSLT altogether. | cases, bypass XSLT altogether. | ||
- | ===Required libraries=== | + | ===Libraries=== |
- | Library requirements: expat-2.0.0 (copied to cvs), boost 1.32, xerces (version?) | + | *expat-2.0.0 (copied to svn tree) |
+ | *boost 1.32 | ||
+ | *xerces (version?) | ||
You will need to install the following C libraries. These libraries are very common on linux distributions. Make | You will need to install the following C libraries. These libraries are very common on linux distributions. Make | ||
sure that you do not already have them before trying to install. If you do need to install, first try to use the standard package system (e.g. RPMs for Fedora linux , cygwin installer for cygwin-- see below, etc.) If you cannot install them via the normal package system for your distribution, go to their website and download directly. | sure that you do not already have them before trying to install. If you do need to install, first try to use the standard package system (e.g. RPMs for Fedora linux , cygwin installer for cygwin-- see below, etc.) If you cannot install them via the normal package system for your distribution, go to their website and download directly. | ||
- | libgd www.boutell.com/gd | + | |
- | libpng www.libpng.org | + | *libgd www.boutell.com/gd |
- | zlib www.gzip.org/zlib | + | *libpng www.libpng.org |
+ | *zlib www.gzip.org/zlib | ||
Windows users should get this by using the Cygwin installer (www.cygwin.com). <b>Make sure to get the devel packages</b>, in order to have the required .h files. | Windows users should get this by using the Cygwin installer (www.cygwin.com). <b>Make sure to get the devel packages</b>, in order to have the required .h files. | ||
+ | ==Building from source== | ||
===Configuration=== | ===Configuration=== | ||
- | ====Building from source==== | + | On [[TPP:Documentation]] you can see other guides for building on specific systems. |
- | + | ||
- | <b>Linux only</b> Skip ahead to Compilation for a Cygwin build. Note that we distribute precompiled binaries in a complete custom cygwin installation. | + | |
+ | Skip ahead to Compilation for a Cygwin build, the following is for ''Linux only''. Note that we distribute precompiled binaries in a complete custom cygwin installation. | ||
- | =====Edit src/Makefile.incl file===== | + | ====Edit the src/Makefile.incl file==== |
- | #Set the TPP_ROOT variable to the directory where you want to install the TPP, | + | Set the TPP_ROOT variable to the directory where you want to install the TPP. '''Include the trailing '/' when setting the path''', e.g.: |
- | include the trailing '/' when setting the path, e.g. | + | |
TPP_ROOT=/usr/local/tpp/ | TPP_ROOT=/usr/local/tpp/ | ||
- | #Set the TPP_WEB variable to the webserver root relative alias to the TPP, include the trailing '/' when setting the path, e.g. | + | Set the TPP_WEB variable to the webserver root relative alias to the TPP. '''Include the trailing '/' when setting the path''', e.g.: |
TPP_WEB=/tpp/ | TPP_WEB=/tpp/ | ||
- | WARNING: To avoid problems during the installation you MUST include the trailing '/' | + | |
- | when setting the above two paths | + | WARNING: To avoid problems during the installation ''''you MUST include the trailing '/' when setting the above two paths''''. |
+ | |||
#Set the XSLT_PROC to the path of the xsltproc executable on your system | #Set the XSLT_PROC to the path of the xsltproc executable on your system | ||
XSLT_PROC=/usr/bin/xsltproc | XSLT_PROC=/usr/bin/xsltproc | ||
- | In the src directory: type 'make configure' (once again, DON'T DO THIS for Windows/Cygwin!!!) | + | In the src directory, type 'make configure' (once again, DON'T DO THIS for Windows/Cygwin!!!) |
- | ===Compilation from source=== | + | ===Compilation=== |
- | ====Linux==== | + | ====Linux==== |
In the src directory: type 'make all' to compile binaries | In the src directory: type 'make all' to compile binaries | ||
Line 100: | Line 108: | ||
The TPP will be installed in the following directory structure by default: | The TPP will be installed in the following directory structure by default: | ||
- | /usr/local/tpp TPP root directory | + | ;/usr/local/tpp : TPP root directory |
- | /usr/local/tpp/cgi-bin CGI-BIN for tpp, contains all web served executables | + | ;/usr/local/tpp/cgi-bin : CGI-BIN for tpp, contains all web served executables |
- | /usr/local/tpp/bin binary directory | + | ;/usr/local/tpp/bin : binary directory |
- | /usr/local/tpp/html contains all non-executable web served objects | + | ;/usr/local/tpp/html : contains all non-executable web served objects |
- | /usr/local/tpp/etc contains miscelaneous configuration files | + | ;/usr/local/tpp/etc : contains miscelaneous configuration files |
- | /usr/local/tpp/schema contains all XML schema files | + | ;/usr/local/tpp/schema: contains all XML schema files |
====Windows/Cygwin==== | ====Windows/Cygwin==== | ||
Line 113: | Line 121: | ||
- | ===Webserver Configuration=== | + | ==Webserver Configuration== |
Ideally, all data directories should be cross mounted under the webserver root. Webserver should have SSI (server side includes) turned on. If it is not already, you can activate the Web Server SSI (Server Side Includes) | Ideally, all data directories should be cross mounted under the webserver root. Webserver should have SSI (server side includes) turned on. If it is not already, you can activate the Web Server SSI (Server Side Includes) | ||
- | ====Activating SSI on Linux==== | + | ===Activating SSI on Linux=== |
Modify the /etc/httpd.conf file: | Modify the /etc/httpd.conf file: | ||
Line 128: | Line 136: | ||
- | ====Webserver Root==== | + | ===Webserver Root=== |
The environment variable WEBSERVER_ROOT must be set for the program user(s) as well as the webserver. The WEBSERVER_ROOT | The environment variable WEBSERVER_ROOT must be set for the program user(s) as well as the webserver. The WEBSERVER_ROOT | ||
- | should point to the webserver's document root directory (e.g. /home/httpd/html on linux, or ??? on windows ). | + | should point to the webserver's document root directory (e.g. /home/httpd/html on linux/apache, or ??? on windows/IIS ). |
- | + | ===Apache specific webserver configuration=== | |
- | + | ||
- | ====Apache specific webserver configuration==== | + | |
Configure the webserver: | Configure the webserver: | ||
Line 173: | Line 179: | ||
</Directory> | </Directory> | ||
- | ====Windows/Cygwin IIS-specific configuration==== | + | ===Windows/Cygwin Apache-specific configuration=== |
Configure as described in "Post-Install Configuration" on the following page: | Configure as described in "Post-Install Configuration" on the following page: | ||
[http://tools.proteomecenter.org/configure.html] | [http://tools.proteomecenter.org/configure.html] | ||
- | ===Running the TPP=== | + | Apache is installed and configured automatically as part of the Cygwin install. |
+ | |||
+ | ==XML File Validation== | ||
+ | The SAX Validator will use the schema location indicated in the XML file to validate: | ||
+ | SAX2Count -v=always myfile.xml | ||
+ | |||
+ | |||
+ | ==Running the TPP== | ||
+ | |||
+ | ===Overview=== | ||
Specific instructions are geared towards command-line usage. Note that the web-base GUI is a very convenient way to do these steps. | Specific instructions are geared towards command-line usage. Note that the web-base GUI is a very convenient way to do these steps. | ||
Line 186: | Line 201: | ||
====Assigning peptide sequences to spectra (using a search engine)==== | ====Assigning peptide sequences to spectra (using a search engine)==== | ||
- | ====Converting search results to pepXML format==== | + | ====Converting search engine results to pepXML format==== |
- | You need to start with a converter to write out search results as 'summary.xml' in pepXML format. | + | Analysis programs begin with a pepXML-format file called 'summary.xml', containing the peptides identified by the search engine. |
+ | |||
*For SEQUEST results, you must specify the sequest.params file used for the search: | *For SEQUEST results, you must specify the sequest.params file used for the search: | ||
Line 231: | Line 247: | ||
Questions? Search the newsgroup first, then post questions. | Questions? Search the newsgroup first, then post questions. | ||
- | ===XML File Validation=== | + | ===Converters to mzXML=== |
- | The SAX Validator will use the schema location indicated in the XML file to validate: | + | |
- | SAX2Count -v=always myfile.xml | + | |
+ | ===Converters to pepXML=== | ||
- | Program Usage: | + | All converted pepXML files reference a standard sytlesheed pepXML_std.xsl to enable a view of the xml file directly in a browser. |
- | I. Converters to pepXML | + | ====Sequest2XML==== |
- | Sequest2XML summary.html (-P/full/path/mysequest.params) (-M) (-m) (-a) (-pI) (-Eenzyme) | + | '''retired in favor of Out2XML''' |
- | Converts summary.html to summary.xml in pepXML format. Uses sequest.params file in current directory, unless specified as second argument | + | |
- | -M: MALDI mode: do not include spectrum spot number in mzXML file name | + | |
- | -m: monoisotopic masses (regardless of sequest.params setting) | + | |
- | -a: average masses (regardless of sequest.params setting) | + | |
- | -pI: compute peptide pI values | + | |
- | -Eenzyme: set sample enzyme (default is trypsin, possible values are: nonspecific, chymotrypsin, elastase, gluc, gluc_bicarb, aspn, tca, cnbr, trypsin/cnbr, clostripain, iodosobenzoate, protein_endopeptidase, staph_protease, trypsin_k, trypsin_r) | + | |
- | + | ||
- | Mascot2XML summary.dat -D/full/path/mydatabase.fasta (-pI) (-Eenzyme) | + | |
- | Converts summary.dat to summary.xml in pepXML format. | + | |
- | See Sequest2XML for option definitions | + | |
- | Comet2XML summary.cmt.tar.gz (-Eenzyme) | + | |
- | Converts summary.cmt.tar.gz to summary.xml in pepXML format. | + | |
- | See Sequest2XML for option definitions | + | |
- | All converted pepXML files reference a standard sytlesheed pepXML_std.xsl to enable a view of the xml file directly in a browser. | + | Sequest2XML summary.html (-P/full/path/mysequest.params) (-M) (-m) (-a) (-pI) (-Eenzyme) |
+ | Converts summary.html to summary.xml in pepXML format. Uses sequest.params file in current directory, unless specified as second argument | ||
+ | ;OPTIONS | ||
+ | ;-M : MALDI mode: do not include spectrum spot number in mzXML file name | ||
+ | ;-m : monoisotopic masses (regardless of sequest.params setting) | ||
+ | ;-a : average masses (regardless of sequest.params setting) | ||
+ | ;-pI : compute peptide pI values | ||
+ | ;-Eenzyme : set sample enzyme (default is trypsin, possible values are: nonspecific, chymotrypsin, elastase, gluc, gluc_bicarb, aspn, tca, cnbr, trypsin/cnbr, clostripain, iodosobenzoate, protein_endopeptidase, staph_protease, trypsin_k, trypsin_r) | ||
- | II. Peptide-level analyses | + | ====Mascot2XML==== |
- | InteractParser interact.xml file1.xml file2.xml file3.xml ..... | + | Mascot2XML summary.dat -D/full/path/mydatabase.fasta (-pI) (-Eenzyme) |
- | Merges together pepXML files file1.xml, file2.xml, file3.xml .... into interact.xml. Combines all analysis_summary elements, and reindexes spectrum_query elements. Makes a system call to pepxml2html.pl (pepxml2html.pl -file interact.xml) to create stylesheet for viewing interact.xml amd interact.shtml in a browser. | + | Converts summary.dat to summary.xml in pepXML format. |
+ | ;OPTIONS | ||
+ | ;See Sequest2XML for option definitions | ||
- | DatabaseParser interact.xml | + | ====Comet2XML==== |
- | Prints the database(s) referenced in pepXML document | + | Comet2XML summary.cmt.tar.gz (-Eenzyme) |
+ | ;OPTIONS | ||
+ | ;See Sequest2XML for option definitions | ||
- | RefreshParser interact.xml /full/path/database.fasta | + | ===Peptide-level analyses=== |
- | Goes into database to find all proteins corresponding to identified peptides and overwrites results to interact.xml | + | |
- | EnzymeDigestionParser interact.xml (-Eenzyme) | + | ====InteractParser==== |
- | Computes number of tolerable termini and number of missed cleavages in dataset using sample enzyme stored in interact.xml unless specified as argument. | + | InteractParser interact.xml file1.xml file2.xml file3.xml ..... |
+ | Merges together pepXML files file1.xml, file2.xml, file3.xml .... into interact.xml. Combines all analysis_summary elements, and reindexes spectrum_query elements. Makes a system call to pepxml2html.pl (pepxml2html.pl -file interact.xml) to create stylesheet for viewing interact.xml amd interact.shtml in a browser. | ||
- | PeptideProphetParser interact.xml (EXCLUDE) (LEAVE) (ICAT) (NOICAT) (ZERO) (GLYC) (MALDI) (MINPROB=xx) | + | ====DatabaseParser==== |
- | Runs PeptideProphet with options and overwrites results to interact.xml : | + | DatabaseParser interact.xml |
- | EXCLUDE: exclude delta stars (SEQUEST) | + | Prints the database(s) referenced in pepXML document |
- | LEAVE: leave delta star values alone (SEQUEST) | + | |
- | ICAT: use peptide icat info in probability calculation | + | |
- | NOICAT: do not use peptide icat info in probability calculation | + | |
- | ZERO: do not discard any data | + | |
- | GLYC: use peptide NXS/T motif info in probability calculation | + | |
- | MALDI: specify maldi spectra | + | |
- | PI: use pI information | + | |
- | ACCMASS: use accurate mass binning | + | |
- | MINPROB=xx: filter away results with a probability less than xx | + | |
- | EXTRAITRS=xx: specify additional EM iterations | + | |
- | XPressPeptideParser interact.xml (-b) (-n<str>,<num>) (-n<str>,<num>) (-n<str>,<num>) (-L or -H) | + | ====RefreshParser==== |
- | Runs XPRESS with options and overwrites results to interact.xml | + | RefreshParser interact.xml /full/path/database.fasta |
- | Options: | + | Goes into database to find all proteins corresponding to identified peptides and overwrites results to interact.xml |
- | -m<num> change XPRESS mass tolerance (default=1.0) | + | |
- | -l<str> change labeled residues (default='C') | + | |
- | -r<num> change XPRESS residue mass difference (default=9.0) | + | |
- | -n<str>,<num> when specifying multiple isotopic labels, use | + | |
- | this option e.g. -nK,3.0 -nL,3.0 | + | |
- | -r<num> change XPRESS residue mass difference (default=9.0) | + | |
- | -b heavy labeled peptide elutes before light labeled partner | + | |
- | -L for ratio, set/fix light to 1, vary heavy | + | |
- | -H for ratio, set/fix heavy to 1, vary light | + | |
- | ASAPRatioPeptideParser interact.xml (-b) (-l<str>) (-S) (-m<str>) (-F) (-C) | + | ====EnzymeDigestionParser==== |
- | Runs ASAPRatio with options and overwrites results to interact.xml | + | EnzymeDigestionParser interact.xml (-Eenzyme) |
- | Options: | + | Computes number of tolerable termini and number of missed cleavages in dataset using sample enzyme stored in interact.xml unless specified as argument. |
- | -l<str> change labeled residues (default='C') | + | |
- | -b heavy labeled peptide elutes before light labeled partner | + | |
- | -f<num> areaFlag set to num (ratio display option) | + | |
- | -S static modification quantification (i.e. each run is either all light or all heavy) | + | |
- | -F use fixed scan range for light and heavy | + | |
- | -C quantitate only the charge state where the CID was made | + | |
- | -m<str> specified label masses (e.g. M74.325Y125.864), only relevant for static modification quantification | + | |
- | LibraPeptideParser interact.xml -clibra_condition.xml | + | ====PeptideProphetParser==== |
- | Runs LIBRA using channel information specified in libra_condition.xml file and overwrites results to interact.xml | + | PeptideProphetParser interact.xml (EXCLUDE) (LEAVE) (ICAT) (NOICAT) (ZERO) (GLYC) (MALDI) (MINPROB=xx) |
+ | Runs PeptideProphet with options and ''overwrites results to interact.xml'' | ||
+ | ;OPTIONS | ||
+ | ;EXCLUDE: exclude delta stars (SEQUEST) | ||
+ | ;LEAVE: leave delta star values alone (SEQUEST) | ||
+ | ;ICAT: use peptide icat info in probability calculation | ||
+ | ;NOICAT: do not use peptide icat info in probability calculation | ||
+ | ;ZERO: do not discard any data | ||
+ | ;GLYC: use peptide NXS/T motif info in probability calculation | ||
+ | ;MALDI: specify maldi spectra | ||
+ | ;PI: use pI information | ||
+ | ;ACCMASS: use accurate mass binning | ||
+ | ;MINPROB=xx: filter away results with a probability less than xx | ||
+ | ;EXTRAITRS=xx: specify additional EM iterations | ||
+ | ;NONTT: Do not use NTT information | ||
+ | ;CLEVEL=xx: Specify Conservative Level | ||
- | CompactParser file.xml | + | ====XPressPeptideParser==== |
- | Compacts either pepXML or protXML, combining together start and end tags when no elements are contained between them | + | XPressPeptideParser interact.xml (-b) (-n<str>,<num>) (-n<str>,<num>) (-n<str>,<num>) (-L or -H) |
+ | Runs XPRESS with options and overwrites results to interact.xml | ||
+ | ;OPTIONS | ||
+ | ;-m<num> : change XPRESS mass tolerance (default=1.0) | ||
+ | ;-l<str> : change labeled residues (default='C') | ||
+ | ;-r<num> : change XPRESS residue mass difference (default=9.0) | ||
+ | ;-n<str>,<num> : when specifying multiple isotopic labels, use this option e.g. -nK,3.0 -nL,3.0 | ||
+ | ;-r<num> : change XPRESS residue mass difference (default=9.0) | ||
+ | |||
+ | ;-b : heavy labeled peptide elutes before light labeled partner | ||
+ | |||
+ | ;-L : for ratio, set/fix light to 1, vary heavy | ||
+ | |||
+ | ;-H : for ratio, set/fix heavy to 1, vary light | ||
- | III. Protein-level analyses | + | ====ASAPRatioPeptideParser==== |
- | ProteinProphet.pl '<interact pep prob html file1><interact pep prob html file2>....' <outfile> (ICAT) (GLYC) (XPRESS) (ASAP_PROPHET) (ACCURACY) (ASAP) (REFRESH) (DELUDE) (NOOCCAM) | + | ASAPRatioPeptideParser interact.xml (-b) (-l<str>) (-S) (-m<str>) (-F) (-C) |
- | NOOCCAM: non-conservative maximum protein list | + | Runs ASAPRatio with options and overwrites results to interact.xml |
- | ICAT: highlight peptide cysteines | + | ;OPTIONS |
- | GLYC: highlight peptide N-glycosylation motif | + | ;-l<str> : change labeled residues (default='C') |
- | ACCURACY: min pep prob 0 | + | ;-b : heavy labeled peptide elutes before light labeled partner |
- | ASAP: compute ASAP ratios for protein entries | + | ;-f<num> : areaFlag set to num (ratio display option) |
- | (ASAP must have been run previously on interact dataset) | + | ;-S : static modification quantification (i.e. each run is either all light or all heavy) |
- | REFRESH: import manual changes to ASAP ratios | + | ;-F : use fixed scan range for light and heavy |
- | (after initially using ASAP option) | + | ;-C : quantitate only the charge state where the CID was made |
- | ASAP_PROPHET: *New and Improved* compute ASAP ratios for protein entries | + | ;-m<str> : specified label masses (e.g. M74.325Y125.864), only relevant for static modification quantification |
- | (ASAP must have been run previously on all input interact datasets with mz/XML raw data format) | + | |
- | DELUDE: do NOT use peptide degeneracy information when assessing proteins | + | |
- | HTML: write output to static html page (rather than dynamic shtml) | + | |
- | Other options in conjunction with HTML: | + | |
- | EXCELPEPS: write output tab delim xls file including all peptides | + | |
- | EXCELxx: write output tab delim xls file including all protein (group)s | + | |
- | with minimum probability xx, where xx is a number between 0 and 1 | + | |
- | XPressProteinRatioParser interact-prot.xml | + | ====LibraPeptideParser==== |
- | Runs XPRESS and overwrites results to interact-prot.xml | + | LibraPeptideParser interact.xml -clibra_condition.xml |
+ | Runs LIBRA using channel information specified in libra_condition.xml file and overwrites results to interact.xml | ||
+ | |||
+ | ====CompactParser==== | ||
+ | CompactParser file.xml | ||
+ | Compacts either pepXML or protXML, combining together start and end tags when no elements are contained between them | ||
+ | |||
+ | ===Protein-level analyses=== | ||
+ | |||
+ | ====ProteinProphet.pl==== | ||
+ | ProteinProphet.pl '<interact pep prob html file1><interact pep prob html file2>....' <outfile> (ICAT) (GLYC) (XPRESS) (ASAP_PROPHET) (ACCURACY) (ASAP) (REFRESH) (DELUDE) (NOOCCAM) | ||
+ | |||
+ | ;OPTIONS | ||
+ | ;NOOCCAM: non-conservative maximum protein list | ||
+ | ;ICAT: highlight peptide cysteines | ||
+ | ;GLYC: highlight peptide N-glycosylation motif | ||
+ | ;ACCURACY: min pep prob 0 | ||
+ | ;ASAP: compute ASAP ratios for protein entries (ASAP must have been run previously on interact dataset) | ||
+ | ;REFRESH: import manual changes to ASAP ratios (after initially using ASAP option) | ||
+ | ;ASAP_PROPHET: *New and Improved* compute ASAP ratios for protein entries (ASAP must have been run previously on all input interact datasets with mz/XML raw data format) | ||
+ | ;DELUDE: do NOT use peptide degeneracy information when assessing proteins | ||
+ | ;HTML: write output to static html page (rather than dynamic shtml) | ||
+ | ;Other options in conjunction with HTML | ||
+ | ;EXCELPEPS: write output tab delim xls file including all peptides | ||
+ | ;EXCELxx: write output tab delim xls file including all protein (group)s with minimum probability xx, where xx is a number between 0 and 1 | ||
+ | |||
+ | ====XPressProteinRatioParser==== | ||
+ | XPressProteinRatioParser interact-prot.xml | ||
+ | Runs XPRESS and overwrites results to interact-prot.xml | ||
- | ASAPRatioProteinRatioParser interact-prot.xml | + | ====ASAPRatioProteinRatioParser==== |
- | Runs ASAPRatio and overwrites results to interact-prot.xml | + | ASAPRatioProteinRatioParser interact-prot.xml |
+ | Runs ASAPRatio and overwrites results to interact-prot.xml | ||
- | LibraProteinRatioParser interact-prot.xml <normalization_channel> | + | ====LibraProteinRatioParser==== |
- | Runs LIBRA using normalizing ratios to normalization_channel and overwrites results to interact-prot.xml | + | interact-prot.xml <normalization_channel> |
+ | Runs LIBRA using normalizing ratios to normalization_channel and overwrites results to interact-prot.xml | ||
+ | |||
+ | ===Wrappers=== | ||
+ | |||
+ | ====xinteract==== | ||
+ | xinteract (generaloptions) (-Oprophetoptions) (-Xxpressoptions) (-Aasapoptions) (-L<conditionfile>libraoptions) xmlfile1 xmlfile2 .... | ||
+ | |||
+ | =====options===== | ||
+ | |||
+ | ;generaloptions | ||
+ | ;-Nmyfile.xml : write output to file 'myfile.xml' | ||
+ | ;-nI : do not run Interact (convert to pepXML only) | ||
+ | ;-nP : do not run PeptideProphet | ||
+ | ;-nR : do not run get all proteins corresponding to degenerate peptides from database | ||
+ | ;-p0 : do not discard search results with PeptideProphet probabilities below 0.05 | ||
+ | ;-x<num> : number of extra PeptideProphet interations; default <num>=0 | ||
+ | ;-p<num> :filter results below PeptideProphet probability <num>; default <num>=0.05 | ||
+ | ;-mw : calculate protein molecular weights | ||
+ | ;-MONO : calculate monoisotopic peptide masses during conversion to pepXML | ||
+ | ;-AVE : calculate average peptide masses during conversion to pepXML | ||
+ | ;-eX : specify sample enzyme other than trypsin | ||
+ | :; -eC : specify sample enzyme = Chymotrypsin | ||
+ | :; -eA : specify sample enzyme = AspN | ||
+ | :;-eG : specify sample enzyme = GluC | ||
+ | :;-eB : specify sample enzyme = GluC Bicarb | ||
+ | :;-eM : specify sample enzyme = CNBr | ||
+ | :;-eD : specify sample enzyme = Trypsin/CNBr | ||
+ | :;-e3 : specify sample enzyme = Chymotrypsin/AspN/Trypsin | ||
+ | :;-eE : specify sample enzyme = Elastase | ||
+ | :;-eL : specify sample enzyme = LysN (cuts before K) | ||
+ | :;-eP : specify sample enzyme = LysN Promisc (cuts before KASR) | ||
+ | :;-eN : specify sample enzyme = Nonspecific or None | ||
- | IV. Wrappers | + | For developers: |
+ | ;-t : run regression test against a previously derived result | ||
+ | ;-t! : learn results for regression test | ||
- | xinteract: | + | ---- |
- | usage: xinteract (generaloptions) (-Oprophetoptions) (-Xxpressoptions) (-Aasapoptions) (-L<conditionfile>libraoptions) xmlfile1 xmlfile2 .... | + | ;prophetoptions (following the 'O') |
+ | ;i : use icat information in PeptideProphet | ||
+ | ;f : do not use icat information in PeptideProphet | ||
+ | ;g : use N-glyc motif information in PeptideProphet | ||
+ | ;m : maldi data | ||
+ | ;I : use pI information in PeptideProphet | ||
+ | ;A : use accurate mass binning in PeptideProphet | ||
+ | ;w : warning instead of exit with error if instrument types between runs is different | ||
+ | ;x : exclude all entries with asterisked score values in PeptideProphet | ||
+ | ;l : leave alone all entries with asterisked score values in PeptideProphet | ||
- | generaloptions: | + | ;p : run ProteinProphet afterwards |
- | -Nmyfile.xml [write output to file 'myfile.xml'] | + | ;u : do not assemble protein groups in ProteinProphet analysis |
- | -nI [do not run Interact (convert to pepXML only)] | + | ;s :do not use Occam's Razor in ProteinProphet analysis to derive the simplest protein list to explain observed peptides |
- | -nP [do not run PeptideProphet] | + | |
- | -nR [do not run get all proteins corresponding to degenerate peptides from database] | + | |
- | -p0 [do not discard search results with PeptideProphet probabilities below 0.05] | + | |
- | -x<num> [number of extra PeptideProphet interations; default <num>=0] | + | |
- | -p<num> [filter results below PeptideProphet probability <num>; default <num>=0.05] | + | |
- | -mw [calculate protein molecular weights] | + | |
- | -MONO [calculate monoisotopic peptide masses during conversion to pepXML] | + | |
- | -AVE [calculate average peptide masses during conversion to pepXML] | + | |
- | -eX [specify sample enzyme other than trypsin] | + | |
- | -eC [specify sample enzyme = Chymotrypsin] | + | |
- | -eA [specify sample enzyme = AspN] | + | |
- | -eG [specify sample enzyme = GluC] | + | |
- | -eB [specify sample enzyme = GluC Bicarb] | + | |
- | -eM [specify sample enzyme = CNBr] | + | |
- | -eD [specify sample enzyme = Trypsin/CNBr] | + | |
- | -e3 [specify sample enzyme = Chymotrypsin/AspN/Trypsin] | + | |
- | -eE [specify sample enzyme = Elastase] | + | |
- | -eL [specify sample enzyme = LysN (cuts before K)] | + | |
- | -eP [specify sample enzyme = LysN Promisc (cuts before KASR)] | + | |
- | -eN [specify sample enzyme = Nonspecific or None] | + | |
- | For developers: | + | ---- |
- | -t [run regression test against a previously derived result] | + | ;xpressoptions (will run XPRESS analysis with any specified options that follow the 'X') |
- | -t! [learn results for regression test] | + | ;-m<num> : change XPRESS mass tolerance (default=1.0) |
+ | ;-l<str> : change labeled residues (default='C') | ||
+ | ;-n<str>,<num> : change XPRESS residue mass difference for <str> to <num> (default=9.0) | ||
+ | ;-b : heavy labeled peptide elutes before light labeled partner | ||
+ | ;-L : for ratio, set/fix light to 1, vary heavy | ||
+ | ;-H : for ratio, set/fix heavy to 1, vary light | ||
- | prophetoptions [following the 'O']: | + | ---- |
- | i [use icat information in PeptideProphet] | + | ;asapoptions (will run ASAPRatio analysis with any specified options that follow the 'A') |
- | f [do not use icat information in PeptideProphet] | + | ;-l<str> : change labeled residues (default='C') |
- | g [use N-glyc motif information in PeptideProphet] | + | ;-b : heavy labeled peptide elutes before light labeled partner |
- | m [maldi data] | + | ;-f<num> : areaFlag set to num (ratio display option) |
- | I [use pI information in PeptideProphet] | + | ;-S : static modification quantification (i.e. each run is either all light or all heavy) |
- | A [use accurate mass binning in PeptideProphet] | + | ;-F : use fixed scan range for light and heavy |
- | w [warning instead of exit with error if instrument types between runs is different] | + | ;-C : quantitate only the charge state where the CID was made |
- | x [exclude all entries with asterisked score values in PeptideProphet] | + | ;-m<str> : specified label masses (e.g. M74.325Y125.864), only relevant for static modification quantification |
- | l [leave alone all entries with asterisked score values in PeptideProphet] | + | |
- | p [run ProteinProphet afterwards] | + | ---- |
- | u [do not assemble protein groups in ProteinProphet analysis] | + | ;libraoptions (will run Libra Quantitation analysis with any specified options that follow the 'L') |
- | s [do not use Occam's Razor in ProteinProphet analysis to | + | ;-<num> : normalization channel (for protein level quantitation) |
- | derive the simplest protein list to explain observed peptides] | + | |
- | xpressoptions [will run XPRESS analysis with any specified options that follow the 'X']: | + | ---- |
- | -m<num> change XPRESS mass tolerance (default=1.0) | + | |
- | -l<str> change labeled residues (default='C') | + | |
- | -n<str>,<num> change XPRESS residue mass difference for <str> to <num> (default=9.0) | + | |
- | -b heavy labeled peptide elutes before light labeled partner | + | |
- | -L for ratio, set/fix light to 1, vary heavy | + | |
- | -H for ratio, set/fix heavy to 1, vary light | + | |
- | asapoptions [will run ASAPRatio analysis with any specified options that follow the 'A']: | + | =====examples===== |
- | -l<str> change labeled residues (default='C') | + | ;xinteract *.xml |
- | -b heavy labeled peptide elutes before light labeled partner | + | :combines together data in all pepXML files into 'interact.xml', then runs PeptideProphet] |
- | -f<num> areaFlag set to num (ratio display option) | + | |
- | -S static modification quantification (i.e. each run is either | + | |
- | all light or all heavy) | + | |
- | -F use fixed scan range for light and heavy | + | |
- | -C quantitate only the charge state where the CID was made | + | |
- | -m<str> specified label masses (e.g. M74.325Y125.864), only relevant for | + | |
- | static modification quantification | + | |
- | libraoptions [will run Libra Quantitation analysis with any specified options that follow the 'L']: | + | ;xinteract -Ndata.xml *.xml |
- | -<num> normalization channel (for protein level quantitation) | + | :same as above, but results are written to 'data.xml' |
- | examples: | + | ;xinteract -Ndata.xml -X -Op *.xml |
- | xinteract *.xml [combines together data in all pepXML files into 'interact.xml', then runs PeptideProphet] | + | :same as above, but run XPRESS analysis in its default mode, then ProteinProphet |
- | xinteract -Ndata.xml *.xml [same as above, but results are written to 'data.xml'] | + | |
- | xinteract -Ndata.xml -X -Op *.xml [same as above, but run XPRESS analysis in its default mode, then | + | |
- | ProteinProphet] | + | |
- | xinteract -X -A file1.xml file2.xml [combines together data in file1.xml and file2.xml into 'interact.xml' | + | |
- | and then runs XPRESS (in its default mode) and ASAPRatio (in its default mode)] | + | |
- | xinteract -X-nC,6.0 -A file1.xml file2.xml [same as above, but specifies that cysteine label has a heavy/light | + | |
- | mass difference of 6.0] | + | |
- | xinteract -X -A-lDE-S file1.xml file2.xml [sampe as above, but specifies for ASAP to run in static mode | + | |
- | with labeled residues D and E] | + | |
- | xinteract -Lmyconditionfile.xml-1 -Op file1.xml file2.xml [run libra quantitiation after PeptideProphet using myconditionfile.xml, and after ProteinProphet normalizing ratios to channel 1 values | + | |
+ | ;xinteract -X -A file1.xml file2.xml | ||
+ | :combines together data in file1.xml and file2.xml into 'interact.xml' and then runs XPRESS (in its default mode) and ASAPRatio (in its default mode) | ||
- | runpropet: | + | ;xinteract -X-nC,6.0 -A file1.xml file2.xml |
- | Runs ProteinProphet on designated 'interact.xml' file(s) | + | :same as above, but specifies that cysteine label has a heavy/light mass difference of 6.0 |
- | How to use runprophet: | + | ;xinteract -X -A-lDE-S file1.xml file2.xml |
+ | :same as above, but specifies for ASAP to run in static mode with labeled residues D and E | ||
- | usage 1: specify input file and options | + | ;xinteract -Lmyconditionfile.xml-1 -Op file1.xml file2.xml |
+ | :run libra quantitiation after PeptideProphet using myconditionfile.xml, and after ProteinProphet normalizing ratios to channel 1 values | ||
- | runprophet -Ooptions <interact file (with probs)> | + | ====runpropet==== |
+ | Runs ProteinProphet on designated 'interact.xml' file(s), where interact.xml is a pepXML-formated output file from PeptideProphet. | ||
- | runs analysis on inputfile with specified options, | + | =====usage 1: specify input file and options===== |
- | writes analysis to inputfile-prot.htm | + | runprophet -Ooptions <interact file (with probs)> |
- | options: | + | |
- | i: icat data (color Cysteines) | + | |
- | g: N-glycosylation data (color NXS/T) | + | |
- | m: multifiles (more than one interact.xml file, must specify outfile | + | |
- | d: delude (do not look up ALL prots | + | |
- | corresponding to degenerate peps) | + | |
- | l: use html input files (pre-TPP) | + | |
- | X: import XPRESS protein ratios | + | |
- | A: import ASAPRatio protein ratios and pvalues | + | |
- | L<num>: import Libra protein ratios normalized to channel <num> | + | |
- | a: import ASAPRatio results present in file | + | |
- | (starting from scratch) | + | |
- | r: update changes made to ASAPRatio results | + | |
- | (previously run using 'a' option) | + | |
- | n: don't use occam's razor for degenerate peps | + | |
- | (get max prot list, including many false positives) | + | |
- | u: do not assemble PROTEIN GROUPS | + | |
- | z: do not include zero probability protein entries in output | + | |
- | H: writes results to static html file (and tab delimited excel file) | + | |
- | P: includes peptides in tab delimited excel file (must accompany 'H') | + | |
- | xx: includes results in tab delimited excel file with minimum | + | |
- | probability xx, where xx is a number between 0 and 1 (must accompany 'H') | + | |
- | example: runprophet -OiXA interact.xml | + | runs analysis on inputfile with specified options and writes analysis to inputfile-prot.htm |
- | (for icat data with mzXML XPRESS and ASAPRatio quantitation information) | + | |
- | example: runprophet -Oia interact.xml | + | ;options: |
- | (for icat data with non-mzXML ASAPRAtio information) | + | ;i: icat data (color Cysteines) |
- | example: runprophet -Oi interact.xml | + | ;g: N-glycosylation data (color NXS/T) |
- | (for icat data) | + | ;m: multifiles (more than one interact.xml file, must specify outfile |
- | example: runprophet -Og interact.xml | + | ;d: delude (do not look up ALL prots corresponding to degenerate peps) |
- | (for N-glycosylated data) | + | ;l: use html input files (pre-TPP) |
- | example: runprophet -OL interact.htm | + | ;X: import XPRESS protein ratios |
- | (for pre-TPP html input file) | + | ;A: import ASAPRatio protein ratios and pvalues |
+ | ;L<num>: import Libra protein ratios normalized to channel <num> | ||
+ | ;a: import ASAPRatio results present in file (starting from scratch) | ||
+ | ;r: update changes made to ASAPRatio results (previously run using 'a' option) | ||
+ | ;n: don't use occam's razor for degenerate peps (get max prot list, including many false positives) | ||
+ | ;u: do not assemble PROTEIN GROUPS | ||
+ | ;z: do not include zero probability protein entries in output | ||
+ | ;H: writes results to static html file (and tab delimited excel file) | ||
+ | ;P: includes peptides in tab delimited excel file (must accompany 'H') | ||
+ | ;xx: includes results in tab delimited excel file with minimum probability xx, where xx is a number between 0 and 1 (must accompany 'H') | ||
- | usage 2: specify input file and use default options | + | ;examples: |
+ | ;runprophet -OiXA interact.xml | ||
+ | :for icat data with mzXML XPRESS and ASAPRatio quantitation information | ||
- | runprophet <interact file (with probs)> | + | ;runprophet -Oia interact.xml |
+ | :for icat data with non-mzXML ASAPRAtio information | ||
- | runs analysis on inputfile using default options, | + | ;runprophet -Oi interact.xml |
- | writes analysis to inputfile-prot.htm | + | :for icat data |
- | example: runprophet interact.xml | + | ;runprophet -Og interact.xml |
- | (writes output to file: interact-prot.shtml) | + | :for N-glycosylated data |
- | usage 3: specify output file | + | ;runprophet -OL interact.htm |
+ | :for pre-TPP html input file | ||
- | runprophet (-Ooptions) <interact file (with probs)> <outputfile> | + | =====usage 2: specify input file and use default options===== |
+ | runprophet <interact file (with probs)> | ||
+ | runs analysis on inputfile using default options and writes analysis to inputfile-prot.htm | ||
- | runs analysis on inputfile (with specified options), | + | ;example: |
- | writes analysis to specified outputfile | + | ;runprophet interact.xml |
+ | :writes output to file: interact-prot.shtml | ||
- | example: runprophet interact.xml protein.shtml | + | =====usage 3: specify output file===== |
- | (writes output to file: protein.shtml) | + | runprophet (-Ooptions) <interact file (with probs)> <outputfile> |
+ | runs analysis on inputfile (with specified options) and writes analysis to specified outputfile | ||
- | usage 4: combine multiple datasets into a single analysis | + | ;example: |
- | runprophet -Om(options) <interactfile1 interactfile2 ...> <outputfile> | + | ;runprophet interact.xml protein.shtml |
+ | :writes output to file: protein.shtml | ||
- | runs analysis on multiple inputfiles (with specified options), | + | =====usage 4: combine multiple datasets into a single analysis===== |
- | writes analysis to specified outputfile | + | |
- | example: runprophet -Oim interact-1.xml interact-2.xml protein.shtml | + | runprophet -Om(options) <interactfile1 interactfile2 ...> <outputfile> |
- | (analyzes interact-1.xml and interact-2.xml icat data | + | |
- | together and writes output to file: protein.shtml) | + | |
- | usage 5: options for static html output | + | runs analysis on multiple inputfiles (with specified options) and writes analysis to specified outputfile |
- | example: runprophet -OHP0.9 interact.xml | + | ;example: runprophet -Oim interact-1.xml interact-2.xml protein.shtml |
- | writes results to static html file, and results with min prob 0.9 | + | |
- | (including peptides) to tab delimited excel file | + | |
+ | ;analyzes interact-1.xml and interact-2.xml icat data together and writes output to file: protein.shtml | ||
- | Note: All Parser programs (children of Parser.cxx) parse pepXML and protXML with the constrain that all text be enclosed within a 'tag' | + | =====usage 5: options for static html output===== |
- | such as the following example: | + | ;example: runprophet -OHP0.9 interact.xml |
- | <mytag name="akeller"> | + | :writes results to static html file, and results with min prob 0.9 (including peptides) to tab delimited excel file |
- | <email address="akeller@systemsbiology.org/> | + | |
- | </mytag> | + | |
- | NOT ACCEPTABLE are files with text outside of a bracket tag enclosure, such as 'akeller' in the following illegal example: | + | ==Author== |
- | <illegaltag>akeller</illegaltag> | + | Much of this text comes from the TPP README file, originally written by A. Keller, 2004. |
+ | Wiki version created by J. Tasman. | ||
- | CREDITS: | + | ==Credits== |
The refreshparser program uses the SPARE Parts by Bruce W. Watson / Loek Cleophas. | The refreshparser program uses the SPARE Parts by Bruce W. Watson / Loek Cleophas. |
Current revision
License
All of the TPP is released under the LGPL license. Some other programs (converters) have different licences.
Languages
C, C++, perl
Version control
This project is hosted at sourceforge ([1]) under the "sashimi" project, using SVN.
SVN access
Anonymous access
Anyone can check out the code: ...
Developer access
You'll need to be a project developer: register at SourceForge and contact one of the main Sashimi developers.
see our SourceForge SVN info page.
XML Parsing libraries
perl
- xml tree
c++
- A. Keller's "tag" system:
All Parser programs (children of Parser.cxx) parse pepXML and protXML with the constrain that all text be enclosed within a 'tag' such as the following example:
<mytag name="akeller"><email address="akeller@systemsbiology.org/></mytag>
NOT ACCEPTABLE are files with text outside of a bracket tag enclosure, such as 'akeller' in the following illegal example:
<illegaltag>akeller</illegaltag>
- J. Tasman code
System Requirements
Webserver
Webserver with access to data directories. See configuration section below. Apache support is the default configuration and is installed and configured on port 1441 automatically as part of the current cygwin installation. IIS on windows is no longer a supported configuration.
XSLT Processor
The TPP currently relies on an "xsl transform" processor for manipulating XML data.
- One such program, xsltproc, is usually distributed with Linux so you most likely already have it. It should reside in the /usr/bin/ directory. If it is not already on your computer, first try to install it via the standard package system for your distribution, or the cygwin installer for cygwin. Otherwise you can download it for free at: http://xmlsoft.org/XSLT/downloads.html
- Another freely available XSLT processor, Xalan, will also work fine with ProteinProphet. If you use it, just make sure it is installed in a directory already on the library path, or else set the LD_LIBRARY_PATH variable to include its location on the webserver:
Add LD_LIBRARY_PATH files to /etc/ld.so.conf Then type: ldconfig -v
The viewing of large xml files is sometimes slow. We are hoping to optimize the stylesheets in the future, and in some cases, bypass XSLT altogether.
Libraries
- expat-2.0.0 (copied to svn tree)
- boost 1.32
- xerces (version?)
You will need to install the following C libraries. These libraries are very common on linux distributions. Make sure that you do not already have them before trying to install. If you do need to install, first try to use the standard package system (e.g. RPMs for Fedora linux , cygwin installer for cygwin-- see below, etc.) If you cannot install them via the normal package system for your distribution, go to their website and download directly.
- libgd www.boutell.com/gd
- libpng www.libpng.org
- zlib www.gzip.org/zlib
Windows users should get this by using the Cygwin installer (www.cygwin.com). Make sure to get the devel packages, in order to have the required .h files.
Building from source
Configuration
On TPP:Documentation you can see other guides for building on specific systems.
Skip ahead to Compilation for a Cygwin build, the following is for Linux only. Note that we distribute precompiled binaries in a complete custom cygwin installation.
Edit the src/Makefile.incl file
Set the TPP_ROOT variable to the directory where you want to install the TPP. Include the trailing '/' when setting the path, e.g.:
TPP_ROOT=/usr/local/tpp/
Set the TPP_WEB variable to the webserver root relative alias to the TPP. Include the trailing '/' when setting the path, e.g.:
TPP_WEB=/tpp/
WARNING: To avoid problems during the installation 'you MUST include the trailing '/' when setting the above two paths'.
- Set the XSLT_PROC to the path of the xsltproc executable on your system
XSLT_PROC=/usr/bin/xsltproc
In the src directory, type 'make configure' (once again, DON'T DO THIS for Windows/Cygwin!!!)
Compilation
Linux
In the src directory: type 'make all' to compile binaries
Windows/Cygwin
In the src directory: type 'make windows' to compile binaries
Installation
Linux
In the src directory: type 'make install' to install all the binaries
The TPP will be installed in the following directory structure by default:
- /usr/local/tpp
- TPP root directory
- /usr/local/tpp/cgi-bin
- CGI-BIN for tpp, contains all web served executables
- /usr/local/tpp/bin
- binary directory
- /usr/local/tpp/html
- contains all non-executable web served objects
- /usr/local/tpp/etc
- contains miscelaneous configuration files
- /usr/local/tpp/schema
- contains all XML schema files
Windows/Cygwin
In the src directory: type 'make install-windows' to install all the binaries
Webserver Configuration
Ideally, all data directories should be cross mounted under the webserver root. Webserver should have SSI (server side includes) turned on. If it is not already, you can activate the Web Server SSI (Server Side Includes)
Activating SSI on Linux
Modify the /etc/httpd.conf file:
In the document root <Directory> section, add +Includes to the end of already existing Options line:
Options +Includes
Uncomment or add in the mod_mime.c section:
AddType text/html .shtml AddHandler server-parsed .shtml
Then restart web server: /etc/rc.d/init.d/httpd restart
Webserver Root
The environment variable WEBSERVER_ROOT must be set for the program user(s) as well as the webserver. The WEBSERVER_ROOT should point to the webserver's document root directory (e.g. /home/httpd/html on linux/apache, or ??? on windows/IIS ).
Apache specific webserver configuration
Configure the webserver:
Add the appropriate web paths to the TPP as described below. If you are using the Apache http server, edit the active 'httpd.conf' file. Add the following Alias and ScriptAlias Directives as described below. Be sure to link to the appropriate tpp-version number.
# # ISB-Tools Trans Proteomic Pipeline directives #
Alias /tpp/html "/usr/local/tpp/html"
<Directory "/usr/local/tpp/html"> AllowOverride None Options Includes Indexes FollowSymLinks MultiViews Order allow,deny Allow from all </Directory>
<Directory "/usr/local/tpp/schema"> AllowOverride None Options Includes Indexes FollowSymLinks MultiViews Order allow,deny Allow from all </Directory>
ScriptAlias /tpp/cgi-bin/ "/usr/local/tpp/cgi-bin/"
<Directory "/usr/local/tpp/cgi-bin"> AllowOverride AuthConfig Limit Options ExecCGI Order allow,deny Allow from all SetEnv WEBSERVER_ROOT /home/httpd/html </Directory>
Windows/Cygwin Apache-specific configuration
Configure as described in "Post-Install Configuration" on the following page: [2]
Apache is installed and configured automatically as part of the Cygwin install.
XML File Validation
The SAX Validator will use the schema location indicated in the XML file to validate:
SAX2Count -v=always myfile.xml
Running the TPP
Overview
Specific instructions are geared towards command-line usage. Note that the web-base GUI is a very convenient way to do these steps.
Conversion of raw spectroscopy data to mzXML format
Assigning peptide sequences to spectra (using a search engine)
Converting search engine results to pepXML format
Analysis programs begin with a pepXML-format file called 'summary.xml', containing the peptides identified by the search engine.
- For SEQUEST results, you must specify the sequest.params file used for the search:
In directory with summary.html and summary.mzXML (as well as the SEQUEST results .tgz or subdirectory), type:
Sequest2XML summary.html -Psequest.params
- For Mascot results, you must specify the database used for search:
In the directory with summary.dat and summary.mzXML, type:
Mascot2XML summary.dat -D/full/path/database
You can view the search results by opening the 'summary.xml' file in your browser.
Processing peptide data with the pipeline (using xinteract)
Next, you can run xinteract to apply all or some parts of the pipeline. Type 'xinteract' with no arguments for usage instructions. You can also convert and run the pipeline in one step. See xinterct instructions for details.
Example
To run the pipeline manually, starting with file1.xml and file2.xml:
- Combine together data from 2 files:
- InteractParser interact.xml file1.xml file2.xml
Peptide Results can be viewed at any point along the analysis by opening the interact.shtml link.
- Run PeptideProphet
PeptideProphetParser interact.xml
- Run XPRESS
XPressPeptideParser interact.xml
- Run ASAPRatio
ASAPRatioPeptideParser interact.xml
- Go into database to retrieve all proteins corresponding to identified peptides
RefreshParser interact.xml /full/path/database
- ProteinProphet
ProteinProphet.pl interact.xml interact-prot.shtml XML_INPUT
From this point on, all analysis is on the output from ProteinProphet: interact-prot.xml Protein Results can be viewed at any point along the analysis by opening the interact-prot.shtml link.
- XPRESS Protein
XPressProteinParser interact-prot.xml
- ASAPRatio Protein
ASAPRatioProteinRatioParser interact-prot.xml
- ASAPRatio Pvalue
ASAPRatioPvalueParser interact-prot.xml
Questions? Search the newsgroup first, then post questions.
Converters to mzXML
Converters to pepXML
All converted pepXML files reference a standard sytlesheed pepXML_std.xsl to enable a view of the xml file directly in a browser.
Sequest2XML
retired in favor of Out2XML
Sequest2XML summary.html (-P/full/path/mysequest.params) (-M) (-m) (-a) (-pI) (-Eenzyme)
Converts summary.html to summary.xml in pepXML format. Uses sequest.params file in current directory, unless specified as second argument
- OPTIONS
- -M
- MALDI mode: do not include spectrum spot number in mzXML file name
- -m
- monoisotopic masses (regardless of sequest.params setting)
- -a
- average masses (regardless of sequest.params setting)
- -pI
- compute peptide pI values
- -Eenzyme
- set sample enzyme (default is trypsin, possible values are: nonspecific, chymotrypsin, elastase, gluc, gluc_bicarb, aspn, tca, cnbr, trypsin/cnbr, clostripain, iodosobenzoate, protein_endopeptidase, staph_protease, trypsin_k, trypsin_r)
Mascot2XML
Mascot2XML summary.dat -D/full/path/mydatabase.fasta (-pI) (-Eenzyme)
Converts summary.dat to summary.xml in pepXML format.
- OPTIONS
- See Sequest2XML for option definitions
Comet2XML
Comet2XML summary.cmt.tar.gz (-Eenzyme)
- OPTIONS
- See Sequest2XML for option definitions
Peptide-level analyses
InteractParser
InteractParser interact.xml file1.xml file2.xml file3.xml .....
Merges together pepXML files file1.xml, file2.xml, file3.xml .... into interact.xml. Combines all analysis_summary elements, and reindexes spectrum_query elements. Makes a system call to pepxml2html.pl (pepxml2html.pl -file interact.xml) to create stylesheet for viewing interact.xml amd interact.shtml in a browser.
DatabaseParser
DatabaseParser interact.xml
Prints the database(s) referenced in pepXML document
RefreshParser
RefreshParser interact.xml /full/path/database.fasta
Goes into database to find all proteins corresponding to identified peptides and overwrites results to interact.xml
EnzymeDigestionParser
EnzymeDigestionParser interact.xml (-Eenzyme)
Computes number of tolerable termini and number of missed cleavages in dataset using sample enzyme stored in interact.xml unless specified as argument.
PeptideProphetParser
PeptideProphetParser interact.xml (EXCLUDE) (LEAVE) (ICAT) (NOICAT) (ZERO) (GLYC) (MALDI) (MINPROB=xx)
Runs PeptideProphet with options and overwrites results to interact.xml
- OPTIONS
- EXCLUDE
- exclude delta stars (SEQUEST)
- LEAVE
- leave delta star values alone (SEQUEST)
- ICAT
- use peptide icat info in probability calculation
- NOICAT
- do not use peptide icat info in probability calculation
- ZERO
- do not discard any data
- GLYC
- use peptide NXS/T motif info in probability calculation
- MALDI
- specify maldi spectra
- PI
- use pI information
- ACCMASS
- use accurate mass binning
- MINPROB=xx
- filter away results with a probability less than xx
- EXTRAITRS=xx
- specify additional EM iterations
- NONTT
- Do not use NTT information
- CLEVEL=xx
- Specify Conservative Level
XPressPeptideParser
XPressPeptideParser interact.xml (-b) (-n<str>,<num>) (-n<str>,<num>) (-n<str>,<num>) (-L or -H)
Runs XPRESS with options and overwrites results to interact.xml
- OPTIONS
- -m<num>
- change XPRESS mass tolerance (default=1.0)
- -l<str>
- change labeled residues (default='C')
- -r<num>
- change XPRESS residue mass difference (default=9.0)
- -n<str>,<num>
- when specifying multiple isotopic labels, use this option e.g. -nK,3.0 -nL,3.0
- -r<num>
- change XPRESS residue mass difference (default=9.0)
- -b
- heavy labeled peptide elutes before light labeled partner
- -L
- for ratio, set/fix light to 1, vary heavy
- -H
- for ratio, set/fix heavy to 1, vary light
ASAPRatioPeptideParser
ASAPRatioPeptideParser interact.xml (-b) (-l<str>) (-S) (-m<str>) (-F) (-C)
Runs ASAPRatio with options and overwrites results to interact.xml
- OPTIONS
- -l<str>
- change labeled residues (default='C')
- -b
- heavy labeled peptide elutes before light labeled partner
- -f<num>
- areaFlag set to num (ratio display option)
- -S
- static modification quantification (i.e. each run is either all light or all heavy)
- -F
- use fixed scan range for light and heavy
- -C
- quantitate only the charge state where the CID was made
- -m<str>
- specified label masses (e.g. M74.325Y125.864), only relevant for static modification quantification
LibraPeptideParser
LibraPeptideParser interact.xml -clibra_condition.xml
Runs LIBRA using channel information specified in libra_condition.xml file and overwrites results to interact.xml
CompactParser
CompactParser file.xml
Compacts either pepXML or protXML, combining together start and end tags when no elements are contained between them
Protein-level analyses
ProteinProphet.pl
ProteinProphet.pl '<interact pep prob html file1><interact pep prob html file2>....' <outfile> (ICAT) (GLYC) (XPRESS) (ASAP_PROPHET) (ACCURACY) (ASAP) (REFRESH) (DELUDE) (NOOCCAM)
- OPTIONS
- NOOCCAM
- non-conservative maximum protein list
- ICAT
- highlight peptide cysteines
- GLYC
- highlight peptide N-glycosylation motif
- ACCURACY
- min pep prob 0
- ASAP
- compute ASAP ratios for protein entries (ASAP must have been run previously on interact dataset)
- REFRESH
- import manual changes to ASAP ratios (after initially using ASAP option)
- ASAP_PROPHET
- *New and Improved* compute ASAP ratios for protein entries (ASAP must have been run previously on all input interact datasets with mz/XML raw data format)
- DELUDE
- do NOT use peptide degeneracy information when assessing proteins
- HTML
- write output to static html page (rather than dynamic shtml)
- Other options in conjunction with HTML
- EXCELPEPS
- write output tab delim xls file including all peptides
- EXCELxx
- write output tab delim xls file including all protein (group)s with minimum probability xx, where xx is a number between 0 and 1
XPressProteinRatioParser
XPressProteinRatioParser interact-prot.xml
Runs XPRESS and overwrites results to interact-prot.xml
ASAPRatioProteinRatioParser
ASAPRatioProteinRatioParser interact-prot.xml
Runs ASAPRatio and overwrites results to interact-prot.xml
LibraProteinRatioParser
interact-prot.xml <normalization_channel>
Runs LIBRA using normalizing ratios to normalization_channel and overwrites results to interact-prot.xml
Wrappers
xinteract
xinteract (generaloptions) (-Oprophetoptions) (-Xxpressoptions) (-Aasapoptions) (-L<conditionfile>libraoptions) xmlfile1 xmlfile2 ....
options
- generaloptions
- -Nmyfile.xml
- write output to file 'myfile.xml'
- -nI
- do not run Interact (convert to pepXML only)
- -nP
- do not run PeptideProphet
- -nR
- do not run get all proteins corresponding to degenerate peptides from database
- -p0
- do not discard search results with PeptideProphet probabilities below 0.05
- -x<num>
- number of extra PeptideProphet interations; default <num>=0
- -p<num>
- filter results below PeptideProphet probability <num>; default <num>=0.05
- -mw
- calculate protein molecular weights
- -MONO
- calculate monoisotopic peptide masses during conversion to pepXML
- -AVE
- calculate average peptide masses during conversion to pepXML
- -eX
- specify sample enzyme other than trypsin
- -eC
- specify sample enzyme = Chymotrypsin
- -eA
- specify sample enzyme = AspN
- -eG
- specify sample enzyme = GluC
- -eB
- specify sample enzyme = GluC Bicarb
- -eM
- specify sample enzyme = CNBr
- -eD
- specify sample enzyme = Trypsin/CNBr
- -e3
- specify sample enzyme = Chymotrypsin/AspN/Trypsin
- -eE
- specify sample enzyme = Elastase
- -eL
- specify sample enzyme = LysN (cuts before K)
- -eP
- specify sample enzyme = LysN Promisc (cuts before KASR)
- -eN
- specify sample enzyme = Nonspecific or None
For developers:
- -t
- run regression test against a previously derived result
- -t!
- learn results for regression test
- prophetoptions (following the 'O')
- i
- use icat information in PeptideProphet
- f
- do not use icat information in PeptideProphet
- g
- use N-glyc motif information in PeptideProphet
- m
- maldi data
- I
- use pI information in PeptideProphet
- A
- use accurate mass binning in PeptideProphet
- w
- warning instead of exit with error if instrument types between runs is different
- x
- exclude all entries with asterisked score values in PeptideProphet
- l
- leave alone all entries with asterisked score values in PeptideProphet
- p
- run ProteinProphet afterwards
- u
- do not assemble protein groups in ProteinProphet analysis
- s
- do not use Occam's Razor in ProteinProphet analysis to derive the simplest protein list to explain observed peptides
- xpressoptions (will run XPRESS analysis with any specified options that follow the 'X')
- -m<num>
- change XPRESS mass tolerance (default=1.0)
- -l<str>
- change labeled residues (default='C')
- -n<str>,<num>
- change XPRESS residue mass difference for <str> to <num> (default=9.0)
- -b
- heavy labeled peptide elutes before light labeled partner
- -L
- for ratio, set/fix light to 1, vary heavy
- -H
- for ratio, set/fix heavy to 1, vary light
- asapoptions (will run ASAPRatio analysis with any specified options that follow the 'A')
- -l<str>
- change labeled residues (default='C')
- -b
- heavy labeled peptide elutes before light labeled partner
- -f<num>
- areaFlag set to num (ratio display option)
- -S
- static modification quantification (i.e. each run is either all light or all heavy)
- -F
- use fixed scan range for light and heavy
- -C
- quantitate only the charge state where the CID was made
- -m<str>
- specified label masses (e.g. M74.325Y125.864), only relevant for static modification quantification
- libraoptions (will run Libra Quantitation analysis with any specified options that follow the 'L')
- -<num>
- normalization channel (for protein level quantitation)
examples
- xinteract *.xml
- combines together data in all pepXML files into 'interact.xml', then runs PeptideProphet]
- xinteract -Ndata.xml *.xml
- same as above, but results are written to 'data.xml'
- xinteract -Ndata.xml -X -Op *.xml
- same as above, but run XPRESS analysis in its default mode, then ProteinProphet
- xinteract -X -A file1.xml file2.xml
- combines together data in file1.xml and file2.xml into 'interact.xml' and then runs XPRESS (in its default mode) and ASAPRatio (in its default mode)
- xinteract -X-nC,6.0 -A file1.xml file2.xml
- same as above, but specifies that cysteine label has a heavy/light mass difference of 6.0
- xinteract -X -A-lDE-S file1.xml file2.xml
- same as above, but specifies for ASAP to run in static mode with labeled residues D and E
- xinteract -Lmyconditionfile.xml-1 -Op file1.xml file2.xml
- run libra quantitiation after PeptideProphet using myconditionfile.xml, and after ProteinProphet normalizing ratios to channel 1 values
runpropet
Runs ProteinProphet on designated 'interact.xml' file(s), where interact.xml is a pepXML-formated output file from PeptideProphet.
usage 1: specify input file and options
runprophet -Ooptions <interact file (with probs)>
runs analysis on inputfile with specified options and writes analysis to inputfile-prot.htm
- options
- i
- icat data (color Cysteines)
- g
- N-glycosylation data (color NXS/T)
- m
- multifiles (more than one interact.xml file, must specify outfile
- d
- delude (do not look up ALL prots corresponding to degenerate peps)
- l
- use html input files (pre-TPP)
- X
- import XPRESS protein ratios
- A
- import ASAPRatio protein ratios and pvalues
- L<num>
- import Libra protein ratios normalized to channel <num>
- a
- import ASAPRatio results present in file (starting from scratch)
- r
- update changes made to ASAPRatio results (previously run using 'a' option)
- n
- don't use occam's razor for degenerate peps (get max prot list, including many false positives)
- u
- do not assemble PROTEIN GROUPS
- z
- do not include zero probability protein entries in output
- H
- writes results to static html file (and tab delimited excel file)
- P
- includes peptides in tab delimited excel file (must accompany 'H')
- xx
- includes results in tab delimited excel file with minimum probability xx, where xx is a number between 0 and 1 (must accompany 'H')
- examples
- runprophet -OiXA interact.xml
- for icat data with mzXML XPRESS and ASAPRatio quantitation information
- runprophet -Oia interact.xml
- for icat data with non-mzXML ASAPRAtio information
- runprophet -Oi interact.xml
- for icat data
- runprophet -Og interact.xml
- for N-glycosylated data
- runprophet -OL interact.htm
- for pre-TPP html input file
usage 2: specify input file and use default options
runprophet <interact file (with probs)>
runs analysis on inputfile using default options and writes analysis to inputfile-prot.htm
- example
- runprophet interact.xml
- writes output to file: interact-prot.shtml
usage 3: specify output file
runprophet (-Ooptions) <interact file (with probs)> <outputfile>
runs analysis on inputfile (with specified options) and writes analysis to specified outputfile
- example
- runprophet interact.xml protein.shtml
- writes output to file: protein.shtml
usage 4: combine multiple datasets into a single analysis
runprophet -Om(options) <interactfile1 interactfile2 ...> <outputfile>
runs analysis on multiple inputfiles (with specified options) and writes analysis to specified outputfile
- example
- runprophet -Oim interact-1.xml interact-2.xml protein.shtml
- analyzes interact-1.xml and interact-2.xml icat data together and writes output to file
- protein.shtml
usage 5: options for static html output
- example
- runprophet -OHP0.9 interact.xml
- writes results to static html file, and results with min prob 0.9 (including peptides) to tab delimited excel file
Author
Much of this text comes from the TPP README file, originally written by A. Keller, 2004. Wiki version created by J. Tasman.
Credits
The refreshparser program uses the SPARE Parts by Bruce W. Watson / Loek Cleophas.