TPP:Developer Documentation

From SPCTools

(Difference between revisions)

Jump to: navigation, search

Revision as of 18:30, 19 July 2007

1 License
2 Languages
3 Version control
- 3.1 SVN access
  - 3.1.1 Anonymous access
  - 3.1.2 Developer access
4 XML Parsing libraries
- 4.1 perl
- 4.2 c++
5 System Requirements
6 Building from source
7 Webserver Configuration
8 XML File Validation
9 Running the TPP
10 Author
11 Credits

License

All of the TPP is released under the LGPL license. Some other programs (converters) have different licences.

Languages

C, C++, perl

Version control

This project is hosted at sourceforge ([1]) under the "sashimi" project, using CVS.

SVN access

Anonymous access

Anyone can check out the code: ...

Developer access

You'll need to be a project developer: register at SourceForge and contact one of the main Sashimi developers.

 see [ttps://sourceforge.net/svn/?group_id=69281 our SourceForge SVN info page.]

XML Parsing libraries

perl

xml tree

c++

A. Keller's "tag" system:

All Parser programs (children of Parser.cxx) parse pepXML and protXML with the constrain that all text be enclosed within a 'tag' such as the following example:

<mytag name="akeller"><email address="akeller@systemsbiology.org/></mytag>

NOT ACCEPTABLE are files with text outside of a bracket tag enclosure, such as 'akeller' in the following illegal example:

<illegaltag>akeller</illegaltag>

J. Tasman code

System Requirements

Webserver

Webserver with access to data directories. See configuration section below. Apache support is the default configuration and is installed and configured on port 1441 automatically as part of the current cygwin installation. IIS on windows is no longer a supported configuration.

XSLT Processor

The TPP currently relies on an "xsl transform" processor for manipulating XML data.

One such program, xsltproc, is usually distributed with Linux so you most likely already have it. It should reside in the /usr/bin/ directory. If it is not already on your computer, first try to install it via the standard package system for your distribution, or the cygwin installer for cygwin. Otherwise you can download it for free at: http://xmlsoft.org/XSLT/downloads.html
Another freely available XSLT processor, Xalan, will also work fine with ProteinProphet. If you use it, just make sure it is installed in a directory already on the library path, or else set the LD_LIBRARY_PATH variable to include its location on the webserver:

Add LD_LIBRARY_PATH files to /etc/ld.so.conf
Then type: ldconfig -v

The viewing of large xml files is sometimes slow. We are hoping to optimize the stylesheets in the future, and in some cases, bypass XSLT altogether.

Libraries

expat-2.0.0 (copied to cvs tree)
boost 1.32
xerces (version?)

You will need to install the following C libraries. These libraries are very common on linux distributions. Make sure that you do not already have them before trying to install. If you do need to install, first try to use the standard package system (e.g. RPMs for Fedora linux , cygwin installer for cygwin-- see below, etc.) If you cannot install them via the normal package system for your distribution, go to their website and download directly.

libgd www.boutell.com/gd
libpng www.libpng.org
zlib www.gzip.org/zlib

Windows users should get this by using the Cygwin installer (www.cygwin.com). Make sure to get the devel packages, in order to have the required .h files.

Building from source

Configuration

Linux only. Skip ahead to Compilation for a Cygwin build. Note that we distribute precompiled binaries in a complete custom cygwin installation.

Edit the src/Makefile.incl file

Set the TPP_ROOT variable to the directory where you want to install the TPP. Include the trailing '/' when setting the path, e.g.:

TPP_ROOT=/usr/local/tpp/

Set the TPP_WEB variable to the webserver root relative alias to the TPP. Include the trailing '/' when setting the path, e.g.:

TPP_WEB=/tpp/

WARNING: To avoid problems during the installation 'you MUST include the trailing '/' when setting the above two paths'.

Set the XSLT_PROC to the path of the xsltproc executable on your system

XSLT_PROC=/usr/bin/xsltproc

In the src directory, type 'make configure' (once again, DON'T DO THIS for Windows/Cygwin!!!)

Compilation

Linux

In the src directory: type 'make all' to compile binaries

Windows/Cygwin

In the src directory: type 'make windows' to compile binaries

Installation

Linux

In the src directory: type 'make install' to install all the binaries

The TPP will be installed in the following directory structure by default:

/usr/local/tpp: TPP root directory
/usr/local/tpp/cgi-bin: CGI-BIN for tpp, contains all web served executables
/usr/local/tpp/bin: binary directory
/usr/local/tpp/html: contains all non-executable web served objects
/usr/local/tpp/etc: contains miscelaneous configuration files
/usr/local/tpp/schema: contains all XML schema files

Windows/Cygwin

In the src directory: type 'make install-windows' to install all the binaries

Webserver Configuration

Ideally, all data directories should be cross mounted under the webserver root. Webserver should have SSI (server side includes) turned on. If it is not already, you can activate the Web Server SSI (Server Side Includes)

Activating SSI on Linux

Modify the /etc/httpd.conf file:

In the document root <Directory> section, add +Includes to the end of already existing Options line:

 Options +Includes

Uncomment or add in the mod_mime.c section:

AddType text/html .shtml
AddHandler server-parsed  .shtml

Then restart web server: /etc/rc.d/init.d/httpd restart

Webserver Root

The environment variable WEBSERVER_ROOT must be set for the program user(s) as well as the webserver. The WEBSERVER_ROOT should point to the webserver's document root directory (e.g. /home/httpd/html on linux/apache, or ??? on windows/IIS ).

Apache specific webserver configuration

Configure the webserver:

Add the appropriate web paths to the TPP as described below. If you are using the Apache http server, edit the active 'httpd.conf' file. Add the following Alias and ScriptAlias Directives as described below. Be sure to link to the appropriate tpp-version number.

#
# ISB-Tools Trans Proteomic Pipeline directives
#


Alias /tpp/html "/usr/local/tpp/html"


<Directory "/usr/local/tpp/html">
   AllowOverride None
   Options Includes Indexes FollowSymLinks MultiViews
   Order allow,deny
   Allow from all
</Directory>


<Directory "/usr/local/tpp/schema">
   AllowOverride None
   Options Includes Indexes FollowSymLinks MultiViews
   Order allow,deny
   Allow from all
</Directory>


ScriptAlias /tpp/cgi-bin/ "/usr/local/tpp/cgi-bin/"


<Directory "/usr/local/tpp/cgi-bin">
   AllowOverride AuthConfig Limit
   Options ExecCGI
   Order allow,deny
   Allow from all
   SetEnv WEBSERVER_ROOT /home/httpd/html
</Directory>

Windows/Cygwin Apache-specific configuration

Configure as described in "Post-Install Configuration" on the following page: [2]

Apache is installed and configured automatically as part of the Cygwin install.

XML File Validation

The SAX Validator will use the schema location indicated in the XML file to validate:

SAX2Count -v=always myfile.xml

Running the TPP

Overview

Specific instructions are geared towards command-line usage. Note that the web-base GUI is a very convenient way to do these steps.

Conversion of raw spectroscopy data to mzXML format

Assigning peptide sequences to spectra (using a search engine)

Converting search engine results to pepXML format

Analysis programs begin with a pepXML-format file called 'summary.xml', containing the peptides identified by the search engine.

For SEQUEST results, you must specify the sequest.params file used for the search:

In directory with summary.html and summary.mzXML (as well as the SEQUEST results .tgz or subdirectory), type:

Sequest2XML summary.html -Psequest.params

For Mascot results, you must specify the database used for search:

In the directory with summary.dat and summary.mzXML, type:

Mascot2XML summary.dat -D/full/path/database

You can view the search results by opening the 'summary.xml' file in your browser.

Processing peptide data with the pipeline (using xinteract)

Next, you can run xinteract to apply all or some parts of the pipeline. Type 'xinteract' with no arguments for usage instructions. You can also convert and run the pipeline in one step. See xinterct instructions for details.

Example

To run the pipeline manually, starting with file1.xml and file2.xml:

Combine together data from 2 files:
InteractParser interact.xml file1.xml file2.xml

Peptide Results can be viewed at any point along the analysis by opening the interact.shtml link.

Run PeptideProphet

PeptideProphetParser interact.xml

Run XPRESS

XPressPeptideParser interact.xml

Run ASAPRatio

ASAPRatioPeptideParser interact.xml

Go into database to retrieve all proteins corresponding to identified peptides

RefreshParser interact.xml /full/path/database

ProteinProphet

ProteinProphet.pl interact.xml interact-prot.shtml XML_INPUT

From this point on, all analysis is on the output from ProteinProphet: interact-prot.xml Protein Results can be viewed at any point along the analysis by opening the interact-prot.shtml link.

XPRESS Protein

XPressProteinParser interact-prot.xml

ASAPRatio Protein

ASAPRatioProteinRatioParser interact-prot.xml

ASAPRatio Pvalue

ASAPRatioPvalueParser interact-prot.xml

Questions? Search the newsgroup first, then post questions.

Converters to mzXML

Converters to pepXML

All converted pepXML files reference a standard sytlesheed pepXML_std.xsl to enable a view of the xml file directly in a browser.

Sequest2XML

retired in favor of Out2XML

Sequest2XML summary.html (-P/full/path/mysequest.params) (-M) (-m) (-a) (-pI) (-Eenzyme)

Converts summary.html to summary.xml in pepXML format. Uses sequest.params file in current directory, unless specified as second argument

OPTIONS
-M: MALDI mode: do not include spectrum spot number in mzXML file name
-m: monoisotopic masses (regardless of sequest.params setting)
-a: average masses (regardless of sequest.params setting)
-pI: compute peptide pI values
-Eenzyme: set sample enzyme (default is trypsin, possible values are: nonspecific, chymotrypsin, elastase, gluc, gluc_bicarb, aspn, tca, cnbr, trypsin/cnbr, clostripain, iodosobenzoate, protein_endopeptidase, staph_protease, trypsin_k, trypsin_r)

Mascot2XML

Mascot2XML summary.dat -D/full/path/mydatabase.fasta (-pI) (-Eenzyme)

Converts summary.dat to summary.xml in pepXML format.

OPTIONS
See Sequest2XML for option definitions

Comet2XML

Comet2XML summary.cmt.tar.gz (-Eenzyme)

OPTIONS
See Sequest2XML for option definitions

Peptide-level analyses

InteractParser

InteractParser interact.xml file1.xml file2.xml file3.xml .....

Merges together pepXML files file1.xml, file2.xml, file3.xml .... into interact.xml. Combines all analysis_summary elements, and reindexes spectrum_query elements. Makes a system call to pepxml2html.pl (pepxml2html.pl -file interact.xml) to create stylesheet for viewing interact.xml amd interact.shtml in a browser.

DatabaseParser

DatabaseParser interact.xml

Prints the database(s) referenced in pepXML document

RefreshParser

RefreshParser interact.xml /full/path/database.fasta

Goes into database to find all proteins corresponding to identified peptides and overwrites results to interact.xml

EnzymeDigestionParser

EnzymeDigestionParser interact.xml (-Eenzyme)

Computes number of tolerable termini and number of missed cleavages in dataset using sample enzyme stored in interact.xml unless specified as argument.

PeptideProphetParser

PeptideProphetParser interact.xml (EXCLUDE) (LEAVE) (ICAT) (NOICAT) (ZERO) (GLYC) (MALDI) (MINPROB=xx)

Runs PeptideProphet with options and overwrites results to interact.xml

OPTIONS
EXCLUDE: exclude delta stars (SEQUEST)
LEAVE: leave delta star values alone (SEQUEST)
ICAT: use peptide icat info in probability calculation
NOICAT: do not use peptide icat info in probability calculation
ZERO: do not discard any data
GLYC: use peptide NXS/T motif info in probability calculation
MALDI: specify maldi spectra
PI: use pI information
ACCMASS: use accurate mass binning
MINPROB=xx: filter away results with a probability less than xx
EXTRAITRS=xx: specify additional EM iterations

XPressPeptideParser

XPressPeptideParser interact.xml (-b) (-n<str>,<num>) (-n<str>,<num>) (-n<str>,<num>) (-L or -H)

Runs XPRESS with options and overwrites results to interact.xml

OPTIONS
-m<num>: change XPRESS mass tolerance (default=1.0)
-l<str>: change labeled residues (default='C')
-r<num>: change XPRESS residue mass difference (default=9.0)
-n<str>,<num>: when specifying multiple isotopic labels, use this option e.g. -nK,3.0 -nL,3.0
-r<num>: change XPRESS residue mass difference (default=9.0)

-b: heavy labeled peptide elutes before light labeled partner

-L: for ratio, set/fix light to 1, vary heavy

-H: for ratio, set/fix heavy to 1, vary light

ASAPRatioPeptideParser

ASAPRatioPeptideParser interact.xml (-b) (-l<str>) (-S) (-m<str>) (-F) (-C)

Runs ASAPRatio with options and overwrites results to interact.xml

OPTIONS
-l<str>: change labeled residues (default='C')
-b: heavy labeled peptide elutes before light labeled partner
-f<num>: areaFlag set to num (ratio display option)
-S: static modification quantification (i.e. each run is either all light or all heavy)
-F: use fixed scan range for light and heavy
-C: quantitate only the charge state where the CID was made
-m<str>: specified label masses (e.g. M74.325Y125.864), only relevant for static modification quantification

LibraPeptideParser

LibraPeptideParser interact.xml -clibra_condition.xml

Runs LIBRA using channel information specified in libra_condition.xml file and overwrites results to interact.xml

CompactParser

CompactParser file.xml

Compacts either pepXML or protXML, combining together start and end tags when no elements are contained between them

Protein-level analyses

ProteinProphet.pl

ProteinProphet.pl '<interact pep prob html file1><interact pep prob html file2>....' <outfile> (ICAT) (GLYC) (XPRESS) (ASAP_PROPHET) (ACCURACY) (ASAP) (REFRESH) (DELUDE) (NOOCCAM)

OPTIONS
NOOCCAM: non-conservative maximum protein list
ICAT: highlight peptide cysteines
GLYC: highlight peptide N-glycosylation motif
ACCURACY: min pep prob 0
ASAP: compute ASAP ratios for protein entries (ASAP must have been run previously on interact dataset)
REFRESH: import manual changes to ASAP ratios (after initially using ASAP option)
ASAP_PROPHET: *New and Improved* compute ASAP ratios for protein entries (ASAP must have been run previously on all input interact datasets with mz/XML raw data format)
DELUDE: do NOT use peptide degeneracy information when assessing proteins
HTML: write output to static html page (rather than dynamic shtml)
Other options in conjunction with HTML
EXCELPEPS: write output tab delim xls file including all peptides
EXCELxx: write output tab delim xls file including all protein (group)s with minimum probability xx, where xx is a number between 0 and 1

XPressProteinRatioParser

XPressProteinRatioParser interact-prot.xml

Runs XPRESS and overwrites results to interact-prot.xml

ASAPRatioProteinRatioParser

ASAPRatioProteinRatioParser interact-prot.xml

Runs ASAPRatio and overwrites results to interact-prot.xml

LibraProteinRatioParser

interact-prot.xml <normalization_channel>

Runs LIBRA using normalizing ratios to normalization_channel and overwrites results to interact-prot.xml

Wrappers

xinteract

xinteract (generaloptions) (-Oprophetoptions) (-Xxpressoptions) (-Aasapoptions) (-L<conditionfile>libraoptions) xmlfile1 xmlfile2 ....

options

generaloptions

-Nmyfile.xml

write output to file 'myfile.xml'

-nI

do not run Interact (convert to pepXML only)

-nP

do not run PeptideProphet

-nR

do not run get all proteins corresponding to degenerate peptides from database

-p0

do not discard search results with PeptideProphet probabilities below 0.05

-x<num>

number of extra PeptideProphet interations; default <num>=0

-p<num>

filter results below PeptideProphet probability <num>; default <num>=0.05

-mw

calculate protein molecular weights

-MONO

calculate monoisotopic peptide masses during conversion to pepXML

-AVE

calculate average peptide masses during conversion to pepXML

-eX

specify sample enzyme other than trypsin

-eC: specify sample enzyme = Chymotrypsin
-eA: specify sample enzyme = AspN
-eG: specify sample enzyme = GluC
-eB: specify sample enzyme = GluC Bicarb
-eM: specify sample enzyme = CNBr
-eD: specify sample enzyme = Trypsin/CNBr
-e3: specify sample enzyme = Chymotrypsin/AspN/Trypsin
-eE: specify sample enzyme = Elastase
-eL: specify sample enzyme = LysN (cuts before K)
-eP: specify sample enzyme = LysN Promisc (cuts before KASR)
-eN: specify sample enzyme = Nonspecific or None

For developers:

-t: run regression test against a previously derived result
-t!: learn results for regression test

prophetoptions (following the 'O')
i: use icat information in PeptideProphet
f: do not use icat information in PeptideProphet
g: use N-glyc motif information in PeptideProphet
m: maldi data
I: use pI information in PeptideProphet
A: use accurate mass binning in PeptideProphet
w: warning instead of exit with error if instrument types between runs is different
x: exclude all entries with asterisked score values in PeptideProphet
l: leave alone all entries with asterisked score values in PeptideProphet

p: run ProteinProphet afterwards
u: do not assemble protein groups in ProteinProphet analysis
s: do not use Occam's Razor in ProteinProphet analysis to derive the simplest protein list to explain observed peptides

xpressoptions (will run XPRESS analysis with any specified options that follow the 'X')
-m<num>: change XPRESS mass tolerance (default=1.0)
-l<str>: change labeled residues (default='C')
-n<str>,<num>: change XPRESS residue mass difference for <str> to <num> (default=9.0)
-b: heavy labeled peptide elutes before light labeled partner
-L: for ratio, set/fix light to 1, vary heavy
-H: for ratio, set/fix heavy to 1, vary light

asapoptions (will run ASAPRatio analysis with any specified options that follow the 'A')
-l<str>: change labeled residues (default='C')
-b: heavy labeled peptide elutes before light labeled partner
-f<num>: areaFlag set to num (ratio display option)
-S: static modification quantification (i.e. each run is either all light or all heavy)
-F: use fixed scan range for light and heavy
-C: quantitate only the charge state where the CID was made
-m<str>: specified label masses (e.g. M74.325Y125.864), only relevant for static modification quantification

libraoptions (will run Libra Quantitation analysis with any specified options that follow the 'L')
-<num>: normalization channel (for protein level quantitation)

examples

xinteract *.xml: combines together data in all pepXML files into 'interact.xml', then runs PeptideProphet]

xinteract -Ndata.xml *.xml: same as above, but results are written to 'data.xml'

xinteract -Ndata.xml -X -Op *.xml: same as above, but run XPRESS analysis in its default mode, then ProteinProphet

xinteract -X -A file1.xml file2.xml: combines together data in file1.xml and file2.xml into 'interact.xml' and then runs XPRESS (in its default mode) and ASAPRatio (in its default mode)

xinteract -X-nC,6.0 -A file1.xml file2.xml: same as above, but specifies that cysteine label has a heavy/light mass difference of 6.0

xinteract -X -A-lDE-S file1.xml file2.xml: same as above, but specifies for ASAP to run in static mode with labeled residues D and E

xinteract -Lmyconditionfile.xml-1 -Op file1.xml file2.xml: run libra quantitiation after PeptideProphet using myconditionfile.xml, and after ProteinProphet normalizing ratios to channel 1 values

runpropet

Runs ProteinProphet on designated 'interact.xml' file(s), where interact.xml is a pepXML-formated output file from PeptideProphet.

usage 1: specify input file and options

runprophet -Ooptions <interact file (with probs)>

runs analysis on inputfile with specified options and writes analysis to inputfile-prot.htm

options
i: icat data (color Cysteines)
g: N-glycosylation data (color NXS/T)
m: multifiles (more than one interact.xml file, must specify outfile
d: delude (do not look up ALL prots corresponding to degenerate peps)
l: use html input files (pre-TPP)
X: import XPRESS protein ratios
A: import ASAPRatio protein ratios and pvalues
L<num>: import Libra protein ratios normalized to channel <num>
a: import ASAPRatio results present in file (starting from scratch)
r: update changes made to ASAPRatio results (previously run using 'a' option)
n: don't use occam's razor for degenerate peps (get max prot list, including many false positives)
u: do not assemble PROTEIN GROUPS
z: do not include zero probability protein entries in output
H: writes results to static html file (and tab delimited excel file)
P: includes peptides in tab delimited excel file (must accompany 'H')
xx: includes results in tab delimited excel file with minimum probability xx, where xx is a number between 0 and 1 (must accompany 'H')

examples
runprophet -OiXA interact.xml: for icat data with mzXML XPRESS and ASAPRatio quantitation information

runprophet -Oia interact.xml: for icat data with non-mzXML ASAPRAtio information

runprophet -Oi interact.xml: for icat data

runprophet -Og interact.xml: for N-glycosylated data

runprophet -OL interact.htm: for pre-TPP html input file

usage 2: specify input file and use default options

runprophet <interact file (with probs)>

runs analysis on inputfile using default options and writes analysis to inputfile-prot.htm

example
runprophet interact.xml: writes output to file: interact-prot.shtml

usage 3: specify output file

runprophet (-Ooptions) <interact file (with probs)> <outputfile>

runs analysis on inputfile (with specified options) and writes analysis to specified outputfile

example

runprophet interact.xml protein.shtml: writes output to file: protein.shtml

usage 4: combine multiple datasets into a single analysis

runprophet -Om(options) <interactfile1 interactfile2 ...> <outputfile>

runs analysis on multiple inputfiles (with specified options) and writes analysis to specified outputfile

example: runprophet -Oim interact-1.xml interact-2.xml protein.shtml

analyzes interact-1.xml and interact-2.xml icat data together and writes output to file: protein.shtml

usage 5: options for static html output

example: runprophet -OHP0.9 interact.xml; writes results to static html file, and results with min prob 0.9 (including peptides) to tab delimited excel file

Author

Much of this text comes from the TPP README file, originally written by A. Keller, 2004. Wiki version created by J. Tasman.

Credits

The refreshparser program uses the SPARE Parts by Bruce W. Watson / Loek Cleophas.

Retrieved from "http://tools.proteomecenter.org/wiki/index.php?title=TPP:Developer_Documentation"

-===CVS access===
+===SVN access===
 ====Anonymous access====
 ====Developer access====
-You'll need to be a project developer: register at sourceforge and contact one of the main sashimi developers.
+You'll need to be a project developer: register at SourceForge and contact one of the main Sashimi developers.
- export CVS_RSH=ssh
+  see [ttps://sourceforge.net/svn/?group_id=69281 our SourceForge SVN info page.]
- cvs -z3 -d:ext:YOUR_SOURCEFORGE_USERNAME@sashimi.cvs.sourceforge.net:/cvsroot/sashimi co MODULENAME
-where MODULENAME is trans_proteomic_pipeline, massWolf, etc.
 ==XML Parsing libraries==

TPP:Developer Documentation

From SPCTools

Revision as of 18:30, 19 July 2007

Contents

License

Languages

Version control

SVN access

Anonymous access

Developer access

XML Parsing libraries

perl

c++

System Requirements

Webserver

XSLT Processor

Libraries

Building from source

Configuration

Edit the src/Makefile.incl file

Compilation

Linux

Windows/Cygwin

Installation

Linux

Windows/Cygwin

Webserver Configuration

Activating SSI on Linux

Webserver Root

Apache specific webserver configuration

Windows/Cygwin Apache-specific configuration

XML File Validation

Running the TPP

Overview

Conversion of raw spectroscopy data to mzXML format

Assigning peptide sequences to spectra (using a search engine)

Converting search engine results to pepXML format

Processing peptide data with the pipeline (using xinteract)

Example

Converters to mzXML

Converters to pepXML

Sequest2XML

Mascot2XML

Comet2XML

Peptide-level analyses

InteractParser

DatabaseParser

RefreshParser

EnzymeDigestionParser

PeptideProphetParser

XPressPeptideParser

ASAPRatioPeptideParser

LibraPeptideParser

CompactParser

Protein-level analyses

ProteinProphet.pl

XPressProteinRatioParser

ASAPRatioProteinRatioParser

LibraProteinRatioParser

Wrappers

xinteract

options

examples

runpropet

usage 1: specify input file and options

usage 2: specify input file and use default options

usage 3: specify output file

usage 4: combine multiple datasets into a single analysis

usage 5: options for static html output

Author

Credits

Views

Personal tools

Navigation

support newsgroups

Search

Toolbox