TPP:4.3.0 Release Notes
From SPCTools
Revision as of 20:39, 7 August 2009 Ntasman (Talk | contribs) (creating 4.3.0 release notes) ← Previous diff |
Current revision Ntasman (Talk | contribs) (creating 4.3.0 release notes) |
Current revision
Trans-Proteomic Pipeline (TPP) 4.3.0 release notes
(Changes since the last official release, 4.2.1)
We are proud to offer a major update to the Trans-Proteomic Pipeline (TPP) software, release 4.3.0 (4.3 revision). The software is available for Windows as well as linux (and unofficially) through osx source from all the usual locations (please see the section below, "Getting the software"). Most users are recommended to use the Windows installer, which installs and configures the TPP and other required software (such as a webserver). For advanced users who need to customize the TPP, or for those who run high-throughput experiments, you can download the source code. Again, most users don't need build from source, and should use the Windows installer.
TPP web inteface (Petunia) updates
- allow recovery of timed-out jobs by adding link "Click here if your commands have actually completed but browser timed out" to the "Your commands are still running." display which forces session along to the results display.
- Added pages for ETD-related charge-guessing tools (createChargeFile.pl, mergeCharges.pl; see ETD section below) under new 'Pre-Process' section, and updateAllPaths.pl for moving files on the same system or between systems (under Utilities)
- Allow Libra to normalize on channels 1-8
- Updating TMT-6 isotopic correction factors. Values provided by R. Gundry
- performance fix for large gzipped files - only need to uncompress a
little of the file to determine file content type
- when fileChooser is being used to select a search db, let's not show stuff like pepXML or mass spec data that we know isn't searchdb
- Added 12/13C metabolic labeling option to XPress
- add intensity threshold option to MzXML2Search; Added default value of threshold 0.01
- add page for utility to fix and update paths within TPP files (for example, for files moved within the filesystem, or copied over from another computer)
improved mzXML output, introducing 3.1
- more validating: XML output validates except for a tricky uniqueness constraint on the msInstrument (inherited from earlier mzXML schemas); (the key and keyref depend on optional elements, which may not be valid.)
- introduces optional possibleCharges (see ETD section, below)
improved build scripts
Automatic system configuration scripts have been added to improve building the TPP on the many varieties of linux.
- scripts are now avaible which install required packages for building the TPP on Ubuntu (8.04, 8.10, 9.04) and Centos (5.2), which hopefully cover the experience for most RedHat- and Debian-derrived distributions
- updated Windows build notes
improved linux/mac build testing
- Code is now built on Ubuntu (8.04 32-bit, 8.10 32-bit, 9.04 64-bit ) and Centos (5.2 64-bit) and Mac OS X (10.5 64-bit) in order to test compilation.
improved support for deploying the TPP on Amazon Web Services EC2
"cloud computing"
ETD support
- added charge state prediction software
- TPP's X!Tandem modified with new functionality for parsing optional mzXML 3.1 precursorMz "possibleCharges" attribute, a comma-separated list of additional possible precursor charges, as may come from ETD data. (Example: "2,4,7")
Mayu
Mayu: adding Lukas Reiter's software to calculate decoy-estimated PSM, peptide, and protein FDR.
- Further notes available at the SPCtools wiki, http://tools.proteomecenter.org/wiki/index.php?title=Software:Mayu
- Mayu publication: "Protein identification false discovery rates for very large proteomics datasets generated by tandem mass spectrometry", http://www.mcponline.org/cgi/reprint/M900317-MCP200v1
MzXML2Search
output is identical to previous working version, except that odta and mgf behivior is changed, and possibleCharges (mzXML 3.1) is taken into account (note, tested.)
- mgf: since we may have more than 8 possible charges, we use charge state for our TITLE convension, and PEPMASS is calcated from the assumed charge, rather than outputting a list of (less than 8) possible charges for one scan, I output one scan per possible charge
state.
- odta: multiple charges are written out, not just the first (as was probably a mistake, previously.)
- added minimum peak intensity threshold option for all output modes:
-I<num> where num is a float specifying minimum threshold for peak intensity, default=0
- set default minimum intensity to 0.01 to avoid zero peaks in output
- bugfix: user-specified charge range was mishandled
new utility: add_mz
usage:
This program takes in a text file containing peptide sequence and charge state as first 2 columns. Output appends additional columns of precursor m/z & number of NxS/T motifs. All Asp in NxS/T motifs are considered glycosylated +0.984016. All Cys are +57.021464. Met are +15.994915 if indicated modified in the sequence. Every other modification is ignored! Peptides must contain preceeding & trailing amino acids.
new utility: calculate_pi
This program will add a pI column to the file.
usage:
calculate_pi inputfile.txt
here inputfile.txt contains peptide sequences, one on each row.
new utility: indexmzXML
This program will check a mzXML file's scan index, adding if missing or fixing and rewriting, if necessary.
improved utility: updateAllPaths
This program will fix and rewrite paths within TPP files. Useful for updating TPP files moved on the same computer, or from another computer. Can correctly handle unix to Windows path changes.
pepXMLViewer
- added switch to include B in glyco motif (as [NB][^P][ST]
- modified columns displayed to remove esoteric search engine scores if PeptideProphet probability is present
ramp parser API
- added precursor possibleCharges parsing (originally in MzXML2Search), adds new ScanHeaderStruct members for this data
readmzXML utility
- add -s option to dump peaklist w/o scan header info to stdout for scans specified on command line.
- print out more precision for precursor m/z; remove pause between header & peaks for brief peak list option
ReAdW
- adding option to force precursor m/z determination only from Thermo "filter line"-- do not try to get more accurate mass. Only use this for a reason-- such as a (non-standard) method that is known to record the correct mass *only* in the filter line.
- add sanity check for files with rogue 'accurate' monoisotopic m/z values
trapper
- changes from David Horn of Agilent: fixes for QQQ machines with cycle summing. Technical explanation from D. Horn: # of scans vs # of points in TIC check was failing, as the chromatogram filter was summing cycles. For files with multiple scan methods per cycle, scans were combined; hence difference in chromatogram data points. Cycle summing is now turned off.
massWolf
- improved EXPERIMENTAL centroiding (peak-picking) code, graciously contributed by Dmitrii T.
X!Tandem
- new feature for ETD data handling multiple charges from mzXML 3.1:
- compressed output: allow for direct production of gzipped xtandem results (just specify .tandem.gz as output filename extension)
SpectraST
- Enable library import from tab-delimited table
- Fix bug in create option -c_WGT (rarely used)
- Read base_name from msms_run_summary rather than search_summary for constructive alternative path
- Use activation_method and possible_charge when searching; refine decoy searching - consider 2nd annotation
- Add two SILAC mods: 13C(6)15N(3) and 13C(6)15N(1)
- Fix horrible mangled name bug when first amino acid is modified (introduced in r4335)
- Add MRM table format SHOWINFO which prints additional info about each transition (including the score)
- Add new HTML output format; Deprecate seldom-used and clumsy -s_SAV and -s_TGZ option; Add hidden option to output results to different directory
- Parse and use prevAA and nextAA from alternative_protein in pepXML import to maximize NTT
- Allow non-peptide IDs in TSV import; fix some seg faults when ID is non-peptide
- Option to compile without kwset (GPL)
- Read activation method (fragmentation type) and retention time also from mzXML file during library import
- Fix seg fault when searching mzXMLs with possibleCharge (e.g. with charge prediction for ETD)
- Add Ubl modifications; allow distinct mod tokens for isobaric mods (as [string])
- Open fewer files simultaneously during pepXML import; fix minor memory leak
- Enable precursor removal for ETD based on precursor charge state; improve ETD annotation
- Enable search of .sptxt files (treated as .msp files)
modelling
- Correct display error for prot xml file generated directly from the web interface to ProteinProphet results.
- One solution for cases of multiple search_hits with rank 1
- Compute and cache variable bandwidths to save time.
- Guard against division by zero when a datapoint falls exactly on the grid point.
- Tuning OMSSA bandwidth selection for more smoothing
- Tuning iProphet
- updates to handle multiple search_hits with rank 1 for a single spectrum query.
- Do prob adjustment for any number of charge states a spectrum is searched in. Don't assume that each spectrum of unknown charge is
either 2+ and 3+ only, use all predicted charges.
- store a flag defining whether the given result is decoy or not.
- Fix bug leading to reporting of probabilities for failed or ignored charge states.
- Guard against inf loop when the initial stddev is zero, e.g. when all ids in the run are decoys.
- Fix inf loop bug caused by wrong array index being updated.
- Store confidence value.
- Extend max charge to +7.
- Adding Lys-C enzyme option to xinteract.
- Auto-reject any X containing peptide
- YABSE Search Engine support
- Tuning OMSSA models.
- Fix problem of un-iterated probabilities getting used.
- Recognize iProphet and show those probabilities when available.
- Show iProphet probabilities and results.
- Pass through analysis_summary tags that are not from PeptideProphet.
- Pass through iProphet probabilities and results when available
- Variable bandwidth selection.
- Tuning discrim val mixture distribution for inspect.
- add a return code to RT model for autodisabling the model when poor correlation. fix bug of RT model being computed but not used in final probability computation.
- More useful error message when parsing a search result breaks.
- Limit range of pI z-scores to model from -10 to 10
other bugfixes and improvements
- CID Spectrum Display cgi: fix a problem with mingw using web path instead of filesystem path for pngfile
- Out2XML: now tolerant of variable number of header lines in .out files
- ProteinProphet: fixed a divide-by-zero error caused by use of tab+space instead of space after protein name in fasta file (tab was read as part of fasta protein name)
- Sequest2XML: slightly more sophisticated sense of filename extension (so it reports .mzML.gz instead of just .gz)
- getdb.swissprot utility: updated database download link
- libra: more sophisticated sense of mzML/mzXML filetypes to deal with possibility of .gz
- XPRESS: added 13C metabolic labeling support using -O and -P options
- runsearch: more sophisticated sense of mzML/mzXML filetypes to deal with possibility of .gz
- subsetdb: add option to read match/no-match accession strings from files
- tandem2xml: accept gzipped input files
- xinteract: don't do pepxml indexing when running headless (as in CPAS usage)
- building: boost library updated from 1.35.0 to 1.39.0. Includes Improved 64-bit support. Tested on Centos 5.2 64-bit and Ubuntu 9.04 64-bit.
- PeptideProphet: Make tool more robust by auto trimming whitespace from spectrum names. Tandem searches submitted by some users which were generated with method mzXML->mgf->tandem->pepXML were previously getting an extra space to the end of spectrum name.
- ProteinProphet: Fix bug of indistinguishable_peptides being reported wrong with bad charge and first 2 chars stripped off.
known issues
- mzWiff: charge state information is not recorded in output mzXML
- massWolf: charge state information is not recorded in output mzXML
- trapper: if dual-mode file is processed, the resulting file is possibly always converted as centroid (even if centroid is not selected.)
Getting the TPP software
- Download the TPP version 4.3.0 native windows installer (TPP_Setup_v4_3_JETSTREAM_rev_0.exe) from the Sashimi SourceForge project file release page:
"http://sourceforge.net/projects/sashimi/files/"
- Everyone is encouraged to read and contribute to our wiki, at
"http://tools.proteomecenter.org/wiki/"
- For guides to installing and using our software, please see our wiki:
"http://tools.proteomecenter.org/wiki/index.php?title=Software:TPP"
- For downloading the source code, please go to the following link:
"http://sourceforge.net/projects/sashimi/files/" and find the 4.3.0 source code .zip package;
or, check out the code directly from svn:
"svn co http://sashimi.svn.sourceforge.net/svnroot/sashimi/tags/release_4-3-0"
For building from source, please refer to the readme file in TPP/src as well as the wiki.
The TPP Team: Luis, David, Brian, and Natalie, plus all other developers who contributed to this release from the ISB. Thanks to developers and users from the TPP's user community who provided feedback and code contributions.