Software:SpectraST

From SPCTools

(Difference between revisions)
Jump to: navigation, search
Revision as of 21:06, 5 June 2008
Henrylam (Talk | contribs)
(SpectraST Options)
← Previous diff
Revision as of 18:52, 16 June 2008
Henrylam (Talk | contribs)
(SpectraST File List Feature)
Next diff →
Line 221: Line 221:
</nowiki></pre> </nowiki></pre>
-<div class="messagebox" style="float: right; width: 200px; border: thin solid #DDDDFF; padding: 10px; margin-left: 10px;">Note: One can mix Search jobs and Create jobs in the same .list file. Command-line options will override those specified in the .list file with lines preceded by ‘?’.</div>+<div class="messagebox" style="float: right; width: 200px; border: thin solid #DDDDFF; padding: 10px; margin-left: 10px;">Note: One can mix Search jobs and Create jobs in the same .list file. Command-line options will be overridden by those specified in the .list file with lines preceded by ‘?’.</div>
Then running the command: Then running the command:

Revision as of 18:52, 16 June 2008

SpectraST (short for "Spectra Search Tool" and rhymes with "contrast") is a spectral library building and searching tool designed primarily for shotgun proteomics applications. It is developed at the Institute for Systems Biology (ISB), in the research group of Professor Ruedi Aebersold. The main developer is Henry Lam.

The latest version of SpectraST is 3.0, to be released sometime in 2008, pending the publication of a manuscript. It is distributed by ISB under the LPGL license, as a component of the Trans Proteomic Pipeline (TPP) suite of software, distributed under the same license. The source code repository is at [1], and the official download site for the Windows installer is at [2]. Currently, the released version (2.0) with TPP can only perform library searching and does not have library creation functionalities.

Contents

Introduction to Shotgun Proteomics and Spectral Searching

The goal of proteomics is the systematic identification and quantification of all proteins in a biological system. In one of the most frequently practiced workflows, commonly known as shotgun proteomics, a protein sample of interest is first digested with a proteolytic enzyme (trypsin being the most common) to yield peptides that are amenable to LC-MS/MS analysis. The peptides in the resulting mixture are chromatographically resolved, ionized by techniques such as electrospray ionization (ESI) or matrix-assisted laser desorption ionization (MALDI) before being analyzed by a mass spectrometer. A fraction of the peptide ions are selectively isolated by the mass spectrometer and subjected to collision-induced dissociation (CID), in which the peptide ions are bombarded with noble gas atoms to induce fragmentation. (Other types of fragmentation techniques are also rapidly maturing.) The fragment ions are detected and reported by the mass spectrometer as tandem mass (MS/MS) spectra. Because peptide ions tend to fragment mostly along the peptide backbone in a somewhat predictable manner, the MS/MS spectra contain information that can be used to deduce the peptide sequence.

Traditionally, the inference of the peptide sequence from its characteristic tandem mass spectra is done by sequence (database) searching. In sequence searching, a target protein (or translated DNA) database is used as a reference to generate all possible putative peptide sequences by in silico digestion. The search engines then use various rules to predict the theoretical fragmentation pattern of each of these putative peptides, and compare the experimentally observed MS/MS spectra to these theoretical spectra one-by-one. Presumably, a positive identification is made if the experimental spectrum is sufficiently similar to one of the theoretical spectra. Several popular computational tools developed for this purpose have emerged over the years, each employing different algorithms and heuristics to achieve an acceptable balance of sensitivity and accuracy. Unfortunately, traditional sequence searching is a challenging, error-prone, and computationally expensive exercise. Despite the tremendous improvement in computer hardware and software over the past decade, this step often remains the bottleneck of any given proteomics experiment. The requirement of computational resources is also substantial, limiting the use of this powerful technique to only those research groups that can afford the costly computational infrastructure.

Spectral searching is an alternative approach that promises to address some of the shortcomings of sequence searching. In spectral searching, a spectral library is meticulously compiled from a large collection of previously observed and identified peptide MS/MS spectra. The unknown spectrum can then by identified by comparing it to all the candidates in the spectral library for the best match. This approach has been commonly employed for mass spectrometric analysis of small molecules with great success, but has only become possible for proteomics very recently. The chief difficulty, that of generating enough high-quality experimental spectra for compilation into spectral libraries, has been overcome by the recent explosion of proteomics data and the availability of public data repositories. Several attempts at creating and searching spectral libraries in the context of proteomics have been published within the past year, all demonstrating the tremendous improvement in search speed and the great potential of this method in complementing, if not replacing, sequence searching in many proteomics applications.

Advantages of Spectral Searching

1. Speed

Spectral searching benefits from a much reduced search space compared to sequence searching. In spectral searching, only peptide ions that are observed and identified in previous experiments will be included in spectral libraries and considered as candidates, whereas in sequence searching, all putative peptide sequences -- plus all permutations of post-translational modification sites, if specified -- in a protein database are considered. Most of these putative peptide ions considered in sequence searching are never observed in practice for a variety of reasons. With typical search parameters, the search space of spectral searching can be several orders of magnitude smaller. It is therefore not surprising that spectral searching can also be several orders of magnitude faster. SpectraST can achieve a top speed of 0.001 to 0.01 second per query spectrum (against a library of about 50,000 entries) on a modern personal computer. In contrast, SEQUEST, one of the most popular sequence search engine, needs about 5 to 20 seconds per query spectrum (against a human IPI database).

2. Preciseness

Spectral searching compares experimental spectra to experimental spectra; sequence searching compares experimental spectra to theoretical spectra. In general, the theoretical spectra considered in sequence searching are very simplistic (e.g., only including b- and y-type ions, at a fixed intensity), and do not resemble the experimental spectra that they are supposed to match. On the other hand, armed with previously observed experimental spectra compiled into spectral libraries, spectral searching can take full advantage of all spectral features, including actual peak intensities, neutral losses from fragments, and various uncommon or even unknown fragments, to determine the best match. The similarity scoring of spectral searching is therefore more precise, and will generally provide better discrimination between good and bad matches. This usually results in much superior statistics (e.g., sensitivity, false discovery rates) for the search results, compared to sequence searching.

Versions

What's new in SpectraST 3.0

  • Creating libraries from sequence search results
  • Library manipulation
    • Union/Intersect/Subtract operations
    • Consensus/Best-replicate library building
    • Filtering based on criteria
    • Quality filters
  • Importing libraries from X!Hunter and BiblioSpec formats
  • File list feature
  • Logging
  • Lib2HTML utility for visualizing library
  • Monoisotopic mass support
  • Various bug fixes and performance enhancements

What’s new in SpectraST 2.0

  • Binary library format, enabling speed gain
  • Library information and statistics in preambles of .sptxt and .pepidx files
  • Searching of .dta files
  • Detecting homologs in hit list
  • Various bug fixes and performance enhancements

User's Guide

Installing SpectraST

SpectraST is an integral component of the Trans Proteomic Pipeline suite of software. Although it can be used alone without other TPP components, SpectraST users are strongly encouraged to download and install the entire TPP suite, which provides other useful functionalities such as raw data importation, automatic validation of search results, protein inference, and quantification and visualization.

Windows users: SpectraST is available as part of TPP for Windows, which is run in the cygwin (UNIX emulator) environment. Download the cygwin installer and follow installation instructions.

UNIX/LINUX users: Visit the Sashimi project page on SourceForge.net, and download the code as a tarball directly. Compiling, installation and configuration information is available in the README file.

Running SpectraST

SpectraST has two modes, the Create mode and the Search mode. In the former, SpectraST creates a searchable spectral library from various formats to prepare for searching. In the latter, SpectraST takes in unknown spectra and searches each of them against the spectral library.

The simplest way of running SpectraST is from the command line of your UNIX/LINUX or cygwin shell. The general usage is:

spectrast <options> <list of files of appropriate formats>

Options must be separated by space, and all begin with a hyphen ('-'). Search mode options always have an 's' following the hyphen; Create mode options a 'c'. SpectraST will perform the appropriate action based on the options specified, and complain when there are problems interpreting the command statement. The usage statement, and a list of options can be viewed by issuing the command spectrast by itself.

Once TPP is installed, SpectraST can also be run from the Petunia web interface, with limited options.

SpectraST Search Mode

SpectraST can perform spectral searching from the following data formats:

  • .mzXML (all versions) format
  • .mzData format
  • .dta (SEQUEST) format, a simple peak list preceded by precursor information
  • NIST (National Institute of Standards and Technology)’s .msp format

It requires a spectral library in SpectraST’s .splib format, which can be created in SpectraST Create Mode.

The results can be outputted to the following formats:

  • .pepXML format
  • .txt format, a fixed-width column text format
  • .xls format, a tab-delimited column text format

The search mode is initiated with the option -s, or any of the search mode options. For instance, to search the MS/MS spectra in the file foo.mzXML against the spectral library bar.splib, using the parameters specified in the file spectrast.params, the command is simply:

Note: If the library is not specified in the parameter file or if the parameter file is not given, then the option -sL is mandatory; otherwise SpectraST will not know which spectral library to use.

spectrast -sFspectrast.params -sLbar.splib foo.mzXML

In the above, -sF and -sL are search mode options that the user can specify to customize the behavior of SpectraST. SpectraST will search all the MS/MS spectra in the file foo.mzXML against the spectral library bar.splib, using the parameters specified in the file spectrast.params. The result will be written to a file named foo.<ext> in the same directory where <ext> specifies the output format (.xml, .txt or .xls).

For a full list of options, see SpectraST Options.

SpectraST Create Mode

Importing Existing Libraries

SpectraST can create a searchable spectral library from the following formats:

  • NIST (National Institute of Standards and Technology)'s .msp format (Download here)
  • X!Hunter's .hlf format [3]
  • BiblioSpec’s .ms2 format [4]

If files of these extensions are supplied, SpectraST simply converts those spectral libraries into a form suitable for SpectraST searches (.splib formats). (Note however that there is no study on how well SpectraST works with X!Hunter and BiblioSpec libraries.) For instance, to import the NIST yeast consensus library, and call the resulting library bar.splib and put it in the directory /dir/, the command is:

spectrast -cN/dir/bar yeast_consensus.msp

When it is done, it produces 5 files in the directory /dir/. The file bar.splib is the library itself; it’s in a binary (machine-readable) format. The file bar.sptxt is a text (human-readable) version of bar.splib. This .sptxt file is of no use to SpectraST; it can be deleted after manual inspection. The files bar.spidx and bar.pepidx are indices on the precursor m/z value and peptide, respectively. Keep the indices and the .splib file in the same directory for SpectraST to function properly. Lastly, a file spectrast.log is also created to document the command executed. Some useful information about the library is printed at the beginning of the bar.sptxt and bar.pepidx.

For a full list of SpectraST options, see SpectraST Options.

Creating Libraries from Sequence Search Results

Note: As per TPP convention, the spectrum query must be named:

<mzXML file name>.<start scan>.<end scan>.<charge>

in the .pepXML file, so that SpectraST knows where to find the corresponding experimental spectrum. (If the .pepXML file is created with TPP tools, this should not be an issue.)

SpectraST can create a spectral library from a .pepXML file, which contains peptide identifications from a previous shotgun proteomics experiment. For this purpose, it is preferable that the .pepXML has been processed with PeptideProphet, such that all the search hits have probabilities assigned. When importing from a .pepXML file, SpectraST scans through the .pepXML file for confident identifications, and attempts to extract the corresponding experimental spectra from .mzXML files. For instance, the command

spectrast -cNraw -cP0.9 dataset1.xml

will import all peptide identifications with probability at or above 0.9 from the file dataset1.xml, and put them in a library called raw.splib (with the accompanying raw.sptxt, raw.spidx and raw.pepidx files).

For a full list of SpectraST options, see SpectraST Options.

Manipulating SpectraST Libraries

SpectraST can convert one or more .splib libraries to another, performing various operations. For instance, to create a consensus library from all the entries in bar.splib and foo.splib, the command is:

spectrast -cNconsensus -cJU -cAC bar.splib foo.splib

SpectraST will take the union (specified by the option -cJU) of all the entries in bar.splib and foo.splib, and wherever a certain peptide ion is present as multiple entries (replicates), it will coalesce the replicates into a single consensus spectrum (specified by -cAC).

Some additional examples:

spectrast -cNphospho -cf”Mods =~ Phospho” bar.splib

This will screen the library bar.splib for all entries with a phosphorylation modification, and put the phosphopeptides in the library phospho.splib.

spectrast -cNcommon -cJI dataset1.splib dataset2.splib

This will take the intersection of the two libraries dataset1.splib and dataset2.splib, and put all entries of peptide ions that are seen in both files in the library common.splib.

spectrast -cNquality -cAQ -cL2 bar.splib

This will apply SpectraST’s quality filters to the library bar.splib; only those entries that pass the first 2 quality filters will be included in the library quality.splib.

For a full list of SpectraST options, see SpectraST Options. For a typical recipe for creating consensus libraries from sequence search results, see Creating Consensus Libraries.

Creating Consensus Libraries

A recipe for creating consensus libraries from TPP-processed sequence search results is detailed here. Consider the following example:

Dataset IdentifierpepXML FilesmzXML Files
AlphaA-SEQ.xml (SEQUEST results of A1.mzXML),
A-MAS.xml (Mascot results of A1.mzXML)
A1.mzXML
BetaB1.xml (SEQUEST results of B1.mzXML),
B2.xml (SEQUEST results of B2.mzXML)
B1.mzXML,
B2.mzXML
GammaG.xml (combined SEQUEST results of all .mzXML files)G1.mzXML,
G2.mzXML,
G3.mzXML

The following commands should be issued in succession:

Note: Alternatively, the library building recipe can be encoded in a recipe.list file (see SpectraST File List Feature):
? -cNrawA -cnAlpha
A-SEQ.xml
A-MAS.xml
? -cNrawB -cnBeta
B1.xml
B2.xml
? -cNrawG -cnGamma
G.xml
? -cJU -cAC -cNconsABC
rawA.splib
rawB.splib
rawC.splib
? -cAQ -cNconsABC_Q
consABC.splib
The command spectrast recipe.list will complete the entire library building procedure.

1. Importing the raw spectra into SpectraST
spectrast -cNrawA -cnAlpha A-SEQ.xml A.MAS.xml
spectrast -cNrawB -cnBeta B1.xml B2.xml
spectrast -cNrawG -cnGamma G.xml

These commands will create the raw libraries rawA.splib, rawB.splib and rawC.splib. Identifications from multiple .pepXML files of the same dataset are imported with the same dataset identifier. The same query with identifications from multiple search engines will be combined intelligently. The probability threshold above which identifications are imported can be specified with the option -cP<prob>, which defaults to 0.9. This will not coalesce replicates of the same peptide ion identification into a consensus spectrum yet. Remember that the .mzXML files must be in the same directories as their corresponding .pepXML files.

2. Creating a consensus spectral library
spectrast -cJU -cAC -cNconsABC raw*.splib

This will combine the three raw libraries, then replace multiple replicates of the same peptide ion identification with a consensus spectrum. Many options are available to fine-tune the algorithm; however, the default parameters are usually adequate.

3. Performing quality control of the consensus spectral library
spectrast -cAQ -cNconsABC_Q consABC.splib

This will run the consensus spectra through SpectraST's quality filters. With the default settings, spectra failing either or both of the first 2 filters will be removed, and spectra failing any of the other filters will be marked. Different quality levels can be set with the options -cL and -cl. It is recommended that a consensus spectral library is subject to some quality control before using it in spectral searching; the optimal quality level reflects the user's desired compromise between library comprehensiveness and library quality. This is to minimize mis-identified and low-quality spectra in the library. These questionable spectra can propagate errors from sequence searching, reduce the discriminating power of the spectral search engine, and induce false positive and false negative hits.

SpectraST Parameter File

Note: All options set in the parameter file will be overridden by command-line options, if specified.

SpectraST allows the use of parameter files to simplify the process of spectral library building and searching. Namely, desired options can be specified in a text file, and supplied to SpectraST every time the same action is performed, saving the user from having to specify lengthy list of command-line options. To invoke the parameter files, specify the options -sF<parameter file> and -cF<parameter file> for Search Mode and Create Mode, respectively. Exemplary parameters file are provided below (these are essentially the defaults):

Search Mode: spectrast.params

Create Mode: spectrast_create.params

SpectraST File List Feature

SpectraST allows the user to list the files to be processed in a text file with extension .list. This can be useful when the number of files to be processed is very large, possibly overwhelming the UNIX command line. It is also an easy way to queue up multiple SpectraST tasks and to keep track of them. For example, if the file job.list contains the lines:

# This is a comment line ignored by SpectraST.
? -sLfoo.splib   # '?' signals the start of a new job; options for this job follow the '?'
1.mzXML
2.mzXML

? -sLbar.splib
3.mzXML
4.mzXML
Note: One can mix Search jobs and Create jobs in the same .list file. Command-line options will be overridden by those specified in the .list file with lines preceded by ‘?’.

Then running the command:

spectrast -sFspectrast.params job.list

is equivalent to running

spectrast -sFspectrast.params -sLfoo.splib 1.mzXML 2.mzXML

followed by

spectrast -sFspectrast.params -sLbar.splib 3.mzXML 4.mzXML

SpectraST Options

Commonly used options are shown in bold. The rest are advanced options that should rarely need to be used.

Search Mode Options
Command-line TokenName in Parameter FileMeaningRemarks
GENERAL OPTIONS
-sNoneSpecify search mode.Not needed when any other search options are set.
-sF<file>NoneRead search options from <file>.If <file> is omitted, “spectrast.params” is assumed
-sL<file>libraryFileSpecify library file.Mandatory unless specified in parameter file. <file> must have .splib extension.
-sD<file>databaseFileSpecify a sequence database file. This will not affect the search in any way, but this information will be included in the output for any downstream data processing.<file> must have .fasta extension. If not set, SpectraST will try to determine this from the preamble of the library.
-sT<type>databaseTypeSpecify the type of the sequence database file.-sTAA (default) = protein database
-sTDNA = genomic database.
-sRindexCacheAllCache all entries in RAM. Requires a lot of memory (the library will usually be loaded almost in its entirety), but speeds up search for unsorted queries. Turn on with -sR, off with -sR!. Default is off.
-sS<file>filterSelectedListFileNameOnly search a subset of the query spectra in the search file. Only query spectra with names matching a line of <file> will be searched.Default is off (search all queries).
CANDIDATE SELECTION AND SCORING OPTIONS
-sM<tol>indexRetrievalMzToleranceSpecify precursor m/z tolerance in Th. Monoisotopic mass is assumed.Default is 3.0 Th.
-sAindexRetrievalUseAverageUse average mass instead of monoisotopic mass.Turn on with -sA, off with -sA!. Default is off.
-sC<type>expectedCysteineModSpecify the expected kind of cysteine modification. Those candidate library entries with a wrong kind of cysteine modification will be ignored.-sCICAT_cl = cleavable ICAT
-sCICAT_uc = uncleavable ICAT
-sCCAM = Carbamidomethyl.
Default is off (search all candidates).
-scignoreSpectraWithUnmodCysteineIgnore any candidate library entries with an unmodified cysteine. Turn on with -sc, off with -sc!. Default is off.
-s_HOM<rank>detectHomologsDetect homologous lower hits up to <rank>. Looks for lower hits homologous to the top hit and adjust delta accordingly.Default is 4.
-s_NO1ignoreChargeOneLibSpectraIgnore all library entries with +1 charge state.Turn on with -s_NO1, off with -s_NO1!. Default is off.
-s_NOSignoreAbnormalSpectraIgnore all spectra which have non-Normal status.Turn of with -s_NOS, off with -s_NOS!. Default is off.
OUTPUT AND DISPLAY OPTIONS
-sE<ext>outputExtensionOutput format. The search result will be written to a file with the same base name as the search file, with extension <ext>.-sEtxt = Fixed-width text format
-sExls = Tab-delimited text format)
-sExml (default) or -sEpepXML = .pepXML format.
-s_FV1<thres>hitListTopHitFvalThresholdMinimum F value threshold for the top hit. Only top hits having F value greater than <thres> will be printed.Default = 0.0 (all top hits will be displayed)
-s_FV2<thres>hitListLowerHitsFvalThresholdMinimum F value threshold for the lower hits. Only lower hits having F value greater than <thres> will be printed.Default = 0.45
-s_SHHhitListShowHomologsAlways displays homologous lower hits regardless of F value.Turn on with -s_SHH (need -s_HOM on), off with -s_SHH! Default is on.
-s_SH1hitListOnlyTopHitOnly display the top hit for each query.Turn on with -s_SH1, off with -s_SH1!. Default is on.
-s_SHMhitListExcludeNoMatchDo not display queries for which there is no candidate, or the top hit is below the minimum F value threshold specified with -sV.Turn on with -s_SHM, off with -s_SHM!. Default is on.
-s_SAVsaveSpectraSave query and matched library spectra as .msp files.Turn on with -s_SAV, off with -s_SAV!. Default is off.
-s_TGZtgzSavedSpectraArchive the saved query and matched library spectra as .tgz files to save space.Turn on with -s_TGZ (need -s_SAV on), off with -s_TGZ!. Default is off.
SPECTRUM FILTERING AND PROCESSING OPTIONS
-s_XNP<thres>filterMinPeakCountRequire minimum number of peaks. All query spectra with fewer than <thres> peaks passing the intensity threshold set with -sP will be removed.Default is 10.
-s_XMZ<m/z>filterAllPeaksBelowMzRemove spectra with almost no peaks above a certain m/z value. All query spectra with 95%+ of the total intensity below <m/z> will be removed. Default is 520.
-s_XIN<inten>filterMaxIntensityBelowFilter query spectra with no peaks with intensity above <inten>.Default is 0.
-s_CNT<thres>filterCountPeakIntensityThresholdMinimum peak intensity for peaks to be counted. Only peaks with intensity above <thres> will be counted to meet the requirement for minimum number of peaks. Default is 2.01
-s_RNT<thres>filterRemovePeakIntensityThresholdNoise peak threshold. All peaks with intensities below <thres> will be zeroed. Default is 2.01
-s_R51<thres>filterRemoveHuge515ThresholdRemove dominant peak at 515.3 Th. All dominant peaks near 515.3 Th (with intensity greater than <thres> of the total intensity of the spectrum) will be zeroed.Default is off. Dominant 515.3 Th peaks are a common impurity artifact in cleavable ICAT experiments.
-s_RNP<num>filterMaxPeaksUsedRemove all but the top <num> peaks in query spectra.Default is 150.
-s_RDR<num>filterMaxDynamicRangeRemove all peaks smaller than 1/<num> of the base (highest) peak in query spectra.Default is 1000.
-s_MZS<mzpow>,
-s_INS<intpow>
peakScalingMzPower,
peakScalingIntensityPower
Intensity scaling power with respect to the m/z value and the raw intensity. The scaled intensity will be (m/z)^<mzpow> * (raw intensity)^<intpow>Default is <mzpow> = 0.0, <intpow> = 0.5.
-s_UAS<factor>peakScalingUnassignedPeaksScaling factor for unassigned peaks in library spectra. Unassigned peaks in the library spectra will be scaled by <factor>.Default is 0.1.
-s_BIN<num>peakBinningNumBinsPerMzUnitNumber of bins per Th.Default is 1.
-s_NEI<frac>peakBinningFractionToNeighborFraction of the scaled intensity assigned to neighboring bins.Default is 0.5.


Create Mode Options
Command-line TokenName in Parameter FileMeaningRemarks
GENERAL OPTIONS (Applicable with any file input)
-cNoneSpecify create mode.Not needed when any other create options are set.
-cF<file>NoneRead create options from file <file>.If <file> is omitted, "spectrast_create.params" is assumed.
-cN<name>outputFileNameSpecify output file name for .splib, .sptxt, .spidx and .pepidx files.If not set, SpectraST will try to construct a sensible name.
-cT<file>useProbTableUse probability table in <file>. Only those peptide ions included in the table will be imported, and their probability adjusted optionally.A probability table is a text file with one peptide ion in the format AC[160]DEFGHIK/2 per line. If a probability is supplied following the peptide ion separated by a tab, it will be used to replace the original probability of that library entry.
-cO<file>useProteinListUse protein list in <file>. Only those peptide ions associated with proteins in the list will be imported.A protein list is a text file with one protein identifier per line. If a number X is supplied following the protein separated by a tab, then at most X peptide ions associated with that protein will be imported. Peptides with more replicates are favored.
-cm<remark>remarkRemark. Add a Remark=<remark> comment to all library entries created.Default is off.
-c_ANNannotatePeaksAnnotate peaks.Turn on with -c_ANN, off with -c_ANN!. Default is on.
-c_BINbinaryFormatWrite library in binary format, which enables quicker search. Turn on with -c_BIN, off with -c_BIN!. Default is on.
-c_DTAwriteDtaFilesWrite all library spectra as .dta files. Turn on with -c_DTA, off with -c_DTA!. Default is off.
-c_PLT<crit>plotSpectraPlot the library spectra as they are created.-c_PLT or -c_PLTALL = Plot every spectrum. -c_PLT<crit> = Plot spectrum when either the Status or the Spec comment value = <crit>.
PEPXML IMPORT OPTIONS (Applicable with .pepXML file input)
-cP<prob>minimumProbabilityToIncludeInclude all spectra identified with probability no less than <prob> in the library.Default is 0.9.
-cn<name>datasetNameSpecify a dataset identifier for the file to be imported.If not set, SpectraST will construct it from the path and the name of the .pepXML file.
-cgsetDeamidatedNXSTSet all asparagines (N) in the motif NX(S/T) as deamidated (N[115]), and all asparagines not in the motif NX(S/T) as unmodified. Use for glycocaptured peptides. Turn on with -cg, off with -cg!. Default is off.
-coaddMzXMLFileToDatasetNameAdd the originating mzXML file name to the dataset identifier. Good for keeping track of the MS run in which the peptide is observed. Turn on with -co, off with -co!. Default is off.
-c_NPK<num>minimumNumPeaksToIncludeExclude spectra of peptide IDs shorter than <num> amino acids.Default is 10.
-c_NAA<thres>minimumDeltaCnToIncludeExclude spectra with fewer than <num> peaks.Default is 6.
-c_DCN<num>minimumNumAAToIncludeExclude spectra with deltaCn smaller than <thres>. Useful for excluding spectra with indiscriminate modification sites. Turn on with -c_DCN, off with -c_DCN!. Default is 0.0.
-c_RNT<thres>rawSpectraNoiseThresholdAbsolute noise filter. Remove noise peaks with intensity below <thres> in imported spectra.Default is 0.0.
-c_RDR<range>rawSpectraMaxDynamicRangeRelative noise filter. Filter out noise peaks with intensity below 1/<range> of that of the highest peak.Default is 100000.0.
LIBRARY MANIPULATION OPTIONS (Applicable with .splib file input)
-cf<pred>filterCriteriaFilter library by criteria. Keep only those entries satisfying the predicate <pred>.<pred> should be in quotes in the form “<attr> <op> <value>”. <attr> can refer to any of the fields and any comment entries. <op> can be ==, !=, <, >, <=, >=, =~ and !~. Multiple predicates can be separated by either & (AND logic) or | (OR logic), but not both. Default is off.
-cJcombineActionCombine action.-cJU = Union (default). Include all the peptide ions in all the files.
-cJI = Intersection. Only include peptide ions that are present in all the files.
-cJS = Subtraction. Only include peptide ions in the first file that are not present in any of the other files.
-cJH = Subtraction of homologs. Only include peptide ions in the first file that do not have any homologs with similar m/z in any of the other files.
-cAbuildActionBuild action.-cAB = Best replicate. Pick the best replicate of each peptide ion.
-cAC = Consensus. Create the consensus spectrum of all replicate spectra of each peptide ion.
-cAQ = Quality filter. Apply quality filters to library.
Default is no build action - all spectra will be included as is.
CONSENSUS SPECTRUM CREATION OPTIONS (Applicable with -cAC option)
-cr<num>minimumNumReplicatesMinimum number of replicates required for each library entry. Peptide ions with fewer than <num> replicates will be excluded from library when creating consensus library.Default is 1.
-c_DISremoveDissimilarReplicatesRemove dissimilar replicates before creating consensus spectrum.Turn on with -c_DIS, off with -c_DIS!. Default is on.
-c_QUO<frac>peakQuorumSpecify peak quorum: the fraction of all replicates required to contain a certain peak. Peaks not present in enough replicates will be deleted.Default is 0.6.
-c_XPU<num>maximumNumPeaksUsedMaximum number of peaks in each replicate to be considered in creating consensus. Only the most intense <num> peaks by intensity will be considered.Default is 300.
-c_XNR<num>maximumNumReplicatesMaximum number of replicates used to build consensus spectrum.Default is 100.
-c_XPK<num>maximumNumPeaksKeptDe-noise single spectra by keeping only the most intense <num> peaks.Default is 150. Will not affect consensus spectra of more than one replicates.
-c_WGT<score>replicateWeightSelect the type of score to weigh and rank the replicates.-c_WGTS (default) = Use a measure of signal-to-noise ratio as the weight.
-c_WGTX = Use a function of the SEQUEST xcorr score as the weight.
-c_WGTP = Use a function of the PeptideProphet probability as the weight.
BEST REPLICATE SELECTION OPTIONS (Applicable with -cAB option)
-cr<num>minimumNumReplicatesMinimum number of replicates required for each library entry. Peptide ions with fewer than <num> replicates will be excluded from library when creating best-replicate library library.Default is 1.
-c_DISremoveDissimilarReplicatesRemove dissimilar replicates before selecting best replicate.Turn on with -c_DIS, off with -c_DIS!. Default is on.
QUALITY FILTER OPTIONS (Applicable with -cAQ option)
-cr<num>minimumNumReplicatesReplicate quorum. Its value affects behavior of quality filter (see below).Default is 1.
-cL<level>,
-cl<level>
qualityLevelRemove,
qualityLevelMark
Specify the stringency of the quality filter. -cL specifies the level for removal, -cl specifies the level for marking.<level> = 0: No filter.
<level> = 1: Remove/mark impure spectra.
<level> = 2: Also remove/mark spectra with a spectrally similar counterpart in the library that is better.
<level> = 3: Also remove/mark inquorate entries (defined with -cr) that share no peptide sub-sequences with any other entries in the library.
<level> = 4: Also remove/mark all singleton entries.
<level> = 5: Also remove/mark all inquorate entries (defined with -cr).
Default is -cL2, -cl5
-c_QP1qualityPenalizeSingletonsApply stricter thresholds to singleton spectra during quality filters.Turn on with -c_QP1, off with -c_QP1!. Default is on.
-c_QIP<thres>qualityImmuneProbThresholdSpecify a probability above which library spectra are immune to quality filters.Default is 0.999.
-c_QIEqualityImmuneMultipleEnginesMake spectra identified by multiple sequence search engines immune to quality filters.Turn on with -c_QIE, off with -c_QIE!. Default is on.


Miscellaneous Options
Command-line TokenName in Parameter FileMeaningRemarks
-VNoneVerbose mode. More information displayed to console.Default is off.
-QNoneQuiet mode.Default is off.
-L<file>NoneSpecify name of log file.Default is spectrast.log.

Other SpectraST Utilities

Plotspectrast

Plotspectrast is a spectrum viewer designed for SpectraST. It comes as two programs: a CGI that can be launched from a web page (e.g., from PepXMLViewer), and a stand-alone application. They are included in the TPP and no additional installation is necessary.

The most common use of Plotspectrast is for visualization of spectral matches from PepXMLViewer. When displaying SpectraST results, PepXMLViewer provides a link to invoke plotspectrast.cgi for each spectrum query. The query spectrum will be plotted as a "mirror image" of the best-matched library spectrum, enabling the user to quickly assess the quality of the match. Below the plot there is an ion table, and tables listing information about the library spectrum. The legend of the plots and ion table is as follows:

  • Library spectrum
    • Peak color: Red = Selected annotated peaks; Blue = Unannotated peaks
    • Label color: Red = Selected annotated peaks that have matched peaks in the query spectrum; Black = Unmatched peaks
  • Query spectrum
    • Peak color: Red = Peaks that match selected annotated peaks in the library spectrum; Black = Unmatched peaks
  • Ion table
    • Cell color: Red = Ions present in both spectra; Pink = Ions present in the library spectrum only; White = Ions present in neither spectrum

Various controls are available to the left of the plot to customize how the spectra are displayed:

  • X-Range: The range of X axis (the m/z values) displayed
  • MatchTol: The m/z tolerance within which a peak is considered matched between the library and query spectra. This affects the labeling and coloring of the peaks.
  • Y-Zoom: Zooming factor in the Y axis (the peak intensity).
  • BlankPrecRegion: Blank the region around the precursor m/z. (Note: in SpectraST searching peaks in this region are ignored.)
  • Annotation Options
    • LabelType: Toggling between displaying the ion type, the m/z value, or no label for selected annotated peaks
    • NumPeaks: The number of peaks considered for labeling, from the highest peak down
    • MinInten: The minimum intensity for a peak to be labeled
    • Ions a, b, y (+1, +2, +3): Whether or not to label that particular type of ion of that charge state
    • -H2O/-NH3/-P: Whether or not to label water/ammonia/phosphate neutral loss peaks of fragment ions
    • Prec losses: Whether or not to label neutral loss peaks of the precursor
    • All: Whether or not to label all annotated peaks
    • ColorAll: Whether to color all the annotated peaks regardless of label selection

The stand-alone plotspectrast application produces a static .jpg image in the same directory as the query spectrum file. It has the following usage:

plotspectrast <.splib file> <library file offset> <.mzXML file> <query scan number>

Plots the library spectrum at <library file offset> and the query spectrum of <query scan number> in the .mzXML file. The desired value of <library file offset> can be extracted from the .spidx, .pepidx or .sptxt file (BinaryFileOffset in the Comment field).

plotspectrast <.splib file> <library file offset> <.dta file>

Similar to above, except the query spectrum is in a .dta file.

plotspectrast <.splib file> <library file offset> <.none file>

Plots the library spectrum by itself. It will not actually look for the .none file, but the resulting .jpg file will be named with the same prefix as the .none file and place in the same directory.

plotspectrast <.msp file of library spectrum> <.msp, .dta, or .none file>

Similar to above, except the library spectrum is given in a .msp file.

Lib2HTML

Lib2HTML is an application that converts a SpectraST library into an HTML file for viewing. It is included in the TPP and no additional installation is necessary. In the resulting HTML file, replicates of the same peptide ion will be listed on one row, and links are provided to each replicate to view the spectrum using Plotspectrast. The usage is:

Lib2HTML <options> <full path from webserver root to .splib file>

Options include:

  • -V : Verbose. Displaying more information for each entry.
  • -N<num> : Specify the maximum number of replicates displayed for each unique peptide ion. Default is 10.
  • -P<path> : Specify the full path from the webserver root to the plotspectrast.cgi binary.

Developer's Guide

The SpectraST source code contains detailed documentation. More tips to developers who want to modify SpectraST will be available shortly.

Where to Get Help

The SPC Tools Discussion Group: spctools-discuss.googlegroups.com

The SPC Tools Announcement Group: spctools-announce.googlegroups.com

Public spectral libraries are available for download at PeptideAtlas

External Links


Reference

  • Keller, Andrew, et al. (2005) "A uniform proteomics MS/MS analysis platform utilizing open XML file formats". Molecular Systems Biology 1, 17. Full text
  • Lam, Henry, et al. (2007). "Development and validation of a spectral library searching method for peptide identification from MS/MS". Proteomics 7 (5), 655-667. Abstract
  • Craig, Robertson, et al. (2006). "Using annotated peptide mass spectrum libraries for protein identification". Journal of Proteome Research 5 (8), 1843-1849. Abstract
  • Frewen, Barbara, et al. (2006). "Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries". Analytical Chemistry 78 (16), 5678-5684. Abstract
Personal tools