Software:SpectraST
From SPCTools
Revision as of 21:06, 5 June 2008 Henrylam (Talk | contribs) (→SpectraST Options) ← Previous diff |
Revision as of 18:52, 16 June 2008 Henrylam (Talk | contribs) (→SpectraST File List Feature) Next diff → |
||
Line 221: | Line 221: | ||
</nowiki></pre> | </nowiki></pre> | ||
- | <div class="messagebox" style="float: right; width: 200px; border: thin solid #DDDDFF; padding: 10px; margin-left: 10px;">Note: One can mix Search jobs and Create jobs in the same .list file. Command-line options will override those specified in the .list file with lines preceded by ‘?’.</div> | + | <div class="messagebox" style="float: right; width: 200px; border: thin solid #DDDDFF; padding: 10px; margin-left: 10px;">Note: One can mix Search jobs and Create jobs in the same .list file. Command-line options will be overridden by those specified in the .list file with lines preceded by ‘?’.</div> |
Then running the command: | Then running the command: | ||
Revision as of 18:52, 16 June 2008
SpectraST (short for "Spectra Search Tool" and rhymes with "contrast") is a spectral library building and searching tool designed primarily for shotgun proteomics applications. It is developed at the Institute for Systems Biology (ISB), in the research group of Professor Ruedi Aebersold. The main developer is Henry Lam.
The latest version of SpectraST is 3.0, to be released sometime in 2008, pending the publication of a manuscript. It is distributed by ISB under the LPGL license, as a component of the Trans Proteomic Pipeline (TPP) suite of software, distributed under the same license. The source code repository is at [1], and the official download site for the Windows installer is at [2]. Currently, the released version (2.0) with TPP can only perform library searching and does not have library creation functionalities.
Contents
|
Introduction to Shotgun Proteomics and Spectral Searching
The goal of proteomics is the systematic identification and quantification of all proteins in a biological system. In one of the most frequently practiced workflows, commonly known as shotgun proteomics, a protein sample of interest is first digested with a proteolytic enzyme (trypsin being the most common) to yield peptides that are amenable to LC-MS/MS analysis. The peptides in the resulting mixture are chromatographically resolved, ionized by techniques such as electrospray ionization (ESI) or matrix-assisted laser desorption ionization (MALDI) before being analyzed by a mass spectrometer. A fraction of the peptide ions are selectively isolated by the mass spectrometer and subjected to collision-induced dissociation (CID), in which the peptide ions are bombarded with noble gas atoms to induce fragmentation. (Other types of fragmentation techniques are also rapidly maturing.) The fragment ions are detected and reported by the mass spectrometer as tandem mass (MS/MS) spectra. Because peptide ions tend to fragment mostly along the peptide backbone in a somewhat predictable manner, the MS/MS spectra contain information that can be used to deduce the peptide sequence.
Traditionally, the inference of the peptide sequence from its characteristic tandem mass spectra is done by sequence (database) searching. In sequence searching, a target protein (or translated DNA) database is used as a reference to generate all possible putative peptide sequences by in silico digestion. The search engines then use various rules to predict the theoretical fragmentation pattern of each of these putative peptides, and compare the experimentally observed MS/MS spectra to these theoretical spectra one-by-one. Presumably, a positive identification is made if the experimental spectrum is sufficiently similar to one of the theoretical spectra. Several popular computational tools developed for this purpose have emerged over the years, each employing different algorithms and heuristics to achieve an acceptable balance of sensitivity and accuracy. Unfortunately, traditional sequence searching is a challenging, error-prone, and computationally expensive exercise. Despite the tremendous improvement in computer hardware and software over the past decade, this step often remains the bottleneck of any given proteomics experiment. The requirement of computational resources is also substantial, limiting the use of this powerful technique to only those research groups that can afford the costly computational infrastructure.
Spectral searching is an alternative approach that promises to address some of the shortcomings of sequence searching. In spectral searching, a spectral library is meticulously compiled from a large collection of previously observed and identified peptide MS/MS spectra. The unknown spectrum can then by identified by comparing it to all the candidates in the spectral library for the best match. This approach has been commonly employed for mass spectrometric analysis of small molecules with great success, but has only become possible for proteomics very recently. The chief difficulty, that of generating enough high-quality experimental spectra for compilation into spectral libraries, has been overcome by the recent explosion of proteomics data and the availability of public data repositories. Several attempts at creating and searching spectral libraries in the context of proteomics have been published within the past year, all demonstrating the tremendous improvement in search speed and the great potential of this method in complementing, if not replacing, sequence searching in many proteomics applications.
Advantages of Spectral Searching
1. Speed
Spectral searching benefits from a much reduced search space compared to sequence searching. In spectral searching, only peptide ions that are observed and identified in previous experiments will be included in spectral libraries and considered as candidates, whereas in sequence searching, all putative peptide sequences -- plus all permutations of post-translational modification sites, if specified -- in a protein database are considered. Most of these putative peptide ions considered in sequence searching are never observed in practice for a variety of reasons. With typical search parameters, the search space of spectral searching can be several orders of magnitude smaller. It is therefore not surprising that spectral searching can also be several orders of magnitude faster. SpectraST can achieve a top speed of 0.001 to 0.01 second per query spectrum (against a library of about 50,000 entries) on a modern personal computer. In contrast, SEQUEST, one of the most popular sequence search engine, needs about 5 to 20 seconds per query spectrum (against a human IPI database).
2. Preciseness
Spectral searching compares experimental spectra to experimental spectra; sequence searching compares experimental spectra to theoretical spectra. In general, the theoretical spectra considered in sequence searching are very simplistic (e.g., only including b- and y-type ions, at a fixed intensity), and do not resemble the experimental spectra that they are supposed to match. On the other hand, armed with previously observed experimental spectra compiled into spectral libraries, spectral searching can take full advantage of all spectral features, including actual peak intensities, neutral losses from fragments, and various uncommon or even unknown fragments, to determine the best match. The similarity scoring of spectral searching is therefore more precise, and will generally provide better discrimination between good and bad matches. This usually results in much superior statistics (e.g., sensitivity, false discovery rates) for the search results, compared to sequence searching.
Versions
What's new in SpectraST 3.0
- Creating libraries from sequence search results
- Library manipulation
- Union/Intersect/Subtract operations
- Consensus/Best-replicate library building
- Filtering based on criteria
- Quality filters
- Importing libraries from X!Hunter and BiblioSpec formats
- File list feature
- Logging
- Lib2HTML utility for visualizing library
- Monoisotopic mass support
- Various bug fixes and performance enhancements
What’s new in SpectraST 2.0
- Binary library format, enabling speed gain
- Library information and statistics in preambles of .sptxt and .pepidx files
- Searching of .dta files
- Detecting homologs in hit list
- Various bug fixes and performance enhancements
User's Guide
Installing SpectraST
SpectraST is an integral component of the Trans Proteomic Pipeline suite of software. Although it can be used alone without other TPP components, SpectraST users are strongly encouraged to download and install the entire TPP suite, which provides other useful functionalities such as raw data importation, automatic validation of search results, protein inference, and quantification and visualization.
Windows users: SpectraST is available as part of TPP for Windows, which is run in the cygwin (UNIX emulator) environment. Download the cygwin installer and follow installation instructions.
UNIX/LINUX users: Visit the Sashimi project page on SourceForge.net, and download the code as a tarball directly. Compiling, installation and configuration information is available in the README file.
Running SpectraST
SpectraST has two modes, the Create mode and the Search mode. In the former, SpectraST creates a searchable spectral library from various formats to prepare for searching. In the latter, SpectraST takes in unknown spectra and searches each of them against the spectral library.
The simplest way of running SpectraST is from the command line of your UNIX/LINUX or cygwin shell. The general usage is:
spectrast <options> <list of files of appropriate formats>
Options must be separated by space, and all begin with a hyphen ('-'). Search mode options always have an 's' following the hyphen; Create mode options a 'c'. SpectraST will perform the appropriate action based on the options specified, and complain when there are problems interpreting the command statement. The usage statement, and a list of options can be viewed by issuing the command spectrast
by itself.
Once TPP is installed, SpectraST can also be run from the Petunia web interface, with limited options.
SpectraST Search Mode
SpectraST can perform spectral searching from the following data formats:
- .mzXML (all versions) format
- .mzData format
- .dta (SEQUEST) format, a simple peak list preceded by precursor information
- NIST (National Institute of Standards and Technology)’s .msp format
It requires a spectral library in SpectraST’s .splib format, which can be created in SpectraST Create Mode.
The results can be outputted to the following formats:
- .pepXML format
- .txt format, a fixed-width column text format
- .xls format, a tab-delimited column text format
The search mode is initiated with the option -s, or any of the search mode options. For instance, to search the MS/MS spectra in the file foo.mzXML against the spectral library bar.splib
, using the parameters specified in the file spectrast.params, the command is simply:
spectrast -sFspectrast.params -sLbar.splib foo.mzXML
In the above, -sF and -sL are search mode options that the user can specify to customize the behavior of SpectraST. SpectraST will search all the MS/MS spectra in the file foo.mzXML
against the spectral library bar.splib
, using the parameters specified in the file spectrast.params
. The result will be written to a file named foo.<ext>
in the same directory where <ext>
specifies the output format (.xml, .txt or .xls).
For a full list of options, see SpectraST Options.
SpectraST Create Mode
Importing Existing Libraries
SpectraST can create a searchable spectral library from the following formats:
- NIST (National Institute of Standards and Technology)'s .msp format (Download here)
- X!Hunter's .hlf format [3]
- BiblioSpec’s .ms2 format [4]
If files of these extensions are supplied, SpectraST simply converts those spectral libraries into a form suitable for SpectraST searches (.splib formats). (Note however that there is no study on how well SpectraST works with X!Hunter and BiblioSpec libraries.) For instance, to import the NIST yeast consensus library, and call the resulting library bar.splib and put it in the directory /dir/
, the command is:
spectrast -cN/dir/bar yeast_consensus.msp
When it is done, it produces 5 files in the directory /dir/
. The file bar.splib
is the library itself; it’s in a binary (machine-readable) format. The file bar.sptxt
is a text (human-readable) version of bar.splib
. This .sptxt file is of no use to SpectraST; it can be deleted after manual inspection. The files bar.spidx
and bar.pepidx
are indices on the precursor m/z value and peptide, respectively. Keep the indices and the .splib file in the same directory for SpectraST to function properly. Lastly, a file spectrast.log
is also created to document the command executed. Some useful information about the library is printed at the beginning of the bar.sptxt
and bar.pepidx
.
For a full list of SpectraST options, see SpectraST Options.
Creating Libraries from Sequence Search Results
SpectraST can create a spectral library from a .pepXML file, which contains peptide identifications from a previous shotgun proteomics experiment. For this purpose, it is preferable that the .pepXML has been processed with PeptideProphet, such that all the search hits have probabilities assigned. When importing from a .pepXML file, SpectraST scans through the .pepXML file for confident identifications, and attempts to extract the corresponding experimental spectra from .mzXML files. For instance, the command
spectrast -cNraw -cP0.9 dataset1.xml
will import all peptide identifications with probability at or above 0.9 from the file dataset1.xml
, and put them in a library called raw.splib
(with the accompanying raw.sptxt
, raw.spidx
and raw.pepidx
files).
For a full list of SpectraST options, see SpectraST Options.
Manipulating SpectraST Libraries
SpectraST can convert one or more .splib libraries to another, performing various operations. For instance, to create a consensus library from all the entries in bar.splib
and foo.splib
, the command is:
spectrast -cNconsensus -cJU -cAC bar.splib foo.splib
SpectraST will take the union (specified by the option -cJU
) of all the entries in bar.splib
and foo.splib
, and wherever a certain peptide ion is present as multiple entries (replicates), it will coalesce the replicates into a single consensus spectrum (specified by -cAC
).
Some additional examples:
spectrast -cNphospho -cf”Mods =~ Phospho” bar.splib
This will screen the library bar.splib
for all entries with a phosphorylation modification, and put the phosphopeptides in the library phospho.splib
.
spectrast -cNcommon -cJI dataset1.splib dataset2.splib
This will take the intersection of the two libraries dataset1.splib
and dataset2.splib
, and put all entries of peptide ions that are seen in both files in the library common.splib
.
spectrast -cNquality -cAQ -cL2 bar.splib
This will apply SpectraST’s quality filters to the library bar.splib
; only those entries that pass the first 2 quality filters will be included in the library quality.splib
.
For a full list of SpectraST options, see SpectraST Options. For a typical recipe for creating consensus libraries from sequence search results, see Creating Consensus Libraries.
Creating Consensus Libraries
A recipe for creating consensus libraries from TPP-processed sequence search results is detailed here. Consider the following example:
Dataset Identifier | pepXML Files | mzXML Files |
---|---|---|
Alpha | A-SEQ.xml (SEQUEST results of A1.mzXML), A-MAS.xml (Mascot results of A1.mzXML) | A1.mzXML |
Beta | B1.xml (SEQUEST results of B1.mzXML), B2.xml (SEQUEST results of B2.mzXML) | B1.mzXML, B2.mzXML |
Gamma | G.xml (combined SEQUEST results of all .mzXML files) | G1.mzXML, G2.mzXML, G3.mzXML |
The following commands should be issued in succession:
1. Importing the raw spectra into SpectraST
spectrast -cNrawA -cnAlpha A-SEQ.xml A.MAS.xml
spectrast -cNrawB -cnBeta B1.xml B2.xml
spectrast -cNrawG -cnGamma G.xml
These commands will create the raw libraries rawA.splib, rawB.splib and rawC.splib. Identifications from multiple .pepXML files of the same dataset are imported with the same dataset identifier. The same query with identifications from multiple search engines will be combined intelligently. The probability threshold above which identifications are imported can be specified with the option -cP<prob>
, which defaults to 0.9. This will not coalesce replicates of the same peptide ion identification into a consensus spectrum yet. Remember that the .mzXML files must be in the same directories as their corresponding .pepXML files.
2. Creating a consensus spectral library
spectrast -cJU -cAC -cNconsABC raw*.splib
This will combine the three raw libraries, then replace multiple replicates of the same peptide ion identification with a consensus spectrum. Many options are available to fine-tune the algorithm; however, the default parameters are usually adequate.
3. Performing quality control of the consensus spectral library
spectrast -cAQ -cNconsABC_Q consABC.splib
This will run the consensus spectra through SpectraST's quality filters. With the default settings, spectra failing either or both of the first 2 filters will be removed, and spectra failing any of the other filters will be marked. Different quality levels can be set with the options -cL
and -cl
. It is recommended that a consensus spectral library is subject to some quality control before using it in spectral searching; the optimal quality level reflects the user's desired compromise between library comprehensiveness and library quality. This is to minimize mis-identified and low-quality spectra in the library. These questionable spectra can propagate errors from sequence searching, reduce the discriminating power of the spectral search engine, and induce false positive and false negative hits.
SpectraST Parameter File
SpectraST allows the use of parameter files to simplify the process of spectral library building and searching. Namely, desired options can be specified in a text file, and supplied to SpectraST every time the same action is performed, saving the user from having to specify lengthy list of command-line options. To invoke the parameter files, specify the options -sF<parameter file>
and -cF<parameter file>
for Search Mode and Create Mode, respectively. Exemplary parameters file are provided below (these are essentially the defaults):
Search Mode: spectrast.params
Create Mode: spectrast_create.params
SpectraST File List Feature
SpectraST allows the user to list the files to be processed in a text file with extension .list. This can be useful when the number of files to be processed is very large, possibly overwhelming the UNIX command line. It is also an easy way to queue up multiple SpectraST tasks and to keep track of them. For example, if the file job.list contains the lines:
# This is a comment line ignored by SpectraST. ? -sLfoo.splib # '?' signals the start of a new job; options for this job follow the '?' 1.mzXML 2.mzXML ? -sLbar.splib 3.mzXML 4.mzXML
Then running the command:
spectrast -sFspectrast.params job.list
is equivalent to running
spectrast -sFspectrast.params -sLfoo.splib 1.mzXML 2.mzXML
followed by
spectrast -sFspectrast.params -sLbar.splib 3.mzXML 4.mzXML
SpectraST Options
Commonly used options are shown in bold. The rest are advanced options that should rarely need to be used.
Command-line Token | Name in Parameter File | Meaning | Remarks |
---|---|---|---|
GENERAL OPTIONS | |||
-s | None | Specify search mode. | Not needed when any other search options are set. |
-sF<file> | None | Read search options from <file>. | If <file> is omitted, “spectrast.params” is assumed |
-sL<file> | libraryFile | Specify library file. | Mandatory unless specified in parameter file. <file> must have .splib extension. |
-sD<file> | databaseFile | Specify a sequence database file. This will not affect the search in any way, but this information will be included in the output for any downstream data processing. | <file> must have .fasta extension. If not set, SpectraST will try to determine this from the preamble of the library. |
-sT<type> | databaseType | Specify the type of the sequence database file. | -sTAA (default) = protein database -sTDNA = genomic database. |
-sR | indexCacheAll | Cache all entries in RAM. Requires a lot of memory (the library will usually be loaded almost in its entirety), but speeds up search for unsorted queries. | Turn on with -sR, off with -sR!. Default is off. |
-sS<file> | filterSelectedListFileName | Only search a subset of the query spectra in the search file. Only query spectra with names matching a line of <file> will be searched. | Default is off (search all queries). |
CANDIDATE SELECTION AND SCORING OPTIONS | |||
-sM<tol> | indexRetrievalMzTolerance | Specify precursor m/z tolerance in Th. Monoisotopic mass is assumed. | Default is 3.0 Th. |
-sA | indexRetrievalUseAverage | Use average mass instead of monoisotopic mass. | Turn on with -sA, off with -sA!. Default is off. |
-sC<type> | expectedCysteineMod | Specify the expected kind of cysteine modification. Those candidate library entries with a wrong kind of cysteine modification will be ignored. | -sCICAT_cl = cleavable ICAT -sCICAT_uc = uncleavable ICAT -sCCAM = Carbamidomethyl. Default is off (search all candidates). |
-sc | ignoreSpectraWithUnmodCysteine | Ignore any candidate library entries with an unmodified cysteine. | Turn on with -sc, off with -sc!. Default is off. |
-s_HOM<rank> | detectHomologs | Detect homologous lower hits up to <rank>. Looks for lower hits homologous to the top hit and adjust delta accordingly. | Default is 4. |
-s_NO1 | ignoreChargeOneLibSpectra | Ignore all library entries with +1 charge state. | Turn on with -s_NO1, off with -s_NO1!. Default is off. |
-s_NOS | ignoreAbnormalSpectra | Ignore all spectra which have non-Normal status. | Turn of with -s_NOS, off with -s_NOS!. Default is off. |
OUTPUT AND DISPLAY OPTIONS | |||
-sE<ext> | outputExtension | Output format. The search result will be written to a file with the same base name as the search file, with extension <ext>. | -sEtxt = Fixed-width text format -sExls = Tab-delimited text format) -sExml (default) or -sEpepXML = .pepXML format. |
-s_FV1<thres> | hitListTopHitFvalThreshold | Minimum F value threshold for the top hit. Only top hits having F value greater than <thres> will be printed. | Default = 0.0 (all top hits will be displayed) |
-s_FV2<thres> | hitListLowerHitsFvalThreshold | Minimum F value threshold for the lower hits. Only lower hits having F value greater than <thres> will be printed. | Default = 0.45 |
-s_SHH | hitListShowHomologs | Always displays homologous lower hits regardless of F value. | Turn on with -s_SHH (need -s_HOM on), off with -s_SHH! Default is on. |
-s_SH1 | hitListOnlyTopHit | Only display the top hit for each query. | Turn on with -s_SH1, off with -s_SH1!. Default is on. |
-s_SHM | hitListExcludeNoMatch | Do not display queries for which there is no candidate, or the top hit is below the minimum F value threshold specified with -sV. | Turn on with -s_SHM, off with -s_SHM!. Default is on. |
-s_SAV | saveSpectra | Save query and matched library spectra as .msp files. | Turn on with -s_SAV, off with -s_SAV!. Default is off. |
-s_TGZ | tgzSavedSpectra | Archive the saved query and matched library spectra as .tgz files to save space. | Turn on with -s_TGZ (need -s_SAV on), off with -s_TGZ!. Default is off. |
SPECTRUM FILTERING AND PROCESSING OPTIONS | |||
-s_XNP<thres> | filterMinPeakCount | Require minimum number of peaks. All query spectra with fewer than <thres> peaks passing the intensity threshold set with -sP will be removed. | Default is 10. |
-s_XMZ<m/z> | filterAllPeaksBelowMz | Remove spectra with almost no peaks above a certain m/z value. All query spectra with 95%+ of the total intensity below <m/z> will be removed. | Default is 520. |
-s_XIN<inten> | filterMaxIntensityBelow | Filter query spectra with no peaks with intensity above <inten>. | Default is 0. |
-s_CNT<thres> | filterCountPeakIntensityThreshold | Minimum peak intensity for peaks to be counted. Only peaks with intensity above <thres> will be counted to meet the requirement for minimum number of peaks. | Default is 2.01 |
-s_RNT<thres> | filterRemovePeakIntensityThreshold | Noise peak threshold. All peaks with intensities below <thres> will be zeroed. | Default is 2.01 |
-s_R51<thres> | filterRemoveHuge515Threshold | Remove dominant peak at 515.3 Th. All dominant peaks near 515.3 Th (with intensity greater than <thres> of the total intensity of the spectrum) will be zeroed. | Default is off. Dominant 515.3 Th peaks are a common impurity artifact in cleavable ICAT experiments. |
-s_RNP<num> | filterMaxPeaksUsed | Remove all but the top <num> peaks in query spectra. | Default is 150. |
-s_RDR<num> | filterMaxDynamicRange | Remove all peaks smaller than 1/<num> of the base (highest) peak in query spectra. | Default is 1000. |
-s_MZS<mzpow>, -s_INS<intpow> | peakScalingMzPower, peakScalingIntensityPower | Intensity scaling power with respect to the m/z value and the raw intensity. The scaled intensity will be (m/z)^<mzpow> * (raw intensity)^<intpow> | Default is <mzpow> = 0.0, <intpow> = 0.5. |
-s_UAS<factor> | peakScalingUnassignedPeaks | Scaling factor for unassigned peaks in library spectra. Unassigned peaks in the library spectra will be scaled by <factor>. | Default is 0.1. |
-s_BIN<num> | peakBinningNumBinsPerMzUnit | Number of bins per Th. | Default is 1. |
-s_NEI<frac> | peakBinningFractionToNeighbor | Fraction of the scaled intensity assigned to neighboring bins. | Default is 0.5. |
Command-line Token | Name in Parameter File | Meaning | Remarks |
---|---|---|---|
GENERAL OPTIONS (Applicable with any file input) | |||
-c | None | Specify create mode. | Not needed when any other create options are set. |
-cF<file> | None | Read create options from file <file>. | If <file> is omitted, "spectrast_create.params" is assumed. |
-cN<name> | outputFileName | Specify output file name for .splib, .sptxt, .spidx and .pepidx files. | If not set, SpectraST will try to construct a sensible name. |
-cT<file> | useProbTable | Use probability table in <file>. Only those peptide ions included in the table will be imported, and their probability adjusted optionally. | A probability table is a text file with one peptide ion in the format AC[160]DEFGHIK/2 per line. If a probability is supplied following the peptide ion separated by a tab, it will be used to replace the original probability of that library entry. |
-cO<file> | useProteinList | Use protein list in <file>. Only those peptide ions associated with proteins in the list will be imported. | A protein list is a text file with one protein identifier per line. If a number X is supplied following the protein separated by a tab, then at most X peptide ions associated with that protein will be imported. Peptides with more replicates are favored. |
-cm<remark> | remark | Remark. Add a Remark=<remark> comment to all library entries created. | Default is off. |
-c_ANN | annotatePeaks | Annotate peaks. | Turn on with -c_ANN, off with -c_ANN!. Default is on. |
-c_BIN | binaryFormat | Write library in binary format, which enables quicker search. | Turn on with -c_BIN, off with -c_BIN!. Default is on. |
-c_DTA | writeDtaFiles | Write all library spectra as .dta files. | Turn on with -c_DTA, off with -c_DTA!. Default is off. |
-c_PLT<crit> | plotSpectra | Plot the library spectra as they are created. | -c_PLT or -c_PLTALL = Plot every spectrum. -c_PLT<crit> = Plot spectrum when either the Status or the Spec comment value = <crit>. |
PEPXML IMPORT OPTIONS (Applicable with .pepXML file input) | |||
-cP<prob> | minimumProbabilityToInclude | Include all spectra identified with probability no less than <prob> in the library. | Default is 0.9. |
-cn<name> | datasetName | Specify a dataset identifier for the file to be imported. | If not set, SpectraST will construct it from the path and the name of the .pepXML file. |
-cg | setDeamidatedNXST | Set all asparagines (N) in the motif NX(S/T) as deamidated (N[115]), and all asparagines not in the motif NX(S/T) as unmodified. Use for glycocaptured peptides. | Turn on with -cg, off with -cg!. Default is off. |
-co | addMzXMLFileToDatasetName | Add the originating mzXML file name to the dataset identifier. Good for keeping track of the MS run in which the peptide is observed. | Turn on with -co, off with -co!. Default is off. |
-c_NPK<num> | minimumNumPeaksToInclude | Exclude spectra of peptide IDs shorter than <num> amino acids. | Default is 10. |
-c_NAA<thres> | minimumDeltaCnToInclude | Exclude spectra with fewer than <num> peaks. | Default is 6. |
-c_DCN<num> | minimumNumAAToInclude | Exclude spectra with deltaCn smaller than <thres>. Useful for excluding spectra with indiscriminate modification sites. | Turn on with -c_DCN, off with -c_DCN!. Default is 0.0. |
-c_RNT<thres> | rawSpectraNoiseThreshold | Absolute noise filter. Remove noise peaks with intensity below <thres> in imported spectra. | Default is 0.0. |
-c_RDR<range> | rawSpectraMaxDynamicRange | Relative noise filter. Filter out noise peaks with intensity below 1/<range> of that of the highest peak. | Default is 100000.0. |
LIBRARY MANIPULATION OPTIONS (Applicable with .splib file input) | |||
-cf<pred> | filterCriteria | Filter library by criteria. Keep only those entries satisfying the predicate <pred>. | <pred> should be in quotes in the form “<attr> <op> <value>”. <attr> can refer to any of the fields and any comment entries. <op> can be ==, !=, <, >, <=, >=, =~ and !~. Multiple predicates can be separated by either & (AND logic) or | (OR logic), but not both. Default is off. |
-cJ | combineAction | Combine action. | -cJU = Union (default). Include all the peptide ions in all the files. -cJI = Intersection. Only include peptide ions that are present in all the files. -cJS = Subtraction. Only include peptide ions in the first file that are not present in any of the other files. -cJH = Subtraction of homologs. Only include peptide ions in the first file that do not have any homologs with similar m/z in any of the other files. |
-cA | buildAction | Build action. | -cAB = Best replicate. Pick the best replicate of each peptide ion. -cAC = Consensus. Create the consensus spectrum of all replicate spectra of each peptide ion. -cAQ = Quality filter. Apply quality filters to library. Default is no build action - all spectra will be included as is. |
CONSENSUS SPECTRUM CREATION OPTIONS (Applicable with -cAC option) | |||
-cr<num> | minimumNumReplicates | Minimum number of replicates required for each library entry. Peptide ions with fewer than <num> replicates will be excluded from library when creating consensus library. | Default is 1. |
-c_DIS | removeDissimilarReplicates | Remove dissimilar replicates before creating consensus spectrum. | Turn on with -c_DIS, off with -c_DIS!. Default is on. |
-c_QUO<frac> | peakQuorum | Specify peak quorum: the fraction of all replicates required to contain a certain peak. Peaks not present in enough replicates will be deleted. | Default is 0.6. |
-c_XPU<num> | maximumNumPeaksUsed | Maximum number of peaks in each replicate to be considered in creating consensus. Only the most intense <num> peaks by intensity will be considered. | Default is 300. |
-c_XNR<num> | maximumNumReplicates | Maximum number of replicates used to build consensus spectrum. | Default is 100. |
-c_XPK<num> | maximumNumPeaksKept | De-noise single spectra by keeping only the most intense <num> peaks. | Default is 150. Will not affect consensus spectra of more than one replicates. |
-c_WGT<score> | replicateWeight | Select the type of score to weigh and rank the replicates. | -c_WGTS (default) = Use a measure of signal-to-noise ratio as the weight. -c_WGTX = Use a function of the SEQUEST xcorr score as the weight. -c_WGTP = Use a function of the PeptideProphet probability as the weight. |
BEST REPLICATE SELECTION OPTIONS (Applicable with -cAB option) | |||
-cr<num> | minimumNumReplicates | Minimum number of replicates required for each library entry. Peptide ions with fewer than <num> replicates will be excluded from library when creating best-replicate library library. | Default is 1. |
-c_DIS | removeDissimilarReplicates | Remove dissimilar replicates before selecting best replicate. | Turn on with -c_DIS, off with -c_DIS!. Default is on. |
QUALITY FILTER OPTIONS (Applicable with -cAQ option) | |||
-cr<num> | minimumNumReplicates | Replicate quorum. Its value affects behavior of quality filter (see below). | Default is 1. |
-cL<level>, -cl<level> | qualityLevelRemove, qualityLevelMark | Specify the stringency of the quality filter. -cL specifies the level for removal, -cl specifies the level for marking. | <level> = 0: No filter. <level> = 1: Remove/mark impure spectra. <level> = 2: Also remove/mark spectra with a spectrally similar counterpart in the library that is better. <level> = 3: Also remove/mark inquorate entries (defined with -cr) that share no peptide sub-sequences with any other entries in the library. <level> = 4: Also remove/mark all singleton entries. <level> = 5: Also remove/mark all inquorate entries (defined with -cr). Default is -cL2, -cl5 |
-c_QP1 | qualityPenalizeSingletons | Apply stricter thresholds to singleton spectra during quality filters. | Turn on with -c_QP1, off with -c_QP1!. Default is on. |
-c_QIP<thres> | qualityImmuneProbThreshold | Specify a probability above which library spectra are immune to quality filters. | Default is 0.999. |
-c_QIE | qualityImmuneMultipleEngines | Make spectra identified by multiple sequence search engines immune to quality filters. | Turn on with -c_QIE, off with -c_QIE!. Default is on. |
Command-line Token | Name in Parameter File | Meaning | Remarks |
---|---|---|---|
-V | None | Verbose mode. More information displayed to console. | Default is off. |
-Q | None | Quiet mode. | Default is off. |
-L<file> | None | Specify name of log file. | Default is spectrast.log. |
Other SpectraST Utilities
Plotspectrast
Plotspectrast is a spectrum viewer designed for SpectraST. It comes as two programs: a CGI that can be launched from a web page (e.g., from PepXMLViewer), and a stand-alone application. They are included in the TPP and no additional installation is necessary.
The most common use of Plotspectrast is for visualization of spectral matches from PepXMLViewer. When displaying SpectraST results, PepXMLViewer provides a link to invoke plotspectrast.cgi for each spectrum query. The query spectrum will be plotted as a "mirror image" of the best-matched library spectrum, enabling the user to quickly assess the quality of the match. Below the plot there is an ion table, and tables listing information about the library spectrum. The legend of the plots and ion table is as follows:
Various controls are available to the left of the plot to customize how the spectra are displayed:
The stand-alone plotspectrast application produces a static .jpg image in the same directory as the query spectrum file. It has the following usage:
plotspectrast <.splib file> <library file offset> <.mzXML file> <query scan number>
Plots the library spectrum at <library file offset> and the query spectrum of <query scan number> in the .mzXML file. The desired value of <library file offset> can be extracted from the .spidx, .pepidx or .sptxt file (BinaryFileOffset in the Comment field).
plotspectrast <.splib file> <library file offset> <.dta file>
Similar to above, except the query spectrum is in a .dta file.
plotspectrast <.splib file> <library file offset> <.none file>
Plots the library spectrum by itself. It will not actually look for the .none file, but the resulting .jpg file will be named with the same prefix as the .none file and place in the same directory.
plotspectrast <.msp file of library spectrum> <.msp, .dta, or .none file>
Similar to above, except the library spectrum is given in a .msp file.
Lib2HTML
Lib2HTML is an application that converts a SpectraST library into an HTML file for viewing. It is included in the TPP and no additional installation is necessary. In the resulting HTML file, replicates of the same peptide ion will be listed on one row, and links are provided to each replicate to view the spectrum using Plotspectrast. The usage is:
Lib2HTML <options> <full path from webserver root to .splib file>
Options include:
-
-V
: Verbose. Displaying more information for each entry. -
-N<num>
: Specify the maximum number of replicates displayed for each unique peptide ion. Default is 10. -
-P<path>
: Specify the full path from the webserver root to the plotspectrast.cgi binary.
Developer's Guide
The SpectraST source code contains detailed documentation. More tips to developers who want to modify SpectraST will be available shortly.
Where to Get Help
The SPC Tools Discussion Group: spctools-discuss.googlegroups.com
The SPC Tools Announcement Group: spctools-announce.googlegroups.com
Public spectral libraries are available for download at PeptideAtlas
External Links
- Institute for Systems Biology
- Sashimi Project on SourceForge.net
- Seattle Proteome Center Software
- spctools-discuss.googlegroups.com
- spctools-announce.googlegroups.com
- PeptideAtlas's Spectral Library Page
- GPM's X!Hunter Project
- BiblioSpec Project at University of Washington
Reference
- Keller, Andrew, et al. (2005) "A uniform proteomics MS/MS analysis platform utilizing open XML file formats". Molecular Systems Biology 1, 17. Full text
- Lam, Henry, et al. (2007). "Development and validation of a spectral library searching method for peptide identification from MS/MS". Proteomics 7 (5), 655-667. Abstract
- Craig, Robertson, et al. (2006). "Using annotated peptide mass spectrum libraries for protein identification". Journal of Proteome Research 5 (8), 1843-1849. Abstract
- Frewen, Barbara, et al. (2006). "Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries". Analytical Chemistry 78 (16), 5678-5684. Abstract