Expert search and TPP usage
From SPCTools
Contents |
Preparation
Know the following about your data:
- which amino acid modifications are ubiquitous, and what are their weights?
- which amino acid modifications are present only some of the time?
- how complete was the digest?
- was the data analyzed on a high mass accuracy instrument?
Data storage
If data is to ultimately be stored in SBEAMS, it should be placed in the following directory structure (Feb. 2009 note -- this disk is nearly full and new data should not be moved there):
/regis/sbeams/archive/<investigator>/<project>/<experiment_tag>/<search_descriptor>
For example:
/regis/sbeams/archive/youngah/HsUrine/HsNormFemUrine_163A/XTK_Hs3.38
<investigator> should be the name of the person who generated the data, in the format used in SBEAMS. If that investigator is not registered in SBEAMS, they should be registered.
<search_descriptor> should be a three-letter abbreviation for the search engine, underscore, and a brief descriptor of the database searched.
Raw data and mzXML files should be stored in the <experiment> directory. mzXML files should be symlinked (use ln -s) to each <search_descriptor> directory. Search results and TPP results should be stored in the <search_descriptor> directories.
Moving XML files
Paths are hardcoded within pepXML and protXML files. If you move these files, you must run the following script in order for the files to work properly with the TPP:
/sbeams/bin/updateAllPaths.pl *.xml *.xls *.shtml
Searching
Go to the directory for your experiment and set up a generic search parameters file. This will be automatically adapted for SEQUEST and X!Tandem using a nifty script that Abhishek Pratap wrote in 2008.
# copy a generic search parameter file and edit to suit your data cp /sbeams/bin/params/search.params . vi search.params
X!Tandem-K
We use a modification of the publicly-available X!Tandem search engine called X!Tandem-K. It uses a significantly different scoring algorithm, K-score.
# see above for example <search_descriptor> setenv SEARCHDIR <search_descriptor> mkdir $SEARCHDIR foreach file ( *.mzXML ) ln -s ../$file $SEARCHDIR/$file end cd $SEARCHDIR /sbeams/bin/params/createEngineSpecificParams.pl --config_file ../search.params --output tandem.params # edit tandem.params to conduct a search appropriate for your data vi tandem.params echo “ -OdA -dDECOY_ -E<experiment_tag>” > xinteract.params /sbeams/bin/tandem/runtandemsearch *.mzXML
(TPP is automatically run on the search results based on whatever is in xinteract.params and whole log emailed to you)
X!Tandem parameters
See [1] for a description of X!Tandem parameters.
A major choice you must make is whether to do a one-pass search with generous criteria (allowing semi-tryptic matches, modifications, and missed cleavages), or whether to do a two-pass search, the first pass with stricter criteria, and the second pass with generous criteria but only searching those proteins that were matched in the first pass. The two-pass method is called "refine" mode. It is much faster, but violates some of the assumptions made in the TPP and therefore may give slightly less accurate TPP results.
Example parameters for a one-pass search allowing semi-tryptic cleavage:
<note type="input" label="protein, cleavage semi">yes</note> <note type="input" label="refine">no</note>
Example parameters for a search using refine mode, allowing semi-tryptic cleavage only in the second pass:
<note type="input" label="refine">yes</note> <note type="input" label="refine, cleavage semi">yes</note>
xinteract (TPP) parameters for X!Tandem searches
For a listing of all xinteract parameters, type xinteract | more.
Here is a sample xinteract.params file for an X!Tandem search using decoys; it works with and without refinement:
-OdA -dDECOY_ -EYoungAhFem1912
Key:
- -O (letter, not digit) introduces options for PeptideProphet
- d reports decoy hits with a computed probability based on the model learned
- A says that you have high mass accuracy data that was searched using mono-isotopic masses
- -dDECOY_ use decoy hits to pin down the negative distribution; DECOY_ is the decoy identifier prefix
- -E is the experiment tag
raw notes from Eric, November 2008
If you want to rerun the TPP on a dataset, do:
/bin/rm interact*
/bin/rm zztandem*
setenv TESTDEVPATH /tools/bin/TPP/tpp-dshteynb/bin
/sbeams/bin/tandem/finishtandemsearch --nowait *.mzXML >& zztandempostprocessing.log
(the --nowait is required so that the finisher doesn’t want around for the .done files)
Note that runtandemsearch and finishtandemsearch will also run on *.mgf or *.mzData or *.pkl
SpectraST:
- To process 1 or more mzXML files, do this:
setenv SEARCHDIR SST_HsNISTIT2.0_aDECOY1
mkdir $SEARCHDIR
foreach file ( *.mzXML )
ln -s ../$file $SEARCHDIR/$file
end
cd $SEARCHDIR
cp /regis/sbeams/bin/spectrast/spectrast.params .
vi spectrast.params
echo “-OPNMd -dDECOY” > xinteract.params
setenv TESTDEVPATH /tools/bin/TPP/tpp-dshteynb/bin
/sbeams/bin/spectrast/runspectrast *.mzXML
(TPP is automatically run on the search results based on whatever is in xinteract.params and whole log emailed to you)
If you want to rerun the TPP on a dataset, do:
/bin/rm interact*
/bin/rm zzpost*
setenv TESTDEVPATH /tools/bin/TPP/tpp-dshteynb/bin
/sbeams/bin/spectrast/finishspectrastsearch --nowait *.mzXML >& zzpostprocessing.log
(the --nowait is required so that the finisher doesn’t want around for the .done files)
Notes:
- For Xtandem, let’s not use the semi-parametric model.
- For SpectraST, let’s DO use the semi-parametric model.
- iProphet and MHT confidence scores are good. Let’s use them. Confidence scores are conservative but good.
- I think LOGPROBS is a bust. I’m not recommending them at the moment.