Expert search and TPP usage

From SPCTools

(Difference between revisions)
Jump to: navigation, search
Revision as of 18:07, 23 January 2009
Tfarrah (Talk | contribs)

← Previous diff
Revision as of 18:43, 23 January 2009
Tfarrah (Talk | contribs)

Next diff →
Line 8: Line 8:
<search_descriptor> should be a three-letter abbreviation for the search engine, underscore, and a brief descriptor of the database searched. <search_descriptor> should be a three-letter abbreviation for the search engine, underscore, and a brief descriptor of the database searched.
-== Eric's advice on November 11, 2008: ==+Raw data and mzXML files should be stored in the <experiment> directory. mzXML files should be symlinked (use ln -s) to each <search_descriptor> directory. Search results and TPP results should be stored in the <search_descriptor> directories.
-X! Tandem-K:+== Searching ==
 +Go to the directory for your experiment.
 + # copy a generic search parameter file and edit to suit your data
 + cp /sbeams/bin/params/search.params .
 + vi search.params
-- To process 1 or more mzXML files, do this:+=== X!Tandem-K ===
 +We use a modification of the publicly-available X!Tandem search engine called X!Tandem-K. It uses a significantly different scoring algorithm, K-score.
 + # see above for example <search_descriptor>
 + setenv SEARCHDIR <search_descriptor>
 + mkdir $SEARCHDIR
 + foreach file ( *.mzXML )
 + ln -s ../$file $SEARCHDIR/$file
 + end
 + cd $SEARCHDIR
 + /sbeams/bin/params/createEngineSpecificParams.pl --config_file ../search.params --output tandem.params
 + # edit tandem.params to conduct a search appropriate for your data
 + vi tandem.params
 + echo “-ON” > xinteract.params
 + /sbeams/bin/tandem/runtandemsearch *.mzXML
 +(TPP is automatically run on the search results based on whatever is in xinteract.params and whole log emailed to you)
-setenv SEARCHDIR XTK_HsIPI3.50_TargetDecoy+==== X!Tandem parameters ====
 +See [http://www.thegpm.org/TANDEM/api/] for a description of X!Tandem parameters.
-mkdir $SEARCHDIR+A major choice you must make is whether to do a one-pass search with generous criteria (allowing semi-tryptic matches, modifications, and missed cleavages), or whether to do a two-pass search, the first pass with stricter criteria, and the second pass with generous criteria but only searching those proteins that were matched in the first pass. The two-pass method is called "refine" mode. It is much faster, but violates some of the assumptions made in the TPP and therefore may give slightly less accurate TPP results.
-foreach file ( *.mzXML )+Example parameters for a one-pass search allowing semi-tryptic cleavage:
 + <note type="input" label="protein, cleavage semi">yes</note>
 + <note type="input" label="refine">no</note>
-ln -s ../$file $SEARCHDIR/$file+Example parameters for a search using refine mode, allowing semi-tryptic cleavage only in the second pass:
 + <note type="input" label="refine">yes</note>
 + <note type="input" label="refine, cleavage semi">yes</note>
-end+=== raw notes from Eric, November 2008 ===
- +
-cd $SEARCHDIR+
- +
-cp /regis/sbeams/bin/tandem/tandem.params-cam tandem.params+
- +
-vi tandem.params+
- +
-echo “-ON” > xinteract.params+
- +
-setenv TESTDEVPATH /tools/bin/TPP/tpp-dshteynb/bin+
- +
-/sbeams/bin/tandem/runtandemsearch *.mzXML+
- +
-(TPP is automatically run on the search results based on whatever is in xinteract.params and whole log emailed to you)+
- +
- +
If you want to rerun the TPP on a dataset, do: If you want to rerun the TPP on a dataset, do:
Line 113: Line 120:
Notes: Notes:
- 
-- I do recommend the using a cutting-edge TESTDEVPATH: 
- 
-setenv TESTDEVPATH /tools/bin/TPP/tpp-dshteynb/bin 
- 
-but keep in mind that this is DEV. Not everyone may like this. It uses iProphet automatically. If you don’t specify TESTDEVPATH, iProphet is not currently run. Maybe it should be. 
- For Xtandem, let’s not use the semi-parametric model. - For Xtandem, let’s not use the semi-parametric model.

Revision as of 18:43, 23 January 2009

Contents

Data storage

If data is to ultimately be stored in SBEAMS, it should be placed in the following directory structure:

/regis/sbeams/archive/<investigator>/<project>/<experiment>/<search_descriptor>

For example:

/regis/sbeams/archive/youngah/HsUrine/HsNormFemUrine_163A/XTK_Hs3.38

<investigator> should be the name of the person who generated the data, in the format used in SBEAMS. If that investigator is not registered in SBEAMS, they should be registered.

<search_descriptor> should be a three-letter abbreviation for the search engine, underscore, and a brief descriptor of the database searched.

Raw data and mzXML files should be stored in the <experiment> directory. mzXML files should be symlinked (use ln -s) to each <search_descriptor> directory. Search results and TPP results should be stored in the <search_descriptor> directories.

Searching

Go to the directory for your experiment.

# copy a generic search parameter file and edit to suit your data
cp /sbeams/bin/params/search.params .
vi search.params

X!Tandem-K

We use a modification of the publicly-available X!Tandem search engine called X!Tandem-K. It uses a significantly different scoring algorithm, K-score.

# see above for example <search_descriptor>
setenv SEARCHDIR <search_descriptor>
mkdir $SEARCHDIR
foreach file ( *.mzXML )
  ln -s ../$file $SEARCHDIR/$file
end
cd $SEARCHDIR
/sbeams/bin/params/createEngineSpecificParams.pl --config_file ../search.params --output tandem.params
# edit tandem.params to conduct a search appropriate for your data
vi tandem.params
echo “-ON” > xinteract.params
/sbeams/bin/tandem/runtandemsearch *.mzXML

(TPP is automatically run on the search results based on whatever is in xinteract.params and whole log emailed to you)

X!Tandem parameters

See [1] for a description of X!Tandem parameters.

A major choice you must make is whether to do a one-pass search with generous criteria (allowing semi-tryptic matches, modifications, and missed cleavages), or whether to do a two-pass search, the first pass with stricter criteria, and the second pass with generous criteria but only searching those proteins that were matched in the first pass. The two-pass method is called "refine" mode. It is much faster, but violates some of the assumptions made in the TPP and therefore may give slightly less accurate TPP results.

Example parameters for a one-pass search allowing semi-tryptic cleavage:

<note type="input" label="protein, cleavage semi">yes</note>
<note type="input" label="refine">no</note>

Example parameters for a search using refine mode, allowing semi-tryptic cleavage only in the second pass:

<note type="input" label="refine">yes</note>
<note type="input" label="refine, cleavage semi">yes</note>

raw notes from Eric, November 2008

If you want to rerun the TPP on a dataset, do:

/bin/rm interact*

/bin/rm zztandem*

setenv TESTDEVPATH /tools/bin/TPP/tpp-dshteynb/bin

/sbeams/bin/tandem/finishtandemsearch --nowait *.mzXML >& zztandempostprocessing.log

(the --nowait is required so that the finisher doesn’t want around for the .done files)


Note that runtandemsearch and finishtandemsearch will also run on *.mgf or *.mzData or *.pkl




SpectraST:

- To process 1 or more mzXML files, do this:


setenv SEARCHDIR SST_HsNISTIT2.0_aDECOY1

mkdir $SEARCHDIR

foreach file ( *.mzXML )

ln -s ../$file $SEARCHDIR/$file

end

cd $SEARCHDIR

cp /regis/sbeams/bin/spectrast/spectrast.params .

vi spectrast.params

echo “-OPNMd -dDECOY” > xinteract.params

setenv TESTDEVPATH /tools/bin/TPP/tpp-dshteynb/bin

/sbeams/bin/spectrast/runspectrast *.mzXML

(TPP is automatically run on the search results based on whatever is in xinteract.params and whole log emailed to you)


If you want to rerun the TPP on a dataset, do:

/bin/rm interact*

/bin/rm zzpost*

setenv TESTDEVPATH /tools/bin/TPP/tpp-dshteynb/bin

/sbeams/bin/spectrast/finishspectrastsearch --nowait *.mzXML >& zzpostprocessing.log

(the --nowait is required so that the finisher doesn’t want around for the .done files)




Notes:

- For Xtandem, let’s not use the semi-parametric model.

- For SpectraST, let’s DO use the semi-parametric model.

- iProphet and MHT confidence scores are good. Let’s use them. Confidence scores are conservative but good.

- I think LOGPROBS is a bust. I’m not recommending them at the moment.

Personal tools