Processing glycopeptide data
From SPCTools
Revision as of 22:23, 20 February 2009 Tfarrah (Talk | contribs) ← Previous diff |
Revision as of 22:54, 20 February 2009 Tfarrah (Talk | contribs) Next diff → |
||
Line 1: | Line 1: | ||
=== Raw notes from Dave Campbell email to tfarrah on Jan. 20, 2009 === | === Raw notes from Dave Campbell email to tfarrah on Jan. 20, 2009 === | ||
- | I think the database conversion script assumes the sequence is all on one line, otherwise the regex might need tweaking. | + | You can look here for a search that was done with sequest and xtandem. Xtandem params are the same as usual except that the target database is ipi.HUMAN.v3.38_forwdecoy_nxst.fasta: |
- | + | ||
- | You can look here for a search that was done with sequest and xtandem, I get the params for the latter from Eric, he can answer questions better than I: | + | |
/regis/sbeams/archive/jwatts/HsGlycoPlasma35indiv/HsGlycoPlasma35indiv | /regis/sbeams/archive/jwatts/HsGlycoPlasma35indiv/HsGlycoPlasma35indiv | ||
- | I've attached a tgz file that has the pertinent scripts and other | + | Pertinent scripts and other |
- | useful files. I took these out of context, so there might be some | + | useful files are in ~tfarrah/alt_nxst. Out of context, so might be some |
unforeseen issues. If they don't work out of the box just let me know | unforeseen issues. If they don't work out of the box just let me know | ||
and I'll help you troubleshoot. | and I'll help you troubleshoot. | ||
- | The basic method entails searching against a modified db, running | + | The basic method entails searching against a modified db with all NXS/T replaced by BXS/T (except for NPS/T and a few other exceptions). B stands for a D that's been substituted in. We then run |
- | the search with a static modification on B in the sequest.params, then | + | the search with a static modification on B (to make it the same weight as a D) in the sequest.params, then |
- | back-converting the results and processing as normal (including | + | back-converting the results (substituting Ns for all Bs?) and processing as normal (including |
refresh-parsing against the original db). We modified the method a | refresh-parsing against the original db). We modified the method a | ||
little to use 'B' (avg of D and N) because our version of Sequest was | little to use 'B' (avg of D and N) because our version of Sequest was | ||
Line 29: | Line 27: | ||
add_B_avg_NandD = 0.4920 ; added to B - avg. 114.5962, mono. 114.53494 | add_B_avg_NandD = 0.4920 ; added to B - avg. 114.5962, mono. 114.53494 | ||
- | make_nxst_db.pl - script to convert database | + | make_nxst_db.pl - script to convert database; assumes the sequence is all on one line, otherwise the regex might need tweaking. |
batch_convert.sh - script to translate batch of xml files. | batch_convert.sh - script to translate batch of xml files. | ||
changeback.pl - perl script called by batch script above to back-substitute files. | changeback.pl - perl script called by batch script above to back-substitute files. |
Revision as of 22:54, 20 February 2009
Raw notes from Dave Campbell email to tfarrah on Jan. 20, 2009
You can look here for a search that was done with sequest and xtandem. Xtandem params are the same as usual except that the target database is ipi.HUMAN.v3.38_forwdecoy_nxst.fasta:
/regis/sbeams/archive/jwatts/HsGlycoPlasma35indiv/HsGlycoPlasma35indiv
Pertinent scripts and other useful files are in ~tfarrah/alt_nxst. Out of context, so might be some unforeseen issues. If they don't work out of the box just let me know and I'll help you troubleshoot.
The basic method entails searching against a modified db with all NXS/T replaced by BXS/T (except for NPS/T and a few other exceptions). B stands for a D that's been substituted in. We then run the search with a static modification on B (to make it the same weight as a D) in the sequest.params, then back-converting the results (substituting Ns for all Bs?) and processing as normal (including refresh-parsing against the original db). We modified the method a little to use 'B' (avg of D and N) because our version of Sequest was limited in its ability to accept non-standard amino acids. If you are running xtandem you might want to consider whether there is a better way to do this. I've outlined the files in the archive below, let me know if you have questions.
Atwood-York_GlycopeptideSearchStrategy.pdf - original paper this is based on
nxst_conversion_recipe.txt - README file for this process. Note perl -pi -e step, this must be done.
sequest.params.ft - modified sequest.params file, the salient line is shown below:
add_B_avg_NandD = 0.4920 ; added to B - avg. 114.5962, mono. 114.53494
make_nxst_db.pl - script to convert database; assumes the sequence is all on one line, otherwise the regex might need tweaking.
batch_convert.sh - script to translate batch of xml files. changeback.pl - perl script called by batch script above to back-substitute files.