Processing glycopeptide data
From SPCTools
Background
We sometimes want to assay the glycoproteins in a sample. Because they are low-abundance, we first remove non-glycopeptides from the digested sample. Glycosylation always occurs at the asparagine (N) of an N[^P][ST] motif (N, followed by anything but P, followed by S or T). However, not all such motifs are glycosylated, and since a peptide may contain more than one such motif, we need to distinguish glycosylated motifs from non-glycosylated. Therefore we remove the carbohydrate moiety (something we want to do anyway to reduce sample complexity) using PNGase F, which also changes the N to a D, effectively tagging the N. One might then search with an optional modification of N with mass equal to the N/D massdiff. But this would allow N/D substitution at Ns that are not part of a glyco motif. So we substitute J or B for the N's in all the glyco motifs, and allow an optional modification at J (or B).
Outline of analysis procedure
- Create a target database where the N of all glyco sites is swapped for a J or B (depends on downstream software limitations, and also whether original seq DB has Bs in it already -- B is a standard but uncommon abbrev. for "N or D, not distinguishable")
- Search glyco data against this DB
- In search results, change J/B back to N
Notes on how to analyze
You can look here for a search that was done with sequest and xtandem. Xtandem params are the same as usual except that the target database is ipi.HUMAN.v3.38_forwdecoy_nxst.fasta:
/regis/sbeams/archive/jwatts/HsGlycoPlasma35indiv/HsGlycoPlasma35indiv
Pertinent scripts and other useful files are in ~tfarrah/alt_nxst. Out of context, so might be some unforeseen issues. If they don't work out of the box just let me know and I'll help you troubleshoot.
The basic method entails searching against a modified db with all NXS/T replaced by BXS/T (except for NPS/T and a few other exceptions). B stands for a D that's been substituted in. We then run the search with a static modification on B (to make it the same weight as a D) in the sequest.params, then back-converting the results (substituting Ns for all Bs?) and processing as normal (including refresh-parsing against the original db). We modified the method a little to use 'B' (avg of D and N) because our version of Sequest was limited in its ability to accept non-standard amino acids. If you are running xtandem you might want to consider whether there is a better way to do this. I've outlined the files in the archive below, let me know if you have questions.
Atwood-York_GlycopeptideSearchStrategy.pdf - original paper this is based on
nxst_conversion_recipe.txt - README file for this process. Note perl -pi -e step, this must be done.
sequest.params.ft - modified sequest.params file, the salient line is shown below:
add_B_avg_NandD = 0.4920 ; added to B - avg. 114.5962, mono. 114.53494
make_nxst_db.pl - script to convert database; assumes the sequence is all on one line, otherwise the regex might need tweaking.
batch_convert.sh - script to translate batch of xml files. changeback.pl - perl script called by batch script above to back-substitute files.