Terry's blog
From SPCTools
Contents |
Terry's research blog
October 8, 2008
I keep thinking the date must be a little later than it actually is. Like, I thought today should be the 9th. At the same time, it seems not so long ago that 2008 felt like a very new year, and it was hard to imagine it would ever feel otherwise.
My task for the next 11 months here at ISB is to reduce the false positive rate in PeptideAtlas. We want to catalog as many proteins as possible while minimizing inclusion of proteins that aren't really in the sample. Eric Deutsch and I have had several discussions about this, starting at my interview in May of this year. This is a place for me to record what I've learned from Eric, and to record ideas of my own and those I've received from others.
Ideas for reducing the false positive rate in Peptide Atlas
- Discard singletons (proteins represented by only a single spectrum
- Require a much higher probability cutoff for singletons (e.g. 0.99 instead of 0.90)
- Require a much higher probability cutoff for all protein identifications (e.g. 0.99 instead of 0.90)
- Set a fixed FDR, say 0.1%, and set probability cutoffs accordingly
- Local FDRs should match the global FDR
- Use Henry Lam's SpectraST quality filter
- Cutoff of 0.99 for all nobs (cryptic incomplete note)
- Make use of this observation: the decoy estimated FDR is much smaller than that obtained by averaging the probabilities of all (peptide?) identifications and subtracting from 1. Suggests that decoy estimated FDR is too small, or probabilities are too small.
- Recalculate proabilities for short peptides based on ((# short decoys) / (total hits to short peptides))
- Do the search engines make use of LC retention times? If not, try.
- Look at peak intensity. Searches and TPP do not look at this.
Spectrum features to watch for
- Big unidentified peaks are bad (possible exception: peak just to left of precursor m/z -- confirm with expert)
- Consecutive identified peaks are good. Breaks in this are bad.
- Important identifications should be on big peaks, not small ones. Searches and TPP do not take into account peak intensity.