PeptideAtlas Pipeline Retool 2012/13

From SPCTools

(Difference between revisions)

Current revision

10/29/12:

Why is the build pipeline so complex and time consuming? What are we trying to do?

00. Start with iPro and ProtPro results, refreshed to ref DB

=> We can skip refresh if ref DB same as search DB

01. For each expt, create PAidentlist template file with all PSMs above P=0.4; can be used for multiple builds on same data.

=> Why does this take so long?

NEW: pre-run step 01, 01a below using a roughly estimated PSM FDR. Then calculate the PSM FDRs needed for Silver, Gold, using David's formula.

NEW: Perform steps 01a through 06 in parallel for Gold (prot FDR=1%) and Silver (prot FDR=5%) builds.

01a. Make combined, sorted PAidentlist file, filtering using a roughly estimated PSM FDR.

NEW: create another file, peptide_probs.tsv, saving the highest probability for each stripped peptide. Sort by descending probability.

=> Can we speed the sorting? Currently we use unix sort.

=> What do we need APD files for?

01b. Create special filtered pepXML for each expt., then run ProtPro on all expts combined

Remove step02 of the pipeline--creating biosequence set

02a,b,sc. Compile protein identifications and estimate protein concentrations

NEW: Using peptide_probs.tsv, gather peptides mapping to identified proteins down to 1% peptide FDR.

NEW: Using the PAidentlist file, gather PSMs matching those peps down to 1% PSM FDR.

03. Map peptides to reference DB => peptide_mapping.tsv

05. Calculate chromosomal coordinates => coordinate_mapping.txt

06. Make a list of unmappable peptides

07. Statistics on peps, prots in build.

08. Build SpectraST library from build

Retrieved from "http://tools.proteomecenter.org/wiki/index.php?title=PeptideAtlas_Pipeline_Retool_2012/13"

PeptideAtlas Pipeline Retool 2012/13

From SPCTools

Current revision

Views

Personal tools

Navigation

support newsgroups

Search

Toolbox

 trying to do?
 . Start with iPro and ProtPro results, refreshed to ref DB
 => We can skip refresh if ref DB same as search DB
 . For each expt, create PAidentlist template file with all PSMs above P=0.4;
 can be used for multiple builds on same data.
 => Why does this take so long?
-. Make combined, sorted PAIdentlist file
+'''NEW: pre-run step 01, 01a below using a roughly estimated PSM FDR. Then calculate the PSM FDRs needed for Silver, Gold, using David's formula.'''
+'''NEW: Perform steps 01a through 06 in parallel for Gold (prot FDR=1%) and Silver (prot FDR=5%) builds.'''
+a. Make combined, sorted PAidentlist file, filtering using a roughly estimated PSM FDR.
+'''NEW: create another file, peptide_probs.tsv, saving the highest probability for each stripped peptide. Sort by descending probability.'''
 => Can we speed the sorting? Currently we use unix sort.
 => What do we need APD files for?
+b. Create special filtered pepXML for each expt.,
+then run ProtPro on all expts combined
+''Remove step02 of the pipeline--creating biosequence set''
+a,b,sc. Compile protein identifications and estimate protein concentrations
+'''NEW: Using peptide_probs.tsv, gather peptides mapping to identified proteins down to 1% peptide FDR.'''
+'''NEW: Using the PAidentlist file, gather PSMs matching those peps down to 1% PSM FDR.'''
+. Map peptides to reference DB => peptide_mapping.tsv
+. Calculate chromosomal coordinates => coordinate_mapping.txt
+. Make a list of unmappable peptides
+. Statistics on peps, prots in build.
+. Build SpectraST library from build