Building Peptide Atlas

From SPCTools

PeptideAtlas is an online database of peptides identified in tandem MS experiments. Raw MS data are gathered from labs around the world and processed uniformly as described below. Data from a single species, or sometimes a subproteome such as plasma or liver, are combined into subsections of PeptideAtlas called builds. PeptideAtlas is implemented in the SBEAMS database. Another page outlines the body of software associated with PeptideAtlas.

The atlas build process is wild and woolly and unstable. There are several caveats having to do with special cases, and several steps that must be run manually. The whole process is outlined below. Your other steadfast companion on the atlas build journey will be the atlas build recipe, which describes the same steps below but provides actual unix command line invocations that you can cut and paste. When these documents fail you, as they will, feel free to consult the PeptideAtlas team at ISB, currently comprising Zhi Sun, Dave Campbell, team leader and PeptideAtlas creator Eric Deutsch, and the author of this document, Terry Farrah.

October, 2012: We are re-tooling the pipeline. Notes are here!

1 Copy build recipe and follow it
2 Start with one or more PeptideProphet output files (pepXML) for each experiment in each project.
3 Post-process any glyco search results.
4 Register projects and experiments using SBEAMS interface.
5 Run iProphet
- 5.1 If you ran multiple search engines for each experiment, combine per experiment using iProphet
6 Build a biosequence set for your species if necessary
7 Refresh all iProphet output files to biosequence set
8 Run ProteinProphet on the iProphet file for each experiment.
9 Obtain search batch IDs for each experiment.
10 Ensure that the file $PIPELINE/etc/protid_priorities.csv contains protein identifier prioritization for your species
11 Run step01 of the PeptideAtlas build pipeline
- 11.1 Step 01: Gather peptides, load into DB, and update probabilities
12 Do some steps manually
13 Run steps 02 through 08 of the build pipeline
14 Define Atlas build via SBEAMS
15 Load data into build
16 Some other step that Eric wants us to add
17 To Do
- 17.1 Compiling protein information for existing atlases
18 Deprecated stuff
- 18.1 Deprecated functionality that may be useful again someday
  - 18.1.1 Combine all pepXML files for project using iProphet, then run ProteinProphet

Building Peptide Atlas

From SPCTools

Contents

Copy build recipe and follow it

Start with one or more PeptideProphet output files (pepXML) for each experiment in each project.

Post-process any glyco search results.

Register projects and experiments using SBEAMS interface.

Run iProphet

If you ran multiple search engines for each experiment, combine per experiment using iProphet

Build a biosequence set for your species if necessary

Refresh all iProphet output files to biosequence set

Run ProteinProphet on the iProphet file for each experiment.

Obtain search batch IDs for each experiment.

Ensure that the file $PIPELINE/etc/protid_priorities.csv contains protein identifier prioritization for your species

Run step01 of the PeptideAtlas build pipeline

Step 01: Gather peptides, load into DB, and update probabilities

Do some steps manually

If any experiments were contaminated with proteins from another organism

Optional: Extract a preliminary covering protein list from combined PAidentlist

Create a special filtered pepXML for each experiment

Create Master ProteinProphet file

If any protein IDS are overly long (SpectraST builds only)

Run steps 02 through 08 of the build pipeline

Step02: Download latest fasta files from web for reference DB (also called biosequence set)

Step 02a: Compile protein identifications

Step02sc: Estimate protein concentrations

Step02b: Post process protein identification information

Step03: Map peptides to reference DB

Step04: Count mapped peptides; print in different format

Step 05: Calculate chromosomal coordinates

Step 06: Make a list of unmappable peptides

Step 07: Compile statistics on the peptides and proteins in the build

Step 08:Build a SpectraST library from the build

Define Atlas build via SBEAMS

Load data into build

Load peptides

Load protein identifications

Build search key

Update empirical proteotypic scores

Load spectra and spectrum IDs

Update statistics

Some other step that Eric wants us to add

To Do

Compiling protein information for existing atlases

Deprecated stuff

Deprecated functionality that may be useful again someday

Combine all pepXML files for project using iProphet, then run ProteinProphet

Views

Personal tools

Navigation

support newsgroups

Search

Toolbox