Building Peptide Atlas
From SPCTools
Here is how I built a human urine PeptideAtlas in October 2008. The details of how to do each step are found in regis.systemsbiology.net:~tfarrah/PeptideAtlasBuild/HumanUrine_2008-09.notes.
- Start with one or more pepXML files for each experiment in each project. A project is a set of related experiments. For example, a project may study proteins found in normal and diseased liver, and may include 4 experiments: tissue from two normal patients and from two diseased patients. The pepXML files should be created by searching the spectra with a database search engine such as SEQUEST, X!Tandem, or SpectraST, then validating the hits using PeptideProphet.
- Register projects and experiments in PeptideAtlas using SBEAMS interface.
- Obtain search batch IDs for each experiment and create an experiments list.
- Run ProteinProphet on all pepXML files combined to create a single protXML file
- Run PeptideAtlas build "pipeline". Scripts can be found in /net/db/projects/PeptideAtlas/pipeline/run_scripts. Each script ultimately calls /net/db/projects/PeptideAtlas/pipeline/run_scripts/run_Master_current.csh, and this is where the meat of the pipeline resides.
- Gather all peptides
- Download latest fasta files from web for reference DB (also called biosequence set)
- Map peptides to proteins in reference DB
- Get chromosomal coordinates
- Compile statistics on the peptides and proteins in the build
- Build a SpectraST library from the build (optional)
- Load the reference DB (biosequence set) if new one is needed
- Define Atlas build via SBEAMS
- Load data into build
- Build search key
- Update empirical proteotypic scores
- Load spectra and spectrum IDs
- Update statistics