Building Peptide Atlas
From SPCTools
Here is how I built a human urine PeptideAtlas in October 2008.
The details of how to do each step are found in regis.systemsbiology.net:~tfarrah/PeptideAtlasBuild/HumanUrine_2008-09.notes.
Most stuff takes place via mimas/db at /net/db/projects/PeptideAtlas.
Start with one or more pepXML files for each experiment in each project.
A project is a set of related experiments. For example, a project may study proteins found in normal and diseased liver, and may include 4 experiments: tissue from two normal patients and from two diseased patients. The pepXML files should be created by searching the spectra with a database search engine such as SEQUEST, X!Tandem, or SpectraST, then validating the hits using PeptideProphet and, optionally, iProphet.
A project may at some point also represent the iProphet combined results of several projects by several researchers, searched by several different search engines.
It is helpful when referencing files using wildcards if the pepXML files all reside at the same level in the directory tree. If you have to move files to achieve this, adjust the paths within them using /sbeams/bin/updateAllPaths.pl *.xml *.xls *.shtml.
Register projects and experiments using SBEAMS interface.
- Go to db.systemsbiology.net.
- Login to SBEAMS.
- Click tab "My Projects" or "Accessible Projects" and click "Add new project" at bottom.
- Fill out fields. Owner of project should be the experimenter who created the data, or the computational biologist who combined the data using iProphet. Project tag should match name of subdirectory in /sbeams/archive/<project_owner> that contains the data.
- To register experiments, go to "Accessible Projects", then click the PROTEOMICS button next to your project.
Obtain search batch IDs for each experiment and create an experiments list.
If doing iProphet results, must touch sequest.params to fool load_proteomics_experiment.pl into thinking your dir is OK.
Run ProteinProphet on all pepXML files combined to create a single protXML file
If using iProphet to combine searches
If combining multiple experiments using iProphet, first run iProphet on all pepXML files. Make sure you have enough disk space. I used the following command:
iProphet /regis/sbeams/archive/{phaller,youngah}/*Urine*/*/{SEQ,XTK,SPC}*/interact-prob.pep.xml interact-combined.iprophet.pep.xml
If combining multiple search results per experiment using iProphet, first run iProphet on all pepXML files separately for each experiment.
Caution: iProphet can consume a lot of memory, especially when used to combine a lot of pepXML files. 11/19/08: may wish to run on regis9.
For all Atlas builds, with or without iProphet
Run ProteinProphet on desired pepXML file[s] to create a single protXML file
Run PeptideAtlas build "pipeline".
Scripts can be found in /net/db/projects/PeptideAtlas/pipeline/run_scripts. Each script ultimately calls /net/db/projects/PeptideAtlas/pipeline/run_scripts/run_Master_current.csh, and this is where the meat of the pipeline resides.