Building Peptide Atlas
From SPCTools
Here is how I built a human urine PeptideAtlas in December 2008.
The details of how to do each step are found in mimas.systemsbiology.net:/net/db/projects/PeptideAtlas/pipeline/recipes/HumanUrine_iProphet_2008-10.notes Most stuff takes place via mimas/db at /net/db/projects/PeptideAtlas.
Start with one or more pepXML search results files for each experiment in each project.
A project is a set of related experiments. For example, a project may study proteins found in normal and diseased liver, and may include 4 experiments: tissue from two normal patients and from two diseased patients. The pepXML files should be created by searching the spectra with a database search engine such as SEQUEST, X!Tandem, or SpectraST, then validating the hits using PeptideProphet. If you are going to combine search results using iProphet (see below), iProphet should not be run on each set of search results individually; to avoid running iProphet you will need to run xinteract manually because scripts such as runtandemsearch automatically run iProphet after PeptideProphet.
Searching databases containing decoys is recommended to provide a reference point for evaluating the FDR (false discovery rate) of the final Atlas. As of fall 2008, spectral libraries containing decoys are available for SpectraST searching.
It is helpful when referencing files using wildcards if the pepXML files all reside at the same level in the directory tree. If you have to move files to achieve this, adjust the paths within them using
$ /sbeams/bin/updateAllPaths.pl *.xml *.xls *.shtml.
Run iProphet and ProteinProphet
If you ran multiple search engines for each experiment, combine per experiment using iProphet.
Create a directory, parallel to the search results directories, named iProphet. Then, for example,
$ iProphet ../{XTK,SPC,SEQ}*/interact-prob.pep.xml interact-combined.pep.xml
Be sure that the input pepXML files were not already processed by iProphet -- you don't want to run iProphet twice. Caution: if you ran an automated post-processing script such as runtandemsearch (which calls finishtandemsearch), iProphet may already have been run automatically.
The resulting pepXML files will be used to generate final peptide probabilities in the "gather all peptides" step of the Atlas build process below.
Combine all pepXML files for project using iProphet, then run ProteinProphet
First create a directory for your project in your data area, for ample disk space. Run on regis9 for ample memory.
$ ssh regis9 $ cd /regis/data3/tfarrah/search $ mkdir HsUrine; cd HsUrine; mkdir MultipleExps; cd MultipleExps; mkdir iProphet; cd iProphet $ iProphet /regis/sbeams/archive/{phaller,youngah}/*Urine*/*/{XTK,SPC,SEQ}*/interact-prob.pep.xml $ ProteinProphet interact-combined.pep.xml interact-combined.prot.xml NORMPROTLEN PROTLEN MININDEP0.2 IPROPHET > & ProteinProphet.out
Combining all pepXML files may not be feasible with many and/or large files. In that case, you will need to run iProphet on the experiments in batches, then run ProteinProphet on all the resulting pepXML files combined. Consult David S. for advice.
Register projects and experiments using SBEAMS interface.
- Go to db.systemsbiology.net.
- Login to SBEAMS.
- Click tab "My Projects" or "Accessible Projects" and click "Add new project" at bottom.
- Fill out fields. Owner of project should be the experimenter who created the data, or the computational biologist who combined the data using iProphet. Project tag should match name of subdirectory in /sbeams/archive/<project_owner> that contains the data.
- To register experiments, go to "Accessible Projects", then click the PROTEOMICS button next to your project.
Obtain search batch IDs for each experiment and create an experiments list.
I have found that load_proteomics_experiment.pl expects to find a sequest.params file in each search directory. For non-sequest searches and iProphet results, create one using
$ touch sequest.params
Run PeptideAtlas build "pipeline".
Scripts can be found in /net/db/projects/PeptideAtlas/pipeline/run_scripts. Each script ultimately calls /net/db/projects/PeptideAtlas/pipeline/run_scripts/run_Master_current.csh, and this is where the meat of the pipeline resides.
Gather all peptides and update probabilities
step01. Calls createPipelineInput.pl, via pipeline/bin/PeptideFilesGenerator.pm. Creates "identlist file" for each pepXML in Experiments.list, and also a combined file. This is a simple text format (one line per record) file containing all the relevant pepXML and protXML info for each peptide identification. The probabilities and perhaps some other info are massaged a bit; in particular peptide probabilities are adjusted using the protXML probabilities as a guide. An identlist template file is also created which contains only the unmassaged pepXML info; it is cached in the same dir as each pepXML file for use in future builds.
Download latest fasta files from web for reference DB (also called biosequence set)
step02.
Map peptides to proteins in reference DB
Get chromosomal coordinates
Compile statistics on the peptides and proteins in the build
step07. Results in /net/db/projects/PeptideAtlas/pipeline/output/HumanUrine_2008-09_Ens49/analysis/analysis.out. This file contains instructions for doing the following manually, all of which (I believe on 01/12/08) are optional:
- creating an experiment contribution plot. A plot for the web Atlas is generated automatically, but these instructions tell you how to create one appropriate for PowerPoint or printing.
- creating an amino acid abundance plot. Untested by Terry.
- updating the prototypic peptide database and generating plots thereof. The instructions for this are incomplete; details are in Terry's email.
Build a SpectraST library from the build
Load the reference DB (biosequence set) if new one is needed
Define Atlas build via SBEAMS
Load data into build
Command below entered on command line on mimas. Full usage including desired options found in recipe.
Load data
$SBEAMS/lib/scripts/PeptideAtlas/load_atlas_build.pl
Build search key
$SBEAMS/lib/scripts/PeptideAtlas/rebuildKeySearch.pl
Update empirical proteotypic scores
$SBEAMS/lib/scripts/PeptideAtlas/updateProteotypicScores.pl