Building Peptide Atlas

From SPCTools

(Difference between revisions)

Revision as of 18:44, 19 December 2008

Here is how I built a human urine PeptideAtlas in December 2008.

The details of how to do each step are found in mimas.systemsbiology.net:/net/db/projects/PeptideAtlas/pipeline/recipes/HumanUrine_iProphet_2008-10.notes Most stuff takes place via mimas/db at /net/db/projects/PeptideAtlas.

1 Start with one or more pepXML files for each experiment in each project.
2 If you ran multiple search engines for each experiment, combine using iProphet.
3 Combine all pepXML files using iProphet, then run ProteinProphet
4 Register projects and experiments using SBEAMS interface.
5 Obtain search batch IDs for each experiment and create an experiments list.
6 Run PeptideAtlas build "pipeline".
7 Load the reference DB (biosequence set) if new one is needed
8 Define Atlas build via SBEAMS
9 Load data into build

Start with one or more pepXML files for each experiment in each project.

A project is a set of related experiments. For example, a project may study proteins found in normal and diseased liver, and may include 4 experiments: tissue from two normal patients and from two diseased patients. The pepXML files should be created by searching the spectra with a database search engine such as SEQUEST, X!Tandem, or SpectraST, then validating the hits using PeptideProphet. iProphet should not be run on each experiment.

It is helpful when referencing files using wildcards if the pepXML files all reside at the same level in the directory tree. If you have to move files to achieve this, adjust the paths within them using /sbeams/bin/updateAllPaths.pl *.xml *.xls *.shtml.

If you ran multiple search engines for each experiment, combine using iProphet.

Create a directory, parallel to the search results directories, named iProphet. Then, for example,

$ iProphet ../{XTK,SPC,SEQ}*/interact-prob.pep.xml interact-combined.pep.xml

Combine all pepXML files using iProphet, then run ProteinProphet

First create a directory for your project in your data area, for ample disk space. Run on regis9 for ample memory.

$ ssh regis9
$ cd /regis/data3/tfarrah/search
$ mkdir HsUrine; cd HsUrine; mkdir MultipleExps; cd MultipleExps; mkdir iProphet; cd iProphet
$ iProphet /regis/sbeams/archive/{phaller,youngah}/*Urine*/*/{XTK,SPC,SEQ}*/interact-prob.pep.xml
$ ProteinProphet interact-combined.pep.xml interact-combined.prot.xml NORMPROTLEN PROTLEN MININDEP0.2 IPROPHET > & ProteinProphet.out

Register projects and experiments using SBEAMS interface.

Go to db.systemsbiology.net.
Login to SBEAMS.
Click tab "My Projects" or "Accessible Projects" and click "Add new project" at bottom.
Fill out fields. Owner of project should be the experimenter who created the data, or the computational biologist who combined the data using iProphet. Project tag should match name of subdirectory in /sbeams/archive/<project_owner> that contains the data.
To register experiments, go to "Accessible Projects", then click the PROTEOMICS button next to your project.

Obtain search batch IDs for each experiment and create an experiments list.

If doing iProphet results, must touch sequest.params to fool load_proteomics_experiment.pl into thinking your dir is OK.

Run PeptideAtlas build "pipeline".

Scripts can be found in /net/db/projects/PeptideAtlas/pipeline/run_scripts. Each script ultimately calls /net/db/projects/PeptideAtlas/pipeline/run_scripts/run_Master_current.csh, and this is where the meat of the pipeline resides.

Gather all peptides

step01. Calls createPipelineInput.pl, via pipeline/bin/PeptideFilesGenerator.pm. Creates "identlist file" for each pepXML in Experiments.list, and also a combined file. This is a simple text format (one line per record) file containing all the relevant pepXML and protXML info for each peptide identification. The probabilities and perhaps some other info are massaged a bit; in particular peptide probabilities are adjusted using the protXML probabilities as a guide. An identlist template file is also created which contains only the unmassaged pepXML info; it is cached in the same dir as each pepXML file for use in future builds.

Download latest fasta files from web for reference DB (also called biosequence set)

step02.

Map peptides to proteins in reference DB

Get chromosomal coordinates

Compile statistics on the peptides and proteins in the build

step07. Results in /net/db/projects/PeptideAtlas/pipeline/output/HumanUrine_2008-09_Ens49/analysis/analysis.out. This file contains instructions for doing the following manually:

creating an experiment contribution plot
creating an amino acid abundance plot
updating the prototypic peptide database and generating plots thereof.

Build a SpectraST library from the build (optional)

Load the reference DB (biosequence set) if new one is needed

Define Atlas build via SBEAMS

Load data into build

Command below entered on command line on mimas. Full usage including desired options found in recipe.

Load data

$SBEAMS/lib/scripts/PeptideAtlas/load_atlas_build.pl

Build search key

$SBEAMS/lib/scripts/PeptideAtlas/rebuildKeySearch.pl

Update empirical proteotypic scores

Load spectra and spectrum IDs

Update statistics

Retrieved from "http://tools.proteomecenter.org/wiki/index.php?title=Building_Peptide_Atlas"

-Here is how I built a human urine PeptideAtlas in October 2008.
+Here is how I built a human urine PeptideAtlas in December 2008.
-The details of  how to do each step are found in regis.systemsbiology.net:~tfarrah/PeptideAtlasBuild/HumanUrine_2008-09.notes.
+The details of how to do each step are found in mimas.systemsbiology.net:/net/db/projects/PeptideAtlas/pipeline/recipes/HumanUrine_iProphet_2008-10.notes
 Most stuff takes place via mimas/db at /net/db/projects/PeptideAtlas.
 ===Start with one or more pepXML files for each experiment in each project.===
-A project is a set of related experiments. For example, a project may study proteins found in normal and diseased liver, and may include 4 experiments: tissue from two normal patients and from two diseased patients. The pepXML files should be created by searching the spectra with a database search engine such as SEQUEST, X!Tandem, or SpectraST, then validating the hits using PeptideProphet and, optionally, iProphet.
+A project is a set of related experiments. For example, a project may study proteins found in normal and diseased liver, and may include 4 experiments: tissue from two normal patients and from two diseased patients. The pepXML files should be created by searching the spectra with a database search engine such as SEQUEST, X!Tandem, or SpectraST, then validating the hits using PeptideProphet. iProphet should not be run on each experiment.
-A project may at some point also represent the iProphet combined results of several projects by several researchers, searched by several different search engines.
 It is helpful when referencing files using wildcards if the pepXML files all reside at the same level in the directory tree. If you have to move files to achieve this, adjust the paths within them using /sbeams/bin/updateAllPaths.pl *.xml *.xls *.shtml.
+===If you ran multiple search engines for each experiment, combine using iProphet.===
+Create a directory, parallel to the search results directories, named iProphet. Then, for example,
+ $ iProphet ../{XTK,SPC,SEQ}*/interact-prob.pep.xml interact-combined.pep.xml
+===Combine all pepXML files using iProphet, then run ProteinProphet===
+First create a directory for your project in your data area, for ample disk space. Run on regis9 for ample memory.
+ $ ssh regis9
+ $ cd /regis/data3/tfarrah/search
+ $ mkdir HsUrine; cd HsUrine; mkdir MultipleExps; cd MultipleExps; mkdir iProphet; cd iProphet
+ $ iProphet /regis/sbeams/archive/{phaller,youngah}/*Urine*/*/{XTK,SPC,SEQ}*/interact-prob.pep.xml
+ $ ProteinProphet interact-combined.pep.xml interact-combined.prot.xml NORMPROTLEN PROTLEN MININDEP0.2 IPROPHET > & ProteinProphet.out
 ===Register projects and experiments using SBEAMS interface.===
 If doing iProphet results, must touch sequest.params to fool load_proteomics_experiment.pl
 into thinking your dir is OK.
-===Run ProteinProphet on all pepXML files combined to create a single protXML file===
-====If using iProphet to combine searches====
-If combining multiple experiments using iProphet, first run iProphet on all pepXML files. Make sure you have enough disk space. I used the following command:
- iProphet /regis/sbeams/archive/{phaller,youngah}/*Urine*/*/{SEQ,XTK,SPC}*/interact-prob.pep.xml interact-combined.iprophet.pep.xml
-If combining multiple search results per experiment using iProphet, first run iProphet on all pepXML files separately for each experiment.
-Caution: iProphet can consume a lot of memory, especially when used to combine a lot of pepXML files. 11/19/08: may wish to run on regis9.
-====For all Atlas builds, with or without iProphet====
-Run ProteinProphet on desired pepXML file[s] to create a single protXML file
 ===Run PeptideAtlas build "pipeline".===
 Scripts can be found in /net/db/projects/PeptideAtlas/pipeline/run_scripts. Each script ultimately calls /net/db/projects/PeptideAtlas/pipeline/run_scripts/run_Master_current.csh, and this is where the meat of the pipeline resides.
 ====Gather all peptides====
+step01. Calls createPipelineInput.pl, via pipeline/bin/PeptideFilesGenerator.pm. Creates "identlist file" for each pepXML in Experiments.list, and also a combined file. This is a simple text format (one line per record) file containing all the relevant pepXML and protXML info for each peptide identification. The probabilities and perhaps some other info are massaged a bit; in particular peptide probabilities are adjusted using the protXML probabilities as a guide.  An identlist template file is also created which contains only the unmassaged pepXML info; it is cached in the same dir as each pepXML file for use in future builds.
 ====Download latest fasta files from web for reference DB (also called biosequence set)====
+step02.
 ====Map peptides to proteins in reference DB====
 ====Get chromosomal coordinates====
 ====Compile statistics on the peptides and proteins in the build====
+step07. Results in /net/db/projects/PeptideAtlas/pipeline/output/HumanUrine_2008-09_Ens49/analysis/analysis.out. This file contains instructions for doing the following manually:
+* creating an experiment contribution plot
+* creating an amino acid abundance plot
+* updating the prototypic peptide database and generating plots thereof.
 ====Build a SpectraST library from the build (optional)====
 ===Load the reference DB (biosequence set) if new one is needed===
 ===Define Atlas build via SBEAMS===
 ===Load data into build===
-===Build search key===
+Command below entered on command line on mimas. Full usage including desired options found in recipe.
-===Update empirical proteotypic scores===
+====Load data====
-===Load spectra and spectrum IDs===
+$SBEAMS/lib/scripts/PeptideAtlas/load_atlas_build.pl
-===Update statistics===
+====Build search key====
+$SBEAMS/lib/scripts/PeptideAtlas/rebuildKeySearch.pl
+====Update empirical proteotypic scores====
+====Load spectra and spectrum IDs====
+====Update statistics====