Building Peptide Atlas

From SPCTools

Revision as of 19:45, 27 October 2008; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Here is how I built a human urine PeptideAtlas in October 2008. The details of how to do each step are found in regis.systemsbiology.net:~tfarrah/PeptideAtlasBuild/HumanUrine_2008-09.notes.

Contents

Start with one or more pepXML files for each experiment in each project.

A project is a set of related experiments. For example, a project may study proteins found in normal and diseased liver, and may include 4 experiments: tissue from two normal patients and from two diseased patients. The pepXML files should be created by searching the spectra with a database search engine such as SEQUEST, X!Tandem, or SpectraST, then validating the hits using PeptideProphet and, optionally, iProphet.

A project may at some point also represent the iProphet combined results of several projects by several researchers, searched by several different search engines.

Register projects and experiments using SBEAMS interface.

  • Go to db.systemsbiology.net.
  • Login to SBEAMS.
  • Click tab "My Projects" or "Accessible Projects" and click "Add new project" at bottom.
  • Fill out fields. Owner of project should be the experimenter who created the data, or the computational biologist who combined the data using iProphet. Project tag should match name of subdirectory in /sbeams/archive/<project_owner> that contains the data.
  • To register experiments, go to "Accessible Projects", then click the SBEAMS button next to your project.

Obtain search batch IDs for each experiment and create an experiments list.

If doing iProphet results, must touch sequest.params to fool load_proteomics_experiment.pl into thinking your dir is OK.

Run ProteinProphet on all pepXML files combined to create a single protXML file

Run PeptideAtlas build "pipeline".

Scripts can be found in /net/db/projects/PeptideAtlas/pipeline/run_scripts. Each script ultimately calls /net/db/projects/PeptideAtlas/pipeline/run_scripts/run_Master_current.csh, and this is where the meat of the pipeline resides.

Gather all peptides

Download latest fasta files from web for reference DB (also called biosequence set)

Map peptides to proteins in reference DB

Get chromosomal coordinates

Compile statistics on the peptides and proteins in the build

Build a SpectraST library from the build (optional)

Load the reference DB (biosequence set) if new one is needed

Define Atlas build via SBEAMS

Load data into build

Build search key

Update empirical proteotypic scores

Load spectra and spectrum IDs

Update statistics

Personal tools