TPP Tutorial v1

From SPCTools

(Difference between revisions)
Jump to: navigation, search
Revision as of 22:09, 6 July 2007
Bryanp (Talk | contribs)

← Previous diff
Revision as of 22:12, 6 July 2007
Bryanp (Talk | contribs)
(Setting up an account)
Next diff →
Line 89: Line 89:
In order to add a different username, create a tpp-bin/users/NEWUSER/ directory and run crypt isbTPPspc NEWPASSWORD > .password from this directory. In these examples "isbTPPspc" is the crypt key. This can be changed. In order to add a different username, create a tpp-bin/users/NEWUSER/ directory and run crypt isbTPPspc NEWPASSWORD > .password from this directory. In these examples "isbTPPspc" is the crypt key. This can be changed.
- + 
=Tutorial Data= =Tutorial Data=

Revision as of 22:12, 6 July 2007

Trans Proteomic Pipeline (TPP) Tutorial

TPP V3.2.1, 2007. Note: Screenshots may vary from the TPP build you are using because the application is in development.

This document was originally assembled and edited by Bryan Prazen of Insilicos.

Contents


Introduction

This tutorial will cover the application of the Trans Proteomic Pipeline (TPP) for protein identification and quantitation to LC-tandem MS data. The data used in this tutorial has previously been searched with SEQUEST (Thermo Finnigan). Although this tutorial should be helpful to anyone interested in statistical identification and quantitative analysis of proteins with mass spectrometry, this tutorial was designed for the scientist who is currently running SEQUEST searches on their tandem mass spectrometry data and would like to process their data a step further. This tutorial shows an example of how to run the TPP tools so that searched data can be statistically evaluated, quantified and organized using TPP. This tutorial focuses on the application of TPP and only briefly touches on the bioinformatics behind the tools which are included in TPP.

About Trans-Proteomic Pipeline

Trans-Proteomic Pipeline (TPP) is a data analysis pipeline for the analysis of LC/MS/MS proteomics data. TPP includes modules for validation of database search results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results. The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines. image:Pipeline_Tutorial.jpg

Systems Requirements

This tutorial does not require a search engine. Searched data is provided. A computer running Windows, XP or 2000 is required. Currently builds of TPP are distributed for the cygwin environment running in Windows, Linux and native Windows. This tutorial focuses on TPP run in the cygwin environment on computers running the Windows operating system. A web browser such as Firefox or Internet Explorer is required. Including the TPP software, the tutorial requires about 900MB of hard drive space. TPP itself requires approximately 190MB of disk space. The remaining space is necessary to store and manipulate the data. For TPP analysis of your data it is important to remember that TPP requires that mass spectrometer data be saved in mzXML or mzDATA formats. mzXML and mzDATA are instrument independent data formats used by data analysis software like TPP and data repositories. mzXML was developed by the Institute for Systems Biology and mzData, developed by the HUPO PSI standards group. Unfortunately, storing data in both the mass spectrometer manufacture specific format and one instrument independent data format will require more than twice as much storage space for data.

About this Tutorial

This guide uses the following typographical conventions: Bold is used to indicate commands or steps that the user must complete. Small Itallics is use for notes that contain information that is not required to complete this tutorial.

Who Should Use this Tutorial?

This tutorial is written for anyone who has a general interest in learning about one method to identify and quantify peptides and proteins using mass spectrometry. We have attempted to write this tutorial so that the user does not need an extraordinary knowledge of proteomics, biology, chemistry, mass spectrometry, or software engineering. Also, this tutorial does not require any software or data that is not easily available on the web and it does not require any previous experience with the analysis of mass spectrometric data. This tutorial should also be of use to those who are very familiar with proteomics data analysis but do not have a great deal of experience with TPP.

Getting Started

Downloading and Installing TPP

Information on installing and downloading the Windows Cygwin distribution of TPP can be found at: TPP:Windows_Cygwin_Installation

Getting familiar with Cygwin

The TPP GUI nearly eliminates the need to use the command line, but this section is included in the tutorial because the the Windows Cygwin distribution is running in a Cygwin environment and we do not want you to be totally lost if at some point you must step out of the GUI environment and type some commands.

Cygwin is a Linux-like environment for Windows that consists of a program which acts as a Linux interface emulation and a collection of tools, which provide Linux look and feel. Cygwin is aimed mainly at porting software that runs on Linux systems to run on Windows with only minor software changes. Cygwin is installed with the TPP during the installation procedure.

After installing the TPP, the Cygwin Bash Shell can be found under Cygwin in the Windows start menu. If you are a little familiar with Linux, Unix, or DOS you will not have any problem running TPP from Cygwin.

Below is a short list of commands for the Cygwin shell that will help you find your way around the Cygwin environment.

ls lists the files in a directory man displays the reference manual page about a command cd change directory; cd .. moves you backwards to the next higher subdirectory level chmod changes the permissions for a file

To cut text from the Cygwin shell first highlight the text with the mouse. Then press return, or put the mouse over the Cygwin shell window bar, right click, select edit, and then select copy. To paste text put the cursor in the desired location, put the mouse over the Cygwin shell window bar, right click, select edit and then select paste.

Wildcards

  • and ? are wildcard commands in the Cygwin shell.

For example the command

ls raft4???.html

lists all the .html files in the directory that start with raft4 and have 3 characters after the ‘4’ and before the ‘.’.

The * wildcard is more general. It matches zero or any number of characters, except that it will not match a period that is the first character of a name.

ls raft4041.*

Lists all the files that start with ‘raft4041.’. Wildcards can be used in most Cygwin shell commands.

man followed by a command is a command to get the manual entry describing the command. For example man ls leads to a manual that explains the command ls in more detail than you would ever want to know. Use the spacebar to page through the manual and the q key to quit the manual.

Directory path delimiter

Directory path delimiters can be a confusing part of using Cygwin. Cygwin writes paths with / and dos or windows writes paths with \. For instance a directory in dos or windows that is C:\Inetpub\wwwroot\tutorial will be cygdrive/c/Inetpub/wwwroot/tutorial in the Cygwin shell. Many directory related commands like moving or deleting files can be done in the Windows or Cygwin environment. In this tutorial the Cygwin command will be written out to make you more familiar with Cygwin shell.

If you run into trouble operating Cygwin a Cygwin user guide can be found at Cygwin.

NOTE: The cygwin environment has many tools available which might enhance your Cygwin experience which are not included in the TPP install. These can be downloaded at the Cygwin web site (www.cygwin.com) or obtained by selecting a site other than tools.proteomecenter.org when operating the Cygwin setup program. Installation of new tools should not cause a problem with the TPP, but you should be cautious when updating versions of tools that came with the TPP installation. For instance, Gnuplot comes in a few flavors and the one we install is not the package distributed with Cygwin and has a slightly different syntax. Installing the Cygwin Gnuplot package will likely break the graph images generated by the TPP.

Setting up an account

The TPP GUI comes with one user account. This account has ‘guest’ as both the user name and password. Below are instructions for making another account.


Open the Cygwin shell and type:

cd /cygdrive/c/Inetpub/tpp-bin/users/ md tutorial cd tutorial

and

crypt isbTPPspc TPP > .password You have just created the password ‘TPP’ for the user ‘tutorial.’

In order to add a different username, create a tpp-bin/users/NEWUSER/ directory and run crypt isbTPPspc NEWPASSWORD > .password from this directory. In these examples "isbTPPspc" is the crypt key. This can be changed.

Tutorial Data

Getting the Tutorial Data

This tutorial uses a data set containing proteins that co-purified with lipid raft plasma membrane domains isolated from control and stimulated Jurkat human T cells. The analysis of similar data can be found in:

“The Application of New Software Tools to Quantitative Protein Profiling Via Isotope-coded Affinity Tag (ICAT) and Tandem Mass Spectrometry: II. Evaluation of Tandem Mass Spectrometry Methodologies for Large-Scale Protein Analysis, and the Application of Statistical Tools for Data Analysis and Interpretation” Priska D. von Haller, Eugene Yi, Samuel Donohoe, Kelly Vaughn, Andrew Keller, Alexey I. Nesvizhskii, Jimmy Eng, Xiao-jun Li, David R. Goodlett, Ruedi Aebersold, and Julian D. Watts, Mol Cell Proteomics 2003 2: 428-442."

The data used in this tutorial is not the same data that is described in the publication but the same scientists collected it using the same sample preparation and mass spectrometry procedures. Analysis was done on a LCQ Classic. The samples were ICAT labeled (Old-ICAT, light tag = d0 442, heavy tag = d8 450), separated by cation exchange chromatography, purified by avidin cartrages, separated by μLC, and measured with MS/MS. The tandem mass spectra were then analyzed using SEQUEST. This tutorial begins with the analysis of the SEQUEST results. Only a portion of the data from the raft experiment is used in this tutorial in order to save time and hard drive space. This tutorial uses data that has already been searched by SEQUEST so that the user does not need to have a SEQUEST license for the computer that is used for this tutorial.


Download this data at:

www.insilicos.com/spctools/spctools/data/tutorial.tgz

Tell your browser to save the file to C:\Inetpub\wwwroot\ISB\data\tutorial. You will need to create the tutorial directory.

Unpacking and Storing the TPP Tutorial Data

It is important that all the data that is analyzed with the TPP be stored in specific locations. The TPP can only see data that is located under the C:\Inetpub\wwwroot directory.

If you have the program Winzip on your computer, open the tutorial.tgz file with Winzip and extract the contents to the C:\Inetpub\wwwroot\ISB\data\tutorial directory. After successfully extracting the data delete the tutorial.tgz file.

If you do not have Winzip, open a Cygwin terminal from the Windows start menu and type the commands cd /cygdrive/c/inetpub/wwwroot/ISB/data/tutorial and tar xzvf tutorial.tgz. At this point we can erase the compressed file. Type: rm tutorial.tgz

For this tutorial and future data analysis all data should be stored in C:\Intetpub\wwwroot\ISB\data\. Each experiment can be stored in an individual folder at this location, such as our tutorial folder.

You should now have a folder named ‘tutorial’ which contains mzXML data for 6 LC runs, folders that contain the .out and .dta files, a sequest.params file and a folder containing a FASTA database.

NOTE: To analyze data from your own experiments you will need to search the data, compress the search results and convert the raw data to mzXML format. These steps are covered in the last section - Beyond this Tutorial. Moving the dbase folder to C:\cygwin\:

I suspect that you can do this in the Windows environment easily enough, but to get familiar with Cygwin let’s do it there. Open a Cygwin shell from the Windows start menu and type the following commands:

cd /cygdrive/c/inetpub/wwwroot/ISB/data/tutorial and

mv dbase /cygdrive/c/cygwin/

Many problems with TPP are associated with file permissions and these problems seem to be very machine dependent. We will start by changing the permissions of your data folder. Type the following command in the Cygwin shell:

chmod -R 777 /cygdrive/c/Inetpub/wwwroot/ISB/data/tutorial chmod -R 777 /cygdrive/c/cygwin/dbase For other permission related problems type the same command with the appropriate directory inserted.

SEQUEST data analysis

Creating Summary HTML Files

Before we get started with the GUI we will need to run one function outside the GUI. This is because the GUI assumes that a SEQUEST search will be done from the GUI. Because we decided not to require SEQUEST for this tutorial, we will first transfer the tutorial’s search results to a format the GUI can read using a text command. Each tandem mass spectrum resulting from a liquid chromatography (LC) experiment results in an individual .out file after analysis with SEQUEST or TurboSEQUEST. The first step in analyzing the tutorial results is to collect the result from a given LC separation. The Out2Summary program collates the .out files into a single HTML file for each LC separation. The original raft data contains 24 separate LC separations. For speed and portability reasons this tutorial will only analyze 6 of the 24 LC separations. The data from these 6 separations will be combined and analyzed as one single experiment.

The first step in the analysis is to change the directory in the Cygwin shell to your working directory for the tutorial. Type or copy the following command into the Cygwin shell.

cd /cygdrive/c/Inetpub/wwwroot/ISB/data/tutorial

NOTE: If you have not strayed from these instructions you will already be in the directory.

Out2Summary must be run for each LC separation. Type or copy (yes, you can copy multiple commands at once):

out2summary raft4041 > raft4041.html out2summary raft4243 > raft4243.html out2summary raft4445 > raft4445.html out2summary raft4647 > raft4647.html out2summary raft4849 > raft4849.html out2summary raft5051 > raft5051.html

Image:Out2summary.jpg

This process will take a few minutes.

NOTE: The “>” command directs output that would otherwise go to the screen to the file named raftXXXX.html

NOTE: In future analyses the base name used for the .html should match the base name used for the mzXML data (as above), if you want the instrument information to be passed to the TPP tools.