TPP AMZTPP:PetuniaTutorial

From SPCTools

(Difference between revisions)
Jump to: navigation, search
Revision as of 23:20, 16 December 2013
JoeS (Talk | contribs)
(Step 5: Monitoring the progress of the searches)
← Previous diff
Revision as of 23:25, 16 December 2013
JoeS (Talk | contribs)

Next diff →
Line 59: Line 59:
After submitting the searches the user should be presented with a list of jobs ordered by submission date. The bottom of this list will have details about the last submitted job. You should see messages about queuing the upload of the input files for both of the searches. Next navigate to Acount > Amazon Cloud using the main menu bar. In the details of this page you should see that the background process is running, that there are 4 message queues and you may even see that some files have been uploaded to Amazons Simple Storage Solution (S3). After submitting the searches the user should be presented with a list of jobs ordered by submission date. The bottom of this list will have details about the last submitted job. You should see messages about queuing the upload of the input files for both of the searches. Next navigate to Acount > Amazon Cloud using the main menu bar. In the details of this page you should see that the background process is running, that there are 4 message queues and you may even see that some files have been uploaded to Amazons Simple Storage Solution (S3).
-The background process is a Windows task which manages the interaction of your computer with Amazon Web Services. It periodically checks the upload and download queues for messages. If it finds a upload message it will grab it and upload the files it finds in the message. Once all the files have been uploaded it will queue a message in the service queue and launch a new compute instance on Amazon's Elastic cloud (if needed). Similarly if it sees and messages in the download queue it will grab the messages and download the list of files in the message. The compute instances monitor the service queue, executing any programs on the specified input files found in messages placed in this queue. The output files and results are queued in the download queue for the local background task to handle. +The background process is a Windows task which manages the interaction of your computer with Amazon Web Services. t periodically checks at the 4 queues (shown in the details) for messages. If it sees a upload message it will grab it and upload the files it finds in the message. When all the files have been uploaded it will queue a message in the service queue and launch a new compute instance (if necessary) on Amazon's Elastic cloud to process the data. As compute instances boot they themselves periodically review the service queue for messages. If they encounter one they'll grab the message, download the files and run the appropriate program on the input. When the program finishes the results are uploaded to S3 and a new message is queued into the download queue so that your local background task can then download the results and complete the jobs.
You can watch this entire process execute by refreshing your browser window. The counts of files in the various queues will change over time. You can watch this entire process execute by refreshing your browser window. The counts of files in the various queues will change over time.
 +
 +== Step 6: Visualize your results ==
 +
 +After the searches complete you should be able to visualize the results using

Revision as of 23:25, 16 December 2013

Contents

Introduction

About Tutorial

This tutorial is written for anyone interested in extending available computational resources via Amazon Web Services in the Trans-Proteomics Pipeline. It will walk the user through the steps of using TPP's Petunia web interface to setup proper Amazon credentials and submit multiple searches on AWS. Readers should already familiar with the TPP's web interface and usage.

System requirements and TPP versions

In order to execute this tutorial you must have already installed TPP (version 4.6.3 or greater) on a Windows system. You may also need to install amztpp, the command line tool for managing Amazon Web Services resources used by TPP. Guides for installing both are available at:

Readers should also be aware that executing this tutorial will incur some AWS charges. The exact amount of these charges will vary based on usage but should be on the order of less that $1-$4 USD.

Tutorial

Step 1: Getting Your Amazon credentials

In order for the TPP to have access to Amazon Web Services you must provide your AWS credentials confirm that you are who you say you are and that you do have permissions to do what you are trying to do. These credentials are known as Amazon's access and secret key. These keys are used to make secure REST or Query protocol requests to any AWS service API.

You can create new keys using the Amazon Security Credentials page. Your access keys are displayed under the Access Keys section in the Credentials Section of the page. Secret keys are now no longer displayed. If you've previously created a access/secret key pair and have forgotten what the secret key is you will need to generate a new key pair. For more information about setting up your AWS credentials please see Where's my secret access key?

Step 2: Registering your Amazon Credentials in TPP

Log into Petunia web interface of TPP by double-clicking on the Trans-Proteomic Pipeline flower icon on your Desktop or through the Start menu. Alternatively, you can open a browser window into the following URL: http://localhost/tpp-bin/tpp_gui.pl . You can use the credentials guest and guest as user name and password to log in if this is a new TPP installation.

Once logged in click on Account > Cloud in the menu bar at the top of the page. This should take you to the clusters details form. Open the "Register Amazon EC2 Account" section if it isn't already open. In it are two form fields, one for your access key and one for your secret key. Cut & paste both values into the fields provided and click on the "Verify and Use Keys" button. If all goes well the page should be refreshed and you should see an additional status section added to the page containing details on your current Amazon Web Services.

Step 3: Download and install the tutorial data

For this tutorial we'll be using the same dataset used by the TPP tutorial. This is a SILAC-labeled Yeast dataset comprised of 2 runs on a high mass-accuracy Orbitrap instrument, along with a Yeast database appended with decoys. We also include a parameters files for inspect, tandem, myrimatch and omssa for MS/MS identification. You can install it by:

  • Downloading the mzML files, parameter files and database from Sourceforge (92.2Mb).
  • Unpack the demo archive using 7zip, Stuffit, unzip or a similar program and set the destination directory to be C:\Inetpub\wwwroot\ISB\data

If you've successfully installed the demo set you should have a new folder at C:\Inetpub\wwwroot\ISB\data\demoAMZTPP.

Please note that this tutorial assumes that you are running a default TPP installation on a Windows system; if you are using a different system, please adjust the parameters files and file locations accordingly.

Step 4: Search data with X!Tandem

A custom version of the popular open-source search engine X!Tandem is bundled and installed with the TPP. It has been modified from the original distribution by adding the K-Score scoring function, developed by a team at the Fred Hutchinson Cancer Research Center.

  • First, make sure that Tandem is selected as the analysis pipeline.
  • Click the Database Search tab under Analysis Pipeline to access the X!Tandem search interface.
  • Under Specify mzXML Input Files, click Add Files and select the two mzML files present in the demoAMZTPP directory as input files for database searching.
  • Similarly, under Specify Tandem Parameters File choose the Tandem parameters file called tandem.xml located in the same directory.
This file defines the database search parameters that override the full set of default settings referenced in the file isb_default_input.
In this example, the mass tolerance is set to -2.1 Da to 4.1 Da, and the residue modification mass is set to 57.021464@C. A wide mass tolerance is used to include all the spectra with precursor m/z off by one or more isotopic separations; the high accuracy achieved by the instrument is then modeled by PeptideProphet with the accurate mass model.
For more information, please go to TANDEM
  • Next, select a sequence database to search against. Navigate up to the dbase directory in the File Chooser, and select the database file yeast_orfs_all_REV.20060126.short.fasta.
  • Lastly select the "on_Amazon_cloud" option next to the Run Tandem Search button and click the button to launch the searches.

Step 5: Monitoring the progress of the searches

After submitting the searches the user should be presented with a list of jobs ordered by submission date. The bottom of this list will have details about the last submitted job. You should see messages about queuing the upload of the input files for both of the searches. Next navigate to Acount > Amazon Cloud using the main menu bar. In the details of this page you should see that the background process is running, that there are 4 message queues and you may even see that some files have been uploaded to Amazons Simple Storage Solution (S3).

The background process is a Windows task which manages the interaction of your computer with Amazon Web Services. t periodically checks at the 4 queues (shown in the details) for messages. If it sees a upload message it will grab it and upload the files it finds in the message. When all the files have been uploaded it will queue a message in the service queue and launch a new compute instance (if necessary) on Amazon's Elastic cloud to process the data. As compute instances boot they themselves periodically review the service queue for messages. If they encounter one they'll grab the message, download the files and run the appropriate program on the input. When the program finishes the results are uploaded to S3 and a new message is queued into the download queue so that your local background task can then download the results and complete the jobs.

You can watch this entire process execute by refreshing your browser window. The counts of files in the various queues will change over time.

Step 6: Visualize your results

After the searches complete you should be able to visualize the results using

Personal tools