TPP AMZTPP:PetuniaTutorial

From SPCTools

Revision as of 23:28, 27 December 2013; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Contents

Introduction

About Tutorial

This tutorial is written for anyone interested in extending available computational resources via Amazon Web Services in the Trans-Proteomics Pipeline. It will walk the user through the steps of using TPP's Petunia web interface to setup proper Amazon credentials and submit multiple searches on AWS. Readers should already familiar with the TPP's web interface and usage.

System requirements and TPP versions

In order to execute this tutorial you must have already installed TPP (version 4.6.3 or greater) on a Windows system. You may also need to install amztpp, the command line tool for managing Amazon Web Services resources used by TPP. Guides for installing both are available at:

Readers should also be aware that executing this tutorial will incur some AWS charges. The exact amount of these charges will vary based on usage but should be on the order of $1-$4 USD.

Tutorial

Step 1: Getting Your Amazon credentials

In order for the TPP to have access to Amazon Web Services you must provide your AWS credentials confirm that you are who you say you are and that you do have permissions to do what you are trying to do. These credentials are known as Amazon's access and secret key. These keys are used to make secure REST or Query protocol requests to any AWS service API.

You can create new keys using the Amazon Security Credentials page. Your access keys are displayed under the Access Keys section in the Credentials Section of the page. Secret keys are now no longer displayed. If you've previously created a access/secret key pair and have forgotten what the secret key is you will need to generate a new key pair. For more information about setting up your AWS credentials please see Where's my secret access key?

Step 2: Registering your Amazon Credentials in TPP

Log into Petunia web interface of TPP by double-clicking on the Trans-Proteomic Pipeline flower icon on your Desktop or through the Start menu. Alternatively, you can open a browser window into the following URL: http://localhost/tpp-bin/tpp_gui.pl . You can use the credentials guest and guest as user name and password to log in if this is a new TPP installation.

Once logged in click on Account > Amazon Cloud in the menu bar at the top of the page. This should take you to the clusters details form. Open the "Register Amazon EC2 Account" section if it isn't already open. In it are two form fields, one for your access key and one for your secret key. Cut & paste both values into the fields provided and click on the "Verify and Use Keys" button. If all goes well the page should be refreshed and you should see an additional status section added to the page containing details on your current Amazon Web Services.

Step 3: Download and install the tutorial data

For this tutorial we'll be using the same dataset used by the TPP tutorial. This is a SILAC-labeled Yeast dataset comprised of 2 runs on a high mass-accuracy Orbitrap instrument, along with a Yeast database appended with decoys. We also include a parameters files for inspect, tandem, myrimatch and omssa for MS/MS identification. You can install it by:

  • Downloading the mzML files, parameter files and database from Sourceforge (92.2Mb).
  • Unpack the demo archive using 7zip, Stuffit, unzip or a similar program and set the destination directory to be C:\Inetpub\wwwroot\ISB\data

If you've successfully installed the demo set you should have a new folder at C:\Inetpub\wwwroot\ISB\data\demoAMZTPP.

Please note that this tutorial assumes that you are running a default TPP installation on a Windows system; if you are using a different system, please adjust the parameters files and file locations accordingly.

Step 4: Search data with X!Tandem

A custom version of the popular open-source search engine X!Tandem is bundled and installed with the TPP. It has been modified from the original distribution by adding the K-Score scoring function, developed by a team at the Fred Hutchinson Cancer Research Center.

  • First, make sure that Tandem is selected as the analysis pipeline.
  • Click the Database Search tab under Analysis Pipeline to access the X!Tandem search interface.
  • Under Specify mzXML Input Files, click Add Files and select the two mzML files present in the demoAMZTPP directory as input files for database searching.
  • Similarly, under Specify Tandem Parameters File choose the Tandem parameters file called tandem.xml located in the same directory.
This file defines the database search parameters that override the full set of default settings referenced in the file isb_default_input.
In this example, the mass tolerance is set to -2.1 Da to 4.1 Da, and the residue modification mass is set to 57.021464@C. A wide mass tolerance is used to include all the spectra with precursor m/z off by one or more isotopic separations; the high accuracy achieved by the instrument is then modeled by PeptideProphet with the accurate mass model.
For more information, please go to TANDEM
  • Next, select a sequence database to search against. Navigate up to the dbase directory in the File Chooser, and select the database file yeast_orfs_all_REV.20060126.short.fasta.
  • Lastly select the "on_Amazon_cloud" option next to the Run Tandem Search button and click the button to launch the searches.

Step 5: Monitoring the progress of the searches

After submitting the searches the user should be presented with a list of jobs ordered by submission date. The bottom of this list will have details about the last submitted job. You should see messages about queuing the upload of the input files for both of the searches. Next navigate to Account > Amazon Cloud using the main menu bar. In the details of this page you should see that the background process is running, that there are 4 message queues and you may even see that some files have been uploaded to Amazons Simple Storage Solution (S3).

The background process is a Windows task which manages the interaction of your computer with Amazon Web Services. It periodically checks the 4 queues (shown in the details) for messages. If it sees a upload message it will grab it and upload the files it finds in the message. When all the files have been uploaded it will queue a message in the service queue and launch a new compute instance (if necessary) on Amazon's Elastic cloud to process the data. As compute instances boot they themselves periodically review the service queue for messages. If they encounter one they'll grab the message, download the files and run the appropriate program on the input. When the program finishes the results are uploaded to S3 and a new message is queued into the download queue so that your local background task can then download the results and complete the jobs.

You can watch this entire process execute by refreshing your browser window. The counts of files in the various queues will change over time.

Step 6: Visualize your results

Your searches are completed when only the done queue contains any messages. At this point if everything has gone correctly your search results should have been downloaded from Amazon S3 and are available for browsing using the TPP tools in Petunia. To view the pepXML using Petunia navigate to C:\Inetpub\wwwroot\ISB\data\demoAMZTPP. Click on either one of the pepXML files to view the search results. From here you can click on a peptide in the list to see its spectrum.

Step 7: Cleanup

After you are satisfied with your results its important to cleanup your files on Amazon to avoid any additional charges as the original input data and results remain in Amazon S3 until you perform this last step. Using Petunia navigate to Account > Amazon Cloud and click on the button "Shutdown all instances and delete all data". This will ensure that any remaining instances are terminated and all data stored in Amazon S3 is removed. Note at any time you can click on this button to stop Amazon processing and remove your data.

Next Steps

For further information on using Amazon cloud services see the other tutorials and documentation on this website or post your questions to the TPP mailing list at spctools-discuss discussion group.

Personal tools