TPP Amazon Machine Images

From SPCTools

Revision as of 19:05, 11 May 2011; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Starting with TPP 4.4.1 the TPP group is now making available pre-built Amazon Machine Images (AMI) with the latest TPP software installed to make it even easier to perform proteomics data analysis. These images are configured to be used with either the TPP Web Application (TWA), the TPP AWS high performance computing tools, for your own in house applications, or as a base for your own EC2 images. The images are based on the latest Ubuntu EC2 public images and include features such as persistent store in S3 or EBS backed filesystems and wine based conversions of MS/MS files.

Contents

Overview

SPCTools now provides a number of Amazon Machine Images (AMI) based on the official public Ubuntu images made available at [1]. In addition to having TPP installed these images also contain the following open source software:

  • OMSSA - The Open Mass Spectrometry Search Algorithm
  • InsPecT -Tool for fast and accurate identification of post-translationally modified peptides from tandem mass spectra.
  • MyriMatch - highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis

Amazon EC2 AMIs

In addition to having TPP installed these images also contain the following open source software:

  • OMSSA - The Open Mass Spectrometry Search Algorithm
  • InsPecT -Tool for fast and accurate identification of post-translationally modified peptides from tandem mass spectra.
  • MyriMatch - highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis


Zone TPP Version Ubuntu Version Release server 32-bit server 64-bit
us-east-1 4.4.1 Ubuntu 11.04 Natty 20110415 n/a ami-1ef50977
instance-store

User's Guide

There are many good guides already written on how to use Amazon Machine Images (AMI) in the EC2 product. Here's just a few:

Official Guide
Getting Started With Amazon EC2

Notes

Filesystem Layout

The ubuntu images where created with a small root partition and all remaining available disk space is mounted as /mnt. Therefore the following directories were created for TPP data:

/mnt/tppdata/local - Used for local data storage. Anything placed here will be lost when the instance is stopped.
/mnt/tppdata/s3 - Mount point for S3. See note below for more information
/mnt/tppdata/ebs - Mount point for elastic block store (EBS). See note below for more information.

The Petunia web interface is configured to use /mnt/tppdata as its top level so that users can browse and manipulate data on the instance.

S3 Persistence

Since the TPP image comes with s3fs pre-installed it is possible to mount a S3 bucket as a local filesystem to get persistence storage of data in S3. To use this feature please provide the bucket name, your AWS credentials in the userdata field when starting the instance.

EBS Persistence

TBD

Developer's Guide

Building

The easiest way to build a image is to build it from an already existing image ("rebundling"). So TPP images are built from the official public images provided and supported by the Ubuntu community. The process is fairly straight forward:

  1. Find the AMI-ID of the latest Ubuntu community image for the zone you want to use. The simplest way is to use the convenient AMI locator tool found at http://cloud.ubuntu.com/ami. For TPP images filter the AMI list by amd64 architecture (64 bit) and instance-store block store then choose the release you want to use and note the AMI-ID.
  2. Start up a new EC2 instance with the AMI-ID from the previous step. You can do this either using the AWS console web application at http://console.aws.amazon.com or the command line tool ec2-start-instances if you happen to have installed the ec2-ami-tools. Make sure when you start the image that you use a security group that has port 22 and port 80 open and that you specify a key pair so that you can actually log into the instance. Once the instance is running the public domain name can be found using either the console or the command ec2-describe-instances.
  3. Copy your certificate and private key to the /tmp directory of your instance. (scp is your friend here)
  4. Using either ssh or Putty and your key log into your EC2 instance. You'll then need to setup a few environment variables that are used by various scripts and AWS tools:
    export AWS_USER_ID=<your-value>
    export AWS_ACCESS_KEY_ID=<your-value>
    export AWS_SECRET_ACCESS_KEY=<your-value>
    export EC2_CERT=/tmp/<your-value>
    export EC2_PRIVATE_KEY=/tmp/<your-value>
    export TPP_VERSION=4.4.1
  5. Download and run the provided scripts to install, configure, and publish the new TPP image.
    cd /tmp
    export SVN=https://sashimi.svn.sourceforge.net/svnroot/sashimi/trunk/trans_proteomic_pipeline/extern/hpctools
    wget $SVN/ec2/setup_ec2_image.sh
    wget $SVN/ec2/bundle_ec2_image.sh
    wget $SVN/ec2/publish_ec2_image.sh
    sudo -E bash /tmp/setup_ec2_image.sh
    sudo -E bash /tmp/bundle_ec2_image.sh
    sudo -E bash /tmp/publish_ec2_image.sh


middle
A little note about AWS key pairs, user id, private keys and certificates. It can get quite confusing which is needed for each step, and even more so the documentation often uses generic names for them. The key point with setting the environment variables correctly is that the EC2_CERT and EC2_PRIVATE_KEY are the X509 certificate and its associated private key. It is not the key pairs used to access a EC2 image nor the keys used for the API calls. More information about which keys are used where can be found in the Amazon Documentation.


Publishing

Updating Amazon Pages

Amazon maintains a list of publicly available AMIs at http://aws.amazon.com/amis/. Submitting to this list is a unfortunately a manual process and can be done using the web form at http://aws.amazon.com/amis/submit. To maintain consistency in submissions please cut&paste from the template below (filling in the appropriate values) into the Amazon form.

First Name:       <your first name>
Last Name:        <your last name>
Contact Email:    spctools-discuss@googlegroups.com
AMI Title:        Trans-Proteomic Pipeline (Linux 64-bit <version e.g. 4.4.1>)
AMI Manifest:     <S3 path to manifest file>
<AMI ID's>
License:          Public
Operating System: Unix/Linux
Summary Text:     Official image for the Trans-Proteomic Pipeline (TPP <version>)
Description: 
Trans-Proteomic Pipeline (TPP) is a data analysis pipeline for the analysis of LC/MS/MS proteomics data. TPP includes modules for validation of database search results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results. The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines.

Naming Conventions

Manifest Naming

The current suggested schema for naming manifests is to use the default prefix/names assigned by the ec2 tools and place them in a "folder" with a name following the schema "TPP-<version>-<data>" where version is the version of TPP and date is a date indicator in the format YYYYMMDD. An optional serial number [.1,.2,...] can be included for the YYYYMMDD date if necessary. These "folders" should be placed in the correct S3 bucket by region (see next section).

For an example, the name spctools-images-us/TPP-4.4.1-20110403/manifest.xml references image with TPP 4.4.1 installed build on 4/3/2011.

S3 Buckets

The following buckets have been (or will be) created in each region for storing SPCTools TPP images. Each bucket should have a suffix indicating which region the bucket is in:

  • spctools-images-us
  • spctools-iamges-us-west-1
  • spctools-images-eu

The following additional buckets have been created, primarily to reserve them:

  • spctools

External Links

Personal tools