TPP Amazon Machine Images

From SPCTools

Revision as of 00:31, 22 November 2014; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Starting with TPP 4.4.1 the TPP developers and contributors are now making available pre-built Amazon Machine Images (AMI) with the latest TPP software installed to make it even easier to perform proteomics data analysis. These images can be used with the TPP Web Launcher for Amazon (TWA), the TPP command line AWS high performance computing tools, or even as as a base for your own AMIs. The images are based on the latest official Ubuntu EC2 public images and the most recent versions include features such as persistent storage in S3 using s3sync, a "deadman" switch to automatically shut the instance down after a period of time, and automatic formatting and mounting of an EBS volume on attachment.

Contents

Amazon Machine Instances (AMI)

These images have been build using the official Ubuntu AMI images and include TPP in addition to useful open source software, packages, and bioinformatics software. Please see the README for each AMI for more information on the contents and versions.


Zone TPP Version Ubuntu Version Release 32-bit 64-bit Notes
us-west-2 4.8.0 Ubuntu 14.04 LTS 20141022.3 n/a ami-23337d13 README
us-west-2 4.7.1 Ubuntu 14.04 LTS 20140527 n/a ami-ecc9a3dc README
us-west-2 4.7.0 Ubuntu 13.10 Saucy Salamander 20140320 n/a ami-2c11781c README
us-west-2 4.6.3 Ubuntu 12.04 LTS Precise 20130725 n/a ami-4fb0227f README
us-west-2 4.6.2 Ubuntu 12.04 LTS Precise 20130219 n/a ami-0c26ac3c README
us-west-2 4.6.1 Ubuntu 12.04 LTS Precise 20121218 n/a ami-ae2da59e
us-west-2 4.6.0 Ubuntu 12.04 LTS Precise 20120828 n/a ami-e4e46ad4
us-west-2 4.5.2 Ubuntu 12.04 LTS Precise 20120423 n/a ami-1a860a2a
us-west-2 4.5.1 Ubuntu 11.04 Natty 20120209 n/a ami-5678f566
us-west-2 4.5.0 Ubuntu 11.04 Natty 20110415 n/a ami-0e8a073e
us-east-1 4.4.1 Ubuntu 11.04 Natty 20110810 n/a ami-21d51448

User's Guide

There are many good guides already written on how to use Amazon Machine Images (AMI) in the EC2 product. Here's just a few:

Official Guide
Getting Started With Amazon EC2

Image Details

Filesystem Layout

The ubuntu images where created with a small root partition and all remaining available disk space is mounted as /mnt. Therefore the following directories were created for TPP data:

/mnt/tppdata/local - Used for local data storage. Anything placed here will be lost when the instance is stopped.
/mnt/tppdata/s3 - Mount point for S3. See note below for more information
/mnt/tppdata/ebs - Mount point for elastic block store (EBS). See note below for more information.

The Petunia web interface is configured to use /mnt/tppdata as its top level so that users can browse and manipulate data on the instance.

S3 Persistence

Starting with version TPP 4.6.3 the AMI now comes with two Upstart scripts, tpp-s3-get.conf and ttp-s3-put.conf, which provides the option of using S3 to store your data. These scripts (when enabled) will run s3sync to download on system boot (or upload on system shutdown) the file hierarchy stored in /mnt/tppdata/s3.

Another possible option is to use s3fs, which comes pre-installed on some of the older TPP images. Using this tool it is possible to mount a S3 bucket as a local filesystem which then provides a degree of persistence storage of data in S3. While it works, we don't recommend relying on this mechanism as S3 is not a *true* filesystem and you will encounter significant performance and missing capabilities problems. Because of this starting with TPP 4.6.3 it is no longer provided on the images.

EBS Persistence

TBD

Developer's Guide

Building

The easiest way to build a image is to build it from an already existing image ("rebundling"). So TPP images are built from the official public images provided and supported by the Ubuntu community. The process is fairly straight forward:

  1. Find the AMI-ID of the latest Ubuntu community image for the zone you want to use. The simplest way is to use the convenient AMI locator tool found at http://cloud.ubuntu.com/ami. For TPP images filter the AMI list by amd64 architecture (64 bit) and instance-store block store then choose the release you want to use and note the AMI-ID.
  2. Start up a new EC2 instance with the AMI-ID from the previous step. You can do this either using the AWS console web application at http://console.aws.amazon.com or the command line tool ec2-start-instances if you happen to have installed the ec2-ami-tools. Make sure when you start the image that you use a security group that has port 22 and port 80 open and that you specify a key pair so that you can actually log into the instance. Once the instance is running the public domain name can be found using either the console or the command ec2-describe-instances.
  3. Copy your certificate and private key to the /tmp directory of your instance. (scp is your friend here)
  4. Using either ssh or Putty and your public key log into your EC2 instance. You'll then need to setup a few environment variables that are used by various scripts and AWS tools:
    export AWS_USER_ID=<your-value>
    export AWS_ACCESS_KEY_ID=<your-value>
    export AWS_SECRET_ACCESS_KEY=<your-value>
    export EC2_CERT=/tmp/<your-value>
    export EC2_PRIVATE_KEY=/tmp/<your-value>
    export TPP_VERSION=4.6.3
  5. Download and run the provided scripts to install, configure, and publish the new TPP image.
    cd /tmp
    export SVN_EC2="https://svn.code.sf.net/p/sashimi/code/trunk/trans_proteomic_pipeline/extern/hpctools/ec2"
    wget $SVN_EC2/setup_ec2_image.sh
    wget $SVN_EC2/bundle_ec2_image.sh
    wget $SVN_EC2/rnaseq_ec2_image.sh
    wget $SVN_EC2/publish_ec2_image.sh
    sudo -E bash /tmp/setup_ec2_image.sh 2>&1 | tee setup.out
    sudo -E bash /tmp/rnaseq_ec2_image.sh 2>&1 | tee rnaseq.out
    sudo -E bash /tmp/bundle_ec2_image.sh 2>&1 | tee bundle.out
    sudo -E bash /tmp/publish_ec2_image.sh 2>&1 | tee publish.out


middle
A little note about AWS key pairs, user id, private keys and certificates. It can get quite confusing which is needed for each step, and even more so the documentation often uses generic names for them. The key point with setting the environment variables correctly is that the EC2_CERT and EC2_PRIVATE_KEY are the X.509 certificate and its associated private key. These are not the key pairs used to access a EC2 image nor are the keys used for the signing the API calls. More information about which keys are used where can be found in the Amazon Documentation.


Publishing

Updating Amazon Pages

Amazon maintains a list of publicly available AMIs at http://aws.amazon.com/amis/. Submitting to this list is a unfortunately a manual process and can be done using the web form at http://aws.amazon.com/amis/submit. To maintain consistency in submissions please cut&paste from the template below (filling in the appropriate values) into the Amazon form.

First Name:       <your first name>
Last Name:        <your last name>
Contact Email:    spctools-discuss@googlegroups.com
AMI Title:        Trans-Proteomic Pipeline (Linux 64-bit <version e.g. 4.4.1>)
AMI Manifest:     <S3 path to manifest file>
<AMI ID's>
License:          Public
Operating System: Unix/Linux
Summary Text:     Official image for the Trans-Proteomic Pipeline (TPP <version>)
Description: 
Trans-Proteomic Pipeline (TPP) is a data analysis pipeline for the analysis of LC/MS/MS proteomics data. TPP includes modules for validation of database search results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results. The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines.

Naming Conventions

Manifest Naming

The current suggested schema for naming manifests is to use the default prefix/names assigned by the ec2 tools and place them in a "folder" with a name following the schema "TPP-<version>-<data>" where version is the version of TPP and date is a date indicator in the format YYYYMMDD. An optional serial number [.1,.2,...] can be included for the YYYYMMDD date if necessary. These "folders" should be placed in the correct S3 bucket by region (see next section).

For an example, the name spctools-images-us/TPP-4.4.1-20110403/manifest.xml references image with TPP 4.4.1 installed build on 4/3/2011.

S3 Buckets

The following buckets have been (or will be) created in each region for storing SPCTools TPP images. Each bucket should have a suffix indicating which region the bucket is in:

  • spctools-images-us
  • spctools-images-us-west-1
  • spctools-images-us-west-2
  • spctools-images-eu

The following additional buckets have been created, primarily to reserve them:

  • spctools

External Links

Personal tools