TPP Amazon Machine Images
From SPCTools
←Older revision | Newer revision→
Starting with TPP 4.4.1 the TPP developers and contributors are now making available pre-built Amazon Machine Images (AMI) with the latest TPP software installed to make it even easier to perform proteomics data analysis. These images can be used with the TPP Web Launcher for Amazon (TWA), the TPP command line AWS high performance computing tools, or even as as a base for your own AMIs. The images are based on the latest official Ubuntu EC2 public images and the most recent versions include features such as persistent storage in S3 using s3sync, a "deadman" switch to automatically shut the instance down after a period of time, and automatic formatting and mounting of an EBS volume on attachment.
Contents |
Amazon Machine Instances (AMI)
These images have been build using the official Ubuntu AMI images and include TPP in addition to useful open source software, packages, and bioinformatics software. Please see the README for each AMI for more information on the contents and versions.
Zone | TPP Version | Ubuntu Version | Release | 32-bit | 64-bit | Notes |
---|---|---|---|---|---|---|
us-west-2 | 4.8.0 | Ubuntu 14.04 LTS | 20141022.3 | n/a | ami-23337d13 | README |
us-west-2 | 4.7.1 | Ubuntu 14.04 LTS | 20140527 | n/a | ami-ecc9a3dc | README |
us-west-2 | 4.7.0 | Ubuntu 13.10 Saucy Salamander | 20140320 | n/a | ami-2c11781c | README |
us-west-2 | 4.6.3 | Ubuntu 12.04 LTS Precise | 20130725 | n/a | ami-4fb0227f | README |
us-west-2 | 4.6.2 | Ubuntu 12.04 LTS Precise | 20130219 | n/a | ami-0c26ac3c | README |
us-west-2 | 4.6.1 | Ubuntu 12.04 LTS Precise | 20121218 | n/a | ami-ae2da59e | |
us-west-2 | 4.6.0 | Ubuntu 12.04 LTS Precise | 20120828 | n/a | ami-e4e46ad4 | |
us-west-2 | 4.5.2 | Ubuntu 12.04 LTS Precise | 20120423 | n/a | ami-1a860a2a | |
us-west-2 | 4.5.1 | Ubuntu 11.04 Natty | 20120209 | n/a | ami-5678f566 | |
us-west-2 | 4.5.0 | Ubuntu 11.04 Natty | 20110415 | n/a | ami-0e8a073e | |
us-east-1 | 4.4.1 | Ubuntu 11.04 Natty | 20110810 | n/a | ami-21d51448 |
User's Guide
There are many good guides already written on how to use Amazon Machine Images (AMI) in the EC2 product. Here's just a few:
Image Details
Filesystem Layout
The ubuntu images where created with a small root partition and all remaining available disk space is mounted as /mnt. Therefore the following directories were created for TPP data:
- /mnt/tppdata/local - Used for local data storage. Anything placed here will be lost when the instance is stopped.
- /mnt/tppdata/s3 - Mount point for S3. See note below for more information
- /mnt/tppdata/ebs - Mount point for elastic block store (EBS). See note below for more information.
The Petunia web interface is configured to use /mnt/tppdata as its top level so that users can browse and manipulate data on the instance.
S3 Persistence
Starting with version TPP 4.6.3 the AMI now comes with two Upstart scripts, tpp-s3-get.conf and ttp-s3-put.conf, which provides the option of using S3 to store your data. These scripts (when enabled) will run s3sync to download on system boot (or upload on system shutdown) the file hierarchy stored in /mnt/tppdata/s3.
Another possible option is to use s3fs, which comes pre-installed on some of the older TPP images. Using this tool it is possible to mount a S3 bucket as a local filesystem which then provides a degree of persistence storage of data in S3. While it works, we don't recommend relying on this mechanism as S3 is not a *true* filesystem and you will encounter significant performance and missing capabilities problems. Because of this starting with TPP 4.6.3 it is no longer provided on the images.
EBS Persistence
TBD
Developer's Guide
Building
The easiest way to build a image is to build it from an already existing image ("rebundling"). So TPP images are built from the official public images provided and supported by the Ubuntu community. The process is fairly straight forward:
- Find the AMI-ID of the latest Ubuntu community image for the zone you want to use. The simplest way is to use the convenient AMI locator tool found at http://cloud.ubuntu.com/ami. For TPP images filter the AMI list by amd64 architecture (64 bit) and instance-store block store then choose the release you want to use and note the AMI-ID.
- Start up a new EC2 instance with the AMI-ID from the previous step. You can do this either using the AWS console web application at http://console.aws.amazon.com or the command line tool ec2-start-instances if you happen to have installed the ec2-ami-tools. Make sure when you start the image that you use a security group that has port 22 and port 80 open and that you specify a key pair so that you can actually log into the instance. Once the instance is running the public domain name can be found using either the console or the command ec2-describe-instances.
- Copy your certificate and private key to the /tmp directory of your instance. (scp is your friend here)
- Using either ssh or Putty and your public key log into your EC2 instance. You'll then need to setup a few environment variables that are used by various scripts and AWS tools:
- export AWS_USER_ID=<your-value>
- export AWS_ACCESS_KEY_ID=<your-value>
- export AWS_SECRET_ACCESS_KEY=<your-value>
- export EC2_CERT=/tmp/<your-value>
- export EC2_PRIVATE_KEY=/tmp/<your-value>
- export TPP_VERSION=4.6.3
- Download and run the provided scripts to install, configure, and publish the new TPP image.
- cd /tmp
- export SVN_EC2="https://svn.code.sf.net/p/sashimi/code/trunk/trans_proteomic_pipeline/extern/hpctools/ec2"
- wget $SVN_EC2/setup_ec2_image.sh
- wget $SVN_EC2/bundle_ec2_image.sh
- wget $SVN_EC2/rnaseq_ec2_image.sh
- wget $SVN_EC2/publish_ec2_image.sh
- sudo -E bash /tmp/setup_ec2_image.sh 2>&1 | tee setup.out
- sudo -E bash /tmp/rnaseq_ec2_image.sh 2>&1 | tee rnaseq.out
- sudo -E bash /tmp/bundle_ec2_image.sh 2>&1 | tee bundle.out
- sudo -E bash /tmp/publish_ec2_image.sh 2>&1 | tee publish.out
A little note about AWS key pairs, user id, private keys and certificates. It can get quite confusing which is needed for each step, and even more so the documentation often uses generic names for them. The key point with setting the environment variables correctly is that the EC2_CERT and EC2_PRIVATE_KEY are the X.509 certificate and its associated private key. These are not the key pairs used to access a EC2 image nor are the keys used for the signing the API calls. More information about which keys are used where can be found in the Amazon Documentation. |
Publishing
Updating Amazon Pages
Amazon maintains a list of publicly available AMIs at http://aws.amazon.com/amis/. Submitting to this list is a unfortunately a manual process and can be done using the web form at http://aws.amazon.com/amis/submit. To maintain consistency in submissions please cut&paste from the template below (filling in the appropriate values) into the Amazon form.
First Name: <your first name> Last Name: <your last name> Contact Email: spctools-discuss@googlegroups.com AMI Title: Trans-Proteomic Pipeline (Linux 64-bit <version e.g. 4.4.1>) AMI Manifest: <S3 path to manifest file> <AMI ID's> License: Public Operating System: Unix/Linux Summary Text: Official image for the Trans-Proteomic Pipeline (TPP <version>) Description: Trans-Proteomic Pipeline (TPP) is a data analysis pipeline for the analysis of LC/MS/MS proteomics data. TPP includes modules for validation of database search results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results. The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines.
Naming Conventions
Manifest Naming
The current suggested schema for naming manifests is to use the default prefix/names assigned by the ec2 tools and place them in a "folder" with a name following the schema "TPP-<version>-<data>" where version is the version of TPP and date is a date indicator in the format YYYYMMDD. An optional serial number [.1,.2,...] can be included for the YYYYMMDD date if necessary. These "folders" should be placed in the correct S3 bucket by region (see next section).
For an example, the name spctools-images-us/TPP-4.4.1-20110403/manifest.xml references image with TPP 4.4.1 installed build on 4/3/2011.
S3 Buckets
The following buckets have been (or will be) created in each region for storing SPCTools TPP images. Each bucket should have a suffix indicating which region the bucket is in:
- spctools-images-us
- spctools-images-us-west-1
- spctools-images-us-west-2
- spctools-images-eu
The following additional buckets have been created, primarily to reserve them:
- spctools