TPP Amazon Machine Images
From SPCTools
Revision as of 18:33, 11 May 2011 JoeS (Talk | contribs) (→Versions) ← Previous diff |
Revision as of 18:43, 11 May 2011 JoeS (Talk | contribs) (→Trans-Proteomic Pipeline AMIs for Amazon EC2) Next diff → |
||
Line 26: | Line 26: | ||
|style=" padding: 4px"| 20110415 | |style=" padding: 4px"| 20110415 | ||
|style=" padding: 4px"| n/a | |style=" padding: 4px"| n/a | ||
- | |style=" padding: 4px"| ami-1ef50977 [[image:Aws-launch-13x15.png]]<br>instance-store | + | |style=" padding: 4px"| [https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-1ef50977 ami-1ef50977]<br>instance-store |
|} | |} | ||
Revision as of 18:43, 11 May 2011
Starting with TPP 4.4.1 the TPP group is now making available pre-built Amazon Machine Images (AMI) with the latest TPP software installed to make it even easier to perform proteomics data analysis. These images are configured to be used with either the TPP Web Application (TWA), the TPP AWS high performance computing tools, for your own in house applications, or as a base for your own EC2 images. The images are based on the latest Ubuntu EC2 public images and include features such as persistent store in S3 or EBS backed filesystems and wine based conversions of MS/MS files.
Contents |
Details
This is an unix/linux instance-store backed 64-bit image based on the ubuntu 10.10 "mavrick" public image (ami-08f40561). It also contains the following open source software:
- OMSSA Version 2.1.9
- InsPecT Version 20101012
- Myrimatch Version 2.0.85
- Proteowizard's msconvert (Windows version)
Trans-Proteomic Pipeline AMIs for Amazon EC2
Zone | TPP Version | Ubuntu Version | Release | server 32-bit | server 64-bit |
---|---|---|---|---|---|
us-east-1 | 4.4.1 | Ubuntu 11.04 Natty | 20110415 | n/a | ami-1ef50977 instance-store |
User's Guide
There are many good guides already written on how to use Amazon Machine Images (AMI) in the EC2 product. Here's just a few:
Notes
Filesystem Layout
The ubuntu images where created with a small root partition and all remaining available disk space is mounted as /mnt. Therefore the following directories were created for TPP data:
- /mnt/tppdata/local - Used for local data storage. Anything placed here will be lost when the instance is stopped.
- /mnt/tppdata/s3 - Mount point for S3. See note below for more information
- /mnt/tppdata/ebs - Mount point for elastic block store (EBS). See note below for more information.
The Petunia web interface is configured to use /mnt/tppdata as its top level so that users can browse and manipulate data on the instance.
S3 Persistence
Since the TPP image comes with s3fs pre-installed it is possible to mount a S3 bucket as a local filesystem to get persistence storage of data in S3. To use this feature please provide the bucket name, your AWS credentials in the userdata field when starting the instance.
EBS Persistence
TBD
Developer's Guide
Building
The easiest way to build a image is to build it from an already existing image ("rebundling"). So TPP images are built from the official public images provided and supported by the Ubuntu community. The process is fairly straight forward:
- Find the AMI-ID of the latest Ubuntu community image for the zone you want to use. The simplest way is to use the convenient AMI locator tool found at http://cloud.ubuntu.com/ami. For TPP images filter the AMI list by amd64 architecture (64 bit) and instance-store block store then choose the release you want to use and note the AMI-ID.
- Start up a new EC2 instance with the AMI-ID from the previous step. You can do this either using the AWS console web application at http://console.aws.amazon.com or the command line tool ec2-start-instances if you happen to have installed the ec2-ami-tools. Make sure when you start the image that you use a security group that has port 22 and port 80 open and that you specify a key pair so that you can actually log into the instance. Once the instance is running the public domain name can be found using either the console or the command ec2-describe-instances.
- Copy your certificate and private key to the /tmp directory of your instance. (scp is your friend here)
- Using either ssh or Putty and your key log into your EC2 instance. You'll then need to setup a few environment variables that are used by various scripts and AWS tools:
-
- export AWS_USER_ID=<your-value>
- export AWS_ACCESS_KEY_ID=<your-value>
- export AWS_SECRET_ACCESS_KEY=<your-value>
- export EC2_CERT=/tmp/<your-value>
- export EC2_PRIVATE_KEY=/tmp/<your-value>
- export TPP_VERSION=4.4.1
-
- Download and run the provided scripts to install, configure, and publish the new TPP image.
-
- cd /tmp
- export SVN=https://sashimi.svn.sourceforge.net/svnroot/sashimi/trunk/trans_proteomic_pipeline/extern/hpctools
- wget $SVN/ec2/setup_ec2_image.sh
- wget $SVN/ec2/bundle_ec2_image.sh
- wget $SVN/ec2/publish_ec2_image.sh
- sudo -E bash /tmp/setup_ec2_image.sh
- sudo -E bash /tmp/bundle_ec2_image.sh
- sudo -E bash /tmp/publish_ec2_image.sh
-
Publishing
Updating Amazon Pages
Amazon maintains a list of publicly available AMIs at http://aws.amazon.com/amis/. Submitting to this list is a unfortunately a manual process and can be done using the web form at http://aws.amazon.com/amis/submit. To maintain consistency in submissions please cut&paste from the template below (filling in the appropriate values) into the Amazon form.
First Name: <your first name> Last Name: <your last name> Contact Email: spctools-discuss@googlegroups.com AMI Title: Trans-Proteomic Pipeline (Linux 64-bit <version e.g. 4.4.1>) AMI Manifest: <S3 path to manifest file> <AMI ID's> License: Public Operating System: Unix/Linux Summary Text: Official image for the Trans-Proteomic Pipeline (TPP <version>) Description: Trans-Proteomic Pipeline (TPP) is a data analysis pipeline for the analysis of LC/MS/MS proteomics data. TPP includes modules for validation of database search results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results. The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines.
Naming Conventions
Manifest Naming
The current suggested schema for naming manifests is to use the default prefix/names assigned by the ec2 tools and place them in a "folder" with a name following the schema "TPP-<version>-<data>" where version is the version of TPP and date is a date indicator in the format YYYYMMDD. An optional serial number [.1,.2,...] can be included for the YYYYMMDD date if necessary. These "folders" should be placed in the correct S3 bucket by region (see next section).
For an example, the name spctools-images-us/TPP-4.4.1-20110403/manifest.xml references image with TPP 4.4.1 installed build on 4/3/2011.
S3 Buckets
The following buckets have been (or will be) created in each region for storing SPCTools TPP images. Each bucket should have a suffix indicating which region the bucket is in:
- spctools-images-us
- spctools-iamges-us-west-1
- spctools-images-eu
The following additional buckets have been created, primarily to reserve them:
- spctools