AMZTPP:FAQ

From SPCTools

Revision as of 18:44, 16 April 2012; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Frequently asked questions (FAQ) about amztpp usage. If your question is neither answered here nor in the documentation, then please ask for help in the spctools discussion group spctools-discuss.googlegroups.com

Contents

Amazon S3

1. Why is my data stored in S3 for a search?

It is certainly possible to architect a software system that would launch new EC2 instances and copy input data directly to the instance bypassing the step of uploading and downloading from S3. But utilizing S3 provides advantages including when searches need to be repeated (or multiple engines are in use), managing of reliability of cloud instances and data transfers, and fault recovery for use with EC2 spot instances.

While the current version of amztpp does not support Amazon's Reduced Redundancy Storage (RRS) this is a feature being considered for a future release as it does show some promise in reducing the storage costs.

2. Do you support using Amazon's Reduced Redundancy Storage (RRS)?

While the current version of amztpp does not support Amazon's Reduced Redundancy Storage (RRS) this is a feature being considered for a future release as it does show some promise in reducing the storage costs.

Amazon EC2

1. What is the default EC2 instance type?

Tests show that the optimal balance between execution time and cost is currently the m1.xlarge type. This was determined by running a sample mzML file on the various EC2 types and taking the average time of three executions of X!Tandem with the same parameters and database. The number of threads for each of these runs was set to be equal to the number of virtual cores specified for each EC2 type.

2. How does amztpp decide its time to initiate another EC2 instance?

Currently the scheduling algorithm in amztpp is very basic. Whenever the upload of a new service is completed the background process checks to see if its time to launch a new EC2 instance. It first checks to see if the number of running instances is less than the maximum number of running instances allowed (default 10, configurable using the --max flag). It then checks if the number of pending services is less than the number of currently running instances. And finally it checks that number pending is less than the running total of the number of instances ever started. This last check is to take into consideration the overallocation of EC2 instances and their subsequent timeout and termination. The assumption being that if I at some point started up 10 nodes, but now only have 5 running I shouldn't startup any additional ones unless the number of pending is actually over 10.

Personal tools