AMZTPP:FAQ
From SPCTools
Frequently asked questions (FAQ) about amztpp usage. If your question is neither answered here nor in the documentation, then please ask for help in the spctools discussion group spctools-discuss.googlegroups.com
Contents |
Amazon S3
Why is my data stored in S3 for a search?
It is certainly possible to architect a software system that would launch new EC2 instances and copy input data directly to the instance bypassing the step of uploading and downloading from S3. But utilizing S3 provides advantages including when searches need to be repeated (or multiple engines are in use), managing of reliability of cloud instances and data transfers, and fault recovery for use with EC2 spot instances.
While the current version of amztpp does not support Amazon's Reduced Redundancy Storage (RRS) this is a feature being considered for a future release as it does show some promise in reducing the storage costs.
Do you support using Amazon's Reduced Redundancy Storage (RRS)?
While the current version of amztpp does not support Amazon's Reduced Redundancy Storage (RRS) this is a feature being considered for a future release as it does show some promise in reducing the storage costs.
Amazon EC2
What is the default EC2 instance type?
Tests show that the optimal balance between execution time and cost is currently the m1.xlarge type. This was determined by running a sample mzML file on the various EC2 types and taking the average time of three executions of X!Tandem with the same parameters and database. The number of threads for each of these runs was set to be equal to the number of virtual cores specified for each EC2 type.
How does amztpp decide its time to initiate another EC2 instance?
Currently the scheduling algorithm in amztpp is very basic. Whenever the upload of a new service is completed the background process checks to see if its time to launch a new EC2 instance. It first checks to see if the number of running instances is less than the maximum number of running instances allowed (default 10, configurable using the --max flag). It then checks if the number of pending services is less than the number of currently running instances. And finally it checks that number pending is less than the running total of the number of instances ever started. This last check is to take into consideration the overallocation of EC2 instances and their subsequent timeout and termination. The assumption being that if I at some point started up 10 nodes, but now only have 5 running I shouldn't startup any additional ones unless the number of pending is actually over 10.
Do EC2 instances automatically stop, and under what criteria?
Built into the TPP Amazon Machine Images are several levels of protection to ensure that you don't have EC2 instances running longer than necessary. The first level of protection is a "deadman" switch build using Ubuntu's [upstart] init daemon. This simple script starts at instance startup and will shut the system down after 55 minutes. When the amztpp server process which handles services is started it cancels the deadman process. If at any time the amztpp server process stops unexpectedly the deadman is restarted and the system will shut down in 5 minutes. When the amztpp process quits normally when it finds nothing to do after its predetermined timeout the deadman again is set and the system is shutdown in 1 minute.
Users of amztpp should be aware that even with these precautions it is still possible for instances to be left running for long periods of time. In the end its your responsibility to ensure that you are using only the Amazon Web Services resources that you expect to use.