AMZTPP:FAQ
From SPCTools
Revision as of 20:09, 16 April 2012 JoeS (Talk | contribs) (→Do EC2 instances automatically stop, and under what critera?) ← Previous diff |
Revision as of 18:42, 1 May 2012 JoeS (Talk | contribs) Next diff → |
||
Line 1: | Line 1: | ||
Frequently asked questions (FAQ) about amztpp usage. If your question is neither answered here nor in the documentation, then please ask for help in the spctools discussion group [http://groups.google.com/group/spctools-discuss spctools-discuss.googlegroups.com] | Frequently asked questions (FAQ) about amztpp usage. If your question is neither answered here nor in the documentation, then please ask for help in the spctools discussion group [http://groups.google.com/group/spctools-discuss spctools-discuss.googlegroups.com] | ||
+ | |||
+ | === General === | ||
+ | |||
+ | ==== Where does amztpp save your AWS credentials? ==== | ||
+ | |||
+ | For Linux your secret credentials file will located in your home directory, ~/.awssecret. Under Windows it will be saved as .awssecrets in your "My Data" folder. The actual location of the folder may very depending on your operating system version and setup. Under Windows 7 on our systems it happened to be ~\AppData\Local. It is also possible to override the default location of the file on either operating system platform by setting the AWS_CREDENTIAL_FILE environment variable to the path of the file to use. | ||
+ | |||
+ | ==== What does the realclean command of amztpp really do? ==== | ||
+ | |||
+ | The amztpp realclean command will send a termination request to any active EC2 instances, delete all messages and queues in the SQS, and remove all files stored in S3 including the S3 bucket that stores the files. | ||
+ | |||
+ | ==== After running the realclean command will I incur any more charges? ==== | ||
+ | |||
+ | While the amztpp realclean command makes every attempt to remove all usage of AWS services its not possible to provide a 100% guarantee that all operations are ceased and removed. It is '''*highly advised*''' that you check after issuing this command using both the status command and the AWS console to ensure that all data has been removed and that there are no lingering EC2 instances or SQS messages. | ||
=== Amazon S3 === | === Amazon S3 === | ||
Line 21: | Line 35: | ||
==== How does amztpp decide its time to initiate another EC2 instance? ==== | ==== How does amztpp decide its time to initiate another EC2 instance? ==== | ||
- | Currently the scheduling algorithm in '''amztpp''' is very basic. Whenever the upload of a new service is completed the background process checks to see if its time to launch a new EC2 instance. It first checks to see if the number of running instances is less than the maximum number of running instances allowed (default 10, configurable using the --max flag). It then checks if the number of pending services is less than the number of currently running instances. And finally it checks that number pending is less than the running total of the number of instances ever started. This last check is to take into consideration the overallocation of EC2 instances and their subsequent timeout and termination. The assumption being that if I at some point started up 10 nodes, but now only have 5 running I shouldn't startup any additional ones unless the number of pending is actually over 10. | + | Currently the scheduling algorithm in ''amztpp'' is very basic. Whenever the upload of a new service is completed the background process checks to see if its time to launch a new EC2 instance. If the number of running instances is already at the maximum allowed (default 1) than no new instances will be started. Another instance is launched if the number of pending services is greater than the number of previously pending services and the number of active services is equal to the number of running instances. The purpose of checking against the previous number of services is to only allocate more instances when the rate of pending services is increasing. If the rate is decreasing or holding steady the assumption is that you have allocated enough instances to process the data at the rate you are uploading (submitting) data. |
==== Do EC2 instances automatically stop, and under what criteria? ==== | ==== Do EC2 instances automatically stop, and under what criteria? ==== |
Revision as of 18:42, 1 May 2012
Frequently asked questions (FAQ) about amztpp usage. If your question is neither answered here nor in the documentation, then please ask for help in the spctools discussion group spctools-discuss.googlegroups.com
Contents |
General
Where does amztpp save your AWS credentials?
For Linux your secret credentials file will located in your home directory, ~/.awssecret. Under Windows it will be saved as .awssecrets in your "My Data" folder. The actual location of the folder may very depending on your operating system version and setup. Under Windows 7 on our systems it happened to be ~\AppData\Local. It is also possible to override the default location of the file on either operating system platform by setting the AWS_CREDENTIAL_FILE environment variable to the path of the file to use.
What does the realclean command of amztpp really do?
The amztpp realclean command will send a termination request to any active EC2 instances, delete all messages and queues in the SQS, and remove all files stored in S3 including the S3 bucket that stores the files.
After running the realclean command will I incur any more charges?
While the amztpp realclean command makes every attempt to remove all usage of AWS services its not possible to provide a 100% guarantee that all operations are ceased and removed. It is *highly advised* that you check after issuing this command using both the status command and the AWS console to ensure that all data has been removed and that there are no lingering EC2 instances or SQS messages.
Amazon S3
Why is my data stored in S3 for a search?
It is certainly possible to architect a software system that would launch new EC2 instances and copy input data directly to the instance bypassing the step of uploading and downloading from S3. But utilizing S3 provides advantages including when searches need to be repeated (or multiple engines are in use), managing of reliability of cloud instances and data transfers, and fault recovery for use with EC2 spot instances.
While the current version of amztpp does not support Amazon's Reduced Redundancy Storage (RRS) this is a feature being considered for a future release as it does show some promise in reducing the storage costs.
Do you support using Amazon's Reduced Redundancy Storage (RRS)?
While the current version of amztpp does not support Amazon's Reduced Redundancy Storage (RRS) this is a feature being considered for a future release as it does show some promise in reducing the storage costs.
Amazon EC2
What is the default EC2 instance type?
Tests show that the optimal balance between execution time and cost is currently the m1.xlarge type. This was determined by running a sample mzML file on the various EC2 types and taking the average time of three executions of X!Tandem with the same parameters and database. The number of threads for each of these runs was set to be equal to the number of virtual cores specified for each EC2 type.
How does amztpp decide its time to initiate another EC2 instance?
Currently the scheduling algorithm in amztpp is very basic. Whenever the upload of a new service is completed the background process checks to see if its time to launch a new EC2 instance. If the number of running instances is already at the maximum allowed (default 1) than no new instances will be started. Another instance is launched if the number of pending services is greater than the number of previously pending services and the number of active services is equal to the number of running instances. The purpose of checking against the previous number of services is to only allocate more instances when the rate of pending services is increasing. If the rate is decreasing or holding steady the assumption is that you have allocated enough instances to process the data at the rate you are uploading (submitting) data.
Do EC2 instances automatically stop, and under what criteria?
Built into the TPP Amazon Machine Images are several levels of protection to ensure that you don't have EC2 instances running longer than necessary. The first level of protection is a "deadman" switch build using Ubuntu's [upstart] init daemon. This simple script starts at instance startup and will shut the system down after 55 minutes. When the amztpp server process which handles services is started it cancels the deadman process. If at any time the amztpp server process stops unexpectedly the deadman is restarted and the system will shut down in 5 minutes. When the amztpp process quits normally when it finds nothing to do after its predetermined timeout the deadman again is set and the system is shutdown in 1 minute.
Users of amztpp should be aware that even with these precautions it is still possible for instances to be left running for long periods of time. In the end its your responsibility to ensure that you are using only the Amazon Web Services resources that you expect to use.