Guest post by AWS Container Hero Tung Nguyen. Tung is the president and founder of BoltOps, a consulting company focused on cloud infrastructure and software on AWS. He also enjoys writing for the BoltOps Nuts and Bolts blog.
EC2 Spot Instances allow me to use spare compute capacity at a steep discount. Using Amazon ECS with Spot Instances is probably one of the best ways to run my workloads on AWS. By using Spot Instances, I can save 50–90% on Amazon EC2 instances. You would think that folks would jump at a huge opportunity like a black Friday sales special. However, most folks either seem to not know about Spot Instances or are hesitant. This may be due to some fallacies about Spot.
With the Spot model, AWS can remove instances at any time. It can be due to a maintenance upgrade; high demand for that instance type; older instance type; or for any reason whatsoever.
Hence the first fear and fallacy that people quickly point out with Spot:
What do you mean that the instance can be replaced at any time? Oh no, that must mean that within 20 minutes of launching the instance, it gets killed.
I felt the same way too initially. The actual Spot Instance Advisor website states:
The average frequency of interruption across all Regions and instance types is less than 5%.
From my own usage, I have seen instances run for weeks. Need proof? Here’s a screenshot from an instance in one of our production clusters.
If you’re wondering how many days that is….
Yes, that is 228 continuous days. You might not get these same long uptimes, but it disproves the fallacy that Spot Instances are usually interrupted within 20 minutes from launch.
With Spot Instances, I place a single request for a specific instance in a specific Availability Zone. With Spot Fleets, instead of requesting a single instance type, I can ask for a variety of instance types that meet my requirements. For many workloads, as long as the CPU and RAM are close enough, many instance types do just fine.
So, I can spread my instance bets across instance types and multiple zones with Spot Fleets. Using Spot Fleets dramatically makes the system more robust on top of the already mentioned low interruption rate. Also, I can run an On-Demand cluster to provide additional safeguard capacity.
This is one of my favorite ways to run workloads because it gives me a scalable system at a ridiculously low cost. The technologies are such a great fit together that one might think they were built for each other.
These are the necessary pieces I need for building an ECS cluster on top of Spot Fleet. I use the two-minute warning to call ECS connection draining, and ECS automatically moves containers to another instance in the fleet.
Here’s a CloudFormation template that achieves this: ecs-ec2-spot-fleet. Because the focus is on understanding Spot Fleets, the VPC is designed to be simple.
The template specifies two instance types in the Spot Fleet: t3.small and t3.medium with 2 GB and 4 GB of RAM, respectively. The template weights the t3.medium twice as much as the t3.small. Essentially, the Spot Fleet TargetCapacity value equals the total RAM to provision for the ECS cluster. So if I specify 8, the Spot Fleet service might provision four t3.small instances or two t3.medium instances. The cluster adds up to at least 8 GB of RAM.
To launch the stack run, I run the following command:
aws cloudformation create-stack --stack-name ecs-spot-demo --template-body file://ecs-spot-demo.yml --capabilities CAPABILITY_IAM
The CloudFormation stack launches container instances and registers them to an ECS cluster
named development
by default. I can change this with the EcsCluster parameter. For more information on the parameters, see the README and the template source.
When I deploy the application, the deploy tool creates the ECS cluster itself. Here are the Spot Instances in the EC2 console.
After the Spot cluster is up, I can deploy a demo app on it. I wrote a tool called Ufo that is useful for these tasks:
Docker should be installed as a prerequisite. First, I create an ECR repo and set some variables:
ECR_REPO=$(aws ecr create-repository --repository-name demo/sinatra | jq -r '.repository.repositoryUri')
VPC_ID=$(aws ec2 describe-vpcs --filters Name=tag:Name,Values="demo vpc" | jq -r '.Vpcs[].VpcId')
Now I’m ready to clone the demo repo and deploy a sample app to ECS with ufo.
git clone https://github.com/tongueroo/demo-ufo.git demo
cd demo
ufo init --image $ECR_REPO --vpc-id $VPC_ID
ufo current --service demo-web
ufo ship # deploys to ECS on the Spot Fleet cluster
Here’s the ECS service running:
I then grab the Elastic Load Balancing endpoint from the console or with ufo ps.
$ ufo ps
Elb: develop-Elb-12LHJWU4TH3Q8-597605736.us-west-2.elb.amazonaws.com
$
Now I test with curl:
$ curl develop-Elb-12LHJWU4TH3Q8-597605736.us-west-2.elb.amazonaws.com
42
The application returns “42,” the meaning of life, successfully. That’s it! I now have an application running on ECS with Spot Fleet Instances.
One additional advantage of using Spot is that it encourages me to think about my architecture in a highly available manner. The Spot “constraints” ironically result in much better sleep at night as the system must be designed to be self-healing.
Hopefully, this post opens the world of running ECS on Spot Instances to you. It’s a core of part of the systems that BoltOps has been running on its own production system and for customers. I still get excited about the setup today. If you’re interested in Spot architectures, contact me at BoltOps.
One last note: Auto Scaling groups also support running multiple instance types and purchase options. Jeff mentions in his post that weight support is planned for a future release. That’s exciting, as it may streamline the usage of Spot with ECS even further.
Source: AWS News