Deploying many sites in ECS using one ALB
My current go-to deployment strategy is AWS Elastic Container Service (ECS) using Fargate behind an Application Load Balancer (ALB). Each site is its own stateless docker container persisting dynamic data in RDS and/or S3. When I make a change, I build the new container, push it to ECR, create a new task revision and ECS deploys the site for me.
I've now set this up a couple of times and each time I struggle to recollect all the steps along the way, so it's high time I write it down so that I can look it up next time. And now that I understand this a bit better, I was also able to consolidate my infrastructure, since my original approach wasn't necessarily the most cost-efficient setup.
Aside from remembering/reverse engineering all the pieces needed, the part I always got stuck on was the apparent catch-22 of a load balancer wanting a target group, a target group wanting an IP, while the ECS Service wants to set up a load balancer before providing said IP.
Setting up the Application Load Balancer (ALB)
We're going to be using a single Application Load Balancer for all the ECS hosted sites and will be hosting them under https://
with http://
redirecting to the former.
Multi-domain certificate
Before we can create the load balancer we need a certificate that covers all the domains we want to host. This certificate can be swapped out later, so the list of domains defined here is not set in stone. AWS lets you generate a multi-domain and wildcard certificates directly via Certificate Manager, which allows us to host different domains at the single IP of the load balancer, rather than requiring an IP and certificate for each domain.
- Go to Certificate Manager
- Select Request
- Request a public certificate
- Add all the domains and sub-domains that the load balancer will handle
- You can provide up to 10 different domains in one certificate by default, i.e. 10 different sites with one Application Load Balancer. You can request more, but I haven't tried yet.
foo.com
and*.foo.com
have to be declared separately, so that might cut you down to 5 different sites.
- Keep the remaining settings as described
- Follow the instructions for verifying each domain by setting up the appropriate
CNAME
s in DNS
The only caveat is that the certificate common name (CN
) is the first domain that you declare. I.e. if you create it for foo.com
, bar.com
and baz.com
, visitors looking at the certificate for the latter two will see foo.com
as the name on the certificate, which depending on the sites you are hosting may be a deal breaker.
Security Groups
We only want to open up the containers for the ALB to send traffic to, so we need a pair of groups, the first definining who can talk to the ALB and the second defining that the ALB can talk to this group.
- Goto EC2
- Select Security Groups from Network & Security
- Select Create Security Group
- Create security group for the load balancer itself
- Inbound: Ports 80 & 443 inbound from anywhere
- Outbound: Default anywhere rule
- Create security group for ECS services to be reachable only by the load balancer
- Inbound: Port 8000 (or whatever port your docker containers listen for HTTP on) with a source being the previous security group
- Outbound: Default anywhere rule
- Create security group for the load balancer itself
I run the container with a public IP (actually required for env files) and add a Home only security group that opens up the container to my IP, so that I can sanity check it without the load balancer.
Creating the ALB
The ALB, once configured, will provide us an AWS based name, that we will use as the target of the CNAME
s that we're hosting. When someone hits some host name that we haven't set up routing for, we want to return a 503, rather than have one of the sites exposed at a non-canonical host. Unfortunately, the console does not let you define a static response as the initial default, but requires a Target Group to be created instead, so we will create a target group of the first site we want to host and then change the target rules after.
- Goto EC2
- Select Load Balancers from Load Balancing
- Select Create Application Load Balancer
- Pick Application Load Balancer
- Internet Facing
- Remove default security group and add group created above
- Select
HTTPS
as the listener - Create a IP Target Group for the first site to be hosted
- Name it for the first site to be hosted
- Select HTTP (not HTTPS) and the port that the docker container listens on
- The ALB terminates HTTPS and everything inside our network is just HTTP
- Define a healthcheck path on the container
- Some path on the container that returns 200 when things are ok.
- Click
Create
without specifying an IP
- Add the ACM certificate created above
- Click
Create
for the ALB
- Add
HTTP
listener to ALB- Set Routing Action to Redirect to URL
- Keep
Uri Parts
andHTTPS
target defaults
- Modify default
HTTPS
rule- Select
HTTPS
listener - Check
default
rule - Select Edit Rule from Actions
- Set Routing Action to Return fixed response
- Set Response Code to
503
andtext/plain
- Add
No such site
as the Response body
- Select
The load balancer is now ready to direct traffic to the sites we create in ECS
Creating Services
Each site is a docker container running in ECS (or if the container handles multi-tenancy one container may service multiple sites). This provides us with immutable infrastructure, rather than servers we update as we make changes. If a container is compromised it is simply terminated and replaced. Ideally the container is configured to require no local changes at runtime, even for ephemeral state, so that we can use read-only file systems further reducing the chance of a compromise. This assumes, of course, that you do not permit executable code to be stored in your persistence layer.
Add a routing rule to the load balancer
- Goto EC2
- Select Load Balancers from Load Balancing
- Select the load balancer
- Select
HTTPS
listener - Select
Add Rule
- Click
Next
- Add
Host header
condition for the target domain Confirm
and click Next- Set Routing action to
Forward to target groups
- Create a target group as described above or pick the previously created target group if this is the first site we're setting up
- Pick some priority and click
Next
- priority doesn't matter since all our rules will be host header rules that either match or don't
- Click
Create
Upload Application Container to ECR
- Go to ECR
- Create a private repository for the application
- Upload the container (see View Push Commands for details)
Create IAM Policies & Roles
Every service requires two roles:
- A Task Role
- providing the permissions that your application needs
- e.g. appropriate S3 permissions for persistence
- for illustration we'll grant full s3 access to the publicly readable bucket that the site will use for its media storage
- A Task Execution Role
- providing permission to run the task
- since we'll use env files, we'll need appropriate s3 access to the private bucket we keep env files in
Public S3 Policy
This assumes that your site uses S3 for some file storage that is exposed publicly. If it doesn't skip to the next section. Further assuming bucket sample-site-public
in which objects can be public, we create a policy so that our container can write files to it.
- Go to IAM
- Select Policies
- Click Create Policy
- Select the JSON policy editor and add the following:
- Set the name to
sample-site.s3.editor
- Create the policy
Task Role
- Go to IAM
- Select Roles
- Select AWS Service as Trusted Entity Type
- Select Elastic Container Service as Use case
- Select Elastic Container Service Task
- Attach
sample-site.s3.editor
to role (if you are using S3 for public file storage) - Set the name to
sample-site.task
- Create role
Private S3 Policy
Assuming private bucket sample-site-private
, we create a policy so that our task can read our environment file prod.env
. We use separate buckets, so that we can't accidentally expose our secrets with a rogue permission change on the media bucket.
- Go to IAM
- Select Policies
- Click Create Policy
- Select the JSON policy editor and add the following:
- Set the name to
sample-site.task-files
- Create the policy
Task Execution Role
- Go to IAM
- Select Roles
- Select AWS Service as Trusted Entity Type
- Select Elastic Container Service as Use case
- Select Elastic Container Service Task
- Attach
AmazonECSTaskExecutionRolePolicy
role - Attach
sample-site.task-files
role - Set the name to
sample-site.task-runner
- Create role
Create Task Definition
Note
The assumption is that there is already a cluster you can create services in. If not, create one using FARGATE
, just for simplicity. Considering this setup is primarily low traffic sites, even the minimum FARGATE
resources seem like a waste, so unlike what the console says about EC2 being for high throughput work, I may benefit from buying a single small EC2 instance and using it to run multiple low-traffic ECS services on. But that's a future topic. For now, all services are containers running in FARGATE
.
- Go to ECS
- Select Task definitions
- Click Create new task definition
- Select Launch Type
FARGATE
- Select
sample-site.task
for the Task role - Select
sample-site.task-runner
for the Task execution role - Under Container details, link to your application container
- Under Port Mappings expose the port that the container listens on
- Click Add environment file
- Set Location to
arn:aws:s3:::sample-site-private/prod.env
- More versatile than env vars in the definition, and safer if the env contains secrets
- Alternatively could use Parameter Store. I'll write that up once I figure it out.
- Create task definition
Create Service
- Go to ECS
- Select Clusters
- Select your cluster
- Click Create in the Services tab
- Leave Capacity provider strategy
- Leave Application Type as Service
- Select the above create Task definition
- under Networking
- Remove default group
- Add the service specific security group created above
- (Optional) Add a Home only security group for direct access
- Under Load balancing, select previously created ALB
- Pick HTTPS listener
- Pick previously created Target Group
- As the service is deployed, it will automatically fill in targets in the target group
- Update DNS to point host to ALB cname
Add more sites
Repeat the Creating Services instructions to add additional sites to the ALB.
What's next
The above setups are for low traffic sites. I use this setup to run a couple of wordpress blogs and some other code projects. If there is more than rudimentary traffic, it would be time revisit the deployment types and possible replicas. But the nice thing is that you can easily scale this up and down or add elastic scaling rules, and deployment of code changes is done by container pushes and creating new revisions of task definitions.
The next thing I do want to take a look at is setting up an EC2 instance to run as my launch target instead of FARGATE
because even at .25vCPU my current sites are over-provisioned and costing me more than the dedicated EC2 instance with its Elastic IP I had before that I was manually updating code on.