Deploying many sites in ECS using one ALB

My current go-to deployment strategy is AWS Elastic Container Service (ECS) using Fargate behind an Application Load Balancer (ALB). Each site is its own stateless docker container persisting dynamic data in RDS and/or S3. When I make a change, I build the new container, push it to ECR, create a new task revision and ECS deploys the site for me.

I've now set this up a couple of times and each time I struggle to recollect all the steps along the way, so it's high time I write it down so that I can look it up next time. And now that I understand this a bit better, I was also able to consolidate my infrastructure, since my original approach wasn't necessarily the most cost-efficient setup.

Aside from remembering/reverse engineering all the pieces needed, the part I always got stuck on was the apparent catch-22 of a load balancer wanting a target group, a target group wanting an IP, while the ECS Service wants to set up a load balancer before providing said IP.

Setting up the Application Load Balancer (ALB)

We're going to be using a single Application Load Balancer for all the ECS hosted sites and will be hosting them under https:// with http:// redirecting to the former.

Multi-domain certificate

Before we can create the load balancer we need a certificate that covers all the domains we want to host. This certificate can be swapped out later, so the list of domains defined here is not set in stone. AWS lets you generate a multi-domain and wildcard certificates directly via Certificate Manager, which allows us to host different domains at the single IP of the load balancer, rather than requiring an IP and certificate for each domain.

Go to Certificate Manager
Select Request
Request a public certificate
Add all the domains and sub-domains that the load balancer will handle
- You can provide up to 10 different domains in one certificate by default, i.e. 10 different sites with one Application Load Balancer. You can request more, but I haven't tried yet.
- foo.com and *.foo.com have to be declared separately, so that might cut you down to 5 different sites.
Keep the remaining settings as described
Follow the instructions for verifying each domain by setting up the appropriate CNAMEs in DNS

The only caveat is that the certificate common name (CN) is the first domain that you declare. I.e. if you create it for foo.com, bar.com and baz.com, visitors looking at the certificate for the latter two will see foo.com as the name on the certificate, which depending on the sites you are hosting may be a deal breaker.

Security Groups

We only want to open up the containers for the ALB to send traffic to, so we need a pair of groups, the first definining who can talk to the ALB and the second defining that the ALB can talk to this group.

Goto EC2
Select Security Groups from Network & Security
Select Create Security Group
1. Create security group for the load balancer itself
  1. Inbound: Ports 80 & 443 inbound from anywhere
  2. Outbound: Default anywhere rule
2. Create security group for ECS services to be reachable only by the load balancer
  1. Inbound: Port 8000 (or whatever port your docker containers listen for HTTP on) with a source being the previous security group
  2. Outbound: Default anywhere rule

I run the container with a public IP (actually required for env files) and add a Home only security group that opens up the container to my IP, so that I can sanity check it without the load balancer.

Creating the ALB

The ALB, once configured, will provide us an AWS based name, that we will use as the target of the CNAMEs that we're hosting. When someone hits some host name that we haven't set up routing for, we want to return a 503, rather than have one of the sites exposed at a non-canonical host. Unfortunately, the console does not let you define a static response as the initial default, but requires a Target Group to be created instead, so we will create a target group of the first site we want to host and then change the target rules after.

Goto EC2
Select Load Balancers from Load Balancing
Select Create Application Load Balancer
Pick Application Load Balancer
1. Internet Facing
2. Remove default security group and add group created above
3. Select HTTPS as the listener
4. Create a IP Target Group for the first site to be hosted
  1. Name it for the first site to be hosted
  2. Select HTTP (not HTTPS) and the port that the docker container listens on
    - The ALB terminates HTTPS and everything inside our network is just HTTP
  3. Define a healthcheck path on the container
    - Some path on the container that returns 200 when things are ok.
  4. Click Create without specifying an IP
5. Add the ACM certificate created above
6. Click Create for the ALB
Add HTTP listener to ALB
1. Set Routing Action to Redirect to URL
2. Keep Uri Parts and HTTPS target defaults
Modify default HTTPS rule
1. Select HTTPS listener
2. Check default rule
3. Select Edit Rule from Actions
4. Set Routing Action to Return fixed response
5. Set Response Code to 503 and text/plain
6. Add No such site as the Response body

The load balancer is now ready to direct traffic to the sites we create in ECS

Creating Services

Each site is a docker container running in ECS (or if the container handles multi-tenancy one container may service multiple sites). This provides us with immutable infrastructure, rather than servers we update as we make changes. If a container is compromised it is simply terminated and replaced. Ideally the container is configured to require no local changes at runtime, even for ephemeral state, so that we can use read-only file systems further reducing the chance of a compromise. This assumes, of course, that you do not permit executable code to be stored in your persistence layer.

Add a routing rule to the load balancer

Goto EC2
Select Load Balancers from Load Balancing
Select the load balancer
Select HTTPS listener
Select Add Rule
Click Next
Add Host header condition for the target domain
Confirm and click Next
Set Routing action to Forward to target groups
Create a target group as described above or pick the previously created target group if this is the first site we're setting up
Pick some priority and click Next
- priority doesn't matter since all our rules will be host header rules that either match or don't
Click Create

Upload Application Container to ECR

Go to ECR
Create a private repository for the application
Upload the container (see View Push Commands for details)

Create IAM Policies & Roles

Every service requires two roles: - A Task Role - providing the permissions that your application needs - e.g. appropriate S3 permissions for persistence - for illustration we'll grant full s3 access to the publicly readable bucket that the site will use for its media storage - A Task Execution Role - providing permission to run the task - since we'll use env files, we'll need appropriate s3 access to the private bucket we keep env files in

Public S3 Policy

This assumes that your site uses S3 for some file storage that is exposed publicly. If it doesn't skip to the next section. Further assuming bucket sample-site-public in which objects can be public, we create a policy so that our container can write files to it.

Go to IAM
Select Policies
Click Create Policy

Select the JSON policy editor and add the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::sample-site-public/*",
                "arn:aws:s3:::sample-site-public"
            ]
        }
    ]
}

Set the name to sample-site.s3.editor
Create the policy

Task Role

Go to IAM
Select Roles
Select AWS Service as Trusted Entity Type
Select Elastic Container Service as Use case
Select Elastic Container Service Task
Attach sample-site.s3.editor to role (if you are using S3 for public file storage)
Set the name to sample-site.task
Create role

Private S3 Policy

Assuming private bucket sample-site-private, we create a policy so that our task can read our environment file prod.env. We use separate buckets, so that we can't accidentally expose our secrets with a rogue permission change on the media bucket.

Go to IAM
Select Policies
Click Create Policy

Select the JSON policy editor and add the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::sample-site-private/prod.env"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::sample-site-private"
            ]
        }
    ]
}

Set the name to sample-site.task-files
Create the policy

Task Execution Role

Go to IAM
Select Roles
Select AWS Service as Trusted Entity Type
Select Elastic Container Service as Use case
Select Elastic Container Service Task
Attach AmazonECSTaskExecutionRolePolicy role
Attach sample-site.task-files role
Set the name to sample-site.task-runner
Create role

Create Task Definition

Note

The assumption is that there is already a cluster you can create services in. If not, create one using FARGATE, just for simplicity. Considering this setup is primarily low traffic sites, even the minimum FARGATE resources seem like a waste, so unlike what the console says about EC2 being for high throughput work, I may benefit from buying a single small EC2 instance and using it to run multiple low-traffic ECS services on. But that's a future topic. For now, all services are containers running in FARGATE.

Go to ECS
Select Task definitions
Click Create new task definition
Select Launch Type FARGATE
Select sample-site.task for the Task role
Select sample-site.task-runner for the Task execution role
Under Container details, link to your application container
Under Port Mappings expose the port that the container listens on
Click Add environment file
Set Location to arn:aws:s3:::sample-site-private/prod.env
- More versatile than env vars in the definition, and safer if the env contains secrets
- Alternatively could use Parameter Store. I'll write that up once I figure it out.
Create task definition

Create Service

Go to ECS
Select Clusters
Select your cluster
Click Create in the Services tab
Leave Capacity provider strategy
Leave Application Type as Service
Select the above create Task definition
under Networking
Remove default group
Add the service specific security group created above
(Optional) Add a Home only security group for direct access
Under Load balancing, select previously created ALB
Pick HTTPS listener
Pick previously created Target Group
As the service is deployed, it will automatically fill in targets in the target group
Update DNS to point host to ALB cname

Add more sites

Repeat the Creating Services instructions to add additional sites to the ALB.

What's next

The above setups are for low traffic sites. I use this setup to run a couple of wordpress blogs and some other code projects. If there is more than rudimentary traffic, it would be time revisit the deployment types and possible replicas. But the nice thing is that you can easily scale this up and down or add elastic scaling rules, and deployment of code changes is done by container pushes and creating new revisions of task definitions.

The next thing I do want to take a look at is setting up an EC2 instance to run as my launch target instead of FARGATE because even at .25vCPU my current sites are over-provisioned and costing me more than the dedicated EC2 instance with its Elastic IP I had before that I was manually updating code on.