Deploying an AWS ECS Cluster of EC2 Instances With Terraform

Nick Rondeau

Feb 5, 2024 • 8 min read

This project will utilize two major cloud computing tools.

Terraform is an infrastructure orchestration tool (also known as “infrastructure as code(IaC)”). Using Terraform, you declare every single piece of your infrastructure once, in static files, allowing you to deploy and destroy cloud infrastructure easily, make incremental changes to the infrastructure, do rollbacks, infrastructure versioning, etc.

Amazon created an innovative solution for deploying and managing a fleet of virtual machines — AWS ECS. Under the hood, ECS utilizes AWSs’ well-known concept of EC2 virtual machines, as well as CloudWatch for monitoring them, auto scaling groups (for provisioning and deprovisioning machines depending on the current load of the cluster), and most importantly — Docker as a containerization engine.

Here’s what’s to be done:

Within a VPC there’s an autoscaling group with EC2 instances. ECS manages starting tasks on those EC2 instances based on Docker images stored in ECR container registry. Each EC2 instance is a host for a worker that writes something to RDS MySQL. EC2 and MySQL instances are in different security groups.

We need to provision a some building blocks:

a VPC with a public subnet as an isolated pool for our resources
Internet Gateway to contact the outside world
Security groups for RDS MySQL and for EC2s
Auto-scaling group for ECS cluster with launch configuration
RDS MySQL instance
ECR container registry
ECS cluster with task and service definition

The Terraform Part

To start with Terraform we need to install it. Just go along with the steps in this document: https://www.terraform.io/downloads.html

Verify the installation by typing:

$ terraform --version
Terraform v0.13.4

With Terraform (in this case version 0.13.4) we can provision cloud architecture by writing code which is usually created in a programming language. In this case it’s going to be HCL — a HashiCorp configuration language.

Terraform state

Before writing the first line of our code lets focus on understanding what is the Terraform state.

The state is a kind of a snapshot of the architecture. Terraform needs to know what was provisioned, what are the resources that were created, track the changes, etc.

All that information is written either to a local file terraform.state or to a remote location. Generally the code is shared between members of a team, therefore keeping local state file is never a good idea. We want to keep the state in a remote destination. When working with AWS, this destination is s3.

This is the first thing that we need to code — tell terraform that the state location will be remote and kept is s3 (terraform.tf):

terraform {
    backend "s3" {
        bucket = "terraformeksproject"
        key    = "state.tfstate"
    }
}

Terraform will keep the state in an s3 bucket under a state.tfstate key. In order that to happen we need to set up three environment variables:

$ export AWS_SECRET_ACCESS_KEY=...
$ export AWS_ACCESS_KEY_ID=..
$ export AWS_DEFAULT_REGION=...

These credentials can be found/created in AWS IAM Management Console in “My security credentials” section.Both access keys and regionmustbe stored in environment variables if we want to keep the remote state.

VPC

provider "aws" {}

resource "aws_vpc" "vpc" {
    cidr_block = "10.0.0.0/24"
    enable_dns_support   = true
    enable_dns_hostnames = true
    tags       = {
        Name = "Terraform VPC"
    }
}

Terraform needs to know with which API should interact. Here we say it’ll be AWS. List of available providers can be found here: https://www.terraform.io/docs/providers/index.html

The provider section has no parameters because we’ve already provided the credentials needed to communicate with AWS API as environment variables in order have remote Terraform state (there is possibility to set it up withprovider parameters, though).

The resource block type aws_vpc with name vpc creates Virtual Private Cloud — a logically isolated virtual network. When creating VPC we must provide a range of IPv4 addresses. It’s the primary CIDR block for the VPC and this is the only required parameter.

Parameters enable_dns_support and enable_dns_hostnames arerequired if we want to provision database in our VPC that will be publicly accessible (and we do).

Internet gateway

In order to allow communication between instances in our VPC and the internet we need to create Internet gateway.

resource "aws_internet_gateway" "internet_gateway" {
    vpc_id = aws_vpc.vpc.id
}

The only required parameter is a previously created VPC id that can be obtain by invoking aws_vpc.vpc.id this is a terraform way to get to the resource details: resource.resource_name.resource_parameter.

Subnet

Within the VPC let’s add a public subnet:

resource "aws_subnet" "pub_subnet" {
    vpc_id                  = aws_vpc.vpc.id
    cidr_block              = "10.1.0.0/22"
}

To create a subnet we need to provide VPC id and CIDR block. Additionally we can specify availability zone, but it’s not required.

Route Table

Route table allows to set up rules that determine where network traffic from our subnets is directed. Let’s create new, custom one, just to show how it can be used and associated with subnets.

resource "aws_route_table" "public" {
    vpc_id = aws_vpc.vpc.id

    route {
        cidr_block = "0.0.0.0/0"
        gateway_id = aws_internet_gateway.internet_gateway.id
    }
}

resource "aws_route_table_association" "route_table_association" {
    subnet_id      = aws_subnet.pub_subnet.id
    route_table_id = aws_route_table.public.id
}

What we did is created a route table for our VPC that directs all the traffic (0.0.0.0/0) to the internet gateway and associate this route table with both subnets. Each subnet in VPC have to be associated with a route table.

Security Groups

Security groups works like a firewalls for the instances (where ACL works like a global firewall for the VPC). Because we allow all the traffic from the internet to and from the VPC we might set some rules to secure the instances themselves.

We will have two instances in our VPC — cluster of EC2s and RDS MySQL, therefore we need to create two security groups.

resource "aws_security_group" "ecs_sg" {
    vpc_id      = aws_vpc.vpc.id

    ingress {
        from_port       = 22
        to_port         = 22
        protocol        = "tcp"
        cidr_blocks     = ["0.0.0.0/0"]
    }

    ingress {
        from_port       = 443
        to_port         = 443
        protocol        = "tcp"
        cidr_blocks     = ["0.0.0.0/0"]
    }

    egress {
        from_port       = 0
        to_port         = 65535
        protocol        = "tcp"
        cidr_blocks     = ["0.0.0.0/0"]
    }
}

resource "aws_security_group" "rds_sg" {
    vpc_id      = aws_vpc.vpc.id

    ingress {
        protocol        = "tcp"
        from_port       = 3306
        to_port         = 3306
        cidr_blocks     = ["0.0.0.0/0"]
        security_groups = [aws_security_group.ecs_sg.id]
    }

    egress {
        from_port       = 0
        to_port         = 65535
        protocol        = "tcp"
        cidr_blocks     = ["0.0.0.0/0"]
    }
}

First security group is for the EC2 that will live in ECS cluster. Inbound traffic is narrowed to two ports: 22 for SSH and 443 for HTTPS needed to download the docker image from ECR.

Second security group is for the RDS that opens just one port, the default port for MySQL — 3306. Inbound traffic is also allowed from ECS security group, which means that the application that will live on EC2 in the cluster will have permission to use MySQL.

Inbound traffic is allowed for any traffic from the Internet (CIDR block 0.0.0.0/0). In real life case there should be limitations, for example, to IP ranges for a specific VPN.

This ends setting up the networking park of our architecture. Now it’s time for autoscaling group for a EC2 instances in ECS cluster.

Autoscaling Group

Autoscaling group is a collection of EC2 instances. The number of those instances is determined by scaling policies. We will create autoscaling group using a launch template.

Before we will launch container instancesandregister them into a cluster, we have to create an IAM role for those instances to use when they are launched:

data "aws_iam_policy_document" "ecs_agent" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "ecs_agent" {
  name               = "ecs-agent"
  assume_role_policy = data.aws_iam_policy_document.ecs_agent.json
}


resource "aws_iam_role_policy_attachment" "ecs_agent" {
  role       = "aws_iam_role.ecs_agent.name"
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}

resource "aws_iam_instance_profile" "ecs_agent" {
  name = "ecs-agent"
  role = aws_iam_role.ecs_agent.name
}

Having IAM role we can create an autoscaling group from template:

resource "aws_launch_configuration" "ecs_launch_config" {
    image_id             = "ami-094d4d00fd7462815"
    iam_instance_profile = aws_iam_instance_profile.ecs_agent.name
    security_groups      = [aws_security_group.ecs_sg.id]
    user_data            = "#!/bin/bash\necho ECS_CLUSTER=my-cluster >> /etc/ecs/ecs.config"
    instance_type        = "t2.micro"
}

resource "aws_autoscaling_group" "failure_analysis_ecs_asg" {
    name                      = "asg"
    vpc_zone_identifier       = [aws_subnet.pub_subnet.id]
    launch_configuration      = aws_launch_configuration.ecs_launch_config.name

    desired_capacity          = 2
    min_size                  = 1
    max_size                  = 10
    health_check_grace_period = 300
    health_check_type         = "EC2"
}

If we want to use a created, named ECS cluster we have to put that information into user_data, otherwise our instances will be launched in default cluster.

Basic scaling information is described by aws_autoscaling_group parameters. Autoscaling policy has to be provided, we will do it later.

Having autoscaling group set up we are ready to launch our instances and database.

Database Instance

Having prepared subnet and security group for RDS we need one more thing to cover before launching the database instance. To provision a database we need to follow some rules:

Our VPC has to have enabled DNS hostnames and DNS resolution (we did that while creating VPC).
Our VPC has to have a DB subnet group (that is about to happen).
Our VPC has to have a security group that allows access to the DB instance.

Let’s create the missing piece:

resource "aws_db_subnet_group" "db_subnet_group" {
    subnet_ids  = [aws_subnet.pub_subnet.id]
}

And database instance itself:

resource "aws_db_instance" "mysql" {
    identifier                = "mysql"
    allocated_storage         = 5
    backup_retention_period   = 2
    backup_window             = "01:00-01:30"
    maintenance_window        = "sun:03:00-sun:03:30"
    multi_az                  = true
    engine                    = "mysql"
    engine_version            = "5.7"
    instance_class            = "db.t2.micro"
    name                      = "worker_db"
    username                  = "worker"
    password                  = "worker"
    port                      = "3306"
    db_subnet_group_name      = aws_db_subnet_group.db_subnet_group.id
    vpc_security_group_ids    = [aws_security_group.rds_sg.id, aws_security_group.ecs_sg.id]
    skip_final_snapshot       = true
    final_snapshot_identifier = "worker-final"
    publicly_accessible       = true
}

All the parameters are more less self explanatory. If we want our database to be publicly accessible you have to set the publicly_accessible parameter as true.

Elastic Container Service

ECS is a scalable container orchestration service that allows to run and scale dockerized applications on AWS.

To launch such an application we need to download image from some repository. For that we will use ECR. We can push images there and use them while launching EC2 instances within our cluster:

resource "aws_ecr_repository" "worker" {
    name  = "worker"
}

And the ECS itself:

resource "aws_ecs_cluster" "ecs_cluster" {
    name  = "my-cluster"
}

Cluster name is important here, as we used it previously while defining launch configuration. This is where newly created EC2 instances will live.

To launch a dockerized application we need to create a task — a set of simple instructions understood by ECS cluster. The task is a JSON definition that can be kept in a separate file:

[
  {
    "essential": true,
    "memory": 512,
    "name": "worker",
    "cpu": 2,
    "image": "${REPOSITORY_URL}:latest",
    "environment": []
  }
]

In a JSON file we define what image will be used using template variable provided in a template_file data resource as repository_url tagged with latest. 512 MB of RAM and 2 CPU units that is enough to run the application on EC2.

Having this prepared we can create terraform resource for the task definition:

resource "aws_ecs_task_definition" "task_definition" {
  family                = "worker"
  container_definitions = data.template_file.task_definition_template.rendered
}

The family parameter is required and it represents the unique name of our task definition.

The last thing that will bind the cluster with the task is a ECS service. The service will guarantee that we always have some number of tasks running all the time:

resource "aws_ecs_service" "worker" {
  name            = "worker"
  cluster         = aws_ecs_cluster.ecs_cluster.id
  task_definition = aws_ecs_task_definition.task_definition.arn
  desired_count   = 2
}

This ends the terraform description of an architecture.

There’s just one more thing left to code. We need to output the provisioned components in order to use them in worker application.

We need to know URLs for:

ECR repository
MySQL host

Terraform provides output block for that. We can print to the console any parameter of any provisioned component.

output "mysql_endpoint" {
    value = aws_db_instance.mysql.endpoint
}

output "ecr_repository_worker_endpoint" {
    value = aws_ecr_repository.worker.repository_url
}

Applying the changes

First we need to initialize a working directory that contains Terraform files by typing terraform init. This command will install needed plugins and provide a code validation.

Follow up with terraform plan.

Finding that you’re receiving an error?

You need to manually create theS3bucket through the aws console, making sure to edit terraform.tf with the correct bucket name.

If everything is fine we can run terraform apply to finally provision the desired infastructure.