aws

Terraform vs Cloudformation - a Pragmatic Comparison

Terraform vs Cloudformation - a Pragmatic Comparison

Managing infrastructure in the cloud used to be like uploading your code to FTP servers, during the bad old days without version control. Thankfully infrastructure-as-code changes all that. Here we compare the two most popular ones, Cloudformation and Terraform, based on their merits and drawbacks, in a more pragmatic way than a direct competition.

Why (Almost) All Comparisons Of CloudFormation and Terraform Are Unfair - and Wrong

Why (Almost) All Comparisons Of CloudFormation and Terraform Are Unfair - and Wrong

At first glance, CloudFormation is just a bunch of JSON or YAML files, while Terraform is much more expressive, feature-rich, and easier to work with. But those that assume this makes Terraform the better choice have the wrong end of the stick, and this post explains why.

Serverless Is the Future

Are containers the future? Everyone seems to think so. However, their development and popularity mirrors that of another technology that came before, which was eventually used to enable something even greater. Containers too will enable something greater - the serverless model.

Shrink the size of an AMI

I know what you're thinking. "I wish I could create my own AMIs (Amazon Machine Images). Not the traditional way - spawn an EC2 instance, make changes, then hit the EC2 API to use that instance to create a new AMI from it - but actually build a filesystem from scratch in an empty volume, and use that volume to create a bootable AMI."

I mean, who HASN'T thought that before.

On a serious note, there are a few reasons why the traditional process for creating AMIs may not be suitable:

  1. Shrinking AMI size - since the base AMIs provided by Amazon are 8GB big, the traditional process can only create AMIs of 8GB or larger - you can increase the size of the new AMI, but not shrink it. The problem with large AMIs is that they take a long time to spawn volumes from, and those volumes take a long time to turn into a new AMI.
  2. Using a new OS or distribution - what if you want to create an AMI for an operating system or Linux distribution that AWS does not provide a base AMI for, then you would have to create a new AMI.
  3. Fewer filesystem image "layers" - Every AMI can be created from another one, and can itself be used to create other AMIs. However Amazon does not create a brand new image for each one, rather it works out the changes between parent and child, and the child only has the 'changes' from the parent. Eventually, if you want to work out the actual data in your AMI that will become a new volume, you have to trudge your way through all of it's parents for the 'unchanged' bits that any AMI in the chain inherited and did not change. Creating a brand new AMI from an empty volume may be more efficient, but of course it will cost more, as you are no longer paying to store just the changes.

Shrinking an AMI might be a useful exercise for some, but has a few really annoying and obscure steps. One way to do it is described below.

Create a New EC2 Instance, Making Sure It Is Reachable By SSH

Create a new basic EC2 instance based on any Linux distribution you want (like Ubuntu). It will only be used to SSH into and copy files from one place to another.

Make ABSOLUTELY SURE you create it a subnet in the first availability zone, ending in the letter 'a'. Do NOT leave the 'Subnet' option set to 'No preference'. Also make sure you check the 'Auto-Assign Public IP' box. All volumes you create will have to be in the same availability zone, so choosing the first one ending in the letter 'a' makes sure of this.

Ensure you choose a suitable SSH key when asked, so that you can SSH to it. Also, create a new security group (or choose an existing security group) that allows you SSH access to it from your location (it is OK to use 0.0.0.0/0 as the source IP, as this instance will not be around for very long).

Create a New Volume From the AMI You Want To Shrink

We want to create a new volume from which to copy all our files from, so will create it from an existing AMI.

Create a New Empty Volume That Will Become Your New Shrunken AMI

copy Files From the Source Volume to the Target Volume Using the EC2 Instance

Repeat the above steps for the 'target-volume', but this time change Device to /dev/sdg .

Then, SSH into your instance, and mount both volumes:

$ sudo mkdir -p /mnt/source /mnt/target
$ sudo mount /dev/xvdf1 /mnt/source
$ sudo parted -s -a optimal /dev/xvdg mktable msdos mkpart primary 0% 100% toggle 1 boot # Partition target volume
$ sudo mkfs.ext4 /dev/xvdg1 # Make filesystem on target volume
...
Creating filesystem with 524032 4k blocks and 131072 inodes
Filesystem UUID: 49c66eaa-2def-4689-bf18-4b8426ee6cb6
...
$ sudo e2label /dev/xvdg1 cloudimg-rootfs
$ sudo mount /dev/xvdg1 /mnt/target

BE SURE TO TAKE NOTE OF THE FILESYSTEM UUID OF THE NEW PARTITION ON THE TARGET VOLUME, as printed out by mkfs.ext4! It is very important and we will need it later.

Copy Files Across To The New Volume

Copy across all files from the source volume to the target volume. Obviously, the target volume should be of sufficient size to accommodate all the files! If not, you will have to create a new one that does have space.

sudo tar -C /mnt/source -c . | sudo tar -C /mnt/target -xv

Now, we need to change the filesystem UUID in the copied files to the new one. Find the old one by inspecting the old filesystem, and then use sed to change that old one to the new one in /boot/grub/grub.cfg:

$ sudo blkid -s UUID -o value /dev/xvdf1
567ab888-a3b5-43d4-a92a-f594e8653924

# Note we use the new UUID here, as output by mkfs.ext4 above
$ sudo sed -i -e 's/567ab888-a3b5-43d4-a92a-f594e8653924/49c66eaa-2def-4689-bf18-4b8426ee6cb6/g' /mnt/target/boot/grub/grub.cfg

Now we run 'grub-install' to install the boot loader on the volume:

$ sudo grub-install --root-directory=/mnt/target /dev/xvdg

Now we should remove the volumes from the instance. First we unmount them:

$ sudo umount /mnt/source /mnt/target

Then we remove them both from the instance from the Volumes page of the AWS console:

Create a New AMI From The Smaller Volume

Create snapshot  from the target volume. Select it on the Volumes page, and choose Actions -> Create Snapshot. Enter 'smaller-ami' for Name, and leave Description blank. Click the snapshot ID link that now gets shown.

Once the snapshot finishes creating (the spinner next to it stops spinning), select the snapshot, and choose Actions -> Create Image. Choose 'Hardware-assisted virtualization' for 'Virtualization type', and fill in a name of your choice (such as 'smaller-ami' - we can reuse the name of the snapshot). Leave all other options as the defaults, and click 'Create'. The AMI ID should be displayed, clicking which will take you to the AMI page showing it's creation progress.

Now, if you create a new instance using that AMI, it should boot up start just like a normal instance, but with a smaller virtual disk attached to it by default!

Summary

These instructions are quite complex, covering a lot of topics from AWS volume and snapshot management, to Linux partition management, and quite frankly it is best to automate them as they are quite tricky to carry out. But sometimes there really is a need to shrink volumes, and hopefully you will find them helpful if you come across that need.

If you were following along to test the process out, please remember to delete all instances, AMIs, volumes, and snapshots created, as otherwise Amazon will continue to bill you.

Using Ansible to spawn a VM and then perform tasks on it

When working with VM instances in the cloud, it is often necessary to spawn one and provision it with certain packages. It would be very cumbersome to expect an engineer to do this manually every time.

Ansible provides modules for provisioning VM instances with various cloud service providers, as well as modules for performing tasks on a server. So the order of play starts to become clear:

  1. Provision a server
  2. SSH into it
  3. Performing provisioning operations

Server? What server?!

Now it is easy to do 2) and 3) with Ansible - almost every example you will see is based on that. So one would assume that a playbook that looks something like this would be sufficient:

- hosts: localhost
  tasks:
    - name: "Spawn a server"
      ec2:
      instance_type: t2.micro
      image: "ami-8b8c57f8"
      region: eu-west-1
      wait: yes
      state: present
      instance_tags:
        Name: test_server

- hosts: tag_Name_test_server
  become: true
  tasks:
    - name: "Install postgresql"
      yum:
        name: postgresql
        state: present
        update_cache: true

But there is a problem when we bring spawning a server into the mix: the inventory. Normally, the inventory - the list of servers and metadata about them like IP addresses, etc. - is loaded when Ansible first runs. (See here for more details about Ansible's inventory). This means that when we come to connecting to the server (the task that installs postgresql), Ansible will not know about it, since it loaded it's inventory before the server even existed (i.e. at the beginning of the playbook).

The solution is the following task:

- meta: refresh_inventory

This causes Ansible to reload it's inventory so from that point onwards, it's inventory will contain the state of any new servers that were created since the playbook began.

I said, what server?!

Now unfortunately adding the meta: refresh_inventory task between the spawning and provisioning tasks is not all that is required. That is because it takes a while for a cloud instance to boot up after it is spawned, yet Ansible will try to connect to it straight away, and fail because the SSH server has not started yet. The wait_for module to the rescue.

By adding a task like the following, Ansible will (as the name suggests) wait for a connection on the given port. The search_regex parameter causes it to wait not just until the port is open, but until the server responds with a string matching it when the connection is open, which is what we need in this case:

- hosts: tag_Name_test_server
  become: false
  gather_facts: no
  tasks:
    - name: "Wait for server port 22 to appear"
      local_action:
        module: wait_for
        host: "{{ inventory_hostname }}"
        port: 22
        delay: 15
        timeout: 120
        search_regex: "OpenSSH"

Note the following:

local_action is used while running the task against the host so we can look up the hostname of the host become (previously known in Ansible as 'sudo') is set to false as, since we are performing the task on the local host, we do not need/want to try to sudo to the root user.

gather_facts is set to no - gathering facts tells Ansible to connect to the host and look up some metadata like IP addresses and number of CPUs, which would completely defeat the purpose of what we are trying to do in the first place.

Here is the entire playbook. We are also adding a predefined SSH key and creating a security group for it so we can SSH to the host.

- hosts: localhost
  tasks:
    - name: "Create security group to allow SSH access"
      ec2_group:
        name: "test-server"
        description: "test server firewall"
        region: eu-west-1
        state: present
        rules:
          - proto: tcp
            from_port: 22
            to_port: 22
            cidr_ip: 0.0.0.0/0
        rules_egress:
          - proto: all
            cidr_ip: 0.0.0.0/0

    - name: "Spawn a server"
      ec2:
        instance_type: t2.micro
        image: "ami-8b8c57f8"
        groups: "test-server"
        region: eu-west-1
        wait: yes
        key_name: "My SSH key"
        state: present
        instance_tags:
          Name: test_server

    - name: "Refresh inventory"
      meta: refresh_inventory

- hosts: tag_Name_test_server
  become: false
  gather_facts: no
  tasks:
    - name: "Wait for server port 22 to appear"
      local_action:
        module: wait_for
        host: "{{ inventory_hostname }}"
        port: 22
        delay: 15
        timeout: 120
        search_regex: "OpenSSH"

- hosts: tag_Name_test_server
  become: true
  user: ec2-user
  tasks:
    - name: "Install postgresql"
      yum:
        name: postgresql
        state: present
        update_cache: true