Terraform vs Cloudformation - a Pragmatic Comparison

If I ever see anyone managing their infrastructure manually, instead of using an infrastructure-as-code tool, I lose it. It is like pushing your code changes to a shared FTP server. Good luck reverting any changes if you mess up, or deploying an identical infrastructure in a separate environment for some realistic testing.

The infrastructure-as-code tools, CloudFormation and Terraform, have existed for a couple of years now, and are thankfully gaining enough traction to make the above scenario rarer and rarer.

But which is better? There is only one way to find out...

No. This is not a competition! All tools have their own pros and cons, and which tool you choose must take these into account based on your unique situation.

Also, I will not be looking at plain CloudFormation. I will be looking at the CloudFormation tools that run it for you, and let you define resources using a proper programming language, as described in the article Why (Almost) All Comparisons Of CloudFormation and Terraform Are Unfair - and Wrong.

The two CloudFormation tools I recommend are StackMaster based on Ruby, or Stacker based on Python. There are others, including Go and node.js ones, but I have not looked at those so cannot comment on them. Please leave a comment telling me how they work if you have used them!

Differences

Cloud Provider Support

The first and obvious difference is cloud provider support. Not using AWS? Well, there is only one real choice - Terraform.

If you are using AWS, you get several benefits from using CloudFormation:

  1. If you have a support plan, it covers CloudFormation already. Terraform has an "enterprise" version that also provides support, but costs extra.
  2. AWS services spawn like rabbits. If you want to use them, you either have to wait for either CloudFormation or Terraform to support it (not QUITE true - see Custom Resources below). CloudFormation will probably support them earlier - possibly even at launch - as they come from the same company.
  3. Close integration into the rest of their services. For example, CodePIpeline - the AWS continuous integration and delivery tool - has built-in support for deploying CloudFormation changesets (their version of a "diff", or a Terraform plan), and CloudTrail, their auditing tool, tracks CloudFormation usage.
    Obviously Terraform is not so well integrated, and is simply another tool that runs on your cloud platform. You have to do all the work to integrate it into the rest of the services.

Managed Service vs Command-line Tool

CloudFormation is a "Software-as-a-service" (SaaS). The CLI tool invokes it over an API and outputs the result. Since AWS take care of applying your changes for you, you never have to worry about more than one person applying changes at the same time, or the machine running it losing power or internet connectivity. Therefore it is MUCH safer for collaboration. As noted above, technical support for it is included in your AWS support plan.

Terraform is just a command-line tool. You could create your own service to run Terraform, but that is extra work, obviously. If you subscribe to Terraform Enterprise, you get access to a SaaS that will run Terraform for you over an API just like CloudFormation, but obviously, that costs extra (but so does the operational complexity of running your own service!)

Starting a New Project and General Usage

It has to be said - Terraform is a lot easier to get going with, and is a much more friendly tool out of the box. You need a single file with a few lines in it, and you can start putting your infrastructure in code. This is probably one of the reasons for its popularity.

Now the AWS CLI CloudFormation tool is very annoying in that it cannot differentiate between "create" and "update". This would normally be a real annoyance, however as previously noted, we are comparing tools like StackMaster or Stacker, not the AWS CLI tool, with Terraform.

None of the CloudFormation tools are particularly easy to get going with from scratch. There is a lot of setting up to do - firstly by learning how they work, then creating your first stack template, then using that to define your stack, then setting the S3 bucket to apply your templates from - before you have your first resource.

That said, once you create your first project, running it becomes routine and no more difficult than Terraform. Also, after you have your first project, you simply have to clone it to get over the starting hurdle.

Proper Programming Language vs A Set of Fancy Configuration File

I have spoken at length about the benefits of considering CloudFormation from one of the many tools that wrap it in a proper programming language. See Why (Almost) All Comparisons Of CloudFormation and Terraform Are Unfair - and Wrong.

To briefly reiterate - when you have the full power of a programming language behind you, you can create abstractions which are much more flexible and elegant than you could by using Terraform's rather crude constructs, which is just a configuration file after all.

Of course, there is an argument that infrastructure-as-code should be as simple as possible, and this extra power gives you the ability to shoot yourself in the foot - but too much repetition due to the tool of choice not giving you the ability to abstract out similar constructs is also a huge source of bugs and maintenance headaches.

State Management Flexibility - Or Lack Thereof

Terraform lets you import existing resources into its state. It lets you rename resources, or move them between states. It lets you take things out of state, so it no longer knows about them. It lets you push state from one place to another, e.g. from local disk to S3.

CloudFormation has no such flexibility. The ability to import resources, in particular, is extremely annoying in real-world usage, where you often have to slowly integrate your infrastructure-as-code tool into your environment, and it must live along-side existing resources without it.

You may consider Terraform just because you have a bunch of existing resources you need to start managing and cannot afford the downtime of recreating them in an infrastructure-as-code tool.

Custom Resources

This is such a powerful feature that for any large project, I consider it a must.

Custom resources in CloudFormation allow stacks to call Lambda functions with a 'create' or 'delete' flag when they are created or deleted. Parameters you pass to the custom resource in your CloudFormation template are passed to the Lambda. The Lambda's response is then returned to CloudFormation.

This allows you to do some pretty funky things, like dynamic lookups of machine or container images at run-time, supporting resources or features that have no native support in CloudFormation yet, building an abstraction around a group of resources that you wish to manage as a single unit, doing advanced zero-downtime or green/blue deployments, or even manage resources outside of AWS itself, like in another cloud.

Terraform has nothing of the sort - you are stuck with what Hashicorp decide to make it support. It is usually enough, but when you need a custom resource or some additional programmability, you really miss it.

Rollbacks and Zero Downtime Deployments

CloudFormation tries very hard to never leave your stacks in a broken state. This is especially true for clusters of servers or containers - a deployment will usually involve bringing up a new set of backend servers or containers, waiting for their health-checks to pass so they start handling all future requests, and then waiting for the old servers or containers to finish handling any existing requests. Only then are the old ones deleted.

If anything should go wrong in this process, like new backends not starting or not responding to health-checks in a timely manner, then the old servers or containers are kept where they are and the new ones are deleted. This rolling release concept is ridiculously powerful for keeping things running without downtime.

Terraform believes in leaving things in a broken state if something should go wrong, and not rolling back even if it can. This is an ideological issue, as they think it is the right thing to do - infrastructure-as-code should be simple, they will claim, and more complexity should have you break out into a programming language manipulating your resources over an API. Yet they have no support for extensibility, like using custom resources or plugins.

It also does not do any health-checking of new backends before replacing the old ones, so you will not know if your deploy actually succeeded or not without doing additional checking and rolling back yourself.

Tooling Ecosystem

CloudFormation tools, since they all interact with the CloudFormation service, can inter-operate with each other. You could use AWS SAM or "The Serverless Project" for your serverless functions. You could use StackMaster, Stacker, or plain CloudFormation for your infrastructure. You could you any combination of the above, and it could all be roped together with basic CloudFormation concepts like stacks, changesets, and exports. This makes it much easier to use a specialized tool for a particular job or to migrate from one tool to another.

If using Terraform, you use Terraform. Or you bypass it by interacting with the CLI or API directly.

Factors For Making a Decision

When making a decision, several factors need to be considered:

The AWS Factor

If you are not using AWS, then your decision is easy - Terraform. If not, then you must consider the other differences.

Little or Big

For a small project or a small team, it probably does not matter too much. But most benefits of CloudFormation tools really kick in when a project gets larger in size and scope. I would choose StackMaster or Stacker for any large project just due for their ability to create abstractions using proper programming language constructs and custom resources.

Existing Infrastructure-As-Code

If there is substantial infrastructure-as-code already developed in a particular tool, then it is also a no-brainer. Use what the project already uses. Both tools are great - far better than doing it any other way. Again, in this case, the decision may already have been made for you.

Existing Resources

Similarly, if there are any existing resources that need to be imported, the decision is made for you as well. OnlyTerraform lets you import existing resources and manage them afterwards. Of course, CloudFormation users could simply delete and re-create as CloudFormation resources if they wanted to.

Existing Team Expertise

Perhaps it is best to stick to what a team already knows. However, decent engineers should not be afraid to move to other tools that are similar in nature to what they already know, so I do not believe that this is too important a point.

Conclusion

As I have already said, the decision on which to use must be based on a number of factors. However, it seems to boil down to this: if you will benefit from the more elegant abstractions that a CloudFormation tool will provide, use one of them. If you have existing resources to import, or work outside of AWS, use Terraform.

Due to this, and also because both tools are very good at their jobs, it is not helpful to declare a "winner". Any use of infrastructure-as-code is a win, and use of either tool will bring much productivity to a team. However, I hope I have provided a more pragmatic comparison of the two tools to help you choose.

Please leave a comment below letting us know if there is anything you feel I missed, or which tool you decided to go with.