What Is a Canary Deployment? featured image

You have a new feature ready to release to your customers. Ideally, you want the experience to be frictionless and without downtime. There are a few ways to achieve this, and a canary deployment is one of them. 

A canary deployment is one of the more sophisticated deployment strategies that organizations can adopt to reduce risk when releasing new software and maintain a better experience for end-users. A canary deployment is a controlled rollout strategy involving pushing out changes to a small subset of users first before releasing to the rest of the user base. This pattern allows you to:

  • Limit or even avoid downtime altogether
  • Gather feedback on how the new code behaves in your production environment
  • Roll back easily in the event of bugs or performance degradation
  • Release with confidence

Why canaries? The term is inspired by how canaries used to serve as an early warning for the presence of carbon monoxide and other toxic gases down coal mines. The implications aren’t quite as grim for canary deployments: by rolling out changes to a small group first, they serve as your “canary” for identifying risks and errors in production (but that’s the worst of it).

Before we get deeper into the details of how canary deployments work, let’s back up a bit and look at the other deployment and release strategies employed by software companies. 

Four common deployment strategies

Big bang deployments

In the beginning, software development teams might work for months on a new feature or functionality, ending in a big release event when it was rolled out to everyone at once with some fanfare. Big bang releases aren’t necessarily a byproduct of Waterfall development, but they are commonly linked. The challenge with big bangs is that there’s a lot of pressure to meet the deadline, and infrequent releases create pressure to shoehorn more code into the upcoming release, because there won’t be another opportunity for some time. These aren’t ideal conditions for shipping stable code that’s ready.

Rolling deployments

In this model you maintain one production environment, consisting of multiple servers (or cloud instances), and a load balancer usually routes traffic between these. Each server has a copy of the application, and these are updated in a staggered fashion by your IT operators, by targeting specific instances for the new release first. Some users will be exposed to the new version of the app, while others will interact with the existing one. This allows you to observe how the new version performs and gather feedback before deploying to the rest of the servers. In the event of an error, operators can reroute traffic to the servers with the working version of your application until resolved. Rolling deployments are relatively simple to implement and to roll back if necessary, but they can take a long time, and you need to be able to support both old and new versions throughout.

Blue green deployments

Blue green deployments are a relatively safe way to update applications in production with zero downtime. Unlike rolling deployments with their single production environment, here, two identical instances of production are created behind a load balancer. The “blue” environment is live and receives user traffic. The “green” environment receives constant updates from your continuous integration server, allowing you to test the new behavior in production without exposing it to users. When you’ve validated that the green environment is stable and everything works, you can route users over to it instead. You can roll back to the blue environment if you run into any issues. Blue green deployments are safer and less stressful, but there is a cost to maintaining two different versions of your application, and the more complex your infrastructure, the more complex it might be to manage your routes.

Canary deployments

In a canary deployment, your operations or DevOps team rolls out the new deployment of your application to a specific group or percentage of users before gradually releasing to everyone. Canary deployments are sometimes also referred to as phased or incremental rollouts. While rolling deployments deploy the new version to specific servers, canary deployments target a subset of users to receive the update first. Canary releases can be thought of as a modern implementation of beta testing: you might target a group of users who opted into trying out new functionality with the understanding that the experience might  be buggy at first. Some organizations roll out the canary version to their employees to test and validate first, before they migrate all customers to the new version.

There are other controlled rollout strategies, such as percentage rollouts and ring deployments. What they all have in common is deploying to a subset of users to test the new release first; the main difference between these strategies is how that group is selected or what the users in that group have in common. Some companies roll out region by region, or a premium feature deployment may be targeted at premium customers only.  

Canary deployments vs A/B testing

Canary deployments bear some resemblance to A/B testing in that they both involve releasing something new to a select group. They serve different use cases though: while canary releases are used to monitor the performance of new code in production and detect problems, A/B testing helps companies understand users’ reactions to a change (i.e. does the new feature meet their expectations?).

Benefits of canary deployment

With canary deployments:

  • You’re testing in production. As sophisticated as your local or staging environments are, they can’t fully replicate the behavior of your new feature in production. Canary deployments offer a safe way to test in production while minimizing impact to users. 
  • You can get early feedback from real users on performance, bugs, and user experience before exposing all your users to the new version. In some cases you may choose to iterate on the new version based on user feedback before releasing to the rest of your user base. 
  • You can also roll back more quickly in the event of a significant error, since you limited the blast radius to a small group. 
  • If you’re using feature flags (which we’ll explore more below), the infrastructure requirements to implement this model are minimal.
  • These advantages enable you to deploy with more confidence and less stress because the stakes are lower.
  • Rolling out a new feature to a small number of users can also generate demand and excitement from the rest of your users ahead of release, if members of the early-access group like the new feature and promote it to their networks.

Drawbacks of canary deployment

  • Two environments to maintain: This won’t be a big leap if you’ve been doing blue green deployments, but otherwise you’re doubling your production environments, which increases cost and complexity. You can keep infrastructure costs lower by maintaining a smaller version of production for your canaries, since only a small percentage of users will be routed to them.
  • Technical challenges: Routing a subset of users to the new version can be complex to configure. It can be made easier with feature flags (more on these below), but these need to be managed carefully, especially at scale. 
  • Support for multiple versions: As with rolling deployments, this strategy requires support for both the old version and new version of your application while you’re rolling out. 
  • You can’t always control upgrade timing: If your application is installed on your users’ computers or other devices, you won’t necessarily know or be able to control when the change to the new version takes place.
  • Some errors don’t make themselves known on a user’s first exposure to the new version, so something might look good in the canary deployment and still fail when it’s rolled out to everyone. Daniel Reynaud writes about this and other challenges in “Canaries in Practice” which dives into more technical details on canary deployment implementation and what to be aware of.  

How to tell if you’re ready for a canary deployment strategy

There’s no one-size-fits-all deployment process, so here are some considerations to take into account when deciding what’s right for your organization.

Resources

Some deployment patterns require more resources than others. Any model that involves more than one instance of your production environment and backend is going to be more expensive to maintain.

Architecture

The complexity of your application’s architecture also has an impact on how difficult and resource intensive it is to replicate production. Microservices architectures (for all their other benefits) are particularly challenging.

Scale

If your user base is small, rolling out to a small percentage of users to begin with may not be a sufficient sample size to determine the success of the canary (and the costs of implementing canary deployment is probably not worth the return).

Confidence

If you’re confident in the new version and the changes you’re shipping aren’t risky, the overhead of setting up canary deployments may outweigh any potential benefits.

Your current deployment strategy

If you’re already maintaining a second instance of production and evaluating metrics in a blue green deployment style, you may find it relatively simple to shift to canary deployment. It’s a much bigger jump from big bang deployments. You might want to consider an iterative approach, where you start introducing elements that support canary deployments and work your way up to them. 

We’ll explore some of those elements below.

Before you get started with canary deployment

If you’ve weighed up your options and have decided canary deployment is right for you, there are a few things to make sure you have in place before you get going.

Metrics and monitoring

How will you know if the canary succeeded? You could wait for customer feedback to trickle (or snowball) in, but it’s better to proactively monitor your application and validate the canary version’s performance metrics to determine if you continue rolling out or roll back. This approach is sometimes called “canary analysis.” First, you need to decide on what application health metrics you will be paying attention to (whether that’s response times, error rates, CPU load, or business metrics such as session length or conversions). Your baseline metrics will be based on the live, known good version of your application, then you’ll need to decide the thresholds for deviations that trigger a rollback decision. Of course, you also need to make sure you are actually tracking those metrics and have visibility into them. Prometheus and Grafana are a popular combination of services for monitoring and then presenting your metrics in a dashboard.

Continuous integration

You will need a reliable deployment pipeline to run canary deployments, and this requires continuous integration and continuous deployment (CI/CD) pipelines. Having these in place ensures you’re ready to roll back swiftly at the first sign of degraded performance or errors in your canary deployment. Your source code management system may have built-in CI/CD (such as GitHub Actions or GitLab CI/CD), or you can integrate an external tool (such as CircleCI or Jenkins).

A load balancer

A load balancer is a networking solution that distributes traffic across your servers. In canary deployment, this is the piece that decides what traffic goes to your canary and what goes to the current version of your application. There are both hardware and software versions of load balancers, and you may already be using one or more to ensure even distribution of traffic to your servers. You will probably be familiar with NGINX, a popular web application load balancer. Amazon AWS offers a cloud-based Elastic Load Balancer.

Kubernetes

Kubernetes enables more than just canary deployments and can be used for blue green and rolling deployments too. Using Kubernetes is not the only option for staging rollouts (see load balancers above), but it is commonly used together with Docker for managing deployments of application containers.

The next level: Feature flags for canary deployments

Feature flags (or feature toggles as they’re sometimes called) are used in software development and delivery to enable or disable functionality without deploying new code. 

“Imagine you have a feature flag tool that handles your canary deployments for you. So in other words you just do a standard rolling deployment, but all of your new features are turned off completely. Then on the feature level you can say ‘Alright, this risky feature, I’m going to turn it on for 1 percent.’ So, all your servers have been upgraded, but the feature that you’re testing has not been.”— Jonathan Hall, DevOps advocate, Adventures in DevOps

Using feature flags together with your deployments enables you to decouple the deployment of your code from the deployment of a new feature. It might sound more complex because effectively you have two deployments to run: the actual code, and then releasing the feature.   However this strategy allows for even more granular control over who has access to what, and can help you to avoid hasty and stressful rollbacks, since a problematic new feature can be disabled quickly by simply flipping a switch. 

Using feature flags together with canary deployments paves the way for progressive delivery, a concept that builds on continuous integration and continuous delivery (CI/CD). The core tenets of progressive delivery are: 

  1. Release progressions: Controlled releases such as canary deployments or incremental rollouts reduce risk and ensure the quality of each new feature
  2. Progressive delegation: Giving control over the feature to the team best equipped to manage it at each stage of the development and delivery processes empowers team members and reduces blockers. 

Release progressions are possible with canary deployment, and you don’t strictly need feature flags to achieve it, although they do make it easier. Progressive delegation is where feature flags really shine, because by removing the need to deploy new code in order to release a feature, it’s much easier for teams outside of engineering to manage the feature when it’s their turn. 

Wrapping up

Canary deployments are sophisticated, but if done correctly they can relieve your teams of stress around releases and give your users a smooth and consistent experience.

Related Content

More about Industry Insights

January 30, 2023