Right Grid
  • Overview
  • Transcript
Trajectory

Launch Day Workflows

Rich Manalang LaunchDarkly

What’s your Launch Day Workflow? Have you automated anything to improve that workflow? Do you use any LaunchDarkly integrations or have you built any yourself? This session will dive deep into what happens before and after you flip that toggle. We’ll explore integrations we’ve built to help you keep track of that feature once it’s out the door. And we’ll also discuss workflows and integrations that we’re working on to improve the quality, accountability, and compliance of rolling out features.

Downloads slides

Rich Manalang

Rich is a Principal Developer Advocate at LaunchDarkly. He is a passionate product engineer keenly interested in truly making things simpler. He spent most of his tech career building useful (and not so useful) products at Trello, Atlassian, Oracle, and PeopleSoft.

(mellow music) - I've spent the past six months working with a small team at LaunchDarkly to bring you 15 new integrations. In the next 10 minutes. I'd like to share with you how we did it. Good morning, good afternoon, good evening, wherever you may be. I hope you're enjoying Trajectory so far. My name is Rich Manalang. I'm a Developer Advocate here at LaunchDarkly. Since December, I've worked very closely with our small integrations team. No, not this one. I'm sorry, I couldn't find a team photo in time. But we built a platform that makes it faster and easier to create and launch new integrations. Prior to having this new platform, our integrations were written very tightly with our core application code. This meant that every time we wanted to spin up a new integration, it was a development effort measured in weeks, if not months. As most integrations go, there's usually a pattern. We extracted these patterns and built a simple framework to describe each integration's set of capabilities in a configuration file. So now when we want to create an integration or a partner wants to integrate with us, we just create a new configuration file that describes the capabilities that we want the integration to provide. This new framework for building integrations makes it super easy to get a new integration built and deployed. And now it takes longer to actually write the documentation than the integration itself. But beyond that, what I really want to share with you today is how we launch these integrations to you all. 

There are three interesting parts to how we use LaunchDarkly on the integrations team. Let's dive into each one of these in detail. Let's start with operational flags. Think of operational flags as permanent flags that you can use as application or system configuration. Most applications might have dedicated user or account settings or personalization features built into the app. Well, did you know that you can use LaunchDarkly as a lightweight configuration store rather than building that into your own app? For example, here's how we launched each integration using a feature flag. Here we're looking at the flag that enables or disables an integration. Instead of creating one feature flag for each integration, we have one flag that allows us to enable or disable individual integrations. The way we do this is with targeting rules on our feature flag. The first rule enables all of the integrations in the list. Anything in this list will be available to all of our customers. Let's take a look at the next rule. Just a few weeks ago, we released a new version of our New Relic integration and deprecated the older version. However, we still have users using the older integration and wanted to make sure that we didn't break it for them. So we created this second rule to make sure that they still have access to the older integration while making it unavailable for everyone else. The last rule in this list is for when we're working on a new integration and want to test it out internally, we make sure we target ourselves first. You can see that by looking at the user's email domain. This is how we can test the integration in production. There are lots of other interesting use cases for operational flags. For example, rate-limiting. Instead of hard coding API rate limits into your code, you can configure them dynamically with a flag. Or let's take logging levels. Most logging levels are hard-coded in the code. So if you need to debug an issue in production, sometimes upping your log levels can be helpful. But normally this requires a restart in your services. But because LaunchDarkly can stream changes to your service, there's no need. Or managing releases. Some teams use blue-green deployments to manage switching from one deployment version to another. Feature flags can provide this feature also, but instead of doing this with your entire code base, you can choose what part of your code to manage instead. Or lastly, load shedding. Imagine your service is getting DDOS'ed. You can set up a flag to shed some of that load from your service. For example, you can turn off non-essential features that contribute to that load, which would allow your app to recover much faster. 

Now let's talk about scheduling. Sometimes we launch integrations in real time, but sometimes that's inconvenient and not practical. Sometimes we may need to do a coordinated launch with a partner. So in these cases, we schedule our flag changes. Well, you might've heard that we recently released a feature to allow you to schedule changes to flags. Those are now available to everyone. Using this feature, we can tailor exactly when we want an integration to go live. Here's some other neat ways you can use schedule changes. For instance, if you wanted to give your support engineers elevated access so that they can diagnose a customer issue, you could use scheduled changes. You can also use it to auto expire a customer product trial. Like if you wanted to give them, you know, a customer, an extra two weeks to continue trying a product. You can also use it to automatically enable or disable maintenance windows. Lots of different ways you could use scheduled changes. Lastly, let's talk about automation. Well, scheduling is a form of automation, but I'd like to talk about a new feature that we just released called Triggers. A common pattern for us at LaunchDarkly is that when we turn a feature on, we take a look at the impact of that new code in production. I'm sure everyone does this. We do this by looking at our Datadog dashboards. What we're looking for are any elevated error rates or adverse effects on performance as a result of that new feature going live. Seeing changes to feature flags in context with production metrics is possible with our Datadog integration. If you don't use Datadog, don't worry. We have similar integrations with a bunch of other vendors that do very similar things. Typically the engineers of that feature bear the responsibility of monitoring their feature in production. With Triggers, we've now made it possible to attach a circuit breaker to a metric event in your monitoring or observability tool. For us, we have a Datadog metric alert wired to turn a flag off when something bad happens. 

Here's an example. Imagine that you're doing, you're going to perform an infrastructure migration. You're moving from a self hosted Postgres database to an Amazon RDS instance. And that you're managing this migration with a feature flag. Well, I don't know about you, but performing a live migration is like an in-flight refueling of a passenger airplane. You can screw it up very easily if you don't plan it well. Well, luckily for us,. LaunchDarkly feature flags are a good fit for orchestrating these type of critical migrations. So let's just say you've performed most of the migration and you're ready to switch over to RDS. And you've got a feature flag ready to deploy to 100% of your users. Currently you're deployed to only 5% of your users and the performance looks pretty good and not very many errors, or no errors at all. So you decide to switch it over to 100%. This flag has been configured with a circuit breaker that will trip if your performance falls below your desired thresholds. If the performance degrades after switching over, our monitoring app will trigger the circuit breaker and kill the flag. Well, let's assume you did this right. If you're performing dual writes, you should be safe because the system will just fall back to your self hosted Postgres database without any users noticing any issues. Certainly, automating all feature flags to react to external events is not something you'll want to do all of the time, but there are a handful of events you'll probably want to automate to make your system more resilient. It's hard not to appreciate when your system repairs itself, saving you from getting woken in the middle of the night. Well, those are the three things I wanted to tell you about. Again, my name is Rich Manalang, and thank you for listening to my talk. (mellow music)