A view into the complexity of a feature management system
If you’re here, you likely want to learn more about feature management, or at least feature flagging in general. Many applications use some form of feature flagging within their application and don’t realize it. These approaches, while totally useful and valid in many cases, can also spiral into high complexity quickly.
Building this on your own takes away from your team's core focus of innovating and keeping your applications running smoothly. Ultimately, most customers want a few key capabilities:
- Innovation within their applications faster
- Make software releases a non-major IT event
- Recover quickly from a failure, reducing the risk of deployments
- And most importantly—the ability to control user experiences within their application
Many teams turn to feature flag-driven development as a way to accomplish many of these goals, and in their attempts to achieve them, end up stitching together a solution on their own that quickly becomes unmanageable.
We see this often starting small as configuration files, and growing into many other high-touch solutions. Let’s get into what “building” might look like, and check out some pseudo-code.
The basis for Build
If we were building our own feature flagging from scratch, we might start with something like the code below:
### config.env
DEV_VAR = “A Clever Title for the DEV Website””
### index.js
<h1>${DEV_VAR}</h1>
In this approach we have two files (in all likelihood A LOT more, but stay with me on this): one that presents our variables, and the other that renders those variables in some way. This is usually configured at build time, and production builds often require a new build to be generated to handle it (since “hot reloading” is typically a dev mode feature). In our case, we just have dev, so our complexity grows when we want to manage both, becoming something like…
### dev.config.env
TITLE_VAR = “A Clever Title for the DEV Website””
### prod.config.env
TITLE_VAR = “A Clever Title for the PROD Website””
### index.js
<h1>${TITLE_VAR}</h1>
We now have three files, two of which are config files—and we likely configure which one to use as part of our build process. Our complexity is increasing. Now, what if we wanted to also factor in a completely new configuration of our component? The pseudo-code version of this might look like…
### dev.config.env
TITLE_VAR = “A Clever Title for the DEV Website””
NEW_HEADER = TRUE
### prod.config.env
TITLE_VAR = “A Clever Title for the PROD Website””
NEW_HEADER = FALSE
### index.js
If ${NEW_HEADER} {
<h1 className=”cool-css-style”>${TITLE_VAR}</h1>
} else {
<h1>${TITLE_VAR}</h1>
}
We’ve now augmented the code to display the new header with some improved styling when the dev mode is on (since we’re just testing), and in prod builds we leave the stable mode enabled. It’s a binary on and off.
There are a few things to consider in this example:
- This is a VERY simple frontend example. What if we’re handling API connectivity as well and want to change those configurations? What if there’s a database involved that we want to switch?
- These are all either on or off—there’s no concept of targeting specific contexts (users, devices, locations) here. We could start in on that, but consider how much the complexity grows when we start weaving those concepts in as well
- Largely, we’re doing some basic text changes here—consider the complexity as this grows
- What if we want to measure the consumption of these different features? What if we want to integrate with other platforms?
These bullet points (and many more) form the basis of where complexity starts to skyrocket, and if you continue down this path, you start engineering your own feature management platform.
The basis for Buy
I have a saying I use often, “anything is possible with code and time”. While this statement is true, I really can’t underscore enough the time portion of this in this example.
Building solutions on your own for the scenario above is going to become wildly complex and extremely time-consuming—not just from an initial design and build perspective, but also in terms long-term maintenance. These homegrown solutions are going to follow you around in your career like luggage. Have you ever noticed how you can’t seem to get rid of luggage? Build it yourself projects often end up this way.
In this next section, we’re going to step through a few of the ways this complexity becomes untenable, and how ultimately it’s better to consume these features from a system that's designed to manage it at scale.
Release targeting and request context
Simple feature flagging, config-files based toggles, or reading from an environment store to set features are all fairly easy when you simply want a “thing” on or off. That thing might be a database configuration for backend use cases, it might be a v2 of your API, or it might be a new set of configurations for your website.
The real power comes when you want to control WHO or WHAT is getting a specific feature release configuration. Within LaunchDarkly we refer to this as a context, the context of your request.
- Do you want MacOS devices to be in the earliest pilot?
- Do you want your East Coast users to receive a new database configuration?
- Do you want your development team to receive the new configuration but only when they are on the West Coast?
This targeting functionality is a core component (and arguably should be a baseline expectation) in the LaunchDarkly platform. Building this on your own is extremely hard to accomplish, not just from a core functionality standpoint but also from an iteration standpoint. Rule complexity grows in layers the more you add onto it during development. Evaluating the context of your request, targeting feature configurations from that context, and iterating on that configuration is something you are going to do often.
Building a targeting engine that lets you target features to different audience segments requires a serious amount of engineering labor.
Let’s expand our example a bit: Cody (Hi!) is on our development team, so we want to release new features to him as early as possible. The team is testing out a new replicated database configuration, and the primary instance is in New York, where Cody is traveling to. We want to test this database configuration in this scenario—but nowhere else. A pseudo-code targeting example of this might look like...
If (user == Cody) and (location == “New York”) serve (new database)
You might be able to handle that in an isolated “build” scenario, but what happens when we want to expand beyond Cody to the rest of the development team? When everything is working well, how do we expand that to our early access users? What if we only want to test against 20% of that group?
Building the logic to handle these pivots and growths—as well as reductions when a failure happens—is extremely complex. The consequences of doing it wrong can be exponentially bad.
Consuming this capability ensures you aren't having to handle building all this logic, error catching, and failback processes on your own.
Global reach
We talk about LaunchDarkly’s global architecture often, and with that in mind, consider the fact that we have grown to be a highly distributed world. Remote working is the new normal. Furthermore, people are connecting from a myriad of different device types, from varying connection qualities of locations.
You could opt to skip building for this consideration and store all your configurations remotely, which means increased latency for your end users (either from a loading perspective, or an application performance perspective). You’ll have mixed performance across devices in this case.
You could devote an entire business unit to architecting a feature flag system that reliably supports a global userbase.
If you decide you DO want to build your own solution for this, you’ll have to manage your own CDN configuration to distribute your feature configurations globally. You’ll need to configure ways for your applications to take this into account—and manage the cost and complexity of this portion of the solution independently. A couple of additional considerations:
- How does this factor in for client vs. server-side code?
- How does this factor in for new languages or frameworks you might adopt?
- How are you protecting your data from leakage?
Building the complexity of global reach on your own is going to increase your cost and complexity significantly. User experience matters to everyone, so a poor performance isn’t something you’re going to want to just ignore. Architecting this kind of solution on your own has the potential to become an entirely new business unit's worth of effort within your organization.
Once again, this is functionality that's better suited to consume over build on your own.
Release strategies: The release and rollback effect
Most of the earlier examples I provided would have resulted in a rebuild. A rebuild means a new pipeline run most likely, and an entirely new deployment each time. Do you want to change who’s targeted? Do a new deployment. Do you want to disable it if something breaks? Redeploy the code with the flags disabled.
In a software development space that's shifting to continuous EVERYTHING, this lack of flexibility is quickly going to become painful.
A solution that lets you deploy the code, with the feature completely disabled, and have full control over its rollout gives you incredible flexibility. Is your database code ready but the frontend isn’t? Ship the code, and enable the database functionality for specific development team users for testing—while the frontend stays disabled for everyone. Did a problem present itself in the configuration? Disable it immediately without a push.
Are you happy with the changes? Configure segments of users and begin a progressive rollout of the change to those user cohorts. Increase or decrease the rate as needed.
All of these capabilities are built into the core of LaunchDarkly—you don’t need to engineer the functionality to use it.
Measure, integrate, and extend
Being able to measure the experiences of your users is something that we try to do in observability tools in many cases, but oftentimes those are ill-equipped to understand the consumption of actual functionality of a feature. Maybe you want to run an A/B test against two versions of an application, and measure the performance against control groups? Experimentation within LaunchDarkly lets product teams get actual data on the way their features are being consumed—without having to have the development teams engineer these measurement methods from scratch.
What about when you want to tie your solution into existing platforms your teams use? Building these integrations is (once again) extremely time-consuming and complex. Furthermore, once you build it—you own it. Any breaks along the way become part of your operational processes to care and feed moving forward. New API released? You have to go and account for that within your platforms. These are capabilities you should consume, not have to build on your own.
Have I convinced you of the complexity yet?
Obviously, working for LaunchDarkly, there’s an easy perception to address that of course I’m going to argue for buying a solution. That being said, I’ve also experienced what its like to build solutions that become so complex they end up needing dedicated teams to solve them. Building these solutions often can patch a short-term problem, but they can create long-term impacts and technical debts that are hard to escape.
Ultimately, we want teams to be able to focus on building software that their customers love. Too often in the DevOps and software development space, we focus on patching gaps by building our own duct tape. The way you release software to your customers shouldn’t need to be a homegrown solution. You should be able to focus on building YOUR product.
Want to see it in action? My team runs a monthly workshop where we show you LaunchDarkly hands on. Come register, and spend an hour with us getting into LaunchDarkly, building a real application, and shipping with our platform!