This guide discusses some of the challenges of testing code that uses feature flags and provides recommendations for how to address those challenges.
In a traditional development process, a quality assurance (QA) team does testing on staging, then deploys to production. However, staging is never an exact replica of production. If something is working on staging but breaks on production, then the QA team has to rerun all their tests a second time. A more efficient approach is to skip or minimize testing on staging and test in production from the beginning.
Some of our most successful customers are able to deploy to production multiple times per day because their QA and user acceptance testing (UAT) teams can validate functionality in a real production environment before exposing that functionality to the rest of their user base. When you use feature flags to expose your feature to test contexts only, if your QA or UAT team finds a bug, there is no impact on any other contexts and there is no need to do a full rollback. After your feature is working for your test contexts, you can begin an incremental rollout to the rest of your audience.
In addition to manual testing in production, this guide discusses multiple types of automated tests you can run while developing your code, including unit testing, mock testing, integration testing, and end-to-end testing. We also provide example unit tests, example Cypress tests, and a mock to use with the Jest test runner and the React Web SDK.
To complete this guide, you must have the following prerequisites:
To complete this guide, you should understand the following concepts:
Flag variations let you use one flag to serve multiple variations of a feature to different contexts simultaneously. There is no limit to the number of variations you can add to a flag, making them useful for testing scenarios. To learn more, read Creating flag variations.
A fallback value is the variation your app serves to a context if it can’t evaluate a flag, such as if your app can’t connect to LaunchDarkly. You can view the fallback value on a flag’s Targeting tab.
Configure your flags’ fallback values within your SDK.
Configure your SDK: Evaluating flags
Flag targeting lets you control which contexts receive which variation of a feature flag. Targeting is useful when testing your code in production because it lets you target only specific test contexts and exclude your general context population. To learn more, read Target with flags.
You can also use the REST API: Creating flags using the API
This guide discusses the following test types:
It is a common misconception that when testing code that uses feature flags, you must test every possible combination of every flag variation with every other flag, in both the on and off state. The number of combinations this would require quickly becomes unrealistic. This is called the “combinatorial explosion myth.” In reality, you do not need to test every combination of feature flags when testing your code, and even if you tried, it is likely impossible to do. Instead, choose a defined set of scenarios to limit the number of different flag states to test.
Consider testing the following flag combinations:
You can write unit tests for individual pieces of code in your application and test them independently. Because unit tests assess discrete pieces of functionality, wrapping that functionality in feature flags doesn’t affect these tests.
You don’t want to connect to a LaunchDarkly SDK when running unit tests because you can’t ensure the values you’re pulling into your test will be consistent. You can use a mock test instead, which prevents variation in the values you’re using. To learn more, read Mock tests.
You can also use a test data source to mock the behavior of a LaunchDarkly SDK so it has predictable behavior when evaluating flags. However, not all SDKs support using a test data source.
Configure your SDK: Test data sources
In this example, we’re switching from MySQL to ElasticSearch as the search engine. The migration from the old to the new search engine is controlled by a feature flag.
The code for sending queries to each search engine is separated into dedicated functions. This makes it easier to exercise the dedicated logic for each search engine in a unit test, without needing to mock the feature flag result.
Here’s the code we’re testing:
This unit test checks that the correct search function is being called depending on the flag variations of true for the new engine and false for the old engine:
Mock tests are a type of unit test that lets you test your code within a simulated version of an internal or external service. The simulation isolates the behavior of your code so you can focus on the code being tested, and not on the behavior of the service.
We provide a mock to use with the Jest test runner and the React Web SDK, available in our Jest mock GitHub repo. Although this mock is Jest-specific, it is simple enough you can use it with other JavaScript frameworks as well. To learn more about using the mock, read Unit testing with Jest.
As you expand out from testing a unit to testing the whole system, you can start running integration and end-to-end tests. The presence of feature flags matters with these types of tests because the state of a flag could violate your test assertions.
When running these types of tests, your code should use the real LaunchDarkly SDK to evaluate flags. However, it should use consistent flag configurations for the test.
To keep your input consistent, you should use one of the following methods to get values from your SDK.
Reading flags from a file is our recommended approach for testing with feature flags without connecting to LaunchDarkly. With this method, you configure your SDK to read from a file instead of connecting to LaunchDarkly. Reading from a file allows you to run tests in a local environment without connecting to an external network. However, only server-side SDKs support using flag files.
Configure your SDK: Reading flags from a file
Using a dedicated LaunchDarkly environment is an easy option to set up because it requires no code changes. However, it depends on the environment configuration staying static, unless testers or automated testing infrastructure deliberately change it. For each test that requires a specific set of flag values, you must configure the environment correctly. An automated test runner can change the configuration before running each test, but this adds a delay. If multiple test runners are invoked concurrently, their configuration changes will conflict with each other and cause errors.
Using a dedicated LaunchDarkly environment with targeting is similar to the previous option, but the flags in the environment are configured with targeting rules so that they return different values depending on the context object you provide in the code. This allows you to run an entire set of tests with different flag values for each without needing to reconfigure any flags during the test run. This also means that multiple concurrent test runs will not conflict with each other.
To learn more, read Target with flags.
To ensure the correct use of targeting in test runs, the test must meet these requirements:
There are many options for setting up a context object to reflect the needed flag values for the current test.
Here are some examples:
The bootstrapping feature lets you provide browser-based SDKs with an initial set of flag values.
Configure your SDK: Bootstrapping
Using the Relay Proxy in offline mode lets the Relay Proxy run as a separate component, loading flag values from a file without connecting to LaunchDarkly. We do not generally recommend this method because it is more complicated to implement. However, it can be useful in situations where you are already considering the use of the Relay Proxy and would like to configure your test environment to mirror production. To learn more, read Offline mode.
Instead of building a real LaunchDarkly client, you can put a wrapper around your SDK to determine whether flag values will be fixed or in a predefined state. Using wrappers around a LaunchDarkly SDK is a common practice that provides benefits such as standardization, extending capabilities using API, and more. To learn more about wrappers, read Use cases for SDK wrappers.
Cypress is a JavaScript end-to-end testing framework for front-end web applications. This example explains how to create a Cypress plugin to modify flags that control the instances of LaunchDarkly you test with Cypress.
Cypress tests can fail if you don’t have the right flag settings in your testing instance. You may not be able to differentiate between failures caused by flag settings, general flakiness, or real problems. You can use this plugin to ensure that you don’t get failures due to flag settings by targeting the test context in the correct flags before running tests.
To begin, create the following custom Cypress command: setContextFlags(flags: LDFlagSet).
This command:
Here is an example:
Next, create clearContextFlags(flags: LDFlagSet).
This command:
Here is an example:
You can use this plugin to write Cypress tests that cover all configurations of your LaunchDarkly instance, instead of running Cypress tests against fixed configurations.
For a real-world example of using Cypress for end-to-end testing, read Gleb Bahmutov’s blog post Control LaunchDarkly From Cypress Tests.
Manual testing within your LaunchDarkly instance is useful because you can be in control of the flags in your test. However, if multiple people are working within the same environment when testing, they can accidentally overlap and affect each other’s tests.
LaunchDarkly provides a command line interface (CLI) so that you can set up and manage your feature flags, account members, projects, environments, teams, and other resources. The LaunchDarkly CLI also includes a dev-server command that you can use to start a local server, retrieve flag values from a LaunchDarkly source environment, and update those flag values locally. This means you can test your code locally, without having to coordinate with other developers in your organization who are using the same LaunchDarkly source environment.
To learn more, read Using the LaunchDarkly CLI for local testing.
To avoid overlap between multiple manual tests, you can use multiple LaunchDarkly test environments. You can create a unique environment for each QA tester, or create an environment for each QA test run that you delete at the end of the test. In either case, environments are free and easy to create and copy.
The downside to multiple test environments is that you have to carefully manage your SDK keys because each environment has its own key.
You can also use the REST API: Create environment
You can use the LaunchDarkly REST API with test environments to set context targets and flag statuses. This method uses fewer environments and more targeted testing, which lets multiple people run tests concurrently. To use this method, you must coordinate how targeting is set up for testers.
Targeting can work in two different ways:
To learn more, read our REST API documentation.
When you begin manual testing within your LaunchDarkly production instance, testing activity will appear in flag statuses, graphs, application performance management (APM) tools, and so on. You may want to filter out testing activity so your statistics aren’t affected. The best way to do this is to configure your SDK not to transmit events from tests. If you can’t configure your SDK to do this, then you may be able to configure the receiving service to ignore test events.
Filtering out test event data is a good use case for SDK wrappers. You can control whether the code sends events or not by putting the wrapper into a different mode, and then control that mode with a feature flag.
Feature flag-driven development empowers your organization to deploy at a faster pace with less risk. It also creates additional complexity when testing your code that uses feature flags. This guide covered various types of tests you can run in production, and the advantages and drawbacks to each.
Here are some blog posts for further reading: