This guide explains best practices for building resilient applications with LaunchDarkly SDKs. It primarily uses examples from the React SDK, but the patterns are broadly applicable to other SDKs.
The internet is a massive distributed system that is only growing in complexity. Software engineers are under more pressure than ever to ship new features quickly. Speed is important, but so is reliability. Even well-architected systems can rely on external dependencies, and those dependencies can sometimes be unpredictable. Using a few deliberate patterns, you can build applications that stay resilient and responsive when parts of your system experience issues.
When it comes to building applications that are always available in any situation, there are four guiding principles to consider:
This guide is generalizable to any of the LaunchDarkly SDKs and any infrastructure. Wherever we use variation() or allFlags(), substitute in the matching method for your specific SDK.
Your answers to the following questions can help clarify whether to implement the below recommendations:
The following sections explain difference strategies you can use to help improve resilience in your application.
“Initialization” means that an SDK has connected to the service and is ready to evaluate flags. There are legitimate reasons to temporarily block initialization for specific use cases, for instance, to reduce flicker while running experiments. This ensures that users are not exposed to multiple test variants while loading your application.
However, as a rule, we recommend that you never block or close your application if initialization is unsuccessful.
If an SDK calls a flag evaluation method such as variation() before initialization completes or initialization fails, LaunchDarkly uses the default flag values that you have supplied.
We strongly recommend that you implement this in your SDK, as it is the most effective method for increasing resilience.
Some considerations include:
For the React Web SDK, consider the tradeoffs of using withLDProvider versus asyncWithLDProvider to initialize your client. asyncWithLDProvider blocks rendering your application until after initialization is complete, while withLDProvider renders the application prior to initializing.
Set initialization timeouts, after which your application should proceed regardless of success. Recommended values are 100–500ms for client-side SDKs and 1–5s for server-side SDKs.
For the React Web SDK, set your initialization timeout in your ProviderConfig, which passes the value to the JavaScript SDK’s waitForInitialization method under the hood. For other JavaScript-based SDKs, use the built-in waitForInitialization method with a timeout provided.
Here is an example code snippet for the React Web SDK:
For a server-side SDK example, here is how to set a timeout using the Node.js (server-side) SDK’s waitForInitialization method:
Teams are often concerned that letting users access their application with fallback flag values will create issues.
We recommend intentionally setting fallback flag values, regularly reviewing your coded fallback values to keep them current with your rollouts, and regularly cleaning up flags to remove outdated flags with outdated fallbacks.
As a general rule, consider setting your fallback values to stable behavior matching your application’s current working state. Essentially, use values that keep things running. For high-security or compliance-related areas, falling back to more restrictive behavior instead is a good practice.
Some considerations include:
variation() methods, always pass a fallback value, such as variation(flagKey, defaultValue), and treat this as authoritative when the SDK isn’t ready.allFlags() or useFlags() and no bootstrapping is available, use a canonical fallbacks map that can be overwritten by returned values.LaunchDarkly’s client-side SDKs can initialize using flag values that have been provided externally. The source of these values is controlled by the bootstrap configuration option.
There are two built-in sources that can be used to bootstrap these last-known values:
localStorage objectAfter the client-side SDK has been bootstrapped with these initial values, it will attempt to connect to LaunchDarkly to pull updated values. If it is unable to connect, it will continue to use these bootstrapped values until it connects successfully.
Some considerations include:
window.__LD_FLAGS__ = { ... } or using an endpoint that returns the JSON.
clientSideOnly = true, which ensures that only your client-side-available flags are returned. This prevents potentially sensitive server-side flags from being exposed.localStorage and let the SDK handle everything for you. In client-side JavaScript-based SDKs that are version 4.x or greater, this will be the default behavior.Sometimes some part of the LaunchDarkly network is unreachable, but other parts are still reachable, such as when the streaming endpoints are down while the polling endpoints are up. In these cases, we recommend having a quick, low-overhead method in place to control your SDK’s configuration without needing to do a full deploy that runs through your full CI/CD pipeline.
The default behavior requires you to edit the configuration in your code, re-deploy that code, and restart your SDK to make the change effective. But, it’s not always feasible to make a deploy. Instead, we recommend mapping environment variables to SDK options and then using those environment variables as authoritative values to populate your SDK’s configuration. This way you can easily re-configure your SDK to in the face of external circumstances without needing to do a full deploy.
Some considerations include:
LD_MODE with values streaming, polling, and offline. Map this to the configuration options in your SDK. You could also map other options, like polling interval, events send status, Relay Proxy URL, and so on.We recommend that you use persistent data stores only in conjunction with the Relay Proxy and an infinite cache TTL. If you use persistent data stores alone, this can actually decrease resiliency.
The Relay Proxy is a lightweight service that can proxy all of your server-side SDK connections into a single long-lived connection to LaunchDarkly. Additionally, it can serve last-known flag values to the server-side SDKs without needing to connect to LaunchDarkly. Just point your server-side SDKs at your Relay Proxy. When paired with a persistent data store, a durable database storing last-known flag values outside of cache, this combination creates cold-start resilience when cached values are lost, even if LaunchDarkly is temporarily unreachable.
Some considerations include:
-1s. This is especially useful in cases where your persistent data store becomes temporarily unavailable. An infinite TTL will allow the Relay Proxy to continue serving flags to SDK clients, while updating the cache if it receives any flag updates from LaunchDarkly. When the persistent data store becomes available again, the Relay Proxy will write the cache contents back to the database. Avoid restarting the Relay Proxy while your data store is unavailable, as the Relay Proxy will only read from the database upon service startup.
localTtl key in your ld-relay.conf or by passing in the CACHE_TTL environment variable when starting the Relay Proxy with --from-env. More information is available within ld-relay/docs/persistent-storage.md.initTimeout duration has passed. For more information, read initTimeout in ld-relay/docs/configuration.md.
ignoreConnectionErrors to true. If you set it to false, the Relay Proxy will not start once the timeout passes if LaunchDarkly is unreachable.
More information on the Relay Proxy and persistent data stores can be found in our docs:
In the event that LaunchDarkly becomes unreachable, you don’t want to scramble to make untested configuration changes and deploys that may not solve your problems. We recommend that you keep your Relay Proxies hot and actively receiving some small proportion of your server-side SDK traffic at all times, so that you can be ready to scale up for burst traffic at any time.
Some considerations include:
In many cases, analytics event loss is not critical as long as flags are delivering the correct values. However, when moving beyond general flag evaluation into more data-driven use cases, like Experimentation or guarded releases, dropped events can cause problems that can lead to untrustworthy results and lead to restarted experiments. While the Relay Proxy can queue incoming events from the SDKs and batch send them to LaunchDarkly, this capability is not designed for extended network interruptions. It is meant to pool events from many connections and send them to LaunchDarkly over a single connection.
Instead, we recommend standing up a dedicated events pipeline to handle ingestion, storing, and replaying of events. This would become the events endpoint for your SDKs and live between them and LaunchDarkly.
Some options are Vector, Fluent Bit, or Kafka.
This guide explained best practices for building resilient applications with LaunchDarkly SDKs. These practices help ensure consistent flag delivery and reliable behavior, even when parts of your system experience issues.