For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sign inTry it free
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
  • Guides
    • Cheatsheets
      • Control and govern AI agents in production
      • Enable self-healing systems with runtime controls
      • Optimize AI performance and cost with AgentControl configs
      • Run continuous experiments in production
      • Ship AI-built code with AgentControl or CodeControl
    • Feature flags
    • AgentControl
    • Experimentation
    • Statistical methodology
    • Metrics
    • Infrastructure
    • Account management
    • Teams and custom roles
    • SDKs
    • Integrations
    • REST API
    • Additional resources
Sign inTry it free
LogoLogo
On this page
  • Prerequisites
  • How self-healing works in LaunchDarkly
  • CodeControl: Self-healing for application code
  • Step 1: Wrap every significant change in a feature flag
  • Step 2: Define your guardrails
  • Step 3: Roll out gradually and observe
  • AgentControl: Self-healing for AI agents
  • Step 1: Define your agent’s behavior in AgentControl configs
  • Step 2: Use judges to evaluate live performance
  • Step 3: Configure automated rerouting
  • Verify your remediation loop
  • Next steps
GuidesCheatsheets

Enable self-healing systems with runtime controls

Was this page helpful?
Previous

Optimize AI performance and cost with AgentControl configs

Next
Built with

This guide shows the LaunchDarkly patterns that let your system detect degradation and respond automatically to mitigate failures. These responses include rolling back code, swapping AI models, and rerouting traffic, all without a human in the loop.

Self-healing is when a system detects a problem and works to correct it without human intervention. By the end of this guide, you will understand a remediation pattern for problems with both traditional code releases and AI agent deployments.

Prerequisites

To complete this guide, you must have the following:

  • A LaunchDarkly account.
  • LaunchDarkly installed and initialized in your application. To learn more, read SDK.
  • A basic understanding of feature flags and flag targeting in LaunchDarkly. To learn more, read Targeting.

How self-healing works in LaunchDarkly

Self-healing relies on two capabilities working together:

  • Runtime control: your code and agents run behind feature flags or AgentControl configs, so you can change their behavior instantly without a redeploy.
  • Automated remediation: metrics-based thresholds or monitoring integrations trigger flag changes automatically when degradation is detected.

LaunchDarkly provides two paths depending on what you control:

PathWhat you controlRelated LaunchDarkly features
CodeControlTraditional application code, services, infrastructure behaviorFeature flags, Guarded releases, flag automation
AgentControlAI agents, LLM prompts, model selection, routing logicAgentControl configs, model evaluation, automated config updates

Work through the path that fits your use case, or both.

CodeControl: Self-healing for application code

This section describes how to facilitate self-healing in your app code.

Step 1: Wrap every significant change in a feature flag

A key organizing principle is to wrap any code change you want runtime control over in a feature flag. This is a requirement for automated remediation because you can only roll back changes you control with a flag.

Here is an example of guarding a payment-processor change wrapped in a feature flag:

Python
1# Evaluate the flag for the current context before new code runs
2if ld_client.variation("new-payment-processor", context, False):
3 result = new_payment_processor.charge(order)
4else:
5 result = legacy_payment_processor.charge(order)

Every behavior you want to auto-remediate must be behind a flag. Part of creating a flag involves defining what you want to happen if you have to revert a change or if LaunchDarkly is unavailable. This is called your flag’s fallback value. If your code runs without these conditions defined, you have nothing to roll back to if something goes wrong.

To learn more, read Fallback value.

Step 2: Define your guardrails

In LaunchDarkly, connect your flag to the metrics that matter for this change. Define the thresholds that indicate a problem, such as error rate, latency p99, conversion drop, or a custom business metric.

Configure this in the LaunchDarkly, on the page for your flag’s Guarded releases. Set the following:

  • The metric to monitor, such as error rate or latency.
  • The threshold that triggers a remediation action.
  • The action to take when the threshold is crossed. For example, you might want to roll the change back to 0%, disable the flag, or notify an on-call engineering team.
Guarded releases plan availability

Guarded releases is available to customers on Business and Enterprise plans. To learn more, read Guarded releases.

Step 3: Roll out gradually and observe

Release the change gradually with percentage rollouts, rather than a full release to 100% of your user base. Start at a small percentage of traffic, such as 5% to 10%, and let LaunchDarkly observe the metrics you connected in Step 2 before you expand the release to a larger audience.

If the thresholds you defined are crossed at any rollout stage, LaunchDarkly will halt the rollout and take the remediation action you configured in Step 2.

Rolling out a change incrementally with a feature flag, and ensuring the change either succeeds or is remediated, is how to use LaunchDarkly to ensure a safe code release every time.

AgentControl: Self-healing for AI agents

This section describes how to facilitate self-healing when using AI agents.

Step 1: Define your agent’s behavior in AgentControl configs

Move your agent’s prompts, model selection, and routing configuration into LaunchDarkly AgentControl configs instead of hardcoding them. This gives you runtime control over what the agent does without requiring a redeploy.

Here is an example of retrieving the active AgentControl config in Python:

Python
1# Fetch the active AgentControl config for this agent
2ai_config = aiclient.agent_config("support-agent-config", context, default_config)
3
4response = llm_client.complete(
5 model=ai_config["model"],
6 prompt=ai_config["system_prompt"],
7 user_input=user_message
8)

If you want to simulate the outcomes of different inputs on different models, create a playground. You can upload a dataset or adjust different thresholds and prompt options to configure results.

To learn more, read Playgrounds.

Step 2: Use judges to evaluate live performance

Judges are automated evaluators that score your agent’s behavior on the dimensions that matter. You can use pre-defined judges, or bring your own custom judges to gauge dimensions like quality, cost, latency, correctness, or safety. Connect these evaluations to your AgentControl config as metrics.

To learn more, read Judges.

AgentControl uses these evaluation scores to trigger automated remediation, just like CodeControl uses metric thresholds.

Step 3: Configure automated rerouting

Define what LaunchDarkly should do when evaluation scores degrade. For example, you could:

  • Swap to a fallback model.
  • Roll back to a different prompt configuration that you know is good.
  • Reroute the request to a different agent path.

Configure these responses in your AgentControl config’s automation settings in the LaunchDarkly UI.

Verify your remediation loop

Before you rely on automated remediation in production, test the full loop. Here’s how:

  1. Trigger a threshold violation in a non-production environment by injecting errors, simulating latency, or degrading evaluation scores.
  2. Confirm LaunchDarkly detects the threshold crossing.
  3. Confirm the configured remediation action executes by checking that the flag rolls back, the model swaps, or traffic reroutes.
  4. Confirm your application responds to the flag change without requiring a redeploy.

If remediation doesn’t fire, check that your metrics source is connected and reporting, and that your flag evaluation uses the correct context.

Next steps

To continue, explore the following topics:

  • Guarded releases to configure automated rollback thresholds.
  • AgentControl for full agent runtime control.
  • Metrics to feed signals into your remediation logic.
  • Targeting to control rollout scope before automating remediation.