This guide shows how to use LaunchDarkly AgentControl configs to define prompt and model variations, evaluate them against real production traffic, and route to the best-performing option without a redeploy.
To complete this guide, you need the following:
LaunchDarkly treats AI configuration the same way it treats feature releases. A behavior loop using LaunchDarkly AgentControl features lets you iterate on AI model use in production with measurement and control at every stage.
The optimization loop has three phases:
In LaunchDarkly, create an AgentControl config for the agent or feature you want to optimize. Add variations for each configuration you want to compare, such as different models, prompts, temperature settings, or any parameter that affects performance or cost. To learn more, read AgentControl.
Do not specify a config variation in your code. Keeping your application code agnostic as to which variation a config serves lets you adjust behavior using config variation changes, while your code only executes the instructions it is given.
Start with the axes that have the biggest cost or quality impact, such as model selection and system prompt. Add parameter tuning in later iterations.
Before you send any production traffic to new variations, validate them against a benchmark dataset. LaunchDarkly’s offline evaluation tools let you run evaluations with defined judges, such as quality, cost, correctness, and safety, against each variation. This lets you surface differences in outputs and reconfigure the config variations if necessary without any real users involved.
Use this phase to eliminate variations that are clearly underperforming. This saves you cost and bad user experience when real users encounter your config variations in production.
To learn more, read Datasets.
Roll your surviving variations out to a percentage of real traffic with LaunchDarkly’s Experimentation. Define the metrics that constitute a win for your use case:
LaunchDarkly measures these metrics for each variation across real production traffic and displays results as they accumulate. To learn more, read Proving ROI with data-driven AI agent experiments.
When a variation reaches statistical significance, promote it as the new baseline. In LaunchDarkly, this updates the AgentControl config without a code change or redeploy. Traffic routes to the winning configuration immediately.
Don’t treat promotion as the end of the loop. Schedule the next iteration. New model releases, prompt refinements, and changed usage patterns are all reasons to run the cycle again.
Set a regular review cadence, such as monthly or after any significant change in traffic volume or user behavior, to keep your configs optimized as conditions evolve.
To continue, explore the following topics: