Agent Optimization: Discover better agent configurations automatically

Developers improve agents through iteration: Change something, run a test, see if it feels better, repeat. The process has a ceiling, though, and you can only test configurations you thought to try, with the starting point for each change shaped by what you've already built. The best configuration you find through manual iteration is the best one you imagined, not necessarily the best one that exists. What if there were a way to improve agents automatically?

Agent Optimization is now available in beta in AgentControl for eligible customers. You start by defining acceptance criteria for your agent: what a good response looks like, how it should be structured, what it should and shouldn't do. From there, Agent Optimization generates combinations of models, prompts, and hyperparameters to test, evaluates each iteration against those criteria using a judge model, and surfaces the one that performed best. The exploration isn't bounded by configurations anyone thought to try.

A team building a customer support triage agent defines their acceptance criteria: accurate classification of support queries, and responses that begin immediately with the category rather than preamble. They trigger an optimization run and, three iterations later, the winning configuration scores 0.95 on both acceptance criteria while latency has dropped from 8 seconds to 3.4 seconds and cost per run has fallen alongside it. The system found a configuration that's equally accurate, substantially faster, and cheaper. Manual iteration might have eventually reached the same result, or might have traded off quality trying to get there.

Exploring the configuration space

Two optimization modes shape how the exploration runs. Exploratory mode is for agents with open-ended or unpredictable input spaces, which is useful when the goal is mapping behavior across a wide range of inputs rather than validating against known outputs. Expected Output mode is for agents where input/output pairs already define correct behavior, and the goal is to improve performance without regressing on what already works.

From optimization to production

When a run surfaces a winning configuration, it can be promoted directly from the results view to a config. The configuration that passed optimization ships as a variation, and from that point, the rest of the AgentControl system applies: guarded rollout to ramp traffic progressively, online quality scoring through AI Insights to track how it performs in production, and the same controls that govern any other configuration change.

This is where the offline optimization run connects to the production system. The configuration that performed best in a controlled environment now gets tested against real traffic, with the results feeding back into the picture of how agents are actually behaving. That path, from controlled evaluation to production measurement, is what AgentControl is building toward. Today, teams trigger the run and review the results. What the platform is working toward is agents that surface their own improvement opportunities from what's happening in production and apply them, without waiting on a manual iteration cycle.