Code can pass every test, clear every review, and still cause an incident the moment it hits production. That’s because “correctness” is a property of code in isolation. Production-readiness, on the other hand, is a property of code in context under real load, against real users, interacting with systems that weren't part of the test suite. Most tooling only checks the first one. AI-powered SRE agents are getting very good at identifying when something is wrong in production. What they haven't solved, however, and what most teams have dramatically underinvested in, is what happens after the agent knows.
An agent that can identify a problem but can't act on it is, at best, a sophisticated pager. And while most teams have invested heavily in the detection side, they have treated the action side as an afterthought. That's a mistake. The agents actually changing how teams operate are the ones that surface the right lever at the right moment, recommending what to do before a problem ships and after one lands, so that when something goes wrong, the human in the loop isn't starting from scratch. The AWS DevOps Agent is a good example of what that looks like in practice, and it's a big part of why we built the LaunchDarkly MCP Server to connect to it.
The tool consistently used to accomplish this is the feature flag. That's not a coincidence. Flags have always been one of the most precise levers available in production. They are instant, reversible, and work at runtime, with no redeployment required. With flags, teams can contain behavioral regressions in a specific user segment while everything else keeps running normally, or they can wrap a risky change in a flag before it ships, so there is a kill switch the moment something goes wrong. You or your agent can disable a misbehaving feature in milliseconds without touching the codebase. Nothing else in the operations toolkit gives you that combination of speed and granularity with the scale and reliability of a proven enterprise-grade platform.
AI has made that precision more necessary than it has ever been. Teams ship more code faster, but have less deliberate human review at each step. The surface area of what can go wrong in production at any given moment is vastly larger than it used to be. And with that, the nature of failures has changed. It's not always a crash or an error spike. It's a model returning subtly worse outputs. It’s programmatic drift. It's an agent behaving outside its expected parameters. It's the kind of subtle shifts that don't show up in error rates but absolutely show up in customer experience. Traditional incident response tools weren't built for that class of problem. Flags, along with LaunchDarkly best-in-industry goal-seeking experimentation, robust observability, and operational control with AgentControl and Guarded Releases, were.
The value of a flag recommendation depends almost entirely on how quickly the agent can surface it. When something goes wrong, the clock is running. Every minute spent correlating signals, identifying the likely cause, and figuring out which flag to toggle is a minute of customer impact. What changes with a well-connected SRE agent isn't whether a human makes the final call; it's how fast the right information reaches them. The agent collapses the gap between something's wrong and here's exactly what to do about it. The human executes with confidence rather than uncertainty.
The LaunchDarkly MCP Server is how we make that possible. It exposes LaunchDarkly flag operations through the MCP so the MCP-compatible agent can interact with your full flag infrastructure without custom integration work. Through the LaunchDarkly MCP Server, agents can query flag state, identify the relevant flag for a given issue, surface a toggle recommendation with full context for the operator, and provide a complete timestamped audit trail of every change. All of it is available within the same workflow where the agent identified the issue, without context switching, without a separate dashboard, without an approval chain that breaks the agent's momentum.
LaunchDarkly MCP documentation has complete setup and usage information to enable this functionality in your account.
With the LaunchDarkly MCP server connected to the AWS DevOps Agent, the integration works across two distinct workflows. During release management, release readiness review identifies higher-risk changes like policy violations, unsafe access-control expansions, and dependency risks, and recommends wrapping them in a LaunchDarkly feature flag before they ship. The developer gets a recommendation with context: why the change is flagged as high-risk and what flag configuration would give the team a kill switch if something goes wrong in production.
During incident response, the agent takes a different posture. Rather than reaching immediately for a full rollback, which is blunt, disruptive, and slow, when the situation requires, it queries the MCP Server to identify relevant flags and recommends disabling them as a first-line containment option. The operator gets the right flag, the right context, and a clear recommended action. A targeted flag toggle can contain the blast radius of a behavioral issue in milliseconds, buying the team time to diagnose and fix the root cause without taking down the whole system. It instantly keeps a software problem from becoming a customer problem. When the fix is ready, the agent can propose a phased re-enablement plan rather than a single all-or-nothing restore. Together, the LaunchDarkly MCP server connections with AWS DevOps agent help enable teams to have higher uptime and availability, along with easy zero-downtime changes and control.
For a detailed walkthrough of the decision logic behind both workflows, including a sample skill, see the AWS companion post.
Code can pass every test, clear every review, and still cause an incident the moment it hits production. That’s because “correctness” is a property of code in isolation. Production-readiness, on the other hand, is a property of code in context under real load, against real users, interacting with systems that weren't part of the test suite.
AI-powered SRE agents are getting very good at identifying when something is wrong in production. What they haven't solved, however, and what most teams have dramatically underinvested in, is what happens after the agent knows.

