For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sign inTry it free
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
  • Tutorials
    • The AI Iteration Loop for Deploying Reliable Agents with LangGraph
    • Using LaunchDarkly feature flags and Experimentation with Wordpress
    • Migrate a Hardcoded LangGraph Agent to LaunchDarkly AgentControl in 20 Minutes
    • Offline Evaluation of RAG-Grounded Answers in AgentControl
    • Beyond n8n for Workflow Automation: Agent Graphs as Your Universal Agent Harness
    • Catch your first silent AI failure with Vega AI in under 10 minutes
    • Evaluate LLM code generation with LLM-as-judge evaluators
    • OpenTelemetry for LLM Applications: A Practical Guide with LaunchDarkly and Langfuse
    • Use LaunchDarkly Agent Skills in Claude Code and Cursor
    • Detection to Resolution: Real World Debugging with Rage Clicks and Session Replay
    • Compare AI orchestrators: LangGraph vs Strands vs OpenAI Swarm
    • Building a data extraction pipeline with LaunchDarkly
    • Day 12 | 🎊 New Year, New Observability
    • Day 11 | ✉️ Letters to Santa: What engineering teams really want from Observability in 2026
    • Day 10 | Why observability and feature flags go together like milk and cookies
    • Day 9 | 👻 The Three Ghosts Haunting Your AI This Holiday Season
    • Day 7 | 🎄✨The Rockefeller tree in NYC: SLOs that actually drive decisions
    • Day 6 | 💸 The famous green character that stole your cloud budget: the cardinality problem
    • Day 5 | 🧹 Using a Popular Tidying Method to Consolidate Your Observability Stack
    • Day 4 | ❄️ Tracing the impact of holiday styling in your Node.js app
    • Day 8 | 🎁 Observable Multi-Modal Agentic Systems
    • Day 3 | 🔔 Jingle All the Way to Zero-Config Observability
    • Day 2 | 🎅 He knows if you have been bad or good... But what if he gets it wrong?
    • Collecting user feedback in your app with feature flags
    • Day 1 | 🎄 Observability Under the Tree: What Changed in 2025
    • Build a User Frustration Detection & Response System
    • When to Add Online Evals to Your AgentControl
    • Detecting User Frustration: Understanding Rage Clicks and Session Replay
    • AgentControl config CI/CD Pipeline: Automated Quality Gates and Safe Deployment
    • A Deeper Look at LaunchDarkly Architecture: More than Feature Flags
    • Add Observability to Your React Native App in 5 minutes
    • Smart AI Agent Targeting with MCP Tools
    • Build a LangGraph Multi-Agent System in 20 Minutes with LaunchDarkly AgentControl
    • Snowflake Cortex Completion API + LaunchDarkly SDK Integration
    • Using AgentControl to review database changes
    • How to implement WebSockets and kill switches in a Python application
    • 4 hacks to turbocharge your Cursor productivity
    • Create a feature flag in your IDE in 5 minutes with LaunchDarkly's MCP server
    • Observability for Your Go ORM: OpenTelemetry Integration with GORM
    • The complete guide to OpenTelemetry in Next.js
    • How to instrument your React Native app with OpenTelemetry
    • The complete guide to OpenTelemetry in Python
    • Monitoring Browser Applications with OpenTelemetry
    • How to Use OpenTelemetry to Monitor Next.js Applications
    • What is OpenTelemetry and Why Should I Care?
    • Distributed Tracing in Next.js Apps
    • Tracing Distributed Systems in Next.js
    • Real-time Monitoring in Django: Essential Tools and Techniques
    • DeepSeek vs Qwen: local model showdown featuring LaunchDarkly AgentControl
    • Application Tracing in .NET for Performance Monitoring
    • The Ultimate Guide to Ruby Logging: Best Libraries and Practices
    • Using Materialized Views in ClickHouse (vs. Postgres)
    • Filtering and Sampling LaunchDarkly Ingest
    • How to Set Up Your Production AWS MSK Kafka Cluster
    • Publishing an NPM Package with Private pnpm Monorepo Dependencies
    • How To Use The Chrome Inspector & Debugger
    • 3 Levels of Data Validation in a Full Stack Application With React
    • The power of the monorepo: Keep your fullstack app in sync!
    • Compression: The simple, powerful upgrade for your web stack
    • Video tutorials
Sign inTry it free
LogoLogo
On this page
  • What This Pipeline Does
  • How the Pipeline Works
  • Stage 1: Validation
  • Stage 2: Quality Testing
  • Stage 3: Config Sync and Drift Detection
  • Stage 4: Safe Deployment
  • Integration with GitHub Actions
  • What You Get
  • Additional Details
Tutorials

AgentControl CI/CD Pipeline: Automated Quality Gates and Safe Deployment

Was this page helpful?
Previous

A Deeper Look at LaunchDarkly Architecture: More than Feature Flags

Next
Built with

Published November 10th, 2025

Portrait of Scarlett Attensil.

by Scarlett Attensil

Your deployment shouldn’t fail because a config is misconfigured. And you shouldn’t wait until production rollout to discover your new prompt performs worse than the old one.

This CI/CD pipeline is implemented via GitHub Actions to catch config issues before they break your deployment and test prompt changes against your golden dataset before you start a guarded release.

With AgentControl, you get:

  • Instant rollback via UI (no redeploy needed)
  • Real-time config updates (change models, prompts, thresholds without code changes)
  • Progressive rollout with targeting rules and percentage-based deployment

Those controls manage safe deployment after merge. This CI/CD pipeline adds quality gates before merge, so you catch config errors and quality regressions in PRs instead of production.

This is a conceptual guide. For hands-on setup instructions to run this pipeline locally or add it to your project, see the ld-aic-cicd repository for installation, usage examples, and detailed documentation.

What This Pipeline Does

1. Validate AgentControl exist in LaunchDarkly

You’ll see a table in your terminal showing which configs are properly set up:

┌─────────────────┬──────────┬───────────────┬───────────┬───────┐
│ Config Key │ Status │ Model │ Provider │ Tools │
├─────────────────┼──────────┼───────────────┼───────────┼───────┤
│ security-agent │ ✅ Valid │ claude-3-5-h… │ Anthropic │ 0 │
│ support-agent │ ✅ Valid │ gpt-4o │ OpenAI │ 2 │
└─────────────────┴──────────┴───────────────┴───────────┴───────┘

2. Test AI response quality against your golden dataset

You’ll see terminal output showing how each config variation performs across different user contexts:

Judge Evaluation Results (Variation Comparison)
┌──────────────────┬───────────┬─────────┬─────────────┐
│ Variation │ Avg Score │ Min/Max │ Avg Latency │
├──────────────────┼───────────┼─────────┼─────────────┤
│ premium-users │ 9.2/10 │ 8.5/9.8 │ 1200ms │
│ free-tier │ 8.8/10 │ 7.5/9.5 │ 950ms │
│ enterprise │ 9.5/10 │ 9.0/9.9 │ 1850ms │
└──────────────────┴───────────┴─────────┴─────────────┘

3. Sync config defaults

Keep your code’s fallback defaults in sync with production. A nightly job detects drift and creates a PR when configs change in LaunchDarkly.

GitHub Sync Configs

Automated sync job creating a PR when production configs drift from code defaults

4. Deploy safely with LaunchDarkly’s guardrails

After merge, you’ll use dark launch (test with internal users first) and guarded rollouts (automatic quality monitoring) to gradually increase traffic. If quality degrades, LaunchDarkly alerts you or pauses the rollout.

How the Pipeline Works

Stage 1: Validation

The pipeline starts by verifying that your AgentControl actually exist in LaunchDarkly and are configured correctly. This catches basic setup issues early, before they reach your test environment or production.

Validation scans your codebase for config references (like ai_configs.* or ai_client.agent(key="security-agent")), then checks LaunchDarkly to confirm:

  • The config key exists in your project
  • It’s enabled (not disabled or archived)
  • Required fields are present (model, provider, instructions)
  • The configuration is well-formed

If validation fails, your CI/CD pipeline stops here. This prevents deploying code that references missing or broken AgentControl.

Stage 2: Quality Testing

The pipeline tests all config variations across different user contexts (premium users, free-tier, enterprise, etc.) using an LLM-as-judge. This produces quality scores, latency metrics, and variation comparisons to see which config performs best for which user segment.

The testing stage evaluates responses against your quality thresholds: accuracy scores, error rates, and latency limits. The pipeline blocks your PR if quality falls below thresholds. Once tests pass, the pipeline moves to syncing defaults and preparing for deployment.

Stage 3: Config Sync and Drift Detection

An important part of this pipeline is keeping your code’s default config values in sync with what’s actually running in LaunchDarkly production. This serves two purposes:

  • Runtime fallback when LaunchDarkly is unavailable
  • Drift detection when production configs change

First, the pipeline generates a Python module (configs/production_defaults.py) that you commit to git. This becomes your source of truth for default values. Then, a nightly sync job checks for drift by comparing your code’s defaults against LaunchDarkly production. When someone changes a config in production, the sync job detects the difference and creates a PR to update your code.

Why this matters: Your application imports these defaults as fallback behavior when LaunchDarkly is unavailable. Keeping them in sync with production means your fallback behavior matches production, not stale values from weeks ago. The drift detection also keeps your team aware of production changes.

Stage 4: Safe Deployment

After tests pass and the PR merges, LaunchDarkly’s built-in safety controls manage the deployment through four phases: deploy at 0% rollout, dark launch with internal users, guarded rollout with guardrails, and instant rollback if needed.

Deploy at 0% rollout

Your code goes to production, but the new config serves no traffic initially. Setting the rollout percentage to 0% in LaunchDarkly lets you verify deployment succeeded before exposing it to users.

Dark launch with internal users

LaunchDarkly targeting rules enable testing with real production traffic, starting with internal users and beta testers. For example, targeting rules like user.email contains "@yourcompany.com" or user.segment = "beta_testers" serve the new config only to specific groups. This catches issues that only appear with actual user interactions.

Guarded rollout with release guardrail metrics

Traffic increases gradually (0% - 1% - 10% - 50% - 100%) while release guardrail metrics monitor quality at each stage. Configure metrics for error rate, latency, and satisfaction in LaunchDarkly, and they’ll automatically track each rollout stage. If quality degrades, LaunchDarkly alerts you or pauses the rollout.

Guarded Release Dashboard

Guarded release with guardrail metrics monitoring quality during rollout

Instant rollback

If issues arise, LaunchDarkly’s UI enables instant rollback without redeploying. Turn off the flag, reduce percentage to 0%, or modify targeting rules. Changes take effect in seconds.

Integration with GitHub Actions

This entire pipeline runs automatically in your GitHub repository via GitHub Actions. When you open a PR, GitHub Actions executes the validation and testing stages. The results appear directly in your PR checks.

GitHub Actions Pipeline

GitHub Actions workflow showing validation and testing steps with pass/fail status

Each PR displays:

  • Validation results (which configs were checked)
  • Test results (quality scores, latency, variation comparison)
  • Pass/fail status for each quality gate

The workflow uses repository secrets (LD_API_KEY, LD_SDK_KEY, etc.) to connect to LaunchDarkly and execute the checks.

What You Get

This pipeline gives you the confidence to ship AI changes fast without sacrificing quality.

Before this CI/CD pipeline:

  • Manual review of AI outputs doesn’t scale.
  • Quality regressions slip into production.
  • No systematic way to test across user segments and variations.
  • Config changes go straight to prod without validation.

With this CI/CD pipeline:

  • Automated quality gates catch broken configs before merge.
  • LLM-as-judge tests all variations systematically (requires test data covering all config variations).
  • Validation in PR checks ensures configs exist and are well-formed.
  • Drift detection keeps your code defaults in sync with production.

Combined with LaunchDarkly’s AgentControl, instant rollback, progressive rollout, and release guardrails, you get the speed to iterate on AI features quickly with a safety net to catch issues before they reach customers.

Ready to implement? Visit the ld-aic-cicd repository to get started. The repository includes installation instructions, workflow templates, test data examples, and complete documentation.

Additional Details

For detailed implementation guidance, see the ld-aic-cicd repository documentation covering:

  • Choosing Evaluators: Direct evaluator (unit testing for single configs) vs HTTP evaluator (integration testing for multi-agent systems). Custom evaluators for specialized AI architectures
  • Test Data Format: Creating golden datasets with evaluation criteria, context attributes, reference responses, and performance constraints
  • Drift Detection: Syncing production configs to code defaults, nightly drift detection workflows, and handling config changes
  • Function Calling: Tool support across OpenAI, Anthropic, and Gemini providers with schema validation
  • Troubleshooting: Common issues with configs, latency, judge variability, and rate limits