For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sign inTry it free
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
  • Tutorials
    • The AI Iteration Loop for Deploying Reliable Agents with LangGraph
    • Using LaunchDarkly feature flags and Experimentation with Wordpress
    • Migrate a Hardcoded LangGraph Agent to LaunchDarkly AgentControl in 20 Minutes
    • Offline Evaluation of RAG-Grounded Answers in AgentControl
    • Beyond n8n for Workflow Automation: Agent Graphs as Your Universal Agent Harness
    • Catch your first silent AI failure with Vega AI in under 10 minutes
    • Evaluate LLM code generation with LLM-as-judge evaluators
    • OpenTelemetry for LLM Applications: A Practical Guide with LaunchDarkly and Langfuse
    • Use LaunchDarkly Agent Skills in Claude Code and Cursor
    • Detection to Resolution: Real World Debugging with Rage Clicks and Session Replay
    • Compare AI orchestrators: LangGraph vs Strands vs OpenAI Swarm
    • Building a data extraction pipeline with LaunchDarkly
    • Day 12 | 🎊 New Year, New Observability
    • Day 11 | ✉️ Letters to Santa: What engineering teams really want from Observability in 2026
    • Day 10 | Why observability and feature flags go together like milk and cookies
    • Day 9 | 👻 The Three Ghosts Haunting Your AI This Holiday Season
    • Day 7 | 🎄✨The Rockefeller tree in NYC: SLOs that actually drive decisions
    • Day 6 | 💸 The famous green character that stole your cloud budget: the cardinality problem
    • Day 5 | 🧹 Using a Popular Tidying Method to Consolidate Your Observability Stack
    • Day 4 | ❄️ Tracing the impact of holiday styling in your Node.js app
    • Day 8 | 🎁 Observable Multi-Modal Agentic Systems
    • Day 3 | 🔔 Jingle All the Way to Zero-Config Observability
    • Day 2 | 🎅 He knows if you have been bad or good... But what if he gets it wrong?
    • Collecting user feedback in your app with feature flags
    • Day 1 | 🎄 Observability Under the Tree: What Changed in 2025
    • Build a User Frustration Detection & Response System
    • When to Add Online Evals to Your AgentControl
    • Detecting User Frustration: Understanding Rage Clicks and Session Replay
    • AgentControl config CI/CD Pipeline: Automated Quality Gates and Safe Deployment
    • A Deeper Look at LaunchDarkly Architecture: More than Feature Flags
    • Add Observability to Your React Native App in 5 minutes
    • Smart AI Agent Targeting with MCP Tools
    • Build a LangGraph Multi-Agent System in 20 Minutes with LaunchDarkly AgentControl
    • Snowflake Cortex Completion API + LaunchDarkly SDK Integration
    • Using AgentControl to review database changes
    • How to implement WebSockets and kill switches in a Python application
    • 4 hacks to turbocharge your Cursor productivity
    • Create a feature flag in your IDE in 5 minutes with LaunchDarkly's MCP server
    • Observability for Your Go ORM: OpenTelemetry Integration with GORM
    • The complete guide to OpenTelemetry in Next.js
    • How to instrument your React Native app with OpenTelemetry
    • The complete guide to OpenTelemetry in Python
    • Monitoring Browser Applications with OpenTelemetry
    • How to Use OpenTelemetry to Monitor Next.js Applications
    • What is OpenTelemetry and Why Should I Care?
    • Distributed Tracing in Next.js Apps
    • Tracing Distributed Systems in Next.js
    • Real-time Monitoring in Django: Essential Tools and Techniques
    • DeepSeek vs Qwen: local model showdown featuring LaunchDarkly AgentControl
    • Application Tracing in .NET for Performance Monitoring
    • The Ultimate Guide to Ruby Logging: Best Libraries and Practices
    • Using Materialized Views in ClickHouse (vs. Postgres)
    • Filtering and Sampling LaunchDarkly Ingest
    • How to Set Up Your Production AWS MSK Kafka Cluster
    • Publishing an NPM Package with Private pnpm Monorepo Dependencies
    • How To Use The Chrome Inspector & Debugger
    • 3 Levels of Data Validation in a Full Stack Application With React
    • The power of the monorepo: Keep your fullstack app in sync!
    • Compression: The simple, powerful upgrade for your web stack
    • Video tutorials
Sign inTry it free
LogoLogo
On this page
  • The problem: Research gap analysis across multiple papers
  • Why use a swarm?
  • Technical requirements
  • The architecture: how LaunchDarkly powers framework-agnostic swarms
  • Three layers of framework-agnostic swarms
  • Step 1: Download research papers
  • Step 2: Set up your multi-orchestrator project
  • Environment setup
  • Install dependencies
  • Step 3: Bootstrap agent configs with the manifest
  • Understanding the bootstrap system
  • Run the bootstrap script
  • What gets created
  • Verify in LaunchDarkly dashboard
  • How variations and targeting work
  • Customize agent behavior
  • Step 4: Implement per-agent tracking
  • Fetching agent configurations dynamically
  • Pattern 1: Native framework metrics (Strands)
  • Pattern 2: Message-based tracking (LangGraph)
  • Pattern 3: Interception-based tracking (OpenAI Swarm)
  • Critical: Provider token field names differ
  • Step 5: Run multiple orchestrators and track results
  • Comparing orchestrator approaches to swarms
  • Key differences
  • Performance comparison (9 runs: 3 datasets × 3 orchestrators)
  • Example reports: See the outputs
  • Conclusion
  • Related tutorials
Tutorials

Building Framework-Agnostic AI Swarms: Compare LangGraph, Strands, and OpenAI Swarm

Was this page helpful?
Previous

Build a production LLM data extraction pipeline with AgentControl and Vercel AI Gateway

Next
Built with

Published February 4, 2026

Portrait of Scarlett Attensil.

by Scarlett Attensil

Newer features are available with AgentControl

This tutorial was published in early February 2026, before LaunchDarkly shipped agent graphs. Agent graphs are the LaunchDarkly-specific way to define multi-agent topology that all three orchestrators in this tutorial can consume. The framework-agnostic pattern below still works, but for new builds you may also want to use:

  • Agent graphs: Agent graphs let you externalize the topology itself, not just the per-agent configs, so swapping orchestrators doesn’t redefine the graph
  • Offline evaluations and Datasets: Compare orchestrator outputs against a saved reference set, not just live runs
  • Prompt snippets: Share common system-prompt fragments across the three orchestrators’ agent configs
  • Manual LLM span tracing: Instrument per-orchestrator overhead beyond what auto-tracing captures

To learn more, read AgentControl.

If you’ve ever run the same app in multiple environments, you know the pain of duplicated configuration. Agent swarms have the same problem: the moment you try multiple orchestrators (LangGraph, Strands, OpenAI Swarm), your agent definitions start living in different formats. Prompts drift. Model settings drift. A “small behavior tweak” turns into archaeology across repos.

AI behavior isn’t code. Prompts aren’t functions. They change too often, and too experimentally, to be hard-wired into orchestrator code. AgentControl lets you treat agent definitions like shared configuration instead. Define them once, store them centrally, and let any orchestrator fetch them. Update a prompt or model setting in the LaunchDarkly UI, and the new version rolls out without a redeploy.

Start your free trial

Ready to build framework-agnostic AI swarms? Start your 14-day free trial of LaunchDarkly to follow along with this tutorial. No credit card required.

Start free trial →

The problem: Research gap analysis across multiple papers

When analyzing academic literature, researchers face a daunting task: reading dozens of papers to identify patterns, spot contradictions, and find unexplored opportunities. A single LLM call can summarize papers, but it produces a monolithic analysis you can’t trace, refine, or trust for critical decisions.

The challenge compounds when you need to:

  • Identify methodological patterns across 12+ papers without missing subtle connections
  • Detect contradictory findings that might invalidate assumptions
  • Discover research gaps that represent genuine opportunities, not just oversight

This is where specialized agents excel - each focused on one aspect of the analysis, building on each other’s work.

In this tutorial, we’ll build a 3-agent research analysis swarm that solves this problem by dividing the work:

AgentRoleOutput
Approach AnalyzerClusters methodological themes across papers”Papers 1, 4, 7 use reinforcement learning; Papers 2, 5 use symbolic methods”
Contradiction DetectorFinds conflicting claims between papers”Paper 3 claims X improves performance; Paper 8 shows X degrades it”
Gap SynthesizerIdentifies unexplored research directions”No papers combine approach A with dataset B; potential opportunity”

We’ll implement this swarm across three different orchestrators (LangGraph, Strands, and OpenAI Swarm), demonstrating how AgentControl enable:

  • Framework-agnostic agent definitions: Define agents once in LaunchDarkly, use them everywhere
  • Per-agent observability: Track tokens, latency, and costs for each agent individually - catch silent failures when agents skip execution
  • Dynamic swarm composition: Add/remove agents from the swarm or switch models without touching code

Why use a swarm?

Research gap analysis requires different skills: clustering methodological patterns, detecting contradictions, and synthesizing opportunities. With a swarm, each agent handles one aspect and produces artifacts the next agent builds on. You can track tokens, latency, and cost per agent. You can catch silent failures when an agent skips execution. And when something goes wrong, you know exactly where.

Technical requirements

Before implementing the swarm, ensure you have:

  • LaunchDarkly account with AgentControl enabled (see quickstart guide)
  • API keys for Anthropic Claude or OpenAI GPT-4 (check supported models)
  • Python 3.11+ for running orchestrators
  • Basic understanding of agent systems (review LangGraph agents tutorial if needed)

The complete implementation is available at GitHub - AI Orchestrators.

The architecture: how LaunchDarkly powers framework-agnostic swarms

The swarm architecture has three layers: dynamic agent configuration, per-agent tracking, and custom metrics for cost attribution. Here’s how they work together.

LangGraph swarm architecture showing configuration fetch, agent interactions with Command-based handoffs, and dual metrics tracking to AgentControl trends

LangGraph swarm architecture showing configuration fetch, agent interactions with Command-based handoffs, and dual metrics tracking to AgentControl trends

The diagram shows LangGraph’s implementation, but Strands and OpenAI Swarm follow the same pattern with their own handoff mechanisms. The key components are:

  1. Configuration Fetch: The orchestrator queries LaunchDarkly’s API to dynamically discover all agent configurations, avoiding hardcoded agent definitions
  2. Agent Graph: Three specialized agents (Approach Analyzer, Contradiction Detector, Gap Synthesizer) connected through explicit handoff mechanisms
  3. Metrics Collection: Each agent execution captures tokens, duration, and cost metrics through both the config tracker and custom metrics API
  4. Dual Dashboard Views: The same metrics appear in the AgentControl trends dashboard (for individual agent monitoring)

Three layers of framework-agnostic swarms

1. config for Dynamic Agent Configuration

Each config stores:

  • Agent key, display name, and model selection
  • System instructions and tool definitions

Your orchestrator code queries LaunchDarkly for “all enabled agent configs” and builds the swarm dynamically. No hardcoded agent names.

2. Per-Agent Tracking with AI SDK

LaunchDarkly’s AI SDK provides tracking through config evaluations. You get a fresh tracker for each agent, then track tokens, duration, and success/failure. These metrics flow to the config Monitoring dashboard automatically.

config monitoring dashboard showing per-agent token usage, duration, and success rates across multiple runs

config monitoring dashboard showing per-agent token usage, duration, and success rates across multiple runs

This tracking catches silent failures - when agents skip execution or produce minimal output. Step 4 shows the implementation patterns for each framework.

3. Custom Metrics for Cost Attribution

Per-agent tracking shows performance, but for cost comparisons across orchestrators you need custom metrics. These let you query by orchestrator, compare costs across frameworks, and identify anomalies.

With the architecture covered, let’s build the swarm. We’ll download research papers, set up the project, bootstrap agent configs in LaunchDarkly, implement per-agent tracking, and run the swarm across all three orchestrators.

Step 1: Download research papers

First, you need papers to analyze. The scripts/download_papers.py script queries ArXiv with narrow, category-specific searches to ensure focused results.

$python scripts/download_papers.py

The script presents pre-configured narrow research topics:

1# From orchestration/scripts/download_papers.py:164-189
2topics = {
3 "1": {
4 "name": "Chain-of-thought prompting in LLMs",
5 "query": "cat:cs.CL AND (chain-of-thought OR CoT) AND reasoning",
6 "years": 2
7 },
8 "2": {
9 "name": "Retrieval-augmented generation (RAG)",
10 "query": "cat:cs.CL AND (retrieval-augmented OR RAG) AND generation",
11 "years": 2
12 },
13 "3": {
14 "name": "Emergent communication in multi-agent RL",
15 "query": "cat:cs.MA AND (emergent communication OR language emergence)",
16 "years": 5
17 },
18 "4": {
19 "name": "Few-shot prompting for code generation",
20 "query": "cat:cs.SE AND few-shot AND code generation",
21 "years": 2
22 },
23 "5": {
24 "name": "Vision-language model grounding",
25 "query": "cat:cs.CV AND vision-language AND grounding",
26 "years": 2
27 }
28}

These topics are intentionally narrow: Each uses ArXiv categories (cat:cs.CL, cat:cs.MA) to limit scope. Boolean AND operators ensure papers match all criteria. 2-5 year windows prevent overwhelming the analysis.

For even narrower custom queries, combine categories with specific techniques like cat:cs.CL AND chain-of-thought AND mathematical AND reasoning for CoT math only, cat:cs.MA AND emergent AND (referential OR compositional) for specific emergence types, or cat:cs.SE AND few-shot AND (Python OR JavaScript) AND test generation for language-specific code generation.

The script saves papers to data/gap_analysis_papers.json with this structure:

1[
2 {
3 "id": "2409.02645v2",
4 "title": "Emergent Language: A Survey and Taxonomy",
5 "authors": "Jannik Peters, Constantin Waubert de Puiseau, ...",
6 "published": "2024-09-04",
7 "category": "cs.MA",
8 "abstract": "The field of emergent language represents...",
9 "introduction": "Language emergence has been explored...",
10 "conclusion": "This paper provides a comprehensive review..."
11 }
12]

Why this format: Each paper includes ~2-3K characters of text (abstract + intro + conclusion), which is enough for analysis but won’t overflow context windows. For 12 papers, you’re looking at ~30K characters (~7.5K tokens) of input.

You now have 12 papers saved locally. Next, we’ll configure LaunchDarkly credentials and install the orchestration frameworks.

Step 2: Set up your multi-orchestrator project

Environment setup

For help getting your SDK and API keys, see the API access tokens guide and SDK key management.

$# .env file
$LD_SDK_KEY=sdk-xxxxx # Get from LaunchDarkly project settings
$LD_API_KEY=api-xxxxx # Create at Account settings → Authorization
$LAUNCHDARKLY_PROJECT_KEY=orchestrator-agents
$
$# Model API keys
$ANTHROPIC_API_KEY=sk-ant-xxxxx
$OPENAI_API_KEY=sk-xxxxx

Install dependencies

$python -m venv .venv
$source .venv/bin/activate
$
$# LaunchDarkly SDKs - see [Python SDK docs](/sdk/server-side/python)
$pip install ldai ldclient python-dotenv arxiv PyPDF2 requests
$
$# Orchestration frameworks
$pip install strands-sdk langgraph swarm

For more on the LaunchDarkly AI SDK, see the AI SDK documentation.

Your environment is configured and dependencies are installed. Next, we’ll use the bootstrap script to automatically create all three agent configs in LaunchDarkly.

Step 3: Bootstrap agent configs with the manifest

The orchestration repo includes a complete bootstrap system that automatically creates all agent configurations, tools, and variations in LaunchDarkly. This is much faster and more reliable than manual setup.

Understanding the bootstrap system

The bootstrap process uses a YAML manifest to define:

  1. Tools - Functions agents can call (fetch_paper_section, handoff_to_agent, etc.)
  2. Agent Configs - Three specialized agents with their roles and instructions
  3. Variations - Multiple model options (Anthropic Claude vs OpenAI GPT)
  4. Targeting Rules - Which orchestrators get which models

Run the bootstrap script

$# From the orchestration repo root
$cd ai-orchestrators
$
$# Run bootstrap with the research gap manifest
$python scripts/launchdarkly/bootstrap.py
$
$# You'll see:
>╔═══════════════════════════════════════════════════════╗
>║ AI Agent Orchestrator - LaunchDarkly Bootstrap ║
>╚═══════════════════════════════════════════════════════╝
>
>Available manifests:
> 1. Research Gap Analysis (research_gap_manifest.yaml)
>
>Select manifest or press Enter for default: [Enter]
>
>📦 Project: orchestrator-agents
>🌍 Environment: production
>
>🛠️ Creating paper analysis tools...
> ✓ Tool 'extract_key_sections' created
> ✓ Tool 'fetch_paper_section' created
> ✓ Tool 'handoff_to_agent' created
> ...
>
>🤖 Creating AI agent configs...
> ✓ config 'approach-analyzer' created
> ✓ config 'contradiction-detector' created
> ✓ config 'gap-synthesizer' created
>
>✨ Bootstrap complete!

What gets created

The bootstrap script creates the three agents described earlier (Approach Analyzer, Contradiction Detector, Gap Synthesizer), each with swarm-aware instructions and handoff tools.

Verify in LaunchDarkly dashboard

After bootstrap completes:

  1. Go to your AgentControl dashboard at https://app.launchdarkly.com/<your-project-key>/<your-environment-key>/ai-configs
  2. You’ll see all three agent configs created
  3. Each config has:
    • Two variations (Claude and OpenAI models)
    • Proper tools configured
    • Detailed swarm-aware instructions
    • Targeting rules for orchestrator-specific routing

How variations and targeting work

Each agent has two variations in the manifest:

1# Example from approach-analyzer agent
2variations:
3 - key: "analyzer-claude"
4 name: "Approach Analyzer Claude"
5 modelConfig:
6 provider: "anthropic"
7 modelId: "claude-sonnet-4-5"
8 tools: ["handoff_to_agent", "cluster_approaches"]
9 instructions: |
10 [Agent instructions here]
11
12 - key: "analyzer-openai"
13 name: "Approach Analyzer OpenAI"
14 modelConfig:
15 provider: "openai"
16 modelId: "gpt-5"
17 tools: ["handoff_to_agent", "cluster_approaches"]
18 instructions: |
19 [Same instructions, different model]
20
21targeting:
22 rules:
23 - variation: "analyzer-openai"
24 clauses:
25 - attribute: "orchestrator"
26 op: "in"
27 values: ["openai_swarm", "openai-swarm"]
28 defaultVariation: "analyzer-claude"

When an orchestrator requests this agent:

  1. Context includes orchestrator attribute: context = create_context(execution_id, orchestrator="openai_swarm")
  2. LaunchDarkly evaluates targeting rules: If orchestrator is “openai_swarm” or “openai-swarm”, use OpenAI variation
  3. Otherwise use default: Claude variation for all other orchestrators

This lets you:

  • Use OpenAI models when running OpenAI Swarm (native compatibility)
  • Use Claude for other orchestrators
  • A/B test models by adjusting targeting rules

Customize agent behavior

After bootstrap, you can adjust agents in the LaunchDarkly UI without code changes. Switch between Claude, GPT-4, or other supported providers. Refine instructions for better handoffs. Control which agents are included in the swarm through targeting rules. Test different prompts or models side-by-side with experiments.

Your three agents are now configured in LaunchDarkly. Next, we’ll implement tracking so you can monitor tokens, latency, and cost for each agent individually.

Step 4: Implement per-agent tracking

The orchestration repository demonstrates per-agent tracking across all three frameworks. First, you need to fetch agent configurations from LaunchDarkly:

Fetching agent configurations dynamically

1from shared.launchdarkly import (
2 init_launchdarkly_clients,
3 fetch_agent_configs_from_api,
4 create_context,
5 build_agent_requests
6)
7
8# Initialize LaunchDarkly clients
9ld_client, ai_client = init_launchdarkly_clients()
10
11# Fetch agent list from LaunchDarkly API (not hardcoded!)
12items = fetch_agent_configs_from_api()
13print(f"Found {len(items)} config(s) in LaunchDarkly")
14
15# Create execution context
16execution_id = f"langgraph-{datetime.now().strftime('%Y%m%d_%H%M%S')}"
17context = create_context(execution_id, orchestrator="langgraph")
18
19# Build requests for all agents
20agent_requests, agent_metadata = build_agent_requests(items)
21
22# Fetch all configs in one call
23configs = ai_client.agent_configs(agent_requests, context)
24
25# Process agents with configured variations
26enabled_agents = []
27for item in items:
28 config = configs.get(item["key"])
29 if config and config.enabled:
30 enabled_agents.append({
31 "key": item["key"],
32 "name": item["name"],
33 "config": config,
34 "model": config.model.name if config.model else "claude-sonnet-4-5"
35 })
36
37print(f"✓ Found {len(enabled_agents)} configured agent configs")

Pattern 1: Native framework metrics (Strands)

Strands provides accumulated_usage on each node result after execution:

1# From orchestrators/strands/run_gap_analysis.py:418-424
2if agent_key in per_agent_metrics:
3 usage = node_result.accumulated_usage or {}
4 input_tokens, output_tokens = extract_usage_tokens(usage)
5 total_tokens = input_tokens + output_tokens

View full Strands implementation

Pattern 2: Message-based tracking (LangGraph)

LangGraph attaches usage_metadata to messages, requiring post-execution iteration:

1# From orchestrators/langgraph/run_gap_analysis.py:442-446
2if hasattr(msg, "usage_metadata") and msg.usage_metadata:
3 usage_data = msg.usage_metadata
4 input_tokens = usage_data.get("input_tokens", 0) or usage_data.get("prompt_tokens", 0)
5 output_tokens = usage_data.get("output_tokens", 0) or usage_data.get("completion_tokens", 0)
6 has_usage = True

View full LangGraph implementation

Pattern 3: Interception-based tracking (OpenAI Swarm)

OpenAI Swarm doesn’t aggregate per-agent metrics, requiring interception of completion calls:

1# From orchestrators/openai_swarm/run_gap_analysis.py:369-387
2original_get_chat_completion = client.get_chat_completion
3
4def tracked_get_chat_completion(agent, history, context_variables, model_override, stream, debug):
5 start_call = time.time()
6 completion = original_get_chat_completion(
7 agent=agent,
8 history=history,
9 context_variables=context_variables,
10 model_override=model_override,
11 stream=stream,
12 debug=debug,
13 )
14 duration = time.time() - start_call
15 agent_key = key_by_name.get(agent.name, agent.name)
16 usage = getattr(completion, "usage", None)
17 if usage:
18 input_tokens = int(getattr(usage, "prompt_tokens", 0))
19 output_tokens = int(getattr(usage, "completion_tokens", 0))
20 total_tokens = int(getattr(usage, "total_tokens", input_tokens + output_tokens))

View full OpenAI Swarm implementation

Critical: Provider token field names differ

Each provider uses different field names: Anthropic uses input_tokens/output_tokens, OpenAI uses prompt_tokens/completion_tokens, and some frameworks use camelCase (inputTokens). The implementations use fallback chains to handle all formats.

You can now capture tokens, latency, and cost for each agent. Next, we’ll run the swarm across LangGraph, Strands, and OpenAI Swarm to see how they perform with the same agent definitions.

Step 5: Run multiple orchestrators and track results

The repository includes scripts to run all three orchestrators and analyze their performance:

$# Run all orchestrators 5 times each
$./scripts/run_swarm_benchmark.sh sequential 5
$
$# Analyze the results
$python scripts/analyze_benchmark_results.py
Quick start recap
  1. Configure env: Create .env with SDK keys
  2. Install deps: pip install -r requirements.txt
  3. Download papers: python scripts/download_papers.py
  4. Bootstrap agents: python scripts/launchdarkly/bootstrap.py
  5. Configure targeting: Set default variation for each agent in LaunchDarkly UI
  6. Test run: python orchestrators/strands/run_gap_analysis.py

Troubleshooting: If you see “No enabled agents found,” check that each agent has a default variation set in the Targeting tab.

Now that you’ve run the swarm across all three orchestrators, let’s look at how they differ in approach and performance.

Comparing orchestrator approaches to swarms

All three frameworks support multi-agent workflows, they just disagree on who decides what happens next.

Key differences

AspectStrandsLangGraphOpenAI Swarm
RoutingFramework-managedGraph-basedFunction return
Handoff APITool call (automatic)Command objectReturn Agent object
BoilerplateLowMediumMedium
ControlLow (black box)High (explicit graph)High (manual impl)
DebuggingHard (why didn’t agent run?)Easy (graph trace)Hard (silent failures)
Per-Agent MetricsBuilt-inWrapper requiredInterception required

View full implementations: Strands | LangGraph | OpenAI Swarm

The LaunchDarkly advantage: By defining agents externally, you can implement swarms across all three frameworks and compare their approaches with the same agent definitions.

Performance comparison (9 runs: 3 datasets × 3 orchestrators)

MetricOpenAI SwarmStrandsLangGraph
Avg Time2.9 min5.7 min8.0 min
Tokens67K99K89K
Speed385 tok/s287 tok/s186 tok/s
Report Size13KB32KB67KB
Variance±1.05 min±1.38 min±0.21 min

Key insight (based on limited sample): Fastest ≠ best. OpenAI Swarm was 3x faster but produced reports 80% smaller than LangGraph. LangGraph had the lowest variance and most comprehensive outputs despite slower execution.

Performance comparison graphs showing execution time, token usage, and processing speed across all three orchestrators

Performance comparison graphs showing execution time, token usage, and processing speed across all three orchestrators

Example reports: See the outputs

  • LangGraph (60-70KB): Emergent | Theorem | Self-Improvement
  • Strands (30-35KB): Emergent | Theorem | Self-Improvement
  • OpenAI Swarm (10-15KB): Emergent | Theorem | Self-Improvement

Report size variation demonstrates why per-agent tracking matters - you need to know when agents produce minimal output.

Conclusion

The orchestrator you choose determines how agents coordinate, but it shouldn’t lock you into a single framework. By defining agents in LaunchDarkly and fetching them at runtime, you can run the same swarm across LangGraph, Strands, and OpenAI Swarm without duplicating configuration or watching prompts drift between repos.

The performance differences are real. OpenAI Swarm is fastest, LangGraph produces the most comprehensive outputs, and Strands offers the simplest setup. But you only discover these tradeoffs if you can track each agent individually and catch silent failures when they happen.

Swarms cost more than single LLM calls. The payoff is traceable reasoning you can audit, refine, and trust.

The full implementation is available on GitHub - AI Orchestrators. Clone the repo and run the same swarm across all three orchestrators. To get started with AgentControl, follow the quickstart guide.

Related tutorials

  • Beyond n8n for Workflow Automation: Agent Graphs - Externalize the orchestration topology to LaunchDarkly so swapping frameworks doesn’t redefine the graph
  • Build AgentControl configs with Agent Skills - Generate the agent configs in this comparison from natural-language prompts
  • Build a LangGraph Multi-Agent system in 20 Minutes - The LangGraph variant of the swarm pattern, in depth
  • Evaluate LLM code generation with LLM-as-judge evaluators - Add per-agent quality measurement to whichever orchestrator you choose
  • Proving ROI with data-driven AI agent experiments - A/B test orchestrator choices to prove which one wins for your workload