Building Framework-Agnostic AI Swarms: Compare LangGraph, Strands, and OpenAI Swarm

Published February 4, 2026

Portrait of Scarlett Attensil.

by Scarlett Attensil

If you’ve ever run the same app in multiple environments, you know the pain of duplicated configuration. Agent swarms have the same problem: the moment you try multiple orchestrators (LangGraph, Strands, OpenAI Swarm), your agent definitions start living in different formats. Prompts drift. Model settings drift. A “small behavior tweak” turns into archaeology across repos.

AI behavior isn’t code. Prompts aren’t functions. They change too often, and too experimentally, to be hard-wired into orchestrator code. LaunchDarkly AI Configs lets you treat agent definitions like shared configuration instead. Define them once, store them centrally, and let any orchestrator fetch them. Update a prompt or model setting in the LaunchDarkly UI, and the new version rolls out without a redeploy.

Start your free trial

Ready to build framework-agnostic AI swarms? Start your 14-day free trial of LaunchDarkly to follow along with this tutorial. No credit card required.

Start free trial

The problem: Research gap analysis across multiple papers

When analyzing academic literature, researchers face a daunting task: reading dozens of papers to identify patterns, spot contradictions, and find unexplored opportunities. A single LLM call can summarize papers, but it produces a monolithic analysis you can’t trace, refine, or trust for critical decisions.

The challenge compounds when you need to:

  • Identify methodological patterns across 12+ papers without missing subtle connections
  • Detect contradictory findings that might invalidate assumptions
  • Discover research gaps that represent genuine opportunities, not just oversight

This is where specialized agents excel - each focused on one aspect of the analysis, building on each other’s work.

In this tutorial, we’ll build a 3-agent research analysis swarm that solves this problem by dividing the work:

AgentRoleOutput
Approach AnalyzerClusters methodological themes across papers”Papers 1, 4, 7 use reinforcement learning; Papers 2, 5 use symbolic methods”
Contradiction DetectorFinds conflicting claims between papers”Paper 3 claims X improves performance; Paper 8 shows X degrades it”
Gap SynthesizerIdentifies unexplored research directions”No papers combine approach A with dataset B; potential opportunity”

We’ll implement this swarm across three different orchestrators (LangGraph, Strands, and OpenAI Swarm), demonstrating how LaunchDarkly AI Configs enable:

  • Framework-agnostic agent definitions: Define agents once in LaunchDarkly, use them everywhere
  • Per-agent observability: Track tokens, latency, and costs for each agent individually - catch silent failures when agents skip execution
  • Dynamic swarm composition: Add/remove agents from the swarm or switch models without touching code

Why use a swarm?

Research gap analysis requires different skills: clustering methodological patterns, detecting contradictions, and synthesizing opportunities. With a swarm, each agent handles one aspect and produces artifacts the next agent builds on. You can track tokens, latency, and cost per agent. You can catch silent failures when an agent skips execution. And when something goes wrong, you know exactly where.

Technical requirements

Before implementing the swarm, ensure you have:

The complete implementation is available at GitHub - AI Orchestrators.

The architecture: how LaunchDarkly powers framework-agnostic swarms

The swarm architecture has three layers: dynamic agent configuration, per-agent tracking, and custom metrics for cost attribution. Here’s how they work together.

LangGraph swarm architecture showing LaunchDarkly configuration fetch, agent interactions with Command-based handoffs, and dual metrics tracking to both AI Config Trends and Product Analytics dashboards

LangGraph swarm architecture showing LaunchDarkly configuration fetch, agent interactions with Command-based handoffs, and dual metrics tracking to both AI Config Trends and Product Analytics dashboards

The diagram shows LangGraph’s implementation, but Strands and OpenAI Swarm follow the same pattern with their own handoff mechanisms. The key components are:

  1. Configuration Fetch: The orchestrator queries LaunchDarkly’s API to dynamically discover all agent configurations, avoiding hardcoded agent definitions
  2. Agent Graph: Three specialized agents (Approach Analyzer, Contradiction Detector, Gap Synthesizer) connected through explicit handoff mechanisms
  3. Metrics Collection: Each agent execution captures tokens, duration, and cost metrics through both the AI Config tracker and custom metrics API
  4. Dual Dashboard Views: The same metrics appear in both the AI Config Trends dashboard (for individual agent monitoring) and Product Analytics (for cross-orchestrator comparison)

Three layers of framework-agnostic swarms

1. AI Config for Dynamic Agent Configuration

Each AI Config stores:

  • Agent key, display name, and model selection
  • System instructions and tool definitions

Your orchestrator code queries LaunchDarkly for “all enabled agent configs” and builds the swarm dynamically. No hardcoded agent names.

2. Per-Agent Tracking with AI SDK

LaunchDarkly’s AI SDK provides tracking through config evaluations. You get a fresh tracker for each agent, then track tokens, duration, and success/failure. These metrics flow to the AI Config Monitoring dashboard automatically.

AI Config monitoring dashboard showing per-agent token usage, duration, and success rates across multiple runs

AI Config monitoring dashboard showing per-agent token usage, duration, and success rates across multiple runs

This tracking catches silent failures - when agents skip execution or produce minimal output. Step 4 shows the implementation patterns for each framework.

3. Custom Metrics for Cost Attribution

Per-agent tracking shows performance, but for cost comparisons across orchestrators you need custom metrics. These let you query by orchestrator, compare costs across frameworks, and identify anomalies.

With the architecture covered, let’s build the swarm. We’ll download research papers, set up the project, bootstrap agent configs in LaunchDarkly, implement per-agent tracking, and run the swarm across all three orchestrators.

Step 1: Download research papers

First, you need papers to analyze. The scripts/download_papers.py script queries ArXiv with narrow, category-specific searches to ensure focused results.

$python scripts/download_papers.py

The script presents pre-configured narrow research topics:

1# From orchestration/scripts/download_papers.py:164-189
2topics = {
3 "1": {
4 "name": "Chain-of-thought prompting in LLMs",
5 "query": "cat:cs.CL AND (chain-of-thought OR CoT) AND reasoning",
6 "years": 2
7 },
8 "2": {
9 "name": "Retrieval-augmented generation (RAG)",
10 "query": "cat:cs.CL AND (retrieval-augmented OR RAG) AND generation",
11 "years": 2
12 },
13 "3": {
14 "name": "Emergent communication in multi-agent RL",
15 "query": "cat:cs.MA AND (emergent communication OR language emergence)",
16 "years": 5
17 },
18 "4": {
19 "name": "Few-shot prompting for code generation",
20 "query": "cat:cs.SE AND few-shot AND code generation",
21 "years": 2
22 },
23 "5": {
24 "name": "Vision-language model grounding",
25 "query": "cat:cs.CV AND vision-language AND grounding",
26 "years": 2
27 }
28}

These topics are intentionally narrow: Each uses ArXiv categories (cat:cs.CL, cat:cs.MA) to limit scope. Boolean AND operators ensure papers match all criteria. 2-5 year windows prevent overwhelming the analysis.

For even narrower custom queries, combine categories with specific techniques like cat:cs.CL AND chain-of-thought AND mathematical AND reasoning for CoT math only, cat:cs.MA AND emergent AND (referential OR compositional) for specific emergence types, or cat:cs.SE AND few-shot AND (Python OR JavaScript) AND test generation for language-specific code generation.

The script saves papers to data/gap_analysis_papers.json with this structure:

1[
2 {
3 "id": "2409.02645v2",
4 "title": "Emergent Language: A Survey and Taxonomy",
5 "authors": "Jannik Peters, Constantin Waubert de Puiseau, ...",
6 "published": "2024-09-04",
7 "category": "cs.MA",
8 "abstract": "The field of emergent language represents...",
9 "introduction": "Language emergence has been explored...",
10 "conclusion": "This paper provides a comprehensive review..."
11 }
12]

Why this format: Each paper includes ~2-3K characters of text (abstract + intro + conclusion), which is enough for analysis but won’t overflow context windows. For 12 papers, you’re looking at ~30K characters (~7.5K tokens) of input.

You now have 12 papers saved locally. Next, we’ll configure LaunchDarkly credentials and install the orchestration frameworks.

Step 2: Set up your multi-orchestrator project

Environment setup

For help getting your SDK and API keys, see the API access tokens guide and SDK key management.

$# .env file
$LD_SDK_KEY=sdk-xxxxx # Get from LaunchDarkly project settings
$LD_API_KEY=api-xxxxx # Create at Account settings → Authorization
$LAUNCHDARKLY_PROJECT_KEY=orchestrator-agents
$
$# Model API keys
$ANTHROPIC_API_KEY=sk-ant-xxxxx
$OPENAI_API_KEY=sk-xxxxx

Install dependencies

$python -m venv .venv
$source .venv/bin/activate
$
$# LaunchDarkly SDKs - see [Python SDK docs](/sdk/server-side/python)
$pip install ldai ldclient python-dotenv arxiv PyPDF2 requests
$
$# Orchestration frameworks
$pip install strands-sdk langgraph swarm

For more on the LaunchDarkly AI SDK, see the AI SDK documentation.

Your environment is configured and dependencies are installed. Next, we’ll use the bootstrap script to automatically create all three agent configs in LaunchDarkly.

Step 3: Bootstrap agent configs with the manifest

The orchestration repo includes a complete bootstrap system that automatically creates all agent configurations, tools, and variations in LaunchDarkly. This is much faster and more reliable than manual setup.

Understanding the bootstrap system

The bootstrap process uses a YAML manifest to define:

  1. Tools - Functions agents can call (fetch_paper_section, handoff_to_agent, etc.)
  2. Agent Configs - Three specialized agents with their roles and instructions
  3. Variations - Multiple model options (Anthropic Claude vs OpenAI GPT)
  4. Targeting Rules - Which orchestrators get which models

Run the bootstrap script

$# From the orchestration repo root
$cd ai-orchestrators
$
$# Run bootstrap with the research gap manifest
$python scripts/launchdarkly/bootstrap.py
$
$# You'll see:
>╔═══════════════════════════════════════════════════════╗
>║ AI Agent Orchestrator - LaunchDarkly Bootstrap ║
>╚═══════════════════════════════════════════════════════╝
>
>Available manifests:
> 1. Research Gap Analysis (research_gap_manifest.yaml)
>
>Select manifest or press Enter for default: [Enter]
>
>📦 Project: orchestrator-agents
>🌍 Environment: production
>
>🛠️ Creating paper analysis tools...
> ✓ Tool 'extract_key_sections' created
> ✓ Tool 'fetch_paper_section' created
> ✓ Tool 'handoff_to_agent' created
> ...
>
>🤖 Creating AI agent configs...
> ✓ AI Config 'approach-analyzer' created
> ✓ AI Config 'contradiction-detector' created
> ✓ AI Config 'gap-synthesizer' created
>
>✨ Bootstrap complete!

What gets created

The bootstrap script creates the three agents described earlier (Approach Analyzer, Contradiction Detector, Gap Synthesizer), each with swarm-aware instructions and handoff tools.

Verify in LaunchDarkly dashboard

After bootstrap completes:

  1. Go to your LaunchDarkly AI Configs dashboard at https://app.launchdarkly.com/<your-project-key>/<your-environment-key>/ai-configs
  2. You’ll see all three agent configs created
  3. Each config has:

How variations and targeting work

Each agent has two variations in the manifest:

1# Example from approach-analyzer agent
2variations:
3 - key: "analyzer-claude"
4 name: "Approach Analyzer Claude"
5 modelConfig:
6 provider: "anthropic"
7 modelId: "claude-sonnet-4-5"
8 tools: ["handoff_to_agent", "cluster_approaches"]
9 instructions: |
10 [Agent instructions here]
11
12 - key: "analyzer-openai"
13 name: "Approach Analyzer OpenAI"
14 modelConfig:
15 provider: "openai"
16 modelId: "gpt-5"
17 tools: ["handoff_to_agent", "cluster_approaches"]
18 instructions: |
19 [Same instructions, different model]
20
21targeting:
22 rules:
23 - variation: "analyzer-openai"
24 clauses:
25 - attribute: "orchestrator"
26 op: "in"
27 values: ["openai_swarm", "openai-swarm"]
28 defaultVariation: "analyzer-claude"

When an orchestrator requests this agent:

  1. Context includes orchestrator attribute: context = create_context(execution_id, orchestrator="openai_swarm")
  2. LaunchDarkly evaluates targeting rules: If orchestrator is “openai_swarm” or “openai-swarm”, use OpenAI variation
  3. Otherwise use default: Claude variation for all other orchestrators

This lets you:

  • Use OpenAI models when running OpenAI Swarm (native compatibility)
  • Use Claude for other orchestrators
  • A/B test models by adjusting targeting rules

Customize agent behavior

After bootstrap, you can adjust agents in the LaunchDarkly UI without code changes. Switch between Claude, GPT-4, or other supported providers. Refine instructions for better handoffs. Control which agents are included in the swarm through targeting rules. Test different prompts or models side-by-side with experiments.

Your three agents are now configured in LaunchDarkly. Next, we’ll implement tracking so you can monitor tokens, latency, and cost for each agent individually.

Step 4: Implement per-agent tracking

The orchestration repository demonstrates per-agent tracking across all three frameworks. First, you need to fetch agent configurations from LaunchDarkly:

Fetching agent configurations dynamically

1from shared.launchdarkly import (
2 init_launchdarkly_clients,
3 fetch_agent_configs_from_api,
4 create_context,
5 build_agent_requests
6)
7
8# Initialize LaunchDarkly clients
9ld_client, ai_client = init_launchdarkly_clients()
10
11# Fetch agent list from LaunchDarkly API (not hardcoded!)
12items = fetch_agent_configs_from_api()
13print(f"Found {len(items)} AI config(s) in LaunchDarkly")
14
15# Create execution context
16execution_id = f"langgraph-{datetime.now().strftime('%Y%m%d_%H%M%S')}"
17context = create_context(execution_id, orchestrator="langgraph")
18
19# Build requests for all agents
20agent_requests, agent_metadata = build_agent_requests(items)
21
22# Fetch all configs in one call
23configs = ai_client.agent_configs(agent_requests, context)
24
25# Process agents with configured variations
26enabled_agents = []
27for item in items:
28 config = configs.get(item["key"])
29 if config and config.enabled:
30 enabled_agents.append({
31 "key": item["key"],
32 "name": item["name"],
33 "config": config,
34 "model": config.model.name if config.model else "claude-sonnet-4-5"
35 })
36
37print(f"✓ Found {len(enabled_agents)} configured agent configs")

Pattern 1: Native framework metrics (Strands)

Strands provides accumulated_usage on each node result after execution:

1# From orchestrators/strands/run_gap_analysis.py:418-424
2if agent_key in per_agent_metrics:
3 usage = node_result.accumulated_usage or {}
4 input_tokens, output_tokens = extract_usage_tokens(usage)
5 total_tokens = input_tokens + output_tokens

View full Strands implementation

Pattern 2: Message-based tracking (LangGraph)

LangGraph attaches usage_metadata to messages, requiring post-execution iteration:

1# From orchestrators/langgraph/run_gap_analysis.py:442-446
2if hasattr(msg, "usage_metadata") and msg.usage_metadata:
3 usage_data = msg.usage_metadata
4 input_tokens = usage_data.get("input_tokens", 0) or usage_data.get("prompt_tokens", 0)
5 output_tokens = usage_data.get("output_tokens", 0) or usage_data.get("completion_tokens", 0)
6 has_usage = True

View full LangGraph implementation

Pattern 3: Interception-based tracking (OpenAI Swarm)

OpenAI Swarm doesn’t aggregate per-agent metrics, requiring interception of completion calls:

1# From orchestrators/openai_swarm/run_gap_analysis.py:369-387
2original_get_chat_completion = client.get_chat_completion
3
4def tracked_get_chat_completion(agent, history, context_variables, model_override, stream, debug):
5 start_call = time.time()
6 completion = original_get_chat_completion(
7 agent=agent,
8 history=history,
9 context_variables=context_variables,
10 model_override=model_override,
11 stream=stream,
12 debug=debug,
13 )
14 duration = time.time() - start_call
15 agent_key = key_by_name.get(agent.name, agent.name)
16 usage = getattr(completion, "usage", None)
17 if usage:
18 input_tokens = int(getattr(usage, "prompt_tokens", 0))
19 output_tokens = int(getattr(usage, "completion_tokens", 0))
20 total_tokens = int(getattr(usage, "total_tokens", input_tokens + output_tokens))

View full OpenAI Swarm implementation

Critical: Provider token field names differ

Each provider uses different field names: Anthropic uses input_tokens/output_tokens, OpenAI uses prompt_tokens/completion_tokens, and some frameworks use camelCase (inputTokens). The implementations use fallback chains to handle all formats.

You can now capture tokens, latency, and cost for each agent. Next, we’ll run the swarm across LangGraph, Strands, and OpenAI Swarm to see how they perform with the same agent definitions.

Step 5: Run multiple orchestrators and track results

The repository includes scripts to run all three orchestrators and analyze their performance:

$# Run all orchestrators 5 times each
$./scripts/run_swarm_benchmark.sh sequential 5
$
$# Analyze the results
$python scripts/analyze_benchmark_results.py
Quick start recap
  1. Configure env: Create .env with SDK keys
  2. Install deps: pip install -r requirements.txt
  3. Download papers: python scripts/download_papers.py
  4. Bootstrap agents: python scripts/launchdarkly/bootstrap.py
  5. Configure targeting: Set default variation for each agent in LaunchDarkly UI
  6. Test run: python orchestrators/strands/run_gap_analysis.py

Troubleshooting: If you see “No enabled agents found,” check that each agent has a default variation set in the Targeting tab.

Now that you’ve run the swarm across all three orchestrators, let’s look at how they differ in approach and performance.

Comparing orchestrator approaches to swarms

All three frameworks support multi-agent workflows, they just disagree on who decides what happens next.

Key differences

AspectStrandsLangGraphOpenAI Swarm
RoutingFramework-managedGraph-basedFunction return
Handoff APITool call (automatic)Command objectReturn Agent object
BoilerplateLowMediumMedium
ControlLow (black box)High (explicit graph)High (manual impl)
DebuggingHard (why didn’t agent run?)Easy (graph trace)Hard (silent failures)
Per-Agent MetricsBuilt-inWrapper requiredInterception required

View full implementations: Strands | LangGraph | OpenAI Swarm

The LaunchDarkly advantage: By defining agents externally, you can implement swarms across all three frameworks and compare their approaches with the same agent definitions.

Performance comparison (9 runs: 3 datasets × 3 orchestrators)

MetricOpenAI SwarmStrandsLangGraph
Avg Time2.9 min5.7 min8.0 min
Tokens67K99K89K
Speed385 tok/s287 tok/s186 tok/s
Report Size13KB32KB67KB
Variance±1.05 min±1.38 min±0.21 min

Key insight (based on limited sample): Fastest ≠ best. OpenAI Swarm was 3x faster but produced reports 80% smaller than LangGraph. LangGraph had the lowest variance and most comprehensive outputs despite slower execution.

Performance comparison graphs showing execution time, token usage, and processing speed across all three orchestrators

Performance comparison graphs showing execution time, token usage, and processing speed across all three orchestrators

Example reports: See the outputs

Report size variation demonstrates why per-agent tracking matters - you need to know when agents produce minimal output.

Go further: Set up a product analytics dashboard

If you’ll be running multiple benchmarks, set up a custom dashboard to track trends:

  1. Go to Metrics → Custom metrics
  2. Select agent_execution_cost and click Create dashboard
  3. Add charts for agent_execution_tokens and agent_execution_duration
  4. Add filters for orchestrator and agent to compare frameworks
  5. Save as “Swarm Cost Comparison”

Custom dashboard showing cost, token usage, and duration metrics across all three orchestrators

Custom dashboard showing cost, token usage, and duration metrics across all three orchestrators

The dashboard automatically aggregates metrics from all your runs, showing:

  • Cost by orchestrator: Which framework is most expensive
  • Per-agent token usage: Which agents consume the most resources
  • Execution time trends: Performance consistency across runs
  • Silent failures: Agents that didn’t execute (0 tokens)

Product analytics dashboard showing aggregated metrics across all orchestrator runs with filters and breakdowns

Product analytics dashboard showing aggregated metrics across all orchestrator runs with filters and breakdowns

Conclusion

The orchestrator you choose determines how agents coordinate, but it shouldn’t lock you into a single framework. By defining agents in LaunchDarkly and fetching them at runtime, you can run the same swarm across LangGraph, Strands, and OpenAI Swarm without duplicating configuration or watching prompts drift between repos.

The performance differences are real. OpenAI Swarm is fastest, LangGraph produces the most comprehensive outputs, and Strands offers the simplest setup. But you only discover these tradeoffs if you can track each agent individually and catch silent failures when they happen.

Swarms cost more than single LLM calls. The payoff is traceable reasoning you can audit, refine, and trust.

The full implementation is available on GitHub - AI Orchestrators. Clone the repo and run the same swarm across all three orchestrators. To get started with LaunchDarkly AI Configs, follow the quickstart guide.