Building Framework-Agnostic AI Swarms: Compare LangGraph, Strands, and OpenAI Swarm | LaunchDarkly

Published February 4, 2026

by Scarlett Attensil

If you’ve ever run the same app in multiple environments, you know the pain of duplicated configuration. Agent swarms have the same problem: the moment you try multiple orchestrators (LangGraph, Strands, OpenAI Swarm), your agent definitions start living in different formats. Prompts drift. Model settings drift. A “small behavior tweak” turns into archaeology across repos.

AI behavior isn’t code. Prompts aren’t functions. They change too often, and too experimentally, to be hard-wired into orchestrator code. LaunchDarkly AI Configs lets you treat agent definitions like shared configuration instead. Define them once, store them centrally, and let any orchestrator fetch them. Update a prompt or model setting in the LaunchDarkly UI, and the new version rolls out without a redeploy.

Start your free trial

Ready to build framework-agnostic AI swarms? Start your 14-day free trial of LaunchDarkly to follow along with this tutorial. No credit card required.

Start free trial →

The problem: Research gap analysis across multiple papers

When analyzing academic literature, researchers face a daunting task: reading dozens of papers to identify patterns, spot contradictions, and find unexplored opportunities. A single LLM call can summarize papers, but it produces a monolithic analysis you can’t trace, refine, or trust for critical decisions.

The challenge compounds when you need to:

Identify methodological patterns across 12+ papers without missing subtle connections
Detect contradictory findings that might invalidate assumptions
Discover research gaps that represent genuine opportunities, not just oversight

This is where specialized agents excel - each focused on one aspect of the analysis, building on each other’s work.

In this tutorial, we’ll build a 3-agent research analysis swarm that solves this problem by dividing the work:

Agent	Role	Output
Approach Analyzer	Clusters methodological themes across papers	”Papers 1, 4, 7 use reinforcement learning; Papers 2, 5 use symbolic methods”
Contradiction Detector	Finds conflicting claims between papers	”Paper 3 claims X improves performance; Paper 8 shows X degrades it”
Gap Synthesizer	Identifies unexplored research directions	”No papers combine approach A with dataset B; potential opportunity”

We’ll implement this swarm across three different orchestrators (LangGraph, Strands, and OpenAI Swarm), demonstrating how LaunchDarkly AI Configs enable:

Framework-agnostic agent definitions: Define agents once in LaunchDarkly, use them everywhere
Per-agent observability: Track tokens, latency, and costs for each agent individually - catch silent failures when agents skip execution
Dynamic swarm composition: Add/remove agents from the swarm or switch models without touching code

Why use a swarm?

Research gap analysis requires different skills: clustering methodological patterns, detecting contradictions, and synthesizing opportunities. With a swarm, each agent handles one aspect and produces artifacts the next agent builds on. You can track tokens, latency, and cost per agent. You can catch silent failures when an agent skips execution. And when something goes wrong, you know exactly where.

Technical requirements

Before implementing the swarm, ensure you have:

LaunchDarkly account with AI Configs enabled (see quickstart guide)
API keys for Anthropic Claude or OpenAI GPT-4 (check supported models)
Python 3.11+ for running orchestrators
Basic understanding of agent systems (review LangGraph agents tutorial if needed)

The complete implementation is available at GitHub - AI Orchestrators.

The architecture: how LaunchDarkly powers framework-agnostic swarms

The swarm architecture has three layers: dynamic agent configuration, per-agent tracking, and custom metrics for cost attribution. Here’s how they work together.

LangGraph swarm architecture showing LaunchDarkly configuration fetch, agent interactions with Command-based handoffs, and dual metrics tracking to AI Config Trends

The diagram shows LangGraph’s implementation, but Strands and OpenAI Swarm follow the same pattern with their own handoff mechanisms. The key components are:

Configuration Fetch: The orchestrator queries LaunchDarkly’s API to dynamically discover all agent configurations, avoiding hardcoded agent definitions
Agent Graph: Three specialized agents (Approach Analyzer, Contradiction Detector, Gap Synthesizer) connected through explicit handoff mechanisms
Metrics Collection: Each agent execution captures tokens, duration, and cost metrics through both the AI Config tracker and custom metrics API
Dual Dashboard Views: The same metrics appear in the AI Config Trends dashboard (for individual agent monitoring)

Three layers of framework-agnostic swarms

1. AI Config for Dynamic Agent Configuration

Each AI Config stores:

Agent key, display name, and model selection
System instructions and tool definitions

Your orchestrator code queries LaunchDarkly for “all enabled agent configs” and builds the swarm dynamically. No hardcoded agent names.

2. Per-Agent Tracking with AI SDK

LaunchDarkly’s AI SDK provides tracking through config evaluations. You get a fresh tracker for each agent, then track tokens, duration, and success/failure. These metrics flow to the AI Config Monitoring dashboard automatically.

AI Config monitoring dashboard showing per-agent token usage, duration, and success rates across multiple runs

This tracking catches silent failures - when agents skip execution or produce minimal output. Step 4 shows the implementation patterns for each framework.

3. Custom Metrics for Cost Attribution

Per-agent tracking shows performance, but for cost comparisons across orchestrators you need custom metrics. These let you query by orchestrator, compare costs across frameworks, and identify anomalies.

With the architecture covered, let’s build the swarm. We’ll download research papers, set up the project, bootstrap agent configs in LaunchDarkly, implement per-agent tracking, and run the swarm across all three orchestrators.

Step 1: Download research papers

First, you need papers to analyze. The scripts/download_papers.py script queries ArXiv with narrow, category-specific searches to ensure focused results.

$ python scripts/download_papers.py

The script presents pre-configured narrow research topics:

1 # From orchestration/scripts/download_papers.py:164-189
2 topics = {
3     "1": {
4         "name": "Chain-of-thought prompting in LLMs",
5         "query": "cat:cs.CL AND (chain-of-thought OR CoT) AND reasoning",
6         "years": 2
7     },
8     "2": {
9         "name": "Retrieval-augmented generation (RAG)",
10         "query": "cat:cs.CL AND (retrieval-augmented OR RAG) AND generation",
11         "years": 2
12     },
13     "3": {
14         "name": "Emergent communication in multi-agent RL",
15         "query": "cat:cs.MA AND (emergent communication OR language emergence)",
16         "years": 5
17     },
18     "4": {
19         "name": "Few-shot prompting for code generation",
20         "query": "cat:cs.SE AND few-shot AND code generation",
21         "years": 2
22     },
23     "5": {
24         "name": "Vision-language model grounding",
25         "query": "cat:cs.CV AND vision-language AND grounding",
26         "years": 2
27     }
28 }

These topics are intentionally narrow: Each uses ArXiv categories (cat:cs.CL, cat:cs.MA) to limit scope. Boolean AND operators ensure papers match all criteria. 2-5 year windows prevent overwhelming the analysis.

For even narrower custom queries, combine categories with specific techniques like cat:cs.CL AND chain-of-thought AND mathematical AND reasoning for CoT math only, cat:cs.MA AND emergent AND (referential OR compositional) for specific emergence types, or cat:cs.SE AND few-shot AND (Python OR JavaScript) AND test generation for language-specific code generation.

The script saves papers to data/gap_analysis_papers.json with this structure:

1 [
2   {
3     "id": "2409.02645v2",
4     "title": "Emergent Language: A Survey and Taxonomy",
5     "authors": "Jannik Peters, Constantin Waubert de Puiseau, ...",
6     "published": "2024-09-04",
7     "category": "cs.MA",
8     "abstract": "The field of emergent language represents...",
9     "introduction": "Language emergence has been explored...",
10     "conclusion": "This paper provides a comprehensive review..."
11   }
12 ]

Why this format: Each paper includes ~2-3K characters of text (abstract + intro + conclusion), which is enough for analysis but won’t overflow context windows. For 12 papers, you’re looking at ~30K characters (~7.5K tokens) of input.

You now have 12 papers saved locally. Next, we’ll configure LaunchDarkly credentials and install the orchestration frameworks.

Step 2: Set up your multi-orchestrator project

Environment setup

For help getting your SDK and API keys, see the API access tokens guide and SDK key management.

$ # .env file
$ LD_SDK_KEY=sdk-xxxxx       # Get from LaunchDarkly project settings
$ LD_API_KEY=api-xxxxx       # Create at Account settings → Authorization
$ LAUNCHDARKLY_PROJECT_KEY=orchestrator-agents
$ 
$ # Model API keys
$ ANTHROPIC_API_KEY=sk-ant-xxxxx
$ OPENAI_API_KEY=sk-xxxxx

Install dependencies

$ python -m venv .venv
$ source .venv/bin/activate
$ 
$ # LaunchDarkly SDKs - see [Python SDK docs](/sdk/server-side/python)
$ pip install ldai ldclient python-dotenv arxiv PyPDF2 requests
$ 
$ # Orchestration frameworks
$ pip install strands-sdk langgraph swarm

For more on the LaunchDarkly AI SDK, see the AI SDK documentation.

Your environment is configured and dependencies are installed. Next, we’ll use the bootstrap script to automatically create all three agent configs in LaunchDarkly.

Step 3: Bootstrap agent configs with the manifest

The orchestration repo includes a complete bootstrap system that automatically creates all agent configurations, tools, and variations in LaunchDarkly. This is much faster and more reliable than manual setup.

Understanding the bootstrap system

The bootstrap process uses a YAML manifest to define:

Tools - Functions agents can call (fetch_paper_section, handoff_to_agent, etc.)
Agent Configs - Three specialized agents with their roles and instructions
Variations - Multiple model options (Anthropic Claude vs OpenAI GPT)
Targeting Rules - Which orchestrators get which models

Run the bootstrap script

$ # From the orchestration repo root
$ cd ai-orchestrators
$ 
$ # Run bootstrap with the research gap manifest
$ python scripts/launchdarkly/bootstrap.py
$ 
$ # You'll see:
> ╔═══════════════════════════════════════════════════════╗
> ║  AI Agent Orchestrator - LaunchDarkly Bootstrap       ║
> ╚═══════════════════════════════════════════════════════╝
> 
> Available manifests:
>   1. Research Gap Analysis (research_gap_manifest.yaml)
> 
> Select manifest or press Enter for default: [Enter]
> 
> 📦 Project: orchestrator-agents
> 🌍 Environment: production
> 
> 🛠️  Creating paper analysis tools...
>     ✓ Tool 'extract_key_sections' created
>     ✓ Tool 'fetch_paper_section' created
>     ✓ Tool 'handoff_to_agent' created
>     ...
> 
> 🤖 Creating AI agent configs...
>     ✓ AI Config 'approach-analyzer' created
>     ✓ AI Config 'contradiction-detector' created
>     ✓ AI Config 'gap-synthesizer' created
> 
> ✨ Bootstrap complete!

What gets created

The bootstrap script creates the three agents described earlier (Approach Analyzer, Contradiction Detector, Gap Synthesizer), each with swarm-aware instructions and handoff tools.

Verify in LaunchDarkly dashboard

After bootstrap completes:

Go to your LaunchDarkly AI Configs dashboard at https://app.launchdarkly.com/<your-project-key>/<your-environment-key>/ai-configs
You’ll see all three agent configs created
Each config has:
- Two variations (Claude and OpenAI models)
- Proper tools configured
- Detailed swarm-aware instructions
- Targeting rules for orchestrator-specific routing

How variations and targeting work

Each agent has two variations in the manifest:

1 # Example from approach-analyzer agent
2 variations:
3   - key: "analyzer-claude"
4     name: "Approach Analyzer Claude"
5     modelConfig:
6       provider: "anthropic"
7       modelId: "claude-sonnet-4-5"
8     tools: ["handoff_to_agent", "cluster_approaches"]
9     instructions: |
10       [Agent instructions here]
11 
12   - key: "analyzer-openai"
13     name: "Approach Analyzer OpenAI"
14     modelConfig:
15       provider: "openai"
16       modelId: "gpt-5"
17     tools: ["handoff_to_agent", "cluster_approaches"]
18     instructions: |
19       [Same instructions, different model]
20 
21 targeting:
22   rules:
23     - variation: "analyzer-openai"
24       clauses:
25         - attribute: "orchestrator"
26           op: "in"
27           values: ["openai_swarm", "openai-swarm"]
28   defaultVariation: "analyzer-claude"

When an orchestrator requests this agent:

Context includes orchestrator attribute: context = create_context(execution_id, orchestrator="openai_swarm")
LaunchDarkly evaluates targeting rules: If orchestrator is “openai_swarm” or “openai-swarm”, use OpenAI variation
Otherwise use default: Claude variation for all other orchestrators

This lets you:

Use OpenAI models when running OpenAI Swarm (native compatibility)
Use Claude for other orchestrators
A/B test models by adjusting targeting rules

Customize agent behavior

After bootstrap, you can adjust agents in the LaunchDarkly UI without code changes. Switch between Claude, GPT-4, or other supported providers. Refine instructions for better handoffs. Control which agents are included in the swarm through targeting rules. Test different prompts or models side-by-side with experiments.

Your three agents are now configured in LaunchDarkly. Next, we’ll implement tracking so you can monitor tokens, latency, and cost for each agent individually.

Step 4: Implement per-agent tracking

The orchestration repository demonstrates per-agent tracking across all three frameworks. First, you need to fetch agent configurations from LaunchDarkly:

Fetching agent configurations dynamically

1 from shared.launchdarkly import (
2     init_launchdarkly_clients,
3     fetch_agent_configs_from_api,
4     create_context,
5     build_agent_requests
6 )
7 
8 # Initialize LaunchDarkly clients
9 ld_client, ai_client = init_launchdarkly_clients()
10 
11 # Fetch agent list from LaunchDarkly API (not hardcoded!)
12 items = fetch_agent_configs_from_api()
13 print(f"Found {len(items)} AI config(s) in LaunchDarkly")
14 
15 # Create execution context
16 execution_id = f"langgraph-{datetime.now().strftime('%Y%m%d_%H%M%S')}"
17 context = create_context(execution_id, orchestrator="langgraph")
18 
19 # Build requests for all agents
20 agent_requests, agent_metadata = build_agent_requests(items)
21 
22 # Fetch all configs in one call
23 configs = ai_client.agent_configs(agent_requests, context)
24 
25 # Process agents with configured variations
26 enabled_agents = []
27 for item in items:
28     config = configs.get(item["key"])
29     if config and config.enabled:
30         enabled_agents.append({
31             "key": item["key"],
32             "name": item["name"],
33             "config": config,
34             "model": config.model.name if config.model else "claude-sonnet-4-5"
35         })
36 
37 print(f"✓ Found {len(enabled_agents)} configured agent configs")

Pattern 1: Native framework metrics (Strands)

Strands provides accumulated_usage on each node result after execution:

1 # From orchestrators/strands/run_gap_analysis.py:418-424
2 if agent_key in per_agent_metrics:
3     usage = node_result.accumulated_usage or {}
4     input_tokens, output_tokens = extract_usage_tokens(usage)
5     total_tokens = input_tokens + output_tokens

View full Strands implementation

Pattern 2: Message-based tracking (LangGraph)

LangGraph attaches usage_metadata to messages, requiring post-execution iteration:

1 # From orchestrators/langgraph/run_gap_analysis.py:442-446
2 if hasattr(msg, "usage_metadata") and msg.usage_metadata:
3     usage_data = msg.usage_metadata
4     input_tokens = usage_data.get("input_tokens", 0) or usage_data.get("prompt_tokens", 0)
5     output_tokens = usage_data.get("output_tokens", 0) or usage_data.get("completion_tokens", 0)
6     has_usage = True

View full LangGraph implementation

Pattern 3: Interception-based tracking (OpenAI Swarm)

OpenAI Swarm doesn’t aggregate per-agent metrics, requiring interception of completion calls:

1 # From orchestrators/openai_swarm/run_gap_analysis.py:369-387
2 original_get_chat_completion = client.get_chat_completion
3 
4 def tracked_get_chat_completion(agent, history, context_variables, model_override, stream, debug):
5     start_call = time.time()
6     completion = original_get_chat_completion(
7         agent=agent,
8         history=history,
9         context_variables=context_variables,
10         model_override=model_override,
11         stream=stream,
12         debug=debug,
13     )
14     duration = time.time() - start_call
15     agent_key = key_by_name.get(agent.name, agent.name)
16     usage = getattr(completion, "usage", None)
17     if usage:
18         input_tokens = int(getattr(usage, "prompt_tokens", 0))
19         output_tokens = int(getattr(usage, "completion_tokens", 0))
20         total_tokens = int(getattr(usage, "total_tokens", input_tokens + output_tokens))

View full OpenAI Swarm implementation

Critical: Provider token field names differ

Each provider uses different field names: Anthropic uses input_tokens/output_tokens, OpenAI uses prompt_tokens/completion_tokens, and some frameworks use camelCase (inputTokens). The implementations use fallback chains to handle all formats.

You can now capture tokens, latency, and cost for each agent. Next, we’ll run the swarm across LangGraph, Strands, and OpenAI Swarm to see how they perform with the same agent definitions.

Step 5: Run multiple orchestrators and track results

The repository includes scripts to run all three orchestrators and analyze their performance:

$ # Run all orchestrators 5 times each
$ ./scripts/run_swarm_benchmark.sh sequential 5
$ 
$ # Analyze the results
$ python scripts/analyze_benchmark_results.py

Quick start recap

Configure env: Create .env with SDK keys
Install deps: pip install -r requirements.txt
Download papers: python scripts/download_papers.py
Bootstrap agents: python scripts/launchdarkly/bootstrap.py
Configure targeting: Set default variation for each agent in LaunchDarkly UI
Test run: python orchestrators/strands/run_gap_analysis.py

Troubleshooting: If you see “No enabled agents found,” check that each agent has a default variation set in the Targeting tab.

Now that you’ve run the swarm across all three orchestrators, let’s look at how they differ in approach and performance.

Comparing orchestrator approaches to swarms

All three frameworks support multi-agent workflows, they just disagree on who decides what happens next.

Key differences

Aspect	Strands	LangGraph	OpenAI Swarm
Routing	Framework-managed	Graph-based	Function return
Handoff API	Tool call (automatic)	Command object	Return Agent object
Boilerplate	Low	Medium	Medium
Control	Low (black box)	High (explicit graph)	High (manual impl)
Debugging	Hard (why didn’t agent run?)	Easy (graph trace)	Hard (silent failures)
Per-Agent Metrics	Built-in	Wrapper required	Interception required

View full implementations: Strands | LangGraph | OpenAI Swarm

The LaunchDarkly advantage: By defining agents externally, you can implement swarms across all three frameworks and compare their approaches with the same agent definitions.

Performance comparison (9 runs: 3 datasets × 3 orchestrators)

Metric	OpenAI Swarm	Strands	LangGraph
Avg Time	2.9 min	5.7 min	8.0 min
Tokens	67K	99K	89K
Speed	385 tok/s	287 tok/s	186 tok/s
Report Size	13KB	32KB	67KB
Variance	±1.05 min	±1.38 min	±0.21 min

Key insight (based on limited sample): Fastest ≠ best. OpenAI Swarm was 3x faster but produced reports 80% smaller than LangGraph. LangGraph had the lowest variance and most comprehensive outputs despite slower execution.

Performance comparison graphs showing execution time, token usage, and processing speed across all three orchestrators

Example reports: See the outputs

LangGraph (60-70KB): Emergent | Theorem | Self-Improvement
Strands (30-35KB): Emergent | Theorem | Self-Improvement
OpenAI Swarm (10-15KB): Emergent | Theorem | Self-Improvement

Report size variation demonstrates why per-agent tracking matters - you need to know when agents produce minimal output.

Conclusion

The orchestrator you choose determines how agents coordinate, but it shouldn’t lock you into a single framework. By defining agents in LaunchDarkly and fetching them at runtime, you can run the same swarm across LangGraph, Strands, and OpenAI Swarm without duplicating configuration or watching prompts drift between repos.

The performance differences are real. OpenAI Swarm is fastest, LangGraph produces the most comprehensive outputs, and Strands offers the simplest setup. But you only discover these tradeoffs if you can track each agent individually and catch silent failures when they happen.

Swarms cost more than single LLM calls. The payoff is traceable reasoning you can audit, refine, and trust.

The full implementation is available on GitHub - AI Orchestrators. Clone the repo and run the same swarm across all three orchestrators. To get started with LaunchDarkly AI Configs, follow the quickstart guide.