For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sign inTry it free
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
  • Tutorials
    • The AI Iteration Loop for Deploying Reliable Agents with LangGraph
    • Using LaunchDarkly feature flags and Experimentation with Wordpress
    • Migrate a Hardcoded LangGraph Agent to LaunchDarkly AgentControl in 20 Minutes
    • Offline Evaluation of RAG-Grounded Answers in AgentControl
    • Beyond n8n for Workflow Automation: Agent Graphs as Your Universal Agent Harness
    • Catch your first silent AI failure with Vega AI in under 10 minutes
    • Evaluate LLM code generation with LLM-as-judge evaluators
    • OpenTelemetry for LLM Applications: A Practical Guide with LaunchDarkly and Langfuse
    • Use LaunchDarkly Agent Skills in Claude Code and Cursor
    • Detection to Resolution: Real World Debugging with Rage Clicks and Session Replay
    • Compare AI orchestrators: LangGraph vs Strands vs OpenAI Swarm
    • Building a data extraction pipeline with LaunchDarkly
    • Day 12 | 🎊 New Year, New Observability
    • Day 11 | ✉️ Letters to Santa: What engineering teams really want from Observability in 2026
    • Day 10 | Why observability and feature flags go together like milk and cookies
    • Day 9 | 👻 The Three Ghosts Haunting Your AI This Holiday Season
    • Day 7 | 🎄✨The Rockefeller tree in NYC: SLOs that actually drive decisions
    • Day 6 | 💸 The famous green character that stole your cloud budget: the cardinality problem
    • Day 5 | 🧹 Using a Popular Tidying Method to Consolidate Your Observability Stack
    • Day 4 | ❄️ Tracing the impact of holiday styling in your Node.js app
    • Day 8 | 🎁 Observable Multi-Modal Agentic Systems
    • Day 3 | 🔔 Jingle All the Way to Zero-Config Observability
    • Day 2 | 🎅 He knows if you have been bad or good... But what if he gets it wrong?
    • Collecting user feedback in your app with feature flags
    • Day 1 | 🎄 Observability Under the Tree: What Changed in 2025
    • Build a User Frustration Detection & Response System
    • When to Add Online Evals to Your AgentControl
    • Detecting User Frustration: Understanding Rage Clicks and Session Replay
    • AgentControl config CI/CD Pipeline: Automated Quality Gates and Safe Deployment
    • A Deeper Look at LaunchDarkly Architecture: More than Feature Flags
    • Add Observability to Your React Native App in 5 minutes
    • Smart AI Agent Targeting with MCP Tools
    • Build a LangGraph Multi-Agent System in 20 Minutes with LaunchDarkly AgentControl
    • Snowflake Cortex Completion API + LaunchDarkly SDK Integration
    • Using AgentControl to review database changes
    • How to implement WebSockets and kill switches in a Python application
    • 4 hacks to turbocharge your Cursor productivity
    • Create a feature flag in your IDE in 5 minutes with LaunchDarkly's MCP server
    • Observability for Your Go ORM: OpenTelemetry Integration with GORM
    • The complete guide to OpenTelemetry in Next.js
    • How to instrument your React Native app with OpenTelemetry
    • The complete guide to OpenTelemetry in Python
    • Monitoring Browser Applications with OpenTelemetry
    • How to Use OpenTelemetry to Monitor Next.js Applications
    • What is OpenTelemetry and Why Should I Care?
    • Distributed Tracing in Next.js Apps
    • Tracing Distributed Systems in Next.js
    • Real-time Monitoring in Django: Essential Tools and Techniques
    • DeepSeek vs Qwen: local model showdown featuring LaunchDarkly AgentControl
    • Application Tracing in .NET for Performance Monitoring
    • The Ultimate Guide to Ruby Logging: Best Libraries and Practices
    • Using Materialized Views in ClickHouse (vs. Postgres)
    • Filtering and Sampling LaunchDarkly Ingest
    • How to Set Up Your Production AWS MSK Kafka Cluster
    • Publishing an NPM Package with Private pnpm Monorepo Dependencies
    • How To Use The Chrome Inspector & Debugger
    • 3 Levels of Data Validation in a Full Stack Application With React
    • The power of the monorepo: Keep your fullstack app in sync!
    • Compression: The simple, powerful upgrade for your web stack
    • Video tutorials
Sign inTry it free
LogoLogo
On this page
  • The problem with unstructured text
  • How the pieces fit together
  • Why this approach
  • Change schemas in 2 minutes, not 2 sprints
  • One gateway, all the models
  • Complete setup
  • Prerequisites
  • Quick start
  • Install dependencies
  • Configure environment
  • Project structure
  • configuration
  • Where the extraction schemas live
  • The 6 extraction tools
  • How to customize the schemas
  • Implementation & usage
  • Core implementation
  • How to run the extraction pipeline
  • Beyond sales calls - what else you can extract
  • When to use this approach
  • What’s next
  • Further reading
Tutorials

Build a production LLM data extraction pipeline with AgentControl and Vercel AI Gateway

Was this page helpful?
Previous

Day 12 | 🎊 New Year, New Observability

Next
Built with

Published January 9th, 2026

Portrait of Scarlett Attensil.

by Scarlett Attensil

Newer features are available with AgentControl

This tutorial was published in January 2026, before LaunchDarkly shipped several features that extend the extraction-pipeline pattern shown below. The walkthrough still works, but for new builds you may also want to use:

  • Prompt snippets: Reusable prompt fragments so you can share schema definitions across configs without copy-paste drift
  • Offline evaluations and Datasets: Validate every prompt or schema change against a saved reference set of transcripts before promoting
  • Custom judges: Score the structured outputs against domain-specific quality criteria, such as field completeness or schema adherence

To learn more, read AgentControl.

Every conversation contains signals your ML models need. Customer calls reveal buying intent. Support tickets expose product friction. Interview transcripts capture technical depth. The problem? Those signals are buried in thousands of words of unstructured text.

This tutorial shows you how to build a data extraction pipeline that turns messy transcripts into structured JSON - using AgentControl to control everything (models, prompts, schemas) without redeploying.

What you’ll build: A pipeline that extracts 40-60+ structured fields from any transcript - sentiment scores, engagement metrics, binary signals, text statistics - all instantly configurable through LaunchDarkly’s UI.

The key insight: When you discover that customer_question_count predicts engagement better than talk time, or that Opus 4.5 handles technical jargon better than GPT-5.2, you can update your extraction logic immediately. No PR, no deploy, no waiting.

Ready to build? Clone the complete example repository to start extracting structured data from your transcripts in minutes.

The problem with unstructured text

Your organization has valuable signals buried in text such as customer conversations, support tickets, interview transcripts, product reviews. Tools like Gong, Chorus, and conversation intelligence platforms are excellent for their designed purpose, but you need something different: extracting specific features for your ML models, with a schema you control completely.

What you typically have:

"Yeah, so we've been looking at different solutions. The other vendor's
pricing was reasonable but their timeline was concerning. We need this
rolled out before Q3..."

What your models need:

1{
2 "alternatives_mentioned": true,
3 "pricing_sentiment": 0.3,
4 "timeline_mentioned": true,
5 "urgency_score": 0.8,
6 "decision_timeframe": "Q3"
7}

How the pieces fit together

┌─────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Your Text │ ---> │ Vercel AI Gateway│ ---> │ LLM │
└─────────────┘ │ (unified API) │ │ (GPT/Claude)│
└──────────────────┘ └─────────────┘
↑ │
│ ↓
┌──────────────────┐ ┌─────────────┐
│ LaunchDarkly │ │ Structured │
│ config │ │ JSON │
│ (model, prompts, │ └─────────────┘
│ 6 tool schemas) │
└──────────────────┘

The AI model automatically selects the most appropriate extraction schema (prospecting, discovery, demo, proposal, technical, or customer success) based on the transcript content.

What stays in LaunchDarkly:

  • Model selection (GPT-5.2, Opus 4.5, Gemini 3)
  • System and user prompts
  • All 6 extraction tool schemas (40-60+ fields each)
  • Temperature and other parameters
  • Intelligent tool selection logic
  • Targeting rules for different use cases

What stays in your code:

  • Reading input files
  • Passing transcript text
  • Writing output CSV/JSON

Why this approach

Change schemas in 2 minutes, not 2 sprints

This separation matters when you discover issues in production. For example, if the AI model is selecting the wrong tool for certain transcript types, you can instantly adjust the prompt or model using LaunchDarkly - takes 2 minutes in the UI instead of a hotfix deploy.

Key benefits:

  • Instant schema updates: Add fields to any of the 6 tools when you discover new predictive signals
  • A/B testing models: Test Google Gemini 3 vs Anthropic Claude Opus 4.5 to see which selects tools more accurately
  • Smart tool selection: AI model automatically chooses between prospecting, discovery, demo, proposal, technical, or customer success schemas
  • Privacy compliance: Different configurations for different regions using targeting rules - GDPR-compliant schemas for European customers

One gateway, all the models

Vercel brings specific advantages for data extraction workloads:

  • Server-sent events for real-time progress: Processing 1,000 transcripts? Stream progress updates to your UI as each completes using the Vercel AI SDK
  • Automatic scaling: From 1 transcript to 10,000 - Vercel functions scale without configuration
  • Built-in reliability: Automatic retries and failover when LLM providers have issues
  • OIDC tokens on deployment: No API keys in production - Vercel handles auth automatically
  • Unified LLM access: One endpoint for OpenAI GPT-5.2, Anthropic Claude Opus 4.5, and Google Gemini 3 models through Vercel AI Gateway

Extraction progress dashboard showing transcript processing status

Real-time progress showing extraction status for batch processing

Deploy once, then tune everything through LaunchDarkly while Vercel handles the infrastructure.

Complete setup

Prerequisites

  • Node.js 18+ and basic TypeScript knowledge
  • LaunchDarkly account with AgentControl enabled (quickstart guide)
  • Vercel account (free tier works) or API keys for local development
  • 100+ transcripts or documents to process (any format)

Quick start

Clone the complete example to skip setup and start extracting immediately:

$git clone https://github.com/launchdarkly-labs/scarlett-feature-extraction.git
$cd scarlett-feature-extraction

Install dependencies

$# Node.js dependencies for the extraction pipeline
$npm install @launchdarkly/node-server-sdk @launchdarkly/server-sdk-ai @launchdarkly/server-sdk-ai-vercel
$npm install ai @ai-sdk/openai

SDK Documentation: LaunchDarkly Node.js AI SDK with Vercel provider provides complete reference for all SDK features.

For ML model training (optional), set up a Python environment:

$python3 -m venv venv && source venv/bin/activate
$pip install catboost scikit-learn pandas numpy joblib requests python-dotenv

Note: For Python-based pipelines, see the LaunchDarkly Python AI SDK (doesn’t include Vercel provider).

Configure environment

$# 1. Configure environment variables
$cp .env.example .env
$# Add your keys to .env:
$# LAUNCHDARKLY_SDK_KEY=sdk-xxxxx
$# LD_API_KEY=api-xxxxx (for bootstrap only)
$# LD_PROJECT_KEY=your-project (for bootstrap only)
$
$# 2. Get Vercel AI Gateway token (for local development)
$npx vercel env pull # Generates VERCEL_OIDC_TOKEN, refresh every 12 hours
$
$# 3. Bootstrap config (one-time)
$python bootstrap/create_unified_config.py

Environment variables explained:

  • LAUNCHDARKLY_SDK_KEY: Runtime SDK key for feature flags
  • AI_GATEWAY_API_KEY: Required for Vercel deployments
  • VERCEL_OIDC_TOKEN: Auto-generated for local development via npx vercel env pull
  • LD_API_KEY & LD_PROJECT_KEY: Only for bootstrap script to create configs

Project structure

your-project/
├── bootstrap/
│ └── create_unified_config.py # Creates config (run once)
├── lib/
│ ├── pipeline.ts # Extraction pipeline
│ └── launchdarkly-client.ts # LaunchDarkly AI SDK integration
├── app/
│ ├── page.tsx # Upload UI
│ └── api/
│ └── extract-stream/ # API endpoint for extraction
├── LAUNCHDARKLY_TOOLS.json # Your field schemas (customize this!)
└── .env # Your API keys

configuration

Where the extraction schemas live

The field definitions and extraction schemas are stored in two key locations:

  1. LAUNCHDARKLY_TOOLS.json - Contains all field schemas:

    • 40 core fields shared across all tools (sentiment scores, engagement metrics, etc.)
    • Tool-specific fields for each document type
    • This file defines what data you’ll extract
  2. bootstrap/create_unified_config.py - Sets up LaunchDarkly:

    • Reads the schemas from LAUNCHDARKLY_TOOLS.json
    • Creates the config in LaunchDarkly named transcript-extraction-unified
    • Attaches all 6 extraction tools to this single config
    • Run once: python bootstrap/create_unified_config.py

The 6 extraction tools

Each tool is defined in LAUNCHDARKLY_TOOLS.json with specific fields for different call types:

Prospecting tool (extract_prospecting_features - 43 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json → variation_a_prospecting
  • Use case: First contact, cold outreach, gatekeeper conversations
  • Example fields: gatekeeper_encountered, callback_scheduled, interest_level

Discovery tool (extract_discovery_features - 48 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json → variation_b_discovery
  • Use case: Qualification calls, needs assessment
  • Example fields: budget_confirmed, authority_level, qualification_score

Demo tool (extract_demo_features - 58 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json → variation_c_demo
  • Use case: Product demonstrations, feature walkthroughs
  • Example fields: customer_wow_moments, demo_effectiveness_score, trial_requested

Proposal tool (extract_proposal_features - 53 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json → variation_d_proposal
  • Use case: Pricing discussions, contract negotiations
  • Example fields: close_probability, discount_requested, blockers_to_close

Technical tool (extract_technical_features - 63 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json → variation_e_technical
  • Use case: Architecture reviews, technical deep-dives
  • Example fields: technical_fit_score, technical_risk_score, scalability_concerns

Customer Success tool (extract_customer_success_features - 53 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json → variation_f_customer_success
  • Use case: QBRs, renewal discussions, expansion opportunities
  • Example fields: account_health_score, renewal_likelihood, churn_risk

ML Preview: These extracted fields become features for predictive models - Part 2 will show you how to train models that predict deal outcomes, customer churn, and more using the data you extract here.

How to customize the schemas

You have two options for customizing your extraction schemas:

Option 1: Edit directly in LaunchDarkly UI

  1. Navigate to LaunchDarkly → AgentControl → transcript-extraction-unified
  2. Click on any tool (e.g., “Extract Prospecting Features”)
  3. Edit the JSON schema directly in the UI - add/remove fields instantly
  4. Save changes - they apply immediately, no deployment needed!

This is perfect for quick iterations - discover a new predictive signal? Add it in 30 seconds.

Option 2: Update via code

  1. Edit LAUNCHDARKLY_TOOLS.json:
1// Add a new field to prospecting tool:
2"variation_a_prospecting": {
3 "function": {
4 "parameters": {
5 "properties": {
6 // Add your custom field here:
7 "competitor_switching_intent": {
8 "type": "boolean",
9 "description": "Intent to switch from competitor"
10 }
11 }
12 }
13 }
14}
  1. Re-run bootstrap to update LaunchDarkly:
$python bootstrap/create_unified_config.py
  1. Your extraction automatically uses the new schema - no code changes needed!

Start with 100 transcripts to validate your tools. You’ll likely learn more from the first 100 extractions than weeks of planning. Watch which tools the AI model selects and refine the prompts if it’s choosing incorrectly.

Implementation & usage

Core implementation

Here’s the extraction implementation:

1// Source: github.com/launchdarkly-labs/scarlett-feature-extraction
2// lib/launchdarkly-client.ts:101-198 (simplified)
3import * as ld from "@launchdarkly/node-server-sdk";
4import { initAi } from "@launchdarkly/server-sdk-ai";
5
6export class LaunchDarklyAIClient {
7 private ldClient: ld.LDClient;
8 private aiClient: any;
9
10 async initialize(): Promise<void> {
11 this.ldClient = ld.init(process.env.LAUNCHDARKLY_SDK_KEY!);
12 await this.ldClient.waitForInitialization();
13 this.aiClient = initAi(this.ldClient);
14 }
15
16 async extractStructuredFeatures(params: {
17 configKey: string; // "transcript-extraction-unified"
18 context: ld.LDContext;
19 transcript: string;
20 }): Promise<any> {
21 // Get config with all 6 extraction tools
22 const aiConfig = await this.aiClient.completionConfig(
23 params.configKey,
24 params.context,
25 { enabled: false }
26 );
27
28 // Extract schema from first tool (all share core fields)
29 const tools = aiConfig.model?.parameters?.tools || [];
30 const jsonSchema = tools[0].parameters;
31
32 // Create Vercel AI Gateway client
33 const { createOpenAI } = await import("@ai-sdk/openai");
34 const vercelGateway = createOpenAI({
35 baseURL: "https://ai-gateway.vercel.sh/v1",
36 apiKey: process.env.VERCEL_OIDC_TOKEN || process.env.AI_GATEWAY_API_KEY,
37 });
38
39 // Use the model from config
40 const model = vercelGateway.chat(
41 `${aiConfig.provider.name}/${aiConfig.model.name}`
42 );
43
44 // AI automatically selects appropriate tool based on content
45 const managed = await this.aiClient.createModel(aiConfig);
46 const result = await managed.run(`Transcript:\n\n${params.transcript}`, jsonSchema);
47
48 return result.parsed;
49 }
50}

How to run the extraction pipeline

Once your schemas are configured, running extraction is straightforward:

$# 1. Start the development server
$npm run dev
$
$# 2. Open http://localhost:3000
$# 3. Upload transcript files (.txt or .md)
$# 4. Click "Extract Features"
$# 5. Download the CSV with all extracted fields

The AI model automatically selects the right extraction tool based on the transcript content.

Beyond sales calls - what else you can extract

This architecture works for any unstructured data extraction need. The pipeline code never changes - just the configuration.

Support ticket analysis

  • Extract urgency scores, issue categories, product areas, customer effort scores
  • Route urgent tickets to detailed schemas, low-priority to streamlined ones
  • ML applications: Predict escalation likelihood, estimate resolution time, auto-assign to right team, identify product bugs from patterns

Customer review mining

  • Pull out feature mentions with sentiment, competitor comparisons, recommendation likelihood
  • Different product lines can use different configs with tailored extraction tools
  • ML applications: Predict NPS scores, identify feature requests, forecast churn from negative patterns, cluster customers by satisfaction drivers

Interview transcript processing

  • Extract technical competency signals, communication clarity, culture fit indicators
  • Different roles need different schemas - handle this through LaunchDarkly targeting rules
  • ML applications: Predict candidate success probability, identify skill gaps, score cultural alignment, reduce hiring bias through standardized signals

Medical consultation transcripts

  • Extract symptoms, treatment discussions, medication mentions, follow-up requirements
  • Ensure HIPAA compliance by redacting PII before extraction
  • ML applications: Predict readmission risk, identify medication adherence issues, flag potential diagnoses for review, optimize appointment scheduling

Legal document analysis

  • Extract contract terms, risk clauses, obligations, and deadlines
  • Route different document types (NDAs, MSAs, employment contracts) to specialized schemas
  • ML applications: Assess contract risk scores, identify non-standard terms, predict negotiation outcomes, flag compliance issues

Earnings call transcripts

  • Extract forward-looking statements, financial metrics, competitive positioning
  • Capture management sentiment and guidance changes
  • ML applications: Predict stock price movements, identify leadership confidence levels, detect financial health indicators, compare guidance to historical accuracy

Privacy and sensitive data

  • Add PII detection step before extraction - scan for emails, phone numbers, SSNs, names
  • Either redact ([REDACTED]) or skip extraction based on compliance requirements
  • ML applications: Route by geography using LaunchDarkly targeting - EU transcripts use privacy-safe schemas, other regions get full extraction

When to use this approach

Use this when:

  • Processing 100-10,000 documents monthly
  • Schema needs frequent iteration
  • Different document types need different treatment
  • You’re bootstrapping training data

Skip this when:

  • Processing millions of documents (use traditional NLP)
  • Schema is fixed and proven
  • You need sub-second latency
  • Documents follow strict templates

What’s next

In Part 2, I’ll show how to use these extracted features for ML models. The challenge: sparse outcomes (most deals don’t close, most candidates aren’t hired). I’ll demonstrate a zero-inflated regression approach that actually works with real-world data.

Ready to build? Start with your messiest transcripts - that’s where you’ll learn what features really matter.

Further reading

  • CI/CD for AgentControl - Automate config deployments with version control
  • Multi-agent systems with LangGraph - Build complex AI workflows
  • Smart AI Agent Targeting with MCP Tools - Route extraction jobs to different schemas by user segment or geography
  • Evaluate LLM code generation with LLM-as-judge evaluators - Custom judges for grading specialized outputs
  • Build AgentControl configs with Agent Skills - Spin up new extraction configs from your coding assistant