SDK data handling behavior | LaunchDarkly

Overview

This topic explains how the observability SDKs handle retries, buffering, memory usage, and data loss. Understanding these behaviors can help you tune your SDK configuration, estimate resource usage, and troubleshoot missing data.

Retry policies

The observability SDKs use retry logic to handle transient network failures when exporting telemetry data (traces, logs, and metrics) and session replay data. Retry behavior varies by SDK and data type.

Traces, logs, and metrics

For OpenTelemetry data (traces, logs, and metrics), most SDKs rely on the default retry behavior built into the OpenTelemetry exporter libraries. The default OpenTelemetry retry policy uses exponential backoff with a maximum of 5 retry attempts.

The following table summarizes the retry behavior per SDK:

SDK	Protocol	Retry strategy	Max retries	Notes
JavaScript (browser)	HTTP/Protobuf	Custom exponential backoff	3	Base delay 1s, formula: `1000 + 500 * 2^n` ms
Node.js (server-side)	HTTP/Protobuf	OTel SDK default	5	Export timeout: 5s
Python	gRPC	OTel SDK default (gRPC built-in)	5	Export timeout: 5s
Go	HTTP/Protobuf	OTel SDK default	5	Export timeout: 30s
.NET (server-side)	HTTP/Protobuf	OTel SDK default	5	Export timeout: 30s
Ruby	HTTP/Protobuf	OTel SDK default	5	Uses gzip compression
React Native	HTTP/Protobuf	OTel SDK default	5	Export timeout: 5s
Android	HTTP/Protobuf	OTel SDK default	5	Export timeout: 5s

The browser SDK uses a custom OTLPTraceExporterBrowserWithXhrRetry implementation instead of the standard OTel browser exporter. This works around known issues with the sendBeacon API stalling in certain browsers, using XMLHttpRequest instead.

Session replay

Session replay data uses separate retry logic from OTel telemetry. The following table summarizes session replay retry behavior:

SDK	Max retries	Backoff strategy	Upload timeout
JavaScript (browser)	5	Exponential backoff	15 seconds
Android	Unlimited (with backoff cap)	Exponential: `2.0 * 2^(n-1)` seconds, max 60s, 20% jitter	Per-batch
React Native	Inherits OTel default	Exponential backoff	5 seconds

For the Android SDK, the session replay transport uses a dedicated BatchWorker with per-exporter backoff tracking. Failed exports trigger exponential backoff starting at 2 seconds with a maximum of 60 seconds with 20% jitter. Successful exports clear the backoff state for that exporter.

Memory usage and buffering

Each SDK maintains in-memory queues for telemetry data that has been recorded but not yet exported. Understanding these queue sizes helps you estimate the maximum memory the SDK may consume.

Queue sizes by SDK

The following table shows the maximum queue sizes and batch export sizes for each SDK. When queues are full, new data is dropped.

SDK	Max queue size	Max export batch size	Batch interval
JavaScript (browser)	2,048 spans	1,024 spans	30 seconds
Node.js (server-side)	1,048,576 (1M)	1,048,576 (1M)	5 seconds
Python	1,048,576 bytes (1 MB)	131,072 bytes (128 KB)	5 seconds
Go	2,048 logs (OTel defaults for spans)	512 logs (OTel defaults for spans)	1 second (traces), 5 seconds (metrics)
.NET (server-side)	10,000 items	10,000 items	5 seconds
Ruby	1,024 items	128 items	1 second
React Native	100 items	10 items	500 milliseconds
Android	100 items (OTel), 5,000,000 cost units (replay)	10 items (OTel)	1 second

Node.js and Python use byte-based limits

The Node.js and Python SDKs use byte-based queue limits (1 MB) rather than item counts. This means the number of items that fit in the queue depends on the size of each span or log record.

Android uses cost-based queue management

The Android SDK uses a cost-based queue for session replay data. Each event payload has an associated cost, and the queue enforces a total cost limit of 5,000,000 units with a per-exporter limit of 2,500,000 units. This prevents large payloads from consuming a disproportionate share of the queue.

Estimating maximum memory usage

To estimate the theoretical maximum memory an SDK’s telemetry buffers may consume, multiply the max queue size by the average size of a span or log record in your application. Typical sizes:

A simple span with a few attributes: 200-500 bytes
A log record with a message and attributes: 100-300 bytes
A metric data point: 50-150 bytes

For example, the .NET SDK with a max queue size of 10,000 items and an average span size of 400 bytes would use approximately 4 MB for the trace queue at capacity. The total across traces, logs, and metrics queues would be roughly three times that.

Data drop behavior

When a queue reaches capacity, the SDK must drop incoming data. The behavior when this occurs varies by SDK:

SDK	Drop behavior	Logging
JavaScript (browser)	Silently dropped	No warning logged
Node.js (server-side)	Silently dropped	No warning logged
Python	Silently dropped	No warning logged
Go	Silently dropped	No warning logged
.NET (server-side)	Silently dropped	No warning logged
Ruby	Silently dropped	No warning logged
React Native	Dropped with warning	Logs `"Exceeded event queue capacity"` once when capacity is first exceeded, tracks dropped event count internally
Android	Dropped with error log	Logs `"Dropping N items"` with the exporter class name

Most SDKs drop data silently

If you are not seeing expected telemetry data in LaunchDarkly, high-throughput applications may be exceeding the SDK’s queue capacity. This is most likely to occur with the Ruby (1,024 items), React Native (100 items), or Android (100 items) SDKs which have smaller queue sizes.

To mitigate data loss:

Ensure your application is not generating more telemetry data than the SDK can export within the batch interval.
Use ingestion filters to reduce the volume of low-value signals.
For server-side SDKs, consider using an OpenTelemetry Collector as an intermediary to handle buffering and retries.