SDK data handling behavior

Overview

This topic explains how the observability SDKs handle retries, buffering, memory usage, and data loss. Understanding these behaviors can help you tune your SDK configuration, estimate resource usage, and troubleshoot missing data.

Retry policies

The observability SDKs use retry logic to handle transient network failures when exporting telemetry data (traces, logs, and metrics) and session replay data. Retry behavior varies by SDK and data type.

Traces, logs, and metrics

For OpenTelemetry data (traces, logs, and metrics), most SDKs rely on the default retry behavior built into the OpenTelemetry exporter libraries. The default OpenTelemetry retry policy uses exponential backoff with a maximum of 5 retry attempts.

The following table summarizes the retry behavior per SDK:

SDKProtocolRetry strategyMax retriesNotes
JavaScript (browser)HTTP/ProtobufCustom exponential backoff3Base delay 1s, formula: 1000 + 500 * 2^n ms
Node.js (server-side)HTTP/ProtobufOTel SDK default5Export timeout: 5s
PythongRPCOTel SDK default (gRPC built-in)5Export timeout: 5s
GoHTTP/ProtobufOTel SDK default5Export timeout: 30s
.NET (server-side)HTTP/ProtobufOTel SDK default5Export timeout: 30s
RubyHTTP/ProtobufOTel SDK default5Uses gzip compression
React NativeHTTP/ProtobufOTel SDK default5Export timeout: 5s
AndroidHTTP/ProtobufOTel SDK default5Export timeout: 5s

The browser SDK uses a custom OTLPTraceExporterBrowserWithXhrRetry implementation instead of the standard OTel browser exporter. This works around known issues with the sendBeacon API stalling in certain browsers, using XMLHttpRequest instead.

Session replay

Session replay data uses separate retry logic from OTel telemetry. The following table summarizes session replay retry behavior:

SDKMax retriesBackoff strategyUpload timeout
JavaScript (browser)5Exponential backoff15 seconds
AndroidUnlimited (with backoff cap)Exponential: 2.0 * 2^(n-1) seconds, max 60s, 20% jitterPer-batch
React NativeInherits OTel defaultExponential backoff5 seconds

For the Android SDK, the session replay transport uses a dedicated BatchWorker with per-exporter backoff tracking. Failed exports trigger exponential backoff starting at 2 seconds with a maximum of 60 seconds with 20% jitter. Successful exports clear the backoff state for that exporter.

Memory usage and buffering

Each SDK maintains in-memory queues for telemetry data that has been recorded but not yet exported. Understanding these queue sizes helps you estimate the maximum memory the SDK may consume.

Queue sizes by SDK

The following table shows the maximum queue sizes and batch export sizes for each SDK. When queues are full, new data is dropped.

SDKMax queue sizeMax export batch sizeBatch interval
JavaScript (browser)2,048 spans1,024 spans30 seconds
Node.js (server-side)1,048,576 (1M)1,048,576 (1M)5 seconds
Python1,048,576 bytes (1 MB)131,072 bytes (128 KB)5 seconds
Go2,048 logs (OTel defaults for spans)512 logs (OTel defaults for spans)1 second (traces), 5 seconds (metrics)
.NET (server-side)10,000 items10,000 items5 seconds
Ruby1,024 items128 items1 second
React Native100 items10 items500 milliseconds
Android100 items (OTel), 5,000,000 cost units (replay)10 items (OTel)1 second
Node.js and Python use byte-based limits

The Node.js and Python SDKs use byte-based queue limits (1 MB) rather than item counts. This means the number of items that fit in the queue depends on the size of each span or log record.

Android uses cost-based queue management

The Android SDK uses a cost-based queue for session replay data. Each event payload has an associated cost, and the queue enforces a total cost limit of 5,000,000 units with a per-exporter limit of 2,500,000 units. This prevents large payloads from consuming a disproportionate share of the queue.

Estimating maximum memory usage

To estimate the theoretical maximum memory an SDK’s telemetry buffers may consume, multiply the max queue size by the average size of a span or log record in your application. Typical sizes:

  • A simple span with a few attributes: 200-500 bytes
  • A log record with a message and attributes: 100-300 bytes
  • A metric data point: 50-150 bytes

For example, the .NET SDK with a max queue size of 10,000 items and an average span size of 400 bytes would use approximately 4 MB for the trace queue at capacity. The total across traces, logs, and metrics queues would be roughly three times that.

Data drop behavior

When a queue reaches capacity, the SDK must drop incoming data. The behavior when this occurs varies by SDK:

SDKDrop behaviorLogging
JavaScript (browser)Silently droppedNo warning logged
Node.js (server-side)Silently droppedNo warning logged
PythonSilently droppedNo warning logged
GoSilently droppedNo warning logged
.NET (server-side)Silently droppedNo warning logged
RubySilently droppedNo warning logged
React NativeDropped with warningLogs "Exceeded event queue capacity" once when capacity is first exceeded, tracks dropped event count internally
AndroidDropped with error logLogs "Dropping N items" with the exporter class name
Most SDKs drop data silently

If you are not seeing expected telemetry data in LaunchDarkly, high-throughput applications may be exceeding the SDK’s queue capacity. This is most likely to occur with the Ruby (1,024 items), React Native (100 items), or Android (100 items) SDKs which have smaller queue sizes.

To mitigate data loss:

  • Ensure your application is not generating more telemetry data than the SDK can export within the batch interval.
  • Use ingestion filters to reduce the volume of low-value signals.
  • For server-side SDKs, consider using an OpenTelemetry Collector as an intermediary to handle buffering and retries.