SDK data handling behavior
Overview
This topic explains how the observability SDKs handle retries, buffering, memory usage, and data loss. Understanding these behaviors can help you tune your SDK configuration, estimate resource usage, and troubleshoot missing data.
Retry policies
The observability SDKs use retry logic to handle transient network failures when exporting telemetry data (traces, logs, and metrics) and session replay data. Retry behavior varies by SDK and data type.
Traces, logs, and metrics
For OpenTelemetry data (traces, logs, and metrics), most SDKs rely on the default retry behavior built into the OpenTelemetry exporter libraries. The default OpenTelemetry retry policy uses exponential backoff with a maximum of 5 retry attempts.
The following table summarizes the retry behavior per SDK:
The browser SDK uses a custom OTLPTraceExporterBrowserWithXhrRetry implementation instead of the standard OTel browser exporter. This works around known issues with the sendBeacon API stalling in certain browsers, using XMLHttpRequest instead.
Session replay
Session replay data uses separate retry logic from OTel telemetry. The following table summarizes session replay retry behavior:
For the Android SDK, the session replay transport uses a dedicated BatchWorker with per-exporter backoff tracking. Failed exports trigger exponential backoff starting at 2 seconds with a maximum of 60 seconds with 20% jitter. Successful exports clear the backoff state for that exporter.
Memory usage and buffering
Each SDK maintains in-memory queues for telemetry data that has been recorded but not yet exported. Understanding these queue sizes helps you estimate the maximum memory the SDK may consume.
Queue sizes by SDK
The following table shows the maximum queue sizes and batch export sizes for each SDK. When queues are full, new data is dropped.
Node.js and Python use byte-based limits
The Node.js and Python SDKs use byte-based queue limits (1 MB) rather than item counts. This means the number of items that fit in the queue depends on the size of each span or log record.
Android uses cost-based queue management
The Android SDK uses a cost-based queue for session replay data. Each event payload has an associated cost, and the queue enforces a total cost limit of 5,000,000 units with a per-exporter limit of 2,500,000 units. This prevents large payloads from consuming a disproportionate share of the queue.
Estimating maximum memory usage
To estimate the theoretical maximum memory an SDK’s telemetry buffers may consume, multiply the max queue size by the average size of a span or log record in your application. Typical sizes:
- A simple span with a few attributes: 200-500 bytes
- A log record with a message and attributes: 100-300 bytes
- A metric data point: 50-150 bytes
For example, the .NET SDK with a max queue size of 10,000 items and an average span size of 400 bytes would use approximately 4 MB for the trace queue at capacity. The total across traces, logs, and metrics queues would be roughly three times that.
Data drop behavior
When a queue reaches capacity, the SDK must drop incoming data. The behavior when this occurs varies by SDK:
Most SDKs drop data silently
If you are not seeing expected telemetry data in LaunchDarkly, high-throughput applications may be exceeding the SDK’s queue capacity. This is most likely to occur with the Ruby (1,024 items), React Native (100 items), or Android (100 items) SDKs which have smaller queue sizes.
To mitigate data loss:
- Ensure your application is not generating more telemetry data than the SDK can export within the batch interval.
- Use ingestion filters to reduce the volume of low-value signals.
- For server-side SDKs, consider using an OpenTelemetry Collector as an intermediary to handle buffering and retries.