For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sign inTry it free
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
  • Flagship blog
    • 52 Blog Posts, Claude, 3 Prompts, Under an Hour
    • Shipping from Oakland: An Observability Hackathon Recap
    • Day 12 | New Year, New Observability
    • Day 11 | What engineering teams really want from Observability
    • Day 10 | Why observability and feature flags go together like milk and cookies
    • Day 9 | The Three Ghosts Haunting Your AI This Holiday Season
    • Day 8 | Observable Multi-Modal Agentic Systems
    • Day 7 | SLOs that actually drive decisions
    • Day 6 | Stop cardinality from stealing your cloud budget
    • Day 5 | Using a Popular Tidying Method to Consolidate Your Observability Stack
    • Day 4 | Tracing the impact of feature flags in your Node.js app
    • Day 3 | Zero-Config Observability with OpenTelemetry
    • Day 2 | Why AI agents need three layers of observability
    • Day 1 | Observability Under the Tree: What Changed in 2025
    • 5 takeaways from my first PyCon JP conference
    • Dungeons & Downtimes: XP gained from our adventure
    • Reverse Proxy for custom domains
    • Adventures in dogfooding: Guarded Releases
    • A quick tool for npm package scanning
    • My DEF CON 33 experience
    • Make every launch a big deal
    • Fun with JS streams
    • Moonshots XXII: Hack to the Future recap
    • A tale of three rate limiters
    • My good friend Claude
    • My approach to React app architecture in 2025
    • Data isolation with ClickHouse row policies
    • Ingest and Visualization for OpenTelemetry Metrics
    • Alert Evaluations: Incremental Merges in ClickHouse
    • Optimizing ClickHouse: The Tactics That Worked for Us
    • Migrating from OpenSearch to ClickHouse
    • Revamping Privacy Mode: A Better Way to Obfuscate Sensitive Data
    • An open-source session replay benchmark
    • LLM-based Grouping of Errors
    • Building GitHub Enhanced Stacktraces
    • Vercel Edge Runtime Support
    • Finding Interesting Sessions with Markov Chains
    • Building Logging Integrations at LaunchDarkly
    • The Network Request Details Panel
    • Using Github as a Headless CMS
    • Your Source Maps Should Be Public
    • Supporting Outside Contributions at LaunchDarkly
    • Managing our design tokens at LaunchDarkly
    • Our Commitment to OpenTelemetry
    • The 5 Best Logging Libraries for Ruby
    • InfluxDB: Visualizing Millions of Customers' Metrics using a Time Series Database
    • 8 Tips to Help You Maximize Chrome DevTools
    • The Debugging Process and Techniques for Web Applications (Part 2/2)
    • 5 Best Node.js Logging Libraries
    • What are rage clicks and how to detect them
    • 5 Best Practices for Maintaining a Clean ReactJS App
    • Is Kafka the Key? The Evolution of LaunchDarkly's Ingest
    • What Is Full Stack Monitoring and How Does It Work?
    • The beauty of contact-first API design
    • What is Frontend Monitoring and What Tools Help You Do It?
    • 5 strategies to monitor the health of your web application
    • Configuring OpenSearch for a Write-Heavy Workload
    • Maximizing Our Machines: Worker Pools At LaunchDarkly
Sign inTry it free
LogoLogo
On this page
  • Data Store Experimentation
  • Timeseries DBs to the Rescue
  • Performance with Continuous Queries (Tasks)
  • Cardinality Constraints
  • Bucket per Project
  • Insightful Info
Flagship blog

InfluxDB: Visualizing Millions of Customers’ Metrics using a Time Series Database

Was this page helpful?
Previous

8 Tips to Help You Maximize Chrome DevTools

Next
Built with

Published January 6, 2023

portrait of Vadim Korolik.

by Vadim Korolik

Data Store Experimentation

These days, it’s hard to imagine building or managing a production full stack app without metrics. Frontend network request timing is just as important as the recording of backend timings. Processing data in asynchronous workers means troubleshooting performance bottlenecks is only possible with insights into what parts of the code are slow.

At LaunchDarkly, we provide customers with the ability to record and visualize arbitrary numeric metrics from their full stack apps. We use the functionality in our own app to record everything from the latency of certain UI operations to the duration of sessions processed in the backend.

LaunchDarkly dashboard visualizing arbitrary numeric metrics from a full stack application.

LaunchDarkly dashboard visualizing arbitrary numeric metrics from a full stack application.

However, designing a solution that would support querying millions of metrics turned out more challenging than we expected. The problem is unavoidable with unbounded data like metrics as there is no limit to how many points can be produced in a small timestamp.

In their simplest form, metrics consist of key-value pairs with a timestamp. Additional tags are stored as columns, which can be normalized to allow dynamic tag creation. With this model, you can imagine that writing the data to the database was pretty simple, as was finding the metrics for a user’s session.

Writing queries in PostgreSQL was also easy due to our team’s familiarity with it. Developing queries was quick and seemed promising for production. As long as indexes are properly configured on the datetime column, querying for a metric can be particularly efficient. An example query that fetches a 30 day window of metrics, averaged daily looks something like:

SELECT date_trunc(‘day’, created_at) as day, AVG(value) FROM metrics WHERE created_at > NOW() - INTERVAL ‘90 days’ AND created_at < NOW() - INTERVAL ‘60 days’ AND project_id = 123 AND name = ‘my_metric’ GROUP BY 1 ORDER BY 1

An astute reader would notice though that to render a timeline view like shown above, she needs more than a single queried datapoint. In a 30 day timeline, all datapoints in that interval must be retrieved by the database and aggregated. If there are millions of data points daily, a query like this might take a noticeable amount of time to evaluate and provide noticeable loading delay to our customer, impacting the overall experience of using our app.

Another issue with storing such data in a SQL database comes with expiring old data. It may seem trivial to write a `DELETE` query on a range of data using the same index, but such an operation will likely lock writes to the table until it can complete, causing performance constraints on the critical write path of the backend.

Timeseries DBs to the Rescue

Quickly, our engineering team realized the need to use a timeseries database to store and query the metrics data.

Thankfully, InfluxDB (and most other timeseries databases) allow automatic time-based expiration of data, without the need for complex partitioning strategies required to solve the problem in SQL DBs.

Performance with Continuous Queries (Tasks)

provides a bound on the number of points per time range

InfluxDB continuous queries providing a bound on the number of points per time range.

InfluxDB continuous queries providing a bound on the number of points per time range.

Cardinality Constraints

One challenge that we faced with using InfluxDB is the concept of cardinality limits. In short, the unique tags of each data point in a given bucket multiply, making it very easy for the value to grow. More precisely, the cardinality is defined as the number of unique measurement, tag set, and field key combinations in an InfluxDB bucket.

Because we were submitting certain string metrics from our users as tag values on influx measurements, the number of unique values for the tags were unbounded. For example, we would report a network request’s latency as a measurement with the URL of the request as one of the tags. Since our customers make many network requests and the URLs were often generated and unique, this value quickly grew beyond the allowed limit of 1M.

Bucket per Project

A solution that has since worked for us was to split the data so that each customer project would correspond to an influx bucket. Because the number of unique values only multiply within a bucket, the cardinality growth becomes much more manageable and can stay within the allowed limits. However, this solution doesn’t come without some downsides. Unfortunately, influx tasks typically operate on a single bucket. Having a task per bucket results in some additional implementation hurdles as well as added costs from running additional queries.

Insightful Info

Visualizing and reporting on metrics from our customers’ full stack applications allows us to help provide performance insights, solve nuanced regressions, and analyze the efficiency of their infrastructure all from LaunchDarkly. With our session replay functionality, you can see exactly how errors impact your users in real-time. Plus, access full, language-specific stack traces to get to the root of any issue. The console and network tabs provide all the necessary tools for debugging, and with precise timing, you’ll be able to understand exactly when and how bugs occur. Don’t let errors slow you down - try LaunchDarkly today and take your debugging to the next level.