For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sign inTry it free
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
DocsGuidesSDKsIntegrationsAPI docsTutorialsFlagship blog
  • Flagship blog
    • 52 Blog Posts, Claude, 3 Prompts, Under an Hour
    • Shipping from Oakland: An Observability Hackathon Recap
    • Day 12 | New Year, New Observability
    • Day 11 | What engineering teams really want from Observability
    • Day 10 | Why observability and feature flags go together like milk and cookies
    • Day 9 | The Three Ghosts Haunting Your AI This Holiday Season
    • Day 8 | Observable Multi-Modal Agentic Systems
    • Day 7 | SLOs that actually drive decisions
    • Day 6 | Stop cardinality from stealing your cloud budget
    • Day 5 | Using a Popular Tidying Method to Consolidate Your Observability Stack
    • Day 4 | Tracing the impact of feature flags in your Node.js app
    • Day 3 | Zero-Config Observability with OpenTelemetry
    • Day 2 | Why AI agents need three layers of observability
    • Day 1 | Observability Under the Tree: What Changed in 2025
    • 5 takeaways from my first PyCon JP conference
    • Dungeons & Downtimes: XP gained from our adventure
    • Reverse Proxy for custom domains
    • Adventures in dogfooding: Guarded Releases
    • A quick tool for npm package scanning
    • My DEF CON 33 experience
    • Make every launch a big deal
    • Fun with JS streams
    • Moonshots XXII: Hack to the Future recap
    • A tale of three rate limiters
    • My good friend Claude
    • My approach to React app architecture in 2025
    • Data isolation with ClickHouse row policies
    • Ingest and Visualization for OpenTelemetry Metrics
    • Alert Evaluations: Incremental Merges in ClickHouse
    • Optimizing ClickHouse: The Tactics That Worked for Us
    • Migrating from OpenSearch to ClickHouse
    • Revamping Privacy Mode: A Better Way to Obfuscate Sensitive Data
    • An open-source session replay benchmark
    • LLM-based Grouping of Errors
    • Building GitHub Enhanced Stacktraces
    • Vercel Edge Runtime Support
    • Finding Interesting Sessions with Markov Chains
    • Building Logging Integrations at LaunchDarkly
    • The Network Request Details Panel
    • Using Github as a Headless CMS
    • Your Source Maps Should Be Public
    • Supporting Outside Contributions at LaunchDarkly
    • Managing our design tokens at LaunchDarkly
    • Our Commitment to OpenTelemetry
    • The 5 Best Logging Libraries for Ruby
    • InfluxDB: Visualizing Millions of Customers' Metrics using a Time Series Database
    • 8 Tips to Help You Maximize Chrome DevTools
    • The Debugging Process and Techniques for Web Applications (Part 2/2)
    • 5 Best Node.js Logging Libraries
    • What are rage clicks and how to detect them
    • 5 Best Practices for Maintaining a Clean ReactJS App
    • Is Kafka the Key? The Evolution of LaunchDarkly's Ingest
    • What Is Full Stack Monitoring and How Does It Work?
    • The beauty of contact-first API design
    • What is Frontend Monitoring and What Tools Help You Do It?
    • 5 strategies to monitor the health of your web application
    • Configuring OpenSearch for a Write-Heavy Workload
    • Maximizing Our Machines: Worker Pools At LaunchDarkly
Sign inTry it free
LogoLogo
On this page
  • Defining a “page”
  • Normalizing our data
  • Scoring interesting sessions
Flagship blog

Finding Interesting Sessions with Markov Chains

Was this page helpful?
Previous

Building Logging Integrations at LaunchDarkly

Next
Built with

Published July 20, 2023

portrait of Zane Mayberry.

by Zane Mayberry

At LaunchDarkly, there’s always been an unwritten goal to identify interesting user sessions to make it easier for our customers to understand their users.

One way to analyze a session is to look at the user’s journey, summarized by the order of page visits and the time spent per page, the thought being that sessions with unusual user journeys would uncover insights into how users may be experiencing frustration or using the app in an unexpected way.

This blog post covers the logic that we use to analyze the user journeys of each session and surface the interesting ones.

Defining a “page”

First, in order to define a user journey as “the order of page visits”, we need to first define a “page”. The LaunchDarkly client records browser navigation events, so we can get the URLs and timestamps for each page visit. Conceptually, a page is a bit different from a URL, since one or many URLs may map to a single page. An app may also use URL normalization rules (e.g. case insensitivity, or removing trailing slashes).

For most apps, query parameters don’t change which page is rendered, just which content is displayed or how that content is rendered. Similarly, for dynamic routing, a routing scheme could include one or more resource IDs in the URLs’ paths. For example, in LaunchDarkly, errors details can be viewed at app.highlight.io/\{project-id\}/errors/\{error-id\}, but regardless of the project-id and error-id in the URL, we render the same “error details page” with different content displayed.

Normalizing our data

To reduce noise, we wanted to apply some normalization steps before saving a session’s user journey. A web app may have some redirection logic, such that between two page visits, there is an intermediate visit to another page which isn’t meaningful to the user. This doesn’t really impact the user journey, so we can just discard the intermediate page. To try our best to group URLs together as pages with data, we need to identify resource IDs in URLs - to do this, we wanted an algorithm that would split up a URL and identify which parts are likely resource IDs so we could remove them. There isn’t a single way to accomplish this, but we ended up splitting the URL paths on their slashes, and handling each part with a few heuristics that work pretty well for our use case:

  • If the part contains a number, treat the part as an ID.
  • Split the part on capitals or separator characters (’-’, ‘_’, ‘~’, ‘+’). If any of these doesn’t contain a vowel or contains more than 5 sequential consonants, treat the part as an ID.

There are a few cases where this heuristic won’t work. IDs generated from dictionary words (e.g. “diamond-hamburger-board”) will be treated as separate pages. And any static pages with numbers or strings of consonants (e.g. in acronyms) will be treated as IDs. Both of these situations won’t have a huge impact for most apps, but will add some noise to the results. A better approach may look back at previous journeys to merge similar URLs, but this can have issues too, e.g. for handling new pages.

Scoring interesting sessions

After applying these normalization steps, we can enumerate all of the state transitions in a user journey. For each step in the journey, we can calculate the bigram probability of the next url given the current url using all of the other sessions we’ve seen, creating a Markov chain. An interesting session should be one where the state transitions are less probable. For a LaunchDarkly user session, an example journey with its corresponding probabilities looks like this:

pagenext_pageprobability
START/{id-1}/sessions/{id-2}0.226
/{id-1}/sessions/{id-2}/{id-1}/errors/{id-2}0.0215
/{id-1}/errors/{id-2}/{id-1}/errors/{id-2}/instances/{id-3}0.0515
/{id-1}/errors/{id-2}/instances/{id-3}/{id-1}/sessions0.0606
/{id-1}/sessions/{id-1}/sessions/{id-2}0.479
/{id-1}/sessions/{id-2}END0.688

We can calculate the probability of the entire session by multiplying each step’s probability together. In the above example, we get a total probability of 5.01x10^-6. This is the probability of that exact user journey happening, assuming at each step that the next page was randomly chosen based only on the current page. We could calculate this probability for all sessions, then find the sessions with the lowest probability.

As fun as this sounds, this has a couple of issues:

  • Longer sessions will be favored, because probabilities are multiplied together and the probability of every step cannot be greater than 1.
  • Navigating among pages with many links to other pages will cause lower scores, as the expected probability is lower. For example, for a current page, if there are five other pages a user can visit and each is equally interesting (equal probability), the step’s probability will be .2 regardless of which page is chosen.

To solve these issues, we can normalize the probabilities such that the expected value for each state transition is 1. For the first issue, regardless of how many transitions are made, the expected value of an entire session is 1. For the second issue, all transitions are scored as 1 instead of .2. High probability steps will now have a normalized score greater than 1, and the total score of a session will be less than 1 if it’s more interesting and greater than 1 if it’s less interesting. Applying this normalization to the previous example looks like this:

pagenext_pagenormalized
START/{id-1}/sessions/{id-2}1.53
/{id-1}/sessions/{id-2}/{id-1}/errors/{id-2}0.0424
/{id-1}/errors/{id-2}/{id-1}/errors/{id-2}/instances/{id-3}0.275
/{id-1}/errors/{id-2}/instances/{id-3}/{id-1}/sessions0.221
/{id-1}/sessions/{id-1}/sessions/{id-2}1.77
/{id-1}/sessions/{id-2}END1.34

The query for generating these normalized scores:

1with frequencies as (
2 select page, next_page, count(*)
3 from user_journey_steps
4 where created_at > now() - interval '30 days'
5 and project_id = 1
6 group by 1, 2)
7select u.index, u.page, u.next_page,
8 sum(case when f.page = u.page and f.next_page = u.next_page then count else 0 end)
9 * (sum(case when f.page = u.page then count else 0 end))
10 / sum((case when f.page = u.page then count else 0 end) ^ 2) as normalized
11from frequencies f
12inner join user_journey_steps u
13on f.page = u.page or f.next_page = u.next_page
14where u.session_id = 259500567
15group by u.session_id, index
16order by u.session_id, index

Finally, we can calculate this score when a session is completed and save the value so that we can sort on it to find the most interesting sessions later on. We’re excited to use this logic across the LaunchDarkly application, and as a first step, we’re using this score for our new Session Insights email digest. This weekly summary of the most interesting sessions aims to help LaunchDarkly users find interesting behaviors in their apps they might not yet be aware of.