Multi-armed bandit results | LaunchDarkly

Overview

This topic explains how to read the results of a multi-armed bandit. Multi-armed bandits automatically shift traffic to the leading variation over time, so you do not need to actively monitor a multi-armed bandit, and you do not need to manually ship the leading variation. However, you can still use the Results tab to view the performance of each variation, and understand how the multi-armed bandit is reallocating traffic.

You can run a multi-armed bandit indefinitely, allowing it to reallocate traffic as needed. If you don’t want to run a multi-armed bandit indefinitely, we recommend that you stop it when a single variation’s probability to be best reaches 99% or higher.

If you are running a multi-armed bandit on a time-boxed feature, such as a holiday promotion, then you can stop the multi-armed bandit iteration when the promotion is over.

Summary

The “Summary” section displays the multi-armed bandit’s optimization goal and key takeaways.

Cumulative exposures

The “Cumulative exposures” section displays how many contexts have encountered each variation over time. Hover on the chart to display a breakdown of how many contexts encountered each variation on a specific date and time.

Cumulative results

The “Cumulative results” chart displays the leading variation, and, depending on the metric type, columns with information about probability to be best, conversion rate, conversions, mean, and exposures.

Leading variation

The leading variation is the treatment that currently has the highest probability to be best. As the multi-armed bandit runs, it will reallocate traffic to the leading variation at the frequency you specified when you created it.

The current leading variation is indicated on the “Cumulative results” chart:

A multi-armed bandit cumulative results chart.

Probability density

The probability density chart displays the distribution of the results for the metric. Click Show probability density chart to display the chart, and Hide probability density chart to hide it.

The horizontal x-axis displays the unit of the metric included in the experiment. For example, if the metric is measuring revenue, the unit might be dollars, or if the metric is measuring website latency, the unit might be milliseconds.

If the unit you’re measuring on the x-axis is something you want to increase, such as revenue, account sign ups, and so on, then the farther to the right the curve is, the better. The variation with the curve farthest to the right means the unit the metric is measuring is highest for that variation.

If the unit you’re measuring on the x-axis is something you want to decrease, such as website latency, then the farther to the left the curve is, the better. The variation with the curve farthest to the left means the unit the metric is measuring is lowest for that variation.

How wide a curve is on the x-axis determines the credible interval. Narrower curves mean the results of the variation fall within a smaller range of values, so you can be more confident in the likely results of that variation’s performance.

The vertical y-axis measures probability. You can determine how probable it is that the metric will equal the number on the x-axis by how high the curve is.

Probability to be best

The probability to be best for a variation is the likelihood that it outperforms all other variations for a specific metric. For multi-armed bandits, the variation with the highest probability to be best is considered the leading variation.

Additional columns in the “Cumulative results” chart display depending on the metric type you used in the multi-armed bandit. Expand the sections below to view information for different metric types.

Binary conversion metrics

Binary conversion metrics include:

Custom conversion binary metrics
Clicked or tapped metrics using the Occurrence option
Page viewed metrics using the Occurrence option

Expand Binary conversion metrics

Conversion rate

The value for each unit in a binary conversion metric can be either 1 or 0. A value of 1 means the conversion occurred, such as a user viewing a web page, or submitting a form. A value of 0 means no conversion occurred.

The conversion rate column displays the percentage of units with at least one conversion that you should expect in this experiment, based on the data collected so far. For example, the percentage of users you can expect to click at least once.

Conversions

The conversions column displays the total number of users or other contexts that had at least one conversion.

Total exposures

The total exposures column displays the total number of contexts that encountered the metric as part of the multi-armed bandit.

Count conversion and numeric metrics

Count conversion and numeric metrics include:

Custom conversion count metrics
Numeric metrics
Clicked or tapped metrics using the Count option
Page viewed metrics using the Count option

Expand Count conversion and numeric metrics

Posterior mean

The posterior mean is the variation’s average numeric value that you should expect in this experiment, based on the data collected so far.

All of the data in the results table are based on a posterior distribution, which is the combination of the collected data and our prior beliefs about that data. To learn more about posterior distributions, read Frequentist and Bayesian modeling.

LaunchDarkly automatically performs checks on the results data, to make sure that actual context traffic matches the allocation you set. To learn more, read Understanding sample ratios.

Total value

The total value is the sum total of all the numbers returned by a numeric metric.

Total exposures

The total exposures column displays the total number of contexts that encountered the metric as part of the multi-armed bandit.