Regression thresholds for guarded rollouts

This topic includes advanced concepts

This section includes an explanation of advanced statistical concepts. We provide them for informational purposes, but you do not need to understand these concepts to use guarded rollouts.

Overview

This topic explains how to determine a custom regression threshold for a guarded rollout.

Guarded rollouts availability

All LaunchDarkly accounts include a limited number of guarded rollouts. Use these to evaluate the feature in real-world deployments.

For each metric you use in a guarded rollout, you can use the LaunchDarkly default regression threshold or a custom threshold:

  • The Default option uses LaunchDarkly’s standard regression threshold. This option is appropriate for most guarded rollouts.
  • The Set custom thresholds option lets you specify a custom threshold. This option requires familiarity with statistical analysis concepts, which are explained below.
Minimum context requirement for guarded rollouts

A new flag variation must be evaluated by a minimum number of contexts during each step of a guarded rollout. If this minimum sample size for guarded rollouts isn’t met, LaunchDarkly automatically rolls back the change.

Regression thresholds

When you create a guarded rollout, the regression threshold represents the level of underperformance you’re willing to tolerate in the new variation you’re rolling out, called the “treatment group,” as compared to the current variation, called the “control group.” The threshold value ranges from 0% to 100%.

Here’s what the values mean:

  • Thresholds closer to 0%: A more conservative approach with minimal tolerance for performance drops. This indicates little tolerance for risk that the treatment group performs worse than the control group.
  • Thresholds closer to 100%: A higher tolerance for potential regressions before taking action. This indicates a higher tolerance for risk, meaning you are more willing to accept the possibility of the treatment group performing worse than the control group.

If you want to prioritize detecting even small regressions, set the threshold lower. If you’re comfortable with some performance degradation, you can increase the threshold to a value greater than 0%.

How regression thresholds work

Regression thresholds are relative, not absolute. They compare the performance of the treatment variation to the control variation.

LaunchDarkly considers a regression to have occurred when the relative difference between the treatment and control exceeds your specified threshold:

Calculation of regression threshold
( (New - Original) / Original ) > threshold

Where:

  • New is the metric value for the treatment variation
  • Original is the metric value for the control variation

Example

If the control group’s error rate is 5% and the regression threshold is set to 10%, LaunchDarkly detects a regression when the treatment group’s error rate exceeds 5.5%, as calculated here:

Calculation of regression rate
5% × (1 + 0.10) = 5.5%

LaunchDarkly uses a probability to be worse model to evaluate this condition. This means LaunchDarkly calculates the likelihood that the treatment variation performs worse than the control variation by more than the specified threshold. If that probability exceeds 95%, LaunchDarkly detects a regression.

Risk tolerance and threshold selection

The regression threshold acts as a tuning parameter for your risk tolerance:

  • Use lower thresholds to detect regressions quickly and be more conservative.
  • Use higher thresholds to allow for more variability in metric performance before detecting a regression.

Common misconceptions

For a deeper explanation of how regression thresholds work in practice, read the blog post Defining regression thresholds for guarded rollouts.

Regression thresholds can be misunderstood. Here are some common incorrect interpretations:

  • “If I set the regression threshold to 10%, LaunchDarkly will detect a regression when the metric reaches 10%.” This is False.
  • “A 10% threshold means the treatment must be 10% of the control value to count as a regression.” This is False.

Correct interpretation:

  • “A 10% regression threshold means the treatment group must be more than 10% worse relative to the control group.”

For example, if the control group’s error rate is 5%, a regression is detected only if the treatment group’s error rate exceeds 5.5%.

Direction of improvement and the concept of “worse”

Metrics may have different “success criteria,” defined in the metric configuration:

  • For metrics where lower is better (for example: latency, error rate), an increase may indicate a regression.
  • For metrics where higher is better (for example: conversions, revenue), a decrease may indicate a regression.

The regression threshold defines how much deviation from the control you’re willing to accept.

LaunchDarkly uses a “probability to be worse” method to detect regressions. This represents the likelihood that the treatment variation is worse than the control variation by more than the allowed threshold. If this probability exceeds 95%, LaunchDarkly detects a regression.

Custom threshold calculation methods

LaunchDarkly evaluates regressions differently based on the type of metric and its analysis method.

Average analysis method

For metrics that use the “Average” analysis method, you can set the threshold to a reference value. LaunchDarkly compares this value to the relative difference between the treatment and control groups to detect regressions.

Percentile analysis method

For metrics that use a percentile analysis method, the threshold is used as a reference value for evaluating differences in confidence intervals.

  • If the metric’s success criterion is “higher is better”: LaunchDarkly detects a regression when the treatment group’s upper bound is sufficiently lower than the control group’s lower bound, based on the threshold and the control’s percentile estimate. This is calculated as a ratio: the difference between the treatment’s upper bound and the control’s lower bound, relative to the control’s percentile estimate.

  • If the metric’s success criterion is “lower is better”: LaunchDarkly detects a regression when the treatment group’s lower bound is sufficiently higher than the control group’s upper bound, based on the threshold and the control’s percentile estimate. This is calculated as a ratio: the difference between the treatment’s lower bound and the control’s upper bound, relative to the control’s percentile estimate.

Example: A metric using the “Average” analysis method

Imagine you are using a metric with the “Average” analysis method in a guarded rollout. The true conversion rate of the control group is 2%, and you set the regression threshold to 10%.

In this example, LaunchDarkly detects a regression when P(true conversion rate of treatment group<2%(100%10%))=P(true conversion rate of treatment group<1.8%)>95%P(\text{true conversion rate of treatment group} < 2\% \cdot (100\% -10\%)) = P(\text{true conversion rate of treatment group}< 1.8\%) > 95\%.

In other words, LaunchDarkly detects a regression when there is a 95% probability that the true, unknown conversion rate of the treatment group would be smaller than 1.8%, given the evidence provided by the observed data. The threshold is with respect to the relative difference of the treatment’s mean and control’s mean.

Example: A metric using the “Percentile” analysis method

Imagine you are using a metric with the “Percentile” analysis method in a guarded rollout. The percentile of the control group is 1,000, and you set the regression threshold to 2%.

In this example, the regression detection behavior is as follows:

  • If the metric’s success criteria is “lower is better,” LaunchDarkly detects a regression when Treatment CI lowerControl CI upper>2%1000=20\text{Treatment CI lower} - \text{Control CI upper} > 2\% \cdot 1000 = 20.
  • If the metric’s success criteria is “higher is better,” LaunchDarkly detects a regression when Control CI lowerTreatment CI upper>2%1000=20\text{Control CI lower} - \text{Treatment CI upper} > 2\% \cdot 1000 = 20.