Regression thresholds for guarded rollouts
This topic includes advanced concepts
This section includes an explanation of advanced statistical concepts. We provide them for informational purposes, but you do not need to understand these concepts to use guarded rollouts.
Overview
This topic explains how to determine a custom regression threshold for a guarded rollout.
Guarded rollouts availability
All LaunchDarkly accounts include a limited number of guarded rollouts. Use these to evaluate the feature in real-world deployments.
For each metric you use in a guarded rollout, you can use the LaunchDarkly default regression threshold or a custom threshold:
- The Default option uses LaunchDarkly’s standard regression threshold, appropriate for most use cases.
- The Set custom thresholds option lets you define how much underperformance you’re willing to tolerate. This option requires familiarity with statistical concepts, which are explained below.
Minimum context requirement for guarded rollouts
A new flag variation must be evaluated by a minimum number of contexts during each step of a guarded rollout. If this minimum sample size isn’t met, LaunchDarkly automatically rolls back the change.
Regression thresholds
When you create a guarded rollout, the regression threshold represents the level of underperformance you’re willing to tolerate in the new variation you’re rolling out, called the “treatment,” as compared to the original variation, called the “control.” The threshold value ranges from 0% to 100%.
Here’s what the values mean:
- Thresholds closer to 0%: A more conservative approach with minimal tolerance for performance drops. This indicates little tolerance for risk that the new variation performs worse than the original.
- Thresholds closer to 100%: A less conservative approach with greater tolerance for potential regressions. This indicates greater tolerance for risk, meaning you are more willing to accept the possibility of the new variation performing worse than the original.
If you want to prioritize detecting even small regressions, set the threshold lower. If you’re comfortable with some performance degradation, you can increase the threshold to a value greater than 0%.
How regression thresholds work
Regression thresholds are relative, not absolute. They compare the performance of the new variation to the original variation.
LaunchDarkly considers a regression to have occurred when the relative difference between the new and original variations exceeds your specified threshold:
Where:
New
is the metric value for the new variationOriginal
is the metric value for the original variation
LaunchDarkly compares the relative difference between the two values. For example, if the original error rate is 5%, a 10% threshold means the new variation must exceed 5.5% before it’s flagged as a regression.
Example
Consider an error rate metric, where the lower is better. If the original’s error rate is 5% and the regression threshold is set to 10%, LaunchDarkly detects a regression when the new error rate exceeds 5.5% with high probability, as calculated here:
LaunchDarkly uses a probability to be worse model to evaluate this condition. This means LaunchDarkly calculates the likelihood that the new variation performs worse than the original variation by more than the specified threshold. If that probability exceeds 95%, LaunchDarkly detects a regression.
Risk tolerance and threshold selection
The regression threshold acts as a tuning parameter for your risk tolerance:
- Use lower thresholds to detect regressions quickly and be more conservative.
- Use higher thresholds to allow for more variability in metric performance before detecting a regression.
Common misconceptions
For a deeper explanation of how regression thresholds work in practice, read the blog post Defining regression thresholds for guarded rollouts.
Regression thresholds can be misunderstood. Consider an error rate metric, where the lower is better. Here are some common incorrect interpretations:
- “If I set the regression threshold to 10%, LaunchDarkly will detect a regression when the metric reaches 10%.” This is False.
- “A 10% threshold means the new variation must be 10% of the original value to count as a regression.” This is False.
Correct interpretation:
- “A 10% regression threshold means the new variation must be more than 10% worse relative to the original variation.”
For example, if the original variation’s error rate is 5%, a regression is detected only if the new variation’s error rate exceeds 5.5%.
Direction of improvement and the concept of “worse”
Metrics may have different “success criteria,” defined in the metric configuration:
- For metrics where lower is better (for example: latency, error rate), an increase may indicate a regression.
- For metrics where higher is better (for example: conversions, revenue), a decrease may indicate a regression.
The regression threshold defines how much deviation from the original variation you’re willing to accept.
LaunchDarkly uses a “probability to be worse” method to detect regressions. This represents the likelihood that the new variation is worse than the original variation by more than the allowed threshold. If this probability exceeds 95%, LaunchDarkly detects a regression.
Custom threshold calculation methods
LaunchDarkly uses a Bayesian statistical model to assess the probability of a new variation performing worse than the original variation. You can customize the threshold if needed. LaunchDarkly uses the custom threshold to determine what constitutes “worse” by comparing the relative difference between the new and original variations. Then, LaunchDarkly calculates the probability of the new variation being worse. If this probability surpasses 95%, LaunchDarkly identifies a regression.
For examples of customized thresholds with metrics using different analysis methods, read Analysis method.
Example: A metric using the “Average” analysis method
Imagine you are using a metric with the “Average” analysis method in a guarded rollout, and the metric represents conversion rate, so the higher is better. The true conversion rate of the original variation is 2%, and you set the regression threshold to 10%.
In this example, LaunchDarkly detects a regression when .
In other words, LaunchDarkly detects a regression when there is a 95% probability that the true, unknown conversion rate of the new variation would be smaller than 1.8%, given the evidence provided by the observed data. The threshold is with respect to the relative difference of the new variation’s mean and original variation’s mean.
Example: A metric using the “Percentile” analysis method
This example follows the same structure as the “Average” analysis method example, but applies to a percentile-based metric where lower values are better.
Imagine you are using a metric with the “Percentile” analysis method in a guarded rollout that measures 90th percentile of latency, so the lower is better. The percentile of the original variation is 1,000 ms, and you set the regression threshold to 10%.
In this example, LaunchDarkly detects a regression when .
In other words, LaunchDarkly detects a regression when there is a 95% probability that the new variation’s true 90th percentile latency exceeds 1,100 ms, based on the observed data. The threshold is with respect to the relative difference of the new variation’s 90th percentile and original variation’s 90th percentile.