Frequentist experiment results
Overview
This topic explains how to read and use the results table of a frequentist experiment.
Filter options
You can filter an experiment’s results table by metric, variation, or attribute value.
Metrics filter
When you create an experiment, you can add one or more metrics for the experiment to measure. If an experiment is measuring more than one metric, you can filter the results to view only select metrics at a time. You cannot add additional metrics to an experiment after you create it.
To filter your results by metric, click All metrics and select the metric you want to view. The results table updates to show you results from only the metrics you selected.
Variations filter
To filter your results by variation, click All variations and select the variations you want to view. The results table updates to show you results from only the variations you selected.
Attributes filter
When you create an experiment, you can designate one or more context attributes as filterable for that experiment. If a context attribute is filterable, you can filter the experiment results by those attribute’s values. For example, if you designate the “Country” attribute as filterable, then you can narrow your results by users with a “Country” attribute value of Canada
.
You cannot designate additional context attributes as filterable after you create an experiment.
To filter your results by context attribute value:
- Click All attributes. A list of context attributes appears.
- Hover on the context attribute you want to filter by. A list of available values for that attribute appears.
- Search for and select the attribute values you want to view.
The results table updates to show you results from only contexts with the attribute value that you selected.
Visualization options
The results table has the following visualization options:
- Forest plot
- Relative difference
- Arm averages
Forest plot
The forest plot displays the confidence interval for each treatment variation with respect to the control variation.
Each horizontal bar shows the likely range of values for the average of the metric you’re computing. The bar extends to the lower and upper limits of that confidence interval. The midpoint of the bar is the observed mean value for the metric.
When the confidence interval no longer overlaps the 0% vertical line on the forest plot, that means the difference between the control variation and the treatment variation is statistically significant. The bar for the variation then displays green or red, depending on whether the variation is performing better or worse than the control.
The forest plot is the most efficient summary of how much variability there is in your experiment data and how close to statistical significance the comparisons are. The forest plot should be your main decision-making tool when deciding on a winning variation for frequentist experiments.
Relative difference
The relative difference graph displays a time series of the relative difference between the treatment variation and the control. This graph is helpful for investigating trends in relative differences over time.
Arm averages
The arm averages graph displays the average value over time for each variation. This graph is useful for investigating trends that impact all experiment variations equally over time.
Results chart data
This section explains the data that displays in the experiment results chart.
Graph
The graph column displays the forest plot, relative difference, or arm averages graph, depending on the visualization option you have selected. Click on the graph to view an enlarged version.
P-value
The p-value is a measure of how likely it is that any difference observed between a treatment variation and the control variation is due to random chance, rather than an actual difference in performance between the two variations. The idea that there is no actual difference between two groups is called the null hypothesis. The p-value helps determine whether you should accept or reject the null hypothesis for a given set of variations.
The standard p-value for LaunchDarkly experiments is 0.05, but you can set a different p-value when you create an experiment.
The p-value helps determine whether the null hypothesis is true for a given set of variations:
- Generally, a p-value of less than or equal to 0.05, suggests that the observed data is due to actual differences in performance between the treatment variation and the control variation, and not due to random chance.
- A p-value of more than 0.05 suggests that the observed data is not due to actual differences in performance between the treatment variation and the control variation, but instead fluctuations due to underlying noise in the data.
P-values in funnel optimization experiments
In funnel optimization experiments, LaunchDarkly calculates the p-value for each step in the funnel, but the final metric in the funnel is the metric you should use to decide the winning variation for the experiment as a whole.
LaunchDarkly includes all end users that reach the last step in a funnel in the experiment’s winning variation calculations, even if an end user skipped some steps in the funnel. For example, if your funnel metric group has four steps, and an end user takes step 1, skips step 2, then takes steps 3 and 4, the experiment still considers the end user to have completed the funnel and includes them in the calculations for the winning variation.
Conversion rate (conversion metrics only)
The conversion rate column displays for all conversion metrics. Examples of conversions include clicking on a button, or entering information into a form.
Conversion metrics can be one of two types: count or binary.
Conversion rates in count conversion metrics
The value for each unit in a count conversion metric can be any positive value. The value equals the number of times the conversion occurred. For example, a value of 3 means the user clicked on a button three times.
The aggregated statistic for count conversion metrics is the average number of conversions across all units in the metric. For example, the average number of times users clicked on a button.
Count conversion metrics include:
- Clicked or tapped metrics using the Count option
- Custom conversion count metrics
- Page viewed metrics using the Count option
Conversion rates in binary conversion metrics
The value for each unit in a binary conversion metric can be either 1 or 0. A value of 1 means the conversion occurred, such as a user viewing a web page, or submitting a form. A value of 0 means no conversion occurred.
The aggregated statistic for binary conversion metrics is the percentage of units with at least one conversion. For example, the percentage of users who clicked at least once.
Binary conversion metrics include:
- Clicked or tapped metrics using the Occurrence option
- Custom conversion binary metrics
- Page viewed metrics using the Occurrence option
For funnel optimization experiments, the conversion rate includes all end users who completed the step, even if they didn’t complete a previous step in the funnel. LaunchDarkly calculates the conversion rate for each step in the funnel by dividing the number of end users who completed that step by the total number of end users who started the funnel. LaunchDarkly considers all end users in the experiment for whom the SDK has sent a flag evaluation event as having started the funnel.
Conversions (conversion metrics only)
Conversions is the total number of conversions for a conversion metric:
- for custom conversion count metrics, it is the total number of conversions
- for custom conversion binary metrics, it is the total number of users or other contexts that had at least one conversion
Relative difference
The relative difference from the control variation measures how much a metric in the treatment variation differs from the control variation, expressed as a proportion of the control’s estimated value.
Mean (numeric and custom conversion count metrics only)
The mean is the variation’s average numeric value that you should expect in this experiment, based on the data collected so far. The mean displays only for numeric and custom conversion count metrics.
Total value (numeric metrics only)
The total value is the sum total of all the numbers returned by a numeric metric.
Exposures
The exposures column displays how many unique contexts encountered each variation of the experiment.
To learn more about troubleshooting if your experiment hasn’t received any metric events, read Experimentation Results page status: “This metric has never received an event for this iteration”.