Creating A/A tests
Overview
This topic explains how to set up and configure an A/A test in LaunchDarkly. A/A tests are a type of feature change experiment that splits users into different, but identical, variations. When you run an A/A test, you compare two groups receiving the same product experience. This lets you validate that your experiment setup is working as intended, your metrics are tracking events as expected, and builds trust in your experiment results.
Configuring an A/A test requires several steps:
- Creating the flag or AI Config and its variations,
- Creating one or more metrics,
- Building the A/A test,
- Turning on the flag or AI Config, and
- Starting an iteration.
These steps are explained in detail below.
Prerequisites
Before you build an A/A test, you should read about and understand the following concepts:
Create flags or AI Configs
To begin an A/A test, create a flag or AI Config with two variations. These variations must be defined identically to each other in your codebase.
A/A test variations must be identical
To run a successful A/A test, the two variations you use must be identical. Ensure that both variations are defined the same way in your code.
If you want to run an A/A test on a flag or AI Config with more than two variations, then at least two of the variations must be identical. Any additional variations can be unique. When you create the A/A test, you will not assign any traffic to those additional, unique variations.
You do not need to toggle on the flag before you create an A/A test, but you do have to toggle on the flag before you start an experiment iteration. AI Configs are on by default.
To learn more, read Creating new flags, Creating flag variations, Create AI Configs, and Create and manage AI Config variations.
Limitations
You cannot run an A/A test on a flag if:
- the flag has an active guarded rollout
- the flag has an active progressive rollout
- the flag is in a running Data Export experiment
- the flag is in a running warehouse native experiment
- the flag is a migration flag
You can build and run multiple funnel optimization experiments, feature change experiments, and A/A tests on the same flag or AI Config as long as there is only one running experiment per rule. You cannot run multiple experiments on the same rule at the same time.
Create metrics
Metrics measure audience behaviors affected by your flags and AI Configs. You can use metrics to track all kinds of things, from how often end users access a URL to how long that URL takes to load a page. You can reuse metrics in multiple A/A tests, or create new ones for your A/A test.
To learn how to create your own new metric, read Metrics. LaunchDarkly also automatically creates metrics for AI Configs. To learn more, read Metrics generated from AI SDK events.
Percentile analysis methods for Experimentation are in beta
The default metric analysis method is “Average.” The use of percentile analysis methods with LaunchDarkly experiments is in beta. If you use a metric with a percentile analysis method in a an experiment with a large audience, the experiment results tab may take longer to load, or the results tab may time out and display an error message. Percentile analysis methods are also not compatible with CUPED adjustments.
You can use one or more metrics or standard metric groups in A/A tests. However, you cannot use funnel metric groups in A/A tests. To learn more, read Metric groups.
Build A/A tests
You can view all of the experiments in your environment on the Experiments list.
To build an A/A test:
- Click Create and choose Experiment. The “Create experiment” dialog appears.
- Enter an A/A test Name.
- Enter a Hypothesis.
- Click Create experiment. The experiment Design tab appears.
- Select the Feature change experiment type.
- Check Run as A/A test.
- Choose a context kind to Randomize by.
- Select one or more Metrics or metric groups.
- A list of environments displays. It shows which environments have received events for these metrics. If no environments are receiving events, check that your SDKs are configured correctly.
- (Optional) If you have added multiple metrics and want to change the primary metric, hover on the metric name and click primary.
- Click Create to create and use a new metric or new standard metric group.
- Choose a Flag or AI Config to use in the A/A test. The flag or AI Config must have two identical variations.
- Click Create flag or Create AI Config to create and use a new flag or AI Config.
- Choose a targeting rule for the Experiment audience.
- If you want to restrict your A/A test audience to only contexts with certain attributes, create a targeting rule on the flag or AI Config you include in the A/A test and run the A/A test on that rule.
- If you don’t want to restrict the audience for your A/A test, run the A/A test on the default rule. If the flag or AI Config doesn’t have any targeting rules, the default rule will be the only option.
- Choose the Variation served to users outside this experiment. Contexts that match the selected targeting rule but are not in the A/A test will receive this variation.
- Select the Sample size for the A/A test. This is the percentage of all of the contexts that match the A/A test’s targeting rule that you want to include in the A/A test.
- For A/A tests on flags or AI Configs with two variations, leave the variations split at 50%/50%.
- If your flag or AI Config has more than two variations, click Edit to update the variation split. Assign 50% to each of the two identical variations, and 0% to any other variations.
- Click Save audience split.
- Select a variation to serve as the Control.
- Select a Statistical approach of Bayesian or frequentist.
- If you selected a statistical approach of Bayesian, select a preset or Custom success threshold.
- If you selected a statistical approach of frequentist, select:
- a Significance level.
- a one-sided or two-sided Direction of hypothesis test.
Expand statistical approach options
You can select a statistical approach of Bayesian or Frequentist. Each approach includes one or more analysis options.
We recommend Bayesian when you have a small sample size of less than a thousand contexts, and we recommend Frequentist when you have a larger sample size of a thousand or more.
The Bayesian options include:
- Threshold:
- 90% probability to beat control is the standard success threshold, but you can raise the threshold to 95% or 99% if you want to be more confident in your A/A test results.
- You can lower the threshold to less than 90% using the Custom option. We recommend a lower threshold only when you are experimenting on non-critical parts of your app and are less concerned with determining a clear winning variation.
The frequentist options include:
- Significance level:
- 0.05 p-value is the standard significance level, but you can lower the level to 0.01 or raise the level to 0.10, depending on if you need to be more or less confident in your results. A lower significance level means that you can be more confident in your winning variation.
- You can raise the significance level to more than 0.10 using the Custom option. We recommend a higher significance level only when you are experimenting on non-critical parts of your app and are less concerned with determining a clear winning variation.
- Direction of hypothesis test:
- Two-sided: We recommend two-sided when you’re in doubt about whether the difference between the control and the treatment variations will be negative or positive, and want to look for indications of statistical significance in both directions.
- One-sided: We recommend one-sided when you feel confident that the difference between the control and treatment variations will be either negative or positive, and want to look for indications of statistical significance only in one direction.
To learn more, read Bayesian versus frequentist statistics.
- (Optional) If you want to be able to filter your A/A test results by attribute, click Advanced, then select up to five context attributes to filter results by.
- Scroll to the top of the page and click Save.
If needed, you can save your in-progress A/A test design to finish later. To save your design, click Save at the top of the creation screen. Your in-progress A/A test design is saved and appears on the Experiments list. To finish building the A/A test, click on the A/A test’s name and continue editing.
After you have created your A/A test, the next step is to toggle on the flag. AI Configs are on by default. Then, you can start an iteration.
You can also use the REST API: Create experiment
Turn on flags or AI Configs
For an A/A test to begin recording data, the flag or AI Config used in the A/A test must be on. Targeting rules for AI Configs are on by default. To learn how to turn targeting rules on for flags, read Turning flags on and off.
Start A/A test iterations
After you create an A/A test and toggle on the flag, you can start an A/A test iteration in one or more environments.
To start an A/A test iteration:
- Navigate to the Experiments list.
- Click on the environment section containing the A/A test you want to start.
- If the environment you need isn’t visible, click the + next to the list of environment sections. Search for the environment you want, and select it from the list.
- Click on the name of the A/A test you want to start an iteration for. The Design tab appears.
- Click Start.
- Repeat steps 1-4 for each environment you want to start an iteration in.
A/A test iterations allow you to record A/A tests in individual blocks of time. To ensure accurate A/A test results, when you make changes that impact an A/A test, LaunchDarkly starts a new iteration of the A/A test.
To learn more about starting and stopping iterations, read Starting and stopping experiment iterations.
You can also use the REST API: Create iteration