This section includes an explanation of advanced statistical concepts. We provide them for informational purposes, but you do not need to understand these concepts to use CUPED for covariate adjustment.
This guide explains the methodology and usage of CUPED (Controlled experiments Using Pre-Experiment Data) for covariate adjustment in LaunchDarkly Experimentation results.
Covariate adjustment refers to the use of variables unaffected by treatment, known as covariates, for:
In mainstream statistics, covariate adjustment is typically performed using Fisher’s (1932) analysis of covariance (ANCOVA) model. In the context of online experimentation, Deng et al. (2013) introduced CUPED (short for Controlled Experiments Using Pre-Experiment Data), which can be thought of as a special case of ANCOVA with the pre-period version of the modeled outcome as a single covariate.
In this guide, we use the terms covariate adjustment, analysis of covariance (ANCOVA), and CUPED interchangeably.
In a randomized experiment, there are three types of variables defined for each experiment unit, such as “user,” in a user-randomized experiment:
Outcomes are post-treatment variables. These are variables potentially affected by the treatment, or measured after the treatment is assigned. An example is revenue measured after the user enters the experiment.
Covariates must be pre-treatment variables, which are variables measured before the treatment is assigned, or variables unaffected by treatment. Examples include revenue measured before a user enters the experiment, which is measured before treatment, and gender, which is unaffected by the treatment.
The goal of covariate adjustment is to improve the measurement of an experiment outcome, such as experiment revenue, through the use of prognostic covariates. Prognostic covariates are covariates predictive of the outcome. Pre-experiment revenue is an example of a prognostic covariate, which is typically predictive of experiment revenue. The ANCOVA model, and CUPED in particular, does this by leveraging the correlation, which is the strength of linear relationship, between an outcome and a set of covariates, with the goal of improving measurement precision and accuracy.
We illustrate this with a simple example of an outcome , such as experiment revenue, and a covariate , such as pre-experiment revenue.
In this example, there is a strong linear relationship between them for both treatment and control variations, shown in the scatter plot on the left below:

Predicting the observations in the treatment and control variations with, respectively, the sample means and results in a large variance for the errors, as illustrated in the plot on the right above.
However, we can leverage the linear relationship between and by predicting the observations in the treatment and control variations with, respectively, the regression predictions and , as shown in the scatter plot on the left below:

This results in smaller variance for the errors, as shown in the density plot on the right above. The above two scatter plots were inspired by those shown in Huitema (2011).
The correlation, which is the strength of the linear relationship between the outcome and the covariate , determines how much the error variance is reduced. The larger the correlation, the larger the variance reduction.
Specifically, if we denote the original error variance estimates for the two variations by, respectively, and , and the new error variance estimates using CUPED by, respectively, and , and the outcome-covariate correlations by, respectively, and , then the following holds approximately:
The proportional reduction in error variance is approximately the square of the correlations:
If the correlations in both variations are , the error variance will be reduced by , and if they are , the error variance will be reduced by . The proportional reduction in the error variance translates to about the same proportion reduction in the variance of the experiment lift estimate, which translates to the same proportional reduction in experiment duration on average. Therefore, when the correlations are , the experiment duration will be reduced by as much as on average, and when they are , the experiment duration will be reduced by as much as on average. In other words, this can cut experiment duration nearly in half.
In addition to reducing the variance of lift estimates, CUPED applies an adjustment to the sample means and to produce the following covariate-adjusted means:
Where and denote the covariate means for, respectively, the treatment and control variations, and denote the covariate mean over all experiment variations. Although the unadjusted means and are unbiased estimators of the variation averages over many realizations of the experiment, for a specific experiment there could be some conditional bias. Conditional bias may occur due to the random imbalances between the treatment and control variation covariate means and . As long as the linear regression model is correct, the adjustments and control for these imbalances and remove the conditional bias.
In this section we discuss the scope and model for the CUPED implementation in the LaunchDarkly Experimentation product.
CUPED is available for experiments when the following criteria have been met:
The covariate adjustment model implemented is characterized by the following two features:
Besides giving us the most general model, another advantage of Feature 1 is that we can implement the ANCOVA model by fitting separate linear regression models by variation. This means we fit one for each experiment variation, which simplifies implementation. One advantage of Feature 2 is that we can fit the linear regression models using simple analytical formulas without needing to use specialized statistical software for linear regression. Combining Features 1 and 2 yields a very simple SQL implementation that you can apply to big data with computational efficiency.
Some may express concern about our using only one covariate in the model when we could potentially include more. In practice, using only the single pre-period covariate is advantageous from both the data collection and model fit points of view:
The pre-period covariate is measured over a seven-day lookback window before the start of the experiment. Precedent for using only seven days is established by the implementation of the PrePost model for covariate adjustment for YouTube experiments, mentioned in Soriano (2019), which is the basis for our implementation.
There is also a tradeoff between using shorter versus longer windows in terms of relevance versus sufficiency. Shorter windows may have more relevance due to the recency of the information measured, but may not have captured all the information to optimize the outcome-covariate correlation. Longer windows capture more information, but risk including irrelevant information from older events, which may decrease the outcome-covariate correlation.
Each LaunchDarkly experiment indicates whether CUPED is enabled or disabled on its Results tab above the “Exposures” graph.

The CUPED statuses indicate:
For those interested, we will cover some advanced topics in the following sections.
For a two-variation experiment, you can formulate the ANCOVA 3 model implemented at LaunchDarkly as a single model. For example:
where if unit is in the treatment variation and if unit is in the control variation.
The original ANCOVA model introduced by Fisher (1932) makes the following assumptions:
Yang and Tsiatis (2001) referred to this original model as the ANCOVA 1 model. If we remove Assumption 1 to allow for unequal covariate slopes, that is, allowing for , then we have what Yang and Tsiatis (2001) calls the ANCOVA 2 model, also known as Lin’s (2013) model or the ANHECOVA (ANalysis of HEterogeneous COVAriance) model of Ye et al. (2021).
However, in practice it can be convenient to relax Assumption 2 in addition to Assumption 1, which allows for unequal error variances, that is, . This gives us what we call the ANCOVA 3 model.
This can be implemented in two ways:
nlme::gls function in R.Fitting separate models has the advantage of fitting very simple regression models when there is only one covariate. This makes for a simple SQL implementation without leveraging additional software, which improves computational efficiency, especially on big data. We give an example of a simple SQL implementation of the ANCOVA 3 model in the section SQL Implementation.
In a comparative study, whether a randomized experiment or an observational study, the goal is to perform causal inference, which includes estimating the causal effect of a treatment, for example, the causal effect of a new product feature on revenue.
Under the Neyman-Rubin potential outcomes framework for causal inference, we begin with individual potential outcomes (IPOs) and for, respectively, receiving the treatment and not receiving the treatment, for each individual . The individual treatment effect (ITE) for individual is given by:
One estimand for the causal effect of treatment is the average treatment effect (ATE), which is the average of the ITEs:
This is the difference between the average potential outcomes (APOs) and of receiving and not receiving the treatment, respectively. An alternate causal estimand is the relative average treatment effect (RATE):
In the LaunchDarkly Experimentation product, we estimate the APO for each experiment variation for every combination of analysis time, experiment iteration, metric, and attribute. We then perform causal inference based on estimating the RATE for each treatment variation versus control.
To perform causal inference, we first estimate the IPOs by their respective linear regression predictions for the treatment and control variations using the ANCOVA 3 model described earlier:
The APOs are estimated by averaging the IPOs over all available units. In this case, the units are in both the treatment and control variations:
where denotes the average of the covariate over all units in both variations. Because the linear regression models have only one predictor, the estimated regression intercepts are given by:
Therefore, the estimated APOs are given by:
We refer to and as covariate-adjusted means. They are the unadjusted sample means and , minus the adjustments and . This removes conditional bias due to the randomized imbalances between the covariate means and for both the treatment and control variations, respectively.
You can compute the estimated regression slopes with the following formulas:
where:
We can show that the estimated SEs for the covariate-adjusted means for both the treatment and control variations are:
where and are the sample sizes for the treatment and control variations, respectively.
When the sample sizes and are large and the imbalances and are negligible, the above SEs reduce to the following:
Therefore, the proportional variance reduction for each is approximately equal to the squared correlation for the variation, as we showed earlier:
For frequentist estimates, the estimates of the APOs are the above covariate-adjusted means and . In the Bayesian model, the APO estimates are regularized using empirical Bayes priors. To learn more, read Statistical methodology for Bayesian experiments and Statistical methodology for frequentist experiments.
The Bayesian results without covariate adjustment through CUPED continue to use the normal-normal model for custom conversion count and custom numeric continuous metrics and the beta-binomial model for custom conversion binary, clicked or tapped, and page viewed metrics. However, the Bayesian results with covariate adjustment through CUPED will use the normal-normal model for all metrics using the “average” analysis method, including custom conversion binary metrics. Under this model, we assume the following prior distribution for the parameter estimated in variation :
For details on the prior mean and , read Statistical methodology for Bayesian experiments.
LaunchDarkly provides a frequentist estimate and its estimated standard error . For the non-CUPED results, the estimate is the sample mean. For CUPED results, the estimate in the covariate-adjusted mean , with details provided in the previous section.
We define precision as the inverse of the variance, which is equivalent to the inverse of the squared standard error. Therefore, the estimated precisions of the prior distribution and the frequentist estimate are, respectively:
Define the following precision sum and weight:
Then the posterior distribution of the estimated parameter is given by:
where the posterior mean is given by the precision-weighted average of the frequentist estimate and the prior mean , and the posterior variance is the inverse of the sum of the frequentist estimate precision and the prior precision .
Here is an example SQL implementation of the ANCOVA 3 model for covariate adjustment to demonstrate its simplicity.
Assume that we have fields y and x in a table named UnitTable, which is aggregated by experiment units, with fields for analysis time, experiment, metric, segment, and variation. The following simple query produces non-CUPED and CUPED estimates with corresponding SEs aggregated by combinations of analysis time, experiment, metric, segment, and variation:
The BasicStats common table expression (CTE) produces the following aggregated statistics needed to compute the unadjusted and covariate-adjusted means for each combination of analysis time, experiment, metric, segment, and variation:
The outer query takes the aggregated statistics from the BasicStats CTE to compute the unadjusted and covariate-adjusted means and their SEs using the formulas we derived in the “Covariate-adjusted means” section.
Deng, Alex, Ya Xu, Ron Kohavi, and Toby Walker (2013). “Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data.” WSDM’13, Rome, Italy.
Fisher, Ronald A. (1932). Statistical Methods for Research Workers. Oliver and Boyd. Edinburgh, 4th ed.
Huitema, Bradley (2011). Analysis of Covariance and Alternatives: Statistical Methods for Experiments, Quasi-Experiments, and Single-Case Studies, 2nd ed. Wiley.
Lin, Winston (2013). “Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique.” Annals of Applied Statistics, 7(1): 295-318.
Soriano, Jacopo (2019). “Percent Change Estimation in Large Scale Online Experiments.” https://arxiv.org/pdf/1711.00562.pdf.
Yang, Li and Anastasios A. Tsiatis. (2001). “Efficiency Study of Estimators for a Treatment Effect in a Pretest-posttest Trial.” American Statistician, 55: 314-321.
Ye, Ting, Jun Shao, Yanyao Yi, and Qingyuan Zhao (2023). “Toward Better Practice of Covariate Adjustment in Analyzing Randomized Clinical Trials.” Journal of the American Statistical Association, 118(544): 2370-2382.