Datasets
Overview
This topic explains how to create, manage, and use datasets for offline evaluations. Datasets define the inputs AgentControl uses to evaluate model behavior before release.
Offline evaluations run config variations or LLM inputs against uploaded datasets. They score the outputs of those evaluations using criteria such as built-in scorers or judges defined as AgentControl configs. Datasets support repeatable evaluation workflows. You can reuse the same dataset across multiple evaluation runs to compare variations and validate changes before rollout.
Prepare your dataset
A dataset is a file in CSV or JSONL format. Each row represents a single evaluation task that LaunchDarkly evaluates independently during a run.
Each row can include the following fields:
input: The prompt or request sent to the model.expected_output: The ideal or target output for the associated input. After you complete an evaluation, you can use this field to compare what you expected against what the model actually returns.context: Supporting information provided alongside the input, such as retrieved documents or tool responses.variables: Named values that populate placeholders in your config prompt templates at runtime.
LaunchDarkly generates one model output per row and evaluates it using the criteria you configure.
Example dataset row
Use this structure to compare generated outputs against expected results for known scenarios.
Upload datasets
To use a dataset in an offline evaluation, upload a CSV or JSONL file.
- Navigate to the Library page.
- Select the Datasets tab.
- Click Upload dataset. The “Upload dataset” modal opens.
- (Optional) Enter a Name for the dataset. If you don’t specify a name, the dataset will use the same name as the file you upload.
- Drag and drop or click to select your dataset file.
- Click Save dataset.
After you upload a dataset, LaunchDarkly validates and processes the file for use in evaluation runs. This includes validating the file format, detecting the dataset schema, and enforcing row and size limits. LaunchDarkly also computes a dataset hash for deduplication and stores dataset metadata.
When validation completes, the dataset “Status” field in the Datasets tab updates to “ready.” The dataset is now available for evaluation runs.
If validation fails, an error appears. If you correct the error, you can upload the dataset again.
Datasets in evaluations
When you configure an offline evaluation, you select a dataset to use as input. To view evaluations that have used a dataset:
- Navigate to the library and click into the Datasets tab.
- Find the dataset for which you wish to view more information.
- Click the three-dot overflow menu and choose View evaluations. The “Evaluations” page opens.
Manage datasets
After you upload a dataset, it appears in the AgentControl library. You can now use it in evaluation runs.
As you use the dataset, the ”# of Evaluations” column in the Datasets tab updates to show how many evaluations have used that dataset.
Delete datasets
You can delete a dataset if you no longer need it. Here’s how:
- Navigate to the library and click into the Datasets tab.
- Find the dataset for which you wish to view more information.
- Click the three-dot overflow menu and choose Delete. A confirmation message appears.
- Verify that you wish to delete the dataset and click Delete dataset.