Workflows

Introduction

ReSim workflows provide a powerful way to organize and manage collections of test suites for CI/CD pipelines. Instead of managing individual test suites separately, workflows allow you to group related test suites together and run them as a single unit. Where a single batch answers "how does this build perform on these experiences?", a workflow answers "how does this version of our software perform across all the test suites we care about?" — producing one batch per enabled test suite in a single operation.

Why use workflows?

ReSim is frequently used with CI/CD platforms like GitHub Actions and GitLab CI to trigger test suites on pull requests, releases, and scheduled runs (e.g., nightly tests). While you can trigger individual test suites directly, this approach has limitations:

Configuration Management: Adding new test suites or temporarily disabling tests requires updating CI workflow files and pushing changes
Complexity: Managing multiple test suite triggers across different CI events becomes unwieldy
Flexibility: Making runtime changes to test configurations requires code changes

How workflows help

Workflows solve these problems by providing:

Named Collections: Group related test suites (e.g., "nightly", "regression", "smoke tests") under meaningful names
Runtime Configuration: Enable/disable test suites or add new ones through the ReSim UI without code changes
Simplified CI: Trigger entire test collections with a single command
Multi-System Testing: Test several systems (e.g., a planner and a perception stack) in one run by supplying one build per system
Flexible Management: Update test suite configurations independently of your CI pipeline

Key concepts

Concept	Description
Workflow	A project-scoped collection of test suites, each marked enabled or disabled. Has a name, description, and an optional CI link.
Workflow test suite	A membership record tying a test suite to a workflow, with an `enabled` flag. Disabling a suite keeps it associated with the workflow but skips it at run time.
Workflow run	One execution of a workflow: you supply one build per system covered by the workflow's enabled suites, and ReSim launches one batch per enabled suite, pairing each suite with the build for its system.

Workflows build on a couple of core ReSim concepts: every test suite targets exactly one system, and every build belongs to exactly one system. (See Core concepts for the full definitions.)

Because each test suite targets a system, a workflow that contains suites for several different systems implicitly requires a build for each of those systems at run time. The API exposes this as the workflow's requiredSystems. Nothing prevents — and nothing special is needed to allow — a workflow from spanning multiple systems; the requirement only materializes when you run it.

Example use case

Consider a nightly CI job that runs progression, regression, and smoke tests. Instead of managing three separate test suite triggers, you can:

Create a "Nightly Tests" workflow containing all three test suites
Configure your CI to run: resim workflows runs create --workflow "Nightly Tests"
Later, add new test suites or disable progression tests entirely through the ReSim UI
Your CI pipeline remains unchanged while test configurations evolve

Workflow management

ReSim provides comprehensive CLI commands for managing workflows. You can create, update, list, and retrieve workflows, as well as manage workflow runs.

Creating workflows

Create a new workflow with the create command:

Bash

resim workflows create \
  --project "my-project" \
  --name "nightly-regression" \
  --description "Full regression: planner + perception" \
  --ci-link "https://github.com/myorg/myrepo/actions/workflows/nightly.yml" \
  --suites '[
    {"testSuite": "planner-regression", "enabled": true},
    {"testSuite": "perception-smoke", "enabled": true},
    {"testSuite": "slow-soak-tests", "enabled": false}
  ]'

Required Parameters:

--project: The name or ID of the project to create the workflow in
--name: The name of the workflow
--description: A description of the workflow
--suites OR --suites-file: JSON array of test suites (exactly one required)

Optional Parameters:

--ci-link: A link to the CI pipeline that triggers this workflow (e.g., a GitHub Actions URL)

Test Suite Configuration:

You can specify test suites in two ways:

Inline JSON (using --suites):

Bash

--suites '[{"testSuite": "planner-regression", "enabled": true}, {"testSuite": "perception-smoke", "enabled": false}]'

JSON File (using --suites-file):

Bash

--suites-file ./workflow-suites.json

The testSuite field accepts either a test suite UUID or its name. Test suite names are unique identifiers, so you can use whichever is more convenient — names are often more readable and maintainable.

Listing workflows

List all workflows in a project:

Bash

resim workflows list --project "my-project"

This command returns a summary of each workflow including:

Workflow ID and name
Description
CI workflow link (if set)
Associated test suites with their enabled/disabled status

Getting workflow details

Retrieve detailed information about a specific workflow:

Bash

resim workflows get --project "my-project" --workflow "nightly-regression"

You can specify the workflow by either name or UUID. The output includes the workflow's suites and their enabled states.

Updating workflows

Update an existing workflow:

Bash

resim workflows update \
  --project "my-project" \
  --workflow "nightly-regression" \
  --name "nightly-regression-v2" \
  --description "Updated description" \
  --ci-link "https://new-ci-link.com"

Optional Parameters:

--name: New name for the workflow
--description: New description
--ci-link: New CI workflow link
--suites OR --suites-file: Replace all test suites (see create command for format)

You must provide at least one update parameter, and the --suites and --suites-file flags are mutually exclusive.

Suite updates are declarative. When you pass --suites or --suites-file, the CLI diffs the supplied list against the workflow's current state and adds, removes, and toggles suites so that the workflow matches the full list you provided. Omitting a suite from the list removes it from the workflow.

Workflow runs

Once you have a workflow defined, you can execute it to run all enabled test suites. A workflow run launches one batch per enabled suite, pairing each suite with the build for its system.

Creating workflow runs

Bash

resim workflows runs create --project <project> --workflow <name-or-id> [build flags...]

A run needs one build per system covered by the workflow's enabled suites. There are three ways to supply builds, exactly one of which must be used:

1. Repeatable `--build` (simple multi-build)

Bash

resim workflows runs create \
  --project my-project \
  --workflow nightly-regression \
  --build 11111111-1111-1111-1111-111111111111 \
  --build 22222222-2222-2222-2222-222222222222

Each --build is a build ID; the system each build covers is derived server-side from the build itself, so no system needs to be named on the command line. With this form, the run-level flags apply to every build in the run:

--parameter <name>=<value> (repeatable): parameter overrides passed to each batch
--pool-labels <label,...>: where to run; labels are a logical AND
--allowable-failure-percent <0-100>: maximum percentage of tests that may hit an execution error while still computing aggregate metrics and counting the run as complete

Parameters can be specified multiple times or comma-separated:

Bash

--parameter "env=production,debug=false"
# or
--parameter "env=production" --parameter "debug=false"

2. `--builds` / `--builds-file` JSON (per-build configuration)

When different systems need different parameters, pool labels, or failure tolerances, describe each build as a JSON object:

Bash

resim workflows runs create \
  --project my-project \
  --workflow nightly-regression \
  --builds '[
    {
      "buildID": "11111111-1111-1111-1111-111111111111",
      "parameters": {"speed": "fast"},
      "poolLabels": ["gpu", "large"],
      "allowableFailurePercent": 10
    },
    {"buildID": "22222222-2222-2222-2222-222222222222"}
  ]'

(--builds-file path/to/builds.json is equivalent.)

All entry fields except buildID are optional and apply only to that build's batches. Because configuration is per-entry here, the run-level --parameter, --pool-labels, and --allowable-failure-percent flags cannot be combined with --builds/--builds-file.

3. `--build-id` (deprecated)

The legacy single-build flag still works — it is converted internally to a one-entry builds list — but prints a deprecation warning and is hidden from --help. Use --build or --builds instead, even for single-system workflows.

Validation rules

The CLI rejects, before calling the API:

More than one build mechanism at once (e.g. --build together with --builds)
Run-level override flags combined with --builds/--builds-file
Duplicate build IDs, malformed UUIDs, the reserved pool label resim, and allowableFailurePercent outside 0–100

The API additionally rejects:

Two builds resolving to the same system (HTTP 400)
An enabled suite whose system has no matching build (HTTP 422) — the run is not partially created

If you supply a build whose system is not covered by any enabled suite, the run is still created but that build is simply not exercised; the response reports it under unusedBuilds and the CLI prints a warning. This lets CI pipelines send their full set of builds without tracking exactly which suites are currently enabled.

Other run flags

--account <username>: associate a CI/CD platform account with the run (otherwise inferred from CI environment variables)
--github: machine-readable output (workflow_run_id=<uuid>) for GitHub Actions

Listing workflow runs

List all runs for a specific workflow (paginated):

Bash

resim workflows runs list --project "my-project" --workflow "nightly-regression"

Getting workflow run details

Bash

resim workflows runs get \
  --project "my-project" \
  --workflow "nightly-regression" \
  --run-id "run-uuid-here"

This prints one entry per test suite in the run:

JSON

[
  {
    "testSuiteID": "aaaaaaaa-...",
    "systemID": "bbbbbbbb-...",
    "buildID": "11111111-...",
    "batchID": "cccccccc-...",
    "batchURL": "https://app.resim.ai/projects/.../batches/cccccccc-..."
  }
]

systemID is always present (copied from the test suite at run creation)
buildID shows which build was paired with this suite — important for multi-system runs
batchID/batchURL are omitted for suites that produced no batch: suites disabled at run time, or suites with no active experiences (recorded for traceability, but nothing is launched)

resim workflows runs get --slack instead emits a Slack webhook payload summarizing every batch in the run.

Supervising runs from CI

resim workflows runs supervise waits for every batch in a run to finish, rerunning failed tests, and exits with a code reflecting the worst final batch state — suitable as the gating step of a CI job:

Bash

resim workflows runs supervise \
  --project my-project \
  --workflow nightly-regression \
  --run-id <uuid> \
  --max-rerun-attempts 2 \
  --rerun-max-failure-percent 50 \
  --rerun-on-states Error,Blocker \
  --wait-timeout 2h

All batches are supervised in parallel. Tests whose conflated status matches --rerun-on-states are rerun up to --max-rerun-attempts times (skipped if more than --rerun-max-failure-percent of jobs failed). The exit code is derived from each batch's conflated status, filtered to --fail-on-states if set, otherwise to --rerun-on-states:

Exit code	Meaning
0	All batches complete (no remaining jobs in the fail filter)
1	Internal CLI error
2	ERROR (orchestration error or unresolved ERROR jobs)
5	Cancelled
6	Timed out
7	Unresolved BLOCKER jobs
8	Unresolved WARNING jobs

Suites that produced no batch are skipped. --quiet suppresses informational logging; --poll-every controls the polling interval.

For supervising a single batch rather than a whole workflow run, see re-running batches.

Typical CI flow

flowchart TD
    ci[CI pipeline triggers on commit] --> buildA["resim builds create (system A image)"]
    ci --> buildB["resim builds create (system B image)"]
    buildA --> runCreate["resim workflows runs create --build idA --build idB"]
    buildB --> runCreate
    runCreate --> batchA[Batch per enabled suite of system A]
    runCreate --> batchB[Batch per enabled suite of system B]
    batchA --> supervise["resim workflows runs supervise (reruns + exit code)"]
    batchB --> supervise
    supervise --> gate[CI job passes or fails on exit code]

CI builds and pushes one image per system, then registers each with resim builds create --system ... --image ... (or --build-spec for compose-based builds), capturing the build IDs via --github.
resim workflows runs create launches the workflow with one --build per system (or --builds JSON for per-system configuration), capturing workflow_run_id.
resim workflows runs supervise blocks until all batches settle and gates the pipeline on the exit code.

Examples

Example 1: Creating a multi-system nightly workflow

Bash

# Create a workflow covering both the planner and the perception stack
resim workflows create \
  --project "my-robot-project" \
  --name "Nightly Tests" \
  --description "Nightly regression across planner and perception" \
  --ci-link "https://github.com/myorg/robot-repo/actions/workflows/nightly.yml" \
  --suites-file ./nightly-suites.json

Where nightly-suites.json contains:

JSON

[
  {"testSuite": "planner-regression", "enabled": true},
  {"testSuite": "perception-smoke", "enabled": true},
  {"testSuite": "slow-soak-tests", "enabled": false}
]

Because planner-regression and perception-smoke target different systems, every run of this workflow will require one build for each of those systems.

Example 2: Running and supervising a workflow in CI

Bash

# Register a build for each system, capturing the IDs
PLANNER_BUILD=$(resim builds create --project "my-robot-project" \
  --system "planner" --image "$PLANNER_IMAGE" --branch "$BRANCH" \
  --version "$GIT_SHA" --github | cut -d= -f2)
PERCEPTION_BUILD=$(resim builds create --project "my-robot-project" \
  --system "perception" --image "$PERCEPTION_IMAGE" --branch "$BRANCH" \
  --version "$GIT_SHA" --github | cut -d= -f2)

# Launch the workflow run with one build per system
RUN_ID=$(resim workflows runs create \
  --project "my-robot-project" \
  --workflow "Nightly Tests" \
  --build "$PLANNER_BUILD" \
  --build "$PERCEPTION_BUILD" \
  --parameter "test_environment=staging" \
  --allowable-failure-percent 10 \
  --github | cut -d= -f2)

# Block until all batches finish, rerunning flaky failures, and gate CI on the result
resim workflows runs supervise \
  --project "my-robot-project" \
  --workflow "Nightly Tests" \
  --run-id "$RUN_ID" \
  --max-rerun-attempts 2 \
  --rerun-max-failure-percent 50 \
  --rerun-on-states "Error,Blocker" \
  --wait-timeout 2h

Example 3: Per-build configuration with `--builds`

Bash

# Perception batches need GPU pools and tolerate some flakiness;
# the planner runs with defaults
resim workflows runs create \
  --project "my-robot-project" \
  --workflow "Nightly Tests" \
  --builds '[
    {
      "buildID": "'$PERCEPTION_BUILD'",
      "poolLabels": ["gpu", "large"],
      "allowableFailurePercent": 10
    },
    {"buildID": "'$PLANNER_BUILD'"}
  ]'

Example 4: Updating workflow configuration

Bash

# Disable performance tests and update description. The suite list is
# declarative: any suite omitted here is removed from the workflow.
resim workflows update \
  --project "my-robot-project" \
  --workflow "Nightly Tests" \
  --description "Nightly regression and smoke tests (performance tests moved to weekly)" \
  --suites '[{"testSuite": "planner-regression", "enabled": true}, {"testSuite": "perception-smoke", "enabled": true}]'

API notes

For integrations that call the API directly rather than through the CLI:

createWorkflowRunInput accepts either the deprecated top-level buildID (with run-level parameters/poolLabels/allowableFailurePercent) or the preferred builds array of {buildID, parameters, poolLabels, allowableFailurePercent} — never both (HTTP 400). New integrations should always use builds, even for single-system workflows.
With builds, the deprecated run-level parameters/poolLabels/allowableFailurePercent must be omitted; configure each entry instead.
The workflowRun response carries workflowRunTestSuites[] with per-suite systemID, buildID, and batchID, plus unusedBuilds[] for supplied-but-unneeded builds. The top-level workflowRun.buildID is a deprecated convenience echo populated only when every enabled suite used the same build.
The workflow resource exposes requiredSystems[] — the systems covered by its enabled suites — which callers can use to determine which builds a run will need.

Best practices

Workflow organization

Use Descriptive Names: Choose workflow names that clearly indicate their purpose (e.g., "Nightly Regression", "PR Smoke Tests", "Release Validation")
Group Related Tests: Organize test suites logically by functionality, environment, or execution frequency — a single workflow can span multiple systems
Document with Descriptions: Provide clear descriptions explaining what each workflow tests and when it should be used

CI/CD integration

Link CI Workflows: Use the --ci-link parameter to connect ReSim workflows with your CI pipeline for easy navigation
Send All Your Builds: It is safe for CI to pass every build it produces — builds whose system is not covered by any enabled suite are reported as unused rather than causing an error
Gate on supervise: Use resim workflows runs supervise as the final step of a CI job so the pipeline passes or fails based on test outcomes
Use Environment Variables: Leverage CI environment variables for dynamic parameters like build IDs and test environments
Set Appropriate Failure Thresholds: Use --allowable-failure-percent to handle expected test failures in non-critical workflows

Test suite management

Start with Core Tests: Begin with essential test suites enabled, then add optional tests as needed
Use Test Suite Names: Prefer test suite names over UUIDs for better readability and maintainability
Use JSON Files: For complex configurations, use --suites-file instead of inline JSON for better maintainability
Remember Updates Are Declarative: When updating suites, supply the complete desired list — omitted suites are removed
Regular Review: Periodically review and update workflow configurations to ensure they remain relevant

Parameter management

Consistent Naming: Use consistent parameter naming conventions across your workflows
Per-Build Configuration: When systems need different parameters or pool labels, use --builds JSON instead of run-level flags
Environment-Specific Values: Use parameters to customize test behavior for different environments
Document Parameters: Keep track of what parameters each workflow expects and their valid values

Monitoring and maintenance

Track Workflow Runs: Regularly review workflow run results to identify patterns and issues
Update Descriptions: Keep workflow descriptions current as test suites evolve
Archive Unused Workflows: Remove or archive workflows that are no longer needed to keep your project organized

Workflows

Introduction

Why use workflows?

How workflows help

Key concepts

Example use case

Workflow management

Creating workflows

Listing workflows

Getting workflow details

Updating workflows

Workflow runs

Creating workflow runs

1. Repeatable --build (simple multi-build)

2. --builds / --builds-file JSON (per-build configuration)

3. --build-id (deprecated)

Validation rules

Other run flags

Listing workflow runs

Getting workflow run details

Supervising runs from CI

Typical CI flow

Examples

Example 1: Creating a multi-system nightly workflow

Example 2: Running and supervising a workflow in CI

Example 3: Per-build configuration with --builds

Example 4: Updating workflow configuration

API notes

Best practices

Workflow organization

CI/CD integration

Test suite management

Parameter management

Monitoring and maintenance

1. Repeatable `--build` (simple multi-build)

2. `--builds` / `--builds-file` JSON (per-build configuration)

3. `--build-id` (deprecated)

Example 3: Per-build configuration with `--builds`