Workflows
Introduction
ReSim workflows provide a powerful way to organize and manage collections of test suites for CI/CD pipelines. Instead of managing individual test suites separately, workflows allow you to group related test suites together and run them as a single unit. Where a single batch answers "how does this build perform on these experiences?", a workflow answers "how does this version of our software perform across all the test suites we care about?" — producing one batch per enabled test suite in a single operation.
Why use workflows?
ReSim is frequently used with CI/CD platforms like GitHub Actions and GitLab CI to trigger test suites on pull requests, releases, and scheduled runs (e.g., nightly tests). While you can trigger individual test suites directly, this approach has limitations:
- Configuration Management: Adding new test suites or temporarily disabling tests requires updating CI workflow files and pushing changes
- Complexity: Managing multiple test suite triggers across different CI events becomes unwieldy
- Flexibility: Making runtime changes to test configurations requires code changes
How workflows help
Workflows solve these problems by providing:
- Named Collections: Group related test suites (e.g., "nightly", "regression", "smoke tests") under meaningful names
- Runtime Configuration: Enable/disable test suites or add new ones through the ReSim UI without code changes
- Simplified CI: Trigger entire test collections with a single command
- Multi-System Testing: Test several systems (e.g., a planner and a perception stack) in one run by supplying one build per system
- Flexible Management: Update test suite configurations independently of your CI pipeline
Key concepts
| Concept | Description |
|---|---|
| Workflow | A project-scoped collection of test suites, each marked enabled or disabled. Has a name, description, and an optional CI link. |
| Workflow test suite | A membership record tying a test suite to a workflow, with an enabled flag. Disabling a suite keeps it associated with the workflow but skips it at run time. |
| Workflow run | One execution of a workflow: you supply one build per system covered by the workflow's enabled suites, and ReSim launches one batch per enabled suite, pairing each suite with the build for its system. |
Workflows build on a couple of core ReSim concepts: every test suite targets exactly one system, and every build belongs to exactly one system. (See Core concepts for the full definitions.)
Because each test suite targets a system, a workflow that contains suites for several different systems implicitly requires a build for each of those systems at run time. The API exposes this as the workflow's requiredSystems. Nothing prevents — and nothing special is needed to allow — a workflow from spanning multiple systems; the requirement only materializes when you run it.
Example use case
Consider a nightly CI job that runs progression, regression, and smoke tests. Instead of managing three separate test suite triggers, you can:
- Create a "Nightly Tests" workflow containing all three test suites
- Configure your CI to run:
resim workflows runs create --workflow "Nightly Tests" - Later, add new test suites or disable progression tests entirely through the ReSim UI
- Your CI pipeline remains unchanged while test configurations evolve
Workflow management
ReSim provides comprehensive CLI commands for managing workflows. You can create, update, list, and retrieve workflows, as well as manage workflow runs.
Creating workflows
Create a new workflow with the create command:
resim workflows create \
--project "my-project" \
--name "nightly-regression" \
--description "Full regression: planner + perception" \
--ci-link "https://github.com/myorg/myrepo/actions/workflows/nightly.yml" \
--suites '[
{"testSuite": "planner-regression", "enabled": true},
{"testSuite": "perception-smoke", "enabled": true},
{"testSuite": "slow-soak-tests", "enabled": false}
]'
Required Parameters:
--project: The name or ID of the project to create the workflow in--name: The name of the workflow--description: A description of the workflow--suitesOR--suites-file: JSON array of test suites (exactly one required)
Optional Parameters:
--ci-link: A link to the CI pipeline that triggers this workflow (e.g., a GitHub Actions URL)
Test Suite Configuration:
You can specify test suites in two ways:
- Inline JSON (using
--suites):
--suites '[{"testSuite": "planner-regression", "enabled": true}, {"testSuite": "perception-smoke", "enabled": false}]'
- JSON File (using
--suites-file):
--suites-file ./workflow-suites.json
The testSuite field accepts either a test suite UUID or its name. Test suite names are unique identifiers, so you can use whichever is more convenient — names are often more readable and maintainable.
Listing workflows
List all workflows in a project:
resim workflows list --project "my-project"
This command returns a summary of each workflow including:
- Workflow ID and name
- Description
- CI workflow link (if set)
- Associated test suites with their enabled/disabled status
Getting workflow details
Retrieve detailed information about a specific workflow:
resim workflows get --project "my-project" --workflow "nightly-regression"
You can specify the workflow by either name or UUID. The output includes the workflow's suites and their enabled states.
Updating workflows
Update an existing workflow:
resim workflows update \
--project "my-project" \
--workflow "nightly-regression" \
--name "nightly-regression-v2" \
--description "Updated description" \
--ci-link "https://new-ci-link.com"
Optional Parameters:
--name: New name for the workflow--description: New description--ci-link: New CI workflow link--suitesOR--suites-file: Replace all test suites (see create command for format)
You must provide at least one update parameter, and the --suites and --suites-file flags are mutually exclusive.
Suite updates are declarative. When you pass --suites or --suites-file, the CLI diffs the supplied list against the workflow's current state and adds, removes, and toggles suites so that the workflow matches the full list you provided. Omitting a suite from the list removes it from the workflow.
Workflow runs
Once you have a workflow defined, you can execute it to run all enabled test suites. A workflow run launches one batch per enabled suite, pairing each suite with the build for its system.
Creating workflow runs
resim workflows runs create --project <project> --workflow <name-or-id> [build flags...]
A run needs one build per system covered by the workflow's enabled suites. There are three ways to supply builds, exactly one of which must be used:
1. Repeatable --build (simple multi-build)
resim workflows runs create \
--project my-project \
--workflow nightly-regression \
--build 11111111-1111-1111-1111-111111111111 \
--build 22222222-2222-2222-2222-222222222222
Each --build is a build ID; the system each build covers is derived server-side from the build itself, so no system needs to be named on the command line. With this form, the run-level flags apply to every build in the run:
--parameter <name>=<value>(repeatable): parameter overrides passed to each batch--pool-labels <label,...>: where to run; labels are a logical AND--allowable-failure-percent <0-100>: maximum percentage of tests that may hit an execution error while still computing aggregate metrics and counting the run as complete
Parameters can be specified multiple times or comma-separated:
--parameter "env=production,debug=false"
# or
--parameter "env=production" --parameter "debug=false"
2. --builds / --builds-file JSON (per-build configuration)
When different systems need different parameters, pool labels, or failure tolerances, describe each build as a JSON object:
resim workflows runs create \
--project my-project \
--workflow nightly-regression \
--builds '[
{
"buildID": "11111111-1111-1111-1111-111111111111",
"parameters": {"speed": "fast"},
"poolLabels": ["gpu", "large"],
"allowableFailurePercent": 10
},
{"buildID": "22222222-2222-2222-2222-222222222222"}
]'
(--builds-file path/to/builds.json is equivalent.)
All entry fields except buildID are optional and apply only to that build's batches. Because configuration is per-entry here, the run-level --parameter, --pool-labels, and --allowable-failure-percent flags cannot be combined with --builds/--builds-file.
3. --build-id (deprecated)
The legacy single-build flag still works — it is converted internally to a one-entry builds list — but prints a deprecation warning and is hidden from --help. Use --build or --builds instead, even for single-system workflows.
Validation rules
The CLI rejects, before calling the API:
- More than one build mechanism at once (e.g.
--buildtogether with--builds) - Run-level override flags combined with
--builds/--builds-file - Duplicate build IDs, malformed UUIDs, the reserved pool label
resim, andallowableFailurePercentoutside 0–100
The API additionally rejects:
- Two builds resolving to the same system (HTTP 400)
- An enabled suite whose system has no matching build (HTTP 422) — the run is not partially created
If you supply a build whose system is not covered by any enabled suite, the run is still created but that build is simply not exercised; the response reports it under unusedBuilds and the CLI prints a warning. This lets CI pipelines send their full set of builds without tracking exactly which suites are currently enabled.
Other run flags
--account <username>: associate a CI/CD platform account with the run (otherwise inferred from CI environment variables)--github: machine-readable output (workflow_run_id=<uuid>) for GitHub Actions
Listing workflow runs
List all runs for a specific workflow (paginated):
resim workflows runs list --project "my-project" --workflow "nightly-regression"
Getting workflow run details
resim workflows runs get \
--project "my-project" \
--workflow "nightly-regression" \
--run-id "run-uuid-here"
This prints one entry per test suite in the run:
[
{
"testSuiteID": "aaaaaaaa-...",
"systemID": "bbbbbbbb-...",
"buildID": "11111111-...",
"batchID": "cccccccc-...",
"batchURL": "https://app.resim.ai/projects/.../batches/cccccccc-..."
}
]
systemIDis always present (copied from the test suite at run creation)buildIDshows which build was paired with this suite — important for multi-system runsbatchID/batchURLare omitted for suites that produced no batch: suites disabled at run time, or suites with no active experiences (recorded for traceability, but nothing is launched)
resim workflows runs get --slack instead emits a Slack webhook payload summarizing every batch in the run.
Supervising runs from CI
resim workflows runs supervise waits for every batch in a run to finish, rerunning failed tests, and exits with a code reflecting the worst final batch state — suitable as the gating step of a CI job:
resim workflows runs supervise \
--project my-project \
--workflow nightly-regression \
--run-id <uuid> \
--max-rerun-attempts 2 \
--rerun-max-failure-percent 50 \
--rerun-on-states Error,Blocker \
--wait-timeout 2h
All batches are supervised in parallel. Tests whose conflated status matches --rerun-on-states are rerun up to --max-rerun-attempts times (skipped if more than --rerun-max-failure-percent of jobs failed). The exit code is derived from each batch's conflated status, filtered to --fail-on-states if set, otherwise to --rerun-on-states:
| Exit code | Meaning |
|---|---|
| 0 | All batches complete (no remaining jobs in the fail filter) |
| 1 | Internal CLI error |
| 2 | ERROR (orchestration error or unresolved ERROR jobs) |
| 5 | Cancelled |
| 6 | Timed out |
| 7 | Unresolved BLOCKER jobs |
| 8 | Unresolved WARNING jobs |
Suites that produced no batch are skipped. --quiet suppresses informational logging; --poll-every controls the polling interval.
For supervising a single batch rather than a whole workflow run, see re-running batches.
Typical CI flow
flowchart TD
ci[CI pipeline triggers on commit] --> buildA["resim builds create (system A image)"]
ci --> buildB["resim builds create (system B image)"]
buildA --> runCreate["resim workflows runs create --build idA --build idB"]
buildB --> runCreate
runCreate --> batchA[Batch per enabled suite of system A]
runCreate --> batchB[Batch per enabled suite of system B]
batchA --> supervise["resim workflows runs supervise (reruns + exit code)"]
batchB --> supervise
supervise --> gate[CI job passes or fails on exit code]
- CI builds and pushes one image per system, then registers each with
resim builds create --system ... --image ...(or--build-specfor compose-based builds), capturing the build IDs via--github. resim workflows runs createlaunches the workflow with one--buildper system (or--buildsJSON for per-system configuration), capturingworkflow_run_id.resim workflows runs superviseblocks until all batches settle and gates the pipeline on the exit code.
Examples
Example 1: Creating a multi-system nightly workflow
# Create a workflow covering both the planner and the perception stack
resim workflows create \
--project "my-robot-project" \
--name "Nightly Tests" \
--description "Nightly regression across planner and perception" \
--ci-link "https://github.com/myorg/robot-repo/actions/workflows/nightly.yml" \
--suites-file ./nightly-suites.json
Where nightly-suites.json contains:
[
{"testSuite": "planner-regression", "enabled": true},
{"testSuite": "perception-smoke", "enabled": true},
{"testSuite": "slow-soak-tests", "enabled": false}
]
Because planner-regression and perception-smoke target different systems, every run of this workflow will require one build for each of those systems.
Example 2: Running and supervising a workflow in CI
# Register a build for each system, capturing the IDs
PLANNER_BUILD=$(resim builds create --project "my-robot-project" \
--system "planner" --image "$PLANNER_IMAGE" --branch "$BRANCH" \
--version "$GIT_SHA" --github | cut -d= -f2)
PERCEPTION_BUILD=$(resim builds create --project "my-robot-project" \
--system "perception" --image "$PERCEPTION_IMAGE" --branch "$BRANCH" \
--version "$GIT_SHA" --github | cut -d= -f2)
# Launch the workflow run with one build per system
RUN_ID=$(resim workflows runs create \
--project "my-robot-project" \
--workflow "Nightly Tests" \
--build "$PLANNER_BUILD" \
--build "$PERCEPTION_BUILD" \
--parameter "test_environment=staging" \
--allowable-failure-percent 10 \
--github | cut -d= -f2)
# Block until all batches finish, rerunning flaky failures, and gate CI on the result
resim workflows runs supervise \
--project "my-robot-project" \
--workflow "Nightly Tests" \
--run-id "$RUN_ID" \
--max-rerun-attempts 2 \
--rerun-max-failure-percent 50 \
--rerun-on-states "Error,Blocker" \
--wait-timeout 2h
Example 3: Per-build configuration with --builds
# Perception batches need GPU pools and tolerate some flakiness;
# the planner runs with defaults
resim workflows runs create \
--project "my-robot-project" \
--workflow "Nightly Tests" \
--builds '[
{
"buildID": "'$PERCEPTION_BUILD'",
"poolLabels": ["gpu", "large"],
"allowableFailurePercent": 10
},
{"buildID": "'$PLANNER_BUILD'"}
]'
Example 4: Updating workflow configuration
# Disable performance tests and update description. The suite list is
# declarative: any suite omitted here is removed from the workflow.
resim workflows update \
--project "my-robot-project" \
--workflow "Nightly Tests" \
--description "Nightly regression and smoke tests (performance tests moved to weekly)" \
--suites '[{"testSuite": "planner-regression", "enabled": true}, {"testSuite": "perception-smoke", "enabled": true}]'
API notes
For integrations that call the API directly rather than through the CLI:
createWorkflowRunInputaccepts either the deprecated top-levelbuildID(with run-levelparameters/poolLabels/allowableFailurePercent) or the preferredbuildsarray of{buildID, parameters, poolLabels, allowableFailurePercent}— never both (HTTP 400). New integrations should always usebuilds, even for single-system workflows.- With
builds, the deprecated run-levelparameters/poolLabels/allowableFailurePercentmust be omitted; configure each entry instead. - The
workflowRunresponse carriesworkflowRunTestSuites[]with per-suitesystemID,buildID, andbatchID, plusunusedBuilds[]for supplied-but-unneeded builds. The top-levelworkflowRun.buildIDis a deprecated convenience echo populated only when every enabled suite used the same build. - The
workflowresource exposesrequiredSystems[]— the systems covered by its enabled suites — which callers can use to determine which builds a run will need.
Best practices
Workflow organization
- Use Descriptive Names: Choose workflow names that clearly indicate their purpose (e.g., "Nightly Regression", "PR Smoke Tests", "Release Validation")
- Group Related Tests: Organize test suites logically by functionality, environment, or execution frequency — a single workflow can span multiple systems
- Document with Descriptions: Provide clear descriptions explaining what each workflow tests and when it should be used
CI/CD integration
- Link CI Workflows: Use the
--ci-linkparameter to connect ReSim workflows with your CI pipeline for easy navigation - Send All Your Builds: It is safe for CI to pass every build it produces — builds whose system is not covered by any enabled suite are reported as unused rather than causing an error
- Gate on
supervise: Useresim workflows runs superviseas the final step of a CI job so the pipeline passes or fails based on test outcomes - Use Environment Variables: Leverage CI environment variables for dynamic parameters like build IDs and test environments
- Set Appropriate Failure Thresholds: Use
--allowable-failure-percentto handle expected test failures in non-critical workflows
Test suite management
- Start with Core Tests: Begin with essential test suites enabled, then add optional tests as needed
- Use Test Suite Names: Prefer test suite names over UUIDs for better readability and maintainability
- Use JSON Files: For complex configurations, use
--suites-fileinstead of inline JSON for better maintainability - Remember Updates Are Declarative: When updating suites, supply the complete desired list — omitted suites are removed
- Regular Review: Periodically review and update workflow configurations to ensure they remain relevant
Parameter management
- Consistent Naming: Use consistent parameter naming conventions across your workflows
- Per-Build Configuration: When systems need different parameters or pool labels, use
--buildsJSON instead of run-level flags - Environment-Specific Values: Use parameters to customize test behavior for different environments
- Document Parameters: Keep track of what parameters each workflow expects and their valid values
Monitoring and maintenance
- Track Workflow Runs: Regularly review workflow run results to identify patterns and issues
- Update Descriptions: Keep workflow descriptions current as test suites evolve
- Archive Unused Workflows: Remove or archive workflows that are no longer needed to keep your project organized