Skip to content

Introduction

How ReSim Works in a Nutshell

ReSim is a web app for scheduling, running, and extracting insights from software integration tests run at scale in the cloud, and hardware-in-the-loop tests run on premises. The core of our testing interface is the Docker image, so any integration test that can be bundled into one or more Docker images can be run in ReSim. Therefore IsaacSim, Gazebo, and even custom in-house simulators can readily be used for integration testing in ReSim. Often, such simulation tests are run over a large set of inputs (a.k.a. “scenarios”) which ReSim calls “experiences”. Typically, these are stored in a customer-managed s3 bucket and each is exposed to the test container via a volume mount when it’s run. Given this testing interface, the idiomatic ReSim recipe then follows a relatively logical progression:

  • Create a docker image that contains your test. Its entrypoint should run your test based off the inputs in /tmp/resim/inputs. Push it to a registry like AWS ECR or Google Artifact Registry. This registry can be private, but ReSim must have access in order to pull your test and run it. Use ReSim’s CLI to inform our app about your image. In ReSim jargon this image is called a “build”. Multiple images which work in tandem can also be used if needed for your application.

  • Put the inputs for your test in an s3 bucket. Again, this can be private, but ReSim needs access to pull the inputs when it’s time to run a test. Use ReSim’s CLI to inform our app about the s3 prefix of your inputs. In the ReSim database this information, along with other metadata, is called an “experience”.

  • In the ReSim app, the ReSim CLI, or in an automated CI workflow, select a set of such experiences (i.e. s3 prefixes) to run and a build (i.e. docker image) to run them with. ReSim then takes care of marshaling the needed resources to run the tests, scheduling and running them, collecting their results, and informing the user via multiple avenues (e.g. Slack, Github, Gitlab, etc.) when they are complete. Test logs can be downloaded through the UI or CLI.

In practice, many variants of this recipe exist (e.g. the experiences don’t have to exist in s3, or they can have multiple s3 prefixes): the above sequence represents only the most basic way to configure things. ReSim also includes a plethora of features to leverage the above workflow including customizable metrics (using a very similar approach called a “metrics build”), parameter sweeps, hardware-in-the-loop testing, durable test suites (collections of experiences and custom metrics), longitudinal reporting, and many others. The above recipe is the core concept upon which these features are built and generally such features are designed to make the recipe useful for more varied robotics workflows. For instance, parameter sweeps allow you run a build on a set of experiences with multiple parameter sets so a robotics engineer can experiment and find the best possible parameters for their robot’s stack. The documentation below dives into the details of setting up your builds and experiences and running your first ReSim tests, but as long as you keep the above recipe in mind, the logic behind it should generally be clear.

ReSim Results Dashboard

Why use simulation for CI?

Developing autonomous systems is hard. Because we work with complex hardware and software systems, it is very difficult to know the effects of any particular change to the robot's software. Really, we are working with systems of systems, with changes to subsystems having potential knock-on effects from the systems that depend on it. The easiest solution to this problem is to try out each proposed change on the physical robot to see if it improves or degrades the robot's performance. As performance improves, this can also be a very high-fidelity way of verifying that the robot meets the requirements associated with its goals and environment. After all, we are testing each version of the software with the actual hardware. Unfortunately, there are a few reasons why real-world testing alone does not work well for many applications:

  • Complex Environments: For many real world robotics applications like self driving or drone delivery, the robot is required to operate in an unstructured environment where unexpected agents and conditions may be encountered at any time. This imposes so many requirements on the robot that it becomes impossible to check them all on the real robot either because:
  • Certain requirements can only be demonstrated in rare events which cannot be reliably encountered for testing purposes.
  • It takes too long to sequentially test the combinatorially large set of requirements on a single robot, and it's often too expensive to buy enough robots to test the requirements in parallel.
  • Safety: For certain applications (e.g. crewed spaceflight or self driving), it can be quite dangerous to test every requirement out in the real world. A failure and such cases could cause severe injury or death.
  • Cost: Even in cases where the environment is safe and relatively structured, checking requirements manually can be costly, requiring the time and effort of highly specialized personnel to conduct tests and interpret the results. In addition, such testing usually involves a high latency (e.g. it takes 1+ business days to get metrics and feedback on a proposed change) which slows down autonomy development overall.

As a result of these issues, most robotics development efforts only verify a very small fraction of the complete set of robot requirements on each proposed change. This can yield a pernicious pattern where a change fixing one issue in a robot's behavior degrades its behavior in another, potentially less visible, way. Furthermore, even when this isn't the case engineers are discouraged from trying out experimental improvements if they think it could degrade some unrelated behavior, which stifles innovation.

In our view, one of the most powerful applications of simulation is to ameliorate this issue using simulation testing in continuous integration. This involves creating a set of "blocking" simulation scenarios in which a simulated model of the robot running its software (or even subsystems) is expected to always succeed. Then, each proposed change (e.g. Pull Request) to the robot software is tested against these scenarios and if any fail the change cannot land. This allows for many different requirements (including those that would be difficult or unsafe to test in the real world) to now be checked on every single change. This gives engineers the confidence they need to try out more experiments and accelerate their development.

In addition to this, broader sets of tests can be run at a regular cadence to assess the performance of the system over time and get holistic information about its behavior. This gives engineers and managers the information needed to formulate and execute an effective development strategy.

Practically, performing these tests quickly requires the parallelism afforded by cloud computing. The ReSim app allows users to quickly and easily set up such a continuous integration workflow with a few steps.

  1. Making a set of scenarios (which we call "experiences") that we want to run on every change (or Pull Request) or on a regular cadence.
  2. Packaging our robot simulation code into a Docker container to be easily run on the cloud.
  3. Registering our scenarios and robot simulation code with the ReSim app so it can run them.
  4. Setting up a continuous integration action (e.g. a GitHub Action) to enable automated triggering of the simulations and blocking when they fail.

The subsequent articles will cover in detail how to accomplish these steps. While the use of such simulations definitely do not obviate the need for rigorous real-world testing, they allow developers to minimize breakages and make improvements more quickly and confidently.

How do I start?

You will need a to set up a few things in order to get working with your own tests. Following this guide will get you set up.

We advise you have our Core Concepts guide handy as we use some specific definitions.

Once you're set up, we have a number of User Guides to walk you through the features.