Overview
One of ReSim's core products is a comprehensive and growing metrics framework for evaluating the performance of Embodied AI applications, such as robots. The purpose of this product is to allow users to display information from their simulation runs of their embodied AI system in an online dashboard such that judgments can be rendered about its performance. To this end, such information can be compared between different versions of the system on different software branches or longitudinally along a single software branch.
Metrics exist at three levels:
- Test Metrics: Metrics are computed per-test (i.e. per simulation), based off outputs from the simulation (such as log data and other artifacts) - an example would be a precision-recall curve across a set of input data.
- Batch Metrics: Multiple test metrics are aggregated to result in batch metrics. For example, an average accuracy across all the tests in your batch.
- Test Suite Report Metrics: These metrics help you assess how the performance of your system has evolved over time. For example average accuracy over time.
These metrics are written using the same system, which is the ResimMetricsWriter, although the "input data" is of course different between the three.
The basic mechanism in the ReSim app is a metrics build
, which is a docker image that
wraps any metrics code you write. The image is run by the ReSim app after every simulation, every batch, and on demand as part of report generation.
It is ReSim's core philosophy to allow maximum flexibility for engineers to display the analysis that is most relevant to them. We aim to achieve this in three ways:
- Open Source: The entirety of our metrics framework is open source to ensure transparency for those using ReSim to test and evaluate their embodied AI system.
- Determine your own aggregations: Rather than the ReSim web app providing fixed, limited aggregations of the results of a set of tests (batch metrics), or fixed longitudinal reports (test suite report metrics), the ReSim metrics framework lets you write code to decide how to aggregate. ReSim also provides some sensible default metrics to get users started.
- Standard plotting libraries: While ReSim supports a range of custom metrics styles, described in Metrics Types, any Plotly chart can also be wrapped with our metrics metadata and displayed using Plotly.JS.
To allow you to use this powerful framework, we recommend working through the docs in this order:
Contents
- Metrics Builds: How to make a metrics build to run your metrics and what data contract it expects.
- Metrics Data: How to extract and represent data from your logs, ready to visualize in a metric.
- Metric Types: A summary of the types of metrics supported in the dashboard, and how to use them.
- Metrics Writer: How to write metrics in a way such that they appear in the dashboard.
- Events: A summary of events, the assignment of metrics to events, and how to use them.
- Batch Metrics: How to compute batch metrics.
- Test Suite Reports: How to compute test suite reports and their metrics.