Skip to content

Metrics 2.0 - Getting Started

Introduction

Metrics 2.0 is a major update to the Metrics framework that provides a more flexible and powerful way to collect, plot, and track data in ReSim. At its core, Metrics 2.0 works by ingesting data from various sources in your system; ROS topics, log files, or custom instrumentation, into a centralized data lake. When you emit data during a test run, it's automatically captured and stored in this data lake. After each run completes, the system automatically queries this data lake using SQL queries you've defined to generate metrics, visualizations, and pass/fail status checks. The data lake persists over time, which means you can continue to explore, query, and visualize your test data long after the run has finished, enabling powerful post-hoc analysis and trend tracking across releases.

Why use Metrics 2.0?

The data lake architecture provides several key benefits. Because your data persists, you're not limited to viewing only the metrics you defined before the test ran. You can return to the ReSim UI days, weeks, or months later to explore the raw data, write new queries, create new visualizations, and answer questions you didn't anticipate when you first ran the test. This makes Metrics 2.0 ideal for debugging unexpected failures, conducting retrospective analysis, and iterating on your metrics definitions without needing to re-run expensive tests.

Branch isolation

Metrics configurations are tied to specific branches in your project. This means your main branch can maintain a stable, well-tested set of metrics and data schemas, while development branches can experiment with new metrics, modify schemas, or test breaking changes without affecting production workflows. Each branch maintains its own independent metrics configuration, allowing engineers to iterate safely and merge changes only when they're ready.

What you'll create

To use Metrics, you'll create a single configuration file (.resim/metrics/config.resim.yaml) that defines three key components:

  1. Topics — The schema for data you'll emit from your system (e.g., robot velocity, localization error, goal status). Topics define the structure and types of the data flowing into the data lake.

  2. Metrics — SQL queries that transform your emitted data into visualizations and scalar values. Metrics can be test-level (computed per individual test run) or batch-level (aggregated across multiple test runs). Metrics can have status checks - optional thresholds attached to metrics that determine pass/fail/warning states. These allow you to automatically flag tests that miss performance requirements.

  3. Metrics sets — Collections of metrics that are run against a set of tests, helping you to validate performance and learn about robot behavior.

Metrics 2.0 data flow

The rest of this guide will walk you through each of these components step by step, starting with identifying the data you want to collect.

Identifying Key Data

The first step to adopting ReSim’s Metrics framework is to identify the key data that you want to collect, plot, and track over time in the ReSim Platform. To start with, we’d recommend simple data that you’re interested in plotting simply; i.e. your robot’s velocity over time. Identify where this data comes from; in our example, we’ll be reading it from our robot’s /odom topic.

Writing the schema

Firstly, we strongly recommend installing the ReSim VSCode Extension (or from Open VSX Registry for VSCode forks); it provides language support for ReSim config files and will help you avoid mistakes when emitting your data!

Look at the structure of the data you want to emit to ReSim; in our example case, we’re looking at the Odometry.twist.twist.linear data, which is a Vector3 representing the velocity of the robot in the x, y and z axis. We can write a ReSim config for this like so:

# required version header
version: 1

topics:
  odom_linear_velocity:
    schema:
      x: float
      y: float
      z: float

To ease your experience with writing queries, we recommend flattening nested data structures into their own topics. Additionally, we don’t support slash characters / in topic names (sorry!). Use the content assist from the VSCode extension to explore the available data types. In-depth descriptions are provided by the extension on hover.

The config file should be saved to .resim/metrics/config.resim.yaml in your repository. You’re also welcome to save it to another path - when using the ReSim CLI to start a batch, simply pass --metrics-config-path=<YOUR_PATH> as an option. The extension will only recognise files with the .resim.yaml suffix, so follow that pattern.

emit ting your data

There are two approaches to take here; you can either emit the data live from your stack or post-hoc when your test is complete. We’ll discuss both here.

The emit API is primarily used through the Emitter class. You can pass it a config_path and an output_path . output_path is where we will write your emissions file - by default it is /tmp/resim/outputs/emissions.resim.jsonl . When config_path is both specified and is visible to the application, we will validate the types & shape of the data being emitted to ensure it will be ingested by the ReSim platform correctly. See below for an example:

from resim.metrics.python.emissions import Emitter

emitter = new Emitter(config_path=".resim/metrics/config.resim.yaml")
emitter.emit("odom_linear_velocity", {"x": 1.0, "y": 2.0, "z": 3.0}, timestamp=0)
emitter.emit("odom_linear_velocity", {"x": 1.1, "y": 1.9, "z": 0.0}, timestamp=500000000)
# rejected
emitter.emit("odom_linear_velocity", {"x": "left", "y": "right", "z": "up"}, timestamp=1000000000)

Under the hood, we append these messages to your output_path. In this example, the lines would look like this:

{"$metadata": {"topic": "odom_linear_velocity", "timestamp": 0}, "$data": {"x": 1.0, "y": 2.0, "z": 3.0}}
{"$metadata": {"topic": "odom_linear_velocity", "timestamp": 500000000}, "$data": {"x": 1.1, "y": 1.9, "z": 0.0}}
...

Stack Emissions

Here’s a more fully fleshed-out code example of a stack-based emission pattern. Continuing with our ROS example, we will create a new node for our stack, the MetricsEmitter which will receive messages from the /odom ROS topic and emit them to the odom_linear_velocity ReSim topic.

from rclpy.node import Node
from resim.metrics.python.emissions import Emitter
from nav_msgs.msg import Odometry
from builtin_interfaces.msg import Time as MsgTime
from rclpy.time import Time
from typing import Optional

class MetricsEmitter(Node):
    def __init__(self):
        super().__init__('metrics_emitter')
        self.emitter = new Emitter(config_path=".resim/metrics/config.yaml")
        self.first_timestamp = None

        self.odom_subscriber = self.create_subscription(Odometry, '/odom', self.odom_callback, 10)

    def get_relative_timestamp(self, msg_time: Optional[MsgTime] = None) -> Optional[int]:
        if self.first_goal_received_time is None:
            return None

        current_time: Time
        if msg_time is not None:
            current_time = Time.from_msg(msg_time)
        else:
            current_time = self.get_clock().now()

        return current_time.nanoseconds - self.first_timestamp.nanoseconds

   def odom_callback(self, msg: Odometry):
     if self.first_timestamp == None:
      self.first_timestamp = Time.from_msg(msg.header.stamp).nanoseconds

       self.emitter.emit("odom_linear_velocity", {
        "x": msg.twist.twist.linear.x,
        "y": msg.twist.twist.linear.y,
        "z": msg.twist.twist.linear.z
       }, timestamp=self.get_relative_timestamp(msg.header.stamp))

Post-hoc Emissions

If you’re already recording log files from your simulations, you might want to take advantage of ReSim’s metrics builds. We can follow much the same pattern, but instead read the data from log files like MCAPs or rosbags. See a developed example below using ROS:

from resim.metrics.python.emissions import Emitter
from rosidl_runtime_py.utilities import get_message
import rosbag2_py
from pathlib import Path
import rclpy.serialization

def emit_velocity_data(emitter: Emitter, input_bag: Path):
    reader = rosbag2_py.SequentialReader()
    reader.open(
        rosbag2_py.StorageOptions(uri=str(input_bag), storage_id="mcap"),
        rosbag2_py.ConverterOptions(
            input_serialization_format="cdr", output_serialization_format="cdr"
        ),
    )

    # Create a dictionary mapping topic names to their types
    topic_type_map = {topic.name: topic.type for topic in reader.get_all_topics_and_types()}
    odom_topic = "/odom"

    if odom_topic not in topic_type_map:
        raise ValueError(f"topic {odom_topic} not in bag")

    msg_type = get_message(topic_type_map[odom_topic])

    while reader.has_next():
        topic, data, timestamp = reader.read_next()
        if topic == odom_topic:
             # Only emit if enough time has passed since last emission
              msg = rclpy.serialization.deserialize_message(data, msg_type)
              emitter.emit('odom_linear_velocity', {
                  'x': msg.twist.twist.linear.x,
                  'y': msg.twist.twist.linear.y,
                  'z': msg.twist.twist.linear.z,
              }, timestamp=timestamp)

if __name__ == "__main__":
    input_bag = Path("/tmp/resim/inputs/logs/record.mcap")
    with Emitter(config_path=".resim/metrics/config.yaml") as emitter:
        emit_velocity_data(emitter, Path(input_bag))

Series Emissions

If you’re emitting a series of data from a single call, there is an additional helper function - Emitter.emit_series. This will spread the data from this single emit call to multiple emission lines, to aid with query simplicity. For the initial example above, this can be written as:

emitter = new Emitter(config_path=".resim/metrics/config.resim.yaml")
emitter.emit_series("odom_linear_velocity", {
 "x": [1.0, 1.1, 1.1, 1.2],
 "y": [2.0, 1.9, 1.8, 1.7],
 "z": [3.0, 3.0, 3.0, 3.0]
}, timestamps=[0, 500000000, 1000000000, 1500000000])
# Equivalent to
emitter.emit("odom_linear_velocity", {"x": 1.0, "y": 2.0, "z": 3.0}, timestamp=0)
emitter.emit("odom_linear_velocity", {"x": 1.1, "y": 1.9, "z": 3.0}, timestamp=500000000)
emitter.emit("odom_linear_velocity", {"x": 1.1, "y": 1.8, "z": 3.0}, timestamp=1000000000)
emitter.emit("odom_linear_velocity", {"x": 1.2, "y": 1.7, "z": 3.0}, timestamp=1500000000)

The config file remains the same as the original example, as the resulting emissions match the existing definition of the topic. The function will validate the types of the data as it is emitted & ensure all series are of the same length.

Event Emissions

ReSim supports special types of emissions called “events”. These cause an event to be registered for your job at the given timestamp, which will appear in the Events tab in the ReSim app.

image.png

Each event refers to a specific point in time - such as reaching a goal, or when a system stop occurs. Metrics can be attached to each event by including them in the emission to help you understand the system state at that time. Events are emitted using the emitter.emit_event method - see the example below:

topics:
 # ...
 goal_reached:
    event: true
    schema:
      name: string
      description: string
      status: status
      tags: string[]
      metrics: metric[]
emitter = new Emitter(config_path=".resim/metrics/config.resim.yaml")
emitter.emit_event({
 name: "Goal 1 Reached",
 description: "Robot successfully reached goal number 1",
 status: "PASSED",
 tags: ["navigation"],
 metrics: [{
  "name": "Stereo View",
  "type": "image",
  "value": ["goal_1.png"]
  "status": "PASSED"
 }]
}, event=True, timestamp=1750000000)

The structure of the metric[] object is defined as

"metrics": [
 {
     "name": "string",
     "description": "optional",
     "status": "optional. PASSED | FAIL_BLOCK | etc."
     "type": "scalar | image | plotly | text",
     "value": "the value of the metric. Changes based on the type.",
     // examples:

     // Scalar and text metrics are treated the same way.
     // "type": "scalar" | "text",
     // "value": "2.37",

     // "type": "image",
     // "value": ["1.jpg", "2.jpg"]

     // "type": "plotly"
     // "value": "{ a valid plotly blob}", ie. the output of fig.to_json()
  }
]

Custom emissions API

If Python isn’t your jam, you’re more than welcome to write your own emitter in the language of your choice. The minimal implementation of the emit API (without validation) in Python is here. If you’d like official support for your language of choice, please reach out to our ProServe team and we’ll look into it!

Metrics development

Adding your first metric

We recommend developing your first couple of metrics using debug dashboards. The first step is to create an empty metrics set at the end of your config file:

...
metrics sets:
  My Metrics: 
    metrics: []

Next, create the debug dashboard using the ReSim CLI - see the linked docs for installation instructions. Once you've generated an emissions file, it can be ingested using the resim metrics debug command, invoked like so:

resim metrics debug --project "metrics-demo" --emissions-file "emissions.resim.jsonl" --metrics-config-path ".resim/metrics/config.resim.yaml" --metrics-set "My Metrics"

After a short time, this will print a URL to the debug dashboard which is linked to the config and emissions file provided. From here, you can start to develop your metrics. See the video below for a demonstration.

Once you're happy with your metrics, you can export the current state of the dashboard by clicking the "Export Config" button. This will download a zip file containing the config file and any custom templates configured. You can then replace your local config file with this - you may need to do some formatting & reordering to get it how you like it.

You can also develop your metrics in the config file directly, and then reusing resim metrics debug to test them. See the metric templates below for more details. Once you've added a metric in the config file, make sure that it's included in your metrics set:

...
metrics:
  Robot Speed:
    type: test
    description: Robot speed over time
    query_string: |
      SELECT 
        'Speed', 
        timestamp / 1E9 AS "Time (s)", 
        SQRT(POWER(x, 2) + POWER(y, 2)) AS "Speed (m/s)" 
      FROM odom_linear_velocity;
    template_type: system
    template: line

metrics sets:
  My Metrics:
    metrics:
      - Robot Speed

And then you can view it in the debug dashboard by running:

resim metrics debug --project "metrics-demo" --emissions-file "emissions.resim.jsonl" --metrics-config-path ".resim/metrics/config.resim.yaml" --metrics-set "My Metrics"

Image Metrics

At this time, image metrics are not supported in the debug dashboard as they have a more complex ingestion process. Comment them out in your config in development, and add them back in when you're ready to test them with a real batch.

Status checks

You can make a metric affect pass/fail by adding a status block. Status checks run a separate query that is evaluated against threshold values: if the query returns any rows, the metric is marked as blocking (or warning).

Under status you configure:

  • query_string — A SQL query with exactly one ? parameter. The ? is replaced with the block or warn threshold value when the check runs (e.g. HAVING COUNT(*) < ? or WHERE value > ?).
  • block — Threshold value for a blocking failure. If the status query returns any rows when run with this value, the job is marked as failed (blocker).
  • warn — Optional. Threshold value for a warning. If the query returns any rows when run with this value, the metric is marked as a warning instead of a pass.

Example: require at least 3 goals to be reached; otherwise the metric blocks. The status query returns a row when the count is below the threshold, so we use HAVING COUNT(*) < ? with block: 3:

Time to reach final goal:
  type: test
  description: Time between receiving first goal and reaching final goal. Blocks if less than 3 goals are reached.
  query_string: |
    SELECT
      CASE
        WHEN (SELECT COUNT(*) FROM time_to_goal) != 3 THEN 75.0
        ELSE sum(time_s)
      END
    FROM time_to_goal;
  template_type: system
  template: scalar
  status:
    query_string: SELECT '1' FROM time_to_goal HAVING COUNT(*) < ?
    block: 3

Running your first Metrics 2.0 batch

Once you’re happy with your metrics set, you can give your stack its first run in ReSim. First step is to register your config with the ReSim platform:

resim metrics sync --project "metrics-demo" --branch "metrics-setup"

Each branch in your project has its own copy of the config file, which might result in lots of CLI calls to re-register config files. To help with this, at the time of triggering a batch your config can automatically be synced to your branch:

$ resim suite run --project "metrics-demo" \
  --test-suite "Nightly Suite" \
  --build-id "dc71ebee-1601-4d98-aab4-d4bdecfafca1" \
  --sync-metrics-config \
  --metrics-set "My Metrics"

With that, your batch should be running. Wait for it to complete, and the metrics you have configured will be shown automatically in the ReSim app. If you want to make more changes, you can simply repeat the process of exporting the config and replacing your local config file.

Changing your metrics

If you make changes to your metrics on a completed batch, the changes will be saved for the instance of metrics you are currently viewing. For example, if you make a change to the metrics for a single job, it will not be reflected on other jobs in the batch or going forward until you synchronise the config file manually.

Authoring batch metrics

When defining a metric in the config file, you may have spotted the type: test definition. This helps our metrics system determine where that metric should be displayed - in this case, it will be shown as a test result. The other option is type: batch, which will be shown as a batch metric - which are used for creating aggregate metrics across an entire batch. This changes the data which is available to the metric when authoring the query. This allows really easy authoring of high level metrics - such as "What was the average speed of the robot across all jobs in the batch?" which can be defined as:

Average Speed:
  type: batch
  description: Average speed of the robot across all jobs in the batch.
  query_string: SELECT AVG(speed) FROM robot_speed
  template_type: system
  template: scalar
Along with the data you have emitted, you may also want to join against the metadata table to get more human-readable information. See the metadata schema below for more details.

Dashboards

Work in Progress

Dashboards are a work in progress. These docs will be added soon!

Metric Templates

Each metric uses a template that defines how its query results are rendered. The following system templates are available; in all cases the query runs against your emitted topic data, and column order (and for charts, column aliases) determine layout and axis titles.

Line Chart

The line template expects exactly three columns in order: a series identifier, then the x value, then the y value. Each row is [series_name, x_value, y_value]. Different series_name values produce multiple series on the same chart. The axis titles are taken from the column names at positions 2 and 3, so use descriptive aliases with units (e.g. "Time (s)", "Speed (m/s)"). Emitted timestamps are in nanoseconds; divide by 1E9 to show seconds.

Example config:

Distance to Goal:
  type: test
  description: Distance to each goal over time, measured from odometry. Each line represents a different goal.
  query_string: | 
    SELECT 
      goal_name as group_name, 
      timestamp / 1E9 AS "Time (s)", 
      distance_m as "Distance to Goal (m)" 
    FROM goal_distance;
  template_type: system
  template: line

Metrics 2.0 Line Chart

Bar Chart

The bar template uses the same shape as the line template: three columns [series, x, y] and rows [series_name, x_value, y_value]. It is well suited to aggregated or categorical comparisons (e.g. per experience, per goal).

Example config:

Maximum Localization Error:
  type: batch
  description: Maximum position difference between odometry and AMCL localization across all experiences.
  query_string: |
    SELECT 
      'Maximum', 
      m.experience_name as "Experience Name", 
      MAX(position_diff_m) as "Maximum Localization Error (m)" 
    FROM pose_difference pd 
    JOIN metadata m on pd.job_id = m.job_id 
    GROUP BY m.experience_name ORDER BY m.experience_name;
  template_type: system
  template: bar

Metrics 2.0 Bar Chart

Table

The table template accepts any number of columns and any number of rows. Column names become the table headers and each row is rendered as a table row. Use it for summary statistics, key-value pairs, or any tabular output.

Example config:

AMCL Covariance Calibration:
  type: test
  description: Evaluates whether AMCL's uncertainty estimates are realistic.
  query_string: |
    SELECT "Metric", "Value" FROM (
      SELECT 1 as sort_order, 'Total Samples' as "Metric", CAST(COUNT(*) AS VARCHAR) as "Value"
      FROM covariance_accuracy
      UNION ALL
      SELECT 2 as sort_order, 'Within 1σ (target: ~68%)' as "Metric", CONCAT(CAST(ROUND(100.0 * AVG(CASE WHEN within_1_sigma = 1 THEN 1.0 ELSE 0.0 END), 1) AS VARCHAR), '%') as "Value"
      FROM covariance_accuracy
      ORDER BY sort_order
    );
  template_type: system
  template: table

Metrics 2.0 Table

Scalar

The scalar template expects the query to return a single value. The renderer uses the first row and first column. Optional status thresholds can block or warn based on a separate query (e.g. checking that a count meets a minimum). You can also provide units, which will be displayed after the value.

Example config:

Time to reach final goal:
  type: test
  description: Time between receiving first goal and reaching final goal.
  query_string: |
    SELECT
      CASE
        WHEN (SELECT COUNT(*) FROM time_to_goal) != 3 THEN 75.0
        ELSE sum(time_s)
      END
    FROM time_to_goal;
  template_type: system
  template: scalar
  units: "s"
  status:
    query_string: SELECT '1' FROM time_to_goal HAVING COUNT(*) < ?
    block: 3

Metrics 2.0 Scalar

Image

The image template expects one column whose values are image identifiers (e.g. filenames or paths) available in the job logs. Each row produces one image in the metric output.

Example config:

Stereo Camera Feed:
  type: test
  description: Camera feed from navigation, sped up 2x, starting after first goal received.
  query_string: SELECT filename FROM camera_gif
  template_type: system
  template: image

Metrics 2.0 Image Metric

State Timeline

The state timeline template shows discrete states over time for one or more systems. The query must return exactly three columns in order: system identifier, timestamp, and state name. Each row is [system_name, timestamp, state_name]. Use different system_name values to show multiple systems. Consecutive rows with the same state are merged into segments. Timestamps are in nanoseconds and are converted to elapsed seconds from the first timestamp for the x-axis; the x-axis title is "Elapsed time (s)" and the y-axis title comes from the alias you give the system column. Each unique state is assigned a color from the ReSim palette.

Example config:

Goal Status:
  type: test
  description: Status of the goals over time.
  query_string: |
    SELECT 
      'Navigation' as "System", 
      timestamp, 
      state 
    FROM goal_status;
  template_type: system
  template: state_timeline

Metrics 2.0 State Timeline

Custom templates

Work in Progress

Custom template docs will be added soon! If you need support, reach out the ReSim team.

Metadata Schema

When you SELECT * on your topics, you will notice some additional metadata such as batch_id and job_id are returned. These can be used to join against a special metadata table which contains information about your build, experience, and more. The metadata table has one row per job in the batch. Batch metrics (type: batch) query across all rows. This allows you to, in something like a batch metric, compute

  • what was the min, average, and max speed, for each experience in my test set?
  • how did the performance of this metric vary across sunny, cloudy, and rainy test scenarios?

An example batch metric:

“How does the min, avg, and max speed vary across each experience?”

SELECT m.experience_name, MIN(speed), AVG(speed), MAX(speed)
FROM drone_speed d
JOIN metadata m on d.job_id = m.job_id
GROUP BY m.experience_name;

If you want to see all metadata available you can run this query:

SELECT * FROM metadata

Column reference

Column Type Description / Observed values
org_id string Organisation identifier, e.g. "resim.ai"
batch_id string (UUID) ID of the batch this job belongs to
job_id string (UUID) Unique identifier for the individual job
project_id string (UUID) ReSim project ID
branch_id string (UUID) Branch ID within the project
branch_name string Human-readable branch name, e.g. "main"
build_id string (UUID) ID of the build under test
build_name string Human-readable build name
build_creation_timestamp timestamp When the build was created
experience_id string (UUID) ID of the experience (scenario) for this job
experience_name string Human-readable experience name
experience_tag_ids string[] UUIDs of the experience tags attached to this job
experience_tag_names string[] Names of the experience tags attached to this job
test_suite_id string (UUID) ID of the test suite
test_suite_name string Human-readable test suite name
test_suite_revision int Revision number of the test suite
job_status string Raw execution status — e.g.: 'SUCCEEDED', 'FAILED'
job_metrics_status string Metrics evaluation status — e.g. 'PASSED', 'FAILED'
job_conflated_status string Combined status used for pass/fail decisions — e.g.: 'PASSED', 'ERROR', 'BLOCKER', 'WARNING'
custom_field_keys string[] Custom field names set on the job (may be [])
custom_field_values string[] Custom field values, parallel to custom_field_keys (may be [])
time timestamp When the relevant job started

Array column usage

experience_tag_names and experience_tag_ids are Trino arrays. Common patterns:

-- Check membership
WHERE contains(experience_tag_names, 'my_tag')

-- Filter to tags matching a prefix, take the first result (1-based indexing)
filter(experience_tag_names, x -> starts_with(x, 'my_prefix_'))[1]

-- Conditional count across jobs
count_if(job_conflated_status = 'PASSED')

-- Group jobs by a tag prefix
WITH classified AS (
  SELECT
    filter(experience_tag_names, x -> starts_with(x, 'my_prefix_'))[1] AS group_name,
    job_conflated_status
  FROM metadata
  WHERE contains(experience_tag_names, 'my_tag')
)
SELECT group_name, count_if(job_conflated_status = 'PASSED') AS passed
FROM classified
WHERE group_name IS NOT NULL
GROUP BY group_name