Metrics Writer

When authoring metrics, we provide a writer that lets you more easily output all your metrics into our protobuf format, without dealing with protobuf or any details of the file format. We call this the ResimMetricsWriter.

Overall usage

An example usage is as follows:

from resim.metrics.proto.validate_metrics_proto import validate_job_metrics
from resim.metrics.python.metrics_writer import ResimMetricsWriter

metrics_writer = ResimMetricsWriter(JOB_ID) # Make metrics writer!

# TODO: Add your metrics or metrics data here!
metrics_writer.add_metrics_data(...)
metrics_writer.add_metric(...)
# END TODO!

metrics_output = metrics_writer.write() # This gives you the protobuf to output!
validate_job_metrics(metrics_output.metrics_msg) # This should validate, if you wrote valid metrics!

# Finally, write that message to file as metrics.binproto.
with open("/tmp/resim/outputs/metrics.binproto", "wb") as f:
      f.write(metrics_output.metrics_msg.SerializeToString())

For transparency, we provide the output as a protobuf message for you to write to file (rather than directly writing it to file) as in our experience it can be useful to inspect the output protobuf, and (for example) write it to text protobuf so you can inspect it manually.

How to write metrics

The easiest way to write a new metric to the ResimMetricsWriter is to use the fluent API provided by the Metric class.

Below is an example, making a DoubleOverTimeMetric using the fluent API - for more info on what this is, see the Metrics Types docs.

# Example metrics data - not necessarily written to metrics writer
TIMESTAMPS: MetricsData = ...
ERROR_INDEXED_BY_TIMESTAMPS: MetricsData = ...
STATUSES_INDEXED_BY_TIMESTAMPS: MetricsData = ...

# Example write
metrics_writer
      .add_double_over_time_metric("Localization error") # Type and name specified here
      .with_description("Accumulated error in localization over time")
      .append_doubles_over_time_data(ERROR_INDEXED_BY_TIMESTAMPS) # Append a single data series, as we only want to plot one
      .append_statuses_over_time_data(STATUSES_INDEXED_BY_TIMESTAMPS) # Append associated statuses
      .with_failure_definitions([DoubleFailureDefinition(fails_below=0.0, fails_above=1.0)])
      .with_start_time(Timestamp(secs=0))
      .with_end_time(Timestamp(secs=10))
      .with_y_axis_name("Error (%)")
      .with_legend_series_names(["Error"])
      .with_status(MetricStatus.PASSED_METRIC_STATUS)
      .with_importance(MetricImportance.HIGH_IMPORTANCE)
      .with_tag("err_count", "32")
      .with_should_display(True)
      .with_blocking(False)

This will write a double over time metric to our output, with all the desired data and properties. As a rule of thumb, most of the parameters described in the Metrics Types docs can be set using the fluent API by prepending with_ (or append_, for lists.)

NB: Notice that the MetricsData we used does not necessarily have to be written to the metrics writer. Provided the top-level metric referencing that data is written to the metrics writer, any associated data will also be written. For example, even the TIMESTAMPS data will (transitively) be written to our output file by the above code.

Writing metrics data

You can also manually write MetricsData to the metrics_writer using

metrics_writer.add_metrics_data(TIMESTAMPS)

This is useful, because we may want to write data that is not immediately referenced by any metric. Such data may be used by batch metrics. The MetricsData class also has a simple fluent API to make this easier.

Overriding the metrics status

By default, the writer.write() function constructs a protobuf message that computes an overall job status based on the individual metrics. The logic is fairly simple: - if any metric is a blocking failure (FAIL_BLOCK_METRIC_STATUS), the overall job will be considered a blocking failure; - otherwise, if any metric is a warning (FAIL_WARN_METRIC_STATUS), the overall job will be considered a warning failure; - otherwise, the overall job will be considered a pass;

It is, however, possible to override that calculation with writer.write(metrics_status_override=FAIL_BLOCK_METRIC_STATUS), or whatever status you wish. This is particularly useful if you would like to ignore warnings, for example.

Validating the metrics writer output

It's important to validate your output, to check it's a valid protobuf message that our system can plot. We currently provide this through the validate_job_metrics function in resim.metrics.proto.validate_metrics_proto. Note that this is called on the output metrics message of the metrics writer (i.e. output.metrics_msg), not the metrics writer itself.