How to augment a Prometheus query result with the labels of another metric that many not always be there

published Jan 14, 2023

This PromQL join trick does not appear to be documented anywhere (that I could find).

How to augment a Prometheus query result with the labels of another metric that many not always be there

The quest

Suppose you have two Prometheus metrics that share a subset of labels:

series from the first metric gives you a value
series from the second metric have of labels you'd like to use in your query results

Augmenting the first series with labels from the second metric is normally trivial in Prometheus to accomplish.

Let's take a typical example from node exporter. Here is are two data points, with their labelsets, for the temperature of two systems:

node_hwmon_temp_celsius{chip="i2c_6_6_002f", instance="penny", job="node", location="home", sensor="temp5"}
node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", instance="roxanne", job="node", location="home", sensor="temp3"}

Suppose that, to these results, you'd like to add the sensor label. Handily for you, there is a metric named node_hwmon_sensor_label which has exactly what you want. Here's an example:

node_hwmon_sensor_label{chip="pci0000:00_0000:00:18_3", instance="roxanne", job="node", label="tccd1", location="home", sensor="temp3"}

Cool, huh? So now all you gotta do is the typical left join query:

# PromQL query
node_hwmon_temp_celsius + on (instance, chip, sensor) group_left(label) (0 * node_hwmon_sensor_label)

What we did here is simply snag the label metric label from node_hwmon_sensor_label and add it to the labelset of the metric node_hwmon_temp_celsius. This works because there is a 1:1 correspondence (on instance, chip and sensor) between the metric on the left and the metric on the right.

Or is there?

The snag

Well, not so fast. When you do this query (at least with the above data points) you'll clearly see that label appears in the labelset of the result, but you'll also note some of your temperatures are missing!

node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", instance="roxanne", job="node", label="tccd1", location="home", sensor="temp3"}

What! What happened? Simple, actually. After a bit of digging, you realize there is no metric node_hwmon_sensor_label for the instance penny and the other matching labels.

In sum: our assumption of a 1:1 correspondence is false. There is at most one node_hwmon_sensor_label for each node_hwmon_temp_celsius, but there may very well just be zero. Since joins will exclude any datapoint with a labelset that doesn't match both sides, your metrics are gone.

You can see how that's a problem in an alert query, right? Some key data points are missing from your query result. What happens if the CPU in penny cooks to death? You would not find out through Prometheus, that's what happens.

The fix

It's actually fairly easy to fix, if you are willing to pay for a small increase in complexity of your query. I'll first show the query that fixes the issue, then I'll explain. Here goes:

# PromQL query
node_hwmon_temp_celsius
+ on (instance, chip, sensor) group_left(label) (
  0 * label_replace(
    (
      label_join(node_hwmon_temp_celsius, "label", "-", "chip", "sensor")
      unless ignoring(label) (node_hwmon_sensor_label)
      or node_hwmon_sensor_label
    ),
    "__name__", "node_hwmon_sensor_label", "__name__", ".*"
    )
)

OK, so what are we doing, explained in English? We have replaced the right hand side of the join (bolded above) with a query that goes as follows:

From the temperature metric, make me a fake label label out of the chip and the sensor labels...
- ...unless there is already a sensor label metric that matches the labelset of the temperature metric
Or simply return the real sensor label metric.
- This "or" works because the "unless" part of the query produces metrics with fake labels only whenever there are no metrics with real labels.
Finally (in the outermost part of the query) replace the "name" of the resulting metric with node_hwmon_sensor_label, so that all returned metrics have a consistent name.

In effect, what we are doing here is making Prometheus construct a totally fake node_hwmon_sensor_label whenever there isn't one to match a node_hwmon_temp_celsius. So, from maybe one, maybe zero node_hwmon_sensor_label for each node_hwmon_temp_celsius, now we've gone to a guaranteed 1:1 match between the two sides of the join.

And with that, the new query's result is clear:

{chip="pci0000:00_0000:00:18_3", instance="roxanne", job="node", label="tccd1", location="home", sensor="temp3"}
{chip="i2c_6_6_002f", instance="penny", job="node", label="i2c_6_6_002f-temp5", location="home", sensor="temp5"}

Note how the result for penny now contains a fake label with value i2c_6_6_002f-temp5, even though clearly there is no corresponding node_hwmon_sensor_label for this instance's sensor and chip.

Nifty, eh?

tips Prometheus system administration Linux reliability engineering