How to augment a Prometheus query result with the labels of another metric that many not always be there
This PromQL join trick does not appear to be documented anywhere (that I could find).
The quest
Suppose you have two Prometheus metrics that share a subset of labels:
- series from the first metric gives you a value
- series from the second metric have of labels you'd like to use in your query results
Augmenting the first series with labels from the second metric is normally trivial in Prometheus to accomplish.
Let's take a typical example from node exporter. Here is are two data points, with their labelsets, for the temperature of two systems:
node_hwmon_temp_celsius{chip="i2c_6_6_002f", instance="penny", job="node", location="home", sensor="temp5"}
node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", instance="roxanne", job="node", location="home", sensor="temp3"}
Suppose that, to these results, you'd like to add the sensor label. Handily for you, there is a metric named node_hwmon_sensor_label
which has exactly what you want. Here's an example:
node_hwmon_sensor_label{chip="pci0000:00_0000:00:18_3", instance="roxanne", job="node", label="tccd1", location="home", sensor="temp3"}
Cool, huh? So now all you gotta do is the typical left join query:
# PromQL query
node_hwmon_temp_celsius + on (instance, chip, sensor) group_left(label) (0 * node_hwmon_sensor_label)
What we did here is simply snag the label
metric label from node_hwmon_sensor_label
and add it to the labelset of the metric node_hwmon_temp_celsius
. This works because there is a 1:1 correspondence (on instance
, chip
and sensor
) between the metric on the left and the metric on the right.
Or is there?
The snag
Well, not so fast. When you do this query (at least with the above data points) you'll clearly see that label
appears in the labelset of the result, but you'll also note some of your temperatures are missing!
node_hwmon_temp_celsius{chip="pci0000:00_0000:00:18_3", instance="roxanne", job="node", label="tccd1", location="home", sensor="temp3"}
What! What happened? Simple, actually. After a bit of digging, you realize there is no metric node_hwmon_sensor_label
for the instance penny
and the other matching labels.
In sum: our assumption of a 1:1 correspondence is false. There is at most one node_hwmon_sensor_label
for each node_hwmon_temp_celsius
, but there may very well just be zero. Since joins will exclude any datapoint with a labelset that doesn't match both sides, your metrics are gone.
You can see how that's a problem in an alert query, right? Some key data points are missing from your query result. What happens if the CPU in penny
cooks to death? You would not find out through Prometheus, that's what happens.
The fix
It's actually fairly easy to fix, if you are willing to pay for a small increase in complexity of your query. I'll first show the query that fixes the issue, then I'll explain. Here goes:
# PromQL query
node_hwmon_temp_celsius
+ on (instance, chip, sensor) group_left(label) (
0 * label_replace(
(
label_join(node_hwmon_temp_celsius, "label", "-", "chip", "sensor")
unless ignoring(label) (node_hwmon_sensor_label)
or node_hwmon_sensor_label
),
"__name__", "node_hwmon_sensor_label", "__name__", ".*"
)
)
OK, so what are we doing, explained in English? We have replaced the right hand side of the join (bolded above) with a query that goes as follows:
- From the temperature metric, make me a fake
label
label out of thechip
and thesensor
labels...- ...unless there is already a sensor label metric that matches the labelset of the temperature metric
- Or simply return the real sensor label metric.
- This "or" works because the "unless" part of the query produces metrics with fake labels only whenever there are no metrics with real labels.
- Finally (in the outermost part of the query) replace the "name" of the resulting metric with
node_hwmon_sensor_label
, so that all returned metrics have a consistent name.
In effect, what we are doing here is making Prometheus construct a totally fake node_hwmon_sensor_label
whenever there isn't one to match a node_hwmon_temp_celsius
. So, from maybe one, maybe zero node_hwmon_sensor_label
for each node_hwmon_temp_celsius
, now we've gone to a guaranteed 1:1 match between the two sides of the join.
And with that, the new query's result is clear:
{chip="pci0000:00_0000:00:18_3", instance="roxanne", job="node", label="tccd1", location="home", sensor="temp3"}
{chip="i2c_6_6_002f", instance="penny", job="node", label="i2c_6_6_002f-temp5", location="home", sensor="temp5"}
Note how the result for penny
now contains a fake label
with value i2c_6_6_002f-temp5
, even though clearly there is no corresponding node_hwmon_sensor_label
for this instance
's sensor
and chip
.
Nifty, eh?