compute: support labelling spot instance as boolean #263

jjo · 2024-07-22T18:50:54Z

What

Support e.g. --compute.spot-as-boolean-label=LABEL_NAME (e.g. with _LABEL_NAME_=is_spot), to set such additional label.

Why

As a Spot instance is driven by "upper" level workloads' choice (when setting the nodeSelectors/taints/labels to trigger CA to allocate such specific nodes to them), with workloads explicitly knowing the rewards/risk aspects of spots. This is quite different from ondemand vs RIs (granted the latter are the only ones guaranteed to be available).

In the case of GrafanaLabs, this has grown in our recording rules to be used as spot={true|false}, which is now forcing us to use below relabeling trick (actually add-labelling) to set it:

(
  label_replace(<CSP>_instance_cpu_usd_per_core_hour{price_tier="spot"}, "spot", "true", "", "")
  or
  label_replace(<CSP>_instance_cpu_usd_per_core_hour{price_tier!="spot"}, "spot", "false", "", "")
)

The above needs to be done for every you may run (especially the case if you have a centralized TSDB), note also that you need to visit every timeseries to perform this relabelling (under the context of TSDB pressure), create the two TSs sets, to then OR them.

The text was updated successfully, but these errors were encountered:

logyball · 2024-07-22T21:14:14Z

This issue feels like a fine one to add to me. It does expand the set of labels for compute instances by one per metric, but doesn't add cardinality, and will probably have conveniences down the line. I don't even think we need this as a config, to be honest, we can just straight-up add a spot boolean where:

ondemand and reserved instances (and any other types that cloud providers decide to create in the future) set it to false
spot instances set it to true

Pokom · 2024-07-24T12:45:13Z

@jjo and I synced up on this yesterday and came to the agreement that the best thing to do here is to simply add a new label called spot that is a boolean value. This label is more of a relic of how we internally compute our TCO metrics, but there's potential value for others as well.

The rationale for a new label is

Minimal overhead added to each time series
Reduced complexity by avoiding an operational toggle

jjo · 2024-09-02T19:21:56Z

Closing after syncing with @the-it on fully embracing price_tier instead.

jjo mentioned this issue Jul 23, 2024

compute: allow "instance" label name to be set (behind a flag) #266

Closed

jjo closed this as completed Sep 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute: support labelling spot instance as boolean #263

compute: support labelling spot instance as boolean #263

jjo commented Jul 22, 2024 •

edited

Loading

logyball commented Jul 22, 2024 •

edited

Loading

Pokom commented Jul 24, 2024

jjo commented Sep 2, 2024

compute: support labelling spot instance as boolean #263

compute: support labelling spot instance as boolean #263

Comments

jjo commented Jul 22, 2024 • edited Loading

What

Why

logyball commented Jul 22, 2024 • edited Loading

Pokom commented Jul 24, 2024

jjo commented Sep 2, 2024

jjo commented Jul 22, 2024 •

edited

Loading

logyball commented Jul 22, 2024 •

edited

Loading