Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute: support labelling spot instance as boolean #263

Closed
jjo opened this issue Jul 22, 2024 · 3 comments
Closed

compute: support labelling spot instance as boolean #263

jjo opened this issue Jul 22, 2024 · 3 comments

Comments

@jjo
Copy link
Contributor

jjo commented Jul 22, 2024

What

Support e.g. --compute.spot-as-boolean-label=LABEL_NAME (e.g. with _LABEL_NAME_=is_spot), to set such additional label.

Why

As a Spot instance is driven by "upper" level workloads' choice (when setting the nodeSelectors/taints/labels to trigger CA to allocate such specific nodes to them), with workloads explicitly knowing the rewards/risk aspects of spots. This is quite different from ondemand vs RIs (granted the latter are the only ones guaranteed to be available).

In the case of GrafanaLabs, this has grown in our recording rules to be used as spot={true|false}, which is now forcing us to use below relabeling trick (actually add-labelling) to set it:

(
  label_replace(<CSP>_instance_cpu_usd_per_core_hour{price_tier="spot"}, "spot", "true", "", "")
  or
  label_replace(<CSP>_instance_cpu_usd_per_core_hour{price_tier!="spot"}, "spot", "false", "", "")
)

The above needs to be done for every you may run (especially the case if you have a centralized TSDB), note also that you need to visit every timeseries to perform this relabelling (under the context of TSDB pressure), create the two TSs sets, to then OR them.

@logyball
Copy link
Contributor

logyball commented Jul 22, 2024

This issue feels like a fine one to add to me. It does expand the set of labels for compute instances by one per metric, but doesn't add cardinality, and will probably have conveniences down the line. I don't even think we need this as a config, to be honest, we can just straight-up add a spot boolean where:

  • ondemand and reserved instances (and any other types that cloud providers decide to create in the future) set it to false
  • spot instances set it to true

@Pokom
Copy link
Contributor

Pokom commented Jul 24, 2024

@jjo and I synced up on this yesterday and came to the agreement that the best thing to do here is to simply add a new label called spot that is a boolean value. This label is more of a relic of how we internally compute our TCO metrics, but there's potential value for others as well.

The rationale for a new label is

  1. Minimal overhead added to each time series
  2. Reduced complexity by avoiding an operational toggle

@jjo
Copy link
Contributor Author

jjo commented Sep 2, 2024

Closing after syncing with @the-it on fully embracing price_tier instead.

@jjo jjo closed this as completed Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants