Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Create the pool of pre-allocated VMs #115

Closed
2 of 3 tasks
vadim2404 opened this issue Mar 31, 2023 · 7 comments
Closed
2 of 3 tasks

Epic: Create the pool of pre-allocated VMs #115

vadim2404 opened this issue Mar 31, 2023 · 7 comments
Assignees
Labels
c/compute Component: compute t/Epic Issue type: Epic
Milestone

Comments

@vadim2404
Copy link

vadim2404 commented Mar 31, 2023

Motivation

To reduce start-up time for VMs (currently, it's ~4 seconds), we need to have several pre-warmed VMs and be able to change min/max boundaries at runtime.

DoD

Start-up time for 99% of cases is < 1 sec

Tasks:

Related epics/tasks/PRs:

@vadim2404 vadim2404 added the t/Epic Issue type: Epic label Mar 31, 2023
@vadim2404 vadim2404 added this to the 2023/04 milestone Mar 31, 2023
@vadim2404 vadim2404 modified the milestones: 2023/04, 2023/05 Mar 31, 2023
@ololobus
Copy link
Member

ololobus commented Mar 31, 2023

I don't think that we have to do it specifically for VMs only. There are at least two things that prevent doing it only on the autoscaler / neonvm side.

  1. Compute versions, they are picked based on storage node versions, so only control-plane knows which version of the pre-created compute is needed. After storage release there always will be pre-created computes left with stale version, so control-plane needs to tear them down and spawn fresh ones.

  2. compute_ctl needs JWT and id (compute_id) to be able to get spec from the control-plane. These two are also known to the control-plane only, and future compute_id<->endpoint_id binding too.

Thus, as I imagined and discussed it with @tychoish and @kelvich primarily, pool of pre-created 'whatever' (pods, VMs or any other custom resource in the future) is maintained by the control-plane. From compute we only need an interface to 'wake it up' and notify that it now serves some particular timeline. I added some PoC compute_ctl code in this PR neondatabase/neon#3923

So the only required part from the autoscaler perspective is:

be able to change min/max boundaries at runtime

And as we discussed with @sharnoff, we will likely need to switch to using labels / annotations for that, since this is usually the only thing one can change on k8s objects without re-creation / restart.

As for the DoD

Start-up time for 99% of cases is < 1 sec

I don't think it should be in the Epic, but rahter in the PReq, as it depends on waaay to many code paths in all components. E.g. we already have some p99 outliers shortly after pageserver restarts. Control-plane could be sloppy and do many sub-optimal movements. And so on.

That said, I don't think that this target is realistic for a first iteration. Something like p90 or even p80 is more realistic, as any really huge spike from Hasura will likely hit k8s node autoscaler and p99 will ~20s-1min

@vadim2404
Copy link
Author

we discussed this item with you and it requires some on our side as well, therefore the epic is created.

be able to change min/max boundaries at runtime
If it's enough, then who will be against it? :)

That said, I don't think that this target is realistic for a first iteration. Something like p90 or even p80 is more realistic, as any really huge spike from Hasura will likely hit k8s node autoscaler and p99 will ~20s-1min

I don't think that we need to stop after the first iteration. The ultimate goal is to have a really quick start-up for computes. Then, I don't want to discount the ultimate goal from the very beginning.

@sharnoff
Copy link
Member

Was going to unassign myself because it seemed like the remaining work is in https://github.com/neondatabase/cloud/pull/4741, but given this:

That said, I don't think that this target is realistic for a first iteration. Something like p90 or even p80 is more realistic, as any really huge spike from Hasura will likely hit k8s node autoscaler and p99 will ~20s-1min

I don't think that we need to stop after the first iteration. The ultimate goal is to have a really quick start-up for computes. Then, I don't want to discount the ultimate goal from the very beginning.

I added #77 to the task list, and am keeping myself assigned :)

However: I think the DoD does not match the issue title, so one of them should probably change.

@ololobus
Copy link
Member

@sharnoff am I right that we can now change autoscaling limits on NeonVM without restart?

@sharnoff
Copy link
Member

In theory we can, but there's still some bugs that makes it unusable in practice (#249, #252). Happy to prioritize those if it'd be useful. So far, my understanding was that the compute pool would just be creating VMs with their target sizes for now.

@ololobus
Copy link
Member

the compute pool would just be creating VMs with their target sizes for now

If we do pool matrix for all (even some) combinations of min/max there will be just too many dimensions. I think that we will end up with a standard free tier size (1/4) for beginning. We can gather stats on the most common compute sizes for pods (obviously 1/4) and VMs (not sure)

So I think that we absolutely need this for moving everyone to VMs (it's where start time is much worse), but in terms of min/max VMs are not worse than pods now (we cannot patch them without restart either), so I don't think it's urgent

cc @tychoish @nikitakalyanov just in case

@sharnoff
Copy link
Member

Closing as completed because it's (broadly) been implemented on the control plane side which, which was the last remaining item here. See also https://github.com/neondatabase/cloud/pull/4741#issuecomment-1644267920

@stepashka stepashka added the c/compute Component: compute label Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/compute Component: compute t/Epic Issue type: Epic
Projects
None yet
Development

No branches or pull requests

4 participants