instancestorage: asynchronously pre-allocate instance IDs in sql_instances #90427

jaylim-crl · 2022-10-21T04:40:48Z

Related to #85737.

Previously, when a SQL pod starts up, we will allocate an instance ID for it
synchronously. This can be a problem for multi-region setup because that would
now involve a read and write across all regions. This commit is part of the
work to make the system.sql_instances table REGIONAL BY ROW to reduce cold
start times. Here, we would asynchronously pre-allocate instance IDs, and the
startup process will then claim an ID for the SQL pod. Once the sql_instances
table is made REGIONAL BY ROW, we can update the logic to pre-allocate for
every region, and only perform a regional lookup when claiming an ID.

At the same time, we will now use a cached reader of the sqlLivenessProvider
with the SQL instance storage. This is fine since the cached reader is on the
conservative side, and will return true on IsAlive if it does not know. The
only downside to this is that we leave instance rows lying around in the
sql_instances table longer, but that is fine since they will eventually be
reclaimed once the cache gets updated.

Epic: None

Release note (sql change): The system.sql_instances table now includes
pre-allocated ID entries, where all the fields except id will be NULL.

cockroach-teamcity · 2022-10-21T04:40:55Z

This change is

jaylim-crl · 2022-10-21T04:42:34Z

Making this a draft for now. Will need to clean up tests. Note that this change should be backwards compatible since empty rows are essentially instance rows with expired sessions.

ajwerner

I want to give this a closer read, but it's a nice change.

Reviewed 3 of 10 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jaylim-crl and @jeffswenson)

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 38 at r1 (raw file):

	// AllocateLoopFrequency refers to the frequency in which the allocate
	// background task for instance rows will run.
	AllocateLoopFrequency = 30 * time.Second

Make this a cluster setting. Also, when you use this, jitter it +/- 15%. Lastly, I think you can increase the duration substantially. If this ran once every 10 minutes I think that that would be fine. If the worry is that SQL pods don't last that long, we could use a shortage of IDs when we claim as a signal to kick the loop to run. Like if we find that there are fewer than some number in the current region, then we could kick off the loop.

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 202 at r1 (raw file):

	}

	// We are done. All the requested count are already pre-allocated.

Another possible design approach here might be to:

Kick off the task to re-assign IDs before we attempt to get one, but have it wait before running.
Return an error in this case that results in this code poking that loop to attempt to reclaim some IDs, and maybe have it notify back when it has done so
Just retry to get an ID in the local region.

The advantage to that approach is that there only then needs to be one piece of code to re-assign IDs. WDYT?

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 345 at r1 (raw file):

		defer timer.Stop()

		// Fire timer immediately.

I don't think it's a good idea to start the reallocation immediately upon launch. We should wait a bit.

Code quote:

		// Fire timer immediately.
		timer.Reset(0)

jaylim-crl

TFTR!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jeffswenson)

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 38 at r1 (raw file):

Previously, ajwerner wrote…

Make this a cluster setting. Also, when you use this, jitter it +/- 15%. Lastly, I think you can increase the duration substantially. If this ran once every 10 minutes I think that that would be fine. If the worry is that SQL pods don't last that long, we could use a shortage of IDs when we claim as a signal to kick the loop to run. Like if we find that there are fewer than some number in the current region, then we could kick off the loop.

That's acceptable. I have this as a TODO above, but I'll just do it in this PR.

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 202 at r1 (raw file):

Previously, ajwerner wrote…

Another possible design approach here might be to:

Kick off the task to re-assign IDs before we attempt to get one, but have it wait before running.

Return an error in this case that results in this code poking that loop to attempt to reclaim some IDs, and maybe have it notify back when it has done so

Just retry to get an ID in the local region.

The advantage to that approach is that there only then needs to be one piece of code to re-assign IDs. WDYT?

I thought about this, and I think one disadvantage to that approach is that we'll have an additional read in the local region, which I was trying to avoid. The idea behind the current approach in this PR is that the instance IDs are mainly allocated asynchronously, but if there are no instance IDs, we would do it synchronously without letting callers wait. I'd prefer this approach to ensure that waiting is minimal.

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 345 at r1 (raw file):

Previously, ajwerner wrote…

I don't think it's a good idea to start the reallocation immediately upon launch. We should wait a bit.

How long is a bit here? If we did the other suggestion of triggering the allocation loop when the number of IDs is lower than a certain threshold during claiming, would that be acceptable? (In theory that's somewhat similar to starting the reallocation immediately.) The reason behind this is that we want to make sure the IDs are mostly allocated during the resumption of the first SQL pod right after tenant creation, and we definitely don't want to keep the first pod around for 10 minutes.

jeffswenson

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 202 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

I thought about this, and I think one disadvantage to that approach is that we'll have an additional read in the local region, which I was trying to avoid. The idea behind the current approach in this PR is that the instance IDs are mainly allocated asynchronously, but if there are no instance IDs, we would do it synchronously without letting callers wait. I'd prefer this approach to ensure that waiting is minimal.

I like Andrew's suggestion of triggering pre-allocation if there are insufficient IDs and waiting for the preallocation to complete if there are no available Ids. It moves all ID allocation to a single code path and it gracefully ensures there are likely to be pre-allocated ids if we start many pods in a short time period. We don't have to poll for pre-allocation to complete. We could do something like pass a channel to the background go routine and have the channel close after pre-allocation is done. So there is really no extra waiting introduced by using the dedicated pre-allocation logic.

ajwerner

Related to a learning from last week, we should work to make this subsystem avoid intents and avoid long-running transactions. I think we want to change the reclaim loop to not hold any locks for any duration. To do this, I think the reclaim loop, to the extent possible, should first scan to find candidates to reclaim and then use a separate transaction to atomically perform the claim. One possible approach to pull that off would be to cput the write to allocate the ID to a region and have that atomically broadcast the same deletion to all regions.

The subtlety of making that transaction "good" is more motivation to have just one piece of code responsible for handing out IDs to regions.

I do think it'd be a good idea to pre-emptively kick off this logic soon after startup if we detect that there are not many IDs available in the current region. One possibility is to do something probabilistic, such that the maximal duration we'll wait is inversely proportional to the number of remaining IDs. That may be too fancy. So long as we ensure that the protocol doesn't run a risk of dangerous contention during execution, I'm more on board with accepting concurrent execution of these loops.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson)

jeffswenson · 2022-11-02T13:47:25Z

Related to a learning from last week, we should work to make this subsystem avoid intents and avoid long-running transactions. I think we want to change the reclaim loop to not hold any locks for any duration. To do this, I think the reclaim loop, to the extent possible, should first scan to find candidates to reclaim and then use a separate transaction to atomically perform the claim. One possible approach to pull that off would be to cput the write to allocate the ID to a region and have that atomically broadcast the same deletion to all regions.

I was thinking about this last night, and the instance IDs allocations have a lower risk of livelocks when compared to sqlliveness. The sql_instance table has no heart beats. So there are fewer transactions to conflict with the background job. I agree 10 minutes is a better background period for this, but I'm skeptical we need to get too fancy.

jeffswenson · 2022-11-02T13:49:09Z

Something else to consider: as long as claiming an ID takes precedence over the background job, it is okay for the background job to acquire locks. So we can probably sidestep the risk by using transaction priorities.

jaylim-crl

TFTRs! This PR is ready for another round of review. The transaction that does claiming of IDs was marked as high priority.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jeffswenson)

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 38 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

That's acceptable. I have this as a TODO above, but I'll just do it in this PR.

Done.

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 202 at r1 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

I like Andrew's suggestion of triggering pre-allocation if there are insufficient IDs and waiting for the preallocation to complete if there are no available Ids. It moves all ID allocation to a single code path and it gracefully ensures there are likely to be pre-allocated ids if we start many pods in a short time period. We don't have to poll for pre-allocation to complete. We could do something like pass a channel to the background go routine and have the channel close after pre-allocation is done. So there is really no extra waiting introduced by using the dedicated pre-allocation logic.

Done. We will now trigger pre-allocation if there are insufficient IDs during an assignment, and retry accordingly. All ID reclamation operations were moved to one area as well.

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 345 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

How long is a bit here? If we did the other suggestion of triggering the allocation loop when the number of IDs is lower than a certain threshold during claiming, would that be acceptable? (In theory that's somewhat similar to starting the reallocation immediately.) The reason behind this is that we want to make sure the IDs are mostly allocated during the resumption of the first SQL pod right after tenant creation, and we definitely don't want to keep the first pod around for 10 minutes.

Removed the immediate firing of timer. We will do this automatically when there are insufficient IDs during an assignment.

pkg/sql/sqlinstance/instancestorage/instancestorage.go

jeffswenson

LGTM

ajwerner

mod the minor note to back off in case we keep failing to find an ID.

Reviewed 1 of 9 files at r2, 2 of 3 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jaylim-crl and @jeffswenson)

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 187 at r3 (raw file):

	}
	var err error
	for {

Can you add backoff to this loop given it could potentially spin? The situation in which I see it spinning is if there are no rows and the cached reader takes time to find out that sessions are dead.

…ances Related to cockroachdb#85737. Previously, when a SQL pod starts up, we will allocate an instance ID for it synchronously. This can be a problem for multi-region setup because that would now involve a read and write across all regions. This commit is part of the work to make the system.sql_instances table REGIONAL BY ROW to reduce cold start times. Here, we would asynchronously pre-allocate instance IDs, and the startup process will then claim an ID for the SQL pod. Once the sql_instances table is made REGIONAL BY ROW, we can update the logic to pre-allocate for every region, and only perform a regional lookup when claiming an ID. At the same time, we will now use a cached reader of the sqlLivenessProvider with the SQL instance storage. This is fine since the cached reader is on the conservative side, and will return true on IsAlive if it does not know. The only downside to this is that we leave instance rows lying around in the sql_instances table longer, but that is fine since they will eventually be reclaimed once the cache gets updated. Epic: None Release note (sql change): The `system.sql_instances` table now includes pre-allocated ID entries, where all the fields except `id` will be NULL.

jaylim-crl

TFTRs!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @jeffswenson)

pkg/sql/sqlinstance/instancestorage/instancestorage.go line 187 at r3 (raw file):

Previously, ajwerner wrote…

Can you add backoff to this loop given it could potentially spin? The situation in which I see it spinning is if there are no rows and the cached reader takes time to find out that sessions are dead.

Done.

jaylim-crl · 2022-11-08T05:41:34Z

bors r=JeffSwenson,ajwerner

Failures for TestDockerCLI and TestComposeGSS are unrelated.

craig · 2022-11-08T13:13:41Z

Build succeeded:

Bazel Essential CI (Cockroach)

jaylim-crl requested review from jeffswenson and ajwerner October 21, 2022 04:42

ajwerner mentioned this pull request Oct 24, 2022

sqlliveness: one multi-tenant server can delete another server's instance ID #90576

Closed

jaylim-crl force-pushed the jay/221021-sql-instances-preallocator branch 4 times, most recently from b4f8c41 to daed016 Compare October 31, 2022 04:03

jaylim-crl marked this pull request as ready for review October 31, 2022 08:53

jaylim-crl requested review from a team as code owners October 31, 2022 08:53

ajwerner reviewed Oct 31, 2022

View reviewed changes

jaylim-crl requested a review from ajwerner November 1, 2022 17:48

jaylim-crl commented Nov 1, 2022

View reviewed changes

jeffswenson reviewed Nov 2, 2022

View reviewed changes

ajwerner reviewed Nov 2, 2022

View reviewed changes

jaylim-crl force-pushed the jay/221021-sql-instances-preallocator branch from daed016 to b0b3376 Compare November 4, 2022 11:22

jaylim-crl requested review from ajwerner and jeffswenson November 4, 2022 11:33

jaylim-crl commented Nov 4, 2022

View reviewed changes

jaylim-crl force-pushed the jay/221021-sql-instances-preallocator branch 3 times, most recently from 6f972c8 to ed4742e Compare November 4, 2022 14:15

jeffswenson reviewed Nov 4, 2022

View reviewed changes

pkg/sql/sqlinstance/instancestorage/instancestorage.go Outdated Show resolved Hide resolved

pkg/sql/sqlinstance/instancestorage/instancestorage.go Outdated Show resolved Hide resolved

pkg/sql/sqlinstance/instancestorage/instancestorage.go Show resolved Hide resolved

jaylim-crl force-pushed the jay/221021-sql-instances-preallocator branch from ed4742e to a8c1aec Compare November 7, 2022 17:07

jeffswenson approved these changes Nov 7, 2022

View reviewed changes

ajwerner approved these changes Nov 7, 2022

View reviewed changes

jaylim-crl force-pushed the jay/221021-sql-instances-preallocator branch from a8c1aec to f4dc2a0 Compare November 7, 2022 21:53

jaylim-crl requested review from ajwerner and jeffswenson November 7, 2022 21:54

jaylim-crl commented Nov 7, 2022

View reviewed changes

craig bot merged commit d8ebd29 into cockroachdb:master Nov 8, 2022

cockroach-teamcity mentioned this pull request Nov 8, 2022

PR #90427 - instancestorage: asynchronously pre-allocate instance IDs in sql_instances cockroachdb/docs#15573

Open

jaylim-crl deleted the jay/221021-sql-instances-preallocator branch November 30, 2022 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instancestorage: asynchronously pre-allocate instance IDs in sql_instances #90427

instancestorage: asynchronously pre-allocate instance IDs in sql_instances #90427

jaylim-crl commented Oct 21, 2022 •

edited

Loading

cockroach-teamcity commented Oct 21, 2022

jaylim-crl commented Oct 21, 2022

ajwerner left a comment

jaylim-crl left a comment

jeffswenson left a comment

ajwerner left a comment

jeffswenson commented Nov 2, 2022 •

edited

Loading

jeffswenson commented Nov 2, 2022

jaylim-crl left a comment

jeffswenson left a comment

ajwerner left a comment

jaylim-crl left a comment

jaylim-crl commented Nov 8, 2022

craig bot commented Nov 8, 2022

instancestorage: asynchronously pre-allocate instance IDs in sql_instances #90427

instancestorage: asynchronously pre-allocate instance IDs in sql_instances #90427

Conversation

jaylim-crl commented Oct 21, 2022 • edited Loading

cockroach-teamcity commented Oct 21, 2022

jaylim-crl commented Oct 21, 2022

ajwerner left a comment

Choose a reason for hiding this comment

jaylim-crl left a comment

Choose a reason for hiding this comment

jeffswenson left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

jeffswenson commented Nov 2, 2022 • edited Loading

jeffswenson commented Nov 2, 2022

jaylim-crl left a comment

Choose a reason for hiding this comment

jeffswenson left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

jaylim-crl left a comment

Choose a reason for hiding this comment

jaylim-crl commented Nov 8, 2022

craig bot commented Nov 8, 2022

jaylim-crl commented Oct 21, 2022 •

edited

Loading

jeffswenson commented Nov 2, 2022 •

edited

Loading