Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instancestorage: asynchronously pre-allocate instance IDs in sql_instances #90427

Merged

Conversation

jaylim-crl
Copy link
Collaborator

@jaylim-crl jaylim-crl commented Oct 21, 2022

Related to #85737.

Previously, when a SQL pod starts up, we will allocate an instance ID for it
synchronously. This can be a problem for multi-region setup because that would
now involve a read and write across all regions. This commit is part of the
work to make the system.sql_instances table REGIONAL BY ROW to reduce cold
start times. Here, we would asynchronously pre-allocate instance IDs, and the
startup process will then claim an ID for the SQL pod. Once the sql_instances
table is made REGIONAL BY ROW, we can update the logic to pre-allocate for
every region, and only perform a regional lookup when claiming an ID.

At the same time, we will now use a cached reader of the sqlLivenessProvider
with the SQL instance storage. This is fine since the cached reader is on the
conservative side, and will return true on IsAlive if it does not know. The
only downside to this is that we leave instance rows lying around in the
sql_instances table longer, but that is fine since they will eventually be
reclaimed once the cache gets updated.

Epic: None

Release note (sql change): The system.sql_instances table now includes
pre-allocated ID entries, where all the fields except id will be NULL.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jaylim-crl
Copy link
Collaborator Author

Making this a draft for now. Will need to clean up tests. Note that this change should be backwards compatible since empty rows are essentially instance rows with expired sessions.

@jaylim-crl jaylim-crl force-pushed the jay/221021-sql-instances-preallocator branch 4 times, most recently from b4f8c41 to daed016 Compare October 31, 2022 04:03
@jaylim-crl jaylim-crl marked this pull request as ready for review October 31, 2022 08:53
@jaylim-crl jaylim-crl requested review from a team as code owners October 31, 2022 08:53
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to give this a closer read, but it's a nice change.

Reviewed 3 of 10 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jaylim-crl and @jeffswenson)


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 38 at r1 (raw file):

	// AllocateLoopFrequency refers to the frequency in which the allocate
	// background task for instance rows will run.
	AllocateLoopFrequency = 30 * time.Second

Make this a cluster setting. Also, when you use this, jitter it +/- 15%. Lastly, I think you can increase the duration substantially. If this ran once every 10 minutes I think that that would be fine. If the worry is that SQL pods don't last that long, we could use a shortage of IDs when we claim as a signal to kick the loop to run. Like if we find that there are fewer than some number in the current region, then we could kick off the loop.


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 202 at r1 (raw file):

	}

	// We are done. All the requested count are already pre-allocated.

Another possible design approach here might be to:

  1. Kick off the task to re-assign IDs before we attempt to get one, but have it wait before running.
  2. Return an error in this case that results in this code poking that loop to attempt to reclaim some IDs, and maybe have it notify back when it has done so
  3. Just retry to get an ID in the local region.

The advantage to that approach is that there only then needs to be one piece of code to re-assign IDs. WDYT?


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 345 at r1 (raw file):

		defer timer.Stop()

		// Fire timer immediately.

I don't think it's a good idea to start the reallocation immediately upon launch. We should wait a bit.

Code quote:

		// Fire timer immediately.
		timer.Reset(0)

@jaylim-crl jaylim-crl requested a review from ajwerner November 1, 2022 17:48
Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jeffswenson)


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 38 at r1 (raw file):

Previously, ajwerner wrote…

Make this a cluster setting. Also, when you use this, jitter it +/- 15%. Lastly, I think you can increase the duration substantially. If this ran once every 10 minutes I think that that would be fine. If the worry is that SQL pods don't last that long, we could use a shortage of IDs when we claim as a signal to kick the loop to run. Like if we find that there are fewer than some number in the current region, then we could kick off the loop.

That's acceptable. I have this as a TODO above, but I'll just do it in this PR.


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 202 at r1 (raw file):

Previously, ajwerner wrote…

Another possible design approach here might be to:

  1. Kick off the task to re-assign IDs before we attempt to get one, but have it wait before running.
  2. Return an error in this case that results in this code poking that loop to attempt to reclaim some IDs, and maybe have it notify back when it has done so
  3. Just retry to get an ID in the local region.

The advantage to that approach is that there only then needs to be one piece of code to re-assign IDs. WDYT?

I thought about this, and I think one disadvantage to that approach is that we'll have an additional read in the local region, which I was trying to avoid. The idea behind the current approach in this PR is that the instance IDs are mainly allocated asynchronously, but if there are no instance IDs, we would do it synchronously without letting callers wait. I'd prefer this approach to ensure that waiting is minimal.


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 345 at r1 (raw file):

Previously, ajwerner wrote…

I don't think it's a good idea to start the reallocation immediately upon launch. We should wait a bit.

How long is a bit here? If we did the other suggestion of triggering the allocation loop when the number of IDs is lower than a certain threshold during claiming, would that be acceptable? (In theory that's somewhat similar to starting the reallocation immediately.) The reason behind this is that we want to make sure the IDs are mostly allocated during the resumption of the first SQL pod right after tenant creation, and we definitely don't want to keep the first pod around for 10 minutes.

Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 202 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

I thought about this, and I think one disadvantage to that approach is that we'll have an additional read in the local region, which I was trying to avoid. The idea behind the current approach in this PR is that the instance IDs are mainly allocated asynchronously, but if there are no instance IDs, we would do it synchronously without letting callers wait. I'd prefer this approach to ensure that waiting is minimal.

I like Andrew's suggestion of triggering pre-allocation if there are insufficient IDs and waiting for the preallocation to complete if there are no available Ids. It moves all ID allocation to a single code path and it gracefully ensures there are likely to be pre-allocated ids if we start many pods in a short time period. We don't have to poll for pre-allocation to complete. We could do something like pass a channel to the background go routine and have the channel close after pre-allocation is done. So there is really no extra waiting introduced by using the dedicated pre-allocation logic.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to a learning from last week, we should work to make this subsystem avoid intents and avoid long-running transactions. I think we want to change the reclaim loop to not hold any locks for any duration. To do this, I think the reclaim loop, to the extent possible, should first scan to find candidates to reclaim and then use a separate transaction to atomically perform the claim. One possible approach to pull that off would be to cput the write to allocate the ID to a region and have that atomically broadcast the same deletion to all regions.

The subtlety of making that transaction "good" is more motivation to have just one piece of code responsible for handing out IDs to regions.

I do think it'd be a good idea to pre-emptively kick off this logic soon after startup if we detect that there are not many IDs available in the current region. One possibility is to do something probabilistic, such that the maximal duration we'll wait is inversely proportional to the number of remaining IDs. That may be too fancy. So long as we ensure that the protocol doesn't run a risk of dangerous contention during execution, I'm more on board with accepting concurrent execution of these loops.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson)

@jeffswenson
Copy link
Collaborator

jeffswenson commented Nov 2, 2022

Related to a learning from last week, we should work to make this subsystem avoid intents and avoid long-running transactions. I think we want to change the reclaim loop to not hold any locks for any duration. To do this, I think the reclaim loop, to the extent possible, should first scan to find candidates to reclaim and then use a separate transaction to atomically perform the claim. One possible approach to pull that off would be to cput the write to allocate the ID to a region and have that atomically broadcast the same deletion to all regions.

I was thinking about this last night, and the instance IDs allocations have a lower risk of livelocks when compared to sqlliveness. The sql_instance table has no heart beats. So there are fewer transactions to conflict with the background job. I agree 10 minutes is a better background period for this, but I'm skeptical we need to get too fancy.

@jeffswenson
Copy link
Collaborator

Something else to consider: as long as claiming an ID takes precedence over the background job, it is okay for the background job to acquire locks. So we can probably sidestep the risk by using transaction priorities.

@jaylim-crl jaylim-crl force-pushed the jay/221021-sql-instances-preallocator branch from daed016 to b0b3376 Compare November 4, 2022 11:22
Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! This PR is ready for another round of review. The transaction that does claiming of IDs was marked as high priority.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jeffswenson)


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 38 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

That's acceptable. I have this as a TODO above, but I'll just do it in this PR.

Done.


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 202 at r1 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

I like Andrew's suggestion of triggering pre-allocation if there are insufficient IDs and waiting for the preallocation to complete if there are no available Ids. It moves all ID allocation to a single code path and it gracefully ensures there are likely to be pre-allocated ids if we start many pods in a short time period. We don't have to poll for pre-allocation to complete. We could do something like pass a channel to the background go routine and have the channel close after pre-allocation is done. So there is really no extra waiting introduced by using the dedicated pre-allocation logic.

Done. We will now trigger pre-allocation if there are insufficient IDs during an assignment, and retry accordingly. All ID reclamation operations were moved to one area as well.


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 345 at r1 (raw file):

Previously, jaylim-crl (Jay Lim) wrote…

How long is a bit here? If we did the other suggestion of triggering the allocation loop when the number of IDs is lower than a certain threshold during claiming, would that be acceptable? (In theory that's somewhat similar to starting the reallocation immediately.) The reason behind this is that we want to make sure the IDs are mostly allocated during the resumption of the first SQL pod right after tenant creation, and we definitely don't want to keep the first pod around for 10 minutes.

Removed the immediate firing of timer. We will do this automatically when there are insufficient IDs during an assignment.

@jaylim-crl jaylim-crl force-pushed the jay/221021-sql-instances-preallocator branch 3 times, most recently from 6f972c8 to ed4742e Compare November 4, 2022 14:15
@jaylim-crl jaylim-crl force-pushed the jay/221021-sql-instances-preallocator branch from ed4742e to a8c1aec Compare November 7, 2022 17:07
Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: mod the minor note to back off in case we keep failing to find an ID.

Reviewed 1 of 9 files at r2, 2 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jaylim-crl and @jeffswenson)


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 187 at r3 (raw file):

	}
	var err error
	for {

Can you add backoff to this loop given it could potentially spin? The situation in which I see it spinning is if there are no rows and the cached reader takes time to find out that sessions are dead.

…ances

Related to cockroachdb#85737.

Previously, when a SQL pod starts up, we will allocate an instance ID for it
synchronously. This can be a problem for multi-region setup because that would
now involve a read and write across all regions. This commit is part of the
work to make the system.sql_instances table REGIONAL BY ROW to reduce cold
start times. Here, we would asynchronously pre-allocate instance IDs, and the
startup process will then claim an ID for the SQL pod. Once the sql_instances
table is made REGIONAL BY ROW, we can update the logic to pre-allocate for
every region, and only perform a regional lookup when claiming an ID.

At the same time, we will now use a cached reader of the sqlLivenessProvider
with the SQL instance storage. This is fine since the cached reader is on the
conservative side, and will return true on IsAlive if it does not know. The
only downside to this is that we leave instance rows lying around in the
sql_instances table longer, but that is fine since they will eventually be
reclaimed once the cache gets updated.

Epic: None

Release note (sql change): The `system.sql_instances` table now includes
pre-allocated ID entries, where all the fields except `id` will be NULL.
@jaylim-crl jaylim-crl force-pushed the jay/221021-sql-instances-preallocator branch from a8c1aec to f4dc2a0 Compare November 7, 2022 21:53
Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @jeffswenson)


pkg/sql/sqlinstance/instancestorage/instancestorage.go line 187 at r3 (raw file):

Previously, ajwerner wrote…

Can you add backoff to this loop given it could potentially spin? The situation in which I see it spinning is if there are no rows and the cached reader takes time to find out that sessions are dead.

Done.

@jaylim-crl
Copy link
Collaborator Author

bors r=JeffSwenson,ajwerner

Failures for TestDockerCLI and TestComposeGSS are unrelated.

@craig
Copy link
Contributor

craig bot commented Nov 8, 2022

Build succeeded:

@craig craig bot merged commit d8ebd29 into cockroachdb:master Nov 8, 2022
@jaylim-crl jaylim-crl deleted the jay/221021-sql-instances-preallocator branch November 30, 2022 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants