-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: allow the in-memory version to be the next fence version #99967
sql: allow the in-memory version to be the next fence version #99967
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
92f9c73
to
dfd76d5
Compare
Do you mean individual migrations or all migrations that happen when we upgrade from two released binaries? I believe the former is what is relevant here, and as I mentioned in #99894, I don't see migrations taking anywhere near 2mins around the timestamp of the failure 🤔. Another thing is that the "show cluster setting" retry loop has basically no backoff, right? So wouldn't it be unlikely that we're hitting a fence every time for 2 minutes? |
All migrations. The deal is that at the start of a migration, we move the in-memory version to the next fence version (but we don't write the fence). Then when the migration completes, we move the in-memory version and then write to KV. |
The point being that while migrations are running, it is likely that the in-memory version is at the fence. |
cockroach/pkg/sql/show_cluster_setting.go Line 59 in a961061
|
What's funky is I don't see the upgrade migrations taking very long on master now. |
pkg/sql/show_cluster_setting.go
Outdated
@@ -94,7 +94,8 @@ func (p *planner) getCurrentEncodedVersionSettingValue( | |||
} | |||
|
|||
localRawVal := []byte(s.Get(&st.SV)) | |||
if !bytes.Equal(localRawVal, kvRawVal) { | |||
if !bytes.Equal(localRawVal, kvRawVal) && | |||
!localIsNextFence(localRawVal, kvRawVal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add a comment here explaining what you are in the commit message.
dfd76d5
to
4efa1a7
Compare
@irfansharif I added more structure, improved the error message, and added testing. PTAL |
59b9c83
to
d5b0036
Compare
In `SHOW CLUSTER SETTING version` we wait for the stored version to match the in-memory version. The in-memory version get pushed to the fence but the fence is not persisted, so, while migrations are ongoing, we won't see these values match. In practice this is a problem these days because the migrations take a long time. Epic: none Informs cockroachdb#99894 Release note (bug fix): Fixed a bug which could cause `SHOW CLUSTER SETTING version` to hang and return an opaque error while cluster finalization is ongoing.
d5b0036
to
5c3bb1f
Compare
For your amusement, here's ChatGPT's summary of this change as a short poem:
|
bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
In
SHOW CLUSTER SETTING version
we wait for the stored version to match the in-memory version. The in-memory version get pushed to the fence but the fence is not persisted, so, while migrations are ongoing, we won't see these values match. In practice this is a problem these days because the migrations take a long time.Epic: none
Informs #99894
Release note: None