-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loqrecovery: support mixed version recovery #96811
loqrecovery: support mixed version recovery #96811
Conversation
6331b93
to
c328ce2
Compare
e5a0087
to
c8de12d
Compare
c8de12d
to
ed2d7b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to clarify some of the semantics here before I do a complete review. See comments.
Beside the comments, currently a 23.1 binary in a mixed 22.2/23.1 cluster will generate a plan that's incompatible with a 22.2 cluster right, because 22.2 used a slightly different structure for the collected info? We should make sure we generate something compatible with the released 22.2 versions in these cases.
We also need to take care with the JSON (un)marshalling. There are several pitfalls here:
-
If I'm understanding it correctly, the JSON marshaller will currently choke on unknown fields. How do we ensure that a future binary won't emit fields that an older binary will choke on? Is
EmitDefaults: false
sufficient? -
Unlike Protobuf, we must guarantee stability of the field names -- it's easy for someone to miss this and think that changing a field name is ok. At the very least, we need tests that ensure JSON encoding stability, comments on relevant structs.
-
Similarly, we use
EnumsAsInts: false
, which means that unlike the Protobuf protocol we can't change enum names without breaking backwards compatibility.
This only matters for replica info collection and plan generation locally on the client, right? Let's make sure we're certain what the implications are for these and their included messages.
d163406
to
ecdbbd7
Compare
I put comments about our versioning approach to version.go where version validation functions are as it seems better place than scattering across the places where they are used. So we will maintain old json format for clusters that are prior to 23.1. There are some workarounds to make things works on master which is 22.2. until it is released. Otherwise master will work like a "legacy" and only use old format and features. Those bits could go away after branch is cut. |
ce64aac
to
0abce85
Compare
8d8acea
to
6dbbb02
Compare
Changed to used standard version gate throughout. One thing that I think is different with CLI vs intra cluster rpc's is that cluster won't version gate data if CLI has lower version. The RPC will still allow cli to talk to cluster if cli is above min version. This is not enough for us when collecting data, so I kept explicit version check that data returned by cluster is compatible with CLI. |
5d31760
to
1da29fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this ended up being pretty clean.
2e205f3
to
42132a9
Compare
This commit adds mixed version support for half-online loss of quorum recovery service and cli tools. This change would allow user to use loq recovery in partially upgraded clusters by tracking version that generated data and produce recovery plans which will have identical version so that versions could be verified on all steps of recovery. Release note: None
42132a9
to
e2ffffc
Compare
bors r=erikgrinaker |
Build succeeded: |
This commit adds mixed version support for half-online loss of quorum recovery service and cli tools.
This change would allow user to use loq recovery in partially upgraded clusters by tracking version that generated data and produce recovery plans which will have identical version so that versions could be verified on all steps of recovery.
General rule is you can use data from the cluster that is not newer than a binary version to avoid new information being dropped. This rule applies to planning process where planner should understand replica info and also to cockroach node that applies the plan, which should be created by equal or lower version. Additional restriction is on planner to preserve version in the plan and don't use any new features if processed info is older than the binary version. This is no different on what version gates do in cockroach.
Release note: None
Fixes #95344