Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: normalize versions in multitenant-upgrade roachtest #95904

Merged
merged 1 commit into from
Feb 2, 2023

Conversation

healthy-pod
Copy link
Contributor

@healthy-pod healthy-pod commented Jan 26, 2023

If the multitenant-upgrade roachtest uses a mix
of release/non-release binaries, it may be using versions
that are technically the same but fail to confirm that because
version in test binaries are incremented by 1M.

This code change fixes the issue by normalizing versions before
comparing them.

Closes #95648

Epic: none
Release note: None

@healthy-pod healthy-pod requested review from knz and postamar January 26, 2023 01:57
@blathers-crl
Copy link

blathers-crl bot commented Jan 26, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@healthy-pod healthy-pod force-pushed the fix-tlv branch 4 times, most recently from c2918b8 to dec3327 Compare January 26, 2023 04:48
@healthy-pod healthy-pod requested a review from a team as a code owner January 26, 2023 04:48
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, but I think we're going to need @postamar's eyes on this too. I know the spirit of what his PR meant to accomplish, but the details are important and subtle.

@postamar could you please take a look?

@@ -141,7 +139,6 @@ func (p *planner) createTenantInternal(
// Use the previous major version to create the tenant and bootstrap it
// just like the previous major version binary would, using hardcoded
// initial values.
tenantVersion.Version = clusterversion.ByKey(minVersion)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this change does subtly alter what we're doing here. Before your change, we'd always initialize the tenant to a fully valid cluster version (as opposed to a cluster version which may be mid-way between two version, in the case of an in-progres upgrade). I'm not certain what's the right thing to do here, but I'd think that we wouldn't want to initialize to intermediate versions as we'd need to initialize the catalogs to that intermediate state as well, which seems difficult.

// Returns three versions :
// - v0 corresponds to the bootstrapped version of the tenant,
// - v1, v2 correspond to adjacent releases.
// Returns two versions v0, v1, v2 which correspond to adjacent releases. v0 will
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change future-proof? If, for example, we let MinSupportedVerion trail a few releases, would this comment still be correct? I'd imagine that the test would still do what we'd want it to do, though maybe not if we wanted to test all of the upgrade stages in between v0 and v2.

@healthy-pod
Copy link
Contributor Author

I agree maybe this is not the right change, maybe we should make the changes to how we test instead? I am looking for suggestions but I will leave 2 points on why I made this change to further explain how this is just a testing issue and not an actual correctness issue:

  1. passing a version override using testing knobs or by setting version when creating testing cluster settings is no longer effective because the changes in createTenantInternal run after that (ofcourse) and do not check for any overrides. Ideally, minVersion and curVersion in createTenantInternal should have the testing overrides values if an override was passed.
  2. checking for version equality between two servers where a server is started with a crdb_test-built binary and the other isn't will return different versions even if they are technically the same. I am thinking of adding another version equality function that works around this by incrementing/decrementing one of the versions by 1e6 and returning true if they become equal.

@healthy-pod healthy-pod requested a review from a team as a code owner January 30, 2023 04:05
@healthy-pod healthy-pod requested review from srosenberg and smg260 and removed request for a team January 30, 2023 04:05
@healthy-pod healthy-pod force-pushed the fix-tlv branch 3 times, most recently from aaa4bed to 452c5ee Compare January 30, 2023 04:13
@healthy-pod healthy-pod changed the title sql: ensure TLV is consistent with SLV roachtest: normalize versions in multitenant-upgrade roachtest Jan 30, 2023
@healthy-pod
Copy link
Contributor Author

When multiple previous releases are supported, my previous code change becomes incorrect. I removed my changes to tenant_creation and just made the test ignore the difference between normal versions and versions from test binaries by normalizing them.

Ready for another pass.

@healthy-pod healthy-pod requested a review from ajstorm January 30, 2023 04:32
@renatolabs
Copy link
Contributor

Drive-by comment about upgrade testing here: If we decide to merge this PR as it currently is, I'd suggest adding a note/TODO/issue to revert it at some point. I think we can agree that asserting on specific cluster versions makes for a stronger test (and having or not having the version offset is part of that); more importantly, we want to eventually live in a world where most upgrade roachtests don't have to deal with version offsetting (#92608).

I'd also make a small correction here that version offsetting does not depend on the crdb_test build flag; offsetting happens if the binary is not considered a release binary:

// developmentBranch must true on the main development branch but should be set
// to false on a release branch once the set of versions becomes append-only and
// associated upgrade implementations are frozen. It can be forced to true via
// an env var even on a release branch, to allow running a release binary in a
// dev cluster.
var developmentBranch = true || envutil.EnvOrDefaultBool("COCKROACH_FORCE_DEV_VERSION", false)

@healthy-pod
Copy link
Contributor Author

@renatolabs

I'd suggest adding a note/TODO/issue to revert it at some point

What should the trigger of this revert be? Is there something expected after which we won't need to normalize versions?

I'd also make a small correction here that version offsetting does not depend on the crdb_test build flag; offsetting happens if the binary is not considered a release binary

👍

@renatolabs
Copy link
Contributor

What should the trigger of this revert be?

Once #92608 is closed.

@healthy-pod healthy-pod force-pushed the fix-tlv branch 2 times, most recently from 52b9f19 to 1cb4e12 Compare January 31, 2023 21:57
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change much better than the initial one, but it's not clear to me that it's what we want long-term. I'd think that the long-term fix here is to ensure that the create tenant code respects the version overrides contained in the testing knobs. That way we won't need the offset ignoring code. I'm OK to merge this now if we plan to add the testing knobs fix in a separate PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @healthy-pod, @knz, @postamar, @smg260, and @srosenberg)


pkg/cmd/roachtest/tests/multitenant_upgrade.go line 315 at r2 (raw file):

) {
	normalizeVersion := func(v roachpb.Version) roachpb.Version {
		if (v.Major - clusterversion.DevOffset) >= 0 {

Nit: it might beeasier to understand this code if it was written as:

if v.Major > clusterversion.DevOffset {

Copy link
Contributor Author

@healthy-pod healthy-pod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added the tenant logical version overrides in #94973, this is not related to that. Previously (ie before #94774) we used to set the tenant logical version to be exactly the storage logical version. As in, we read the SLV and set the TLV to that. Now this is not true, it's either the SBV if it's equal to the SLV or otherwise the min version the storage binary can use to create a tenant. What happens here is:

current binary = non-release binary with version offset
predecessor binary = release binary without version offset

  • storage starts with predecessor binary and its logical version is predecessor binary so SLV doesn't have offset
  • storage upgrades to current binary but doesn't finalize so SLV is still without offset
  • storage binary now though is current binary which is non-release and when it "calculates" the tenant creation version (instead of what used to happen previously which was just copying SLV) it will be the min supported version of current binary which happens to be predecessor binary but with offset because current binary is not a release binary even if storage didn't finalize yet.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @postamar, @smg260, and @srosenberg)

If the multitenant-upgrade roachtest uses a mix
of release/non-release binaries, it may be using versions
that are technically the same but fail to confirm that because
version in test binaries are incremented by 1M.

This code change fixes the issue by normalizing versions before
comparing them.

Closes cockroachdb#95648

Epic: none
Release note: None
@healthy-pod
Copy link
Contributor Author

bors r=ajstorm

@craig craig bot merged commit 22244a7 into cockroachdb:master Feb 2, 2023
@craig
Copy link
Contributor

craig bot commented Feb 2, 2023

Build succeeded:

renatolabs added a commit to renatolabs/cockroach that referenced this pull request Feb 24, 2023
This commit adds a new environment variable that can be used by tests
to force a development binary to treated as a release binary:
`COCKROACH_TESTING_FORCE_RELEASE_BRANCH`.

Previously, upgrade roachtests were setting the
`COCKROACH_UPGRADE_TO_DEV_VERSION` variable to allow `master` to be
deployed during tests. The main downside of this approach is that it
is not the logic that real cluster would run when upgrading. In
addition, it could lead to upgrades that seem to go "backwards", which
can be quite confusing.

With this change, upgrade roachtests bypass the version offsetting
logic and are treated as the "next release" during tests.

This commit also reverts cockroachdb#95904, as the workaround introduced there is
no longer necessary.

Resolves: cockroachdb#92608.

Epic: CRDB-19321

Release note: None
craig bot pushed a commit that referenced this pull request Mar 3, 2023
97595: clusterversion: allow tests to pretend to be a release branch r=srosenberg a=renatolabs

This commit adds a new environment variable that can be used by tests to force a development binary to treated as a release binary: `COCKROACH_TESTING_FORCE_RELEASE_BRANCH`.

Previously, upgrade roachtests were setting the
`COCKROACH_UPGRADE_TO_DEV_VERSION` variable to allow `master` to be deployed during tests. The main downside of this approach is that it is not the logic that real cluster would run when upgrading. In addition, it could lead to upgrades that seem to go "backwards", which can be quite confusing.

With this change, upgrade roachtests bypass the version offsetting logic and are treated as the "next release" during tests.

This commit also reverts #95904, as the workaround introduced there is no longer necessary.

Resolves: #92608.

Epic: CRDB-19321

Release note: None

Co-authored-by: Renato Costa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: multitenant-upgrade failed
4 participants