Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2: sql: fix mixed-version behaviour of create_tenant() #94792

Merged

Conversation

postamar
Copy link

@postamar postamar commented Jan 5, 2023

Previously, running create_tenant on the latest binary in
a mixed-version cluster would bootstrap the system schema using the
logic from the latest binary, in which all system database migration
upgrade steps have been "baked in".

This might not be compatible with the the tenant cluster's version which
is (correctly) set to the active host cluster version, the idea there
being that tenant clusters versions cannot be greater than host cluster
versions.

This commit fixes this by version-gating the tenant cluster system
schema bootstrapping, and using hard-coded values when bootstrapping
into the old version.

Informs #94773.

Release note: None
Release justification: fixes a serious bug

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@postamar postamar changed the title bootstrap: add string representation of initial values release=22.2: bootstrap: add string representation of initial values Jan 5, 2023
@postamar postamar changed the title release=22.2: bootstrap: add string representation of initial values release-22.2: bootstrap: add string representation of initial values Jan 5, 2023
@postamar postamar changed the title release-22.2: bootstrap: add string representation of initial values release-22.2: sql: fix mixed-version behaviour of create_tenant() Jan 5, 2023
@postamar postamar force-pushed the cherry-picked-94774-on-222 branch 3 times, most recently from dd0c7bd to a56de4c Compare January 6, 2023 17:35
@postamar postamar marked this pull request as ready for review January 6, 2023 18:25
@postamar postamar requested a review from a team January 6, 2023 18:25
@postamar postamar requested a review from a team as a code owner January 6, 2023 18:25
@postamar postamar requested a review from rafiss January 6, 2023 18:25
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on the overall architecture, but I have some concerns.

I'll also want to see the pre-computed metadata bytes in a PR against master.

Reviewed 5 of 5 files at r1, 8 of 8 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar and @rafiss)


pkg/sql/catalog/bootstrap/metadata.go line 268 at r1 (raw file):

	// Build the string representation.
	var sb strings.Builder
	for _, r := range records {

I'd like to question the wisdom of baking an ad-hoc encoding format here. Would JSON or protobuf be an option here?


pkg/startupmigrations/migrations.go line 788 at r2 (raw file):

	        `
	err := r.execAsRootWithRetry(ctx, "addRootUser", upsertRootStmt, username.RootUser)
	if pgerror.GetPGCode(err) == pgcode.UndefinedColumn {

I am not comfortable with this approach - if we change anything later in the query above that introduces a mistake in a column name, this will fire here.

My recommendation instead:

  • change the statement above into an UPSERT without user_id
  • after that upsert succeeds, unconditionally execute a 2nd statement that adds user ID 1 for the root user if it does not exist yet (UPDATE system.users SET user_id = 1 WHERE username = $1 AND user_id IS NULL) and allow that 2nd statement to fail.

pkg/startupmigrations/migrations.go line 812 at r2 (raw file):

		// schema, the user_id column doesn't exist yet and version gates are
		// useless at this juncture.
		const upsertAdminStmtV221 = `

ditto

Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @postamar, and @rafiss)


pkg/startupmigrations/migrations.go line 788 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I am not comfortable with this approach - if we change anything later in the query above that introduces a mistake in a column name, this will fire here.

My recommendation instead:

  • change the statement above into an UPSERT without user_id
  • after that upsert succeeds, unconditionally execute a 2nd statement that adds user ID 1 for the root user if it does not exist yet (UPDATE system.users SET user_id = 1 WHERE username = $1 AND user_id IS NULL) and allow that 2nd statement to fail.

Do we care about transactionality here? Because those would be two separate operations.

@postamar
Copy link
Author

postamar commented Jan 6, 2023

I am not comfortable with this approach - if we change anything later in the query above that introduces a mistake in a column name, this will fire here.

I like the idea of an UPSERT followed by an UPDATE. I don't share your concern about changes in the query. These queries just don't change much. I can't think of why we'd want to change them in release-22.2. In master they've been replaced by permanent upgrades, which are much nicer to deal with.

See #94774 for the changes on master. Over there we have a similar but different challenge in that we need to insert into system.jobs to create the permanent upgrade job but we don't know what version we're at.

@postamar postamar force-pushed the cherry-picked-94774-on-222 branch from a56de4c to 228f74d Compare January 6, 2023 19:35
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. In master they've been replaced by permanent upgrades, which are much nicer to deal with.

oh that's right. then we shouldn't be too concerned.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @fqazi, @postamar, and @rafiss)

@postamar
Copy link
Author

postamar commented Jan 6, 2023

Do we care about transactionality here? Because those would be two separate operations.

I don't think so, but I may be wrong.

@knz
Copy link
Contributor

knz commented Jan 6, 2023

I'd love to see what @rafiss thinks.

Marius Posta added 2 commits January 6, 2023 15:01
This commit adds InitialValuesToString and InitialValuesFromString which
convert KV datums and KV split keys from a MetadataSchema to a string
representation and vice-versa. The tenant prefix is omitted from the
string representation.

This functionality exists to allow bootstrapping clusters using the
values and splits as generated by an earlier cockroach binary.

Informs cockroachdb#94773.

Release note: None
Previously, running create_tenant on the latest binary in
a mixed-version cluster would bootstrap the system schema using the
logic from the latest binary, in which all system database migration
upgrade steps have been "baked in".

This might not be compatible with the the tenant cluster's version which
is (correctly) set to the active host cluster version, the idea there
being that tenant clusters versions cannot be greater than host cluster
versions.

This commit fixes this by version-gating the tenant cluster system
schema bootstrapping, and using hard-coded values when bootstrapping
into the old version.

Informs cockroachdb#94773.

Release note: None
@postamar postamar force-pushed the cherry-picked-94774-on-222 branch from 228f74d to 64a402f Compare January 6, 2023 20:01
@postamar
Copy link
Author

postamar commented Jan 6, 2023

@knz I've had to give up doing UPSERT+UPDATE in this branch, because the new column in the users table is not nullable. So the way I see it I'm stuck with my original approach. That being said, UPSERT+UPDATE is what I've done in master.

@knz
Copy link
Contributor

knz commented Jan 6, 2023

thanks that works.

Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1, 3 of 11 files at r3, 8 of 8 files at r5, 8 of 8 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar and @rafiss)

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for taking this on. my comments are all non-blocking

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar)


pkg/sql/catalog/bootstrap/metadata.go line 282 at r5 (raw file):

	for i, r := range s {
		if i == 0 {
			sb.WriteRune('[')

not sure if it matters, but if s is empty, then this [ will be omitted incorrectly (but the ] will still be written below)


pkg/sql/catalog/bootstrap/previous_release.go line 58 at r6 (raw file):

	// nonSystem221 is the hard-coded string representation of the
	// initial values for a non-system tenant as produced by the 22.1
	// release binary.

could be useful to leave instructions on how this was generated, for future readers


pkg/startupmigrations/migrations.go line 788 at r6 (raw file):

	        `
	err := r.execAsRootWithRetry(ctx, "addRootUser", upsertRootStmt, username.RootUser)
	if pgerror.GetPGCode(err) == pgcode.UndefinedColumn {

i don't feel super strongly, but it would be more robust to do introspection on the table, and use that to decide which UPSERT statement to execute. like with select column_name, is_nullable from [show columns from system.users] where column_name = 'user_id';

for the sake of expedience, i don't feel like my suggestion is necessary


pkg/ccl/kvccl/kvtenantccl/tenant_upgrade_test.go line 176 at r6 (raw file):

}

// Returns two versions v0, v1, v2 which correspond to adjacent releases.

seems like it returns 3 versions

@postamar postamar merged commit d8e309b into cockroachdb:release-22.2 Jan 6, 2023
@postamar postamar deleted the cherry-picked-94774-on-222 branch January 6, 2023 22:45
@postamar
Copy link
Author

postamar commented Jan 6, 2023

Thanks for the review Rafi!

[ will be omitted

will fix in master

could be useful to leave instructions on how this was generated, for future readers

will do in master

it would be more robust to do introspection on the table

Completely agree. I'm leaving things as they are because startupmigrations are a dead end but in master I'm hopeful that we can:

  • either get the tenant's bootstrapped version to be initialized in a way that allows it to be used in the permanent migrations;
  • or do some information_schema queries or something which are known to be always be cheap (no extra round trips!) to do exactly what you suggest in the permanent upgrade txn.

seems like it returns 3 versions

whoops!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants