Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: block session for database name cache updates #21900

Merged

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Jan 29, 2018

Before this patch, if one did something like:
ALTER DATABASE foo RENAME TO bar
SELECT * from foo

the SELECT might or might not fail.

Similarly:

ALTER DATABASE foo RENAME TO bar
CREATE DATABASE foo

the CREATE might or might not succeed.

This is because database names are cached and the cache is updated
asynchonously, through gossip. The situation is different for table
names, as that cache is integrated with the table leasing mechanism.

This patch makes it so that the SQL session on which the a database is
dropped/renamed is blocked until the database cache is updated to
reflect the availability of the name. The blocking happens when the
transaction (implicit or explicit) of the DDL statement commits.
Name resolution inside the txn (if any), is not changed in this path.
Presumably that works fine because the session maintains a per-txn list
of uncommitted databases.
Of course, this all refers strictly to the session doing the DROP/RENAME
DDL. Any other session is not affected, and their behavior remains as
erratic as before.

The changes described above were already essentially in effect in our
logic tests, through a testing-only mechanism - "testingVerifyMetadata".
This patch removes most of that mechanism, and reworks and elevates a
small set of its uses beyond testing:

We've had this weird testing facility that we've had since time
immemorial - DDL statements would provide a callback that verifies the
state of the table descriptor, as seen by the Executor after running the
statement. The idea, I think, was to verify two things: a) that the
descriptor is not mutated too much "synchronously" (as part of running
the DDL) - instead it should be mutated by asynchronous jobs. Second, to
verify that these async jobs eventually run.

The mechanism was super weird, leading to intractable code that I don't
want to port forward to the new connExecutor. I doubt that the test
facility was ever useful, beyond the database cache updates discussed
above.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei
Copy link
Contributor Author

At least one logic test becomes flaky with this change because I think logic tests were abusing this mechanism for blocking a bit such that gossip propagates updated descriptors and some caches are invalidated. E.g. renaming a database and immediately verifying that the database doesn't exist is now flaky. Will figure something to do.

@nvanbenschoten
Copy link
Member

:lgtm: in spirit, but if removing this breaks tests then it sounds like it is needed until we replace it with something else. It looks like testingVerifyMetadata was initially added in 1cc9644. Can we now remove sql/verify.go entirely?


Review status: 0 of 13 files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@andreimatei andreimatei force-pushed the testing-verify-metadata branch from 9c8156c to a992c13 Compare January 30, 2018 23:30
@andreimatei andreimatei changed the title sql: delete TestingVerifyMetadata sql: block session for database name cache updates Jan 30, 2018
@andreimatei
Copy link
Contributor Author

@nvanbenschoten I've kept a subset of the uses of testingVerifyMetadata for database name cache purposes. PTAL

@vivekmenezes this is the best I could come up with. All the other facilities we have in the area, like the uncommitted databases you were mentioning, are per txn, so they don't really apply to blocking different transaction.

@vivekmenezes
Copy link
Contributor

Not confident in how this all works. Can you restructure it to block after all the schema changers have executed? Can you copy all the uncommitted databases from the session to construct the list of names it needs to wait for?

@andreimatei andreimatei force-pushed the testing-verify-metadata branch from a992c13 to 36f9d80 Compare January 31, 2018 20:28
@andreimatei andreimatei requested a review from a team January 31, 2018 20:28
@andreimatei andreimatei force-pushed the testing-verify-metadata branch from 36f9d80 to d1c49f7 Compare January 31, 2018 20:48
@andreimatei
Copy link
Contributor Author

I've rewritten the thing in a different way - now the blocking is encapsulated with the tableCollection . PTAL


Review status: 0 of 18 files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Reviewed 18 of 18 files at r1.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sql/backfill.go, line 284 at r1 (raw file):

				defer func() {
					if err := tc.releaseTables(ctx, dontBlockForDBCacheUpdate); err != nil {
						log.Warningf(ctx, "error releasing tables: err")

I don't think this is what you want.


pkg/sql/table.go, line 294 at r1 (raw file):

	// Can be nil if releaseTables() is only called with the
	// dontBlockForDBCacheUpdate option.
	dbCacheSubscriber dbCacheSubscriber

Put this next to databaseCache.


pkg/sql/table.go, line 506 at r1 (raw file):

					desc, err := dc.getCachedDatabaseDesc(uc.name)
					if err == nil {
						// If the database name still exists but it not references another

s/it/is/


pkg/sql/table.go, line 514 at r1 (raw file):

						return true, nil
					}
					if !strings.Contains(err.Error(), "does not exist") {

Let's get rid of this string matching. It's pretty fragile.


Comments from Reviewable

@andreimatei andreimatei force-pushed the testing-verify-metadata branch from d1c49f7 to 8cdb81c Compare February 1, 2018 21:17
@andreimatei
Copy link
Contributor Author

switched to a sync.Map inside the cache. Also added a comment about how the cache sharing and wiping is funky.


Review status: 12 of 18 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


pkg/sql/backfill.go, line 284 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think this is what you want.

lolo


pkg/sql/table.go, line 294 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Put this next to databaseCache.

Done.


pkg/sql/table.go, line 506 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/it/is/

error was at the next word


pkg/sql/table.go, line 514 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's get rid of this string matching. It's pretty fragile.

Done.


Comments from Reviewable

@andreimatei andreimatei force-pushed the testing-verify-metadata branch from 8cdb81c to 9b5b1d5 Compare February 1, 2018 21:26
@nvanbenschoten
Copy link
Member

:lgtm: once CI builds. Please stress pkg/sql a bit before merging to make sure this doesn't introduce any flakes.


Reviewed 6 of 6 files at r2.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Feb 2, 2018

:lgtm:

Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely!!!! You have excelled yourself

)

func (tc *TableCollection) addUncommittedDatabase(name string, id sqlbase.ID, action dbAction) {
db := uncommittedDatabase{name: name, id: id, dropped: action == dbDropped}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you could store the action in uncommittedDatabase as opposed to the boolean.

Before this patch, if one did something like:
ALTER DATABASE foo RENAME TO bar
SELECT * from foo

the SELECT might or might not fail.

Similarly:

ALTER DATABASE foo RENAME TO bar
CREATE DATABASE foo

the CREATE might or might not succeed.

This is because database names are cached and the cache is updated
asynchonously, through gossip. The situation is different for table
names, as that cache is integrated with the table leasing mechanism.

This patch makes it so that the SQL session on which the a database is
dropped/renamed is blocked until the database cache is updated to
reflect the availability of the name. The blocking happens when the
transaction (implicit or explicit) of the DDL statement commits.
Name resolution inside the txn (if any), is not changed in this path.
Presumably that works fine because the session maintains a per-txn list
of uncommitted databases.
Of course, this all refers strictly to the session doing the DROP/RENAME
DDL. Any other session is not affected, and their behavior remains as
erratic as before.

This works by having the TableCollection optionally block for updates to
the database cache to arrive once a txn is committed, as such ensuring
that the session that ran the DDL is only unblocked to run subsequent
statements once the cache is copacetic.

The changes described above were already essentially in effect in our
logic tests, through a testing-only mechanism - "testingVerifyMetadata".
This patch removes that mechanism:
We've had this weird testing facility that we've had since time
immemorial - DDL statements would provide a callback that verifies the
state of the table descriptor, as seen by the Executor after running the
statement. The idea, I think, was to verify two things: a) that the
descriptor is not mutated too much "synchronously" (as part of running
the DDL) - instead it should be mutated by asynchronous jobs. Second, to
verify that these async jobs eventually run.

The mechanism was super weird, leading to intractable code that I don't
want to port forward to the new connExecutor. I doubt that the test
facility was ever useful, beyond the database cache updates discussed
above.

Release note: the results for database creation, dropping or renaming
statement are now block only delivered to the client once it's
guaranteed that the client in question can use the new database (or the
new name of the database).
@andreimatei andreimatei force-pushed the testing-verify-metadata branch from 9b5b1d5 to 21f8373 Compare February 2, 2018 21:50
@andreimatei
Copy link
Contributor Author

TFTRs


Review status: 14 of 18 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


pkg/sql/table.go, line 546 at r2 (raw file):

Previously, vivekmenezes wrote…

I suppose you could store the action in uncommittedDatabase as opposed to the boolean.

my current policy (which has been known to change) is to prefer enums to bools for function arguments because they're self-documenting at call sites, but prefer bool fields to enums because they're sometimes easier to use (in an if statement) and the documentation argument doesn't quite apply to fields.
I wish there were a google styleguide to tell me what to do.


Comments from Reviewable

@andreimatei
Copy link
Contributor Author

stressed a bunch


Review status: 14 of 18 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@andreimatei andreimatei merged commit 7813e06 into cockroachdb:master Feb 2, 2018
@andreimatei andreimatei deleted the testing-verify-metadata branch February 2, 2018 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants