-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: block session for database name cache updates #21900
sql: block session for database name cache updates #21900
Conversation
At least one logic test becomes flaky with this change because I think logic tests were abusing this mechanism for blocking a bit such that gossip propagates updated descriptors and some caches are invalidated. E.g. renaming a database and immediately verifying that the database doesn't exist is now flaky. Will figure something to do. |
Review status: 0 of 13 files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
9c8156c
to
a992c13
Compare
@nvanbenschoten I've kept a subset of the uses of testingVerifyMetadata for database name cache purposes. PTAL @vivekmenezes this is the best I could come up with. All the other facilities we have in the area, like the uncommitted databases you were mentioning, are per txn, so they don't really apply to blocking different transaction. |
Not confident in how this all works. Can you restructure it to block after all the schema changers have executed? Can you copy all the uncommitted databases from the session to construct the list of names it needs to wait for? |
a992c13
to
36f9d80
Compare
36f9d80
to
d1c49f7
Compare
I've rewritten the thing in a different way - now the blocking is encapsulated with the Review status: 0 of 18 files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
Reviewed 18 of 18 files at r1. pkg/sql/backfill.go, line 284 at r1 (raw file):
I don't think this is what you want. pkg/sql/table.go, line 294 at r1 (raw file):
Put this next to pkg/sql/table.go, line 506 at r1 (raw file):
pkg/sql/table.go, line 514 at r1 (raw file):
Let's get rid of this string matching. It's pretty fragile. Comments from Reviewable |
d1c49f7
to
8cdb81c
Compare
switched to a sync.Map inside the cache. Also added a comment about how the cache sharing and wiping is funky. Review status: 12 of 18 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending. pkg/sql/backfill.go, line 284 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
lolo pkg/sql/table.go, line 294 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/sql/table.go, line 506 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
error was at the next word pkg/sql/table.go, line 514 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. Comments from Reviewable |
8cdb81c
to
9b5b1d5
Compare
Reviewed 6 of 6 files at r2. Comments from Reviewable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely!!!! You have excelled yourself
) | ||
|
||
func (tc *TableCollection) addUncommittedDatabase(name string, id sqlbase.ID, action dbAction) { | ||
db := uncommittedDatabase{name: name, id: id, dropped: action == dbDropped} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you could store the action in uncommittedDatabase as opposed to the boolean.
Before this patch, if one did something like: ALTER DATABASE foo RENAME TO bar SELECT * from foo the SELECT might or might not fail. Similarly: ALTER DATABASE foo RENAME TO bar CREATE DATABASE foo the CREATE might or might not succeed. This is because database names are cached and the cache is updated asynchonously, through gossip. The situation is different for table names, as that cache is integrated with the table leasing mechanism. This patch makes it so that the SQL session on which the a database is dropped/renamed is blocked until the database cache is updated to reflect the availability of the name. The blocking happens when the transaction (implicit or explicit) of the DDL statement commits. Name resolution inside the txn (if any), is not changed in this path. Presumably that works fine because the session maintains a per-txn list of uncommitted databases. Of course, this all refers strictly to the session doing the DROP/RENAME DDL. Any other session is not affected, and their behavior remains as erratic as before. This works by having the TableCollection optionally block for updates to the database cache to arrive once a txn is committed, as such ensuring that the session that ran the DDL is only unblocked to run subsequent statements once the cache is copacetic. The changes described above were already essentially in effect in our logic tests, through a testing-only mechanism - "testingVerifyMetadata". This patch removes that mechanism: We've had this weird testing facility that we've had since time immemorial - DDL statements would provide a callback that verifies the state of the table descriptor, as seen by the Executor after running the statement. The idea, I think, was to verify two things: a) that the descriptor is not mutated too much "synchronously" (as part of running the DDL) - instead it should be mutated by asynchronous jobs. Second, to verify that these async jobs eventually run. The mechanism was super weird, leading to intractable code that I don't want to port forward to the new connExecutor. I doubt that the test facility was ever useful, beyond the database cache updates discussed above. Release note: the results for database creation, dropping or renaming statement are now block only delivered to the client once it's guaranteed that the client in question can use the new database (or the new name of the database).
9b5b1d5
to
21f8373
Compare
TFTRs Review status: 14 of 18 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. pkg/sql/table.go, line 546 at r2 (raw file): Previously, vivekmenezes wrote…
my current policy (which has been known to change) is to prefer enums to bools for function arguments because they're self-documenting at call sites, but prefer bool fields to enums because they're sometimes easier to use (in an Comments from Reviewable |
stressed a bunch Review status: 14 of 18 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful. Comments from Reviewable |
Before this patch, if one did something like:
ALTER DATABASE foo RENAME TO bar
SELECT * from foo
the SELECT might or might not fail.
Similarly:
ALTER DATABASE foo RENAME TO bar
CREATE DATABASE foo
the CREATE might or might not succeed.
This is because database names are cached and the cache is updated
asynchonously, through gossip. The situation is different for table
names, as that cache is integrated with the table leasing mechanism.
This patch makes it so that the SQL session on which the a database is
dropped/renamed is blocked until the database cache is updated to
reflect the availability of the name. The blocking happens when the
transaction (implicit or explicit) of the DDL statement commits.
Name resolution inside the txn (if any), is not changed in this path.
Presumably that works fine because the session maintains a per-txn list
of uncommitted databases.
Of course, this all refers strictly to the session doing the DROP/RENAME
DDL. Any other session is not affected, and their behavior remains as
erratic as before.
The changes described above were already essentially in effect in our
logic tests, through a testing-only mechanism - "testingVerifyMetadata".
This patch removes most of that mechanism, and reworks and elevates a
small set of its uses beyond testing:
We've had this weird testing facility that we've had since time
immemorial - DDL statements would provide a callback that verifies the
state of the table descriptor, as seen by the Executor after running the
statement. The idea, I think, was to verify two things: a) that the
descriptor is not mutated too much "synchronously" (as part of running
the DDL) - instead it should be mutated by asynchronous jobs. Second, to
verify that these async jobs eventually run.
The mechanism was super weird, leading to intractable code that I don't
want to port forward to the new connExecutor. I doubt that the test
facility was ever useful, beyond the database cache updates discussed
above.
Release note: None