Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Index Backfill] Failing to update index permissions causes index to stuck #4930

Closed
frozenspider opened this issue Jun 30, 2020 · 5 comments
Closed
Assignees
Labels
kind/bug This issue is a bug

Comments

@frozenspider
Copy link
Contributor

frozenspider commented Jun 30, 2020

i've added this snippet:

--- src/yb/master/backfill_index.cc	(revision be6b1d633b3cbebed15f4b69bb913cfbac48904f)
+++ src/yb/master/backfill_index.cc	(date 1593555973572)
@@ -204,6 +204,9 @@
     TRACE("Locking indexed table");
     auto l = indexed_table->LockForWrite();
     auto &indexed_table_data = *l->mutable_data();
+    if (rand() % 100 > 50) {
+      indexed_table_data.pb.set_version(indexed_table_data.pb.version() + 1);
+    }
     if (current_version && *current_version != indexed_table_data.pb.version()) {
       LOG(INFO) << "The table schema version "
                 << "seems to have already been updated to " << indexed_table_data.pb.version()

in UpdateIndexPermission to simulate probability of table schema version mismatch. If this happens, index backfill process is interrupted and never resumed.
YCQL command eventually times out:

ycqlsh> CREATE INDEX indexed_i ON k.indexed(v);
ServerError: Server Error. Timed out waiting for Table Creation
CREATE INDEX indexed_i ON k.indexed(v);
                           ^^^^^^^^
 (ql error -2)

yb-master also sees the indexed table as ALTERING:
image
This affects both YCQL and YSQL

@jaki
Copy link
Contributor

jaki commented Jul 1, 2020

I don't follow where the snippet goes (can you format it as a diff with enough context lines?) and how it simulates table schema version mismatch (is it realistic to be off by one on indexed table version? is it an unrealistic way of simulating a very slow alter schema response rpc from tserver to master?).

@frozenspider
Copy link
Contributor Author

I've changed a snippet to a diff.
As for the simulation, I don't believe it's realistic - I couldn't reproduce it simply using TEST_slowdown_backfill_alter_table_rpcs_ms flag and doing a CREATE TABLE concurrently - I assume there's some implicit leader lock taken for MultiStageAlterTable. This, however, is just a convenient example to demonstrate that erroneous statuses aren't handled correctly.

@amitanandaiyer
Copy link
Contributor

I don't think this diff is realistic.

     auto &indexed_table_data = *l->mutable_data();
+    if (rand() % 100 > 50) {
+      indexed_table_data.pb.set_version(indexed_table_data.pb.version() + 1);
+    }

seems pretty artificial to me. The idea of

   if (current_version && *current_version != indexed_table_data.pb.version()) {
       LOG(INFO) << "The table schema version "
                 << "seems to have already been updated to " << indexed_table_data.pb.version()

is that if there has been an alter sent/in-progress for a version > current_version (say indexed_table_data.pb.version() + 1) like what you are trying to do -- The remaining work will be picked up when the "alter table" for that version (ie. indexed_table_data.pb.version() + 1) completes.

If you aren't actually sending an Alter ... but only incrementing the schema version to force that branch, I don't think it is expected to work.

@amitanandaiyer
Copy link
Contributor

If you want to execute that path -- something more realistic would be to create "multiple indexes" simultaneously on the table.

that way they can race against each other to cause a version to be bumped up unexpectedly.

@frozenspider
Copy link
Contributor Author

Closing this as not being applicable right now.
However, proper error handling should be added to that code path at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

3 participants