Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/client: SystemConfigTrigger() fails to gossip configuration changes if EndTransaction is not executed on system range #7570

Closed
mrtracy opened this issue Jun 30, 2016 · 8 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@mrtracy
Copy link
Contributor

mrtracy commented Jun 30, 2016

client.Transaction has a setting SystemConfigTrigger(), which is set whenever the SystemDB is modified (for example, when adding a new table). This has the effect of immediately gossiping the new system configuration when the transaction is committed (during EndTransaction command).

However, this will only work if the EndTransaction command is executed on the range containing SystemDB; in the case of a "mixed" transaction which updates both SystemDB and other non-system tables, the new configuration will only be gossiped if the transaction record is located on the SystemDB range. Note that this situation currently depends on the first key modified in a transaction; if the first key modified is in the SystemDB then it works correct, otherwise it will fail.

Note that the configuration change is eventually gossiped whenever the leader lease for the SystemDB range is refreshed or changed.

I have added an log.Errorc() message which occurs when SystemConfigTrigger is set for an EndTransaction statement that executes on a non-system range; this should allow us to discover any other situations where this is occurring. I have also added comment explanations on SetSystemConfigTrigger(), and in replica.runCommitTrigger.

This issue is to track any additional documentation or functionality changes needed to address this unexpected behavior.

mrtracy pushed a commit to mrtracy/cockroach that referenced this issue Jun 30, 2016
Commit adds three new event log types:
+ "Alter Table", logged for every ALTER TABLE DDL statement.
+ "Finish Schema Change", logged when an asynchronous schema change is
completed.
+ "Reverse Schema Change", logged when an asynchronous schema change encounters
an error (such as a constraint violation during backfill) and is rolled back.

This Commit encountered an interesting, perhaps unintended effect with
`SystemConfigTrigger`: in a "mixed" transaction which updates both system and
non-system keys, the system changes will only be gossiped if the transaction
record is located on the system range. In other words, "mixed" transactions will
only work correctly if the system keys are modified first. Issue cockroachdb#7570 had been
logged to track this unexpected behavior; this commit has also added a number of
comments and a logged error to address this.
@petermattis petermattis changed the title SystemConfigTrigger() fails to gossip configuration changes if EndTransaction is not executed on system range internal/client: SystemConfigTrigger() fails to gossip configuration changes if EndTransaction is not executed on system range Jul 1, 2016
@tbg
Copy link
Member

tbg commented Jul 1, 2016

Why does SetSystemConfigTrigger exist again? I was under the impression that its purpose is being called "anywhere" from sql without additional knowledge about the txn. But if the txn must be anchored to the system span for that to have the desired effect, then that contact is extremely fraught. Could we make SetSystemConfigTrigger return an error whenever a) the txn proto doesn't have a key set, or b) that key isn't clearly a member of the system config span? That precludes 1PC transactions with that side effect, but I doubt we have those (or that the ones we have really need to be 1PC - and if there is one, it could still send the manual EndTransactionRequest, which I would find preferable generally anyway but accept doesn't really sit well with sql's usage).
Additionally, can we error on EndTransaction that has this side effect but is on the wrong range?

@tamird
Copy link
Contributor

tamird commented Jul 1, 2016

Related: #2502.

@petermattis
Copy link
Collaborator

SetSystemConfigTrigger exists so that a node can gossip the system config whenever a change is made to one of the system config tables. See sqlbase.IsSystemConfigID. The system config tables are currently:

    SystemDatabaseID  = 1
    NamespaceTableID  = 2
    DescriptorTableID = 3
    UsersTableID      = 4
    ZonesTableID      = 5

It would be less error prone to move the burden of knowing when to gossip the system config table to the server. We know how to extract the table ID from any key and do so to look up zone configs outside of sql. Determining if a key being modified corresponds to a system config table would be straightforward. The question is where should this check be done server side so that it robustly captures all mutations to system config keys.

@tbg
Copy link
Member

tbg commented Jul 2, 2016

I think it's more reliable to do what I suggest above because the issue depends on the history of the transaction prior to doing system config stuff, and so we're far more likely to catch this early, for example the following code

txn := something // with some unknown history
txn.SetSystemConfigTrigger()
txn.Put(system-config-key, "something")
txn.Commit()

will work fine with what you're suggesting except when txn is previously used for a non-systemdb write, whereas my suggestion would catch it always (since in the other case, the txn isn't anchored yet, which catches the error regardless).

@petermattis
Copy link
Collaborator

You're correct. I can't think of any easy way to deal with transactions for which the transaction key does not lie on the system config range. We can't rely on intent resolution happening in any deterministic time frame.

mrtracy pushed a commit to mrtracy/cockroach that referenced this issue Jul 7, 2016
Commit adds three new event log types:
+ "Alter Table", logged for every ALTER TABLE DDL statement.
+ "Finish Schema Change", logged when an asynchronous schema change is
completed.
+ "Reverse Schema Change", logged when an asynchronous schema change encounters
an error (such as a constraint violation during backfill) and is rolled back.

This Commit encountered an interesting, perhaps unintended effect with
`SystemConfigTrigger`: in a "mixed" transaction which updates both system and
non-system keys, the system changes will only be gossiped if the transaction
record is located on the system range. In other words, "mixed" transactions will
only work correctly if the system keys are modified first. Issue cockroachdb#7570 had been
logged to track this unexpected behavior; this commit has also added a number of
comments and a logged error to address this.
@petermattis petermattis modified the milestone: Q3 Jul 11, 2016
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Mar 24, 2017
Not all schema changes require extra work after executing
in a transaction. For example CREATE TABLE with foreign
keys goes into effect immediately.

This change affects schema changes that require extra work
to complete after the transaction they are in is committed.
e.g. CREATE INDEX, DROP TABLE, etc. This change affects these
types of schema changes in the following way:

1. any statement in the same txn, following a schema change
that requires further work, is disallowed. Since the schema
change is not really complete, running such a statement makes
no sense.
2. A schema change that requires further work is allowed at
the end of a transaction IIF the preceeding statements in
the transaction are READ_ONLY. This is because the schema
change needs to anchor the transaction on the system range,
and it is unable to do so thanks to cockroachdb#7570

fixes cockroachdb#14280
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Apr 4, 2017
Not all schema changes require extra work after executing
in a transaction. For example CREATE TABLE without foreign
keys goes into effect immediately.

This change affects schema changes that require extra work
after the transaction they are in is committed.
e.g. CREATE INDEX, DROP TABLE, etc. This change affects these
types of schema changes in the following way:

1. any statement in the same txn following a schema change
that requires further work, is disallowed. Since the schema
change is not really complete, running a statement after
it makes no sense.
2. A schema change that requires further work is allowed at
the end of a transaction IIF the preceeding statements in
the transaction are READ_ONLY. This is because the schema
change needs to anchor the transaction on the system range,
and it is unable to do so thanks to cockroachdb#7570

fixes cockroachdb#14280
@petermattis petermattis modified the milestone: Q3 Aug 3, 2017
@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-bulkio-schema-changes labels May 14, 2018
@tbg
Copy link
Member

tbg commented May 14, 2018

Looks like the reasonable thing here is to fail EndTransaction if it has a system config trigger but is not on a system config range.

@petermattis
Copy link
Collaborator

Note that since this issue was filed we've adjusted the internal/client API to make it much less error prone. In particular, Txn.SetSystemConfigTrigger also manually sets the txn anchor key and errors out if the txn already has an anchor key. I'm not sure if we need to do anything else to provide additional safety.

@tbg
Copy link
Member

tbg commented May 14, 2018

👍 thanks for the heads up.

@tbg tbg closed this as completed May 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

5 participants