-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: a txn might use a table descriptor lease from the future #6773
Comments
@andreimatei ping on this one. It's almost a year old. Is this going to be addressed for 1.0? |
I'd like to keep this open so we can eventually fix it, but it need not be fixed for 1.0 |
The concern here is with using table descriptors from the transactions future. This is a perfectly legal descriptor in the sense that it is one of the two legal ones. So what happens here? ADD INDEX In the particular case of using a brand new index present in the descriptor from the future, the transaction might decide to use the index. If it reads the index, two things can happen: Either the index was last updated in the past, in which case the transaction reads the data, or the data was written in the transactions future, in which case the transaction gets pushed. ADD COLUMN In the case of a new column becoming active in the future table descriptor, again if the transaction might attempt to read the column and if it reads data from the future, it will simply get pushed. DROP INDEX If the index has been deleted in the future decriptor, the sql transaction will ignore the index and not touch the index data. DROP COLUMN In the case of a column, the column data will be ignored. So in summary the schema change is leaving the database in a consistent state at a particular timestamp (which BTW is not the timestamp of the table descriptor, earlier on it ran the backfill in separate transactions, each of those could have also used a timestamp greater than the timestamp of the transaction finalizing the schema change). The important point is that using the table descriptor involves reading data that can push a transaction (ADD), or completely ignore data (DROP) |
@andreimatei please close this issue if you are satisfied with this answer |
In case you're wondering why a reader gets pushed by a writer, from the design doc: "Reader encounters write intent or value with newer timestamp in the near future: In this case, we have to be careful. The newer intent may, in absolute terms, have happened in our read's past if the clock of the writer is ahead of the node serving the values. In that case, we would need to take this value into account, but we just don't know. Hence the transaction restarts, using instead a future timestamp (but remembering a maximum timestamp used to limit the uncertainty window to the maximum clock skew). In fact, this is optimized further; see the details under "choosing a time stamp" below. So I believe the problem is a transaction does some reads on some keys acquires an ancient timestamp and then waits for a bit, and then it runs a SQL statement that uses a table lease from the future. The window of uncertainty doesn't apply here, so the read doesn't get pushed when it attempts to read the index for example. Let me write up a test that illustrates this problem. The solution is to actually push this transaction based on lease timestamp. |
related to #6774 |
What I would like to understand is fundamental: what is a table lease with expiration E? Is it a capability for txns to commit with a timestamp < E (i.e. the ability to do writes using with ts < E using this version of the descriptor)? If so, shouldn't they also have a start timestamp T (i.e. shouldn't we enforce that writes that have been produced using this version of the descriptor don't have a timestamp < T)? Or are leases a capability to use a version of a descriptor in absolute time (i.e. a lease with expiration E means that no transaction, regardless of its timestamp, cannot use the descriptor if it is started in absolute time after E)? Or is it both? I think currently we use lease as a combination of the two interpretations, but I suspect that's not right. |
@andreimatei a lease is valid from the time the lease descriptor version was written in the DB (start time) to the time the lease expires (expiration time). For sure we are not considering the start time in transactions and that needs to be fixed. Luckily we can use the timestamp as the start time and that should suffice. In terms of expiration time, I agree we can make something perfect that takes care of time uncertainty in a perfect way but I feel that for the 1.0 release we can just expire the use of the lease like say 30 seconds before the real expiration. This will allow us to make this work with a simple implementation without making it thorny for no strong reason. |
The comment below is irrelevant and can be ignored, read the next comments As far as considering using the start time of a lease when using a lease,
|
Looks like to make this perfect we also need to insert a
The above guarantees that one cannot use a lease from an old version. |
So to summarize the new strategy that guarantees that a lease only on the latest version of a table can be acquired. (fix for #6774)
|
We also need to figure out how to push transactions which cannot use the latest lease, which is the fix for this issue. |
Hmmmm, looks like what I'm suggesting above will still not work :-( |
I might have my own suggestions, but first I'd like to hear an answer to my question from before (maybe you've answered it and I'm just failing to parse): what is a lease supposed to be? Is it about txn timestamp, the wallclock when a txn starts and ends, both, or neither? |
So here is another proposal on how we can acquire and use table leases properly: Lease acquisition:
lease use: The lease acquisition made by the node, is made at a particular timestamp T3 (the timestamp the transaction representing the node inserts the lease). Assume the table descriptor of the lease was written at timestamp Tc. Another transaction is allowed to use a lease only if its timestamp T4 is >= Tc (Note T3 >= Tc). The other condition is that the transaction timestamp T4 < T4 + expiration interval - delta . I'm not sure what this delta should be (perhaps the CLOCK_OFFSET) but we can set this to something much higher like 10 seconds. If the transaction timestamp cannot meet this criterion, it is aborted, gets a new timestamp and even perhaps sees a new lease and makes progress. Another detail worth adding here is the question of when to declare new schema elements as consistent: We currently upgrade a schema element to be world readable (new index for example), when the backfill is complete and the new descriptor has been written exposing the schema element as PUBLIC with a new version for the descriptor at T5. This is mildly inaccurate. With the backfill now happening in a distributed way with multiple transactions, you can have a situation in which one backfill transaction writes itself out at a small delta into the future at timestamp T5 + delta. Therefore a read on the table at T6, where T5 < T6 < T5 + delta will see an inconsistent index. While we believe the transaction on seeing the write at T5 + delta will get aborted there is a need for linearizability here so we cannot depend on pure serializability. Linearizability is attained by inserting a CLOCK_OFFSET delay between the time the backfill is complete and the schema element is declared consistent. |
The proposals all seem like significant changes. Is there a simpler approach which records the timestamp at which a lease was acquired and acquires another one if the requesting transaction has a lower timestamp? |
table leases are owned by the node and should be acquired outside of the context of the user's transaction. This prevents old user transactions to be used in acquiring leases. Also prevent a transaction from using a lease on a table descriptor version written after the transaction timestamp. restart transaction whenever it sees a future table lease. database name -> database id lookup is no longer done using the user transaction. fixes cockroachdb#6774 fixes cockroachdb#6773
moving to 1.1. the fix is too risky for little benefit |
table leases are owned by the node and should be acquired outside of the context of the user's transaction. This prevents old user transactions to be used in acquiring leases. Also prevent a transaction from using a lease on a table descriptor version written after the transaction timestamp. restart transaction whenever it sees a future table lease. database name -> database id lookup is no longer done using the user transaction. fixes cockroachdb#6774 fixes cockroachdb#6773
table leases are owned by the node and should be acquired outside of the context of the user's transaction. This prevents old user transactions to be used in acquiring leases. Also prevent a transaction from using a lease on a table descriptor version written after the transaction timestamp. restart transaction whenever it sees a future table lease. database name -> database id lookup is no longer done using the user transaction. fixes cockroachdb#6774 fixes cockroachdb#6773
table leases are owned by the node and are acquired outside of the context of a user's transaction. This prevents old user transactions to be used in acquiring leases. Also prevent a transaction from using a lease on a table descriptor version written after the transaction timestamp. Restart transaction whenever it sees a future table lease. database name -> database id lookup is no longer done using the user transaction. Schema changes made during a transaction are cached and used by future commands during the same transaction. Future commands use the cached schema and do not attempt to acquire a new lease on a table descriptor. fixes cockroachdb#6774 fixes cockroachdb#6773
table leases are owned by the node and are acquired outside of the context of a user's transaction. This prevents old user transactions to be used in acquiring leases. Also prevent a transaction from using a lease on a table descriptor version written after the transaction timestamp. Restart transaction whenever it sees a future table lease. database name -> database id lookup is no longer done using the user transaction. Schema changes made during a transaction are cached and used by future commands during the same transaction. Future commands use the cached schema and do not attempt to acquire a new lease on a table descriptor. fixes cockroachdb#6774 fixes cockroachdb#6773
table leases are owned by the node and are acquired outside of the context of a user's transaction. This prevents old user transactions to be used in acquiring leases. Also prevent a transaction from using a lease on a table descriptor version written after the transaction timestamp. Restart transaction whenever it sees a future table lease. database name -> database id lookup is no longer done using the user transaction. Schema changes made during a transaction are cached and used by future commands during the same transaction. Future commands use the cached schema and do not attempt to acquire a new lease on a table descriptor. fixes cockroachdb#6774 fixes cockroachdb#6773
table leases are owned by the node and are acquired outside of the context of a user's transaction. This prevents old user transactions to be used in acquiring leases. Also prevent a transaction from using a lease on a table descriptor version written after the transaction timestamp. Restart transaction whenever it sees a future table lease. database name -> database id lookup is no longer done using the user transaction. Schema changes made during a transaction are cached and used by future commands during the same transaction. Future commands use the cached schema and do not attempt to acquire a new lease on a table descriptor. fixes cockroachdb#6774 fixes cockroachdb#6773
The sql planner acquires leases before letting a txn use a table descriptor. The leases comes from the
LeaseManager
, which maintains a cache of leases. On a miss, a lease is acquired from the database in the context of the current txn - that's good. On a hit, the lease might have been previously acquired by a txn with a higher timestamp then the current one; that's no bueno since the current transaction will essentially use a version of the descriptor from its future - for example it might use an index for which its unable to read the data.We should not allow txns to use leases obtained by txns higher commit timestamps.
The text was updated successfully, but these errors were encountered: