Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: a txn might use a table descriptor lease from the future #6773

Closed
andreimatei opened this issue May 18, 2016 · 24 comments
Closed

sql: a txn might use a table descriptor lease from the future #6773

andreimatei opened this issue May 18, 2016 · 24 comments
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Milestone

Comments

@andreimatei
Copy link
Contributor

The sql planner acquires leases before letting a txn use a table descriptor. The leases comes from the LeaseManager, which maintains a cache of leases. On a miss, a lease is acquired from the database in the context of the current txn - that's good. On a hit, the lease might have been previously acquired by a txn with a higher timestamp then the current one; that's no bueno since the current transaction will essentially use a version of the descriptor from its future - for example it might use an index for which its unable to read the data.
We should not allow txns to use leases obtained by txns higher commit timestamps.

@andreimatei andreimatei added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 18, 2016
@andreimatei andreimatei added this to the 1.0 milestone May 18, 2016
@andreimatei andreimatei self-assigned this May 18, 2016
@vivekmenezes vivekmenezes assigned danhhz and andreimatei and unassigned andreimatei and danhhz May 19, 2016
@spencerkimball
Copy link
Member

@andreimatei ping on this one. It's almost a year old. Is this going to be addressed for 1.0?

@vivekmenezes vivekmenezes modified the milestones: Later, 1.0 Mar 31, 2017
@vivekmenezes
Copy link
Contributor

I'd like to keep this open so we can eventually fix it, but it need not be fixed for 1.0

@vivekmenezes
Copy link
Contributor

The concern here is with using table descriptors from the transactions future. This is a perfectly legal descriptor in the sense that it is one of the two legal ones. So what happens here?

ADD INDEX

In the particular case of using a brand new index present in the descriptor from the future, the transaction might decide to use the index. If it reads the index, two things can happen: Either the index was last updated in the past, in which case the transaction reads the data, or the data was written in the transactions future, in which case the transaction gets pushed.

ADD COLUMN

In the case of a new column becoming active in the future table descriptor, again if the transaction might attempt to read the column and if it reads data from the future, it will simply get pushed.

DROP INDEX

If the index has been deleted in the future decriptor, the sql transaction will ignore the index and not touch the index data.

DROP COLUMN

In the case of a column, the column data will be ignored.

So in summary the schema change is leaving the database in a consistent state at a particular timestamp (which BTW is not the timestamp of the table descriptor, earlier on it ran the backfill in separate transactions, each of those could have also used a timestamp greater than the timestamp of the transaction finalizing the schema change). The important point is that using the table descriptor involves reading data that can push a transaction (ADD), or completely ignore data (DROP)

@vivekmenezes
Copy link
Contributor

@andreimatei please close this issue if you are satisfied with this answer

@vivekmenezes
Copy link
Contributor

In case you're wondering why a reader gets pushed by a writer, from the design doc:

"Reader encounters write intent or value with newer timestamp in the near future: In this case, we have to be careful. The newer intent may, in absolute terms, have happened in our read's past if the clock of the writer is ahead of the node serving the values. In that case, we would need to take this value into account, but we just don't know. Hence the transaction restarts, using instead a future timestamp (but remembering a maximum timestamp used to limit the uncertainty window to the maximum clock skew). In fact, this is optimized further; see the details under "choosing a time stamp" below.
"

So I believe the problem is a transaction does some reads on some keys acquires an ancient timestamp and then waits for a bit, and then it runs a SQL statement that uses a table lease from the future. The window of uncertainty doesn't apply here, so the read doesn't get pushed when it attempts to read the index for example.

Let me write up a test that illustrates this problem. The solution is to actually push this transaction based on lease timestamp.

@vivekmenezes
Copy link
Contributor

related to #6774

@vivekmenezes vivekmenezes modified the milestones: 1.0, Later Apr 6, 2017
@vivekmenezes vivekmenezes self-assigned this Apr 6, 2017
@andreimatei
Copy link
Contributor Author

What I would like to understand is fundamental: what is a table lease with expiration E? Is it a capability for txns to commit with a timestamp < E (i.e. the ability to do writes using with ts < E using this version of the descriptor)? If so, shouldn't they also have a start timestamp T (i.e. shouldn't we enforce that writes that have been produced using this version of the descriptor don't have a timestamp < T)?
If this interpretation of the lease is right, then why do we ever check if a lease is about to expire (i.e. lease.hasSomeLifeLeft()? What does it even mean for a lease to expire?
And separately from expiration, shouldn't there be a stasis to account for clock skew, like there is for range leases?

Or are leases a capability to use a version of a descriptor in absolute time (i.e. a lease with expiration E means that no transaction, regardless of its timestamp, cannot use the descriptor if it is started in absolute time after E)?

Or is it both?

I think currently we use lease as a combination of the two interpretations, but I suspect that's not right.

@vivekmenezes
Copy link
Contributor

@andreimatei a lease is valid from the time the lease descriptor version was written in the DB (start time) to the time the lease expires (expiration time). For sure we are not considering the start time in transactions and that needs to be fixed. Luckily we can use the timestamp as the start time and that should suffice. In terms of expiration time, I agree we can make something perfect that takes care of time uncertainty in a perfect way but I feel that for the 1.0 release we can just expire the use of the lease like say 30 seconds before the real expiration. This will allow us to make this work with a simple implementation without making it thorny for no strong reason.

@vivekmenezes
Copy link
Contributor

vivekmenezes commented Apr 6, 2017

The comment below is irrelevant and can be ignored, read the next comments

As far as considering using the start time of a lease when using a lease,

  • Always create new leases on the latest two versions of the table descriptor:
  1. We need to first decouple the use of gaining a lease using a transaction except when the transaction involves a CREATE TABLE for the leased table. The lease acquisition logic should use its own transaction in all other cases.
  2. Furthermore, when incrementing the lease we should not just count the number of leases on the prior version - 1, but the count on all versions prior to version -1
  • The txn using a lease should check that its timestamp >= start time of the lease, or else abort

@vivekmenezes
Copy link
Contributor

Looks like to make this perfect we also need to insert a cluster lease for every version of the table with no expiration time.

  1. Every time the lease version is incremented, the old cluster lease at the old version is deleted and a new cluster lease at the new version is created.
  2. every time a lease is acquired, the cluster lease at that version is also read.

The above guarantees that one cannot use a lease from an old version.

@vivekmenezes
Copy link
Contributor

vivekmenezes commented Apr 6, 2017

So to summarize the new strategy that guarantees that a lease only on the latest version of a table can be acquired. (fix for #6774)

  1. Each table owns a cluster lease in the lease table with its current version. This lease is an expired lease and thus cannot be used by a transaction.
  2. Every time the lease version is incremented, a new cluster lease at the new version is created without deleting the old cluster lease. This allows a lease acquirer to check if it is acquiring a lease on the latest version of the descriptor using 3. below
  3. lease acquisition involves inserting an entry for the lease in the lease table and checking that a cluster lease for version+1 doesn't exists.

@vivekmenezes
Copy link
Contributor

We also need to figure out how to push transactions which cannot use the latest lease, which is the fix for this issue.

@vivekmenezes
Copy link
Contributor

Hmmmm, looks like what I'm suggesting above will still not work :-(

@andreimatei
Copy link
Contributor Author

I might have my own suggestions, but first I'd like to hear an answer to my question from before (maybe you've answered it and I'm just failing to parse): what is a lease supposed to be? Is it about txn timestamp, the wallclock when a txn starts and ends, both, or neither?

@vivekmenezes
Copy link
Contributor

vivekmenezes commented Apr 7, 2017

So here is another proposal on how we can acquire and use table leases properly:

Lease acquisition:

  1. Move from a request transaction acquired lease to a node acquired one: This will ensure that the lease truly belong to the node and matches up with the requirement that it be acquired in the "present"
  2. Since a lease is acquired in the "present" it will compete directly with transactions modifying the table descriptor. A transaction modifying the table descriptor runs a scan over previous leases and increments the lease version thus populating the read timestamp cache, which can push the lease acquiring transaction. This makes lease acquisition safe, in the sense that a lease will safely be acquired at the current version or the next version (version + 1) depending on when the lease acquisition runs, but never at version -1. A successful acquisition at version -1 at time T1, will mean that the transaction trying to increment the version to version + 1 at T2 > T1 will see the lease. And conversely if the lease is incremented at T2 to version + 1, and the lease acquisition tries to create a lease at version -1 at T1 later, it will be blocked by the timestamp cache as long as T2 - cache-window < T1 < T2 . The key here is to get T1 to obey this, which means not introducing any unnecessary delays in the lease acquisition transaction (and hence the need to move to node based acquisitions).
  3. If we are not confident in (T2 - cache-window < T1 < T2) we could consider adding artificial delays such as guarantees that a version cannot be incremented for a set period of time (it needs to be baked for a short time). But I feel confident we don't need to do that here.

lease use:

The lease acquisition made by the node, is made at a particular timestamp T3 (the timestamp the transaction representing the node inserts the lease). Assume the table descriptor of the lease was written at timestamp Tc. Another transaction is allowed to use a lease only if its timestamp T4 is >= Tc (Note T3 >= Tc). The other condition is that the transaction timestamp T4 < T4 + expiration interval - delta . I'm not sure what this delta should be (perhaps the CLOCK_OFFSET) but we can set this to something much higher like 10 seconds. If the transaction timestamp cannot meet this criterion, it is aborted, gets a new timestamp and even perhaps sees a new lease and makes progress.

Another detail worth adding here is the question of when to declare new schema elements as consistent: We currently upgrade a schema element to be world readable (new index for example), when the backfill is complete and the new descriptor has been written exposing the schema element as PUBLIC with a new version for the descriptor at T5. This is mildly inaccurate. With the backfill now happening in a distributed way with multiple transactions, you can have a situation in which one backfill transaction writes itself out at a small delta into the future at timestamp T5 + delta. Therefore a read on the table at T6, where T5 < T6 < T5 + delta will see an inconsistent index. While we believe the transaction on seeing the write at T5 + delta will get aborted there is a need for linearizability here so we cannot depend on pure serializability. Linearizability is attained by inserting a CLOCK_OFFSET delay between the time the backfill is complete and the schema element is declared consistent.

@petermattis
Copy link
Collaborator

The proposals all seem like significant changes. Is there a simpler approach which records the timestamp at which a lease was acquired and acquires another one if the requesting transaction has a lower timestamp?

vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Apr 24, 2017
table leases are owned by the node and should be acquired
outside of the context of the user's transaction. This prevents
old user transactions to be used in acquiring leases.

Also prevent a transaction from using a lease on a table
descriptor version written after the transaction timestamp.
restart transaction whenever it sees a future table lease.

database name -> database id lookup is no longer done
using the user transaction.

fixes cockroachdb#6774
fixes cockroachdb#6773
@vivekmenezes vivekmenezes modified the milestones: 1.0, 1.1 Apr 24, 2017
@vivekmenezes vivekmenezes modified the milestones: 1.1, 1.0 Apr 26, 2017
@vivekmenezes
Copy link
Contributor

moving to 1.1. the fix is too risky for little benefit

vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue May 9, 2017
table leases are owned by the node and should be acquired
outside of the context of the user's transaction. This prevents
old user transactions to be used in acquiring leases.

Also prevent a transaction from using a lease on a table
descriptor version written after the transaction timestamp.
restart transaction whenever it sees a future table lease.

database name -> database id lookup is no longer done
using the user transaction.

fixes cockroachdb#6774
fixes cockroachdb#6773
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue May 10, 2017
table leases are owned by the node and should be acquired
outside of the context of the user's transaction. This prevents
old user transactions to be used in acquiring leases.

Also prevent a transaction from using a lease on a table
descriptor version written after the transaction timestamp.
restart transaction whenever it sees a future table lease.

database name -> database id lookup is no longer done
using the user transaction.

fixes cockroachdb#6774
fixes cockroachdb#6773
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue May 15, 2017
table leases are owned by the node and are  acquired
outside of the context of a user's transaction. This prevents
old user transactions to be used in acquiring leases.

Also prevent a transaction from using a lease on a table
descriptor version written after the transaction timestamp.
Restart transaction whenever it sees a future table lease.

database name -> database id lookup is no longer done
using the user transaction.

Schema changes made during a transaction are cached and
used by future commands during the same transaction. Future
commands use the cached schema and do not attempt to
acquire a new lease on a table descriptor.

fixes cockroachdb#6774
fixes cockroachdb#6773
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue May 23, 2017
table leases are owned by the node and are  acquired
outside of the context of a user's transaction. This prevents
old user transactions to be used in acquiring leases.

Also prevent a transaction from using a lease on a table
descriptor version written after the transaction timestamp.
Restart transaction whenever it sees a future table lease.

database name -> database id lookup is no longer done
using the user transaction.

Schema changes made during a transaction are cached and
used by future commands during the same transaction. Future
commands use the cached schema and do not attempt to
acquire a new lease on a table descriptor.

fixes cockroachdb#6774
fixes cockroachdb#6773
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue May 23, 2017
table leases are owned by the node and are  acquired
outside of the context of a user's transaction. This prevents
old user transactions to be used in acquiring leases.

Also prevent a transaction from using a lease on a table
descriptor version written after the transaction timestamp.
Restart transaction whenever it sees a future table lease.

database name -> database id lookup is no longer done
using the user transaction.

Schema changes made during a transaction are cached and
used by future commands during the same transaction. Future
commands use the cached schema and do not attempt to
acquire a new lease on a table descriptor.

fixes cockroachdb#6774
fixes cockroachdb#6773
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue May 24, 2017
table leases are owned by the node and are  acquired
outside of the context of a user's transaction. This prevents
old user transactions to be used in acquiring leases.

Also prevent a transaction from using a lease on a table
descriptor version written after the transaction timestamp.
Restart transaction whenever it sees a future table lease.

database name -> database id lookup is no longer done
using the user transaction.

Schema changes made during a transaction are cached and
used by future commands during the same transaction. Future
commands use the cached schema and do not attempt to
acquire a new lease on a table descriptor.

fixes cockroachdb#6774
fixes cockroachdb#6773
@vivekmenezes
Copy link
Contributor

#17307

@vivekmenezes
Copy link
Contributor

#16842 mostly fixes this issue.

#17378 is the last outstanding fix for this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

7 participants