kv: intent resolution for high priority pushes should not wait for batching #50390

kocoten1992 · 2020-06-18T14:08:02Z

Describe the problem

When select for update a row, other query running into it will significantly increase cost, example below:

To Reproduce

start fresh ./cockroach start-single-node --insecure

CREATE TABLE test_table (
  id uuid NOT NULL,
  PRIMARY KEY (id)
);

INSERT INTO test_table (id) values ('aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa');
INSERT INTO test_table (id) values ('bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb');

open two cockroach shells:

# shell 1
begin;
root@:26257/test_database  OPEN> select * from test_table where id = 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' for update;
                   id
----------------------------------------
  aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
(1 row)

Time: 766.309µs

# shell 2
begin priority high;
root@:26257/default  OPEN> select * from test_table where id = 'bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb' for update;
                   id
----------------------------------------
  bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb
(1 row)

Time: 1.265911ms

root@:26257/default  OPEN> select * from test_table where id = 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' for update;
                   id
----------------------------------------
  aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
(1 row)

Time: 14.729765ms

Notice that, the select time are significantly higher.

I don't know how to get around this ? Maybe we could have a select priority read and skip all of the checking

Because this make our select for update kinda work - but does not truly.

Environment:

CockroachDB version: v20.1.1

P/s: another example

# shell 1
begin priority low;
root@:26257/default  OPEN> select * from test_table where id = 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' for update;
                   id
----------------------------------------
  aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
(1 row)

Time: 1.081118ms

# shell 2
# no need transaction here
root@:26257/default> select * from test_table where id = 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa';           
                   id
----------------------------------------
  aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
(1 row)

Time: 19.77373ms

Jira issue: CRDB-4129

Epic CRDB-25218

The text was updated successfully, but these errors were encountered:

blathers-crl · 2020-06-18T14:08:04Z

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

@ricardocrdb (member of the technical support engineering team)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

yuzefovich · 2020-06-18T19:35:31Z

cc @nvanbenschoten

kocoten1992 · 2020-06-18T20:24:32Z

I forgot to mention, begin read only; does not help, maybe it make sense to read only ?

riking · 2020-06-19T05:50:24Z

This looks like the intended behavior of SELECT .. FOR UPDATE? The entire point of the statement is to take a write lock on the row, because you're about to update it. If other reads didn't wait for you to finish, then they'd have aborted transactions.

And yes, SELECT .. FOR UPDATE should be rejected during a read-only transaction.

kocoten1992 · 2020-06-19T14:13:15Z

For some peculiar flow control, what you really want is really both select .. for update and normal select working together.

With postgresql, select .. for update (and even select .. for share) don't interfere with normal select (we could achieve maximum query throughput with normal select).

With cockroachdb, our select .. for update does interfere with normal select, but instead we have something better than postgres, priority transaction. Unfortunately, as current state, normal select would be affected in performance.

It's all about query throughput, I hope I made myself clear.

riking · 2020-06-20T06:25:23Z

Right, so the delay is because the read only client needs to find the transaction record of the FOR UPDATE client, see that it hasn't committed yet, bump the transaction time of the other transaction so it will happen after the read, and return.

I think this means that the SELECT FOR UPDATE is causing read only transactions to perform a write.

The good news is that any other concurrent reads for the same key should be satisfied by the same timestamp bump?

kocoten1992 · 2020-06-20T13:19:17Z

The good news is that any other concurrent reads for the same key should be satisfied by the same timestamp bump?

Are you sure about that ? @riking

Here is my testing, I open 3 cockroach client:
shell1. begin with normal priority, select for update
shell2. begin high priority, hiccup
shell3. begin high priority, still hiccup

If we are talking about calling the same select on the same transaction, that is not what I am after.

nvanbenschoten · 2020-06-22T18:20:38Z

Hi @kocoten1992, thanks for the report and the reproduction steps. Like @riking mentioned, the delay is partly because the high priority transaction needs to perform a write to the other transaction's record. In my testing on my Mac, this was responsible for about 8ms. Do you mind measuring the duration of an INSERT statement in your test environment? As an aside, this 8ms was slower than I was expecting until I was reminded of golang/go#26650.

In practice, you will only hit this delay if the first transaction has heartbeat its transaction record, which begins after the transaction has been alive for 1s (see DefaultTxnHeartbeatInterval). If not, the push is a fully in-memory operation which does not require a disk write and should take less than 1ms. Are you holding transactions open for longer than this in your application?

The other ~5m appears to be coming from a delay in during intent resolution. CockroachDB attempts to batch intent resolution across multiple transactions, which is effective in avoiding redundant pushes and reducing the aggregate number of disk writes due to contention handling. In practice, this is a latency/throughput trade-off. However, I think there's a strong argument that high-priority transactions should not wait for this delay. So we should be able to remove this remaining 5ms. Do you mind if I repurpose this issue to track that change?

kocoten1992 · 2020-06-22T19:42:45Z

I don't mind at all, and thank you!

p/s: my testing method was wrong (I open cockroachdb client and let transaction live for longer than 1s), thanks for letting me know.

p/s: hi, what about transaction select read only, should we skip everything, read immediately without checking row lock ?

…uests Fixes cockroachdb#50390. Related to cockroachdb#60585. Related to cockroachdb#103126. Closes cockroachdb#64500, which was an earlier attempt to solve this issue using a similar approach. This commit addresses the comments on that PR, which focused on the pagination of intent resolution when bypassing the request batcher. We still don't try to solve this issue, and instead limit the cases where intent resolution bypasses the request batcher to those where pagination is not necessary. This commit adds a new `sendImmediately` option to the `IntentResolver` `ResolveOptions`, which instructs the `IntentResolver` to send the provided intent resolution requests immediately, instead of adding them to a batch and waiting up to 10ms (defaultIntentResolutionBatchWait) for that batch to fill up with other intent resolution requests. This can be used to avoid any batching-induced latency and should be used only by foreground traffic that is willing to trade off some throughput for lower latency. The commit then sets this flag for single-key intent resolution requests initiated by the `lockTableWaiter`. Unlike most calls into the `IntentResolver`, which are performed by background tasks that are more than happy to trade increased latency for increased throughput, the `lockTableWaiter` calls into the `IntentResolver` on behalf of a foreground operation. It wants intent resolution to complete as soon as possible, so it is willing to trade reduced throughput for reduced latency. I tested this out by writing 10,000 different intents in a normal-priority transaction and then scanning over the table using a high-priority transaction. The test was performed on a 3-node roachprod cluster to demonstrate the effect with realistic network and disk latencies. ``` -- session 1 CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY); BEGIN; INSERT INTO keys SELECT generate_series(1, 10000); -- session 2 BEGIN PRIORITY HIGH; SELECT count(*) FROM keys; ``` Without this change, the scan took 70.078s. With this change, the scan took 15.958s. This 78% speed up checks out. Each encountered intent is resolved serially (see cockroachdb#103126), so the per-intent latency drops from 7ms to 1.6ms. This improvement by about 5ms agrees with the `defaultIntentResolutionBatchIdle`, which delays each resolution request that passes through a RequestBatcher. With this change, these resolve intent requests are issued immediately and this delay is not experienced. While this is a significant improvement and will be important for Read Committed transactions after cockroachdb#102014, this is still not quite enough to resolve cockroachdb#103126. For that, we need to batch the resolve intent requests themselves using a form of deferred intent resolution like we added in cockroachdb#49218 (for finalized transactions). A similar improvement is seen for scans that encounter many abandoned intents from many different transactions. This scenario bypasses the deferred intent resolution path added in cockroachdb#49218, because the intents are each written by different transactions. The speedup for this scenario was presented in cockroachdb#64500. Release note (performance improvement): SQL statements that must clean up intents from many different previously abandoned transactions now do so moderately more efficiently.

103265: kv: resolve conflicting intents immediately for latency-sensitive requests r=nvanbenschoten a=nvanbenschoten Fixes #50390. Related to #60585. Related to #103126. Closes #64500, which was an earlier attempt to solve this issue using a similar approach. This commit addresses the comments on that PR, which focused on the pagination of intent resolution when bypassing the request batcher. We still don't try to solve this issue, and instead limit the cases where intent resolution bypasses the request batcher to those where pagination is not necessary. This commit adds a new `sendImmediately` option to the `IntentResolver` `ResolveOptions`, which instructs the `IntentResolver` to send the provided intent resolution requests immediately, instead of adding them to a batch and waiting up to 10ms (`defaultIntentResolutionBatchWait`) for that batch to fill up with other intent resolution requests. This can be used to avoid any batching-induced latency and should be used only by foreground traffic that is willing to trade off some throughput for lower latency. The commit then sets this flag for single-key intent resolution requests initiated by the `lockTableWaiter`. Unlike most calls into the `IntentResolver`, which are performed by background tasks that are more than happy to trade increased latency for increased throughput, the `lockTableWaiter` calls into the `IntentResolver` on behalf of a foreground operation. It wants intent resolution to complete as soon as possible, so it is willing to trade reduced throughput for reduced latency. I tested this out by writing 10,000 different intents in a normal-priority transaction and then scanning over the table using a high-priority transaction. The test was performed on a 3-node roachprod cluster to demonstrate the effect with realistic network and disk latencies. ```sql -- session 1 CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY); BEGIN; INSERT INTO keys SELECT generate_series(1, 10000); -- session 2 BEGIN PRIORITY HIGH; SELECT count(*) FROM keys; ``` Without this change, the scan took **70.078s**. With this change, the scan took **15.958s**. This **78%** speed-up checks out. Each encountered intent is resolved serially (see #103126), so the **per-intent latency** drops from **7ms** to **1.6ms.** This improvement by about 5ms agrees with the `defaultIntentResolutionBatchIdle`, which delays each resolution request that passes through a RequestBatcher. With this change, these resolve intent requests are issued immediately and this delay is not experienced. While this is a significant improvement and will be important for Read Committed transactions after #102014, this is still not quite enough to resolve #103126. For that, we need to batch the resolve intent requests themselves using a form of deferred intent resolution like we added in #49218 (for finalized transactions). A similar improvement is seen for scans that encounter many abandoned intents from many different transactions. This scenario bypasses the deferred intent resolution path added in #49218, because the intents are each written by different transactions. The speedup for this scenario was presented in #64500. Release note (performance improvement): SQL statements that must clean up intents from many different previously abandoned transactions now do so moderately more efficiently. 103362: sql: validate primary / secondary region localities at end of txn r=fqazi a=fqazi Previously, if a database was restored with skip_localities, there was no way to modify this database to set the primary region since validation would kick in too early during the statement. This meant fixing the regions in a restored database was impossible if the primary region was no longer valid. To address this, this patch, delays locality validation till the end of the transaction. Fixes: #103290 Release note (bug fix): SET PRIMARY REGION and SET SECONDARY REGION did not validate transactionally, which could prevent cleaning up removed regions after a restore. 103373: concurrency: small refactors in preparation for reservation removal r=nvanbenschoten a=arulajmani See individual commits for details. Informs: #103361 103538: go.mod: bump Pebble to 6f2788660198, rework shared storage wrapper r=RaduBerinde a=RaduBerinde 6f278866 shared: improve interface for more efficient reading 9eb2c407 db: log events to testing.T in unit tests f32e7dc6 db: add reserved Pebblev4 sstable format 5a6b91b8 objstorage: improve test and add read ahead test 2bc4319e objstorage: remove genericFileReadable 8143ffb9 objstorage: fix readaheadState initialization 06d08888 db: add Compact.Duration metric e7213de0 db: add Uptime metric e9005aed db: don't delete files during ingest application 222b43ec internal/arenaskl: fix Skiplist doc typo Release note: None Epic: none Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Faizan Qazi <[email protected]> Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Radu Berinde <[email protected]>

blathers-crl bot added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-oncall labels Jun 18, 2020

nvanbenschoten changed the title ~~[select for update] cost is too high~~ kv: intent resolution for high priority pushes should not wait for batching Jun 30, 2020

nvanbenschoten added the A-kv-transactions Relating to MVCC and the transactional model. label Jun 30, 2020

nvanbenschoten self-assigned this Jun 30, 2020

jlinder added the T-kv KV Team label Jun 16, 2021

nvanbenschoten mentioned this issue Sep 10, 2021

kv: resolve conflicting intents immediately for latency-sensitive requests #64500

Closed

otan removed the X-blathers-oncall label Oct 14, 2021

nvanbenschoten mentioned this issue May 14, 2023

kv: resolve conflicting intents immediately for latency-sensitive requests #103265

Merged

nvanbenschoten added the A-read-committed Related to the introduction of Read Committed label May 14, 2023

craig bot closed this as completed in 3fc29ad May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: intent resolution for high priority pushes should not wait for batching #50390

kv: intent resolution for high priority pushes should not wait for batching #50390

kocoten1992 commented Jun 18, 2020 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Jun 18, 2020

yuzefovich commented Jun 18, 2020

kocoten1992 commented Jun 18, 2020

riking commented Jun 19, 2020 •

edited

Loading

kocoten1992 commented Jun 19, 2020

riking commented Jun 20, 2020 •

edited

Loading

kocoten1992 commented Jun 20, 2020

nvanbenschoten commented Jun 22, 2020

kocoten1992 commented Jun 22, 2020 •

edited

Loading

kv: intent resolution for high priority pushes should not wait for batching #50390

kv: intent resolution for high priority pushes should not wait for batching #50390

Comments

kocoten1992 commented Jun 18, 2020 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Jun 18, 2020

yuzefovich commented Jun 18, 2020

kocoten1992 commented Jun 18, 2020

riking commented Jun 19, 2020 • edited Loading

kocoten1992 commented Jun 19, 2020

riking commented Jun 20, 2020 • edited Loading

kocoten1992 commented Jun 20, 2020

nvanbenschoten commented Jun 22, 2020

kocoten1992 commented Jun 22, 2020 • edited Loading

kocoten1992 commented Jun 18, 2020 •

edited by exalate-issue-sync bot

Loading

riking commented Jun 19, 2020 •

edited

Loading

riking commented Jun 20, 2020 •

edited

Loading

kocoten1992 commented Jun 22, 2020 •

edited

Loading