bounded staleness: v22.1.5 returns error - cannot use bounded staleness for DISTRIBUTE #85288

sheaffej · 2022-07-29T00:06:55Z

Describe the problem

When performing a bounded staleness read on a table where a majority of its voting replicas are down or network isolated away, with v22.1.5 an error is returned, compared to v21.2.13 where the query succeeds.

In v21.2.13, I can perform a bounded staleness read on rides (in a surviving region):

root@localhost:26257/movr_demo> select id, start_time, end_time from rides as of system time with_max_staleness('1h') where id = '00000000-0000-4000-8000-000000000000';
                   id                  |     start_time      |      end_time
---------------------------------------+---------------------+----------------------
  00000000-0000-4000-8000-000000000000 | 2018-12-21 03:04:05 | 2018-12-23 08:04:05
(1 row)

But in v22.1.5, I get this error:

root@localhost:26257/movr_demo> select id, start_time, end_time from rides as of system time with_max_staleness('1h') where id = '00000000-0000-4000-8000-000000000000';
ERROR: unimplemented: cannot use bounded staleness for DISTRIBUTE
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/67562/v22.1

To Reproduce

Create a multi-region cluster, set the database to ZONE survival goal, and the rides table as a REGIONAL table. Wait until any rebalancing has completed, then fail all nodes in the databases's primary region. Connect to surviving node, and attempt a bounded staleness read.

Create the cluster:

# Change based on your username
CLUSTER=j4-movr-mr-demo

LIFETIME="12h0m0s"
MACHINE="n2-standard-4"
# RELEASE="v21.2.13"
RELEASE="v22.1.5"
ZONES="us-west3-b,us-west1-b,us-east4-b,us-east1-b"
NODES=12

roachprod create ${CLUSTER} \
   -n ${NODES} \
   --gce-machine-type=${MACHINE} \
   --gce-zones=${ZONES} \
   --geo \
   --lifetime=${LIFETIME}

roachprod stage ${CLUSTER} release ${RELEASE}
roachprod start ${CLUSTER}:1-12 --args "--max-offset 250ms"
roachprod adminurl ${CLUSTER}:1 --open
roachprod status ${CLUSTER}

# Install an enterprise license and set any preferred cluster settings
echo "
SET CLUSTER SETTING cluster.organization = 'J4 CRL Demo';
SET CLUSTER SETTING enterprise.license = '---->get-your-own-key<----';
SET CLUSTER SETTING kv.snapshot_rebalance.max_rate = '4g'; 
SET CLUSTER SETTING kv.snapshot_recovery.max_rate = '4g';
SET CLUSTER SETTING server.time_until_store_dead = '1m15s';
" | roachprod sql $CLUSTER:1

Create the database and tables

DROP DATABASE IF EXISTS movr_demo;
CREATE DATABASE movr_demo;

ALTER DATABASE movr_demo SET PRIMARY REGION = "us-west1"; 
ALTER DATABASE movr_demo ADD REGION "us-west3"; 
ALTER DATABASE movr_demo ADD REGION "us-east4";
ALTER DATABASE movr_demo ADD REGION "us-east1";

-- Just being explicit here
ALTER DATABASE movr_demo SURVIVE ZONE FAILURE;

CREATE TABLE movr_demo.public.rides
(
    id uuid PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
    city STRING,
    vehicle_city STRING,
    rider_id uuid,
    vehicle_id uuid,
    start_address STRING,
    end_address STRING,
    start_time timestamp,
    end_time timestamp,
    revenue DECIMAL
)
;

IMPORT INTO movr_demo.public.rides
CSV DATA (
    'workload:///csv/movr/rides?infer-crdb-region-column=true&multi-region=false&num-histories=1000&num-promo-codes=1000&num-ranges=9&num-rides=100000&num-users=1000&num-vehicles=5000&row-end=100000&row-start=0&seed=1&survive=az&version=1.0.0'
) WITH "nullif" = 'NULL'
;

-- Just being explicit here
ALTER TABLE movr_demo.public.rides SET LOCALITY REGIONAL BY TABLE IN PRIMARY REGION;

After the cluster has settled with replica movement, fail the 3 primary region nodes

roachprod stop $CLUSTER:4-6

Observe that the ranges become unavailable.

Connect to a surviving node (e.g. us-east1):

roachprod sql $CLUSTER:10

Then run the bounded staleness query. This rides ID is always present in this generated data set, so if you ran the steps exactly above, this query should succeed.

select id, start_time, end_time 
from rides as of system time with_max_staleness('1h') 
where id = '00000000-0000-4000-8000-000000000000';

In v21.2.13, the query succeeds. In v22.1.5, the query returns the error:

root@localhost:26257/movr_demo> select id, start_time, end_time from rides as of system time with_max_staleness('1h') where id = '00000000-0000-4000-8000-000000000000';
ERROR: unimplemented: cannot use bounded staleness for DISTRIBUTE
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/67562/v22.1

Jira issue: CRDB-18155

The text was updated successfully, but these errors were encountered:

rytaft · 2022-08-01T19:17:27Z

Thanks for the info, @sheaffej! This should be pretty easy to fix -- PR coming shortly.

85075: externalconn,cloud: add GCP KMS support to external connections r=benbardin,rhu713 a=adityamaru In #84931 we taught the `ExternalStorage` infrastructure to recognize the `external` URI scheme. In this change we do the same but for the `KMS` infrastructure. Concretely, a user is now able to create an external storage object that represent a gcp KMS URI. This can be done using the `CREATE EXTERNAL CONNECTION` syntax. The user is then able to point an operation in CockroachDB such as BACKUP,RESTORE,SHOW BACKUP to that KMS using an `external` URI. For example: ``` CREATE EXTERNAL CONNECTION backup AS 'nodelocal://1/foo'; CREATE EXTERNAL CONNECTION backupkms AS 'gs:///cmk?AUTH=implict'; BACKUP INTO 'external://foo' WITH kms='external://backupkms'; ``` Under the hood, we implement the `ConnectionDetails` interface for gcp KMS. This allows us to persist a row in the `external_connections` table when the object is created, and to `Dial` the underlying resource when the object is being used. The interfaces had to be tweaked slightly to accomodate for the fact that they are now being implemented by two different infrastructures `ExternalStorage` and `KMS`. This is an expected evolution, and will pave the way for our third and final support for changefeed `Sinks`. A large portion of this diff is just plumbing the appropriate environments throught the backup/restore code. This diff also adds KMS specific tests to `pkg/ccl/cloudccl/gcp` and tweaks our nightly job to run these tests with the configured credentials. Informs: #84753 Release note (sql change): GCP KMS can be represented as an External Connection object, that can be used during a backup or restore using the `external` URI. 85351: opt: simplify contradictory filters to false r=mgartner a=mgartner The `SimplifySelectFilters` normalization rule now transforms filters to `false` if any of the filter items are contradictions. This allows filters to be simplified in more cases, which can trigger other rules. This can eliminate non-leakproof expressions in filters and allow `SimplifyZeroCardinalityGroup` to fire. For example, consider the query: SELECT a, b FROM t WHERE a > 0 AND a < 0 AND 1/b = 1 Prior to this commit, it could not be simplified to an empty values clause because the non-leakproof division operator prevents SimplifyZeroCardinalityGroup from applying. The normalized plan would be: select ├── columns: a:1 b:2 ├── cardinality: [0 - 0] ├── immutable ├── scan t │ └── columns: a:1 b:2 └── filters ├── (a:1 > 0) AND (a:1 < 0) [constraints=(contradiction)] └── (1 / b:2) = 1 [outer=(2), immutable] Now, `SimplifySelectFilters` eliminates the division operator, making the expression leakproof, and allowing `SimplifyZeroCardinalityGroup` to apply. The normalized plan is now: values ├── columns: a:1!null b:2!null ├── cardinality: [0 - 0] ├── key: () └── fd: ()-->(1-2) Release note (performance improvement): The optimizer can detect contradictory filters in more cases, leading to more efficient query plans. 85364: storage: improve range key clearing r=jbowens a=erikgrinaker This patch set improves clearing of range keys, with the overall goal of avoiding dropping Pebble range tombstones unnecessarily, in particular across range key spans that will most often be empty. The main affected components are: * `ClearRange` gRPC method: similarly to point keys, it now uses an iterator to clear individual range keys for small data sets. For larger data sets, it will use Pebble range tombstones to clear point or range keys separately, but only if any such keys exist. * Raft snapshots * Unreplicated data: no change, we only write a Pebble range tombstone across the point key span, since we don't expect range keys here. * Subsumed replicas' range-local state: iterates and clears any point/range keys (individually or ranged). * Snapshot SSTs: every SST (for each key span in the Raft range) unconditionally has a Pebble range tombstone at the bottom, both across point keys and range keys. * Raft splits: on RHS conflict, iterates and clears any RHS point/range keys (individually or ranged). * Raft merges: for RHS range-local data, iterates and clears any RHS point/range keys (individually or ranged). * Raft replica removal: iterates and clears any RHS point/range keys (individually or ranged). * Raft log truncation: no change, only writes tombstones across point keys. The iffiest bit is the unconditional Pebble range tombstones at the bottom of Raft snapshot SSTs, both because they're unconditional, and also because they will be fairly common. `@jbowens` Do you consider this problematic? Specifically, this bit here: https://github.com/cockroachdb/cockroach/blob/b0e76dc38bdc893ee3ba42131f9718fc855a1dc6/pkg/kv/kvserver/store_snapshot.go#L183-L185 Resolves #83032. --- **storage: add point/range key params to `Writer.ClearRawRange()`** This patch adds boolean parameters for point/range keys to `Writer.ClearRawRange()`. When true, it will unconditionally write Pebble range tombstones across the point/range key spans. This gives the caller better control over when to write tombstones, to avoid writing them unnecessarily. This is a behavioral change, because the previous `Engine` and `Batch` implementations would only drop Pebble range key tombstones if there were range keys present. Later commits will introduce higher-level functionality to avoid writing these tombstones unnecessarily and adapt callers. Release note: None **storage: clear individual range keys in `ClearMVCCIteratorRange`** Previously, `Writer.ClearMVCCIteratorRange` would use a Pebble `RANGEKEYDEL` to clear all range keys within the span. This went against the intent of the method, which is to iterate across the the span and clear individual range keys. This patch changes is to do the latter. Parameters are also added to control whether to clear point and/or range keys. Release note: None **storage: remove `Writer.ClearAllRangeKeys`** This patch removes `Writer.ClearAllRangeKeys()`, which was used to clear all range keys in a span using a single `RANGEKEYDEL` tombstone. `ClearRawRange` should be used instead. This also removes some clumsy `Engine` logic that attempted to detect and avoid writing range key tombstones if there were no range keys, which required additional metadata tracking in batches and such. The responsibility for this is instead placed on callers. It was never possible to handle this sufficiently inside the writers anyway, because of e.g. `SSTWriter` which does not have access to a reader. Release note: None **storage: add `Writer.ClearEngineRangeKey()`** This patch adds `Writer.ClearEngineRangeKey()`, and makes `ClearMVCCRangeKey()` call through to it. It also takes the opportunity to consolidate the put logic in a similar fashion, by calling through to the engine method. Release note: None **storage: improve `ClearRangeWithHeuristic` for range keys** This patch improves `ClearRangeWithHeuristic()` to take individual, optional thresholds for point and/or range keys, controlling when to use `RANGEDEL` or `RANGEKEYDEL` instead of individual tombstones. The Raft replica clearing code has been adapted to take advantage of this, and in particular, will no longer drop Pebble range key tombstones unless there are actual range keys present. Release note: None **kvserver/batcheval: improve range key clears in `ClearRange`** This patch improves range key clearing in `ClearRange`, by using `ClearRangeWithHeuristic` to only write Pebble range key tombstones if there are any range keys present. Release note: None **storage: add point/range key params to `Writer.ClearMVCCRange`** This patch adds point/range key parameters to `Writer.ClearMVCCRange()` controlling whether to clear the key type. This can be used by callers to avoid dropping unnecessary tombstones, particularly across range key spans that are expected to be empty most of the time. Release note: None 85412: logictest: temporarily disable workmem randomization in SQLLite tests r=yuzefovich a=yuzefovich This commit restores 2bdcab7 that was lost during the mega logic test refactor. Fixes: #85377. Release note: None 85423: opt: allow auto-commit and bounded staleness with distribute operator r=rytaft a=rytaft **opt: allow auto-commit with distribute operator** Prior to this commit, we disallowed auto-commit when a plan had a distribute enforcer. However, this was an oversight, as the distribute operator in the the optimizer is currently a no-op, and should not impact the ability to perform auto-commit. Fixes #85043 Release note (bug fix): Fixed a bug that was introduced in 22.1.0 that could cause the optimizer to not use auto-commit for some mutations in multi-region clusters when it should have done so. This could cause poor query performance. **opt: allow bounded staleness with distribute operator** Prior to this commit, we disallowed bounded staleness when a plan had a distribute enforcer. However, this was an oversight, as the distribute operator in the the optimizer is currently a no-op, and should not impact the ability to perform reads with bounded staleness. Fixes #85288 Release note (bug fix): Fixed a bug that was introduced in 22.1.0 that could cause the optimizer to reject valid bounded staleness queries with the error "unimplemented: cannot use bounded staleness for DISTRIBUTE". This has now been fixed. Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Rebecca Taft <[email protected]>

Prior to this commit, we disallowed bounded staleness when a plan had a distribute enforcer. However, this was an oversight, as the distribute operator in the the optimizer is currently a no-op, and should not impact the ability to perform reads with bounded staleness. Fixes #85288 Release note (bug fix): Fixed a bug that was introduced in 22.1.0 that could cause the optimizer to reject valid bounded staleness queries with the error "unimplemented: cannot use bounded staleness for DISTRIBUTE". This has now been fixed.

sheaffej added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team labels Jul 29, 2022

sheaffej changed the title ~~bounded staleness: v22.2.5 returns error - cannot use bounded staleness for DISTRIBUTE~~ bounded staleness: v22.1.5 returns error - cannot use bounded staleness for DISTRIBUTE Jul 29, 2022

rytaft self-assigned this Aug 1, 2022

rytaft mentioned this issue Aug 1, 2022

opt: allow auto-commit and bounded staleness with distribute operator #85423

Merged

craig bot closed this as completed in 1a1cf77 Aug 1, 2022

blathers-crl bot mentioned this issue Aug 1, 2022

release-22.1: opt: allow auto-commit and bounded staleness with distribute operator #85434

Merged

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bounded staleness: v22.1.5 returns error - cannot use bounded staleness for DISTRIBUTE #85288

bounded staleness: v22.1.5 returns error - cannot use bounded staleness for DISTRIBUTE #85288

sheaffej commented Jul 29, 2022 •

edited by cockroach-jira-scripts

Loading

rytaft commented Aug 1, 2022

bounded staleness: v22.1.5 returns error - cannot use bounded staleness for DISTRIBUTE #85288

bounded staleness: v22.1.5 returns error - cannot use bounded staleness for DISTRIBUTE #85288

Comments

sheaffej commented Jul 29, 2022 • edited by cockroach-jira-scripts Loading

rytaft commented Aug 1, 2022

sheaffej commented Jul 29, 2022 •

edited by cockroach-jira-scripts

Loading