Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bounded staleness: v22.1.5 returns error - cannot use bounded staleness for DISTRIBUTE #85288

Closed
sheaffej opened this issue Jul 29, 2022 · 1 comment · Fixed by #85423
Closed
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@sheaffej
Copy link
Collaborator

sheaffej commented Jul 29, 2022

Describe the problem

When performing a bounded staleness read on a table where a majority of its voting replicas are down or network isolated away, with v22.1.5 an error is returned, compared to v21.2.13 where the query succeeds.

In v21.2.13, I can perform a bounded staleness read on rides (in a surviving region):

root@localhost:26257/movr_demo> select id, start_time, end_time from rides as of system time with_max_staleness('1h') where id = '00000000-0000-4000-8000-000000000000';
                   id                  |     start_time      |      end_time
---------------------------------------+---------------------+----------------------
  00000000-0000-4000-8000-000000000000 | 2018-12-21 03:04:05 | 2018-12-23 08:04:05
(1 row)

But in v22.1.5, I get this error:

root@localhost:26257/movr_demo> select id, start_time, end_time from rides as of system time with_max_staleness('1h') where id = '00000000-0000-4000-8000-000000000000';
ERROR: unimplemented: cannot use bounded staleness for DISTRIBUTE
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/67562/v22.1

To Reproduce

Create a multi-region cluster, set the database to ZONE survival goal, and the rides table as a REGIONAL table. Wait until any rebalancing has completed, then fail all nodes in the databases's primary region. Connect to surviving node, and attempt a bounded staleness read.

Create the cluster:

# Change based on your username
CLUSTER=j4-movr-mr-demo

LIFETIME="12h0m0s"
MACHINE="n2-standard-4"
# RELEASE="v21.2.13"
RELEASE="v22.1.5"
ZONES="us-west3-b,us-west1-b,us-east4-b,us-east1-b"
NODES=12

roachprod create ${CLUSTER} \
   -n ${NODES} \
   --gce-machine-type=${MACHINE} \
   --gce-zones=${ZONES} \
   --geo \
   --lifetime=${LIFETIME}

roachprod stage ${CLUSTER} release ${RELEASE}
roachprod start ${CLUSTER}:1-12 --args "--max-offset 250ms"
roachprod adminurl ${CLUSTER}:1 --open
roachprod status ${CLUSTER}

# Install an enterprise license and set any preferred cluster settings
echo "
SET CLUSTER SETTING cluster.organization = 'J4 CRL Demo';
SET CLUSTER SETTING enterprise.license = '---->get-your-own-key<----';
SET CLUSTER SETTING kv.snapshot_rebalance.max_rate = '4g'; 
SET CLUSTER SETTING kv.snapshot_recovery.max_rate = '4g';
SET CLUSTER SETTING server.time_until_store_dead = '1m15s';
" | roachprod sql $CLUSTER:1

Create the database and tables

DROP DATABASE IF EXISTS movr_demo;
CREATE DATABASE movr_demo;

ALTER DATABASE movr_demo SET PRIMARY REGION = "us-west1"; 
ALTER DATABASE movr_demo ADD REGION "us-west3"; 
ALTER DATABASE movr_demo ADD REGION "us-east4";
ALTER DATABASE movr_demo ADD REGION "us-east1";

-- Just being explicit here
ALTER DATABASE movr_demo SURVIVE ZONE FAILURE;

CREATE TABLE movr_demo.public.rides
(
    id uuid PRIMARY KEY NOT NULL DEFAULT gen_random_uuid(),
    city STRING,
    vehicle_city STRING,
    rider_id uuid,
    vehicle_id uuid,
    start_address STRING,
    end_address STRING,
    start_time timestamp,
    end_time timestamp,
    revenue DECIMAL
)
;

IMPORT INTO movr_demo.public.rides
CSV DATA (
    'workload:///csv/movr/rides?infer-crdb-region-column=true&multi-region=false&num-histories=1000&num-promo-codes=1000&num-ranges=9&num-rides=100000&num-users=1000&num-vehicles=5000&row-end=100000&row-start=0&seed=1&survive=az&version=1.0.0'
) WITH "nullif" = 'NULL'
;

-- Just being explicit here
ALTER TABLE movr_demo.public.rides SET LOCALITY REGIONAL BY TABLE IN PRIMARY REGION;

After the cluster has settled with replica movement, fail the 3 primary region nodes

roachprod stop $CLUSTER:4-6

Observe that the ranges become unavailable.

Connect to a surviving node (e.g. us-east1):

roachprod sql $CLUSTER:10

Then run the bounded staleness query. This rides ID is always present in this generated data set, so if you ran the steps exactly above, this query should succeed.

select id, start_time, end_time 
from rides as of system time with_max_staleness('1h') 
where id = '00000000-0000-4000-8000-000000000000';

In v21.2.13, the query succeeds. In v22.1.5, the query returns the error:

root@localhost:26257/movr_demo> select id, start_time, end_time from rides as of system time with_max_staleness('1h') where id = '00000000-0000-4000-8000-000000000000';
ERROR: unimplemented: cannot use bounded staleness for DISTRIBUTE
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://go.crdb.dev/issue-v/67562/v22.1

Jira issue: CRDB-18155

@sheaffej sheaffej added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team labels Jul 29, 2022
@sheaffej sheaffej changed the title bounded staleness: v22.2.5 returns error - cannot use bounded staleness for DISTRIBUTE bounded staleness: v22.1.5 returns error - cannot use bounded staleness for DISTRIBUTE Jul 29, 2022
@rytaft rytaft self-assigned this Aug 1, 2022
@rytaft
Copy link
Collaborator

rytaft commented Aug 1, 2022

Thanks for the info, @sheaffej! This should be pretty easy to fix -- PR coming shortly.

craig bot pushed a commit that referenced this issue Aug 1, 2022
85075: externalconn,cloud: add GCP KMS support to external connections r=benbardin,rhu713 a=adityamaru

In #84931 we taught the `ExternalStorage` infrastructure to
recognize the `external` URI scheme. In this change we do the same
but for the `KMS` infrastructure.

Concretely, a user is now able to create an external storage
object that represent a gcp KMS URI. This can be done using the
`CREATE EXTERNAL CONNECTION` syntax. The user is then able to point
an operation in CockroachDB such as BACKUP,RESTORE,SHOW BACKUP to that
KMS using an `external` URI. For example:

```
CREATE EXTERNAL CONNECTION backup AS 'nodelocal://1/foo';
CREATE EXTERNAL CONNECTION backupkms AS 'gs:///cmk?AUTH=implict';

BACKUP INTO 'external://foo' WITH kms='external://backupkms';
```

Under the hood, we implement the `ConnectionDetails` interface
for gcp KMS. This allows us to persist a row in the `external_connections`
table when the object is created, and to `Dial` the underlying resource
when the object is being used. The interfaces had to be tweaked slightly
to accomodate for the fact that they are now being implemented by two
different infrastructures `ExternalStorage` and `KMS`. This is an
expected evolution, and will pave the way for our third and final support
for changefeed `Sinks`. A large portion of this diff is just plumbing the
appropriate environments throught the backup/restore code.

This diff also adds KMS specific tests to `pkg/ccl/cloudccl/gcp` and tweaks
our nightly job to run these tests with the configured credentials.

Informs: #84753

Release note (sql change): GCP KMS can be represented as an External Connection
object, that can be used during a backup or restore using the `external` URI.

85351: opt: simplify contradictory filters to false r=mgartner a=mgartner

The `SimplifySelectFilters` normalization rule now transforms filters to
`false` if any of the filter items are contradictions. This allows
filters to be simplified in more cases, which can trigger other rules.
This can eliminate non-leakproof expressions in filters and allow
`SimplifyZeroCardinalityGroup` to fire.

For example, consider the query:

    SELECT a, b FROM t WHERE a > 0 AND a < 0 AND 1/b = 1

Prior to this commit, it could not be simplified to an empty values
clause because the non-leakproof division operator prevents
SimplifyZeroCardinalityGroup from applying. The normalized plan would
be:

    select
     ├── columns: a:1 b:2
     ├── cardinality: [0 - 0]
     ├── immutable
     ├── scan t
     │    └── columns: a:1 b:2
     └── filters
          ├── (a:1 > 0) AND (a:1 < 0) [constraints=(contradiction)]
          └── (1 / b:2) = 1 [outer=(2), immutable]

Now, `SimplifySelectFilters` eliminates the division operator, making
the expression leakproof, and allowing `SimplifyZeroCardinalityGroup` to
apply. The normalized plan is now:

    values
     ├── columns: a:1!null b:2!null
     ├── cardinality: [0 - 0]
     ├── key: ()
     └── fd: ()-->(1-2)

Release note (performance improvement): The optimizer can detect
contradictory filters in more cases, leading to more efficient query
plans.

85364: storage: improve range key clearing r=jbowens a=erikgrinaker

This patch set improves clearing of range keys, with the overall goal of avoiding dropping Pebble range tombstones unnecessarily, in particular across range key spans that will most often be empty. The main affected components are:

* `ClearRange` gRPC method: similarly to point keys, it now uses an iterator to clear individual range keys for small data sets. For larger data sets, it will use Pebble range tombstones to clear point or range keys separately, but only if any such keys exist.

* Raft snapshots
  * Unreplicated data: no change, we only write a Pebble range tombstone across the point key span, since we don't expect range keys here.
  * Subsumed replicas' range-local state: iterates and clears any point/range keys (individually or ranged).
  * Snapshot SSTs: every SST (for each key span in the Raft range) unconditionally has a Pebble range tombstone at the bottom, both across point keys and range keys.

* Raft splits: on RHS conflict, iterates and clears any RHS point/range keys (individually or ranged).

* Raft merges: for RHS range-local data, iterates and clears any RHS point/range keys (individually or ranged).

* Raft replica removal: iterates and clears any RHS point/range keys (individually or ranged).

* Raft log truncation: no change, only writes tombstones across point keys.

The iffiest bit is the unconditional Pebble range tombstones at the bottom of Raft snapshot SSTs, both because they're unconditional, and also because they will be fairly common. `@jbowens` Do you consider this problematic? Specifically, this bit here:

https://github.com/cockroachdb/cockroach/blob/b0e76dc38bdc893ee3ba42131f9718fc855a1dc6/pkg/kv/kvserver/store_snapshot.go#L183-L185

Resolves #83032.

---

**storage: add point/range key params to `Writer.ClearRawRange()`**

This patch adds boolean parameters for point/range keys to
`Writer.ClearRawRange()`. When true, it will unconditionally write
Pebble range tombstones across the point/range key spans. This gives the
caller better control over when to write tombstones, to avoid writing
them unnecessarily.

This is a behavioral change, because the previous `Engine` and `Batch`
implementations would only drop Pebble range key tombstones if there
were range keys present. Later commits will introduce higher-level
functionality to avoid writing these tombstones unnecessarily and adapt
callers.

Release note: None

**storage: clear individual range keys in `ClearMVCCIteratorRange`**

Previously, `Writer.ClearMVCCIteratorRange` would use a Pebble
`RANGEKEYDEL` to clear all range keys within the span. This went against
the intent of the method, which is to iterate across the the span and
clear individual range keys. This patch changes is to do the latter.

Parameters are also added to control whether to clear point and/or range
keys.

Release note: None
  
**storage: remove `Writer.ClearAllRangeKeys`**

This patch removes `Writer.ClearAllRangeKeys()`, which was used to clear
all range keys in a span using a single `RANGEKEYDEL` tombstone.
`ClearRawRange` should be used instead.

This also removes some clumsy `Engine` logic that attempted to detect
and avoid writing range key tombstones if there were no range keys,
which required additional metadata tracking in batches and such. The
responsibility for this is instead placed on callers. It was never
possible to handle this sufficiently inside the writers anyway, because
of e.g. `SSTWriter` which does not have access to a reader.

Release note: None

**storage: add `Writer.ClearEngineRangeKey()`**

This patch adds `Writer.ClearEngineRangeKey()`, and makes
`ClearMVCCRangeKey()` call through to it. It also takes the opportunity
to consolidate the put logic in a similar fashion, by calling through to
the engine method.

Release note: None
  
**storage: improve `ClearRangeWithHeuristic` for range keys**

This patch improves `ClearRangeWithHeuristic()` to take individual,
optional thresholds for point and/or range keys, controlling when to use
`RANGEDEL` or `RANGEKEYDEL` instead of individual tombstones. The Raft
replica clearing code has been adapted to take advantage of this, and in
particular, will no longer drop Pebble range key tombstones unless there
are actual range keys present.

Release note: None

**kvserver/batcheval: improve range key clears in `ClearRange`**

This patch improves range key clearing in `ClearRange`, by using
`ClearRangeWithHeuristic` to only write Pebble range key tombstones if
there are any range keys present.

Release note: None
  
**storage: add point/range key params to `Writer.ClearMVCCRange`**

This patch adds point/range key parameters to `Writer.ClearMVCCRange()`
controlling whether to clear the key type. This can be used by callers
to avoid dropping unnecessary tombstones, particularly across range key
spans that are expected to be empty most of the time.

Release note: None

85412: logictest: temporarily disable workmem randomization in SQLLite tests r=yuzefovich a=yuzefovich

This commit restores 2bdcab7 that was
lost during the mega logic test refactor.

Fixes: #85377.

Release note: None

85423: opt: allow auto-commit and bounded staleness with distribute operator r=rytaft a=rytaft

**opt: allow auto-commit with distribute operator**

Prior to this commit, we disallowed auto-commit when a plan had a
distribute enforcer. However, this was an oversight, as the distribute
operator in the the optimizer is currently a no-op, and should not
impact the ability to perform auto-commit.

Fixes #85043

Release note (bug fix): Fixed a bug that was introduced in 22.1.0 that
could cause the optimizer to not use auto-commit for some mutations in
multi-region clusters when it should have done so. This could cause
poor query performance.

**opt: allow bounded staleness with distribute operator**

Prior to this commit, we disallowed bounded staleness when a plan had a
distribute enforcer. However, this was an oversight, as the distribute
operator in the the optimizer is currently a no-op, and should not
impact the ability to perform reads with bounded staleness.

Fixes #85288

Release note (bug fix): Fixed a bug that was introduced in 22.1.0 that
could cause the optimizer to reject valid bounded staleness queries with
the error "unimplemented: cannot use bounded staleness for DISTRIBUTE".
This has now been fixed.

Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
@craig craig bot closed this as completed in 1a1cf77 Aug 1, 2022
blathers-crl bot pushed a commit that referenced this issue Aug 1, 2022
Prior to this commit, we disallowed bounded staleness when a plan had a
distribute enforcer. However, this was an oversight, as the distribute
operator in the the optimizer is currently a no-op, and should not
impact the ability to perform reads with bounded staleness.

Fixes #85288

Release note (bug fix): Fixed a bug that was introduced in 22.1.0 that
could cause the optimizer to reject valid bounded staleness queries with
the error "unimplemented: cannot use bounded staleness for DISTRIBUTE".
This has now been fixed.
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants