Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external: introduce an external:// URI scheme that is recognized by ExternalStorage, Sink and KMS #84753

Closed
adityamaru opened this issue Jul 20, 2022 · 2 comments
Assignees
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Jul 20, 2022

As discussed in #84209 External Connections are going to be addressable using an external scheme URI that will contain the objects name as its host. This URI will be recognized by the ExternalStorage, KMS and Sink interfaces in CockroachDB today. Thereby allowing all bulk operations, and changefeeds to reference an External Connection when running. Roughly we will:

  • Parse the external URI to get the external connection name.
  • Locate the row for that object in the system.external_connections table and unmarshal its details.
  • Dial() into the underlying external resource and return a handle to the caller

Jira issue: CRDB-17845

Epic CRDB-15001

Epic CRDB-15001

@adityamaru adityamaru added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery labels Jul 20, 2022
@adityamaru adityamaru self-assigned this Jul 20, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jul 20, 2022

cc @cockroachdb/bulk-io

adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 22, 2022
In cockroachdb#84310 we added the ability to create an external connection
to represent a `nodelocal` endpoint. This diff is the second piece
of the puzzle that allows systems in CRDB to interact with the
external connection object.

We introduce an `external` URI scheme to the `ExternalStorage`
registry. URIs with an `external` scheme are required to contain a
host component referring to the unique name of an existing extenral
connection object. Optionally, the URI can also contain a path
component that will be appended to the endpoint that was specified
at the time the external connection object was created. This is necessary
for operations such as backup and restore that read/write to subdirectories
in the endpoint inputted by the user. A nice UX win is the abilility to
have a single external connection object for the base bucket, and then
interact with all the subdirectories without having to create an object
for each directory. In the future we may want to clamp down on this, and
allow the user to specify which objects permit subdirectory access.

The `external://<object-name>/<optional-path>` URI is parsed and the
underlying object is fetched from the `system.external_connections` table.
The resource represented by the object is then `Dial()`ed to return an
`ExternalStorage` handle that can be used to read, write, list etc. With
this change all bulk operations and cdc are able to use external connections
to represent a `nodelocal` endpoint. For example, a backup can now be run as:

```
CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';
BACKUP INTO 'external://foo';
RESTORE FROM LATEST IN 'external://foo';
```

Gradually, we will add support for all other external storage endpoints
as well.

Note, we do not register limiter settings for the `external` provider
`ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is
because the underlying resource of the external connection object will already
have its registered limiters and recorder.

Informs: cockroachdb#84753

Release note (sql change): Bulk operations and CDC will accept an `external`
scheme URI that points to a previously created External Connection object,
These operations can then interact with the underlying resource represented by
the object as they did before.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 25, 2022
In cockroachdb#84310 we added the ability to create an external connection
to represent a `nodelocal` endpoint. This diff is the second piece
of the puzzle that allows systems in CRDB to interact with the
external connection object.

We introduce an `external` URI scheme to the `ExternalStorage`
registry. URIs with an `external` scheme are required to contain a
host component referring to the unique name of an existing extenral
connection object. Optionally, the URI can also contain a path
component that will be appended to the endpoint that was specified
at the time the external connection object was created. This is necessary
for operations such as backup and restore that read/write to subdirectories
in the endpoint inputted by the user. A nice UX win is the abilility to
have a single external connection object for the base bucket, and then
interact with all the subdirectories without having to create an object
for each directory. In the future we may want to clamp down on this, and
allow the user to specify which objects permit subdirectory access.

The `external://<object-name>/<optional-path>` URI is parsed and the
underlying object is fetched from the `system.external_connections` table.
The resource represented by the object is then `Dial()`ed to return an
`ExternalStorage` handle that can be used to read, write, list etc. With
this change all bulk operations and cdc are able to use external connections
to represent a `nodelocal` endpoint. For example, a backup can now be run as:

```
CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';
BACKUP INTO 'external://foo';
RESTORE FROM LATEST IN 'external://foo';
```

Gradually, we will add support for all other external storage endpoints
as well.

Note, we do not register limiter settings for the `external` provider
`ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is
because the underlying resource of the external connection object will already
have its registered limiters and recorder.

Informs: cockroachdb#84753

Release note (sql change): Bulk operations and CDC will accept an `external`
scheme URI that points to a previously created External Connection object,
These operations can then interact with the underlying resource represented by
the object as they did before.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 27, 2022
In cockroachdb#84310 we added the ability to create an external connection
to represent a `nodelocal` endpoint. This diff is the second piece
of the puzzle that allows systems in CRDB to interact with the
external connection object.

We introduce an `external` URI scheme to the `ExternalStorage`
registry. URIs with an `external` scheme are required to contain a
host component referring to the unique name of an existing extenral
connection object. Optionally, the URI can also contain a path
component that will be appended to the endpoint that was specified
at the time the external connection object was created. This is necessary
for operations such as backup and restore that read/write to subdirectories
in the endpoint inputted by the user. A nice UX win is the abilility to
have a single external connection object for the base bucket, and then
interact with all the subdirectories without having to create an object
for each directory. In the future we may want to clamp down on this, and
allow the user to specify which objects permit subdirectory access.

The `external://<object-name>/<optional-path>` URI is parsed and the
underlying object is fetched from the `system.external_connections` table.
The resource represented by the object is then `Dial()`ed to return an
`ExternalStorage` handle that can be used to read, write, list etc. With
this change all bulk operations and cdc are able to use external connections
to represent a `nodelocal` endpoint. For example, a backup can now be run as:

```
CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';
BACKUP INTO 'external://foo';
RESTORE FROM LATEST IN 'external://foo';
```

Gradually, we will add support for all other external storage endpoints
as well.

Note, we do not register limiter settings for the `external` provider
`ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is
because the underlying resource of the external connection object will already
have its registered limiters and recorder.

Informs: cockroachdb#84753

Release note (sql change): Bulk operations and CDC will accept an `external`
scheme URI that points to a previously created External Connection object,
These operations can then interact with the underlying resource represented by
the object as they did before.
craig bot pushed a commit that referenced this issue Jul 27, 2022
84931: externalconn,nodelocal: add `external` ExternalStorage provider r=adityamaru a=adityamaru

In #84310 we added the ability to create an external connection
to represent a `nodelocal` endpoint. This diff is the second piece
of the puzzle that allows systems in CRDB to interact with the
external connection object.

We introduce an `external` URI scheme to the `ExternalStorage`
registry. URIs with an `external` scheme are required to contain a
host component referring to the unique name of an existing extenral
connection object. Optionally, the URI can also contain a path
component that will be appended to the endpoint that was specified
at the time the external connection object was created. This is necessary
for operations such as backup and restore that read/write to subdirectories
in the endpoint inputted by the user. A nice UX win is the abilility to
have a single external connection object for the base bucket, and then
interact with all the subdirectories without having to create an object
for each directory. In the future we may want to clamp down on this, and
allow the user to specify which objects permit subdirectory access.

The `external://<object-name>/<optional-path>` URI is parsed and the
underlying object is fetched from the `system.external_connections` table.
The resource represented by the object is then `Dial()`ed to return an
`ExternalStorage` handle that can be used to read, write, list etc. With
this change all bulk operations and cdc are able to use external connections
to represent a `nodelocal` endpoint. For example, a backup can now be run as:

```
CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo';
BACKUP INTO 'external://foo';
RESTORE FROM LATEST IN 'external://foo';
```

Gradually, we will add support for all other external storage endpoints
as well.

Note, we do not register limiter settings for the `external` provider
`ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is
because the underlying resource of the external connection object will already
have its registered limiters and recorder.

Informs: #84753

Release note (sql change): Bulk operations and CDC will accept an `external`
scheme URI that points to a previously created External Connection object,
These operations can then interact with the underlying resource represented by
the object as they did before.

85041: sql: better logging about the optimizer as well as some cancellation checks r=yuzefovich a=yuzefovich

**sql: add logging about the optimizer into the trace**

This commit derives a separate tracing span on the main query path that
covers the whole optimizer as well as adds a few logging messages around
the optbuilder and the optimization. It also audits some of the already
present logs to be written down before performing possibly long
operations.

Release note: None

**xform: check for context cancellation**

reviously, the optimizer didn't check the context cancellation ever
which could lead to surprising behavior - e.g. if the statement timeout
occurs in the middle of the optimization, we would still finish the
whole optimization phase only to realize that we were canceled at the
execution time.

This situation is now improved by periodically checking for the context
cancellation every time a group is optimized by `optimizeGroup`. It's
still possible for the optimizer to continue running long after
cancellation if there is a long running procedure at a deeper level.
For example, a long-running, individual invocation of an exploration
rule will not halt due to context cancellation. However,
the cancellation improvement in this commit should halt long-running
optimization in most cases.

Fixes: #70245.
Addresses: #70314.

Release note: None

85101: kvcoord: remove an artifact of unsetting WriteTooOld flag r=yuzefovich a=yuzefovich

It seems like during 20.1 release cycle we had a change so that the
errors don't carry `WriteTooOld` flag set to `true` and now only the
BatchResponse can have that. We kept the ability of unsetting that flag
so that in a mixed-version scenario we wouldn't run into any issues, but
now that unsetting behavior is no longer needed and is removed in this
commit.

My particular interest in removing this is because it is the only
modification of the `SetupFlowRequest` proto (which carries the `Txn`
proto inside) that occurs during the distributed query setup, and I want
to have more parallelization there.

Release note: None

85118: roachpb: remove `RangeDescriptor.deprecated_generation_comparable` r=pavelkalinnikov,andreimatei a=erikgrinaker

This patch removes the unused field `deprecated_generation_comparable`
from `RangeDescriptor`. The preparatory work for this was done in 20.2,
see a578719.

Release note: None

Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 28, 2022
In cockroachdb#84931 we taught the `ExternalStorage` infrastructure to
recognize the `external` URI scheme. In this change we do the same
but for the `KMS` infrastructure.

Concretely, a user is now able to create an external storage
object that represent a gcp KMS URI. This can be done using the
`CREATE EXTERNAL CONNECTION` syntax. The user is then able to point
an operation in CockroachDB such as BACKUP,RESTORE,SHOW BACKUP to that
KMS using an `external` URI. For example:

```
CREATE EXTERNAL CONNECTION backup AS 'nodelocal://1/foo';
CREATE EXTERNAL CONNECTION backupkms AS 'gs:///cmk?AUTH=implict';

BACKUP INTO 'external://foo' WITH kms='external://backupkms';
```

Under the hood, we implement the `ConnectionDetails` interface
for gcp KMS. This allows us to persist a row in the `external_connections`
table when the object is created, and to `Dial` the underlying resource
when the object is being used. The interfaces had to be tweaked slightly
to accomodate for the fact that they are now being implemented by two
different infrastructures `ExternalStorage` and `KMS`. This is an
expected evolution, and will pave the way for our third and final support
for changefeed `Sinks`. A large portion of this diff is just plumbing the
appropriate environments throught the backup/restore code.

This diff also adds KMS specific tests to `pkg/ccl/cloudccl/gcp` and tweaks
our nightly job to run these tests with the configured credentials.

Informs: cockroachdb#84753

Release note (sql change): GCP KMS can be represented as an External Connection
object, that can be used during a backup or restore using the `external` URI.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 29, 2022
In cockroachdb#84931 we taught the `ExternalStorage` infrastructure to
recognize the `external` URI scheme. In this change we do the same
but for the `KMS` infrastructure.

Concretely, a user is now able to create an external storage
object that represent a gcp KMS URI. This can be done using the
`CREATE EXTERNAL CONNECTION` syntax. The user is then able to point
an operation in CockroachDB such as BACKUP,RESTORE,SHOW BACKUP to that
KMS using an `external` URI. For example:

```
CREATE EXTERNAL CONNECTION backup AS 'nodelocal://1/foo';
CREATE EXTERNAL CONNECTION backupkms AS 'gs:///cmk?AUTH=implict';

BACKUP INTO 'external://foo' WITH kms='external://backupkms';
```

Under the hood, we implement the `ConnectionDetails` interface
for gcp KMS. This allows us to persist a row in the `external_connections`
table when the object is created, and to `Dial` the underlying resource
when the object is being used. The interfaces had to be tweaked slightly
to accomodate for the fact that they are now being implemented by two
different infrastructures `ExternalStorage` and `KMS`. This is an
expected evolution, and will pave the way for our third and final support
for changefeed `Sinks`. A large portion of this diff is just plumbing the
appropriate environments throught the backup/restore code.

This diff also adds KMS specific tests to `pkg/ccl/cloudccl/gcp` and tweaks
our nightly job to run these tests with the configured credentials.

Informs: cockroachdb#84753

Release note (sql change): GCP KMS can be represented as an External Connection
object, that can be used during a backup or restore using the `external` URI.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 31, 2022
In cockroachdb#84931 we taught the `ExternalStorage` infrastructure to
recognize the `external` URI scheme. In this change we do the same
but for the `KMS` infrastructure.

Concretely, a user is now able to create an external storage
object that represent a gcp KMS URI. This can be done using the
`CREATE EXTERNAL CONNECTION` syntax. The user is then able to point
an operation in CockroachDB such as BACKUP,RESTORE,SHOW BACKUP to that
KMS using an `external` URI. For example:

```
CREATE EXTERNAL CONNECTION backup AS 'nodelocal://1/foo';
CREATE EXTERNAL CONNECTION backupkms AS 'gs:///cmk?AUTH=implict';

BACKUP INTO 'external://foo' WITH kms='external://backupkms';
```

Under the hood, we implement the `ConnectionDetails` interface
for gcp KMS. This allows us to persist a row in the `external_connections`
table when the object is created, and to `Dial` the underlying resource
when the object is being used. The interfaces had to be tweaked slightly
to accomodate for the fact that they are now being implemented by two
different infrastructures `ExternalStorage` and `KMS`. This is an
expected evolution, and will pave the way for our third and final support
for changefeed `Sinks`. A large portion of this diff is just plumbing the
appropriate environments throught the backup/restore code.

This diff also adds KMS specific tests to `pkg/ccl/cloudccl/gcp` and tweaks
our nightly job to run these tests with the configured credentials.

Informs: cockroachdb#84753

Release note (sql change): GCP KMS can be represented as an External Connection
object, that can be used during a backup or restore using the `external` URI.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 1, 2022
This change adds the ability to create an external connection that
rerpresents a `kafka` sink. It also teaches changefeeds to recognize
the `external` schema URI.

When creating an external connection that represents a kafka sink, we
run validation on the passed in URI before persisting it in the system
table. This PR does not add any `WITH` options to the create statement
but in the near future we will want to allow the user to pass in certain
sink/resource specific configurations that will apply to all users of the
external connection object. For example, a kafka JSON config.

A changefeed can now be run to an `external` scheme URI that points to
an existing external connection object. Since we can only represent kafka
sinks as external connections as of now, the statement will only accept the
options that are valid for a kafka sink.

Informs: cockroachdb#84753

Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION`
to represent a `kafka` sink. Subsequently, users can run
`CREATE CHANGEFEED` with an `external:///<external-connection-object-name`
URI as the sink to use the kafka resource represented by the
external connection object.
craig bot pushed a commit that referenced this issue Aug 1, 2022
85075: externalconn,cloud: add GCP KMS support to external connections r=benbardin,rhu713 a=adityamaru

In #84931 we taught the `ExternalStorage` infrastructure to
recognize the `external` URI scheme. In this change we do the same
but for the `KMS` infrastructure.

Concretely, a user is now able to create an external storage
object that represent a gcp KMS URI. This can be done using the
`CREATE EXTERNAL CONNECTION` syntax. The user is then able to point
an operation in CockroachDB such as BACKUP,RESTORE,SHOW BACKUP to that
KMS using an `external` URI. For example:

```
CREATE EXTERNAL CONNECTION backup AS 'nodelocal://1/foo';
CREATE EXTERNAL CONNECTION backupkms AS 'gs:///cmk?AUTH=implict';

BACKUP INTO 'external://foo' WITH kms='external://backupkms';
```

Under the hood, we implement the `ConnectionDetails` interface
for gcp KMS. This allows us to persist a row in the `external_connections`
table when the object is created, and to `Dial` the underlying resource
when the object is being used. The interfaces had to be tweaked slightly
to accomodate for the fact that they are now being implemented by two
different infrastructures `ExternalStorage` and `KMS`. This is an
expected evolution, and will pave the way for our third and final support
for changefeed `Sinks`. A large portion of this diff is just plumbing the
appropriate environments throught the backup/restore code.

This diff also adds KMS specific tests to `pkg/ccl/cloudccl/gcp` and tweaks
our nightly job to run these tests with the configured credentials.

Informs: #84753

Release note (sql change): GCP KMS can be represented as an External Connection
object, that can be used during a backup or restore using the `external` URI.

85351: opt: simplify contradictory filters to false r=mgartner a=mgartner

The `SimplifySelectFilters` normalization rule now transforms filters to
`false` if any of the filter items are contradictions. This allows
filters to be simplified in more cases, which can trigger other rules.
This can eliminate non-leakproof expressions in filters and allow
`SimplifyZeroCardinalityGroup` to fire.

For example, consider the query:

    SELECT a, b FROM t WHERE a > 0 AND a < 0 AND 1/b = 1

Prior to this commit, it could not be simplified to an empty values
clause because the non-leakproof division operator prevents
SimplifyZeroCardinalityGroup from applying. The normalized plan would
be:

    select
     ├── columns: a:1 b:2
     ├── cardinality: [0 - 0]
     ├── immutable
     ├── scan t
     │    └── columns: a:1 b:2
     └── filters
          ├── (a:1 > 0) AND (a:1 < 0) [constraints=(contradiction)]
          └── (1 / b:2) = 1 [outer=(2), immutable]

Now, `SimplifySelectFilters` eliminates the division operator, making
the expression leakproof, and allowing `SimplifyZeroCardinalityGroup` to
apply. The normalized plan is now:

    values
     ├── columns: a:1!null b:2!null
     ├── cardinality: [0 - 0]
     ├── key: ()
     └── fd: ()-->(1-2)

Release note (performance improvement): The optimizer can detect
contradictory filters in more cases, leading to more efficient query
plans.

85364: storage: improve range key clearing r=jbowens a=erikgrinaker

This patch set improves clearing of range keys, with the overall goal of avoiding dropping Pebble range tombstones unnecessarily, in particular across range key spans that will most often be empty. The main affected components are:

* `ClearRange` gRPC method: similarly to point keys, it now uses an iterator to clear individual range keys for small data sets. For larger data sets, it will use Pebble range tombstones to clear point or range keys separately, but only if any such keys exist.

* Raft snapshots
  * Unreplicated data: no change, we only write a Pebble range tombstone across the point key span, since we don't expect range keys here.
  * Subsumed replicas' range-local state: iterates and clears any point/range keys (individually or ranged).
  * Snapshot SSTs: every SST (for each key span in the Raft range) unconditionally has a Pebble range tombstone at the bottom, both across point keys and range keys.

* Raft splits: on RHS conflict, iterates and clears any RHS point/range keys (individually or ranged).

* Raft merges: for RHS range-local data, iterates and clears any RHS point/range keys (individually or ranged).

* Raft replica removal: iterates and clears any RHS point/range keys (individually or ranged).

* Raft log truncation: no change, only writes tombstones across point keys.

The iffiest bit is the unconditional Pebble range tombstones at the bottom of Raft snapshot SSTs, both because they're unconditional, and also because they will be fairly common. `@jbowens` Do you consider this problematic? Specifically, this bit here:

https://github.com/cockroachdb/cockroach/blob/b0e76dc38bdc893ee3ba42131f9718fc855a1dc6/pkg/kv/kvserver/store_snapshot.go#L183-L185

Resolves #83032.

---

**storage: add point/range key params to `Writer.ClearRawRange()`**

This patch adds boolean parameters for point/range keys to
`Writer.ClearRawRange()`. When true, it will unconditionally write
Pebble range tombstones across the point/range key spans. This gives the
caller better control over when to write tombstones, to avoid writing
them unnecessarily.

This is a behavioral change, because the previous `Engine` and `Batch`
implementations would only drop Pebble range key tombstones if there
were range keys present. Later commits will introduce higher-level
functionality to avoid writing these tombstones unnecessarily and adapt
callers.

Release note: None

**storage: clear individual range keys in `ClearMVCCIteratorRange`**

Previously, `Writer.ClearMVCCIteratorRange` would use a Pebble
`RANGEKEYDEL` to clear all range keys within the span. This went against
the intent of the method, which is to iterate across the the span and
clear individual range keys. This patch changes is to do the latter.

Parameters are also added to control whether to clear point and/or range
keys.

Release note: None
  
**storage: remove `Writer.ClearAllRangeKeys`**

This patch removes `Writer.ClearAllRangeKeys()`, which was used to clear
all range keys in a span using a single `RANGEKEYDEL` tombstone.
`ClearRawRange` should be used instead.

This also removes some clumsy `Engine` logic that attempted to detect
and avoid writing range key tombstones if there were no range keys,
which required additional metadata tracking in batches and such. The
responsibility for this is instead placed on callers. It was never
possible to handle this sufficiently inside the writers anyway, because
of e.g. `SSTWriter` which does not have access to a reader.

Release note: None

**storage: add `Writer.ClearEngineRangeKey()`**

This patch adds `Writer.ClearEngineRangeKey()`, and makes
`ClearMVCCRangeKey()` call through to it. It also takes the opportunity
to consolidate the put logic in a similar fashion, by calling through to
the engine method.

Release note: None
  
**storage: improve `ClearRangeWithHeuristic` for range keys**

This patch improves `ClearRangeWithHeuristic()` to take individual,
optional thresholds for point and/or range keys, controlling when to use
`RANGEDEL` or `RANGEKEYDEL` instead of individual tombstones. The Raft
replica clearing code has been adapted to take advantage of this, and in
particular, will no longer drop Pebble range key tombstones unless there
are actual range keys present.

Release note: None

**kvserver/batcheval: improve range key clears in `ClearRange`**

This patch improves range key clearing in `ClearRange`, by using
`ClearRangeWithHeuristic` to only write Pebble range key tombstones if
there are any range keys present.

Release note: None
  
**storage: add point/range key params to `Writer.ClearMVCCRange`**

This patch adds point/range key parameters to `Writer.ClearMVCCRange()`
controlling whether to clear the key type. This can be used by callers
to avoid dropping unnecessary tombstones, particularly across range key
spans that are expected to be empty most of the time.

Release note: None

85412: logictest: temporarily disable workmem randomization in SQLLite tests r=yuzefovich a=yuzefovich

This commit restores 2bdcab7 that was
lost during the mega logic test refactor.

Fixes: #85377.

Release note: None

85423: opt: allow auto-commit and bounded staleness with distribute operator r=rytaft a=rytaft

**opt: allow auto-commit with distribute operator**

Prior to this commit, we disallowed auto-commit when a plan had a
distribute enforcer. However, this was an oversight, as the distribute
operator in the the optimizer is currently a no-op, and should not
impact the ability to perform auto-commit.

Fixes #85043

Release note (bug fix): Fixed a bug that was introduced in 22.1.0 that
could cause the optimizer to not use auto-commit for some mutations in
multi-region clusters when it should have done so. This could cause
poor query performance.

**opt: allow bounded staleness with distribute operator**

Prior to this commit, we disallowed bounded staleness when a plan had a
distribute enforcer. However, this was an oversight, as the distribute
operator in the the optimizer is currently a no-op, and should not
impact the ability to perform reads with bounded staleness.

Fixes #85288

Release note (bug fix): Fixed a bug that was introduced in 22.1.0 that
could cause the optimizer to reject valid bounded staleness queries with
the error "unimplemented: cannot use bounded staleness for DISTRIBUTE".
This has now been fixed.

Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 3, 2022
This change adds the ability to create an external connection that
rerpresents a `kafka` sink. It also teaches changefeeds to recognize
the `external` schema URI.

When creating an external connection that represents a kafka sink, we
run validation on the passed in URI before persisting it in the system
table. This PR does not add any `WITH` options to the create statement
but in the near future we will want to allow the user to pass in certain
sink/resource specific configurations that will apply to all users of the
external connection object. For example, a kafka JSON config.

A changefeed can now be run to an `external` scheme URI that points to
an existing external connection object. Since we can only represent kafka
sinks as external connections as of now, the statement will only accept the
options that are valid for a kafka sink.

Informs: cockroachdb#84753

Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION`
to represent a `kafka` sink. Subsequently, users can run
`CREATE CHANGEFEED` with an `external:///<external-connection-object-name`
URI as the sink to use the kafka resource represented by the
external connection object.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 4, 2022
This change adds the ability to create an external connection that
rerpresents a `kafka` sink. It also teaches changefeeds to recognize
the `external` schema URI.

When creating an external connection that represents a kafka sink, we
run validation on the passed in URI before persisting it in the system
table. This PR does not add any `WITH` options to the create statement
but in the near future we will want to allow the user to pass in certain
sink/resource specific configurations that will apply to all users of the
external connection object. For example, a kafka JSON config.

A changefeed can now be run to an `external` scheme URI that points to
an existing external connection object. Since we can only represent kafka
sinks as external connections as of now, the statement will only accept the
options that are valid for a kafka sink.

Informs: cockroachdb#84753

Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION`
to represent a `kafka` sink. Subsequently, users can run
`CREATE CHANGEFEED` with an `external:///<external-connection-object-name`
URI as the sink to use the kafka resource represented by the
external connection object.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 5, 2022
This change registers s3 as a URI that can be represented as
an External Connection. Most notably we take a page from the
CDC book and switch the s3 parse function to check for invalid
parameters, and configurations. This allows us to catch certain
misconfiguration at the time we create the external connection.

Informs: cockroachdb#84753

Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION`
to represent an s3 URI.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 8, 2022
This change registers s3 as a URI that can be represented as
an External Connection. Most notably we take a page from the
CDC book and switch the s3 parse function to check for invalid
parameters, and configurations. This allows us to catch certain
misconfiguration at the time we create the external connection.

Informs: cockroachdb#84753

Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION`
to represent an s3 URI.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 10, 2022
This change adds the ability to create an external connection that
rerpresents a `kafka` sink. It also teaches changefeeds to recognize
the `external` schema URI.

When creating an external connection that represents a kafka sink, we
run validation on the passed in URI before persisting it in the system
table. This PR does not add any `WITH` options to the create statement
but in the near future we will want to allow the user to pass in certain
sink/resource specific configurations that will apply to all users of the
external connection object. For example, a kafka JSON config.

A changefeed can now be run to an `external` scheme URI that points to
an existing external connection object. Since we can only represent kafka
sinks as external connections as of now, the statement will only accept the
options that are valid for a kafka sink.

Informs: cockroachdb#84753

Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION`
to represent a `kafka` sink. Subsequently, users can run
`CREATE CHANGEFEED` with an `external:///<external-connection-object-name`
URI as the sink to use the kafka resource represented by the
external connection object.
craig bot pushed a commit that referenced this issue Aug 10, 2022
84923:  importer: replace Clear/RevertRange calls with DeleteRange in IMPORT rollback r=dt,erikgrinaker a=msbutler

This patch replaces the ClearRange and RevertRange calls with MVCC DeleteRange
during IMPORT rollbacks.

In the predicate based DeleteRange call, the client's provided table span is
partitioned in order to process the partitions in parallel. The new
bulkio.import.predicate_delete_range_parallelism setting defines the number of
workers that can issue DeleteRange requests in concurrently, and the new
bulkio.import.predicate_delete_range_batch_size setting defines the number of
ranges of data to include in a single DeleteRange request.

Release note: none

85410: externalconn,changefeedccl: support kafka in external connections r=adityamaru a=adityamaru

This change adds the ability to create an external connection that
rerpresents a `kafka` sink. It also teaches changefeeds to recognize
the `external` schema URI.

When creating an external connection that represents a kafka sink, we
run validation on the passed in URI before persisting it in the system
table. This PR does not add any `WITH` options to the create statement
but in the near future we will want to allow the user to pass in certain
sink/resource specific configurations that will apply to all users of the
external connection object. For example, a kafka JSON config.

A changefeed can now be run to an `external` scheme URI that points to
an existing external connection object. Since we can only represent kafka
sinks as external connections as of now, the statement will only accept the
options that are valid for a kafka sink.

Informs: #84753

Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION`
to represent a `kafka` sink. Subsequently, users can run
`CREATE CHANGEFEED` with an `external:///<external-connection-object-name`
URI as the sink to use the kafka resource represented by the
external connection object.

Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 10, 2022
This change registers s3 as a URI that can be represented as
an External Connection. Most notably we take a page from the
CDC book and switch the s3 parse function to check for invalid
parameters, and configurations. This allows us to catch certain
misconfiguration at the time we create the external connection.

Informs: cockroachdb#84753

Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION`
to represent an s3 URI.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Aug 10, 2022
This change registers s3 as a URI that can be represented as
an External Connection. Most notably we take a page from the
CDC book and switch the s3 parse function to check for invalid
parameters, and configurations. This allows us to catch certain
misconfiguration at the time we create the external connection.

Informs: cockroachdb#84753

Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION`
to represent an s3 URI.
craig bot pushed a commit that referenced this issue Aug 10, 2022
85680: amazon,externalconn: add s3 support to External Connections r=adityamaru a=adityamaru

This change registers s3 as a URI that can be represented as
an External Connection. Most notably we take a page from the
CDC book and switch the s3 parse function to check for invalid
parameters, and configurations. This allows us to catch certain
misconfiguration at the time we create the external connection.

Informs: #84753

Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION`
to represent an s3 URI.

85849: sql/catalog/tabledesc: relaxed validation for virtual col in SUFFIX cols r=Xiang-Gu a=Xiang-Gu

One of the validation rule says that "we don't allow virtual columns to
be in the SUFFIX columns of a secondary index", except for one case:
`ALTER PRIMARY KYE USING HASH`, where the implicitly created virtual,
shard column, will need to appear in the SUFFIX columns of the
implicitly created unique, secondary index of the old PK key columns (
which btw is a CRDB unique feature).

The validation has logic to exempt us from this special case but it's
written specifically to the legacy schema changer. Namely, it uses the
canonical `PrimaryKeySwap` mutation type as the signal but we don't have
that in the declarative schema changer. This PR addresses this issue and
allows the validation logic to also exempt the exact same case but
in the declarative schema changer.

Release note: None

85862: ui: new insights table component r=maryliag a=maryliag

Creation of the base insights table,
that can be used on both schema insights and statement
details page.

This commit only introduce the table, with the different
types, but still needs the proper cell formating for
each type.

Part of #83783

Release note: None

85865: rowexec: use the sorter's own context r=yuzefovich a=yuzefovich

Release note: none

85866: streamingccl, rowexec: correctly mark eventStream as "streaming" r=yuzefovich a=yuzefovich

This commit fixes an incomplete solution of
9bb5d30 which attempted to mark
`eventStream` generator builtin as "streaming" so that the columnarizer
on top of the `projectSetProcessor` would not buffer anything. As found
by Steven, the solution in that commit was incomplete since the
generators array is not populated at the time where `MustBeStreaming`
check is performed. This commit fixes that oversight by using
a different way of propagating the property - via the function
properties.

Release note: None

85885: kvserver: add queue size metric to RaftTransport r=tbg a=pavelkalinnikov

Currently, there is a `RaftEnqueuePending` metric in `StoreMetrics` which exposes
the `RaftTransport` outgoing queue size. However, this is a per-`Store` metric, so
it duplicates the same variable across all the stores. The metric should be
tracked on a per-node/transport basis.

This PR introduces a per-transport `SendQueueSize` metric to replace the old
`RaftEnqueuePending`. It also does the necessary plumbing to include it to
the exported metrics, since it's the first metric in `RaftTransport`.

<img width="1350" alt="Screenshot 2022-08-10 at 15 37 47" src="https://user-images.githubusercontent.com/3757441/183929666-0f6523d7-5877-4f2f-8460-4f013f87966d.png">

Touches #83917

Release note: None

85906: eval: don't always ignore error from ResolveFunctionByOID r=chengxiong-ruan a=rafiss

This addresses a comment from the review of #85656.

Release note: None

85914: systemschema_test: fix timestamp stripping regex r=postamar a=postamar

The test in this package was flaky because hlc.Timestamp values inside
descriptors (modification time, for instance) with a logical component
would be improperly redacted. This commit fixes this.

Fixes #85799.

Release note: None

Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Xiang Gu <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Marius Posta <[email protected]>
@adityamaru
Copy link
Contributor Author

This is now done, we must continue registering more providers that is being tracked in #84228.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery
Projects
No open projects
Archived in project
Development

No branches or pull requests

1 participant