-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
external: introduce an external://
URI scheme that is recognized by ExternalStorage, Sink and KMS
#84753
Labels
A-disaster-recovery
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-disaster-recovery
Comments
cc @cockroachdb/bulk-io |
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Jul 22, 2022
In cockroachdb#84310 we added the ability to create an external connection to represent a `nodelocal` endpoint. This diff is the second piece of the puzzle that allows systems in CRDB to interact with the external connection object. We introduce an `external` URI scheme to the `ExternalStorage` registry. URIs with an `external` scheme are required to contain a host component referring to the unique name of an existing extenral connection object. Optionally, the URI can also contain a path component that will be appended to the endpoint that was specified at the time the external connection object was created. This is necessary for operations such as backup and restore that read/write to subdirectories in the endpoint inputted by the user. A nice UX win is the abilility to have a single external connection object for the base bucket, and then interact with all the subdirectories without having to create an object for each directory. In the future we may want to clamp down on this, and allow the user to specify which objects permit subdirectory access. The `external://<object-name>/<optional-path>` URI is parsed and the underlying object is fetched from the `system.external_connections` table. The resource represented by the object is then `Dial()`ed to return an `ExternalStorage` handle that can be used to read, write, list etc. With this change all bulk operations and cdc are able to use external connections to represent a `nodelocal` endpoint. For example, a backup can now be run as: ``` CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo'; BACKUP INTO 'external://foo'; RESTORE FROM LATEST IN 'external://foo'; ``` Gradually, we will add support for all other external storage endpoints as well. Note, we do not register limiter settings for the `external` provider `ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is because the underlying resource of the external connection object will already have its registered limiters and recorder. Informs: cockroachdb#84753 Release note (sql change): Bulk operations and CDC will accept an `external` scheme URI that points to a previously created External Connection object, These operations can then interact with the underlying resource represented by the object as they did before.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Jul 25, 2022
In cockroachdb#84310 we added the ability to create an external connection to represent a `nodelocal` endpoint. This diff is the second piece of the puzzle that allows systems in CRDB to interact with the external connection object. We introduce an `external` URI scheme to the `ExternalStorage` registry. URIs with an `external` scheme are required to contain a host component referring to the unique name of an existing extenral connection object. Optionally, the URI can also contain a path component that will be appended to the endpoint that was specified at the time the external connection object was created. This is necessary for operations such as backup and restore that read/write to subdirectories in the endpoint inputted by the user. A nice UX win is the abilility to have a single external connection object for the base bucket, and then interact with all the subdirectories without having to create an object for each directory. In the future we may want to clamp down on this, and allow the user to specify which objects permit subdirectory access. The `external://<object-name>/<optional-path>` URI is parsed and the underlying object is fetched from the `system.external_connections` table. The resource represented by the object is then `Dial()`ed to return an `ExternalStorage` handle that can be used to read, write, list etc. With this change all bulk operations and cdc are able to use external connections to represent a `nodelocal` endpoint. For example, a backup can now be run as: ``` CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo'; BACKUP INTO 'external://foo'; RESTORE FROM LATEST IN 'external://foo'; ``` Gradually, we will add support for all other external storage endpoints as well. Note, we do not register limiter settings for the `external` provider `ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is because the underlying resource of the external connection object will already have its registered limiters and recorder. Informs: cockroachdb#84753 Release note (sql change): Bulk operations and CDC will accept an `external` scheme URI that points to a previously created External Connection object, These operations can then interact with the underlying resource represented by the object as they did before.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Jul 27, 2022
In cockroachdb#84310 we added the ability to create an external connection to represent a `nodelocal` endpoint. This diff is the second piece of the puzzle that allows systems in CRDB to interact with the external connection object. We introduce an `external` URI scheme to the `ExternalStorage` registry. URIs with an `external` scheme are required to contain a host component referring to the unique name of an existing extenral connection object. Optionally, the URI can also contain a path component that will be appended to the endpoint that was specified at the time the external connection object was created. This is necessary for operations such as backup and restore that read/write to subdirectories in the endpoint inputted by the user. A nice UX win is the abilility to have a single external connection object for the base bucket, and then interact with all the subdirectories without having to create an object for each directory. In the future we may want to clamp down on this, and allow the user to specify which objects permit subdirectory access. The `external://<object-name>/<optional-path>` URI is parsed and the underlying object is fetched from the `system.external_connections` table. The resource represented by the object is then `Dial()`ed to return an `ExternalStorage` handle that can be used to read, write, list etc. With this change all bulk operations and cdc are able to use external connections to represent a `nodelocal` endpoint. For example, a backup can now be run as: ``` CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo'; BACKUP INTO 'external://foo'; RESTORE FROM LATEST IN 'external://foo'; ``` Gradually, we will add support for all other external storage endpoints as well. Note, we do not register limiter settings for the `external` provider `ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is because the underlying resource of the external connection object will already have its registered limiters and recorder. Informs: cockroachdb#84753 Release note (sql change): Bulk operations and CDC will accept an `external` scheme URI that points to a previously created External Connection object, These operations can then interact with the underlying resource represented by the object as they did before.
craig bot
pushed a commit
that referenced
this issue
Jul 27, 2022
84931: externalconn,nodelocal: add `external` ExternalStorage provider r=adityamaru a=adityamaru In #84310 we added the ability to create an external connection to represent a `nodelocal` endpoint. This diff is the second piece of the puzzle that allows systems in CRDB to interact with the external connection object. We introduce an `external` URI scheme to the `ExternalStorage` registry. URIs with an `external` scheme are required to contain a host component referring to the unique name of an existing extenral connection object. Optionally, the URI can also contain a path component that will be appended to the endpoint that was specified at the time the external connection object was created. This is necessary for operations such as backup and restore that read/write to subdirectories in the endpoint inputted by the user. A nice UX win is the abilility to have a single external connection object for the base bucket, and then interact with all the subdirectories without having to create an object for each directory. In the future we may want to clamp down on this, and allow the user to specify which objects permit subdirectory access. The `external://<object-name>/<optional-path>` URI is parsed and the underlying object is fetched from the `system.external_connections` table. The resource represented by the object is then `Dial()`ed to return an `ExternalStorage` handle that can be used to read, write, list etc. With this change all bulk operations and cdc are able to use external connections to represent a `nodelocal` endpoint. For example, a backup can now be run as: ``` CREATE EXTERNAL CONNECTION foo AS 'nodelocal://1/foo'; BACKUP INTO 'external://foo'; RESTORE FROM LATEST IN 'external://foo'; ``` Gradually, we will add support for all other external storage endpoints as well. Note, we do not register limiter settings for the `external` provider `ExternalStorage` interface, nor do we wrap it with an ioRecorder. This is because the underlying resource of the external connection object will already have its registered limiters and recorder. Informs: #84753 Release note (sql change): Bulk operations and CDC will accept an `external` scheme URI that points to a previously created External Connection object, These operations can then interact with the underlying resource represented by the object as they did before. 85041: sql: better logging about the optimizer as well as some cancellation checks r=yuzefovich a=yuzefovich **sql: add logging about the optimizer into the trace** This commit derives a separate tracing span on the main query path that covers the whole optimizer as well as adds a few logging messages around the optbuilder and the optimization. It also audits some of the already present logs to be written down before performing possibly long operations. Release note: None **xform: check for context cancellation** reviously, the optimizer didn't check the context cancellation ever which could lead to surprising behavior - e.g. if the statement timeout occurs in the middle of the optimization, we would still finish the whole optimization phase only to realize that we were canceled at the execution time. This situation is now improved by periodically checking for the context cancellation every time a group is optimized by `optimizeGroup`. It's still possible for the optimizer to continue running long after cancellation if there is a long running procedure at a deeper level. For example, a long-running, individual invocation of an exploration rule will not halt due to context cancellation. However, the cancellation improvement in this commit should halt long-running optimization in most cases. Fixes: #70245. Addresses: #70314. Release note: None 85101: kvcoord: remove an artifact of unsetting WriteTooOld flag r=yuzefovich a=yuzefovich It seems like during 20.1 release cycle we had a change so that the errors don't carry `WriteTooOld` flag set to `true` and now only the BatchResponse can have that. We kept the ability of unsetting that flag so that in a mixed-version scenario we wouldn't run into any issues, but now that unsetting behavior is no longer needed and is removed in this commit. My particular interest in removing this is because it is the only modification of the `SetupFlowRequest` proto (which carries the `Txn` proto inside) that occurs during the distributed query setup, and I want to have more parallelization there. Release note: None 85118: roachpb: remove `RangeDescriptor.deprecated_generation_comparable` r=pavelkalinnikov,andreimatei a=erikgrinaker This patch removes the unused field `deprecated_generation_comparable` from `RangeDescriptor`. The preparatory work for this was done in 20.2, see a578719. Release note: None Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Erik Grinaker <[email protected]>
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Jul 28, 2022
In cockroachdb#84931 we taught the `ExternalStorage` infrastructure to recognize the `external` URI scheme. In this change we do the same but for the `KMS` infrastructure. Concretely, a user is now able to create an external storage object that represent a gcp KMS URI. This can be done using the `CREATE EXTERNAL CONNECTION` syntax. The user is then able to point an operation in CockroachDB such as BACKUP,RESTORE,SHOW BACKUP to that KMS using an `external` URI. For example: ``` CREATE EXTERNAL CONNECTION backup AS 'nodelocal://1/foo'; CREATE EXTERNAL CONNECTION backupkms AS 'gs:///cmk?AUTH=implict'; BACKUP INTO 'external://foo' WITH kms='external://backupkms'; ``` Under the hood, we implement the `ConnectionDetails` interface for gcp KMS. This allows us to persist a row in the `external_connections` table when the object is created, and to `Dial` the underlying resource when the object is being used. The interfaces had to be tweaked slightly to accomodate for the fact that they are now being implemented by two different infrastructures `ExternalStorage` and `KMS`. This is an expected evolution, and will pave the way for our third and final support for changefeed `Sinks`. A large portion of this diff is just plumbing the appropriate environments throught the backup/restore code. This diff also adds KMS specific tests to `pkg/ccl/cloudccl/gcp` and tweaks our nightly job to run these tests with the configured credentials. Informs: cockroachdb#84753 Release note (sql change): GCP KMS can be represented as an External Connection object, that can be used during a backup or restore using the `external` URI.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Jul 29, 2022
In cockroachdb#84931 we taught the `ExternalStorage` infrastructure to recognize the `external` URI scheme. In this change we do the same but for the `KMS` infrastructure. Concretely, a user is now able to create an external storage object that represent a gcp KMS URI. This can be done using the `CREATE EXTERNAL CONNECTION` syntax. The user is then able to point an operation in CockroachDB such as BACKUP,RESTORE,SHOW BACKUP to that KMS using an `external` URI. For example: ``` CREATE EXTERNAL CONNECTION backup AS 'nodelocal://1/foo'; CREATE EXTERNAL CONNECTION backupkms AS 'gs:///cmk?AUTH=implict'; BACKUP INTO 'external://foo' WITH kms='external://backupkms'; ``` Under the hood, we implement the `ConnectionDetails` interface for gcp KMS. This allows us to persist a row in the `external_connections` table when the object is created, and to `Dial` the underlying resource when the object is being used. The interfaces had to be tweaked slightly to accomodate for the fact that they are now being implemented by two different infrastructures `ExternalStorage` and `KMS`. This is an expected evolution, and will pave the way for our third and final support for changefeed `Sinks`. A large portion of this diff is just plumbing the appropriate environments throught the backup/restore code. This diff also adds KMS specific tests to `pkg/ccl/cloudccl/gcp` and tweaks our nightly job to run these tests with the configured credentials. Informs: cockroachdb#84753 Release note (sql change): GCP KMS can be represented as an External Connection object, that can be used during a backup or restore using the `external` URI.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Jul 31, 2022
In cockroachdb#84931 we taught the `ExternalStorage` infrastructure to recognize the `external` URI scheme. In this change we do the same but for the `KMS` infrastructure. Concretely, a user is now able to create an external storage object that represent a gcp KMS URI. This can be done using the `CREATE EXTERNAL CONNECTION` syntax. The user is then able to point an operation in CockroachDB such as BACKUP,RESTORE,SHOW BACKUP to that KMS using an `external` URI. For example: ``` CREATE EXTERNAL CONNECTION backup AS 'nodelocal://1/foo'; CREATE EXTERNAL CONNECTION backupkms AS 'gs:///cmk?AUTH=implict'; BACKUP INTO 'external://foo' WITH kms='external://backupkms'; ``` Under the hood, we implement the `ConnectionDetails` interface for gcp KMS. This allows us to persist a row in the `external_connections` table when the object is created, and to `Dial` the underlying resource when the object is being used. The interfaces had to be tweaked slightly to accomodate for the fact that they are now being implemented by two different infrastructures `ExternalStorage` and `KMS`. This is an expected evolution, and will pave the way for our third and final support for changefeed `Sinks`. A large portion of this diff is just plumbing the appropriate environments throught the backup/restore code. This diff also adds KMS specific tests to `pkg/ccl/cloudccl/gcp` and tweaks our nightly job to run these tests with the configured credentials. Informs: cockroachdb#84753 Release note (sql change): GCP KMS can be represented as an External Connection object, that can be used during a backup or restore using the `external` URI.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Aug 1, 2022
This change adds the ability to create an external connection that rerpresents a `kafka` sink. It also teaches changefeeds to recognize the `external` schema URI. When creating an external connection that represents a kafka sink, we run validation on the passed in URI before persisting it in the system table. This PR does not add any `WITH` options to the create statement but in the near future we will want to allow the user to pass in certain sink/resource specific configurations that will apply to all users of the external connection object. For example, a kafka JSON config. A changefeed can now be run to an `external` scheme URI that points to an existing external connection object. Since we can only represent kafka sinks as external connections as of now, the statement will only accept the options that are valid for a kafka sink. Informs: cockroachdb#84753 Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION` to represent a `kafka` sink. Subsequently, users can run `CREATE CHANGEFEED` with an `external:///<external-connection-object-name` URI as the sink to use the kafka resource represented by the external connection object.
craig bot
pushed a commit
that referenced
this issue
Aug 1, 2022
85075: externalconn,cloud: add GCP KMS support to external connections r=benbardin,rhu713 a=adityamaru In #84931 we taught the `ExternalStorage` infrastructure to recognize the `external` URI scheme. In this change we do the same but for the `KMS` infrastructure. Concretely, a user is now able to create an external storage object that represent a gcp KMS URI. This can be done using the `CREATE EXTERNAL CONNECTION` syntax. The user is then able to point an operation in CockroachDB such as BACKUP,RESTORE,SHOW BACKUP to that KMS using an `external` URI. For example: ``` CREATE EXTERNAL CONNECTION backup AS 'nodelocal://1/foo'; CREATE EXTERNAL CONNECTION backupkms AS 'gs:///cmk?AUTH=implict'; BACKUP INTO 'external://foo' WITH kms='external://backupkms'; ``` Under the hood, we implement the `ConnectionDetails` interface for gcp KMS. This allows us to persist a row in the `external_connections` table when the object is created, and to `Dial` the underlying resource when the object is being used. The interfaces had to be tweaked slightly to accomodate for the fact that they are now being implemented by two different infrastructures `ExternalStorage` and `KMS`. This is an expected evolution, and will pave the way for our third and final support for changefeed `Sinks`. A large portion of this diff is just plumbing the appropriate environments throught the backup/restore code. This diff also adds KMS specific tests to `pkg/ccl/cloudccl/gcp` and tweaks our nightly job to run these tests with the configured credentials. Informs: #84753 Release note (sql change): GCP KMS can be represented as an External Connection object, that can be used during a backup or restore using the `external` URI. 85351: opt: simplify contradictory filters to false r=mgartner a=mgartner The `SimplifySelectFilters` normalization rule now transforms filters to `false` if any of the filter items are contradictions. This allows filters to be simplified in more cases, which can trigger other rules. This can eliminate non-leakproof expressions in filters and allow `SimplifyZeroCardinalityGroup` to fire. For example, consider the query: SELECT a, b FROM t WHERE a > 0 AND a < 0 AND 1/b = 1 Prior to this commit, it could not be simplified to an empty values clause because the non-leakproof division operator prevents SimplifyZeroCardinalityGroup from applying. The normalized plan would be: select ├── columns: a:1 b:2 ├── cardinality: [0 - 0] ├── immutable ├── scan t │ └── columns: a:1 b:2 └── filters ├── (a:1 > 0) AND (a:1 < 0) [constraints=(contradiction)] └── (1 / b:2) = 1 [outer=(2), immutable] Now, `SimplifySelectFilters` eliminates the division operator, making the expression leakproof, and allowing `SimplifyZeroCardinalityGroup` to apply. The normalized plan is now: values ├── columns: a:1!null b:2!null ├── cardinality: [0 - 0] ├── key: () └── fd: ()-->(1-2) Release note (performance improvement): The optimizer can detect contradictory filters in more cases, leading to more efficient query plans. 85364: storage: improve range key clearing r=jbowens a=erikgrinaker This patch set improves clearing of range keys, with the overall goal of avoiding dropping Pebble range tombstones unnecessarily, in particular across range key spans that will most often be empty. The main affected components are: * `ClearRange` gRPC method: similarly to point keys, it now uses an iterator to clear individual range keys for small data sets. For larger data sets, it will use Pebble range tombstones to clear point or range keys separately, but only if any such keys exist. * Raft snapshots * Unreplicated data: no change, we only write a Pebble range tombstone across the point key span, since we don't expect range keys here. * Subsumed replicas' range-local state: iterates and clears any point/range keys (individually or ranged). * Snapshot SSTs: every SST (for each key span in the Raft range) unconditionally has a Pebble range tombstone at the bottom, both across point keys and range keys. * Raft splits: on RHS conflict, iterates and clears any RHS point/range keys (individually or ranged). * Raft merges: for RHS range-local data, iterates and clears any RHS point/range keys (individually or ranged). * Raft replica removal: iterates and clears any RHS point/range keys (individually or ranged). * Raft log truncation: no change, only writes tombstones across point keys. The iffiest bit is the unconditional Pebble range tombstones at the bottom of Raft snapshot SSTs, both because they're unconditional, and also because they will be fairly common. `@jbowens` Do you consider this problematic? Specifically, this bit here: https://github.com/cockroachdb/cockroach/blob/b0e76dc38bdc893ee3ba42131f9718fc855a1dc6/pkg/kv/kvserver/store_snapshot.go#L183-L185 Resolves #83032. --- **storage: add point/range key params to `Writer.ClearRawRange()`** This patch adds boolean parameters for point/range keys to `Writer.ClearRawRange()`. When true, it will unconditionally write Pebble range tombstones across the point/range key spans. This gives the caller better control over when to write tombstones, to avoid writing them unnecessarily. This is a behavioral change, because the previous `Engine` and `Batch` implementations would only drop Pebble range key tombstones if there were range keys present. Later commits will introduce higher-level functionality to avoid writing these tombstones unnecessarily and adapt callers. Release note: None **storage: clear individual range keys in `ClearMVCCIteratorRange`** Previously, `Writer.ClearMVCCIteratorRange` would use a Pebble `RANGEKEYDEL` to clear all range keys within the span. This went against the intent of the method, which is to iterate across the the span and clear individual range keys. This patch changes is to do the latter. Parameters are also added to control whether to clear point and/or range keys. Release note: None **storage: remove `Writer.ClearAllRangeKeys`** This patch removes `Writer.ClearAllRangeKeys()`, which was used to clear all range keys in a span using a single `RANGEKEYDEL` tombstone. `ClearRawRange` should be used instead. This also removes some clumsy `Engine` logic that attempted to detect and avoid writing range key tombstones if there were no range keys, which required additional metadata tracking in batches and such. The responsibility for this is instead placed on callers. It was never possible to handle this sufficiently inside the writers anyway, because of e.g. `SSTWriter` which does not have access to a reader. Release note: None **storage: add `Writer.ClearEngineRangeKey()`** This patch adds `Writer.ClearEngineRangeKey()`, and makes `ClearMVCCRangeKey()` call through to it. It also takes the opportunity to consolidate the put logic in a similar fashion, by calling through to the engine method. Release note: None **storage: improve `ClearRangeWithHeuristic` for range keys** This patch improves `ClearRangeWithHeuristic()` to take individual, optional thresholds for point and/or range keys, controlling when to use `RANGEDEL` or `RANGEKEYDEL` instead of individual tombstones. The Raft replica clearing code has been adapted to take advantage of this, and in particular, will no longer drop Pebble range key tombstones unless there are actual range keys present. Release note: None **kvserver/batcheval: improve range key clears in `ClearRange`** This patch improves range key clearing in `ClearRange`, by using `ClearRangeWithHeuristic` to only write Pebble range key tombstones if there are any range keys present. Release note: None **storage: add point/range key params to `Writer.ClearMVCCRange`** This patch adds point/range key parameters to `Writer.ClearMVCCRange()` controlling whether to clear the key type. This can be used by callers to avoid dropping unnecessary tombstones, particularly across range key spans that are expected to be empty most of the time. Release note: None 85412: logictest: temporarily disable workmem randomization in SQLLite tests r=yuzefovich a=yuzefovich This commit restores 2bdcab7 that was lost during the mega logic test refactor. Fixes: #85377. Release note: None 85423: opt: allow auto-commit and bounded staleness with distribute operator r=rytaft a=rytaft **opt: allow auto-commit with distribute operator** Prior to this commit, we disallowed auto-commit when a plan had a distribute enforcer. However, this was an oversight, as the distribute operator in the the optimizer is currently a no-op, and should not impact the ability to perform auto-commit. Fixes #85043 Release note (bug fix): Fixed a bug that was introduced in 22.1.0 that could cause the optimizer to not use auto-commit for some mutations in multi-region clusters when it should have done so. This could cause poor query performance. **opt: allow bounded staleness with distribute operator** Prior to this commit, we disallowed bounded staleness when a plan had a distribute enforcer. However, this was an oversight, as the distribute operator in the the optimizer is currently a no-op, and should not impact the ability to perform reads with bounded staleness. Fixes #85288 Release note (bug fix): Fixed a bug that was introduced in 22.1.0 that could cause the optimizer to reject valid bounded staleness queries with the error "unimplemented: cannot use bounded staleness for DISTRIBUTE". This has now been fixed. Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Rebecca Taft <[email protected]>
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Aug 3, 2022
This change adds the ability to create an external connection that rerpresents a `kafka` sink. It also teaches changefeeds to recognize the `external` schema URI. When creating an external connection that represents a kafka sink, we run validation on the passed in URI before persisting it in the system table. This PR does not add any `WITH` options to the create statement but in the near future we will want to allow the user to pass in certain sink/resource specific configurations that will apply to all users of the external connection object. For example, a kafka JSON config. A changefeed can now be run to an `external` scheme URI that points to an existing external connection object. Since we can only represent kafka sinks as external connections as of now, the statement will only accept the options that are valid for a kafka sink. Informs: cockroachdb#84753 Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION` to represent a `kafka` sink. Subsequently, users can run `CREATE CHANGEFEED` with an `external:///<external-connection-object-name` URI as the sink to use the kafka resource represented by the external connection object.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Aug 4, 2022
This change adds the ability to create an external connection that rerpresents a `kafka` sink. It also teaches changefeeds to recognize the `external` schema URI. When creating an external connection that represents a kafka sink, we run validation on the passed in URI before persisting it in the system table. This PR does not add any `WITH` options to the create statement but in the near future we will want to allow the user to pass in certain sink/resource specific configurations that will apply to all users of the external connection object. For example, a kafka JSON config. A changefeed can now be run to an `external` scheme URI that points to an existing external connection object. Since we can only represent kafka sinks as external connections as of now, the statement will only accept the options that are valid for a kafka sink. Informs: cockroachdb#84753 Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION` to represent a `kafka` sink. Subsequently, users can run `CREATE CHANGEFEED` with an `external:///<external-connection-object-name` URI as the sink to use the kafka resource represented by the external connection object.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Aug 5, 2022
This change registers s3 as a URI that can be represented as an External Connection. Most notably we take a page from the CDC book and switch the s3 parse function to check for invalid parameters, and configurations. This allows us to catch certain misconfiguration at the time we create the external connection. Informs: cockroachdb#84753 Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION` to represent an s3 URI.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Aug 8, 2022
This change registers s3 as a URI that can be represented as an External Connection. Most notably we take a page from the CDC book and switch the s3 parse function to check for invalid parameters, and configurations. This allows us to catch certain misconfiguration at the time we create the external connection. Informs: cockroachdb#84753 Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION` to represent an s3 URI.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Aug 10, 2022
This change adds the ability to create an external connection that rerpresents a `kafka` sink. It also teaches changefeeds to recognize the `external` schema URI. When creating an external connection that represents a kafka sink, we run validation on the passed in URI before persisting it in the system table. This PR does not add any `WITH` options to the create statement but in the near future we will want to allow the user to pass in certain sink/resource specific configurations that will apply to all users of the external connection object. For example, a kafka JSON config. A changefeed can now be run to an `external` scheme URI that points to an existing external connection object. Since we can only represent kafka sinks as external connections as of now, the statement will only accept the options that are valid for a kafka sink. Informs: cockroachdb#84753 Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION` to represent a `kafka` sink. Subsequently, users can run `CREATE CHANGEFEED` with an `external:///<external-connection-object-name` URI as the sink to use the kafka resource represented by the external connection object.
craig bot
pushed a commit
that referenced
this issue
Aug 10, 2022
84923: importer: replace Clear/RevertRange calls with DeleteRange in IMPORT rollback r=dt,erikgrinaker a=msbutler This patch replaces the ClearRange and RevertRange calls with MVCC DeleteRange during IMPORT rollbacks. In the predicate based DeleteRange call, the client's provided table span is partitioned in order to process the partitions in parallel. The new bulkio.import.predicate_delete_range_parallelism setting defines the number of workers that can issue DeleteRange requests in concurrently, and the new bulkio.import.predicate_delete_range_batch_size setting defines the number of ranges of data to include in a single DeleteRange request. Release note: none 85410: externalconn,changefeedccl: support kafka in external connections r=adityamaru a=adityamaru This change adds the ability to create an external connection that rerpresents a `kafka` sink. It also teaches changefeeds to recognize the `external` schema URI. When creating an external connection that represents a kafka sink, we run validation on the passed in URI before persisting it in the system table. This PR does not add any `WITH` options to the create statement but in the near future we will want to allow the user to pass in certain sink/resource specific configurations that will apply to all users of the external connection object. For example, a kafka JSON config. A changefeed can now be run to an `external` scheme URI that points to an existing external connection object. Since we can only represent kafka sinks as external connections as of now, the statement will only accept the options that are valid for a kafka sink. Informs: #84753 Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION` to represent a `kafka` sink. Subsequently, users can run `CREATE CHANGEFEED` with an `external:///<external-connection-object-name` URI as the sink to use the kafka resource represented by the external connection object. Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Aditya Maru <[email protected]>
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Aug 10, 2022
This change registers s3 as a URI that can be represented as an External Connection. Most notably we take a page from the CDC book and switch the s3 parse function to check for invalid parameters, and configurations. This allows us to catch certain misconfiguration at the time we create the external connection. Informs: cockroachdb#84753 Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION` to represent an s3 URI.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Aug 10, 2022
This change registers s3 as a URI that can be represented as an External Connection. Most notably we take a page from the CDC book and switch the s3 parse function to check for invalid parameters, and configurations. This allows us to catch certain misconfiguration at the time we create the external connection. Informs: cockroachdb#84753 Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION` to represent an s3 URI.
craig bot
pushed a commit
that referenced
this issue
Aug 10, 2022
85680: amazon,externalconn: add s3 support to External Connections r=adityamaru a=adityamaru This change registers s3 as a URI that can be represented as an External Connection. Most notably we take a page from the CDC book and switch the s3 parse function to check for invalid parameters, and configurations. This allows us to catch certain misconfiguration at the time we create the external connection. Informs: #84753 Release note (sql change): Users can now `CREATE EXTERNAL CONNECTION` to represent an s3 URI. 85849: sql/catalog/tabledesc: relaxed validation for virtual col in SUFFIX cols r=Xiang-Gu a=Xiang-Gu One of the validation rule says that "we don't allow virtual columns to be in the SUFFIX columns of a secondary index", except for one case: `ALTER PRIMARY KYE USING HASH`, where the implicitly created virtual, shard column, will need to appear in the SUFFIX columns of the implicitly created unique, secondary index of the old PK key columns ( which btw is a CRDB unique feature). The validation has logic to exempt us from this special case but it's written specifically to the legacy schema changer. Namely, it uses the canonical `PrimaryKeySwap` mutation type as the signal but we don't have that in the declarative schema changer. This PR addresses this issue and allows the validation logic to also exempt the exact same case but in the declarative schema changer. Release note: None 85862: ui: new insights table component r=maryliag a=maryliag Creation of the base insights table, that can be used on both schema insights and statement details page. This commit only introduce the table, with the different types, but still needs the proper cell formating for each type. Part of #83783 Release note: None 85865: rowexec: use the sorter's own context r=yuzefovich a=yuzefovich Release note: none 85866: streamingccl, rowexec: correctly mark eventStream as "streaming" r=yuzefovich a=yuzefovich This commit fixes an incomplete solution of 9bb5d30 which attempted to mark `eventStream` generator builtin as "streaming" so that the columnarizer on top of the `projectSetProcessor` would not buffer anything. As found by Steven, the solution in that commit was incomplete since the generators array is not populated at the time where `MustBeStreaming` check is performed. This commit fixes that oversight by using a different way of propagating the property - via the function properties. Release note: None 85885: kvserver: add queue size metric to RaftTransport r=tbg a=pavelkalinnikov Currently, there is a `RaftEnqueuePending` metric in `StoreMetrics` which exposes the `RaftTransport` outgoing queue size. However, this is a per-`Store` metric, so it duplicates the same variable across all the stores. The metric should be tracked on a per-node/transport basis. This PR introduces a per-transport `SendQueueSize` metric to replace the old `RaftEnqueuePending`. It also does the necessary plumbing to include it to the exported metrics, since it's the first metric in `RaftTransport`. <img width="1350" alt="Screenshot 2022-08-10 at 15 37 47" src="https://user-images.githubusercontent.com/3757441/183929666-0f6523d7-5877-4f2f-8460-4f013f87966d.png"> Touches #83917 Release note: None 85906: eval: don't always ignore error from ResolveFunctionByOID r=chengxiong-ruan a=rafiss This addresses a comment from the review of #85656. Release note: None 85914: systemschema_test: fix timestamp stripping regex r=postamar a=postamar The test in this package was flaky because hlc.Timestamp values inside descriptors (modification time, for instance) with a logical component would be improperly redacted. This commit fixes this. Fixes #85799. Release note: None Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Xiang Gu <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Marius Posta <[email protected]>
This is now done, we must continue registering more providers that is being tracked in #84228. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-disaster-recovery
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-disaster-recovery
As discussed in #84209 External Connections are going to be addressable using an
external
scheme URI that will contain the objects name as its host. This URI will be recognized by theExternalStorage
,KMS
andSink
interfaces in CockroachDB today. Thereby allowing all bulk operations, and changefeeds to reference an External Connection when running. Roughly we will:external
URI to get the external connection name.system.external_connections
table and unmarshal its details.Dial()
into the underlying external resource and return a handle to the callerJira issue: CRDB-17845
Epic CRDB-15001
Epic CRDB-15001
The text was updated successfully, but these errors were encountered: