From 1d8b73775c363692008b9f14a6b7820b8da46656 Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Mon, 20 Jan 2025 14:25:03 +0000 Subject: [PATCH 1/7] DOC-4744 updated SQL Server preparation guide --- .../data-pipelines/prepare-dbs/sql-server.md | 183 +++++++++++------- 1 file changed, 114 insertions(+), 69 deletions(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/sql-server.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/sql-server.md index 8f1123e25..682d049c6 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/sql-server.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/sql-server.md @@ -16,66 +16,82 @@ type: integration weight: 2 --- -To prepare your SQL Server database for Debezium, you must first run a query to -enable CDC globally and then separately enable CDC for each table you want to +To prepare your SQL Server database for Debezium, you must first create a dedicated Debezium user, +run a script to enable CDC globally and then separately enable CDC for each table you want to capture. You need administrator privileges to do this. Once you enable CDC, it captures all of the INSERT, UPDATE, and DELETE operations -on your chosen tables. The Debezium connector can then emit these events to -[Kafka topics](https://kafka.apache.org/intro#intro_concepts_and_terms). +on your chosen tables. The Debezium connector can then emit these events to RDI. -## 1. Enable CDC on the database +## 1. Create a Debezium user -There are two system stored procedures to enable CDC (you need -administrator privileges to run these). Use `sys.sp_cdc_enable_db` -to enable CDC for the whole database and then -You can run the procedure with SQL Server Management Studio or with -Transact-SQL. - -Before running the procedure, ensure that: - -- You are a member of the `sysadmin` fixed server role for the SQL Server. -- You are a `db_owner` of the database. -- The SQL Server Agent is running. - -Then, follow the steps below to enable CDC: +It is strongly recommended to create a dedicated Debezium user for the connection between RDI +and the source database. When using an existing user, ensure that the required +permissions are granted and that the user is added to the CDC role. -1. From the **View** menu in SQL Server Management Studio, click **Template Explorer**. +1. Create a user with the Transact-SQL below: -1. In the Template Browser, expand **SQL Server Templates**. + ```sql + USE master + GO + CREATE LOGIN MyUser WITH PASSWORD = 'My_Password' + GO + USE MyDB + GO + CREATE USER MyUser FOR LOGIN MyUser + GO + ``` -1. Expand **Change Data Capture > Configuration** and then click **Enable Database for CDC**. + Replace `MyUser`, `My_Password` and `MyDB` with your chosen values. -1. In the template, replace the database name in the `USE` statement with the name of the - database where you want to enable CDC. For example, if your database was called - `myDB`, the template would be: +1. Grant required permissions: ```sql + USE master + GO + GRANT VIEW SERVER STATE TO MyUser + GO USE MyDB GO - EXEC sys.sp_cdc_enable_db + EXEC sp_addrolemember N'db_datareader', N'MyUser' GO ``` -1. Run the stored procedure `sys.sp_cdc_enable_db` to enable CDC for the database. +## 2. Enable CDC on the database -When you enable CDC for the database, it creates a schema called `cdc` and also -a CDC user, metadata tables, and other system objects. +There are two system stored procedures to enable CDC (you need +administrator privileges to run these). Use `sys.sp_cdc_enable_db` +to enable CDC for the whole database and then `sys.sp_cdc_enable_table` to enable CDC for individual tables. + +Before running the procedures, ensure that: -Keep the **Change Data Capture > Configuration** foldout open in the Template Explorer -because you will need it to enable CDC on the individual tables next. +- You are a member of the `sysadmin` fixed server role for the SQL Server. +- You are a `db_owner` of the database. +- The SQL Server Agent is running. + +Then, assuming your database is called `MyDB`, run the script below to enable CDC: -## 2. Enable CDC for the tables you want to capture +```sql +USE MyDB +GO +EXEC sys.sp_cdc_enable_db +GO +``` -You must also enable CDC on the tables you want Debezium to capture using the -following steps (again, you need administrator privileges for this): +{{< note >}}For SQL Server on AWS RDS, you must use a different stored procedure: +```sql +EXEC msdb.dbo.rds_cdc_enable_db 'Chinook' +GO +``` +{{< /note >}} + +When you enable CDC for the database, it creates a schema called `cdc` and also +a CDC user, metadata tables, and other system objects. -1. With the **Change Data Capture > Configuration** foldout still open in the - Template Explorer, select **Enable Table Specifying Filegroup Option**. +## 3. Enable CDC for the tables you want to capture -1. In the template, replace the table name in the USE statement with the name of - the table you want to capture. For example, if your table was called `MyTable` - then the template would look like the following: +1. You must also enable CDC on the tables you want Debezium to capture using the +following commands (again, you need administrator privileges for this): ```sql USE MyDB @@ -85,38 +101,34 @@ following steps (again, you need administrator privileges for this): @source_schema = N'dbo', @source_name = N'MyTable', @role_name = N'MyRole', - @filegroup_name = N'MyDB_CT', @supports_net_changes = 0 GO ``` + + Repeat this for every table you want to capture. + + {{< note >}}The value for `@role_name` can’t be a fixed database role, such as `db_datareader`. + Specifying a new name will create a corresponding database role that has full access to the + captured change data. + {{< /note >}} -1. Run the stored procedure `sys.sp_cdc_enable_table` to enable CDC for - the table. +1. Add the Debezium user to the CDC role: -1. Repeat steps 1 to 3 for every table you want to capture. + ```sql + USE MyDB + GO + EXEC sp_addrolemember N'MyRole', N'MyUser' + GO + ``` -## 3. Check that you have access to the CDC table +## 4. Check that you have access to the CDC table You can use another stored procedure `sys.sp_cdc_help_change_data_capture` to query the CDC information for the database and check you have enabled -it correctly. Before doing this, check that: - -* You have `SELECT` permission on all of the captured columns of the capture instance. - If you are a member of the `db_owner` database role then you can view information for - all of the defined capture instances. -* You are a member of any gating roles that are defined for the table that the query includes. - -Follow the steps below to run `sys.sp_cdc_help_change_data_capture`: - -1. From the **View** menu in SQL Server Management Studio, click **Object Explorer**. - -1. From the Object Explorer, expand **Databases**, and then expand your database - object, for example, `MyDB`. - -1. Expand **Programmability > Stored Procedures > System Stored Procedures**. +it correctly. To do this, connect as the Debezium user you created previously (`MyUser`). 1. Run the `sys.sp_cdc_help_change_data_capture` stored procedure to query - the table. For example, if your database was called `MyDB` then you would + the CDC configuration. For example, if your database was called `MyDB` then you would run the following: ```sql @@ -131,14 +143,31 @@ Follow the steps below to run `sys.sp_cdc_help_change_data_capture`: access. If the result is empty then you should check that you have privileges to access both the capture instance and the CDC tables. -## SQL Server on Azure +### Troubleshooting -You can also use the Debezium SQL Server connector with SQL Server on Azure. -See Microsoft's guide to -[configuring SQL Server on Azure for CDC with Debezium](https://learn.microsoft.com/en-us/samples/azure-samples/azure-sql-db-change-stream-debezium/azure-sql%2D%2Dsql-server-change-stream-with-debezium/) -for more information. +If no CDC is happening then it might mean that SQL Server Agent is down. You can check for this using the SQL query shown below: + +```sql +IF EXISTS (SELECT 1 + FROM master.dbo.sysprocesses + WHERE program_name = N'SQLAgent - Generic Refresher') +BEGIN + SELECT @@SERVERNAME AS 'InstanceName', 1 AS 'SQLServerAgentRunning' +END +ELSE +BEGIN + SELECT @@SERVERNAME AS 'InstanceName', 0 AS 'SQLServerAgentRunning' +END +``` + +If the query returns a result of 0, you need to need to start SQL Server Agent using the following commands: -### SQL Server capture job agent configuration parameters +```sql +EXEC xp_servicecontrol N'START',N'SQLServerAGENT'; +GO +``` + +## SQL Server capture job agent configuration parameters In SQL Server, the parameters that control the behavior of the capture job agent are defined in the SQL Server table `msdb.dbo.cdc_jobs`. If you experience performance @@ -169,6 +198,13 @@ of the Debezium SQL Server connector: See the SQL Server documentation for more information about capture agent parameters. +## SQL Server on Azure + +You can also use the Debezium SQL Server connector with SQL Server on Azure. +See Microsoft's guide to +[configuring SQL Server on Azure for CDC with Debezium](https://learn.microsoft.com/en-us/samples/azure-samples/azure-sql-db-change-stream-debezium/azure-sql%2D%2Dsql-server-change-stream-with-debezium/) +for more information. + ## Handling changes to the schema RDI can't adapt automatically when you change the schema of a CDC table in SQL Server. For example, @@ -186,19 +222,28 @@ documentation for further details. 1. Create a new capture table for the updated source table by running the `sys.sp_cdc_enable_table` stored procedure with a new, unique value for the parameter `@capture_instance`. For example, if the old value - was `dbo_customers`, you could replace it with `dbo_customers_v2`: + was `dbo_MyTable`, you could replace it with `dbo_MyTable_v2` (you can see the existing values by running + stored procedure `sys.sp_cdc_help_change_data_capture`): ```sql - EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'customers', @role_name = NULL, @supports_net_changes = 0, @capture_instance = 'dbo_customers_v2'; + EXEC sys.sp_cdc_enable_table + @source_schema = N'dbo', + @source_name = N'MyTable', + @role_name = N'MyRole', + @capture_instance = N'dbo_MyTable_v2', + @supports_net_changes = 0 GO ``` 1. When Debezium starts streaming from the new capture table, drop the old capture table by running the `sys.sp_cdc_disable_table` stored procedure with the parameter `@capture_instance` set to the old - capture instance name, `dbo_customers`: + capture instance name, `dbo_MyTable`: ```sql - EXEC sys.sp_cdc_disable_table @source_schema = 'dbo', @source_name = 'dbo_customers', @capture_instance = 'dbo_customers'; + EXEC sys.sp_cdc_disable_table + @source_schema = N'dbo', + @source_name = N'MyTable', + @capture_instance = N'dbo_MyTable' GO ``` From 2509dbd76a941aa38b59772a0f3a11d0809a419b Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Tue, 21 Jan 2025 09:23:53 +0000 Subject: [PATCH 2/7] DOC-4744 initial bits of reformatting --- .../reference/config-yaml-reference.md | 275 +++++++++++++++--- 1 file changed, 228 insertions(+), 47 deletions(-) diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md index d7d617d37..5ecf42ede 100644 --- a/content/integrate/redis-data-integration/reference/config-yaml-reference.md +++ b/content/integrate/redis-data-integration/reference/config-yaml-reference.md @@ -1,72 +1,253 @@ --- Title: Redis Data Integration configuration file linkTitle: RDI configuration file -description: Redis Data Integration configuration file reference +description: Reference for the RDI `config.yaml` file weight: 10 alwaysopen: false categories: ["redis-di"] aliases: /integrate/redis-data-integration/ingest/reference/config-yaml-reference/ --- -**Properties** +## Top level objects -|Name|Type|Description|Required| -|----|----|-----------|--------| -|[**sources**](#sources)
(Source collectors)|`object`||| -|[**processors**](#processors)
(Configuration details of Redis Data Integration Processors)|`object`, `null`||| -|[**targets**](#targets)
(Target connections)|`object`||| +These objects define the sections at the root level of `config.yaml`. - -## sources: Source collectors +### Properties -**Additional Properties** +| Name | Type | Description | Required | +| -- | -- | -- | -- | +| [**sources**](#sources) | `object` | Source collectors || +| [**processors**](#processors)| `object`, `null` | RDI Processors || +| [**targets**](#targets) | `object` | Target connections || -|Name|Type|Description|Required| -|----|----|-----------|--------| +## sources: Source collectors {#sources} - -## processors: Configuration details of Redis Data Integration Processors +Each source database type has its own connector, but the basic configuration properties are +the same for all databases. -**Properties** +See the Debezium documentation for more information about the specific connectors: + +- [MySQL/MariaDB](https://debezium.io/documentation/reference/stable/connectors/mysql.html) +- [Oracle](https://debezium.io/documentation/reference/stable/connectors/oracle.html) +- [PostgreSQL](https://debezium.io/documentation/reference/stable/connectors/postgresql.html) +- [SQL Server](https://debezium.io/documentation/reference/stable/connectors/sqlserver.html) + +### Essential properties + +[**connection:**](#connection)
+|Name|Type|Default|Source Databases|Description| +|--|--|--|--|--| +|host|string| |MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The address of the database instance.| +|port| int||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | The port of the database instance.| +|database|string||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The name of the database from which to stream the changes. For `SQL Server` you can define the database as comma-separated list of the SQL Server database names from which to stream the changes.| +|database.pdb.name|string|ORCLPDB1|Oracle|The name of the [Oracle Pluggable Database](https://docs.oracle.com/en/database/oracle/oracle-database/19/riwin/about-pluggable-databases-in-oracle-rac.html) that the connector captures changes from. For non-CDB installation, do not specify this property.| +|database.encrypt|boolean|false|MySQL|If SSL is enabled for a SQL Server database, enable SSL by setting the value of this property to true. | +|database.server.id|int|1|MySQL|A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster.| +|database.url|string||Oracle|Specifies the raw database JDBC URL. Use this property to provide flexibility in defining that database connection. Valid values include raw TNS names and RAC connection strings.| +|topic.prefix|string|rdi|MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | A prefix for all topic names that receive events emitted by this connector.| + +### Advanced properties + +[**Sink:**](#sink)
+|Name|Type|Default|Description| +|--|--|--|--| +| redis.null.key | string | default | Redis does not support the notion of data without key, so this string will be used as key for records without primary key. | +| redis.null.value | string | default | Redis does not support the notion of null payloads, as is the case with tombstone events. This string will be used as value for records without a payload. | +| redis.batch.size | int | 500 | Number of change records to insert in a single batch write (Pipelined transaction).| +| redis.memory.limit.mb | int | 300 | The connector stops sending events when Redis size exceeds this threshold.| +| redis.wait.enabled | boolean | false | In case Redis is configured with a replica shard, this allows to verify that the data has been written to the replica. | +| redis.wait.timeout.ms | int | 1000 | Defines the timeout in milliseconds when waiting for replica. | +| redis.wait.retry.enabled | boolean | false | Enables retry on wait for replica failure.| +| redis.wait.retry.delay.ms | int | 1000 | Defines the delay of retry on wait for replica failure. | +| redis.retry.initial.delay.ms | int | 300 | Initial retry delay when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed `redis.retry.max.delay.ms`. | +| redis.retry.max.delay.ms | int | 10000 | Max delay when encountering Redis connection or OOM issues. | + +[**Source:**](#source)
+|Name|Type|Default|Source Databases|Description| +|--|--|--|--|--| +|snapshot.mode|string|initial|MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the mode that the connector uses to take snapshots of a captured table.| +|topic.prefix|string|rdi|MySQL, Oracle, PostgreSQL, SQLServer|A prefix for all topic names that receive events emitted by this connector.| +|database.exclude.list|string||MariaDB, MySQL|An optional, comma-separated list of regular expressions that match the names of databases for which you do not want to capture changes. The connector captures changes in any database whose name is not included in `database.exclude.list`. Do not specify the `database` field in the `connection` configuration if you are using the `database.exclude.list` property to filter out databases.| +|schema.exclude.list|string||Oracle, PostgreSQL, SQLServer|An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. The connector captures changes in any schema whose name is not included in `schema.exclude.list`. Do no specify the `schemas` section if you are using the `schema.exclude.list` property to filter out schemas. | +|table.exclude.list|string||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; The connector captures all tables that are not included in `table.exclude.list`. Do not specify the `tables` block in the configuration if you are using the `table.exclude.list` property to filter out tables. | +| column.exclude.list | string| | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. Do not specify the `columns` block in the configuration if you are using the `column.exclude.list` property to filter out columns. | +|snapshot.select.statement.overrides|String||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the table rows to include in a snapshot. Use the property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log.| +|log.enabled|boolean|false|Oracle|Enables capturing and serialization of large object (CLOB, NCLOB, and BLOB) column values in change events.| +|unavailable.value.placeholder|\_\_debezium_unavailable_value |Oracle|Specifies the constant that the connector provides to indicate that the original value is unchanged and not provided by the database.| + +### Using queries in the initial snapshot (relevant for MySQL, Oracle, PostgreSQL and SQLServer) + +- In case you want a snapshot to include only a subset of the rows in a table, you need to add the property `snapshot.select.statement.overrides` and add a comma-separated list of [fully-qualified table names](#fully-qualified-table-name). The list should include every table for which you want to add a SELECT statement. + +- **For each table in the list above, add a further configuration property** that specifies the `SELECT` statement for the connector to run on the table when it takes a snapshot. + + The specified `SELECT` statement determines the subset of table rows to include in the snapshot. + + Use the following format to specify the name of this `SELECT` statement property: + + - Oracle, SQLServer, PostrgreSQL: `snapshot.select.statement.overrides: .` + - MySQL: `snapshot.select.statement.overrides: .` + +- Add the list of columns you want to include in the `SELECT` statement using fully-qualified names. Each column should be specified in the configuration as shown below: + + ```yaml + tables: + schema_name.table_name: # For MySQL: use database_name.table_name + columns: + - column_name1 # Each column on a new line + - column_name2 + - column_name3 + ``` + +- To capture all columns from a table, use empty curly braces `{}` instead of listing individual columns: + + ```yaml + tables: + schema_name.table_name: {} # Captures all columns + ``` + +### Example + +To select the columns `CustomerId`, `FirstName` and `LastName` from `customer` table and join it with `invoice` table in order to get customers with total invoices greater than 8000, we need to add the following properties to the `config.yaml` file: + +```yaml +tables: + chinook.customer: + columns: + - CustomerID + - FirstName + - LastName + +advanced: + source: + snapshot.select.statement.overrides: chinook.customer + snapshot.select.statement.overrides.chinook.customer: | + SELECT c.CustomerId, c.FirstName, c.LastName + FROM chinook.customer c + INNER JOIN chinook.invoice inv + ON c.CustomerId = inv.CustomerId + WHERE inv.total > 8000 +``` + +### Form custom message key(s) for change event records + +- By default, Debezium uses the primary key column(s) of a table as the message key for records that it emits. + In place of the default, or to specify a key for tables that lack a primary key, you can configure custom message keys based on one or more columns. + +- To establish a custom message key for a table, list the table followed by the column to use as the message key. Each list entry takes the following format: + + ```yaml + # To include entries for multiple tables, simply add each table with its corresponding columns and keys under the 'tables' field. + tables: + .: + columns: + - # List of columns to include + keys: + - # Column(s) to be used as the primary key + ``` + + Notes: + + - When specifying columns in the `keys` field, ensure that these same columns are also listed under the `columns` field in your configuration. + - There is no limit to the number of columns that can be used to create custom message keys. However, it’s best to use the minimum required number of columns to specify a unique key. + +### Fully-qualified table name + +In this document we refer to the fully-qualified table name as `.`. This format is for MySQL database. For Oracle, SQLServer and Postgresql databases use ``.`` instead. + +| Database Type | Fully-qualified Table Name | +| -- | -- | +| Oracle, SQLServer, PostrgreSQL | `.` | +| MySQL | `.` | + +{{< note >}}You can specify the fully-qualified table name `.` as +a regular expression instead of providing the full name of the `databaseName` and `tableName`. +{{< /note >}} + +### Examples + +- The primary key of the tables `customer` and `employee` is `ID`. + + To establish custom messages keys based on `FirstName` and `LastName` for the tables `customer` and `employee`, add the following block to the `config.yaml` file: + + ```yaml + tables: + # Sync a specific table with all its columns: + chinook.customer: + columns: + - ID + - FirstName + - LastName + - Company + - Address + - Email + keys: + - FirstName + - LastName + chinook.employee: + columns: + - ID + - FirstName + - LastName + - ReportsTo + - Address + - City + - State + keys: + - FirstName + - LastName + ``` + +## processors: RDI processors {#processors} + +### Properties |Name|Type|Description|Required| -|----|----|-----------|--------| -|**on\_failed\_retry\_interval**
(Interval \(in seconds\) on which to perform retry on failure)|`integer`, `string`|Default: `5`
Pattern: ``^\${.*}$``
Minimum: `1`
|| -|**read\_batch\_size**
(The batch size for reading data from source database)|`integer`, `string`|Default: `2000`
Pattern: ``^\${.*}$``
Minimum: `1`
|| -|**debezium\_lob\_encoded\_placeholder**
(Enable Debezium LOB placeholders)|`string`|Default: `"X19kZWJleml1bV91bmF2YWlsYWJsZV92YWx1ZQ=="`
|| +|--|--|--|--| +|**on_failed_retry_interval**
(Interval \(in seconds\) on which to perform retry on failure)|`integer`, `string`|Default: `5`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**read_batch_size**
(The batch size for reading data from source database)|`integer`, `string`|Default: `2000`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**debezium_lob_encoded_placeholder**
(Enable Debezium LOB placeholders)|`string`|Default: `"X19kZWJleml1bV91bmF2YWlsYWJsZV92YWx1ZQ=="`
|| |**dedup**
(Enable deduplication mechanism)|`boolean`|Default: `false`
|| -|**dedup\_max\_size**
(Max size of the deduplication set)|`integer`|Default: `1024`
Minimum: `1`
|| -|**dedup\_strategy**
(Deduplication strategy: reject \- reject messages\(dlq\), ignore \- ignore messages)|`string`|(DEPRECATED)
Property 'dedup_strategy' is now deprecated. The only supported strategy is 'ignore'. Please remove from the configuration.
Default: `"ignore"`
Enum: `"reject"`, `"ignore"`
|| -|**duration**
(Time \(in ms\) after which data will be read from stream even if read\_batch\_size was not reached)|`integer`, `string`|Default: `100`
Pattern: ``^\${.*}$``
Minimum: `1`
|| -|**write\_batch\_size**
(The batch size for writing data to target Redis database\. Should be less or equal to the read\_batch\_size)|`integer`, `string`|Default: `200`
Pattern: ``^\${.*}$``
Minimum: `1`
|| -|**error\_handling**
(Error handling strategy: ignore \- skip, dlq \- store rejected messages in a dead letter queue)|`string`|Default: `"dlq"`
Pattern: ``^\${.*}$|ignore|dlq``
|| -|**dlq\_max\_messages**
(Dead letter queue max messages per stream)|`integer`, `string`|Default: `1000`
Pattern: ``^\${.*}$``
Minimum: `1`
|| -|**target\_data\_type**
(Target data type: hash/json \- RedisJSON module must be in use in the target DB)|`string`|Default: `"hash"`
Pattern: ``^\${.*}$|hash|json``
|| -|**json\_update\_strategy**
(Target update strategy: replace/merge \- RedisJSON module must be in use in the target DB)|`string`|(DEPRECATED)
Property 'json_update_strategy' will be deprecated in future releases. Use 'on_update' job-level property to define the json update strategy.
Default: `"replace"`
Pattern: ``^\${.*}$|replace|merge``
|| -|**initial\_sync\_processes**
(Number of processes RDI Engine creates to process the initial sync with the source)|`integer`, `string`|Default: `4`
Pattern: ``^\${.*}$``
Minimum: `1`
Maximum: `32`
|| -|**idle\_sleep\_time\_ms**
(Idle sleep time \(in milliseconds\) between batches)|`integer`, `string`|Default: `200`
Pattern: ``^\${.*}$``
Minimum: `1`
Maximum: `999999`
|| -|**idle\_streams\_check\_interval\_ms**
(Interval \(in milliseconds\) for checking new streams when the stream processor is idling)|`integer`, `string`|Default: `1000`
Pattern: ``^\${.*}$``
Minimum: `1`
Maximum: `999999`
|| -|**busy\_streams\_check\_interval\_ms**
(Interval \(in milliseconds\) for checking new streams when the stream processor is busy)|`integer`, `string`|Default: `5000`
Pattern: ``^\${.*}$``
Minimum: `1`
Maximum: `999999`
|| -|**wait\_enabled**
(Checks if the data has been written to the replica shard)|`boolean`|Default: `false`
|| -|**wait\_timeout**
(Timeout in milliseconds when checking write to the replica shard)|`integer`, `string`|Default: `1000`
Pattern: ``^\${.*}$``
Minimum: `1`
|| -|**retry\_on\_replica\_failure**
(Ensures that the data has been written to the replica shard and keeps retrying if not)|`boolean`|Default: `true`
|| - -**Additional Properties:** not allowed - -## targets: Target connections +|**dedup_max_size**
(Max size of the deduplication set)|`integer`|Default: `1024`
Minimum: `1`
|| +|**dedup_strategy**
(Deduplication strategy: reject \- reject messages\(dlq\), ignore \- ignore messages)|`string`|(DEPRECATED)
Property 'dedup_strategy' is now deprecated. The only supported strategy is 'ignore'. Please remove from the configuration.
Default: `"ignore"`
Enum: `"reject"`, `"ignore"`
|| +|**duration**
(Time \(in ms\) after which data will be read from stream even if read_batch_size was not reached)|`integer`, `string`|Default: `100`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**write_batch_size**
(The batch size for writing data to target Redis database\. Should be less or equal to the read_batch_size)|`integer`, `string`|Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**error_handling**
(Error handling strategy: ignore \- skip, dlq \- store rejected messages in a dead letter queue)|`string`|Default: `"dlq"`
Pattern: `^\${.*}$|ignore|dlq`
|| +|**dlq_max_messages**
(Dead letter queue max messages per stream)|`integer`, `string`|Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**target_data_type**
(Target data type: hash/json \- RedisJSON module must be in use in the target DB)|`string`|Default: `"hash"`
Pattern: `^\${.*}$|hash|json`
|| +|**json_update_strategy**
(Target update strategy: replace/merge \- RedisJSON module must be in use in the target DB)|`string`|(DEPRECATED)
Property 'json_update_strategy' will be deprecated in future releases. Use 'on_update' job-level property to define the json update strategy.
Default: `"replace"`
Pattern: `^\${.*}$|replace|merge`
|| +|**initial_sync_processes**
(Number of processes RDI Engine creates to process the initial sync with the source)|`integer`, `string`|Default: `4`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `32`
|| +|**idle_sleep_time_ms**
(Idle sleep time \(in milliseconds\) between batches)|`integer`, `string`|Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +|**idle_streams_check_interval_ms**
(Interval \(in milliseconds\) for checking new streams when the stream processor is idling)|`integer`, `string`|Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +|**busy_streams_check_interval_ms**
(Interval \(in milliseconds\) for checking new streams when the stream processor is busy)|`integer`, `string`|Default: `5000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +|**wait_enabled**
(Checks if the data has been written to the replica shard)|`boolean`|Default: `false`
|| +|**wait_timeout**
(Timeout in milliseconds when checking write to the replica shard)|`integer`, `string`|Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
|| +|**retry_on_replica_failure**
(Ensures that the data has been written to the replica shard and keeps retrying if not)|`boolean`|Default: `true`
|| +| | + +### Additional properties + +Not allowed + +## targets: Target connections {#targets} **Properties** -|Name|Type|Description|Required| -|----|----|-----------|--------| -|[**connection**](#targetsconnection)
(Connection details)|`object`||| +| Name | Type | Description | Required | +| -- | -- | -- | -- | +| [**connection**](#targetsconnection) | `object` | Connection details | | - -### targets\.connection: Connection details +### targets\.connection: Connection details {#targetsconnection} **Properties (Pattern)** -|Name|Type|Description|Required| -|----|----|-----------|--------| -|**\.\***|||| -|**additionalProperties**|||| +| Name | Type | Description | | +| -- | -- | -- | -- | +| host | string | Host of the Redis database to which Redis Data Integration will write the processed data. | +| port | int | Port for the Redis database to which Redis Data Integration will write the processed data. | | +| user | string | User of the Redis database to which Redis Data Integration will write the processed data. Uncomment if not using default user. | +| password | string | Password for Redis target database. | +| key | string | uncomment the following lines if you are using SSL/TLS. | +| key_password | string | uncomment the following lines if you are using SSL/TLS. | +| cert | string | uncomment the following lines if you are using SSL/TLS. | +| cacert | string | uncomment the following lines if you are using SSL/TLS. | From 21de3eeab96b1982007dde7509241fa93ed8fd49 Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Tue, 21 Jan 2025 09:24:34 +0000 Subject: [PATCH 3/7] DOC-4744 more reformatting --- .../redis-data-integration/reference/config-yaml-reference.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md index 5ecf42ede..c5826fa4f 100644 --- a/content/integrate/redis-data-integration/reference/config-yaml-reference.md +++ b/content/integrate/redis-data-integration/reference/config-yaml-reference.md @@ -231,7 +231,7 @@ Not allowed ## targets: Target connections {#targets} -**Properties** +## Properties | Name | Type | Description | Required | | -- | -- | -- | -- | @@ -239,7 +239,7 @@ Not allowed ### targets\.connection: Connection details {#targetsconnection} -**Properties (Pattern)** +### Properties (Pattern)ß | Name | Type | Description | | | -- | -- | -- | -- | From 0fb2a67672ae8d80c795bf010b300dea3d6d56bf Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Tue, 21 Jan 2025 11:12:36 +0000 Subject: [PATCH 4/7] DOC-4744 made property name formatting consistent --- .../reference/config-yaml-reference.md | 132 +++++++++--------- 1 file changed, 67 insertions(+), 65 deletions(-) diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md index c5826fa4f..a79e73414 100644 --- a/content/integrate/redis-data-integration/reference/config-yaml-reference.md +++ b/content/integrate/redis-data-integration/reference/config-yaml-reference.md @@ -16,9 +16,9 @@ These objects define the sections at the root level of `config.yaml`. | Name | Type | Description | Required | | -- | -- | -- | -- | -| [**sources**](#sources) | `object` | Source collectors || -| [**processors**](#processors)| `object`, `null` | RDI Processors || -| [**targets**](#targets) | `object` | Target connections || +| [`sources`](#sources) | `object` | Source collectors || +| [`processors`](#processors)| `object`, `null` | RDI Processors || +| [`targets`](#targets) | `object` | Target connections || ## sources: Source collectors {#sources} @@ -34,46 +34,49 @@ See the Debezium documentation for more information about the specific connector ### Essential properties -[**connection:**](#connection)
+#### `connection` + |Name|Type|Default|Source Databases|Description| |--|--|--|--|--| -|host|string| |MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The address of the database instance.| -|port| int||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | The port of the database instance.| -|database|string||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The name of the database from which to stream the changes. For `SQL Server` you can define the database as comma-separated list of the SQL Server database names from which to stream the changes.| -|database.pdb.name|string|ORCLPDB1|Oracle|The name of the [Oracle Pluggable Database](https://docs.oracle.com/en/database/oracle/oracle-database/19/riwin/about-pluggable-databases-in-oracle-rac.html) that the connector captures changes from. For non-CDB installation, do not specify this property.| -|database.encrypt|boolean|false|MySQL|If SSL is enabled for a SQL Server database, enable SSL by setting the value of this property to true. | -|database.server.id|int|1|MySQL|A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster.| -|database.url|string||Oracle|Specifies the raw database JDBC URL. Use this property to provide flexibility in defining that database connection. Valid values include raw TNS names and RAC connection strings.| -|topic.prefix|string|rdi|MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | A prefix for all topic names that receive events emitted by this connector.| +| `host` |string| |MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The address of the database instance.| +| `port` | int||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | The port of the database instance.| +| `database` |string||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The name of the database from which to stream the changes. For `SQL Server` you can define the database as comma-separated list of the SQL Server database names from which to stream the changes.| +| `database.pdb.name` |string|ORCLPDB1|Oracle|The name of the [Oracle Pluggable Database](https://docs.oracle.com/en/database/oracle/oracle-database/19/riwin/about-pluggable-databases-in-oracle-rac.html) that the connector captures changes from. For non-CDB installation, do not specify this property.| +| `database.encrypt` |boolean|false|MySQL|If SSL is enabled for a SQL Server database, enable SSL by setting the value of this property to true. | +| `database.server.id` |int|1|MySQL|A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster.| +| `database.url` |string||Oracle|Specifies the raw database JDBC URL. Use this property to provide flexibility in defining that database connection. Valid values include raw TNS names and RAC connection strings.| +| `topic.prefix` |string|rdi|MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | A prefix for all topic names that receive events emitted by this connector.| ### Advanced properties -[**Sink:**](#sink)
+#### `sink` + |Name|Type|Default|Description| |--|--|--|--| -| redis.null.key | string | default | Redis does not support the notion of data without key, so this string will be used as key for records without primary key. | -| redis.null.value | string | default | Redis does not support the notion of null payloads, as is the case with tombstone events. This string will be used as value for records without a payload. | -| redis.batch.size | int | 500 | Number of change records to insert in a single batch write (Pipelined transaction).| -| redis.memory.limit.mb | int | 300 | The connector stops sending events when Redis size exceeds this threshold.| -| redis.wait.enabled | boolean | false | In case Redis is configured with a replica shard, this allows to verify that the data has been written to the replica. | -| redis.wait.timeout.ms | int | 1000 | Defines the timeout in milliseconds when waiting for replica. | -| redis.wait.retry.enabled | boolean | false | Enables retry on wait for replica failure.| -| redis.wait.retry.delay.ms | int | 1000 | Defines the delay of retry on wait for replica failure. | -| redis.retry.initial.delay.ms | int | 300 | Initial retry delay when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed `redis.retry.max.delay.ms`. | -| redis.retry.max.delay.ms | int | 10000 | Max delay when encountering Redis connection or OOM issues. | - -[**Source:**](#source)
+| `redis.null.key` | string | default | Redis does not support the notion of data without key, so this string will be used as key for records without primary key. | +| `redis.null.value` | string | default | Redis does not support the notion of null payloads, as is the case with tombstone events. This string will be used as value for records without a payload. | +| `redis.batch.size` | int | 500 | Number of change records to insert in a single batch write (Pipelined transaction).| +| `redis.memory.limit.mb` | int | 300 | The connector stops sending events when Redis size exceeds this threshold.| +| `redis.wait.enabled` | boolean | false | In case Redis is configured with a replica shard, this allows to verify that the data has been written to the replica. | +| `redis.wait.timeout.ms` | int | 1000 | Defines the timeout in milliseconds when waiting for replica. | +| `redis.wait.retry.enabled` | boolean | false | Enables retry on wait for replica failure.| +| `redis.wait.retry.delay.ms` | int | 1000 | Defines the delay of retry on wait for replica failure. | +| `redis.retry.initial.delay.ms` | int | 300 | Initial retry delay when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed `redis.retry.max.delay.ms`. | +| `redis.retry.max.delay.ms` | int | 10000 | Max delay when encountering Redis connection or OOM issues. | + +## `source` + |Name|Type|Default|Source Databases|Description| |--|--|--|--|--| -|snapshot.mode|string|initial|MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the mode that the connector uses to take snapshots of a captured table.| -|topic.prefix|string|rdi|MySQL, Oracle, PostgreSQL, SQLServer|A prefix for all topic names that receive events emitted by this connector.| -|database.exclude.list|string||MariaDB, MySQL|An optional, comma-separated list of regular expressions that match the names of databases for which you do not want to capture changes. The connector captures changes in any database whose name is not included in `database.exclude.list`. Do not specify the `database` field in the `connection` configuration if you are using the `database.exclude.list` property to filter out databases.| -|schema.exclude.list|string||Oracle, PostgreSQL, SQLServer|An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. The connector captures changes in any schema whose name is not included in `schema.exclude.list`. Do no specify the `schemas` section if you are using the `schema.exclude.list` property to filter out schemas. | -|table.exclude.list|string||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; The connector captures all tables that are not included in `table.exclude.list`. Do not specify the `tables` block in the configuration if you are using the `table.exclude.list` property to filter out tables. | -| column.exclude.list | string| | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. Do not specify the `columns` block in the configuration if you are using the `column.exclude.list` property to filter out columns. | -|snapshot.select.statement.overrides|String||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the table rows to include in a snapshot. Use the property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log.| -|log.enabled|boolean|false|Oracle|Enables capturing and serialization of large object (CLOB, NCLOB, and BLOB) column values in change events.| -|unavailable.value.placeholder|\_\_debezium_unavailable_value |Oracle|Specifies the constant that the connector provides to indicate that the original value is unchanged and not provided by the database.| +| `snapshot.mode` |string|initial|MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the mode that the connector uses to take snapshots of a captured table.| +| `topic.prefix` |string|rdi|MySQL, Oracle, PostgreSQL, SQLServer|A prefix for all topic names that receive events emitted by this connector.| +| `database.exclude.list` |string||MariaDB, MySQL|An optional, comma-separated list of regular expressions that match the names of databases for which you do not want to capture changes. The connector captures changes in any database whose name is not included in `database.exclude.list`. Do not specify the `database` field in the `connection` configuration if you are using the `database.exclude.list` property to filter out databases.| +| `schema.exclude.list` |string||Oracle, PostgreSQL, SQLServer|An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. The connector captures changes in any schema whose name is not included in `schema.exclude.list`. Do no specify the `schemas` section if you are using the `schema.exclude.list` property to filter out schemas. | +| `table.exclude.list` |string||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; The connector captures all tables that are not included in `table.exclude.list`. Do not specify the `tables` block in the configuration if you are using the `table.exclude.list` property to filter out tables. | +| `column.exclude.list` | string| | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. Do not specify the `columns` block in the configuration if you are using the `column.exclude.list` property to filter out columns. | +| `snapshot.select.statement.overrides` |String||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the table rows to include in a snapshot. Use the property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log.| +| `log.enabled` |boolean|false|Oracle|Enables capturing and serialization of large object (CLOB, NCLOB, and BLOB) column values in change events.| +| `unavailable.value.placeholder` |\_\_debezium_unavailable_value |Oracle|Specifies the constant that the connector provides to indicate that the original value is unchanged and not provided by the database.| ### Using queries in the initial snapshot (relevant for MySQL, Oracle, PostgreSQL and SQLServer) @@ -204,26 +207,25 @@ a regular expression instead of providing the full name of the `databaseName` an |Name|Type|Description|Required| |--|--|--|--| -|**on_failed_retry_interval**
(Interval \(in seconds\) on which to perform retry on failure)|`integer`, `string`|Default: `5`
Pattern: `^\${.*}$`
Minimum: `1`
|| -|**read_batch_size**
(The batch size for reading data from source database)|`integer`, `string`|Default: `2000`
Pattern: `^\${.*}$`
Minimum: `1`
|| -|**debezium_lob_encoded_placeholder**
(Enable Debezium LOB placeholders)|`string`|Default: `"X19kZWJleml1bV91bmF2YWlsYWJsZV92YWx1ZQ=="`
|| -|**dedup**
(Enable deduplication mechanism)|`boolean`|Default: `false`
|| -|**dedup_max_size**
(Max size of the deduplication set)|`integer`|Default: `1024`
Minimum: `1`
|| -|**dedup_strategy**
(Deduplication strategy: reject \- reject messages\(dlq\), ignore \- ignore messages)|`string`|(DEPRECATED)
Property 'dedup_strategy' is now deprecated. The only supported strategy is 'ignore'. Please remove from the configuration.
Default: `"ignore"`
Enum: `"reject"`, `"ignore"`
|| -|**duration**
(Time \(in ms\) after which data will be read from stream even if read_batch_size was not reached)|`integer`, `string`|Default: `100`
Pattern: `^\${.*}$`
Minimum: `1`
|| -|**write_batch_size**
(The batch size for writing data to target Redis database\. Should be less or equal to the read_batch_size)|`integer`, `string`|Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
|| -|**error_handling**
(Error handling strategy: ignore \- skip, dlq \- store rejected messages in a dead letter queue)|`string`|Default: `"dlq"`
Pattern: `^\${.*}$|ignore|dlq`
|| -|**dlq_max_messages**
(Dead letter queue max messages per stream)|`integer`, `string`|Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
|| -|**target_data_type**
(Target data type: hash/json \- RedisJSON module must be in use in the target DB)|`string`|Default: `"hash"`
Pattern: `^\${.*}$|hash|json`
|| -|**json_update_strategy**
(Target update strategy: replace/merge \- RedisJSON module must be in use in the target DB)|`string`|(DEPRECATED)
Property 'json_update_strategy' will be deprecated in future releases. Use 'on_update' job-level property to define the json update strategy.
Default: `"replace"`
Pattern: `^\${.*}$|replace|merge`
|| -|**initial_sync_processes**
(Number of processes RDI Engine creates to process the initial sync with the source)|`integer`, `string`|Default: `4`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `32`
|| -|**idle_sleep_time_ms**
(Idle sleep time \(in milliseconds\) between batches)|`integer`, `string`|Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| -|**idle_streams_check_interval_ms**
(Interval \(in milliseconds\) for checking new streams when the stream processor is idling)|`integer`, `string`|Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| -|**busy_streams_check_interval_ms**
(Interval \(in milliseconds\) for checking new streams when the stream processor is busy)|`integer`, `string`|Default: `5000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| -|**wait_enabled**
(Checks if the data has been written to the replica shard)|`boolean`|Default: `false`
|| -|**wait_timeout**
(Timeout in milliseconds when checking write to the replica shard)|`integer`, `string`|Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
|| -|**retry_on_replica_failure**
(Ensures that the data has been written to the replica shard and keeps retrying if not)|`boolean`|Default: `true`
|| -| | +| `on_failed_retry_interval` |`integer`, `string`| Interval \(in seconds\) on which to perform retry on failure.
Default: `5`
Pattern: `^\${.*}$`
Minimum: `1`|| +| `read_batch_size` |`integer`, `string`| Batch size for reading data from the source database.
Default: `2000`
Pattern: `^\${.*}$`
Minimum: `1`|| +| `debezium_lob_encoded_placeholder` |`string`| Enable Debezium LOB placeholders.
Default: `"X19kZWJleml1bV91bmF2YWlsYWJsZV92YWx1ZQ=="`|| +| `dedup` |`boolean`| Enable deduplication mechanism.
Default: `false`
|| +| `dedup_max_size` |`integer`| Max size of the deduplication set.
Default: `1024`
Minimum: `1`
|| +| `dedup_strategy` |`string`| Deduplication strategy: reject \- reject messages\(dlq\), ignore \- ignore messages.
(DEPRECATED)
Property 'dedup_strategy' is now deprecated. The only supported strategy is 'ignore'. Please remove from the configuration.
Default: `"ignore"`
Enum: `"reject"`, `"ignore"`
|| +| `duration` |`integer`, `string`| Time (in ms) after which data will be read from stream even if read_batch_size was not reached.
Default: `100`
Pattern: `^\${.*}$`
Minimum: `1`
|| +| `write_batch_size` |`integer`, `string`| The batch size for writing data to target Redis database\. Should be less or equal to the read_batch_size.
Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
|| +| `error_handling` |`string`| Error handling strategy: ignore \- skip, dlq \- store rejected messages in a dead letter queue.
Default: `"dlq"`
Pattern: `^\${.*}$|ignore|dlq`
|| +| `dlq_max_messages` |`integer`, `string`| Dead letter queue max messages per stream.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
|| +| `target_data_type` |`string`| Target data type: hash/json \- RedisJSON module must be in use in the target DB.
Default: `"hash"`
Pattern: `^\${.*}$|hash|json`
|| +| `json_update_strategy` |`string`| Target update strategy: replace/merge \- RedisJSON module must be in use in the target DB.
(DEPRECATED)
Property 'json_update_strategy' will be deprecated in future releases. Use 'on_update' job-level property to define the json update strategy.
Default: `"replace"`
Pattern: `^\${.*}$|replace|merge`
|| +| `initial_sync_processes` |`integer`, `string`| Number of processes RDI Engine creates to process the initial sync with the source.
Default: `4`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `32`
|| +| `idle_sleep_time_ms` |`integer`, `string`| Idle sleep time \(in milliseconds\) between batches.
Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +| `idle_streams_check_interval_ms` |`integer`, `string`| Interval \(in milliseconds\) for checking new streams when the stream processor is idling.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +| `busy_streams_check_interval_ms` |`integer`, `string`| Interval \(in milliseconds\) for checking new streams when the stream processor is busy.
Default: `5000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| +| `wait_enabled` |`boolean`| Checks if the data has been written to the replica shard.
Default: `false`
|| +| `wait_timeout` |`integer`, `string`| Timeout in milliseconds when checking write to the replica shard.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
|| +| `retry_on_replica_failure` |`boolean`| Ensures that the data has been written to the replica shard and keeps retrying if not.
Default: `true`
|| ### Additional properties @@ -235,19 +237,19 @@ Not allowed | Name | Type | Description | Required | | -- | -- | -- | -- | -| [**connection**](#targetsconnection) | `object` | Connection details | | +| [`connection`](#targetsconnection) | `object` | Connection details | | -### targets\.connection: Connection details {#targetsconnection} +### `targets.connection`: Connection details {#targetsconnection} -### Properties (Pattern)ß +### Properties (Pattern) | Name | Type | Description | | | -- | -- | -- | -- | -| host | string | Host of the Redis database to which Redis Data Integration will write the processed data. | -| port | int | Port for the Redis database to which Redis Data Integration will write the processed data. | | -| user | string | User of the Redis database to which Redis Data Integration will write the processed data. Uncomment if not using default user. | -| password | string | Password for Redis target database. | -| key | string | uncomment the following lines if you are using SSL/TLS. | -| key_password | string | uncomment the following lines if you are using SSL/TLS. | -| cert | string | uncomment the following lines if you are using SSL/TLS. | -| cacert | string | uncomment the following lines if you are using SSL/TLS. | +| `host` | string | Host of the Redis database to which Redis Data Integration will write the processed data. | +| `port` | int | Port for the Redis database to which Redis Data Integration will write the processed data. | | +| `user` | string | User of the Redis database to which Redis Data Integration will write the processed data. Uncomment if not using default user. | +| `password` | string | Password for Redis target database. | +| `key` | string | uncomment the following lines if you are using SSL/TLS. | +| `key_password` | string | uncomment the following lines if you are using SSL/TLS. | +| `cert` | string | uncomment the following lines if you are using SSL/TLS. | +| `cacert` | string | uncomment the following lines if you are using SSL/TLS. | From 95faf738a9ce32bfaf3ddf41a41e737cc51a5e06 Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Tue, 21 Jan 2025 11:47:48 +0000 Subject: [PATCH 5/7] DOC-4744 removed default table columns for better use of space --- .../reference/config-yaml-reference.md | 76 +++++++++---------- 1 file changed, 38 insertions(+), 38 deletions(-) diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md index a79e73414..369a0a7fe 100644 --- a/content/integrate/redis-data-integration/reference/config-yaml-reference.md +++ b/content/integrate/redis-data-integration/reference/config-yaml-reference.md @@ -14,11 +14,11 @@ These objects define the sections at the root level of `config.yaml`. ### Properties -| Name | Type | Description | Required | -| -- | -- | -- | -- | -| [`sources`](#sources) | `object` | Source collectors || -| [`processors`](#processors)| `object`, `null` | RDI Processors || -| [`targets`](#targets) | `object` | Target connections || +| Name | Type | Description | +| -- | -- | -- | +| [`sources`](#sources) | `object` | Source collectors | +| [`processors`](#processors)| `object`, `null` | RDI Processors | +| [`targets`](#targets) | `object` | Target connections | ## sources: Source collectors {#sources} @@ -36,47 +36,47 @@ See the Debezium documentation for more information about the specific connector #### `connection` -|Name|Type|Default|Source Databases|Description| -|--|--|--|--|--| -| `host` |string| |MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The address of the database instance.| -| `port` | int||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | The port of the database instance.| -| `database` |string||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The name of the database from which to stream the changes. For `SQL Server` you can define the database as comma-separated list of the SQL Server database names from which to stream the changes.| -| `database.pdb.name` |string|ORCLPDB1|Oracle|The name of the [Oracle Pluggable Database](https://docs.oracle.com/en/database/oracle/oracle-database/19/riwin/about-pluggable-databases-in-oracle-rac.html) that the connector captures changes from. For non-CDB installation, do not specify this property.| -| `database.encrypt` |boolean|false|MySQL|If SSL is enabled for a SQL Server database, enable SSL by setting the value of this property to true. | -| `database.server.id` |int|1|MySQL|A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster.| -| `database.url` |string||Oracle|Specifies the raw database JDBC URL. Use this property to provide flexibility in defining that database connection. Valid values include raw TNS names and RAC connection strings.| -| `topic.prefix` |string|rdi|MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | A prefix for all topic names that receive events emitted by this connector.| +|Name|Type|Source Databases|Description| +|--|--|--|--| +| `host` |string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The address of the database instance.| +| `port` | int| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | The port of the database instance.| +| `database` |string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The name of the database from which to stream the changes. For `SQL Server` you can define the database as comma-separated list of the SQL Server database names from which to stream the changes.| +| `database.pdb.name` |string| Oracle|The name of the [Oracle Pluggable Database](https://docs.oracle.com/en/database/oracle/oracle-database/19/riwin/about-pluggable-databases-in-oracle-rac.html) that the connector captures changes from. For non-CDB installation, do not specify this property.
Default: "ORCLPDB1"| +| `database.encrypt` |boolean| MySQL|If SSL is enabled for a SQL Server database, enable SSL by setting the value of this property to true.
Default: false | +| `database.server.id` |int| MySQL|A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster.
Default: 1| +| `database.url` |string| Oracle|Specifies the raw database JDBC URL. Use this property to provide flexibility in defining that database connection. Valid values include raw TNS names and RAC connection strings.| +| `topic.prefix` |string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | A prefix for all topic names that receive events emitted by this connector.
Default: "rdi" | ### Advanced properties #### `sink` -|Name|Type|Default|Description| -|--|--|--|--| -| `redis.null.key` | string | default | Redis does not support the notion of data without key, so this string will be used as key for records without primary key. | -| `redis.null.value` | string | default | Redis does not support the notion of null payloads, as is the case with tombstone events. This string will be used as value for records without a payload. | -| `redis.batch.size` | int | 500 | Number of change records to insert in a single batch write (Pipelined transaction).| -| `redis.memory.limit.mb` | int | 300 | The connector stops sending events when Redis size exceeds this threshold.| -| `redis.wait.enabled` | boolean | false | In case Redis is configured with a replica shard, this allows to verify that the data has been written to the replica. | -| `redis.wait.timeout.ms` | int | 1000 | Defines the timeout in milliseconds when waiting for replica. | -| `redis.wait.retry.enabled` | boolean | false | Enables retry on wait for replica failure.| -| `redis.wait.retry.delay.ms` | int | 1000 | Defines the delay of retry on wait for replica failure. | -| `redis.retry.initial.delay.ms` | int | 300 | Initial retry delay when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed `redis.retry.max.delay.ms`. | -| `redis.retry.max.delay.ms` | int | 10000 | Max delay when encountering Redis connection or OOM issues. | +|Name|Type|Description| +|--|--|--| +| `redis.null.key` | string | Redis does not support the notion of data without key, so this string will be used as key for records without primary key.
Default: "default" | +| `redis.null.value` | string | Redis does not support the notion of null payloads, as is the case with tombstone events. This string will be used as value for records without a payload.
Default: "default" | +| `redis.batch.size` | int | Number of change records to insert in a single batch write (Pipelined transaction).
Default: 500 | +| `redis.memory.limit.mb` | int | The connector stops sending events when Redis size exceeds this threshold.
Default: 300 | +| `redis.wait.enabled` | boolean | In case Redis is configured with a replica shard, this allows to verify that the data has been written to the replica.
Default: false | +| `redis.wait.timeout.ms` | int | Defines the timeout in milliseconds when waiting for replica.
Default: 1000 | +| `redis.wait.retry.enabled` | boolean | Enables retry on wait for replica failure.
Default: false | +| `redis.wait.retry.delay.ms` | int | Defines the delay of retry on wait for replica failure.
Default: 1000 | +| `redis.retry.initial.delay.ms` | int | Initial retry delay when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed `redis.retry.max.delay.ms`.
Default: 300 | +| `redis.retry.max.delay.ms` | int | Max delay when encountering Redis connection or OOM issues.
Default: 10000 | ## `source` -|Name|Type|Default|Source Databases|Description| -|--|--|--|--|--| -| `snapshot.mode` |string|initial|MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the mode that the connector uses to take snapshots of a captured table.| -| `topic.prefix` |string|rdi|MySQL, Oracle, PostgreSQL, SQLServer|A prefix for all topic names that receive events emitted by this connector.| -| `database.exclude.list` |string||MariaDB, MySQL|An optional, comma-separated list of regular expressions that match the names of databases for which you do not want to capture changes. The connector captures changes in any database whose name is not included in `database.exclude.list`. Do not specify the `database` field in the `connection` configuration if you are using the `database.exclude.list` property to filter out databases.| -| `schema.exclude.list` |string||Oracle, PostgreSQL, SQLServer|An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. The connector captures changes in any schema whose name is not included in `schema.exclude.list`. Do no specify the `schemas` section if you are using the `schema.exclude.list` property to filter out schemas. | -| `table.exclude.list` |string||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; The connector captures all tables that are not included in `table.exclude.list`. Do not specify the `tables` block in the configuration if you are using the `table.exclude.list` property to filter out tables. | -| `column.exclude.list` | string| | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. Do not specify the `columns` block in the configuration if you are using the `column.exclude.list` property to filter out columns. | -| `snapshot.select.statement.overrides` |String||MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the table rows to include in a snapshot. Use the property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log.| -| `log.enabled` |boolean|false|Oracle|Enables capturing and serialization of large object (CLOB, NCLOB, and BLOB) column values in change events.| -| `unavailable.value.placeholder` |\_\_debezium_unavailable_value |Oracle|Specifies the constant that the connector provides to indicate that the original value is unchanged and not provided by the database.| +|Name|Type|Source Databases|Description| +|--|--|--|--| +| `snapshot.mode` |string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the mode that the connector uses to take snapshots of a captured table.
Default: "initial" | +| `topic.prefix` |string| MySQL, Oracle, PostgreSQL, SQLServer|A prefix for all topic names that receive events emitted by this connector.
Default: "rdi" | +| `database.exclude.list` |string| MariaDB, MySQL|An optional, comma-separated list of regular expressions that match the names of databases for which you do not want to capture changes. The connector captures changes in any database whose name is not included in `database.exclude.list`. Do not specify the `database` field in the `connection` configuration if you are using the `database.exclude.list` property to filter out databases.| +| `schema.exclude.list` |string| Oracle, PostgreSQL, SQLServer|An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. The connector captures changes in any schema whose name is not included in `schema.exclude.list`. Do no specify the `schemas` section if you are using the `schema.exclude.list` property to filter out schemas. | +| `table.exclude.list` |string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; The connector captures all tables that are not included in `table.exclude.list`. Do not specify the `tables` block in the configuration if you are using the `table.exclude.list` property to filter out tables. | +| `column.exclude.list` | string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. Do not specify the `columns` block in the configuration if you are using the `column.exclude.list` property to filter out columns. | +| `snapshot.select.statement.overrides` |String| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the table rows to include in a snapshot. Use the property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log.| +| `log.enabled` |boolean| Oracle|Enables capturing and serialization of large object (CLOB, NCLOB, and BLOB) column values in change events.
Default: false | +| `unavailable.value.placeholder` | Special | Oracle|Specifies the constant that the connector provides to indicate that the original value is unchanged and not provided by the database (this has the type `__debezium_unavailable_value`).| ### Using queries in the initial snapshot (relevant for MySQL, Oracle, PostgreSQL and SQLServer) From e35a43fc75c4a1cb806116aa370f61dd0a8ff8ec Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Tue, 21 Jan 2025 13:03:20 +0000 Subject: [PATCH 6/7] DOC-4744 formatted type names consistently --- .../reference/config-yaml-reference.md | 80 +++++++++---------- 1 file changed, 40 insertions(+), 40 deletions(-) diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md index 369a0a7fe..28730a02e 100644 --- a/content/integrate/redis-data-integration/reference/config-yaml-reference.md +++ b/content/integrate/redis-data-integration/reference/config-yaml-reference.md @@ -8,7 +8,7 @@ categories: ["redis-di"] aliases: /integrate/redis-data-integration/ingest/reference/config-yaml-reference/ --- -## Top level objects +## Top level objects These objects define the sections at the root level of `config.yaml`. @@ -38,14 +38,14 @@ See the Debezium documentation for more information about the specific connector |Name|Type|Source Databases|Description| |--|--|--|--| -| `host` |string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The address of the database instance.| -| `port` | int| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | The port of the database instance.| -| `database` |string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The name of the database from which to stream the changes. For `SQL Server` you can define the database as comma-separated list of the SQL Server database names from which to stream the changes.| -| `database.pdb.name` |string| Oracle|The name of the [Oracle Pluggable Database](https://docs.oracle.com/en/database/oracle/oracle-database/19/riwin/about-pluggable-databases-in-oracle-rac.html) that the connector captures changes from. For non-CDB installation, do not specify this property.
Default: "ORCLPDB1"| -| `database.encrypt` |boolean| MySQL|If SSL is enabled for a SQL Server database, enable SSL by setting the value of this property to true.
Default: false | -| `database.server.id` |int| MySQL|A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster.
Default: 1| -| `database.url` |string| Oracle|Specifies the raw database JDBC URL. Use this property to provide flexibility in defining that database connection. Valid values include raw TNS names and RAC connection strings.| -| `topic.prefix` |string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | A prefix for all topic names that receive events emitted by this connector.
Default: "rdi" | +| `host` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The address of the database instance.| +| `port` | `integer` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | The port of the database instance.| +| `database` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The name of the database from which to stream the changes. For `SQL Server` you can define the database as comma-separated list of the SQL Server database names from which to stream the changes.| +| `database.pdb.name` | `string` | Oracle|The name of the [Oracle Pluggable Database](https://docs.oracle.com/en/database/oracle/oracle-database/19/riwin/about-pluggable-databases-in-oracle-rac.html) that the connector captures changes from. For non-CDB installation, do not specify this property.
Default: "ORCLPDB1"| +| `database.encrypt` | `string` | MySQL|If SSL is enabled for a SQL Server database, enable SSL by setting the value of this property to true.
Default: false | +| `database.server.id` | `integer` | MySQL|A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster.
Default: 1| +| `database.url` | `string` | Oracle|Specifies the raw database JDBC URL. Use this property to provide flexibility in defining that database connection. Valid values include raw TNS names and RAC connection strings.| +| `topic.prefix` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | A prefix for all topic names that receive events emitted by this connector.
Default: "rdi" | ### Advanced properties @@ -53,29 +53,29 @@ See the Debezium documentation for more information about the specific connector |Name|Type|Description| |--|--|--| -| `redis.null.key` | string | Redis does not support the notion of data without key, so this string will be used as key for records without primary key.
Default: "default" | -| `redis.null.value` | string | Redis does not support the notion of null payloads, as is the case with tombstone events. This string will be used as value for records without a payload.
Default: "default" | -| `redis.batch.size` | int | Number of change records to insert in a single batch write (Pipelined transaction).
Default: 500 | -| `redis.memory.limit.mb` | int | The connector stops sending events when Redis size exceeds this threshold.
Default: 300 | -| `redis.wait.enabled` | boolean | In case Redis is configured with a replica shard, this allows to verify that the data has been written to the replica.
Default: false | -| `redis.wait.timeout.ms` | int | Defines the timeout in milliseconds when waiting for replica.
Default: 1000 | -| `redis.wait.retry.enabled` | boolean | Enables retry on wait for replica failure.
Default: false | -| `redis.wait.retry.delay.ms` | int | Defines the delay of retry on wait for replica failure.
Default: 1000 | -| `redis.retry.initial.delay.ms` | int | Initial retry delay when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed `redis.retry.max.delay.ms`.
Default: 300 | -| `redis.retry.max.delay.ms` | int | Max delay when encountering Redis connection or OOM issues.
Default: 10000 | +| `redis.null.key` | `string` | Redis does not support the notion of data without key, so this string will be used as key for records without primary key.
Default: "default" | +| `redis.null.value` | `string` | Redis does not support the notion of null payloads, as is the case with tombstone events. This string will be used as value for records without a payload.
Default: "default" | +| `redis.batch.size` | `integer` | Number of change records to insert in a single batch write (Pipelined transaction).
Default: 500 | +| `redis.memory.limit.mb` | `integer` | The connector stops sending events when Redis size exceeds this threshold.
Default: 300 | +| `redis.wait.enabled` | `string` | In case Redis is configured with a replica shard, this allows to verify that the data has been written to the replica.
Default: false | +| `redis.wait.timeout.ms` | `integer` | Defines the timeout in milliseconds when waiting for replica.
Default: 1000 | +| `redis.wait.retry.enabled` | `string` | Enables retry on wait for replica failure.
Default: false | +| `redis.wait.retry.delay.ms` | `integer` | Defines the delay of retry on wait for replica failure.
Default: 1000 | +| `redis.retry.initial.delay.ms` | `integer` | Initial retry delay when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed `redis.retry.max.delay.ms`.
Default: 300 | +| `redis.retry.max.delay.ms` | `integer` | Max delay when encountering Redis connection or OOM issues.
Default: 10000 | ## `source` |Name|Type|Source Databases|Description| |--|--|--|--| -| `snapshot.mode` |string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the mode that the connector uses to take snapshots of a captured table.
Default: "initial" | -| `topic.prefix` |string| MySQL, Oracle, PostgreSQL, SQLServer|A prefix for all topic names that receive events emitted by this connector.
Default: "rdi" | -| `database.exclude.list` |string| MariaDB, MySQL|An optional, comma-separated list of regular expressions that match the names of databases for which you do not want to capture changes. The connector captures changes in any database whose name is not included in `database.exclude.list`. Do not specify the `database` field in the `connection` configuration if you are using the `database.exclude.list` property to filter out databases.| -| `schema.exclude.list` |string| Oracle, PostgreSQL, SQLServer|An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. The connector captures changes in any schema whose name is not included in `schema.exclude.list`. Do no specify the `schemas` section if you are using the `schema.exclude.list` property to filter out schemas. | -| `table.exclude.list` |string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; The connector captures all tables that are not included in `table.exclude.list`. Do not specify the `tables` block in the configuration if you are using the `table.exclude.list` property to filter out tables. | -| `column.exclude.list` | string| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. Do not specify the `columns` block in the configuration if you are using the `column.exclude.list` property to filter out columns. | -| `snapshot.select.statement.overrides` |String| MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the table rows to include in a snapshot. Use the property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log.| -| `log.enabled` |boolean| Oracle|Enables capturing and serialization of large object (CLOB, NCLOB, and BLOB) column values in change events.
Default: false | +| `snapshot.mode` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the mode that the connector uses to take snapshots of a captured table.
Default: "initial" | +| `topic.prefix` | `string` | MySQL, Oracle, PostgreSQL, SQLServer|A prefix for all topic names that receive events emitted by this connector.
Default: "rdi" | +| `database.exclude.list` | `string` | MariaDB, MySQL|An optional, comma-separated list of regular expressions that match the names of databases for which you do not want to capture changes. The connector captures changes in any database whose name is not included in `database.exclude.list`. Do not specify the `database` field in the `connection` configuration if you are using the `database.exclude.list` property to filter out databases.| +| `schema.exclude.list` | `string` | Oracle, PostgreSQL, SQLServer|An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. The connector captures changes in any schema whose name is not included in `schema.exclude.list`. Do no specify the `schemas` section if you are using the `schema.exclude.list` property to filter out schemas. | +| `table.exclude.list` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; The connector captures all tables that are not included in `table.exclude.list`. Do not specify the `tables` block in the configuration if you are using the `table.exclude.list` property to filter out tables. | +| `column.exclude.list` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. Do not specify the `columns` block in the configuration if you are using the `column.exclude.list` property to filter out columns. | +| `snapshot.select.statement.overrides` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the table rows to include in a snapshot. Use the property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log.| +| `log.enabled` | `string` | Oracle|Enables capturing and serialization of large object (CLOB, NCLOB, and BLOB) column values in change events.
Default: false | | `unavailable.value.placeholder` | Special | Oracle|Specifies the constant that the connector provides to indicate that the original value is unchanged and not provided by the database (this has the type `__debezium_unavailable_value`).| ### Using queries in the initial snapshot (relevant for MySQL, Oracle, PostgreSQL and SQLServer) @@ -235,21 +235,21 @@ Not allowed ## Properties -| Name | Type | Description | Required | -| -- | -- | -- | -- | -| [`connection`](#targetsconnection) | `object` | Connection details | | +| Name | Type | Description | +| -- | -- | -- | +| [`connection`](#targetsconnection) | `object` | Connection details | ### `targets.connection`: Connection details {#targetsconnection} ### Properties (Pattern) -| Name | Type | Description | | -| -- | -- | -- | -- | -| `host` | string | Host of the Redis database to which Redis Data Integration will write the processed data. | -| `port` | int | Port for the Redis database to which Redis Data Integration will write the processed data. | | -| `user` | string | User of the Redis database to which Redis Data Integration will write the processed data. Uncomment if not using default user. | -| `password` | string | Password for Redis target database. | -| `key` | string | uncomment the following lines if you are using SSL/TLS. | -| `key_password` | string | uncomment the following lines if you are using SSL/TLS. | -| `cert` | string | uncomment the following lines if you are using SSL/TLS. | -| `cacert` | string | uncomment the following lines if you are using SSL/TLS. | +| Name | Type | Description | +| -- | -- | -- | +| `host` | `string` | Host of the Redis database to which Redis Data Integration will write the processed data. | +| `port` | `integer` | Port for the Redis database to which Redis Data Integration will write the processed data. | +| `user` | `string` | User of the Redis database to which Redis Data Integration will write the processed data. Uncomment if not using default user. | +| `password` | `string` | Password for Redis target database. | +| `key` | `string` | uncomment the following lines if you are using SSL/TLS. | +| `key_password` | `string` | uncomment the following lines if you are using SSL/TLS. | +| `cert` | `string` | uncomment the following lines if you are using SSL/TLS. | +| `cacert` | `string` | uncomment the following lines if you are using SSL/TLS. | From 57a11ecd447a3da37dcaf24200d9f94ecc99971c Mon Sep 17 00:00:00 2001 From: Andy Stark Date: Tue, 21 Jan 2025 15:08:29 +0000 Subject: [PATCH 7/7] DOC-4744 rewrite property descriptions --- .../reference/config-yaml-reference.md | 130 +++++++++--------- 1 file changed, 65 insertions(+), 65 deletions(-) diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md index 28730a02e..f09191e03 100644 --- a/content/integrate/redis-data-integration/reference/config-yaml-reference.md +++ b/content/integrate/redis-data-integration/reference/config-yaml-reference.md @@ -20,7 +20,7 @@ These objects define the sections at the root level of `config.yaml`. | [`processors`](#processors)| `object`, `null` | RDI Processors | | [`targets`](#targets) | `object` | Target connections | -## sources: Source collectors {#sources} +## `sources`: Source collectors {#sources} Each source database type has its own connector, but the basic configuration properties are the same for all databases. @@ -36,47 +36,47 @@ See the Debezium documentation for more information about the specific connector #### `connection` -|Name|Type|Source Databases|Description| -|--|--|--|--| -| `host` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The address of the database instance.| -| `port` | `integer` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | The port of the database instance.| -| `database` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|The name of the database from which to stream the changes. For `SQL Server` you can define the database as comma-separated list of the SQL Server database names from which to stream the changes.| -| `database.pdb.name` | `string` | Oracle|The name of the [Oracle Pluggable Database](https://docs.oracle.com/en/database/oracle/oracle-database/19/riwin/about-pluggable-databases-in-oracle-rac.html) that the connector captures changes from. For non-CDB installation, do not specify this property.
Default: "ORCLPDB1"| -| `database.encrypt` | `string` | MySQL|If SSL is enabled for a SQL Server database, enable SSL by setting the value of this property to true.
Default: false | -| `database.server.id` | `integer` | MySQL|A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster.
Default: 1| -| `database.url` | `string` | Oracle|Specifies the raw database JDBC URL. Use this property to provide flexibility in defining that database connection. Valid values include raw TNS names and RAC connection strings.| -| `topic.prefix` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | A prefix for all topic names that receive events emitted by this connector.
Default: "rdi" | +| Name | Type | Source Databases | Description | +| -- | -- | -- | -- | +| `host` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer| The IP address of the database instance. | +| `port` | `integer` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | The port of the database instance. | +| `database` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer| The name of the database to capture changes from. For `SQL Server` you can define this as comma-separated list of database names. | +| `database.pdb.name` | `string` | Oracle |The name of the [Oracle Pluggable Database](https://docs.oracle.com/en/database/oracle/oracle-database/19/riwin/about-pluggable-databases-in-oracle-rac.html) that the connector captures changes from. Do not specify this property for a non-CDB installation.
Default: `"ORCLPDB1"` | +| `database.encrypt` | `string` | MySQL| If SSL is enabled for your SQL Server database, you should also enable SSL in RDI by setting the value of this property to `true`.
Default: `false` | +| `database.server.id` | `integer` | MySQL | Numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster.
Default: 1| +| `database.url` | `string` | Oracle | Specifies the raw database JDBC URL. Use this property to provide flexibility in defining the database connection. Valid values include raw TNS names and RAC connection strings.| +| `topic.prefix` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | A prefix for all topic names that receive events emitted by this connector.
Default: `"rdi"` | ### Advanced properties #### `sink` -|Name|Type|Description| -|--|--|--| -| `redis.null.key` | `string` | Redis does not support the notion of data without key, so this string will be used as key for records without primary key.
Default: "default" | -| `redis.null.value` | `string` | Redis does not support the notion of null payloads, as is the case with tombstone events. This string will be used as value for records without a payload.
Default: "default" | -| `redis.batch.size` | `integer` | Number of change records to insert in a single batch write (Pipelined transaction).
Default: 500 | -| `redis.memory.limit.mb` | `integer` | The connector stops sending events when Redis size exceeds this threshold.
Default: 300 | -| `redis.wait.enabled` | `string` | In case Redis is configured with a replica shard, this allows to verify that the data has been written to the replica.
Default: false | -| `redis.wait.timeout.ms` | `integer` | Defines the timeout in milliseconds when waiting for replica.
Default: 1000 | -| `redis.wait.retry.enabled` | `string` | Enables retry on wait for replica failure.
Default: false | -| `redis.wait.retry.delay.ms` | `integer` | Defines the delay of retry on wait for replica failure.
Default: 1000 | -| `redis.retry.initial.delay.ms` | `integer` | Initial retry delay when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed `redis.retry.max.delay.ms`.
Default: 300 | -| `redis.retry.max.delay.ms` | `integer` | Max delay when encountering Redis connection or OOM issues.
Default: 10000 | - -## `source` - -|Name|Type|Source Databases|Description| +| Name | Type | Description | +| -- | -- | -- | +| `redis.null.key` | `string` | Redis does not allow data objects without keys. This string will be used as the key for records that don't have a primary key.
Default: `"default"` | +| `redis.null.value` | `string` | Redis does not allow null object values (these occur with tombstone events, for example). This string will be used as the value for records without a payload.
Default: `"default"` | +| `redis.batch.size` | `integer` | Number of change records to insert in a single batch write (pipelined transaction).
Default: `500` | +| `redis.memory.limit.mb` | `integer` | The connector stops sending events when the Redis database size exceeds this size (in MB).
Default: `300` | +| `redis.wait.enabled` | `string` | If Redis is configured with a replica shard, this lets you verify that the data has been written to the replica.
Default: `false` | +| `redis.wait.timeout.ms` | `integer` | Defines the timeout in milliseconds when waiting for the replica.
Default: `1000` | +| `redis.wait.retry.enabled` | `string` | Enables retry on wait for replica failure.
Default: `false` | +| `redis.wait.retry.delay.ms` | `integer` | Defines the delay for retry on wait for replica failure.
Default: `1000` | +| `redis.retry.initial.delay.ms` | `integer` | Initial retry delay (in milliseconds) when encountering Redis connection or OOM issues. This value will be doubled upon every retry but won’t exceed `redis.retry.max.delay.ms`.
Default: `300` | +| `redis.retry.max.delay.ms` | `integer` | Maximum delay (in milliseconds) when encountering Redis connection or OOM issues.
Default: `10000` | + +#### `source` + +| Name | Type | Source Databases | Description | |--|--|--|--| -| `snapshot.mode` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the mode that the connector uses to take snapshots of a captured table.
Default: "initial" | -| `topic.prefix` | `string` | MySQL, Oracle, PostgreSQL, SQLServer|A prefix for all topic names that receive events emitted by this connector.
Default: "rdi" | -| `database.exclude.list` | `string` | MariaDB, MySQL|An optional, comma-separated list of regular expressions that match the names of databases for which you do not want to capture changes. The connector captures changes in any database whose name is not included in `database.exclude.list`. Do not specify the `database` field in the `connection` configuration if you are using the `database.exclude.list` property to filter out databases.| -| `schema.exclude.list` | `string` | Oracle, PostgreSQL, SQLServer|An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. The connector captures changes in any schema whose name is not included in `schema.exclude.list`. Do no specify the `schemas` section if you are using the `schema.exclude.list` property to filter out schemas. | -| `table.exclude.list` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; The connector captures all tables that are not included in `table.exclude.list`. Do not specify the `tables` block in the configuration if you are using the `table.exclude.list` property to filter out tables. | +| `snapshot.mode` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | Specifies the mode that the connector uses to take snapshots of a captured table.
Default: `"initial"` | +| `topic.prefix` | `string` | MySQL, Oracle, PostgreSQL, SQLServer| A prefix for all topic names that receive events emitted by this connector.
Default: `"rdi"` | +| `database.exclude.list` | `string` | MariaDB, MySQL | An optional, comma-separated list of regular expressions that match the names of databases for which you do not want to capture changes. The connector captures changes in any database whose name is not included in `database.exclude.list`. Do not specify the `database` field in the `connection` configuration if you are using the `database.exclude.list` property to filter out databases. | +| `schema.exclude.list` | `string` | Oracle, PostgreSQL, SQLServer | An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. The connector captures changes in any schema whose name is not included in `schema.exclude.list`. Do no specify the `schemas` section if you are using the `schema.exclude.list` property to filter out schemas. | +| `table.exclude.list` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match fully-qualified table identifiers for the tables that you want to exclude from being captured; The connector captures all tables that are not included in `table.exclude.list`. Do not specify the `tables` block in the configuration if you are using the `table.exclude.list` property to filter out tables. | | `column.exclude.list` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. Do not specify the `columns` block in the configuration if you are using the `column.exclude.list` property to filter out columns. | -| `snapshot.select.statement.overrides` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer|Specifies the table rows to include in a snapshot. Use the property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log.| -| `log.enabled` | `string` | Oracle|Enables capturing and serialization of large object (CLOB, NCLOB, and BLOB) column values in change events.
Default: false | -| `unavailable.value.placeholder` | Special | Oracle|Specifies the constant that the connector provides to indicate that the original value is unchanged and not provided by the database (this has the type `__debezium_unavailable_value`).| +| `snapshot.select.statement.overrides` | `string` | MariaDB, MySQL, Oracle, PostgreSQL, SQLServer |Specifies the table rows to include in a snapshot. Use this property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log. | +| `log.enabled` | `string` | Oracle | Enables capturing and serialization of large object (CLOB, NCLOB, and BLOB) column values in change events.
Default: `false` | +| `unavailable.value.placeholder` | Special | Oracle | Specifies the constant that the connector provides to indicate that the original value is unchanged and not provided by the database (this has the type `__debezium_unavailable_value`). | ### Using queries in the initial snapshot (relevant for MySQL, Oracle, PostgreSQL and SQLServer) @@ -201,37 +201,37 @@ a regular expression instead of providing the full name of the `databaseName` an - LastName ``` -## processors: RDI processors {#processors} +## `processors`: RDI processors {#processors} ### Properties -|Name|Type|Description|Required| -|--|--|--|--| -| `on_failed_retry_interval` |`integer`, `string`| Interval \(in seconds\) on which to perform retry on failure.
Default: `5`
Pattern: `^\${.*}$`
Minimum: `1`|| -| `read_batch_size` |`integer`, `string`| Batch size for reading data from the source database.
Default: `2000`
Pattern: `^\${.*}$`
Minimum: `1`|| -| `debezium_lob_encoded_placeholder` |`string`| Enable Debezium LOB placeholders.
Default: `"X19kZWJleml1bV91bmF2YWlsYWJsZV92YWx1ZQ=="`|| +| Name | Type | Description | +| -- | -- | -- | +| `on_failed_retry_interval` |`integer`, `string`| Interval (in seconds) between attempts to retry on failure.
Default: `5`
Pattern: `^\${.*}$`
Minimum: `1`| +| `read_batch_size` |`integer`, `string`| Batch size for reading data from the source database.
Default: `2000`
Pattern: `^\${.*}$`
Minimum: `1`| +| `debezium_lob_encoded_placeholder` |`string`| Enable Debezium LOB placeholders.
Default: `"X19kZWJleml1bV91bmF2YWlsYWJsZV92YWx1ZQ=="`| | `dedup` |`boolean`| Enable deduplication mechanism.
Default: `false`
|| -| `dedup_max_size` |`integer`| Max size of the deduplication set.
Default: `1024`
Minimum: `1`
|| -| `dedup_strategy` |`string`| Deduplication strategy: reject \- reject messages\(dlq\), ignore \- ignore messages.
(DEPRECATED)
Property 'dedup_strategy' is now deprecated. The only supported strategy is 'ignore'. Please remove from the configuration.
Default: `"ignore"`
Enum: `"reject"`, `"ignore"`
|| -| `duration` |`integer`, `string`| Time (in ms) after which data will be read from stream even if read_batch_size was not reached.
Default: `100`
Pattern: `^\${.*}$`
Minimum: `1`
|| -| `write_batch_size` |`integer`, `string`| The batch size for writing data to target Redis database\. Should be less or equal to the read_batch_size.
Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
|| -| `error_handling` |`string`| Error handling strategy: ignore \- skip, dlq \- store rejected messages in a dead letter queue.
Default: `"dlq"`
Pattern: `^\${.*}$|ignore|dlq`
|| -| `dlq_max_messages` |`integer`, `string`| Dead letter queue max messages per stream.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
|| -| `target_data_type` |`string`| Target data type: hash/json \- RedisJSON module must be in use in the target DB.
Default: `"hash"`
Pattern: `^\${.*}$|hash|json`
|| -| `json_update_strategy` |`string`| Target update strategy: replace/merge \- RedisJSON module must be in use in the target DB.
(DEPRECATED)
Property 'json_update_strategy' will be deprecated in future releases. Use 'on_update' job-level property to define the json update strategy.
Default: `"replace"`
Pattern: `^\${.*}$|replace|merge`
|| -| `initial_sync_processes` |`integer`, `string`| Number of processes RDI Engine creates to process the initial sync with the source.
Default: `4`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `32`
|| -| `idle_sleep_time_ms` |`integer`, `string`| Idle sleep time \(in milliseconds\) between batches.
Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| -| `idle_streams_check_interval_ms` |`integer`, `string`| Interval \(in milliseconds\) for checking new streams when the stream processor is idling.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| -| `busy_streams_check_interval_ms` |`integer`, `string`| Interval \(in milliseconds\) for checking new streams when the stream processor is busy.
Default: `5000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
|| -| `wait_enabled` |`boolean`| Checks if the data has been written to the replica shard.
Default: `false`
|| -| `wait_timeout` |`integer`, `string`| Timeout in milliseconds when checking write to the replica shard.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
|| -| `retry_on_replica_failure` |`boolean`| Ensures that the data has been written to the replica shard and keeps retrying if not.
Default: `true`
|| +| `dedup_max_size` |`integer`| Maximum size of the deduplication set.
Default: `1024`
Minimum: `1`
| +| `dedup_strategy` |`string`| Deduplication strategy: reject \- reject messages(dlq), ignore \- ignore messages.
(DEPRECATED)
The property `dedup_strategy` is now deprecated. The only supported strategy is 'ignore'. Please remove from the configuration.
Default: `"ignore"`
Enum: `"reject"`, `"ignore"`
| +| `duration` |`integer`, `string`| Time (in ms) after which data will be read from stream even if `read_batch_size` was not reached.
Default: `100`
Pattern: `^\${.*}$`
Minimum: `1`
| +| `write_batch_size` |`integer`, `string`| The batch size for writing data to target Redis database\. Should be less or equal to `read_batch_size`.
Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
| +| `error_handling` |`string`| Error handling strategy: ignore \- skip, dlq \- store rejected messages in a dead letter queue.
Default: `"dlq"`
Pattern: `^\${.*}$\|ignore\|dlq`
| +| `dlq_max_messages` |`integer`, `string`| Maximum number of messages per stream in the dead letter queue .
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
| +| `target_data_type` |`string`| Target data type: `hash`/`json` (the RedisJSON module must be enabled in the target database to use JSON).
Default: `"hash"`
Pattern: `^\${.*}$\|hash\|json`
| +| `json_update_strategy` |`string`| Target update strategy: replace/merge (the RedisJSON module must be enabled in the target DB to use JSON).
(DEPRECATED)
The property `json_update_strategy` will be deprecated in future releases. Use the job-level property `on_update` to define the JSON update strategy.
Default: `"replace"`
Pattern: `^\${.*}$\|replace\|merge`
| +| `initial_sync_processes` |`integer`, `string`| Number of processes the RDI Engine creates to process the initial sync with the source.
Default: `4`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `32`
| +| `idle_sleep_time_ms` |`integer`, `string`| Idle sleep time (in milliseconds) between batches.
Default: `200`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
| +| `idle_streams_check_interval_ms` |`integer`, `string`| Interval (in milliseconds) for checking new streams when the stream processor is idling.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
| +| `busy_streams_check_interval_ms` |`integer`, `string`| Interval (in milliseconds) for checking new streams when the stream processor is busy.
Default: `5000`
Pattern: `^\${.*}$`
Minimum: `1`
Maximum: `999999`
| +| `wait_enabled` |`boolean`| Checks if the data has been written to the replica shard.
Default: `false`
| +| `wait_timeout` |`integer`, `string`| Timeout in milliseconds when checking write to the replica shard.
Default: `1000`
Pattern: `^\${.*}$`
Minimum: `1`
| +| `retry_on_replica_failure` |`boolean`| Ensures that the data has been written to the replica shard and keeps retrying if not.
Default: `true`
| ### Additional properties Not allowed -## targets: Target connections {#targets} +## `targets`: Target connections {#targets} ## Properties @@ -241,15 +241,15 @@ Not allowed ### `targets.connection`: Connection details {#targetsconnection} -### Properties (Pattern) +### Properties | Name | Type | Description | | -- | -- | -- | -| `host` | `string` | Host of the Redis database to which Redis Data Integration will write the processed data. | -| `port` | `integer` | Port for the Redis database to which Redis Data Integration will write the processed data. | -| `user` | `string` | User of the Redis database to which Redis Data Integration will write the processed data. Uncomment if not using default user. | +| `host` | `string` | IP address of the Redis database where RDI will write the processed data. | +| `port` | `integer` | Port of the Redis database where RDI will write the processed data. | +| `user` | `string` | User of the Redis database where RDI will write the processed data. Uncomment this if you are not using the default user. | | `password` | `string` | Password for Redis target database. | -| `key` | `string` | uncomment the following lines if you are using SSL/TLS. | -| `key_password` | `string` | uncomment the following lines if you are using SSL/TLS. | -| `cert` | `string` | uncomment the following lines if you are using SSL/TLS. | -| `cacert` | `string` | uncomment the following lines if you are using SSL/TLS. | +| `key` | `string` | Uncomment the lines below this if you are using SSL/TLS. | +| `key_password` | `string` | Uncomment the lines below this if you are using SSL/TLS. | +| `cert` | `string` | Uncomment the lines below this if you are using SSL/TLS. | +| `cacert` | `string` | Uncomment the lines below this if you are using SSL/TLS. |