forked from apache/seatunnel
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature][Connector-V2] [Hudi]Add hudi sink connector (apache#4405)
- Loading branch information
Showing
28 changed files
with
2,021 additions
and
762 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# Hudi | ||
|
||
> Hudi sink connector | ||
## Description | ||
|
||
Used to write data to Hudi. | ||
|
||
## Key features | ||
|
||
- [x] [exactly-once](../../concept/connector-v2-features.md) | ||
- [x] [cdc](../../concept/connector-v2-features.md) | ||
|
||
## Options | ||
|
||
| name | type | required | default value | | ||
|----------------------------|--------|----------|---------------| | ||
| table_name | string | yes | - | | ||
| table_dfs_path | string | yes | - | | ||
| conf_files_path | string | no | - | | ||
| record_key_fields | string | no | - | | ||
| partition_fields | string | no | - | | ||
| table_type | enum | no | copy_on_write | | ||
| op_type | enum | no | insert | | ||
| batch_interval_ms | Int | no | 1000 | | ||
| insert_shuffle_parallelism | Int | no | 2 | | ||
| upsert_shuffle_parallelism | Int | no | 2 | | ||
| min_commits_to_keep | Int | no | 20 | | ||
| max_commits_to_keep | Int | no | 30 | | ||
| common-options | config | no | - | | ||
|
||
### table_name [string] | ||
|
||
`table_name` The name of hudi table. | ||
|
||
### table_dfs_path [string] | ||
|
||
`table_dfs_path` The dfs root path of hudi table,such as 'hdfs://nameserivce/data/hudi/hudi_table/'. | ||
|
||
### table_type [enum] | ||
|
||
`table_type` The type of hudi table. The value is 'copy_on_write' or 'merge_on_read'. | ||
|
||
### conf_files_path [string] | ||
|
||
`conf_files_path` The environment conf file path list(local path), which used to init hdfs client to read hudi table file. The example is '/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml'. | ||
|
||
### op_type [enum] | ||
|
||
`op_type` The operation type of hudi table. The value is 'insert' or 'upsert' or 'bulk_insert'. | ||
|
||
### batch_interval_ms [Int] | ||
|
||
`batch_interval_ms` The interval time of batch write to hudi table. | ||
|
||
### insert_shuffle_parallelism [Int] | ||
|
||
`insert_shuffle_parallelism` The parallelism of insert data to hudi table. | ||
|
||
### upsert_shuffle_parallelism [Int] | ||
|
||
`upsert_shuffle_parallelism` The parallelism of upsert data to hudi table. | ||
|
||
### min_commits_to_keep [Int] | ||
|
||
`min_commits_to_keep` The min commits to keep of hudi table. | ||
|
||
### max_commits_to_keep [Int] | ||
|
||
`max_commits_to_keep` The max commits to keep of hudi table. | ||
|
||
### common options | ||
|
||
Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details. | ||
|
||
## Examples | ||
|
||
```hocon | ||
source { | ||
Hudi { | ||
table_dfs_path = "hdfs://nameserivce/data/hudi/hudi_table/" | ||
table_type = "copy_on_write" | ||
conf_files_path = "/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml" | ||
use.kerberos = true | ||
kerberos.principal = "test_user@xxx" | ||
kerberos.principal.file = "/home/test/test_user.keytab" | ||
} | ||
} | ||
``` | ||
|
||
## Changelog | ||
|
||
### 2.2.0-beta 2022-09-26 | ||
|
||
- Add Hudi Source Connector | ||
|
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# Hudi | ||
|
||
> Hudi 接收器连接器 | ||
## 描述 | ||
|
||
用于将数据写入 Hudi。 | ||
|
||
## 主要特点 | ||
|
||
- [x] [exactly-once](../../concept/connector-v2-features.md) | ||
- [x] [cdc](../../concept/connector-v2-features.md) | ||
|
||
## 选项 | ||
|
||
| 名称 | 类型 | 是否必需 | 默认值 | | ||
|----------------------------|--------|------|---------------| | ||
| table_name | string | 是 | - | | ||
| table_dfs_path | string | 是 | - | | ||
| conf_files_path | string | 否 | - | | ||
| record_key_fields | string | 否 | - | | ||
| partition_fields | string | 否 | - | | ||
| table_type | enum | 否 | copy_on_write | | ||
| op_type | enum | 否 | insert | | ||
| batch_interval_ms | Int | 否 | 1000 | | ||
| insert_shuffle_parallelism | Int | 否 | 2 | | ||
| upsert_shuffle_parallelism | Int | 否 | 2 | | ||
| min_commits_to_keep | Int | 否 | 20 | | ||
| max_commits_to_keep | Int | 否 | 30 | | ||
| common-options | config | 否 | - | | ||
|
||
### table_name [string] | ||
|
||
`table_name` Hudi 表的名称。 | ||
|
||
### table_dfs_path [string] | ||
|
||
`table_dfs_path` Hudi 表的 DFS 根路径,例如 "hdfs://nameservice/data/hudi/hudi_table/"。 | ||
|
||
### table_type [enum] | ||
|
||
`table_type` Hudi 表的类型。 | ||
|
||
### conf_files_path [string] | ||
|
||
`conf_files_path` 环境配置文件路径列表(本地路径),用于初始化 HDFS 客户端以读取 Hudi 表文件。示例:"/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml"。 | ||
|
||
### op_type [enum] | ||
|
||
`op_type` Hudi 表的操作类型。值可以是 'insert'、'upsert' 或 'bulk_insert'。 | ||
|
||
### batch_interval_ms [Int] | ||
|
||
`batch_interval_ms` 批量写入 Hudi 表的时间间隔。 | ||
|
||
### insert_shuffle_parallelism [Int] | ||
|
||
`insert_shuffle_parallelism` 插入数据到 Hudi 表的并行度。 | ||
|
||
### upsert_shuffle_parallelism [Int] | ||
|
||
`upsert_shuffle_parallelism` 更新插入数据到 Hudi 表的并行度。 | ||
|
||
### min_commits_to_keep [Int] | ||
|
||
`min_commits_to_keep` Hudi 表保留的最少提交数。 | ||
|
||
### max_commits_to_keep [Int] | ||
|
||
`max_commits_to_keep` Hudi 表保留的最多提交数。 | ||
|
||
### 通用选项 | ||
|
||
数据源插件的通用参数,请参考 [Source Common Options](common-options.md) 了解详细信息。 | ||
|
||
## 示例 | ||
|
||
```hocon | ||
source { | ||
Hudi { | ||
table_dfs_path = "hdfs://nameserivce/data/hudi/hudi_table/" | ||
table_type = "cow" | ||
conf_files_path = "/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml" | ||
use.kerberos = true | ||
kerberos.principal = "test_user@xxx" | ||
kerberos.principal.file = "/home/test/test_user.keytab" | ||
} | ||
} | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.