Skip to content

Commit

Permalink
[Feature][Connector-V2] [Hudi]Add hudi sink connector (apache#4405)
Browse files Browse the repository at this point in the history
  • Loading branch information
liugddx authored Jul 9, 2024
1 parent d663398 commit dc271dc
Show file tree
Hide file tree
Showing 28 changed files with 2,021 additions and 762 deletions.
1 change: 0 additions & 1 deletion docs/en/Connector-v2-release-state.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ SeaTunnel uses a grading system for connectors to help you understand what to ex
| [Hive](connector-v2/source/Hive.md) | Source | GA | 2.2.0-beta |
| [Http](connector-v2/sink/Http.md) | Sink | Beta | 2.2.0-beta |
| [Http](connector-v2/source/Http.md) | Source | Beta | 2.2.0-beta |
| [Hudi](connector-v2/source/Hudi.md) | Source | Beta | 2.2.0-beta |
| [Iceberg](connector-v2/source/Iceberg.md) | Source | Beta | 2.2.0-beta |
| [InfluxDB](connector-v2/sink/InfluxDB.md) | Sink | Beta | 2.3.0 |
| [InfluxDB](connector-v2/source/InfluxDB.md) | Source | Beta | 2.3.0-beta |
Expand Down
98 changes: 98 additions & 0 deletions docs/en/connector-v2/sink/Hudi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Hudi

> Hudi sink connector
## Description

Used to write data to Hudi.

## Key features

- [x] [exactly-once](../../concept/connector-v2-features.md)
- [x] [cdc](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
|----------------------------|--------|----------|---------------|
| table_name | string | yes | - |
| table_dfs_path | string | yes | - |
| conf_files_path | string | no | - |
| record_key_fields | string | no | - |
| partition_fields | string | no | - |
| table_type | enum | no | copy_on_write |
| op_type | enum | no | insert |
| batch_interval_ms | Int | no | 1000 |
| insert_shuffle_parallelism | Int | no | 2 |
| upsert_shuffle_parallelism | Int | no | 2 |
| min_commits_to_keep | Int | no | 20 |
| max_commits_to_keep | Int | no | 30 |
| common-options | config | no | - |

### table_name [string]

`table_name` The name of hudi table.

### table_dfs_path [string]

`table_dfs_path` The dfs root path of hudi table,such as 'hdfs://nameserivce/data/hudi/hudi_table/'.

### table_type [enum]

`table_type` The type of hudi table. The value is 'copy_on_write' or 'merge_on_read'.

### conf_files_path [string]

`conf_files_path` The environment conf file path list(local path), which used to init hdfs client to read hudi table file. The example is '/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml'.

### op_type [enum]

`op_type` The operation type of hudi table. The value is 'insert' or 'upsert' or 'bulk_insert'.

### batch_interval_ms [Int]

`batch_interval_ms` The interval time of batch write to hudi table.

### insert_shuffle_parallelism [Int]

`insert_shuffle_parallelism` The parallelism of insert data to hudi table.

### upsert_shuffle_parallelism [Int]

`upsert_shuffle_parallelism` The parallelism of upsert data to hudi table.

### min_commits_to_keep [Int]

`min_commits_to_keep` The min commits to keep of hudi table.

### max_commits_to_keep [Int]

`max_commits_to_keep` The max commits to keep of hudi table.

### common options

Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details.

## Examples

```hocon
source {
Hudi {
table_dfs_path = "hdfs://nameserivce/data/hudi/hudi_table/"
table_type = "copy_on_write"
conf_files_path = "/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml"
use.kerberos = true
kerberos.principal = "test_user@xxx"
kerberos.principal.file = "/home/test/test_user.keytab"
}
}
```

## Changelog

### 2.2.0-beta 2022-09-26

- Add Hudi Source Connector

90 changes: 0 additions & 90 deletions docs/en/connector-v2/source/Hudi.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/zh/Connector-v2-release-state.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ SeaTunnel 使用连接器分级系统来帮助您了解连接器的期望:
| [Hive](../en/connector-v2/source/Hive.md) | Source | GA | 2.2.0-beta |
| [Http](connector-v2/sink/Http.md) | Sink | Beta | 2.2.0-beta |
| [Http](../en/connector-v2/source/Http.md) | Source | Beta | 2.2.0-beta |
| [Hudi](../en/connector-v2/source/Hudi.md) | Source | Beta | 2.2.0-beta |
| [Iceberg](../en/connector-v2/source/Iceberg.md) | Source | Beta | 2.2.0-beta |
| [InfluxDB](../en/connector-v2/sink/InfluxDB.md) | Sink | Beta | 2.3.0 |
| [InfluxDB](../en/connector-v2/source/InfluxDB.md) | Source | Beta | 2.3.0-beta |
Expand Down
92 changes: 92 additions & 0 deletions docs/zh/connector-v2/sink/Hudi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Hudi

> Hudi 接收器连接器
## 描述

用于将数据写入 Hudi。

## 主要特点

- [x] [exactly-once](../../concept/connector-v2-features.md)
- [x] [cdc](../../concept/connector-v2-features.md)

## 选项

| 名称 | 类型 | 是否必需 | 默认值 |
|----------------------------|--------|------|---------------|
| table_name | string || - |
| table_dfs_path | string || - |
| conf_files_path | string || - |
| record_key_fields | string || - |
| partition_fields | string || - |
| table_type | enum || copy_on_write |
| op_type | enum || insert |
| batch_interval_ms | Int || 1000 |
| insert_shuffle_parallelism | Int || 2 |
| upsert_shuffle_parallelism | Int || 2 |
| min_commits_to_keep | Int || 20 |
| max_commits_to_keep | Int || 30 |
| common-options | config || - |

### table_name [string]

`table_name` Hudi 表的名称。

### table_dfs_path [string]

`table_dfs_path` Hudi 表的 DFS 根路径,例如 "hdfs://nameservice/data/hudi/hudi_table/"。

### table_type [enum]

`table_type` Hudi 表的类型。

### conf_files_path [string]

`conf_files_path` 环境配置文件路径列表(本地路径),用于初始化 HDFS 客户端以读取 Hudi 表文件。示例:"/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml"。

### op_type [enum]

`op_type` Hudi 表的操作类型。值可以是 'insert'、'upsert' 或 'bulk_insert'。

### batch_interval_ms [Int]

`batch_interval_ms` 批量写入 Hudi 表的时间间隔。

### insert_shuffle_parallelism [Int]

`insert_shuffle_parallelism` 插入数据到 Hudi 表的并行度。

### upsert_shuffle_parallelism [Int]

`upsert_shuffle_parallelism` 更新插入数据到 Hudi 表的并行度。

### min_commits_to_keep [Int]

`min_commits_to_keep` Hudi 表保留的最少提交数。

### max_commits_to_keep [Int]

`max_commits_to_keep` Hudi 表保留的最多提交数。

### 通用选项

数据源插件的通用参数,请参考 [Source Common Options](common-options.md) 了解详细信息。

## 示例

```hocon
source {
Hudi {
table_dfs_path = "hdfs://nameserivce/data/hudi/hudi_table/"
table_type = "cow"
conf_files_path = "/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml"
use.kerberos = true
kerberos.principal = "test_user@xxx"
kerberos.principal.file = "/home/test/test_user.keytab"
}
}
```

2 changes: 1 addition & 1 deletion plugin-mapping.properties
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ seatunnel.sink.OssJindoFile = connector-file-jindo-oss
seatunnel.source.CosFile = connector-file-cos
seatunnel.sink.CosFile = connector-file-cos
seatunnel.source.Pulsar = connector-pulsar
seatunnel.source.Hudi = connector-hudi
seatunnel.sink.DingTalk = connector-dingtalk
seatunnel.source.Elasticsearch = connector-elasticsearch
seatunnel.sink.Elasticsearch = connector-elasticsearch
Expand Down Expand Up @@ -119,6 +118,7 @@ seatunnel.source.AmazonSqs = connector-amazonsqs
seatunnel.sink.AmazonSqs = connector-amazonsqs
seatunnel.source.Paimon = connector-paimon
seatunnel.sink.Paimon = connector-paimon
seatunnel.sink.hudi = connector-hudi
seatunnel.sink.Druid = connector-druid
seatunnel.source.Easysearch = connector-easysearch
seatunnel.sink.Easysearch = connector-easysearch
Expand Down
Loading

0 comments on commit dc271dc

Please sign in to comment.