Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve][Doc] Add file_filter_pattern example to doc #7922

Merged
merged 23 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
baa7cb8
[Fix][Doc] Fix LocalFile doc (#7887)
YOMO-Lee Oct 22, 2024
415a510
Merge branch 'apache:dev' into dev
YOMO-Lee Oct 23, 2024
9088a0d
Merge branch 'apache:dev' into dev
YOMO-Lee Oct 25, 2024
4276681
[Fix][Connector-V2][ClickHouse] Fix ClickHouse Bug (#7897)
YOMO-Lee Oct 25, 2024
1b80667
[Fix][Connector-V2][ClickHouse] Fix ClickHouse Bug (#7897)
YOMO-Lee Oct 25, 2024
e64b8a6
[Fix][Connector-V2][ClickHouse] Fix ClickHouse Bug (#7897)
YOMO-Lee Oct 26, 2024
42e5919
[Fix][Doc] Fix LocalFile doc (#7887)
YOMO-Lee Oct 26, 2024
8b13c97
Merge branch 'apache:dev' into dev
YOMO-Lee Oct 26, 2024
2e9162d
[Fix][Doc] Fix LocalFile doc (#7887)
YOMO-Lee Oct 26, 2024
f5073f6
Merge remote-tracking branch 'origin/dev' into dev
YOMO-Lee Oct 26, 2024
1dcf78e
Merge branch 'apache:dev' into dev
YOMO-Lee Oct 26, 2024
e564e7f
Revert "[Fix][Doc] Fix LocalFile doc (#7887)"
YOMO-Lee Oct 26, 2024
c5bcdf7
Revert "[Fix][Connector-V2][ClickHouse] Fix ClickHouse Bug (#7897)"
YOMO-Lee Oct 26, 2024
d02a01b
Revert "[Fix][Connector-V2][ClickHouse] Fix ClickHouse Bug (#7897)"
YOMO-Lee Oct 26, 2024
52ee377
Revert "[Fix][Connector-V2][ClickHouse] Fix ClickHouse Bug (#7897)"
YOMO-Lee Oct 26, 2024
0062ba4
[Fix][Doc] Fix LocalFile Doc
YOMO-Lee Oct 28, 2024
eae7b14
[Fix][DOC] LocalFile doc optimize
YOMO-Lee Oct 29, 2024
d542c11
[Fix][DOC] LocalFile doc optimize
YOMO-Lee Oct 29, 2024
867a840
[Fix][DOC] Additional explanation for the file_filter_pattern parameter
YOMO-Lee Oct 29, 2024
c1e8f09
[Fix][DOC] Additional explanation for the file_filter_pattern parameter
YOMO-Lee Oct 29, 2024
9b14797
[Fix][DOC] Additional explanation for the file_filter_pattern parameter
YOMO-Lee Oct 29, 2024
4df08ad
Merge remote-tracking branch 'origin/LocalFile' into LocalFile
YOMO-Lee Oct 29, 2024
9dc6fe7
update
Hisoka-X Oct 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 78 additions & 2 deletions docs/en/connector-v2/source/CosFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ To use this connector you need put hadoop-cos-{hadoop.version}-{version}.jar and

## Options

| name | type | required | default value |
| name | type | required | default value |
|---------------------------|---------|----------|---------------------|
| path | string | yes | - |
| file_format_type | string | yes | - |
Expand All @@ -64,7 +64,7 @@ To use this connector you need put hadoop-cos-{hadoop.version}-{version}.jar and
| sheet_name | string | no | - |
| xml_row_tag | string | no | - |
| xml_use_attr_format | boolean | no | - |
| file_filter_pattern | string | no | - |
| file_filter_pattern | string | no | |
| compress_codec | string | no | none |
| archive_compress_codec | string | no | none |
| encoding | string | no | UTF-8 |
Expand Down Expand Up @@ -275,6 +275,55 @@ Specifies Whether to process data using the tag attribute format.

Filter pattern, which used for filtering files.

The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression.
There are some examples.

File Structure Example:
```
/data/seatunnel/20241001/report.txt
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
/data/seatunnel/20241005/old_data.csv
/data/seatunnel/20241012/logo.png
```
Matching Rules Example:

**Example 1**: *Match all .txt files*,Regular Expression:
```
/data/seatunnel/20241001/.*\.txt
```
The result of this example matching is:
```
/data/seatunnel/20241001/report.txt
```
**Example 2**: *Match all file starting with abc*,Regular Expression:
```
/data/seatunnel/20241002/abc.*
```
The result of this example matching is:
```
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
```
**Example 3**: *Match all file starting with abc,And the fourth character is either h or g*, the Regular Expression:
```
/data/seatunnel/20241007/abc[h,g].*
```
The result of this example matching is:
```
/data/seatunnel/20241007/abch202410.csv
```
**Example 4**: *Match third level folders starting with 202410 and files ending with .csv*, the Regular Expression:
```
/data/seatunnel/202410\d*/.*\.csv
```
The result of this example matching is:
```
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
/data/seatunnel/20241005/old_data.csv
```

### compress_codec [string]

The compress codec of files and the details that supported as the following shown:
Expand Down Expand Up @@ -372,6 +421,33 @@ sink {

```

### Filter File

```hocon
env {
parallelism = 1
job.mode = "BATCH"
}

source {
CosFile {
bucket = "cosn://seatunnel-test-1259587829"
secret_id = "xxxxxxxxxxxxxxxxxxx"
secret_key = "xxxxxxxxxxxxxxxxxxx"
region = "ap-chengdu"
path = "/seatunnel/read/binary/"
file_format_type = "binary"
// file example abcD2024.csv
file_filter_pattern = "abc[DX]*.*"
}
}

sink {
Console {
}
}
```

## Changelog

### next version
Expand Down
80 changes: 80 additions & 0 deletions docs/en/connector-v2/source/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,59 @@ The target ftp password is required

The source file path.

### file_filter_pattern [string]

Filter pattern, which used for filtering files.

The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression.
There are some examples.

File Structure Example:
```
/data/seatunnel/20241001/report.txt
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
/data/seatunnel/20241005/old_data.csv
/data/seatunnel/20241012/logo.png
```
Matching Rules Example:

**Example 1**: *Match all .txt files*,Regular Expression:
```
/data/seatunnel/20241001/.*\.txt
```
The result of this example matching is:
```
/data/seatunnel/20241001/report.txt
```
**Example 2**: *Match all file starting with abc*,Regular Expression:
```
/data/seatunnel/20241002/abc.*
```
The result of this example matching is:
```
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
```
**Example 3**: *Match all file starting with abc,And the fourth character is either h or g*, the Regular Expression:
```
/data/seatunnel/20241007/abc[h,g].*
```
The result of this example matching is:
```
/data/seatunnel/20241007/abch202410.csv
```
**Example 4**: *Match third level folders starting with 202410 and files ending with .csv*, the Regular Expression:
```
/data/seatunnel/202410\d*/.*\.csv
```
The result of this example matching is:
```
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
/data/seatunnel/20241005/old_data.csv
```

### file_format_type [string]

File type, supported as the following file types:
Expand Down Expand Up @@ -400,6 +453,33 @@ sink {

```

### Filter File

```hocon
env {
parallelism = 1
job.mode = "BATCH"
}

source {
FtpFile {
host = "192.168.31.48"
port = 21
user = tyrantlucifer
password = tianchao
path = "/seatunnel/read/binary/"
file_format_type = "binary"
// file example abcD2024.csv
file_filter_pattern = "abc[DX]*.*"
}
}

sink {
Console {
}
}
```

## Changelog

### 2.2.0-beta 2022-09-26
Expand Down
79 changes: 78 additions & 1 deletion docs/en/connector-v2/source/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Read data from hdfs file system.

## Source Options

| Name | Type | Required | Default | Description |
| Name | Type | Required | Default | Description |
|---------------------------|---------|----------|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| path | string | yes | - | The source file path. |
| file_format_type | string | yes | - | We supported as the following file types:`text` `csv` `parquet` `orc` `json` `excel` `xml` `binary`.Please note that, The final file name will end with the file_format's suffix, the suffix of the text file is `txt`. |
Expand All @@ -62,6 +62,7 @@ Read data from hdfs file system.
| sheet_name | string | no | - | Reader the sheet of the workbook,Only used when file_format is excel. |
| xml_row_tag | string | no | - | Specifies the tag name of the data rows within the XML file, only used when file_format is xml. |
| xml_use_attr_format | boolean | no | - | Specifies whether to process data using the tag attribute format, only used when file_format is xml. |
| file_filter_pattern | string | no | | Filter pattern, which used for filtering files. |
| compress_codec | string | no | none | The compress codec of files |
| archive_compress_codec | string | no | none |
| encoding | string | no | UTF-8 | |
Expand All @@ -71,6 +72,59 @@ Read data from hdfs file system.

**delimiter** parameter will deprecate after version 2.3.5, please use **field_delimiter** instead.

### file_filter_pattern [string]

Filter pattern, which used for filtering files.

The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression.
There are some examples.

File Structure Example:
```
/data/seatunnel/20241001/report.txt
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
/data/seatunnel/20241005/old_data.csv
/data/seatunnel/20241012/logo.png
```
Matching Rules Example:

**Example 1**: *Match all .txt files*,Regular Expression:
```
/data/seatunnel/20241001/.*\.txt
```
The result of this example matching is:
```
/data/seatunnel/20241001/report.txt
```
**Example 2**: *Match all file starting with abc*,Regular Expression:
```
/data/seatunnel/20241002/abc.*
```
The result of this example matching is:
```
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
```
**Example 3**: *Match all file starting with abc,And the fourth character is either h or g*, the Regular Expression:
```
/data/seatunnel/20241007/abc[h,g].*
```
The result of this example matching is:
```
/data/seatunnel/20241007/abch202410.csv
```
**Example 4**: *Match third level folders starting with 202410 and files ending with .csv*, the Regular Expression:
```
/data/seatunnel/202410\d*/.*\.csv
```
The result of this example matching is:
```
/data/seatunnel/20241007/abch202410.csv
/data/seatunnel/20241002/abcg202410.csv
/data/seatunnel/20241005/old_data.csv
```

### compress_codec [string]

The compress codec of files and the details that supported as the following shown:
Expand Down Expand Up @@ -146,3 +200,26 @@ sink {
}
```

### Filter File

```hocon
env {
parallelism = 1
job.mode = "BATCH"
}

source {
HdfsFile {
path = "/apps/hive/demo/student"
file_format_type = "json"
fs.defaultFS = "hdfs://namenode001"
// file example abcD2024.csv
file_filter_pattern = "abc[DX]*.*"
}
}

sink {
Console {
}
}
```
Loading
Loading