Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve][Doc] Add file_filter_pattern example to doc #7922

Merged
merged 23 commits into from
Oct 29, 2024
Merged

[Improve][Doc] Add file_filter_pattern example to doc #7922

merged 23 commits into from
Oct 29, 2024

Conversation

YOMO-Lee
Copy link
Contributor

Added file filtering instructions to the localfile connector documentation

YOMO-Lee and others added 15 commits October 22, 2024 17:28
Supplement and optimize the description of the LocalFile connector on filtering files
[(#7887)](#7887)
1、When the ClickHouse connector is set to multi parallelism, the task extraction is completed but cannot be stopped normally
[(#7897)](#7897)

2、Added E2E test cases for this issue [(#7897)](#7897)

3、Local developers want to observe **Job Progress Information** in a timely manner,  Need to modify the following configuration.The configuration in config is invalid
```
seatunnel engine/seatunnel-engineer-common/src/main/resources/seatunnely.yaml
```
1、When the ClickHouse connector is set to multi parallelism, the task extraction is completed but cannot be stopped normally
[(#7897)](#7897)

2、Added E2E test cases for this issue [(#7897)](#7897)

3、Local developers want to observe **Job Progress Information** in a timely manner,  Need to modify the following configuration.The configuration in config is invalid
```
seatunnel engine/seatunnel-engineer-common/src/main/resources/seatunnely.yaml
```
1、When the ClickHouse connector is set to multi parallelism, the task extraction is completed but cannot be stopped normally
[(#7897)](#7897)

2、Added E2E test cases for this issue [(#7897)](#7897)

3、Local developers want to observe **Job Progress Information** in a timely manner, Need to modify the following configuration.The configuration in config is invalid
```
seatunnel engine/seatunnel-engineer-common/src/main/resources/seatunnely.yaml
```
Continue to optimize the document about filtering files and add some examples
[(#7887)](#7887)
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @YOMO-Lee ! I left some comments.

@@ -254,6 +254,72 @@ Specifies Whether to process data using the tag attribute format.

Filter pattern, which used for filtering files.

The filtering format is similar to wildcard matching file names in Linux.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot tell users about the ambiguous conclusion. Please tell users directly that we use Java regular expressions.

Comment on lines 259 to 270
| Wildcard | Meaning | Example |
|--------------|--------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| * | Match 0 or more characters | f* &emsp;&ensp;&emsp; Any file starting with f<br/>b*.txt &emsp; Any file starting with b, any character in the middle, and ending with. txt |
| [] | Match a single character in parentheses | [abc]* &emsp; A file that starts with any one of the characters a, b, or c |
| ? | Match any single character | f?.txt &emsp; Any file starting with 'f' followed by a character and ending with '. txt' |
| [!] | Match any single character not in parentheses | [!abc]* &emsp; Any file that does not start with abc |
| [a-z] | Match any single character from a to z | [a-z]* &emsp; Any file starting with a to z |
| {a,b,c}/a..z | When separated by commas, it represents individual characters<br/>When separated by two dots, represents continuous characters | {a,b,c}* &emsp; Files starting with any character from abc<br/>{a..Z}* &emsp;&ensp; Files starting with any character from a to z |

However, it should be noted that unlike Linux wildcard characters, when encountering file suffixes, the middle dot cannot be omitted.

For example, `abc20241022.csv`, the normal Linux wildcard `abc*` is sufficient, but here we need to use `abc*.*` , Pay attention to a point in the middle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace this part to link to https://en.wikipedia.org/wiki/Regular_expression. Let user to learn regular itself.

Comment on lines 274 to 287
report.txt
notes.txt
input.csv
abch20241022.csv
abcw20241022.csv
abcx20241022.csv
abcq20241022.csv
abcg20241022.csv
abcv20241022.csv
abcb20241022.csv
old_data.csv
logo.png
script.sh
helpers.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some file path, not only match file name.

Optimize the describe about Regex
@@ -254,6 +254,54 @@ Specifies Whether to process data using the tag attribute format.

Filter pattern, which used for filtering files.

The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression. learn it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression. learn it
The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression.

@@ -254,6 +254,54 @@ Specifies Whether to process data using the tag attribute format.

Filter pattern, which used for filtering files.

The pattern follows standard regular expressions. For details, please refer to https://en.wikipedia.org/wiki/Regular_expression. learn it

File Structure Example:
Copy link
Member

@Hisoka-X Hisoka-X Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
File Structure Example:
There are some examples.
File Structure Example:

Optimize document structure
Optimize document structure
@YOMO-Lee YOMO-Lee requested a review from Hisoka-X October 29, 2024 04:04
@YOMO-Lee
Copy link
Contributor Author

@Hisoka-X Please review this

Please provide a description of all connectors that support the file_filter_pattern parameter
Added the following file connector description about file_filter_pattern:
CosFile(en)、OssFile(en)、OssJindoFile(en)、HdfsFile(en)
Added the following file connector description about file_filter_pattern:
FtpFile(en)、SftpFile(en)、S3File(en)、HdfsFile(zh)
@Hisoka-X Hisoka-X changed the title [Fix] LocalFile doc optimize (#7887) [Improve][Doc] Add file_filter_pattern example to doc Oct 29, 2024
@YOMO-Lee
Copy link
Contributor Author

@zhilinli123 please review

@hailin0 hailin0 merged commit a2590e8 into apache:dev Oct 29, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants