Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/dev' into add-kerberos-e2e
Browse files Browse the repository at this point in the history
  • Loading branch information
zhangshenghang committed Nov 23, 2024
2 parents a5902a4 + b6e7a42 commit b3a3136
Show file tree
Hide file tree
Showing 22 changed files with 1,064 additions and 21 deletions.
1 change: 1 addition & 0 deletions docs/en/connector-v2/source/CosFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,6 +343,7 @@ The compress codec of archive files and the details that supported as the follow
| ZIP | txt,json,excel,xml | .zip |
| TAR | txt,json,excel,xml | .tar |
| TAR_GZ | txt,json,excel,xml | .tar.gz |
| GZ | txt,json,xml | .gz |
| NONE | all | .* |

### encoding [string]
Expand Down
1 change: 1 addition & 0 deletions docs/en/connector-v2/source/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,7 @@ The compress codec of archive files and the details that supported as the follow
| ZIP | txt,json,excel,xml | .zip |
| TAR | txt,json,excel,xml | .tar |
| TAR_GZ | txt,json,excel,xml | .tar.gz |
| GZ | txt,json,xml | .gz |
| NONE | all | .* |

### encoding [string]
Expand Down
1 change: 1 addition & 0 deletions docs/en/connector-v2/source/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@ The compress codec of archive files and the details that supported as the follow
| ZIP | txt,json,excel,xml | .zip |
| TAR | txt,json,excel,xml | .tar |
| TAR_GZ | txt,json,excel,xml | .tar.gz |
| GZ | txt,json,xml | .gz |
| NONE | all | .* |

### encoding [string]
Expand Down
3 changes: 3 additions & 0 deletions docs/en/connector-v2/source/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,7 @@ The compress codec of archive files and the details that supported as the follow
| ZIP | txt,json,excel,xml | .zip |
| TAR | txt,json,excel,xml | .tar |
| TAR_GZ | txt,json,excel,xml | .tar.gz |
| GZ | txt,json,xml | .gz |
| NONE | all | .* |

### encoding [string]
Expand Down Expand Up @@ -490,4 +491,6 @@ sink {
- [BugFix] Fix the bug of incorrect path in windows environment ([2980](https://github.com/apache/seatunnel/pull/2980))
- [Improve] Support extract partition from SeaTunnelRow fields ([3085](https://github.com/apache/seatunnel/pull/3085))
- [Improve] Support parse field from file path ([2985](https://github.com/apache/seatunnel/pull/2985))
### 2.3.9-beta 2024-11-12
- [Improve] Support parse field from file path ([8019](https://github.com/apache/seatunnel/issues/8019))

1 change: 1 addition & 0 deletions docs/en/connector-v2/source/OssJindoFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,7 @@ The compress codec of archive files and the details that supported as the follow
| ZIP | txt,json,excel,xml | .zip |
| TAR | txt,json,excel,xml | .tar |
| TAR_GZ | txt,json,excel,xml | .tar.gz |
| GZ | txt,json,xml | .gz |
| NONE | all | .* |

### encoding [string]
Expand Down
1 change: 1 addition & 0 deletions docs/en/connector-v2/source/S3File.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,7 @@ The compress codec of archive files and the details that supported as the follow
| ZIP | txt,json,excel,xml | .zip |
| TAR | txt,json,excel,xml | .tar |
| TAR_GZ | txt,json,excel,xml | .tar.gz |
| GZ | txt,json,xml | .gz |
| NONE | all | .* |

### encoding [string]
Expand Down
13 changes: 7 additions & 6 deletions docs/en/connector-v2/source/SftpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,11 +235,12 @@ The compress codec of files and the details that supported as the following show
The compress codec of archive files and the details that supported as the following shown:

| archive_compress_codec | file_format | archive_compress_suffix |
|------------------------|--------------------|-------------------------|
| ZIP | txt,json,excel,xml | .zip |
| TAR | txt,json,excel,xml | .tar |
| TAR_GZ | txt,json,excel,xml | .tar.gz |
| NONE | all | .* |
|--------------------|--------------------|---------------------|
| ZIP | txt,json,excel,xml | .zip |
| TAR | txt,json,excel,xml | .tar |
| TAR_GZ | txt,json,excel,xml | .tar.gz |
| GZ | txt,json,xml | .gz |
| NONE | all | .* |

### encoding [string]

Expand Down Expand Up @@ -384,4 +385,4 @@ sink {
Console {
}
}
```
```
40 changes: 37 additions & 3 deletions docs/en/seatunnel-engine/rest-api-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -384,15 +384,18 @@ When we can't get the job info, the response will be:

#### Parameters

> | name | type | data type | description |
> | name | type | data type | description |
> |----------------------|----------|-----------|-----------------------------------|
> | jobId | optional | string | job id |
> | jobName | optional | string | job name |
> | isStartWithSavePoint | optional | string | if job is started with save point |
> | format | optional | string | config format, support json and hocon, default json |
#### Body

```json
You can choose json or hocon to pass request body.
The json format example:
``` json
{
"env": {
"job.mode": "batch"
Expand Down Expand Up @@ -421,6 +424,37 @@ When we can't get the job info, the response will be:
]
}
```
The hocon format example:
``` hocon
env {
job.mode = "batch"
}
source {
FakeSource {
result_table_name = "fake"
row.num = 100
schema = {
fields {
name = "string"
age = "int"
card = "int"
}
}
}
}
transform {
}
sink {
Console {
source_table_name = "fake"
}
}
```


#### Responses

Expand Down Expand Up @@ -810,4 +844,4 @@ Returns a list of logs from the requested node.
To get a list of logs from the current node: `http://localhost:5801/log`
To get the content of a log file: `http://localhost:5801/log/job-898380162133917698.log`

</details>
</details>
2 changes: 1 addition & 1 deletion docs/zh/seatunnel-engine/rest-api-v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
sidebar_position: 11
---

# RESTful API
# RESTful API V1

:::caution warn

Expand Down
39 changes: 36 additions & 3 deletions docs/zh/seatunnel-engine/rest-api-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
sidebar_position: 12
---

# RESTful API
# RESTful API V2

SeaTunnel有一个用于监控的API,可用于查询运行作业的状态和统计信息,以及最近完成的作业。监控API是RESTful风格的,它接受HTTP请求并使用JSON数据格式进行响应。

Expand Down Expand Up @@ -380,14 +380,17 @@ seatunnel:

#### 参数

> | 参数名称 | 是否必传 | 参数类型 | 参数描述 |
> |----------------------|----------|--------|-----------------------------------|
> | 参数名称 | 是否必传 | 参数类型 | 参数描述 |
> |----------------------|----------|-----------------------------------|-----------------------------------|
> | jobId | optional | string | job id |
> | jobName | optional | string | job name |
> | isStartWithSavePoint | optional | string | if job is started with save point |
> | format | optional | string | 配置风格,支持json和hocon,默认 json |
#### 请求体

你可以选择用json或者hocon的方式来传递请求体。
Json请求示例:
```json
{
"env": {
Expand Down Expand Up @@ -418,6 +421,36 @@ seatunnel:
}
```

Hocon请求示例:
```hocon
env {
job.mode = "batch"
}
source {
FakeSource {
result_table_name = "fake"
row.num = 100
schema = {
fields {
name = "string"
age = "int"
card = "int"
}
}
}
}
transform {
}
sink {
Console {
source_table_name = "fake"
}
}
```
#### 响应

```json
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ public enum ArchiveCompressFormat {
ZIP(".zip"),
TAR(".tar"),
TAR_GZ(".tar.gz"),
GZ(".gz"),
;
private final String archiveCompressCodec;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,11 @@ protected void resolveArchiveCompressedInputStream(
}
}
break;
case GZ:
GzipCompressorInputStream gzipIn =
new GzipCompressorInputStream(hadoopFileSystemProxy.getInputStream(path));
readProcess(path, tableId, output, copyInputStream(gzipIn), partitionsMap, path);
break;
case NONE:
readProcess(
path,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
Expand Down Expand Up @@ -149,6 +151,13 @@ public class LocalFileIT extends TestSuiteBase {
"/seatunnel/read/tar_gz/txt/multifile/multiTarGz.tar.gz",
container);

Path txtGz =
convertToGzFile(
Lists.newArrayList(ContainerUtil.getResourcesFile("/text/e2e.txt")),
"e2e-txt-gz");
ContainerUtil.copyFileIntoContainers(
txtGz, "/seatunnel/read/gz/txt/single/e2e-txt-gz.gz", container);

Path jsonZip =
convertToZipFile(
Lists.newArrayList(
Expand All @@ -168,6 +177,14 @@ public class LocalFileIT extends TestSuiteBase {
"/seatunnel/read/zip/json/multifile/multiJson.zip",
container);

Path jsonGz =
convertToGzFile(
Lists.newArrayList(
ContainerUtil.getResourcesFile("/json/e2e.json")),
"e2e-json-gz");
ContainerUtil.copyFileIntoContainers(
jsonGz, "/seatunnel/read/gz/json/single/e2e-json-gz.gz", container);

ContainerUtil.copyFileIntoContainers(
"/text/e2e_gbk.txt",
"/seatunnel/read/encoding/text/e2e_gbk.txt",
Expand All @@ -193,6 +210,13 @@ public class LocalFileIT extends TestSuiteBase {
ContainerUtil.copyFileIntoContainers(
xmlZip, "/seatunnel/read/zip/xml/single/e2e-xml.zip", container);

Path xmlGz =
convertToGzFile(
Lists.newArrayList(ContainerUtil.getResourcesFile("/xml/e2e.xml")),
"e2e-xml-gz");
ContainerUtil.copyFileIntoContainers(
xmlGz, "/seatunnel/read/gz/xml/single/e2e-xml-gz.gz", container);

Path txtLzo = convertToLzoFile(ContainerUtil.getResourcesFile("/text/e2e.txt"));
ContainerUtil.copyFileIntoContainers(
txtLzo, "/seatunnel/read/lzo_text/e2e.txt", container);
Expand Down Expand Up @@ -313,6 +337,7 @@ public void testLocalFileReadAndWrite(TestContainer container)
/** Compressed file test */
// test read single local text file with zip compression
helper.execute("/text/local_file_zip_text_to_assert.conf");
helper.execute("/text/local_file_gz_text_to_assert.conf");
// test read multi local text file with zip compression
helper.execute("/text/local_file_multi_zip_text_to_assert.conf");
// test read single local text file with tar compression
Expand All @@ -325,10 +350,12 @@ public void testLocalFileReadAndWrite(TestContainer container)
helper.execute("/text/local_file_multi_tar_gz_text_to_assert.conf");
// test read single local json file with zip compression
helper.execute("/json/local_file_json_zip_to_assert.conf");
helper.execute("/json/local_file_json_gz_to_assert.conf");
// test read multi local json file with zip compression
helper.execute("/json/local_file_json_multi_zip_to_assert.conf");
// test read single local xml file with zip compression
helper.execute("/xml/local_file_zip_xml_to_assert.conf");
helper.execute("/xml/local_file_gz_xml_to_assert.conf");
// test read single local excel file with zip compression
helper.execute("/excel/local_excel_zip_to_assert.conf");
// test read multi local excel file with zip compression
Expand Down Expand Up @@ -551,4 +578,29 @@ public FileVisitResult visitFile(

return tarGzFilePath;
}

public Path convertToGzFile(List<File> files, String name) throws IOException {
if (files == null || files.isEmpty()) {
throw new IllegalArgumentException("File list is empty or invalid");
}

File firstFile = files.get(0);
Path gzFilePath = Paths.get(firstFile.getParent(), String.format("%s.gz", name));

try (FileInputStream fis = new FileInputStream(firstFile);
FileOutputStream fos = new FileOutputStream(gzFilePath.toFile());
GZIPOutputStream gzos = new GZIPOutputStream(fos)) {

byte[] buffer = new byte[2048];
int length;

while ((length = fis.read(buffer)) > 0) {
gzos.write(buffer, 0, length);
}
gzos.finish();
} catch (IOException e) {
e.printStackTrace();
}
return gzFilePath;
}
}
Loading

0 comments on commit b3a3136

Please sign in to comment.