[Fix][Kafka] Fix in kafka streaming mode can not read incremental data #7871

Carl-Zhou-CN · 2024-10-18T04:32:09Z

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Current test

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config
Update the release-note.

hailin0 · 2024-10-18T08:42:06Z

Please add test cases

weipengfei-sj · 2024-10-31T01:51:30Z

When running tasks in a flow mode, an empty map array is returned, and a NullPointerException will occur at this location.

It also needs to be adjusted to the following format.
split.setEndOffset(
latestOffsets.getOrDefault(
split.getTopicPartition(), Long.MAX_VALUE));

Carl-Zhou-CN · 2024-11-05T09:50:21Z

When running tasks in a flow mode, an empty map array is returned, and a NullPointerException will occur at this location.

It also needs to be adjusted to the following format. split.setEndOffset( latestOffsets.getOrDefault( split.getTopicPartition(), Long.MAX_VALUE));

After the above modification, are there any problems

liunaijie · 2024-11-14T09:00:09Z

.../java/org/apache/seatunnel/connectors/seatunnel/kafka/source/KafkaSourceSplitEnumerator.java

+
+        if (isStreamingMode) {
+            return Collections.emptyMap();
+        }


@Carl-Zhou-CN The fix is LGTM.

But I have a question about Kafka Batch mode, there is no option to set end offset, so how do stop it in batch mode?

In batch processing mode, the last offset in the partition will be consumed when reaching the slice.

OK, looks we need update the doc, I am not find related notes.

I'll be adding test cases and documentation by the end of the week

There only consume the data less than Stop offset.
If this record is on the middle of poll result, How to commit the offset?

Only the offset of the current consumption data is submitted

liunaijie · 2024-11-14T10:25:15Z

When running tasks in a flow mode, an empty map array is returned, and a NullPointerException will occur at this location.

It also needs to be adjusted to the following format. split.setEndOffset( latestOffsets.getOrDefault( split.getTopicPartition(), Long.MAX_VALUE));

Good catch, @Carl-Zhou-CN we also need update here.

Or can we in streaming mode, not return an empty map, return the value with Long.MAX_VALUE?

Carl-Zhou-CN · 2024-11-14T11:44:17Z

When running tasks in a flow mode, an empty map array is returned, and a NullPointerException will occur at this location.
It also needs to be adjusted to the following format. split.setEndOffset( latestOffsets.getOrDefault( split.getTopicPartition(), Long.MAX_VALUE));

Good catch, @Carl-Zhou-CN we also need update here.

Or can we in streaming mode, not return an empty map, return the value with Long.MAX_VALUE?

Yes, he was ignored

Hisoka-X

LGTM.

liunaijie · 2024-11-15T12:22:11Z

docs/en/connector-v2/source/kafka.md

@@ -59,6 +59,7 @@ They can be downloaded via install-plugin.sh or from the Maven central repositor
 ### Simple

 > This example reads the data of kafka's topic_1, topic_2, topic_3 and prints it to the client.And if you have not yet installed and deployed SeaTunnel, you need to follow the instructions in Install SeaTunnel to install and deploy SeaTunnel. And if you have not yet installed and deployed SeaTunnel, you need to follow the instructions in [Install SeaTunnel](../../start-v2/locally/deployment.md) to install and deploy SeaTunnel. And then follow the instructions in [Quick Start With SeaTunnel Engine](../../start-v2/locally/quick-start-seatunnel-engine.md) to run this job.
+> In batch mode, it will consume continuously until it reaches the maximum offset.


Suggest move this hint to https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/source/kafka.md?plain=1#L14

And the maximun is not clear, in batch mode, it will stop when it reaches the offset at startup.

This PR modification does not alter the behavior of the Kafka source. I believe adding it above does not make a significant difference and disrupts the structure of the documentation

But I updated the description to explain the process

liunaijie · 2024-11-15T12:50:59Z

LGTM

apache#7871)

[Bugfix][Kafka] In kafak flow mode, stop offse should be Long.MAX_VALUE

6be1377

github-actions bot added connectors-v2 kafka labels Oct 18, 2024

Carl-Zhou-CN mentioned this pull request Oct 18, 2024

[Hotfix][Connector-V2][kafka] fix kafka sink config exactly-once exception #7857

Merged

4 tasks

Merge branch 'apache:dev' into kafka-strema

a2a961b

liunaijie changed the title ~~[Bugfix][Kafka] In kafak flow mode, stop offse should be Long.MAX_VALUE~~ [Bugfix][Kafka] In kafka flow mode, stop offse should be Long.MAX_VALUE Oct 18, 2024

Hisoka-X added this to the 2.3.9 milestone Oct 28, 2024

corgy-w mentioned this pull request Nov 5, 2024

2.3.8版本是不是也没支持数组对象的数据提取，复杂的json无法实现1行转多行的逻辑 #7961

Open

3 tasks

Carl-Zhou-CN closed this Nov 5, 2024

Carl-Zhou-CN reopened this Nov 5, 2024

liunaijie mentioned this pull request Nov 6, 2024

job在设置为流处理模式情况下，kafka-source不会根据group-offset进行自动消费， #7971

Closed

3 tasks

liunaijie reviewed Nov 14, 2024

View reviewed changes

liunaijie previously approved these changes Nov 14, 2024

View reviewed changes

github-actions bot added approved reviewed labels Nov 14, 2024

Carl-Zhou-CN added the need add test case label Nov 14, 2024

[Bugfix][Kafka] In kafak flow mode, stop offse should be Long.MAX_VALUE

a42e8f0

Carl-Zhou-CN dismissed liunaijie’s stale review via a42e8f0 November 15, 2024 05:56

github-actions bot removed approved reviewed labels Nov 15, 2024

Carl-Zhou-CN requested a review from liunaijie November 15, 2024 05:57

[Bugfix][Kafka] In kafak flow mode, stop offse should be Long.MAX_VALUE

e74cfcf

github-actions bot added the document label Nov 15, 2024

Merge branch 'apache:dev' into kafka-strema

5e2f20c

Hisoka-X removed the need add test case label Nov 15, 2024

Hisoka-X previously approved these changes Nov 15, 2024

View reviewed changes

github-actions bot added approved reviewed labels Nov 15, 2024

liunaijie reviewed Nov 15, 2024

View reviewed changes

[Bugfix][Kafka] In kafak flow mode, stop offse should be Long.MAX_VALUE

992b526

Carl-Zhou-CN dismissed Hisoka-X’s stale review via 992b526 November 15, 2024 12:41

github-actions bot removed approved reviewed labels Nov 15, 2024

liunaijie approved these changes Nov 15, 2024

View reviewed changes

github-actions bot added approved reviewed labels Nov 15, 2024

Carl-Zhou-CN requested a review from Hisoka-X November 16, 2024 02:21

Hisoka-X approved these changes Nov 16, 2024

View reviewed changes

Hisoka-X changed the title ~~[Bugfix][Kafka] In kafka flow mode, stop offse should be Long.MAX_VALUE~~ [Fix][Kafka] Fix in kafka streaming mode can not read incremental data Nov 16, 2024

Hisoka-X merged commit a0eeeb9 into apache:dev Nov 16, 2024
7 checks passed

fcb-xiaobo pushed a commit to fcb-xiaobo/seatunnel that referenced this pull request Nov 18, 2024

[Fix][Kafka] Fix in kafka streaming mode can not read incremental data (

48adca5

apache#7871)

fcb-xiaobo pushed a commit to fcb-xiaobo/seatunnel that referenced this pull request Nov 18, 2024

[Fix][Kafka] Fix in kafka streaming mode can not read incremental data (

c4338d6

apache#7871)

hawk9821 pushed a commit to hawk9821/seatunnel that referenced this pull request Nov 18, 2024

[Fix][Kafka] Fix in kafka streaming mode can not read incremental data (

1de39f9

apache#7871)

hawk9821 pushed a commit to hawk9821/seatunnel that referenced this pull request Nov 18, 2024

[Fix][Kafka] Fix in kafka streaming mode can not read incremental data (

2a45253

apache#7871)

AceGain mentioned this pull request Nov 28, 2024

[Bug] [Kafka source STREAMING] Kafka source connector doesn't capture new data from topic #8107

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix][Kafka] Fix in kafka streaming mode can not read incremental data #7871

[Fix][Kafka] Fix in kafka streaming mode can not read incremental data #7871

Carl-Zhou-CN commented Oct 18, 2024 •

edited

Loading

hailin0 commented Oct 18, 2024

weipengfei-sj commented Oct 31, 2024 •

edited

Loading

Carl-Zhou-CN commented Nov 5, 2024

liunaijie Nov 14, 2024

Carl-Zhou-CN Nov 14, 2024

liunaijie Nov 14, 2024

Carl-Zhou-CN Nov 14, 2024

liunaijie Nov 14, 2024

Carl-Zhou-CN Nov 14, 2024

liunaijie commented Nov 14, 2024

Carl-Zhou-CN commented Nov 14, 2024

Hisoka-X left a comment

liunaijie Nov 15, 2024

Carl-Zhou-CN Nov 15, 2024

Carl-Zhou-CN Nov 15, 2024

liunaijie commented Nov 15, 2024

[Fix][Kafka] Fix in kafka streaming mode can not read incremental data #7871

[Fix][Kafka] Fix in kafka streaming mode can not read incremental data #7871

Conversation

Carl-Zhou-CN commented Oct 18, 2024 • edited Loading

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

hailin0 commented Oct 18, 2024

weipengfei-sj commented Oct 31, 2024 • edited Loading

Carl-Zhou-CN commented Nov 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liunaijie commented Nov 14, 2024

Carl-Zhou-CN commented Nov 14, 2024

Hisoka-X left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liunaijie commented Nov 15, 2024

Carl-Zhou-CN commented Oct 18, 2024 •

edited

Loading

weipengfei-sj commented Oct 31, 2024 •

edited

Loading