Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming processing supports nullable datatype #594

Conversation

lizhou1111
Copy link
Contributor

@lizhou1111 lizhou1111 commented Mar 8, 2024

This pr is for supporting nullable data type in streaming processing.Actually proton had already supported the nullable data type, but it had some bugs. Full test report:Nullable datatype in streaming processing test report.

Bug 1: group by window_start and nullable column.
For example:

drop stream if exists test;
create stream test(id nullable(int), value int);
select id, window_start, max(value) from tumble(test, 5s) group by window_start, id;
-- No matter what data you insert in any speed it won't output any result

It won’t output any result.

Bug 2: when you create external stream with nullable column, in some data type it will lose data.
For example:

drop stream if exists test_external1;
create external stream if not exists test_external1(
	int8_c int8,
	ipv4_c nullable(ipv4)
) settings type = 'kafka', brokers = 'redpanda-1:9092', topic = 'test', data_format='JSONEachRow';

select * from test_external1;
-- rpk topic produce test
-- this data is ok
{"int8_c":1,"ipv4_c":"192.168.168.168"}

-- this data will be discard
{"int8_c":2,"ipv4_c":""}

-- But if you just don't have this field, it is ok, proton will fill this field with null.
{"int8_c":3}

-- proton output:
┌─int8_c─┬─ipv4_c──────────┐
│      1192.168.168.168 │
└────────┴─────────────────┘
┌─int8_c─┬─ipv4_c─┐
│      3 │ ᴺᵁᴸᴸ   │
└────────┴────────┘

Current progress:

  • Bug1 fixing
  • Bug2 fixing

For bug2, if a columns are nullable, there are 4 situation about inserting data in redpanda

  1. valid data({"key" : 1})
  2. insert null({"key" : null})
  3. insert just empty field({"key" : ""})
  4. insert invalid data({"key" : "some invalid data"})

In the former proton, proton will discard the whole message for 3 and 4(bug 2), now it only happen in 4(Maybe this shouldn't happen either?).

@lizhou1111 lizhou1111 self-assigned this Mar 8, 2024
@lizhou1111 lizhou1111 force-pushed the feature/issue-422-streaming-processing-supports-nullable-datatype branch 3 times, most recently from f03ffbd to a14d8be Compare March 11, 2024 08:41
@lizhou1111 lizhou1111 force-pushed the feature/issue-422-streaming-processing-supports-nullable-datatype branch from a14d8be to 2450404 Compare March 12, 2024 02:47
@lizhou1111 lizhou1111 marked this pull request as ready for review March 12, 2024 02:47
@lizhou1111 lizhou1111 requested a review from yl-lisen March 12, 2024 02:47
@lizhou1111 lizhou1111 added the enhancement New feature or request label Mar 12, 2024
@lizhou1111 lizhou1111 force-pushed the feature/issue-422-streaming-processing-supports-nullable-datatype branch from 2450404 to 3f82357 Compare March 15, 2024 04:27
Copy link
Collaborator

@yl-lisen yl-lisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, don't close #422 , it's an initial PR , we can merge it first
Next, we need do more nullable type supporting or checking for functions / other processing via following PRs

@lizhou1111 lizhou1111 merged commit 9619a57 into develop Mar 15, 2024
21 checks passed
@lizhou1111 lizhou1111 deleted the feature/issue-422-streaming-processing-supports-nullable-datatype branch March 15, 2024 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants