Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Stream Executor with Emit on Window Close Semantics #51

Merged
merged 15 commits into from
Mar 1, 2024

Conversation

st1page
Copy link
Contributor

@st1page st1page commented Feb 14, 2023

No description provided.

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter half of this document does not conform to Markdown format. (It requires 2 line breaks to open a new paragraph)


### GroupAgg

If there is watermark in group key, the EMIT ON WINDOW CLOSE GroupAgg calculate the aggregation with the same logic with streamGroupAgg, But it only emit the complete part of the agg result. A simple idea is add a sortExecutor after a normal streamGroupAgg and we can find that the streamGroupAgg's result table and sortExecutor's stateTable exactly have the same schema and meaning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be unacceptable to me to use 2 caches in memory, which effectively means the memory is cut into 1/2, as well as the 2x IO cost of cache miss.

Copy link
Member

@fuyufjh fuyufjh Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SortBufferCache can be omitted here. It doesn't worth to maintain a full row copy in memory just for emitting when a Window event is received.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what you say makes sense here, because every record will just be read one time and then will be deleted soon. The cache is meaningless here.

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

It likes window join in Flink.
When there are watermarks in the first join key of the both two sides. We can use two SortBuffers(table and cache) to implement a sortMergeJoin. SortBuffer’s `peek` method is used here.

### IntervalJoin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #32 (comment)

In short, Flink puts join key before time column, rather than 2 copies of data. I think we may start with it first so that hopefully it can reuse the source code of the normal version of interval join operator.

@chenzl25
Copy link
Contributor

With emit on close semantics, it seems we can implement event time temporal join as well, because this semantics can ensure all the data we need are complete.

It likes window join in Flink.
When there are watermarks in the first join key of the both two sides. We can use two SortBuffers(table and cache) to implement a sortMergeJoin. SortBuffer’s `peek` method is used here.

### IntervalJoin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #32 (comment)

In short, Flink puts join key before time column, rather than 2 copies of data. I think we may start with it first so that hopefully it can reuse the source code of the normal version of interval join operator.

@xxchan xxchan changed the title RFC: Stream Executor with Emit on Close Semantics RFC: Stream Executor with Emit on Window Close Semantics Apr 12, 2023
@st1page st1page merged commit 8439676 into main Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants