Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Umbrella] Flink Engine Improvement and Quality Assurance #2100

Open
4 of 9 tasks
yaooqinn opened this issue Mar 11, 2022 · 15 comments
Open
4 of 9 tasks

[Umbrella] Flink Engine Improvement and Quality Assurance #2100

yaooqinn opened this issue Mar 11, 2022 · 15 comments

Comments

@yaooqinn
Copy link
Member

yaooqinn commented Mar 11, 2022

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the proposal

We introduced the Flink engine in #1322.

In this ticket, we collect feedback, improvements, bugfixes, aim to make it production-ready

Task list

Bugs

Improvement

Documentations

Brainstorming

Miscs

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@SteNicholas
Copy link
Member

@yaooqinn, the module label should be flink, not hive.

@yaooqinn
Copy link
Member Author

@yaooqinn, the module label should be flink, not hive.

oops..

@link3280
Copy link
Contributor

@yaooqinn shall we make this a KPIP and let the corresponding issues follow the naming pattern like [SUBTASK][KPIP-X]?

@yaooqinn
Copy link
Member Author

I am not sure that we can propose a KPIP on the status of this ticket, which seems not to meet the requirement of a KPIP.

In fact, we shall not create subtasks for KPIP-2 as it has been resolved. [SUBTASK][#2100] may be enough?

@link3280
Copy link
Contributor

@yaooqinn LGTM

zhaomin1423 added a commit to zhaomin1423/kyuubi that referenced this issue May 2, 2022
zhaomin1423 added a commit to zhaomin1423/kyuubi that referenced this issue May 2, 2022
zhaomin1423 added a commit to zhaomin1423/kyuubi that referenced this issue May 2, 2022
zhaomin1423 added a commit to zhaomin1423/kyuubi that referenced this issue May 2, 2022
zhaomin1423 added a commit to zhaomin1423/kyuubi that referenced this issue May 3, 2022
zhaomin1423 added a commit to zhaomin1423/kyuubi that referenced this issue May 4, 2022
zhaomin1423 added a commit to zhaomin1423/kyuubi that referenced this issue May 5, 2022
pan3793 pushed a commit that referenced this issue May 23, 2022
### _Why are the changes needed?_
Currently, Flink uses its legacy data type system in CollectSink, but sooner would move to the new type system (see https://issues.apache.org/jira/browse/FLINK-12251). Kyuubi should adapt to the new data type system beforehand.

This PR supports StringData in Flink.

This is a subtask of #2100 .

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #2718 from link3280/KYUUBI-2405.

Closes #2718

951b20a [Paul Lin] [KYUUBI#2405] Optimize code style
9236083 [Paul Lin] [KYUUBI#2405] Simplify sampling code
8708fa8 [Paul Lin] [KYUUBI#2405] Update comments
773d860 [Paul Lin] [KYUUBI#2405] Fix index out of range when sampling
b087b41 [Paul Lin] [KYUUBI#2405] Update externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/schema/RowSet.scala
dfeeda9 [Paul Lin] [KYUUBI#2405] Fix index out of range when result set is empty
e627e5f [Paul Lin] [KYUUBI#2405] Support Flink StringData Data Type

Authored-by: Paul Lin <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
@yaooqinn yaooqinn unpinned this issue Jul 14, 2022
@yaooqinn yaooqinn modified the milestones: v1.6.0, v1.7.0 Dec 6, 2022
@pan3793 pan3793 modified the milestones: v1.7.0, v1.8.0 Feb 7, 2023
@pan3793
Copy link
Member

pan3793 commented Feb 7, 2023

Postpone to 1.8, because this feature is not under rapid development, and it's not supposed to be accomplished in a short time.

@waywtdcc
Copy link
Contributor

The jdbc interface supports asynchronous real-time tasks to obtain results. Can this be done? @pan3793

@pan3793
Copy link
Member

pan3793 commented Mar 18, 2023

@waywtdcc technically, I don't think there is any blocker in Kyuubi framework, the JDBC driver retrieves result from Kyuubi Server in mini-batch, and we do similar thing in Spark which called incremental collection.

So it could be true if the Flink engine can return the streaming data in an Iterator.

cc the Flink experts @SteNicholas @link3280 @yanghua

@pan3793
Copy link
Member

pan3793 commented Mar 18, 2023

@waywtdcc are you using Flink 1.14? Actually, the Kyuubi community is going to add support for Flink 1.17 and drop support for Flink 1.14, because of the lack of developer resources.

It would be great if you can share more about your use case / challenge / expectation on Kyuubi Flink egnine :)

@waywtdcc
Copy link
Contributor

@waywtdcc are you using Flink 1.14? Actually, the Kyuubi community is going to add support for Flink 1.17 and drop support for Flink 1.14, because of the lack of developer resources.

It would be great if you can share more about your use case / challenge / expectation on Kyuubi Flink egnine :)

We use flink1.14 for data synchronization and real-time computing

@waywtdcc
Copy link
Contributor

@waywtdcc technically, I don't think there is any blocker in Kyuubi framework, the JDBC driver retrieves result from Kyuubi Server in mini-batch, and we do similar thing in Spark which called incremental collection.

So it could be true if the Flink engine can return the streaming data in an Iterator.

cc the Flink experts @SteNicholas @link3280 @yanghua

Ok, I see. So what if I need to get the historical checkpoint list and stop after executing the savepoint operation?

@pan3793
Copy link
Member

pan3793 commented Mar 20, 2023

All things you need to do is construct a proper FetchIterator on the Flink engine side.

@link3280
Copy link
Contributor

@waywtdcc technically, I don't think there is any blocker in Kyuubi framework, the JDBC driver retrieves result from Kyuubi Server in mini-batch, and we do similar thing in Spark which called incremental collection.
So it could be true if the Flink engine can return the streaming data in an Iterator.
cc the Flink experts @SteNicholas @link3280 @yanghua

Ok, I see. So what if I need to get the historical checkpoint list and stop after executing the savepoint operation?

@waywtdcc There're on-going efforts on Flink to improve the savepoint management via SQLs (see FLIP-222 for details). Kyuubi will support these statements once they are available.

@waywtdcc
Copy link
Contributor

Add a jar package, how to execute a certain method of this jar package?

@waywtdcc
Copy link
Contributor

All things you need to do is construct a proper FetchIterator on the Flink engine side.

Yes, we also need to get the resulting data in a streaming manner.

@pan3793 pan3793 removed this from the v1.8.0 milestone Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants