-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions related to watermarking of Iceberg source #9138
Comments
@jonathf: Always good to know if somebody is interested in a feature, so feel free to ask your questions!
I will create a backport PR soon, which will add the feature to Flink 1.15, and Flink 1.16 too.
The only server side of Iceberg is the REST Catalog implementation, which are not affected by this change (and you might use a different Catalog anyway). So Flink dependency upgrade is enough.
You need column metrics for your tables, so please make sure that the file statistics are there for the given column (you might need to rewrite the files, if they are missing). If the metrics are there, then just enable the watermark generation using If you have missing statistics, you will get the following exception:
|
I am glad to hear that! And thank you for the swift and in-depth answer.
We are on AWS managed Flink application which is currently limited upward to 1.15. Do you know anything about if or when AWS will update their support?
Good to know.
Thank you. I am not familiar with column metrics in flink. Is there documentation on the topic that you can point me to so I may read up on the topic? |
#9139 is the backport PR.
Sadly, I have no information about this. I would ask on the Iceberg user list or Slack channel, as there are several AWS maintainers in the community.
These are not Flink column metrics. They are Iceberg column metrics. By default, the metrics collection is turned on, but it could be fine tuned by these configs:
|
Okay, understood. I misunderstood the column metrics. Looks like metrics are off on our end, but seems like they can easily be added through AWS Glue. We will definetly give it a try (with a full rewrite). Our watermark column are of type timestamp inhereted from kafka through flink before outputed to iceberg. Can I assume this is is interpreted correctly? |
Yes, that's the plan |
Great! So to ensure I understand you correctly: |
Once #9139 is merged, you can use it to compile your own version of iceberg-flink-runtime, but officially it will not be supported by the community, because 1.15 support of this feature likely will not be released ever. |
@jonathf: Do you have any more questions? Could we close this issue? |
One last question: |
There is a link on the web page: https://iceberg.apache.org/ which is hard to miss 😄 |
Heh, I tried googling it, and it sent me in the wrong direction. Thanks for all your help @pvary. |
Query engine
Flink 1.15.2
Question
@pvary, congratulations on getting #8553 merged! It has benn interesting to follow the progress to the PR.
I know that it is still going to take a little time before the next release, but I have some some practical question for when it is released:
I apologies if I am too quick on the trigger and this answers will be answered in e.g. the docs soon.
The text was updated successfully, but these errors were encountered: