-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Take into account org.apache.spark.timeZone
in Parquet/Avro from Spark 3.2
#9632
Comments
Scope for this issue: throw exception at runtime if we detect a non-UTC timezone. Also, make sure to file a separate issue to track fixing this once we have timezone support fully enabled. |
Reading parquet now checks for the timezone. We only need additional work for Avro. |
I did a test on Avro in a round-trip integration test. Based on my test, currently our Gpu Avro scan didn't have timestamp or date support yet. So probably we should narrow down the scope only with Parquet and file another ticket tracking Avro part. And we can revisit it once we had some customer needs for Avro and it may have pre-requisite for Avro support in timestamp and date.
|
It seems that we don't support date/time in Avro yet. So I'm fine with closing this and we can reopen or file a new issue when it is needed. |
@ttnghia Do we have a PR link this JIRA? |
Yes, the config is checked since this PR: #9631. |
From Spark 3.2, a new metadata key
org.apache.spark.timeZone
is written into the the output Parquet/Avro file. That metadata is used to check and rebase datetime.We need to check that metadata while rebasing datetime. In particular, we need to throw exception if the file was written in timezone other than
UTC
.Ref: apache/spark#34973
The text was updated successfully, but these errors were encountered: