-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow ORC tests to run with wider range of timestamp input #6545
Conversation
Signed-off-by: Nghia Truong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes here look OK, although there are more tests to cleanup after the ORC timestamp fix. See get_orc_timestamp_gen
in orc_test.py and _restricted_timestamp
in hive_write_test.py. Can be done in a followup PR, but it seems very relevant to the theme of this PR.
Sure. I'll update all these related tests too. |
I was working on the timestamp test in ORC as indicated by @jlowe and discovered a new bug with wrong read/write day number (rapidsai/cudf#11691). Thus, this PR should be merged as-is without the suggested changes. They will be addressed later when the issue rapidsai/cudf#11691 is fixed. |
build |
CI failed with unrelated tests:
|
build |
1 similar comment
build |
It appears that new issue is actually an older issue related to the Gregorian calendar switch. That was avoided previously by not generating timestamps before the Gregorian calendar conversion, for example see https://github.com/NVIDIA/spark-rapids/blob/branch-21.08/integration_tests/src/main/python/orc_test.py#L51 from 22.08. I'm fine with not fixing it here, but if we leave it for a followup, we need an issue to track. |
Did we have any issue tracking that in either cudf or here? |
Yes, #131 for the ORC reader and #139 for the ORC writer. This limitation is also documented in the compatibility document. |
I set orc_gen to use Timestamp after 1590 but still see failures thus this should be a new bug:
This time they are failures with |
Convert this into draft since there are still failures that may need cudf fix. |
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
orc_test.py#test_read_with_more_columns
build |
1 similar comment
build |
build |
1 similar comment
build |
This reverses the changes for
orc_test.py#test_read_with_more_columns
that have been made in #6286 due to a bug in cudf's orc reader. The bug is fixed by rapidsai/cudf#11586.Closes #6312.
Depends on:
CI will not pass until they are all merged and spark-rapids-jni is updated with it.