Skip fastparquet timestamp tests when plugin cannot read/write timestamps #9831
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #9776.
The tests in
fastparquet_compatibility_test.py
check for compatibility between Apache Spark, the Spark RAPIDS plugin, and fastparquet. In particular:test_reading_file_written_by_spark_cpu
checks if timestamp columns written with Apache Spark are read similarly with fastparquet and the plugin.test_reading_file_written_with_gpu
checks if timestamps written with the plugin are read the same on Apache Spark and fastparquet.If the timezone is not set to
UTC
, and the system timezone isn'tUTC
either, the plugin falls back to CPU for read/write of Parquet timestamp columns. This would cause the above tests not to run: the plugin can neither read nor write timestamps on GPU.Further, fastparquet seems to interpret timestamps written from Spark as being in
UTC
, regardless of the timezone settings. So on non-UTC timezones, Apache Spark and fastparquet get different results for the same input.For the two reasons above, it is best to only run the three-way timestamp comparison tests in setups with
UTC
timezone.This commit skips the timestamp tests described above, when a non-UTC timezone is detected.