-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fine grained timezone checker instead of type based [databricks] #9719
Changes from 1 commit
801752c
36ddb24
28ba19e
3d5d297
b3f85bd
c7c60d9
90d975e
b88a13b
3cda255
42c7888
1da1291
46dbe60
c954125
eb32703
f0f6164
e554f4e
d9fe752
78e5804
2ad72c8
7c734c0
e62ee80
568f8e2
7a70ad0
a743a7a
4942496
83171ae
e9b3b10
a564891
f497490
63d9e26
1cbb694
07b6863
36cc096
3e70b52
51e5017
07819fb
d900607
6e38bcb
9fbe5b7
dd07316
2e6578e
33187b0
6dff012
63d4394
4d29350
950ac3c
eb0c85e
78c026e
0267c81
216daf3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -565,13 +565,14 @@ def test_csv_read_count(spark_tmp_path): | |
assert_gpu_and_cpu_row_counts_equal(lambda spark: spark.read.csv(data_path), | ||
conf = {'spark.rapids.sql.explain': 'ALL'}) | ||
|
||
@allow_non_gpu('FileSourceScanExec', 'ProjectExec', 'CollectLimitExec', 'DeserializeToObjectExec', *non_utc_allow) | ||
@allow_non_gpu('FileSourceScanExec', 'ProjectExec', 'CollectLimitExec', 'DeserializeToObjectExec') | ||
@pytest.mark.skipif(is_before_spark_341(), reason='`TIMESTAMP_NTZ` is only supported in PySpark 341+') | ||
@pytest.mark.parametrize('date_format', csv_supported_date_formats) | ||
@pytest.mark.parametrize('ts_part', csv_supported_ts_parts) | ||
@pytest.mark.parametrize("timestamp_type", [ | ||
pytest.param('TIMESTAMP_LTZ', marks=pytest.mark.xfail(is_spark_350_or_later(), reason="https://github.com/NVIDIA/spark-rapids/issues/9325")), | ||
"TIMESTAMP_NTZ"]) | ||
@pytest.mark.xfail(is_not_utc(), reason='Timezone is not supported for csv format as https://github.com/NVIDIA/spark-rapids/issues/9653.') | ||
def test_csv_infer_schema_timestamp_ntz_v1(spark_tmp_path, date_format, ts_part, timestamp_type): | ||
csv_infer_schema_timestamp_ntz(spark_tmp_path, date_format, ts_part, timestamp_type, 'csv', 'FileSourceScanExec') | ||
|
||
|
@@ -622,9 +623,9 @@ def do_read(spark): | |
non_exist_classes = cpu_scan_class, | ||
conf = conf) | ||
|
||
@allow_non_gpu('FileSourceScanExec', 'CollectLimitExec', 'DeserializeToObjectExec', *non_utc_allow) | ||
@allow_non_gpu('FileSourceScanExec', 'CollectLimitExec', 'DeserializeToObjectExec') | ||
@pytest.mark.skipif(is_before_spark_340(), reason='`preferDate` is only supported in Spark 340+') | ||
|
||
@pytest.mark.xfail(is_not_utc(), reason='Timezone is not supported for csv format as https://github.com/NVIDIA/spark-rapids/issues/9653.') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here. Why xfail? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same reason as above. This is due to within csv_infer_schema_timestamp_ntz method, it has a class existence capture assert other than allow_non_utc case. As we had other test cases cover timezone staff for csv, xfail it instead of making original test case too complex. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So then how does #9653 cover that? I don't see CSV or JSON mentioned in there at all. I don't wee any mention of timestamp_ntz being mentioned. I just want to be 100% sure that we don't end up dropping something and getting data corruption that we missed because of an xfail. I would rather see a complex test, or at a minimum an issue filed to do the test correctly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated by refining the test cases. |
||
def test_csv_prefer_date_with_infer_schema(spark_tmp_path): | ||
# start date ""0001-01-02" required due to: https://github.com/NVIDIA/spark-rapids/issues/5606 | ||
data_gens = [byte_gen, short_gen, int_gen, long_gen, boolean_gen, timestamp_gen, DateGen(start=date(1, 1, 2))] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this going back to an xfail instead of allow_non_utc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to within csv_infer_schema_timestamp_ntz method, it has a class existence capture assert other than
allow_non_utc
case. As we had other test cases cover timezone staff for csv, xfail it instead of making original test case too complex.