Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output sqllogictests with arrow display rather than CSV writer #4578

Merged
merged 2 commits into from
Dec 11, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Dec 10, 2022

Which issue does this PR close?

re #4460

Rationale for this change

A follow up from #4547 which parsed CSV writer output

After #4547, the sqllogictest code writes RecordBatches to csv and then reparses it. This can lead to misleading results if the data had ' ' in it, for example.

What changes are included in this PR?

  1. Generate Vec<Vec<String>> directly
  2. Use arrow::util::display for string display rather than the writer. This is what pretty_format_batches uses underneath so should be more consistent with existing tests
  3. Check that all output RecordBatches have the same schema

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Dec 10, 2022
@@ -52,7 +52,7 @@ SELECT var_pop(c2) FROM aggregate_test_100
query R
SELECT var_pop(c6) FROM aggregate_test_100
----
2.615633434202189e37
26156334342021890000000000000000000000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these differences are due the difference in arrow::util::display::array_value_to_string and the CSV writer. I think the changes are an improvement

@alamb alamb force-pushed the alamb/sqllogic_test_less_parse branch from 89c6b3b to 596b3e1 Compare December 10, 2022 11:53
@xudong963 xudong963 self-requested a review December 10, 2022 11:55
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear, thanks @alamb

})
.collect();
// Convert to normal string representation
let s = arrow::util::display::array_value_to_string(col, row)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following is more simple and reduces assignment. (85~97)

Suggested change
let s = arrow::util::display::array_value_to_string(col, row)
let mut s = arrow::util::display::array_value_to_string(col, row)
.map_err(DFSqlLogicTestError::Arrow)?;
// apply subsequent normalization depending on type
if matches!(col.data_type(), DataType::Utf8 | DataType::LargeUtf8) && s.is_empty() {
// All empty strings are replaced with this value
s = "(empty)".to_string()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea -- done in d0e7db9

@alamb alamb changed the title Output sqllogictests with arrow display Output sqllogictests with arrow display rather than CSV writer Dec 11, 2022
@xudong963 xudong963 merged commit fb85049 into apache:master Dec 11, 2022
@alamb alamb deleted the alamb/sqllogic_test_less_parse branch August 8, 2023 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants