-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output sqllogictests with arrow display rather than CSV writer #4578
Output sqllogictests with arrow display rather than CSV writer #4578
Conversation
@@ -52,7 +52,7 @@ SELECT var_pop(c2) FROM aggregate_test_100 | |||
query R | |||
SELECT var_pop(c6) FROM aggregate_test_100 | |||
---- | |||
2.615633434202189e37 | |||
26156334342021890000000000000000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these differences are due the difference in arrow::util::display::array_value_to_string
and the CSV writer. I think the changes are an improvement
89c6b3b
to
596b3e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clear, thanks @alamb
}) | ||
.collect(); | ||
// Convert to normal string representation | ||
let s = arrow::util::display::array_value_to_string(col, row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following is more simple and reduces assignment. (85~97)
let s = arrow::util::display::array_value_to_string(col, row) | |
let mut s = arrow::util::display::array_value_to_string(col, row) | |
.map_err(DFSqlLogicTestError::Arrow)?; | |
// apply subsequent normalization depending on type | |
if matches!(col.data_type(), DataType::Utf8 | DataType::LargeUtf8) && s.is_empty() { | |
// All empty strings are replaced with this value | |
s = "(empty)".to_string() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea -- done in d0e7db9
Which issue does this PR close?
re #4460
Rationale for this change
A follow up from #4547 which parsed CSV writer output
After #4547, the sqllogictest code writes RecordBatches to csv and then reparses it. This can lead to misleading results if the data had ' ' in it, for example.
What changes are included in this PR?
Vec<Vec<String>>
directlyarrow::util::display
for string display rather than the writer. This is whatpretty_format_batches
uses underneath so should be more consistent with existing testsAre these changes tested?
Are there any user-facing changes?