-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty strings in CSV files aren't being interpreted as null when using a Dictionary(_, Utf8)
#12041
Comments
take |
@alamb shouldn't the csv reader also throw an error because "bob" is not a valid dictionary? |
I agree the discrepancy between UTf8 and Dictionary looks like a bug
I think |
@alamb before I file an issue to arrow-csv, why is "bob" a valid value for DictionaryArray? don't you need a key and a value for a dictionary? |
I think a more precise version of this would be "bob" is a valid value for |
@alamb I have a reproducer on Arrow CSV, see this https://github.com/apache/arrow-rs/compare/main...edmondop:arrow-rs:datafusion-12041?expand=1 However, I don't know if we should change the behavior of the Arrow CSV reader to return null for empty nullable dictionaries, I filed apache/arrow-rs#6821 to track the discussion |
Pending apache/arrow-rs#6830 merge and release in upstream arrow-rs dependency |
Describe the bug
Related to #7797
Empty strings in CSV files aren't being interpreted as null when using a
Dictionary(_, Utf8)
To Reproduce
Create a simple
input.csv
file like this:Run the following code:
Expected behavior
I was expecting the output to look like this:
But the full dataset is returned instead:
Additional context
Tested on v41.0.0
Replace
DataType::Dictionary(Box::new(DataType::UInt8), Box::new(DataType::Utf8))
withDataType::Utf8
and it works.The text was updated successfully, but these errors were encountered: