-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support postgres NUMERIC type fields #10
Comments
I'd say the problem is that pg2parquet always writes decimals as a fix_len_byte_array parquet type, even if the data would fit into Int64 or Int32. That was easier to implement, as I can write all precisions as single data type, and it was supported by the tools that I tried to use. I'll add support for writing decimals as Int64 for better compatibility (it should also be more efficient). You'll then have to reduce the precision, the default is 128 bit AFAIK.
In the meantime, you can convert the numeric columns to float64 (I think it's called float8 in postgres, for example `select value::float8 from my_table`)
|
I think I get it. So the issue is that I have these values stored as postgres |
I have release new version ( I have also added another option Last option is to use |
Thanks for your work on this! I'll give the new version a try! |
Glad to help, let me know how it went. I'm now reconsidering which of these options should be the default. Float64 seems like the most universal choice, while String would be the safest way in terms of loss of precision 🤔 I'd prioritize compatibility and ease of use - this tool is for exports, not backups (at least I hope I'm not responsible for some broken backup 😅) |
No, you should see a decimal number. Although, internally it is really stored as an integer with "shifted" decimal point by 9 digits. I tried it out, and it works for me (pg2parquet -> parquet2arrow -> pyarrow). I think it's most likely a bug/unsupported feature in the Arrow library you use - it loads the integer number without understanding the metadata saying it's actually a decimal. However, if you want to, we can rule out the possibility of pg2parquet misbehaving in your environment. This file: testtable.zip contains a parquet and an arrow file which are "correct" (I hope, it works in pyarrow at least). Could you try to load it into your arrow loader to see if it indeed doesn't support the decimals?
Perfect, I guess I'll just make this the default, the decimals seem to be quite broken, and each time in a different way 🤦 |
When I use pyarrow, I get:
DuckDB sees the same values (it can only read parquet, not arrow AFAIK)
|
Forgive this question if it makes no sense but I am using this package to export a sample dataset and then I want to convert it to an apache arrow format using the library parquet2arrow.
When I convert the file, I get an error saying that it can't find the Arrow schema, which is a little weird.
The error comes from this line.
I've made a reproduction here: https://github.com/mhkeller/convert-parquet-repro
When I read the parquet file in Tad, it appears just fine, but I'm wondering if the arrow conversion needs additional information that could be included by this package when the file is written.
I saw this issue but I'm not sure why such a simple data file would not yet be supported so I have a feeling that isn't the main issue. The types in the iris file are fairly basic.
Thanks for the help.
The text was updated successfully, but these errors were encountered: