Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet reader thread errors do not make query fail #1767

Closed
andygrove opened this issue Feb 6, 2022 · 2 comments · Fixed by #1837
Closed

Parquet reader thread errors do not make query fail #1767

andygrove opened this issue Feb 6, 2022 · 2 comments · Fixed by #1837
Labels
bug Something isn't working datafusion Changes in the datafusion crate help wanted Extra attention is needed

Comments

@andygrove
Copy link
Member

andygrove commented Feb 6, 2022

Describe the bug
The Parquet reader thread can terminate due to errors but it does not make the query fail. It just uses println! to print the error.

Parquet reader thread terminated due to error: Execution("Failed to map column projection for field l_orderkey. Incompatible data types Int32 and Int64") for files: [PartitionedFile { file_meta: FileMeta { sized_file: SizedFile { path: "/mnt/data/tpch/sf100-parquet//lineitem/part-13.parquet", size: 1165760107 }, last_modified: Some(2022-02-05T18:23:00.416225870Z) }, partition_values: [] }]
Query 3 iteration 1 took 5452.6 ms and returned 0 rows

To Reproduce
Run one of the unsupported tpch queries or run against invalid parquet files.

Expected behavior
Query should fail.

Additional context
None

@andygrove andygrove added the bug Something isn't working label Feb 6, 2022
@andygrove andygrove added the help wanted Extra attention is needed label Feb 15, 2022
@alamb alamb added the datafusion Changes in the datafusion crate label Feb 15, 2022
@alamb
Copy link
Contributor

alamb commented Feb 15, 2022

I think this comes from https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/file_format/parquet.rs#L230

We can probably adapt a similar approach as taken in RepartitionExec https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/repartition.rs#L412-L449 with an extra task that handles a panic (join handle returning null) and sends a Error as @andygrove suggests

@alamb
Copy link
Contributor

alamb commented Feb 15, 2022

I'll take a stab at this

Also, as noted by @Dandandan / @andygrove in the slack channel, the #1617 may make this redundant anyways

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datafusion Changes in the datafusion crate help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants