Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add protobuf serialisation for Arrow Map types #5358

Closed
ahmedriza opened this issue Feb 21, 2023 · 0 comments · Fixed by #5359
Closed

Add protobuf serialisation for Arrow Map types #5358

ahmedriza opened this issue Feb 21, 2023 · 0 comments · Fixed by #5359
Labels
enhancement New feature or request

Comments

@ahmedriza
Copy link
Contributor

ahmedriza commented Feb 21, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently, the protobuf serialisation does not support Arrow Map types. This means that any query from ballista on, say, a Parquet file that contains Map data structures will fail to serialise the Arrow Schema, and hence lead to failure of the whole query (regardless of whether the query actually touches the Maps in the Parquet).

Describe the solution you'd like

Add the missing functionality to serialise Arrow Map data types to datafusion/proto/proto/datafusion.proto and the related serialisers.

Describe alternatives you've considered

I could not see any alternatives.

Additional context
Using this parquet file:
map.parquet.gz

and the following code:

async fn ballista_query() {
    let config = BallistaConfig::new().unwrap();
    let ctx = BallistaContext::standalone(&config, 10).await.unwrap();
    ctx.register_parquet("t", "map.parquet", ParquetReadOptions::default()).await.unwrap();
    let df = ctx.sql("select * from t").await.unwrap();
    df.show().await.unwrap();
}

will fail with:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Internal("failed to serialize logical plan: Plan(\"General error: Proto serialization error: The Map data type is not yet supported\")")',

A correct execution should produce:

+------+------------------------------+
| name | properties                   |
+------+------------------------------+
| foo  | {symbol: FOO, currency: GBP} |
| bar  | {symbol: BAR, currency: USD} |
| baz  | {symbol: BAZ, currency: EUR} |
+------+------------------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant