Increase the coupling of Ibis and the DataFusion Backend #32

mesejo · 2024-05-08T09:17:43Z

Ibis as a library made the design choice of being highly decoupled from each backend specific implementation, this makes sense because it simplifies maintenance. For LETSQL since we only cater to one backend we should increase the coupling with DataFusion.

This is impactful for the following reasons:

~~Access all functionality, for example, the unnest function is available as part of the LogicalPlan and exposed to dataframes, but not through SQL (or hidden)~~
Drop sqlglot as a dependency, this will reduce the size of the library, the number of dependencies and avoid relying on a library that is governed by a single individual/company
Drop generic implementations such as to_parquet that may be missing out on DataFusion performance improvements (see benchmarks). The implementation of to_parquet uses a generic to_pyarrow_batches in conjuction with the pyarrow.parquet.ParquetWriter
Less important, directly generating the LogicalPlan will improve (probably and marginally) the processing speed, currently we construct a string via sqlglot only to parse (deconstruct) it into the LogicalPlan

The text was updated successfully, but these errors were encountered:

mesejo changed the title ~~Generate the LogicalPlan directly from Ibis Expr~~ Increase the coupling of Ibis and the DataFusion Backend May 8, 2024

mesejo closed this as completed Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase the coupling of Ibis and the DataFusion Backend #32

Increase the coupling of Ibis and the DataFusion Backend #32

mesejo commented May 8, 2024 •

edited

Loading

Increase the coupling of Ibis and the DataFusion Backend #32

Increase the coupling of Ibis and the DataFusion Backend #32

Comments

mesejo commented May 8, 2024 • edited Loading

mesejo commented May 8, 2024 •

edited

Loading