You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ibis as a library made the design choice of being highly decoupled from each backend specific implementation, this makes sense because it simplifies maintenance. For LETSQL since we only cater to one backend we should increase the coupling with DataFusion.
This is impactful for the following reasons:
Access all functionality, for example, the unnest function is available as part of the LogicalPlan and exposed to dataframes, but not through SQL (or hidden)
Drop sqlglot as a dependency, this will reduce the size of the library, the number of dependencies and avoid relying on a library that is governed by a single individual/company
Drop generic implementations such as to_parquet that may be missing out on DataFusion performance improvements (see benchmarks). The implementation of to_parquet uses a generic to_pyarrow_batches in conjuction with the pyarrow.parquet.ParquetWriter
Less important, directly generating the LogicalPlan will improve (probably and marginally) the processing speed, currently we construct a string via sqlglot only to parse (deconstruct) it into the LogicalPlan
The text was updated successfully, but these errors were encountered:
mesejo
changed the title
Generate the LogicalPlan directly from Ibis Expr
Increase the coupling of Ibis and the DataFusion Backend
May 8, 2024
Ibis as a library made the design choice of being highly decoupled from each backend specific implementation, this makes sense because it simplifies maintenance. For LETSQL since we only cater to one backend we should increase the coupling with DataFusion.
This is impactful for the following reasons:
Access all functionality, for example, theunnest
function is available as part of the LogicalPlan and exposed to dataframes, but not through SQL (or hidden)sqlglot
as a dependency, this will reduce the size of the library, the number of dependencies and avoid relying on a library that is governed by a single individual/companyto_parquet
that may be missing out on DataFusion performance improvements (see benchmarks). The implementation ofto_parquet
uses a genericto_pyarrow_batches
in conjuction with thepyarrow.parquet.ParquetWriter
sqlglot
only to parse (deconstruct) it into the LogicalPlanThe text was updated successfully, but these errors were encountered: