Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(build system): consolidate YAML/SQL serialization #525

Open
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

hussainsultan
Copy link
Contributor

@hussainsultan hussainsultan commented Feb 8, 2025

This PR adds roundtrip translation of expr to and from YAML and defines what the build artifacts should look like.

  • Build Outputs & Config:
    • Added metadata.json and BuildConfig support (with YAML alias/anchor disabling)
    • Updated BuildManager to generate SQL plans and support proper SQL formatting
  • YAML Serialization:
    kwargs_tuple in profile so it’s written as a key/value pair in YAML
    • Split deferred reads and SQL into dedicated YAML files (e.g. deferred_reads.yaml, sql.yaml)
    • Consolidated redundant YAML schemas in definitions field and added a schema_ref
  • CachedNode:
    Note: CachedNode is not written in sql.yaml or deferred_reads.yaml
    • only caching with SourceStorage is supported
    • Improved error handling and adjusted tests (e.g., unmarking xfail tests for deferred reads)

XFail Cases:
tests/test_compiler.py::test_ibis_compiler
tests/test_letsql_ops.py::{test_memtable, test_memtable_cache}
Reason: MemTable is not serializable
tests/test_udf.py::test_built_in_udf
Reason: UDFs do not have the same memory address when pickled/unpickled

Build Directory Structure:
├── c0907dab80b0
│  ├── c0907dab80b0.sql
│  ├── deferred_reads.yaml
│  ├── expr.yaml
│  ├── metadata.json
│  ├── profiles.yaml
│  └── sql.yaml

Handle in follow-up PRs:

  • Add _register.py for code-based UDF serialization
  • Write cache data to deferred_reads?
  • Support pyarrow UDF

Copy link

codspeed-hq bot commented Feb 8, 2025

CodSpeed Performance Report

Merging #525 will not alter performance

Comparing feat/yaml-compiler (6972299) with main (cf5dc1a)

Summary

✅ 2 untouched benchmarks

@hussainsultan hussainsultan force-pushed the feat/yaml-compiler branch 2 times, most recently from 1a5df96 to 2714aa0 Compare February 9, 2025 13:23
@hussainsultan hussainsultan force-pushed the feat/yaml-compiler branch 4 times, most recently from 57d9da0 to 8a25426 Compare February 17, 2025 19:19
Copy link

codecov bot commented Feb 17, 2025

Codecov Report

Attention: Patch coverage is 53.57143% with 39 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
python/letsql/vendor/ibis/backends/__init__.py 51.85% 39 Missing ⚠️
Files with missing lines Coverage Δ
python/letsql/config.py 69.51% <100.00%> (-30.49%) ⬇️
python/letsql/vendor/ibis/backends/__init__.py 48.87% <51.85%> (-20.44%) ⬇️

... and 189 files with indirect coverage changes

@hussainsultan hussainsultan force-pushed the feat/yaml-compiler branch 4 times, most recently from e692ba3 to d9278bf Compare February 22, 2025 17:10
@hussainsultan hussainsultan changed the title feat: add ibis-yaml compiler feat(build system): consolidate enhancements and improve YAML/SQL serialization Feb 22, 2025
@hussainsultan hussainsultan changed the title feat(build system): consolidate enhancements and improve YAML/SQL serialization feat(build system): consolidate and add YAML/SQL serialization Feb 22, 2025
@hussainsultan hussainsultan changed the title feat(build system): consolidate and add YAML/SQL serialization feat(build system): consolidate YAML/SQL serialization Feb 22, 2025
def compile_expr(self, expr: ir.Expr) -> str:
expr_hash = self.artifact_store.get_expr_hash(expr)

backends = find_all_sources(expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hussainsultan This would break with new profiles implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants