Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake Dialect #9486

Closed
jdunkerley opened this issue Mar 19, 2024 · 24 comments · Fixed by #10612
Closed

Snowflake Dialect #9486

jdunkerley opened this issue Mar 19, 2024 · 24 comments · Fixed by #10612
Assignees
Labels
-libs Libraries: New libraries to be implemented
Milestone

Comments

@jdunkerley
Copy link
Member

jdunkerley commented Mar 19, 2024

  • Add the full dialect support for Snowflake and restore Postgres dialect to as before.
  • Also need to support reading a "BigInt" column with values that are too large.
    • Snowflake report Number(38,0) as a BigInt (it has no integer types).
    • We should read as Long unless it exceed in which case we should flip to BigInteger and read as that.
  • Depending on state of Postgres read decimal should ensure we can read decimal.
  • Remove the Snowflake_Spec tests.
  • Have a new Snowflake_Tests project running what we think are correct from the Table_Tests.
@jdunkerley jdunkerley self-assigned this Mar 19, 2024
@jdunkerley jdunkerley converted this from a draft issue Mar 19, 2024
@jdunkerley jdunkerley added the -libs Libraries: New libraries to be implemented label Mar 19, 2024
@jdunkerley jdunkerley added this to the Beta Release milestone May 14, 2024
@jdunkerley jdunkerley moved this from 📤 Backlog to ❓New in Issues Board May 14, 2024
@jdunkerley jdunkerley moved this from ❓New to 📤 Backlog in Issues Board May 14, 2024
@jdunkerley jdunkerley assigned radeusgd and unassigned jdunkerley Jun 10, 2024
@jdunkerley
Copy link
Member Author

jdunkerley commented Jun 10, 2024

https://github.com/enso-org/enso/tree/wip/jd/snowflake-dialect
WIP branch.

Note: Nothing == Nothing in Snowflake is ... True.

GitHub
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.

@radeusgd radeusgd moved this from 📤 Backlog to 🔧 Implementation in Issues Board Jun 11, 2024
@enso-bot
Copy link

enso-bot bot commented Jun 12, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-06-11):

Progress: Recoinessance on Snowflake Dialect. Fixed compilation errors. Moved tests to separate project. Allow secrets in config, prepare confign and readme, Debugging why alraedy fixed Aprache Arrow errors is haunting me - apparently not fixed if you run through the launcher. It should be finished by 2024-06-18.

Next Day: Next day I will be working on the same task. Fix Snowflake running thru launcher. See which tests are working, see how to split into 2 PRs and what basic stuff needs to work for 1st iteration. Check if no regressions in Postgres.

@enso-bot
Copy link

enso-bot bot commented Jun 12, 2024

Radosław Waśko reports a new STANDUP for today (2024-06-12):

Progress: Fixed running Snowflake with launcher. Fixed most critical errors. Now the test suite completes (with many failures though). Prepared a first iteration PR - we are at not worse stage than before, but split and can run the tests nicely. It should be finished by 2024-06-18.

Next Day: Next day I will be working on the same task. Work on aligning the tests and fixing mismatches in implementation. Try to extract common logic between backends where applicable to avoid duplication.

mergify bot pushed a commit that referenced this issue Jun 13, 2024
- Part of #9486
- Building on top of initial work by @jdunkerley and finishing it
- Reverted the changes to the Postgres_Dialect from last Snowflake work and split the Snowflake_Dialect into a separate module.
- Moved from `rounding_decimal_places_not_allowed_for_floats` to `supports_float_round_decimal_places` (as too confusing).
- Added Snowflake_Dialect type.
- Extracted `Snowflake_Spec` into separate `Snowflake_Tests`
- It imports the common tests from `Table_Tests`.
- Some initial adaptations to make the snowflake dialect not-crash.
- Adding `Internals_Access` proxy to allow external implementations to access our internal data structures without directly exposing them to users. Users should not use these.
- Adding profiling of SQL to check performance.
@enso-bot
Copy link

enso-bot bot commented Jun 14, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-06-13):

Progress: Investigating what we can do to make Snowflake tests faster. Increasing some timeouts in project-manager tests to make Types PR pass. Merged both pending PRs. Initial work on StaticModuleScope. It should be finished by 2024-06-18.

Next Day: Next day I will be working on the same task. Continue work on Types for a bit. Then continue work on Snowflake.

@enso-bot
Copy link

enso-bot bot commented Jun 18, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-06-17):

Progress: Speeding up tests by sharing connection. Avoid allocating tables in some tests - instead we build a table by constructing a query containing literal values - thus we avoid a CREATE query which had quite a bit of overhead. Reduced number of tests run by default (additional tests enabled by env var) from 1449 to 984. Seems there is 30-50% time reduction - the new tests take ~33 minutes to run, it used to be 45-60 mins. It should be finished by 2024-06-18.

Next Day: Next day I will be working on the same task. Continue work on the tests, start to actually fix stuff as there's still a lot of failures to handle...

@enso-bot
Copy link

enso-bot bot commented Jun 19, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-06-18):

Progress: Fixing aggregate tests, add row number and many others. Adding ordering, fixing dialect etc. Fixed literal table problem in Snowflake - yesterday's gains are not as impressive anymore as there were some failures before. It should be finished by 2024-06-18.

Next Day: Next day I will be working on the same task. Continue work on the tests and fixes.

@enso-bot
Copy link

enso-bot bot commented Jun 19, 2024

Radosław Waśko reports a new 🔴 DELAY for today (2024-06-19):

Summary: There is 10 days delay in implementation of the Snowflake Dialect (#9486) task.
It will cause 3 days delay for the delivery of this weekly plan.

7 days is my time off + at least 3 more days to finish the integration.

Delay Cause: Lots of small edge cases to handle, lots of tests need updating becaue of Subtle differences in Snowflake. Moreover iteration time is very slow because Snowflake is running remote and the query latency (network + server speed) is much larger than for local backends.

@enso-bot
Copy link

enso-bot bot commented Jun 19, 2024

Radosław Waśko reports a new STANDUP for today (2024-06-19):

Progress: Continuing work on fixing tests. Fixing Column Operations. Fixing also what I broke in other backend's tests while fixing Snowflake. Created a PR with a snapshot of my changes. Remaining changes in another PR. It should be finished by 2024-06-28.

Next Day: Next day I will be working on the same task. Continue work on the tests and fixes.

mergify bot pushed a commit that referenced this issue Jun 27, 2024
- Part of #9486
- Fixing our tests to not rely on deterministic ordering of created Tables in Database backends
- Before, SQLite and Postgres used to mostly return rows in the order they were inserted in, but Snowflake does not.
- Fixing various parts of Snowflake dialect.
@enso-bot
Copy link

enso-bot bot commented Jul 1, 2024

Radosław Waśko reports a new STANDUP for the provided date (2024-06-28):

Progress: Added a few more tests in the Types task. Updated Snowlake tests after order_by renamed to sort. Improving message in Test framework for easier work. Fix a bug that if no network connection Snowflake tests used to succeed with 0 tests run instead of failing. Investigating IFF operator in Snowflake, adding tests for literal table variant. It should be finished by 2024-06-28.

Next Day: Next day I will be working on the same task. Finish investigating the IFF problem. See what needs fixing next. Try gathering an overview.

@enso-bot
Copy link

enso-bot bot commented Jul 3, 2024

Radosław Waśko reports a new 🔴 DELAY for yesterday (2024-07-02):

Summary: There is 10 days delay in implementation of the Snowflake Dialect (#9486) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Iteration time on Snowflake is extremely long as it runs ~50-100x slower than other DBs for small queries because of the network latency and other kinds of overhead. Lots of subtle differences from Postgres dialect, each needing some custom handling. This is a very large task.

@enso-bot
Copy link

enso-bot bot commented Jul 3, 2024

Radosław Waśko reports a new STANDUP for the provided date (2024-07-01):

Progress: Fixed COUNT aggregations. Sped up tests by sharing tables and connection even more. Fix shortest/longest. Investigate first/last, created some followup tickets. It should be finished by 2024-07-08.

Next Day: Next day I will be working on the same task. Continue.

@enso-bot
Copy link

enso-bot bot commented Jul 3, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-02):

Progress: Implemented date/time support in Snowflake. Updated type mapping to reflect true state of things. Introduced notion of implicit conversions that do not trigger Inexact_Type_Coercion warning. It should be finished by 2024-07-08.

Next Day: Next day I will be working on the same task. Work on date/time operations.

@enso-bot
Copy link

enso-bot bot commented Jul 4, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-03):

Progress: Implemented most date/time operation variants for Snowflake. Created ticket for inconsistencie. WIP trying a fix for the type inference problem. It should be finished by 2024-07-08.

Next Day: Next day I will be working on the same task. Continue. Small cloud ticket. Types?

@enso-bot
Copy link

enso-bot bot commented Jul 5, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-04):

Progress: Investigating and fixing value type inference problems. Modified some problematic operations. Trying to batch round spec as it's extremely slow. It should be finished by 2024-07-08.

Next Day: Next day I will be working on the same task. Finish round batching. Small cloud ticket. Types work.

@enso-bot
Copy link

enso-bot bot commented Jul 8, 2024

Radosław Waśko reports a new STANDUP for the provided date (2024-07-05):

Progress: Prepared a PR for the Snowflake dialect to synchronize my changes. Some small fixes needed before it's mergable. Still further work needed on the dialect (at least one more PR expected). Working on Round batching - currently it did not improve performance because we are fetching types in N queries. Working on Types: introduced the No_Such_Method static Warning. Prototype for resolving the type hierarchy (Any). Investigating why pass data are randomly missing - very weird bug - if I print the pass data in error branch then they are present and there's no errors and if I don't print they are missing... It should be finished by 2024-07-08.

Next Day: Next day I will be working on the same task. Fix issues for PR (tests for downloading Integer column in Snowflake). Batched types fetch (quite big performance win expected).

@enso-bot
Copy link

enso-bot bot commented Jul 10, 2024

Radosław Waśko reports a new 🔴 DELAY for today (2024-07-10):

Summary: There is 14 days delay in implementation of the Snowflake Dialect (#9486) task.
It will cause 0 days delay for the delivery of this weekly plan.

Delay Cause: Still more time needed as this has many parts.

@enso-bot
Copy link

enso-bot bot commented Jul 10, 2024

Radosław Waśko reports a new STANDUP for the provided date (2024-07-08):

Progress: Fixed so that we don't do additional queries when fetching a Table. This fixed round batching to be actually 2-6x faster now. Hoped for overall improvement but total performance of test suite seemed to strangely be unaffected. Added tests checking that a NUMBER(38, 0) column is materialized as Integer (not Decimal) column if the values fit. Testing the Cloud PR - encountered a regression - cannot check if it was introduced by the recent changes or happened before because the Cloud API changes broke develop so I have no state to compare against. It should be finished by 2024-07-22.

Next Day: Next day I will be working on the same task. Investigate and fix the Image write regression. Implement the Integer materialization in Snowflake to be able to get the PR through.

mergify bot pushed a commit that referenced this issue Jul 10, 2024
- Related to #9486
- Fixes types in literal tables that are used throughout the tests
- Tries to makes testing faster by disabling some edge cases, trying batching some queries, re-using the main connection and trying to re-use tables more
- Implements date/time type mapping and operations for Snowflake
- Updates type mapping to correctly reflect what Snowflake does
- Disables warnings for Integer->Decimal coercion as that's too annoying and implicitly understood in Snowflake
- Allows to select a Decimal column with `..By_Type ..Integer` (only in Snowflake backend) because the Decimal column there is its 'de-facto' Integer column replacement.
@enso-bot
Copy link

enso-bot bot commented Jul 11, 2024

Radosław Waśko reports a new STANDUP for the provided date (2024-07-08):

Progress: Investigating Cloud file problems together with Paweł. Got the Snowflake 3 PR merged. Prepared 4th PR - Implemented the Integer materialization. It should be finished by 2024-07-22.

Next Day: Next day I will be working on the same task. Continue fixing Snowflake tests.

@radeusgd radeusgd mentioned this issue Jul 11, 2024
4 tasks
mergify bot pushed a commit that referenced this issue Jul 11, 2024
…er type, other type mapping tests (#10518)

- Related to #9486
- Ensures that even though an integer column in Snowflake is represented by `Decimal` type, if the values are small enough, they are materialized as `Integer`.
- If the values are larger, they are still read in as `Decimal`.
- Adds tests for some other `Decimal` edge cases (various precisions and scales), and for `Float`.
@enso-bot
Copy link

enso-bot bot commented Jul 12, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-11):

Progress: Further progress on fixing Snowflake tests. Created pt. 5 PR - fixing division / modulo, expressions, filter, missing values tests, checking joins; removed unnecessary warnings when materializing to Decimal to Integer column (reverse variant of the implicit conversion). It should be finished by 2024-07-22.

Next Day: Next day I will be working on the #9812 task. Work on Types.

mergify bot pushed a commit that referenced this issue Jul 16, 2024
- Related to #9486
- Batching of expression tests
- Fixing arithmetic by simplifying `%` and `/` operations
- Trying to share some more tables, sometimes improving performance sometimes not really
- Adding sorting and other fixes to tests to make them pass: Missing_Values_Spec, Filter_Spec, Map_Spec
- Fixing warnings related to materialization of Decimal->Integer, thus fixing Join_Spec.
@enso-bot
Copy link

enso-bot bot commented Jul 16, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-15):

Progress: Summarising a big picture what is left to do in Snowflake: union, replace/merge, distinct, table upload (some issues with DDL transactions need a workaround), some small nits. Fixed some easier issues with table upload. It should be finished by 2024-07-24.

Next Day: Next day I will be working on the same task. Continue Snowflake work - work on some small issues in column operations, then union.

@enso-bot
Copy link

enso-bot bot commented Jul 17, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-16):

Progress: Column Ops down to 0 failures. Fixing Union tests. It should be finished by 2024-07-24.

Next Day: Next day I will be working on the same task. Continue

@enso-bot
Copy link

enso-bot bot commented Jul 18, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-17):

Progress: Fixed Union tests. Implemented Distinct for Snowflake, Fixed Merge tests. Put up a PR. Some work on types. It should be finished by 2024-07-24.

Next Day: Next day I will be working on the same task. Continue Snowflake / types stuff.

@enso-bot
Copy link

enso-bot bot commented Jul 19, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-18):

Progress: Got the PR reviewed but tests were failing, mostly due to changes to should_contain_the_same_elements_as. Fixing these tests. Brainstorm with Ad on variants of should_contain_the_same_elements_as - renamed it to should_equal_ignoring_order and handled ignoring cardinality simply by using distinct. Fixed the tests. Got one remaining test failure to investigate. It should be finished by 2024-07-24.

Next Day: Next day I will be working on the same task. Investigate the bug. Merge the PR. Fix or pend remaining Snowflake failures in next PR.

@enso-bot
Copy link

enso-bot bot commented Jul 19, 2024

Radosław Waśko reports a new STANDUP for today (2024-07-19):

Progress: Found and reported the bug with weird interaction of ... and warnings (#10605). Fixed remaining failures - the 6th PR should be ready but it's failing on CI due to #10610 - maybe some clean run will get lucky... In the meantime fixed or marked pending all remaining tests - 7th final Snowflake PR also ready - #10612 and created follow up tickets to address remaining stuff: table upload / update #10609 and count distinct on boolean columns bug #10611. It should be finished by 2024-07-24.

Next Day: Next day I will be working on the same task. Getting the pending PRs merged. Plan what to do next.

@radeusgd radeusgd moved this from 🔧 Implementation to 👁️ Code review in Issues Board Jul 19, 2024
mergify bot pushed a commit that referenced this issue Jul 19, 2024
)

- Part of #9486
- Fixes `Table.union`, `merge` and `distinct` tests
- Replaces `distinct_on` in `Context` that was actually a Postgres specific addition leaking into the base with a more abstract `Context_Extension` mechanism.
- This allows us to implement the Snowflake-specific `DISTINCT` using `QUALIFY`.
@mergify mergify bot closed this as completed in #10612 Jul 23, 2024
@mergify mergify bot closed this as completed in ba56f8e Jul 23, 2024
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants