Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement transactional Table Upload / Update / Delete Rows for Snowflake backend #10609

Closed
3 tasks
radeusgd opened this issue Jul 19, 2024 · 6 comments · Fixed by #10738
Closed
3 tasks

Implement transactional Table Upload / Update / Delete Rows for Snowflake backend #10609

radeusgd opened this issue Jul 19, 2024 · 6 comments · Fixed by #10738
Assignees
Labels
-libs Libraries: New libraries to be implemented l-db-connector Libraries: database connectors l-db-write Libraries: database writer p-high Should be completed in the next sprint

Comments

@radeusgd
Copy link
Member

radeusgd commented Jul 19, 2024

  • (For August release) Possibly disable this for Snowflake (If inside a transaction)
  • Warnings for this case (if this can be done non-destructively)
  • Implement alternate transaction method

Snowflake is a first Database we support that has dataDefinitionCausesTransactionCommit() return true, meaning that we cannot make a DDL statement be a part of a bigger transaction.

Our Upload / Update rows mechanisms were relying on using DDL statements in transactions and on transaction roll-back also deleting such tables created within a transaction. In Snowflake that does not work.

We need to implement another strategy for creating / updating tables that can work in such environment.
It probably needs to split the action into 3 phases:

  1. create all needed tables,
  2. run the actual UPDATE transaction,
  3. and if the transaction failed and was rolled-back - delete the tables created in step (1).

This is still less robust than combining the whole operation in a single transaction - but we've got to deal with what we can use. However, because of that - we do not want the new strategy to replace the 'old' fully transactional one. Instead, we want to abstract away the common parts (creating something like DDL_Transaction_Strategy) and provide to the Upload_Table code two possible strategies. If the transactional one is available for a given database it should be preferred, falling back to the one described above. Ideally the main table update logic should be the same and only delegate to the DDL strategy.

@radeusgd radeusgd self-assigned this Jul 19, 2024
@radeusgd radeusgd added -libs Libraries: New libraries to be implemented l-db-connector Libraries: database connectors l-db-write Libraries: database writer p-high Should be completed in the next sprint labels Jul 19, 2024
@radeusgd
Copy link
Member Author

Assigning high priority as without it Snowflake update_rows may behave in unexpected ways and leave garbage behind.

If we want to ship proper Snowflake support before this is implemented, we should probably disable update_rows for Snowflake, or at least attach a warning.

@enso-bot enso-bot bot mentioned this issue Jul 19, 2024
mergify bot pushed a commit that referenced this issue Jul 23, 2024
- Closes #9486
- All tests are succeeding or marked pending
- Created follow up tickets for things that still need to be addressed, including:
- Fixing upload / table update #10609
- Fixing `Count_Distinct` on Boolean columns #10611
- Running the tests on CI is not part of this PR - to be addressed separately
@GregoryTravis GregoryTravis moved this from ❓New to 📤 Backlog in Issues Board Jul 23, 2024
@GregoryTravis GregoryTravis added this to the 2024-08 Release milestone Jul 23, 2024
@radeusgd radeusgd moved this from 📤 Backlog to 🔧 Implementation in Issues Board Jul 29, 2024
@enso-bot
Copy link

enso-bot bot commented Jul 30, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-29):

Progress: Investigating the IDE problems on my machine. Created PR #10708 that fixes propagation of some FS errors that used to be swallowed by accident. Found out the root cause - I had Controlled Folder Access feature of Windows Defender enabled and it was preventing writes to Documents folder. Started libs/engine discussion on what we can do about it. Started work on Snowflake transactional upload - refactored current upload logic to make it easier to work with. It should be finished by 2024-08-07.

Next Day: Next day I will be working on the same task. Continue - analyze the 'shape' of each transaction and see what kind of abstraction could work to make it work for Snowflake's constraints.

@enso-bot
Copy link

enso-bot bot commented Jul 31, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-30):

Progress: Designing how to approach the Snowflake transaction constraints. It should be finished by 2024-08-07.

Next Day: Next day I will be working on the same task. Continue - implement the 'transaction with tables' and DDL check.

mergify bot pushed a commit that referenced this issue Jul 31, 2024
…r submodules (#10713)

- First step of #10609 - before I actually modify it, I decided I need to refactor the `Upload_Table` logic as it was quite convoluted. Doing this as a separate PR for easier review. A big 600+ line file was replaced by several smaller ones grouped by topics.
- Practically no changes apart from moving stuff into separate modules.
- One small change - added `Missing_Argument` to `SQL_Query` as I noticed that lack of defaults was giving rise to confusing errors when working with `query` in the GUI.

Before:
![image](https://github.com/user-attachments/assets/b586caec-f25c-406e-be5a-d402f10feb86)
After:
![image](https://github.com/user-attachments/assets/6b1d4206-05b1-4587-b3e1-43ca95ea7c2e)
![image](https://github.com/user-attachments/assets/58c098bd-db0c-4ee2-823c-bf5c9e758ce4)
@enso-bot
Copy link

enso-bot bot commented Aug 1, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-31):

Progress: Implemented the 'transaction with tables'. Added a check for DDL in transactions that prevents undesirable behaviour (accidental commit). Lost quite some time on #7117 bug when testing it. It should be finished by 2024-08-07.

Next Day: Next day I will be working on the same task. Integrate 'transaction with tables' with current operations.

@enso-bot
Copy link

enso-bot bot commented Aug 2, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-08-01):

Progress: Integrated Create and Select_Into, basic tests passing. It should be finished by 2024-08-07.

Next Day: Next day I will be working on the same task. Continue - Update and Delete.

@radeusgd radeusgd moved this from 🔧 Implementation to 👁️ Code review in Issues Board Aug 5, 2024
@enso-bot
Copy link

enso-bot bot commented Aug 6, 2024

Radosław Waśko reports a new STANDUP for the provided date (2024-08-02):

Progress: Prepared the Snowflake transactions PR for review - finished remaining operations and fixed tests. It should be finished by 2024-08-07.

Next Day: Next day I will be working on the #10716 task. Next task - investigate suggestions issue

@mergify mergify bot closed this as completed in #10738 Aug 6, 2024
@mergify mergify bot closed this as completed in 3fd1464 Aug 6, 2024
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented l-db-connector Libraries: database connectors l-db-write Libraries: database writer p-high Should be completed in the next sprint
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants