Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sqllab] add retries for stop_query #8139

Merged

Conversation

serenajiang
Copy link
Contributor

@serenajiang serenajiang commented Aug 29, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

Used backoff module to add retries for stop_query, which was having issues related to lock timeouts and rollbacks. Also reworked get_query to use backoff as well.

get_query behavior is the same as before.

stop_query attempts to stop the query five times. If, during the process of stopping, the query completes, it will not set the status of the query to stop. If it fails all 5 retries, it will throw an error (probably the same lock timeout error, but without the stuff about rollbacks because there are now rollbacks).

TEST PLAN

tox passes. stop_query works as expected locally, but it's hard to reproduce this error, so I can't really tell if it errors less.

REVIEWERS

@graceguo-supercat

@codecov-io
Copy link

codecov-io commented Aug 29, 2019

Codecov Report

Merging #8139 into master will increase coverage by 0.14%.
The diff coverage is 42.85%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #8139      +/-   ##
=========================================
+ Coverage   66.06%   66.2%   +0.14%     
=========================================
  Files         479     479              
  Lines       22930   22929       -1     
  Branches     2524    2524              
=========================================
+ Hits        15148   15180      +32     
+ Misses       7648    7615      -33     
  Partials      134     134
Impacted Files Coverage Δ
superset/views/core.py 71.53% <33.33%> (+0.27%) ⬆️
superset/sql_lab.py 77.51% <50%> (+0.04%) ⬆️
superset/assets/src/chart/chartAction.js 49.63% <0%> (ø) ⬆️
superset/models/core.py 81.76% <0%> (+0.64%) ⬆️
superset/db_engine_specs/mysql.py 92.15% <0%> (+49.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee24539...96564d6. Read the comment docs.

@villebro
Copy link
Member

@smacker is this related to #7611 ?

@smacker
Copy link
Contributor

smacker commented Aug 30, 2019

@villebro nope. This change handles errors on superset side (main DB with superset's state). While #7611 interacts with data source connections. Let me give an example:

  1. You keep your superset state (including queries) in postgresql
  2. You have huge MySQL DB which you use to generate data for charts and you query it using SQLLab in async mode
  3. Your query makes a very expensive calculation like joining several tables with billions of rows without proper where statement (maybe by a mistake)
  4. You realize the query is taking too long or it's just wrong and you stop it
  5. Status of the query is changed in postgresql and UI works correctly. But celery job is still running in the background because the job itself doesn't have any "cancel" method except SQLLAB_TIMEOUT. The only place where the check for stop happens is between queries. So as long as the query already started on MySQL side, it will be consuming resources on mysql server while you don't need the result of the query anymore
  6. [sqllab] Cancel database query on stop #7611 adds a hook to say to MySQL "I don't need this query anymore, please stop it".

You can reproduce it by running a long query, pressing stop in UI and then checking SHOW PROCESSLIST on your DB server.

P.S. I used MySQL as an example, it would work the same for any other data source

@graceguo-supercat
Copy link

This PR is trying to fix the errors like this:

This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (_mysql_exceptions.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction')[SQL: UPDATE query SET status=%s, changed_on=%s WHERE query.id = %s][parameters: ('stopped', datetime.datetime(2019, 8, 30, 17, 33, 25, 363177), 4180054)](Background on this error at: http://sqlalche.me/e/e3q8) (Background on this error at: http://sqlalche.me/e/7s2a)

@serenajiang serenajiang force-pushed the serena-stop-query-retries branch from 6d0574a to be825a4 Compare August 30, 2019 23:22
@serenajiang serenajiang changed the title [sqllab] add retries for stop_query [WIP] add retries for stop_query Aug 30, 2019
@serenajiang serenajiang force-pushed the serena-stop-query-retries branch from be825a4 to 210021f Compare August 30, 2019 23:58
@serenajiang serenajiang changed the title [WIP] add retries for stop_query [sqllab] add retries for stop_query Aug 31, 2019
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and added logic looks solid. Also much cleaner and easier to refine down the line with backoff if need to move to exponential backoff or similar as opposed to using homegrown retry logic. LGTM!

@villebro
Copy link
Member

@serenajiang this needs rebasing.

Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple nits and a couple double checks, but this looks awesome otherwise. Thanks for taking the time to refactor this using backoff too!

@graceguo-supercat graceguo-supercat merged commit 00257b9 into apache:master Sep 3, 2019
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.35.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 0.35.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants