Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: dynamic cut-over threshold in Online DDL #17123

Closed
shlomi-noach opened this issue Oct 31, 2024 · 0 comments · Fixed by #17126
Closed

Feature Request: dynamic cut-over threshold in Online DDL #17123

shlomi-noach opened this issue Oct 31, 2024 · 0 comments · Fixed by #17126
Assignees
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc)

Comments

@shlomi-noach
Copy link
Contributor

Feature Description

Online DDL's cut-over threshold value is used to both determine whether a migration is ready to cut-over, as well as set timeout for the cut-over operation, as follows:

  • One of the indicators to a migration being ready to cut-over is the vreplication lag (which is different from replication lag, but is certainly correlated to, and can be caused by replication lag). If vreplication lag is higher then the cut-over threshold, we say the migration is not ready to complete.
  • Once we enter the cut-over phase, there's a bunch of operations that we timeout. The locks on the table. The rename statement, the wrapping query buffering. And the value used for those timeouts is (based on) the cut-over threshold.

Just putting this out of the way that it makes sense to use that same value for both cases, as the closely correlate to each other.

Now, there is a default cut-over threshold, hard coded as 10sec. The user is then able to supply a different value via ddl strategy flag, e.g. --cut-over-threshold=15s.

But as things stand, that value is then constant for the duration of the migrations. Sometimes, you see a migration struggling to complete under load. There's a variety of techniques to help it get through: just let it retry; throttle it and then unthrottle it on a less busy time; force completion (kill blocking queries). However, we want to also add the ability to modify the cut-over threshold on a running migration.

This would be achieved by a query such as ALTER VITESS_MIGRATION '<uuid>' CUTOVER_THRESHOLD='15s'. We should limit cut-over threshold to reasonable values. I'd say 5s would be the bare minimum, and 30s is pushing it on the upper limit (the effect could be a table being locked for 30s).

Use Case(s)

Dynamic control over Online DDL migrations, solving ongoing cut-over issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc)
Projects
None yet
1 participant