-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck retrying Migration to 21.2-56: "populate RangeAppliedState.RaftAppliedIndexTerm for all ranges" #81961
Comments
Hello, I am Blathers. I am here to help you get the issue triaged. Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here. I was unable to automatically find someone to ping. If we have not gotten back to your issue within a few business days, you can try the following:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
10th retry now... |
@wzrdtales How many total ranges do you have in this cluster? Judging from that screenshot we're churning through ~3800 ranges per minute. With the estimate of 3-4 days, are you saying this cluster has 16,416,000+ ranges? I'm more interested in the retry loop itself rather than how long each attempt takes. Do you happen to have logs around where these retries occur? If it's because of timeouts in applying the Migrate request, I wonder if we should bump
It is partially intended, though the retry behavior not so much. The slow pace is to pace the internal migrations such that they're non-disruptive to foreground traffic. We want to avoid a thundering herd of work on upgrades. |
I was calculating the whole runtime, it ended up needing around 24 hours in total I think. It resolved on its own, but in a weird status which left us worried (as you noticed :))
Not yet that big, "only" 300k+ ranges. |
I'm glad it resolved, I'll close this issue. I'll add a suggestion to #72931 to let operators control the pacing here more directly, if they want to speed up the migrate parallelism for clusters with large range counts while keeping an eye out on foreground impact. I'll also note that some of these migrations are intended to be long-running and operating over the timescales you're observing: #48843. |
Describe the problem
After upgrading to 22.1, several migrations were running through fine. Since 10 hours it is however stuck with 21.2-56 and doesn't finish on it. (Retrying the 9th time now)
To Reproduce
Upgrade to 22.1
On kubernetes.
Jira issue: CRDB-16140
The text was updated successfully, but these errors were encountered: