Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: limit the bandwidth used for snapshots #14718

Merged

Conversation

petermattis
Copy link
Collaborator

Limit the bandwidth used for snapshots. Preemptive snapshots are
throttled to 2 MB/sec (COCKROACH_PREEMPTIVE_SNAPSHOT_RATE) and Raft
snapshots are throttled to 4 MB/sec (COCKROACH_RAFT_SNAPSHOT_RATE). The
effect of limiting the bandwidth is that a preemptive snapshot for a 64
MB range will take ~32s to send and a Raft snapshot will take ~16s. The
benefit is a much smaller impact on foreground traffic.

Fixes #10972

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator Author

I'm not sure how to test this other than manually on our test clusters. See the graphs in #10972.

@petermattis petermattis force-pushed the pmattis/rate-limit-snapshots branch from 28ce6e9 to 816434a Compare April 7, 2017 21:29
@bdarnell
Copy link
Contributor

In #13687 we wanted to increase the bandwidth usable by snapshots. We're still better than before (we were limited to 1MB/sec by GRPC's defaults), but it feels to me like we're going about this the wrong way by tweaking limits in both directions. What we really need is some sort of prioritization scheme that would allow snapshots to use the bandwidth only if it's not needed for other stuff. But I don't have any concrete suggestions so maybe we should go ahead and do this anyway.

Should raft snapshots perhaps have a much higher rate (or be unlimited), since they always indicate a replica falling behind and therefore a range being in danger?


Review status: 0 of 3 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@petermattis
Copy link
Collaborator Author

What we really need is some sort of prioritization scheme that would allow snapshots to use the bandwidth only if it's not needed for other stuff. But I don't have any concrete suggestions so maybe we should go ahead and do this anyway.

I had the same thought this weekend, but haven't had a good idea for a mechanism to de-prioritize snapshot traffic. I think we should go ahead with this PR since it is simple and clear in effect. I'll file an issue that a better mechanism should be developed here.

Should raft snapshots perhaps have a much higher rate (or be unlimited), since they always indicate a replica falling behind and therefore a range being in danger?

The 4 MB/sec default for Raft snapshots was arbitrary. I'd be fine bumping this up to 8 MB/sec. I'm mildly anxious about making it unlimited as the evidence right now is that can overwhelm a gRPC connection.

Limit the bandwidth used for snapshots. Preemptive snapshots are
throttled to 2 MB/sec (COCKROACH_PREEMPTIVE_SNAPSHOT_RATE) and Raft
snapshots are throttled to 8 MB/sec (COCKROACH_RAFT_SNAPSHOT_RATE). The
effect of limiting the bandwidth is that a preemptive snapshot for a 64
MB range will take ~32s to send and a Raft snapshot will take ~8s. The
benefit is a much smaller impact on foreground traffic.

Fixes cockroachdb#10972
@petermattis petermattis force-pushed the pmattis/rate-limit-snapshots branch from 816434a to be834a3 Compare April 10, 2017 12:35
@tamird
Copy link
Contributor

tamird commented Apr 10, 2017 via email

@petermattis
Copy link
Collaborator Author

Should we file an issue upstream? Seems like an ability to specify priority would be useful for many users of grpc.

I'm chatting with the gRPC folks today. I'll bring this up with them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants