-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement follower rate limiting for file restore #37449
Conversation
This is related to elastic#35975. This commit implements rate limiting on the follower side using the `RateLimitingInputStream`.
Pinging @elastic/es-distributed |
This implements rate limiting on the follower side. However, it is mostly a POC for us to talk about because I'm not sure this is what we want. I implemented it in the same way we implement it in We could:
On the read side:
In particular:
Each individual operation is atomic, but not in combination. If two messages are being received at the same time So I'm not sure if I should be following that strategy? We do the same thing on writing the normal recovery chunks. So maybe that is what we want? It just seems to me that when concurrent restores are happening, this allows more bytes to be sent than we intended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the same approach, where the rate limiter works in a concurrent setting, to be used for peer recovery, recovery from remote as well as snapshot restore. If there are problems with it (like you pointed out), we can fix it (for all of peer recovery / remote recovery / snapshot restore) in a follow-up. We will also need a setting here to make the rate limiter configurable for CCR.
@@ -336,7 +342,7 @@ public void close() { | |||
} | |||
} | |||
|
|||
private static class RestoreFileInputStream extends InputStream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove the static
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to access the rate limiter field. We also need to add a AtomicLong
field for bytes written that needs to be shared with the input streams.
I replicated this logic in the repository. It might be possible to wrap rate limiter and atomic long in a Also I added a test. Although this is inherently kind of hard to test. The test starts a cluster with a low rate limit, indexes some documents, and then checks that rate limiting is applied. As the test starts and stops multiple clusters, it take about 10 seconds to run. And it also depends on "time" (if writes were very delayed for some reason the rate limiting might not kick in). Thoughts on the best testing approach? |
* Max bytes a follower node can recover per second. | ||
*/ | ||
public static final Setting<ByteSizeValue> FOLLOWER_RECOVERY_MAX_BYTES_READ_PER_SECOND = | ||
Setting.byteSizeSetting("ccr.follower.recovery.max_bytes_per_sec", new ByteSizeValue(40, ByteSizeUnit.MB), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use the same setting for rate-limiting inbound as well as outbound CCR recovery trafic. This means removing the "follower" part from the setting name. Perhaps ccr.indices.recovery.max_bytes_per_sec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also add a bullet point to the meta issue to document the newly introduced settings
public CcrRepository(RepositoryMetaData metadata, Client client, CcrLicenseChecker ccrLicenseChecker, Settings settings) { | ||
super(settings); | ||
this.metadata = metadata; | ||
assert metadata.name().startsWith(NAME_PREFIX) : "CcrRepository metadata.name() must start with: " + NAME_PREFIX; | ||
this.remoteClusterAlias = Strings.split(metadata.name(), NAME_PREFIX)[1]; | ||
this.ccrLicenseChecker = ccrLicenseChecker; | ||
this.client = client; | ||
this.rateLimiter = new RateLimiter.SimpleRateLimiter(FOLLOWER_RECOVERY_MAX_BYTES_READ_PER_SECOND.get(settings).getMbFrac()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting is dynamic, so let's allow to dynamically adjust it by registering a settings update consumer (see the RecoverySettings
how it's done). We can use the CCRSettings object to store the latest value, similar to the RecoverySettings
object. You can create the CCSettings
instance in the CCR
class and do the registration with CLusterSettings in createComponents, which can be accessed from clusterService.getClusterSettings()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I registered a listener for the setting change.
@@ -235,6 +238,60 @@ public void testDocsAreRecovered() throws Exception { | |||
thread.join(); | |||
} | |||
|
|||
public void testRateLimitingIsEmployed() throws Exception { | |||
restartClustersWithSettings(Settings.builder().put(CcrSettings.FOLLOWER_RECOVERY_MAX_BYTES_READ_PER_SECOND.getKey(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by making the setting truly dynamic, we won't need the restarts here, which will greatly speed up the tests. See SharedClusterSnapshotRestoreIT#testThrottling for how it's tested for snapshot /restore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the same setup and assertion as that test.
@@ -389,5 +402,17 @@ public int read(byte[] bytes, int off, int len) throws IOException { | |||
|
|||
return bytesReceived; | |||
} | |||
|
|||
private void maybePause(int bytesRequested) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to make sure, this is the same code as in PeerRecoveryTargetService. Let's have a follow-up to refactor it into a class that is reusable, and use it everywhere (peer recovery source, target as well as snapshot + restore).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did have to extract a CombinedRateLimiter
. Otherwise the different repositories for different remote clusters will not share the rate limiter. However, I added a meta issue to ensure that we use the same code base for this and peer recovery. So we can work out the issues in that follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, sounds good.
let's do this in a follow-up. Also we should do a little more testing at that point, to make sure it's behaving the correct way under concurrency. I'm ok to change the behavior of blobstore repository, if it fixes broken concurrent behavior.
should be fixed by making this setting dynamically updateable, which will not require cluster restarts |
I added a meta task to combine the different code usages.
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some minor comments
if (bytesSincePause > rateLimiter.getMinPauseCheckBytes()) { | ||
// Time to pause | ||
bytesSinceLastPause.addAndGet(-bytesSincePause); | ||
return rateLimiter.pause(bytesSincePause); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have this always return a non-negative number?
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Outdated
Show resolved
Hide resolved
@@ -389,5 +402,17 @@ public int read(byte[] bytes, int off, int len) throws IOException { | |||
|
|||
return bytesReceived; | |||
} | |||
|
|||
private void maybePause(int bytesRequested) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, sounds good.
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/util/CombinedRateLimiter.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ccr/src/test/java/org/elasticsearch/xpack/ccr/CcrRepositoryIT.java
Outdated
Show resolved
Hide resolved
Made your changes @ywelsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Jenkins run Gradle build tests 1 |
Commit elastic#37535 removed an internal restore request in favor of the RestoreSnapshotRequest. Commit elastic#37449 added a new test that used the internal restore request. This commit modifies the new test to use the RestoreSnapshotRequest.
This is related to #35975. This commit implements rate limiting on the follower side using a new class `CombinedRateLimiter`.
This is related to #35975. This commit implements rate limiting on the
follower side using the
RateLimitingInputStream
.