Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add upper limit for scroll expiry #26448

Merged
merged 7 commits into from
Sep 6, 2017

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Aug 30, 2017

Add upper limit for scroll expiry

This change adds a dynamic cluster setting named search.max_keep_alive.
It is used as an upper limit for scroll expiry time in scroll queries and defaults to 1 day.
For throttling purpose reindex overrides the default scroll expiry time automatically and could be affected by this change which is why we use an insanely high default value at the moment (1 day).
This change also ensures that the existing setting search.default_keep_alive is always smaller than search.max_keep_alive.

Relates #11511
Fixes #23268

jimczi added 3 commits August 30, 2017 21:27
This change adds a dynamic cluster setting named `search.max_keep_alive`.
It is used as an upper limit for scroll expiry time in scroll queries and defaults to 1 hour.
This change also ensures that the existing setting `search.default_keep_alive` is always smaller than `search.max_keep_alive`.

Relates elastic#11511
@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories >feature v6.1.0 v7.0.0 labels Aug 30, 2017
@jimczi
Copy link
Contributor Author

jimczi commented Aug 31, 2017

Tests for _reindex fail with the default value of 1 day. We test that _reindex can be paused (by setting request_per_second to 0.0..1 and this creates very big expiry time for the internal scroll request.
I pushed a commit that bounds the maximum value for _reindex expiry time to 1h. This is just a workaround to make sure that _reindex does not wait too long between two scroll requests. @nik9000 I can open a separate PR for this but the workaround is pretty simple and not intrusive:
dc8238b

*/
public synchronized <A, B> void addSettingsUpdateConsumer(Setting<A> a, Setting<B> b, BiConsumer<A, B> consumer) {
addSettingsUpdateConsumer(a, b, consumer, (i, j) -> {} );

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line?


public void validate(Integer a, Integer b) {
if (Integer.signum(a) != Integer.signum(b)) {
throw new IllegalArgumentException("boom");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💥

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected some reindex integration tests to fail as well.....

/**
* Maximum wait time allowed for throttling.
*/
private static final long MAX_THROTTLE_WAIT_TIME = TimeUnit.HOURS.toNanos(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably use a TimeValue instead. That way you never have to wonder "is this in nanos or millis?"

@@ -189,7 +195,8 @@ public void delayPrepareBulkRequest(ThreadPool threadPool, TimeValue lastBatchSt

public TimeValue throttleWaitTime(TimeValue lastBatchStartTime, TimeValue now, int lastBatchSize) {
long earliestNextBatchStartTime = now.nanos() + (long) perfectlyThrottledBatchTime(lastBatchSize);
return timeValueNanos(max(0, earliestNextBatchStartTime - System.nanoTime()));
long waitTime = min(MAX_THROTTLE_WAIT_TIME, max(0, earliestNextBatchStartTime - System.nanoTime()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like limiting this here but think maybe we ought to also enforce it at request start and rethrottle time. We can work backwards from the batch size to reject requests that had a requests_per_second that is too small, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just do that in a followup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be nice to use the actual value of the setting but that can be a "reindex only" followup thing, I think.

@nik9000
Copy link
Member

nik9000 commented Sep 5, 2017

I expected some reindex integration tests to fail as well.....

Talked to Jim - I was wrong. No tests should fail because we never assert that we wait as long as we say we do. We have an integration test in reindex that starts the request very slow and then rethrottles it to speed it up but one hour is slow enough for it. If we modified reindex to fail user side requests that'd set the limit too long then the test would fail. We probably should do that as a separate change to reindex after this is merged just so no one is surprised about the sleep.

@@ -189,7 +195,8 @@ public void delayPrepareBulkRequest(ThreadPool threadPool, TimeValue lastBatchSt

public TimeValue throttleWaitTime(TimeValue lastBatchStartTime, TimeValue now, int lastBatchSize) {
long earliestNextBatchStartTime = now.nanos() + (long) perfectlyThrottledBatchTime(lastBatchSize);
return timeValueNanos(max(0, earliestNextBatchStartTime - System.nanoTime()));
long waitTime = min(MAX_THROTTLE_WAIT_TIME, max(0, earliestNextBatchStartTime - System.nanoTime()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be nice to use the actual value of the setting but that can be a "reindex only" followup thing, I think.

@jimczi jimczi merged commit 0c799ee into elastic:master Sep 6, 2017
@jimczi jimczi deleted the feature/scroll_max_keep_alive branch September 6, 2017 08:06
jimczi added a commit that referenced this pull request Sep 7, 2017
This change adds a dynamic cluster setting named `search.max_keep_alive`.
It is used as an upper limit for scroll expiry time in scroll queries and defaults to 1 hour.
This change also ensures that the existing setting `search.default_keep_alive` is always smaller than `search.max_keep_alive`.

Relates #11511

* check style

* add skip for bwc

* iter

* Add a maxium throttle wait time of 1h for reindex

* review

* remove empty line
jimczi added a commit that referenced this pull request Sep 7, 2017
jimczi added a commit that referenced this pull request Sep 7, 2017
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories v6.1.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants