Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry metric store for more transport errors #538

Merged
merged 5 commits into from
Jul 24, 2018

Conversation

dliappis
Copy link
Contributor

Improve the resiliency of Rally when using an Elasticsearch remote metrics store by retrying certain transport errors.

Also enhance existing tests to check that retries/debug output and sleep are executed properly.

@dliappis dliappis changed the title Retry metric store for 502 503 504 Retry metric store for more transport errors Jul 23, 2018
@dliappis dliappis added enhancement Improves the status quo :Metrics How metrics are stored, calculated or aggregated labels Jul 23, 2018
@dliappis dliappis added this to the 1.0.1 milestone Jul 23, 2018
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement! Looks fine in general, I just left a few ideas for you to ponder.


# should return on first success
operation = mock.Mock(side_effect=[gateway_timeout, gateway_timeout, "success", gateway_timeout])
operation = mock.Mock(side_effect=[bad_gateway, bad_gateway, "success", bad_gateway])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe randomize and vary the error type here? I think then we could also reduce the number of individual test cases?

elif e.status_code == 429 and execution_count < max_execution_count:
self.logger.debug("Execution rejected in attempt [%d/%d].", execution_count, max_execution_count)
time.sleep(3)
retriable_responses_with_sleep = {502: 1, 503: 1, 504: 1, 429: 3}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also be ok if we settle on a single waiting period. Wdyt?

Dockerfile Outdated
@@ -25,6 +25,7 @@ ENV HOME /home/${NEW_USER}
ENV PYENV_ROOT=/home/${NEW_USER}/.pyenv
ENV PATH=$PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH
ENV JAVA_HOME=/opt/jdk-10
ENV JAVA10_HOME=/opt/jdk-10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this came from this commit 9ddb393 to fix the it tests in Docker. I'll pull it out and submit separately.

dliappis added 2 commits July 24, 2018 10:26
and switch to fixed wait time between attempts (3s).
This will be addressed separately outside of the PR.
@dliappis
Copy link
Contributor Author

@danielmitterdorfer Thanks for the review! I changed the logic, as discussed, to use randomized tests and fixed retry periods. Would appreciate another look when you have time.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look great. Thanks for the PR! LGTM.

@dliappis dliappis merged commit 7724526 into elastic:master Jul 24, 2018
@dliappis dliappis deleted the retry-metric-store-for-502-503-504 branch July 24, 2018 09:12
@dliappis
Copy link
Contributor Author

Thank you for the review @danielmitterdorfer !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Metrics How metrics are stored, calculated or aggregated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants