Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout for waiting on compactor to become ACTIVE in the ring. #4262

Merged
merged 17 commits into from
Jul 5, 2021

Conversation

ac1214
Copy link
Contributor

@ac1214 ac1214 commented Jun 4, 2021

Signed-off-by: Albert [email protected]

What this PR does:
Addresses #3914
Which issue(s) this PR fixes:
Fixes #3914

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Given this PR tries to address #3914, I think a timeout is just fine. You can test a deadline on the context passed to ring.WaitInstanceState and that should be enough.

@ac1214 ac1214 changed the title add MaxRetries to WaitInstanceState Add timeout for waiting on compactor to become ACTIVE in the ring. Jun 4, 2021
@ac1214 ac1214 requested a review from pracucci June 4, 2021 16:31
Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It's broadly ok, but a couple of style points.

It looks to me like the tests you've added test similar code, rather than the code you added. If I'm right, could you test the actual code?

What is the user experience if the timeout is hit? Do they see anything in the logs to let them know what happened?

pkg/compactor/compactor.go Outdated Show resolved Hide resolved
pkg/ring/ring.go Outdated Show resolved Hide resolved
@ac1214
Copy link
Contributor Author

ac1214 commented Jun 22, 2021

Thanks for the PR! It's broadly ok, but a couple of style points.

It looks to me like the tests you've added test similar code, rather than the code you added. If I'm right, could you test the actual code?

What is the user experience if the timeout is hit? Do they see anything in the logs to let them know what happened?

I added tests to actually test the starting of the compactor. I also added some logging as to make it clear to why the compactor failed to start.

@ac1214 ac1214 requested a review from bboreham June 22, 2021 19:09
@bboreham
Copy link
Contributor

Sorry but we have begun the process of cutting a new release; please rebase from master and move your CHANGELOG entry to the top under ## master / unreleased

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job and thanks for addressing my feedback. Way easier than the initial version. I left few last comments 🙏

pkg/compactor/compactor.go Outdated Show resolved Hide resolved
pkg/compactor/compactor_ring.go Outdated Show resolved Hide resolved
pkg/compactor/compactor_ring.go Outdated Show resolved Hide resolved
pkg/compactor/compactor_test.go Outdated Show resolved Hide resolved
pkg/ring/util_test.go Outdated Show resolved Hide resolved
@ac1214 ac1214 requested a review from pracucci June 30, 2021 17:35
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (modulo a nit)

CHANGELOG.md Outdated Show resolved Hide resolved
@ac1214
Copy link
Contributor Author

ac1214 commented Jul 1, 2021

Would I be able to get a second review on this PR and get the workflow triggered again? @bboreham
Thanks!

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Marco Pracucci <[email protected]>
@pracucci pracucci enabled auto-merge (squash) July 5, 2021 15:17
@pracucci pracucci merged commit b4af68a into cortexproject:master Jul 5, 2021
alvinlin123 pushed a commit to ac1214/cortex that referenced this pull request Jan 14, 2022
…ortexproject#4262)

* add MaxRetries to WaitInstanceState

Signed-off-by: Albert <[email protected]>

* update CHANGELOG.md

Signed-off-by: Albert <[email protected]>

* Add timeout for waiting on compactor to become ACTIVE in the ring.

Signed-off-by: Albert <[email protected]>

* add MaxRetries variable back to WaitInstanceState

Signed-off-by: Albert <[email protected]>

* Fix linting issues

Signed-off-by: Albert <[email protected]>

* Remove duplicate entry from changelog

Signed-off-by: Albert <[email protected]>

* Address PR comments and set timeout to be configurable

Signed-off-by: Albert <[email protected]>

* Address PR comments and fix tests

Signed-off-by: Albert <[email protected]>

* Update unit tests

Signed-off-by: Albert <[email protected]>

* Update changelog and fix linting

Signed-off-by: Albert <[email protected]>

* Fixed CHANGELOG entry order

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Albert <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
Signed-off-by: Alvin Lin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

compactor went into tailspin
3 participants