-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve informer test #594
Conversation
2a2748f
to
c23dfac
Compare
8a3c96b
to
a819708
Compare
@waiter.run | ||
rescue ThreadError # rubocop:disable Lint/SuppressedException | ||
end | ||
@waiter.join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think I see — previously, leftover waiter thread(s) could execute @watcher.finish
at any time, possibly interrupting unrelated watchers from later tests, right?
But if we somehow (e.g. 'ERROR') come here early during sleep(@reconcile_timeout)
, we might block for a long time — default 15min right?
Here, what we're potentially blocking is the existing worker thread's loop, which won't be directly felt by the app but can cause significant gaps in the watching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would have killed a random new waiter before (not really in tests since they all build their own informer)
added a Thread.pass
to fix the race condition (don't think it really happens since the watch is a http request so it will take some time, but it's cheap so 🤷 )
rescue ThreadError | ||
# thread was already dead | ||
end | ||
thread.join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question about @waiter
possibly blocking for many minutes.
Here it'd block the app calling stop_worker
right?
-
I'm thinking there is a subordinate relationship between worker thread -> single
watch_to_update_cache
run -> waiter thread. 🤔
What if waiter threads did not refer to current instance variable@watcher.finish
but closed over a lexically scoped reference to their watcher? Would then it be safe to leak unfinished waiter threads without .join-ing them? Would it be a good idea? -
Waiter threads don't do much — are they safe to .kill? Or is there another way to interrupt a sleep() early?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is there another way to interrupt a sleep() early?
You can use Concurrent::Event
with a wait timeout to implement an interruptible sleep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before we killed them, but that made the tests brittle, so I like the joining since that makes anything that goes wrong more obvious (and re-raises exceptions too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my testing @watcher.each
can get blocked on the response stream. Even closing the http_client
will not unblock the response body stream.
In that case join
on the waiter thread that runs the @watcher.each
internally will never return
@agrare I don't know if you have been following the Kubeclient::Informer class on master branch, but perhaps it's interesting for future use in manageiq (?). Either way, you may have useful experience to share here from how manageiq is killing & restarting watches... That's a long way of saying "please review" 😉 [P.S. entirely out-of-scope here, but there is the unfinished business of #488. The blocker there was we didn't know how to implement watch |
Hey @cben! No I hadn't seen this, it definitely looks intriguing though we try not to cache inventory to keep our memory usage down so I doubt we'd use it directly but I can definitely see how this would be useful if your code were only interacting with kubeclient. There definitely could be a more user friendly interface around watches since it seems we are solving the same issues.
I notice a few minor differences in how we handle this compared to the
I believe we pulled this logic from the kubevirt/cnv ManageIQ provider which was contributed by the kubevirt team. Not saying this is better or worse than the Informer implementation just noting the differences.
Yes we do use What is the purpose of the waiter thread? I looks like it'll stop the watch after 15 minutes? |
a819708
to
e5cda0a
Compare
I think this is a good step forward, so prefer to merge and then iterate further if there are still open issues. |
Sorry, missed your reply entirely. FWIW,
I'm still somewhat concerned a call to And I'm certainly happy about CI looking reliably green now 🎉 Merging. |
ensure | ||
sleep(1) # do not overwhelm the api-server if we are somehow broken | ||
end | ||
break if @stopped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not change the loop
to until @stopped
then?
making all the threads terminate and no longer killing things ... 3 green ci runs 🤞