-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daemonset sleep period/restart causes confusing logs #970
Comments
Multiple k8s issues regarding run-once daemonset behavior but nothing happening it seems? Issues pointed that they may have a new supported restart policy of |
I think that we can just loop sleep forever instead of just sleep for 1h. Is there a downside to that? It just changes the systemd startup command from
to
@zubron what do you think? Any downside to that? |
@johnSchnake I just took a look through that issue and some of the other duplicate issues. This approach seems fine and looks like what most other people are doing anyway. I don't know how else we could work around this. Even if support for a new restart policy was introduced, it would be quite a few release cycles before we could ever make use of it given that we need to support older cluster versions. |
We currently sleep for 1h after the logs are gathered but some test runs take multiple hours which leads to this plugin getting restarted and trying to resubmit logs. This causes confusing messages in the logs since the original plugin pod is gone and the only one that exists gets a 409 error from submitting duplicate results. Fixes #970 Signed-off-by: John Schnake <[email protected]>
What steps did you take and what happened:
Run a e2e+systemdlogs run. If the e2e tests take a long time (multiple hours) the systemdlogs will gather logs, sleep for an hour, shutdown (causing a restart since DS has to have restart policy=Always), then they gather logs again and the server rejects them as duplicate.
This can be really confusing in error cases since the logs make it unclear where results came from/when good results got processed, why retries were occuring, etc.
That was one issue hitting #969
What did you expect to happen:
I want the daemonsets to be able to be run-once with restart Never. E.g. a job that starts on every node. K8s just doesnt have that as of now so we've fallen back to
do work && sleep 3600
. Need to reconsider how that integrates with the aggregator and logging to avoid the confusion.Maybe even if it just starts up it can check if it has already reported results itself somehow and exit if it has. Could that cause worse problems at some point?
Anything else you would like to add:
Problem in the logs is that the flow is like this:
It makes it unclear when/who ever submitted results.
The text was updated successfully, but these errors were encountered: