-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not report problems until kube-apiserver is ready #295
Comments
We've discussed 3 ways to solve the problem.
I prefer (3), which is the easy way to solve the problem without any side effects on the existing behavior. |
Posting the comments from @Random-Liu in our offline discussion.
I will follow the advice and go with option (1). |
In #288, we changed NPD to run custom plugins on startup. I hoped this would allow NPD to always report an event immediately when the cluster is just created, no matter how big the
invoke_internal
is.However, this will not always work due to its interaction with kube-apiserver. What I observed during cluster creation was below.
Unable to write event: 'Post https://x.x.x.x/api/v1/namespaces/default/events: dial tcp 3 4.68.6.201:443: connect: connection refused' (may retry after sleeping)
events is forbidden: User "system:node-problem-detector" cannot create resource "events" in API group "" in the namespace "default"' (will not retry!)
There is a small window between (3) and (5) - if the event is rejected during that interval the event will never be resent again.
Changing the event library to always retry on permission error may or may not make sense. But what we can do in NPD is to introduce a configurable
initial_delay
for custom plugins. In this case, I can configure it to 1m withinvoke_internal
still being 6h. The plugin will run after 1m when the NPD starts./cc @wangzhen127 @Random-Liu
The text was updated successfully, but these errors were encountered: