-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
informer library for shared watching with retries and updated list #494
Conversation
48a91bd
to
c4bbdd9
Compare
38f2f39
to
912bf44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor questions aside, this is great ❤️
end | ||
|
||
def fill_cache | ||
reply = @client.get_entities(nil, @resource_name, raw: true, resource_version: '0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will version "0" result in .metadata.resourceVersion we can watch from? It means "any version is OK" which allows server returning cached response, but I vaguely remember hearing that might allow a version so old it'll return 410 when watched (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, when List+Watch restarts, this might lead to replaying a state + events that are older than we already observed?
https://kubernetes.io/docs/reference/using-api/api-concepts/#the-resourceversion-parameter doesn't exactly answer this, but does say about supplying resourceVersion="0" to watch directly:
Get State and Start at Any: Warning: Watches initialize this way may return arbitrarily stale data! Please review this semantic before using it, and favor the other semantics where possible. Start a watch at any resource version, the most recent resource version available is preferred, but not required; any starting resource version is allowed. It is possible for the watch to start at a much older resource version that the client has previously observed, particularly in high availability configurations, due to partitions or stale caches. Clients that cannot tolerate this should not start a watch with this semantic. To establish initial state, the watch begins with synthetic "Added" events for all resources instances that exist at the starting resource version. All following watch events are for all changes that occurred after the resource version the watch started at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- if the version was too old it would results in a restart of the list/watch loop, so we should be good (never saw this happening)
- watch does not receive the
0
, but what the list returned ... but it could end up in a replay since the list might have gotten old data ... so ideally we'd keep track of the last resourceVersion we saw and then do something with that (with a "latest" approach we'd lose some events though)
lib/kubeclient/informer.rb
Outdated
case notice[:type] | ||
when 'ADDED', 'MODIFIED' then @cache[cache_key(notice[:object])] = notice[:object] | ||
when 'DELETED' then @cache.delete(cache_key(notice[:object])) | ||
when 'ERROR' then break # restart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- for 410 (Gone / Expired), this will restart the fill_cache -> watch_to_update_cache loop ✔️
- can any other errors be interesting for user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea, I never had trouble with just ignoring :D
lib/kubeclient/informer.rb
Outdated
when 'ADDED', 'MODIFIED' then @cache[cache_key(notice[:object])] = notice[:object] | ||
when 'DELETED' then @cache.delete(cache_key(notice[:object])) | ||
when 'ERROR' then break # restart | ||
else raise "Unsupported event type #{notice[:type]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this happens (shouldn't), it'd raise in the worker thread. Would ruby even print it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be most consistent to ignore it so the worker does not crash ...
signaling back is kinda tricky, that's what I'm planing to use a logger for
a266968
to
f17b299
Compare
still any showstoppers ? |
f17b299
to
1dbffe6
Compare
[we're awaiting a baby any time now, so my response times will be more erratic than they already were.] One thought I had is this actually consists of several parts:
In theory they could be split, one might want restarts without keeping a cache in RAM (or only a subset of cache). LGTM 👍 Can you check the test failure?
|
30b9e61
to
2431240
Compare
2431240
to
e8493a9
Compare
I think I got the tests fixed now ... truffle-ruby had some timing issues, so had to disable 1 test there |
LGTM. I'm still uncertain about using resourceVersion "0" for list, but we can revise that later if needed. |
fixes #456