-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pubsub): ordering keys #26
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on the original PR, a system test would be very good to have, although that can be added separately.
There is a possibility that the code temporarily deviates from the FlowControl.max_bytes
limit in certain usage scenarios - which are believed to be atypical - but I'd like to hear from @kamalaboulhosn for the final ruling on that.
The PR still needs changes from the latest master
, GitHub reports that it's out of date and not mergeable yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like pubsub/google/cloud/pubsub_v1/subscriber/_protocol/messages_on_hold.py didn't get pulled over.
@plamut I'm actually okay with the deviation in this way. Here is the way I think about it: Max bytes flow control is to save the client from OOMing when the messages could vary, but the size of the message is the overwhelmingly interesting thing that takes up memory. Given that that message is already in memory in the client library and buffered, no more memory is being used to send it to the user callback. If the interesting factor in memory utilization is state that has to be loaded due to the message, e.g., have to load a large record from a database or a file from GCS, then the number of messages is the better thing to use for flow control. For that flow control, I believe we won't deliver more messages unless we can. |
@kamalaboulhosn I hadn't thought of that max-bytes limit as primarily an OOM-protection feature. I'll keep that in mind. I actually just now tried the code changes that would need to be made to prevent this overflow. It's not much of a change, but it adds more ifs-ands-and-buts to the design, making it harder to understand. Ie: the invariant that keys in the "pending ordering keys" dict all have messages in flight is diluted: now, there's an extra possibility that the key is waiting to be "activated". In any case, I prefer to keep things simple for now, because it's only gonna get more complicated when we do an optimization pass and fix bugs. |
Going to release a version of the library before merging this branch. |
we have been seeing this new assert trigger in production https://github.com/googleapis/python-pubsub/pull/26/files#diff-a9838428343bb40f2ac9b6cf2beb9f77R332-R335 I feel like if there is an error that it should try and create a new batch rather than hit that assert. given the rest of the code if I am not mistaken the assert should possible be moved down below
so that it will return None and create a new batch. |
Fixes #6