-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConnectionRecovery recovers models before consumers #1076
Comments
I would say considering a recovering channel to be useful would be generally wrong but there is no realistic way for us to block all other operations (that would be highly surprising after all these years). |
The "fast reply consumer" is a message from the Direct Reply-to mechanism. It's not a real consumer and it does not use a real queue. It relies on a convention in the channel state, so as channels are restarted, that state is completely lost. All this happens on the RabbitMQ node end. |
Yes, but from the client point of view it's an active consumer that needs to be recovered before you can publish a rpc successfully. I want to have a look and check whether there's something that could be done to improve it. |
This is now fixed on 6.x branch and on main. Would it be possible to do a bugfix release for the 6.x and close this issue then? |
I'll try to do it today. There were some pipeline changes forced upon us (e.g. expiring signing keys) so I cannot be certain releasing would just work. |
We had a go at doing a release, it required an internally used Docker image rebuild. Hopefully we can finalise it later this week. Sorry about the wait. |
Would it be possible to give a rough timeframe? (We're planning a release of our product, which we hit this issue) |
I have a reproduced state in our application, where the connection recovery leads to an unrecoverable error.
Situation:
Now the fun begins...
While the recovery is within these lines here a RPC message gets tried to published.
Since the models have already been recovered (Line 420), the publish actually succeeds, but the server will respond with
PRECONDITION_FAILED - fast reply consumer does not exist
because the consumer for the replies has not yet been recovered. (Line 424)The outcome of this is that the channel will close and rendered unusable, while the connection recovery succeeds.
So the question is, should the AutorecoveringModel already be usable again prior to the recovery being finished?
It kind of is related to #1061 as it will touch similar area.
If a temporary model would restore the topology, then at least the time between the recovery of the models and the recovery of the consumers would be shortened. (Still possible tough)
The text was updated successfully, but these errors were encountered: