Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConnectionRecovery recovers models before consumers #1076

Closed
bollhals opened this issue Aug 26, 2021 · 7 comments
Closed

ConnectionRecovery recovers models before consumers #1076

bollhals opened this issue Aug 26, 2021 · 7 comments
Assignees

Comments

@bollhals
Copy link
Contributor

I have a reproduced state in our application, where the connection recovery leads to an unrecoverable error.

Situation:

  • 1 AutorecoverConnection + Channel
  • The connection dropped (plugged out the cable)
  • The recovery procedure starts its loop to recover
  • Cable is plugged in again
  • Connection can be established again, and the recover starts.

Now the fun begins...
While the recovery is within these lines here a RPC message gets tried to published.

Since the models have already been recovered (Line 420), the publish actually succeeds, but the server will respond with PRECONDITION_FAILED - fast reply consumer does not exist because the consumer for the replies has not yet been recovered. (Line 424)

The outcome of this is that the channel will close and rendered unusable, while the connection recovery succeeds.

So the question is, should the AutorecoveringModel already be usable again prior to the recovery being finished?

It kind of is related to #1061 as it will touch similar area.
If a temporary model would restore the topology, then at least the time between the recovery of the models and the recovery of the consumers would be shortened. (Still possible tough)

@michaelklishin
Copy link
Member

michaelklishin commented Aug 26, 2021

I would say considering a recovering channel to be useful would be generally wrong but there is no realistic way for us to block all other operations (that would be highly surprising after all these years).

@michaelklishin
Copy link
Member

The "fast reply consumer" is a message from the Direct Reply-to mechanism. It's not a real consumer and it does not use a real queue. It relies on a convention in the channel state, so as channels are restarted, that state is completely lost. All this happens on the RabbitMQ node end.

@bollhals
Copy link
Contributor Author

The "fast reply consumer" is a message from the Direct Reply-to mechanism. It's not a real consumer and it does not use a real queue. It relies on a convention in the channel state, so as channels are restarted, that state is completely lost. All this happens on the RabbitMQ node end.

Yes, but from the client point of view it's an active consumer that needs to be recovered before you can publish a rpc successfully. I want to have a look and check whether there's something that could be done to improve it.

@bollhals
Copy link
Contributor Author

bollhals commented Sep 6, 2021

This is now fixed on 6.x branch and on main.

Would it be possible to do a bugfix release for the 6.x and close this issue then?

@michaelklishin
Copy link
Member

michaelklishin commented Sep 6, 2021

I'll try to do it today. There were some pipeline changes forced upon us (e.g. expiring signing keys) so I cannot be certain releasing would just work.

@michaelklishin
Copy link
Member

We had a go at doing a release, it required an internally used Docker image rebuild. Hopefully we can finalise it later this week. Sorry about the wait.

@bollhals
Copy link
Contributor Author

Would it be possible to give a rough timeframe? (We're planning a release of our product, which we hit this issue)

@lukebakken lukebakken self-assigned this Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants