Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed CRC check should not panic #368

Closed
christian-ehmig opened this issue Dec 17, 2024 · 7 comments · Fixed by #378
Closed

Failed CRC check should not panic #368

christian-ehmig opened this issue Dec 17, 2024 · 7 comments · Fixed by #378
Labels
bug Something isn't working

Comments

@christian-ehmig
Copy link

Describe the bug

If a CRC check fails

panic("Error during CRC")
panics. Shouldn't this be handeled in a better way?

Any user of your stream client would need to recover() from such a situation.

We are using the "reliable client" and provoke some crash scenarios like taking down one rmq cluster node. While the reconnect seems to work, an enabled CRC check panics in such situations and therefore stops all consumers.

Reproduction steps

  1. Provoke a CRC checksum error for consumers

Expected behavior

Log the error - reconnect / reset message buffer etc.

Additional context

No response

@christian-ehmig christian-ehmig added the bug Something isn't working label Dec 17, 2024
@Gsantomaggio
Copy link
Member

Hi @christian-ehmig
Thank you for reporting the issue.

CRC failure should be a race condition.

  • Do you think you could reproduce the issue?
  • Can you add more information about your environment?

@Gsantomaggio
Copy link
Member

@hiimjako, Any thoughts on that?

@hiimjako
Copy link
Collaborator

I agree that the client shouldn't panic.
I'm not sure about the root cause of the CRC issue, whether it's an actual message problem or some sort of race condition. However, I think we could remove the panic, log the problem and reset the status as suggested.

@christian-ehmig
Copy link
Author

christian-ehmig commented Dec 18, 2024

Thanks for your quick response on this. I can try to reproduce this locally with a docker-compose setup, producers and consumers. However, in our test case, we use a single consumer on a stream queue via a single connection. I doubt it's related to a race condition.

The CRC check fails if we "unplug the cable" (kill the virtual machine) of one RMQ cluster node (3 in total).

@Gsantomaggio
Copy link
Member

What I had in mind for the .NET client was a policy in case CRC fails.
Like:

  • Ignore
  • Close the consumer
  • Close the connection ( with all the consumers )

I need to think a bit about that but I agree that panic is too much

@Gsantomaggio
Copy link
Member

@christian-ehmig FYI: at the moment, we are focusing on #367
As soon as possible, we will be back. Feel free to propose a PR if you have a fix in mind. thank you

@Gsantomaggio
Copy link
Member

We can close the TCP client when the CRC is corrupted, hoping that is a temporary problem.
That is better than panic.
We will do it for the 1.5

Gsantomaggio added a commit that referenced this issue Jan 27, 2025
The producer is blocked when the limit is reached. It is possible to configure the
limit using the producerOption.QueueSize setting.

Closes: #373

Close the TCP client connection in case of CRC fail. The panic is removed.
Closes: #368

Signed-off-by: Gabriele Santomaggio <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants