Failed CRC check should not panic #368

christian-ehmig · 2024-12-17T16:37:15Z

Describe the bug

If a CRC check fails

rabbitmq-stream-go-client/pkg/stream/server_frame.go

Line 359 in ab4d470

panic("Error during CRC")

panics. Shouldn't this be handeled in a better way?

Any user of your stream client would need to recover() from such a situation.

We are using the "reliable client" and provoke some crash scenarios like taking down one rmq cluster node. While the reconnect seems to work, an enabled CRC check panics in such situations and therefore stops all consumers.

Reproduction steps

Provoke a CRC checksum error for consumers

Expected behavior

Log the error - reconnect / reset message buffer etc.

Additional context

No response

Gsantomaggio · 2024-12-17T21:31:54Z

Hi @christian-ehmig
Thank you for reporting the issue.

CRC failure should be a race condition.

Do you think you could reproduce the issue?
Can you add more information about your environment?

Gsantomaggio · 2024-12-18T07:59:32Z

@hiimjako, Any thoughts on that?

hiimjako · 2024-12-18T09:25:44Z

I agree that the client shouldn't panic.
I'm not sure about the root cause of the CRC issue, whether it's an actual message problem or some sort of race condition. However, I think we could remove the panic, log the problem and reset the status as suggested.

christian-ehmig · 2024-12-18T13:01:02Z

Thanks for your quick response on this. I can try to reproduce this locally with a docker-compose setup, producers and consumers. However, in our test case, we use a single consumer on a stream queue via a single connection. I doubt it's related to a race condition.

The CRC check fails if we "unplug the cable" (kill the virtual machine) of one RMQ cluster node (3 in total).

Gsantomaggio · 2024-12-18T14:01:18Z

What I had in mind for the .NET client was a policy in case CRC fails.
Like:

Ignore
Close the consumer
Close the connection ( with all the consumers )

I need to think a bit about that but I agree that panic is too much

Gsantomaggio · 2024-12-19T07:32:51Z

@christian-ehmig FYI: at the moment, we are focusing on #367
As soon as possible, we will be back. Feel free to propose a PR if you have a fix in mind. thank you

Gsantomaggio · 2025-01-20T09:15:39Z

We can close the TCP client when the CRC is corrupted, hoping that is a temporary problem.
That is better than panic.
We will do it for the 1.5

The producer is blocked when the limit is reached. It is possible to configure the limit using the producerOption.QueueSize setting. Closes: #373 Close the TCP client connection in case of CRC fail. The panic is removed. Closes: #368 Signed-off-by: Gabriele Santomaggio <[email protected]>

christian-ehmig added the bug Something isn't working label Dec 17, 2024

Gsantomaggio mentioned this issue Jan 27, 2025

Add limit to the unconfirmed messages #378

Merged

Gsantomaggio closed this as completed in #378 Jan 27, 2025

Gsantomaggio closed this as completed in 7a3d780 Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed CRC check should not panic #368

Failed CRC check should not panic #368

christian-ehmig commented Dec 17, 2024

Gsantomaggio commented Dec 17, 2024

Gsantomaggio commented Dec 18, 2024

hiimjako commented Dec 18, 2024

christian-ehmig commented Dec 18, 2024 •

edited

Loading

Gsantomaggio commented Dec 18, 2024

Gsantomaggio commented Dec 19, 2024

Gsantomaggio commented Jan 20, 2025

Failed CRC check should not panic #368

Failed CRC check should not panic #368

Comments

christian-ehmig commented Dec 17, 2024

Describe the bug

Reproduction steps

Expected behavior

Additional context

Gsantomaggio commented Dec 17, 2024

Gsantomaggio commented Dec 18, 2024

hiimjako commented Dec 18, 2024

christian-ehmig commented Dec 18, 2024 • edited Loading

Gsantomaggio commented Dec 18, 2024

Gsantomaggio commented Dec 19, 2024

Gsantomaggio commented Jan 20, 2025

christian-ehmig commented Dec 18, 2024 •

edited

Loading