Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow customizing AckDeadline to allow for long-running sets of retries #4549

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

evankanderson
Copy link
Member

Summary

It turns out that Watermill SQL maintains an internal timeout on message delivery, and retries messages which take longer than the timeout. On one hand, this feels like good infrastructure practice, but it interferes with related Watermill middleware for retries and handling poison messages. One example of this is AckDeadline:

AckDeadline is the time to wait for acking a message.
If message is not acked within this time, it will be nacked and re-delivered.

When messages are read in bulk, this time is calculated for each message separately.

If you want to disable ack deadline, set it to 0.
Warning: when ack deadline is disabled, messages which are not acked may block PostgreSQL subscriber from reading new messages
due to not increasing pg_snapshot_xmin(pg_current_snapshot()) value.

Must be non-negative. Nil value defaults to 30s.

Fixes #4483

Change Type

Mark the type of change your PR introduces:

  • Bug fix (resolves an issue without affecting existing features)
  • Feature (adds new functionality without breaking changes)
  • Breaking change (may impact existing functionalities or require documentation updates)
  • Documentation (updates or additions to documentation)
  • Refactoring or test improvements (no bug fixes or new functionality)

Testing

Manual testing with https://github.com/eleftherias/msg-queue-repro.git to find the issue and controlling setting, and then starting Minder to check that the new setting was accepted. I did not build a unit test with a timeout to verify this setting, but did test manually in the reproduction repo.

Review Checklist:

  • Reviewed my own code for quality and clarity.
  • Added comments to complex or tricky code sections.
  • Updated any affected documentation.
  • Included tests that validate the fix or feature.
  • Checked that related changes are merged.

@coveralls
Copy link

Coverage Status

coverage: 53.458% (-0.002%) from 53.46%
when pulling c4ff2e4 on evankanderson:fix-ackdeadline
into 275579a on stacklok:main.

@evankanderson evankanderson merged commit 6317e0c into mindersec:main Sep 20, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Messages can be repeatedly added into the queue on failure
4 participants