Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][client] fix retry topic with exclusive mode. #23859

Merged

Conversation

thetumbled
Copy link
Member

@thetumbled thetumbled commented Jan 17, 2025

Motivation

  • Retry topic relies on the delayed queue feature. Once user call reconsumeLater for a message, this message will be produce as a delayed message to the corresponding retry topic.
  • Delayed queue feature can only work with shared/key-shared subscription mode, or the delayed message will be dispatched immediately with exclusive/failover mode.

Based on the analysis above, we can come to the conclusion that the consumer of the retry topic must be shared/key-shared.

Modifications

Restrict the subscription type of the retry topic to be shared.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Jan 17, 2025
@thetumbled thetumbled self-assigned this Jan 17, 2025
@thetumbled thetumbled added the type/bug The PR fixed a bug or issue reported a bug label Jan 17, 2025
@thetumbled
Copy link
Member Author

And i have a question: why delayed message do not support exclusive type? is it a task that can't be implemented? or a pending task?

@thetumbled thetumbled changed the title [fix][client] fix retry letter with exclusive mode. [fix][client] fix retry topic with exclusive mode. Jan 17, 2025
@lhotari
Copy link
Member

lhotari commented Jan 17, 2025

And i have a question: why delayed message do not support exclusive type? is it a task that can't be implemented? or a pending task?

@thetumbled This question has been asked multiple times before and I didn't search for previous answers. I think that one of the reasons is that the semantics of delayed messages hasn't been defined and designed with Failover or Exclusive type.
There's also technical reasons from implementation perspective. Naively adding support for delayed messages would result in a lot of code duplication in the dispatcher implementation classes. Avoiding code duplication would require refactoring of the dispatcher code and there's currently a lot of resistance in doing that. I noticed that when working on PIP-379 that there's a preference to minimize refactoring due to the risk of regressions. The state management in dispatchers isn't that great and there are bugs that pop up when a subtle change is made. I'm currently investigating #23845 and that's some of my observations about the dispatcher code. It's fairly hard to grasp it how it works due to the multiple levels of states that impact the execution. Multiple levels of state result in a vast state space, which isn't great. A better approach would be to minimize the state space. That could be achieved with a simpler execution model that addresses the different states that exist due to various hacks that have been added to address issues in the original implementation.

@thetumbled
Copy link
Member Author

thetumbled commented Jan 22, 2025

And i have a question: why delayed message do not support exclusive type? is it a task that can't be implemented? or a pending task?

@thetumbled This question has been asked multiple times before and I didn't search for previous answers. I think that one of the reasons is that the semantics of delayed messages hasn't been defined and designed with Failover or Exclusive type. There's also technical reasons from implementation perspective. Naively adding support for delayed messages would result in a lot of code duplication in the dispatcher implementation classes. Avoiding code duplication would require refactoring of the dispatcher code and there's currently a lot of resistance in doing that. I noticed that when working on PIP-379 that there's a preference to minimize refactoring due to the risk of regressions. The state management in dispatchers isn't that great and there are bugs that pop up when a subtle change is made. I'm currently investigating #23845 and that's some of my observations about the dispatcher code. It's fairly hard to grasp it how it works due to the multiple levels of states that impact the execution. Multiple levels of state result in a vast state space, which isn't great. A better approach would be to minimize the state space. That could be achieved with a simpler execution model that addresses the different states that exist due to various hacks that have been added to address issues in the original implementation.

I agree with the complexity of the dispatcher. It is too complicate to master this module and it is hard to guarantee that there is nothing wrong when a seemingly good pr is merged.
Not only the dispatcher module, but other modules like consumer/producer in client jdk are also incredibly complex. There are too many features in pulsar, which is a advantage compared to other mq, but also a disadvantage resulting into many issues.

@thetumbled
Copy link
Member Author

@codecov-commenter
Copy link

codecov-commenter commented Feb 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.19%. Comparing base (bbc6224) to head (462ea73).
Report is 924 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23859      +/-   ##
============================================
+ Coverage     73.57%   74.19%   +0.61%     
+ Complexity    32624    31892     -732     
============================================
  Files          1877     1853      -24     
  Lines        139502   143872    +4370     
  Branches      15299    16350    +1051     
============================================
+ Hits         102638   106740    +4102     
+ Misses        28908    28739     -169     
- Partials       7956     8393     +437     
Flag Coverage Δ
inttests 26.75% <100.00%> (+2.17%) ⬆️
systests 23.20% <33.33%> (-1.12%) ⬇️
unittests 73.71% <100.00%> (+0.86%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...va/org/apache/pulsar/client/impl/ConsumerBase.java 75.72% <100.00%> (+1.59%) ⬆️

... and 1040 files with indirect coverage changes

@BewareMyPower BewareMyPower merged commit 5a59ab7 into apache:master Feb 18, 2025
52 checks passed
lhotari pushed a commit that referenced this pull request Feb 19, 2025
lhotari pushed a commit that referenced this pull request Feb 19, 2025
lhotari pushed a commit that referenced this pull request Feb 19, 2025
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 19, 2025
(cherry picked from commit 5a59ab7)
(cherry picked from commit 8411eef)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Feb 20, 2025
(cherry picked from commit 5a59ab7)
(cherry picked from commit e1d838c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants