Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Introduce the last sent position to fix message ordering issues in Key_Shared (PIP-282) #21953

Merged
merged 2 commits into from
Jul 19, 2024

Conversation

equanz
Copy link
Contributor

@equanz equanz commented Jan 23, 2024

PIP: #20776

Motivation

Key_Shared has a mechanism called the "recently joined consumers" to keep message ordering.
However, currently, it doesn't care about some corner cases.
More specifically, we found two out-of-order issues cased by:

  1. [issue-1] The race condition in the "recently joined consumers", where consumers can be added before finishing reading and dispatching messages from ledgers.
  2. [issue-2] Messages could be added to messagesToRedeliver without consumer-side operations such as unacknowledgement.

We should care about these cases in Key_Shared subscription.
(For more details, please see the PIP document.)

Modifications

  1. Introduce the new position (the lastSentPosition and individuallySentPositions) and use them in PersistentStickyKeyDispatcherMultipleConsumers.
  2. Rename the consumer stats readPositionWhenJoining to lastSentPositionWhenJoining.
  3. Modify the subscription stats (consumersAfterMarkDeletePosition) of the definition.

(For more details, please see the PIP document.)

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Unit tests for the lastSentPosition and individuallySentPositions
  • Test for guaranteed message ordering in corner cases

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: equanz#6

Copy link
Contributor

@nkurihar nkurihar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

(In our-company we cherry-picked this change and so far no issues have been identified. However, it is better if you could give it another review for further validation.)

@hanmz
Copy link
Contributor

hanmz commented Feb 18, 2024

LGTM

@poorbarcode
Copy link
Contributor

/pulsarbot rerun-failure-checks

@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2024

Codecov Report

Attention: Patch coverage is 87.93103% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 73.90%. Comparing base (bbc6224) to head (c3b318a).
Report is 179 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #21953      +/-   ##
============================================
+ Coverage     73.57%   73.90%   +0.33%     
- Complexity    32624    33094     +470     
============================================
  Files          1877     1885       +8     
  Lines        139502   140522    +1020     
  Branches      15299    15466     +167     
============================================
+ Hits         102638   103857    +1219     
+ Misses        28908    28616     -292     
- Partials       7956     8049      +93     
Flag Coverage Δ
inttests 26.69% <44.82%> (+2.11%) ⬆️
systests 24.45% <45.68%> (+0.13%) ⬆️
unittests 73.18% <87.93%> (+0.34%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...che/bookkeeper/mledger/impl/ManagedCursorImpl.java 79.83% <100.00%> (+0.53%) ⬆️
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 81.01% <ø> (+0.35%) ⬆️
...ava/org/apache/pulsar/broker/service/Consumer.java 86.66% <100.00%> (ø)
.../common/policies/data/stats/ConsumerStatsImpl.java 97.61% <100.00%> (ø)
...mon/policies/data/stats/SubscriptionStatsImpl.java 94.21% <100.00%> (+0.09%) ⬆️
...ker/service/persistent/PersistentSubscription.java 77.18% <92.85%> (+0.49%) ⬆️
...ersistentStickyKeyDispatcherMultipleConsumers.java 87.02% <85.55%> (+1.38%) ⬆️

... and 243 files with indirect coverage changes

@poorbarcode
Copy link
Contributor

@equanz Sorry for the late review, I will finish the review tomorrow.

@equanz
Copy link
Contributor Author

equanz commented Mar 7, 2024

Hi @poorbarcode.
Thank you for your review. What is the current status?

@poorbarcode
Copy link
Contributor

Hi @poorbarcode.
Thank you for your review. What is the current status?

Sorry, I am busy for other things before, I will done this review soon

@equanz equanz force-pushed the pip_282_to_master branch from 43a2dc3 to c3b318a Compare April 23, 2024 05:33
@equanz
Copy link
Contributor Author

equanz commented Apr 23, 2024

Rebased to fix conflicts.
BTW, can someone review this? I've been trying to fix the issue for a year now... (cf. #20179)

@poorbarcode
Copy link
Contributor

@codelipenghui Could you take a look?

@coderzc coderzc removed this from the 3.3.0 milestone May 8, 2024
@coderzc coderzc added this to the 3.4.0 milestone May 8, 2024
@equanz
Copy link
Contributor Author

equanz commented May 21, 2024

@codelipenghui @poorbarcode

Looking forward to your reviews.

@codelipenghui
Copy link
Contributor

@equanz @poorbarcode Sorry for the late response, I will review this week.

@equanz
Copy link
Contributor Author

equanz commented Jun 11, 2024

Hi @codelipenghui . Do you have any comments?

@codelipenghui
Copy link
Contributor

Hi @codelipenghui . Do you have any comments?

Sorry, I'm struggling in other tasks. Please go ahead to merge the PR. @poorbarcode

@poorbarcode
Copy link
Contributor

@equanz Could you please handle the conflicts?

@equanz equanz force-pushed the pip_282_to_master branch from c3b318a to 8b427b3 Compare June 18, 2024 08:09
@equanz
Copy link
Contributor Author

equanz commented Jun 18, 2024

/pulsarbot rerun-failure-checks

@equanz
Copy link
Contributor Author

equanz commented Jun 18, 2024

@poorbarcode Addressed. PTAL

@equanz
Copy link
Contributor Author

equanz commented Jul 9, 2024

@poorbarcode Could you take a look?

@lhotari
Copy link
Member

lhotari commented Aug 22, 2024

@equanz do you have a chance to update the documentation in https://pulsar.apache.org/docs/next/concepts-messaging/#preserving-order-of-processing to reflect the PIP-282 changes?
it seems that "Once the connection has been established, the broker will record the current read position and associate it with the new consumer. The read position is a marker indicating that messages have been dispatched to the consumers up to this point, and after it, no messages have been dispatched yet. The broker will start delivering messages to the new consumer only when all messages up to the read position have been acknowledged." would have to be updated in https://github.com/apache/pulsar-site/blob/main/docs/concepts-messaging.md?plain=1#L797-L801.
/cc @poorbarcode @codelipenghui

@equanz
Copy link
Contributor Author

equanz commented Aug 22, 2024

I forgot it. I'll fix it.

@equanz
Copy link
Contributor Author

equanz commented Aug 22, 2024

Here is a fix of the docs.
apache/pulsar-site#953

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required Your PR changes impact docs and you will update later. ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants