-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][client] Refactor SchemaHash to reduce call of hashFunction in SchemaHash #17948
Conversation
bc9bd18
to
cacf71b
Compare
pulsar-common/src/main/java/org/apache/pulsar/common/protocol/schema/SchemaHash.java
Outdated
Show resolved
Hide resolved
One more change could be: public SchemaHash getSchemaHash() {
return schemaHash == null ? SchemaHash.of(new byte[0], null) : schemaHash;
} to avoid the |
A great suggestion |
@merlimat Please help review this PR again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve this patch but we cannot chery pick to released branches.
We are changing the internal API of SchemaInfoImpl.
Unfortunately Pulsar users are very creative and they could have used it.
If we find a way to not break the internal builder API and make it compatible we can cherry pick
@eolivelli I think this PR will not break the The old API like the followings are still available for users. SCHEMA_INFO = new SchemaInfoImpl()
.setName("Boolean")
.setType(SchemaType.BOOLEAN)
.setSchema(new byte[0]); |
If users create If the compatible issues you mentioned is that, I think we can upated
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
great work
04ac05d
to
c2416da
Compare
/pulsarbot run-failure-checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@AnonHxy please update the pr title to make the semantic check to pass. [improve][client] should be accepted |
The title check seems don't accept the "[issue-xxx]" part. |
…in SchemaHash (apache#17948) (cherry picked from commit 2b5b92c) (cherry picked from commit 11b5df7)
Fixes #17931
Motivation
The PR #17049 bring a significant performance regression for publish throughput. The root cause is that
HashFunction#hashBytes(byte[])
takes a lot of cpu time. So we need add a cache forSchemaHash
.For details see line97:
pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageImpl.java
Lines 85 to 99 in 8d13ff8
And we can see that


hashFunction
takes a lot of cpu time from the flame graph belowThe test result for publish throughput(msgs/s) is below:
Modifications
SchemaHash
.SchemaHash#of
Verifying this change
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: AnonHxy#8