-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable key value byte stitching in PulsarMessageBatch #8897
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8897 +/- ##
============================================
- Coverage 69.60% 62.99% -6.62%
+ Complexity 4997 4865 -132
============================================
Files 1806 1770 -36
Lines 94202 92714 -1488
Branches 14050 13943 -107
============================================
- Hits 65571 58404 -7167
- Misses 24072 30071 +5999
+ Partials 4559 4239 -320
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
* Stitch key and value bytes together using a simple format: | ||
* 4 bytes for key length + key bytes + 4 bytes for value length + value bytes | ||
*/ | ||
private byte[] stitchKeyValue(byte[] keyBytes, byte[] valueBytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is the case for other stream connectors too. I don't think the decoders have access the message key or message headers today.
A more elegant approach maybe to use MessageBatch<StreamMessageType>
, where StreamMessageType can contain payload and metadata. But I suspect you want to avoid a more invasive code change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that particular change will touch too many classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change lgtm, but you might want to callout here and tag for release notes, that existing decoders won't work, along with guidelines on how to use this in that case?
...ion/pinot-pulsar/src/main/java/org/apache/pinot/plugin/stream/pulsar/PulsarMessageBatch.java
Outdated
Show resolved
Hide resolved
Done |
Add a config flag for Pulsar stream connector to enable key and value byte array stitching support in PulsarMessageBatch. This is important when ingesting from a Pulsar topic with a key value schema where the message data only includes value bytes and key bytes have to be retrieved separately.
By stitching, we allow higher layers (eg: decoder) to access both key and value in the same byte array.
Custom Decoder
When this flag is enabled, a custom decoder will be needed to extract Pinot GenericRow from the incoming key and value byte arrays. This involves decoupling the individual byte arrays and then using a corresponding decoder (eg: Avro or Json) to extract fields from the decoded key and value.
Release Notes
Adding support to stitch key and value bytes together in PulsarMessageBatch, controlled by a flag. Custom decoders are needed when this flag is enabled.