Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single Object Encoding codecs #511

Open
kimgr opened this issue Mar 22, 2025 · 3 comments
Open

Single Object Encoding codecs #511

kimgr opened this issue Mar 22, 2025 · 3 comments

Comments

@kimgr
Copy link
Contributor

kimgr commented Mar 22, 2025

As mentioned in #489, we consume and produce Avro SOE encoding, which is a minimal framing to carry a schema identifier with each record. We have numerous half-baked implementations of this encoding/decoding, and I wanted to look into upstreaming them.

The Java decoder implementation is fairly dynamic, and looks up a decoder by schema fingerprint in the header, using a resolver. The resolver is a bit like the Registry already in your library, but much simpler (mostly just a GetSchema(fingerprint []byte) Schema).

I think it would be useful to have a few flavors of encoders/decoders:

  • Fixed schema, assumes all records being coded have the same schema, and can unmarshal with or without verification
  • Dynamic schema, like the Java implementation, looks up a schema for each record from an abstract registry, and uses said schema to decode. Encoding might either not be supported in this style, or might use a fixed schema.
  • Fully generic based on types generated by avrogen -- this is like the fixed schema, but is fully type safe as avrogen attaches the schema to the type

Seeing how the SOE header is super-simple, I feel like it would be nice to be able to decorate existing Marshal/Unmarshal/Encode/Decode interfaces somehow. I don't have a design clear in my head for that (possibly some kind of config option that creates decorated API/Encoder/Decoder implementations).

As I mentioned up-front, we have the code to do most of those things, but not in a form that fits neatly into hamba/avro -- so...

  • would you consider including something for SOE?
  • do the different encoder/decoder flavors make sense/seem useful?
  • any thoughts on how to compose SOE coding with the existing codecs?

Thanks!

@nrwiersma
Copy link
Member

Hi,

SOE would be considered, but not in the main package. All other extra encoding currently sits in separate packages, all leveraging the main package, so this would fit that pattern. It would also simplify the issue of how to design that API, as it can be single purpose.

@kimgr
Copy link
Contributor Author

kimgr commented Mar 23, 2025

Makes sense, thanks.

Looks like a dynamic decoder would be almost line-for-line identical with registry.Decoder, except using a different schema resolver signature.

I'll see if I can sketch something out.

@kimgr
Copy link
Contributor Author

kimgr commented Mar 25, 2025

Posted #514 -- let's continue talking there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants