Single Object Encoding codecs #511

kimgr · 2025-03-22T15:55:18Z

As mentioned in #489, we consume and produce Avro SOE encoding, which is a minimal framing to carry a schema identifier with each record. We have numerous half-baked implementations of this encoding/decoding, and I wanted to look into upstreaming them.

The Java decoder implementation is fairly dynamic, and looks up a decoder by schema fingerprint in the header, using a resolver. The resolver is a bit like the Registry already in your library, but much simpler (mostly just a GetSchema(fingerprint []byte) Schema).

I think it would be useful to have a few flavors of encoders/decoders:

Fixed schema, assumes all records being coded have the same schema, and can unmarshal with or without verification
Dynamic schema, like the Java implementation, looks up a schema for each record from an abstract registry, and uses said schema to decode. Encoding might either not be supported in this style, or might use a fixed schema.
Fully generic based on types generated by avrogen -- this is like the fixed schema, but is fully type safe as avrogen attaches the schema to the type

Seeing how the SOE header is super-simple, I feel like it would be nice to be able to decorate existing Marshal/Unmarshal/Encode/Decode interfaces somehow. I don't have a design clear in my head for that (possibly some kind of config option that creates decorated API/Encoder/Decoder implementations).

As I mentioned up-front, we have the code to do most of those things, but not in a form that fits neatly into hamba/avro -- so...

would you consider including something for SOE?
do the different encoder/decoder flavors make sense/seem useful?
any thoughts on how to compose SOE coding with the existing codecs?

Thanks!

The text was updated successfully, but these errors were encountered:

nrwiersma · 2025-03-22T18:53:24Z

Hi,

SOE would be considered, but not in the main package. All other extra encoding currently sits in separate packages, all leveraging the main package, so this would fit that pattern. It would also simplify the issue of how to design that API, as it can be single purpose.

kimgr · 2025-03-23T11:40:55Z

Makes sense, thanks.

Looks like a dynamic decoder would be almost line-for-line identical with registry.Decoder, except using a different schema resolver signature.

I'll see if I can sketch something out.

kimgr · 2025-03-25T20:07:20Z

Posted #514 -- let's continue talking there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single Object Encoding codecs #511

Single Object Encoding codecs #511

kimgr commented Mar 22, 2025 •

edited

Loading

nrwiersma commented Mar 22, 2025

kimgr commented Mar 23, 2025

kimgr commented Mar 25, 2025

Single Object Encoding codecs #511

Single Object Encoding codecs #511

Comments

kimgr commented Mar 22, 2025 • edited Loading

nrwiersma commented Mar 22, 2025

kimgr commented Mar 23, 2025

kimgr commented Mar 25, 2025

kimgr commented Mar 22, 2025 •

edited

Loading