You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some interesting formations have appeared in downstream code as well as our own code: codecs have some common patterns and common segments of code, and then a fairly small section that specializes or diverges and makes that codec unique.
In large part this is due to the use of the refmt Token/TokenSource/TokenSink types. (Which is interesting, because I also think the way we currently expose some of those interface details is not great; but apparently it does have some virtues and maybe I shouldn't be so hasty in wanting to rip it out or conceal it. Another issue will be made for that discussion, anyway.)
returnfmt.Errorf("schemafree link emission only supported by this codec for CID type links")
}
default:
panic("unreachable")
}
}
The similarity is... almost 100%, as you can see. There's just a tiiiny divergence in one of the cases in one of the functions which introduces some custom logic (in this example, it peeks at the concrete types and treats some of them a little special).
Looking into the details a bit more...
Normally I'd say a project "shouldn't" need to do something like this: IPLD Schemas already offer a lot of ways to tune the isomorphism between logical data and a serialization-ready data model view of the same.
But in this example, there's a serialization already -- and that was described with IPLD Schemas already -- and what this developer wanted was another, distinct serialization that does not have the same token stream, which happens to have the purpose of debug/human-readability. (It's the "not the same token stream" detail which makes switching to another known multicodec not suitable to attain the goal here. I think this is probably rare, but I'll let it fly unchallenged for the sake of this discussion.)
I'm perfectly happy with this outcome: the developer writes a custom codec, and uses it for their human-readable output presentation, and that's great.
This custom codec is not, and cannot be, a multicodec. It's reaching around for data that's not in the data model! If the function can't be specified in terms of the data model, and needs some other "special sauce" to be defined, then it's not a multicodec -- there'd never be a context-free way to morph the serial data back into data model, and thus it does not meet our criteria for multicodecs.
There's a couple things interesting about this:
I think it's interesting that we came up with a scenario where walking a thing as tokens was useful (and it happened to pretty much produce a codec... just not a multicodec).
I think it's also interesting that we came up with a scenario where it would've been helpful to plug together some handler functions for various token kinds (together with a bunch of default handlers for every token kind we didn't have a special behavior in mind for) and synthesize a whole codec function out of it.
So.
Maybe... maybe having an API for traversals that's based on streams of tokens... is actually a useful idea, and something we should keep supporting. (Probably still in a muchly refactored way than the present, but nonetheless.)
And Maybe... having some kind of build-a-codec gadget, which handles the token case-switching for you and composes callbacks you give it (...while doing all the other fiddly bits like memory budgeting etc, for you) could actually be useful. (We wouldn't necessarily use it for the core codecs, for performance reasons, but it could be plenty helpful for building custom prettyprinters or suchlike.)
Or maybe this is all a bit too much. :) It's probably best to sit on this idea, until another example usecase comes along. Anyway, the notes are here now.
The text was updated successfully, but these errors were encountered:
I think the evolution of codecs like this one are likely 'away from what is supported' such that it'll be hard to make a general enough framework that's both performant, useful, and flexible enough to work for all cases.
Having a string or other fast path for type enumeration seems useful (versus doing type assertions, which seem like they're going to get expensive)
I suspect the other half of this that will be intertwined, but also breaks down the 'stream of tokens' as interface level, is going to be custom semantics about when to load/follow links when performing traversals.
Some interesting formations have appeared in downstream code as well as our own code: codecs have some common patterns and common segments of code, and then a fairly small section that specializes or diverges and makes that codec unique.
In large part this is due to the use of the refmt
Token
/TokenSource
/TokenSink
types. (Which is interesting, because I also think the way we currently expose some of those interface details is not great; but apparently it does have some virtues and maybe I shouldn't be so hasty in wanting to rip it out or conceal it. Another issue will be made for that discussion, anyway.)This is probably best shown by full example:
go-ipld-prime/codec/dagjson/marshal.go
Lines 78 to 155 in 3500324
The similarity is... almost 100%, as you can see. There's just a tiiiny divergence in one of the cases in one of the functions which introduces some custom logic (in this example, it peeks at the concrete types and treats some of them a little special).
Looking into the details a bit more...
There's a couple things interesting about this:
I think it's interesting that we came up with a scenario where walking a thing as tokens was useful (and it happened to pretty much produce a codec... just not a multicodec).
I think it's also interesting that we came up with a scenario where it would've been helpful to plug together some handler functions for various token kinds (together with a bunch of default handlers for every token kind we didn't have a special behavior in mind for) and synthesize a whole codec function out of it.
So.
Maybe... maybe having an API for traversals that's based on streams of tokens... is actually a useful idea, and something we should keep supporting. (Probably still in a muchly refactored way than the present, but nonetheless.)
And Maybe... having some kind of build-a-codec gadget, which handles the token case-switching for you and composes callbacks you give it (...while doing all the other fiddly bits like memory budgeting etc, for you) could actually be useful. (We wouldn't necessarily use it for the core codecs, for performance reasons, but it could be plenty helpful for building custom prettyprinters or suchlike.)
Or maybe this is all a bit too much. :) It's probably best to sit on this idea, until another example usecase comes along. Anyway, the notes are here now.
The text was updated successfully, but these errors were encountered: