-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tendermint data structures and serialization #654
Comments
I was able to generate JSON structs (or Serialization types, as the issue calls it) from the OpenAPI definition in Tendermint Go. We need to decide how/where we want to store them. Currently, I've created a The OpenAPI generator creates a full crate for the json structs (including a The generation is also done differently. Instead of a |
Nice! 👍
Wouldn't it be better, in the long run, to have a single crate (e.g. |
That's what I thought originally, but the openapi compiler is quite opinionated: it generates a full directory structure of a complete crate.
The last two steps make a normally automated process somewhat manual/cumbersome. We could try coming up with an automated method (especially for the lib.rs generation) but I feel it's weird. If you like this idea, I'll look into generating the lib.rs for protobuf files automatically. The real problem is not automation but that all files are overwritten by openapi during generation, because it populates the whole crate folder. In the prost generation we only overwrite the "prost" folder within the source code, which is designated for this automated generation. Edit: also makes it impossible to update the JSON structs without recreating the protobuf structs. Not unusable but definitely inconvenient. |
I've created a diagram from all the public structs in tendermint-rs that are serializable: https://imgur.com/a/YhWa7Hb Current stateWe have three different types of serialization implemented for our structs. green: implicit serialization using annotationsThis is the basic thinking in serde for serialization: annotate your struct with a derive and possibly a few modifiers (like renaming some fields or skipping a few for serialization) and let the serialization take place recursively: all fields are serialized through their own serialization methods. The problem is that this way no validation is done for the fields during deserialization. yellow: explicit serialization using custom deriveThis is the advanced thinking in serde for serialization: custom implementation for the Serialize and Deserialize traits. The benefit is that validation can be implemented for the struct but in a lot of cases, the code is copied over from other structs implementing similar serialization/deserialization. The problem is that the serialization is "hidden" together with the struct: it's not clear if it's implemented or if the serialization code could be reused for other structs. Also, this method is explicit: if a field in the struct has its own serialization/deserialization implemented, this method will ignore it and use the default serialization/deserialization instead. (For example a hexstring->Vec deserialization might serialize into a bytes array, instead of a string.) blue: explicit serialization using raw types / JSON typesThis is the advanced thinking we would like to get to in this issue. The struct is serialized/deserialized using TryFrom/Into traits, through a more open JSON struct. This has the benefit, that it requires minimal annotations on the domain struct and it does validation of the incoming data during conversion. Current implementation is using the protobuf structs defined in the Desired end-state
Ongoing work
I propose we take these last few items as take-aways and make this issue an "ongoing work". Possibly we can close it and refer back to it any time we implement new serialization or touch current serialization. |
We have a request to add JSON support to cosmos-sdk-rs: cosmos/cosmos-rust#83 Any advice you can give on the best way to do that in a greenfield capacity? Is there anything we can/should reuse from tendermint-rs? |
That's a tough one to answer. Our experience with Protobuf/JSON serialization has involved far more labour than we initially anticipated. The core challenges with serialization are best understood when thinking about the problem from different users' perspectives:
Since we want to fulfill all these needs, we ideally want to use the best of both Protobuf and JSON. It's unfortunate (and strange) that Protobuf doesn't meet all of these needs. Another issue is that, in Ultimately such a situation lends itself to maintaining two distinct sets of data structures just for serialization (one set for Protobuf and another for JSON), as well as mappings between the two types. Or at least mappings from each set to our domain types and then back again. With domain types, we would thus end up with 3 sets of data structures we need to maintain. It's a lot of code (and a lot of slog work) if you have lots of data structures, but it's probably the best route to meet all of our users' needs. |
The same problem exists in ibc-rs: for a lot of structures derived from protobuf descriptions, serde is customized to produce human-friendly JSON. It would be tedious to implement serde by hand, or maintain a copy of every struct and define conversions, just to get serde and other trait implementations that are not supported out of the box in the types generated by prost-build. I see two ways to ultimately resolve this. The pbjson-types approach: single crate, all(?) includedEach proto module gets a "canonical" Rust crate with all necessary additional trait implementations derived or otherwise auto-generated. Domain types, in many cases, can embed the generated struct types to easily derive the implementations. Advantages:
Disadvantages:
The prost-types approach: slim prost crate, possible sidecar generatorsThe "canonical" line of generated crates (in our stack it's Advantages:
Disadvantages:
|
@mzabaluev This write-up is very helpful. On the side of the |
@xla If I understand this right, it potentially results in vertical stacks of incompatible, partially redundant "proto+" types, as exemplified by prost-wkt vs pbjson-types. Maybe we can at least agree on one set of richly generated types for Tendermint/Cosmos-SDK/IBC and use that throughout, but then the maintenance responsibilities on the "proto+" crates should be spelled out and include all the rich functionality required by the dependent projects. In our case that would mean, for example, that serialization generated in cosmos-rust has to correspond to the JSON format choices made by ibc-rs. |
Radical idea: have a single repo just for the
We have no JSON support at all as yet, so we're totally open to supporting whatever conventions ibc-rs wants to use. |
This approach is sensible and is in line with my angle of a shared compiler stack. @mzabaluev Following through with the consolidation would result in consistent, non-redundant, compatible types and would address your concerns. @thanethomson Keen to hear your thoughts on this? |
The format choices were made to work with files used by the relayer. They may be somewhat ad-hoc and not necessarily even follow a consistent convention; I can't tell at present if this serialization is universally usable (which is, to say, a common problem with serde). More importantly, outsourcing this format away from ibc-rs would complicate the maintenance loop for Hermes. Consider the situation when we decide to make a breaking change in the schema. Currently it only requires a version bump for the relayer and its libraries, easy to manage in ibc-rs. If this schema is maintained in cosmos-rust or the future all-protos repository, we'd have to wait for a new major version there, which looks like an unnecessary entanglement; after all, it's just the relayer's own file format. |
Versioning this will be... interesting. Would there be a synchronized release cycle for changes in Tendermint, Cosmos SDK, and IBC specifications? Somewhat less radical, I think, would be to have a repo with a |
See also #1128 for an additional complication 🙂 I've been thinking about this for a while now and it's still not totally obvious what the best possible solution here is. It's probably a good idea to break the problem down into sub-problems. ProblemsProblem 1: Protobuf file (
|
@thanethomson You were probably referring to my comment in an ibc-rs status update meeting yesterday. That only reflected on the current difficulties I encountered while working on https://github.com/informalsystems/ibc-rs/pull/2213. If each of the projects whose .proto files we require for relayer's own code generation published those files as BSR modules, our work could be made much easier. I have filed an issue about that: cosmos/ibc-go#1345 |
I think it should be possible to link two different versions of the
I support this. Perhaps for better ergonomics towards "ordinary" Tendermint developers, you don't need a sub-path for the latest supported version and common code, only the modules supporting older versions should be named (and feature-gated) For code generated by prost-build though, there is no way around generating a separate set of types from the .proto sources of every distinct Tendermint version. |
+1 for this, especially if the Tendermint version(s) in use were gated by crate features. It seems like this would also make it possible to remove the |
After working through #639, we're pretty convinced that it's best to maintain 2 categories of data structures:
Previously we've mixed our domain types and serialization types when it comes to JSON, and we'd like to split them up. A rough plan to implement this would be the following:
tendermint
crate.TryFrom
converters to corresponding domain types. These should probably be in a single crate together (we could rename theproto
crate at some point to something more generic, and feature-guard each set of data structures).The text was updated successfully, but these errors were encountered: