Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to convert Image formats #44

Closed
andremarianiello opened this issue Mar 27, 2020 · 27 comments · Fixed by #87
Closed

Add functionality to convert Image formats #44

andremarianiello opened this issue Mar 27, 2020 · 27 comments · Fixed by #87
Labels
Feature New feature or request

Comments

@andremarianiello
Copy link

I am dealing with a set of proto files from a couple legacy repos that I cannot change. I can successfully build an image from these protos, but I cannot generate stubs from them with protoc. To generate stubs correctly I would have to write some nasty commands to automatically fixup the proto files before building the image. However, if I could build a JSON image, modify that really easily with jq, and then convert that to a binary image with buf to use as input to protoc, I wouldn't have to resort to hack-ily sed-ing the proto files. Is this possible?

P.S. Converting binary images to JSON would also be useful for introspection, but that is not necessary for me right now.

@bufdev
Copy link
Member

bufdev commented Mar 27, 2020

It's a pretty easy thing to do internally actually, but need to investigate how easy it is to add to buf. Will update this issue with more details in a bit.

@bufdev
Copy link
Member

bufdev commented Mar 27, 2020

Can you try out #45?

To install:

go get github.com/bufbuild/buf/cmd/buf@image-build-all-inputs

To convert from binary to JSON:

$ buf image build --input input_json_file.json -o output_binary_file.bin

To convert from JSON to binary:

$ buf image build --input input_binary_file.bin --o output_json_file.json

@bufdev
Copy link
Member

bufdev commented Mar 27, 2020

One note:

The JSON to binary to JSON round trip will produce the same logical result, but the binary to JSON to binary round trip may not in the case of custom options. This is likely OK, as we're not supporting this conversion as a means to keep such data, but it needs to be documented.

So, these two will be equal (except for some empty array issues, which are basically a function of jsonpb and likely won't be fixed):

$ buf image build -o -#format=json
$ buf image build -o -#format=json | buf image build --input -#format=json -o - | buf image build --input - -o -#format=json

But these two may not be:

$ buf image build -o -
$ buf image build -o - | buf image build --input - -o -#format=json | buf image build --input -#format=json -o -

@bufdev
Copy link
Member

bufdev commented Mar 27, 2020

One more note: If you're doing manual conversion of things within the FileDescriptorProtos, that could get really ugly. Just proceed with caution, but one note is that the SourceCodeInfo will need to be completely thrown out the window, so make sure you do --exclude-source-info on your resulting image.

@bufdev bufdev added the Feature New feature or request label Mar 27, 2020
@andremarianiello
Copy link
Author

andremarianiello commented Mar 27, 2020

At first pass it seems to work great! There are some small things I noticed which may or may not matter to you.

When I roundtrip a json image from json->bin->json, the json is slightly different at the end than it was at the beginning. Small example:

5520,5521c5516
<             "leadingComments": " list of <string, string, string> namespace, key, value triplets\n",
<             "leadingDetachedComments": []
---
>             "leadingComments": " list of <string, string, string> namespace, key, value triplets\n"

and

61742d61337
<             "path": [],

It seems like empty arrays get omitted during the round trip. I assume this is just a byproduct of how the pb->json logic is configured to output empty fields, and thus should have no impact on the binary image, but I could see it tripping up automation that deals with these, especially if people want to commit JSON images to git. Do you think this behavioral quirk should be fixed?

Also, thanks for the tip about the source info, I will keep that in mind.

@bufdev
Copy link
Member

bufdev commented Mar 27, 2020

Yea the comment above addresses this - this is just a function of jsonpb, can't really clean that up, and it's the same functionally. Long-term, this would be something we'd want to address though - doing a "perfect" round trip (i.e. same shasum effectively) is important. The custom options present enough of an issue anyways, though.

Regardless, if you're using JSON images for i.e. code generation, I'd recommend explicitly NOT doing that - they're lossy on custom options anyways (which doesn't matter for breaking change detection or linting, but would matter for code generation).

Is this still valuable?

@andremarianiello
Copy link
Author

jsonpb actually allows you to configure that behavior IIRC via the EmitDefaults flag (if the issue is what I think it is)

Regardless, if you're using JSON images for i.e. code generation, I'd recommend explicitly NOT doing that

I was planning on converting the JSON image to binary, and then generating code from the binary image. Is that what you are saying I shouldn't do?

@bufdev
Copy link
Member

bufdev commented Mar 27, 2020

The EmitDefaults doesn't actually have an effect on empty arrays, those aren't considered defaults so to say - note that in both the single and the round trip case, the output is run through a jsonpb.Marshaler before being written, and it's the same jsonpb.Marshaler that has EmitDefaults to off.

As of now, I can't say I'd recommend it - the issue is that JSON images are lossy on custom options. So if you have any custom options in your setup, the JSON image won't have them. For linting and breaking change detection, JSON images are fine, but as of now, they aren't great for that. Something we need to look into.

@bufdev
Copy link
Member

bufdev commented Mar 27, 2020

Actually correct that - it does appear to have an effect on empty arrays. But it's a nil vs empty issue, which comes up a lot in these cases - it's just a tough one, and not sure I want to emit defaults for everything to get around it.

@andremarianiello
Copy link
Author

I don't believe that these protos have any custom options, but it is good to be aware of. Thanks for all your help with this. I haven't tested my whole process (modifying the JSON image, converting to binary, code gen), but if there is an issue with that it is mine, not yours. Thanks for all your help with this - looks good.

@bufdev
Copy link
Member

bufdev commented Mar 27, 2020

Do you mind if we sit on this for a little bit and think about the long-term implications of merging this? Ie are you OK to use it on the branch for now? Or would that be problematic? Thinking through this, we're slightly concerned that allowing images to be input to buf image commands might cause trouble, namely with the lossy nature of JSON images (actually all images are lossy https://buf.build/docs/lint-checkers#formatting ) - while it's fine for input to buf check commands, if we are extending Images for the upcoming Buf Schema Registry, we probably always want to start with sources.

@andremarianiello
Copy link
Author

Not ideal for me, but I understand why you want to be careful about this. Using this branch is acceptable.

@bufdev
Copy link
Member

bufdev commented Mar 27, 2020

We'll look into this more and try to get back to you as soon as possible. We appreciate it, sorry for the inconvenience.

I would highly recommend keeping things in binary Images regardless, if you're planning to use these for code generation - so maybe use this to get everything into binary after you do your seds? :-) Ie reconvert back to binary once you're done, basically.

Another note, if you're planning on pushing these into code generation: SourceCodeInfo matters, it's how you get all your comments in your generated files, so since you'll be effectively invalidating SourceCodeInfo by doing these manual edits to the FileDescriptorProtos, you may be in trouble anyways.

@bufdev
Copy link
Member

bufdev commented Mar 28, 2020

Just an update: We're actually going to look to fix the lossy nature (relative to binary) of JSON images - it's possible already, but even easier with v2 of golang/protobuf. We'll merge this once we can do the roundtrip properly.

@andremarianiello
Copy link
Author

That's great news! I'm a big fan of the new Go protobuf api.

@bufdev
Copy link
Member

bufdev commented May 9, 2020

Hey @andremarianiello - is this still something that is needed? The short is that we're not sure we want to necessarily support conversion in this manner, at least for now. Let us know.

@andremarianiello
Copy link
Author

I am using the JSON to binary functionality from the branch you made for my production build. It is a fairly complicated build, but doing the manipulations I needed to do with jq was much simpler and more robust that operating on the source directly. Even if you decide not to never include that functionality in master, I expect to be using your branch for a long time. I do not currently have an alternative.

@bufdev
Copy link
Member

bufdev commented May 9, 2020

I'd recommend forking in this case - we may support JSON to binary conversion in the future, we're not sure we can commit to this with the long-term implications it has (re: previous comment #44 (comment)) - right now it's incorrect due to the lossy nature of FileDescriptorSets. Can you let me know when you've forked?

@andremarianiello
Copy link
Author

Forking complete. Thanks for the branch.

Could you elaborate more on "the lossy nature of golang/protobuf"? I am interested in learning what these issues are and whether or not you believe them able to be eventually overcome. I am aware of missing custom options. Is there more?

@bufdev
Copy link
Member

bufdev commented May 9, 2020

Whoops, I meant "the lossy nature of FileDescriptorSets" (edited for clarity).

The biggest thing is leading detached comments:

https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.proto#L881

I forget exactly which one of these is dropped, but if you have:

// one

// two

// three
message Foo {} // four

Either // one, // two, or both are dropped from the resulting FileDescriptorSet.

Sorry for the deleting of the branch here - we definitely want to get to this at some point, we just have some bigger fish to fry at this exact moment, and it'll be tough to keep that branch up to date. Feel free to get in touch at [email protected] and happy to discuss more.

Going to keep this issue open so we can consider it for the future!

@bufdev bufdev changed the title Is there a way to convert a JSON image to a binary image? Add functionality to convert Image formats May 9, 2020
@bufdev
Copy link
Member

bufdev commented May 10, 2020

Hey @andremarianiello, see comment on #50 - it turns out that no matter what, custom options are dropped from JSON - I've verified this in the golang/protobuf code as well, although I have filed an issue to see if this can be changed (probably not). Just thought I'd give you the heads up.

@bufdev
Copy link
Member

bufdev commented Jun 21, 2020

Update: it turns out that custom options can be added to JSON, and we'll be adding something along the lines of buf image convert in the near future.

@bufdev
Copy link
Member

bufdev commented Jun 22, 2020

See #87

@bufdev
Copy link
Member

bufdev commented Jun 22, 2020

Released as 0.18.0.

@andremarianiello
Copy link
Author

Thanks so much! I'm surprised this actually got done. I thought that custom options were just not possible in JSON images. What changed?

@bufdev
Copy link
Member

bufdev commented Jun 27, 2020

I was wrong :-) You can have custom options in JSON :-)

@andremarianiello
Copy link
Author

The best kind of mistake!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants