-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON (de)serialization #49
Conversation
Thanks again for everything. I'll do my best to check it out ASAP. |
Just saying that i haven't forgotten this, just trying to find some time to do it. |
Started reviewing it. You're still not forgotten :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good start for something great. There are a few comments already placed in the review here, but i'll add this:
- How is OneOf case handled? Could you add a test (and code to make it work) ?
- I'm not a huge fan of using std.json "leaky" code without it being explicit to the end user. I think we need to use non-leaky methods and return a
Parsed(T)
, and at worst propose leaky json_decode on a separate interface (that would require ArenaAllocator instead of just Allocator).
Overall this is very good and my requests do not change the fact that i'm very thankful for your time and investment in this.
Made some commits. All unions are handled by std.json internally, see here and here - it just worked out of the box, made some tests to prove it. About using "leaky" versions of std.json (de)serialization functions - the only reason I chose them is because I didn't get it to work with non-leaky alternatives) I just remember than some tests blamed about memory leaks (std.testing.allocator is able to catch leaks) and some code didn't even compile, don't remember why tbh. I can try to explore it again with a fresh look. And thanks for feedback, even the late one) |
This is great. I'll have some fun with it and might even add some suggestions before approving it, but overall i'm positive this will be merged soon. Thanks again for your work. |
LUL i just switched to a non-leaky function for JSON decoding and everything works :) There are no memory leaks or compile errors that I've encountered before. |
@librolibro I added a few commits directly, i'll let you look at them and tell me if they're ok for you. According to the json spec for protobuf We did not have a specific information for bytes, we had a shortcut to handle it just like strings. These last commits do not change the zig-facing interface of using ManagedStrings for bytes fields, but add the information needed in the I'll look tomorrow into the Map type, because i'm pretty sure we are not correct with the spec too (we handle it as a list of entries). |
Everything looks clean at the first glance, good job at that! (@embedfile usage was nice, didn't know about it :)) Another thing that comes to my mind is the ability to not add fields into JSON string if their value is equal to default - e.g. Python's implementation of google.protobuf.json_format.MessageToJson comes with including_default_value_fields argument which if False by default. While specs say that
and it's not a strict requirement it may be worth implementing this and since the message field iteration was already made in this code and not in std.json depths it shouldn't be that hard to do. P.S. Here's even more things that might be worth implementing |
Since this project uses ManagedString structs (instead of []const u8), I also need to add jsonParse/jsonStringify method for it
Previously it was made by expecting the type name to be "ArrayListAligned" which was not good since user-defined message could have the same name as built-in ArrayList() struct.
The problem is that std.meta.eql can't properly compare all types (e.g. if struct contains pointer it just compares pointer value itself - that's how it compares ArrayLists). With this "ultimate comparer" function tests looks more clean and less - although the function itself is not that simple and might some tests :) Maybe it'll be useful somewhere outside tests_json.zig (e.g. in other tests)
bf06693
to
ad352b9
Compare
I took the liberty to rebase after #51 was merged, there was a small conflict. |
We need to set ourselves some clear goals for this PR. Required TODO
@librolibro if you start any of these subjects, tell me so we avoid doing it twice. Worth analyzing in separate issues (that will be created after the merge of this PR)
|
In theory you don't need to: a base64 encoder should be able to work on (multiples of) 3-byte pieces. const str = value.getSlice();
try jws.valueStart();
try jws.stream.writeByte('\"');
var temp_out: [5]u8;
var i=0;
while (i < str.len) : (i+=3) {
const s = base64.standard.Encoder.encode(&temp_out, str[i..i+3]);
try jws.stream.writeAll(s);
}
try jws.stream.writeByte('\"');
jws.valueDone(); |
Could have worked indeed. I managed to make it in one pass, that should be good enough. |
repeated bytes, oneof=[NaN or +/-inf] tests fails as expected (NaN in arrays and oneof=.Bytes would probably fail as well), I'll make some changes to support these, for now there are failing tests to not cover this problem. P.S. For every type we're handling by itself (except ManagedString and MessageMixins) we need to paste almost the same code in three places: single field handling, repeated field handling and also oneof handling. P.P.S. Is there a way to disable workflow runs temporarily? I certainly know that tests will fail because I wrote them to fail :) Or maybe I shouldn't push until I won't make it good |
Not single bytes/numeric fields supported now, but also repeated ones and oneofs
It's actually me that's starting them manually sometimes, just tell me when you feel they should run. |
Looks like all types are being handled correctly whether it's single/repeated/oneof field (need more tests for sure)
For now it looks like all (supported ATM) types are processed as expected. I think the current JSON code base is messy IMO - especially in case of function naming. I think it's better to refactor all JSON code according to your code style and not to Zig's one (snake_case for functions instead of camelCase etc.) to avoid more refactor for now. Maybe it's time to move JSON code to separate file because there's a lot of things copied from std.json + things implemented by you and me (400+ lines of code). Also more tests ... For things like deserializing integer/float from JSON string (whether it's single, repeated of inside of oneof) etc. P.S. Since it's your project it's for you to decide whether you need to refactor all of your code to completely follow Zig style guide or not. It's better when all parts of the code are homogeneous, how it's a bit messy) |
Please run another workflow for this PR. I want to check if tests will pass now of Windows (there were strange fails, it was probably about different line endings, but I'm not 100% sure) |
Looks like it's not fixed yet. Don't bother too much, i'll take the time to look into it tuesday. Thank you for all your work already. |
Now .gitattributes work as expected (for some reason leading ./ was the problem and I missed it). If Windows build failures are about line endings it might fix the problem now :) $ git check-attr --all -- tests/json_data/fixed_sizes/camelCase_1.json
tests/json_data/fixed_sizes/camelCase_1.json: text: set
tests/json_data/fixed_sizes/camelCase_1.json: eol: lf |
Last major thing wasn't made yet is For now I don't know how to vary parser's behavior (omit default fields or not, serialize using camelCase or snake_case etc.), there's no able to pass extra state into std.json inner functions and hence into our custom jsonParse/jsonStringify implementations. I have couple "possibly good" ideas how to make it, I would try these after default fields workaround, maybe in the weekend. |
This will be good enough for a good "basic" json encoding/decoding. I'll create issues for everything mentionned in #49 (comment) So we can do bit by bit everything right. Thanks a lot for your involvment ! |
@librolibro the following issues have been created, so PR can be done to target one subject at a time: |
I was on the way to push some commits :) Earlier I thought that Github PR's can be reopened with merge up-to-date master with fork's branch, and then do it again for next issue (does Github allows this?)... But separate issues/PRs might be a cleaner solution. |
Woops sorry! I'd really prefer if we split it up, it will be cleaner to handle. |
Closes #32
This one works without extra boilerplate code (some tests included) and I kinda like the solution, but still ...
To get things done I reimplemented a part of source code (see innerParse function in std/json/static.zig, the part with the .Struct case body). Since few internal non-public was needed i just copied it (eww).
Zig is on its early stages and standard library often changes - and that's why I partly don't like this PR because there might be lots or refactors (for example, standard token parser might change)
Feel free to judge it)