JSON (de)serialization #49

librolibro · 2024-06-30T15:05:08Z

Closes #32

This one works without extra boilerplate code (some tests included) and I kinda like the solution, but still ...

To get things done I reimplemented a part of source code (see innerParse function in std/json/static.zig, the part with the .Struct case body). Since few internal non-public was needed i just copied it (eww).

Zig is on its early stages and standard library often changes - and that's why I partly don't like this PR because there might be lots or refactors (for example, standard token parser might change)

Feel free to judge it)

Arwalk · 2024-07-02T05:25:37Z

Thanks again for everything. I'll do my best to check it out ASAP.

Arwalk · 2024-07-12T07:53:07Z

Just saying that i haven't forgotten this, just trying to find some time to do it.

Arwalk · 2024-07-22T08:11:58Z

Started reviewing it. You're still not forgotten :)

Arwalk

This is a very good start for something great. There are a few comments already placed in the review here, but i'll add this:

How is OneOf case handled? Could you add a test (and code to make it work) ?
I'm not a huge fan of using std.json "leaky" code without it being explicit to the end user. I think we need to use non-leaky methods and return a Parsed(T), and at worst propose leaky json_decode on a separate interface (that would require ArenaAllocator instead of just Allocator).

Overall this is very good and my requests do not change the fact that i'm very thankful for your time and investment in this.

src/protobuf.zig

librolibro · 2024-07-22T16:01:23Z

Made some commits.

All unions are handled by std.json internally, see here and here - it just worked out of the box, made some tests to prove it.

About using "leaky" versions of std.json (de)serialization functions - the only reason I chose them is because I didn't get it to work with non-leaky alternatives) I just remember than some tests blamed about memory leaks (std.testing.allocator is able to catch leaks) and some code didn't even compile, don't remember why tbh. I can try to explore it again with a fresh look.

And thanks for feedback, even the late one)

Arwalk · 2024-07-22T21:30:14Z

This is great. I'll have some fun with it and might even add some suggestions before approving it, but overall i'm positive this will be merged soon. Thanks again for your work.

librolibro · 2024-07-24T13:57:39Z

LUL i just switched to a non-leaky function for JSON decoding and everything works :)

There are no memory leaks or compile errors that I've encountered before.

Arwalk · 2024-07-24T19:55:27Z

@librolibro I added a few commits directly, i'll let you look at them and tell me if they're ok for you. According to the json spec for protobuf bytes types are supposed to be base64 encoded.

We did not have a specific information for bytes, we had a shortcut to handle it just like strings. These last commits do not change the zig-facing interface of using ManagedStrings for bytes fields, but add the information needed in the _desc_table so that json encoding/decoding know when to do the the base64 soup.

I'll look tomorrow into the Map type, because i'm pretty sure we are not correct with the spec too (we handle it as a list of entries).

librolibro · 2024-07-25T03:04:22Z

Everything looks clean at the first glance, good job at that! (@embedfile usage was nice, didn't know about it :))

Another thing that comes to my mind is the ability to not add fields into JSON string if their value is equal to default - e.g. Python's implementation of google.protobuf.json_format.MessageToJson comes with including_default_value_fields argument which if False by default. While specs say that

A proto3 JSON implementation MAY provide the following options

and it's not a strict requirement it may be worth implementing this and since the message field iteration was already made in this code and not in std.json depths it shouldn't be that hard to do.

P.S. Here's even more things that might be worth implementing

Since this project uses ManagedString structs (instead of []const u8), I also need to add jsonParse/jsonStringify method for it

Previously it was made by expecting the type name to be "ArrayListAligned" which was not good since user-defined message could have the same name as built-in ArrayList() struct.

The problem is that std.meta.eql can't properly compare all types (e.g. if struct contains pointer it just compares pointer value itself - that's how it compares ArrayLists). With this "ultimate comparer" function tests looks more clean and less - although the function itself is not that simple and might some tests :) Maybe it'll be useful somewhere outside tests_json.zig (e.g. in other tests)

Arwalk · 2024-07-25T06:39:55Z

I took the liberty to rebase after #51 was merged, there was a small conflict.

Arwalk · 2024-07-25T07:01:09Z

PR workflow should pass now

james-callahan · 2024-07-31T21:48:48Z

I'm gonna check if there are ways to do that, but i'm not sure i have access to the slices in the json encoding api. In the end, it's not dramatic if this stays as is, if you're already working with json encoding/decoding you're probably not too worried about a pair of temporary allocations.

Edit: nevermind, found a way in 1c8ab07 . That require rewriting some of the code inside the json writer, we might have to make a request to make some methods public in the std.

In theory you don't need to: a base64 encoder should be able to work on (multiples of) 3-byte pieces.
So e.g. do:

  const str = value.getSlice();
  try jws.valueStart();
  try jws.stream.writeByte('\"');
  var temp_out: [5]u8;
  var i=0;
  while (i < str.len) : (i+=3) {
    const s = base64.standard.Encoder.encode(&temp_out, str[i..i+3]);
    try jws.stream.writeAll(s);  
  }
  try jws.stream.writeByte('\"');
  jws.valueDone();

Arwalk · 2024-07-31T22:19:05Z

I'm gonna check if there are ways to do that, but i'm not sure i have access to the slices in the json encoding api. In the end, it's not dramatic if this stays as is, if you're already working with json encoding/decoding you're probably not too worried about a pair of temporary allocations.
Edit: nevermind, found a way in 1c8ab07 . That require rewriting some of the code inside the json writer, we might have to make a request to make some methods public in the std.

In theory you don't need to: a base64 encoder should be able to work on (multiples of) 3-byte pieces. So e.g. do:
  const str = value.getSlice();
  try jws.valueStart();
  try jws.stream.writeByte('\"');
  var temp_out: [5]u8;
  var i=0;
  while (i < str.len) : (i+=3) {
    const s = base64.standard.Encoder.encode(&temp_out, str[i..i+3]);
    try jws.stream.writeAll(s);  
  }
  try jws.stream.writeByte('\"');
  jws.valueDone();

Could have worked indeed. I managed to make it in one pass, that should be good enough.

librolibro · 2024-08-01T14:30:34Z

repeated bytes, oneof=[NaN or +/-inf] tests fails as expected (NaN in arrays and oneof=.Bytes would probably fail as well), I'll make some changes to support these, for now there are failing tests to not cover this problem.

P.S. For every type we're handling by itself (except ManagedString and MessageMixins) we need to paste almost the same code in three places: single field handling, repeated field handling and also oneof handling.

P.P.S. Is there a way to disable workflow runs temporarily? I certainly know that tests will fail because I wrote them to fail :) Or maybe I shouldn't push until I won't make it good

Not single bytes/numeric fields supported now, but also repeated ones and oneofs

Arwalk · 2024-08-02T08:04:32Z

P.P.S. Is there a way to disable workflow runs temporarily? I certainly know that tests will fail because I wrote them to fail :) Or maybe I shouldn't push until I won't make it good

It's actually me that's starting them manually sometimes, just tell me when you feel they should run.

Looks like all types are being handled correctly whether it's single/repeated/oneof field (need more tests for sure)

librolibro · 2024-08-03T15:22:50Z

For now it looks like all (supported ATM) types are processed as expected. I think the current JSON code base is messy IMO - especially in case of function naming. I think it's better to refactor all JSON code according to your code style and not to Zig's one (snake_case for functions instead of camelCase etc.) to avoid more refactor for now. Maybe it's time to move JSON code to separate file because there's a lot of things copied from std.json + things implemented by you and me (400+ lines of code).

Also more tests ... For things like deserializing integer/float from JSON string (whether it's single, repeated of inside of oneof) etc.

P.S. Since it's your project it's for you to decide whether you need to refactor all of your code to completely follow Zig style guide or not. It's better when all parts of the code are homogeneous, how it's a bit messy)

librolibro · 2024-08-04T14:45:28Z

Please run another workflow for this PR. I want to check if tests will pass now of Windows (there were strange fails, it was probably about different line endings, but I'm not 100% sure)

Arwalk · 2024-08-04T20:05:51Z

Looks like it's not fixed yet. Don't bother too much, i'll take the time to look into it tuesday. Thank you for all your work already.

librolibro · 2024-08-05T13:45:19Z

Now .gitattributes work as expected (for some reason leading ./ was the problem and I missed it). If Windows build failures are about line endings it might fix the problem now :)

$ git check-attr --all -- tests/json_data/fixed_sizes/camelCase_1.json
tests/json_data/fixed_sizes/camelCase_1.json: text: set
tests/json_data/fixed_sizes/camelCase_1.json: eol: lf

librolibro · 2024-08-05T13:57:57Z

Last major thing wasn't made yet is Always emit fields without presence described here + tests for cases when not all fields are present in the input JSON string (I think there's MessageMixins.init call missed and these tests would fail).

For now I don't know how to vary parser's behavior (omit default fields or not, serialize using camelCase or snake_case etc.), there's no able to pass extra state into std.json inner functions and hence into our custom jsonParse/jsonStringify implementations. I have couple "possibly good" ideas how to make it, I would try these after default fields workaround, maybe in the weekend.

Arwalk · 2024-08-06T13:43:38Z

This will be good enough for a good "basic" json encoding/decoding. I'll create issues for everything mentionned in #49 (comment) So we can do bit by bit everything right.

Thanks a lot for your involvment !

Arwalk · 2024-08-06T13:51:50Z

@librolibro the following issues have been created, so PR can be done to target one subject at a time:

librolibro · 2024-08-06T14:03:34Z

I was on the way to push some commits :) Earlier I thought that Github PR's can be reopened with merge up-to-date master with fork's branch, and then do it again for next issue (does Github allows this?)... But separate issues/PRs might be a cleaner solution.

Arwalk · 2024-08-06T14:10:00Z

Woops sorry! I'd really prefer if we split it up, it will be cleaner to handle.

Arwalk requested changes Jul 22, 2024

View reviewed changes

src/protobuf.zig Show resolved Hide resolved

src/protobuf.zig Show resolved Hide resolved

src/protobuf.zig Outdated Show resolved Hide resolved

src/protobuf.zig Show resolved Hide resolved

src/protobuf.zig Show resolved Hide resolved

Arwalk self-assigned this Jul 24, 2024

librolibro and others added 16 commits July 25, 2024 08:38

Json (de)serialization (w/o string fields support) ...

dbb3ee8

Since this project uses ManagedString structs (instead of []const u8), I also need to add jsonParse/jsonStringify method for it

ManagedString support for json (de)serialization

26d57df

more tests

c96bd4a

More tests

e8e15d0

Less repetitions

d4b7f3b

Additional comments for jsonParse/jsonStringify

7f91bb5

List detection refactor ...

c6a5feb

Previously it was made by expecting the type name to be "ArrayListAligned" which was not good since user-defined message could have the same name as built-in ArrayList() struct.

Oneof tests

0432dc7

parseFromSliceLeaky -> parseFromSlice

afc9d81

added test for bytes

f4e172a

added bytes type for differenciation

0e16472

generator on par with the rest now

0242410

decode base64 done

6a8168d

encode base64 done

f908344

formatting

ad352b9

Arwalk force-pushed the json_encoding_and_decoding branch from bf06693 to ad352b9 Compare July 25, 2024 06:39

Arwalk added this to the v1.1.0 milestone Jul 25, 2024

librolibro added 5 commits July 31, 2024 23:11

camelCase conversion in comptime ...

5011f15

PR workflow should pass now

Test refactor

979b0ec

Test refactor

09f3a9e

Get rid of unnecessary imports

f90ab99

Some test TODOs to make

9e3f79e

Arwalk and others added 2 commits August 1, 2024 00:26

proper order of operation for base64 encoding.

914b574

WIP: Added tests that were expected to fail

3dd1227

librolibro added 3 commits August 1, 2024 21:46

Tests for oneof=.Bytes (fails as expected)

04f7a09

Tests for repeated NaNs/infs

5fa086d

fix(json): Proper bytes and numeric serialization ...

ee16034

Not single bytes/numeric fields supported now, but also repeated ones and oneofs

fix(json): Proper bytes and numeric serialization ...

1879100

Looks like all types are being handled correctly whether it's single/repeated/oneof field (need more tests for sure)

librolibro added 3 commits August 4, 2024 21:32

Test for integer/float fields (string->numeric conversion)

da6c772

Oops ... Forgot to commit these files

791a17b

.gitattributes added (probably will fix workflow on Windows)

bd21749

Now gitattributes really applies to JSON files

12d7936

Arwalk approved these changes Aug 6, 2024

View reviewed changes

Arwalk merged commit 9115a73 into Arwalk:master Aug 6, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON (de)serialization #49

JSON (de)serialization #49

librolibro commented Jun 30, 2024 •

edited

Loading

Arwalk commented Jul 2, 2024

Arwalk commented Jul 12, 2024

Arwalk commented Jul 22, 2024

Arwalk left a comment

librolibro commented Jul 22, 2024

Arwalk commented Jul 22, 2024 •

edited

Loading

librolibro commented Jul 24, 2024

Arwalk commented Jul 24, 2024

librolibro commented Jul 25, 2024 •

edited

Loading

Arwalk commented Jul 25, 2024

Arwalk commented Jul 25, 2024 •

edited

Loading

james-callahan commented Jul 31, 2024

Arwalk commented Jul 31, 2024

librolibro commented Aug 1, 2024 •

edited

Loading

Arwalk commented Aug 2, 2024

librolibro commented Aug 3, 2024 •

edited

Loading

librolibro commented Aug 4, 2024

Arwalk commented Aug 4, 2024

librolibro commented Aug 5, 2024

librolibro commented Aug 5, 2024

Arwalk commented Aug 6, 2024

Arwalk commented Aug 6, 2024

librolibro commented Aug 6, 2024

Arwalk commented Aug 6, 2024

JSON (de)serialization #49

JSON (de)serialization #49

Conversation

librolibro commented Jun 30, 2024 • edited Loading

Arwalk commented Jul 2, 2024

Arwalk commented Jul 12, 2024

Arwalk commented Jul 22, 2024

Arwalk left a comment

Choose a reason for hiding this comment

librolibro commented Jul 22, 2024

Arwalk commented Jul 22, 2024 • edited Loading

librolibro commented Jul 24, 2024

Arwalk commented Jul 24, 2024

librolibro commented Jul 25, 2024 • edited Loading

Arwalk commented Jul 25, 2024

Arwalk commented Jul 25, 2024 • edited Loading

Required TODO

Worth analyzing in separate issues (that will be created after the merge of this PR)

james-callahan commented Jul 31, 2024

Arwalk commented Jul 31, 2024

librolibro commented Aug 1, 2024 • edited Loading

Arwalk commented Aug 2, 2024

librolibro commented Aug 3, 2024 • edited Loading

librolibro commented Aug 4, 2024

Arwalk commented Aug 4, 2024

librolibro commented Aug 5, 2024

librolibro commented Aug 5, 2024

Arwalk commented Aug 6, 2024

Arwalk commented Aug 6, 2024

librolibro commented Aug 6, 2024

Arwalk commented Aug 6, 2024

librolibro commented Jun 30, 2024 •

edited

Loading

Arwalk commented Jul 22, 2024 •

edited

Loading

librolibro commented Jul 25, 2024 •

edited

Loading

Arwalk commented Jul 25, 2024 •

edited

Loading

librolibro commented Aug 1, 2024 •

edited

Loading

librolibro commented Aug 3, 2024 •

edited

Loading