Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

there's a funky smell and it's coming from command output encoding #599

Closed
phritz opened this issue Jun 23, 2018 · 15 comments · Fixed by #3975
Closed

there's a funky smell and it's coming from command output encoding #599

phritz opened this issue Jun 23, 2018 · 15 comments · Fixed by #3975
Assignees
Labels
C-bug Category: This is a bug

Comments

@phritz
Copy link
Contributor

phritz commented Jun 23, 2018

OK I'm sorry / not sorry to have to hand this off as I go on vacation. As part of #587 which you can see in #598 I uncovered some funky smells in our output encoding. See in particular https://github.com/filecoin-project/go-filecoin/pull/598/files#diff-e77b7ad9d6713425351aae9703d0d36bR16

First, take note of #266 which is almost certainly related to these problems. The issues specifically are:

  1. Block.Miner doesn't make it through the nutty series of encodings in dag_daemon_test. The only reason this test passes is that the genesis block doesn't have a miner address set. Anyway, this call returns an error that is presently ignored, if you capture and assert no error on it you can see it: https://github.com/filecoin-project/go-filecoin/blob/master/commands/dag_daemon_test.go#L45

  2. This test also embodies the following pattern. It gets a block two ways and then checks that they are equal. It gets the block first via Block => json => Block: it calls chain ls which json encodes and outputs a block from memory. Then it dag gets the cid from that block which results in Block => IPLD node => CBOR => IPLD node => json => IPLD node => Block. Walking you through that: the block gets turned into an ipld node which is written as cbor in the block store (Block => IPLD node => CBOR) and then dag get CBOR decodes it into an IPLD node and json encodes it and outputs it (Block => IPLD node => CBOR => IPLD node => json). Then the test json decodes back to the ipld node (Block => IPLD node => CBOR => IPLD node => json => IPLD node) and then decodes the ipld node as a block (bringing us all the way to Block => IPLD node => CBOR => IPLD node => json => IPLD node => Block). There was an issue with this test when implemented Uint64 so I wanted to write a test in block_test that specifically checked whether the block survived this sequence of transformations. I wrote the test, but it fails with the same error as above. However! The dag_daemon_test which does the should-be-logically-identical transformation passes. So, that's mystery number 2, possibly related to mystery 1.

  3. I mentioned we json encode Uint64's to strings. We have to do this because of the way that dag_daemon_test's (old, outdated) cbor implementation works: https://github.com/ipfs/go-ipld-cbor/blob/master/node.go#L82. It uses json for a reason I don't understand, it was probably expedient. The ipld node contains bytes for Uint64's so unless we define custom json marshaling it will fail because without custom Uint64 marshaling it expects that Uint64's are marshaled as numbers (not bytes). It's a bummer to have to json encode Uint64's to a string (the LEB128 bytes are base64-encoded and js-quoted) because that makes the json output of Uint64's in commands opaque. Ick.

  4. I'd like to understand what we're shooting for output encoding wise. The sensible thing to me and i think(?) what we have been shooting for in general is to have exactly two types of output encoding for commands.

  • human-readable output. This is a json encoding of the model object (eg, Block), meant for consumption by humans, and should never be read by machines because of that
  • machine-readable output. This should be something close to if not identical to the storage format, so call it raw CBOR. (I'm not sure why we json-encode it? seems like that's a transport concern).

Is that right? If so we need a pass over the code to make sure it is what we're doing.

OK, whew, popping back up. I think the place to start is with #266, now that it is clear that things are not well in this area of the code. Note that everything that is not dag_daemon_test uses this fork of go-ipld-cbor: https://github.com/filecoin-project/go-ipld-cbor. Once dag_daemon_test has been moved to use QmRiRJhn427YVuufBEHofLreKWNw7P7BWNq86Sb9kzqdbd like the rest of the codebase then tackling 1 and 2 above makes sense. Then once they are fixed or understood it would be amazing to look at 3, whether we can json marshal to a number (sees like we should since the new version of go-ipld-cbor uses refmt and not some hacky series of json marshals).

cc @whyrusleeping @laser @aboodman

Edit: clarified 2 based on feedback from @travisperson in #598 (review)

@phritz phritz added the C-bug Category: This is a bug label Jun 23, 2018
@dignifiedquire
Copy link
Contributor

note, JSON does not support the full u64 range so we will have to use sth not number there always. This is not the case for cbor though.

@phritz
Copy link
Contributor Author

phritz commented Jun 26, 2018 via email

@dignifiedquire
Copy link
Contributor

dignifiedquire commented Jun 26, 2018 via email

@phritz
Copy link
Contributor Author

phritz commented Jun 26, 2018 via email

@phritz
Copy link
Contributor Author

phritz commented Jun 26, 2018 via email

@aboodman
Copy link
Contributor

I don't consider JSON output human-readable, and I think that is why in some cases we have a third output encoding, text.

Is it possible that:

a) we can ditch the json encoding entirely, or
b) make json encoding some entirely orthogonal tool that converts to/from our canonical binary encoding

@travisperson
Copy link
Contributor

When converting a Block to an IPLD Node, it does not encode the cids under stateRoot and parents correctly to json. The cids do show up in the cbor bytes, and are present in the ipldNode.

0001-test-Block-CID-IPLD-Encoding.patch.txt (running off of master, bc46155)

=== RUN   TestBlockMarshalling
=== RUN   TestBlockMarshalling/block_to_json_to_map
=== RUN   TestBlockMarshalling/block_to_ipld_to_json_to_map
=== RUN   TestBlockMarshalling/block_to_cbor_to_ipld_to_json_to_map
--- FAIL: TestBlockMarshalling (0.00s)
    --- PASS: TestBlockMarshalling/block_to_json_to_map (0.00s)
    --- FAIL: TestBlockMarshalling/block_to_ipld_to_json_to_map (0.00s)
    	block_test.go:36: 
    			Error Trace:	block_test.go:36
    			            				block_test.go:73
    			Error:      	Should be true
    			Test:       	TestBlockMarshalling/block_to_ipld_to_json_to_map
    			Messages:   	blockMap stateRoot has no link
    	block_test.go:48: 
    			Error Trace:	block_test.go:48
    			            				block_test.go:74
    			Error:      	Should be true
    			Test:       	TestBlockMarshalling/block_to_ipld_to_json_to_map
    			Messages:   	ipldMap first parent has no link
    --- FAIL: TestBlockMarshalling/block_to_cbor_to_ipld_to_json_to_map (0.00s)
    	block_test.go:36: 
    			Error Trace:	block_test.go:36
    			            				block_test.go:97
    			Error:      	Should be true
    			Test:       	TestBlockMarshalling/block_to_cbor_to_ipld_to_json_to_map
    			Messages:   	blockMap stateRoot has no link
    	block_test.go:48: 
    			Error Trace:	block_test.go:48
    			            				block_test.go:98
    			Error:      	Should be true
    			Test:       	TestBlockMarshalling/block_to_cbor_to_ipld_to_json_to_map
    			Messages:   	ipldMap first parent has no link
FAIL
exit status 1
FAIL	github.com/filecoin-project/go-filecoin/types	0.012s

A bit of digging it appears there is something weird happening when the ipld node is marshaled it self.

https://github.com/filecoin-project/go-ipld-cbor/blob/62a8360668af46a079fd448235a6597b9528b4d7/node.go#L450-L457

The out map from convertToJsonIsh has the cid values, but after the call to json.Marshal they are no longer there, but are replaced by an empty object.

The go-cid package (QmcZfnkapfECQGcLZaf9B79NRg7cRa9EnZh4LSbkCzwNvY) used by go-cbor-ipld (QmRiRJhn427YVuufBEHofLreKWNw7P7BWNq86Sb9kzqdbd), does not appear to be used when encoding the cid.

https://github.com/ipfs/go-cid/blob/master/cid.go#L413-L415

Does this all seam related? If the ipld node can't be encoded to json it makes sense that it can't be decoded. Which I believe might explain #288 not working with the new cbor package.

@travisperson
Copy link
Contributor

  1. Block.Miner doesn't make it through the nutty series of encodings in dag_daemon_test. The only reason this test passes is that the genesis block doesn't have a miner address set. Anyway, this call returns an error that is presently ignored, if you capture and assert no error on it you can see it:

The error is/was something along the lines of unmarshal error: cannot assign <f:0> to uint8 field.

This is because json.Marshal treats all numbers as float64 (per json spec). The issue with this is when doing a little dance of taking an Address (currently encoded to an array of 22 json numbers), and decode it back into an interface, the true type is []float64.

This can be demonstrated by the following playground code

https://play.golang.org/p/2LHLEj588Gc

@travisperson
Copy link
Contributor

With regard to:

When converting a Block to an IPLD Node, it does not encode the cids under stateRoot and parents correctly to json.
#599 (comment)

This really only has to do with going from an IPLD Node to JSON. (Probably will effect point 2)

This issue is due to a quirk of reflect in golang I believe. In go-cid, the MarshalJSON is only on the pointer receiver. When decoding cbor bytes into a structure, refmt assigns a CID value, not a pointer. This results in the ipld node not being able to marshal directly to json.

Example of quirk https://play.golang.org/p/sOeVJYiv84x

@travisperson
Copy link
Contributor

I believe this branch (https://github.com/filecoin-project/go-filecoin/tree/test/dag-daemon-test-cbor) will resolve #266. It makes Address types encode to their string value instead of a byte array. However, it still fails to pass tests due to the added assertion after the DecodeInto.

It fails with unmarshal error: cannot assign <f:1337> to uint64 field which is a result of JSON representing numbers as float64. Though that issue might be resolve by #587.

The big issue is that cbor.FromJSON is really a poor way to deal with things. All json numbers are float64 values. This means that the ipld node returned is full of float64 values, which can't be decoded into things.

@travisperson
Copy link
Contributor

I updated the test/dag-daemon-test-cbor branch to not use FromJSON at all by just going straight from the JSON output into the Block. I don't see a reason not too at the moment (unless it's desired for the test), and it makes the test pass.

phritz added a commit that referenced this issue Jul 11, 2018
Fixes #587

In which we convert the uint64's to a new type, Uint64 which we teach refmt to serialize in LEB128. Why you may ask does this PR marshal Uint64 to json as a string instead of a number? See #599
@phritz phritz self-assigned this Jul 11, 2018
@phritz
Copy link
Contributor Author

phritz commented Jul 11, 2018

@travisperson thanks for jumping in, really helpful. There are multiple overlapping issues here I think I'm close to wrapping my head around. However could you clarify for me what you mean by:

The big issue is that cbor.FromJSON is really a poor way to deal with things. All json numbers are float64 values. This means that the ipld node returned is full of float64 values, which can't be decoded into things.

Is there a fundamental reason that we couldn't try to make the float64's fit into the thing they're unmarshaled into eg in https://github.com/polydawn/refmt/blob/0ceb50a9198a7771378f4d712cfdf1b99121dfbb/obj/unmarshalBuiltins.go#L62, or is it just that that work hasn't been done yet, or is it something else?

@phritz
Copy link
Contributor Author

phritz commented Jul 11, 2018

OK, this boils down to at least the following set of issues, many of which were discovered by @travisperson above. I'll update this comment as I address them:

STATUS July'18: waiting for ipfs/go-ipld-cbor#29 to land which brings go-ipfs to our version of cbor.

@travisperson
Copy link
Contributor

Is there a fundamental reason that we couldn't try to make the float64's fit into the thing they're unmarshaled into ... is it just that that work hasn't been done yet, or is it something else?

Immediately I can't really think of one. It might be sane to attempt to check for the precision of that floating values in this step (https://github.com/polydawn/refmt/blob/0ceb50a9198a7771378f4d712cfdf1b99121dfbb/obj/unmarshalBuiltins.go#L80), and attempt to fit the value. I think it would definitely work, but not sure if it's a great idea for the library. Might be worth opening an issue and get the authors thoughts.

@anorth anorth added the A-api label Jul 22, 2019
@anorth
Copy link
Member

anorth commented Jul 23, 2019

I hope that this issue becomes redundant with well-defined encoding in #3098 and the CLI using that API directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants