Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider alternative to asCID signaling mechanism #213

Closed
Gozala opened this issue Oct 16, 2022 · 7 comments · Fixed by #217
Closed

Consider alternative to asCID signaling mechanism #213

Gozala opened this issue Oct 16, 2022 · 7 comments · Fixed by #217
Labels

Comments

@Gozala
Copy link
Contributor

Gozala commented Oct 16, 2022

As per #212 making asCID enumerable creates a problem, it also seems to be motivating factor for some of the work in #211.

For the background me and @mikeal have chosen cid.asCID === cid as a signaling mechanism that value is CID with following properties:

  1. Signal is retained when transferring value across the JS threads / realms because structural cloning respects self referentiality.
  2. It has very low collision chance - It is very unlikely that codec will be passed a data where cid.asCID === cid is true, while user was meant to pass something unrelated.

At the time we also considered following alternative cid['/'] === cid.bytes (which was inspired by DAG-JSON spec, but ultimately chose cid.asCID === cid because it seemed like it had even less collision opportunity.

Maybe given recent problems it is worth considering switch to cid['/'] === cid.bytes which would address those problems

@achingbrain
Copy link
Member

Pulling the below out of #214 (comment) as it's relevant here:

After looking at dag-json spec I'm realizing that / supposed to point to CID string not bytes so maybe this is not ideal. However general point still is:

  1. We need an own (signalling) property with a very low collision probability.
  2. This property should be retained across JS realms (implying it can not be symbol or getter)

So basically cid.wellKnownPropertyName === cid.bytes could check both boxes above and harder it is to type property name less likely it is to collide.

@achingbrain
Copy link
Member

achingbrain commented Oct 18, 2022

Does wellKnownPropertyName have to reference bytes? If the property name is '/', can the value be a string instead to be consistent with dag-json?

If consistency with dag-json doesn't add any value, and wellKnownPropertyName has to have a low probability of collision, why not call it something really unidiomatic like ___cid_bytes___?

That said, I've experimented with cid['/'] === cid.bytes locally and it does seem to solve the problem, so I'm happy with that if other people are.

achingbrain added a commit that referenced this issue Oct 18, 2022
As per #212 making
`asCID` enumerable breaks tests where modules don't handle self-referential
data properly.

As proposed in #213
this swaps `cid.CID === cid` for `cid['/'] === cid.bytes` as a mechanism
to tell consumers that the object in question is a `CID` which lets
them write CBOR with the correct tags, for example.

Fixes #212
Closes #213
@achingbrain
Copy link
Member

I've put my cid['/'] === cid.bytes experiment up in #217

achingbrain added a commit that referenced this issue Oct 18, 2022
As per #212 making
`asCID` enumerable breaks tests where modules don't handle self-referential
data properly.

As proposed in #213
this swaps `cid.CID === cid` for `cid['/'] === cid.bytes` as a mechanism
to tell consumers that the object in question is a `CID` which lets
them write CBOR with the correct tags, for example.

Fixes #212
Closes #213
@achingbrain
Copy link
Member

@Gozala do you think you'll have time to look at this? It's blocking updating multiformats in @ipld/dag-cbor, @ipld/dag-json and @ipld/car, which is in turn blocking ipfs-unixfs, ipfs-repo and ipfs itself and I'd really like to get a new version shipped in time for IPFS Camp.

@Gozala
Copy link
Contributor Author

Gozala commented Oct 19, 2022

@Gozala do you think you'll have time to look at this? It's blocking updating multiformats in @ipld/dag-cbor, @ipld/dag-json and @ipld/car, which is in turn blocking ipfs-unixfs, ipfs-repo and ipfs itself and I'd really like to get a new version shipped in time for IPFS Camp.

I have reviewed #217 and provided my feedback. I'd love all those things to upgrade to new multiformats, yet I wonder if there is a way we can unblock that without having to rush this.

I have tried to capture some of the problems I see with the way CID / Link interface is defined and provided sketch of the alternative that I think would work better. It also seems more aligned with discussions that took place elsewhere. I would love to at least get your and @rvagg input on that.

It also may be worth considering name different from ['/'] because it used to be that JS engines were not as well optimized for such property access and that again may have unfortunate performance implications.

If consistency with dag-json doesn't add any value, and wellKnownPropertyName has to have a low probability of collision, why not call it something really unidiomatic like cid_bytes?

I think that is reasonable as well, although I do think that representing CIDs simply as boxed Uint8Array's e.g. { CID: Uint8Array } may have low enough collision probability and does seem to make more sense than having two fields pointing to same Uint8Array.

Finally I also have been thinking of a way to avoid having to box Uint8Arrays as per #214 (comment) and if such solution is found I think that would be the best of all.

@achingbrain
Copy link
Member

It also may be worth considering name different from ['/'] because it used to be that JS engines were not as well optimized for such property access and that again may have unfortunate performance implications.

I ran a quick test, just accessing / and __cid on an object in a loop 10e9 times and could see no difference, so good news there.

#214 sounds good for further discussion.

github-actions bot pushed a commit that referenced this issue Oct 19, 2022
## [10.0.2](v10.0.1...v10.0.2) (2022-10-19)

### Bug Fixes

* use slash as flag that an object is a CID ([#217](#217)) ([1cec619](1cec619)), closes [#212](#212) [#213](#213)

### Trivial Changes

* **no-release:** rename varint test file so it is run ([#209](#209)) ([e32fe47](e32fe47))
* remove unnecessary dev deps ([#218](#218)) ([a43ffff](a43ffff))
@github-actions
Copy link

🎉 This issue has been resolved in version 10.0.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants