-
Notifications
You must be signed in to change notification settings - Fork 41
fix: make Block().* return correct ABI based ipld.ErrNotFound errors #156
Conversation
e5b3ccf
to
507bb8c
Compare
507bb8c
to
c15609a
Compare
abyfy_errors.go
Outdated
|
||
// This file handle parsing and returning the correct ABI based errors from error messages | ||
//lint:ignore ST1008 this function is not using the error as a mean to return failure but it massages it to return the correct type | ||
func abyfyIpldNotFound(msg string) (error, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems pretty complicated and brittle. Does every impl of this HTTP client have to do this? Should we consider changing the API to return structured errors, or do this server side, so clients don't have to resort to these contortions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some summarizing from the issue as discussed in https://discord.com/channels/806902334369824788/956547968385822730/957037148853395496.
IIUC from the interface tests previously we did have some weak contract on the CoreAPI that functions like Block().Get() and Block().Rm() also needed to return things like blockstore.ErrNotFound (or really that the error message would be something like blockstore: block not found ipfs/interface-go-ipfs-core@01ee941. Now we've broken the weak contract by changing the text and are figuring out the best way to restore it.
Does every impl of this HTTP client have to do this?
Not necessarily, it depends what they're trying to do. In this case to for an HTTP API to be compatible with interface-go-ipfs-core
's CoreAPI interfaces it needs to do some sort of error checking to figure out what type of error it received. Maybe we could drop that requirement, but idk that we have to.
Should we consider changing the API to return structured errors
You're right that it might be worth changing the HTTP API to have some contract on the types of errors it returns. I think for me the question boils down to the value tradeoffs here, i.e. the number of server and client implementations of this API that need to care about errors vs how much effort it is to return structured errors. Given the low number of servers and the low number of clients that care strongly about the error types it seems like we can probably not do this right now.
or do this server side, so clients don't have to resort to these contortions?
We can make some of this easier server side, although currently there are two implementers of this API (go-ipfs and js-ipfs) with most other IPFS implementations choosing to not implement this HTTP API at all. Some contortions will still be necessary because go-ipfs and js-ipfs don't have the same error messages at the moment (especially since we just changed them in go-ipfs with resolving ipfs/kubo#7074).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks. For reference, here are some active community client impls (other than Go and JS):
- https://github.com/ipfs-shipyard/py-ipfs-http-client
- https://github.com/ferristseng/rust-ipfs-api
- https://github.com/vasild/cpp-ipfs-http-client
- https://github.com/Peergos/Peergos/blob/00e2a8d121473002fa9bdc1ca3f90ecf7f9d33e8/src/peergos/shared/storage/ContentAddressedStorage.java#L257
I don't know how these are used and whether they care about this or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with @aschmahmann
Should we consider changing the API to return structured errors, or do this server side, so clients don't have to resort to these contortions?
I belive what surfaced from our discussion (see adin's discord link) is that we can do that later.
Right now it's fine because there is only one error type that needs that treatment. But I think that if we spread that pattern in the future I think we will use a list of error IDs (or automatically generated ones).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how these are used and whether they care about this or not.
My thought is that grep.app would've shown more code caring about the previous error text if code built around those clients cared. Since there doesn't seem to be much there it appears to be safe to change the error text and that others aren't relying on knowing the error types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, here are some active community client impls
I don't think it matters to them, as this is required if the http clients want to be compatible with the CoreAPI objects (extended weak contract, if you only use errors to show your users error messages you don't care about that for example).
Object which doesn't exists in other languages and even if it does (maybe JS ?) it doesn't know about ipld.ErrNotFound
.
abyfy_errors.go
Outdated
|
||
// This file handle parsing and returning the correct ABI based errors from error messages | ||
//lint:ignore ST1008 this function is not using the error as a mean to return failure but it massages it to return the correct type | ||
func abyfyIpldNotFound(msg string) (error, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some summarizing from the issue as discussed in https://discord.com/channels/806902334369824788/956547968385822730/957037148853395496.
IIUC from the interface tests previously we did have some weak contract on the CoreAPI that functions like Block().Get() and Block().Rm() also needed to return things like blockstore.ErrNotFound (or really that the error message would be something like blockstore: block not found ipfs/interface-go-ipfs-core@01ee941. Now we've broken the weak contract by changing the text and are figuring out the best way to restore it.
Does every impl of this HTTP client have to do this?
Not necessarily, it depends what they're trying to do. In this case to for an HTTP API to be compatible with interface-go-ipfs-core
's CoreAPI interfaces it needs to do some sort of error checking to figure out what type of error it received. Maybe we could drop that requirement, but idk that we have to.
Should we consider changing the API to return structured errors
You're right that it might be worth changing the HTTP API to have some contract on the types of errors it returns. I think for me the question boils down to the value tradeoffs here, i.e. the number of server and client implementations of this API that need to care about errors vs how much effort it is to return structured errors. Given the low number of servers and the low number of clients that care strongly about the error types it seems like we can probably not do this right now.
or do this server side, so clients don't have to resort to these contortions?
We can make some of this easier server side, although currently there are two implementers of this API (go-ipfs and js-ipfs) with most other IPFS implementations choosing to not implement this HTTP API at all. Some contortions will still be necessary because go-ipfs and js-ipfs don't have the same error messages at the moment (especially since we just changed them in go-ipfs with resolving ipfs/kubo#7074).
adc28ba
to
f3052c4
Compare
f3052c4
to
48b2867
Compare
errors.go
Outdated
// Assume that CIDs only contain a-zA-Z0-9 characters. | ||
// This is true because go-ipld-format use go-cid#Cid.String which use base{3{2,6},58}. | ||
postIndex = strings.IndexFunc(msgPostKey, notAsciiLetterOrDigits) | ||
if postIndex < 0 { | ||
postIndex = len(msgPostKey) | ||
} | ||
|
||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the complexity brought by this? Can't we just assume the CID is separated by a space, instead? (if we have to assume anything at all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a space seems incomplete, I would like to include "
too.
And at this point instead of searching a list of things I want to break on, I thought it was simpler to do a list of things I do not want to break on.
That is very static code that should never need an update (unless this parsing code is removed) and has a ~100% test coverage. So it's not really adding maintainment costs.
Edit: this code could be made simpler by removing the LUT, however not breaking on a-zA-Z0-9 seems usefull to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code could be made simpler by removing the LUT
Yes, that.
however not breaking on a-zA-Z0-9 seems usefull to me.
I'm not sure how this keeps something from not breaking, will need to take a closer look. In general hard-coding a basic alphanumeric detector along with its tests sounds like unneeded complexity added, if not for maintaining costs (though more code seems to always imply that) at least for my selfish reviewing costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this keeps something from not breaking
I don't meant breaking code, I meant breaking as control flow, breaking out of the IndexFunc
loop.
Since you are two with gus raising the same point #156 (comment) I've replaced it by breaking on " \t\n\r\v\f;\"
" (whitespaces or semicolon (produced by multierr when concatenating) or "
for %q
and %#v
).
go.mod
Outdated
|
||
replace github.com/ipfs/go-ipld-format => github.com/Jorropo/go-ipld-format v0.3.2-0.20220330014726-942265d1aca7 | ||
|
||
replace github.com/ipfs/interface-go-ipfs-core => github.com/Jorropo/interface-go-ipfs-core v0.6.2-0.20220331215619-b98f8571cf6b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW from https://discord.com/channels/806902334369824788/956547968385822730/959649999988338762
The
go-ipfs
andgo-ipfs-http-cient
PRs are circularly dependent, how you want to handle that ?
The tests works on my machine when I use both branches together (unhelpfull as it is).Do I make temporary replaces and custom checkout that I'll remove before merging so you can see a green CI light ?
An other option is that sincego-ipfs
's dependency is only in a testing workflow, we can mergego-ipfs
with a custom checkout there, that unblocksgo-ipfs-http-client
merge it and then do a new PR that just cleanupgo-ipfs
's checkout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay that makes sense, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect CI runs to fully pass at this point, or is the current test failure expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect CI runs to fully pass at this point, or is the current test failure expected?
It is the one expected. And it works on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay SGTM, the go-ipfs PR looks like it's ready to go right? So can we go ahead and remove the replace directives then so we can merge this?
errors.go
Outdated
// Assume that CIDs only contain a-zA-Z0-9 characters. | ||
// This is true because go-ipld-format use go-cid#Cid.String which use base{3{2,6},58}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a brittle assumption to make, as there already exist multibase encodings (and additional proposals) outside of this range. Would prefer not to make assumptions about the CID encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replaced by breaking on " \t\n\r\v\f;\"
" (whitespaces or semicolon (produced by multierr when concatenating) or "
for %q
and %#v
). Hopefully no CID we will see will include thoses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see so this is actually just a bug on the server side, because there doesn't seem to be a way to parse all valid CIDs from these error strings, am I understanding that correctly?
We should add some documentation about the fact that this only supports a subset of valid CIDs, and also open an issue about this bug.
Anyway, this seems like the least-awful approach. Thanks for taking the time to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see so this is actually just a bug on the server side, because there doesn't seem to be a way to parse all valid CIDs from these error strings, am I understanding that correctly?
Currently AFAIK, no multibase that we support use thoses codepoints so it's valid. If you use a CID that use thoses codepoints the client wouldn't know how to parse it anyway (without bumping it's go-multibase dep).
The "proper" fix, is encoding ipld.ErrNotFound.Cid
in a separate whatever field (LEB, json field, ...) so we can unambiguously know where they terminate and fetch them correctly, but that fixing an issue that doesn't exists yet since no CID we know how to parse use that (I would like to see a test failing before fixing it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra chars the encodings have outside 0-9A-Za-z
:
- base32pad:
=
- base32padupper:
=
- base32hexpadupper:
=
- base64pad:
=
,+
,/
- ex
f00003E00003F00
=>MAAA+AAA/AA==
- ex
- base64urlpad:
=
,-
,_
- ex
f00003E00003F00
=>UAAA-AAA_AA==
- ex
- base64:
+
,/
- ex
f00003E00003F00
=>mAAA+AAA/AA
- ex
- base64url:
-
,_
- ex
f00003E00003F00
=>uAAA-AAA_AA
- ex
If a new encoding is added, ideally it would just work (would be the case if the cid were properly delimited rather than guessing about delimiters, so we could just blindly pass it to go-multibase
). Second-best would be that an existing test would fail when we add support for the new encoding (doesn't seem to be the case here?). Least ideal is that this parsing code gets confused and parses the CID incorrectly (how it's currently implemented). I'm not sure what the implications are of that...annoying for users at the least, potentially insecure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra chars the encodings have outside
0-9A-Za-z
I meant " \t\n\r\v\f;\"
".
parsing code gets confused and parses the CID incorrectly
It parse incorrectly see that the CID is invalid (either header is truncated or multihash length doesn't match) and giveup and use a text error instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially insecure
I'll change it to return a blockstoreNotFoundMatchingIPLDErrNotFound
instead of a text error when it gives up (the difference is that ipld.IsNotFound(blockstoreNotFoundMatchingIPLDErrNotFound{})
return true
). However the CID will still not be recoverable (so if someone expects to reads the CID out of that they can't).
But no code currently do that (because in the past only text errors was there) and I don't plan anyone to write such code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we decoding CIDs in the first place? Is that part of the spirit of what we're testing here? It seems what we want to identify is the type of error returned (and sadly we need to do that by string manipulation), but we already know what CID we're requesting (it's our test) so we can anticipate the the CID string and just match against it.
48b2867
to
5635257
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy this no longer uses the alphanumeric detection scheme.
This release include the new ipld.ErrNotFound error support in the block API.
5635257
to
d7208ce
Compare
Suggested version: Changes in diff --git a/go.mod b/go.mod
index 3ef46c5..ce3ed95 100644
--- a/go.mod
+++ b/go.mod
@@ -5,11 +5,11 @@ require (
github.com/ipfs/go-cid v0.0.7
github.com/ipfs/go-ipfs-cmds v0.6.0
github.com/ipfs/go-ipfs-files v0.0.8
- github.com/ipfs/go-ipld-format v0.2.0
- github.com/ipfs/go-merkledag v0.4.0
+ github.com/ipfs/go-ipld-format v0.4.0
+ github.com/ipfs/go-merkledag v0.6.0
github.com/ipfs/go-path v0.1.1
github.com/ipfs/go-unixfs v0.2.5
- github.com/ipfs/interface-go-ipfs-core v0.5.2
+ github.com/ipfs/interface-go-ipfs-core v0.6.2
github.com/ipfs/iptb v1.4.0
github.com/ipfs/iptb-plugins v0.3.0
github.com/libp2p/go-libp2p-core v0.8.6
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
34cc489 looks good.
…rNotFound fix: make Block().* return correct ABI based ipld.ErrNotFound errors This commit was moved from ipfs/go-ipfs-http-client@fdbee7c
This fixes the tests for ipfs/kubo#8815
There are probably other APIs that needs the same treatment however we do not have failing tests for them yet.
Blocking PR and replace rules to be removed: