Skip to content
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.

fix: give one minute timeouts to function calls instead of block retrievals #44

Merged
merged 3 commits into from
Sep 27, 2021

Conversation

aschmahmann
Copy link
Contributor

This allows us to get around issues with shared IPLD ADL contexts such as ipfs/kubo#8420

…ievals to get around issues with shared IPLD ADL contexts
Copy link
Member

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually grok how these functions fit together, or who the caller of the exported functions here is, or what their expectations are, so my review should be considered low-value.

However, it does look to me like -- depending on what callers actually expect from these methods -- in some of the cases, the current diff might not be hoisting the context initialization high enough up the call stack.

(Maybe it'll work fine, in some cases, or even in "all" cases considering how this is used "in practice". I don't know. If so, it still probably suggests a weirdly close coupling of this library to a specific single application that uses it. I'm not gonna try to police that or change that today; just sayin'.)

In general I wonder if this defer cancel() behavior is a useful idea. Would anything bad happen if we just dropped it? And I continue to doubt that timeouts being introduced by a library at all are a sensible idea, even under the alluring guise of "defense in depth".

Comment on lines +75 to +78
// create a new cancellable session
ctx, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, because the method's return is (cid.Cid, []string, error) -- there's no Node which might be a HAMT or Unixfs node or other ADL that people might ask things of later and be surprised if it says "i'm cancelled", so it's good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems to be the one that's used by go-ipfs and caused the reported error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for all the other ones aside from this we just remove the timeout.

@Stebalien you were thinking that leaving these timeouts in might make debugging changes here easier (#36 (comment)). Any objection to leaving this one in and removing the rest?

The change here means we have 1min to resolve the function instead of 1min per block, but that should be way more than enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember why that had anything to do with tracking down bugs... (might have been WRT keeping changes minimal?).

I'm generally against timeouts in places like this where there's no reasonable "we should have received a response by X time" so I'd rather remove all of the hard-coded timeouts like this eventually.

Comment on lines 139 to 142
// create a new cancellable session
ctx, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably still going to exhibit roughly the same behaviour as before, because it's still cancelling by the time the method returns, and the method's return is (ipld.Node, ipld.Link, error) -- emphasis on that it returns a Node, and so that might be a HAMT or Unixfs node or other ADL that people might ask things of later and be surprised if it immediately says "i'm cancelled".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There don't seem to be too many users of this. There's go-ipfs' https://github.com/ipfs/go-ipfs/blob/92854db7aed4424fad117ceb4e13f64a80ff348b/fuse/readonly/readonly_unix.go#L70

From the way it's used there it doesn't seem to be expecting to return a HAMT. @hannahhoward can you confirm that a HAMT pointing to a HAMT won't cause issues with the above code?

The function also seems to be used https://github.com/filecoin-project/lotus/blob/4fc78bf72037ada0cb3edd0f551172428ddbc9e8/node/impl/full/chain.go#L593, which exhibits the concerns you were indicating. It doesn't deal with UnixFS HAMTs, but ... that's not a great feeling. Should be fine if we drop the timeouts per #44 (comment) though.

Comment on lines 183 to 186
// create a new cancellable session
ctx, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably still going to exhibit roughly the same behaviour as before, because it's still cancelling by the time the method returns, and the method's return is ([]ipld.Node, error) -- emphasis on that it returns a Node, and so that might be a HAMT or Unixfs node or other ADL that people might ask things of later and be surprised if it immediately says "i'm cancelled".

Copy link
Contributor Author

@aschmahmann aschmahmann Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if ResolvePathComponents or ResolveLinks are used anywhere. https://grep.app/ doesn't show any uses. We can probably drop the timeouts here (as mentioned in #44 (comment))

Comment on lines +125 to +127
//
// Note: if/when the context is cancelled or expires then if a multi-block ADL node is returned then it may not be
// possible to load certain values.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so I've removed the other 3 timeouts but left this one in as it's at least plausible.

I'm still concerned that the other functions all return nodes where cancelling the context prevents the nodes from being traversable. However, the caller is able to control if/when this happens so it's less of a problem. I added a comment calling out the difficulty.

cc @warpfork any better ideas?

Copy link
Contributor

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM by me. I definitely feel this is a really weird place to have timeouts in general.

@aschmahmann aschmahmann merged commit 7031c42 into master Sep 27, 2021
@aschmahmann aschmahmann deleted the fix/adl-context-cancel branch September 27, 2021 16:56
@aschmahmann aschmahmann mentioned this pull request Sep 30, 2021
62 tasks
Jorropo pushed a commit to ipfs/go-libipfs-rapide that referenced this pull request Mar 23, 2023
fix: give one minute timeouts to function calls instead of block retrievals

This commit was moved from ipfs/go-path@7031c42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants