Skip to content
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.

fix: give one minute timeouts to function calls instead of block retrievals #44

Merged
merged 3 commits into from
Sep 27, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions resolver/resolver.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,10 @@ func (r *Resolver) ResolveToLastNode(ctx context.Context, fpath path.Path) (cid.
// create a selector to traverse and match all path segments
pathSelector := pathAllSelector(p[:len(p)-1])

// create a new cancellable session
ctx, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel()

Comment on lines +75 to +78
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, because the method's return is (cid.Cid, []string, error) -- there's no Node which might be a HAMT or Unixfs node or other ADL that people might ask things of later and be surprised if it says "i'm cancelled", so it's good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems to be the one that's used by go-ipfs and caused the reported error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for all the other ones aside from this we just remove the timeout.

@Stebalien you were thinking that leaving these timeouts in might make debugging changes here easier (#36 (comment)). Any objection to leaving this one in and removing the rest?

The change here means we have 1min to resolve the function instead of 1min per block, but that should be way more than enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember why that had anything to do with tracking down bugs... (might have been WRT keeping changes minimal?).

I'm generally against timeouts in places like this where there's no reasonable "we should have received a response by X time" so I'd rather remove all of the hard-coded timeouts like this eventually.

// resolve node before last path segment
nodes, lastCid, depth, err := r.resolveNodes(ctx, c, pathSelector)
if err != nil {
Expand Down Expand Up @@ -132,6 +136,10 @@ func (r *Resolver) ResolvePath(ctx context.Context, fpath path.Path) (ipld.Node,
// create a selector to traverse all path segments but only match the last
pathSelector := pathLeafSelector(p)

// create a new cancellable session
ctx, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably still going to exhibit roughly the same behaviour as before, because it's still cancelling by the time the method returns, and the method's return is (ipld.Node, ipld.Link, error) -- emphasis on that it returns a Node, and so that might be a HAMT or Unixfs node or other ADL that people might ask things of later and be surprised if it immediately says "i'm cancelled".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There don't seem to be too many users of this. There's go-ipfs' https://github.com/ipfs/go-ipfs/blob/92854db7aed4424fad117ceb4e13f64a80ff348b/fuse/readonly/readonly_unix.go#L70

From the way it's used there it doesn't seem to be expecting to return a HAMT. @hannahhoward can you confirm that a HAMT pointing to a HAMT won't cause issues with the above code?

The function also seems to be used https://github.com/filecoin-project/lotus/blob/4fc78bf72037ada0cb3edd0f551172428ddbc9e8/node/impl/full/chain.go#L593, which exhibits the concerns you were indicating. It doesn't deal with UnixFS HAMTs, but ... that's not a great feeling. Should be fine if we drop the timeouts per #44 (comment) though.

nodes, c, _, err := r.resolveNodes(ctx, c, pathSelector)
if err != nil {
return nil, nil, err
Expand Down Expand Up @@ -172,6 +180,10 @@ func (r *Resolver) ResolvePathComponents(ctx context.Context, fpath path.Path) (
// create a selector to traverse and match all path segments
pathSelector := pathAllSelector(p)

// create a new cancellable session
ctx, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably still going to exhibit roughly the same behaviour as before, because it's still cancelling by the time the method returns, and the method's return is ([]ipld.Node, error) -- emphasis on that it returns a Node, and so that might be a HAMT or Unixfs node or other ADL that people might ask things of later and be surprised if it immediately says "i'm cancelled".

Copy link
Contributor Author

@aschmahmann aschmahmann Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if ResolvePathComponents or ResolveLinks are used anywhere. https://grep.app/ doesn't show any uses. We can probably drop the timeouts here (as mentioned in #44 (comment))

nodes, _, _, err := r.resolveNodes(ctx, c, pathSelector)
if err != nil {
evt.Append(logging.LoggableMap{"error": err.Error()})
Expand Down Expand Up @@ -218,10 +230,6 @@ func (r *Resolver) ResolveLinks(ctx context.Context, ndd ipld.Node, names []stri
// Finds nodes matching the selector starting with a cid. Returns the matched nodes, the cid of the block containing
// the last node, and the depth of the last node within its block (root is depth 0).
func (r *Resolver) resolveNodes(ctx context.Context, c cid.Cid, sel ipld.Node) ([]ipld.Node, cid.Cid, int, error) {
// create a new cancellable session
ctx, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel()

session := r.FetcherFactory.NewSession(ctx)

// traverse selector
Expand Down