Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #3783: Improve IsPinned() lookups for indirect pins #3809

Merged
merged 1 commit into from
Mar 23, 2017

Conversation

hsanjuan
Copy link
Contributor

@hsanjuan hsanjuan commented Mar 21, 2017

This avoids revisiting already-searched branches and cut down
the IsPinned() check times considerably when recursive pins
share big underlying DAGs. Same mechanism as merkledag.EnumerateChildren().

Here are some benchmarks with a randomly-selected list of 10 of my pins (and bad pins):

This runs ipfs pin ls $pin for each of the listed pins and measures how long it takes it.

With 0.4.7:

Time for good pin QmSUuF4bnRBRAmaRzJcXHAu7z7tdD8sx1c5hU5kuvdPuDV: 0:00.55elapsed   0
Time for bad pin QmSUuF4bnRBRAmaRzJcXHAu7z7tdD8sx1c5hU5kuvdPuD1: 0:07.84elapsed   1
Time for good pin QmPcd2ypbzvh5ew6DBw9yCR8HBPWDJrLXVEw5DFntKWemk: 0:01.82elapsed   0
Time for bad pin QmPcd2ypbzvh5ew6DBw9yCR8HBPWDJrLXVEw5DFntKWem1: 0:07.19elapsed   1
Time for good pin QmYihqmU45xvKGBW1cGHCSF1Qxe5DTXzYqwFNEuBCvsR4Y: 0:00.23elapsed   0
Time for bad pin QmYihqmU45xvKGBW1cGHCSF1Qxe5DTXzYqwFNEuBCvsR41: 0:07.38elapsed   1
Time for good pin QmSP2Ef89DKFSrJi9T3b8bEeUA4fQSfR9AosKD7QEoXn3q: 0:00.31elapsed   0
Time for bad pin QmSP2Ef89DKFSrJi9T3b8bEeUA4fQSfR9AosKD7QEoXn31: 0:08.11elapsed   1
Time for good pin QmfJ2jZD4ULrbpJyoqxXnetckcwwHrGZMZfwdwKmfdZ4VE: 0:00.55elapsed   0
Time for bad pin QmfJ2jZD4ULrbpJyoqxXnetckcwwHrGZMZfwdwKmfdZ4V1: 0:08.26elapsed   1

With this patch:

Time for good pin QmSUuF4bnRBRAmaRzJcXHAu7z7tdD8sx1c5hU5kuvdPuDV: 0:00.69elapsed   0
Time for bad pin QmSUuF4bnRBRAmaRzJcXHAu7z7tdD8sx1c5hU5kuvdPuD1: 0:01.59elapsed   1
Time for good pin QmPcd2ypbzvh5ew6DBw9yCR8HBPWDJrLXVEw5DFntKWemk: 0:01.59elapsed   0
Time for bad pin QmPcd2ypbzvh5ew6DBw9yCR8HBPWDJrLXVEw5DFntKWem1: 0:01.80elapsed   1
Time for good pin QmYihqmU45xvKGBW1cGHCSF1Qxe5DTXzYqwFNEuBCvsR4Y: 0:00.31elapsed   0
Time for bad pin QmYihqmU45xvKGBW1cGHCSF1Qxe5DTXzYqwFNEuBCvsR41: 0:01.82elapsed   1
Time for good pin QmSP2Ef89DKFSrJi9T3b8bEeUA4fQSfR9AosKD7QEoXn3q: 0:00.40elapsed   0
Time for bad pin QmSP2Ef89DKFSrJi9T3b8bEeUA4fQSfR9AosKD7QEoXn31: 0:01.80elapsed   1
Time for good pin QmfJ2jZD4ULrbpJyoqxXnetckcwwHrGZMZfwdwKmfdZ4VE: 0:00.61elapsed   0
Time for bad pin QmfJ2jZD4ULrbpJyoqxXnetckcwwHrGZMZfwdwKmfdZ4V1: 0:01.78elapsed   1

@hsanjuan hsanjuan added the kind/enhancement A net-new feature or improvement to an existing feature label Mar 21, 2017
@hsanjuan hsanjuan self-assigned this Mar 21, 2017
@hsanjuan hsanjuan requested a review from whyrusleeping March 21, 2017 11:34
@hsanjuan hsanjuan added status/in-progress In progress need/review Needs a review and removed status/in-progress In progress labels Mar 21, 2017
Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, would be good to add a test that verifies my issue one way or the other

for _, rc := range p.recursePin.Keys() {
has, err := hasChild(p.dserv, rc, c)
has, err := hasChild(p.dserv, rc, c, visitedSet.Visit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we can reuse the same visitedSet across pin checks, unless we pass all the pins we're checking into the call at once. otherwise we might mark a subtree as visited that contains the next pin we're checking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait a second, i misread. 'c' is the thing we're looking for, and it remains the same throughout the entire operation. This LGTM then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping well, don't you need to revoke the "request changes" review or something? I have no idea if that is possible though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... i think it wants me to submit another review with 'changed approved'

Copy link
Contributor

@hoenirvili hoenirvili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting the patch. Looks very good, any change is welcomed. Other than the type thing that I've mentioned, this should be ok.

func hasChild(ds mdag.LinkService, root *cid.Cid, child *cid.Cid) (bool, error) {
// hasChild recursively looks for a Cid among the children of a root Cid.
// The visit function can be used to shortcut already-visited branches.
func hasChild(ds mdag.LinkService, root *cid.Cid, child *cid.Cid, visit func(*cid.Cid) bool) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make a type with the signature of go func(*cid.Cid) bool ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it creates an extra indirection which does not help to figure out what this is doing any more than having the definition directly as parameter. Moreover, this is the only parameter which uses that in the module, it's not hard to read and it is called from a single place which clearly shows that expected usage is to pass cid.Set.Visit function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least change the hasChild func format in someting like this.

func hasChild(
ds mdag.LinkService,
root *cid.Cid,
child *cid.Cid,
visit func(*cid.Cid) bool,
) (bool, error) {
//body
}

This should be documented somewhere to enforce a number of columns a func can have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe this is done anywhere else in the IPFS code base.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in general I don't mind having the function declaration be long. Simply breaking at the most convenient place onto the next line is fine. Breaking it out into separate lines per parameter brings back unpleasant memories of really really old C code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like going over 80 chars, but if the codebase style keeps declarations in the same line, I'll leave it as it is for the moment because I like consistency.

@hsanjuan
Copy link
Contributor Author

I'll add an extra test with a shared tree

@kevina kevina requested review from kevina and removed request for kevina March 21, 2017 22:03
@hsanjuan hsanjuan force-pushed the faster-pin-ls branch 2 times, most recently from 5632d9a to 41e3656 Compare March 22, 2017 16:52
@hsanjuan
Copy link
Contributor Author

@whyrusleeping @hoenirvili I have added an extra test.

pin/pin_test.go Outdated

// Create node B and add A3 as child
b, _ := randNode()
err = b.AddNodeLink("mychild", aNodes[3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make all error checks one line like if err := func(); err != nil { //code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsanjuan I think you changed the error checks in the wrong test, lol

pin/pin_test.go Outdated
assertPinned(t, p, bk, "B should be pinned")

// Unpin A5 recursively
err = p.Unpin(ctx, aKeys[5], true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check this error (and the one below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

@whyrusleeping
Copy link
Member

@hsanjuan One more comment about changing the error checking in the wrong test

@hsanjuan
Copy link
Contributor Author

@whyrusleeping sorry! I have fixed it. Honestly hope I haven't broken something else.

This avoids revisiting already-searched branches and cut down
the IsPinned() check times considerably when recursive pins
share big underlying DAGs.

A test has been added which double-checks that pinned and unpinned items
lookups respond as expected with shared branches.

License: MIT
Signed-off-by: Hector Sanjuan <[email protected]>
@whyrusleeping whyrusleeping merged commit 4e2e537 into master Mar 23, 2017
@whyrusleeping whyrusleeping deleted the faster-pin-ls branch March 23, 2017 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/review Needs a review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants