Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report progress during pin add #3671

Merged
merged 1 commit into from
Mar 2, 2017
Merged

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Feb 10, 2017

Closes #3624

License: MIT
Signed-off-by: Kevin Atkinson [email protected]

@kevina kevina added the status/in-progress In progress label Feb 10, 2017
@kevina
Copy link
Contributor Author

kevina commented Feb 10, 2017

@whyrusleeping let me know if this is along the lines of what you are thinking.

Also, how would we test this output, you said to output something every x time. But that will be hard to test as the amount of time an operation takes is anything but deterministic.

@whyrusleeping
Copy link
Member

Looking pretty good so far 👍 A couple important things to keep in mind:

  • if the progress flag isnt passed, we should keep the same exact behaviour we have now. This means not returning a channel in the default (non-progress) case.
  • The progress tracking stuff should be very easily testable on its own in the merkledag package, just generate some graphs and call enumerateChildren on those graphs with the progress tracker and verify the output looks correct.
  • I'm thinking we can be a little more granular than just incrementing by sets of links. Maybe we keep track of two numbers: total nodes processed, and the total number of nodes we think there are (this second number will obviously grow as we go down the graph, but thats fine)

@kevina
Copy link
Contributor Author

kevina commented Feb 10, 2017

if the progress flag isnt passed, we should keep the same exact behaviour we have now. This means not returning a channel in the default (non-progress) case.

I would think changing the type of the output based on flag would be a bad idea.

I'm thinking we can be a little more granular than just incrementing by sets of links. Maybe we keep track of two numbers: total nodes processed, and the total number of nodes we think there are (this second number will obviously grow as we go down the graph, but thats fine)

Okay. To avoid performance problems was trying to avoid updating the number after each call to visit. However, if don't think that will be a problem (I am mainly concerned about the locking overhead) I can update the number one node at a time.

I will add a second number for links so far, but nor sure how useful it will be.

@whyrusleeping
Copy link
Member

I would think changing the type of the output based on flag would be a bad idea.

The type remains the same, it just enables a streaming mode for progress output.

To avoid performance problems...

The cost of fetching and unmarshaling each node is going to vastly exceed the tiny cost of the mutex here (its not even a R/W Lock). If it ends up being an issue we could even switch to atomic integer operations

@kevina
Copy link
Contributor Author

kevina commented Feb 10, 2017

@whyrusleeping

The type remains the same, it just enables a streaming mode for progress output.

The type of the object return will change from a *AddPinOutput to a chan <-interface{} to a depending on if --progress was specified. Is that okay?

To avoid performance problems...

Okay.

@whyrusleeping
Copy link
Member

Yeah, changing between a chan and the concrete type there is fine. The type field in the command object will still stay the same

@kevina
Copy link
Contributor Author

kevina commented Feb 11, 2017

@whyrusleeping okay, I increased the granularity and avoided using a channel when --progress is not specified. The type had to change from PinOutput to AddPinOutput (not sure how to avoid it), but the API should in theory have stayed the same if I understand how it work as the Progress field should be admitted if it is "empty" i.e. 0.

One question, should we continue to use the Marshaler to report progress or switch to PostRun?

@whyrusleeping
Copy link
Member

@kevina Does it work correctly implemented in the marshaler?

@kevina
Copy link
Contributor Author

kevina commented Feb 12, 2017

@kevina Does it work correctly implemented in the marshaler?

@whyrusleeping I'm sorry I couldn't parse that sentence.

Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test for the progress tracker in merkledag_test.go with a few different structures. One running on a single node with no children, one running on a fairly large graph, and maybe one running on a linked list of nodes.

Also change the timer to something more like 500ms

@kevina
Copy link
Contributor Author

kevina commented Feb 12, 2017

@whyrusleeping

Add a test for the progress tracker in merkledag_test.go with a few different structures. One running on a single node with no children, one running on a fairly large graph, and maybe one running on a linked list of nodes.

I was planning to, but got stuck on the best way to create some random structures. Any hints on how to do thie?

Also change the timer to something more like 500ms

How about one second? The current time was only for manually testing.

@whyrusleeping
Copy link
Member

@kevina something like this should work: https://ipfs.io/ipfs/QmRsQHn9yV1VgGU3fEkBtpnwjVgKkrRXFKGS2cRoJMyG4m

@kevina
Copy link
Contributor Author

kevina commented Feb 12, 2017

Okay thanks that helped. I will try to write the tests, and finish this P.R, later today or tomorrow.

@kevina kevina force-pushed the kevina/pin-add-progress branch from 3f02784 to 995c4ec Compare February 13, 2017 04:02
@kevina kevina changed the title WIP: Report progress during pin add Report progress during pin add Feb 13, 2017
@kevina kevina added need/review Needs a review and removed status/in-progress In progress labels Feb 13, 2017
@kevina
Copy link
Contributor Author

kevina commented Feb 13, 2017

Okay, assuming tests pass I consider this done.

Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, mostly looks good. This will be great to have


r, ok := res.Output().(*AddPinOutput)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use a type switch here:

switch out := res.Output().(type) {
case *AddPinOutput:
case <chan interface{}:
default:
}

return
}

res.SetOutput(&PinOutput{added})
v := new(dag.ProgressTracker)
ctx := context.WithValue(req.Context(), "progress", v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a helper function on the progress tracker to do this for us nicely:

v := new(ProgressTracker)
ctx := v.DeriveContext(req.Context())

if r.Progress == -1 || r.Pins != nil {
added = r.Pins
} else {
if progressLine {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the if progressLine stuff for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use \r to replace write over the current line with a new progress line, but I only want to do that if the last line to stderr was a progress line

@@ -377,18 +377,39 @@ func EnumerateChildren(ctx context.Context, ds LinkService, root *cid.Cid, visit
} else if err != nil {
return err
}
v, _ := ctx.Value("progress").(*ProgressTracker)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the context does not have a progress tracker in it, I think this may panic (type asserting a nil value?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, cool. Makes sense now that i think about it

go func() {
defer close(out)
for {
timer := time.NewTimer(500 * time.Millisecond)
Copy link
Contributor

@hsanjuan hsanjuan Feb 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using a time.Ticker? Also, you are recreating timers and only the last one will stop so you are accumulating resources without free-ing them... you could use Timer.Reset() at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timers should be GCed unless I am missing something. Timer.Stop() is always called.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsanjuan Good catch.

We should be making this timer outside of the for loop.
The way things are, we are making a new timer every loop, and they all tick every 500ms, and then they don't get closed until the goroutine returns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timers should be GCed
You're right, it's actually tickers not timers, but it's still generating garbage which needs to be GCed when it could easily not do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't think it makes any difference. Nevertheless I just pushed a commit to move the creation of the timer to outside the loop at the expense of slightly more code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes a huge difference.

Every call to NewTimer creates an entry in the runtimes timer heap: https://golang.org/src/time/sleep.go?s=2220:2252#L84
https://golang.org/src/runtime/time.go#L94

They build up and generate events until you close them. Do not make this mistake, it causes weird side effects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @whyrusleeping,

I did make a mistake, but the mistake I think you were pointing out:

They build up and generate events until you close them. Do not make this mistake, it causes weird side effects

And calling stopTimer which I thought I did should removes them from the heap, https://golang.org/src/runtime/time.go#L71.

This mistake I made was assuming that the defer in a loop would fire before the next iteration. This is how a similar construct would of worked in C++. It seams I was wrong and go takes a more simple minded approach. The defer does not fire until the end of the function as demonstrated in this code (https://play.golang.org/p/zKTz1Kc0K_)

func main() {
	for i := 0; i < 10; i++ { 
		defer fmt.Println("ByBy")
		println("Hello")
	}
}

I personally think go is wrong here, but that is not something I wish to have an extended discussion on. I will just know to keep in mind how defer works in go and try not to make similar assumptions based on how C++ works, as go is obviously not C++.

res.SetOutput((<-chan interface{})(out))
go func() {
duration := 500 * time.Millisecond
timer := time.NewTimer(duration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably makes sense to use time.Ticker here so you don't have to reset it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping Okay done. Please double check that the ticker will be correctly garbage collected.

@whyrusleeping whyrusleeping requested a review from Kubuxu February 14, 2017 06:08
@kevina kevina force-pushed the kevina/pin-add-progress branch from d045e89 to 582dcb6 Compare February 14, 2017 06:28
@whyrusleeping
Copy link
Member

This LGTM, Just waiting on a 👍 from @Kubuxu now

@Kubuxu
Copy link
Member

Kubuxu commented Feb 14, 2017

I am afraid this will break API implementations, someone might not be expecting that he will get objects with Progress instead of Pins field set.

This is why it is important to spec those API as if we stated that you should filter the NDJSON stream for value you need it would be OK, but right now, anything can happen.


I will do proper CR tomorrow.

@Kubuxu
Copy link
Member

Kubuxu commented Feb 14, 2017

Also I am not happy with testing of this code, I know it can be hard but it should be possible.

https://codecov.io/gh/ipfs/go-ipfs/compare/2e116b40eb7ea41564c3371b0b3552c9791b59d1...582dcb64e37b014592677683c6d6fc1209d34159/diff

@kevina
Copy link
Contributor Author

kevina commented Feb 15, 2017

@Kubuxu

Also I am not happy with testing of this code

Is that also showing coverage from the sharness tests? I added some test which should at least cover most of the new code and some of that might not have been covered before.

@Kubuxu
Copy link
Member

Kubuxu commented Feb 15, 2017

Ahh, it wasn't rebased to master. Master has sharness test coverage collection.

Could you rebase it?

@kevina kevina force-pushed the kevina/pin-add-progress branch from 582dcb6 to e530d19 Compare February 15, 2017 01:02
@kevina
Copy link
Contributor Author

kevina commented Feb 15, 2017

Could you rebase it?

Done.

@kevina
Copy link
Contributor Author

kevina commented Feb 15, 2017

I am afraid this will break API implementations, someone might not be expecting that he will get objects with Progress instead of Pins field set.

If I did things correctly. That will only happen if --progress is used. If not the API should in theory be the same, although I have not tested that because I not sure how.

@kevina
Copy link
Contributor Author

kevina commented Feb 15, 2017

@Kubuxu the coverage tests are green now.

Copy link
Member

@Kubuxu Kubuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs rebase but LGTM, sorry for being late.

@kevina
Copy link
Contributor Author

kevina commented Feb 16, 2017

Actually once it rebased the progress indicator will no longer work due to the fact that FetchGraph now calls EnumerateChildrenAsync. I will fix that by moving the progress reporting into the Visit callback and also make sure to always show the progress indicator to make it easier to test for.

Expect something in the next few hours, end of day at the latest.

@Kubuxu Kubuxu added this to the Ipfs 0.4.7 milestone Feb 16, 2017
@kevina kevina force-pushed the kevina/pin-add-progress branch from e530d19 to e3fb71e Compare February 16, 2017 20:21
@kevina
Copy link
Contributor Author

kevina commented Feb 16, 2017

Okay I rebased and made some changes. You might want to look at the last two commits. Note the earlier commits are technically non-functional due to the change of FetchGraph now calling EnumerateChildrenASync instead of EnumerateChildren. I can rebase/squash my commits if you think it will be best.

@whyrusleeping
Copy link
Member

@kevina Yeah, squashing the broken commits would be good. We try hard to make sure every commit builds and works.

@Kubuxu re: breaking api concerns, The default value here is false, so unless the user (or api client) explicitly passes --progress=true, then nothing changes for them.

@Kubuxu
Copy link
Member

Kubuxu commented Feb 17, 2017

Yeah, didn't notice it at first. So LGTM

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
@kevina kevina force-pushed the kevina/pin-add-progress branch from e3fb71e to cdd29c2 Compare February 17, 2017 21:19
@kevina
Copy link
Contributor Author

kevina commented Feb 17, 2017

Okay. Just squashed and rebased.

@whyrusleeping whyrusleeping merged commit 34a94c2 into master Mar 2, 2017
@whyrusleeping whyrusleeping deleted the kevina/pin-add-progress branch March 2, 2017 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/review Needs a review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants