Ticket checking for expected consensus #482

aboodman · 2018-05-23T07:39:39Z

Fixes #384

aboodman · 2018-05-24T09:26:28Z

types/block.go

@@ -52,17 +55,6 @@ func (b *Block) Cid() *cid.Cid {
 	return b.ToNode().Cid()
 }

-// AddParent sets the parent pointer of the receiver to the argument if it
-// is a valid assignment, else returns an error.
-func (b *Block) AddParent(p Block) error {


I had to move this because it's not convenient to validate parent height here anymore. Also, not really sure it makes sense to be doing this validation at such a low-level on such a general data structure.

aboodman · 2018-05-24T09:30:54Z

mining/worker.go


 // Mine does the actual work. It's the implementation of worker.mine.
-func Mine(ctx context.Context, input Input, blockGenerator BlockGenerator, doSomeWork DoSomeWorkFunc, outCh chan<- Output) {
+func Mine(ctx context.Context, input Input, nullBlockTimer NullBlockTimerFunc, blockGenerator BlockGenerator, createPoST DoSomeWorkFunc, outCh chan<- Output) {


I'm not thrilled about the proliferation of these callback functions... I have had some ideas since getting my hands a little dirtier, but they seem out of scope for this PR:

1/ I think just naming better helps a lot. I renamed doSomeWork to createPoST because I think that is the role it is actually playing. If we complete the rename of the type (in a separate PR) and do other similar work to clean up the descriptiveness of all these names and organize them better, that will help a bit.

2/ A much bigger sledgehammer we could use is to make at least some of these functions global variables, and then replace them temporarily during tests. There's a nice pattern for this using defer. That would get rid of all the parameters passed everywhere and type definitions, and isolate the remaining complexity in tests. A bonus would be that jump-to-definition in editors would work again. This has the disadvantage, though, that we won't be able to parallelize tests as aggressively as if we hadn't done it. I'm not sure to what extent we care about that.

I like both ideas, better naming is always good IMO (assuming of course you can come up with better names in a reasonable amount of time). And your idea 2 is great, seems like it would improve code readability and debugability pretty nicely.

1 && 2 SGTM. Re 2 I love that pattern because it's easy to follow, but have been avoiding it due to previous feedback about sullying global namespace and public interfaces. Will bring it back when it looks like the right tool.

I tested it out in the lastest PR. See changes to worker_test.go.

Swap is cool. If we start using it, we'll need to watch out for goroutines escaping the dynamic scope of the swap. Mentioning this mostly as a note to self since this will be a new Go pattern for me.

aboodman · 2018-05-24T09:32:39Z

mining/worker.go

+}
+
+// How long the node's mining Worker should sleep to simulate mining.
+const mineSleepTime = time.Millisecond * 10


I reduced this significantly because (a) I needed to move it to in front of when the block is mined to better adhere to the design/spec, and also because (b) now with EC we tend to attempt mining a lot more times in each test (thus making tests painfully slow without changing this).

Cool. FWIW the fact that it was sleeping that long in tests was basically a bug, we should have been using a nop function in tests when constructing the mining worker here: https://github.com/filecoin-project/go-filecoin/blob/master/node/node.go#L198. Given trajectory though def simpler to just lower the sleep time.

But don't we want the sleep in place under normal circumstances, when you run like go-filecoin daemon? I think that codepath is used in both places.

You are correct. I was assuming that our trajectory was to have a real mechanism that slows us down -- either waiting for the next epoch or actually doing proofs -- some time soon. I guess I don't know when that is actually going to happen so yes you are right, continuing to sleep on the normal code path is probably good.

whyrusleeping · 2018-05-24T11:32:35Z

mining/block_generator.go

@@ -23,7 +23,7 @@ type GetStateTree func(context.Context, *cid.Cid) (state.Tree, error)

 // BlockGenerator is the primary interface for blockGenerator.
 type BlockGenerator interface {
-	Generate(context.Context, *types.Block, types.Address) (*types.Block, error)
+	Generate(ctx context.Context, block *types.Block, ticket types.Signature, nullBlockCount uint64, addr types.Address) (*types.Block, error)


i would either make these names better, or leave them off. block is unclear, should maybe be parent or basis

whyrusleeping · 2018-05-24T11:44:22Z

mining/worker.go

+	// Find the smallest ticket from parent set
+	var smallest types.Signature
+	for _, v := range parents {
+		if smallest == nil || bytes.Compare(v.Ticket, smallest) < 0 {


mental note: We should have a set of test vectors for this to make sure all implementations are sorting correctly

Not sure if you noticed this, but there's a small set of them in the unit test we could start with: https://github.com/filecoin-project/go-filecoin/pull/482/files#diff-a0c70e0ee68f1661f66db8fcbbabacf4R213. I was able to (ab)use the official NIST test vectors for this purpose. But generally speaking yeah, we're going to want/need a pretty extensive conformance test that covers the entire protocol.

whyrusleeping · 2018-05-24T11:46:41Z

mining/worker.go

+			break
+		}
+
+		nullBlockTimer()


still need a TODO to check if we've gotten a better block from the network in the meantime.

as written, this will make every miner keep trying an increasing number of null blocks until they win, implying every miner will submit a block at every epoch. Which is fine for now, but obviously not optimal in the future

Whoops, I think there's actually a minor bug there. When a better block comes in, Start() will cancel the current goroutine executing Mine() and spawn a new one. So we will start mining on the new block. But, I neglect to check for cancellation in this loop in the case where there's no winning ticket, so this instance of Mine() will also keep going until it finds a number of null blocks it wins for.

I'll write tests that exercise this -- it's the last TODO I had.

phritz

This all LGTM

phritz · 2018-05-24T19:27:44Z

mining/worker.go

-	// doSomeWork is a function like Sleep() that we call to simulate mining.
-	doSomeWork DoSomeWorkFunc
-	mine       mineFunc
+	createPoST     DoSomeWorkFunc // TODO: rename createPoSTFunc?


Sure, you should rename as you see fit here and anywhere else.

PR already too big.

phritz · 2018-05-24T20:08:10Z

mining/worker.go

 	}
+
+	buf := make([]byte, 4)
+	n := binary.PutUvarint(buf, uint64(nullBlockCount))


Maybe make it easy for the person doing the varint implementation to know that they should make a change here:

// TODO replace varint encoding as part of #340

phritz · 2018-05-24T20:17:45Z

mining/block_generator.go

@@ -104,15 +104,17 @@ func (b blockGenerator) Generate(ctx context.Context, baseBlock *types.Block, re
 		return nil, err
 	}

+	if blockHeight != baseBlock.Height+nullBlockCount+1 {


Might be a little more clear if blockHeight was set right before this check instead of a dozen lines above. When you do that though seems like the utility of the check comes into question, so maybe worth a note about what we're trying to accomplish (a backstop bc the relationship isn't enforced anywhere else).

aboodman

Esteemed reviewers, PTAL, I believe this is ready to land.

aboodman · 2018-05-25T23:22:20Z

commands/payment_channel_daemon_test.go

@@ -138,7 +138,7 @@ func TestPaymentChannelRedeemSuccess(t *testing.T) {
 func TestPaymentChannelReclaimSuccess(t *testing.T) {
 	payer := &address.TestAddress
 	target := &address.TestAddress2
-	eol := types.NewBlockHeight(5)
+	eol := types.NewBlockHeight(20)


This is suuuper ghetto. We're going to have to think about how to write this kind of test now that mine once means (and I think we want it to continue to mean) mine until you get a block.

@acruikshank

I don't fully understand the scope of the issue but we should feel empowered to modify mine once's behavior to hit a winning ticket immediately in the case of tests like this if it is useful. Or have a new command that does. If all we want is a block and we don't care about the actual consensus, we can circumvent consensus.

That you are having to futz with the numbers here points to a potential issue with the design. It's not entirely clear to me how a client and a miner will negotiate the eol such that the miner is guaranteed enough time to redeem their vouchers without the client having to tie up funds for an excessive amount of time before they can be reclaimed. There will be a point in time (point in block height?) where the client can create and deliver a voucher, but the miner will have, at best, a probabilistic chance of being able to redeem it.

Yeah, in tests like this we can make it so mining always succeeds.

phritz · 2018-05-26T00:12:36Z

commands/payment_channel_daemon_test.go

@@ -138,7 +138,7 @@ func TestPaymentChannelRedeemSuccess(t *testing.T) {
 func TestPaymentChannelReclaimSuccess(t *testing.T) {
 	payer := &address.TestAddress
 	target := &address.TestAddress2
-	eol := types.NewBlockHeight(5)
+	eol := types.NewBlockHeight(20)


I don't fully understand the scope of the issue but we should feel empowered to modify mine once's behavior to hit a winning ticket immediately in the case of tests like this if it is useful. Or have a new command that does. If all we want is a block and we don't care about the actual consensus, we can circumvent consensus.

aboodman · 2018-05-26T00:30:24Z

I don't fully understand the scope of the issue but we should feel empowered to modify mine once's behavior to hit a winning ticket immediately in the case of tests like this if it is useful. Or have a new command that does. If all we want is a block and we don't care about the actual consensus, we can circumvent consensus.

Yeah, that would be easy to implement technically, but I was concerned about how it would interact with consensus. What would it mean to have this feature in the deployed client? I guess it would mean you mine a block and just nobody agrees? It fails consensus? If that is OK, I'm happy to implement it.

aboodman · 2018-05-26T01:28:31Z

Can you say more a out that?

…

On Fri, May 25, 2018, 6:19 PM clkunzang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In mining/worker.go <#482 (comment)> : > // Mine does the actual work. It's the implementation of worker.mine. -func Mine(ctx context.Context, input Input, blockGenerator BlockGenerator, doSomeWork DoSomeWorkFunc, outCh chan<- Output) { +func Mine(ctx context.Context, input Input, nullBlockTimer NullBlockTimerFunc, blockGenerator BlockGenerator, createPoST DoSomeWorkFunc, outCh chan<- Output) { Swap is cool. If we start using it, we'll need to watch out for goroutines escaping the dynamic scope of the swap. Mentioning this mostly as a note to self since this will be a new Go pattern for me. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#482 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE6BHCn7fm-5Guz-Tihln0lVm__8364ks5t2K2mgaJpZM4UJ7Ri> .

porcuquine · 2018-05-26T02:16:21Z

@aboodman, Sure -- as an example (for shape, not substance), consider your usage:

(func() {
		defer swap.Swap(&isWinningTicket, everyThirdWinningTicket())()
		go Mine(ctx, input, nullBlockImmediately, mockBg, func() { workCount++ }, outCh)
		r = <-outCh
	})()

In this case, it happens to be fine because the relevant work of Mine has definitely completed by the time the function scope ends. (This is because of the blocking receive into r.)

However, knowing that this is the case requires knowing what's going on. Since the test involves many more actions, it's conceivable that we are relying on Mine to do something else after sending to outCh. (I know in your example, that's not the case -- but suppose this had happened earlier in a sequence of action/assertions within a single test.)

Or, maybe Mine does continue to do something else that we wouldn't care about if isWinningTicket were still swapped. But after it sends to outCh the first time, the inner function will return and isWinningTicket will not be swapped anymore. So it's now possible that the still-running (if it is -- this is just an example) goroutine will call isWinningTicket again, but this time get the original/default/snapshot value of the function and call it.

This is manageable, but it does require reasoning a lot more about the concurrency. It also means that changes to any function which spawns a goroutine need to be considered carefully lest they upset a test somewhere (which indirectly brings a call to that function into dynamic scope).

Because of all these issues, I might suggest that we split the difference. We could implement a test wrapper abstraction providing a syntactically convenient way to run a single self-contained test within the dynamic scope of the swapped re-bindings. This would reduce the discipline required in trying to use very fine-grained re-bindings like in your example.

Not directly related, but on the same topic and going back to your comment about parallel tests. I'm actually not sure of exactly when Go's testing framework believes it has the right to run tests in parallel. If we begin adopting this, someone should find that out and document it in the usage rules.

Basically, I think Swap is cool (and I like the black reflect magic in its implementation) -- but I think it comes with a lot of potential pitfalls. We can mitigate that by going all-in and formalizing some patterns which might be a bit more limited than all we can dream up but might also prevent us from getting into some nightmarish scenarios involving non-deterministic tests. If we end up in that quicksand, we'll find it hard to get out and wish we hadn't let it happen.

I'm quite interested to hear about how you've used this in the past and what kind of patterns/harnesses you adopted to stay safe. Maybe this warrants a discussion issue of its own.

porcuquine · 2018-05-26T02:47:23Z

@aboodman Sorry to have started turning your PR into a discussion on a tangent. Let's take the discussion to an issue: #499

Feel free to delete my long comment, which I copied there, if you like.

aboodman · 2018-05-26T02:47:57Z

On Fri, May 25, 2018 at 7:16 PM, clkunzang ***@***.***> wrote: @aboodman <https://github.com/aboodman>, Sure -- as an example (for shape, not substance), consider your usage: (func() { defer swap.Swap(&isWinningTicket, everyThirdWinningTicket())() go Mine(ctx, input, nullBlockImmediately, mockBg, func() { workCount++ }, outCh) r = <-outCh })() In this case, it happens to be fine because the relevant work of Mine has definitely completed by the time the function scope ends. (This is because of the blocking receive into r.) However, *knowing* that this is the case requires knowing what's going on. Since the test involves many more actions, it's conceivable that we are relying on Mine to do something else after sending to outCh. Or, maybe Mine *does* continue to do something else that we wouldn't care about if isWinningTicket were still swapped. But after it sends to outCh the first time, the inner function will return and isWinningTicket will *not* be swapped anymore. So it's now possible that the still-running (if it is -- this is just an example) goroutine will call isWinningTicket again, but this time get the original/default/snapshot value of the function and call it. This is manageable, but it does require reasoning a lot more about the concurrency. It also means that changes to any function which spawns a goroutine need to be considered carefully lest they upset a test somewhere (which indirectly brings a call to that function into dynamic scope). Because of all these issues, I might suggest that we split the difference. We could implement a test wrapper abstraction providing a syntactically convenient way to run a single self-contained test within the dynamic scope of the swapped re-bindings. This would reduce the discipline required in trying to use very fine-grained re-bindings like in your example. Not directly related, but on the same topic and going back to your comment about parallel tests. I'm actually not sure of *exactly* when Go's testing framework believes it has the right to run tests in parallel. If we begin adopting this, someone should find that out and document it in the usage rules. Basically, I think Swap is cool (and I like the black reflect magic in its implementation) -- but I think it comes with a lot of potential pitfalls. We can mitigate that by going all-in and formalizing some patterns which might be a bit more limited than all we can dream up but might also prevent us from getting into some nightmarish scenarios involving non-deterministic tests. If we end up in that quicksand, we'll find it hard to get out and wish we hadn't let it happen.

Agree with all of this. I'm not sure how useful Swap() will be in practice anyway because it really only helps for replacing global variables which aren't that common. And besides the issues you brought up, what we have here is basically monkey-patching, with all the complexity that entails.

I'm quite interested to hear about how you've used this in the past and what kind of patterns/harnesses you adopted to stay safe. Maybe this warrants a discussion issue of its own.

I haven't used it in the past. I frequently like to "try things on for size" to see how they feel. It's sometimes hard to speculate about how useful something will or wont' be, or how the tradeoffs will feel, without hands on experience - at least for me. Plus it's just fun to try new approaches. In this case, this has the one dramatic advantage of letting jump-to-definition continue to work, so I wanted to try it out. After using this here and gaining more experience with our tests, I'm learning more toward just a traditional software engineering solution of re-organizing the dependencies into interfaces with consistent names and code organization, and I bet that'll be enough. But now at least the tool is there and we can all play with it to gain experience.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#482 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE6BDp9W0qKYwBD6rqkt5pWHxQgYj2nks5t2Lr1gaJpZM4UJ7Ri> .

whyrusleeping · 2018-05-28T05:49:00Z

mining/worker.go

+			// let the successful run proceed unless the context is explicitly canceled.
+			if ctx.Err() == nil {
+				outCh <- NewOutput(next, err)
+			}


we should definitely log something if a context cancellation causes us to throw away a good block. I can't think of too many cases where we would want to actually throw away a correctly mined block.

Added some logging and made the existing TODO more assertive.

whyrusleeping · 2018-05-28T05:59:53Z

util/chk/check.go

+// True panics if a boolean condition is not true.
+func True(cond bool, formatAndArgs ...interface{}) {
+	var msg string
+	if len(formatAndArgs) == 0 {


why not put this inside the if !cond {? That way it only allocates when its going to immediately cease to exist

Nice catch. As a result of this comment, I went looking for ways to test this and found testing.AllocsPerRun. I also decided that it made more sense to remove chk.Equals() and chk.Nil() as those would also implicitly allocate and it's not burdensome to just inline those checks in the calls to chk.True().

whyrusleeping · 2018-05-28T06:03:15Z

util/swap/swap.go

+//
+// The destination value must be a pointer and the new value must be a value
+// that can be assigned to the destination pointer.
+func Swap(dst, new interface{}) (unswap func()) {


whyrusleeping · 2018-05-28T06:03:52Z

util/swap/swap_test.go

+
+	age := 25
+	fn := func() int { return 1 }
+	(func() {


the extra parens mean... what exactly?

Whoops. I don't know why gofmt didn't call me out for that.

aboodman

@whyrusleeping ptal. Assuming you are good, can you also please click the 'approve' button.

aboodman · 2018-05-28T16:36:19Z

util/chk/check.go

+// True panics if a boolean condition is not true.
+func True(cond bool, formatAndArgs ...interface{}) {
+	var msg string
+	if len(formatAndArgs) == 0 {


Nice catch. As a result of this comment, I went looking for ways to test this and found testing.AllocsPerRun. I also decided that it made more sense to remove chk.Equals() and chk.Nil() as those would also implicitly allocate and it's not burdensome to just inline those checks in the calls to chk.True().

aboodman · 2018-05-28T16:37:02Z

util/swap/swap_test.go

+
+	age := 25
+	fn := func() int { return 1 }
+	(func() {


Whoops. I don't know why gofmt didn't call me out for that.

dignifiedquire · 2018-05-29T11:30:14Z

is approval from @whyrusleeping the last thing missing to get this merged?

…bing global vars in tests)

Fixes #384

aboodman changed the title ~~WIP: Implement ticket checking for expected consensus~~ WIP: Ticket checking for expected consensus May 23, 2018

aboodman force-pushed the feat/ec/4 branch 4 times, most recently from 754fc99 to b745608 Compare May 24, 2018 09:18

aboodman changed the title ~~WIP: Ticket checking for expected consensus~~ Ticket checking for expected consensus May 24, 2018

aboodman commented May 24, 2018

View reviewed changes

aboodman requested review from phritz and whyrusleeping May 24, 2018 09:33

whyrusleeping reviewed May 24, 2018

View reviewed changes

phritz reviewed May 24, 2018

View reviewed changes

mishmosh added this to the Sprint 11 milestone May 24, 2018

aboodman force-pushed the feat/ec/4 branch 2 times, most recently from 472e539 to 892c8e7 Compare May 25, 2018 23:24

aboodman commented May 25, 2018

View reviewed changes

aboodman force-pushed the feat/ec/4 branch 4 times, most recently from 9138c72 to 559ec2f Compare May 25, 2018 23:48

phritz approved these changes May 26, 2018

View reviewed changes

porcuquine mentioned this pull request May 26, 2018

Swap functions for tests #499

Closed

whyrusleeping reviewed May 28, 2018

View reviewed changes

aboodman commented May 28, 2018

View reviewed changes

whyrusleeping approved these changes May 29, 2018

View reviewed changes

aboodman added 2 commits May 29, 2018 08:51

Introduce two utilities packages: chk (invariants) and swap (for stub…

8c92721

…bing global vars in tests)

Implement ticket checking for expected consensus

39aed1a

Fixes #384

aboodman force-pushed the feat/ec/4 branch from 889cf14 to 39aed1a Compare May 29, 2018 15:52

aboodman merged commit d6e2d71 into master May 29, 2018

aboodman deleted the feat/ec/4 branch May 29, 2018 17:00

This was referenced May 30, 2018

Fix mine-once to work with tests more reliably post-EC #504

Closed

Introduce --force-winning-ticket on mining once subcommand. #520

Closed

Ticket checking for expected consensus #482

Ticket checking for expected consensus #482

Conversation

aboodman commented May 23, 2018 • edited Loading

Choose a reason for hiding this comment

aboodman May 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phritz May 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aboodman May 24, 2018 • edited Loading

Choose a reason for hiding this comment

phritz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phritz May 24, 2018 • edited Loading

Choose a reason for hiding this comment

aboodman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aboodman commented May 26, 2018 • edited Loading

aboodman commented May 26, 2018 via email

porcuquine commented May 26, 2018 • edited Loading

porcuquine commented May 26, 2018

aboodman commented May 26, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aboodman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dignifiedquire commented May 29, 2018

aboodman commented May 23, 2018 •

edited

Loading

aboodman May 24, 2018 •

edited

Loading

phritz May 24, 2018 •

edited

Loading

aboodman May 24, 2018 •

edited

Loading

phritz May 24, 2018 •

edited

Loading

aboodman commented May 26, 2018 •

edited

Loading

porcuquine commented May 26, 2018 •

edited

Loading