Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ticket checking for expected consensus #482

Merged
merged 2 commits into from
May 29, 2018
Merged

Ticket checking for expected consensus #482

merged 2 commits into from
May 29, 2018

Conversation

aboodman
Copy link
Contributor

@aboodman aboodman commented May 23, 2018

Fixes #384

@aboodman aboodman changed the title WIP: Implement ticket checking for expected consensus WIP: Ticket checking for expected consensus May 23, 2018
@aboodman aboodman force-pushed the feat/ec/4 branch 4 times, most recently from 754fc99 to b745608 Compare May 24, 2018 09:18
@aboodman aboodman changed the title WIP: Ticket checking for expected consensus Ticket checking for expected consensus May 24, 2018
@@ -52,17 +55,6 @@ func (b *Block) Cid() *cid.Cid {
return b.ToNode().Cid()
}

// AddParent sets the parent pointer of the receiver to the argument if it
// is a valid assignment, else returns an error.
func (b *Block) AddParent(p Block) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move this because it's not convenient to validate parent height here anymore. Also, not really sure it makes sense to be doing this validation at such a low-level on such a general data structure.


// Mine does the actual work. It's the implementation of worker.mine.
func Mine(ctx context.Context, input Input, blockGenerator BlockGenerator, doSomeWork DoSomeWorkFunc, outCh chan<- Output) {
func Mine(ctx context.Context, input Input, nullBlockTimer NullBlockTimerFunc, blockGenerator BlockGenerator, createPoST DoSomeWorkFunc, outCh chan<- Output) {
Copy link
Contributor Author

@aboodman aboodman May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not thrilled about the proliferation of these callback functions... I have had some ideas since getting my hands a little dirtier, but they seem out of scope for this PR:

1/ I think just naming better helps a lot. I renamed doSomeWork to createPoST because I think that is the role it is actually playing. If we complete the rename of the type (in a separate PR) and do other similar work to clean up the descriptiveness of all these names and organize them better, that will help a bit.

2/ A much bigger sledgehammer we could use is to make at least some of these functions global variables, and then replace them temporarily during tests. There's a nice pattern for this using defer. That would get rid of all the parameters passed everywhere and type definitions, and isolate the remaining complexity in tests. A bonus would be that jump-to-definition in editors would work again. This has the disadvantage, though, that we won't be able to parallelize tests as aggressively as if we hadn't done it. I'm not sure to what extent we care about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like both ideas, better naming is always good IMO (assuming of course you can come up with better names in a reasonable amount of time). And your idea 2 is great, seems like it would improve code readability and debugability pretty nicely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 && 2 SGTM. Re 2 I love that pattern because it's easy to follow, but have been avoiding it due to previous feedback about sullying global namespace and public interfaces. Will bring it back when it looks like the right tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it out in the lastest PR. See changes to worker_test.go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap is cool. If we start using it, we'll need to watch out for goroutines escaping the dynamic scope of the swap. Mentioning this mostly as a note to self since this will be a new Go pattern for me.

}

// How long the node's mining Worker should sleep to simulate mining.
const mineSleepTime = time.Millisecond * 10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reduced this significantly because (a) I needed to move it to in front of when the block is mined to better adhere to the design/spec, and also because (b) now with EC we tend to attempt mining a lot more times in each test (thus making tests painfully slow without changing this).

Copy link
Contributor

@phritz phritz May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. FWIW the fact that it was sleeping that long in tests was basically a bug, we should have been using a nop function in tests when constructing the mining worker here: https://github.com/filecoin-project/go-filecoin/blob/master/node/node.go#L198. Given trajectory though def simpler to just lower the sleep time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But don't we want the sleep in place under normal circumstances, when you run like go-filecoin daemon? I think that codepath is used in both places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I was assuming that our trajectory was to have a real mechanism that slows us down -- either waiting for the next epoch or actually doing proofs -- some time soon. I guess I don't know when that is actually going to happen so yes you are right, continuing to sleep on the normal code path is probably good.

@aboodman aboodman requested review from phritz and whyrusleeping May 24, 2018 09:33
@@ -23,7 +23,7 @@ type GetStateTree func(context.Context, *cid.Cid) (state.Tree, error)

// BlockGenerator is the primary interface for blockGenerator.
type BlockGenerator interface {
Generate(context.Context, *types.Block, types.Address) (*types.Block, error)
Generate(ctx context.Context, block *types.Block, ticket types.Signature, nullBlockCount uint64, addr types.Address) (*types.Block, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would either make these names better, or leave them off. block is unclear, should maybe be parent or basis

// Find the smallest ticket from parent set
var smallest types.Signature
for _, v := range parents {
if smallest == nil || bytes.Compare(v.Ticket, smallest) < 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mental note: We should have a set of test vectors for this to make sure all implementations are sorting correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you noticed this, but there's a small set of them in the unit test we could start with: https://github.com/filecoin-project/go-filecoin/pull/482/files#diff-a0c70e0ee68f1661f66db8fcbbabacf4R213. I was able to (ab)use the official NIST test vectors for this purpose. But generally speaking yeah, we're going to want/need a pretty extensive conformance test that covers the entire protocol.

break
}

nullBlockTimer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need a TODO to check if we've gotten a better block from the network in the meantime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as written, this will make every miner keep trying an increasing number of null blocks until they win, implying every miner will submit a block at every epoch. Which is fine for now, but obviously not optimal in the future

Copy link
Contributor Author

@aboodman aboodman May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I think there's actually a minor bug there. When a better block comes in, Start() will cancel the current goroutine executing Mine() and spawn a new one. So we will start mining on the new block. But, I neglect to check for cancellation in this loop in the case where there's no winning ticket, so this instance of Mine() will also keep going until it finds a number of null blocks it wins for.

I'll write tests that exercise this -- it's the last TODO I had.

Copy link
Contributor

@phritz phritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all LGTM

// doSomeWork is a function like Sleep() that we call to simulate mining.
doSomeWork DoSomeWorkFunc
mine mineFunc
createPoST DoSomeWorkFunc // TODO: rename createPoSTFunc?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, you should rename as you see fit here and anywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR already too big.

mining/worker.go Outdated
}

buf := make([]byte, 4)
n := binary.PutUvarint(buf, uint64(nullBlockCount))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it easy for the person doing the varint implementation to know that they should make a change here:

// TODO replace varint encoding as part of #340

@@ -104,15 +104,17 @@ func (b blockGenerator) Generate(ctx context.Context, baseBlock *types.Block, re
return nil, err
}

if blockHeight != baseBlock.Height+nullBlockCount+1 {
Copy link
Contributor

@phritz phritz May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a little more clear if blockHeight was set right before this check instead of a dozen lines above. When you do that though seems like the utility of the check comes into question, so maybe worth a note about what we're trying to accomplish (a backstop bc the relationship isn't enforced anywhere else).

@mishmosh mishmosh added this to the Sprint 11 milestone May 24, 2018
@aboodman aboodman force-pushed the feat/ec/4 branch 2 times, most recently from 472e539 to 892c8e7 Compare May 25, 2018 23:24
Copy link
Contributor Author

@aboodman aboodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Esteemed reviewers, PTAL, I believe this is ready to land.

@@ -138,7 +138,7 @@ func TestPaymentChannelRedeemSuccess(t *testing.T) {
func TestPaymentChannelReclaimSuccess(t *testing.T) {
payer := &address.TestAddress
target := &address.TestAddress2
eol := types.NewBlockHeight(5)
eol := types.NewBlockHeight(20)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suuuper ghetto. We're going to have to think about how to write this kind of test now that mine once means (and I think we want it to continue to mean) mine until you get a block.

@acruikshank

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the scope of the issue but we should feel empowered to modify mine once's behavior to hit a winning ticket immediately in the case of tests like this if it is useful. Or have a new command that does. If all we want is a block and we don't care about the actual consensus, we can circumvent consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That you are having to futz with the numbers here points to a potential issue with the design. It's not entirely clear to me how a client and a miner will negotiate the eol such that the miner is guaranteed enough time to redeem their vouchers without the client having to tie up funds for an excessive amount of time before they can be reclaimed. There will be a point in time (point in block height?) where the client can create and deliver a voucher, but the miner will have, at best, a probabilistic chance of being able to redeem it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in tests like this we can make it so mining always succeeds.

@aboodman aboodman force-pushed the feat/ec/4 branch 4 times, most recently from 9138c72 to 559ec2f Compare May 25, 2018 23:48
@@ -138,7 +138,7 @@ func TestPaymentChannelRedeemSuccess(t *testing.T) {
func TestPaymentChannelReclaimSuccess(t *testing.T) {
payer := &address.TestAddress
target := &address.TestAddress2
eol := types.NewBlockHeight(5)
eol := types.NewBlockHeight(20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the scope of the issue but we should feel empowered to modify mine once's behavior to hit a winning ticket immediately in the case of tests like this if it is useful. Or have a new command that does. If all we want is a block and we don't care about the actual consensus, we can circumvent consensus.

@aboodman
Copy link
Contributor Author

aboodman commented May 26, 2018

I don't fully understand the scope of the issue but we should feel empowered to modify mine once's behavior to hit a winning ticket immediately in the case of tests like this if it is useful. Or have a new command that does. If all we want is a block and we don't care about the actual consensus, we can circumvent consensus.

Yeah, that would be easy to implement technically, but I was concerned about how it would interact with consensus. What would it mean to have this feature in the deployed client? I guess it would mean you mine a block and just nobody agrees? It fails consensus? If that is OK, I'm happy to implement it.

@aboodman
Copy link
Contributor Author

aboodman commented May 26, 2018 via email

@porcuquine
Copy link
Contributor

porcuquine commented May 26, 2018

@aboodman, Sure -- as an example (for shape, not substance), consider your usage:

(func() {
		defer swap.Swap(&isWinningTicket, everyThirdWinningTicket())()
		go Mine(ctx, input, nullBlockImmediately, mockBg, func() { workCount++ }, outCh)
		r = <-outCh
	})()

In this case, it happens to be fine because the relevant work of Mine has definitely completed by the time the function scope ends. (This is because of the blocking receive into r.)

However, knowing that this is the case requires knowing what's going on. Since the test involves many more actions, it's conceivable that we are relying on Mine to do something else after sending to outCh. (I know in your example, that's not the case -- but suppose this had happened earlier in a sequence of action/assertions within a single test.)

Or, maybe Mine does continue to do something else that we wouldn't care about if isWinningTicket were still swapped. But after it sends to outCh the first time, the inner function will return and isWinningTicket will not be swapped anymore. So it's now possible that the still-running (if it is -- this is just an example) goroutine will call isWinningTicket again, but this time get the original/default/snapshot value of the function and call it.

This is manageable, but it does require reasoning a lot more about the concurrency. It also means that changes to any function which spawns a goroutine need to be considered carefully lest they upset a test somewhere (which indirectly brings a call to that function into dynamic scope).

Because of all these issues, I might suggest that we split the difference. We could implement a test wrapper abstraction providing a syntactically convenient way to run a single self-contained test within the dynamic scope of the swapped re-bindings. This would reduce the discipline required in trying to use very fine-grained re-bindings like in your example.

Not directly related, but on the same topic and going back to your comment about parallel tests. I'm actually not sure of exactly when Go's testing framework believes it has the right to run tests in parallel. If we begin adopting this, someone should find that out and document it in the usage rules.

Basically, I think Swap is cool (and I like the black reflect magic in its implementation) -- but I think it comes with a lot of potential pitfalls. We can mitigate that by going all-in and formalizing some patterns which might be a bit more limited than all we can dream up but might also prevent us from getting into some nightmarish scenarios involving non-deterministic tests. If we end up in that quicksand, we'll find it hard to get out and wish we hadn't let it happen.

I'm quite interested to hear about how you've used this in the past and what kind of patterns/harnesses you adopted to stay safe. Maybe this warrants a discussion issue of its own.

@porcuquine
Copy link
Contributor

@aboodman Sorry to have started turning your PR into a discussion on a tangent. Let's take the discussion to an issue: #499

Feel free to delete my long comment, which I copied there, if you like.

@aboodman
Copy link
Contributor Author

aboodman commented May 26, 2018 via email

mining/worker.go Outdated
// let the successful run proceed unless the context is explicitly canceled.
if ctx.Err() == nil {
outCh <- NewOutput(next, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should definitely log something if a context cancellation causes us to throw away a good block. I can't think of too many cases where we would want to actually throw away a correctly mined block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some logging and made the existing TODO more assertive.

// True panics if a boolean condition is not true.
func True(cond bool, formatAndArgs ...interface{}) {
var msg string
if len(formatAndArgs) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not put this inside the if !cond {? That way it only allocates when its going to immediately cease to exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. As a result of this comment, I went looking for ways to test this and found testing.AllocsPerRun. I also decided that it made more sense to remove chk.Equals() and chk.Nil() as those would also implicitly allocate and it's not burdensome to just inline those checks in the calls to chk.True().

//
// The destination value must be a pointer and the new value must be a value
// that can be assigned to the destination pointer.
func Swap(dst, new interface{}) (unswap func()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hot 🔥


age := 25
fn := func() int { return 1 }
(func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extra parens mean... what exactly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. I don't know why gofmt didn't call me out for that.

Copy link
Contributor Author

@aboodman aboodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping ptal. Assuming you are good, can you also please click the 'approve' button.

// True panics if a boolean condition is not true.
func True(cond bool, formatAndArgs ...interface{}) {
var msg string
if len(formatAndArgs) == 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. As a result of this comment, I went looking for ways to test this and found testing.AllocsPerRun. I also decided that it made more sense to remove chk.Equals() and chk.Nil() as those would also implicitly allocate and it's not burdensome to just inline those checks in the calls to chk.True().


age := 25
fn := func() int { return 1 }
(func() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. I don't know why gofmt didn't call me out for that.

@dignifiedquire
Copy link
Contributor

is approval from @whyrusleeping the last thing missing to get this merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants