Skip to content
This repository has been archived by the owner on Jun 26, 2023. It is now read-only.

[WIP] remove waitpub, export publish #47

Closed
wants to merge 1 commit into from

Conversation

schomatis
Copy link
Contributor

Fixes #38.

@ghost ghost assigned schomatis Dec 21, 2018
@ghost ghost added the status/in-progress In progress label Dec 21, 2018
@schomatis schomatis force-pushed the fix/republisher/remove-waitpub branch 2 times, most recently from 5739f11 to 50b6144 Compare December 21, 2018 21:12
@schomatis schomatis requested a review from Stebalien December 21, 2018 21:12
@schomatis
Copy link
Contributor Author

@Stebalien This is a rough sketch of my proposal in #38, could you take a look at it please and tell me what you think please?

This patch doesn't guarantee that different PublishNow will actually happen in the order they were called (but this isn't guaranteed in the current code anyways, since the order of Update calls isn't enforced).

Copy link

@nitishm nitishm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM


valueLock sync.Mutex
valueToPublish cid.Cid
lastValuePublished cid.Cid
valueToPublish *cid.Cid
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a pointer ? Can the CID be modified (elsewhere) while we are waiting to publish ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think I see how this handled in PublishNow() with the extractedValue == nil check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the CID be modified (elsewhere) while we are waiting to publish ?

It shouldn't be, the Update API hasn't changed, we still make a local copy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be a cid.Cid and should be set to cid.Undef when "nil". (probably should have been cid.Nil but I didn't win that argument).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought of that at first but we do cid.Undef to issue publish orders, at least in some tests,

go-mfs/repub_test.go

Lines 43 to 45 in 4fb6dc4

go func() {
for {
rp.Update(cid.Undef)

so I think it violates the semantics I would expect of a nil value.

I agree that we should fix that and provide a cid.Nil but in the meanwhile I don't see the harm in implementing the nil with a pointer for an internal variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer is fine but shouldn't be necessary. Really, we should fix that test, passing the "Undef" CID is should be equivalent to a "nil" CID (and passing a nil CID to rp.Update doesn't make sense).

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the right approach. However, I'd expect PublishNow to only return after the latest value has been published (even if there's a concurrent publish).


valueLock sync.Mutex
valueToPublish cid.Cid
lastValuePublished cid.Cid
valueToPublish *cid.Cid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be a cid.Cid and should be set to cid.Undef when "nil". (probably should have been cid.Nil but I didn't win that argument).

if err != nil {
return err
if extractedValue == nil {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concurrent call won't actually wait. We may need a RwMutex here.

@schomatis
Copy link
Contributor Author

However, I'd expect PublishNow to only return after the latest value has been published (even if there's a concurrent publish).

Yes, that seems fair but WaitPub didn't do that, and my main objective here is to simplify the code, not redefine behavior, maybe PublishNow is a misleading name, would PublishCurrentValue be better?

@Stebalien
Copy link
Member

Yes, that seems fair but WaitPub didn't do that, and my main objective here is to simplify the code, not redefine behavior, maybe PublishNow is a misleading name, would PublishCurrentValue be better?

If one changes a file and then calls WaitPub, WaitPub is guaranteed to not return until that change has been published. Of course, given multiple calls to WaitPub, one (or more) of these calls may block waiting for yet another change (that's the issue you're fixing).

Here, given two commands call PublishNow at the same time, one command will return early (before the publish happens). That means PublishNow, as implemented, isn't useful as a replacement for WaitPub.

@@ -139,17 +113,20 @@ func (rp *Republisher) Run() {

// Wrapper function around the user-defined `pubfunc`. It publishes
// the (last) `valueToPublish` set and registers it in `lastValuePublished`.
func (rp *Republisher) publish(ctx context.Context) error {
// TODO: Allow passing a value to `PublishNow` which supersedes the
// internal `valueToPublish`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to allow this. Users shouldn't swap out the MFS root using the republisher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the comment, what do you mean by swap out?

The TODO (that I don't think I worded correctly) was aiming at adding an optional argument that would replace the Update(newCid); PublishNow(); call pair with just a PublishNow(newCid) call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yeah, that makes sense.

(context: I keep thinking we're exposing the republisher to the user)

@schomatis
Copy link
Contributor Author

Here, given two commands call PublishNow at the same time, one command will return early (before the publish happens). That means PublishNow, as implemented, isn't useful as a replacement for WaitPub.

Good point, actually this is just a simple (but important) mistake on my part, I should have been following the pubfunc structure of taking the lock twice, a first time to extract the value to publish and a second one to mark it as published (valueToPublish = nil) only after pubfunc is called, does that sound fair to you?

@schomatis schomatis force-pushed the fix/republisher/remove-waitpub branch from 50b6144 to 89de69a Compare January 5, 2019 03:28
@Stebalien
Copy link
Member

does that sound fair to you?

Are you referring to your latest change? That's going to overwrite potentially unpublished values. Also, we really shouldn't be allowing the user to invoke pubfunc multiple times in parallel (both for thread safety and to prevent logical races where we publish values out of order). In the past, this was protected by the loop.

@schomatis
Copy link
Contributor Author

That's going to overwrite potentially unpublished values. Also, we really shouldn't be allowing the user to invoke pubfunc multiple times in parallel (both for thread safety and to prevent logical races where we publish values out of order). In the past, this was protected by the loop.

If I understand you correctly (please correct me if not) the two issues to study are:

  1. Is the user-supplied PubFunc function thread-safe? This is what I was wondering in WaitPub may wait more than necessary #38 (comment), but from your following comments I assumed this wasn't a problem, if I misunderstood that (sorry!) and we can't risk calling it in parallel then we should just close this PR, which has no way of working then.

That's going to overwrite potentially unpublished values.

prevent logical races where we publish values out of order

This is actually what motivated this PR in the first place. I think the previous examples you mentioned put focus on multiple WaitPub calls, but from what I understand of the MFS API the user doesn't call WaitPub in isolation but only as a complement to Update (to ensure pubfunc was actually called when WaitPub returns). In that setup I don't see that this logical race being prevented, two simultaneous calls to Update(); WaitPub() don't guarantee that pubfunc is called for the two updated values, most likely one will overwrite the other (because of our short timer logic) and only one publish operation will happen. PublishNow does not fix that, but it makes it (IMO) much more explicit (giving visibility to what otherwise are easy to overlook bugs like #38). Putting the pubfunc in a loop gives the impression that we do an orderly publish of updated values when I think in fact we don't.


Anyway, if my appreciation in either of those points is wrong let's close this PR, my only objective at the moment, since I don't have much time left for proper fixes and redesigns, is to simplify the code for the next one to come along (hopefully not you :) to get a more clear perspective of what the code does (and doesn't do) and how could that be improved upon. Any suggestions towards that end you could propose I'll try to implement during next week.

@Stebalien
Copy link
Member

This is what I was wondering in #38 (comment), but from your following comments I assumed this wasn't a problem, if I misunderstood that (sorry!) and we can't risk calling it in parallel then we should just close this PR, which has no way of working then.

It's probably thread-safe (although we shouldn't assume that) however, that doesn't matter. If the user calls Update(x) and then Update(y) in a single thread, y should win. With the new code, x could win (and could even prevent y from being published) given a concurrent PublishNow() call.

This is actually what motivated this PR in the first place.

Given two simultaneous calls to Update, one will always trump the other. However, given two sequential calls to update with interleaved calls to WaitPub and/or PublishNow, the last published value should always win. That's the real issue here (the end-user can't currently call Update anyways).

Anyway, if my appreciation in either of those points is wrong let's close this PR, my only objective at the moment, since I don't have much time left for proper fixes and redesigns, is to simplify the code for the next one to come along (hopefully not you :) to get a more clear perspective of what the code does (and doesn't do) and how could that be improved upon. Any suggestions towards that end you could propose I'll try to implement during next week.

Fixing #38 shouldn't require a redesign.

@schomatis
Copy link
Contributor Author

However, given two sequential calls to update with interleaved calls to WaitPub and/or PublishNow, the last published value should always win.

If the user calls Update(x) and then Update(y) in a single thread, y should win. With the new code, x could win (and could even prevent y from being published) given a concurrent PublishNow() call.

I'm not sure I'm following, with this patch sequential calls to Update(x); PublishNow(); Update(y); PublishNow(); would not respect the order? (If so, could you expand on why?)


Fixing #38 shouldn't require a redesign.

Agreed, what I meant is that I want to diminish the technical debt that I think helps bugs like #38 go unnoticed, that requires a redesign I think.

@schomatis
Copy link
Contributor Author

(If so, could you expand on why?)

Ok, I think I get it, the loop in Run doesn't play nice with independent PublishNow() calls from the user.

@schomatis
Copy link
Contributor Author

Holding the lock throughout pubfunc seems like too much, closing then.

@schomatis schomatis closed this Jan 8, 2019
@ghost ghost removed the status/in-progress In progress label Jan 8, 2019
@schomatis schomatis deleted the fix/republisher/remove-waitpub branch January 8, 2019 00:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants