Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch vs take! on message channels #855

Closed
Affie opened this issue Aug 24, 2020 · 10 comments
Closed

fetch vs take! on message channels #855

Affie opened this issue Aug 24, 2020 · 10 comments

Comments

@Affie
Copy link
Member

Affie commented Aug 24, 2020

Currently parametric works with a take! and save (for later use and debugging) model. (or used to, it's broken) restored with #857

CSM works with fetch and conditions with channels stored in the source clique ("pull")

Part of #459

@Affie
Copy link
Member Author

Affie commented Aug 25, 2020

The "take! and save" approach in parametric is based on:

  • The "4-stroke" state machine with messages going all the way for initialization also (leaves <--> root)
    • Based on asynchronous vs synchronous message passing algorithms.
    • The approuch can also work with turning around quicker, as long as the "up-down-up-down" order remains. The only reason for this is it greatly reduces complexity.
  • Therefore, cliques only calculate messages once per stroke.
    • DF, "4-stroke" is still in its initial phase, and unclear if that will actually transpire to work in all cases (I hope and believe it probably will). However, the current tree init in CSM is not yet 4-stroke (it uses the local-cascading model). Until then and since nonGaussian is the primary purpose, with current non-init ParametricCSM still being relatively simpler than existing initing-CSM, I think PCSM should be changed to a fetch model (to match CSM for consolidation purposes) regardless of whether the tree init solution eventually turns out to be a 4-stoke solution. Don't jump the gun by building a 4-stroke parametric solution before consolidation with CSM is not done first, and future 4-stroke work must under no circumstances be allowed to re-break the consolidation between CSM and PCSM -- I have been feeling the pain (mostly alone) working to consolidate the two CSMs and as a group we have to move past this phase. The hard-core objective right now is to consolidate PCSM with CSM before anything else and then keep them consolidated, 4-stroke only happens after consolidation is complete. If CSM has a bad design, I would like/ask for some help in fixing that please, rather than being the only one maintaining CSM. There is plenty of work to do in CSM and I don't want a situation where CSM and PCSM are constantly mismatched.
    • Regarding uncertainty in 4-stroke, remember that nonGaussian is significantly more general than parametric, and that many initialization cases that CSM can do will never be possible with PCSM -- i.e. CSM kinda has to lead the development otherwise we build bad assumptions into CSM that are difficult to remove / undo later. I'm super eager to get back to the tree init stuff, but unfortunately we need to eat our vegetables first before we can get back to the fun problems like tree init -- we have to do the hard work now, no easy way around that and lets fix bad design in CSM to reduce the long term maintenance load of CSM+PCSM (and whichever other solvers follow).
  • I wanted to rely on the channels alone for synchronisation and wait for all children to send a message.
    • DF, same for both take! and fetch architectures, so this one should be good to go in all cases. CSM upward pass (fetch model) is already a 95% match on this requirement. Downward will hopefully be better consolidated soon, and that is only a labor/consolidation limitation (not a design limitation).
    • JT, by synchronisation I also mean to wait on, and not just the clique status. Currently, CSM needs conditions to wait on. The channels are basically just wrappers for convenience around conditions (with a buffer). So it still uses 2 "channels"

Ref

See single upMsg::Channel{LikelhoodMessage} that is used exclusively for synchronization -- there might be a few places in CSM that is not yet using the channel for getting clique status, but that is part of the CSM cleanup that follows consolidation:

upMsgChannel::Channel{LikelihoodMessage}

@dehann
Copy link
Member

dehann commented Sep 25, 2020

synchronisation I also mean to wait on

For tree init the status needs to be checked multiple times, and later messages need to be fetched more than once -- this make the take! on one channel only harder i think. I was not able to resolve tree init without switching to a fetch model with Condition to allow a wait cycle again.

channels are basically just wrappers for convenience around conditions (with a buffer)

Yes and no, Conditions are edge triggered while Channels are level triggered. It's discussed somewhere in the Julia docs.

You'll see in CSM there are states that are slowXYZ. This is usually a wait on condition with checks for some kind of status, and also short loops in the CSM should a certain criteria be met. This is sometimes required for treeinit where a hard block until upsolved is not always possible, or when siblings need to decide on a down init order (thats in the cascading model).

@Affie
Copy link
Member Author

Affie commented Sep 29, 2020

Yes and no, Conditions are edge triggered while Channels are level triggered

Yes, you are correct. For me, that is an advantage, with level triggering cliques don't need to be waiting when the message is sent.

@dehann
Copy link
Member

dehann commented Sep 30, 2020

Just confirming, for the take model you send a copy of the down message to each child?

@Affie
Copy link
Member Author

Affie commented Sep 30, 2020

Yes. As you know there is a channel per edge. Currently for down the same message is sent on all down channels. I considered a possible future need to send a specific message to only one child, but that didn’t seem necessary.

@Affie
Copy link
Member Author

Affie commented Sep 30, 2020

The cliques are scheduled by the repeating pattern:

  • ....
  • send up
  • wait for down
  • do some down stuff
  • send down
  • wait for up
  • do some up stuff
  • ....

EDIT: DF adding to IIF wiki for CSM Storyboards:
https://github.com/JuliaRobotics/IncrementalInference.jl/wiki/CSM-Storyboards

@Affie
Copy link
Member Author

Affie commented Oct 7, 2020

WIP take! storyboard
image

take! stating UP sequence

  • Do all prep (MarginalizedRecycle, IncrementalRecycle, BuildSubgraph, etc)
  • Wait for up message if needed (leaves start)
  • Try up init
    • If priors try init
    • Elseif linear on manifold (TODO, calculate for differential only)
    • Else can't init
  • Send message up
  • Wait for down message if needed (root start)
  • Branch for down
    • If all children up-solved down solve
    • else try down init
  • Try down init
  • Send message down
  • Wait for up message if needed (leaves start)

... If cliques can now be initialized with new messages upsolves starts, otherwise the init sequence is continued until all are initialized

  • Do upsolve
  • Send message up
  • Wait for down message if needed (root start)
  • Downsolve
  • Send message down
  • Update from subgraph and exit CSM

This was referenced Oct 7, 2020
@dehann
Copy link
Member

dehann commented Oct 15, 2020

Ah great thank you, I added a link from IIF Wiki CSM-Storyboard above.

@dehann
Copy link
Member

dehann commented Oct 15, 2020

Will be interesting to see this expand if cascaded down init returns as shortcut over x-stroke (if that were to happen again).

@dehann dehann added the vote label Oct 18, 2020
@dehann
Copy link
Member

dehann commented Oct 18, 2020

think we are +2 for take!-only model,

@Affie can confirm

@dehann dehann closed this as completed Oct 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants