Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide if msg channels in BTND or edges only #675

Closed
1 of 2 tasks
dehann opened this issue Apr 4, 2020 · 18 comments
Closed
1 of 2 tasks

Decide if msg channels in BTND or edges only #675

dehann opened this issue Apr 4, 2020 · 18 comments

Comments

@dehann
Copy link
Member

dehann commented Apr 4, 2020

EDIT

CONCLUSION

#675 (comment)


Original Post

Following from #459, we should decide if message channels between cliques are stored in either csmc or tree edges.

Related decisions:

Part of the decision is whether the tree should be read-only during inference.


WIP:

@dehann
Copy link
Member Author

dehann commented Apr 4, 2020

I have a preference for storing message channels inside CSMCs rather than on tree edges, loosely discussed in #459. Main discussion transferred here from #532 :

Hi, i don’t completely understand. How do you want to solve a clique in isolation? Do you manually send the messages?

That was a bit unclear sorry -- I'm thinking of the recordcliqs interface of CSM/FSM which records state machine steps in the hist::Vector container. The dream here is to be able to store/restore each individual step of CSM with enough context to run in isolation. For example, think about optimizing or refactoring CSM: if we can store one transition from a working sample, then we can debug and change individual steps simultaneously across all applications.

My thinking around messages in cliques or edges is inspired by this "clique in isolation" or "individual CSM steps" way of thinking. The knee jerk reaction is that a clique is a self contained unit with all necessary messages etc. -- if you have the clique at a particular point in time then (hopefully) everything else downstream becomes easier (maybe).

My reservation surrounding messages on edges comes from now having to store/restore not just the clique, but also its edges. Furthermore, the edges must now be duplicated for this or that clique, and synchronization between two cliques becomes harder and harder to track. Hence the knee jerk. That said, there are limits to how "defensive" the coding patterns should be... A step-by-step CSM is/would be a great developer's helper, but not a replacement for end-to-end testing. Hence the relent to, just build it on edges and preemptively add NOTES as might seem relevant.

There is a looming refactoring here on tree messages, as you already know. This refactoring has slipped into the new year and likely best suited to do part of the treeinit effort. I will likely have to punt the message refactoring another couple of weeks given my immediate schedule, but super excited to get that part cleaned up!

A clique will block waiting for a message to be received or send.

Agreed, that sounds like the right behaviour.

Currently I use normal julia channels so you can just put and take from the channels without duplication needed.

Agreed on Julia ::Channel, however, the duplication I referred to above is from a different use-case origin -- i.e. developer's helper.

While building the treeinit code (as it stands now) I was forced to add additional ::Condition signals to the tree+CSM structure too :-P. This has to do with cases where Siblings cannot run independent of each other any more, but need to share information through the parent, which in turn introduces the additional signaling requirements.

@dehann dehann changed the title Store message channels in CSMC or tree edges Store message channels in CSMC or tree Apr 4, 2020
@dehann dehann changed the title Store message channels in CSMC or tree Store message channels in CSMC or tree edges/BTND Apr 4, 2020
@dehann dehann changed the title Store message channels in CSMC or tree edges/BTND Store msg channels in CSMC or tree edges/BTND Apr 4, 2020
@dehann
Copy link
Member Author

dehann commented Apr 4, 2020

functions like treeProductUp do not have access to CSMC, since this is a debugging function that occurs separate of solver (nominally post-inference to debug a weird internal result).

@Affie
Copy link
Member

Affie commented Apr 4, 2020

We should maybe make the distinction between the message channels and the messages itself.
Message storage options are in clique or CSMC.
For me, the message channels only makes sense in the tree or cliques (with my preference in the tree as you know), not in CSMC.

@dehann
Copy link
Member Author

dehann commented Apr 6, 2020

(with my preference in the tree as you know), not in CSMC.

Ah, I had it that your preference was not in the tree, maybe make the tree read only?

My preference currently is not to store this separately in the tree but rather in either cliques or csmc (#675 (comment))... let me chew on this a little more.

Nonparametric is currently storing the message channels in the tree cliques (BTND):
https://github.com/JuliaRobotics/IncrementalInference.jl/blob/master/src/JunctionTreeTypes.jl#L259-L262

Perhaps parametric is currently storing messages in either tree

messages::Dict{Int, NamedTuple{(:upMsg, :downMsg),Tuple{Channel{LikelihoodMessage},Channel{LikelihoodMessage}}}}

or CSMC?
https://github.com/JuliaRobotics/IncrementalInference.jl/blob/master/src/JunctionTreeTypes.jl#L203-L204

@Affie
Copy link
Member

Affie commented Apr 6, 2020

Perhaps parametric is currently storing messages in either tree

see comment about making a distinction between channels and stored messages: #675 (comment)

@dehann dehann modified the milestones: v0.11.2, v0.11.3, v0.11.4 Apr 17, 2020
@dehann dehann modified the milestones: v0.11.4, v0.0.x Jun 9, 2020
@dehann
Copy link
Member Author

dehann commented Aug 19, 2020

Think we ended up deciding opposite to much of what was said earlier: that msgs must not be stored in CSMC, but somewhere on the tree. It's not super important whether Messages are on the edges or in BTND. Currently the implementation calls for all messages to be stored inside the BTND, and can eventually move to edges if needed.

@dehann dehann modified the milestones: v0.0.x, v0.16.0 Aug 19, 2020
@dehann dehann changed the title Store msg channels in CSMC or tree edges/BTND Store msg channels in BTND Aug 19, 2020
@Affie
Copy link
Member

Affie commented Aug 25, 2020

1

I find it a bit hard to follow the msg channels in BTND. I think I have it figured out, is this correct (hopefully it can help in the future if it is):
image

@Affie
Copy link
Member

Affie commented Aug 25, 2020

2

For completeness here is the approach taken in parametric:
image

It looks like the biggest difference is every clique has its own down channel. So it's not as straight forward to just point it to a different location.

@Affie
Copy link
Member

Affie commented Aug 25, 2020

3

This is what I thought was happening, and another possibility:
image

@dehann
Copy link
Member Author

dehann commented Aug 26, 2020

While i consume the figures (thanks), I really don't see the hold up between msg channels on edges or in BTND: somewhere in the code a CSM execution asks "GetUpwardChannel(csmc)::Channel{LikelihoodMessage}" and elsewhere another piece of code asks for "GetDownwardChannel(csmc)::Channel{LikelihoodMessage}" ... what does it matter where the channel references are stored, as long as csmc has that pointer? And in both cases csmc does contain references to both the ::BayesTree and all ::BTND, so it really should not matter?

I'll work through the figures when i get a chance, perhaps there is something I'm missing.

Regardless, to aid consolidation for #459, I'd ask that we please store the messages in BTND since CSM is that much bigger than ParametricCSM. It is much harder to change CSM than ParametricCSM. The two structures should be identical and this seems to be a growing pain during consolidation. The only reason this would be hard is if the "GetMsgChannel(csmc)" functions were not possible, and I think they are easily possible?

After consolidation, we can move the references to message channels from BTND to edges (if we end up choosing to do that in the end).

@Affie
Copy link
Member

Affie commented Aug 27, 2020

In short:

  • 1 is if I follow your "pull" definition, it shares a down channel and has unused channels - I'm still not sure if this is how it is/(will be)
  • 2 and 3 are basically the same except for unused channels in the root (3)

@dehann
Copy link
Member Author

dehann commented Aug 28, 2020

Hi @Affie, CSM is currently halfway between 1 and 3, and I'm working towards making it only be like 1.

CSM used to be more like 3, but after #459 closes it will only be like 1.

@dehann dehann modified the milestones: v0.16.0, v0.16.1, v0.17.0 Sep 18, 2020
@dehann dehann changed the title Store msg channels in BTND Decide if msg channels in BTND or edges only Oct 20, 2020
@dehann
Copy link
Member Author

dehann commented Oct 20, 2020

Hi @Affie , to bring this to a decision -- are we going to do messages in channels that are not stored in either BTND or CSMC, but rather a separate MessageChannels store in BayesTree? If so then BayesTree will in future have to be able to work with subset of cliques (not critical at this stage).

We also need a 100% determinstic way to ensure that step by step FSM sandbox repeat calls of of CSM states are exact replicates of the situations at that instant. Since csmc.tree is deepcopied, I think that should be fine?

@Affie
Copy link
Member

Affie commented Oct 20, 2020

I don't think it matters that much. The only requirement that I have is to have a down channel per child. This simplifies sequencing of the cliques and can be used in the future to only send a message to one of the child branches.

So I'm consolidating to use the edge messages as in 2 for now. As 1 only has one down channel. My previous function refTreeMessageChannelsFromBTND! actually uses 3 and not 1, as it uses incoming edges (so children)

We also need a 100% determinstic way to ensure that step by step FSM sandbox repeat calls of of CSM states are exact replicates of the situations at that instant. Since csmc.tree is deepcopied, I think that should be fine?

Remember the channels are not buffered in take!, so technically the tree will not hold any state information if channels are stored in it. waitforup and waitfordown is dependant on the channels, but it's not likely to be debugged in separation as it depends on messages from other cliques. After that, the message is stored in the clique and can only be changed on the next cycle (up/down)

For me, the best of both solution at this stage is to make copies of the channels in the CSMC.
so you would have an up-channels and a down-channels vector. That way you don't need to have copies of the parent and child cliques in your CSMC. As the rule is to only communicate through the up and down channels.

@dehann
Copy link
Member Author

dehann commented Oct 21, 2020

agree 100% with only communicate through channels as hard requirement.

down channel each child sounds good too, no issues there.

did not connect the dots that channel length 0 means tree doesn't have state -- that's very good i think. in old design i had to sometimes reset channels manually. your approach sounds better.

Duplicating channels for debug restep with FSM should be fine, as long as that copying is not the place where things break. if you're happy i'd say hi for it. or avoid if easy, whatever makes the most sense.

so design decision on this issue then is model 2 only (old language "pull model") with take-only channel length 0. If you are happy with that @Affie we can close this issue as soon as able.

@dehann
Copy link
Member Author

dehann commented Oct 21, 2020

ie channels definely not in BTND.

@Affie
Copy link
Member

Affie commented Oct 22, 2020

Ok, so we can reevaluate in the future:
If we remove the tree from CSMC and only keep a clique, then we can copy the channels from the tree to the CSMC instead of the full parent and child cliques.
It works by reference so the only difference will be with debugging as deep-copies are made.

@dehann
Copy link
Member Author

dehann commented Oct 22, 2020

JIp, as long as the deepcopy point is not some weird place where multithreaded timing etc becomes hard to debug -- sometimes nice to just have full deepcopy of the object as is. Guess we can fix with atomics if that comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants