-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documenting CSM and inference steps for debugging #443
Comments
Note that when using
which shows the belief messages passed up or down in each case added as |
Is there a way to debug with a failed solve (where it errors and doesn't return)? |
Hi @Affie , yeah so the best I have been able to do so far is as follows: First thing is to find where the logs are going to land: @show getLogPath(fg) Then solve with all debug options: getSolverParams(fg).dbg = true
getSolverParams(fg).async = true
tree, smt, hist = solveTree!(fg, verbose=true, recordcliqs=ls(fg)); Verbose is very useful, but only prints to the console so that you can get an idea of how far each CSM gets and in which functions the error might occur. With this you will then get process 1 context back while the solver works in the background -- do not start a second solve while the first is still working in the background. Once the failures occur, there are three things to do:
One more thing you can do is to limit the number of iterations for all CSM: getSolverParams(fg).limititers = 13 which allows you to force early termination of CSM and thereby get the csmc object nearest the failure. It is actually possible to get even closer, one CSM at a time: fg = generateCanonicaFG_Hexagonal()
@show getLogPath(fg)
tree = wipeBuildNewTree!(fg)
drawTree(tree, show=true)
getSolverParams(fg).dbg = true
getSolverParams(fg).async = true
getSolverParams(fg).limititers = 13
cliqtask = solveCliq!(fg, tree, :x2, verbose=true, recordcliq=true, async=true)
# wait for that task to finish
sleep(10)
getSolverParams(fg).limititers = 9
cliqtask = solveCliq!(fg, tree, :x0, verbose=true, recordcliq=true, async=true)
... |
Actually there is one more you can do with using RoME
fg = generateCanonicalFG_Hexagonal()
tree, smt, hist = solveTree!(fg, recordcliqs=ls(fg));
printCliqHistorySequential(hist)
semicolon is recommended :-)
|
The most pressing developer tool here is to better manage how CSM terminates on error. The two options I have come up with are either (sure there are more options available):
see #758 |
reminder that the revamped |
Another trick that doesn't come up too often but none the less: getSolverParams(fg) |> typeof |> fieldnames |
Also worth stating again here, the purpose of # either from hist4 = hist[4] or
hist4 = fetch(cliqtask4)
# the next function that would have been called at step 11
fnc11! = hist4[11][3]
csmc11_ = deepcopy(hist4[11][4])
# must reset the condition
getSolveCondition(csmc11_.cliq) = Condition()
# now redo that step (likely with debugger)
fnc12_ = fnc11!(csmc11_)
# note the csmc11_ object will now be changed after fnc11!
## The design intends
fnc13_ = fnc12_(csmc11_)
# ... |
added a new tool
|
Snapshot example of hex init script: |
Commits 426e98f , 1dbe967 , 030d75b add a new swim lane type CSM printout: julia> IIF.printCSMHistoryLogical(hists)
| x0 | x4 | l1 | x1 | x5 | x3
----+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------
1 | 1 testCliqC null | 10 testCliqC null | 21 testCliqC null | 32 testCliqC null | 43 testCliqC null | 54 testCliqC null
2 | 2 testCliqC null | 11 testCliqC null | 22 testCliqC null | 33 testCliqC null | 44 testCliqC null | 55 testCliqC null
3 | 3 isCliqUpS null | 12 isCliqUpS null | 23 isCliqUpS null | 34 isCliqUpS null | 45 isCliqUpS null | 56 isCliqUpS null
4 | 4 buildCliq null | 13 buildCliq null | 24 buildCliq null | 35 buildCliq null | 46 buildCliq null | 57 buildCliq null
5 | 5 canCliqMa null | 14 canCliqMa null | 25 canCliqMa null | 36 canCliqMa null | 47 canCliqMa null | 58 canCliqMa null
6 | 6 blockUnti null | 15 blockUnti null | 26 blockUnti null | 37 blockUnti null | 48 blockUnti null | 59 blockUnti null
7 | 7 trafficRe null | 16 trafficRe null | 27 trafficRe null | 38 trafficRe null | 49 trafficRe null | 60 trafficRe null
8 | 8 determine null | 17 checkIfCl null | 28 checkIfCl null | 39 checkIfCl null | 50 checkIfCl null | 61 checkIfCl null
9 | 9 childCliq null | 18 blockCliq null | 29 blockCliq null | 40 blockCliq null | 51 blockCliq null | 62 blockCliq null
10 | 103 trafficRe null | 19 determine null | 30 determine null | 41 determine null | 52 determine null | 63 determine null
11 | 104 determine null | 20 childCliq null | 31 towardUpO null | 42 towardUpO null | 53 towardUpO null | 64 towardUpO null
12 | 105 towardUpO null | 85 trafficRe null | 65 attemptCl null | 68 attemptCl null | 71 attemptCl null | 74 attemptCl null
... |
as of FSM v0.2.8 a new |
Just copying a recent example here for the record: using Caesar, RoME
fg = generateCanonicalFG_Hexagonal(graphinit=false)
getSolverParams(fg).treeinit = true
getSolverParams(fg).graphinit = false
getSolverParams(fg).limititers = 100
getSolverParams(fg).drawtree = true
getSolverParams(fg).showtree = true
getSolverParams(fg).dbg = true
getSolverParams(fg).async = true
# limitcliqs = [:x0=>8;:x4=>12;:l1=>21;:x1=>21;:x5=>50;:x3=>50] # breaks
# limitcliqs = [:x0=>8;:x4=>13;:l1=>21;:x1=>21;:x5=>60;:x3=>60] # 50 # doesnt break, blocks
# injectDelayBefore=[2=>(canCliqMargRecycle_StateMachine=>5); ] # step 8
# injectDelayBefore=[5=>(canCliqMargRecycle_StateMachine=>5); ]
# injectDelayBefore=[6=>(towardUpOrDwnSolve_StateMachine=>10); ]
# injectDelayBefore = nothing
mkpath(getLogPath(fg))
verbosefid = open(joinLogPath(fg, "csmVerbose.log"),"w")
# verbosefid = stdout
tree, smt, hists = solveTree!(fg, recordcliqs=ls(fg), verbose=true, verbosefid=verbosefid, timeout=50 ) #, timeout=40 , injectDelayBefore=injectDelayBefore ) #, limititercliqs=limitcliqs);
flush(verbosefid)
close(verbosefid)
# for .async = true (because .drawTree=true)
smt[7] |> x->schedule(x, InterruptException(), error=true)
open(joinLogPath(fg, "csmLogicalReconstructMax.log"),"w") do io
IIF.reconstructCSMHistoryLogical(getLogPath(fg), fid=io)
end
# async case
fetchCliqHistoryAll!(smt, hists)
open(joinLogPath(fg, "csmSequ.log"),"w") do fid
printCliqHistorySequential(hists, nothing, fid)
end
open(joinLogPath(fg, "csmLogi.log"),"w") do fid
printCSMHistoryLogical(hists, fid)
end
# printCliqHistorySequential(hists)
# printCliqHistorySequential(hists, 1=>10)
# printCliqHistorySequential(hists, [1,4,6]=>11:15)
# printCliqHistorySequential(hists, [1=>9:16; 2=>20:34; 4=>29:34])
# printCliqHistorySequential(hists, [5=>12:21;6=>12:21])
printCSMHistoryLogical(hists)
# also see dbg logs at this path for more info
# @show getLogPath(fg)
using Images
csmAnimateSideBySide(tree, hists, encode=true, nvenc=true, show=true) |
Enable clique history recording and provide smtasks, then retrieve with smtasks = Task[]
solveTree!(fg; smtasks, verbose=true, timeout=10, recordcliqs=ls(fg));
hists = fetchCliqHistoryAll!(smtasks); We can automate it further in the future. |
Will the internal IncrementalInference.jl/src/SolverAPI.jl Line 533 in 8bede36
|
For now, it has to be called after the solve with I don't know about the commented out call. |
Everything logged with |
Another usefull workflow for debugging CSM freezes is: getSolverParams(fg).async = true
smtasks = Task[]
solveTree!(fg; smtasks) Then if it hangs call: IIF.throwIntExcToAllTasks(smtasks) |
|
EDIT, some of these code snippets are out of date.
Console
solveTree!
will always generate content atgetLogPath(fg)
location.getSolverParams(fg).dbg
.recordcliqs=[:x1;:x2;...]
will save the CSM history to RAM.Animate CSM Video
Can also make concurrent video of CSM at work:
Also see
csmAnimate
's own documentation.EDIT
new
Caesar.writevideo
function directly engages withffmpeg
which is easier:https://github.com/JuliaRobotics/Caesar.jl/blob/b514d18cbb6cc3b5557f67617ed08918c4260a26/src/images/images.jl#L6-L50
Also see example of debug process in #754
The text was updated successfully, but these errors were encountered: