-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental data-store generation #3202
Conversation
2dbda8c
to
fa35518
Compare
3b73f44
to
8235d2d
Compare
Codecov Report
@@ Coverage Diff @@
## master #3202 +/- ##
==========================================
+ Coverage 83.51% 83.62% +0.11%
==========================================
Files 162 162
Lines 17900 18004 +104
Branches 3899 3929 +30
==========================================
+ Hits 14949 15056 +107
+ Misses 2951 2948 -3
Continue to review full report at Codecov.
|
8235d2d
to
ce798a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor comment on style.
In general, it would be nice to have a bit more comments on the complex calculations being introduced here.
I guess you'll also add some tests for the change as well?
I second that, a comment wherever the purpose of a code block is not obvious by inspection (to someone who did not write it!). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach seems OK.
A few quick points on efficiency which is quite important in this code due to the load it will put onto the workflow, none of these comments are really blockers though.
@oliver-sanders, @matthewrmshin Here's the missing piece of the puzzle: This is protobuf world, so the rules are a little different... (fortunately/unfortunately) |
Another thing: I think I may need to change the approach somewhat... I'm a little worried about the edge case where suites leave the initial cycle point hanging around... So may need to gather all the cycles in the pool, and prune and/or generate edges and nodes based on that... Thoughts? |
Not sure, the current GUI will display the task in the initial cycle attaching it to the active tasks via a scissor node. |
Found a flaw in this approach...
So the graph edges can't be generated incrementally, the start point would have to be the ICP (although there may be some logic in there somewhere to help with this)... Would be nice to be able to continue the edge generation from where it stopped previously.. Will keep looking into this... |
Looking at the old gui, it use to ask for the entire graph: cylc-flow/lib/cylc/gui/updater_graph.py Line 284 in 0cc4b46
It made a call to
Using the oldest cycle point from the
Although in the case of ICP offset graphs without ICP dependencies, this could still be wrong.. Because your start point would act as the ICP... So the way forward might be to:
Then I'll test the ICP offset scenario.. |
Ok I think I have a solution; in the Quite a simple change, and appears to be quite efficient. |
31e7197
to
a79aa6c
Compare
OK significant improvements made:
This change means the "edge" case of ICP dependencies is now handled correctly.. Overall I'm pretty happy with this update scheme, it will be way more efficient than before (even though a lot of logic has been added), because it only generates, updates, and/or prunes minimally when and what's needed.. Next: Add functional tests There's a few other PR post this one:
Anyway, input/responses welcome. |
a79aa6c
to
e877f77
Compare
f6f0db4
to
cef34d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cef34d5
to
e806040
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need to quickly see if I can debug it and see the pool being re-populated instead of re-created. But finished reviewing the code.
Great work adding the exclusions for coverage/analysis of the protobuf generated files. And great work on the tests too! 🎉
Left some comments for typos and a question, but none of them are blockers for merging it, and can be fixed afterwards.
Tested with the sibling PR of cylc-uiserver, and at least for Cylc UI it's transparent, and everything works the same as before (but must be better/faster now).
Thanks!
Oh, a silly question, but should we be logging the traffic between client & server via zmq? The messages are hard to read and seem to pollute the logs more than help, e.g. Maybe if the message was nicely formatted, it could be helpful? I'm running the suite with |
I wouldn't do it.. it's in protobuf serialised format... that's a dump of the data into the log... If anything we could print some metadata about the request/response, but definitely not the data. |
Thanks @dwsutherland . I'm not sure where the logging is being done, but we can look into that later (just wanted to check with others here as I would otherwise definitely forget it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwsutherland great work!
I set a breakpoint in scheduler.update_data_structure
, and later was taken to the WsDataMgr.increment_graph_elements
.
From what I could understand, you are basically calculating the differences twice. First the difference of what was added to the Scheduler.pool
and now needs to be added to the pool used for workflows. And then finally you check the difference for the set of tasks pruned, and remove them from the workflow pool too.
And that sets the has_updated
to True
, and when that happens, the WsDataMgr.update_workflow
is called, where it copies the current Workflow
(used by resolvers to send the GraphQL query response). I didn't understand why you had to copy the Workflow
, and not simply update its fields, but there must be a reason, and it seemed to work fine, so approved! 🎉
And photo of the debugger in action. Here is the second time scheduler.run
was executed, after a runahead task was added to the Scheduler pool, and now needs to be incrementally added to the workflow.
Unit tests added Sorting query arg ID delimiter var Functional tests added checksum to stamp re-intialise data-store on reload Use external id with store keys graph sort bug fix
This is only done in Perhaps the time has come to turn it off now the comms layer is tried and tested. |
3743a2b
to
2354f89
Compare
Some profile results showing the
So we can see the Cylc-8:
So the
Giving a grand total of
(as expected, with there being only one cycle point, included in data store wins: 57% faster! Note: Simulation runs don't include the job pool... However, that would just be more memory and a list comprehension on updated tasks (so maybe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, all good now 👍
That seems to be four approvals 🥇 |
For the record, master (without state-summary) was way slower before this merge:
Incremental data-store generation wins: (as expected) |
The current data-store generation is inefficient (as was the old
state_summary_mgr.py
), in that it generates an entirely new data-store on every loop where any change has flagged...The update/creation has been granulated into the following actions/methods:
point_strings
.These are used within the
scheduler
in implementation with:There was a bug in the job pool preventing job element deletion, this has also been fixed.
This change has the following impacts:
Closes #3261
Closes #3212
Requirements check-list
CONTRIBUTING.md
and added my name as a Code Contributor.