Incremental data-store generation #3202

dwsutherland · 2019-06-29T02:59:24Z

The current data-store generation is inefficient (as was the old state_summary_mgr.py), in that it generates an entirely new data-store on every loop where any change has flagged...
The update/creation has been granulated into the following actions/methods:

Creation of workflow/taskdef/familydef data elements.
Generation of graph edges and ghost-nodes/ghost-instances/ghost-proxies (containing def data and tree relationships) for each new cycle point in the pool.
Update/population of [ghost] task instance fields with dynamic data (i.e. status info) for given pool tasks.
Update/population of [ghost] family instance fields with dynamic data (i.e. status info) according to given cycle points (i.e. those cycle points of the updated tasks).
Update/population of workflow (data element) fields with dynamic data (i.e. status info).
Prune data-store of elements (edges/nodes ...etc) and items in fields on or in reference to given list of point_strings.

These are used within the scheduler in implementation with:

Initiation of data-store on run/restart/reload using (1)-(5)
Increment of graph elements and prune old elements using (2), (5), and (6) worked out from the head and tail of the task pool respectively (workflow updated after with new state totals etc).
Populate/Update of dynamic fields of task/family instances and then workflow element ((3), (4), then (5)), from tasks flagged as updated by the workflow.

There was a bug in the job pool preventing job element deletion, this has also been fixed.

This change has the following impacts:

Significant reduction of load on the workflow, mainly from not having to generate entire graph on every change (also that job deletion bug).
The data-store is used both in the WS and UI Server as driving data for GraphQL, so this change sets up the ground work for implementing an incremental data sync between WS & UIS data store.

Closes #3261
Closes #3212

Requirements check-list

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Appropriate tests are included (unit and/or functional).
No change log entry required (e.g. change is small or internal only).
No documentation update required for this change.

codecov · 2019-06-29T19:27:00Z

Codecov Report

Merging #3202 into master will increase coverage by 0.11%.
The diff coverage is 98.26%.

@@            Coverage Diff             @@
##           master    #3202      +/-   ##
==========================================
+ Coverage   83.51%   83.62%   +0.11%     
==========================================
  Files         162      162              
  Lines       17900    18004     +104     
  Branches     3899     3929      +30     
==========================================
+ Hits        14949    15056     +107     
+ Misses       2951     2948       -3

Impacted Files	Coverage Δ
cylc/flow/job_pool.py	`84.5% <100%> (+1.4%)`	⬆️
cylc/flow/scheduler.py	`82.09% <100%> (+0.04%)`	⬆️
cylc/flow/config.py	`84.58% <75%> (-0.13%)`	⬇️
cylc/flow/ws_data_mgr.py	`95.89% <99.04%> (+1.29%)`	⬆️
cylc/flow/task_pool.py	`89.02% <0%> (-0.51%)`	⬇️
cylc/flow/task_job_mgr.py	`84.88% <0%> (-0.42%)`	⬇️
cylc/flow/cycling/iso8601.py	`87.87% <0%> (+0.19%)`	⬆️
cylc/flow/cycling/__init__.py	`76.47% <0%> (+0.53%)`	⬆️
cylc/flow/loggingutil.py	`92.17% <0%> (+0.86%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a48169e...8235d2d. Read the comment docs.

matthewrmshin

A minor comment on style.

In general, it would be nice to have a bit more comments on the complex calculations being introduced here.

I guess you'll also add some tests for the change as well?

cylc/flow/job_pool.py

hjoliver · 2019-07-09T10:53:35Z

In general, it would be nice to have a bit more comments on the complex calculations being introduced here.

I second that, a comment wherever the purpose of a code block is not obvious by inspection (to someone who did not write it!).

oliver-sanders

Approach seems OK.

A few quick points on efficiency which is quite important in this code due to the load it will put onto the workflow, none of these comments are really blockers though.

cylc/flow/ws_data_mgr.py

dwsutherland · 2019-07-10T11:51:30Z

@oliver-sanders, @matthewrmshin

Here's the missing piece of the puzzle:
https://developers.google.com/protocol-buffers/docs/reference/python-generated#repeated-message-fields

This is protobuf world, so the rules are a little different... (fortunately/unfortunately)
I'll explore a little more to see if there's more than in the docs we can do, but I'm not sure they are "normal" lists (...etc).

dwsutherland · 2019-07-10T12:11:31Z

Another thing: I think I may need to change the approach somewhat... I'm a little worried about the edge case where suites leave the initial cycle point hanging around... So may need to gather all the cycles in the pool, and prune and/or generate edges and nodes based on that...

Thoughts?

oliver-sanders · 2019-07-10T13:22:21Z

Thoughts?

Not sure, the current GUI will display the task in the initial cycle attaching it to the active tasks via a scissor node.

dwsutherland · 2019-07-19T00:58:40Z

Found a flaw in this approach...
Graph definitions often:

Do things at the initial cycle point (including future dependencies on this)
have offsets from the ICP [[[+P1D/PT6H]]]

So the graph edges can't be generated incrementally, the start point would have to be the ICP (although there may be some logic in there somewhere to help with this)... Would be nice to be able to continue the edge generation from where it stopped previously..

Will keep looking into this...

dwsutherland · 2019-07-19T02:23:03Z

Looking at the old gui, it use to ask for the entire graph:

cylc-flow/lib/cylc/gui/updater_graph.py

Line 284 in 0cc4b46

def update_gui(self):

It made a call to get_graph_raw, i.e. (7.8.x):

curl -v -s -u cylc:$(cat /niwa/test/ecox_test/cylc-run/nztide/.service/passphrase) --digest --cookie-jar cookietemp --anyauth --insecure 'https://ec-cylc01.kupe.niwa.co.nz:43089/get_graph_raw?start_point_string=20190718T00&stop_point_string=20190719T00'

Using the oldest cycle point from the get_latest_state, i.e. (7.8.x):

$ curl -v -s -u cylc:$(cat /niwa/test/ecox_test/cylc-run/nztide/.service/passphrase) --digest --cookie-jar cookietemp --anyauth --insecure 'https://ec-cylc01.kupe.niwa.co.nz:43089/get_latest_state'

Although in the case of ICP offset graphs without ICP dependencies, this could still be wrong.. Because your start point would act as the ICP...

So the way forward might be to:

generate edges from the oldest cycle in the pool (like the old gui)
Only add edges who's source or target is in the pool.

Then I'll test the ICP offset scenario..

dwsutherland · 2019-07-19T05:04:03Z

Ok I think I have a solution; in the get_graph_edges method (config.py) I set it to always use the ICP as the reference point. This is used along with an offset to obtain the correct source cycle point, and the sequence point only needs to begin at the desired/given start point.

Quite a simple change, and appears to be quite efficient.

dwsutherland · 2019-07-20T11:32:44Z

OK significant improvements made:

Graph edge generation now uses ICP as the relative point with ICP offset (IMO broken in the old get_graph_raw)
Pruning by cycle point where source or target edge points are not in task pool point set.
Other changes; xtriggers now show in the edge source, data elements updates more robust, job element deletion in the data manger pruning (for availability with non-task-pool tasks/nodes) ...etc
More comments made throughout the code.

This change means the "edge" case of ICP dependencies is now handled correctly..

Overall I'm pretty happy with this update scheme, it will be way more efficient than before (even though a lot of logic has been added), because it only generates, updates, and/or prunes minimally when and what's needed..

Next: Add functional tests
(this change is quite urgent, but I'll try to get this done quickly)

There's a few other PR post this one:

Add a insert_absent option to trigger and reset state commands, because for ghosts or other nodes the user won't know they aren't in the pool (should nodes not in the pool revert to ghost state? i.e. for suicides)
Getting more information about the xtrigger into the data store/schema, in particular treating suite_state xtriggers and polling tasks as real edges (for multi-workflow graph view ...etc)

Anyway, input/responses welcome.

oliver-sanders

I'm not in the best position to approve this never having worked with the graph generation logic, however, at a cursory glance all looks well.

I haven't profiled the generation methods to see the speedup as that's a fairly lengthy piece of work.

kinow

Just need to quickly see if I can debug it and see the pool being re-populated instead of re-created. But finished reviewing the code.

Great work adding the exclusions for coverage/analysis of the protobuf generated files. And great work on the tests too! 🎉

Left some comments for typos and a question, but none of them are blockers for merging it, and can be fixed afterwards.

Tested with the sibling PR of cylc-uiserver, and at least for Cylc UI it's transparent, and everything works the same as before (but must be better/faster now).

Thanks!

tests/graphql/02-root-queries/suite.rc

cylc/flow/network/schema.py

cylc/flow/ws_data_mgr.py

kinow · 2019-08-16T03:21:44Z

Oh, a silly question, but should we be logging the traffic between client & server via zmq? The messages are hard to read and seem to pollute the logs more than help, e.g.

Maybe if the message was nicely formatted, it could be helpful? I'm running the suite with cylc run --no-detach --verbose --debug five as I normally do.

dwsutherland · 2019-08-16T03:28:56Z

Oh, a silly question, but should we be logging the traffic between client & server via zmq? The messages are hard to read and seem to pollute the logs more than help, e.g.

I wouldn't do it.. it's in protobuf serialised format... that's a dump of the data into the log... If anything we could print some metadata about the request/response, but definitely not the data.

kinow · 2019-08-16T03:30:27Z

Thanks @dwsutherland . I'm not sure where the logging is being done, but we can look into that later (just wanted to check with others here as I would otherwise definitely forget it).

kinow

@dwsutherland great work!

I set a breakpoint in scheduler.update_data_structure, and later was taken to the WsDataMgr.increment_graph_elements.

From what I could understand, you are basically calculating the differences twice. First the difference of what was added to the Scheduler.pool and now needs to be added to the pool used for workflows. And then finally you check the difference for the set of tasks pruned, and remove them from the workflow pool too.

And that sets the has_updated to True, and when that happens, the WsDataMgr.update_workflow is called, where it copies the current Workflow (used by resolvers to send the GraphQL query response). I didn't understand why you had to copy the Workflow, and not simply update its fields, but there must be a reason, and it seemed to work fine, so approved! 🎉

And photo of the debugger in action. Here is the second time scheduler.run was executed, after a runahead task was added to the Scheduler pool, and now needs to be incrementally added to the workflow.

Unit tests added Sorting query arg ID delimiter var Functional tests added checksum to stamp re-intialise data-store on reload Use external id with store keys graph sort bug fix

cylc/flow/ws_data_mgr.py

oliver-sanders · 2019-08-19T10:20:36Z

Oh, a silly question, but should we be logging the traffic between client & server via zmq? The messages are hard to read and seem to pollute the logs more than help,

This is only done in --debug mode.

Perhaps the time has come to turn it off now the comms layer is tried and tested.

dwsutherland · 2019-08-19T12:01:56Z

Some profile results showing the data-store generation being faster than state-summary for one cycle of a cut down version of the complex suite:
Cylc-7: state-summary
Showing the relevant parts:

(cylc7) sutherlander@cortex-vbox:~$ /usr/bin/time -v cylc7 run --no-detach --profile --mode=simulation complex2
            ._.                                                       
            | |            The Cylc Suite Engine [7.8.3-46-gd66ae]    
._____._. ._| |_____.           Copyright (C) 2008-2019 NIWA          
| .___| | | | | .___|   & British Crown (Met Office) & Contributors.  
| !___| !_! | | !___.  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____!  This program comes with ABSOLUTELY NO WARRANTY;
      .___! |          see `cylc warranty`.  It is free software, you 
      !_____!           are welcome to redistribute it under certain  
PROFILE: Memory: 39724 KiB: scheduler.py: start configure
PROFILE: Memory: 39728 KiB: scheduler.py: before load_suiterc
PROFILE: Memory: 39728 KiB: config.py:config.py: start init config
PROFILE: Memory: 39728 KiB: config.py: before RawSuiteConfig init
PROFILE: Memory: 43412 KiB: config.py: after RawSuiteConfig init
PROFILE: Memory: 43412 KiB: config.py: before get(sparse=True
PROFILE: Memory: 43412 KiB: config.py: after get(sparse=True)
PROFILE: Memory: 43412 KiB: config.py: before _expand_runtime
PROFILE: Memory: 44168 KiB: config.py: after _expand_runtime
PROFILE: Memory: 44168 KiB: config.py: before compute_family_tree
PROFILE: Memory: 44180 KiB: config.py: after compute_family_tree
PROFILE: Memory: 44180 KiB: config.py: before inheritance
PROFILE: Memory: 46048 KiB: config.py: after inheritance
PROFILE: Memory: 46048 KiB: config.py: before get(sparse=False)
PROFILE: Memory: 47848 KiB: config.py: after get(sparse=False)
PROFILE: Memory: 47992 KiB: config.py: before load_graph()
PROFILE: Memory: 50104 KiB: config.py: after load_graph()
PROFILE: Memory: 50668 KiB: config.py: end init config
PROFILE: Memory: 50824 KiB: scheduler.py: after load_suiterc
2019-08-19T11:10:14Z INFO - Suite server: url=https://cortex-vbox:43064/ pid=18497
2019-08-19T11:10:14Z INFO - Run: (re)start=0 log=1
2019-08-19T11:10:14Z INFO - Cylc version: 7.8.3-46-gd66ae
2019-08-19T11:10:14Z INFO - Run mode: simulation
2019-08-19T11:10:14Z INFO - Initial point: 20150219T0000Z
2019-08-19T11:10:14Z INFO - Final point: 20150219T0000Z
PROFILE: Memory: 51892 KiB: scheduler.py: before load_tasks
2019-08-19T11:10:14Z INFO - Cold Start 20150219T0000Z
PROFILE: Memory: 54516 KiB: scheduler.py: after load_tasks
PROFILE: Memory: 55024 KiB: scheduler.py: end configure
PROFILE: Memory: 55028 KiB: scheduler.py: begin run while loop
.
.
.
2019-08-19T11:10:15Z INFO - PROFILE: scheduler loop dt (s): 1: 0.193 5: 0.193 15: 0.193
2019-08-19T11:10:15Z INFO - PROFILE: CPU %: 1: 81.5 5: 81.5 15: 81.5
PROFILE: Memory: 56440 KiB: scheduler.py: loop #0: 2019-08-19T11:10:15Z
.
.
.
2019-08-19T11:11:15Z INFO - PROFILE: scheduler loop dt (s): 1: 0.036 5: 0.039 15: 0.039
2019-08-19T11:11:15Z INFO - PROFILE: CPU %: 1: 16.4 5: 17.5 15: 17.5
PROFILE: Memory: 56968 KiB: scheduler.py: loop #60: 2019-08-19T11:11:15Z
.
.
.
2019-08-19T11:12:15Z INFO - PROFILE: scheduler loop dt (s): 1: 0.046 5: 0.043 15: 0.043
2019-08-19T11:12:15Z INFO - PROFILE: CPU %: 1: 6.8 5: 12.2 15: 12.2
PROFILE: Memory: 58284 KiB: scheduler.py: loop #120: 2019-08-19T11:12:15Z
.
.
.
2019-08-19T11:13:15Z INFO - PROFILE: scheduler loop dt (s): 1: 0.037 5: 0.041 15: 0.041
2019-08-19T11:13:15Z INFO - PROFILE: CPU %: 1: 5.7 5: 10.0 15: 10.0
PROFILE: Memory: 58552 KiB: scheduler.py: loop #180: 2019-08-19T11:13:15Z
2019-08-19T11:13:20Z INFO - [archive_logs.20150219T0000Z] status=running: (received)succeeded at 2019-08-19T11:13:20Z for job(00)
PROFILE: Memory: 58552 KiB: scheduler.py: end main loop (total loops 185): 2019-08-19T11:13:20Z
2019-08-19T11:13:20Z INFO - Suite shutting down - AUTOMATIC
2019-08-19T11:13:21Z INFO - DONE

         3482314 function calls (3396961 primitive calls) in 186.486 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.045    0.045  185.679  185.679 /home/sutherlander/repos/hilary/cylc-flow/lib/cylc/scheduler.py:1576(run)
      187  175.382    0.938  175.382    0.938 {time.sleep}

      187    0.027    0.000    1.498    0.008 /home/sutherlander/repos/hilary/cylc-flow/lib/cylc/scheduler.py:1642(update_state_summary)

       41    0.169    0.004    1.018    0.025 /home/sutherlander/repos/hilary/cylc-flow/lib/cylc/state_summary_mgr.py:50(update)

       41    0.047    0.001    0.406    0.010 /home/sutherlander/repos/hilary/cylc-flow/lib/cylc/state_summary_mgr.py:172(_get_tasks_info)
    16585    0.147    0.000    0.339    0.000 /home/sutherlander/repos/hilary/cylc-flow/lib/cylc/task_proxy.py:314(get_state_summary)

    45264    0.081    0.000    0.276    0.000 /home/sutherlander/repos/hilary/cylc-flow/lib/cylc/state_summary_mgr.py:138(<genexpr>)

	Command being timed: "cylc7 run --no-detach --profile --mode=simulation complex2"
	User time (seconds): 9.99
	System time (seconds): 2.70
	Percent of CPU this job got: 6%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 3:10.36
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 58684
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 455107
	Voluntary context switches: 2147
	Involuntary context switches: 2273
	Swaps: 0
	File system inputs: 0
	File system outputs: 68984
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

So we can see the state_summary update method running for 1.018 sec in total, with a significant amount of the time due to looping over all the pool tasks calling get_state_summary (as expected)..

Cylc-8: data-store
Showing the relevant parts:

(cylc-flow) sutherlander@cortex-vbox:~$ /usr/bin/time -v cylc run --no-detach --profile --mode=simulation complex2
            ._.                                                       
            | |                 The Cylc Suite Engine [8.0a0]         
._____._. ._| |_____.           Copyright (C) 2008-2019 NIWA          
| .___| | | | | .___|   & British Crown (Met Office) & Contributors.  
| !___| !_! | | !___.  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____!  This program comes with ABSOLUTELY NO WARRANTY.
      .___! |              It is free software, you are welcome to    
      !_____!             redistribute it under certain conditions;   
                        see `COPYING' in the Cylc source distribution. 
                                                                       
PROFILE: Memory: 43000 KiB: scheduler.py: start configure
PROFILE: Memory: 43028 KiB: scheduler.py: before load_suiterc
PROFILE: Memory: 43028 KiB: config.py:config.py: start init config
PROFILE: Memory: 43028 KiB: config.py: before RawSuiteConfig init
2019-08-19T22:47:06+12:00 WARNING - deprecated graph items were automatically upgraded in 'suite definition':
2019-08-19T22:47:06+12:00 WARNING -  * (8.0.0) [scheduling][dependencies][X][graph] -> [scheduling][graph][X] - for X in:
	R1
	T00
	T00, T06, T12, T18
	T00, T12
	T06
	T06, T18
	T12
	T18
PROFILE: Memory: 46836 KiB: config.py: after RawSuiteConfig init
PROFILE: Memory: 46836 KiB: config.py: before get(sparse=True
PROFILE: Memory: 46836 KiB: config.py: after get(sparse=True)
PROFILE: Memory: 46836 KiB: config.py: before _expand_runtime
PROFILE: Memory: 47152 KiB: config.py: after _expand_runtime
PROFILE: Memory: 47152 KiB: config.py: before compute_family_tree
PROFILE: Memory: 47152 KiB: config.py: after compute_family_tree
PROFILE: Memory: 47152 KiB: config.py: before inheritance
PROFILE: Memory: 48076 KiB: config.py: after inheritance
PROFILE: Memory: 48076 KiB: config.py: before get(sparse=False)
PROFILE: Memory: 48996 KiB: config.py: after get(sparse=False)
PROFILE: Memory: 49040 KiB: config.py: before load_graph()
PROFILE: Memory: 50968 KiB: config.py: after load_graph()
PROFILE: Memory: 51172 KiB: config.py: end init config
PROFILE: Memory: 51348 KiB: scheduler.py: after load_suiterc
2019-08-19T10:47:08Z INFO - Suite server: url=tcp://cortex-vbox:43061/ pid=18133
2019-08-19T10:47:08Z INFO - Run: (re)start=0 log=1
2019-08-19T10:47:08Z INFO - Cylc version: 8.0a0
2019-08-19T10:47:08Z INFO - Run mode: simulation
2019-08-19T10:47:08Z INFO - Initial point: 20150219T0000Z
2019-08-19T10:47:08Z INFO - Final point: 20150219T0000Z
PROFILE: Memory: 52240 KiB: scheduler.py: before load_tasks
2019-08-19T10:47:08Z INFO - Cold Start 20150219T0000Z
PROFILE: Memory: 53760 KiB: scheduler.py: after load_tasks
PROFILE: Memory: 54220 KiB: scheduler.py: end configure
PROFILE: Memory: 54232 KiB: scheduler.py: begin run while loop
.
.
.
2019-08-19T10:47:08Z INFO - PROFILE: scheduler loop dt (s): 1: 0.231 5: 0.231 15: 0.231
2019-08-19T10:47:08Z INFO - PROFILE: CPU %: 1: 86.0 5: 86.0 15: 86.0
PROFILE: Memory: 59644 KiB: scheduler.py: loop #0: 2019-08-19T10:47:08Z
.
.
.
2019-08-19T10:48:09Z INFO - PROFILE: scheduler loop dt (s): 1: 0.030 5: 0.033 15: 0.033
2019-08-19T10:48:09Z INFO - PROFILE: CPU %: 1: 14.8 5: 17.1 15: 17.1
PROFILE: Memory: 59680 KiB: scheduler.py: loop #61: 2019-08-19T10:48:09Z
.
.
.
2019-08-19T10:49:10Z INFO - PROFILE: scheduler loop dt (s): 1: 0.050 5: 0.042 15: 0.042
2019-08-19T10:49:10Z INFO - PROFILE: CPU %: 1: 6.0 5: 11.7 15: 11.7
PROFILE: Memory: 60240 KiB: scheduler.py: loop #121: 2019-08-19T10:49:10Z
.
.
.
2019-08-19T10:50:10Z INFO - PROFILE: scheduler loop dt (s): 1: 0.035 5: 0.039 15: 0.039
2019-08-19T10:50:10Z INFO - PROFILE: CPU %: 1: 5.3 5: 9.6 15: 9.6
PROFILE: Memory: 60384 KiB: scheduler.py: loop #181: 2019-08-19T10:50:10Z
2019-08-19T10:50:13Z INFO - [archive_logs.20150219T0000Z] status=running: (received)succeeded at 2019-08-19T10:50:13Z for job(00)
PROFILE: Memory: 60384 KiB: scheduler.py: end main loop (total loops 184): 2019-08-19T10:50:13Z
2019-08-19T10:50:13Z INFO - Suite shutting down - AUTOMATIC
2019-08-19T10:50:14Z INFO - DONE

         3212688 function calls (3207722 primitive calls) in 185.622 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.045    0.045  184.684  184.684 /home/sutherlander/repos/cylc-flow/cylc/flow/scheduler.py:1514(run)
      184  174.876    0.950  174.876    0.950 {built-in method time.sleep}
      186    0.004    0.000    0.891    0.005 /home/sutherlander/repos/cylc-flow/cylc/flow/scheduler.py:1587(update_data_structure)

       46    0.012    0.000    0.316    0.007 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:821(update_dynamic_elements)
       46    0.038    0.001    0.145    0.003 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:642(update_task_proxies)
       46    0.068    0.001    0.092    0.002 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:708(update_family_proxies)
       47    0.001    0.000    0.082    0.002 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:588(increment_graph_elements)
        1    0.015    0.015    0.080    0.080 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:393(generate_graph_elements)
       47    0.003    0.000    0.067    0.001 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:770(update_workflow_statuses)
        1    0.000    0.000    0.045    0.045 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:573(initiate_data_model)
        1    0.035    0.035    0.045    0.045 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:168(generate_definition_elements)
      246    0.018    0.000    0.019    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:284(generate_ghost_task)
      994    0.002    0.000    0.004    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:67(task_mean_elapsed_time)
        1    0.003    0.003    0.004    0.004 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:322(generate_ghost_families)
      727    0.003    0.000    0.003    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:693(<listcomp>)
        1    0.000    0.000    0.001    0.001 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:799(update_workflow)
      836    0.001    0.000    0.001    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:228(<listcomp>)
      773    0.001    0.000    0.001    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:829(<genexpr>)
      246    0.000    0.000    0.000    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:315(<listcomp>)
        1    0.000    0.000    0.000    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:257(<listcomp>)
        1    0.000    0.000    0.000    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:494(<listcomp>)
      727    0.000    0.000    0.000    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:679(<listcomp>)
       52    0.000    0.000    0.000    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:380(<listcomp>)

	Command being timed: "cylc run --no-detach --profile --mode=simulation complex2"
	User time (seconds): 9.13
	System time (seconds): 2.37
	Percent of CPU this job got: 6%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 3:08.46
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 60736
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 351270
	Voluntary context switches: 1901
	Involuntary context switches: 2559
	Swaps: 0
	File system inputs: 0
	File system outputs: 75136
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

So the update_data_structure (update_state_summary equivalent), called by the scheduler and updates the data store using:

       46    0.012    0.000    0.316    0.007 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:821(update_dynamic_elements)

       47    0.001    0.000    0.082    0.002 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:588(increment_graph_elements)

0.4 seconds so far, but you can probably add in the initialisation calls:

        1    0.000    0.000    0.045    0.045 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:573(initiate_data_model)

Giving a grand total of 0.443 seconds.. The graph edge generation only happened once:

        1    0.015    0.015    0.080    0.080 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:393(generate_graph_elements)

(as expected, with there being only one cycle point, included in increment_graph_elements)... It will only run once for every new cycle point.

data store wins: 57% faster!
(0.443 vs 1.018 seconds)

Note: Simulation runs don't include the job pool... However, that would just be more memory and a list comprehension on updated tasks (so maybe 0.01 seconds more)..

hjoliver

OK, all good now 👍

hjoliver · 2019-08-20T06:51:28Z

That seems to be four approvals 🥇

dwsutherland · 2019-08-23T02:08:39Z

For the record, master (without state-summary) was way slower before this merge:

(cylc-flow) sutherlander@cortex-vbox:~$ /usr/bin/time -v cylc run --no-detach --profile --mode=simulation complex2
            ._.                                                       
            | |                 The Cylc Suite Engine [8.0a0]         
._____._. ._| |_____.           Copyright (C) 2008-2019 NIWA          
| .___| | | | | .___|   & British Crown (Met Office) & Contributors.  
| !___| !_! | | !___.  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
!_____!___. |_!_____!  This program comes with ABSOLUTELY NO WARRANTY.
      .___! |              It is free software, you are welcome to    
      !_____!             redistribute it under certain conditions;   
                        see `COPYING' in the Cylc source distribution. 
                                                                       
PROFILE: Memory: 43432 KiB: scheduler.py: start configure
PROFILE: Memory: 43436 KiB: scheduler.py: before load_suiterc
PROFILE: Memory: 43436 KiB: config.py:config.py: start init config
PROFILE: Memory: 43436 KiB: config.py: before RawSuiteConfig init
2019-08-20T18:48:07+12:00 WARNING - deprecated graph items were automatically upgraded in 'suite definition':
2019-08-20T18:48:07+12:00 WARNING -  * (8.0.0) [scheduling][dependencies][X][graph] -> [scheduling][graph][X] - for X in:
	R1
	T00
	T00, T06, T12, T18
	T00, T12
	T06
	T06, T18
	T12
	T18
PROFILE: Memory: 46872 KiB: config.py: after RawSuiteConfig init
PROFILE: Memory: 46872 KiB: config.py: before get(sparse=True
PROFILE: Memory: 46872 KiB: config.py: after get(sparse=True)
PROFILE: Memory: 46872 KiB: config.py: before _expand_runtime
PROFILE: Memory: 47172 KiB: config.py: after _expand_runtime
PROFILE: Memory: 47172 KiB: config.py: before compute_family_tree
PROFILE: Memory: 47172 KiB: config.py: after compute_family_tree
PROFILE: Memory: 47172 KiB: config.py: before inheritance
PROFILE: Memory: 48092 KiB: config.py: after inheritance
PROFILE: Memory: 48092 KiB: config.py: before get(sparse=False)
PROFILE: Memory: 49016 KiB: config.py: after get(sparse=False)
PROFILE: Memory: 49052 KiB: config.py: before load_graph()
PROFILE: Memory: 50980 KiB: config.py: after load_graph()
PROFILE: Memory: 51176 KiB: config.py: end init config
PROFILE: Memory: 51344 KiB: scheduler.py: after load_suiterc
2019-08-20T06:48:09Z INFO - Suite server: url=tcp://cortex-vbox:43072/ pid=3115
2019-08-20T06:48:09Z INFO - Run: (re)start=0 log=1
2019-08-20T06:48:09Z INFO - Cylc version: 8.0a0
2019-08-20T06:48:09Z INFO - Run mode: simulation
2019-08-20T06:48:09Z INFO - Initial point: 20150219T0000Z
2019-08-20T06:48:09Z INFO - Final point: 20150219T0000Z
PROFILE: Memory: 52236 KiB: scheduler.py: before load_tasks
2019-08-20T06:48:09Z INFO - Cold Start 20150219T0000Z
PROFILE: Memory: 53764 KiB: scheduler.py: after load_tasks
PROFILE: Memory: 54228 KiB: scheduler.py: end configure
PROFILE: Memory: 54244 KiB: scheduler.py: begin run while loop
.
.
.
2019-08-20T06:48:10Z INFO - PROFILE: scheduler loop dt (s): 1: 0.267 5: 0.267 15: 0.267
2019-08-20T06:48:10Z INFO - PROFILE: CPU %: 1: 89.3 5: 89.3 15: 89.3
PROFILE: Memory: 59576 KiB: scheduler.py: loop #0: 2019-08-20T06:48:10Z
.
.
.
2019-08-20T06:49:10Z INFO - PROFILE: scheduler loop dt (s): 1: 0.061 5: 0.064 15: 0.064
2019-08-20T06:49:10Z INFO - PROFILE: CPU %: 1: 17.2 5: 18.3 15: 18.3
PROFILE: Memory: 60904 KiB: scheduler.py: loop #60: 2019-08-20T06:49:10Z
.
.
.
2019-08-20T06:50:10Z INFO - PROFILE: scheduler loop dt (s): 1: 0.079 5: 0.071 15: 0.071
2019-08-20T06:50:10Z INFO - PROFILE: CPU %: 1: 9.0 5: 13.7 15: 13.7
PROFILE: Memory: 61780 KiB: scheduler.py: loop #120: 2019-08-20T06:50:10Z
.
.
.
PROFILE: Memory: 61952 KiB: scheduler.py: loop #181: 2019-08-20T06:51:11Z
2019-08-20T06:51:14Z INFO - [archive_logs.20150219T0000Z] status=running: (received)succeeded at 2019-08-20T06:51:14Z for job(00)
PROFILE: Memory: 61952 KiB: scheduler.py: end main loop (total loops 184): 2019-08-20T06:51:15Z
2019-08-20T06:51:15Z INFO - Suite shutting down - AUTOMATIC
2019-08-20T06:51:16Z INFO - DONE

         5371608 function calls (5336357 primitive calls) in 186.351 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.049    0.049  185.368  185.368 /home/sutherlander/repos/cylc-flow/cylc/flow/scheduler.py:1516(run)
      184  170.483    0.927  170.483    0.927 {built-in method time.sleep}

      186    0.004    0.000    5.512    0.030 /home/sutherlander/repos/cylc-flow/cylc/flow/scheduler.py:1583(update_data_structure)
       41    0.005    0.000    5.134    0.125 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:63(initiate_data_model)

       41    0.469    0.011    1.906    0.046 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:257(generate_graph_elements)

       41    0.357    0.009    1.552    0.038 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:334(update_task_proxies)

       41    1.032    0.025    1.533    0.037 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:71(generate_definition_elements)

    10086    0.337    0.000    0.367    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:180(_generate_ghost_task)

       41    0.159    0.004    0.185    0.005 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:204(_generate_ghost_families)

       41    0.056    0.001    0.075    0.002 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:368(update_family_proxies)

       41    0.015    0.000    0.063    0.002 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:414(update_workflow)

    34276    0.034    0.000    0.034    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:129(<listcomp>)
       41    0.022    0.001    0.031    0.001 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:153(<listcomp>)
       41    0.020    0.000    0.020    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:170(<listcomp>)
       41    0.018    0.000    0.018    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:315(<listcomp>)
    10086    0.015    0.000    0.015    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:198(<listcomp>)
       41    0.011    0.000    0.011    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:169(<listcomp>)
       41    0.008    0.000    0.008    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:447(<listcomp>)
     9608    0.002    0.000    0.002    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:353(<listcomp>)
       41    0.001    0.000    0.001    0.000 /home/sutherlander/repos/cylc-flow/cylc/flow/ws_data_mgr.py:449(<listcomp>)

	Command being timed: "cylc run --no-detach --profile --mode=simulation complex2"
	User time (seconds): 13.85
	System time (seconds): 2.99
	Percent of CPU this job got: 8%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 3:09.20
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 62276
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 409991
	Voluntary context switches: 2059
	Involuntary context switches: 3669
	Swaps: 0
	File system inputs: 8
	File system outputs: 66936
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Incremental data-store generation wins:
5.134 vs 0.443 seconds

(as expected)

dwsutherland requested review from matthewrmshin, hjoliver and oliver-sanders June 29, 2019 02:59

dwsutherland force-pushed the increment-data-store branch from 2dbda8c to fa35518 Compare June 29, 2019 03:11

cylc deleted a comment Jun 29, 2019

dwsutherland force-pushed the increment-data-store branch 2 times, most recently from 3b73f44 to 8235d2d Compare June 29, 2019 19:26

dwsutherland self-assigned this Jul 1, 2019

dwsutherland requested a review from kinow July 3, 2019 23:46

dwsutherland force-pushed the increment-data-store branch from 8235d2d to ce798a5 Compare July 6, 2019 12:52

dwsutherland mentioned this pull request Jul 6, 2019

Incremental Updates - UIS data store sync #3207

Closed

matthewrmshin reviewed Jul 9, 2019

View reviewed changes

cylc/flow/job_pool.py Outdated Show resolved Hide resolved

oliver-sanders reviewed Jul 10, 2019

View reviewed changes

cylc/flow/ws_data_mgr.py Outdated Show resolved Hide resolved

cylc/flow/ws_data_mgr.py Show resolved Hide resolved

cylc/flow/ws_data_mgr.py Outdated Show resolved Hide resolved

cylc/flow/ws_data_mgr.py Outdated Show resolved Hide resolved

matthewrmshin added this to the cylc-8.0a1 milestone Jul 16, 2019

dwsutherland mentioned this pull request Jul 17, 2019

tasks query bug fix #3219

Merged

6 tasks

dwsutherland force-pushed the increment-data-store branch 2 times, most recently from 31e7197 to a79aa6c Compare July 20, 2019 11:03

cylc deleted a comment Jul 20, 2019

dwsutherland requested review from matthewrmshin and oliver-sanders July 20, 2019 11:33

dwsutherland force-pushed the increment-data-store branch from a79aa6c to e877f77 Compare July 22, 2019 09:03

cylc deleted a comment Aug 15, 2019

dwsutherland force-pushed the increment-data-store branch from f6f0db4 to cef34d5 Compare August 15, 2019 03:34

cylc deleted a comment Aug 15, 2019

oliver-sanders approved these changes Aug 15, 2019

View reviewed changes

dwsutherland force-pushed the increment-data-store branch from cef34d5 to e806040 Compare August 16, 2019 01:48

cylc deleted a comment Aug 16, 2019

kinow reviewed Aug 16, 2019

View reviewed changes

dwsutherland requested a review from kinow August 16, 2019 03:30

cylc deleted a comment Aug 16, 2019

kinow approved these changes Aug 16, 2019

View reviewed changes

dwsutherland added 3 commits August 19, 2019 14:05

Incremental data-store generation

1aa395a

Unit tests added Sorting query arg ID delimiter var Functional tests added checksum to stamp re-intialise data-store on reload Use external id with store keys graph sort bug fix

Fix integer get first from end point

45c2d55

Bruno review changes

5a392db

hjoliver reviewed Aug 19, 2019

View reviewed changes

cylc/flow/ws_data_mgr.py Show resolved Hide resolved

matthewrmshin mentioned this pull request Aug 19, 2019

Reference tests and related improvements #3286

Merged

7 tasks

Hilary review changes

2354f89

dwsutherland force-pushed the increment-data-store branch from 3743a2b to 2354f89 Compare August 19, 2019 11:09

cylc deleted a comment Aug 19, 2019

dwsutherland requested a review from hjoliver August 20, 2019 06:13

dwsutherland mentioned this pull request Aug 20, 2019

task icons - no icons for held and retrying? cylc/cylc-ui#189

Closed

hjoliver approved these changes Aug 20, 2019

View reviewed changes

hjoliver merged commit e439a1d into cylc:master Aug 20, 2019

oliver-sanders added the efficiency For notable efficiency improvements label Aug 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental data-store generation #3202

Incremental data-store generation #3202

dwsutherland commented Jun 29, 2019 •

edited

Loading

codecov bot commented Jun 29, 2019

matthewrmshin left a comment

hjoliver commented Jul 9, 2019

oliver-sanders left a comment

dwsutherland commented Jul 10, 2019 •

edited

Loading

dwsutherland commented Jul 10, 2019

oliver-sanders commented Jul 10, 2019

dwsutherland commented Jul 19, 2019 •

edited

Loading

dwsutherland commented Jul 19, 2019 •

edited

Loading

dwsutherland commented Jul 19, 2019 •

edited

Loading

dwsutherland commented Jul 20, 2019 •

edited

Loading

oliver-sanders left a comment

kinow left a comment •

edited

Loading

kinow commented Aug 16, 2019 •

edited

Loading

dwsutherland commented Aug 16, 2019

kinow commented Aug 16, 2019

kinow left a comment

oliver-sanders commented Aug 19, 2019

dwsutherland commented Aug 19, 2019 •

edited

Loading

hjoliver left a comment

hjoliver commented Aug 20, 2019

dwsutherland commented Aug 23, 2019

Incremental data-store generation #3202

Incremental data-store generation #3202

Conversation

dwsutherland commented Jun 29, 2019 • edited Loading

codecov bot commented Jun 29, 2019

Codecov Report

matthewrmshin left a comment

Choose a reason for hiding this comment

hjoliver commented Jul 9, 2019

oliver-sanders left a comment

Choose a reason for hiding this comment

dwsutherland commented Jul 10, 2019 • edited Loading

dwsutherland commented Jul 10, 2019

oliver-sanders commented Jul 10, 2019

dwsutherland commented Jul 19, 2019 • edited Loading

dwsutherland commented Jul 19, 2019 • edited Loading

dwsutherland commented Jul 19, 2019 • edited Loading

dwsutherland commented Jul 20, 2019 • edited Loading

oliver-sanders left a comment

Choose a reason for hiding this comment

kinow left a comment • edited Loading

Choose a reason for hiding this comment

kinow commented Aug 16, 2019 • edited Loading

dwsutherland commented Aug 16, 2019

kinow commented Aug 16, 2019

kinow left a comment

Choose a reason for hiding this comment

oliver-sanders commented Aug 19, 2019

dwsutherland commented Aug 19, 2019 • edited Loading

hjoliver left a comment

Choose a reason for hiding this comment

hjoliver commented Aug 20, 2019

dwsutherland commented Aug 23, 2019

dwsutherland commented Jun 29, 2019 •

edited

Loading

dwsutherland commented Jul 10, 2019 •

edited

Loading

dwsutherland commented Jul 19, 2019 •

edited

Loading

dwsutherland commented Jul 19, 2019 •

edited

Loading

dwsutherland commented Jul 19, 2019 •

edited

Loading

dwsutherland commented Jul 20, 2019 •

edited

Loading

kinow left a comment •

edited

Loading

kinow commented Aug 16, 2019 •

edited

Loading

dwsutherland commented Aug 19, 2019 •

edited

Loading