-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execution dependency extension #1213
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting to see proper graph resolution, although not the easiest to read 😆 I just pushed all dependencies not already in my execution list into it, after pushing in their dependencies. Simpler than general algorithms because we know what will end up last! No circular detection there though. Anyway, there's a bunch of comments in line
var orig_execute = codecell.CodeCell.prototype.execute; // get original cell execute function | ||
CodeCell.prototype.execute = function (stop_on_error) { | ||
var root_tags = this.metadata.tags || []; | ||
if(root_tags != [] && root_tags.some(tag => /=>.*/.test(tag))) { // if the root cell contains any dependencies, resolve dependency tree... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javascript doesn't treat empty arrays as false
y, like python, and doesn't compare their contents, so [] != []
always evaluates to true
. I suspect you wanted something like root_tags.length < 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure if this.metadata.tags
would be []
or something like null
/undefined
. That is where my javascript knowledge ends. But I see that since I either have [...]
or []
, so it is defined and I can ask for its length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. this.metadata.tags
may often be undefined
, e.g. if no tags have been set. In fact, in a trickier world, it could be anything that'll fit into the notebook JSON, but in practice we can assume that it will be either an array of strings, or undefined (if it's not, something more significant has gone wrong with the notebook somewhere, and this extension breaking is likely to be the least of anyone's worries)
var tags = cell.metadata.tags || []; | ||
var identities = tags.filter(tag => /#.*/.test(tag)).map(tag => tag.substring(1)); // ...get all identities and drop the # | ||
if(cell === root_cell && !tags.some(tag => /#.*/.test(tag))) { | ||
identities.push("DD27AE1D138027D0D7AB824FD0DDDC61367D5CCA4AAB42CE50840762B053764D"); // ...generate an id for the root cell for internal usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than hard-coding, you could use root_cell.cell_id
identified_cells.forEach(function (cell) { // ...get all identified cells (the ones that have at least one #tag) | ||
var tags = cell.metadata.tags || []; | ||
var identities = tags.filter(tag => /#.*/.test(tag)).map(tag => tag.substring(1)); // ...get all identities and drop the # | ||
if(cell === root_cell && !tags.some(tag => /#.*/.test(tag))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than testing !tags.some(...
you can use identities.length < 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might also be worth renaming identities
to something like required_ids
for better readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single cell can have multiple identities, that is why it is called identities. I do not see why they would be required_ids. Below I just add an id, because the root cell does not need an id since it could be the one that is independent (the user is just working on that one and would not understand why it needs an id to run.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apologies, I misunderstood, for some reason I thought this was a list of dependencies' id
s. No rename necessary 👍
|
||
var deps = tags.filter(tag => /=>.*/.test(tag)).map(tag => tag.substring(2)); // ...get all dependencies and drop the => | ||
identities.forEach(function (id) { | ||
cell_map[id] = cell; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line implicitly means only a single cell can have each #some_id
tag. Is this what you want? It could make sense to have a few cells with the same tag, so do something like
cell_map[id] = cell_map[id] || [];
cell_map[id].push(cell);
Alternatively, if you don't intend to support this, at least let the user know that adding multiple cells with the same identifier tag won't work correctly in the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as far as I thought about this, I wanted to work with unique ids, so one id for one cell. In the one id for multiple cells case, I would not be sure what is the right order to execute them and even if they are all independent from each other, they might have different dependencies, so I would have to account for each of them separately. But you gave me a nice idea. Is the cell_id you mentioned above unique within the notebook's context? Then I could use that for uniqueness and allow multiple cells to have the same hashtag id. Maybe I will get to this in a later version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the cell_id you mentioned above unique within the notebook's context? Then I could use that for uniqueness and allow multiple cells to have the same hashtag id.
Yes, it should be unique for the current session. See notebook/static/notebook/js/cell.js#L101 for where it gets assigned, and notebook/static/base/js/utils.js#L206-L220 for the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not be sure what is the right order to execute them
I think it would be reasonable to assume they ought to be executed in the order in which they appear in the notebook?
they might have different dependencies, so I would have to account for each of them separately
sort of, or they can be treated as essentially an aggregate (dependencies of this tag are simply the collected dependencies of all its constituent cells).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds both reasonable. I will look into this as soon as the rest works.
} | ||
|
||
if(processed_nodes >= Object.keys(dep_graph).length) { | ||
console.error('There is a circular dependency in your execute dependencies!'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could do an alert
here, or alternatively use the notebook dialog for something prettier. See
jupyter_contrib_nbextensions/src/jupyter_contrib_nbextensions/nbextensions/init_cell/main.js
Lines 106 to 115 in a4544ec
if (!Jupyter.notebook.trusted) { | |
dialog.modal({ | |
title : 'Initialization cells in untrusted notebook', | |
body : 'This notebook is not trusted, so initialization cells will not be automatically run on kernel load. You can still run them manually, though.', | |
buttons: {'OK': {'class' : 'btn-primary'}}, | |
notebook: Jupyter.notebook, | |
keyboard_manager: Jupyter.keyboard_manager, | |
}); | |
return; | |
} |
In fact, it probably makes sense for this extension to refuse to execute dependencies in untrusted notebooks in the same way that init_cell does...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great, I will add it.
console.log("Map processing order to cells...", processing_order) | ||
var dependency_cells = processing_order.map(id =>cell_map[id]); // ...get dependent cells by their id | ||
console.log("Execute cells..", dependency_cells) | ||
dependency_cells.forEach(function (cell) { cell.execute(stop_on_error); }); // ...execute all dependent cells in order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be the main problem you're seeing. Since you've called cell.execute
, this will call your patched version, so look for dependencies again, etc. Instead, you should almost certainly be doing
dependency_cells.forEach(cell => orig_execute.call(cell, stop_on_error));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that is how this works! Thanks for pointing it out to me!
} | ||
orig_execute.call(this, stop_on_error); // execute original cell execute function | ||
}; | ||
console.log('[exec_deps] loaded'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should use the updated name of execution_dependencies
|
||
Writing extensive notebooks can become very complicated since many cells act as stepping stones to produce intermediate results for later cells. Thus, it becomes tedious to | ||
keep track of the cells that have to be run in order to run a certain cell. This extension simplifies handling the execution dependencies by introducing tag annotations to | ||
identify each cell and indicate a dependency on others. This improves on the current state which requires remembering all dependencies by heart or annotating the cells in the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth noting that dependencies are definitely executed, rather than say only being executed once per kernel session. This may be important for cells which take a long time to execute...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I should add that. Thanks for reviewing my code, I am very grateful for that! I really like jupyter notebooks and I am very interested in contributing more extensions in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, happy to help 😄 And of course, we'd be happy to have anything you think might be useful to include here 😉
One more thing, how do I know about compatibility? I currently have all compatibilities in, but all I know is that it is compatible with 5.x. |
627b3cc
to
cec798a
Compare
cec798a
to
e99b02a
Compare
It does not work yet on my larger example. This needs some more time. |
For compatibility, I just check whether the functions I'm calling exist in the notebook of the given version (4.0.0 for 4.x, etc). You can do this quite conveniently on github by looking at the notebook repo's tags, e.g. https://github.com/jupyter/notebook/tree/4.0.0/notebook/static. In general though, this nbextension isn't using too many fancy notebook methods that have changed recently, like multiple selections, so I'd guess it's compatible with 4.x & 5.x. Probably the most likely thing to break 3.x might be the dialog but even that may be fine. However, I wouldn't worry much about 3.x because I haven't heard of anyone using it for quite some time now...
Sure, will hold off merge till you're ready |
25e02f0
to
8ab463d
Compare
Fixed some problems with processing if the dependencies are only a subgraph within many completely unrelated cells. I will keep on my roadmap:
|
2a18c0f
to
59b6a9d
Compare
59b6a9d
to
509c10a
Compare
Looks like it works now! I used it for the rest of the day and it did not break. Ready to be pulled! |
Great, thanks. Could you add a note to the changelog under the github master section, just noting the new extension? Apologies, I should have mentioned this before 🤦♂️ |
Of course I can. |
There you go. Btw I am working on the improvement to make the dependent cell only if it has not been run before or if it has been modified since its last run. Are there any flags that tell me (if/the last time) a cell has been run and (if/the last time) it has been modified? Maybe I am looking in the wrong places because I can not find it. I will open a new pull request as soon as I am ready with the new changes. Edit: Also, the Changelog's pull request links should be updated, they are all invalid except for mine. They are missing the repository name. P.S. I made a contribution page for you because I liked the way things were going with my first contribution. Maybe have a review on it to see if that is how you like your contributions: Create A new Notebook Extension |
Thanks :)
No, there aren't in the main notebook. There's the ExecuteTime nbextension, but it doesn't record modifications, doesn't keep track of which kernel was running, and only takes into account the current browser frontend. The lack of such a feature is, I think at least in part, by design. The kernel is intended to be frontend-agnositc. That is to say, it should not care (or expect to be able to tell) whether it is being sent commands by a notebook, a terminal, a qt console, your own custom client, whatever. As a result, the kernel has no notion of cells (although I guess one could consider each ipython However, in many (most?) cases, people are using kernels which are only connected to a single notebook frontend, and in such cases, we could get away with a relatively simple model where the nbextension could record cells' ids as they are executed, removes them from the executed list if they get modified or the kernel restarts, and therefore check whether they need re-executing, say if they use
🤦♂️ d'oh I just fixed this for the links for the 3.x releases, then clearly managed to make the same mistake again :( thanks for pointing this out 👍, I'll add a PR to fix these.
Hmm. The lint build seems to fail because isort has changed its mind about ordering imports again. That can be safely ignored. The docs build seems to fail on a bunch of links which look like they should be fine, so I've restarted it to check. I don't think it's anything to do with your edit, which looks fine to me.
I guess you mean this to be a |
var processing_queue = [root_cell_id]; | ||
var processed_nodes = 0; | ||
var in_degree = {}; // ...collect in-degrees of nodes | ||
while(processing_queue.length > 0 && processed_nodes < Object.keys(dep_graph).length) {// ...stay processing deps while the queue contains nodes and the processed nodes are below total node quantity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this loop never increments processed_nodes
, so gets stuck for circular dependencies...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it to the code. Ready for merging.
Yeah! |
I am implementing the extension for custom execution dependencies as discussed in #1193. It seems to work and properly resolves the dependency tree (it even handles circular deps in a simple way). However, in the end it collects all cells in the order found by the topological sort and tries to run them, but then the java script crashes/and the script starts to run anew.
DISCLAIMER: I have limited experience with JavaScript, so there might be quirky stuff done in code which I comes from my experience as a Java developer. But the problem does not seem to be in that part, but seems to be somewhere else.
Any idea what is wrong? Second question I have is if there is a neat way I can tell the user that a circular dependency is preventing the dependencies to run. For now it just prints to the console.
To test it, simply create two python cells and add a #A tag to the first and a =>A tag to the second. The run the second and look into the developer console. There you can see that it restarts the script after it prints "Execute cells.."