Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching and ubiquitous dtypes #144

Merged
merged 139 commits into from
Jan 11, 2023
Merged

Batching and ubiquitous dtypes #144

merged 139 commits into from
Jan 11, 2023

Conversation

liamhuber
Copy link
Member

@liamhuber liamhuber commented Dec 2, 2022

A wrapper node that allows pure data nodes to internally iterate over batches of input data, producing batches of output data.

For nodes that have an exec port, this will be more complicated. In general the interface of pyiron (which is heavily state-based) and ironflow (which strives to be as functional as possible) gets really tough when we start having nodes that run/read/parse pyiron jobs. So here the goal is just to get it working for pure the purely functional data nodes, and then probably do some re-organization before trying to hit the job-running nodes.

This wound up being a complete rework of the DType system, including adding types to all ports by default, and introducing new constraints one which ports can connect to which. Inside this new framework, the ports can be toggled to represent 1D "batched" data of their original type. Toggling is done in the controller panel, and batch status is easily seen in the graph by ALL CAPS labels.

Glaringly, there is no Matrix DType, only 1D List types. This is the biggest weakness, but the solution in broad strokes is pretty clear: introduce a new dtype for matrices with a dimensions field, and then have the current List be a special dimension=1 case for this.

I also had to really relax in the input types for plotting. Even though batching handles a lot of the nitty gritty, we still wind up doing some vector operations like transposition, slicing, and indexing. The right solution is to have these sorts of nodes pass type (well, valid_class) information through themselves, but that's not implemented yet.

How does this compare to other tools? The most mainstream graphical scripting language I can think of is the Unreal engine's "blueprints". Looking at their control structures docs, they don't offer anything like what we have here. IMO that is both a yellow flag that what we are doing is a bad idea, and a nice asset that we can offer this sort of streamlining.
Of course, an explicit for-loop node is still necessary for nested looping, or looping over dimensions with different length, but I do feel that batching covers the most common scenarios with quite some elegance.

Value matching looks for a list-like object instead of directly looking at the value itself.
@github-actions
Copy link

github-actions bot commented Dec 2, 2022

Binder 👈 Launch a binder notebook on branch pyiron/ironflow/batching

@coveralls
Copy link

coveralls commented Dec 2, 2022

Pull Request Test Coverage Report for Build 3622372693

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 196 of 298 (65.77%) changed or added relevant lines in 10 files are covered.
  • 189 unchanged lines in 7 files lost coverage.
  • Overall coverage increased (+0.009%) to 63.679%

Changes Missing Coverage Covered Lines Changed/Added Lines %
ironflow/model/flow.py 9 10 90.0%
ironflow/model/port.py 11 13 84.62%
ironflow/model/node.py 13 16 81.25%
ironflow/gui/workflows/boxes/node_interface/representation.py 0 4 0.0%
ironflow/gui/workflows/boxes/node_interface/control.py 3 38 7.89%
ironflow/nodes/pyiron/atomistics_nodes.py 68 125 54.4%
Files with Coverage Reduction New Missed Lines %
ironflow/model/flow.py 1 89.47%
ironflow/model/dtypes.py 1 98.84%
ironflow/gui/workflows/canvas_widgets/ports.py 1 66.67%
ironflow/gui/workflows/boxes/node_interface/representation.py 3 41.54%
ironflow/model/node.py 24 75.76%
ironflow/gui/workflows/boxes/node_interface/control.py 29 28.32%
ironflow/nodes/pyiron/atomistics_nodes.py 130 54.1%
Totals Coverage Status
Change from base Build 3604660161: 0.009%
Covered Lines: 2316
Relevant Lines: 3637

💛 - Coveralls

Also for a couple references to github.com/pyiron/pyiron/tree/master, which worked because github kindly redirects us, but should still be main now.
A bespoke node causes all sorts of headaches let's switch over to making ports batchable and having a different update method under the hood
@liamhuber
Copy link
Member Author

Actually wrapping nodes in a super-node was hell. The attack I'm trying now is to make individual ports "batchable", so that the same node UI can be used, and we just have a little toggle to say whether it's expecting batched or straight input.

This basically delegates the complexity to the back-end, where the node needs to be able to handle updates with (new) and without (standard) any batched input. Still working on this part...

This is just a proof of concept, needs to be refactored and pulled up to the parent class, with contingencies in place in case the node doesn't support batched running (which should also be reflected in the UI, disallowing the user from marking input as batched).
In anticipation of batching nodes that run jobs
Ironflow IO ports always have dtype, but sometimes it's None, so we can make this check a bit more specific
chr is a method not a type anyhow
Doesn't work for un-typed (output) ports yet! Probably need a new `Untyped` dtype for that, which would also allow me to get rid of a bunch of `if dtype is not None` checks
@liamhuber
Copy link
Member Author

liamhuber commented Dec 12, 2022

The parent "batchable" class now exists for pure-data nodes (i.e. those that don't run a pyiron job under the hood), and have been applied to the linspace and structure nodes so far.

  • Introduce a dtype.Untyped so that untyped output can still be registered as batched or not, and take advantage of the ALL CAPS labelling for batched ports
  • Apply the new batchable parent to the rest of the pyiron data nodes

Then the remaining big step is to apply the batchable parent to the Calc family of classes which run jobs.

@liamhuber liamhuber removed the stale label Jan 10, 2023
@liamhuber
Copy link
Member Author

Error is just the numpy.float error propagating in from pyiron_atomistics, so I'll run black and re-run the tests once that's fixed.

Traceback (most recent call last):
[1213](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1237)
  File "/home/runner/work/ironflow/ironflow/tests/unit/test_gui.py", line 98, in test_user_node_registration
[1214](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1238)
    gui = GUI('foo')
[1215](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1239)
  File "/home/runner/work/ironflow/ironflow/ironflow/gui/gui.py", line 64, in __init__
[1216](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1240)
    super().__init__(
[1217](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1241)
  File "/home/runner/work/ironflow/ironflow/ironflow/model/model.py", line 43, in __init__
[1218](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1242)
    from ironflow.nodes.pyiron import atomistics_nodes
[1219](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1243)
  File "/home/runner/work/ironflow/ironflow/ironflow/nodes/pyiron/atomistics_nodes.py", line 34, in <module>
[1220](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1244)
    from pyiron_atomistics.lammps.lammps import Lammps
[1221](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1245)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/pyiron_atomistics/lammps/lammps.py", line 5, in <module>
[1222](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1246)
    from pyiron_atomistics.lammps.interactive import LammpsInteractive
[1223](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1247)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/pyiron_atomistics/lammps/interactive.py", line 13, in <module>
[1224](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1248)
    from pyiron_atomistics.lammps.base import LammpsBase, _check_ortho_prism
[1225](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1249)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/pyiron_atomistics/lammps/base.py", line 26, in <module>
[1226](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1250)
    from pyiron_atomistics.lammps.control import LammpsControl
[1227](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1251)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/pyiron_atomistics/lammps/control.py", line 11, in <module>
[1228](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1252)
    from pyiron_atomistics.lammps.units import LAMMPS_UNIT_CONVERSIONS
[1229](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1253)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/pyiron_atomistics/lammps/units.py", line 160, in <module>
[1230](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1254)
    dtype_dict[v] = np.float
[1231](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1255)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/numpy/__init__.py", line 284, in __getattr__
[1232](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1256)
    raise AttributeError("module {!r} has no attribute "
[1233](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1257)
AttributeError: module 'numpy' has no attribute 'float'

@liamhuber liamhuber added the format_black Invoke black formatting CI label Jan 11, 2023
@liamhuber liamhuber added the run_coverage Triggers the full test suite and coverage label Jan 11, 2023
@liamhuber
Copy link
Member Author

Coverage is just complaining because of access rights (coveralls.exception.CoverallsException: Not on Travis or CircleCI. You have to provide either repo_token in .coveralls.yml or set the COVERALLS_REPO_TOKEN env var.). In fact, I've already fought with this before and followed the relevant advice.

I'm going to just merge and see if it's a branch-vs-main access issue somehow. Otherwise the conventional wisdom over on the coveralls github issue is to try deleting the repo on the coveralls side and re-add it.

@liamhuber liamhuber merged commit c5e3670 into main Jan 11, 2023
@liamhuber liamhuber deleted the batching branch January 11, 2023 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format_black Invoke black formatting CI run_coverage Triggers the full test suite and coverage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants