Batching and ubiquitous dtypes #144

liamhuber · 2022-12-02T22:54:04Z

~~A wrapper node that allows pure data nodes to internally iterate over batches of input data, producing batches of output data.~~

For nodes that have an exec port, this will be more complicated. In general the interface of pyiron (which is heavily state-based) and ironflow (which strives to be as functional as possible) gets really tough when we start having nodes that run/read/parse pyiron jobs. So here the goal is just to get it working for pure the purely functional data nodes, and then probably do some re-organization before trying to hit the job-running nodes.

This wound up being a complete rework of the DType system, including adding types to all ports by default, and introducing new constraints one which ports can connect to which. Inside this new framework, the ports can be toggled to represent 1D "batched" data of their original type. Toggling is done in the controller panel, and batch status is easily seen in the graph by ALL CAPS labels.

Glaringly, there is no Matrix DType, only 1D List types. This is the biggest weakness, but the solution in broad strokes is pretty clear: introduce a new dtype for matrices with a dimensions field, and then have the current List be a special dimension=1 case for this.

I also had to really relax in the input types for plotting. Even though batching handles a lot of the nitty gritty, we still wind up doing some vector operations like transposition, slicing, and indexing. The right solution is to have these sorts of nodes pass type (well, valid_class) information through themselves, but that's not implemented yet.

How does this compare to other tools? The most mainstream graphical scripting language I can think of is the Unreal engine's "blueprints". Looking at their control structures docs, they don't offer anything like what we have here. IMO that is both a yellow flag that what we are doing is a bad idea, and a nice asset that we can offer this sort of streamlining.
Of course, an explicit for-loop node is still necessary for nested looping, or looping over dimensions with different length, but I do feel that batching covers the most common scenarios with quite some elegance.

Value matching looks for a list-like object instead of directly looking at the value itself.

github-actions · 2022-12-02T22:54:16Z

👈 Launch a binder notebook on branch pyiron/ironflow/batching

coveralls · 2022-12-02T22:56:34Z

Pull Request Test Coverage Report for Build 3622372693

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

196 of 298 (65.77%) changed or added relevant lines in 10 files are covered.
189 unchanged lines in 7 files lost coverage.
Overall coverage increased (+0.009%) to 63.679%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
ironflow/model/flow.py	9	10	90.0%
ironflow/model/port.py	11	13	84.62%
ironflow/model/node.py	13	16	81.25%
ironflow/gui/workflows/boxes/node_interface/representation.py	0	4	0.0%
ironflow/gui/workflows/boxes/node_interface/control.py	3	38	7.89%
ironflow/nodes/pyiron/atomistics_nodes.py	68	125	54.4%

Files with Coverage Reduction	New Missed Lines	%
ironflow/model/flow.py	1	89.47%
ironflow/model/dtypes.py	1	98.84%
ironflow/gui/workflows/canvas_widgets/ports.py	1	66.67%
ironflow/gui/workflows/boxes/node_interface/representation.py	3	41.54%
ironflow/model/node.py	24	75.76%
ironflow/gui/workflows/boxes/node_interface/control.py	29	28.32%
ironflow/nodes/pyiron/atomistics_nodes.py	130	54.1%

Totals
Change from base Build 3604660161:	0.009%
Covered Lines:	2316
Relevant Lines:	3637

💛 - Coveralls

Also for a couple references to github.com/pyiron/pyiron/tree/master, which worked because github kindly redirects us, but should still be main now.

A bespoke node causes all sorts of headaches let's switch over to making ports batchable and having a different update method under the hood

liamhuber · 2022-12-08T21:09:15Z

Actually wrapping nodes in a super-node was hell. The attack I'm trying now is to make individual ports "batchable", so that the same node UI can be used, and we just have a little toggle to say whether it's expecting batched or straight input.

This basically delegates the complexity to the back-end, where the node needs to be able to handle updates with (new) and without (standard) any batched input. Still working on this part...

This is just a proof of concept, needs to be refactored and pulled up to the parent class, with contingencies in place in case the node doesn't support batched running (which should also be reflected in the UI, disallowing the user from marking input as batched).

In anticipation of batching nodes that run jobs

Ironflow IO ports always have dtype, but sometimes it's None, so we can make this check a bit more specific

chr is a method not a type anyhow

Doesn't work for un-typed (output) ports yet! Probably need a new `Untyped` dtype for that, which would also allow me to get rid of a bunch of `if dtype is not None` checks

liamhuber · 2022-12-12T20:28:07Z

The parent "batchable" class now exists for pure-data nodes (i.e. those that don't run a pyiron job under the hood), and have been applied to the linspace and structure nodes so far.

Introduce a dtype.Untyped so that untyped output can still be registered as batched or not, and take advantage of the ALL CAPS labelling for batched ports
Apply the new batchable parent to the rest of the pyiron data nodes

Then the remaining big step is to apply the batchable parent to the Calc family of classes which run jobs.

Choice behaves differently by spec now

It was only ever set to the default value anyhow

So it will play well with manipulated arrays

liamhuber · 2023-01-10T23:12:28Z

Error is just the numpy.float error propagating in from pyiron_atomistics, so I'll run black and re-run the tests once that's fixed.

Traceback (most recent call last):
[1213](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1237)
  File "/home/runner/work/ironflow/ironflow/tests/unit/test_gui.py", line 98, in test_user_node_registration
[1214](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1238)
    gui = GUI('foo')
[1215](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1239)
  File "/home/runner/work/ironflow/ironflow/ironflow/gui/gui.py", line 64, in __init__
[1216](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1240)
    super().__init__(
[1217](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1241)
  File "/home/runner/work/ironflow/ironflow/ironflow/model/model.py", line 43, in __init__
[1218](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1242)
    from ironflow.nodes.pyiron import atomistics_nodes
[1219](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1243)
  File "/home/runner/work/ironflow/ironflow/ironflow/nodes/pyiron/atomistics_nodes.py", line 34, in <module>
[1220](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1244)
    from pyiron_atomistics.lammps.lammps import Lammps
[1221](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1245)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/pyiron_atomistics/lammps/lammps.py", line 5, in <module>
[1222](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1246)
    from pyiron_atomistics.lammps.interactive import LammpsInteractive
[1223](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1247)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/pyiron_atomistics/lammps/interactive.py", line 13, in <module>
[1224](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1248)
    from pyiron_atomistics.lammps.base import LammpsBase, _check_ortho_prism
[1225](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1249)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/pyiron_atomistics/lammps/base.py", line 26, in <module>
[1226](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1250)
    from pyiron_atomistics.lammps.control import LammpsControl
[1227](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1251)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/pyiron_atomistics/lammps/control.py", line 11, in <module>
[1228](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1252)
    from pyiron_atomistics.lammps.units import LAMMPS_UNIT_CONVERSIONS
[1229](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1253)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/pyiron_atomistics/lammps/units.py", line 160, in <module>
[1230](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1254)
    dtype_dict[v] = np.float
[1231](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1255)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.9/site-packages/numpy/__init__.py", line 284, in __getattr__
[1232](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1256)
    raise AttributeError("module {!r} has no attribute "
[1233](https://github.com/pyiron/ironflow/actions/runs/3887984210/jobs/6634855396#step:3:1257)
AttributeError: module 'numpy' has no attribute 'float'

liamhuber · 2023-01-11T16:40:35Z

Coverage is just complaining because of access rights (coveralls.exception.CoverallsException: Not on Travis or CircleCI. You have to provide either repo_token in .coveralls.yml or set the COVERALLS_REPO_TOKEN env var.). In fact, I've already fought with this before and followed the relevant advice.

I'm going to just merge and see if it's a branch-vs-main access issue somehow. Otherwise the conventional wisdom over on the coveralls github issue is to try deleting the repo on the coveralls side and re-add it.

liamhuber added 2 commits December 2, 2022 12:47

Introduce batched data

4c8b957

Value matching looks for a list-like object instead of directly looking at the value itself.

Extend dtypes with numpy classes

f3bb18f

liamhuber added 6 commits December 5, 2022 09:15

Rename master to main

d33d706

Also for a couple references to github.com/pyiron/pyiron/tree/master, which worked because github kindly redirects us, but should still be main now.

Merge branch 'main' into batching

f021415

Remove batch node

8386c47

A bespoke node causes all sorts of headaches let's switch over to making ports batchable and having a different update method under the hood

Make batching a property of dtypes

69a29c4

Add a control button for batching input

c9412e3

Add more info messages

9cac304

liamhuber added 17 commits December 9, 2022 10:26

Add item access for the ports

6f5fcf6

Update control panel on (un)batching

617fdb3

Adjust formatting

b0887ce

Refactor: extract batching as a parent class

26ffaa5

Only allow batching for nodes that inherit from the batching node

307343a

Add another abstraction layer

e5ba9ad

In anticipation of batching nodes that run jobs

Add error catching

5e0e8a6

Apply batching parent to all the structure nodes

feb7cc4

Allow output to be un-typed for the purpose of setting batchedness

3830ccf

Clear outputs on invalid input

f2c1950

Fix attribtue error checking

13ee50f

Ironflow IO ports always have dtype, but sometimes it's None, so we can make this check a bit more specific

Make sure node updates happen on batch status changes

f29cfe9

Get rid of Char

cdca9b9

chr is a method not a type anyhow

Get batch matching tests working with None values

80ce548

🐛 Fix typo

03f91f4

Label batched data upper-case on nodes

dfa013c

Doesn't work for un-typed (output) ports yet! Probably need a new `Untyped` dtype for that, which would also allow me to get rid of a bunch of `if dtype is not None` checks

Add a new untyped dtype

d80d943

liamhuber added 21 commits January 9, 2023 13:45

Refactor: slide methods

5e598ba

Revise Data tests

4f9127c

Make Data conform to spec

caa98da

Add helper function

9eb9efb

Make Choice conform to spec

bb1470b

Fix cross-type test

8712be2

Choice behaves differently by spec now

Rename, extract method, change signature

755b643

Rename method

f39f76e

Don't specify size

b34726b

It was only ever set to the default value anyhow

🐛 sin not sun

c83b6cc

Actually transpose now

96b35b9

Ensure that 1D vectors do get transposed into 2d matrix space

24021b0

Apply new types to pyiron nodes

f7b8071

Allow List default to be None

b97d9d0

Make note of a useful bug

c4dc2c4

Refactor: slide

603422a

Relax QuickPlot typing

bd2f712

So it will play well with manipulated arrays

Add node for selecting by index

eeef6ca

Change selector defaults

fd48165

🐛 always return something

5b97d31

Relax plotting input type constraints even further

2b89eea

liamhuber removed the stale label Jan 10, 2023

Update example to use batching

8a90767

liamhuber added the format_black Invoke black formatting CI label Jan 11, 2023

Format black

55ce1e0

liamhuber added the run_coverage Triggers the full test suite and coverage label Jan 11, 2023

liamhuber merged commit c5e3670 into main Jan 11, 2023

liamhuber deleted the batching branch January 11, 2023 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching and ubiquitous dtypes #144

Batching and ubiquitous dtypes #144

liamhuber commented Dec 2, 2022 •

edited

Loading

github-actions bot commented Dec 2, 2022

coveralls commented Dec 2, 2022 •

edited

Loading

liamhuber commented Dec 8, 2022

liamhuber commented Dec 12, 2022 •

edited

Loading

liamhuber commented Jan 10, 2023

liamhuber commented Jan 11, 2023

Batching and ubiquitous dtypes #144

Batching and ubiquitous dtypes #144

Conversation

liamhuber commented Dec 2, 2022 • edited Loading

github-actions bot commented Dec 2, 2022

coveralls commented Dec 2, 2022 • edited Loading

Pull Request Test Coverage Report for Build 3622372693

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

liamhuber commented Dec 8, 2022

liamhuber commented Dec 12, 2022 • edited Loading

liamhuber commented Jan 10, 2023

liamhuber commented Jan 11, 2023

liamhuber commented Dec 2, 2022 •

edited

Loading

coveralls commented Dec 2, 2022 •

edited

Loading

liamhuber commented Dec 12, 2022 •

edited

Loading