[REF] Decision tree modularization #592

tsalo · 2020-08-10T18:47:08Z

Closes #403 .

Changes proposed in this pull request:

The decision tree steps in selection are modularized in new functions and classes
decision_tree_class.py contains the new class for initializing and running a decision tree
selection/data/ contains json files that define a simpler, conservative decision tree and the full MEICA v2.7 decision tree
-Most functions that define each decision tree step are in selection_nodes.py
-There are some minor changes in decision tree steps that might result in breaking the existing tests

Known issues:

Some functions in the MEICA decision tree haven't been run yet because they require linking to metric calculation functions. Those functions and all of PR [REF] Modularize metric calculation #447 is already merged into this code, to make that possible.
The function documentation isn't being linked to and generated in the ReadTheDocs API section. The documentation also has some gaps. In particular, the documentation for the class structure is not done and there should be a guide explaining the json files for the decision trees and listing rules for creating a new decision tree (i.e. Even though it is possible, it is not advised to create a decision tree that can re-classify accepted or rejected components once they've received those labels)
More checks and warnings can be added to various functions. For example, there can be a warning if an accepted or rejected component is reclassified at a later step
There is repetitive code at the beginning of each decision tree function. That can be changed into an initialization function (or all the functions in selection_nodes.py could become a class, but that might be more hassle than it's worth). Either of these changes probably wouldn't make the code much shorter, but it would make it easier to know what is expected for each function
Some of the functions for the later steps in the decision tree are ugly (i.e. highvariance_highmeanmetricrank_highkapparatio ). These are all functions that tweak metrics, threshold each of them separately, and use the intersection of those thresholded values to make a decision. I can theoretically make a multiple threshold comparison function that would be more generally useful. The minus is that would either break the current way decision nodes are set up, or push a lot more details and function calls into a decision node specification in the json file. On balance, I think these long functions with explanations of what they do are better than adding complexity to the json files, but this deserves a bit more thought. Also, I'm hoping we'll move away from decision trees with some messy joint measurements so I hope the currently messy functions will become legacy functions to maintain a rough replication of the MEICA method, but won't spawn too many additional messy functions.
No testing functions have been written for this code yet
All the other improvements that aren't coming to mind right now.

…ip_integration

… into feature/pytest_skip_integration

This reverts commit 769f4b7.

… into feature/pytest_skip_integration

tsalo · 2020-08-21T15:12:04Z

Update from today's call: @handwerkerd will figure out the functionality we need, then I will figure out if Pydra can do them or if the Pydra devs are willing to add them to Pydra.

Also, as long as the jsons mesh well with Pydra (and thus would be stable across a custom-to-Pydra transition), we could move forward with the internal stuff as-is and shift to Pydra at a later date.

tsalo · 2020-09-08T16:52:17Z

@handwerkerd have you had a chance to look into the functionality we would need Pydra to support?

emdupre · 2020-09-08T16:58:38Z

the functionality we would need Pydra to support?

Sorry, I know I missed the last call, but I'm a little unclear why we'll need Pydra at all. So knowing a bit about the functionality we're looking for would be really helpful !

jbteves · 2020-09-08T17:06:01Z

@emdupre we were debating if we wanted to make the workflow pydra-compatible, but I don't recall reaching any conclusions on whether it was worth it, other than unless Pydra folks can help it's not worth it.

tsalo · 2020-09-08T17:10:07Z

Sorry, I know I missed the last call, but I'm a little unclear why we'll need Pydra at all.

We can make it work without Pydra, but the modularization we want is essentially a standard workflow, and I'd rather use a stable workflow engine than try to cobble together one ourselves. I think using a custom "engine" to build workflows will only make things harder down the line (e.g., with debugging and training new contributors), so we should only use one if (1) an existing general-purpose engine won't work for our needs or (2) existing engines have too many requirements to be worth it (as with nipype).

handwerkerd · 2020-09-08T17:41:53Z

I haven't had a chance to look yet.
As @tsalo notes, it would probably be faster to keep the code as-is, but it might be better for long-term maintenance and adding features, if we use a more generalized pipeline deployment. That said, the big question we left the last meeting with with was: 1. What functionality does tedana need that isn't currently in Pydra; 2. Is it realistic to ask them to add needed functionality to Pydra? If yes to both, then we consider Pydra in the short term. If no, we go forward, as is for now.

tsalo · 2021-07-16T15:55:36Z

Should we close this in favor of #756?

jbteves · 2021-07-16T15:56:26Z

I believe so. Any objections @handwerkerd ?

handwerkerd · 2021-07-16T17:02:33Z

No objections.

jbteves and others added 30 commits November 6, 2019 14:42

Adds integration and five echo skipping

13a54d4

Style fixes

1bb1f18

Updates config for CircleCI

08069d4

Merge remote-tracking branch 'upstream/master' into feature/pytest_sk…

3f50216

…ip_integration

Attempts to fix YML

067ac2b

[TEST] Update Dockerfile to match new integr tests

888de0f

[TEST] Fixes integration tests in Docker image

7209897

[FIX] Remove intermediate IO files

5d2b65a

Resolves merge conflict, adds output check

df6261f

Some fixes

2210aa7

[TEST] Updates dev_tool testing infra

efd9e7a

[TEST] Fixes pytest integration path checking

494bc63

[TEST] CircleCI uses Docker image to run tests

95dffda

[FIX] Minor dev_tool issues for CircleCI

3b966fa

[TEST] Use variable for integration test filename

796a0d2

Attempts to fix CircleCI style check

769f4b7

Merge remote-tracking branch 'origin/feature/pytest_skip_integration'…

a4e17a8

… into feature/pytest_skip_integration

Revert "Attempts to fix CircleCI style check"

08a3ce3

This reverts commit 769f4b7.

Attempt to fix tput call

8769d0b

Adds checkout to code in YML

20086cb

[TEST] Integration tests run in parallel

3b1f4f6

[TEST] Separate data downloads from Docker build

86b0a9b

[TEST] Update integration test data path

17ce437

[TEST] CircleCI uses good Docker

0226e03

[TEST] No version check in circleci

39fdae1

[TEST] Checkout for get_data / style check

07e18e5

Attempts to fix integration test inclusion

2ab23bd

Merge remote-tracking branch 'origin/feature/pytest_skip_integration'…

31b790a

… into feature/pytest_skip_integration

[TEST] Checkout for get_data / style check

74551a0

[FIX] Fix circleci config hopefully

a129019

jbteves mentioned this pull request Sep 8, 2020

Topics for the September 2020 Developers Call #604

Closed

jbteves mentioned this pull request Oct 7, 2020

[REF] Modularize metric calculation #591

Merged

stale bot added the stale label Dec 8, 2020

jbteves mentioned this pull request Dec 8, 2020

Revisit component sorting within metric calculation #564

Closed

Base automatically changed from master to main February 1, 2021 23:57

stale bot removed the stale label Feb 1, 2021

ME-ICA deleted a comment from stale bot Feb 6, 2021

handwerkerd mentioned this pull request Mar 22, 2021

Incorporate type hints when possible #704

Open

jbteves added breaking change WIll make a non-trivial change to outputs and removed output-change labels Apr 19, 2021

jbteves mentioned this pull request Jul 15, 2021

[REF] Decision Tree Modularization #756

Merged

22 tasks

jbteves closed this Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REF] Decision tree modularization #592

[REF] Decision tree modularization #592

tsalo commented Aug 10, 2020

tsalo commented Aug 21, 2020

tsalo commented Sep 8, 2020

emdupre commented Sep 8, 2020

jbteves commented Sep 8, 2020

tsalo commented Sep 8, 2020

handwerkerd commented Sep 8, 2020

tsalo commented Jul 16, 2021

jbteves commented Jul 16, 2021

handwerkerd commented Jul 16, 2021

[REF] Decision tree modularization #592

[REF] Decision tree modularization #592

Conversation

tsalo commented Aug 10, 2020

tsalo commented Aug 21, 2020

tsalo commented Sep 8, 2020

emdupre commented Sep 8, 2020

jbteves commented Sep 8, 2020

tsalo commented Sep 8, 2020

handwerkerd commented Sep 8, 2020

tsalo commented Jul 16, 2021

jbteves commented Jul 16, 2021

handwerkerd commented Jul 16, 2021