Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REF] Decision tree modularization #592

Closed
wants to merge 91 commits into from

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Aug 10, 2020

Closes #403 .

Changes proposed in this pull request:

  • The decision tree steps in selection are modularized in new functions and classes
  • decision_tree_class.py contains the new class for initializing and running a decision tree
  • selection/data/ contains json files that define a simpler, conservative decision tree and the full MEICA v2.7 decision tree
    -Most functions that define each decision tree step are in selection_nodes.py
    -There are some minor changes in decision tree steps that might result in breaking the existing tests

Known issues:

  • Some functions in the MEICA decision tree haven't been run yet because they require linking to metric calculation functions. Those functions and all of PR [REF] Modularize metric calculation #447 is already merged into this code, to make that possible.
  • The function documentation isn't being linked to and generated in the ReadTheDocs API section. The documentation also has some gaps. In particular, the documentation for the class structure is not done and there should be a guide explaining the json files for the decision trees and listing rules for creating a new decision tree (i.e. Even though it is possible, it is not advised to create a decision tree that can re-classify accepted or rejected components once they've received those labels)
  • More checks and warnings can be added to various functions. For example, there can be a warning if an accepted or rejected component is reclassified at a later step
  • There is repetitive code at the beginning of each decision tree function. That can be changed into an initialization function (or all the functions in selection_nodes.py could become a class, but that might be more hassle than it's worth). Either of these changes probably wouldn't make the code much shorter, but it would make it easier to know what is expected for each function
  • Some of the functions for the later steps in the decision tree are ugly (i.e. highvariance_highmeanmetricrank_highkapparatio ). These are all functions that tweak metrics, threshold each of them separately, and use the intersection of those thresholded values to make a decision. I can theoretically make a multiple threshold comparison function that would be more generally useful. The minus is that would either break the current way decision nodes are set up, or push a lot more details and function calls into a decision node specification in the json file. On balance, I think these long functions with explanations of what they do are better than adding complexity to the json files, but this deserves a bit more thought. Also, I'm hoping we'll move away from decision trees with some messy joint measurements so I hope the currently messy functions will become legacy functions to maintain a rough replication of the MEICA method, but won't spawn too many additional messy functions.
  • No testing functions have been written for this code yet
  • All the other improvements that aren't coming to mind right now.

jbteves and others added 30 commits November 6, 2019 14:42
@tsalo
Copy link
Member Author

tsalo commented Aug 21, 2020

Update from today's call: @handwerkerd will figure out the functionality we need, then I will figure out if Pydra can do them or if the Pydra devs are willing to add them to Pydra.

Also, as long as the jsons mesh well with Pydra (and thus would be stable across a custom-to-Pydra transition), we could move forward with the internal stuff as-is and shift to Pydra at a later date.

@tsalo
Copy link
Member Author

tsalo commented Sep 8, 2020

@handwerkerd have you had a chance to look into the functionality we would need Pydra to support?

@emdupre
Copy link
Member

emdupre commented Sep 8, 2020

the functionality we would need Pydra to support?

Sorry, I know I missed the last call, but I'm a little unclear why we'll need Pydra at all. So knowing a bit about the functionality we're looking for would be really helpful !

@jbteves
Copy link
Collaborator

jbteves commented Sep 8, 2020

@emdupre we were debating if we wanted to make the workflow pydra-compatible, but I don't recall reaching any conclusions on whether it was worth it, other than unless Pydra folks can help it's not worth it.

@tsalo
Copy link
Member Author

tsalo commented Sep 8, 2020

Sorry, I know I missed the last call, but I'm a little unclear why we'll need Pydra at all.

We can make it work without Pydra, but the modularization we want is essentially a standard workflow, and I'd rather use a stable workflow engine than try to cobble together one ourselves. I think using a custom "engine" to build workflows will only make things harder down the line (e.g., with debugging and training new contributors), so we should only use one if (1) an existing general-purpose engine won't work for our needs or (2) existing engines have too many requirements to be worth it (as with nipype).

@handwerkerd
Copy link
Member

I haven't had a chance to look yet.
As @tsalo notes, it would probably be faster to keep the code as-is, but it might be better for long-term maintenance and adding features, if we use a more generalized pipeline deployment. That said, the big question we left the last meeting with with was: 1. What functionality does tedana need that isn't currently in Pydra; 2. Is it realistic to ask them to add needed functionality to Pydra? If yes to both, then we consider Pydra in the short term. If no, we go forward, as is for now.

@stale stale bot added the stale label Dec 8, 2020
Base automatically changed from master to main February 1, 2021 23:57
@stale stale bot removed the stale label Feb 1, 2021
@ME-ICA ME-ICA deleted a comment from stale bot Feb 6, 2021
@handwerkerd handwerkerd added effort: high More than 40h total work enhancement issues describing possible enhancements to the project impact: high Enhancement or functionality improvement that will affect most users output-change priority: medium Should get addressed soon refactoring issues proposing/requesting changes to the code which do not impact behavior labels Mar 22, 2021
@jbteves jbteves added breaking change WIll make a non-trivial change to outputs and removed output-change labels Apr 19, 2021
@jbteves jbteves mentioned this pull request Jul 15, 2021
22 tasks
@tsalo
Copy link
Member Author

tsalo commented Jul 16, 2021

Should we close this in favor of #756?

@jbteves
Copy link
Collaborator

jbteves commented Jul 16, 2021

I believe so. Any objections @handwerkerd ?

@handwerkerd
Copy link
Member

No objections.

@jbteves jbteves closed this Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change WIll make a non-trivial change to outputs effort: high More than 40h total work enhancement issues describing possible enhancements to the project impact: high Enhancement or functionality improvement that will affect most users priority: medium Should get addressed soon refactoring issues proposing/requesting changes to the code which do not impact behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modularize the component selection decision tree
5 participants