[REF] Decision Tree Modularization #756

jbteves · 2021-07-15T22:58:09Z

Closes #403, closes #808, closes #809, closes #889, closes #892, closes #936, closes #931, closes #927
Supercedes #592

Changes proposed in this pull request:
See #592
This replaces the inflexible decision tree in tedica.py with a modular structure that will allow for multiple default and user-defined decision trees along with a more interpretable and flexible system for tracking and understanding the results.

Noteworthy implemented features / changes:

Remaining work where we can really use more help

Once the mini-tools are created, have multiple people run on multiple datasets to make sure it runs, gives plausible results, and users with various levels of expertise understand what’s happening.
- If you want to help with this, leave a comment or contact @handwerkerd
- Does Main match the output of the kundu decision tree?
- Is the minimal decision tree reliably more conservative? The minimal tree should accept some component that were rejected by the kundu tree. It’s possible the kundu tree will accept a few low variance components that are rejected by the minimal tree.
After documentation is cleaned up, we will need both developers and non-developer users to read them for clarity.

There are several improvements that aren’t necessary before merging this PR which are opened as stand alone issues:

Improve summaries and visualizations of results with modularized decision tree #888 For built-in tedana reports, better use classification tags to allow for dynamic coloring or selection of components This would be better done by someone who is a more experienced bokeh coder
Improve summaries and visualizations of results with modularized decision tree #888 Write code to automatically create a flow chart or other summary/visualization for a decision tree. This can also be used to make a run-specific visualization that shows how components were reclassified at each node of the decision tree.
harmonize terminology across codebase #919 For example consistently using component_table vs comptable and reducing places where tags with capitalization differences can cause problems
Refactor the tedana.py workflow to better use modularized code #920
Check metrics exist before running tree. Possibly calculate metrics from tree #921
Add check of a component metric is used that includes n/a values #922
Divergence between older MEICA component selection and tedana #929 Decided to merge this PR and then making a separate PR to make a change that will cause a divergence between this PR and the current Main, but will line up better with the older MEICA
Re-do interactive reports demo in documentation #942

Discussed but not going to open an issue unless others have specific use-cases planned for this:

ica_reclassify.py currently just works for manually changing accepted and rejected classifications. Either that function or another can be used to run a follow-up decision tree. Figure out if there are potential use-cases for this functionality and then update code.

tsalo

I have a couple of initial comments. Also, it looks like tedana/selection/DecisionTree.py and tedana/selection/decision_tree_class.py are duplicates.

tsalo · 2021-07-16T16:36:08Z

tedana/resources/decision_trees/kundu.json

+    "info": "Following the full decision tree designed by Prantik Kundu",
+    "report": "This is based on the minimal criteria of the original MEICA decision tree without the more agressive noise removal steps",
+    "refs": "Kundu 2013",
+    "necessary_metrics": [


I think just metrics would be cleaner here.

Thoughts @handwerkerd ?

There are two terms used in the code necessary_metrics and used_metrics. necessary_metrics are declared up-front. With this input, we can take a tree, make a list of all necessary metrics and calculate only those metrics. used_metricsis added to has metrics are used by the decision tree. At the end, there is a check to make sure no used metrics were undeclared innecessary_metrics`.

The other reason I used this terminology is, if we eventually do set up the code to calculate metrics based on a decision tree, then we can have necessary_metrics that are used in the tree and something like additional_metrics which should be calculated, but not used.

I'm not wed to this exact terminology, and am open to ideas for better descriptive terms, but I think metrics alone is insufficient.

tedana/resources/decision_trees/kundu.json

handwerkerd · 2021-11-12T20:01:54Z

@ME-ICA/tedana-devs Josh and I have been working on this and there's been lots of progress. I updated the initial comment to keep track of what's done and what still needs to get done. The big accomplishment is that the minimal decision tree is fully functional with all the structural and functionality changes that were recently discussed.
Function docstrings are similarly updated (though I'm sure the won't all render prefectly) and I wrote document explaining the whole process at: https://github.com/jbteves/tedana/blob/JT_DTM/docs/building_decision_trees.rst

As you can see, there's still a good bit to do, including making the functions underlying the kundu decision tree functional again. This obviously isn't ready for a full review, but feedback is welcome.

eurunuela

I have just gone through the docs page as requested in our last meeting and have made some comments mostly related to typos.

The text is clear to me but I might biased after you explained the ideas behind the modularization in our last meeting.

I'll have a look again once the code is ready for reviews.

docs/building_decision_trees.rst

handwerkerd · 2022-03-17T15:58:02Z

Here's the poll for scheduling a decision tree walk-through meeting sometime in the next two weeks: https://doodle.com/meeting/participate/id/9b6wn5Ld I've already limited to times I could be available.
@tsalo @eurunuela @notZaki & @jbteves all expressed interest in attending.

docs/building_decision_trees.rst

tedana/resources/config/outputs.json

eurunuela · 2022-04-01T10:14:57Z

I was reading the docs again and I think it would be super helpful if every time we talk mention, e.g., necessary_metrics or functionname, we linked those keywords to where they are in the minimal decision tree. This way we direct users to an example, which will probably help them understand the docs better.

jbteves · 2022-04-12T19:30:19Z

@handwerkerd had to overwrite your changes for simplicity for mmix/black formatting, sorry.

jbteves · 2022-04-13T14:36:46Z

@eurunuela as an FYI I have created a new class, InputHarvester, that reads an OutputGenerator's "registry" of files from a previous run. You can use this to get all of the information you might want from a tedana run. See "tedana_reclassify.py" for an example of how this is done.

jbteves · 2022-04-13T14:38:05Z

More general FYI: this is a huge breaking change but basically the harder I tried to put --manacc into this framework, the worse it looked, so I gave up and created a new workflow, tedana_reclassify that has --manacc and --manrej options, it appears to work locally. I can't tell about CI because of the jinja issue.

…t's wrong

ran black

codecov · 2022-08-09T14:47:14Z

Codecov Report

Patch coverage: 96.18% and project coverage change: -4.34 ⚠️

Comparison is base (fb6e255) 93.30% compared to head (8dcdafa) 88.97%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #756      +/-   ##
==========================================
- Coverage   93.30%   88.97%   -4.34%     
==========================================
  Files          28       27       -1     
  Lines        2346     3373    +1027     
  Branches        0      617     +617     
==========================================
+ Hits         2189     3001     +812     
- Misses        157      226      +69     
- Partials        0      146     +146

Impacted Files	Coverage Δ
tedana/decomposition/pca.py	`76.61% <ø> (-12.91%)`	⬇️
tedana/utils.py	`94.59% <ø> (-2.71%)`	⬇️
tedana/reporting/html_report.py	`91.39% <60.00%> (-8.61%)`	⬇️
tedana/reporting/static_figures.py	`96.34% <66.66%> (-2.45%)`	⬇️
tedana/docs.py	`77.35% <77.35%> (ø)`
tedana/reporting/dynamic_figures.py	`96.05% <81.25%> (-3.95%)`	⬇️
tedana/workflows/tedana.py	`80.95% <85.71%> (-8.68%)`	⬇️
tedana/io.py	`87.37% <87.35%> (-6.64%)`	⬇️
tedana/workflows/ica_reclassify.py	`97.79% <97.79%> (ø)`
tedana/selection/component_selector.py	`99.01% <99.01%> (ø)`
... and 7 more

... and 14 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Fixed rho threshold error and added elbows to reports

* Cleans up how testing datasets are downloaded within test_integration.py. In Main & the current JT_DTM each dataset is downloaded in a slightly different way and the five-echo data are downloaded twice. * Added `data_for_testing_info` which gives the file hash location and local directory name for each of the four files we download. All tests are updated to use this function. * The local copy of testing data will now go into the `.testing_data_cache` subdirectory * The downloaded testing data will be in separate directories from the outputs so the downloaded directories can be completely static * When `download_test_data` is called, it will first download the metadata json to see if the last updated copy on osf.io is newer than the downloaded version and will only download if osf has a newer file. Downloading the metadata will happen frequently, but it will hopefully be fast. * The logger is now used to give a warning if osf.io cannot be accessed, but it will still run using cached data

* Added dec_reclassify_high_var_comps plus * clarified diff btwn rho_kundu and _liberal thresh * Clarified docs for minimal tree

* Update gitignore. * Delete _version.py * Adopt new packaging. * Ignore the _version.py file.

* Base the cache on pyproject.toml, not setup.cfg. * Also drop use of setup.py in publishing action.

* ica_reclassify docs now rendering in usage.html * moves file parsing to ica_reclassify_workflow * added error checks and tests

* add pandas version check >= 1.5.2 and mod behavior (#938) * add version check and mod behavior if pandas >= 1.5.2 to prevent error in writing csv * formatting * adding P. Molfese --------- Co-authored-by: Molfese <[email protected]> * readded InputHarvester and expanduser * fixed handler base_dir path * mixing matrix file always in registry --------- Co-authored-by: Peter J. Molfese <[email protected]> Co-authored-by: Molfese <[email protected]>

* Drop Python 3.6 and 3.7 support. * line_terminator --> lineterminator

* Some contributor updates * Added doc to Marco

* Added flow charts and some text * Finished flow charts and text. Co-authored-by: marco7877 <[email protected]> --------- Co-authored-by: marco7877 <[email protected]>

* Update docs. * Update docs/building_decision_trees.rst Co-authored-by: Dan Handwerker <[email protected]> --------- Co-authored-by: Dan Handwerker <[email protected]>

* Output docs on one page * added new multi-echo lectures

handwerkerd · 2023-05-11T13:56:37Z

@ME-ICA/tedana-devs Today at 2:00PM EST (Your time zone), we are are planning to relase the last version of tedana before the major refactor (v0.0.13) and then merge this PR and release the more modularized version (v23.0.0).

Since this is way-too-many years in the making, we'll do this over zoom. Come join the "fun" at https://nih.zoomgov.com/j/1612837388?pwd%3DK1drdXVkK0xER1hEbkNzbUljQ0ZoUT09&sa=D&source=calendar&usd=2&usg=AOvVaw1cvaBfqiQE-iNPsBQwHmk5
Meeting ID: 161 283 7388
Passcode: 153769

tsalo

LGTM!

jbteves marked this pull request as draft July 15, 2021 22:58

tsalo mentioned this pull request Jul 16, 2021

[REF] Decision tree modularization #592

Closed

tsalo reviewed Jul 16, 2021

View reviewed changes

handwerkerd mentioned this pull request Sep 24, 2021

Give elbow criteria a closer look #810

Open

handwerkerd mentioned this pull request Nov 12, 2021

November Developers' call. THURSDAY, 11/18 #826

Closed

handwerkerd mentioned this pull request Nov 17, 2021

Run getelbow only if there are enough components #809

Closed

eurunuela reviewed Dec 3, 2021

View reviewed changes

handwerkerd mentioned this pull request Feb 11, 2022

Print optimal number of maPCA components and plot optimization curves #839

Merged

4 tasks

handwerkerd mentioned this pull request Mar 30, 2022

Decision tree modularization walk through #864

Closed

tsalo reviewed Mar 31, 2022

View reviewed changes

docs/building_decision_trees.rst Outdated Show resolved Hide resolved

tsalo reviewed Mar 31, 2022

View reviewed changes

tedana/resources/config/outputs.json Outdated Show resolved Hide resolved

handwerkerd mentioned this pull request Apr 1, 2022

Orthogonalizing rejected components permanently alters the ICA mixing matrix #868

Closed

Joshua Teves and others added 9 commits August 8, 2022 16:21

Decision tree refactor with minimal and kundu

4b8e555

Fix commented-out tedana workflow

7663afb

Appease the style checker

1284ff5

All tremble before the mighty linter

bfbc509

Actually fix incorrect style checker issue

f01a9a9

Unfix another style checker error

29b6fba

Attempt to make Black happy, even though it does not actually say wha…

ac34882

…t's wrong

ran black

9e8159a

Merge pull request #10 from handwerkerd/DTM_BlackFix

c67bfa8

ran black

handwerkerd and others added 2 commits August 10, 2022 14:37

Added elbows to reports

fd4abf2

Merge pull request #11 from handwerkerd/AddElbowsToReport

8935063

Fixed rho threshold error and added elbows to reports

handwerkerd and others added 14 commits February 28, 2023 14:34

Change to TestLGR.info

2e45e8d

Fixing high variance classification mess (#34)

8acf185

* Added dec_reclassify_high_var_comps plus * clarified diff btwn rho_kundu and _liberal thresh * Clarified docs for minimal tree

Replace versioneer with hatch (#35)

a2a30fa

* Update gitignore. * Delete _version.py * Adopt new packaging. * Ignore the _version.py file.

Fix CI (#36)

7129854

* Base the cache on pyproject.toml, not setup.cfg. * Also drop use of setup.py in publishing action.

Add flake8-pyproject as a requirement. (#37)

df57b56

Try fixing coverage. (#38)

f7cf821

Improving ica_reclassify (#39)

ac52721

* ica_reclassify docs now rendering in usage.html * moves file parsing to ica_reclassify_workflow * added error checks and tests

Drop Python 3.6 and 3.7 support (#40)

29eee66

* Drop Python 3.6 and 3.7 support. * line_terminator --> lineterminator

added mixm to 4echo test (#43)

a3dc1c2

Updating Contributor Information (#41)

3e48d9f

* Some contributor updates * Added doc to Marco

Added flow charts and some text (#44)

1e7ee6e

* Added flow charts and some text * Finished flow charts and text. Co-authored-by: marco7877 <[email protected]> --------- Co-authored-by: marco7877 <[email protected]>

RTDfix (#45)

5921237

handwerkerd previously approved these changes May 5, 2023

View reviewed changes

Update documentation (#46)

26ff954

* Update docs. * Update docs/building_decision_trees.rst Co-authored-by: Dan Handwerker <[email protected]> --------- Co-authored-by: Dan Handwerker <[email protected]>

handwerkerd dismissed their stale review via 26ff954 May 8, 2023 14:43

This was referenced May 9, 2023

Adding elbow & variance information to the reports #764

Open

Re-do interactive reports demo in documentation #942

Closed

Output docs on one page (#47)

bc1cf40

* Output docs on one page * added new multi-echo lectures

handwerkerd approved these changes May 10, 2023

View reviewed changes

Merge branch 'main' into JT_DTM

8dcdafa

tsalo approved these changes May 11, 2023

View reviewed changes

tsalo merged commit 86a8139 into ME-ICA:main May 11, 2023

This was referenced May 17, 2023

custom sections for napoleon output files #910

Closed

add_na_to_rationale: Add n/a to accepted components rationale #907

Closed

Add Rica documentation #883

Closed

Use custom docstring section for generated files #828

Closed

tsalo mentioned this pull request Apr 21, 2024

Minimum image regression relies on distinction between "accepted" and "ignored" components #1085

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REF] Decision Tree Modularization #756

[REF] Decision Tree Modularization #756

jbteves commented Jul 15, 2021 •

edited by handwerkerd

Loading

tsalo left a comment

tsalo Jul 16, 2021

jbteves Nov 18, 2022

handwerkerd Nov 23, 2022

handwerkerd commented Nov 12, 2021

eurunuela left a comment

handwerkerd commented Mar 17, 2022

eurunuela commented Apr 1, 2022

jbteves commented Apr 12, 2022

jbteves commented Apr 13, 2022

jbteves commented Apr 13, 2022

codecov bot commented Aug 9, 2022 •

edited

Loading

handwerkerd commented May 11, 2023

tsalo left a comment

[REF] Decision Tree Modularization #756

[REF] Decision Tree Modularization #756

Conversation

jbteves commented Jul 15, 2021 • edited by handwerkerd Loading

tsalo left a comment

Choose a reason for hiding this comment

tsalo Jul 16, 2021

Choose a reason for hiding this comment

jbteves Nov 18, 2022

Choose a reason for hiding this comment

handwerkerd Nov 23, 2022

Choose a reason for hiding this comment

handwerkerd commented Nov 12, 2021

eurunuela left a comment

Choose a reason for hiding this comment

handwerkerd commented Mar 17, 2022

eurunuela commented Apr 1, 2022

jbteves commented Apr 12, 2022

jbteves commented Apr 13, 2022

jbteves commented Apr 13, 2022

codecov bot commented Aug 9, 2022 • edited Loading

Codecov Report

handwerkerd commented May 11, 2023

tsalo left a comment

Choose a reason for hiding this comment

jbteves commented Jul 15, 2021 •

edited by handwerkerd

Loading

codecov bot commented Aug 9, 2022 •

edited

Loading