Modularize the component selection decision tree #403

handwerkerd · 2019-09-27T17:34:26Z

We are planning to modularize the code for the metrics calculated on each component and the decision tree for accepting/rejecting/ignoring components at the 2019 Hackathon #373 This issue is a place to focus any discussion on how we'll do this.

Next Steps

Make a general framework for how the modularization will work
Reorganize 58% of the code in tedana

handwerkerd · 2019-09-27T17:45:29Z

I was talking about this with @afni-dglen and he made a suggestion for one issue that was bothering me. I wanted to document that suggestion, while it's on my mind. Most of the metrics can easily be modularly calculated for a list of components. Some thresholds and steps in the decision tree use what I call ranking metrics, which means taking another metric, ranking all the components by some value, and then making a threshold decision based on their ranks. This gets tricky to modularize because sometimes these ranks are calculated for all metrics and sometimes on a subset of metrics depending on what was calculated earlier in the decision tree - thus they can't be completely modularized away from what will happen in the decision tree.

The suggestion is that all threshold and ranking functions have an input to specify which components to operate on based on classification. That is, calculate threshold on "all components," "only unclassified components", etc. To do this, we might need to add some intermediate classification labels (i.e. "high variance components"), but that might not be a bad thing.

handwerkerd · 2019-09-27T17:48:51Z

Another comment from @afni-dglen was that, in the current decision tree, we are only recording the final decision in the tree before accept/reject/ignore. This becomes an issue if people can make their own decision tree using the more modularized functions. For this case, we might want to track each node in a decision tree that was touched by a component instead of only the final node. Attaching an ID to each node wouldn't be hard, but swapping out the current final label with a path of steps might be a bit more work.

afni-dglen · 2019-09-27T18:10:28Z

First, both suggestions formally came from @mrneont (Paul Taylor). The suggestion actually was rank all components at beginning before any decisions. Then at the time of any decisions in the tree, operate only on the rankings that have some other required properties. The rankings do not need to be recomputed because they are still relative to each other within the list. So if looking at the list of (provisionally) accepted components below as an example, and one wants the top two components, then comp4 and comp5 are accepted.

comp4 rank=1 accept
comp5 rank=3 accept
comp6 rank=2 reject
comp7 rank=4 accept

jbteves · 2019-09-27T18:43:34Z

@afni-dglen and @handwerkerd: So to clarify, the discussion centered around tracking the classification globally (and adding some classifications) so that we have some states or codes associated with certain properties, as well as the accept/reject/ignore decision? So for a simpler example, we might have:

Item	Classification	Decision
1	Ice Cream	Eat
2	Mud	Don't Eat
3	Rocks	Don't Eat

Should we also then add additional classifications, so that an item can be multiple types? This seems like a plausible long-term problem. So then we might want the ability to have something like

Item	Classification(s)	Decision
1	Ice Cream, Vanilla	Don't Eat
2	Ice Cream, Chocolate	Eat
3	Mud	Don't Eat
4	Rocks	Don't Eat
5	Mud, Rocks	Don't Eat

(I hope if I use this example people are more likely to read, digest, and understand than if I say components and use real classifications).

I like this idea if I'm understanding it, because then it could be easier to change which classifications should lead to acceptance/rejection and permute different choices to see their outcomes easily.

handwerkerd · 2019-09-27T18:55:30Z

@jbteves That's about right. One minus of this approach is that every function that has the ability to alter the decision for an item would need a unique identifier. This might over-extend the current number based system (The codes in: https://tedana.readthedocs.io/en/latest/outputs.html#component-tables ), but I don't see an option that wouldn't potentially over-extend that system.

jbteves · 2019-09-27T19:02:42Z

I don't think that would over-extend, it just makes things messy. We could use enums or named tuples or something to make the code legible. It just means that in addition to adding a test, any classifier function also needs to add to the allowable values in the enum/tuple that this new function might create. That way an invalid value will also trigger an interpreter error, which would help any debugging around those methods. Thoughts @emdupre @tsalo ?

afni-dglen · 2019-09-27T19:46:32Z

The unique codes in the component tables may be useful as a simple counting system, but I think they won't generalize well with even small additions or changes to the processing. Named processes, as @jbteves suggests, seems like a better idea. This isn't required for processing, but keeping a history or provenance that way may be useful for later processing or record keeping.

tsalo · 2019-10-01T12:14:08Z

Multiple metrics are calculated on subsets of components at certain stages in the decision tree. Is the proposed method able to work with those?

jbteves · 2019-10-01T12:29:36Z

Yes, I think the idea is that you could basically select components in certain "states" and only use those. So for example, you could take all components classified as "noise," and analyze only those. It's just that we're proposing to do the following:

Make each function that performs classification add a state to components that meet its classification spec
Make a component table which can hold multiple states for each component
Name the states rather than using "codes." Though you can use an enum to have it both ways, really.

tsalo · 2019-10-12T19:36:57Z

Okay, does this pseudocode fit with the proposed approach, or are there elements that I'm missing?

def selection_function(comptable, data, mixing_matrix):
    """Identifies "hazelnut" items, limited to vanilla ice creams.
    Hazelnut items shall be rejected, as they exhibit TE-independence.
    """
    __required_states = ['ice cream', 'vanilla']
    __selection_states = ['hazelnut']

    node_index = comptable.loc[
      any(ss in comptable['states'].split(', ') for ss in __required_states)].index

    node_comptable = comptable.loc[node_index]
    node_mixing_matrix = mixing_matrix[node_index, :]
    updated_comptable = comptable.copy()

    # internal metric calculation step
    # In this case we're getting the total value across each component's time series,
    # which is nonsense since they're all normalized and the sum would be 0.
    node_metric = np.sum(node_mixing_matrix, axis=1)

    # selection/classification step based on component table
    sel_index = node_comptable.loc[node_metric > 5.].index

    # update states and decisions
    updated_comptable.loc[sel_index, 'states'] += ', hazelnut'
    updated_comptable.loc[sel_index, 'classification'] = 'rejected'
    return updated_comptable

EDIT: And I guess the selection trees would be something like:

def selection_tree_01(comptable, mmix, catd):
    # these functions happen to calculate metrics
    comptable = selection_function_01(comptable, mmix, catd)
    comptable = selection_function_02(comptable, mmix, catd)
    # this function only uses states to make decisions
    comptable = selection_function_03(comptable)

handwerkerd · 2019-10-29T01:45:35Z

I’m trying to get things ready for the modularization effort at the hackathon. I’ve sketched out issues and ideas for the full decision tree pipeline. This narrative document is at: https://docs.google.com/document/d/1zhbDGxNVyzSBIsncBDIO7l15VqXREIlgHb5il7RGrPg/edit?usp=sharing
I also took my two decision tree flow charts and added the locations in the code where each step happens (see attached). The above document isn't pretty, and I’m going to try to more explicitly sketch out pseudocode before the hackathon, but this is the point where I figured it would be useful to share what I’ve been working on with @tsalo and others.
Tedana_ICAandComparisonMetrics_Flowchart_v2_WithCurrentCodeLocations.pdf
Tedana_ICADecisionTree_Flowchart_WithCurrentCodeLocations.pdf

stale · 2020-01-27T03:23:08Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

jbteves · 2020-01-27T15:54:17Z

We're working on it, stale-bot, we promise.

stale · 2020-04-26T16:11:21Z

This issue has been automatically marked as stale because it has not had any activity in 90 days. It will be closed in 600 days if no further activity occurs. Thank you for your contributions to tedana:tada: !

handwerkerd added the discussion issues that still need to be discussed label Sep 27, 2019

handwerkerd added this to the transparent and reproducible processing milestone Sep 27, 2019

handwerkerd changed the title ~~Modularize the component selection decision Tree~~ Modularize the component selection decision tree Sep 27, 2019

tsalo added denoising TE-dependence issues related to TE dependence metrics and component selection and removed denoising labels Oct 4, 2019

handwerkerd mentioned this issue Nov 22, 2019

[REF, BRK] Decision tree modularization #476

Closed

stale bot added the stale label Jan 27, 2020

handwerkerd removed the stale label Jan 27, 2020

emdupre mentioned this issue Feb 14, 2020

Reduce stale-bot aggression #528

Closed

stale bot added the stale label Apr 26, 2020

tsalo mentioned this issue Aug 10, 2020

[REF] Decision tree modularization #592

Closed

tsalo removed the stale label Feb 5, 2021

jbteves mentioned this issue Jul 15, 2021

[REF] Decision Tree Modularization #756

Merged

22 tasks

tsalo closed this as completed in #756 May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modularize the component selection decision tree #403

Modularize the component selection decision tree #403

handwerkerd commented Sep 27, 2019

handwerkerd commented Sep 27, 2019

handwerkerd commented Sep 27, 2019

afni-dglen commented Sep 27, 2019

jbteves commented Sep 27, 2019

handwerkerd commented Sep 27, 2019

jbteves commented Sep 27, 2019

afni-dglen commented Sep 27, 2019

tsalo commented Oct 1, 2019

jbteves commented Oct 1, 2019

tsalo commented Oct 12, 2019 •

edited

Loading

handwerkerd commented Oct 29, 2019

stale bot commented Jan 27, 2020

jbteves commented Jan 27, 2020

stale bot commented Apr 26, 2020

Modularize the component selection decision tree #403

Modularize the component selection decision tree #403

Comments

handwerkerd commented Sep 27, 2019

Next Steps

handwerkerd commented Sep 27, 2019

handwerkerd commented Sep 27, 2019

afni-dglen commented Sep 27, 2019

jbteves commented Sep 27, 2019

handwerkerd commented Sep 27, 2019

jbteves commented Sep 27, 2019

afni-dglen commented Sep 27, 2019

tsalo commented Oct 1, 2019

jbteves commented Oct 1, 2019

tsalo commented Oct 12, 2019 • edited Loading

handwerkerd commented Oct 29, 2019

stale bot commented Jan 27, 2020

jbteves commented Jan 27, 2020

stale bot commented Apr 26, 2020

tsalo commented Oct 12, 2019 •

edited

Loading