-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modularize the component selection decision tree #403
Comments
I was talking about this with @afni-dglen and he made a suggestion for one issue that was bothering me. I wanted to document that suggestion, while it's on my mind. Most of the metrics can easily be modularly calculated for a list of components. Some thresholds and steps in the decision tree use what I call ranking metrics, which means taking another metric, ranking all the components by some value, and then making a threshold decision based on their ranks. This gets tricky to modularize because sometimes these ranks are calculated for all metrics and sometimes on a subset of metrics depending on what was calculated earlier in the decision tree - thus they can't be completely modularized away from what will happen in the decision tree. The suggestion is that all threshold and ranking functions have an input to specify which components to operate on based on classification. That is, calculate threshold on "all components," "only unclassified components", etc. To do this, we might need to add some intermediate classification labels (i.e. "high variance components"), but that might not be a bad thing. |
Another comment from @afni-dglen was that, in the current decision tree, we are only recording the final decision in the tree before accept/reject/ignore. This becomes an issue if people can make their own decision tree using the more modularized functions. For this case, we might want to track each node in a decision tree that was touched by a component instead of only the final node. Attaching an ID to each node wouldn't be hard, but swapping out the current final label with a path of steps might be a bit more work. |
First, both suggestions formally came from @mrneont (Paul Taylor). The suggestion actually was rank all components at beginning before any decisions. Then at the time of any decisions in the tree, operate only on the rankings that have some other required properties. The rankings do not need to be recomputed because they are still relative to each other within the list. So if looking at the list of (provisionally) accepted components below as an example, and one wants the top two components, then comp4 and comp5 are accepted. comp4 rank=1 accept |
@afni-dglen and @handwerkerd: So to clarify, the discussion centered around tracking the classification globally (and adding some classifications) so that we have some states or codes associated with certain properties, as well as the accept/reject/ignore decision? So for a simpler example, we might have:
Should we also then add additional classifications, so that an item can be multiple types? This seems like a plausible long-term problem. So then we might want the ability to have something like
(I hope if I use this example people are more likely to read, digest, and understand than if I say components and use real classifications). I like this idea if I'm understanding it, because then it could be easier to change which classifications should lead to acceptance/rejection and permute different choices to see their outcomes easily. |
@jbteves That's about right. One minus of this approach is that every function that has the ability to alter the decision for an item would need a unique identifier. This might over-extend the current number based system (The codes in: https://tedana.readthedocs.io/en/latest/outputs.html#component-tables ), but I don't see an option that wouldn't potentially over-extend that system. |
I don't think that would over-extend, it just makes things messy. We could use enums or named tuples or something to make the code legible. It just means that in addition to adding a test, any classifier function also needs to add to the allowable values in the enum/tuple that this new function might create. That way an invalid value will also trigger an interpreter error, which would help any debugging around those methods. Thoughts @emdupre @tsalo ? |
The unique codes in the component tables may be useful as a simple counting system, but I think they won't generalize well with even small additions or changes to the processing. Named processes, as @jbteves suggests, seems like a better idea. This isn't required for processing, but keeping a history or provenance that way may be useful for later processing or record keeping. |
Multiple metrics are calculated on subsets of components at certain stages in the decision tree. Is the proposed method able to work with those? |
Yes, I think the idea is that you could basically select components in certain "states" and only use those. So for example, you could take all components classified as "noise," and analyze only those. It's just that we're proposing to do the following:
|
Okay, does this pseudocode fit with the proposed approach, or are there elements that I'm missing? def selection_function(comptable, data, mixing_matrix):
"""Identifies "hazelnut" items, limited to vanilla ice creams.
Hazelnut items shall be rejected, as they exhibit TE-independence.
"""
__required_states = ['ice cream', 'vanilla']
__selection_states = ['hazelnut']
node_index = comptable.loc[
any(ss in comptable['states'].split(', ') for ss in __required_states)].index
node_comptable = comptable.loc[node_index]
node_mixing_matrix = mixing_matrix[node_index, :]
updated_comptable = comptable.copy()
# internal metric calculation step
# In this case we're getting the total value across each component's time series,
# which is nonsense since they're all normalized and the sum would be 0.
node_metric = np.sum(node_mixing_matrix, axis=1)
# selection/classification step based on component table
sel_index = node_comptable.loc[node_metric > 5.].index
# update states and decisions
updated_comptable.loc[sel_index, 'states'] += ', hazelnut'
updated_comptable.loc[sel_index, 'classification'] = 'rejected'
return updated_comptable EDIT: And I guess the selection trees would be something like: def selection_tree_01(comptable, mmix, catd):
# these functions happen to calculate metrics
comptable = selection_function_01(comptable, mmix, catd)
comptable = selection_function_02(comptable, mmix, catd)
# this function only uses states to make decisions
comptable = selection_function_03(comptable) |
I’m trying to get things ready for the modularization effort at the hackathon. I’ve sketched out issues and ideas for the full decision tree pipeline. This narrative document is at: https://docs.google.com/document/d/1zhbDGxNVyzSBIsncBDIO7l15VqXREIlgHb5il7RGrPg/edit?usp=sharing |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: ! |
We're working on it, stale-bot, we promise. |
This issue has been automatically marked as stale because it has not had any activity in 90 days. It will be closed in 600 days if no further activity occurs. Thank you for your contributions to tedana:tada: ! |
We are planning to modularize the code for the metrics calculated on each component and the decision tree for accepting/rejecting/ignoring components at the 2019 Hackathon #373 This issue is a place to focus any discussion on how we'll do this.
Next Steps
The text was updated successfully, but these errors were encountered: