-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP, ENH] Add ability to read surface .gii files #22
Conversation
Modified some code in utils.py to be more readable and to accept various input types (e.g., lists of data files, arrays, etc.). In the process, changed a some doc-strings, modified the print statements to be logging statements, and made a few comments for identification of things that need to be changed to better integrate surface files.
In the process of adding gifti support, but breaking EVERYTHING. Ensuring that all stages of `tedana` do not require spatial information about the input data. Making minor aesthetic and stylistic updates as I go through the code.
Everything is still very broken as I'm working to through to fix all the functions to work with 3D arrays (samples x echos x time). I found the place where things start to diverge from the previous versions, but it seems to be due to numerical instabilities? Unclear. Will figure out later!
More major overhauls in the process of supporting GIFTI files. The `niwrite` functions has been killed to give way to `filewrite`. Lots of stylistic changes in the proces of integrating this throughout `tedana.interfaces.tedana`, but mostly just significant new functionality in `tedana.utils`.
Patches: * csstepdata saved as JSON rather than a text dump * Removes unused imports in test.utils to fix linting errors
I realized my original comment vastly understated the amount of refactoring that I'm doing to integrate gifti support. This is going to be a bit of a beast of a PR, so perhaps some reviews along the way might be helpful to ensuring that I'm on the right track? I think that my refactoring of |
Patch errors
Added incredibly basic doc-strings to most of the functions in tedana.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a few comments on where I think things could be clarified in utils
, but overall that module looks good !
tedana/utils/utils.py
Outdated
Unmasked `data` array | ||
""" | ||
|
||
out = np.zeros((mask.shape + data.shape[1:])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this is not out = np.zeros(mask.shape + data.shape[1:])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! Good catch -- will fix in the next commit.
tedana/utils/utils.py
Outdated
|
||
Parameters | ||
---------- | ||
data : (M x E x T) array_like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This notation is slightly confusing. Are M
and S
of equal size ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the confusion; I'll beef up the doc-string here. M
refers to the number of True
values in the mask that had been applied to the data.
tedana/utils/utils.py
Outdated
Masked `data`, where `S` is samples, `E` is echoes, and `T` is time | ||
""" | ||
|
||
if mask is not None and not type(data) == type(mask): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we confirm that they're of the same shape, not just the same type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually removed almost all the calls to fmask()
in the main tedana.py
interface; since we're always working with data in (S x E x T)
dimensionality, it became obsolete because masking with a boolean array of shape (S,)
was as simple as data[mask]
. Because mask
in the main interface is generated from the data, I think we can remove this function entirely.
tedana/utils/utils.py
Outdated
# use nilearn for other files | ||
data = check_niimg(data) | ||
if mask is not None: | ||
# TODO: check that this uses same order to flatten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same order as what ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was a holdover from when load_data()
and load_image()
were explicitly reshaping the input data with order=F
, which I've since removed. I had wanted to make sure that nilearn's apply_mask()
function was doing the same flattening -- but since that's no longer applicable I'll remove this. And I'll try to be more explicit with my code-comments in the future, too.
tedana/utils/utils.py
Outdated
ndarray | ||
Array of shape (nx, ny, nz[, Ne[, nt]]) | ||
fdata : (X x Y x M x T) np.ndarray | ||
Z-concatenated multi-echo data array, where M is Z * number of echos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we drop zcat compatibility (as discussed in #26), will we drop this function as well ? For now, it's probably worth keeping, but thinking ahead....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uncat2echos
function is actually no longer needed, I think -- even with retained zcat compatibility! A combination of load_data()
and filewrite()
should be enough to ensure we can handle everything. I'll delete this code to remove any confusion!
tedana/utils/utils.py
Outdated
return fdata, ref_img | ||
|
||
|
||
def makeadmask(data, minimum=True, getsum=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename this? I'm still not sure what ad
in makeadmask
stands for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely can be renamed. I believe it stood for make adapative mask, but since that's clear to exactly no one, a more pythonic naming convention (make_adaptive_mask()
) might be more applicable? I'm happy to do whatever works best!
else: | ||
nt = 1 | ||
return np.reshape(data, (nx, ny, nz, nt), order='F') | ||
if isinstance(data, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we raise an error for the else
condition, here? I'm not sure how it could be triggered, but just in case....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worth some discussion about how we envision people using the tedana interface. As it stands, the primary mechanism for doing that is through argparse
in cli/run.py
; if that's the case, data
will always be a list. However, if the goal is to make this a bit more modular, then we should probably add in some other checks.
Right now even if data
is a str
pointing to a nifti file this function should still work (without an explicit else
clause). But it might be good to enforce some requirements! I'm open to comments/suggestions and will be happy to build them in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree there's a broader discussion to have about how to use the tedana interface-- I'll punt on that for now. It sounds like this can handle the two most likely cases, so we can revisit when it comes time to talk about how we expect people to run the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear how necessary this is, but I'm fairly sure some of the current component selection criteria use spatial information. For example, in tedana.py, see the current code starting at line 652: "Step 2: Compute 3-D spatial FFT of Beta maps to detect high-spatial
frequency artifacts"
The simple solution would be to just skip 3-D space dependent criteria with gii files. I suspect that won't cause much harm, but it would require a some close examination of results processed as volumes vs surfaces.
My hunch is that the spatial transforms and interpolations across voxels to go from a volume to a surface might cause some undesired issues with echo-dependent estimations, but there might also be some issues with non-trivial transforms & interpolations in volume space as well. I also don't have a good intuitive sense of how tedana classification mistakes would look in surface space, so there might need to be some thought on how to best visualize results to help users identify issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, and thanks for raising it! Yes, there are going to have to be some differences between the pipelines for volumetric + surface data. Thankfully, having CircleCI running will ensure that none of the changes I'm making in this PR impact the volumetric results in any way, for now.
A lot of this PR is actually just refactoring to ensure that data can be seamlessly passed between functions without regard for the type of data being passed. This will hopefully be useful later in making the code more modular.
As for visualization of surface-based results: I completely agree. I think there will have to be some significant testing to ensure that tedana
is actually suitable for surface data, given the limited spatial information. Hopefully this PR will be the first step in that process!
Also, other updates to `selcomps()` to hopefully get it passing.
To aid in merging #22, where we've collapsed the xyz dimensions into an S dimension. Tracing back on a numpy core bug with large arrays: numpy/numpy#8869 (comment)
It passes! Which means this should be ready for review. 🎉 I haven't actually tested whether surface data can be run through yet, but it's in a much better place to incorporate those data. I think it might be beneficial to merge this in and then work out the surface bugs (and add some tests to ensure they work) afterwards. Note: in the process of making these changes this I experienced an error wherein taking the mean of a large, float32 numpy array gave varying results across Python instances, despite the underlying data being exactly the same to an incredibly conservative tolerance. Because of this, this PR includes a (very slight) breaking change; the CI was updated to reflect these new (hopefully more accurate) outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic ! Some minor questions, but I think this is ready.
tedana/interfaces/t2smap.py
Outdated
x = np.column_stack([np.ones(echo), [-te for te in tes[:echo]]]) | ||
X = np.repeat(x, n_vols, axis=0) | ||
|
||
beta, res, rank, sing = np.linalg.lstsq(X, B) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these variables were added back in ? Previously it was
beta, _, _, _ = np.linalg.lstsq(X, B)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No! And indeed, I think a more parsimonious solution than either of the above would probably be:
beta = np.linalg.lstsq(X, B)[0]
Not sure my rationale here, so I'll change it to reflect this.
tedana/interfaces/tedana.py
Outdated
@@ -63,145 +68,122 @@ def do_svm(X_train, y_train, X_test, svmtype=0): | |||
elif svmtype == 2: | |||
clf = svm.SVC(kernel='linear', probability=True) | |||
else: | |||
raise ValueError('Input svmtype not in range (3)') | |||
raise ValueError('Input svmtype not in [1, 2, 3]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't the possible types [0, 1, 2]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Will fix asap.
tedana/interfaces/tedana.py
Outdated
infile = '__clin.nii.gz' | ||
infile = filewrite(unmask(data, mask), '__clin.nii.gz', ref_img, gzip=True) | ||
|
||
# FIXME: ideally no calls to os.system!!! (or AFNI, for that matter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should open an issue for this 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! See #32.
tedana/interfaces/tedana.py
Outdated
mu = catd.mean(axis=-1) | ||
tes = np.reshape(tes, (n_echoes, 1)) | ||
fmin, fmid, fmax = getfbounds(n_echoes) | ||
mu = catd.mean(axis=-1, dtype=float) # BUG: THIS IS THE BAD PLACE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as I love / hate this code comment, do you think the explanation in the PR is enough to remove it for now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can definitely be removed since there is a trace of the rationale behind changing it. It might be worth following numpy development and reverting this change eventually, but I think that will be further down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can definitely be removed since there is a trace of the rationale behind changing it. It might be worth following numpy development and reverting this change eventually, but I think that will be further down the line.
Minor changes to address review comments for PR #22
LGTM ! Merging ! |
Working to add the ability to seamlessly accept gifti files in addition to nifti files.
This is still a long ways from ready, but I want to use circle testing to ensure I don't screw up anything in the process of making changes. As I'm modifying the functions to accept surface files, I'm making some stylistic changes along the way and beefing up doc-strings, as applicable.