Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP, ENH] Add ability to read surface .gii files #22

Merged
merged 21 commits into from
May 10, 2018
Merged

[WIP, ENH] Add ability to read surface .gii files #22

merged 21 commits into from
May 10, 2018

Conversation

rmarkello
Copy link
Member

Working to add the ability to seamlessly accept gifti files in addition to nifti files.

This is still a long ways from ready, but I want to use circle testing to ensure I don't screw up anything in the process of making changes. As I'm modifying the functions to accept surface files, I'm making some stylistic changes along the way and beefing up doc-strings, as applicable.

Modified some code in utils.py to be more readable and to accept various input
types (e.g., lists of data files, arrays, etc.). In the process, changed a some
doc-strings, modified the print statements to be logging statements, and made a
few comments for identification of things that need to be changed to better
integrate surface files.
In the process of adding gifti support, but breaking EVERYTHING. Ensuring that
all stages of `tedana` do not require spatial information about the input data.
Making minor aesthetic and stylistic updates as I go through the code.
rmarkello and others added 7 commits May 4, 2018 01:04
Everything is still very broken as I'm working to through to fix all the
functions to work with 3D arrays (samples x echos x time). I found the place
where things start to diverge from the previous versions, but it seems to be
due to numerical instabilities? Unclear. Will figure out later!
More major overhauls in the process of supporting GIFTI files. The `niwrite`
functions has been killed to give way to `filewrite`. Lots of stylistic changes
in the proces of integrating this throughout `tedana.interfaces.tedana`, but
mostly just significant new functionality in `tedana.utils`.
Patches:
* csstepdata saved as JSON rather than a text dump
* Removes unused imports in test.utils to fix linting errors
@rmarkello
Copy link
Member Author

I realized my original comment vastly understated the amount of refactoring that I'm doing to integrate gifti support.

This is going to be a bit of a beast of a PR, so perhaps some reviews along the way might be helpful to ensuring that I'm on the right track? I think that my refactoring of utils.py is mostly done (though there will be a few minor tweaks to accommodate L/R hemisphere denotations for gifti files), if someone would like to start there!

emdupre and others added 3 commits May 7, 2018 15:35
Added incredibly basic doc-strings to most of the functions in tedana.py
Copy link
Member

@emdupre emdupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few comments on where I think things could be clarified in utils, but overall that module looks good !

Unmasked `data` array
"""

out = np.zeros((mask.shape + data.shape[1:]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is not out = np.zeros(mask.shape + data.shape[1:]) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Good catch -- will fix in the next commit.


Parameters
----------
data : (M x E x T) array_like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notation is slightly confusing. Are M and S of equal size ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the confusion; I'll beef up the doc-string here. M refers to the number of True values in the mask that had been applied to the data.

Masked `data`, where `S` is samples, `E` is echoes, and `T` is time
"""

if mask is not None and not type(data) == type(mask):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we confirm that they're of the same shape, not just the same type ?

Copy link
Member Author

@rmarkello rmarkello May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually removed almost all the calls to fmask() in the main tedana.py interface; since we're always working with data in (S x E x T) dimensionality, it became obsolete because masking with a boolean array of shape (S,) was as simple as data[mask]. Because mask in the main interface is generated from the data, I think we can remove this function entirely.

# use nilearn for other files
data = check_niimg(data)
if mask is not None:
# TODO: check that this uses same order to flatten
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same order as what ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was a holdover from when load_data() and load_image() were explicitly reshaping the input data with order=F, which I've since removed. I had wanted to make sure that nilearn's apply_mask() function was doing the same flattening -- but since that's no longer applicable I'll remove this. And I'll try to be more explicit with my code-comments in the future, too.

ndarray
Array of shape (nx, ny, nz[, Ne[, nt]])
fdata : (X x Y x M x T) np.ndarray
Z-concatenated multi-echo data array, where M is Z * number of echos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we drop zcat compatibility (as discussed in #26), will we drop this function as well ? For now, it's probably worth keeping, but thinking ahead....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uncat2echos function is actually no longer needed, I think -- even with retained zcat compatibility! A combination of load_data() and filewrite() should be enough to ensure we can handle everything. I'll delete this code to remove any confusion!

return fdata, ref_img


def makeadmask(data, minimum=True, getsum=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename this? I'm still not sure what ad in makeadmask stands for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely can be renamed. I believe it stood for make adapative mask, but since that's clear to exactly no one, a more pythonic naming convention (make_adaptive_mask()) might be more applicable? I'm happy to do whatever works best!

else:
nt = 1
return np.reshape(data, (nx, ny, nz, nt), order='F')
if isinstance(data, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we raise an error for the else condition, here? I'm not sure how it could be triggered, but just in case....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth some discussion about how we envision people using the tedana interface. As it stands, the primary mechanism for doing that is through argparse in cli/run.py; if that's the case, data will always be a list. However, if the goal is to make this a bit more modular, then we should probably add in some other checks.

Right now even if data is a str pointing to a nifti file this function should still work (without an explicit else clause). But it might be good to enforce some requirements! I'm open to comments/suggestions and will be happy to build them in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree there's a broader discussion to have about how to use the tedana interface-- I'll punt on that for now. It sounds like this can handle the two most likely cases, so we can revisit when it comes time to talk about how we expect people to run the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear how necessary this is, but I'm fairly sure some of the current component selection criteria use spatial information. For example, in tedana.py, see the current code starting at line 652: "Step 2: Compute 3-D spatial FFT of Beta maps to detect high-spatial
frequency artifacts"

The simple solution would be to just skip 3-D space dependent criteria with gii files. I suspect that won't cause much harm, but it would require a some close examination of results processed as volumes vs surfaces.

My hunch is that the spatial transforms and interpolations across voxels to go from a volume to a surface might cause some undesired issues with echo-dependent estimations, but there might also be some issues with non-trivial transforms & interpolations in volume space as well. I also don't have a good intuitive sense of how tedana classification mistakes would look in surface space, so there might need to be some thought on how to best visualize results to help users identify issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, and thanks for raising it! Yes, there are going to have to be some differences between the pipelines for volumetric + surface data. Thankfully, having CircleCI running will ensure that none of the changes I'm making in this PR impact the volumetric results in any way, for now.

A lot of this PR is actually just refactoring to ensure that data can be seamlessly passed between functions without regard for the type of data being passed. This will hopefully be useful later in making the code more modular.

As for visualization of surface-based results: I completely agree. I think there will have to be some significant testing to ensure that tedana is actually suitable for surface data, given the limited spatial information. Hopefully this PR will be the first step in that process!

emdupre added a commit that referenced this pull request May 10, 2018
To aid in merging #22, where we've collapsed the xyz dimensions into an S dimension. Tracing back on a numpy core bug with large arrays: numpy/numpy#8869 (comment)
@rmarkello
Copy link
Member Author

It passes! Which means this should be ready for review. 🎉

I haven't actually tested whether surface data can be run through yet, but it's in a much better place to incorporate those data. I think it might be beneficial to merge this in and then work out the surface bugs (and add some tests to ensure they work) afterwards.

Note: in the process of making these changes this I experienced an error wherein taking the mean of a large, float32 numpy array gave varying results across Python instances, despite the underlying data being exactly the same to an incredibly conservative tolerance. Because of this, this PR includes a (very slight) breaking change; the CI was updated to reflect these new (hopefully more accurate) outputs.

Copy link
Member

@emdupre emdupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic ! Some minor questions, but I think this is ready.

x = np.column_stack([np.ones(echo), [-te for te in tes[:echo]]])
X = np.repeat(x, n_vols, axis=0)

beta, res, rank, sing = np.linalg.lstsq(X, B)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these variables were added back in ? Previously it was
beta, _, _, _ = np.linalg.lstsq(X, B)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! And indeed, I think a more parsimonious solution than either of the above would probably be:

beta = np.linalg.lstsq(X, B)[0]

Not sure my rationale here, so I'll change it to reflect this.

@@ -63,145 +68,122 @@ def do_svm(X_train, y_train, X_test, svmtype=0):
elif svmtype == 2:
clf = svm.SVC(kernel='linear', probability=True)
else:
raise ValueError('Input svmtype not in range (3)')
raise ValueError('Input svmtype not in [1, 2, 3]')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the possible types [0, 1, 2] ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Will fix asap.

infile = '__clin.nii.gz'
infile = filewrite(unmask(data, mask), '__clin.nii.gz', ref_img, gzip=True)

# FIXME: ideally no calls to os.system!!! (or AFNI, for that matter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should open an issue for this 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! See #32.

mu = catd.mean(axis=-1)
tes = np.reshape(tes, (n_echoes, 1))
fmin, fmid, fmax = getfbounds(n_echoes)
mu = catd.mean(axis=-1, dtype=float) # BUG: THIS IS THE BAD PLACE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I love / hate this code comment, do you think the explanation in the PR is enough to remove it for now ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can definitely be removed since there is a trace of the rationale behind changing it. It might be worth following numpy development and reverting this change eventually, but I think that will be further down the line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can definitely be removed since there is a trace of the rationale behind changing it. It might be worth following numpy development and reverting this change eventually, but I think that will be further down the line.

Minor changes to address review comments for PR #22
@emdupre
Copy link
Member

emdupre commented May 10, 2018

LGTM ! Merging !

@emdupre emdupre merged commit 99120cc into ME-ICA:master May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants