Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove unecessary copy of large data #995

Merged
merged 1 commit into from
Nov 6, 2023

Conversation

bpinsard
Copy link
Contributor

@bpinsard bpinsard commented Nov 1, 2023

Changes proposed in this pull request:

  • remove copy of the data that seems unnecessary, cuts memory usage by ~30%
  • data is not modified afterward, is copied through subsetting in the following lines (362).

investigation of memory usage originates from nipreps/fmriprep#3125

Copy link

codecov bot commented Nov 2, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b139386) 89.54% compared to head (a0c9aae) 89.54%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #995      +/-   ##
==========================================
- Coverage   89.54%   89.54%   -0.01%     
==========================================
  Files          26       26              
  Lines        3396     3395       -1     
  Branches      619      619              
==========================================
- Hits         3041     3040       -1     
  Misses        207      207              
  Partials      148      148              
Files Coverage Δ
tedana/decay.py 93.79% <ø> (-0.05%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@effigies
Copy link
Contributor

effigies commented Nov 6, 2023

@tsalo It looks like you added this .copy() in #468, but there doesn't seem to be any discussion that indicates it's necessary. My reading is that all downstream uses should be reading, not writing this, so I think this PR is a sensible optimization, but I'm definitely not well-versed in Tedana code.

Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks @bpinsard.

@effigies you're right, data isn't modified in place in that function, so there's no need to copy it. Not sure why I added the copy.

@tsalo tsalo requested a review from handwerkerd November 6, 2023 16:15
Copy link
Member

@handwerkerd handwerkerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for noticing and correcting this!

This feels like something where the edited data was being passed out of the function, but that's not happening now and going through past versions I don't see where that might have been happening.

@bpinsard
Copy link
Contributor Author

bpinsard commented Nov 6, 2023

Thanks for your feedback! It was an easy catch when I quickly mem-profiled the code, the use of float64 is definitely making the workflow memory intensive. I guess there could be further optimization.
Is float64 used only for scipy.optimize.curve_fit? if so maybe converting to float64 could be differed to when it's called.

@handwerkerd
Copy link
Member

Barring essential needs, I'd lead towards keeping things as float64 because the code is doing fits and divisions in multiple places and rounding the final bit of information has been shown to non-trivially reduce precision. Having float64 means are data are reliably stable to precision above float32. There is alowmem option that I don't particularly like, but maybe we can change that to run with float32.

FWIW, there's another standing memory issue ( #856 ). It's possible this fix addresses that issue, but we also identified another place where we can slightly reduce memory needs (see final comment). I'm not sure how much more memory that would save, but probably still worth implementing.

Also, this PR has two approvals and can be merged. @bpinsard as the person who created the PR you are welcome to hit squash and merge

@bpinsard
Copy link
Contributor Author

bpinsard commented Nov 6, 2023

Barring essential needs, I'd lead towards keeping things as float64 because the code is doing fits and divisions in multiple places and rounding the final bit of information has been shown to non-trivially reduce precision. Having float64 means are data are reliably stable to precision above float32. There is alowmem option that I don't particularly like, but maybe we can change that to run with float32.

I think #856 mainly originates from a too low estimates of memory reqs passed to nipype in fmriprep. I hit such errors even with 45G of RAM. Maybe if someone familiar with the tedana internals know how many times the data get transformed/duplicated we could try to figure out a better heuristics. Though the current PR might bring the estimate to what is required.

I don't think I have permissions to merge PRs on that repo.

@handwerkerd handwerkerd merged commit 2cb05ab into ME-ICA:main Nov 6, 2023
@bpinsard bpinsard deleted the fix/mem_usage branch November 6, 2023 21:14
@handwerkerd
Copy link
Member

@bpinsard I'm working on a tedana abstract for OHBM 2024. People who contributed in the last year are welcomed as co-authors. If you want to be a co-author, let me know so that I can make sure I have appropriate info to include.

@tsalo
Copy link
Member

tsalo commented Nov 22, 2023

@all-contributors please add @bpinsard for code.

Copy link
Contributor

@tsalo

I've put up a pull request to add @bpinsard! 🎉

@handwerkerd
Copy link
Member

@all-contributors please add @bpinsard for code.

Copy link
Contributor

@handwerkerd

@bpinsard already contributed before to code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants