replace xarray median and mean with numpy #419

Kasra-Shirvanian · 2025-02-16T23:51:25Z

This PR improves the performance of an example script by replacing xarray's mean() and median() functions with numpy's nanmean() and nanmedian(). The xarray functions apply reductions iteratively over subbranches, making them inefficient for simple statistical calculations. Additionally, the script was redundantly computing mean and median values twice (once for the plot and again for the legend).

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?

This change does not affect the main functionality of the toolbox but significantly improves the performance of the example script. It provides a better user experience for those trying out the examples or adapting them to their own datasets, leading to a better impression of the project.

What does this PR do?

Replaces xarray's mean() and median() with numpy.nanmean() and numpy.nanmedian().
Removed redundant calculations in the legend.

How has this PR been tested?

It has been tested on my local machine

Performance Improvement:

Before: Median computation took 13.41 seconds
After: Median computation now takes 0.00025 seconds

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

sonarqubecloud · 2025-02-16T23:51:53Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

codecov · 2025-02-17T10:00:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.83%. Comparing base (4269319) to head (4cedd64).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #419   +/-   ##
=======================================
  Coverage   99.83%   99.83%           
=======================================
  Files          22       22           
  Lines        1219     1219           
=======================================
  Hits         1217     1217           
  Misses          2        2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

niksirbi · 2025-02-17T10:48:59Z

The xarray functions apply reductions iteratively over subbranches, making them inefficient for simple statistical calculations.

Thanks for drawing attention to this @Kasra-Shirvanian.

This PR prompted me to time the execution of numpy vs xarray ways of computing the mean and median on my machine. I found that you are right, numpy is faster, by a factor ranging from ~1.4 to ~5 (at least on the small dataset we use in this example).

Some outputs from my tests (run with IPython on an M2 silicon Mac, reported time is average of 1000 executions).

xarray mean: 0.094241 s
numpy mean: 0.066624 s
numpy is ~1.41 times faster

xarray median: 0.235287 s
numpy median: 0.094634 s
numpy is ~2.49 times faster

For some context, the purpose of the examples is mainly educational, and part of their aim is to get our users familiar with xarray syntax and methods, that's why we often reach for them despite the performance penalty. Another reason for preferring xarray is that it makes the code in the examples more readable, but this is more relevant for cases where statistics (like mean and median) are computed over a particular dimension. For example, data.mean(dim = "keypoints") is more explicit than np.nanmean(data, axis=2), at least in my opinion.

That said, the case you've highlighted only concerns 1D data, and perhaps there is value in also using numpy methods in some of our examples, because it highlights the fact that you can actually call numpy methods of xarray objects.

Additionally, the script was redundantly computing mean and median values twice (once for the plot and again for the legend).

You are right about that.

All in all I'm happy to merge this PR. feel free to mark it as "Ready for review" despite the failing action. That failure is unrelated to this PR.

niksirbi · 2025-02-17T10:55:47Z

That failure is unrelated to this PR.

Actually, just re-running the failing github action fixed the problem. There was a problematic URL that now seems to be happy again.

niksirbi

This is ready to merge from my point of view @Kasra-Shirvanian. Congrats on your first contribution to movement 🎉

replace xarray median and mean with numpy

4cedd64

Kasra-Shirvanian marked this pull request as draft February 16, 2025 23:52

Kasra-Shirvanian marked this pull request as ready for review February 17, 2025 01:38

Kasra-Shirvanian marked this pull request as draft February 17, 2025 01:38

niksirbi marked this pull request as ready for review February 17, 2025 15:18

niksirbi approved these changes Feb 17, 2025

View reviewed changes

niksirbi added this pull request to the merge queue Feb 17, 2025

Merged via the queue into neuroinformatics-unit:main with commit c51ed75 Feb 17, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace xarray median and mean with numpy #419

replace xarray median and mean with numpy #419

Kasra-Shirvanian commented Feb 16, 2025 •

edited

Loading

sonarqubecloud bot commented Feb 16, 2025

codecov bot commented Feb 17, 2025 •

edited

Loading

niksirbi commented Feb 17, 2025

niksirbi commented Feb 17, 2025

niksirbi left a comment

replace xarray median and mean with numpy #419

replace xarray median and mean with numpy #419

Conversation

Kasra-Shirvanian commented Feb 16, 2025 • edited Loading

Description

How has this PR been tested?

Checklist:

sonarqubecloud bot commented Feb 16, 2025

Quality Gate passed

codecov bot commented Feb 17, 2025 • edited Loading

Codecov Report

niksirbi commented Feb 17, 2025

niksirbi commented Feb 17, 2025

niksirbi left a comment

Choose a reason for hiding this comment

Kasra-Shirvanian commented Feb 16, 2025 •

edited

Loading

codecov bot commented Feb 17, 2025 •

edited

Loading