You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There appears to be a performance bottleneck in raw_to_rawnc. In the docstring it is noted that this is slow for large datasets. As this function converts small files individually, it looks ideal for multiprocessing.
The text was updated successfully, but these errors were encountered:
Update: I've recently been introduced to polars which looks like it would be an ideal solution for this. It's parallel by default and works effectively as a replacement for pandas. I'll work up a PR to use polars in place of pandas for the seaexplorer raw_to_rawnc step
We also introduced a method to subsample the data to remove all the redundant data the Alseamars put out by default. Not sure if that helps fix the problem. Not to say we shouldn't also consider polars.
There appears to be a performance bottleneck in raw_to_rawnc. In the docstring it is noted that this is slow for large datasets. As this function converts small files individually, it looks ideal for multiprocessing.
The text was updated successfully, but these errors were encountered: