Improvements for max_mem #12

rabernat · 2020-07-13T14:17:06Z

Currently we require that the max_mem be a user-specified integer (in bytes). It would be better to do the following:

try to auto-discover max_mem by examining the dask Client object
accept different units (e.g. '1 GB')

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-07-13T14:25:49Z

Will we have dask as a required dependency? If so then we can use dask.utils.parse_bytes. Otherwise we can copy it here: https://github.com/dask/dask/blob/ffc8881f84c3ba7a60ca3f10e4c6be1f889937fa/dask/utils.py#L1185-L1236

rabernat · 2020-07-13T14:53:57Z

I think so. Right now, dask is the only supported scheduler, so the package would be useless without it.

Have you seen this? Someone already got it working with pywren: https://discourse.pangeo.io/t/best-practices-to-go-from-1000s-of-netcdf-files-to-analyses-on-a-hpc-cluster/588/33

But that still uses dask.

bzah · 2022-03-09T08:37:40Z

I wrote a little something to auto-discover max_mem.
It only works with dask default and distributed schedulers, not with other executors. That's why I don't think it's worth a PR as is.
However, it might be useful for people coming across this issue, as I was.

edit: made distributed optional

import psutil

def infer_memory_limit(factor: float = 0.9) -> int:
    if factor > 1 or factor < 0:
        raise ValueError(f"factor was {factor} but, it must be between 0 and 1.")
    try:
        import distributed
        max_sys_mem = (
            distributed.get_client()
            .submit(lambda: distributed.get_worker().memory_limit)
            .result()
        )
    except (ValueError, ImportError):
        # Assumes default scheduler is used (psutil must be available)
        max_sys_mem = psutil.virtual_memory().total
    return int(factor * max_sys_mem)

I have included a factor multiplication after reading this discussion: #54 (comment)
Beware, the default factor 0.9 might be a bit optimistic.

It can be used like this:

safe_mem_factor = 0.7
max_mem = infer_memory_limit(safe_mem_factor)
rechunk(source_array, target_chunks_dict, max_mem, target_store, temp_store=temp_store)

rabernat mentioned this issue Jul 13, 2020

use parse bytes #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements for max_mem #12

Improvements for max_mem #12

rabernat commented Jul 13, 2020 •

edited

Loading

TomAugspurger commented Jul 13, 2020

rabernat commented Jul 13, 2020

bzah commented Mar 9, 2022 •

edited

Loading

Improvements for max_mem #12

Improvements for max_mem #12

Comments

rabernat commented Jul 13, 2020 • edited Loading

TomAugspurger commented Jul 13, 2020

rabernat commented Jul 13, 2020

bzah commented Mar 9, 2022 • edited Loading

rabernat commented Jul 13, 2020 •

edited

Loading

bzah commented Mar 9, 2022 •

edited

Loading