[BUG] Memory error when cudf is used in python multiprocessing Pool #5515

achinta · 2020-06-19T11:54:27Z

Cudf 0.14 gives error when used in python multiprocessing Pool. However, it works in version 0.12. Here is the code to reproduce.

import cudf
import pandas as pd
from multiprocessing import Pool

def get_df(idx):
    pdf = pd.DataFrame({
        'a':[1,2],
        'b':[3,4]
    })
    return cudf.from_pandas(pdf)

# Parallelize the method calls
with Pool(2) as pool:
    pool.map(get_df, [1,2])

Error is

MemoryError                               Traceback (most recent call last)
<ipython-input-1-757e5676c563> in <module>
     11 
     12 with Pool(2) as pool:
---> 13     pool.map(get_df, [1,2])

~/miniconda3/envs/gpu/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    264         in a list that is returned.
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 
    268     def starmap(self, func, iterable, chunksize=None):

~/miniconda3/envs/gpu/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

MemoryError: std::bad_alloc: CUDA error at: /conda/conda-bld/librmm_1591196551527/work/include/rmm/mr/device/cuda_memory_resource.hpp66: cudaErrorInitializationError initialization error

cudf was installed using Anaconda on bare metal. I am attaching the outputs of cudf/print_env.sh
print_env_12.txt
print_env_14.txt

The text was updated successfully, but these errors were encountered:

shwina · 2020-06-19T16:07:16Z

Looks related to the new memory resource bindings; will investigate.

shwina · 2020-06-19T18:11:03Z

Thanks for reporting. This is likely due to the call to fork(), which will attempt to share the CUDA context created in the parent process. One fix is to use spawn() instead:

import cudf
import pandas as pd
from multiprocessing import get_context

def get_df(idx):
    pdf = pd.DataFrame({
        'a':[1,2],
        'b':[3,4]
    })
    return cudf.from_pandas(pdf)

if __name__ == "__main__":
    ctx = get_context("spawn")

    # Parallelize the method calls
    with ctx.Pool(2) as pool:
        print(pool.map(get_df, [1,2]))

Does that help with your problem?

achinta · 2020-06-20T16:49:26Z

No. The error message is

AttributeError: Can't get attribute 'get_df' on <module '__main__' (built-in)>

shwina · 2020-06-22T11:21:59Z

Hmm, how are you running this test? Interactively with IPython/Jupyter or invoking it as a script?

philtrade · 2020-06-23T04:19:34Z

MemoryError                               Traceback (most recent call last)
<ipython-input-1-757e5676c563> in <module>

Looks like in iPython.

No. The error message is

AttributeError: Can't get attribute 'get_df' on <module '__main__' (built-in)>

Looks like the known limitation (the 2nd gray box in the link) of the python's multiprocessing when used interactively with iPython.

When launched as a script, @shwina 's suggestion shouldn't see the AttributeError, should it?

But not sure about the original error message below, is it related to usage of fork vs spawn or sth else.

MemoryError: std::bad_alloc: CUDA error at: /conda/conda-bld/librmm_1591196551527/work/include/rmm/mr/device/cuda_memory_resource.hpp66: cudaErrorInitializationError initialization error

philtrade · 2020-06-23T07:01:49Z

... This is likely due to the call to fork(), which will attempt to share the CUDA context created in the parent process. One fix is to use spawn() instead:

import cudf
import pandas as pd
from multiprocessing import get_context

def get_df(idx):
    pdf = pd.DataFrame({
        'a':[1,2],
        'b':[3,4]
    })
    return cudf.from_pandas(pdf)

if __name__ == "__main__":
    ctx = get_context("spawn")

    # Parallelize the method calls
    with ctx.Pool(2) as pool:
        print(pool.map(get_df, [1,2]))

Verified that this works when run as a script. With the default fork() starting method, it would hit the initialization error.

achinta · 2020-06-24T10:12:24Z

@shwina your suggestion works fine when run as a script. Thanks.

shwina · 2020-06-24T12:05:58Z

Thanks for letting us know!

achinta added Needs Triage Need team to review and classify bug Something isn't working labels Jun 19, 2020

shwina self-assigned this Jun 19, 2020

kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jun 19, 2020

shwina closed this as completed Jun 24, 2020

lmmx mentioned this issue Aug 16, 2021

[FEA] Multiprocessing support (to avoid CUDA OOM due to memory spikes) #9042

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Memory error when cudf is used in python multiprocessing Pool #5515

[BUG] Memory error when cudf is used in python multiprocessing Pool #5515

achinta commented Jun 19, 2020 •

edited

Loading

shwina commented Jun 19, 2020

shwina commented Jun 19, 2020

achinta commented Jun 20, 2020

shwina commented Jun 22, 2020

philtrade commented Jun 23, 2020 •

edited

Loading

philtrade commented Jun 23, 2020

achinta commented Jun 24, 2020

shwina commented Jun 24, 2020

[BUG] Memory error when cudf is used in python multiprocessing Pool #5515

[BUG] Memory error when cudf is used in python multiprocessing Pool #5515

Comments

achinta commented Jun 19, 2020 • edited Loading

shwina commented Jun 19, 2020

shwina commented Jun 19, 2020

achinta commented Jun 20, 2020

shwina commented Jun 22, 2020

philtrade commented Jun 23, 2020 • edited Loading

philtrade commented Jun 23, 2020

achinta commented Jun 24, 2020

shwina commented Jun 24, 2020

achinta commented Jun 19, 2020 •

edited

Loading

philtrade commented Jun 23, 2020 •

edited

Loading