Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise cannot pickle '_thread.lock' object error when use train_loader #4

Open
YSXXXXXXX opened this issue Oct 11, 2023 · 5 comments
Open
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@YSXXXXXXX
Copy link

Describe the bug
Hi, when I run the transworld_exp.py file, the following error occurs:

Traceback (most recent call last):
  File "C:\TransWorldNG\transworld\transworld_exp.py", line 252, in <module>
    run(args.scenario,args.train_data, args.training_step, args.pred_step, args.hid_dim, args.n_head, args.n_layer, device)
  File "C:\TransWorldNG\transworld\transworld_exp.py", line 201, in run
    loss_lst = train(timestamps, graph, batch_size, num_workers, encoder, generator, veh_route, loss_fcn, optimizer, logger, device)
  File "C:\TransWorldNG\transworld\transworld_exp.py", line 40, in train
    for i, (cur_graphs, next_graphs) in enumerate(train_loader): 
  File "D:\anaconda\envs\TransWorldNG\lib\site-packages\torch\utils\data\dataloader.py", line 444, in __iter__
    return self._get_iterator()
  File "D:\anaconda\envs\TransWorldNG\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "D:\anaconda\envs\TransWorldNG\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in __init__
    w.start()
  File "D:\anaconda\envs\TransWorldNG\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "D:\anaconda\envs\TransWorldNG\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "D:\anaconda\envs\TransWorldNG\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "D:\anaconda\envs\TransWorldNG\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\anaconda\envs\TransWorldNG\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread.lock' object

To Reproduce
run python file TransWorldNG\transworld\transworld_exp.py

Expected behavior
train_loader should return the sampled current and next timestamp graphs, in Line: 40.

Desktop:

  • OS: Windows11
  • python: 3.9.11

Possible solution
I notice w.start() appears in the python traceback, so I check the objects that contain in parameter args (see the following Python statement), and find self._collate_fn cannot be pickled.

w = multiprocessing_context.Process(
    target=_utils.worker._worker_loop,
    args=(self._dataset_kind, self._dataset, index_queue,
        self._worker_result_queue, self._workers_done_event,
        self._auto_collation, self._collate_fn, self._drop_last,
        self._base_seed, self._worker_init_fn, i, self._num_workers,
        self._persistent_workers, self._shared_seed))

Maybe this error is associated with the class Node, which uses queue.Queue (the queue.Queue has thread.lock). I think collections.deque can be an alternative replacement.
For more information, please see TypeError: can't pickle _thread.lock objects and Python Multiprocessing Pool.map Causes Error in __new__.

@YSXXXXXXX YSXXXXXXX added bug Something isn't working help wanted Extra attention is needed labels Oct 11, 2023
@lovelybirds
Copy link
Collaborator

It seems you're encountering an issue with multiprocessing, the error occurs when you're trying to use DataLoader with multiple workers (num_workers > 0), which involves pickling and unpickling data.

It can sometimes be influenced by the Python version in use. One thing you can try is to check the Python version, we are using Python 3.9. Secondly, a potential solution is to set the number of workers to 0, on line 166 in TransWorldNG\transworld\transworld_exp.py. This adjustment can disable multiprocessing for data loading. After which you may narrow down the cause of the problem.

@YSXXXXXXX
Copy link
Author

YSXXXXXXX commented Oct 18, 2023

Hi, lovelybirds,
Thank you for your reply.
I agree with your second method. But for the first one, in the issue description, as you see, my Python version is also 3.9. I think queue.Queue (transworld/game/core/node.py Line: 9) may be unsuitable for multiple workers.

@nudtdyk
Copy link

nudtdyk commented Oct 25, 2023

Hi, lovelybirds,
When I set the number of workers to 0,there is another problem:
image

@lovelybirds
Copy link
Collaborator

Hi nudtdyk,

I noticed the error on line 218 with batch//num_workers. My apologies for suggesting worker=0. Please try worker = 1 to circumvent the integer division by zero issues.

We're in the process of creating a Docker environment with the same configurations. Hope this should help in preventing such issues in the future.

@nudtdyk
Copy link

nudtdyk commented Oct 25, 2023

Hi, lovelybirds,
Thank you for your reply.
But if I set worker = 1,I will encounter the same problem as YSXXXXXXX proposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants