Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train more GPUs #12

Open
luoyq6 opened this issue Nov 2, 2022 · 3 comments
Open

How to train more GPUs #12

luoyq6 opened this issue Nov 2, 2022 · 3 comments

Comments

@luoyq6
Copy link

luoyq6 commented Nov 2, 2022

When I try to use multi GPU training, the following problems occur
Traceback (most recent call last):
File "train1.py", line 240, in
rois_label, correlation_feat2, rois_label2 = model(im_data, im_info, gt_boxes, num_boxes, support_ims)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
return self.gather(outputs, self.output_device)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 180, in gather
return gather(outputs, output_device, dim=self.dim)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 78, in gather
res = gather_map(outputs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 73, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 56, in forward
assert all(i.device.type != 'cpu' for i in inputs), (
AssertionError: Gather function not implemented for CPU tensors

@CrapbagMo
Copy link

hi, have you solved it yet?

@luoyq6
Copy link
Author

luoyq6 commented Nov 15, 2022

嗨,你解决了吗?
no

@infinity7428
Copy link
Owner

This seems to be due to Meta-CLM. I modified the code to enable multi-gpu training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants