Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial multigpu support #121

Closed

Conversation

alexpolichroniadis
Copy link

@alexpolichroniadis alexpolichroniadis commented Mar 5, 2019

The trainer works with dataparallel. I haven't updated the test.py code yet, but it should be straightforward. I added a new dataset class to implement a standard pytorch dataset class, so I can feed it into a torch.utils.data.DataLoader, to get the benefits of threading. Shuffling is still handled by the dataset class, hence the batch size of 1 and shuffle=False in the dataloader.

I'll be happy to hear your comments and thanks for your work!

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Updated the YOLOv3 detection and training process for better configuration and performance.

📊 Key Changes

  • Data Configuration Adjustments: Modified paths for training and validation datasets in coco.data configuration.
  • Customizable Data: Added --data-cfg argument in detect.py to specify the data configuration file.
  • Model Predictions: Updated models.py to refine grid size calculations and loss computations for better performance and readability.
  • Improved Data Loading: Transitioned to PyTorch DataLoader in test.py and train.py for more efficient, multi-threaded data loading.
  • Optimization Tweaks: Adjusted optimization steps in train.py and fixed loss logging for enhanced training stability.
  • Multi-GPU Support: Addressed Multi-GPU usage in training script allowing model and loss to replicate across devices.
  • Torch Utils: Improved device selection and reporting in torch_utils.py.

🎯 Purpose & Impact

  • These changes aim to improve code clarity, expand configurability, provide better support for various dataset structures, and enhance overall performance.
  • Users benefit from more streamlined data loading and training, the ability to customize data sources, and potentially higher accuracy and faster convergence due to refined loss computations and model updates.
  • Multi-GPU support scales training to more powerful setups, accelerating research and development.
  • Overall, the PR improves the user experience for both researchers and practitioners working with YOLOv3 object detection.

@glenn-jocher
Copy link
Member

@alexpolichroniadis thanks a lot! I'll review this as soon as I can.

@alexpolichroniadis
Copy link
Author

I have made the full pipeline operational now and will be opening a PR for that soon (or updating this one ). Some things with the current PR are broken outside train.py, but are fixed in the new version. (Like loss reporting, test.py etc.) Will get that to you soon.

/Alex

@alexpolichroniadis
Copy link
Author

Should be okay now! Take a look.

/Alex

@longxianlei
Copy link

'os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7"'
'if torch.cuda.device_count() > 1: model = nn.DataParallel(model, device_ids=[0, 1, 2, 3]) model.to(device).train()'
I have 8 GPUs. I set 4 of my device visiable. Then i use the model to parallel to these GPUs.
but when i run the train.py.
'inter_area = torch.min(box1, box2).prod(2) RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'other' '
Is the code didn't support multi GPU training now.

@glenn-jocher glenn-jocher mentioned this pull request Mar 17, 2019
@longxianlei
Copy link

You merge the MultiGPU version and Single GPU version to one file?
Well done! Thank u.

@glenn-jocher
Copy link
Member

@longxianlei yes I merged the two branches. I added multithreaded support in the DataLoader (#141), and now the times are much faster. The latest benchmarks are here.

https://cloud.google.com/deep-learning-vm/
Machine type: n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform: Intel Skylake
GPUs: 1-4 x NVIDIA Tesla P100
HDD: 100 GB SSD

GPUs batch_size speed COCO epoch
(P100) (images) (s/batch) (min/epoch)
1 16 0.39s 48min
2 32 0.48s 29min
4 64 0.65s 20min

@peterhsu2018
Copy link

@glenn-jocher
Hi, sir
Does any support for mult-gpu?
I try to click the link https://github.com/ultralytics/yolov3/tree/multi_gpu, but it seem to invalid.
Thanks

@glenn-jocher
Copy link
Member

@peterhsu2018 yes, master branch supports multi GPU.

@peterhsu2018
Copy link

@glenn-jocher Thank you so much!

@glenn-jocher
Copy link
Member

@peterhsu2018 you're welcome! If you have any other questions, feel free to ask. Thank you for your understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants