Initial multigpu support #121

alexpolichroniadis · 2019-03-05T22:08:25Z

The trainer works with dataparallel. I haven't updated the test.py code yet, but it should be straightforward. I added a new dataset class to implement a standard pytorch dataset class, so I can feed it into a torch.utils.data.DataLoader, to get the benefits of threading. Shuffling is still handled by the dataset class, hence the batch size of 1 and shuffle=False in the dataloader.

I'll be happy to hear your comments and thanks for your work!

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Updated the YOLOv3 detection and training process for better configuration and performance.

📊 Key Changes

Data Configuration Adjustments: Modified paths for training and validation datasets in coco.data configuration.
Customizable Data: Added --data-cfg argument in detect.py to specify the data configuration file.
Model Predictions: Updated models.py to refine grid size calculations and loss computations for better performance and readability.
Improved Data Loading: Transitioned to PyTorch DataLoader in test.py and train.py for more efficient, multi-threaded data loading.
Optimization Tweaks: Adjusted optimization steps in train.py and fixed loss logging for enhanced training stability.
Multi-GPU Support: Addressed Multi-GPU usage in training script allowing model and loss to replicate across devices.
Torch Utils: Improved device selection and reporting in torch_utils.py.

🎯 Purpose & Impact

These changes aim to improve code clarity, expand configurability, provide better support for various dataset structures, and enhance overall performance.
Users benefit from more streamlined data loading and training, the ability to customize data sources, and potentially higher accuracy and faster convergence due to refined loss computations and model updates.
Multi-GPU support scales training to more powerful setups, accelerating research and development.
Overall, the PR improves the user experience for both researchers and practitioners working with YOLOv3 object detection.

glenn-jocher · 2019-03-07T14:39:10Z

@alexpolichroniadis thanks a lot! I'll review this as soon as I can.

alexpolichroniadis · 2019-03-07T15:17:41Z

I have made the full pipeline operational now and will be opening a PR for that soon (or updating this one ). Some things with the current PR are broken outside train.py, but are fixed in the new version. (Like loss reporting, test.py etc.) Will get that to you soon.

/Alex

alexpolichroniadis · 2019-03-07T18:51:24Z

Should be okay now! Take a look.

/Alex

longxianlei · 2019-03-17T03:01:35Z

'os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7"'
'if torch.cuda.device_count() > 1: model = nn.DataParallel(model, device_ids=[0, 1, 2, 3]) model.to(device).train()'
I have 8 GPUs. I set 4 of my device visiable. Then i use the model to parallel to these GPUs.
but when i run the train.py.
'inter_area = torch.min(box1, box2).prod(2) RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'other' '
Is the code didn't support multi GPU training now.

longxianlei · 2019-03-22T01:48:16Z

You merge the MultiGPU version and Single GPU version to one file?
Well done! Thank u.

glenn-jocher · 2019-03-22T09:41:57Z

@longxianlei yes I merged the two branches. I added multithreaded support in the DataLoader (#141), and now the times are much faster. The latest benchmarks are here.

https://cloud.google.com/deep-learning-vm/
Machine type: n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform: Intel Skylake
GPUs: 1-4 x NVIDIA Tesla P100
HDD: 100 GB SSD

GPUs	`batch_size`	speed	COCO epoch
(P100)	(images)	(s/batch)	(min/epoch)
1	16	0.39s	48min
2	32	0.48s	29min
4	64	0.65s	20min

peterhsu2018 · 2019-05-30T05:55:01Z

@glenn-jocher
Hi, sir
Does any support for mult-gpu?
I try to click the link https://github.com/ultralytics/yolov3/tree/multi_gpu, but it seem to invalid.
Thanks

glenn-jocher · 2019-05-30T08:58:40Z

@peterhsu2018 yes, master branch supports multi GPU.

peterhsu2018 · 2019-05-31T03:07:47Z

@glenn-jocher Thank you so much!

glenn-jocher · 2023-11-15T08:39:42Z

@peterhsu2018 you're welcome! If you have any other questions, feel free to ask. Thank you for your understanding.

alexpolichroniadis and others added 4 commits March 5, 2019 16:55

add multi-gpu support

b83ef11

push init

cc22295

Update train.py

5977838

Merge branch 'master' into multigpu

e51e686

alexpolichroniadis added 3 commits March 7, 2019 13:27

updating multigpu support

ba4281e

remove debug print statement

5a84ef4

cleanup

1ce2698

alexpolichroniadis added 2 commits March 7, 2019 13:58

add targets to cuda device

271b738

Merge branch 'master' into multigpu

14c3f1c

alexpolichroniadis mentioned this pull request Mar 7, 2019

Multi-GPU Training #21

Closed

LcenArthas mentioned this pull request Mar 12, 2019

Multi-GPU Training JKBox/YOLOv3-quadrangle#1

Open

glenn-jocher referenced this pull request Mar 16, 2019

updates

3bea4da

glenn-jocher mentioned this pull request Mar 17, 2019

multi_gpu #135

Merged

glenn-jocher closed this Apr 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial multigpu support #121

Initial multigpu support #121

alexpolichroniadis commented Mar 5, 2019 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Mar 7, 2019

alexpolichroniadis commented Mar 7, 2019

alexpolichroniadis commented Mar 7, 2019

longxianlei commented Mar 17, 2019

longxianlei commented Mar 22, 2019

glenn-jocher commented Mar 22, 2019

peterhsu2018 commented May 30, 2019

glenn-jocher commented May 30, 2019

peterhsu2018 commented May 31, 2019

glenn-jocher commented Nov 15, 2023

Initial multigpu support #121

Initial multigpu support #121

Conversation

alexpolichroniadis commented Mar 5, 2019 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented Mar 7, 2019

alexpolichroniadis commented Mar 7, 2019

alexpolichroniadis commented Mar 7, 2019

longxianlei commented Mar 17, 2019

longxianlei commented Mar 22, 2019

glenn-jocher commented Mar 22, 2019

peterhsu2018 commented May 30, 2019

glenn-jocher commented May 30, 2019

peterhsu2018 commented May 31, 2019

glenn-jocher commented Nov 15, 2023

alexpolichroniadis commented Mar 5, 2019 •

edited by UltralyticsAssistant

Loading