multi_gpu #135

glenn-jocher · 2019-03-17T21:38:40Z

For more information see issue #21.

We started a multi_gpu branch (https://github.com/ultralytics/yolov3/tree/multi_gpu), with a secondary goal of trying out a different loss approach, selecting a single anchor from the 9 available for each target. The new loss produced significantly worse results, so it appears the current method of selecting one anchor from each yolo layer is correct. In the process we did get multi_gpu operational, though not with the speedups expected. We did not attempt to use a multithreaded PyTorch dataloader, nor PIL in place of OpenCV, as we found both of these slower in our single-GPU profiling last year.

We don't have multiple gpu machines on premise so we tested this with GCP Deep Learning VMs. We used batch_size=26 (max that 1 P100 can handle) times the number of GPUs. All other training setting were defaults. We selected the fastest batch out of the first 30 for timing purposes. Results are below for our branch and the #121 PR. In both cases the speedups were very poor. It's possible the IO ops were constrained by GCP due to the limited SSD size, we will try again with a larger SSD but we wanted to get these results out here for feedback. If anyone has another repo or PR we can compare against please let us know!

https://cloud.google.com/deep-learning-vm/
Machine type: n1-highmem-4 (4 vCPUs, 26 GB memory)
CPU platform: Intel Skylake
GPUs: 1-4 x NVIDIA Tesla P100
HDD: 500 GB SSD

GPUs	`batch_size`	yolov3/tree/multi_gpu	yolov3/pull/121
(P100)	(images)	(s/batch)	(s/batch)
1	26	0.91s	1.05s
2	52	1.60s	1.76s
4	104	2.26s	2.81s

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced YOLOv3 model code and testing suite.

📊 Key Changes

Removed unused dependencies and streamlined YOLOLayer and forward method for efficiency.
Simplified loss computation during training.
Enhanced forward method to support ONNX export.
Improved performance by ensuring device compatibility during grid creations.
Refactored the testing script test.py for clarity.
Cleaned up the train.py script, including the removal of unused variables.
Updated dataset handling for better indexing and performance.
Improved util scripts and functions for maintainability.
General code cleanup and refactoring for readability and performance.

🎯 Purpose & Impact

🏎 Speeds up the model’s performance and reduces memory footprint.
🧹 Code cleanup improves maintainability and sets the stage for future features.
🤖 Better ONNX support prepares the model for broader deployment possibilities.
🌍 Device compatibility adjustments ensure more consistent behavior across different computing environments.
✅ Simplified testing and training scripts contribute to a smoother workflow for users setting up and evaluating models.
Overall, these changes lead to an improved user and developer experience, making it easier to use and contribute to the project.

glenn-jocher · 2019-03-19T14:17:01Z

Updated times with batch_size=24, and comparison to existing study.

https://cloud.google.com/deep-learning-vm/
Machine type: n1-highmem-4 (4 vCPUs, 26 GB memory)
CPU platform: Intel Skylake
GPUs: 1-4 x NVIDIA Tesla P100
HDD: 100 GB SSD

GPUs	`batch_size`	`613ce1b`	COCO epoch
(P100)	(images)	(s/batch)	(min/epoch)
1	24	0.84s	70min
2	48	1.27s	53min
4	96	2.11s	44min

Comprison results from https://github.com/ilkarman/DeepLearningFrameworks

glenn-jocher · 2019-03-21T13:24:28Z

@alexpolichroniadis, @longxianlei, @LightToYang Great news! Lack of multithreading in the dataloader was slowing down multi-GPU significantly (#141). I reimplented support for DataLoader multithreading, and speeds have improved greatly (more than double in some cases). The new test results are below for the latest commit.

https://cloud.google.com/deep-learning-vm/
Machine type: n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform: Intel Skylake
GPUs: 1-4 x NVIDIA Tesla P100
HDD: 100 GB SSD

GPUs	`batch_size`	speed	COCO epoch
(P100)	(images)	(s/batch)	(min/epoch)
1	16	0.39s	48min
2	32	0.48s	29min
4	64	0.65s	20min

* updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates

glenn-jocher added 30 commits March 8, 2019 16:21

updates

634ef7f

updates

528d06d

updates

01fcce2

updates

e25383f

updates

b003bfc

updates

318c031

updates

f18c93e

updates

d733ca2

updates

2e9fb9b

updates

483d6ae

updates

f74b6d6

updates

30cf8f2

updates

e29e71a

updates

211f26a

updates

47d0276

updates

be3155a

updates

495d2ff

updates

9338dc7

updates

cb052cf

updates

24db386

updates

d216ec2

updates

2d77553

updates

cca8d64

updates

24f9991

updates

ddfd729

updates

0af47ed

updates

7e2252b

updates

4782c97

updates

26dac7b

updates

b8b575f

glenn-jocher added 24 commits March 15, 2019 20:43

updates

47e774e

updates

2fa4d57

updates

372b34e

updates

0c7224d

updates

dafc7ee

updates

82b8cd2

updates

3e03ee5

updates

4b853ba

updates

29b97e5

updates

c981981

updates

5de866a

updates

c8d8771

updates

35a2705

updates

fc3031f

updates

e923b1c

updates

fe69a94

updates

d5aab48

updates

e67000a

updates

70ad4bd

updates

6b186e4

updates

6352462

updates

e4adebf

updates

e958e54

Merge branch 'master' into multi_gpu

ddfca3a

glenn-jocher merged commit 45fac6b into master Mar 17, 2019

This was referenced Mar 18, 2019

RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'other' #42

Closed

Multi-GPU Training #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi_gpu #135

multi_gpu #135

glenn-jocher commented Mar 17, 2019 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Mar 19, 2019 •

edited

Loading

glenn-jocher commented Mar 21, 2019

multi_gpu #135

multi_gpu #135

Conversation

glenn-jocher commented Mar 17, 2019 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented Mar 19, 2019 • edited Loading

glenn-jocher commented Mar 21, 2019

glenn-jocher commented Mar 17, 2019 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Mar 19, 2019 •

edited

Loading