-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi_gpu #135
multi_gpu #135
Conversation
Updated times with https://cloud.google.com/deep-learning-vm/
Comprison results from https://github.com/ilkarman/DeepLearningFrameworks |
@alexpolichroniadis, @longxianlei, @LightToYang Great news! Lack of multithreading in the dataloader was slowing down multi-GPU significantly (#141). I reimplented support for DataLoader multithreading, and speeds have improved greatly (more than double in some cases). The new test results are below for the latest commit. https://cloud.google.com/deep-learning-vm/
|
* updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates
For more information see issue #21.
We started a multi_gpu branch (https://github.com/ultralytics/yolov3/tree/multi_gpu), with a secondary goal of trying out a different loss approach, selecting a single anchor from the 9 available for each target. The new loss produced significantly worse results, so it appears the current method of selecting one anchor from each yolo layer is correct. In the process we did get multi_gpu operational, though not with the speedups expected. We did not attempt to use a multithreaded PyTorch dataloader, nor PIL in place of OpenCV, as we found both of these slower in our single-GPU profiling last year.
We don't have multiple gpu machines on premise so we tested this with GCP Deep Learning VMs. We used
batch_size=26
(max that 1 P100 can handle) times the number of GPUs. All other training setting were defaults. We selected the fastest batch out of the first 30 for timing purposes. Results are below for our branch and the #121 PR. In both cases the speedups were very poor. It's possible the IO ops were constrained by GCP due to the limited SSD size, we will try again with a larger SSD but we wanted to get these results out here for feedback. If anyone has another repo or PR we can compare against please let us know!https://cloud.google.com/deep-learning-vm/
Machine type: n1-highmem-4 (4 vCPUs, 26 GB memory)
CPU platform: Intel Skylake
GPUs: 1-4 x NVIDIA Tesla P100
HDD: 500 GB SSD
batch_size
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Enhanced YOLOv3 model code and testing suite.
📊 Key Changes
test.py
for clarity.train.py
script, including the removal of unused variables.🎯 Purpose & Impact