-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial multigpu support #121
Initial multigpu support #121
Conversation
@alexpolichroniadis thanks a lot! I'll review this as soon as I can. |
I have made the full pipeline operational now and will be opening a PR for that soon (or updating this one ). Some things with the current PR are broken outside train.py, but are fixed in the new version. (Like loss reporting, test.py etc.) Will get that to you soon. /Alex |
Should be okay now! Take a look. /Alex |
'os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7"' |
You merge the MultiGPU version and Single GPU version to one file? |
@longxianlei yes I merged the two branches. I added multithreaded support in the DataLoader (#141), and now the times are much faster. The latest benchmarks are here. https://cloud.google.com/deep-learning-vm/
|
@glenn-jocher |
@peterhsu2018 yes, master branch supports multi GPU. |
@glenn-jocher Thank you so much! |
@peterhsu2018 you're welcome! If you have any other questions, feel free to ask. Thank you for your understanding. |
The trainer works with dataparallel. I haven't updated the test.py code yet, but it should be straightforward. I added a new dataset class to implement a standard pytorch dataset class, so I can feed it into a torch.utils.data.DataLoader, to get the benefits of threading. Shuffling is still handled by the dataset class, hence the batch size of 1 and shuffle=False in the dataloader.
I'll be happy to hear your comments and thanks for your work!
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Updated the YOLOv3 detection and training process for better configuration and performance.
📊 Key Changes
coco.data
configuration.--data-cfg
argument indetect.py
to specify the data configuration file.models.py
to refine grid size calculations and loss computations for better performance and readability.DataLoader
intest.py
andtrain.py
for more efficient, multi-threaded data loading.train.py
and fixed loss logging for enhanced training stability.torch_utils.py
.🎯 Purpose & Impact