-
-
Notifications
You must be signed in to change notification settings - Fork 16.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic batch size support for TensorRT #8526
Conversation
@democat3457 thanks for the PR! This is very interesting. I'll try to test independently today. Two questions:
|
@glenn-jocher
|
@glenn-jocher have you been able to test this? |
@democat3457 I tested this PR in colab but got an error. Could you take a look please? !git clone https://github.com/democat3457/yolov5 -b patch-1 # clone
%cd yolov5
%pip install -qr requirements.txt # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic # export |
The issue had to do with some int division rounding down to 0, should be fixed now |
@democat3457 thanks! I'll retest |
@glenn-jocher have you been able to retest? |
@democat3457 thanks for the reminder, testing now! |
@democat3457 @democat3457 PR fails on batch-size 2 inference: To reproduce: !git clone https://github.com/democat3457/yolov5 -b patch-1 # clone
%cd yolov5
%pip install -qr requirements.txt # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic # export
# PyTorch Hub
import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.engine')
# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')] # batch of images
# Inference
results = model(imgs)
results.print() # or .show(), .save() |
@glenn-jocher this is because you exported with a (default) max batch size of 1, but tried to use a batch size of 2 when inferencing. TensorRT requires a maximum batch size to properly do dynamic batches, so the |
A warning is now displayed if |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Ah got it! I'll try again following your updates above. |
@democat3457 I retested with --batch-size 16 during export and two images batched during inference but I get a new error now in Colab: !git clone https://github.com/democat3457/yolov5 -b patch-1 # clone
%cd yolov5
%pip install -qr requirements.txt # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic --batch-size 16 # export
# PyTorch Hub
import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.engine')
# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')] # batch of images
# Inference
results = model(imgs)
results.print() # or .show(), .save() Error on PyTorch Hub inference:
|
@glenn-jocher the issue is with the loaded repo when you use I fixed that line to
After reloading the runtime and re-running the script, I get this:
|
@democat3457 oh of course, beginner mistake I made. Thanks for reviewing. |
This works now: !git clone https://github.com/democat3457/yolov5 -b patch-1 # clone
%cd yolov5
%pip install -qr requirements.txt # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com # install TRT
!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic # export
# PyTorch Hub
import torch
# Model
model = torch.hub.load('democat3457/yolov5:patch-1', 'custom', 'yolov5s.engine')
# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')] # batch of images
# Inference
results = model(imgs)
results.print() # or .show(), .save() |
@democat3457 PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐ |
* Dynamic batch size support for TensorRT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update export.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix optimization profile when batch size is 1 * Warn users if they use batch-size=1 with dynamic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More descriptive assertion error * Fix syntax * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pre-commit formatting sucked * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update export.py Co-authored-by: Colin Wong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Glenn Jocher <[email protected]>
@democat3457
To inference on normal TRT models, I pad the rest of the batch with np.zeros to fill the batch up to the batch size (dynamic does not need this)
|
@youngjae-avikus sure
|
@democat3457 Then, what can be said about the reason for conducting the experimental procedure as above and what does the result suggest? |
A correction to your point a: the normal model was ran with the 16 (9+7) total images, not 32 images (but all of those images did have their batches padded with zeros). This experimental procedure shows that dynamic models do seem to allow proportionality of process time and batch size without sacrificing any base performance. |
@democat3457 |
Not regardless of dynamic or normal model, no. Process time is proportional to batch size only if using dynamic, but not if using normal model. |
|
|
@democat3457 if you would like to further discuss testing methodologies or performance characteristics, please feel free to do so. |
Are there any additional benchmarks in particular that you want me to discuss? (I was under the impression everything in this thread was fairly sufficient.) |
@democat3457 it looks like all the necessary benchmarks have been covered in this thread. If you have any other questions or need further assistance with anything else, please feel free to ask! |
TensorRT supports dynamic input shapes by setting an optimization profile and setting the binding input size on-the-fly. This PR exposes that option when exporting as TensorRT and resizes the binding input size during detection.
Tested with exporting both dynamic and non-dynamic TensorRT models and running them through DetectMultiBackend.
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Improved TensorRT model export support with dynamic axes and better input handling.
📊 Key Changes
export_engine
function to accept a dynamic flag.export_engine
, now using internal prefix variable.🎯 Purpose & Impact