Remaining BBox kernel perf optimizations #6896

datumbox · 2022-11-03T11:22:05Z

Some of the opts highlighted at #6872

datumbox · 2022-11-03T11:22:48Z

torchvision/prototype/transforms/functional/_geometry.py

+    w_ratio = new_width / old_width
+    h_ratio = new_height / old_height
+    ratios = torch.tensor([w_ratio, h_ratio, w_ratio, h_ratio], device=bounding_box.device)
    return (
-        bounding_box.reshape(-1, 2, 2).mul(ratios).to(bounding_box.dtype).reshape(bounding_box.shape),
+        bounding_box.mul(ratios).to(bounding_box.dtype),


Improvement:

[------------ resize cpu torch.float32 ------------] | old | new 1 threads: ----------------------------------------- (128, 4) | 13 (+- 0) us | 8 (+- 0) us 6 threads: ----------------------------------------- (128, 4) | 13 (+- 0) us | 8 (+- 0) us Times are in microseconds (us). [----------- resize cuda torch.float32 ------------] | old | new 1 threads: ----------------------------------------- (128, 4) | 37 (+- 0) us | 31 (+- 0) us 6 threads: ----------------------------------------- (128, 4) | 37 (+- 0) us | 31 (+- 0) us Times are in microseconds (us). [------------- resize cpu torch.uint8 -------------] | old | new 1 threads: ----------------------------------------- (128, 4) | 19 (+- 0) us | 13 (+- 0) us 6 threads: ----------------------------------------- (128, 4) | 19 (+- 0) us | 13 (+- 0) us Times are in microseconds (us). [------------ resize cuda torch.uint8 -------------] | old | new 1 threads: ----------------------------------------- (128, 4) | 45 (+- 0) us | 39 (+- 0) us 6 threads: ----------------------------------------- (128, 4) | 45 (+- 0) us | 39 (+- 1) us Times are in microseconds (us).

torchvision/prototype/transforms/functional/_geometry.py

vfdev-5 · 2022-11-03T11:39:01Z

Maybe, we can merge this after #6879

vfdev-5

Nice optim for resize, thanks @datumbox

…astic_bounding_box`.

datumbox · 2022-11-03T11:43:46Z

@vfdev-5 I just pushed a couple of untested opts. Could you check again which you think are safe? I'll do benchmarks after we confirm which ones we want in.

torchvision/prototype/transforms/functional/_geometry.py

torchvision/transforms/functional_tensor.py

vfdev-5 · 2022-11-03T11:45:43Z

I'll cherry pick those for elastic those which makes sense. Thanks for pointers!

datumbox · 2022-11-03T12:35:26Z

torchvision/prototype/transforms/functional/_geometry.py

@@ -388,8 +389,7 @@ def _affine_bounding_box_xyxy(
        new_points = torch.matmul(points, transposed_affine_matrix)
        tr, _ = torch.min(new_points, dim=0, keepdim=True)
        # Translate bounding boxes
-        out_bboxes[:, 0::2] = out_bboxes[:, 0::2] - tr[:, 0]
-        out_bboxes[:, 1::2] = out_bboxes[:, 1::2] - tr[:, 1]
+        out_bboxes.sub_(tr.repeat((1, 2)))


Improvement for both changes:

[-------------------- bbox_rotate cpu -------------------] | False | True 1 threads: ---------------------------------------------- torch.float32 | 265 (+- 40) us | 225 (+- 2) us torch.float64 | 261 (+- 1) us | 241 (+- 1) us torch.int32 | 258 (+- 1) us | 239 (+- 2) us torch.int64 | 260 (+- 1) us | 239 (+- 1) us 6 threads: ---------------------------------------------- torch.float32 | 466 (+- 10) us | 405 (+- 20) us torch.float64 | 483 (+- 10) us | 422 (+- 55) us torch.int32 | 479 (+- 10) us | 420 (+- 10) us torch.int64 | 482 (+- 18) us | 422 (+- 10) us Times are in microseconds (us). [-------------------- bbox_rotate cpu -------------------] | False | True 1 threads: ---------------------------------------------- torch.float32 | 498 (+- 46) us | 432 (+- 0) us torch.float64 | 489 (+- 1) us | 446 (+- 0) us torch.int32 | 503 (+- 0) us | 459 (+- 3) us torch.int64 | 504 (+- 3) us | 458 (+- 0) us 6 threads: ---------------------------------------------- torch.float32 | 573 (+- 2) us | 530 (+- 0) us torch.float64 | 600 (+- 20) us | 554 (+- 20) us torch.int32 | 609 (+- 20) us | 560 (+- 10) us torch.int64 | 598 (+- 58) us | 563 (+- 10) us Times are in microseconds (us).

vfdev-5

LGTM, thanks @datumbox

Summary: * Bbox resize optimization * Other (untested) optimizations on `_affine_bounding_box_xyxy` and `elastic_bounding_box`. * fix conflict * Reverting changes on elastic * revert one more change * Further improvement Reviewed By: datumbox Differential Revision: D41020550 fbshipit-source-id: dfd1f2d91490b45176f1976bcec1fc99248f8587

Bbox resize optimization

6ecb927

facebook-github-bot added the cla signed label Nov 3, 2022

datumbox commented Nov 3, 2022

View reviewed changes

pmeier reviewed Nov 3, 2022

View reviewed changes

torchvision/prototype/transforms/functional/_geometry.py Outdated Show resolved Hide resolved

vfdev-5 reviewed Nov 3, 2022

View reviewed changes

torchvision/prototype/transforms/functional/_geometry.py Outdated Show resolved Hide resolved

vfdev-5 approved these changes Nov 3, 2022

View reviewed changes

Other (untested) optimizations on _affine_bounding_box_xyxy and `el…

cef0a23

…astic_bounding_box`.

datumbox requested a review from vfdev-5 November 3, 2022 11:43

datumbox commented Nov 3, 2022

View reviewed changes

torchvision/prototype/transforms/functional/_geometry.py Outdated Show resolved Hide resolved

torchvision/transforms/functional_tensor.py Outdated Show resolved Hide resolved

datumbox and others added 5 commits November 3, 2022 11:46

fix conflict

8a657a9

Merge branch 'main' into prototype/bbox_speedups

2ed8031

Reverting changes on elastic

27605e2

revert one more change

a2b9681

Further improvement

fd7f0d5

datumbox commented Nov 3, 2022

View reviewed changes

datumbox changed the title ~~[WIP] Remaining BBox kernel perf optimizations~~ Remaining BBox kernel perf optimizations Nov 3, 2022

datumbox added module: transforms Perf For performance improvements prototype labels Nov 3, 2022

vfdev-5 approved these changes Nov 3, 2022

View reviewed changes

Merge branch 'main' into prototype/bbox_speedups

b270b12

datumbox merged commit f1b840d into pytorch:main Nov 3, 2022

datumbox deleted the prototype/bbox_speedups branch November 3, 2022 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remaining BBox kernel perf optimizations #6896

Remaining BBox kernel perf optimizations #6896

datumbox commented Nov 3, 2022 •

edited by pytorch-bot bot

Loading

datumbox Nov 3, 2022

vfdev-5 commented Nov 3, 2022

vfdev-5 left a comment

datumbox commented Nov 3, 2022

vfdev-5 commented Nov 3, 2022

datumbox Nov 3, 2022

vfdev-5 left a comment

Remaining BBox kernel perf optimizations #6896

Remaining BBox kernel perf optimizations #6896

Conversation

datumbox commented Nov 3, 2022 • edited by pytorch-bot bot Loading

datumbox Nov 3, 2022

Choose a reason for hiding this comment

vfdev-5 commented Nov 3, 2022

vfdev-5 left a comment

Choose a reason for hiding this comment

datumbox commented Nov 3, 2022

vfdev-5 commented Nov 3, 2022

datumbox Nov 3, 2022

Choose a reason for hiding this comment

vfdev-5 left a comment

Choose a reason for hiding this comment

datumbox commented Nov 3, 2022 •

edited by pytorch-bot bot

Loading