Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proto] Optimized functional pad op for bboxes + tests #6890

Merged
merged 16 commits into from
Nov 3, 2022

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented Nov 2, 2022

Pad:

[------- pad_bounding_box cpu BoundingBoxFormat.XYXY --------]
            |  pad_bounding_box_old v2  |  pad_bounding_box v2
1 threads: ---------------------------------------------------
      (4,)  |             80            |           10        
6 threads: ---------------------------------------------------
      (4,)  |             84            |           10        

Times are in microseconds (us).

[------- pad_bounding_box cpu BoundingBoxFormat.XYWH --------]
            |  pad_bounding_box_old v2  |  pad_bounding_box v2
1 threads: ---------------------------------------------------
      (4,)  |             50            |           10        
6 threads: ---------------------------------------------------
      (4,)  |             50            |           10        

Times are in microseconds (us).

[------ pad_bounding_box cpu BoundingBoxFormat.CXCYWH -------]
            |  pad_bounding_box_old v2  |  pad_bounding_box v2
1 threads: ---------------------------------------------------
      (4,)  |             50            |           10        
6 threads: ---------------------------------------------------
      (4,)  |             50            |           10        

Times are in microseconds (us).

[------- pad_bounding_box cuda BoundingBoxFormat.XYXY -------]
            |  pad_bounding_box_old v2  |  pad_bounding_box v2
1 threads: ---------------------------------------------------
      (4,)  |            150            |           48        
6 threads: ---------------------------------------------------
      (4,)  |            147            |           48        

Times are in microseconds (us).

[------- pad_bounding_box cuda BoundingBoxFormat.XYWH -------]
            |  pad_bounding_box_old v2  |  pad_bounding_box v2
1 threads: ---------------------------------------------------
      (4,)  |             91            |           47        
6 threads: ---------------------------------------------------
      (4,)  |             91            |           47        

Times are in microseconds (us).

[------ pad_bounding_box cuda BoundingBoxFormat.CXCYWH ------]
            |  pad_bounding_box_old v2  |  pad_bounding_box v2
1 threads: ---------------------------------------------------
      (4,)  |            91.3           |           48        
6 threads: ---------------------------------------------------
      (4,)  |            92.5           |           47        

Times are in microseconds (us).

Crop (replaced clone+inplace with a single op):

[-------- crop_bounding_box cpu BoundingBoxFormat.XYXY --------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |             50             |           10         
6 threads: -----------------------------------------------------
      (4,)  |             50             |           10         

Times are in microseconds (us).

[-------- crop_bounding_box cpu BoundingBoxFormat.XYWH --------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |             97             |           10         
6 threads: -----------------------------------------------------
      (4,)  |            100             |           10         

Times are in microseconds (us).

[------- crop_bounding_box cpu BoundingBoxFormat.CXCYWH -------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |            100             |           10         
6 threads: -----------------------------------------------------
      (4,)  |            100             |           10         

Times are in microseconds (us).

[------- crop_bounding_box cuda BoundingBoxFormat.XYXY --------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |            117             |           47         
6 threads: -----------------------------------------------------
      (4,)  |            118             |           48         

Times are in microseconds (us).

[------- crop_bounding_box cuda BoundingBoxFormat.XYWH --------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |            195             |           47         
6 threads: -----------------------------------------------------
      (4,)  |            196             |           47         

Times are in microseconds (us).

[------ crop_bounding_box cuda BoundingBoxFormat.CXCYWH -------]
            |  crop_bounding_box_old v2  |  crop_bounding_box v2
1 threads: -----------------------------------------------------
      (4,)  |            290             |           48         
6 threads: -----------------------------------------------------
      (4,)  |            292             |           47         

Times are in microseconds (us).

cc @datumbox @bjuncek @pmeier

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@vfdev-5 vfdev-5 added module: transforms Perf For performance improvements prototype labels Nov 2, 2022
@vfdev-5 vfdev-5 merged commit 79ca506 into pytorch:main Nov 3, 2022
@vfdev-5 vfdev-5 deleted the proto-speedup-pad-bboxes branch November 3, 2022 10:58
facebook-github-bot pushed a commit that referenced this pull request Nov 4, 2022
Summary:
* [proto] Speed-up crop on bboxes and tests

* Fix linter

* Update _geometry.py

* Fixed device issue

* Revert changes in test/prototype_transforms_kernel_infos.py

* Fixed failing correctness tests

* [proto] Optimized functional pad op for bboxes + tests

* Renamed copy-pasted variable name

* Code update

* Fixes according to the review

Reviewed By: datumbox

Differential Revision: D41020556

fbshipit-source-id: c000d993c671f04160aada9b0e04a32742bdd9cf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants