Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proto] Small optim for perspective op on images #6907

Merged
merged 7 commits into from
Nov 4, 2022

Conversation

vfdev-5
Copy link
Collaborator

@vfdev-5 vfdev-5 commented Nov 4, 2022

  • Small optim for perspective op on images
  • reverted concat+single matmul trick on bboxes as it does not give any speed up and brings a slowdown for images.
[----------------------------------- Perspective_image_tensor cpu ----------------------------------]
                                   |  perspective_image_tensor_old v2  |  perspective_image_tensor v2
1 threads: ------------------------------------------------------------------------------------------
      (3, 400, 500) torch.uint8    |                4.8                |              4.64           
      (3, 400, 500) torch.float32  |                4.2                |              4.14           
6 threads: ------------------------------------------------------------------------------------------
      (3, 400, 500) torch.uint8    |                3.7                |              3.67           
      (3, 400, 500) torch.float32  |                3.6                |              3.52           

Times are in milliseconds (ms).

[---------------------------------- Perspective_image_tensor cuda ----------------------------------]
                                   |  perspective_image_tensor_old v2  |  perspective_image_tensor v2
1 threads: ------------------------------------------------------------------------------------------
      (3, 400, 500) torch.uint8    |                633                |              585            
      (3, 400, 500) torch.float32  |                549                |              543            
6 threads: ------------------------------------------------------------------------------------------
      (3, 400, 500) torch.uint8    |                594                |              586            
      (3, 400, 500) torch.float32  |                551                |              545            

Times are in microseconds (us).

cc @datumbox @bjuncek @pmeier

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vfdev-5 just 2 questions. Looks good overall.

base_grid[..., 1].copy_(y_grid)
base_grid[..., 2].fill_(1)

rescaled_theta1 = theta1.transpose(1, 2) / torch.tensor([0.5 * ow, 0.5 * oh], dtype=dtype, device=device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do in-place division?

Comment on lines +1053 to +1054
numer_points = torch.matmul(points, theta1.T)
denom_points = torch.matmul(points, theta2.T)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand, previously when we measured this specific change for bboxes it didn't improve the speed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measured all changes together: concat+single matmul + inplace + aminmax

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, it improved the speed for bounding boxes, but the PR didn't measure the impact on images. Since bounding box tensors are usually a lot smaller than images, the perf regression for images was not noticed there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were no perf regression at all as it was using previous implementation. For bboxes shape=(1000, 4) "concat+single matmul" only trick does not bring any speed up on cpu. While working on images (this PR) I see that same trick brings even a slowdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK if you measured it now and you know it doesn't slow us down it's fine by me.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to merge after testing the in-place division.

@vfdev-5 vfdev-5 merged commit 3d10c8a into pytorch:main Nov 4, 2022
facebook-github-bot pushed a commit that referenced this pull request Nov 14, 2022
Summary:
* [proto] small optim for perspective op on images, reverted concat trick on bboxes

* revert unrelated changes

* PR review updates

* PR review change

Reviewed By: NicolasHug

Differential Revision: D41265184

fbshipit-source-id: 12073a164180b2ed392dd455106f6411bab9a317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants