Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove addressed workaround in ResizeV2 #7606

Merged
merged 3 commits into from
May 22, 2023

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented May 19, 2023

This PR removes a workaround which is not needed anymore: the original problem was fixed in torch core already in pytorch/pytorch#101136

I can confirm that the same stress-test from #7557 (review) are still properly passing (and that those tests were hitting the workaround code)

cc @vfdev-5

@NicolasHug NicolasHug requested a review from vfdev-5 May 19, 2023 13:11
@pytorch-bot
Copy link

pytorch-bot bot commented May 19, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7606

Note: Links to docs will display an error until the docs builds have been completed.

❌ 26 New Failures

As of commit 017cdfe:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

elif interpolation == InterpolationMode.BILINEAR and image.device.type == "cpu":
elif (
interpolation == InterpolationMode.BILINEAR
and image.is_cpu
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit from me, I thought it was simpler than image.device.type == "cpu", but I can put it back

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can attest that this actually works, but

  1. I've never seen used it anywhere
  2. it is undocumented. This looks like a bug though, since is_cuda and is_meta are there.

Thus, I would prefer to leave it as is, but no strong opinion. How did you learn about it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  torchvision/transforms/v2/functional/_geometry.py:195: error: "Tensor" has no
  attribute "is_cpu"  [attr-defined]
                  and image.is_cpu
                      ^~~~~~~~~~~~
  Found 1 error in 1 file (checked 236 source files)
  Error: Process completed with exit code 1.

bwhahahaha.
Anyway.

How did you learn about it?

I was reminded that is_cuda exists so I sort of guessed is_cpu should exist as well. It's used in the torch core code-base, but sparsely. I'll revert anyway to avoid fighting mypy

@NicolasHug
Copy link
Member Author

I wonder whether this is actually want we want

We're going from this on main:

transform               median
--------------------  --------
PILToTensor                292
RandomResizedCrop          432
RandomHorizontalFlip        82
ConvertDtype                84
Normalize                  595
--------------------  --------
Total                     1472

To this on this PR:

transform               median
--------------------  --------
PILToTensor                270
RandomResizedCrop          534
RandomHorizontalFlip        67
ConvertDtype                77
Normalize                  137
--------------------  --------
Total                     1090

RandomResizedCrop being slightly slower and Normalize being significantly faster shows that the output is allocated as CF now.

I find that a bit surprising because from offline discussions with @vfdev-5, I thought the output should be preserved as CL. Isn't that what test_memory_format_consistency_resize_image_tensor() is supposed to enforce as well?

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nicolas!

elif interpolation == InterpolationMode.BILINEAR and image.device.type == "cpu":
elif (
interpolation == InterpolationMode.BILINEAR
and image.is_cpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can attest that this actually works, but

  1. I've never seen used it anywhere
  2. it is undocumented. This looks like a bug though, since is_cuda and is_meta are there.

Thus, I would prefer to leave it as is, but no strong opinion. How did you learn about it?

# uint8 dtype support for bilinear mode is limited to cpu and
# according to our benchmarks non-AVX CPUs should prefer u8->f32->interpolate->u8 path
if "AVX2" in torch.backends.cpu.get_cpu_capability():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have been in the original elif already or am I missing something?

Copy link
Member Author

@NicolasHug NicolasHug May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes there's nothing before or after that block so it's logically the same

@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 22, 2023

For visibility, here is what happens in the PR:

  • PIL to Tensor returns a CL-like 3D tensor
  • Random Crop makes it non-contig CL
  • In Resize we have
    strides = image.stride()
    if image.is_contiguous(memory_format=torch.channels_last) and image.shape[0] == 1 and numel != strides[0]:
    # There is a weird behaviour in torch core where the output tensor of `interpolate()` can be allocated as
    # contiguous even though the input is un-ambiguously channels_last (https://github.com/pytorch/pytorch/issues/68430).
    # In particular this happens for the typical torchvision use-case of single CHW images where we fake the batch dim
    # to become 1CHW. Below, we restride those tensors to trick torch core into properly allocating the output as
    # channels_last, thus preserving the memory format of the input. This is not just for format consistency:
    # for uint8 bilinear images, this also avoids an extra copy (re-packing) of the output and saves time.
    # TODO: when https://github.com/pytorch/pytorch/issues/68430 is fixed (possibly by https://github.com/pytorch/pytorch/pull/100373),
    # we should be able to remove this hack.
    new_strides = list(strides)
    new_strides[0] = numel
    image = image.as_strided((1, num_channels, old_height, old_width), new_strides)
    which should hint to pytorch code to make output of the same memory format. This does not work as input is not image.is_contiguous(memory_format=torch.channels_last) due to slicing, so output is CF. As output is CF and in the upsample AVX code we are doing a copy of the input into CL format here: https://github.com/pytorch/pytorch/blob/4f2c007a1b5170c2aa0d47e388ff9e07c7a7d354/aten/src/ATen/native/cpu/UpSampleKernelAVXAntialias.h#L323-L331 thus there should be an additional call of pack_rgb.

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's merge it and think of another way to avoid memory format change later

@NicolasHug NicolasHug merged commit 6ccc712 into pytorch:main May 22, 2023
@github-actions
Copy link

Hey @NicolasHug!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request May 23, 2023
Reviewed By: vmoens

Differential Revision: D46071408

fbshipit-source-id: 8216a893fc11741260c6c741bfa609cbe4a31a54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants