Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NOMERGE] drop in transforms v2 into the v1 tests #7159

Draft
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Feb 1, 2023

This PR should serve as migration test whether we messed something up in transforms v2 that was enforced by the v1 tests. If CI is green here, we can be reasonably confidence that v2 is actually a drop-in replacement for v1.

@NicolasHug
Copy link
Member

I worked on making all the test_transforms.py tests pass in 7e032ad

Most of the time it's trivial and not worth bothering about it, like different error messages. But there are some occurrences that require our attention. I marked them all as TODOs in the PR, but the main ones are:

  • we don't support get_params() on instances
  • we don't support (or maybe we just don't warn) passing ints as interpolate parameter

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 6, 2023

  • we don't support get_params() on instances

This is an oversight on my side. Fix in #7177.

  • we don't support (or maybe we just don't warn) passing ints as interpolate parameter

We intentionally did not port that behavior since it is deprecated and scheduled for removal in 0.15, i.e. the upcoming release:

# Backward compatibility with integer value
if isinstance(interpolation, int):
warnings.warn(
"Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. "
"Please use InterpolationMode enum."
)
interpolation = _interpolation_modes_from_int(interpolation)

I've opened #7176 to handle this as well as all the other scheduled removals.

@NicolasHug
Copy link
Member

Thanks a lot Philp. Regarding int support we need to make sure we raise a proper error now then

@NicolasHug
Copy link
Member

NicolasHug commented Feb 7, 2023

I made test_functional_tensor.py pass in 99bc85f. There are 2 important issues:

  • We do not support ints or tuples for fill in torchscript.
  • ElasticTransform seems completely broken (no idea why though).

I'll move on to test/test_transforms_tensor.py now

@NicolasHug
Copy link
Member

NicolasHug commented Feb 7, 2023

OK, just took care of test/test_transforms_tensor.py. The common theme to most of the failures is that t(img) != scripted(t)(img). Because the jit transforms are actually the v1 version, this is mostly a symptom of the fact that t_V1(img) != t_V2(img).

For most transforms this is OK and this is probably because either:

  • for random transforms, sampling is performed differently between v1 and v2
  • there are other minor implementation differences between v1 and v2

Either way, we should double check all of the those marked as TODO to make sure these are expected.

One thing to note though is that the assertion that t(img) != scripted(t)(img) really doesn't hold anymore. For random transforms the difference can be very significant (99% pixel difference), and can even lead to images of different shape (see rotate). This is somewhat unexpected as this isn't how the V1 transforms behave. But I feel like this is something we can live with, as long as we document it.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 8, 2023

  • We do not support ints or tuples for fill in torchscript.

This just seems like wrong annotations. We didn't change anything about the functionality so reverting back to whatever v1 has should do the trick. As for why our v2 tests are green: I designed the our tests along the annotations of the functions failing to realize how wrong they actually are. Meaning, we never tested tuples, hence the green CI:

def get_fills(*, num_channels, dtype, vector=True):
yield None
max_value = get_max_value(dtype)
# This intentionally gives us a float and an int scalar fill value
yield max_value / 2
yield max_value
if not vector:
return
if dtype.is_floating_point:
yield [0.1 + c / 10 for c in range(num_channels)]
else:
yield [12.0 + c for c in range(num_channels)]


  • ElasticTransform seems completely broken (no idea why though).

It should not just be ElasticTransform, but rather all transforms that internally use grid_sample. #6945 ported _apply_grid_transform while simultaneously doing some refactorings. The relevant here is #6945 (comment):

[...] Important note is that the input image must be float. This is because the handling/casting can be done more efficiently outside of the method.

In practice this meant dropping the

img, need_cast, need_squeeze, out_dtype = _cast_squeeze_in(img, [grid.dtype])

helper.

The error you were seeing was about the image and the grid not having the same dtype, which was handled by this helper. We do have some dtype conversion as well in v2

output = _apply_grid_transform(image if fp else image.to(torch.float32), grid, interpolation.value, fill=fill)

but this doesn't account for floating point inputs that are not float32, since grid is float32 unconditionally.

As for the reason why our v2 tests didn't catch this: in order to reduce time to signal we initially only wanted to test against uint8 and float32 and so this is the default (which is used almost everywhere) when creating images:

dtypes=(torch.float32, torch.uint8),

for image_loader in make_image_loaders(sizes=["random"]):

I've opened #7195 to see if there are more bugs like this lurking in v2.

Since #6945 was mostly about performance and the top comment shows significant improvements for uint8 inputs. @vfdev-5 would you have time to investigate whether this comes from removing the helpers from _apply_grid_transform or from something else?


For random transforms the difference can be very significant (99% pixel difference)

IIUC, the times you experienced that is just for the family of AA transforms, right? That is expected, since we actually changed the sampling strategy there. This is why they have separate consistency tests:

and can even lead to images of different shape (see rotate)

For RandomRotation, the parameter sampling is equal:

Edit: My assessment below is wrong. See #7159 (comment) for the actual reason.

What you are hitting here are slight differences of how the new expanded size is computed. Again #6945 introduced some performance optimizations that apparently can manifest as a +-1. Thus, we are seeing errors such as:

AssertionError: The values for attribute 'shape' do not match: torch.Size([3, 48, 60]) != torch.Size([3, 48, 59]).

Apart from that, the test passes without issues. IMO, we don't need to fix anything here, since I hope no one depends on the exact shape of an output that changes based on random sampling (with expand=True the output shape depends on the sampled angle).

test/test_transforms_tensor.py Outdated Show resolved Hide resolved
Comment on lines 835 to 837
# Looks like our error message is buggy?
# with pytest.raises(RuntimeError, match="cannot call a value of type 'Tensor'"):
# torch.jit.script(t)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this test is supposed to do. I guess try to JIT script a Compose and see if it fails? Because that is what it should be doing:

.. note::
In order to script the transformations, please use ``torch.nn.Sequential`` as below.
>>> transforms = torch.nn.Sequential(
>>> transforms.CenterCrop(10),
>>> transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
>>> )
>>> scripted_transforms = torch.jit.script(transforms)

No idea why we are checking for a specific message here. The message is what you get if you try to script something without annotations. In that case JIT just assumes every variable is a torch.Tensor. Thus,

for t in self.transforms:
img = t(img)

just fails, because tensors cannot be called.

We should just remove the message here and be done with it.

test/test_transforms_tensor.py Outdated Show resolved Hide resolved
@pmeier pmeier mentioned this pull request Feb 15, 2023
Comment on lines +658 to +659
# TODO: None are passing - is it just because of randomness?
return
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look into that. None carries a special meaning I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None is default value for fill in AA. As in AA code we are calling directly functional ops which have fill=None by default.

test/test_transforms.py Show resolved Hide resolved
test/test_transforms.py Show resolved Hide resolved
Comment on lines 1880 to 1882
t = transforms.RandomRotation(0, fill=None)
assert t.fill == 0
# TODO: BC-break - do we care?
assert t.fill == defaultdict()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vfdev-5 and I discussed offline and our answer is no. As explained in #6517, fill=None and fill=0 is not the same. However, some v1 transforms (the three highlighted here) simply coerce None into 0:

if fill is None:
fill = 0

AFAIK this behavior is undocumented. In v2 we pass through None values:

# Fill = 0 is not equivalent to None, https://github.com/pytorch/vision/issues/6517
# So, we can't reassign fill to 0
# if fill is None:
# fill = 0
if fill is None:
return fill

Note that in both cases, fill=None is not the default value

def __init__(self, degrees, interpolation=InterpolationMode.NEAREST, expand=False, center=None, fill=0):

fill: Union[datapoints.FillType, Dict[Type, datapoints.FillType]] = 0,

So v1 is silently ignoring user input, while v2 doesn't.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, this assertion should be

assert t.fill is None

@@ -579,7 +620,8 @@ def test_convert_image_dtype_save_load(tmpdir):


@pytest.mark.parametrize("device", cpu_and_gpu())
@pytest.mark.parametrize("policy", [policy for policy in T.AutoAugmentPolicy])
# TODO: Why are there failures only for CIFAR10??
Copy link
Collaborator

@vfdev-5 vfdev-5 Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check that.

By the way enabling similar cifar10 policy in prototype tests show also a failure. Maybe there is a bug in the code port.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it might have been fixed since then, I'm not sure)

Copy link
Collaborator

@vfdev-5 vfdev-5 Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is failing here and in proto tests

On main it does not fail on cifar10 here but in modified prototype with cifar10

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 9, 2023

@@ -1214,7 +1219,7 @@ def test_rotate():
x = np.zeros((100, 100, 3), dtype=np.uint8)
x[40, 40] = [255, 255, 255]

with pytest.raises(TypeError, match=r"img should be PIL Image"):
with pytest.raises(TypeError, match=r"Input can either"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it should be

    with pytest.raises(TypeError, match=r"(Input can either|supports inputs of type)"):
        F.rotate(x, 10)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants