Support all integer and floating point dtypes in prototype transform kernels? #6840

pmeier · 2022-10-26T13:50:26Z

The standard rule for dtype support for images and videos is:

All floating point and integer tensors are supported.
Floating point tensors are valid in the range [0.0, 1.0] and integer tensors in [0, torch.iinfo(dtype).max] (this is currently under review since there were a few cases, where this was not true or simply not handled. See Don't hardcode 255 unless uint8 is enforced #6825)

However we have currently two kernels that only support uint8 images or videos:

vision/torchvision/prototype/transforms/functional/_color.py

Lines 373 to 375 in c84dbfa

    
           def equalize_image_tensor(image: torch.Tensor) -> torch.Tensor: 
        
               if image.dtype != torch.uint8: 
        
                   raise TypeError(f"Only torch.uint8 image tensors are supported, but found {image.dtype}")

vision/torchvision/transforms/functional_tensor.py

Lines 788 to 789 in c84dbfa

    
           if img.dtype != torch.uint8: 
        
               raise TypeError(f"Only torch.uint8 image tensors are supported, but found {img.dtype}")

This also holds for transforms v1 so this is not a problem of the new API.

One consequence of that is that AA transforms are only supported for uint8 images

vision/torchvision/transforms/autoaugment.py

Lines 104 to 107 in c84dbfa

    
           class AutoAugment(torch.nn.Module): 
        
               r"""AutoAugment data augmentation method based on 
        
               `"AutoAugment: Learning Augmentation Strategies from Data" <https://arxiv.org/pdf/1805.09501.pdf>`_. 
        
               If the image is torch Tensor, it should be of type torch.uint8, and it is expected

since both

vision/torchvision/transforms/autoaugment.py

Lines 76 to 77 in c84dbfa

    
           elif op_name == "Posterize": 
        
               img = F.posterize(img, int(magnitude))

and

vision/torchvision/transforms/autoaugment.py

Lines 82 to 83 in c84dbfa

    
           elif op_name == "Equalize": 
        
               img = F.equalize(img)

are used.

One possible way of mitigating this to simply have a convert_dtype(image, torch.uint8) in the beginning and converting back after computation.

That is probably needed for equalize since we recently switched away from the histogram ops of torch towards our "custom" implementation to enable batch processing (#6757). However, this relies on the fact that the input is an integer and in its current form even on uint8 due to some hardcoded constants.

For posterize I think it is fairly easy to provide the same functionality for float inputs directly without going through a dtype conversion first.

cc @vfdev-5 @datumbox @bjuncek

The text was updated successfully, but these errors were encountered:

datumbox · 2022-10-26T14:38:59Z

I'm in favour of doing this. It will allow users to cast to float32 first which might be beneficial for cases where the uint8 kernels are slower.

datumbox · 2022-10-27T14:40:44Z

Before closing this ticket, we should make the following proposed optimization on AugMix:

vision/torchvision/prototype/transforms/_auto_augment.py

Lines 509 to 513 in e1f464b

    
           # The multiplication below could become in-place provided `aug is not batch and aug.is_floating_point()` 
        
           # Currently we can't do this because `aug` has to be `unint8` to support ops like `equalize`. 
        
           # TODO: change this once all ops in `F` support floats. https://github.com/pytorch/vision/issues/6840 
        
           combined_weights[:, i].reshape(batch_dims) 
        
           * aug

Edit: I checked the above and doesn't actually improve the speed, so we should just remove the comment.

datumbox · 2022-10-31T11:59:22Z

@pmeier I think the last thing to consider before closing the ticket is how AA can send the right threshold values to posterize when the Image is a float. Aka:

vision/torchvision/prototype/transforms/_auto_augment.py

Line 172 in e1f464b

    
           "Solarize": (lambda num_bins, height, width: torch.linspace(255.0, 0.0, num_bins), False),

Thoughts on this?

Edit: I issued a PR for this at #6874

pmeier added needs discussion module: transforms prototype labels Oct 26, 2022

This was referenced Oct 27, 2022

extend support of posterize to all integer and floating dtypes #6847

Merged

extend equalize to all integer and floating dtypes #6851

Merged

datumbox mentioned this issue Oct 31, 2022

[prototype] Adjust solarize threshold on input type #6874

Merged

datumbox closed this as completed in #6874 Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support all integer and floating point dtypes in prototype transform kernels? #6840

Support all integer and floating point dtypes in prototype transform kernels? #6840

pmeier commented Oct 26, 2022 •

edited by pytorch-bot bot

Loading

datumbox commented Oct 26, 2022

datumbox commented Oct 27, 2022 •

edited

Loading

datumbox commented Oct 31, 2022 •

edited

Loading

Support all integer and floating point dtypes in prototype transform kernels? #6840

Support all integer and floating point dtypes in prototype transform kernels? #6840

Comments

pmeier commented Oct 26, 2022 • edited by pytorch-bot bot Loading

datumbox commented Oct 26, 2022

datumbox commented Oct 27, 2022 • edited Loading

datumbox commented Oct 31, 2022 • edited Loading

pmeier commented Oct 26, 2022 •

edited by pytorch-bot bot

Loading

datumbox commented Oct 27, 2022 •

edited

Loading

datumbox commented Oct 31, 2022 •

edited

Loading