[feature request] transforms for object detection #3286

ydcjeff · 2021-01-24T15:30:50Z

🚀 Feature

I would like to start adding/supporting transforms (both functional and class) for object detection, I know I can take some of them from references folder. But, it would nice to have OOTB. Here are a few basic transforms I would like to add first -

RandomHorizontalFlipWithBBox
RandomVerticalFlipWithBBox
LetterBox

Pitch

All of the above transforms will accept 2 arguments when they are called. This breaks the purpose of Compose and nn.Sequential, but currently aren't we writing custom Compose or nn.Sequential? So I think it's ok to start introducing necessary transforms taking 2 arguments for detection, segmentation, etc and let users write custom Compose or nn.Sequential the way they would to like to call the transforms.

Additional context

Current code:

class RandomHorizontalFlipWithBBox(nn.Module):
    def __init__(self, prob: float = 0.5):
        super().__init__()
        self.prob = prob

    def forward(self, img, target):
        if random.random() < self.prob:
            width = img.width
            xmin, xmax = target[..., 0], target[..., 2]
            diff = abs(xmax - xmin)
            target[..., 0] = width - xmin - diff
            target[..., 2] = width - xmax + diff
            return FT.hflip(img), target
        return img, target

    def __repr__(self):
        return self.__class__.__name__ + "(p={})".format(self.prob)

class RandomVerticalFlipWithBBox(nn.Module):
    def __init__(self, prob: float = 0.5):
        super().__init__()
        self.prob = prob

    def forward(self, img, target):
        if random.random() < self.prob:
            height = img.height
            ymin, ymax = target[..., 1], target[..., 3]
            diff = abs(ymax - ymin)
            target[..., 1] = height - ymin - diff
            target[..., 3] = height - ymax + diff
            return FT.vflip(img), target
        return img, target

    def __repr__(self):
        return self.__class__.__name__ + "(p={})".format(self.prob)

class LetterBox(nn.Module):
    """
    Make letter box transform to image and bounding box target.

    Args:
        size (int or tuple of int): the size of the transformed image.
    """

    def __init__(self, size: Union[int, Tuple[int]]):
        super().__init__()
        self.size = size
        if isinstance(size, int):
            self.size = (size, size)

    def forward(self, img: Image.Image, target: Union[np.ndarray, Tensor]):
        """
        Args:
            img (PIL Image): Image to be transformed.
            target (np.ndarray or Tensor): bounding box target to be transformed.

        Returns:
            tuple: (image, target)
        """
        old_width, old_height = img.size
        width, height = self.size

        ratio = min(width / old_width, height / old_height)
        new_width = int(old_width * ratio)
        new_height = int(old_height * ratio)
        img = T.functional.resize(img, (new_height, new_width))

        pad_x = (width - new_width) * 0.5
        pad_y = (height - new_height) * 0.5
        left, right = round(pad_x + 0.1), round(pad_x - 0.1)
        top, bottom = round(pad_y + 0.1), round(pad_y - 0.1)
        padding = (left, top, right, bottom)
        img = T.functional.pad(img, padding, 255 // 2)

        if isinstance(target, torch.Tensor):
            target[..., 0] = torch.round(ratio * target[..., 0]) + left
            target[..., 1] = torch.round(ratio * target[..., 1]) + top
            target[..., 2] = torch.round(ratio * target[..., 2]) + right
            target[..., 3] = torch.round(ratio * target[..., 3]) + bottom
        elif isinstance(target, np.ndarray):
            target[..., 0] = np.rint(ratio * target[..., 0]) + left
            target[..., 1] = np.rint(ratio * target[..., 1]) + top
            target[..., 2] = np.rint(ratio * target[..., 2]) + right
            target[..., 3] = np.rint(ratio * target[..., 3]) + bottom
        return img, target

    def __repr__(self):
        return self.__class__.__name__ + f"({self.size})"

Thank you!

cc @vfdev-5, @fmassa

The text was updated successfully, but these errors were encountered:

fmassa · 2021-01-25T11:26:03Z

Hi,

Thanks for opening this issue.

This has been in our radar for a while already, but we never really managed to find out the right balance between simplicity and generality about the API.
For example, about the API you proposed, it wouldn't be enough if we wanted to work on image + boxes + keypoints, or even image +segmentation map, so we would need a number of repeated implementations to cover the models in torchvision.

For an earlier attempt for the APIs, see #1406 and the discussion within.

I would love to hear your thoughts on this.

zhiqwang · 2021-02-11T18:13:00Z

FYI, It seems that the existing batch_images in GeneralizedRCNNTransform plays the same role as the proposed LetterBox here.

vision/torchvision/models/detection/transform.py

Lines 199 to 217 in 51500c7

    
           def batch_images(self, images, size_divisible=32): 
        
               # type: (List[Tensor], int) -> Tensor 
        
               if torchvision._is_tracing(): 
        
                   # batch_images() does not export well to ONNX 
        
                   # call _onnx_batch_images() instead 
        
                   return self._onnx_batch_images(images, size_divisible) 
        
               max_size = self.max_by_axis([list(img.shape) for img in images]) 
        
               stride = float(size_divisible) 
        
               max_size = list(max_size) 
        
               max_size[1] = int(math.ceil(float(max_size[1]) / stride) * stride) 
        
               max_size[2] = int(math.ceil(float(max_size[2]) / stride) * stride) 
        
               batch_shape = [len(images)] + max_size 
        
               batched_imgs = images[0].new_full(batch_shape, 0) 
        
               for img, pad_img in zip(images, batched_imgs): 
        
                   pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img) 
        
               return batched_imgs

datumbox added new feature module: transforms topic: object detection needs discussion labels Jan 25, 2021

This was referenced Mar 20, 2021

Refactor transformations by extracting probabilities to get_params() method #3098

Closed

Avoid calling torch.rand in Transformation.forward() #3066

Closed

zhiqwang mentioned this issue Jul 6, 2022

resize with pad transformation #6236

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] transforms for object detection #3286

[feature request] transforms for object detection #3286

ydcjeff commented Jan 24, 2021 •

edited by datumbox

Loading

fmassa commented Jan 25, 2021

zhiqwang commented Feb 11, 2021 •

edited

Loading

[feature request] transforms for object detection #3286

[feature request] transforms for object detection #3286

Comments

ydcjeff commented Jan 24, 2021 • edited by datumbox Loading

🚀 Feature

Pitch

Additional context

fmassa commented Jan 25, 2021

zhiqwang commented Feb 11, 2021 • edited Loading

ydcjeff commented Jan 24, 2021 •

edited by datumbox

Loading

zhiqwang commented Feb 11, 2021 •

edited

Loading